Data Scientist with 2+ years of experience in ML pipelines, causal inference, and LLM applications — including custom PyTorch transformers, agentic LLM pipelines, and production FastAPI deployments on GCP. Currently pursuing MS in Business Analytics at USC Marshall.
# Custom transformer + MLOps layer class RageQuitTransformer: # 22 behavioral tokens # KILL, DEATH, DEATH_STREAK, # ACTION_DROUGHT, XP_FALLING_BEHIND encoder = TransformerEncoder( layers=4, heads=4, dim=128 ) # 849K parameters # game-time positional encoding pos = GameTimeEncoding(max_min=90) mlflow: v1 → Production kafka: match-events · 4 partitions s3: 4-tier data lake · 10MB psi: 5 features · >0.2 = FAIL amp: 2.47x speedup · 32ms/step AUC-PR: 0.269 # primary metric AUC-ROC: 0.928 Recall: 0.454
# 8-task Airflow DAG + NCF model # DAG (@weekly schedule) ingest → validate → spark_features → s3_upload → [train_mf || train_ncf] → evaluate_models → drift_check (PSI) → promote OR block_alert # Two-tower NCF model = NeuralCF( gmf_dim=64, mlp_layers=[128,64,32], dropout=0.2 # ~25.6M params ) HR@10: NCF +13% over MF baseline NDCG@10: NCF +17% over MF baseline PSI: blocks if >0.2 on 5 features # 32M ratings · 32 tests passing
# LangGraph agent with verification def analyze(repo, question): plan = planner.understand(question) files = executor.read_code(plan) insight = analyzer.extract(files) report = summarizer.generate(insight) result = verifier.validate(report) # must include file_path + line_range # or verifier rejects and retries while not result.schema_valid: report = summarizer.retry(result) result = verifier.validate(report) return result # eval: 33/33 (core + adversarial)
# Churn timing + causal framework features = [ "session_drift_7d", "engagement_decay_rate", "feature_usage_entropy", "days_since_peak_activity", ] model = XGBClassifier( objective="binary:logistic", eval_metric="auc" ) # Treatment-control evaluation lift_7d = 2.25 # responsive cohort # Segmented: high / medium / low # → targeted retention actions
# 3-model census system # Model 1: Arrivals (Random Forest) features = [ "lag_1d", "lag_7d", "lag_14d", "roll_7d_mean", "roll_14d_mean", "is_weekend", "day_of_week", ] # shift-then-roll: no leakage MAE: 2.1 MAPE: 9.85% # Model 2: LOS (Ridge + log1p) los = expm1(ridge.predict(X_los)) # Model 3: Survival hazard table census_t1 = (census_t0 + rf_arrivals * 0.15 - sum(hazard_probs)) # FastAPI 6 ep · 12 tests ✓ # Docker → GCP Cloud Run · live
Currently seeking full-time Data Science and ML roles starting May 2026. Always open to interesting conversations.