Yashraj Jadhav - Data Scientist & ML Engineer

01 — DEEP LEARNING + MLOPS

Rage Quit Predictor

ProblemPredict player disengagement from behavioral event sequences before it happens — a user retention problem that maps directly to Spotify, Uber, and Airbnb. Event ordering matters: a death streak followed by action drought is a different signal than the same events spread across 20 minutes.

ModelCustom PyTorch transformer (849K params) with 22-token behavioral vocabulary, game-time positional encoding, and dual embedding (discrete events + continuous features). Trained on 20,000 matches from OpenDota API. Outperforms LogReg, XGBoost, and LSTM baselines on AUC-PR.

MLOpsFull production layer: MLflow experiment tracking + model registry (v1→Production), 2.47x training speedup via torch.cuda.amp, AWS S3 4-tier data lake, Apache Kafka event streaming (4 partitions), PSI drift monitoring across 5 behavioral features.

ResultAUC-PR 0.269 (primary metric — correct for 0.61% positive rate), AUC-ROC 0.928, F1 0.422, Recall 0.454. Attention heatmaps confirm interpretable focus on death streaks and action droughts.

PyTorchTransformerMLflow Apache KafkaAWS S3PSI Drift AMPStreamlitBehavioral Analytics

GitHub → Live Demo ↗ Medium ↗

# Custom transformer + MLOps layer

class RageQuitTransformer:
  # 22 behavioral tokens
  # KILL, DEATH, DEATH_STREAK,
  # ACTION_DROUGHT, XP_FALLING_BEHIND

  encoder = TransformerEncoder(
    layers=4, heads=4, dim=128
  )  # 849K parameters

  # game-time positional encoding
  pos = GameTimeEncoding(max_min=90)

  mlflow: v1 → Production
  kafka:  match-events · 4 partitions
  s3:     4-tier data lake · 10MB
  psi:    5 features · >0.2 = FAIL
  amp:    2.47x speedup · 32ms/step

  AUC-PR:  0.269  # primary metric
  AUC-ROC: 0.928
  Recall:  0.454

02 — ML ENGINEERING + RECSYS

Movie Recommender Pipeline

ProblemBuild a production-grade recommendation engine covering the full ML lifecycle: distributed feature engineering → deep learning training → experiment governance → automated pipeline orchestration → drift-gated model promotion.

ModelTwo-tower Neural Collaborative Filtering (GMF + MLP, ~25.6M params) in TensorFlow/Keras on MovieLens 32M (200K users, 87K movies, 99.82% sparsity). Implicit feedback with 4:1 negative sampling. Outperforms Matrix Factorization baseline by ~13% HR@10 and ~17% NDCG@10.

PipelinePySpark feature engineering on 32M records (15 user/item behavioral features, temporal per-user train/val/test split, cold-start filtering). Apache Airflow 8-task DAG with BranchPythonOperator gating promotion on PSI drift across 5 features.

InfraMLflow experiment tracking + model registry with NDCG@10 quality gate before Production promotion. AWS S3 4-tier data lake. Docker Compose running Airflow + MLflow. 32 unit + integration tests passing.

PySparkTensorFlowApache Airflow MLflowAWS S3PSI Drift RecSysDockerNCF

GitHub →

# 8-task Airflow DAG + NCF model

# DAG (@weekly schedule)
ingest → validate → spark_features
  → s3_upload
    → [train_mf || train_ncf]
      → evaluate_models
        → drift_check (PSI)
          → promote OR block_alert

# Two-tower NCF
model = NeuralCF(
  gmf_dim=64,
  mlp_layers=[128,64,32],
  dropout=0.2  # ~25.6M params
)
HR@10:   NCF +13% over MF baseline
NDCG@10: NCF +17% over MF baseline
PSI:     blocks if >0.2 on 5 features
# 32M ratings · 32 tests passing

03 — AI AGENT

AI Repo Co-Pilot

ProblemUnderstanding unfamiliar codebases is slow and error-prone; LLMs hallucinate file paths and line numbers.

Approach5-node LangGraph agent: plan, read, analyze, summarize, verify. Citations require file_path + line_range or verifier rejects and retries. Powered by GPT-4o-mini for cost/latency efficiency.

Result33/33 eval tests pass including adversarial cases (hallucinated paths, ambiguous queries, multi-file dependencies). Supports 40+ languages across any local or cloned repo.

LangGraphGPT-4o-miniOpenAI API PythonAI Agent40+ Languages

GitHub →

# LangGraph agent with verification

def analyze(repo, question):
  plan    = planner.understand(question)
  files   = executor.read_code(plan)
  insight = analyzer.extract(files)
  report  = summarizer.generate(insight)
  result  = verifier.validate(report)

  # must include file_path + line_range
  # or verifier rejects and retries
  while not result.schema_valid:
    report = summarizer.retry(result)
    result = verifier.validate(report)

  return result
# eval: 33/33 (core + adversarial)

04 — LLM SYSTEM

Retention Decision Twin

ProblemCRM teams lack a fast, evidence-based way to choose retention actions for different user cohorts.

ApproachRAG pipeline: 15+ sources chunked and embedded into ChromaDB, retrieved via cosine HNSW, generated by Gemini 2.0 Flash with structured prompts.

Result4-config A/B evaluation (RAG vs No-RAG × Prompt v1 vs v2) scored by LLM-as-judge across 15 scenarios on relevance, faithfulness, citation quality, and actionability.

GeminiChromaDBRAG LLM-as-JudgeStreamlit

GitHub →

15+ Sources

↓

Chunking · 1500 char

↓

Gemini Embeddings · 768d

↓

ChromaDB · HNSW Index

↓

Top-K Retrieval

↓

Gemini 2.0 Flash

↓

Cited Recommendation

05 — CAUSAL ML

Churn Timing & Notification Impact

ProblemLong-tenured users churn silently; raw activity counts miss early disengagement signals.

ApproachXGBoost churn model with drift-based features. Randomized treatment-control framework to measure causal notification impact. Uplift-based user segmentation.

Result~225% lift in 7-day post-notification sessions for responsive cohorts. Identified high-response personas for targeted retention.

XGBoostCausal InferenceA/B Testing ClusteringUplift Analysis

GitHub →

# Churn timing + causal framework

features = [
  "session_drift_7d",
  "engagement_decay_rate",
  "feature_usage_entropy",
  "days_since_peak_activity",
]

model = XGBClassifier(
  objective="binary:logistic",
  eval_metric="auc"
)

# Treatment-control evaluation
lift_7d = 2.25  # responsive cohort
# Segmented: high / medium / low
# → targeted retention actions

06 — PRODUCTION ML

ICU Census Forecasting System

ProblemHospitals over- or under-staff ICUs due to lack of reliable short-term bed occupancy forecasts. Built a 3-model inference system to answer: how many ICU beds will we need tomorrow?

ModelsRandom Forest arrivals (400 trees, 11 lag + calendar features, 14-day temporal holdout — MAE 2.1, MAPE 9.85%); Ridge Regression LOS with log1p transform; empirical survival hazard table for conditional discharge probability. Census = today + arrivals − expected discharges.

ProductionFastAPI REST API (6 endpoints, Pydantic v2 validation, auto-generated Swagger at /docs); 12 pytest integration tests, all passing. Containerized with Docker, deployed to GCP Cloud Run with auto-scaling and 30s health checks.

DesignNo-leakage enforced: all rolling features shift by 1 day before rolling; 14-day holdout never seen during training. Short-stay classifier uses class_weight="balanced" for minority recall (0.69).

scikit-learnFastAPIDocker GCP Cloud RunPydanticpytest Survival AnalysisTime Series

GitHub → Live API ↗

# 3-model census system

# Model 1: Arrivals (Random Forest)
features = [
  "lag_1d", "lag_7d", "lag_14d",
  "roll_7d_mean", "roll_14d_mean",
  "is_weekend", "day_of_week",
]  # shift-then-roll: no leakage
MAE: 2.1   MAPE: 9.85%

# Model 2: LOS (Ridge + log1p)
los = expm1(ridge.predict(X_los))

# Model 3: Survival hazard table
census_t1 = (census_t0
           + rf_arrivals * 0.15
           - sum(hazard_probs))
# FastAPI 6 ep · 12 tests ✓
# Docker → GCP Cloud Run · live

Building intelligent
systems with data & ML

Projects

Rage Quit Predictor

Movie Recommender Pipeline

AI Repo Co-Pilot

Retention Decision Twin

Churn Timing & Notification Impact

ICU Census Forecasting System

Experience

Skills & Tools

Academic Background

Let's connect

Building intelligentsystems with data & ML

Projects

Rage Quit Predictor

Movie Recommender Pipeline

AI Repo Co-Pilot

Retention Decision Twin

Churn Timing & Notification Impact

ICU Census Forecasting System

Experience

Skills & Tools

Academic Background

Let's connect

Building intelligent
systems with data & ML