Data Scientist · ML Engineer · Agentic AI
Years Experience
Cost Savings Delivered
Records Forecasted
GPA at ASU
About
I began my career in 2013 at Infosys, progressing from Software Engineer to Data Scientist over six years — building production ML models and data systems for BNSF Railway, one of North America's largest freight networks.
After relocating to the United States, I invested in 1,200+ hours of self-directed upskilling in Generative AI, LLMs, RAG architectures, and cloud-native ML — building hands-on expertise in PyTorch, AWS, and Docker before enrolling at ASU.
I'm completing an M.S. in Data Science (Computing & Decision Analytics) at Arizona State University with a perfect 4.0 GPA. I was also a Gold Medalist during my undergraduate studies at Anna University.
Most recently, I served as a Data Scientist Intern at the NM Department of IT — architecting a first-of-its-kind GenAI conversational analytics platform using LangGraph + RAG, and deploying ensemble anomaly detection models achieving 85% threat detection accuracy at statewide scale.
Gym sessions and outdoor activities.
Painting, sewing, and intricate craftwork.
Books that expand perspective beyond the tech world.
Capabilities
End-to-end agentic system design — multi-step reasoning pipelines, RAG over financial and operational data, vector stores, semantic search, structured outputs, and production LLM deployment in regulated environments.
Classification, regression, anomaly detection, time-series forecasting, demand planning, and ensemble methods in high-volume production environments.
End-to-end ML lifecycle: experiment tracking, model versioning, automated pipelines, and cloud-native deployment at scale.
Distributed ETL frameworks, real-time pipelines, SQL optimization, and data quality monitoring — 5TB+ daily at 99.9% uptime.
Statistical modeling, hypothesis testing, executive dashboards, and data-driven storytelling for cross-functional stakeholders.
Cloud-native ML deployment, serverless architectures, container orchestration, and CI/CD pipelines across AWS ecosystem.
Career
Data Scientist Intern
Jun 2025 – Aug 2025 · State Government · Albuquerque, NM
GenAI Conversational Analytics Platform
Architected a GenAI platform using LangGraph + RAG on AWS Lambda & S3 processing 10K+ cybersecurity incidents, reducing intelligence retrieval time by 70% (hours → seconds).
Ensemble Anomaly Detection
Deployed XGBoost + Isolation Forest achieving 85% threat detection accuracy with 60% fewer false positives — integrated via FastAPI microservices.
Real-Time ETL & Data Quality
Built Airflow-orchestrated ETL pipelines with automated validation, improving data quality by 40% and cutting failure rate by 30%.
Technology Analyst — Data & Analytics
Jan 2017 – Jan 2019 · Freight Network · North America
ML-Powered Fraud Detection
Built XGBoost + Random Forest fraud detection processing 1M+ daily transactions, delivering $1M+ annual savings and earning a formal CFO commendation.
SHAP-Based Model Explainability
Engineered a SHAP explainability layer for high-stakes financial decisions, reducing stakeholder review cycles by 30% and ensuring regulatory transparency.
A/B Testing & Model Iteration
Applied causal inference & A/B testing across 5+ model iterations, improving F1 score by 18% and reducing false negatives in production.
Data Scientist
Jun 2015 – Jan 2017 · Predictive Maintenance & Scale
Predictive Maintenance for Locomotives
Designed time-series predictive maintenance models for 500+ locomotives, reducing unplanned downtime by 25% and saving an estimated $400K annually.
Distributed ETL at Scale
Orchestrated PySpark + Hadoop ETL handling 5TB+ daily ingestion at 99.9% uptime, powering Tableau & Power BI dashboards for 10+ stakeholders.
MLflow & ML Lifecycle
Implemented MLflow experiment tracking and versioning, cutting deployment time by 35% and standardizing ML lifecycle across a 6-person team.
Software Engineer
Oct 2013 – May 2015 · Data Engineering & Pipelines
Production Data Pipelines
Engineered ML data pipelines processing 500K+ daily records, maintaining 99%+ data completeness across sprint cycles.
Automated Data Validation
Built automated validation and quality monitoring frameworks, reducing data defect rates by 45% across 3 production pipelines.
Self-Directed Research — Generative AI & Data Science
Jan 2019 – May 2024 · United States
Structured Upskilling in GenAI & Cloud ML
Completed 1,200+ hours in GenAI, LLMs, RAG architectures, and cloud-native ML — gaining hands-on expertise in PyTorch, AWS, and Docker ahead of M.S. enrollment at ASU.
Certifications & Personal Projects
Earned ML Specialization, Deep Learning Specialization, and Gen AI with AWS certifications. Built multiple projects across computer vision, NLP, and RAG systems.
Portfolio
Automated sanctions screening system checking entities against a U.S. government blacklist of 18,700+ entries using keyword, semantic, and knowledge graph search. All data processed locally — deployed via FastAPI with a bulk-screening dashboard and tamper-proof audit log.
End-to-end ML forecasting + GenAI reasoning system for service parts demand planning. Powered by the M5 Forecasting Dataset (46M+ real Walmart records) remapped to semiconductor equipment supply chain schema — covering NPI launches, reliability signals, field operations, and service campaigns. A GenAI reasoning layer (LangChain + RAG) enables plain-English stakeholder queries over forecast outputs.
CV pipeline processing 500+ hours of footage detecting safety-critical defects. Real-time Streamlit dashboard with defect-type filtering and exportable reports.
Deep learning pipeline fusing RGB & thermal imagery via Vision Transformers with attention heatmap explainability.
Time-aware fraud pipeline on 300K+ real-time transactions with leakage-safe temporal modeling and SHAP interpretability.
Education
Arizona State University
Computing & Decision Analytics
Aug 2024 – May 2026 (Expected)
GPA: 4.0 / 4.0Anna University
Information Technology
2009 – 2013
GPA: 3.6 / 4.0 · Gold Medalist