Praveen Kumar Yadav

SYSTEM ANALYSIS

Advanced Data Engineering Specialist with 2+ years of experience architecting next-generation data pipelines and ETL frameworks. Expert in Apache Spark, Databricks, and multi-cloud infrastructure (AWS, GCP). Proven track record in optimizing distributed systems, implementing real-time streaming architectures, and integrating machine learning workflows for predictive analytics. Currently engineering mission-critical data infrastructure at Aidetic Software for Myntra's platform, processing 400+ million records daily with sub-second latency and 99.99% reliability.

MISSION LOG

Data Engineer II

Aidetic Software Private Limited - Bangalore, India

FEB 2025 - PRESENT

Orchestrated complete Apache Superset migration for Myntra (v1.3.2 → v4.1.2), ensuring zero data loss across all metadata layers
Deployed comprehensive monitoring infrastructure using Grafana, Prometheus, and Flower for real-time system observability
Achieved 20% performance optimization through advanced configuration tuning and metadata optimization algorithms
Engineered unified auto-migration framework supporting multi-source data transfer to Databricks with intelligent SQL transpilation
Architected large-scale migration from BigQuery to Delta Lake with Z-ordering, compaction, and schema evolution strategies
Built real-time streaming pipelines using Delta Live Tables with automated data quality validation systems
Scaled infrastructure to process 400M+ daily records with guaranteed low-latency and SLA compliance

Data Engineer I

Johnson & Johnson - Remote, USA

NOV 2023 - FEB 2025

Designed advanced genealogy data model for complex material hierarchy analysis using graph theory algorithms
Architected scalable OLAP infrastructure using SparkSQL and PySpark for processing enterprise SAP datasets
Implemented incremental ingestion framework with efficient upsert mechanisms for real-time data synchronization
Built optimized ETL pipelines using Prophecy framework compatible with Graph Database architectures
Integrated Neo4j Spark connector to ingest 10M+ nodes/edges for high-performance hierarchical visualization
Scaled graph ingestion to 100M+ nodes/edges using advanced Cypher query optimization techniques
Delivered ML-ready data pipelines achieving 11% predictive accuracy improvement through enhanced data insights

Data Engineer I

Vanguard Supply Chain Solutions - Remote, USA

JAN 2023 - NOV 2023

Developed intelligent dashboards for Distribution Centre Managers with advanced driver performance analytics
Implemented automated weekly truck log analysis using Databricks, achieving 30% time efficiency improvement
Executed monthly Distribution Centre performance evaluations with actionable improvement recommendations
Deployed proactive alert systems for animal welfare route compliance with temperature monitoring

SYSTEM ANALYSIS

TECHNICAL ARSENAL

◢ PROGRAMMING

◢ BIG DATA

◢ ANALYTICS

◢ AI/ML

◢ CLOUD

◢ DEVOPS

MISSION LOG

Data Engineer II

Data Engineer I

Data Engineer I

TRAINING PROTOCOL

Bachelor of Technology

ACHIEVEMENTS