🤖

PRAVEEN KUMAR YADAV

[ DATA ENGINEER II ]

Processing 400M+ Records Daily...
0
Records/Day
0
Pipelines Active
0
Graph Nodes

SYSTEM ANALYSIS

Advanced Data Engineering Specialist with 2+ years of experience architecting next-generation data pipelines and ETL frameworks. Expert in Apache Spark, Databricks, and multi-cloud infrastructure (AWS, GCP). Proven track record in optimizing distributed systems, implementing real-time streaming architectures, and integrating machine learning workflows for predictive analytics. Currently engineering mission-critical data infrastructure at Aidetic Software for Myntra's platform, processing 400+ million records daily with sub-second latency and 99.99% reliability.

TECHNICAL ARSENAL

◢ PROGRAMMING

Python SQL Advanced C/C++ Java Cypher PySpark

◢ BIG DATA

Hadoop Databricks Apache Spark Prophecy Neo4J Cosmos DB

◢ ANALYTICS

Power BI Grafana Prometheus Superset

◢ AI/ML

NumPy Pandas Matplotlib Seaborn

◢ CLOUD

AWS GCP BigQuery PubSub Docker

◢ DEVOPS

Jira Confluence Bitbucket FastAPI

MISSION LOG

Data Engineer II

Aidetic Software Private Limited - Bangalore, India

FEB 2025 - PRESENT
  • Orchestrated complete Apache Superset migration for Myntra (v1.3.2 → v4.1.2), ensuring zero data loss across all metadata layers
  • Deployed comprehensive monitoring infrastructure using Grafana, Prometheus, and Flower for real-time system observability
  • Achieved 20% performance optimization through advanced configuration tuning and metadata optimization algorithms
  • Engineered unified auto-migration framework supporting multi-source data transfer to Databricks with intelligent SQL transpilation
  • Architected large-scale migration from BigQuery to Delta Lake with Z-ordering, compaction, and schema evolution strategies
  • Built real-time streaming pipelines using Delta Live Tables with automated data quality validation systems
  • Scaled infrastructure to process 400M+ daily records with guaranteed low-latency and SLA compliance

Data Engineer I

Johnson & Johnson - Remote, USA

NOV 2023 - FEB 2025
  • Designed advanced genealogy data model for complex material hierarchy analysis using graph theory algorithms
  • Architected scalable OLAP infrastructure using SparkSQL and PySpark for processing enterprise SAP datasets
  • Implemented incremental ingestion framework with efficient upsert mechanisms for real-time data synchronization
  • Built optimized ETL pipelines using Prophecy framework compatible with Graph Database architectures
  • Integrated Neo4j Spark connector to ingest 10M+ nodes/edges for high-performance hierarchical visualization
  • Scaled graph ingestion to 100M+ nodes/edges using advanced Cypher query optimization techniques
  • Delivered ML-ready data pipelines achieving 11% predictive accuracy improvement through enhanced data insights

Data Engineer I

Vanguard Supply Chain Solutions - Remote, USA

JAN 2023 - NOV 2023
  • Developed intelligent dashboards for Distribution Centre Managers with advanced driver performance analytics
  • Implemented automated weekly truck log analysis using Databricks, achieving 30% time efficiency improvement
  • Executed monthly Distribution Centre performance evaluations with actionable improvement recommendations
  • Deployed proactive alert systems for animal welfare route compliance with temperature monitoring

TRAINING PROTOCOL

Bachelor of Technology

Motilal Nehru National Institute of Technology, Prayagraj

2019 - 2023

SPECIALIZATION: BIOTECHNOLOGY | CGPA: 7.38

ACHIEVEMENTS

⭐ HACKERRANK ELITE

5 Star SQL | 4 Star Python

💻 PROBLEM SOLVER

500+ Problems | LeetCode & GFG

🎓 DATABRICKS CERTIFIED

Data Engineer Associate

✅ CERTIFIED PROFESSIONAL

SQL Advanced | Python Certified