SENIOR DATA ENGINEER · DUBAI
6+ years architecting data lakehouses, ETL frameworks, and ML systems that power enterprise supply chain decisions. 50+ pipelines. 95%+ reliability. 10,000+ SKUs.
WHO I AM
I'm not just a data engineer—I'm the person teams call when they need data they can trust, delivered fast.
With 6+ years in data engineering across UAE companies, I've built my career on turning messy data into reliable infrastructure. Currently at Landmark Group in Dubai, I build pipelines that power warehouse and delivery operations—the data infrastructure behind analytical and business intelligence reports that drive real decisions.
My work sits between raw chaos and actionable intelligence: 10,000+ SKUs flowing through pipelines I designed, 50+ ETL jobs that run while everyone sleeps, and a 3-tier anomaly detection system that catches problems before they become crises.
The thing I'm proudest of? Reducing data delivery from 4 hours to 30 minutes. That's not a vanity metric—it's the difference between same-day decisions and playing catch-up.
REAL IMPACT
Numbers that moved the needle for supply chain operations
WHAT I DO
End-to-end data engineering for enterprise supply chain operations
Bronze → Silver → Gold architecture with Hive-partitioned Parquet files. Designed for scalability, data lineage tracking, and incremental processing patterns that handle 10,000+ SKUs daily.
Configuration-driven pipelines with Abstract Base Classes for extractors, transformers, and loaders. 50+ production pipelines with 95%+ reliability, supporting FULL and INCREMENTAL load patterns.
Star schema design with 15+ fact tables and 6+ dimension tables. Slowly Changing Dimensions (SCD Type 2), conformed dimensions, and optimized for both analytical queries and reporting.
3-tier validation framework: schema validation → business rule checks → statistical anomaly detection. Z-score and IQR-based outlier detection that's caught 500+ issues before they reached stakeholders.
15+ algorithm ensemble using ADI/CV² demand pattern classification. Prophet, XGBoost, and statistical methods automatically selected based on demand characteristics for optimal accuracy.
5+ Selenium/PyAutoGUI bots that extract data from systems without APIs. When there's no clean way in, I build one—automating what others think can't be automated.
WHAT I'VE BUILT
Real production systems powering supply chain operations
Bronze/Silver/Gold architecture processing 10,000+ SKUs with 95%+ pipeline reliability.
Production-grade ETL framework with Abstract Base Classes, data quality validation, and 3-tier anomaly detection.
Intelligent forecasting engine with 15+ ensemble algorithms and ADI/CV² pattern classification.
3-tier statistical system for data quality monitoring: validation → outlier → volatility.
HOW I BUILD
Real code from production systems—declarative configs that enable schema validation, quality checks, and flexible load modes
PIPELINE = { "load": {"mode": "FULL"}, "paths": {"output": "data/landing/orders"}, "schema": { "order_id": "string", "amount": "decimal(10,2)", "country": "string", "created_at": "timestamp" }, "quality": { "required_columns": ["order_id", "amount"], "not_null": ["order_id"] } }
from config import PIPELINE, DB_CONFIG # Database tables ingestion for table in DB_CONFIG["tables"]: df = ( db.read .format("jdbc") .option("url", DB_CONFIG["url"]) .option("dbtable", table) .load() ) # Validate against config schema validate_schema(df, PIPELINE["schema"]) check_quality(df, PIPELINE["quality"]) df.write.parquet(PIPELINE["paths"]["output"])
✓ Schema validation • ✓ Quality checks • ✓ FULL/INCREMENTAL modes
TOOLS I USE
Production-tested technologies for building reliable data systems
WHERE I'M HEADING
Cloud certifications, leadership growth, and enterprise-scale platforms
Follow along as I build in public. New case studies, learnings, and projects coming regularly.