SENIOR DATA ENGINEER · DUBAI

Building Production Data Platforms at Scale

6+ years architecting data lakehouses, ETL frameworks, and ML systems that power enterprise supply chain decisions. 50+ pipelines. 95%+ reliability. 10,000+ SKUs.

Godson Kurishinkal - Data Engineer

WHO I AM

Building Data Infrastructure That Matters

I'm not just a data engineer—I'm the person teams call when they need data they can trust, delivered fast.

With 6+ years in data engineering across UAE companies, I've built my career on turning messy data into reliable infrastructure. Currently at Landmark Group in Dubai, I build pipelines that power warehouse and delivery operations—the data infrastructure behind analytical and business intelligence reports that drive real decisions.

My work sits between raw chaos and actionable intelligence: 10,000+ SKUs flowing through pipelines I designed, 50+ ETL jobs that run while everyone sleeps, and a 3-tier anomaly detection system that catches problems before they become crises.

The thing I'm proudest of? Reducing data delivery from 4 hours to 30 minutes. That's not a vanity metric—it's the difference between same-day decisions and playing catch-up.

🏢 Landmark Group
Dubai, UAE
🎓 IIT Madras
BS Data Science
🎯 Next Goal
Cloud Platform Engineering

REAL IMPACT

Before & After Transformations

Numbers that moved the needle for supply chain operations

Data Delivery Speed

Before 4 Hours Manual extraction & Excel processing
After 30 Min Automated pipelines with Polars
87% Faster Delivery
🔄

Data Freshness

Before 48 Hours Next-day reporting only
After 2-4 Hrs Same-day operational decisions
90% Fresher Data
🛡️

Data Quality Incidents

Before Weekly Reactive firefighting mode
After Monthly Proactive anomaly detection
500+ Anomalies Caught
0+
Years Experience
0+
ETL Pipelines
0%+
Reliability
0+
Fact Tables

WHAT I DO

Core Specializations

End-to-end data engineering for enterprise supply chain operations

Medallion Data Lakehouse

Bronze → Silver → Gold architecture with Hive-partitioned Parquet files. Designed for scalability, data lineage tracking, and incremental processing patterns that handle 10,000+ SKUs daily.

ETL Pipeline Architecture

Configuration-driven pipelines with Abstract Base Classes for extractors, transformers, and loaders. 50+ production pipelines with 95%+ reliability, supporting FULL and INCREMENTAL load patterns.

Dimensional Modeling

Star schema design with 15+ fact tables and 6+ dimension tables. Slowly Changing Dimensions (SCD Type 2), conformed dimensions, and optimized for both analytical queries and reporting.

Data Quality Engineering

3-tier validation framework: schema validation → business rule checks → statistical anomaly detection. Z-score and IQR-based outlier detection that's caught 500+ issues before they reached stakeholders.

ML-Powered Forecasting

15+ algorithm ensemble using ADI/CV² demand pattern classification. Prophet, XGBoost, and statistical methods automatically selected based on demand characteristics for optimal accuracy.

RPA & Legacy Integration

5+ Selenium/PyAutoGUI bots that extract data from systems without APIs. When there's no clean way in, I build one—automating what others think can't be automated.

HOW I BUILD

Configuration-Driven Pipelines

Real code from production systems—declarative configs that enable schema validation, quality checks, and flexible load modes

config.py Python
PIPELINE = {
    "load": {"mode": "FULL"},
    "paths": {"output": "data/landing/orders"},
    
    "schema": {
        "order_id": "string",
        "amount": "decimal(10,2)",
        "country": "string",
        "created_at": "timestamp"
    },
    
    "quality": {
        "required_columns": ["order_id", "amount"],
        "not_null": ["order_id"]
    }
}
ingestion.py Python
from config import PIPELINE, DB_CONFIG

# Database tables ingestion
for table in DB_CONFIG["tables"]:
    df = (
        db.read
        .format("jdbc")
        .option("url", DB_CONFIG["url"])
        .option("dbtable", table)
        .load()
    )
    
    # Validate against config schema
    validate_schema(df, PIPELINE["schema"])
    check_quality(df, PIPELINE["quality"])
    
    df.write.parquet(PIPELINE["paths"]["output"])
Read Config
Validate Schema
Quality Check
Load Data

✓ Schema validation  •  ✓ Quality checks  •  ✓ FULL/INCREMENTAL modes

TOOLS I USE

Tech Stack

Production-tested technologies for building reliable data systems

Python SQL Apache Airflow Docker Git GitHub SQL Server Polars DuckDB Azure Data Factory Databricks Apache Kafka PySpark Microsoft Fabric Terraform

WHERE I'M HEADING

2026 Roadmap

Cloud certifications, leadership growth, and enterprise-scale platforms

🎓

Certifications

  • Fabric Data Engineer (DP-700) Q1
  • Databricks DE Associate Q2
  • Databricks DE Professional Q3
💼

Portfolio

  • 5+ detailed case studies Ongoing
  • Architecture diagrams Q1
  • Open-source ETL toolkit Q2
🚀

Career

  • Senior/Staff Data Engineer Q1-Q2
  • Cloud-native data platforms Ongoing
  • Technical leadership & mentoring Ongoing

🚀 Let's Connect

Follow along as I build in public. New case studies, learnings, and projects coming regularly.