3-Tier Anomaly Detection System
A multi-layer statistical system that catches data quality issues at ingestion, during processing, and monitors long-term volatility—before bad data impacts downstream analytics.
The Challenge
In supply chain analytics, data quality issues don't just cause wrong reports—they cause wrong decisions. A single corrupted inventory record can trigger unnecessary purchase orders worth thousands of dollars.
We were drowning in data quality fires:
- Source systems sending null values where quantities should exist
- Dimension tables with duplicate or missing keys breaking joins
- Sudden spikes in sales data from system glitches (not actual demand)
- Gradual data drift going unnoticed until quarterly reviews
- No visibility into which specific records were problematic
The Solution
I designed a 3-tier anomaly detection system that operates at different stages of the data pipeline, catching issues at the right level with the right technique:
The 3 Tiers
Schema Validation
Gate: Bronze → SilverRule-based checks that ensure data conforms to expected schemas, constraints, and business rules before entering the Silver layer.
Statistical Outlier Detection
Gate: Silver → GoldStatistical methods that identify individual records that deviate significantly from expected distributions—catching data that's valid but suspicious.
Volatility Monitoring
Continuous: Gold LayerTime-series analysis that detects aggregate-level anomalies, trend breaks, and gradual drift that individual outlier detection would miss.
Detection Methods
Z-Score
Measures standard deviations from mean. Best for normally distributed data.
z = (x - μ) / σModified Z-Score
Uses median instead of mean. Robust to existing outliers.
M = 0.6745(x - x̃) / MADIQR Fencing
Quartile-based bounds. No distribution assumptions.
[Q1 - 1.5×IQR, Q3 + 1.5×IQR]Rolling CV
Tracks coefficient of variation over time windows.
CV = σ / μ (rolling)Grubbs Test
Statistical test for single outliers in univariate data.
G = max|x - x̄| / sDBSCAN
Density-based clustering for multivariate anomalies.
ε-neighborhood densityImplementation
The anomaly detector is configurable per metric with multiple detection strategies:
from dataclasses import dataclass from enum import Enum import numpy as np class DetectionMethod(Enum): ZSCORE = "zscore" MODIFIED_ZSCORE = "modified_zscore" IQR = "iqr" @dataclass class AnomalyConfig: method: DetectionMethod threshold: float = 3.0 min_samples: int = 30 class AnomalyDetector: """Multi-method statistical anomaly detection.""" def __init__(self, config: AnomalyConfig): self.config = config self._methods = { DetectionMethod.ZSCORE: self._zscore, DetectionMethod.MODIFIED_ZSCORE: self._modified_zscore, DetectionMethod.IQR: self._iqr_fence, } def detect(self, data: np.ndarray) -> np.ndarray: """Return boolean mask of anomalies.""" if len(data) < self.config.min_samples: return np.zeros(len(data), dtype=bool) return self._methods[self.config.method](data) def _zscore(self, data: np.ndarray) -> np.ndarray: z = (data - np.mean(data)) / np.std(data) return np.abs(z) > self.config.threshold def _modified_zscore(self, data: np.ndarray) -> np.ndarray: median = np.median(data) mad = np.median(np.abs(data - median)) m = 0.6745 * (data - median) / (mad + 1e-10) return np.abs(m) > self.config.threshold def _iqr_fence(self, data: np.ndarray) -> np.ndarray: q1, q3 = np.percentile(data, [25, 75]) iqr = q3 - q1 return (data < q1 - 1.5*iqr) | (data > q3 + 1.5*iqr)
Tech Stack
Key Learnings
Layers Beat Single Methods
No single detection method catches everything. The 3-tier approach improved detection by 3x over Z-score alone.
Context Matters
A 50% sales spike on Black Friday is expected. Time-aware thresholds reduced false positives by 60%.
Quarantine, Don't Delete
Anomalies go to a quarantine table for review—sometimes "anomalies" are real events worth investigating.
Alert Fatigue is Real
Started with 100+ daily alerts. Tuned thresholds and added severity levels to reduce to ~10 actionable alerts.
Want to Build Better Data Quality?
Let's discuss how layered anomaly detection can save your analytics from bad data.