Problem Statement
Processing millions of log events per minute from microservices, extracting insights, and detecting anomalies in real-time.
Architecture
App Servers → Kafka → Stream Processor → ClickHouse → Grafana
Components
- Producers: Apps send structured logs to Kafka topics
- Stream Processor: Python consumers aggregate and enrich data
- Storage: ClickHouse for column-oriented analytics
- Visualization: Grafana dashboards for monitoring
from kafka import KafkaConsumer
import clickhouse_driver
consumer = KafkaConsumer('app-logs', bootstrap_servers=['localhost:9092'])
client = clickhouse_driver.Client('localhost')
for message in consumer:
log = json.loads(message.value)
# Extract metrics
if log['level'] == 'ERROR':
client.execute(
'INSERT INTO error_logs VALUES',
[(log['timestamp'], log['service'], log['message'])]
)
Features
- Real-time Processing: Sub-second latency from log to dashboard
- Anomaly Detection: Statistical models flag unusual patterns
- Alerting: PagerDuty integration for critical errors
- Retention: 30-day hot storage, 1-year cold storage in S3
Scale
- Processes 5M events/minute at peak
- 99.9% delivery guarantee
- Sub-100ms query latency on ClickHouse