Events / Crowd managementPrototype

Crowd Flow Optimization — real-time crowd flow prediction and orchestration

Real-time system combining field sensors, stream processing and predictive models to anticipate and prevent congestion at mega-events.

Executive summary

Prototype crowd flow orchestration system designed for mega-events (concerts, stadiums, festivals). The goal: predict congestion 10 minutes ahead, trigger automated recommendations for field teams and maintain 99.9% availability during critical phases. End-to-end event-driven architecture, from sensor to operator dashboard.

Business problem

Mega-event organisers manage crowds of 50,000 to 100,000 people with reactive, non-predictive tools. Congestion incidents are detected too late, redirection decisions take several minutes, and field teams lack real-time contextual information. The human and reputational cost of a major incident is considerable.

Solution

Real-time 6-layer pipeline: multi-source collection (IoT sensors, anonymised cameras, ticketing), transport via Kafka MSK, Apache Flink stream processing (30s windows), ML scoring via TensorFlow Serving, decision engine with YAML fallback, Next.js operator dashboard with 1s WebSocket. Each layer is independently decoupled and resilient.

Target KPIs

10 min

Congestion prediction horizon

< 200ms

End-to-end decision latency

40%

Operational incidents reduction

99.9%

Target availability for live events

Technical architecture

Event-driven architecture with 6 decoupled layers. The collection layer aggregates sensor streams via AWS IoT Core and MQTT. The transport layer uses AWS MSK (managed multi-AZ Kafka) with Schema Registry for contract validation. Apache Flink on KDA handles stream processing with 30-second sliding windows and exactly-once guarantee. TensorFlow Serving on ECS Fargate handles inference with auto-scaling. The FastAPI decision engine combines ML scoring and YAML fallback rules. The Next.js dashboard receives pushes via WebSocket API Gateway.

General architecture

Architecture Crowd Flow — Vue générale
COUCHE 1 — COLLECTE TERRAINCapteurs IoTComptage, pressionCaméras anon.Anonymisées sourceTicketing APIFlux temps réelBornes accèsFlux entrants/sortantsCOUCHE 2 — EDGE & TRANSPORTEdge gatewayAgrégation localeKafka MSKTopics par zoneSchema RegistryContracts, validationCOUCHE 3 — STREAM PROCESSINGApache FlinkFenêtres 30sFeature storeFeatures en temps réelDétection anomalieTemps réelCOUCHE 4 — MODEL SCORINGTensorFlow ServingInference GPU/CPUPrédiction J+10minCongestion forecastingScore risque / zone0.0 → 1.0 par zoneCOUCHE 5 — DÉCISION & ALERTESDecision engineScoring + règlesRecommandationsOps, équipes terrainFallback rulesYAML déterministeManual overrideOpérateur maîtreCOUCHE 6 — INTERFACE & OBSERVABILITÉDashboard Next.jsWebSocket, 1sPrometheus+GrafanaMétriques infraAlertmanagerRouting, escaladeAudit logTraçabilitéLÉGENDE — OPTIONS PAR COUCHECollecte : capteurs IoT, caméras anonymisées, bornes accèsTransport : Edge gateway + Kafka MSK + Schema RegistryStream processing : Flink fenêtres 30s, feature store temps réelModel scoring : TF Serving, prédiction 10 minutes par zoneDécision : rules engine, ML scoring, override opérateurInterface : Next.js WebSocket, Prometheus, Grafana, PagerDuty

Recommended stack

Architecture Crowd Flow — Stack concrète recommandée
SOURCES — AWS IoT + TerrainAWS IoT CoreMQTT, Lambda@EdgeCaméras NFCAnonymisation srcTicketing RESTAPI temps réelBornes NFCFlux entrantsEDGE & TRANSPORT — AWS MSKIoT Core GWPré-agrégationAWS MSK (Kafka)Managé, multi-AZConfluent RegistrySchema validationSTREAM PROCESSING — Flink on KDAFlink on KDAFenêtres 30s, EOSDynamoDBFeature storeAnomaly DetectorAWS serviceMODEL SCORING — TF Serving ECSTF Serving FargateAuto-scale, blue/greenPrédiction 10minScore congestionRisk APIScore par zoneDÉCISION — FastAPI ECSFastAPI ECS< 50ms P95Push DashboardAPI Gateway WSYAML FallbackHors-ligne okOverride UIOpérateurINTERFACE & OBS — Grafana CloudNext.js + VercelWebSocket, SSEGrafana CloudDashboardsPagerDutySMS opérateursS3 + RDSArchivesSTACK CHOISIE — JUSTIFICATIONSEdgeAWS IoT Core — MQTT natif, Lambda@Edge pré-agrégation, latence < 50msTransportAWS MSK — Kafka managé, 0 ops, réplication multi-AZ, SLA 99.9%StreamFlink sur KDA — fenêtres 30s, exactement-une-fois garantiModèleTF Serving Fargate — auto-scaling, déploiement blue/green, GPU optionnelDécisionFastAPI — < 50ms P95, fallback YAML rules si modèle indisponibleDashboardNext.js + WebSocket — mise à jour 1s, mode dégradé offlineObservabilitéGrafana Cloud + PagerDuty — SLA 99.9%, alertes SMS opérateurs

Congestion alert sequence

Séquence — Alerte Congestion
CapteurEdge GWKafkaFlinkTF SrvDecisionDashboardOpsflux_count (500ms)publish(zone_A_metrics)consume(window 30s)score_request(features)risk_score=0.87alert(zone_A, HIGH)push(recommendation)alerte + actionoverride(REROUTE)

Competitive advantages

No SaaS solution on the market combines 10-minute prediction, a decision engine with deterministic YAML fallback, and a real-time operator dashboard at under 200ms end-to-end latency. The design prioritises resilience: if the ML model is unavailable, YAML rules guarantee operational continuity. The architecture is designed for the constraints of live events: predictable load spikes, zero tolerance for outages during critical phases.

Risks and mitigations

Primary risk: quality and availability of field sensors. Mitigation: multi-source architecture with graceful degradation if a stream becomes unavailable. Second risk: network latency in large venues (saturated Wi-Fi). Mitigation: edge gateway with local buffer and batch transmission. Third risk: model false positives generating unnecessary alerts. Mitigation: configurable confidence threshold and human validation for critical-level alerts. Fourth risk: adoption by field teams. Mitigation: simplified operator UX and offline mode.

Impact

  • Prototype / evaluation in progress.
  • Detailed impact data available on request.

Prototype / evaluation in progress.

Project scope

Pilot scope: 1 event, 1 venue, capacity 20,000 people. POC duration: 8 weeks (4 sprints). Environment: AWS eu-west-1 + ECS Fargate. Governance: data anonymised at source, no personal data stored, GDPR compliance by design.

Hosting and resilience

Deployment: AWS ECS Fargate (backend) + Vercel (dashboard) + AWS MSK multi-AZ (transport). Target availability: 99.9% SLA during live events. RTO < 5 minutes, RPO < 1 minute. YAML fallback automatically activated if the ML model exceeds 500ms latency. Grafana Cloud monitoring + PagerDuty alerts for operators.

Role

Architecture temps réel, stream processing design, ML pipeline, dashboard ops design

Next steps

Multi-site extension, seasonal model calibration, organiser SSO integration.

Tech stack

AWS IoT CoreKafka MSKApache FlinkTensorFlow ServingFastAPINext.jsWebSocketPrometheusGrafanaPagerDutyPostgreSQLDockerKubernetesECS Fargate

Timeline

1

S1–S2

Integration

Sensor integration and Kafka pipeline

2

S3–S4

Model

Predictive model and feature engineering

3

S5–S6

Dashboard

Ops dashboard and load testing

4

S7–S8

Pilot

Live event pilot, calibration, go/no-go