Traffic Monitoring System
Developed an end-to-end computer vision pipeline for Hong Kong tunnel traffic monitoring, processing 100+ camera feeds in real-time with 95%+ detection accuracy using 2D/3D LiDAR fusion.
95%+
Detection Accuracy (mAP)
100+
Camera Feeds
30-60
FPS
< 100ms
Detection Latency
The Challenge
Hong Kong's tunnel infrastructure handles millions of vehicles daily. Traditional monitoring relied on human operators watching dozens of camera feeds—an approach prone to fatigue, inconsistency, and delayed incident response. The client needed an automated system that could detect vehicles, classify them, track their movement, and identify incidents in real-time.
Key Challenges
- •Processing 100+ simultaneous camera feeds with minimal latency
- •Varying lighting conditions inside tunnels (bright entrances, dark interiors, flickering lights)
- •Occlusion from large vehicles blocking smaller ones
- •Accurate vehicle classification (cars, trucks, motorcycles, emergency vehicles)
- •Real-time incident detection (stopped vehicles, wrong-way drivers, debris)
The Solution
I designed a multi-modal perception system that fuses 2D camera imagery with 3D LiDAR point clouds for robust vehicle detection and tracking. The system processes all feeds in real-time, maintaining a unified world model of tunnel traffic that enables sophisticated incident detection and traffic analytics.
Built custom detection pipeline using PaddlePaddle's PP-YOLOE for 2D detection with domain-specific training
Implemented 3D LiDAR processing using PointPillars architecture for depth-accurate vehicle localization
Developed sensor fusion algorithm combining 2D detections with 3D point clouds using calibrated projection
Created multi-object tracking system using DeepSORT with custom re-identification features for tunnel environments
Deployed on edge devices with TensorRT optimization for real-time performance
System Architecture
The architecture follows a distributed edge-cloud pattern. Edge devices handle real-time inference while the cloud aggregates data for analytics and long-term storage.
Edge Processing Units
NVIDIA Jetson AGX Xavier devices deployed at each tunnel section. Each unit processes 8-12 camera feeds and 2-4 LiDAR sensors with on-device inference.
2D Detection Module
PP-YOLOE model fine-tuned on 50,000+ tunnel-specific images. Handles vehicle detection, classification, and attribute recognition (color, size, type).
3D Perception Module
PointPillars-based LiDAR processing for accurate 3D bounding boxes. Provides ground-truth depth information for fusion.
Fusion Engine
Proprietary algorithm that projects 3D LiDAR detections into 2D camera space using extrinsic calibration. Resolves conflicts and improves detection confidence.
Tracking & Analytics
Multi-object tracker maintaining vehicle identities across camera handoffs. Feeds into traffic flow analytics and incident detection system.
Key Implementation Details
Handling Extreme Lighting Variations
Tunnel environments present unique lighting challenges—from bright daylight at entrances to near-darkness inside. I implemented adaptive histogram equalization preprocessing and trained the model with aggressive augmentation including synthetic lighting variations. This improved detection accuracy in low-light conditions by 23%.
Sensor Calibration Pipeline
Accurate fusion requires precise calibration between cameras and LiDAR sensors. I built an automated calibration pipeline using checkerboard patterns and developed a continuous calibration monitoring system that detects drift and triggers recalibration alerts.
Real-Time Multi-Camera Tracking
Vehicles must be tracked consistently as they move through the tunnel, appearing in multiple cameras. I designed a hierarchical tracking system: local trackers per camera, and a global tracker that maintains identity across camera handoffs using learned appearance embeddings.
Edge Deployment Optimization
To achieve 30-60 FPS on Jetson devices, I implemented INT8 quantization with calibration, layer fusion, and dynamic batching. Custom CUDA kernels for preprocessing further reduced latency by eliminating CPU-GPU transfers.
Tech Stack
Deep Learning
Computer Vision
Edge Computing
Backend
Key Learnings
Multi-modal fusion is only as good as your calibration—invest heavily in robust calibration pipelines
Edge deployment constraints force creative optimization that often improves overall system design
Domain-specific training data is more valuable than larger generic datasets
Real-time systems need extensive stress testing under degraded conditions
Interested in Similar Solutions?
Let's discuss how I can help bring your AI ideas to production.
Get in Touch