--- license: mit tags: - point-cloud - semantic-segmentation - autonomous-driving - lane-detection --- # Model Card for Point Transformer V3 Lane Detection This model performs semantic segmentation (lane line) on LiDAR point cloud data to detect and segment lane markings for autonomous vehicle navigation. ## Model Details ### Model Description Point Transformer V3 model adapted for lane detection from LiDAR point clouds, featuring hierarchical encoder-decoder architecture with self-attention mechanisms for point cloud processing. - **Developed by:** Bryan Chang - **Model type:** Point Transformer V3 (PT-v3m1) - **License:** MIT - **Finetuned from model:** Nuscence-pretrained model ### Model Sources - **Repository:** https://github.com/Bryan1203/LiDAR-Based-Lane-Navigation - **Demo:** https://www.youtube.com/watch?v=cCTi2zFftlY ## Uses ### Direct Use The model can be directly used for: - Lane detection from LiDAR point cloud data (ouster lidar with signal attribute) - Semantic segmentation of road surfaces - Real-time autonomous navigation systems ### Downstream Use Can be integrated into: - Autonomous vehicle navigation systems - Road infrastructure mapping - Traffic monitoring systems - Path planning algorithms ### Out-of-Scope Use This model should not be used for: - Non-LiDAR point cloud data - Indoor navigation - Object detection tasks - High-speed autonomous driving without additional safety systems ## Bias, Risks, and Limitations - Performance may degrade in adverse weather conditions - Requires high-quality LiDAR data - Limited to ground-level lane markings - May struggle with unusual road geometries - Real-time performance depends on hardware capabilities ### Recommendations Users should: - Validate model performance in their specific deployment environment - Implement appropriate safety fallbacks - Consider sensor fusion for robust operation - Monitor inference time for real-time applications - Regularly evaluate model performance on new data ## How to Get Started with the Model refer to the repo, src/pointcept151/inference_ros_filter.py for implementation ## Training Details ### Training Data - Based on SemanticKITTI dataset format - Binary classification: background (0) and lane (1) - Point cloud data with 4 channels: x, y, z, intensity (signal) ### Training Procedure #### Preprocessing - Grid sampling with size 0.05 - Random rotation, scaling, and flipping augmentations - Random jittering (σ=0.005, clip=0.02) #### Training Hyperparameters - **Training regime:** Mixed precision (fp16) - Batch size: 4 - Epochs: 50 - Optimizer: AdamW (lr=0.004, weight_decay=0.005) - Scheduler: OneCycleLR - Loss functions: CrossEntropy + Lovasz Loss #### Speeds, Sizes, Times - Inference time: 300-400ms per frame on RTX A4000 - Model size: ~500MB - Training time: ~24 hours on single GPU ## Evaluation ### Testing Data, Factors & Metrics #### Testing Data - Custom labeled high-bay dataset (UIUC testing facility) - Test split from training data #### Factors - Time of day - Weather conditions - Road surface types - Lane marking visibility #### Metrics - Mean IoU - Per-class accuracy - Inference time - Memory usage ### Results Performance metrics on test set: - Mean IoU: [Pending final evaluation] - Background accuracy: [Pending final evaluation] - Lane accuracy: [Pending final evaluation] ## Environmental Impact - **Hardware Type:** NVIDIA RTX A4000 - **Hours used:** ~24 for training - **Cloud Provider:** Local computation - **Carbon Emitted:** [To be calculated] ## Technical Specifications ### Model Architecture and Objective Detailed in configuration: - Encoder depths: (2, 2, 2, 6, 2) - Encoder channels: (32, 64, 128, 256, 512) - Decoder depths: (2, 2, 2, 2) - MLP ratio: 4 - Attention heads: Varies by layer ### Compute Infrastructure #### Hardware - NVIDIA RTX A4000 (16GB VRAM) - 32GB RAM minimum - Multi-core CPU #### Software - Python 3.8+ - PyTorch 1.10+ - CUDA 11.3+ - ROS Noetic - Pointcept framework ## Model Card Authors Bryan Chang ## Model Card Contact bryanchang1234@gmail.com