---
license: mit
tags:
- point-cloud
- semantic-segmentation
- autonomous-driving
- lane-detection
---

# Model Card for Point Transformer V3 Lane Detection

This model performs semantic segmentation (lane line) on LiDAR point cloud data to detect and segment lane markings for autonomous vehicle navigation.

## Model Details

### Model Description

Point Transformer V3 model adapted for lane detection from LiDAR point clouds, featuring hierarchical encoder-decoder architecture with self-attention mechanisms for point cloud processing.

- **Developed by:** Bryan Chang
- **Model type:** Point Transformer V3 (PT-v3m1)
- **License:** MIT
- **Finetuned from model:** Nuscence-pretrained model

### Model Sources

- **Repository:** https://github.com/Bryan1203/LiDAR-Based-Lane-Navigation
- **Demo:** https://www.youtube.com/watch?v=cCTi2zFftlY

## Uses

### Direct Use

The model can be directly used for:
- Lane detection from LiDAR point cloud data (ouster lidar with signal attribute)
- Semantic segmentation of road surfaces
- Real-time autonomous navigation systems

### Downstream Use

Can be integrated into:
- Autonomous vehicle navigation systems
- Road infrastructure mapping
- Traffic monitoring systems
- Path planning algorithms

### Out-of-Scope Use

This model should not be used for:
- Non-LiDAR point cloud data
- Indoor navigation
- Object detection tasks
- High-speed autonomous driving without additional safety systems

## Bias, Risks, and Limitations

- Performance may degrade in adverse weather conditions
- Requires high-quality LiDAR data
- Limited to ground-level lane markings
- May struggle with unusual road geometries
- Real-time performance depends on hardware capabilities

### Recommendations

Users should:
- Validate model performance in their specific deployment environment
- Implement appropriate safety fallbacks
- Consider sensor fusion for robust operation
- Monitor inference time for real-time applications
- Regularly evaluate model performance on new data

## How to Get Started with the Model

refer to the repo, src/pointcept151/inference_ros_filter.py for implementation

## Training Details

### Training Data

- Based on SemanticKITTI dataset format
- Binary classification: background (0) and lane (1)
- Point cloud data with 4 channels: x, y, z, intensity (signal)

### Training Procedure

#### Preprocessing
- Grid sampling with size 0.05
- Random rotation, scaling, and flipping augmentations
- Random jittering (σ=0.005, clip=0.02)

#### Training Hyperparameters

- **Training regime:** Mixed precision (fp16)
- Batch size: 4
- Epochs: 50
- Optimizer: AdamW (lr=0.004, weight_decay=0.005)
- Scheduler: OneCycleLR
- Loss functions: CrossEntropy + Lovasz Loss

#### Speeds, Sizes, Times

- Inference time: 300-400ms per frame on RTX A4000
- Model size: ~500MB
- Training time: ~24 hours on single GPU

## Evaluation

### Testing Data, Factors & Metrics

#### Testing Data
- Custom labeled high-bay dataset (UIUC testing facility)
- Test split from training data

#### Factors
- Time of day
- Weather conditions
- Road surface types
- Lane marking visibility

#### Metrics
- Mean IoU
- Per-class accuracy
- Inference time
- Memory usage

### Results

Performance metrics on test set:
- Mean IoU: [Pending final evaluation]
- Background accuracy: [Pending final evaluation]
- Lane accuracy: [Pending final evaluation]

## Environmental Impact

- **Hardware Type:** NVIDIA RTX A4000
- **Hours used:** ~24 for training
- **Cloud Provider:** Local computation
- **Carbon Emitted:** [To be calculated]

## Technical Specifications

### Model Architecture and Objective

Detailed in configuration:
- Encoder depths: (2, 2, 2, 6, 2)
- Encoder channels: (32, 64, 128, 256, 512)
- Decoder depths: (2, 2, 2, 2)
- MLP ratio: 4
- Attention heads: Varies by layer

### Compute Infrastructure

#### Hardware
- NVIDIA RTX A4000 (16GB VRAM)
- 32GB RAM minimum
- Multi-core CPU

#### Software
- Python 3.8+
- PyTorch 1.10+
- CUDA 11.3+
- ROS Noetic
- Pointcept framework

## Model Card Authors

Bryan Chang

## Model Card Contact

bryanchang1234@gmail.com