altawil
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,136 @@
|
|
1 |
-
|
2 |
-
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
You got it. Here is the complete, professional, and copy-paste-ready README.md content in English for your model repository on the Hugging Face Hub.
|
2 |
+
|
3 |
+
This version is structured to be clear, informative, and follows the best practices for creating a Model Card.
|
4 |
+
|
5 |
+
README.md for Adam-IT/Interfuser-Baseer-v1
|
6 |
+
Generated markdown
|
7 |
+
---
|
8 |
+
license: mit
|
9 |
+
language:
|
10 |
+
- en
|
11 |
+
library_name: pytorch
|
12 |
+
tags:
|
13 |
+
- computer-vision
|
14 |
+
- autonomous-driving
|
15 |
+
- self-driving
|
16 |
+
- interfuser
|
17 |
+
- carla
|
18 |
+
- object-detection
|
19 |
+
- trajectory-prediction
|
20 |
+
datasets:
|
21 |
+
- PDM-Lite-CARLA
|
22 |
+
pipeline_tag: object-detection
|
23 |
+
---
|
24 |
+
|
25 |
+
# π InterFuser Model for Autonomous Driving (Fine-tuned for Baseer API)
|
26 |
+
|
27 |
+
## π Model Description
|
28 |
+
|
29 |
+
This repository contains the fine-tuned weights for the **InterFuser** model, a state-of-the-art transformer-based architecture for autonomous driving. This specific version has been meticulously fine-tuned to power the **[Baseer Self-Driving API](https://huggingface.co/spaces/Adam-IT/Baseer_Server)**, with a primary focus on robust traffic object detection and safe trajectory planning within the CARLA simulation environment.
|
30 |
+
|
31 |
+
The model processes a single front-facing camera view and vehicle state measurements to produce a comprehensive understanding of the driving scene. It simultaneously predicts the future path of the ego-vehicle while detecting and classifying surrounding traffic participants.
|
32 |
+
|
33 |
+
This model serves as the core "brain" for the Baseer project, demonstrating an end-to-end approach to autonomous driving perception and planning.
|
34 |
+
|
35 |
+
## β¨ Key Features
|
36 |
+
|
37 |
+
* **Transformer-Based Fusion:** Leverages the power of transformers to effectively fuse image features with vehicle state information.
|
38 |
+
* **Multi-Task Learning:** Simultaneously performs two critical tasks:
|
39 |
+
1. **Traffic Object Detection:** Identifies cars, motorcycles, and pedestrians in a 20x20 meter grid in front of the vehicle.
|
40 |
+
2. **Waypoint Prediction:** Predicts a safe and drivable trajectory for the next 10 waypoints.
|
41 |
+
* **Scene Understanding:** Provides logits for crucial environmental factors, including the presence of junctions, red light hazards, and stop signs.
|
42 |
+
* **Optimized for CARLA:** Fine-tuned on the `PDM_Lite_Carla` dataset, making it highly effective for scenarios within the CARLA simulator.
|
43 |
+
|
44 |
+
## π οΈ Model Architecture
|
45 |
+
|
46 |
+
This is a variant of the original InterFuser architecture with the following specifications:
|
47 |
+
|
48 |
+
* **Image Backbone:** `ResNet-50` (pretrained on ImageNet)
|
49 |
+
* **LiDAR Backbone:** `ResNet-18` (architecture defined, but LiDAR input is disabled in this version)
|
50 |
+
* **Transformer:**
|
51 |
+
* **Embedding Dimension:** 256
|
52 |
+
* **Encoder Depth:** 6 Layers
|
53 |
+
* **Decoder Depth:** 6 Layers
|
54 |
+
* **Attention Heads:** 8
|
55 |
+
* **Prediction Heads:**
|
56 |
+
* **Waypoints:** Gated Recurrent Unit (GRU) based predictor.
|
57 |
+
* **Traffic Detection:** A detection head that outputs a `20x20x7` grid representing object confidence, position offsets, dimensions, and orientation.
|
58 |
+
|
59 |
+
## π How to Use
|
60 |
+
|
61 |
+
This model is intended to be used as part of the **Baseer Self-Driving API**. However, you can load it directly in PyTorch for your own projects.
|
62 |
+
|
63 |
+
**1. Installation**
|
64 |
+
```bash
|
65 |
+
pip install torch torchvision timm huggingface_hub
|
66 |
+
# You will also need the model definition class (Interfuser) and helper functions from the project.
|
67 |
+
|
68 |
+
|
69 |
+
2. Loading the Model
|
70 |
+
The recommended way to load the model is by using the custom load_and_prepare_model function from the project, which handles the configuration and weight loading automatically.
|
71 |
+
|
72 |
+
Generated python
|
73 |
+
import torch
|
74 |
+
# Assuming your helper functions are in a file named 'config_loader.py'
|
75 |
+
from config_loader import load_and_prepare_model
|
76 |
+
|
77 |
+
# Set the device
|
78 |
+
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
|
79 |
+
|
80 |
+
# Load the model from the Hub
|
81 |
+
# The function will automatically download from "Adam-IT/Interfuser-Baseer-v1"
|
82 |
+
try:
|
83 |
+
model = load_and_prepare_model(device)
|
84 |
+
model.eval()
|
85 |
+
print("Model loaded successfully!")
|
86 |
+
except Exception as e:
|
87 |
+
print(f"Error loading model: {e}")
|
88 |
+
|
89 |
+
# Now you can use the model for inference
|
90 |
+
# dummy_input = ...
|
91 |
+
# with torch.no_grad():
|
92 |
+
# outputs = model(dummy_input)
|
93 |
+
IGNORE_WHEN_COPYING_START
|
94 |
+
content_copy
|
95 |
+
download
|
96 |
+
Use code with caution.
|
97 |
+
Python
|
98 |
+
IGNORE_WHEN_COPYING_END
|
99 |
+
π Training and Fine-tuning
|
100 |
+
|
101 |
+
This model was fine-tuned from a pretrained InterFuser checkpoint.
|
102 |
+
|
103 |
+
Dataset: PDM_Lite_Carla, a dataset generated using the CARLA simulator, focusing on diverse urban driving scenarios.
|
104 |
+
|
105 |
+
Training Objective: The fine-tuning process prioritized the traffic detection task. The loss function was weighted heavily towards improving the Intersection over Union (IoU) of predicted bounding boxes and the accuracy of the traffic map.
|
106 |
+
|
107 |
+
Framework: PyTorch
|
108 |
+
|
109 |
+
Training Run: Finetune_Focus_on_Detection_v5
|
110 |
+
|
111 |
+
β οΈ Limitations and Bias
|
112 |
+
|
113 |
+
Simulation-Only: This model is trained exclusively on simulated data from CARLA. Its performance in real-world scenarios is untested and likely to be poor without significant domain adaptation and further training on real-world datasets.
|
114 |
+
|
115 |
+
Single Camera View: The model relies solely on a front-facing camera. It has blind spots and cannot perceive objects to the sides or rear of the vehicle.
|
116 |
+
|
117 |
+
No LiDAR: Although the architecture supports LiDAR, this version was trained without it. It may struggle in adverse weather conditions (e.g., rain, fog) or poor lighting where vision is compromised.
|
118 |
+
|
119 |
+
Dataset Bias: The model's behavior is limited by the scenarios present in the PDM_Lite_Carla dataset. It may not handle rare or "out-of-distribution" events correctly.
|
120 |
+
|
121 |
+
π¨βπ» Developed By
|
122 |
+
|
123 |
+
Adam-IT
|
124 |
+
|
125 |
+
This model is a core component of a graduation project in the field of Artificial Intelligence and Autonomous Driving.
|
126 |
+
|
127 |
+
π License
|
128 |
+
|
129 |
+
This project is licensed under the MIT License. See the LICENSE file for more details.
|
130 |
+
|
131 |
+
Generated code
|
132 |
+
IGNORE_WHEN_COPYING_START
|
133 |
+
content_copy
|
134 |
+
download
|
135 |
+
Use code with caution.
|
136 |
+
IGNORE_WHEN_COPYING_END
|