File size: 7,473 Bytes

f23a73a
 
5c8f57c
f23a73a
 
 
 
5c8f57c
 
 
 
 
f23a73a
 
 
 
 
 
 
 
5c8f57c
f23a73a
98f47c2
 
 
5c8f57c
f923d1a
5c8f57c
f23a73a
5c8f57c
f23a73a
5c8f57c
f23a73a
5c8f57c
f23a73a
5c8f57c
f23a73a
5c8f57c
f23a73a
5c8f57c
f23a73a
5c8f57c
f23a73a
5c8f57c
 
 
f23a73a
5c8f57c
 
 
f23a73a
5c8f57c
 
 
98f47c2
5c8f57c
f23a73a
5c8f57c
 
 
 
 
 
 
 
 
 
 
 
f23a73a
5c8f57c
f23a73a
98f47c2
f23a73a
98f47c2
5c8f57c
 
f23a73a
5c8f57c
98f47c2
5bf1ac4
c35182f
98f47c2
 
5c8f57c
 
f23a73a
5c8f57c
 
 
 
 
98f47c2
 
5c8f57c
98f47c2
5c8f57c
 
 
98f47c2
 
5c8f57c
f23a73a
5c8f57c
f23a73a
5c8f57c
 
 
f23a73a
5c8f57c
f23a73a
5c8f57c
f23a73a
5c8f57c
f23a73a
5c8f57c
 
f23a73a
5c8f57c
f23a73a
98f47c2
f23a73a
5c8f57c
f23a73a
98f47c2
5c8f57c
 
 
98f47c2
 
5bf1ac4
98f47c2
 
f23a73a
98f47c2
f23a73a
5c8f57c
98f47c2
5c8f57c
f23a73a
98f47c2
 
 
 
5c8f57c
98f47c2
5c8f57c
 
 
 
98f47c2
 
f23a73a
98f47c2
5c8f57c
98f47c2

---
license: mit
language: en
library_name: pytorch
tags:
  - computer-vision
  - autonomous-driving
  - self-driving-car
  - end-to-end
  - transformer
  - attention
  - positional-encoding
  - carla
  - object-detection
  - trajectory-prediction
datasets:
  - PDM-Lite-CARLA
pipeline_tag: object-detection
---

# HDPE: A Foundational Perception Model with Hyper-Dimensional Positional Encoding

[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![PyTorch](https://img.shields.io/badge/PyTorch-EE4C2C?style=flat&logo=pytorch&logoColor=white)](https://pytorch.org/)
[![CARLA](https://img.shields.io/badge/CARLA-Simulator-blue)](https://carla.org/)
[![Demo](https://img.shields.io/badge/🚀-Live%20Demo-brightgreen)](https://huggingface.co/spaces/Adam-IT/Baseer_Server)

**📖 Research Paper (Coming Soon)** | **🚀 [Live Demo API (Powered by this Model)](https://huggingface.co/spaces/BaseerAI/Baseer_Server)**

---

## 📖 Overview: A New Foundation for Perception in Autonomous Driving

This repository contains the pre-trained weights for a novel autonomous driving perception model, the core of our **Interfuser-HDPE** system. This is **not a standard Interfuser model**; it incorporates fundamental innovations in its architecture and learning framework to achieve a more robust, accurate, and geometrically-aware understanding of driving scenes from camera-only inputs.

The innovations baked into these weights make this model a powerful foundation for building complete self-driving systems. It is designed to output rich perception data (object detection grids and waypoints) that can be consumed by downstream modules like trackers and controllers.

---

## 💡 Key Innovations in This Model

The weights in this repository are the result of training a model with the following scientific contributions:

### 1. Hyper-Dimensional Positional Encoding (HDPE) - (Core Contribution)
*   **What it is:** We replace the standard Sinusoidal Positional Encoding with **HDPE**, a novel, first-principles approach inspired by the geometric properties of n-dimensional spaces.
*   **Why it matters:** HDPE generates an interpretable spatial prior that biases the model's attention towards the center of the image (the road ahead). This leads to more stable and contextually-aware feature extraction, and has shown to improve performance significantly, especially in multi-camera fusion scenarios.

### 2. Advanced Multi-Task Loss Framework
*   **What it is:** This model was trained using a specialized combination of **Focal Loss** and **Enhanced-IoU (EIoU) Loss**.
*   **Why it matters:** This framework is purpose-built to tackle the primary challenges in perception: **Focal Loss** addresses the severe class imbalance in object detection, while **EIoU Loss** ensures highly accurate bounding box regression by optimizing for geometric overlap.

### 3. High-Resolution, Camera-Only Architecture
*   **What it is:** This model is vision-based (**camera-only**) and uses a **ResNet-50** backbone with a smaller patch size (`patch_size=8`) for high-resolution analysis.
*   **Why it matters:** It demonstrates that strong perception performance can be achieved without costly sensors like LiDAR, aligning with modern, cost-effective approaches to autonomous driving.

---

## 🏗️ Model Architecture vs. Baseline

| Component                 | Original Interfuser (Baseline) | **Interfuser-HDPE (This Model)**  |
|:--------------------------|:-------------------------------|:----------------------------------|
| **Positional Encoding**   | Sinusoidal PE                  | ✅ **Hyper-Dimensional PE (HDPE)**  |
| **Perception Backbone**   | ResNet-26, LiDAR               | ✅ **Camera-Only, ResNet-50**       |
| **Training Objective**    | Standard BCE + L1 Loss         | ✅ **Focal Loss + EIoU Loss**       |
| **Model Outputs**         | Waypoints, Traffic Grid, States| Same (Optimized for higher accuracy) |

---

## 🚀 How to Use These Weights

These weights are intended to be loaded into a model class that incorporates our architectural changes, primarily the `HyperDimensionalPositionalEncoding` module.

```python
import torch
from huggingface_hub import hf_hub_download
# You need to provide the model class definition, let's call it InterfuserHDPE
from your_model_definition_file import InterfuserHDPE 

# Download the pre-trained model weights
model_path = hf_hub_download(
    repo_id="BaseerAI/Interfuser-Baseer-v1",
    filename="pytorch_model.bin"
)

# Instantiate your model architecture
# The config must match the architecture these weights were trained on
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = InterfuserHDPE(**model_config).to(device)

# Load the state dictionary
state_dict = torch.load(model_path, map_location=device)
model.load_state_dict(state_dict)
model.eval()

# Now the model is ready for inference
with torch.no_grad():
    # The model expects a dictionary of sensor data
    # (e.g., {'rgb': camera_tensor, ...})
    perception_outputs = model(input_data)
```

## 📊 Performance Highlights

When integrated into a full driving stack (like our **[Baseer Self-Driving API](https://huggingface.co/spaces/BaseerAI/Baseer_Server)**), this perception model is the foundation for:

- **Significantly Improved Detection Accuracy**: Achieves higher mAP on the PDM-Lite-CARLA dataset.
- **Superior Driving Score**: Leads to a higher overall Driving Score with fewer infractions compared to baseline models.
- **Proven Scalability**: Performance demonstrably improves when scaling from single-camera to multi-camera inputs, showcasing the robustness of the HDPE-based architecture.

*(Detailed metrics and ablation studies will be available in our upcoming research paper.)*

## 🛠️ Integration with a Full System

This model provides the core perception outputs. To build a complete autonomous agent, you need to combine it with:

- **A Temporal Tracker**: To maintain object identity across frames.
- **A Decision-Making Controller**: To translate perception outputs into vehicle commands.

An example of such a complete system, including our custom-built **Hierarchical, Memory-Enhanced Controller**, can be found in our **[Live Demo API Space](https://huggingface.co/spaces/BaseerAI/Baseer_Server)**.

## 📚 Citation

If you use the HDPE concept or this model in your research, please cite our upcoming paper. For now, you can cite this model repository:

```bibtex
@misc{interfuser-hdpe-2024,
  title={HDPE: Hyper-Dimensional Positional Encoding for End-to-End Self-Driving Systems},
  author={Altawil, Adam},
  year={2024},
  publisher={Hugging Face},
  howpublished={\url{https://huggingface.co/BaseerAI/Interfuser-Baseer-v1}}
}
```

## 👨‍💻 Development

**Lead Researcher**: Adam Altawil  
**Project Type**: Graduation Project - AI & Autonomous Driving  
**Contact**: [Your Contact Information]

## 📄 License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## 🤝 Contributing & Support

For questions, contributions, and support:
- **🚀 Try the Live Demo**: **[Baseer Server Space](https://huggingface.co/spaces/BaseerAI/Baseer_Server)**
- **📧 Contact**: [Your Contact Information]
- **🐛 Issues**: Create an issue in this repository

---

<div align="center">
  <strong>🚗 Driving the Future with Hyper-Dimensional Intelligence 🚗</strong>
</div>