HDPE: A Foundational Perception Model with Hyper-Dimensional Positional Encoding

License: MIT PyTorch CARLA Demo

πŸ“– Research Paper (Coming Soon) | πŸš€ Live Demo API (Powered by this Model)


πŸ“– Overview: A New Foundation for Perception in Autonomous Driving

This repository contains the pre-trained weights for a novel autonomous driving perception model, the core of our Interfuser-HDPE system. This is not a standard Interfuser model; it incorporates fundamental innovations in its architecture and learning framework to achieve a more robust, accurate, and geometrically-aware understanding of driving scenes from camera-only inputs.

The innovations baked into these weights make this model a powerful foundation for building complete self-driving systems. It is designed to output rich perception data (object detection grids and waypoints) that can be consumed by downstream modules like trackers and controllers.


πŸ’‘ Key Innovations in This Model

The weights in this repository are the result of training a model with the following scientific contributions:

1. Hyper-Dimensional Positional Encoding (HDPE) - (Core Contribution)

  • What it is: We replace the standard Sinusoidal Positional Encoding with HDPE, a novel, first-principles approach inspired by the geometric properties of n-dimensional spaces.
  • Why it matters: HDPE generates an interpretable spatial prior that biases the model's attention towards the center of the image (the road ahead). This leads to more stable and contextually-aware feature extraction, and has shown to improve performance significantly, especially in multi-camera fusion scenarios.

2. Advanced Multi-Task Loss Framework

  • What it is: This model was trained using a specialized combination of Focal Loss and Enhanced-IoU (EIoU) Loss.
  • Why it matters: This framework is purpose-built to tackle the primary challenges in perception: Focal Loss addresses the severe class imbalance in object detection, while EIoU Loss ensures highly accurate bounding box regression by optimizing for geometric overlap.

3. High-Resolution, Camera-Only Architecture

  • What it is: This model is vision-based (camera-only) and uses a ResNet-50 backbone with a smaller patch size (patch_size=8) for high-resolution analysis.
  • Why it matters: It demonstrates that strong perception performance can be achieved without costly sensors like LiDAR, aligning with modern, cost-effective approaches to autonomous driving.

πŸ—οΈ Model Architecture vs. Baseline

Component Original Interfuser (Baseline) Interfuser-HDPE (This Model)
Positional Encoding Sinusoidal PE βœ… Hyper-Dimensional PE (HDPE)
Perception Backbone ResNet-26, LiDAR βœ… Camera-Only, ResNet-50
Training Objective Standard BCE + L1 Loss βœ… Focal Loss + EIoU Loss
Model Outputs Waypoints, Traffic Grid, States Same (Optimized for higher accuracy)

πŸš€ How to Use These Weights

These weights are intended to be loaded into a model class that incorporates our architectural changes, primarily the HyperDimensionalPositionalEncoding module.

import torch
from huggingface_hub import hf_hub_download
# You need to provide the model class definition, let's call it InterfuserHDPE
from your_model_definition_file import InterfuserHDPE 

# Download the pre-trained model weights
model_path = hf_hub_download(
    repo_id="BaseerAI/Interfuser-Baseer-v1",
    filename="interfuser_hdpe_v1.pth"
)

# Instantiate your model architecture
# The config must match the architecture these weights were trained on
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = InterfuserHDPE(**model_config).to(device)

# Load the state dictionary
state_dict = torch.load(model_path, map_location=device)
model.load_state_dict(state_dict)
model.eval()

# Now the model is ready for inference
with torch.no_grad():
    # The model expects a dictionary of sensor data
    # (e.g., {'rgb': camera_tensor, ...})
    perception_outputs = model(input_data)

πŸ“Š Performance Highlights

When integrated into a full driving stack (like our Baseer Self-Driving API), this perception model is the foundation for:

  • Significantly Improved Detection Accuracy: Achieves higher mAP on the PDM-Lite-CARLA dataset.
  • Superior Driving Score: Leads to a higher overall Driving Score with fewer infractions compared to baseline models.
  • Proven Scalability: Performance demonstrably improves when scaling from single-camera to multi-camera inputs, showcasing the robustness of the HDPE-based architecture.

(Detailed metrics and ablation studies will be available in our upcoming research paper.)

πŸ› οΈ Integration with a Full System

This model provides the core perception outputs. To build a complete autonomous agent, you need to combine it with:

  • A Temporal Tracker: To maintain object identity across frames.
  • A Decision-Making Controller: To translate perception outputs into vehicle commands.

An example of such a complete system, including our custom-built Hierarchical, Memory-Enhanced Controller, can be found in our Live Demo API Space.

πŸ“š Citation

If you use the HDPE concept or this model in your research, please cite our upcoming paper. For now, you can cite this model repository:

@misc{interfuser-hdpe-2024,
  title={HDPE: Hyper-Dimensional Positional Encoding for End-to-End Self-Driving Systems},
  author={Altawil, Adam},
  year={2024},
  publisher={Hugging Face},
  howpublished={\url{https://huggingface.co/BaseerAI/Interfuser-Baseer-v1}}
}

πŸ‘¨β€πŸ’» Development

Lead Researcher: Adam Altawil
Project Type: Graduation Project - AI & Autonomous Driving
Contact: [Your Contact Information]

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

🀝 Contributing & Support

For questions, contributions, and support:

  • πŸš€ Try the Live Demo: Baseer Server Space
  • πŸ“§ Contact: [Your Contact Information]
  • πŸ› Issues: Create an issue in this repository

πŸš— Driving the Future with Hyper-Dimensional Intelligence πŸš—
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Spaces using BaseerAI/Interfuser-Baseer-v1 2