--- library_name: transformers tags: - code - cybersecurity - vulnerability - cpp license: apache-2.0 datasets: - lemon42-ai/minified-diverseful-multilabels metrics: - accuracy base_model: - answerdotai/ModernBERT-base pipeline_tag: text-classification --- # Model Card for Model ID This is derivative version of [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base).
We fine-tuned ModernBERT-base to detect vulnerability in C/C++ Code.
The actual version has an accuracy of 82%
## Model Details ### Model Description ThreatDetect-C-Cpp can be used as a code classifier. It classify the input code into 7 labels: 'safe' (no vulnerability detected) and six other CWE weaknesses: | Label | Description | |---------|-------------------------------------------------------| | CWE-119 | Improper Restriction of Operations within the Bounds of a Memory Buffer | | CWE-125 | Out-of-bounds Read | | CWE-20 | Improper Input Validation | | CWE-416 | Use After Free | | CWE-703 | Improper Check or Handling of Exceptional Conditions | | CWE-787 | Out-of-bounds Write | | safe | Safe code | - **Developed by:** [lemon42-ai](https://github.com/lemon42-ai) - **Contributers** [Abdellah Oumida](https://www.linkedin.com/in/abdellah-oumida-ab9082234/) & [Mohamed Sbaihi](https://www.linkedin.com/in/mohammed-sbaihi-aa6493254/) - **Model type:** [ModernBERT, Encoder-only Transformer](https://arxiv.org/abs/2412.13663) - **Supported Programming Languages:** C/C++ - **License:** Apache 2.0 (see original License of ModernBERT-Base) - **Finetuned from model:** [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base). ### Model Sources [optional] - **Repository:** [The official lemon42-ai Github repository](https://github.com/lemon42-ai/ThreatDetect-code-vulnerability-detection) - **Technical Blog Post:** Coming soon. ## Uses ThreadDetect-C-Cpp can be integrated in code-related applications. For example, it can be used in pair with a code generator to detect vulnerabilities in the generated code. ## Bias, Risks, and Limitations [More Information Needed] ### Recommendations Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. ## How to Get Started with the Model Use the code below to get started with the model. [More Information Needed] ## Training Details ### Training Data [More Information Needed] ### Training Procedure #### Preprocessing [optional] [More Information Needed] #### Training Hyperparameters - **Training regime:** [More Information Needed] #### Speeds, Sizes, Times [optional] [More Information Needed] ## Evaluation ### Testing Data, Factors & Metrics #### Testing Data [More Information Needed] #### Factors [More Information Needed] #### Metrics [More Information Needed] ### Results [More Information Needed] #### Summary ## Model Examination [optional] [More Information Needed] ## Environmental Impact Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). - **Hardware Type:** [More Information Needed] - **Hours used:** [More Information Needed] - **Cloud Provider:** [More Information Needed] - **Compute Region:** [More Information Needed] - **Carbon Emitted:** [More Information Needed] ## Technical Specifications [optional] ### Model Architecture and Objective [More Information Needed] ### Compute Infrastructure [More Information Needed] #### Hardware [More Information Needed] #### Software [More Information Needed] ## Citation [optional] **BibTeX:** [More Information Needed] **APA:** [More Information Needed] ## Glossary [optional] [More Information Needed] ## More Information [optional] [More Information Needed] ## Model Card Authors [optional] [More Information Needed] ## Model Card Contact [More Information Needed]