---
language: en
license: mit
model_name: tbert-siamese-encoder
---

# Model Card for Model ID
This repository contains the embedding model used to embed artifact for traceability link prediction.


## Model Details

used in the siamese models
### Model Description
This embedding model is the encoder portion of the siamese model used in the paper cited.  This model utilized a relational classifier 
to create similarity scores between text pairs resembling a cross-encoder and consistently ranked almost as high as the top performer.


- **Developed by:** Jinfeng Lin (translated by Alberto Rodriguez)
- **Model type:** Roberta encoder trained on automatic traceability link prediction.
- **Language(s) (NLP):** en
- **License:** mit
- **Finetuned from model [optional]:** See Cited Ppaer.

### Model Sources [optional]

<!-- Provide the basic links for the model. -->

- **Repository:** https://github.com/jinfenglin/TraceBERT
- **Paper:** https://arxiv.org/abs/2102.04411

## Uses
Used to embed software artifacts intended to be compared via cosine similarity.

### Direct Use
Software traceability link prediction, Retrieval Augmented Generation, Artifact Clustering.

### Downstream Use [optional]
The intended vision for this model within a traceability link prediction pipeline, used to retrieve software artifacts for an LLM prompt, and for clustering.

### Out-of-Scope Use
This model could be used for a good set of starting weights for requirements classification.

## Bias, Risks, and Limitations
This data uses open source git data which can be inaccurate and lead to unexpected results.

### Recommendations

<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.

## How to Get Started with the Model

```
parent_artifacts = [
"Display Artifacts",
]
texts = [
    "Display Artifacts", // parent artifact
    "A table view should be provided to display all project artifacts.", // child 1
    "The system should be able to generate documentation for a set of artifacts." // child 2
]
embeddings = model.encode(texts, convert_to_tensor=False)

parent_embedding = embeddings[0:1]
children_embeddings = embeddings[1:]

# Compute cosine similarity
sim_matrix = cosine_similarity(parent_embedding, children_embeddings)

```
## Training, Evaluation, and Results Details
Please see cited paper for more information on training method, evaluation, and resuts.