thearod5's picture
Update model card.
7348352 verified
---
language: en
license: mit
model_name: tbert-siamese-encoder
---
# Model Card for Model ID
This repository contains the embedding model used to embed artifact for traceability link prediction.
## Model Details
used in the siamese models
### Model Description
This embedding model is the encoder portion of the siamese model used in the paper cited. This model utilized a relational classifier
to create similarity scores between text pairs resembling a cross-encoder and consistently ranked almost as high as the top performer.
- **Developed by:** Jinfeng Lin (translated by Alberto Rodriguez)
- **Model type:** Roberta encoder trained on automatic traceability link prediction.
- **Language(s) (NLP):** en
- **License:** mit
- **Finetuned from model [optional]:** See Cited Ppaer.
### Model Sources [optional]
<!-- Provide the basic links for the model. -->
- **Repository:** https://github.com/jinfenglin/TraceBERT
- **Paper:** https://arxiv.org/abs/2102.04411
## Uses
Used to embed software artifacts intended to be compared via cosine similarity.
### Direct Use
Software traceability link prediction, Retrieval Augmented Generation, Artifact Clustering.
### Downstream Use [optional]
The intended vision for this model within a traceability link prediction pipeline, used to retrieve software artifacts for an LLM prompt, and for clustering.
### Out-of-Scope Use
This model could be used for a good set of starting weights for requirements classification.
## Bias, Risks, and Limitations
This data uses open source git data which can be inaccurate and lead to unexpected results.
### Recommendations
<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
## How to Get Started with the Model
```
parent_artifacts = [
"Display Artifacts",
]
texts = [
"Display Artifacts", // parent artifact
"A table view should be provided to display all project artifacts.", // child 1
"The system should be able to generate documentation for a set of artifacts." // child 2
]
embeddings = model.encode(texts, convert_to_tensor=False)
parent_embedding = embeddings[0:1]
children_embeddings = embeddings[1:]
# Compute cosine similarity
sim_matrix = cosine_similarity(parent_embedding, children_embeddings)
```
## Training, Evaluation, and Results Details
Please see cited paper for more information on training method, evaluation, and resuts.