|
--- |
|
language: en |
|
license: mit |
|
model_name: tbert-siamese-encoder |
|
--- |
|
|
|
# Model Card for Model ID |
|
This repository contains the embedding model used to embed artifact for traceability link prediction. |
|
|
|
|
|
## Model Details |
|
|
|
used in the siamese models |
|
### Model Description |
|
This embedding model is the encoder portion of the siamese model used in the paper cited. This model utilized a relational classifier |
|
to create similarity scores between text pairs resembling a cross-encoder and consistently ranked almost as high as the top performer. |
|
|
|
|
|
|
|
- **Developed by:** Jinfeng Lin (translated by Alberto Rodriguez) |
|
- **Model type:** Roberta encoder trained on automatic traceability link prediction. |
|
- **Language(s) (NLP):** en |
|
- **License:** mit |
|
- **Finetuned from model [optional]:** See Cited Ppaer. |
|
|
|
### Model Sources [optional] |
|
|
|
<!-- Provide the basic links for the model. --> |
|
|
|
- **Repository:** https://github.com/jinfenglin/TraceBERT |
|
- **Paper:** https://arxiv.org/abs/2102.04411 |
|
|
|
## Uses |
|
Used to embed software artifacts intended to be compared via cosine similarity. |
|
|
|
### Direct Use |
|
Software traceability link prediction, Retrieval Augmented Generation, Artifact Clustering. |
|
|
|
### Downstream Use [optional] |
|
The intended vision for this model within a traceability link prediction pipeline, used to retrieve software artifacts for an LLM prompt, and for clustering. |
|
|
|
### Out-of-Scope Use |
|
This model could be used for a good set of starting weights for requirements classification. |
|
|
|
## Bias, Risks, and Limitations |
|
This data uses open source git data which can be inaccurate and lead to unexpected results. |
|
|
|
### Recommendations |
|
|
|
<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. --> |
|
|
|
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. |
|
|
|
## How to Get Started with the Model |
|
|
|
``` |
|
parent_artifacts = [ |
|
"Display Artifacts", |
|
] |
|
texts = [ |
|
"Display Artifacts", // parent artifact |
|
"A table view should be provided to display all project artifacts.", // child 1 |
|
"The system should be able to generate documentation for a set of artifacts." // child 2 |
|
] |
|
embeddings = model.encode(texts, convert_to_tensor=False) |
|
|
|
parent_embedding = embeddings[0:1] |
|
children_embeddings = embeddings[1:] |
|
|
|
# Compute cosine similarity |
|
sim_matrix = cosine_similarity(parent_embedding, children_embeddings) |
|
|
|
``` |
|
## Training, Evaluation, and Results Details |
|
Please see cited paper for more information on training method, evaluation, and resuts. |