--- language: en license: mit model_name: tbert-siamese-encoder --- # Model Card for Model ID This repository contains the embedding model used to embed artifact for traceability link prediction. ## Model Details used in the siamese models ### Model Description This embedding model is the encoder portion of the siamese model used in the paper cited. This model utilized a relational classifier to create similarity scores between text pairs resembling a cross-encoder and consistently ranked almost as high as the top performer. - **Developed by:** Jinfeng Lin (translated by Alberto Rodriguez) - **Model type:** Roberta encoder trained on automatic traceability link prediction. - **Language(s) (NLP):** en - **License:** mit - **Finetuned from model [optional]:** See Cited Ppaer. ### Model Sources [optional] - **Repository:** https://github.com/jinfenglin/TraceBERT - **Paper:** https://arxiv.org/abs/2102.04411 ## Uses Used to embed software artifacts intended to be compared via cosine similarity. ### Direct Use Software traceability link prediction, Retrieval Augmented Generation, Artifact Clustering. ### Downstream Use [optional] The intended vision for this model within a traceability link prediction pipeline, used to retrieve software artifacts for an LLM prompt, and for clustering. ### Out-of-Scope Use This model could be used for a good set of starting weights for requirements classification. ## Bias, Risks, and Limitations This data uses open source git data which can be inaccurate and lead to unexpected results. ### Recommendations Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. ## How to Get Started with the Model ``` parent_artifacts = [ "Display Artifacts", ] texts = [ "Display Artifacts", // parent artifact "A table view should be provided to display all project artifacts.", // child 1 "The system should be able to generate documentation for a set of artifacts." // child 2 ] embeddings = model.encode(texts, convert_to_tensor=False) parent_embedding = embeddings[0:1] children_embeddings = embeddings[1:] # Compute cosine similarity sim_matrix = cosine_similarity(parent_embedding, children_embeddings) ``` ## Training, Evaluation, and Results Details Please see cited paper for more information on training method, evaluation, and resuts.