thearod5
/

pl-bert-siamese-encoder

Feature Extraction

text-embeddings-inference

Model card Files Files and versions Community

pl-bert-siamese-encoder / README.md

thearod5's picture

Update model card.

7348352 verified about 1 year ago

|

history blame contribute delete

2.59 kB

	---
	language: en
	license: mit
	model_name: tbert-siamese-encoder
	---

	# Model Card for Model ID
	This repository contains the embedding model used to embed artifact for traceability link prediction.


	## Model Details

	used in the siamese models
	### Model Description
	This embedding model is the encoder portion of the siamese model used in the paper cited. This model utilized a relational classifier
	to create similarity scores between text pairs resembling a cross-encoder and consistently ranked almost as high as the top performer.



	- Developed by: Jinfeng Lin (translated by Alberto Rodriguez)
	- Model type: Roberta encoder trained on automatic traceability link prediction.
	- Language(s) (NLP): en
	- License: mit
	- Finetuned from model [optional]: See Cited Ppaer.

	### Model Sources [optional]

	<!-- Provide the basic links for the model. -->

	- Repository: https://github.com/jinfenglin/TraceBERT
	- Paper: https://arxiv.org/abs/2102.04411

	## Uses
	Used to embed software artifacts intended to be compared via cosine similarity.

	### Direct Use
	Software traceability link prediction, Retrieval Augmented Generation, Artifact Clustering.

	### Downstream Use [optional]
	The intended vision for this model within a traceability link prediction pipeline, used to retrieve software artifacts for an LLM prompt, and for clustering.

	### Out-of-Scope Use
	This model could be used for a good set of starting weights for requirements classification.

	## Bias, Risks, and Limitations
	This data uses open source git data which can be inaccurate and lead to unexpected results.

	### Recommendations

	<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->

	Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.

	## How to Get Started with the Model

	```
	parent_artifacts = [
	"Display Artifacts",
	]
	texts = [
	"Display Artifacts", // parent artifact
	"A table view should be provided to display all project artifacts.", // child 1
	"The system should be able to generate documentation for a set of artifacts." // child 2
	]
	embeddings = model.encode(texts, convert_to_tensor=False)

	parent_embedding = embeddings[0:1]
	children_embeddings = embeddings[1:]

	# Compute cosine similarity
	sim_matrix = cosine_similarity(parent_embedding, children_embeddings)

	```
	## Training, Evaluation, and Results Details
	Please see cited paper for more information on training method, evaluation, and resuts.