gabrielloiseau
/

LUAR-CRUD-sentence-transformers

@@ -1,44 +1,28 @@
 ---
-base_model: sentence-transformers/paraphrase-distilroberta-base-v1
 library_name: sentence-transformers
 pipeline_tag: sentence-similarity
 tags:
 - sentence-transformers
 - sentence-similarity
-- feature-extraction
 ---
-# SentenceTransformer based on sentence-transformers/paraphrase-distilroberta-base-v1
-This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [sentence-transformers/paraphrase-distilroberta-base-v1](https://huggingface.co/sentence-transformers/paraphrase-distilroberta-base-v1). It maps sentences & paragraphs to a 512-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
-## Model Details
-### Model Description
-- **Model Type:** Sentence Transformer
-- **Base model:** [sentence-transformers/paraphrase-distilroberta-base-v1](https://huggingface.co/sentence-transformers/paraphrase-distilroberta-base-v1) <!-- at revision 0520e7529d15c250345a95871495ea016ca93754 -->
-- **Maximum Sequence Length:** 128 tokens
-- **Output Dimensionality:** 512 tokens
-- **Similarity Function:** Cosine Similarity
-<!-- - **Training Dataset:** Unknown -->
-<!-- - **Language:** Unknown -->
-<!-- - **License:** Unknown -->
-### Model Sources
-- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
-- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
-- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
-### Full Model Architecture
-```
-SentenceTransformer(
-  (0): Transformer({'max_seq_length': 128, 'do_lower_case': False}) with Transformer model: RobertaModel
-  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
-  (2): Dense({'in_features': 768, 'out_features': 512, 'bias': True, 'activation_function': 'torch.nn.modules.activation.Tanh'})
-)
-```
 ## Usage
@@ -54,7 +38,6 @@ Then you can load this model and run inference.
 ```python
 from sentence_transformers import SentenceTransformer
-# Download from the 🤗 Hub
 model = SentenceTransformer("gabrielloiseau/LUAR-CRUD-sentence-transformers")
 # Run inference
 sentences = [
@@ -65,78 +48,23 @@ sentences = [
 embeddings = model.encode(sentences)
 print(embeddings.shape)
 # [3, 512]
-# Get the similarity scores for the embeddings
-similarities = model.similarity(embeddings, embeddings)
-print(similarities.shape)
-# [3, 3]
 ```
-<!--
-### Direct Usage (Transformers)
-<details><summary>Click to see the direct usage in Transformers</summary>
-</details>
--->
-<!--
-### Downstream Usage (Sentence Transformers)
-You can finetune this model on your own dataset.
-<details><summary>Click to expand</summary>
-</details>
--->
-<!--
-### Out-of-Scope Use
-*List how the model may foreseeably be misused and address what users ought not to do with the model.*
--->
-<!--
-## Bias, Risks and Limitations
-*What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
--->
-<!--
-### Recommendations
-*What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
--->
-## Training Details
-### Framework Versions
-- Python: 3.12.7
-- Sentence Transformers: 3.1.1
-- Transformers: 4.40.1
-- PyTorch: 2.4.1+cu121
-- Accelerate:
-- Datasets: 3.0.1
-- Tokenizers: 0.19.1
 ## Citation
-### BibTeX
-<!--
-## Glossary
-*Clearly define terms in order to be accessible across audiences.*
--->
-<!--
-## Model Card Authors
-*Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
--->
-<!--
-## Model Card Contact
-*Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
--->

 ---
+base_model:
+- rrivera1849/LUAR-CRUD
 library_name: sentence-transformers
 pipeline_tag: sentence-similarity
 tags:
 - sentence-transformers
 - sentence-similarity
+- LUAR
+license: apache-2.0
+language:
+- en
 ---
+# SentenceTransformer version of rrivera1849/LUAR-MUD
+All credits go to [(Rivera-Soto et al. 2021)](https://aclanthology.org/2021.emnlp-main.70/)
+---
+Author Style Representations using [LUAR](https://aclanthology.org/2021.emnlp-main.70.pdf).
+The LUAR training and evaluation repository can be found [here](https://github.com/llnl/luar).
+This model was trained on a subsample of the Pushshift Reddit Dataset (5 million users) for comments published between January 2015 and October 2019 by authors publishing at least 100 comments during that period.
 ## Usage
 ```python
 from sentence_transformers import SentenceTransformer
 model = SentenceTransformer("gabrielloiseau/LUAR-CRUD-sentence-transformers")
 # Run inference
 sentences = [
 embeddings = model.encode(sentences)
 print(embeddings.shape)
 # [3, 512]
 ```
 ## Citation
+If you find this model helpful, feel free to cite:
+```
+@inproceedings{uar-emnlp2021,
+  author    = {Rafael A. Rivera Soto and Olivia Miano and Juanita Ordonez and Barry Chen and Aleem Khan and Marcus Bishop and Nicholas Andrews},
+  title     = {Learning Universal Authorship Representations},
+  booktitle = {EMNLP},
+  year      = {2021},
+}
+```
+## License
+LUAR is distributed under the terms of the Apache License (Version 2.0).
+All new contributions must be made under the Apache-2.0 licenses.