gabrielloiseau commited on
Commit
e7330e5
·
verified ·
1 Parent(s): 5e328ba

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +24 -96
README.md CHANGED
@@ -1,44 +1,28 @@
1
  ---
2
- base_model: sentence-transformers/paraphrase-distilroberta-base-v1
 
3
  library_name: sentence-transformers
4
  pipeline_tag: sentence-similarity
5
  tags:
6
  - sentence-transformers
7
  - sentence-similarity
8
- - feature-extraction
 
 
 
9
  ---
10
 
11
- # SentenceTransformer based on sentence-transformers/paraphrase-distilroberta-base-v1
12
 
13
- This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [sentence-transformers/paraphrase-distilroberta-base-v1](https://huggingface.co/sentence-transformers/paraphrase-distilroberta-base-v1). It maps sentences & paragraphs to a 512-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
14
 
15
- ## Model Details
16
-
17
- ### Model Description
18
- - **Model Type:** Sentence Transformer
19
- - **Base model:** [sentence-transformers/paraphrase-distilroberta-base-v1](https://huggingface.co/sentence-transformers/paraphrase-distilroberta-base-v1) <!-- at revision 0520e7529d15c250345a95871495ea016ca93754 -->
20
- - **Maximum Sequence Length:** 128 tokens
21
- - **Output Dimensionality:** 512 tokens
22
- - **Similarity Function:** Cosine Similarity
23
- <!-- - **Training Dataset:** Unknown -->
24
- <!-- - **Language:** Unknown -->
25
- <!-- - **License:** Unknown -->
26
 
27
- ### Model Sources
28
 
29
- - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
30
- - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
31
- - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
32
 
33
- ### Full Model Architecture
34
-
35
- ```
36
- SentenceTransformer(
37
- (0): Transformer({'max_seq_length': 128, 'do_lower_case': False}) with Transformer model: RobertaModel
38
- (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
39
- (2): Dense({'in_features': 768, 'out_features': 512, 'bias': True, 'activation_function': 'torch.nn.modules.activation.Tanh'})
40
- )
41
- ```
42
 
43
  ## Usage
44
 
@@ -54,7 +38,6 @@ Then you can load this model and run inference.
54
  ```python
55
  from sentence_transformers import SentenceTransformer
56
 
57
- # Download from the 🤗 Hub
58
  model = SentenceTransformer("gabrielloiseau/LUAR-CRUD-sentence-transformers")
59
  # Run inference
60
  sentences = [
@@ -65,78 +48,23 @@ sentences = [
65
  embeddings = model.encode(sentences)
66
  print(embeddings.shape)
67
  # [3, 512]
68
-
69
- # Get the similarity scores for the embeddings
70
- similarities = model.similarity(embeddings, embeddings)
71
- print(similarities.shape)
72
- # [3, 3]
73
  ```
74
 
75
- <!--
76
- ### Direct Usage (Transformers)
77
-
78
- <details><summary>Click to see the direct usage in Transformers</summary>
79
-
80
- </details>
81
- -->
82
-
83
- <!--
84
- ### Downstream Usage (Sentence Transformers)
85
-
86
- You can finetune this model on your own dataset.
87
-
88
- <details><summary>Click to expand</summary>
89
-
90
- </details>
91
- -->
92
-
93
- <!--
94
- ### Out-of-Scope Use
95
-
96
- *List how the model may foreseeably be misused and address what users ought not to do with the model.*
97
- -->
98
-
99
- <!--
100
- ## Bias, Risks and Limitations
101
-
102
- *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
103
- -->
104
-
105
- <!--
106
- ### Recommendations
107
-
108
- *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
109
- -->
110
-
111
- ## Training Details
112
-
113
- ### Framework Versions
114
- - Python: 3.12.7
115
- - Sentence Transformers: 3.1.1
116
- - Transformers: 4.40.1
117
- - PyTorch: 2.4.1+cu121
118
- - Accelerate:
119
- - Datasets: 3.0.1
120
- - Tokenizers: 0.19.1
121
-
122
  ## Citation
123
 
124
- ### BibTeX
125
-
126
- <!--
127
- ## Glossary
128
 
129
- *Clearly define terms in order to be accessible across audiences.*
130
- -->
131
-
132
- <!--
133
- ## Model Card Authors
 
 
 
134
 
135
- *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
136
- -->
137
 
138
- <!--
139
- ## Model Card Contact
140
 
141
- *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
142
- -->
 
1
  ---
2
+ base_model:
3
+ - rrivera1849/LUAR-CRUD
4
  library_name: sentence-transformers
5
  pipeline_tag: sentence-similarity
6
  tags:
7
  - sentence-transformers
8
  - sentence-similarity
9
+ - LUAR
10
+ license: apache-2.0
11
+ language:
12
+ - en
13
  ---
14
 
15
+ # SentenceTransformer version of rrivera1849/LUAR-MUD
16
 
17
+ All credits go to [(Rivera-Soto et al. 2021)](https://aclanthology.org/2021.emnlp-main.70/)
18
 
19
+ ---
 
 
 
 
 
 
 
 
 
 
20
 
21
+ Author Style Representations using [LUAR](https://aclanthology.org/2021.emnlp-main.70.pdf).
22
 
23
+ The LUAR training and evaluation repository can be found [here](https://github.com/llnl/luar).
 
 
24
 
25
+ This model was trained on a subsample of the Pushshift Reddit Dataset (5 million users) for comments published between January 2015 and October 2019 by authors publishing at least 100 comments during that period.
 
 
 
 
 
 
 
 
26
 
27
  ## Usage
28
 
 
38
  ```python
39
  from sentence_transformers import SentenceTransformer
40
 
 
41
  model = SentenceTransformer("gabrielloiseau/LUAR-CRUD-sentence-transformers")
42
  # Run inference
43
  sentences = [
 
48
  embeddings = model.encode(sentences)
49
  print(embeddings.shape)
50
  # [3, 512]
 
 
 
 
 
51
  ```
52
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
53
  ## Citation
54
 
55
+ If you find this model helpful, feel free to cite:
 
 
 
56
 
57
+ ```
58
+ @inproceedings{uar-emnlp2021,
59
+ author = {Rafael A. Rivera Soto and Olivia Miano and Juanita Ordonez and Barry Chen and Aleem Khan and Marcus Bishop and Nicholas Andrews},
60
+ title = {Learning Universal Authorship Representations},
61
+ booktitle = {EMNLP},
62
+ year = {2021},
63
+ }
64
+ ```
65
 
66
+ ## License
 
67
 
68
+ LUAR is distributed under the terms of the Apache License (Version 2.0).
 
69
 
70
+ All new contributions must be made under the Apache-2.0 licenses.