CodeHima commited on
Commit
89b4d66
·
verified ·
1 Parent(s): 262db71

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +123 -2
README.md CHANGED
@@ -1,3 +1,124 @@
1
- # Tos-Roberta V2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
 
3
- This model is trained to classify clauses in Terms of Service (ToS) documents using RoBERTa-large.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - en
5
+ widget:
6
+ - text: "You have the right to use CommunityConnect for its intended purpose of connecting with others, sharing content responsibly, and engaging in constructive dialogue. You are responsible for the content you post and must respect the rights and privacy of others."
7
+ example_title: "Fair Clause"
8
+ - text: " We reserve the right to suspend, terminate, or restrict your access to the platform at any time and for any reason, without prior notice or explanation. This includes but is not limited to violations of our community guidelines or terms of service, as determined solely by ConnectWorld."
9
+ example_title: "Unfair Clause"
10
+ metrics:
11
+ - accuracy
12
+ - precision
13
+ - f1
14
+ - recall
15
+ library_name: transformers
16
+ pipeline_tag: text-classification
17
+ ---
18
+ # TOSRobertaV2: Terms of Service Fairness Classifier
19
 
20
+ ## Model Description
21
+
22
+ TOSRobertaV2 is a fine-tuned RoBERTa-large model designed to classify clauses in Terms of Service (ToS) documents based on their fairness level. The model categorizes clauses into three classes: clearly fair, potentially unfair, and clearly unfair.
23
+
24
+ ## Intended Use
25
+
26
+ This model is intended for:
27
+ - Analyzing Terms of Service documents for potential unfair clauses
28
+ - Assisting legal professionals in reviewing contracts
29
+ - Helping consumers understand the fairness of agreements they're entering into
30
+ - Supporting researchers studying fairness in legal documents
31
+
32
+ ## Training Data
33
+
34
+ The model was trained on the CodeHima/TOS_DatasetV3, which contains labeled clauses from various Terms of Service documents.
35
+
36
+ ## Training Procedure
37
+
38
+ - Base model: RoBERTa-large
39
+ - Training type: Fine-tuning
40
+ - Number of epochs: 5
41
+ - Optimizer: AdamW
42
+ - Learning rate: 2e-5
43
+ - Batch size: 8
44
+ - Weight decay: 0.01
45
+ - Training loss: 0.3851972973652529
46
+
47
+ ## Evaluation Results
48
+
49
+ ### Validation Set Performance
50
+
51
+ - Accuracy: 0.86
52
+ - F1 Score: 0.8588
53
+ - Precision: 0.8598
54
+ - Recall: 0.8600
55
+
56
+ ### Test Set Performance
57
+
58
+ - Accuracy: 0.8651
59
+
60
+ ### Training Progress
61
+
62
+ | Epoch | Training Loss | Validation Loss | Accuracy | F1 | Precision | Recall |
63
+ |-------|---------------|-----------------|----------|--------|-----------|---------|
64
+ | 1 | 0.5391 | 0.493973 | 0.798095 | 0.7997 | 0.8056 | 0.79810 |
65
+ | 2 | 0.4621 | 0.489970 | 0.831429 | 0.8320 | 0.8330 | 0.83143 |
66
+ | 3 | 0.3954 | 0.674849 | 0.821905 | 0.8250 | 0.8349 | 0.82191 |
67
+ | 4 | 0.3783 | 0.717495 | 0.860000 | 0.8588 | 0.8598 | 0.86000 |
68
+ | 5 | 0.1542 | 0.881050 | 0.847619 | 0.8490 | 0.8514 | 0.84762 |
69
+
70
+ ## Limitations
71
+
72
+ - The model's performance may vary on ToS documents from domains or industries not well-represented in the training data.
73
+ - It may struggle with highly complex or ambiguous clauses.
74
+ - The model's understanding of "fairness" is based on the training data and may not capture all nuances of legal fairness.
75
+
76
+ ## Ethical Considerations
77
+
78
+ - This model should not be used as a substitute for professional legal advice.
79
+ - There may be biases present in the training data that could influence the model's judgments.
80
+ - Users should be aware that the concept of "fairness" in legal documents can be subjective and context-dependent.
81
+
82
+ ## How to Use
83
+
84
+ You can use this model directly with the Hugging Face `transformers` library:
85
+
86
+ ```python
87
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
88
+ import torch
89
+
90
+ tokenizer = AutoTokenizer.from_pretrained("CodeHima/TOSRobertaV2")
91
+ model = AutoModelForSequenceClassification.from_pretrained("CodeHima/TOSRobertaV2")
92
+
93
+ text = "Your clause here"
94
+ inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=512)
95
+
96
+ with torch.no_grad():
97
+ logits = model(**inputs).logits
98
+
99
+ probabilities = torch.softmax(logits, dim=1)
100
+ predicted_class = torch.argmax(probabilities, dim=1).item()
101
+
102
+ classes = ['clearly fair', 'potentially unfair', 'clearly unfair']
103
+ print(f"Predicted class: {classes[predicted_class]}")
104
+ print(f"Probabilities: {probabilities[0].tolist()}")
105
+ ```
106
+
107
+ ## Citation
108
+
109
+ If you use this model in your research, please cite:
110
+
111
+ ```
112
+ @misc{TOSRobertaV2,
113
+ author = {CodeHima},
114
+ title = {TOSRobertaV2: Terms of Service Fairness Classifier},
115
+ year = {2024},
116
+ publisher = {Hugging Face},
117
+ journal = {Hugging Face Model Hub},
118
+ howpublished = {\url{https://huggingface.co/CodeHima/TOSRobertaV2}}
119
+ }
120
+ ```
121
+
122
+ ## License
123
+
124
+ This model is released under the MIT license.