File size: 1,319 Bytes
a528044
 
 
 
 
 
 
 
222caa7
a528044
210d151
a528044
 
 
 
 
190c1c2
 
a528044
190c1c2
 
a528044
190c1c2
 
 
a528044
190c1c2
a528044
190c1c2
 
 
 
a528044
190c1c2
 
a528044
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
---
license: apache-2.0
language:
- zh
pipeline_tag: text-classification
library_name: transformers
---

# yizhao-risk-zh-scorer 
## Introduction
This is a BERT model fine-tuned on a high-quality Chinese financial dataset. It generates a security risk score, which helps to identify and remove data with security risks from financial datasets, thereby reducing the proportion of illegal or undesirable data. For the complete data cleaning process, please refer to [YiZhao](https://github.com/HITsz-TMG/YiZhao).
## Quickstart
Here is an example code snippet for generating security risk scores using this model.
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification

text = "你是一个聪明的机器人"
risk_model_name = "risk-model-zh-v0.1"

risk_tokenizer = AutoTokenizer.from_pretrained(risk_model_name)
risk_model = AutoModelForSequenceClassification.from_pretrained(risk_model_name)

risk_inputs = risk_tokenizer(text, return_tensors="pt", padding="longest", truncation=True)
risk_outputs = risk_model(**risk_inputs)
risk_logits = risk_outputs.logits.squeeze(-1).float().detach().numpy()

risk_score = risk_logits.item()

result = {
    "text": text,
    "risk_score": risk_score
}

print(result)
# {'text': '你是一个聪明的机器人', 'risk_score': 0.11226219683885574}
```