Update README.md
Browse files
README.md
CHANGED
@@ -14,4 +14,65 @@ base_model:
|
|
14 |
tags:
|
15 |
- NeuralWave
|
16 |
- Hackathon
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
17 |
---
|
|
|
14 |
tags:
|
15 |
- NeuralWave
|
16 |
- Hackathon
|
17 |
+
---
|
18 |
+
## Overview
|
19 |
+
|
20 |
+
This model serves to enhance the precision and accuracy of personal information detection by utilizing a reduced label set compared to its base model. Through this refinement, it aims to provide superior labeling precision for identifying personal information across multiple languages.
|
21 |
+
|
22 |
+
---
|
23 |
+
|
24 |
+
## Features
|
25 |
+
|
26 |
+
- **Improved Precision**: By reducing the label set size from the base model, the model enhances the precision of the labeling procedure, ensuring more reliable identification of sensitive information.
|
27 |
+
|
28 |
+
- **Model Versions**:
|
29 |
+
- **Maximum Accuracy Focus**: This version aims to achieve the highest possible accuracy in the detection process, making it suitable for applications where minimizing errors is crucial.
|
30 |
+
- **Maximum Precision Focus**: This variant is designed to maximize the precision of the detection, ideal for scenarios where false positives are particularly undesirable.
|
31 |
+
|
32 |
+
---
|
33 |
+
|
34 |
+
## Installation
|
35 |
+
|
36 |
+
To run this model, you will need to install the dependencies:
|
37 |
+
|
38 |
+
```bash
|
39 |
+
pip install torch transformers safetensors
|
40 |
+
```
|
41 |
+
|
42 |
+
---
|
43 |
+
|
44 |
+
## Usage
|
45 |
+
|
46 |
+
Load and run the model using PyTorch and transformers:
|
47 |
+
|
48 |
+
```python
|
49 |
+
import torch
|
50 |
+
from transformers import AutoTokenizer, AutoModel
|
51 |
+
from safetensors.torch import load_file
|
52 |
+
|
53 |
+
# Load the tokenizer
|
54 |
+
tokenizer = BertTokenizerFast.from_pretrained("google-bert/bert-base-multilingual-cased")
|
55 |
+
|
56 |
+
# Load the model
|
57 |
+
model = AutoModel.from_pretrained('model-path/miniagent.pt', device_map='auto')
|
58 |
+
# Alternatively, for the precision-focused model
|
59 |
+
# model = AutoModel.from_pretrained('model-path/miniagent_precision', device_map='auto')
|
60 |
+
|
61 |
+
# Example input
|
62 |
+
text = "Your sensitive information string"
|
63 |
+
|
64 |
+
# Tokenize and run the model
|
65 |
+
inputs = tokenizer(text, return_tensors="pt")
|
66 |
+
outputs = model(**inputs)
|
67 |
+
|
68 |
+
# Process outputs for analysis...
|
69 |
+
```
|
70 |
+
|
71 |
+
---
|
72 |
+
|
73 |
+
## Evaluation
|
74 |
+
|
75 |
+
- **Accuracy Model**: Focused on minimizing errors, evaluates to achieve the highest accuracy metrics.
|
76 |
+
- **Precision Model**: Designed to minimize false positives, optimizing for precision-driven applications.
|
77 |
+
|
78 |
---
|