File size: 2,740 Bytes
63ef06c ed236e8 63ef06c 639bb58 6974e25 639bb58 054eca3 639bb58 054eca3 639bb58 054eca3 639bb58 054eca3 639bb58 b9701a5 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 |
---
license: cc-by-nc-4.0
datasets:
- ai4privacy/pii-masking-400k
language:
- en
- de
- fr
- it
- es
- nl
base_model:
- iiiorg/piiranha-v1-detect-personal-information
tags:
- NeuralWave
- Hackathon
---
## Overview
This model serves to enhance the precision and accuracy of personal information detection by utilizing a reduced label set compared to its base model. Through this refinement, it aims to provide superior labeling precision for identifying personal information across multiple languages.
---
## Features
- **Improved Precision**: By reducing the label set size from the base model, the model enhances the precision of the labeling procedure, ensuring more reliable identification of sensitive information.
- **Model Versions**:
- **Maximum Accuracy Focus**: This version aims to achieve the highest possible accuracy in the detection process, making it suitable for applications where minimizing errors is crucial.
- **Maximum Precision Focus**: This variant is designed to maximize the precision of the detection, ideal for scenarios where false positives are particularly undesirable.
---
## Installation
To run this model, you will need to install the dependencies:
```bash
pip install torch transformers safetensors
```
---
## Usage
Load and run the model using PyTorch and transformers:
```python
from transformers import AutoModelForTokenClassification, AutoConfig, BertTokenizerFast
from safetensors.torch import load_file
# Load the config
config = AutoConfig.from_pretrained("folder_to_model")
# Initialize the model with the config
model = AutoModelForTokenClassification.from_config(config)
# Load the safetensors weights
state_dict = load_file("folder_to_tensors")
# Load the state dict into the model
model.load_state_dict(state_dict)
# Load the tokenizer
tokenizer = BertTokenizerFast.from_pretrained("google-bert/bert-base-multilingual-cased")
# Load the label mapper if needed
with open("pii_model/label_mapper.json", 'r') as f:
label_mapper_data = json.load(f)
label_mapper = LabelMapper()
label_mapper.label_to_id = label_mapper_data['label_to_id']
label_mapper.id_to_label = {int(k): v for k, v in label_mapper_data['id_to_label'].items()}
label_mapper.num_labels = label_mapper_data['num_labels']
# Process outputs for analysis...
```
---
## Evaluation
- **Accuracy Model**: Focused on minimizing errors, evaluates to achieve the highest accuracy metrics.
- **Precision Model**: Designed to minimize false positives, optimizing for precision-driven applications.
---
## Disclaimer
The publisher of this repository is not affiliated with Ai4Privacy and Ai Suisse SA
## Honorary Mention
This repo created during the Hackaton organized by [NeuralWave](https://neuralwave.ch/#/) |