--- license: mit datasets: - ai4privacy/pii-masking-400k language: - en - de - fr - it - es - nl base_model: - iiiorg/piiranha-v1-detect-personal-information tags: - NeuralWave - Hackathon --- ## Overview This model serves to enhance the precision and accuracy of personal information detection by utilizing a reduced label set compared to its base model. Through this refinement, it aims to provide superior labeling precision for identifying personal information across multiple languages. --- ## Features - **Improved Precision**: By reducing the label set size from the base model, the model enhances the precision of the labeling procedure, ensuring more reliable identification of sensitive information. - **Model Versions**: - **Maximum Accuracy Focus**: This version aims to achieve the highest possible accuracy in the detection process, making it suitable for applications where minimizing errors is crucial. - **Maximum Precision Focus**: This variant is designed to maximize the precision of the detection, ideal for scenarios where false positives are particularly undesirable. --- ## Installation To run this model, you will need to install the dependencies: ```bash pip install torch transformers safetensors ``` --- ## Usage Load and run the model using PyTorch and transformers: ```python import torch from transformers import AutoTokenizer, AutoModel from safetensors.torch import load_file # Load the tokenizer tokenizer = BertTokenizerFast.from_pretrained("google-bert/bert-base-multilingual-cased") # Load the model model = AutoModel.from_pretrained('model-path/miniagent.pt', device_map='auto') # Alternatively, for the precision-focused model # model = AutoModel.from_pretrained('model-path/miniagent_precision', device_map='auto') # Example input text = "Your sensitive information string" # Tokenize and run the model inputs = tokenizer(text, return_tensors="pt") outputs = model(**inputs) # Process outputs for analysis... ``` --- ## Evaluation - **Accuracy Model**: Focused on minimizing errors, evaluates to achieve the highest accuracy metrics. - **Precision Model**: Designed to minimize false positives, optimizing for precision-driven applications. ---