--- license: mit --- Special Thanks: A special thanks to NationTech.io and to Cherry Republic for sponsoring the work. ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6328952f798f8d122ce62a44/qZMLiG0-9A0AbtYCGNeev.png) Supported Tasks and Leaderboards Main Function: Address Parsing: Extracting structured address components (e.g., street, city, postal code) from unstructured text. Evaluation: Medium quality, the model struggles with complex extractions with mis spellings and duplicates. Sub Functions: Named Entity Recognition (NER): Identifying and classifying entities within text, including personal names, organizations, locations, and other categories. Data Anonymization: Recognizing and extracting personally identifiable information (PII) in text data. Domain Categorization: Extracting domain information and document types. And more! Languages: The dataset primarily contains text in English but includes other languages due to the diversity of sources. Dataset Structure Data Instances Each data instance consists of three main components: System Message: Instructions provided to the assistant (model) for the task. User Input: The textual content containing addresses or entities to be parsed. Assistant Response: The assistant's output, providing the extracted address components or entities in JSON format. Example: ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6328952f798f8d122ce62a44/dXm6gMNHBMEVGUnxBFCS0.png) ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6328952f798f8d122ce62a44/UTwkuoL9bX0QuFYEwnEom.png) ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6328952f798f8d122ce62a44/DSDVD1naT_1yZb9Pz3733.png)