---
license: mit
---
Special Thanks:
A special thanks to NationTech.io and to Cherry Republic for sponsoring the work.


![image/png](https://cdn-uploads.huggingface.co/production/uploads/6328952f798f8d122ce62a44/qZMLiG0-9A0AbtYCGNeev.png)

Supported Tasks and Leaderboards

Main Function:

Address Parsing: Extracting structured address components (e.g., street, city, postal code) from unstructured text.
Evaluation: Medium quality, the model struggles with complex extractions with mis spellings and duplicates. 

Sub Functions:

Named Entity Recognition (NER): Identifying and classifying entities within text, including personal names, organizations, locations, and other categories.

Data Anonymization: Recognizing and extracting personally identifiable information (PII) in text data.

Domain Categorization: Extracting domain information and document types.

And more!


Languages:

The dataset primarily contains text in English but includes other languages due to the diversity of sources.

Dataset Structure

Data Instances

Each data instance consists of three main components:

System Message: Instructions provided to the assistant (model) for the task.

User Input: The textual content containing addresses or entities to be parsed.

Assistant Response: The assistant's output, providing the extracted address components or entities in JSON format.

Example:


![image/png](https://cdn-uploads.huggingface.co/production/uploads/6328952f798f8d122ce62a44/dXm6gMNHBMEVGUnxBFCS0.png)

![image/png](https://cdn-uploads.huggingface.co/production/uploads/6328952f798f8d122ce62a44/UTwkuoL9bX0QuFYEwnEom.png)

![image/png](https://cdn-uploads.huggingface.co/production/uploads/6328952f798f8d122ce62a44/DSDVD1naT_1yZb9Pz3733.png)