|
--- |
|
license: apache-2.0 |
|
language: |
|
- en |
|
base_model: |
|
- meta-llama/Llama-3.2-1B-Instruct |
|
--- |
|
# π§ Multi-Task Address Reasoning Model v1.0 |
|
|
|
This model is a **multi-task fine-tuned model** specialized for **address correction, component extraction, and geographic Q&A** with **Chain of Thought reasoning**. Built on Llama-3.2-1B-Instruct with LoRA fine-tuning using Unsloth. |
|
|
|
## π― Model Description |
|
|
|
Multi-task Llama-3.2-1B model fine-tuned with LoRA for Indian address correction, component extraction, and geographic Q&A using Chain of Thought reasoning |
|
|
|
### Key Capabilities |
|
|
|
- **π§ Address Correction**: Fix spelling errors, formatting issues, and incomplete addresses |
|
- **π Component Extraction**: Extract and structure address components (building, locality, city, state, pincode) |
|
- **β Geographic Q&A**: Answer questions about locations, states, cities, and geographic relationships |
|
- **π§ Chain of Thought Reasoning**: Detailed step-by-step reasoning for address analysis |
|
- **π― Multi-Task Learning**: Single model handles multiple address-related tasks |
|
|
|
## π Model Architecture & Training |
|
|
|
- **Base Model**: unsloth/Llama-3.2-1B-Instruct |
|
- **Fine-tuning Method**: LoRA (Low-Rank Adaptation) via Unsloth |
|
- **LoRA Rank (r)**: 64 |
|
- **LoRA Alpha**: 128 |
|
- **LoRA Dropout**: 0.1 |
|
- **Target Modules**: q_proj, o_proj, k_proj, up_proj, v_proj, down_proj, gate_proj |
|
- **Model Size**: ~276MB (adapter only) |
|
- **Checkpoint**: 435 |
|
- **Max Sequence Length**: 1024 tokens (auto-optimized from sequence analysis) |
|
|
|
### Training Configuration |
|
- **Learning Rate**: 1e-4 |
|
- **Batch Size**: 32 (1 per device Γ 32 gradient accumulation) |
|
- **Epochs**: 3 |
|
- **Optimizer**: adamw_8bit |
|
- **Scheduler**: cosine |
|
- **Weight Decay**: 0.01 |
|
|
|
## π Usage Examples |
|
|
|
```python |
|
import torch |
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
from peft import PeftModel |
|
import warnings |
|
import json |
|
warnings.filterwarnings("ignore") |
|
|
|
# Load base model and tokenizer (using actual base model from training) |
|
base_model_name = "unsloth/Llama-3.2-1B-Instruct" # Actual base model used in training |
|
model_name = "shiprocket-ai/multitask-address-reasoning-llama-1B-model" |
|
|
|
print("π₯ Loading tokenizer...") |
|
# Load tokenizer |
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
|
|
# Add pad token if missing |
|
if tokenizer.pad_token is None: |
|
tokenizer.pad_token = tokenizer.eos_token |
|
|
|
print("π₯ Loading base model...") |
|
# Load base model (non-quantized version as per training script) |
|
base_model = AutoModelForCausalLM.from_pretrained( |
|
base_model_name, |
|
torch_dtype=torch.float16, |
|
device_map="auto", |
|
trust_remote_code=True |
|
) |
|
|
|
print("π₯ Loading LoRA adapter...") |
|
# Load LoRA adapter |
|
model = PeftModel.from_pretrained(base_model, model_name) |
|
|
|
print("β
Model loaded successfully!") |
|
|
|
def process_address_with_reasoning(prompt, max_new_tokens=400): |
|
"""Process address with Chain of Thought reasoning (as trained)""" |
|
|
|
# Tokenize |
|
inputs = tokenizer(prompt, return_tensors="pt", truncation=True, max_length=512) |
|
|
|
# Move inputs to model device |
|
device = next(model.parameters()).device |
|
inputs = {k: v.to(device) for k, v in inputs.items()} |
|
|
|
# Generate with reasoning (matching training parameters) |
|
with torch.no_grad(): |
|
outputs = model.generate( |
|
**inputs, |
|
max_new_tokens=max_new_tokens, |
|
temperature=0.1, # Lower temperature as used in training testing |
|
do_sample=True, |
|
pad_token_id=tokenizer.eos_token_id, |
|
use_cache=True |
|
) |
|
|
|
# Decode only the new tokens |
|
input_length = inputs['input_ids'].shape[1] |
|
generated_tokens = outputs[0][input_length:] |
|
response = tokenizer.decode(generated_tokens, skip_special_tokens=True) |
|
|
|
return response.strip() |
|
|
|
def fix_address_with_reasoning(address, max_new_tokens=400): |
|
"""Fix address with detailed Chain of Thought reasoning""" |
|
|
|
messages = [ |
|
{"role": "user", "content": f"Fix and extract components from: {address}"} |
|
] |
|
|
|
prompt = tokenizer.apply_chat_template( |
|
messages, |
|
tokenize=False, |
|
add_generation_prompt=True |
|
) |
|
|
|
return process_address_with_reasoning(prompt, max_new_tokens) |
|
|
|
def answer_geographic_question(question, max_new_tokens=150): |
|
"""Answer geographic questions about addresses""" |
|
|
|
messages = [ |
|
{"role": "user", "content": question} |
|
] |
|
|
|
prompt = tokenizer.apply_chat_template( |
|
messages, |
|
tokenize=False, |
|
add_generation_prompt=True |
|
) |
|
|
|
return process_address_with_reasoning(prompt, max_new_tokens) |
|
|
|
def extract_components(address, max_new_tokens=200): |
|
"""Extract address components with reasoning""" |
|
|
|
messages = [ |
|
{"role": "user", "content": f"Extract all components from this address: {address}"} |
|
] |
|
|
|
prompt = tokenizer.apply_chat_template( |
|
messages, |
|
tokenize=False, |
|
add_generation_prompt=True |
|
) |
|
|
|
return process_address_with_reasoning(prompt, max_new_tokens) |
|
|
|
# Test cases based on training script examples |
|
print(""" |
|
π MULTI-TASK ADDRESS MODEL EXAMPLES""") |
|
print("=" * 60) |
|
print("""π§ Testing Chain of Thought reasoning + Geographic Q&A""") |
|
print("π Model trained with LoRA r=64, alpha=128 for complex reasoning") |
|
print("=" * 60) |
|
|
|
# Test address correction with reasoning (exact example from training) |
|
test_addresses = [ |
|
"pandit nagla badi masjid moradabad 244001", |
|
"sec 14 gurgoan haryana 122001", |
|
"koramangala bangalor 560095", |
|
"dlf cyber city gurgaon haryana" |
|
] |
|
|
|
print(f""" |
|
π§ TESTING ADDRESS CORRECTION WITH CHAIN OF THOUGHT:""") |
|
print("-" * 50) |
|
|
|
for i, test_address in enumerate(test_addresses, 1): |
|
print(f""" |
|
π Test {i}: {test_address}""") |
|
result = fix_address_with_reasoning(test_address) |
|
print(f"π€ Chain of Thought Response:") |
|
print(f" {result}") |
|
print("-" * 40) |
|
|
|
# Test geographic Q&A (examples from training script) |
|
qa_tests = [ |
|
"Which state is Mumbai in?", |
|
"What is the pincode of Bangalore?", |
|
"Is Delhi a metro city?", |
|
"What tier city is Pune?", |
|
"Where is Connaught Place located?", |
|
"What state does Hyderabad belong to?", |
|
"Name a city in Karnataka.", |
|
"What is the postal code for Gurgaon?", |
|
"Which state is New Delhi in?", # Training example |
|
"What cities are in Maharashtra?" |
|
] |
|
|
|
print(f""" |
|
β TESTING GEOGRAPHIC Q&A:""") |
|
print("-" * 50) |
|
|
|
for i, question in enumerate(qa_tests[:8], 1): # Test first 8 questions |
|
print(f""" |
|
β Q{i}: {question}""") |
|
result = answer_geographic_question(question) |
|
print(f"π€ Answer: {result}") |
|
|
|
# Test component extraction |
|
print(f""" |
|
π TESTING COMPONENT EXTRACTION:""") |
|
print("-" * 50) |
|
|
|
extraction_tests = [ |
|
"Flat 203, Emerald Heights, Sector 15, Gurugram, Haryana 122001", |
|
"DLF Cyber City, Cyber Hub, Gurgaon, Haryana", |
|
"Connaught Place, New Delhi, Delhi 110001" |
|
] |
|
|
|
for i, test_address in enumerate(extraction_tests, 1): |
|
print(f""" |
|
π Extract {i}: {test_address}""") |
|
result = extract_components(test_address) |
|
print(f"π€ Components: {result}") |
|
|
|
print(f""" |
|
β
ALL TESTS COMPLETED!""") |
|
print(f"""π§ Model demonstrates Chain of Thought reasoning""") |
|
print(f"""π Geographic knowledge from NER training data""") |
|
print(f"""π§ Address correction with detailed analysis""") |
|
``` |
|
|
|
## π§ Training Methodology |
|
|
|
This model was trained using a sophisticated multi-task approach: |
|
|
|
### **1. Data Preparation Strategy** |
|
- **Source**: Address NER dataset with structured components (address β corrected_address β extracted_info) |
|
- **Multi-task Split**: 70% Chain of Thought address correction + 30% Geographic Q&A |
|
- **Data Augmentation**: Generated 584.8% of original data from original dataset |
|
- **Reasoning Integration**: Each sample includes step-by-step analytical reasoning |
|
|
|
### **2. Chain of Thought Address Correction** |
|
- **Input**: Raw/incomplete addresses with potential errors |
|
- **Process**: Model analyzes, identifies issues, and explains corrections |
|
- **Output**: Detailed reasoning + structured JSON with address components |
|
- **Examples**: Spelling fixes, state inference, component extraction |
|
|
|
### **3. Geographic Q&A Generation** |
|
From each address record's NER data, the model generates multiple Q&A pairs: |
|
- **State-City relationships**: "Which state is Mumbai in?" β "Mumbai is in Maharashtra state." |
|
- **Pincode queries**: "What is the pincode of Bangalore?" β "The pincode of Bangalore is 560001." |
|
- **City tier classification**: "Is Delhi a metro city?" β "Yes, Delhi is a metropolitan city." |
|
- **Locality mapping**: "Where is Connaught Place?" β "Connaught Place is in New Delhi." |
|
|
|
### **4. Sequence Optimization** |
|
- **Dynamic Analysis**: Analyzed 1000+ samples to determine optimal context length |
|
- **Result**: 99% samples fit in 768 tokens, optimized for 1024 |
|
- **Context Window**: 1024 tokens (auto-optimized from sequence analysis) chosen for reasoning tasks |
|
|
|
## π§ Training Performance |
|
|
|
``` |
|
Final Training Loss: 0.5506 |
|
Training Runtime: 3701.74 seconds (~1 hour) |
|
Training Samples/Second: 3.749 |
|
Training Steps/Second: 0.118 |
|
Total Epochs: 3.0 |
|
``` |
|
|
|
## π Supported Tasks |
|
|
|
### 1. **Address Correction with Reasoning** |
|
- Fix spelling errors and formatting issues |
|
- Infer missing components (state, city tier) |
|
- Provide step-by-step reasoning for corrections |
|
|
|
### 2. **Component Extraction** |
|
- Extract building names, localities, cities, states, pincodes |
|
- Structure unstructured address data |
|
- Identify address hierarchy and relationships |
|
|
|
### 3. **Geographic Q&A** |
|
- Answer questions about cities, states, and locations |
|
- Provide geographic knowledge and relationships |
|
- Handle location-based queries |
|
|
|
### 4. **Address Standardization** |
|
- Convert informal addresses to structured format |
|
- Normalize address formats |
|
- Handle various input formats |
|
|
|
## π‘ Use Cases |
|
|
|
### 1. **E-commerce & Logistics** |
|
- Correct customer addresses during checkout |
|
- Extract delivery components for routing |
|
- Answer location-based customer queries |
|
|
|
### 2. **Data Processing & Migration** |
|
- Clean legacy address databases with reasoning |
|
- Extract structured data from unstructured addresses |
|
- Provide explanations for address corrections |
|
|
|
### 3. **Customer Support Automation** |
|
- Answer geographic questions about locations |
|
- Help customers correct their addresses |
|
- Provide location-based information |
|
|
|
### 4. **Address Intelligence** |
|
- Analyze address patterns and relationships |
|
- Infer missing address components |
|
- Provide geographic context and reasoning |
|
|
|
## π― Prompt Formats |
|
|
|
The model works with Llama-3.2 chat format: |
|
|
|
### Address Correction |
|
``` |
|
<|begin_of_text|><|start_header_id|>user<|end_header_id|> |
|
|
|
Fix and extract components from: [address]<|eot_id|><|start_header_id|>assistant<|end_header_id|> |
|
|
|
``` |
|
|
|
### Geographic Q&A |
|
``` |
|
<|begin_of_text|><|start_header_id|>user<|end_header_id|> |
|
|
|
Which state is [location] in?<|eot_id|><|start_header_id|>assistant<|end_header_id|> |
|
|
|
``` |
|
|
|
### Component Extraction |
|
``` |
|
<|begin_of_text|><|start_header_id|>user<|end_header_id|> |
|
|
|
Extract all components from this address: [address]<|eot_id|><|start_header_id|>assistant<|end_header_id|> |
|
|
|
``` |
|
|
|
## β‘ Performance Tips |
|
|
|
1. **Temperature Settings**: Use 0.1-0.3 for factual tasks, 0.3-0.5 for reasoning tasks |
|
2. **Context Management**: Keep prompts under 512 tokens for optimal performance |
|
3. **Batch Processing**: Process multiple addresses efficiently with batching |
|
4. **Device Placement**: Ensure all tensors are on the same device (GPU/CPU) |
|
5. **Memory Management**: Use float16 for memory efficiency |
|
|
|
## β οΈ Limitations |
|
|
|
- **Model Size**: 1B parameters - may have limitations compared to larger models |
|
- **Training Data**: Based on specific dataset - may not generalize to all address formats |
|
- **Geographic Scope**: Optimized for Indian addresses and geography |
|
- **Reasoning Depth**: Chain of thought reasoning may vary in complexity |
|
- **Device Compatibility**: Requires proper device placement for inference |
|
|
|
## π Model Files |
|
|
|
- `adapter_config.json`: LoRA adapter configuration |
|
- `adapter_model.safetensors`: LoRA adapter weights |
|
- `tokenizer_config.json`: Tokenizer configuration |
|
- `tokenizer.json`: Tokenizer vocabulary and settings |
|
- `special_tokens_map.json`: Special tokens mapping |
|
- `chat_template.jinja`: Chat template for conversations |
|
|
|
## π Model Updates |
|
|
|
- **Version**: 1.0 (Checkpoint 435) |
|
- **Last Updated**: 2025-07-08 |
|
- **Training Framework**: Unsloth + LoRA |
|
- **Base Model**: Llama-3.2-1B-Instruct |
|
|
|
## π Citation |
|
|
|
If you use this model in your research or applications, please cite: |
|
|
|
```bibtex |
|
@misc{multitask-address-reasoning-model, |
|
title={Multi-Task Address Reasoning Model}, |
|
year={2025}, |
|
publisher={Hugging Face}, |
|
url={https://huggingface.co/shiprocket-ai/multitask-address-reasoning-llama-1B-model} |
|
} |
|
``` |
|
|
|
## π Support & Contact |
|
|
|
For questions, issues, or feature requests: |
|
- Open an issue in this repository |
|
- Contact: shiprocket-ai team |
|
- Documentation: See usage examples above |
|
|
|
## π License |
|
|
|
This model is released under the Apache 2.0 License. See LICENSE file for details. |
|
|
|
--- |
|
|
|
*Multi-task address intelligence with reasoning - Built with π§ by shiprocket-ai team using Unsloth* |