license: apache-2.0
language:
- en
base_model:
- meta-llama/Llama-3.2-1B-Instruct
π§ Multi-Task Address Reasoning Model v1.0
This model is a multi-task fine-tuned model specialized for address correction, component extraction, and geographic Q&A with Chain of Thought reasoning. Built on Llama-3.2-1B-Instruct with LoRA fine-tuning using Unsloth.
π― Model Description
Multi-task Llama-3.2-1B model fine-tuned with LoRA for Indian address correction, component extraction, and geographic Q&A using Chain of Thought reasoning
Key Capabilities
- π§ Address Correction: Fix spelling errors, formatting issues, and incomplete addresses
- π Component Extraction: Extract and structure address components (building, locality, city, state, pincode)
- β Geographic Q&A: Answer questions about locations, states, cities, and geographic relationships
- π§ Chain of Thought Reasoning: Detailed step-by-step reasoning for address analysis
- π― Multi-Task Learning: Single model handles multiple address-related tasks
π Model Architecture & Training
- Base Model: unsloth/Llama-3.2-1B-Instruct
- Fine-tuning Method: LoRA (Low-Rank Adaptation) via Unsloth
- LoRA Rank (r): 64
- LoRA Alpha: 128
- LoRA Dropout: 0.1
- Target Modules: q_proj, o_proj, k_proj, up_proj, v_proj, down_proj, gate_proj
- Model Size: ~276MB (adapter only)
- Checkpoint: 435
- Max Sequence Length: 1024 tokens (auto-optimized from sequence analysis)
Training Configuration
- Learning Rate: 1e-4
- Batch Size: 32 (1 per device Γ 32 gradient accumulation)
- Epochs: 3
- Optimizer: adamw_8bit
- Scheduler: cosine
- Weight Decay: 0.01
π Usage Examples
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import warnings
import json
warnings.filterwarnings("ignore")
# Load base model and tokenizer (using actual base model from training)
base_model_name = "unsloth/Llama-3.2-1B-Instruct" # Actual base model used in training
model_name = "shiprocket-ai/multitask-address-reasoning-llama-1B-model"
print("π₯ Loading tokenizer...")
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)
# Add pad token if missing
if tokenizer.pad_token is None:
tokenizer.pad_token = tokenizer.eos_token
print("π₯ Loading base model...")
# Load base model (non-quantized version as per training script)
base_model = AutoModelForCausalLM.from_pretrained(
base_model_name,
torch_dtype=torch.float16,
device_map="auto",
trust_remote_code=True
)
print("π₯ Loading LoRA adapter...")
# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, model_name)
print("β
Model loaded successfully!")
def process_address_with_reasoning(prompt, max_new_tokens=400):
"""Process address with Chain of Thought reasoning (as trained)"""
# Tokenize
inputs = tokenizer(prompt, return_tensors="pt", truncation=True, max_length=512)
# Move inputs to model device
device = next(model.parameters()).device
inputs = {k: v.to(device) for k, v in inputs.items()}
# Generate with reasoning (matching training parameters)
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=max_new_tokens,
temperature=0.1, # Lower temperature as used in training testing
do_sample=True,
pad_token_id=tokenizer.eos_token_id,
use_cache=True
)
# Decode only the new tokens
input_length = inputs['input_ids'].shape[1]
generated_tokens = outputs[0][input_length:]
response = tokenizer.decode(generated_tokens, skip_special_tokens=True)
return response.strip()
def fix_address_with_reasoning(address, max_new_tokens=400):
"""Fix address with detailed Chain of Thought reasoning"""
messages = [
{"role": "user", "content": f"Fix and extract components from: {address}"}
]
prompt = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
return process_address_with_reasoning(prompt, max_new_tokens)
def answer_geographic_question(question, max_new_tokens=150):
"""Answer geographic questions about addresses"""
messages = [
{"role": "user", "content": question}
]
prompt = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
return process_address_with_reasoning(prompt, max_new_tokens)
def extract_components(address, max_new_tokens=200):
"""Extract address components with reasoning"""
messages = [
{"role": "user", "content": f"Extract all components from this address: {address}"}
]
prompt = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
return process_address_with_reasoning(prompt, max_new_tokens)
# Test cases based on training script examples
print("""
π MULTI-TASK ADDRESS MODEL EXAMPLES""")
print("=" * 60)
print("""π§ Testing Chain of Thought reasoning + Geographic Q&A""")
print("π Model trained with LoRA r=64, alpha=128 for complex reasoning")
print("=" * 60)
# Test address correction with reasoning (exact example from training)
test_addresses = [
"pandit nagla badi masjid moradabad 244001",
"sec 14 gurgoan haryana 122001",
"koramangala bangalor 560095",
"dlf cyber city gurgaon haryana"
]
print(f"""
π§ TESTING ADDRESS CORRECTION WITH CHAIN OF THOUGHT:""")
print("-" * 50)
for i, test_address in enumerate(test_addresses, 1):
print(f"""
π Test {i}: {test_address}""")
result = fix_address_with_reasoning(test_address)
print(f"π€ Chain of Thought Response:")
print(f" {result}")
print("-" * 40)
# Test geographic Q&A (examples from training script)
qa_tests = [
"Which state is Mumbai in?",
"What is the pincode of Bangalore?",
"Is Delhi a metro city?",
"What tier city is Pune?",
"Where is Connaught Place located?",
"What state does Hyderabad belong to?",
"Name a city in Karnataka.",
"What is the postal code for Gurgaon?",
"Which state is New Delhi in?", # Training example
"What cities are in Maharashtra?"
]
print(f"""
β TESTING GEOGRAPHIC Q&A:""")
print("-" * 50)
for i, question in enumerate(qa_tests[:8], 1): # Test first 8 questions
print(f"""
β Q{i}: {question}""")
result = answer_geographic_question(question)
print(f"π€ Answer: {result}")
# Test component extraction
print(f"""
π TESTING COMPONENT EXTRACTION:""")
print("-" * 50)
extraction_tests = [
"Flat 203, Emerald Heights, Sector 15, Gurugram, Haryana 122001",
"DLF Cyber City, Cyber Hub, Gurgaon, Haryana",
"Connaught Place, New Delhi, Delhi 110001"
]
for i, test_address in enumerate(extraction_tests, 1):
print(f"""
π Extract {i}: {test_address}""")
result = extract_components(test_address)
print(f"π€ Components: {result}")
print(f"""
β
ALL TESTS COMPLETED!""")
print(f"""π§ Model demonstrates Chain of Thought reasoning""")
print(f"""π Geographic knowledge from NER training data""")
print(f"""π§ Address correction with detailed analysis""")
π§ Training Methodology
This model was trained using a sophisticated multi-task approach:
1. Data Preparation Strategy
- Source: Address NER dataset with structured components (address β corrected_address β extracted_info)
- Multi-task Split: 70% Chain of Thought address correction + 30% Geographic Q&A
- Data Augmentation: Generated 584.8% of original data from original dataset
- Reasoning Integration: Each sample includes step-by-step analytical reasoning
2. Chain of Thought Address Correction
- Input: Raw/incomplete addresses with potential errors
- Process: Model analyzes, identifies issues, and explains corrections
- Output: Detailed reasoning + structured JSON with address components
- Examples: Spelling fixes, state inference, component extraction
3. Geographic Q&A Generation
From each address record's NER data, the model generates multiple Q&A pairs:
- State-City relationships: "Which state is Mumbai in?" β "Mumbai is in Maharashtra state."
- Pincode queries: "What is the pincode of Bangalore?" β "The pincode of Bangalore is 560001."
- City tier classification: "Is Delhi a metro city?" β "Yes, Delhi is a metropolitan city."
- Locality mapping: "Where is Connaught Place?" β "Connaught Place is in New Delhi."
4. Sequence Optimization
- Dynamic Analysis: Analyzed 1000+ samples to determine optimal context length
- Result: 99% samples fit in 768 tokens, optimized for 1024
- Context Window: 1024 tokens (auto-optimized from sequence analysis) chosen for reasoning tasks
π§ Training Performance
Final Training Loss: 0.5506
Training Runtime: 3701.74 seconds (~1 hour)
Training Samples/Second: 3.749
Training Steps/Second: 0.118
Total Epochs: 3.0
π Supported Tasks
1. Address Correction with Reasoning
- Fix spelling errors and formatting issues
- Infer missing components (state, city tier)
- Provide step-by-step reasoning for corrections
2. Component Extraction
- Extract building names, localities, cities, states, pincodes
- Structure unstructured address data
- Identify address hierarchy and relationships
3. Geographic Q&A
- Answer questions about cities, states, and locations
- Provide geographic knowledge and relationships
- Handle location-based queries
4. Address Standardization
- Convert informal addresses to structured format
- Normalize address formats
- Handle various input formats
π‘ Use Cases
1. E-commerce & Logistics
- Correct customer addresses during checkout
- Extract delivery components for routing
- Answer location-based customer queries
2. Data Processing & Migration
- Clean legacy address databases with reasoning
- Extract structured data from unstructured addresses
- Provide explanations for address corrections
3. Customer Support Automation
- Answer geographic questions about locations
- Help customers correct their addresses
- Provide location-based information
4. Address Intelligence
- Analyze address patterns and relationships
- Infer missing address components
- Provide geographic context and reasoning
π― Prompt Formats
The model works with Llama-3.2 chat format:
Address Correction
<|begin_of_text|><|start_header_id|>user<|end_header_id|>
Fix and extract components from: [address]<|eot_id|><|start_header_id|>assistant<|end_header_id|>
Geographic Q&A
<|begin_of_text|><|start_header_id|>user<|end_header_id|>
Which state is [location] in?<|eot_id|><|start_header_id|>assistant<|end_header_id|>
Component Extraction
<|begin_of_text|><|start_header_id|>user<|end_header_id|>
Extract all components from this address: [address]<|eot_id|><|start_header_id|>assistant<|end_header_id|>
β‘ Performance Tips
- Temperature Settings: Use 0.1-0.3 for factual tasks, 0.3-0.5 for reasoning tasks
- Context Management: Keep prompts under 512 tokens for optimal performance
- Batch Processing: Process multiple addresses efficiently with batching
- Device Placement: Ensure all tensors are on the same device (GPU/CPU)
- Memory Management: Use float16 for memory efficiency
β οΈ Limitations
- Model Size: 1B parameters - may have limitations compared to larger models
- Training Data: Based on specific dataset - may not generalize to all address formats
- Geographic Scope: Optimized for Indian addresses and geography
- Reasoning Depth: Chain of thought reasoning may vary in complexity
- Device Compatibility: Requires proper device placement for inference
π Model Files
adapter_config.json
: LoRA adapter configurationadapter_model.safetensors
: LoRA adapter weightstokenizer_config.json
: Tokenizer configurationtokenizer.json
: Tokenizer vocabulary and settingsspecial_tokens_map.json
: Special tokens mappingchat_template.jinja
: Chat template for conversations
π Model Updates
- Version: 1.0 (Checkpoint 435)
- Last Updated: 2025-07-08
- Training Framework: Unsloth + LoRA
- Base Model: Llama-3.2-1B-Instruct
π Citation
If you use this model in your research or applications, please cite:
@misc{multitask-address-reasoning-model,
title={Multi-Task Address Reasoning Model},
year={2025},
publisher={Hugging Face},
url={https://huggingface.co/shiprocket-ai/multitask-address-reasoning-llama-1B-model}
}
π Support & Contact
For questions, issues, or feature requests:
- Open an issue in this repository
- Contact: shiprocket-ai team
- Documentation: See usage examples above
π License
This model is released under the Apache 2.0 License. See LICENSE file for details.
Multi-task address intelligence with reasoning - Built with π§ by shiprocket-ai team using Unsloth