metadata

license: apache-2.0
language:
  - en
base_model:
  - meta-llama/Llama-3.2-1B-Instruct

🧠 Multi-Task Address Reasoning Model v1.0

This model is a multi-task fine-tuned model specialized for address correction, component extraction, and geographic Q&A with Chain of Thought reasoning. Built on Llama-3.2-1B-Instruct with LoRA fine-tuning using Unsloth.

🎯 Model Description

Multi-task Llama-3.2-1B model fine-tuned with LoRA for Indian address correction, component extraction, and geographic Q&A using Chain of Thought reasoning

Key Capabilities

🔧 Address Correction: Fix spelling errors, formatting issues, and incomplete addresses
📊 Component Extraction: Extract and structure address components (building, locality, city, state, pincode)
❓ Geographic Q&A: Answer questions about locations, states, cities, and geographic relationships
🧠 Chain of Thought Reasoning: Detailed step-by-step reasoning for address analysis
🎯 Multi-Task Learning: Single model handles multiple address-related tasks

📊 Model Architecture & Training

Base Model: unsloth/Llama-3.2-1B-Instruct
Fine-tuning Method: LoRA (Low-Rank Adaptation) via Unsloth
LoRA Rank (r): 64
LoRA Alpha: 128
LoRA Dropout: 0.1
Target Modules: q_proj, o_proj, k_proj, up_proj, v_proj, down_proj, gate_proj
Model Size: ~276MB (adapter only)
Checkpoint: 435
Max Sequence Length: 1024 tokens (auto-optimized from sequence analysis)

Training Configuration

Learning Rate: 1e-4
Batch Size: 32 (1 per device × 32 gradient accumulation)
Epochs: 3
Optimizer: adamw_8bit
Scheduler: cosine
Weight Decay: 0.01

🚀 Usage Examples

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import warnings
import json
warnings.filterwarnings("ignore")

# Load base model and tokenizer (using actual base model from training)
base_model_name = "unsloth/Llama-3.2-1B-Instruct"  # Actual base model used in training
model_name = "shiprocket-ai/multitask-address-reasoning-llama-1B-model"

print("📥 Loading tokenizer...")
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Add pad token if missing
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

print("📥 Loading base model...")
# Load base model (non-quantized version as per training script)
base_model = AutoModelForCausalLM.from_pretrained(
    base_model_name,
    torch_dtype=torch.float16,
    device_map="auto",
    trust_remote_code=True
)

print("📥 Loading LoRA adapter...")
# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, model_name)

print("✅ Model loaded successfully!")

def process_address_with_reasoning(prompt, max_new_tokens=400):
    """Process address with Chain of Thought reasoning (as trained)"""

    # Tokenize
    inputs = tokenizer(prompt, return_tensors="pt", truncation=True, max_length=512)

    # Move inputs to model device
    device = next(model.parameters()).device
    inputs = {k: v.to(device) for k, v in inputs.items()}

    # Generate with reasoning (matching training parameters)
    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=max_new_tokens,
            temperature=0.1,  # Lower temperature as used in training testing
            do_sample=True,
            pad_token_id=tokenizer.eos_token_id,
            use_cache=True
        )

    # Decode only the new tokens
    input_length = inputs['input_ids'].shape[1]
    generated_tokens = outputs[0][input_length:]
    response = tokenizer.decode(generated_tokens, skip_special_tokens=True)

    return response.strip()

def fix_address_with_reasoning(address, max_new_tokens=400):
    """Fix address with detailed Chain of Thought reasoning"""

    messages = [
        {"role": "user", "content": f"Fix and extract components from: {address}"}
    ]

    prompt = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True
    )

    return process_address_with_reasoning(prompt, max_new_tokens)

def answer_geographic_question(question, max_new_tokens=150):
    """Answer geographic questions about addresses"""

    messages = [
        {"role": "user", "content": question}
    ]

    prompt = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True
    )

    return process_address_with_reasoning(prompt, max_new_tokens)

def extract_components(address, max_new_tokens=200):
    """Extract address components with reasoning"""

    messages = [
        {"role": "user", "content": f"Extract all components from this address: {address}"}
    ]

    prompt = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True
    )

    return process_address_with_reasoning(prompt, max_new_tokens)

# Test cases based on training script examples
print("""
🏠 MULTI-TASK ADDRESS MODEL EXAMPLES""")
print("=" * 60)
print("""🧠 Testing Chain of Thought reasoning + Geographic Q&A""")
print("📊 Model trained with LoRA r=64, alpha=128 for complex reasoning")
print("=" * 60)

# Test address correction with reasoning (exact example from training)
test_addresses = [
    "pandit nagla badi masjid moradabad 244001",
    "sec 14 gurgoan haryana 122001",
    "koramangala bangalor 560095",
    "dlf cyber city gurgaon haryana"
]

print(f"""
🔧 TESTING ADDRESS CORRECTION WITH CHAIN OF THOUGHT:""")
print("-" * 50)

for i, test_address in enumerate(test_addresses, 1):
    print(f"""
📍 Test {i}: {test_address}""")
    result = fix_address_with_reasoning(test_address)
    print(f"🤖 Chain of Thought Response:")
    print(f"   {result}")
    print("-" * 40)

# Test geographic Q&A (examples from training script)
qa_tests = [
    "Which state is Mumbai in?",
    "What is the pincode of Bangalore?",
    "Is Delhi a metro city?",
    "What tier city is Pune?",
    "Where is Connaught Place located?",
    "What state does Hyderabad belong to?",
    "Name a city in Karnataka.",
    "What is the postal code for Gurgaon?",
    "Which state is New Delhi in?",  # Training example
    "What cities are in Maharashtra?"
]

print(f"""
❓ TESTING GEOGRAPHIC Q&A:""")
print("-" * 50)

for i, question in enumerate(qa_tests[:8], 1):  # Test first 8 questions
    print(f"""
❓ Q{i}: {question}""")
    result = answer_geographic_question(question)
    print(f"🤖 Answer: {result}")

# Test component extraction
print(f"""
📊 TESTING COMPONENT EXTRACTION:""")
print("-" * 50)

extraction_tests = [
    "Flat 203, Emerald Heights, Sector 15, Gurugram, Haryana 122001",
    "DLF Cyber City, Cyber Hub, Gurgaon, Haryana",
    "Connaught Place, New Delhi, Delhi 110001"
]

for i, test_address in enumerate(extraction_tests, 1):
    print(f"""
📊 Extract {i}: {test_address}""")
    result = extract_components(test_address)
    print(f"🤖 Components: {result}")

print(f"""
✅ ALL TESTS COMPLETED!""")
print(f"""🧠 Model demonstrates Chain of Thought reasoning""")
print(f"""📍 Geographic knowledge from NER training data""")
print(f"""🔧 Address correction with detailed analysis""")

🧠 Training Methodology

This model was trained using a sophisticated multi-task approach:

1. Data Preparation Strategy

Source: Address NER dataset with structured components (address → corrected_address → extracted_info)
Multi-task Split: 70% Chain of Thought address correction + 30% Geographic Q&A
Data Augmentation: Generated 584.8% of original data from original dataset
Reasoning Integration: Each sample includes step-by-step analytical reasoning

2. Chain of Thought Address Correction

Input: Raw/incomplete addresses with potential errors
Process: Model analyzes, identifies issues, and explains corrections
Output: Detailed reasoning + structured JSON with address components
Examples: Spelling fixes, state inference, component extraction

3. Geographic Q&A Generation

From each address record's NER data, the model generates multiple Q&A pairs:

State-City relationships: "Which state is Mumbai in?" → "Mumbai is in Maharashtra state."
Pincode queries: "What is the pincode of Bangalore?" → "The pincode of Bangalore is 560001."
City tier classification: "Is Delhi a metro city?" → "Yes, Delhi is a metropolitan city."
Locality mapping: "Where is Connaught Place?" → "Connaught Place is in New Delhi."

4. Sequence Optimization

Dynamic Analysis: Analyzed 1000+ samples to determine optimal context length
Result: 99% samples fit in 768 tokens, optimized for 1024
Context Window: 1024 tokens (auto-optimized from sequence analysis) chosen for reasoning tasks

🔧 Training Performance

Final Training Loss: 0.5506
Training Runtime: 3701.74 seconds (~1 hour)
Training Samples/Second: 3.749
Training Steps/Second: 0.118
Total Epochs: 3.0

🎭 Supported Tasks

1. Address Correction with Reasoning

Fix spelling errors and formatting issues
Infer missing components (state, city tier)
Provide step-by-step reasoning for corrections

2. Component Extraction

Extract building names, localities, cities, states, pincodes
Structure unstructured address data
Identify address hierarchy and relationships

3. Geographic Q&A

Answer questions about cities, states, and locations
Provide geographic knowledge and relationships
Handle location-based queries

4. Address Standardization

Convert informal addresses to structured format
Normalize address formats
Handle various input formats

💡 Use Cases

1. E-commerce & Logistics

Correct customer addresses during checkout
Extract delivery components for routing
Answer location-based customer queries

2. Data Processing & Migration

Clean legacy address databases with reasoning
Extract structured data from unstructured addresses
Provide explanations for address corrections

3. Customer Support Automation

Answer geographic questions about locations
Help customers correct their addresses
Provide location-based information

4. Address Intelligence

Analyze address patterns and relationships
Infer missing address components
Provide geographic context and reasoning

🎯 Prompt Formats

The model works with Llama-3.2 chat format:

Address Correction

<|begin_of_text|><|start_header_id|>user<|end_header_id|>

Fix and extract components from: [address]<|eot_id|><|start_header_id|>assistant<|end_header_id|>

Geographic Q&A

<|begin_of_text|><|start_header_id|>user<|end_header_id|>

Which state is [location] in?<|eot_id|><|start_header_id|>assistant<|end_header_id|>

Component Extraction

<|begin_of_text|><|start_header_id|>user<|end_header_id|>

Extract all components from this address: [address]<|eot_id|><|start_header_id|>assistant<|end_header_id|>

⚡ Performance Tips

Temperature Settings: Use 0.1-0.3 for factual tasks, 0.3-0.5 for reasoning tasks
Context Management: Keep prompts under 512 tokens for optimal performance
Batch Processing: Process multiple addresses efficiently with batching
Device Placement: Ensure all tensors are on the same device (GPU/CPU)
Memory Management: Use float16 for memory efficiency

⚠️ Limitations

Model Size: 1B parameters - may have limitations compared to larger models
Training Data: Based on specific dataset - may not generalize to all address formats
Geographic Scope: Optimized for Indian addresses and geography
Reasoning Depth: Chain of thought reasoning may vary in complexity
Device Compatibility: Requires proper device placement for inference

📋 Model Files

adapter_config.json: LoRA adapter configuration
adapter_model.safetensors: LoRA adapter weights
tokenizer_config.json: Tokenizer configuration
tokenizer.json: Tokenizer vocabulary and settings
special_tokens_map.json: Special tokens mapping
chat_template.jinja: Chat template for conversations

🔄 Model Updates

Version: 1.0 (Checkpoint 435)
Last Updated: 2025-07-08
Training Framework: Unsloth + LoRA
Base Model: Llama-3.2-1B-Instruct

📚 Citation

If you use this model in your research or applications, please cite:

@misc{multitask-address-reasoning-model,
  title={Multi-Task Address Reasoning Model},
  year={2025},
  publisher={Hugging Face},
  url={https://huggingface.co/shiprocket-ai/multitask-address-reasoning-llama-1B-model}
}

📞 Support & Contact

For questions, issues, or feature requests:

Open an issue in this repository
Contact: shiprocket-ai team
Documentation: See usage examples above

📜 License

This model is released under the Apache 2.0 License. See LICENSE file for details.

Multi-task address intelligence with reasoning - Built with 🧠 by shiprocket-ai team using Unsloth