🧠 Multi-Task Address Reasoning Model v1.0

This model is a multi-task fine-tuned model specialized for address correction, component extraction, and geographic Q&A with Chain of Thought reasoning. Built on Llama-3.2-1B-Instruct with LoRA fine-tuning using Unsloth.

🎯 Model Description

Multi-task Llama-3.2-1B model fine-tuned with LoRA for Indian address correction, component extraction, and geographic Q&A using Chain of Thought reasoning

Key Capabilities

  • πŸ”§ Address Correction: Fix spelling errors, formatting issues, and incomplete addresses
  • πŸ“Š Component Extraction: Extract and structure address components (building, locality, city, state, pincode)
  • ❓ Geographic Q&A: Answer questions about locations, states, cities, and geographic relationships
  • 🧠 Chain of Thought Reasoning: Detailed step-by-step reasoning for address analysis
  • 🎯 Multi-Task Learning: Single model handles multiple address-related tasks

πŸ“Š Model Architecture & Training

  • Base Model: unsloth/Llama-3.2-1B-Instruct
  • Fine-tuning Method: LoRA (Low-Rank Adaptation) via Unsloth
  • LoRA Rank (r): 64
  • LoRA Alpha: 128
  • LoRA Dropout: 0.1
  • Target Modules: q_proj, o_proj, k_proj, up_proj, v_proj, down_proj, gate_proj
  • Model Size: ~276MB (adapter only)
  • Checkpoint: 435
  • Max Sequence Length: 1024 tokens (auto-optimized from sequence analysis)

Training Configuration

  • Learning Rate: 1e-4
  • Batch Size: 32 (1 per device Γ— 32 gradient accumulation)
  • Epochs: 3
  • Optimizer: adamw_8bit
  • Scheduler: cosine
  • Weight Decay: 0.01

πŸš€ Usage Examples

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import warnings
import json
warnings.filterwarnings("ignore")

# Load base model and tokenizer (using actual base model from training)
base_model_name = "unsloth/Llama-3.2-1B-Instruct"  # Actual base model used in training
model_name = "shiprocket-ai/multitask-address-reasoning-llama-1B-model"

print("πŸ“₯ Loading tokenizer...")
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Add pad token if missing
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

print("πŸ“₯ Loading base model...")
# Load base model (non-quantized version as per training script)
base_model = AutoModelForCausalLM.from_pretrained(
    base_model_name,
    torch_dtype=torch.float16,
    device_map="auto",
    trust_remote_code=True
)

print("πŸ“₯ Loading LoRA adapter...")
# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, model_name)

print("βœ… Model loaded successfully!")

def process_address_with_reasoning(prompt, max_new_tokens=400):
    """Process address with Chain of Thought reasoning (as trained)"""

    # Tokenize
    inputs = tokenizer(prompt, return_tensors="pt", truncation=True, max_length=512)

    # Move inputs to model device
    device = next(model.parameters()).device
    inputs = {k: v.to(device) for k, v in inputs.items()}

    # Generate with reasoning (matching training parameters)
    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=max_new_tokens,
            temperature=0.1,  # Lower temperature as used in training testing
            do_sample=True,
            pad_token_id=tokenizer.eos_token_id,
            use_cache=True
        )

    # Decode only the new tokens
    input_length = inputs['input_ids'].shape[1]
    generated_tokens = outputs[0][input_length:]
    response = tokenizer.decode(generated_tokens, skip_special_tokens=True)

    return response.strip()

def fix_address_with_reasoning(address, max_new_tokens=400):
    """Fix address with detailed Chain of Thought reasoning"""

    messages = [
        {"role": "user", "content": f"Fix and extract components from: {address}"}
    ]

    prompt = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True
    )

    return process_address_with_reasoning(prompt, max_new_tokens)

def answer_geographic_question(question, max_new_tokens=150):
    """Answer geographic questions about addresses"""

    messages = [
        {"role": "user", "content": question}
    ]

    prompt = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True
    )

    return process_address_with_reasoning(prompt, max_new_tokens)

def extract_components(address, max_new_tokens=200):
    """Extract address components with reasoning"""

    messages = [
        {"role": "user", "content": f"Extract all components from this address: {address}"}
    ]

    prompt = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True
    )

    return process_address_with_reasoning(prompt, max_new_tokens)

# Test cases based on training script examples
print("""
🏠 MULTI-TASK ADDRESS MODEL EXAMPLES""")
print("=" * 60)
print("""🧠 Testing Chain of Thought reasoning + Geographic Q&A""")
print("πŸ“Š Model trained with LoRA r=64, alpha=128 for complex reasoning")
print("=" * 60)

# Test address correction with reasoning (exact example from training)
test_addresses = [
    "pandit nagla badi masjid moradabad 244001",
    "sec 14 gurgoan haryana 122001",
    "koramangala bangalor 560095",
    "dlf cyber city gurgaon haryana"
]

print(f"""
πŸ”§ TESTING ADDRESS CORRECTION WITH CHAIN OF THOUGHT:""")
print("-" * 50)

for i, test_address in enumerate(test_addresses, 1):
    print(f"""
πŸ“ Test {i}: {test_address}""")
    result = fix_address_with_reasoning(test_address)
    print(f"πŸ€– Chain of Thought Response:")
    print(f"   {result}")
    print("-" * 40)

# Test geographic Q&A (examples from training script)
qa_tests = [
    "Which state is Mumbai in?",
    "What is the pincode of Bangalore?",
    "Is Delhi a metro city?",
    "What tier city is Pune?",
    "Where is Connaught Place located?",
    "What state does Hyderabad belong to?",
    "Name a city in Karnataka.",
    "What is the postal code for Gurgaon?",
    "Which state is New Delhi in?",  # Training example
    "What cities are in Maharashtra?"
]

print(f"""
❓ TESTING GEOGRAPHIC Q&A:""")
print("-" * 50)

for i, question in enumerate(qa_tests[:8], 1):  # Test first 8 questions
    print(f"""
❓ Q{i}: {question}""")
    result = answer_geographic_question(question)
    print(f"πŸ€– Answer: {result}")

# Test component extraction
print(f"""
πŸ“Š TESTING COMPONENT EXTRACTION:""")
print("-" * 50)

extraction_tests = [
    "Flat 203, Emerald Heights, Sector 15, Gurugram, Haryana 122001",
    "DLF Cyber City, Cyber Hub, Gurgaon, Haryana",
    "Connaught Place, New Delhi, Delhi 110001"
]

for i, test_address in enumerate(extraction_tests, 1):
    print(f"""
πŸ“Š Extract {i}: {test_address}""")
    result = extract_components(test_address)
    print(f"πŸ€– Components: {result}")

print(f"""
βœ… ALL TESTS COMPLETED!""")
print(f"""🧠 Model demonstrates Chain of Thought reasoning""")
print(f"""πŸ“ Geographic knowledge from NER training data""")
print(f"""πŸ”§ Address correction with detailed analysis""")

🧠 Training Methodology

This model was trained using a sophisticated multi-task approach:

1. Data Preparation Strategy

  • Source: Address NER dataset with structured components (address β†’ corrected_address β†’ extracted_info)
  • Multi-task Split: 70% Chain of Thought address correction + 30% Geographic Q&A
  • Data Augmentation: Generated 584.8% of original data from original dataset
  • Reasoning Integration: Each sample includes step-by-step analytical reasoning

2. Chain of Thought Address Correction

  • Input: Raw/incomplete addresses with potential errors
  • Process: Model analyzes, identifies issues, and explains corrections
  • Output: Detailed reasoning + structured JSON with address components
  • Examples: Spelling fixes, state inference, component extraction

3. Geographic Q&A Generation

From each address record's NER data, the model generates multiple Q&A pairs:

  • State-City relationships: "Which state is Mumbai in?" β†’ "Mumbai is in Maharashtra state."
  • Pincode queries: "What is the pincode of Bangalore?" β†’ "The pincode of Bangalore is 560001."
  • City tier classification: "Is Delhi a metro city?" β†’ "Yes, Delhi is a metropolitan city."
  • Locality mapping: "Where is Connaught Place?" β†’ "Connaught Place is in New Delhi."

4. Sequence Optimization

  • Dynamic Analysis: Analyzed 1000+ samples to determine optimal context length
  • Result: 99% samples fit in 768 tokens, optimized for 1024
  • Context Window: 1024 tokens (auto-optimized from sequence analysis) chosen for reasoning tasks

πŸ”§ Training Performance

Final Training Loss: 0.5506
Training Runtime: 3701.74 seconds (~1 hour)
Training Samples/Second: 3.749
Training Steps/Second: 0.118
Total Epochs: 3.0

🎭 Supported Tasks

1. Address Correction with Reasoning

  • Fix spelling errors and formatting issues
  • Infer missing components (state, city tier)
  • Provide step-by-step reasoning for corrections

2. Component Extraction

  • Extract building names, localities, cities, states, pincodes
  • Structure unstructured address data
  • Identify address hierarchy and relationships

3. Geographic Q&A

  • Answer questions about cities, states, and locations
  • Provide geographic knowledge and relationships
  • Handle location-based queries

4. Address Standardization

  • Convert informal addresses to structured format
  • Normalize address formats
  • Handle various input formats

πŸ’‘ Use Cases

1. E-commerce & Logistics

  • Correct customer addresses during checkout
  • Extract delivery components for routing
  • Answer location-based customer queries

2. Data Processing & Migration

  • Clean legacy address databases with reasoning
  • Extract structured data from unstructured addresses
  • Provide explanations for address corrections

3. Customer Support Automation

  • Answer geographic questions about locations
  • Help customers correct their addresses
  • Provide location-based information

4. Address Intelligence

  • Analyze address patterns and relationships
  • Infer missing address components
  • Provide geographic context and reasoning

🎯 Prompt Formats

The model works with Llama-3.2 chat format:

Address Correction

<|begin_of_text|><|start_header_id|>user<|end_header_id|>

Fix and extract components from: [address]<|eot_id|><|start_header_id|>assistant<|end_header_id|>

Geographic Q&A

<|begin_of_text|><|start_header_id|>user<|end_header_id|>

Which state is [location] in?<|eot_id|><|start_header_id|>assistant<|end_header_id|>

Component Extraction

<|begin_of_text|><|start_header_id|>user<|end_header_id|>

Extract all components from this address: [address]<|eot_id|><|start_header_id|>assistant<|end_header_id|>

⚑ Performance Tips

  1. Temperature Settings: Use 0.1-0.3 for factual tasks, 0.3-0.5 for reasoning tasks
  2. Context Management: Keep prompts under 512 tokens for optimal performance
  3. Batch Processing: Process multiple addresses efficiently with batching
  4. Device Placement: Ensure all tensors are on the same device (GPU/CPU)
  5. Memory Management: Use float16 for memory efficiency

⚠️ Limitations

  • Model Size: 1B parameters - may have limitations compared to larger models
  • Training Data: Based on specific dataset - may not generalize to all address formats
  • Geographic Scope: Optimized for Indian addresses and geography
  • Reasoning Depth: Chain of thought reasoning may vary in complexity
  • Device Compatibility: Requires proper device placement for inference

πŸ“‹ Model Files

  • adapter_config.json: LoRA adapter configuration
  • adapter_model.safetensors: LoRA adapter weights
  • tokenizer_config.json: Tokenizer configuration
  • tokenizer.json: Tokenizer vocabulary and settings
  • special_tokens_map.json: Special tokens mapping
  • chat_template.jinja: Chat template for conversations

πŸ”„ Model Updates

  • Version: 1.0 (Checkpoint 435)
  • Last Updated: 2025-07-08
  • Training Framework: Unsloth + LoRA
  • Base Model: Llama-3.2-1B-Instruct

πŸ“š Citation

If you use this model in your research or applications, please cite:

@misc{multitask-address-reasoning-model,
  title={Multi-Task Address Reasoning Model},
  year={2025},
  publisher={Hugging Face},
  url={https://huggingface.co/shiprocket-ai/multitask-address-reasoning-llama-1B-model}
}

πŸ“ž Support & Contact

For questions, issues, or feature requests:

  • Open an issue in this repository
  • Contact: shiprocket-ai team
  • Documentation: See usage examples above

πŸ“œ License

This model is released under the Apache 2.0 License. See LICENSE file for details.


Multi-task address intelligence with reasoning - Built with 🧠 by shiprocket-ai team using Unsloth

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for shiprocket-ai/multitask-address-reasoning-llama-1B-model

Finetuned
(1025)
this model