File size: 13,177 Bytes

---
license: apache-2.0
language:
- en
base_model:
- meta-llama/Llama-3.2-1B-Instruct
---
# 🧠 Multi-Task Address Reasoning Model v1.0

This model is a **multi-task fine-tuned model** specialized for **address correction, component extraction, and geographic Q&A** with **Chain of Thought reasoning**. Built on Llama-3.2-1B-Instruct with LoRA fine-tuning using Unsloth.

## 🎯 Model Description

Multi-task Llama-3.2-1B model fine-tuned with LoRA for Indian address correction, component extraction, and geographic Q&A using Chain of Thought reasoning

### Key Capabilities

- **🔧 Address Correction**: Fix spelling errors, formatting issues, and incomplete addresses
- **📊 Component Extraction**: Extract and structure address components (building, locality, city, state, pincode)
- **❓ Geographic Q&A**: Answer questions about locations, states, cities, and geographic relationships
- **🧠 Chain of Thought Reasoning**: Detailed step-by-step reasoning for address analysis
- **🎯 Multi-Task Learning**: Single model handles multiple address-related tasks

## 📊 Model Architecture & Training

- **Base Model**: unsloth/Llama-3.2-1B-Instruct
- **Fine-tuning Method**: LoRA (Low-Rank Adaptation) via Unsloth
- **LoRA Rank (r)**: 64
- **LoRA Alpha**: 128
- **LoRA Dropout**: 0.1
- **Target Modules**: q_proj, o_proj, k_proj, up_proj, v_proj, down_proj, gate_proj
- **Model Size**: ~276MB (adapter only)
- **Checkpoint**: 435
- **Max Sequence Length**: 1024 tokens (auto-optimized from sequence analysis)

### Training Configuration
- **Learning Rate**: 1e-4
- **Batch Size**: 32 (1 per device × 32 gradient accumulation)
- **Epochs**: 3
- **Optimizer**: adamw_8bit
- **Scheduler**: cosine
- **Weight Decay**: 0.01

## 🚀 Usage Examples

```python
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import warnings
import json
warnings.filterwarnings("ignore")

# Load base model and tokenizer (using actual base model from training)
base_model_name = "unsloth/Llama-3.2-1B-Instruct"  # Actual base model used in training
model_name = "shiprocket-ai/multitask-address-reasoning-llama-1B-model"

print("📥 Loading tokenizer...")
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Add pad token if missing
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

print("📥 Loading base model...")
# Load base model (non-quantized version as per training script)
base_model = AutoModelForCausalLM.from_pretrained(
    base_model_name,
    torch_dtype=torch.float16,
    device_map="auto",
    trust_remote_code=True
)

print("📥 Loading LoRA adapter...")
# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, model_name)

print("✅ Model loaded successfully!")

def process_address_with_reasoning(prompt, max_new_tokens=400):
    """Process address with Chain of Thought reasoning (as trained)"""

    # Tokenize
    inputs = tokenizer(prompt, return_tensors="pt", truncation=True, max_length=512)

    # Move inputs to model device
    device = next(model.parameters()).device
    inputs = {k: v.to(device) for k, v in inputs.items()}

    # Generate with reasoning (matching training parameters)
    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=max_new_tokens,
            temperature=0.1,  # Lower temperature as used in training testing
            do_sample=True,
            pad_token_id=tokenizer.eos_token_id,
            use_cache=True
        )

    # Decode only the new tokens
    input_length = inputs['input_ids'].shape[1]
    generated_tokens = outputs[0][input_length:]
    response = tokenizer.decode(generated_tokens, skip_special_tokens=True)

    return response.strip()

def fix_address_with_reasoning(address, max_new_tokens=400):
    """Fix address with detailed Chain of Thought reasoning"""

    messages = [
        {"role": "user", "content": f"Fix and extract components from: {address}"}
    ]

    prompt = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True
    )

    return process_address_with_reasoning(prompt, max_new_tokens)

def answer_geographic_question(question, max_new_tokens=150):
    """Answer geographic questions about addresses"""

    messages = [
        {"role": "user", "content": question}
    ]

    prompt = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True
    )

    return process_address_with_reasoning(prompt, max_new_tokens)

def extract_components(address, max_new_tokens=200):
    """Extract address components with reasoning"""

    messages = [
        {"role": "user", "content": f"Extract all components from this address: {address}"}
    ]

    prompt = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True
    )

    return process_address_with_reasoning(prompt, max_new_tokens)

# Test cases based on training script examples
print("""
🏠 MULTI-TASK ADDRESS MODEL EXAMPLES""")
print("=" * 60)
print("""🧠 Testing Chain of Thought reasoning + Geographic Q&A""")
print("📊 Model trained with LoRA r=64, alpha=128 for complex reasoning")
print("=" * 60)

# Test address correction with reasoning (exact example from training)
test_addresses = [
    "pandit nagla badi masjid moradabad 244001",
    "sec 14 gurgoan haryana 122001",
    "koramangala bangalor 560095",
    "dlf cyber city gurgaon haryana"
]

print(f"""
🔧 TESTING ADDRESS CORRECTION WITH CHAIN OF THOUGHT:""")
print("-" * 50)

for i, test_address in enumerate(test_addresses, 1):
    print(f"""
📍 Test {i}: {test_address}""")
    result = fix_address_with_reasoning(test_address)
    print(f"🤖 Chain of Thought Response:")
    print(f"   {result}")
    print("-" * 40)

# Test geographic Q&A (examples from training script)
qa_tests = [
    "Which state is Mumbai in?",
    "What is the pincode of Bangalore?",
    "Is Delhi a metro city?",
    "What tier city is Pune?",
    "Where is Connaught Place located?",
    "What state does Hyderabad belong to?",
    "Name a city in Karnataka.",
    "What is the postal code for Gurgaon?",
    "Which state is New Delhi in?",  # Training example
    "What cities are in Maharashtra?"
]

print(f"""
❓ TESTING GEOGRAPHIC Q&A:""")
print("-" * 50)

for i, question in enumerate(qa_tests[:8], 1):  # Test first 8 questions
    print(f"""
❓ Q{i}: {question}""")
    result = answer_geographic_question(question)
    print(f"🤖 Answer: {result}")

# Test component extraction
print(f"""
📊 TESTING COMPONENT EXTRACTION:""")
print("-" * 50)

extraction_tests = [
    "Flat 203, Emerald Heights, Sector 15, Gurugram, Haryana 122001",
    "DLF Cyber City, Cyber Hub, Gurgaon, Haryana",
    "Connaught Place, New Delhi, Delhi 110001"
]

for i, test_address in enumerate(extraction_tests, 1):
    print(f"""
📊 Extract {i}: {test_address}""")
    result = extract_components(test_address)
    print(f"🤖 Components: {result}")

print(f"""
✅ ALL TESTS COMPLETED!""")
print(f"""🧠 Model demonstrates Chain of Thought reasoning""")
print(f"""📍 Geographic knowledge from NER training data""")
print(f"""🔧 Address correction with detailed analysis""")
```

## 🧠 Training Methodology

This model was trained using a sophisticated multi-task approach:

### **1. Data Preparation Strategy**
- **Source**: Address NER dataset with structured components (address → corrected_address → extracted_info)
- **Multi-task Split**: 70% Chain of Thought address correction + 30% Geographic Q&A
- **Data Augmentation**: Generated 584.8% of original data from original dataset
- **Reasoning Integration**: Each sample includes step-by-step analytical reasoning

### **2. Chain of Thought Address Correction**
- **Input**: Raw/incomplete addresses with potential errors
- **Process**: Model analyzes, identifies issues, and explains corrections
- **Output**: Detailed reasoning + structured JSON with address components
- **Examples**: Spelling fixes, state inference, component extraction

### **3. Geographic Q&A Generation**
From each address record's NER data, the model generates multiple Q&A pairs:
- **State-City relationships**: "Which state is Mumbai in?" → "Mumbai is in Maharashtra state."
- **Pincode queries**: "What is the pincode of Bangalore?" → "The pincode of Bangalore is 560001."
- **City tier classification**: "Is Delhi a metro city?" → "Yes, Delhi is a metropolitan city."
- **Locality mapping**: "Where is Connaught Place?" → "Connaught Place is in New Delhi."

### **4. Sequence Optimization**
- **Dynamic Analysis**: Analyzed 1000+ samples to determine optimal context length
- **Result**: 99% samples fit in 768 tokens, optimized for 1024
- **Context Window**: 1024 tokens (auto-optimized from sequence analysis) chosen for reasoning tasks

## 🔧 Training Performance

```
Final Training Loss: 0.5506
Training Runtime: 3701.74 seconds (~1 hour)
Training Samples/Second: 3.749
Training Steps/Second: 0.118
Total Epochs: 3.0
```

## 🎭 Supported Tasks

### 1. **Address Correction with Reasoning**
- Fix spelling errors and formatting issues
- Infer missing components (state, city tier)
- Provide step-by-step reasoning for corrections

### 2. **Component Extraction**
- Extract building names, localities, cities, states, pincodes
- Structure unstructured address data
- Identify address hierarchy and relationships

### 3. **Geographic Q&A**
- Answer questions about cities, states, and locations
- Provide geographic knowledge and relationships
- Handle location-based queries

### 4. **Address Standardization**
- Convert informal addresses to structured format
- Normalize address formats
- Handle various input formats

## 💡 Use Cases

### 1. **E-commerce & Logistics**
- Correct customer addresses during checkout
- Extract delivery components for routing
- Answer location-based customer queries

### 2. **Data Processing & Migration**
- Clean legacy address databases with reasoning
- Extract structured data from unstructured addresses
- Provide explanations for address corrections

### 3. **Customer Support Automation**
- Answer geographic questions about locations
- Help customers correct their addresses
- Provide location-based information

### 4. **Address Intelligence**
- Analyze address patterns and relationships
- Infer missing address components
- Provide geographic context and reasoning

## 🎯 Prompt Formats

The model works with Llama-3.2 chat format:

### Address Correction
```
<|begin_of_text|><|start_header_id|>user<|end_header_id|>

Fix and extract components from: [address]<|eot_id|><|start_header_id|>assistant<|end_header_id|>

```

### Geographic Q&A
```
<|begin_of_text|><|start_header_id|>user<|end_header_id|>

Which state is [location] in?<|eot_id|><|start_header_id|>assistant<|end_header_id|>

```

### Component Extraction
```
<|begin_of_text|><|start_header_id|>user<|end_header_id|>

Extract all components from this address: [address]<|eot_id|><|start_header_id|>assistant<|end_header_id|>

```

## ⚡ Performance Tips

1. **Temperature Settings**: Use 0.1-0.3 for factual tasks, 0.3-0.5 for reasoning tasks
2. **Context Management**: Keep prompts under 512 tokens for optimal performance
3. **Batch Processing**: Process multiple addresses efficiently with batching
4. **Device Placement**: Ensure all tensors are on the same device (GPU/CPU)
5. **Memory Management**: Use float16 for memory efficiency

## ⚠️ Limitations

- **Model Size**: 1B parameters - may have limitations compared to larger models
- **Training Data**: Based on specific dataset - may not generalize to all address formats
- **Geographic Scope**: Optimized for Indian addresses and geography
- **Reasoning Depth**: Chain of thought reasoning may vary in complexity
- **Device Compatibility**: Requires proper device placement for inference

## 📋 Model Files

- `adapter_config.json`: LoRA adapter configuration
- `adapter_model.safetensors`: LoRA adapter weights
- `tokenizer_config.json`: Tokenizer configuration
- `tokenizer.json`: Tokenizer vocabulary and settings
- `special_tokens_map.json`: Special tokens mapping
- `chat_template.jinja`: Chat template for conversations

## 🔄 Model Updates

- **Version**: 1.0 (Checkpoint 435)
- **Last Updated**: 2025-07-08
- **Training Framework**: Unsloth + LoRA
- **Base Model**: Llama-3.2-1B-Instruct

## 📚 Citation

If you use this model in your research or applications, please cite:

```bibtex
@misc{multitask-address-reasoning-model,
  title={Multi-Task Address Reasoning Model},
  year={2025},
  publisher={Hugging Face},
  url={https://huggingface.co/shiprocket-ai/multitask-address-reasoning-llama-1B-model}
}
```

## 📞 Support & Contact

For questions, issues, or feature requests:
- Open an issue in this repository
- Contact: shiprocket-ai team
- Documentation: See usage examples above

## 📜 License

This model is released under the Apache 2.0 License. See LICENSE file for details.

---

*Multi-task address intelligence with reasoning - Built with 🧠 by shiprocket-ai team using Unsloth*