File size: 13,177 Bytes
4f64736 96a8b4d 906eb28 96a8b4d 906eb28 96a8b4d 906eb28 96a8b4d 906eb28 96a8b4d 906eb28 96a8b4d 906eb28 96a8b4d 906eb28 96a8b4d 906eb28 96a8b4d 906eb28 96a8b4d 4f64736 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 |
---
license: apache-2.0
language:
- en
base_model:
- meta-llama/Llama-3.2-1B-Instruct
---
# π§ Multi-Task Address Reasoning Model v1.0
This model is a **multi-task fine-tuned model** specialized for **address correction, component extraction, and geographic Q&A** with **Chain of Thought reasoning**. Built on Llama-3.2-1B-Instruct with LoRA fine-tuning using Unsloth.
## π― Model Description
Multi-task Llama-3.2-1B model fine-tuned with LoRA for Indian address correction, component extraction, and geographic Q&A using Chain of Thought reasoning
### Key Capabilities
- **π§ Address Correction**: Fix spelling errors, formatting issues, and incomplete addresses
- **π Component Extraction**: Extract and structure address components (building, locality, city, state, pincode)
- **β Geographic Q&A**: Answer questions about locations, states, cities, and geographic relationships
- **π§ Chain of Thought Reasoning**: Detailed step-by-step reasoning for address analysis
- **π― Multi-Task Learning**: Single model handles multiple address-related tasks
## π Model Architecture & Training
- **Base Model**: unsloth/Llama-3.2-1B-Instruct
- **Fine-tuning Method**: LoRA (Low-Rank Adaptation) via Unsloth
- **LoRA Rank (r)**: 64
- **LoRA Alpha**: 128
- **LoRA Dropout**: 0.1
- **Target Modules**: q_proj, o_proj, k_proj, up_proj, v_proj, down_proj, gate_proj
- **Model Size**: ~276MB (adapter only)
- **Checkpoint**: 435
- **Max Sequence Length**: 1024 tokens (auto-optimized from sequence analysis)
### Training Configuration
- **Learning Rate**: 1e-4
- **Batch Size**: 32 (1 per device Γ 32 gradient accumulation)
- **Epochs**: 3
- **Optimizer**: adamw_8bit
- **Scheduler**: cosine
- **Weight Decay**: 0.01
## π Usage Examples
```python
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import warnings
import json
warnings.filterwarnings("ignore")
# Load base model and tokenizer (using actual base model from training)
base_model_name = "unsloth/Llama-3.2-1B-Instruct" # Actual base model used in training
model_name = "shiprocket-ai/multitask-address-reasoning-llama-1B-model"
print("π₯ Loading tokenizer...")
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)
# Add pad token if missing
if tokenizer.pad_token is None:
tokenizer.pad_token = tokenizer.eos_token
print("π₯ Loading base model...")
# Load base model (non-quantized version as per training script)
base_model = AutoModelForCausalLM.from_pretrained(
base_model_name,
torch_dtype=torch.float16,
device_map="auto",
trust_remote_code=True
)
print("π₯ Loading LoRA adapter...")
# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, model_name)
print("β
Model loaded successfully!")
def process_address_with_reasoning(prompt, max_new_tokens=400):
"""Process address with Chain of Thought reasoning (as trained)"""
# Tokenize
inputs = tokenizer(prompt, return_tensors="pt", truncation=True, max_length=512)
# Move inputs to model device
device = next(model.parameters()).device
inputs = {k: v.to(device) for k, v in inputs.items()}
# Generate with reasoning (matching training parameters)
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=max_new_tokens,
temperature=0.1, # Lower temperature as used in training testing
do_sample=True,
pad_token_id=tokenizer.eos_token_id,
use_cache=True
)
# Decode only the new tokens
input_length = inputs['input_ids'].shape[1]
generated_tokens = outputs[0][input_length:]
response = tokenizer.decode(generated_tokens, skip_special_tokens=True)
return response.strip()
def fix_address_with_reasoning(address, max_new_tokens=400):
"""Fix address with detailed Chain of Thought reasoning"""
messages = [
{"role": "user", "content": f"Fix and extract components from: {address}"}
]
prompt = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
return process_address_with_reasoning(prompt, max_new_tokens)
def answer_geographic_question(question, max_new_tokens=150):
"""Answer geographic questions about addresses"""
messages = [
{"role": "user", "content": question}
]
prompt = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
return process_address_with_reasoning(prompt, max_new_tokens)
def extract_components(address, max_new_tokens=200):
"""Extract address components with reasoning"""
messages = [
{"role": "user", "content": f"Extract all components from this address: {address}"}
]
prompt = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
return process_address_with_reasoning(prompt, max_new_tokens)
# Test cases based on training script examples
print("""
π MULTI-TASK ADDRESS MODEL EXAMPLES""")
print("=" * 60)
print("""π§ Testing Chain of Thought reasoning + Geographic Q&A""")
print("π Model trained with LoRA r=64, alpha=128 for complex reasoning")
print("=" * 60)
# Test address correction with reasoning (exact example from training)
test_addresses = [
"pandit nagla badi masjid moradabad 244001",
"sec 14 gurgoan haryana 122001",
"koramangala bangalor 560095",
"dlf cyber city gurgaon haryana"
]
print(f"""
π§ TESTING ADDRESS CORRECTION WITH CHAIN OF THOUGHT:""")
print("-" * 50)
for i, test_address in enumerate(test_addresses, 1):
print(f"""
π Test {i}: {test_address}""")
result = fix_address_with_reasoning(test_address)
print(f"π€ Chain of Thought Response:")
print(f" {result}")
print("-" * 40)
# Test geographic Q&A (examples from training script)
qa_tests = [
"Which state is Mumbai in?",
"What is the pincode of Bangalore?",
"Is Delhi a metro city?",
"What tier city is Pune?",
"Where is Connaught Place located?",
"What state does Hyderabad belong to?",
"Name a city in Karnataka.",
"What is the postal code for Gurgaon?",
"Which state is New Delhi in?", # Training example
"What cities are in Maharashtra?"
]
print(f"""
β TESTING GEOGRAPHIC Q&A:""")
print("-" * 50)
for i, question in enumerate(qa_tests[:8], 1): # Test first 8 questions
print(f"""
β Q{i}: {question}""")
result = answer_geographic_question(question)
print(f"π€ Answer: {result}")
# Test component extraction
print(f"""
π TESTING COMPONENT EXTRACTION:""")
print("-" * 50)
extraction_tests = [
"Flat 203, Emerald Heights, Sector 15, Gurugram, Haryana 122001",
"DLF Cyber City, Cyber Hub, Gurgaon, Haryana",
"Connaught Place, New Delhi, Delhi 110001"
]
for i, test_address in enumerate(extraction_tests, 1):
print(f"""
π Extract {i}: {test_address}""")
result = extract_components(test_address)
print(f"π€ Components: {result}")
print(f"""
β
ALL TESTS COMPLETED!""")
print(f"""π§ Model demonstrates Chain of Thought reasoning""")
print(f"""π Geographic knowledge from NER training data""")
print(f"""π§ Address correction with detailed analysis""")
```
## π§ Training Methodology
This model was trained using a sophisticated multi-task approach:
### **1. Data Preparation Strategy**
- **Source**: Address NER dataset with structured components (address β corrected_address β extracted_info)
- **Multi-task Split**: 70% Chain of Thought address correction + 30% Geographic Q&A
- **Data Augmentation**: Generated 584.8% of original data from original dataset
- **Reasoning Integration**: Each sample includes step-by-step analytical reasoning
### **2. Chain of Thought Address Correction**
- **Input**: Raw/incomplete addresses with potential errors
- **Process**: Model analyzes, identifies issues, and explains corrections
- **Output**: Detailed reasoning + structured JSON with address components
- **Examples**: Spelling fixes, state inference, component extraction
### **3. Geographic Q&A Generation**
From each address record's NER data, the model generates multiple Q&A pairs:
- **State-City relationships**: "Which state is Mumbai in?" β "Mumbai is in Maharashtra state."
- **Pincode queries**: "What is the pincode of Bangalore?" β "The pincode of Bangalore is 560001."
- **City tier classification**: "Is Delhi a metro city?" β "Yes, Delhi is a metropolitan city."
- **Locality mapping**: "Where is Connaught Place?" β "Connaught Place is in New Delhi."
### **4. Sequence Optimization**
- **Dynamic Analysis**: Analyzed 1000+ samples to determine optimal context length
- **Result**: 99% samples fit in 768 tokens, optimized for 1024
- **Context Window**: 1024 tokens (auto-optimized from sequence analysis) chosen for reasoning tasks
## π§ Training Performance
```
Final Training Loss: 0.5506
Training Runtime: 3701.74 seconds (~1 hour)
Training Samples/Second: 3.749
Training Steps/Second: 0.118
Total Epochs: 3.0
```
## π Supported Tasks
### 1. **Address Correction with Reasoning**
- Fix spelling errors and formatting issues
- Infer missing components (state, city tier)
- Provide step-by-step reasoning for corrections
### 2. **Component Extraction**
- Extract building names, localities, cities, states, pincodes
- Structure unstructured address data
- Identify address hierarchy and relationships
### 3. **Geographic Q&A**
- Answer questions about cities, states, and locations
- Provide geographic knowledge and relationships
- Handle location-based queries
### 4. **Address Standardization**
- Convert informal addresses to structured format
- Normalize address formats
- Handle various input formats
## π‘ Use Cases
### 1. **E-commerce & Logistics**
- Correct customer addresses during checkout
- Extract delivery components for routing
- Answer location-based customer queries
### 2. **Data Processing & Migration**
- Clean legacy address databases with reasoning
- Extract structured data from unstructured addresses
- Provide explanations for address corrections
### 3. **Customer Support Automation**
- Answer geographic questions about locations
- Help customers correct their addresses
- Provide location-based information
### 4. **Address Intelligence**
- Analyze address patterns and relationships
- Infer missing address components
- Provide geographic context and reasoning
## π― Prompt Formats
The model works with Llama-3.2 chat format:
### Address Correction
```
<|begin_of_text|><|start_header_id|>user<|end_header_id|>
Fix and extract components from: [address]<|eot_id|><|start_header_id|>assistant<|end_header_id|>
```
### Geographic Q&A
```
<|begin_of_text|><|start_header_id|>user<|end_header_id|>
Which state is [location] in?<|eot_id|><|start_header_id|>assistant<|end_header_id|>
```
### Component Extraction
```
<|begin_of_text|><|start_header_id|>user<|end_header_id|>
Extract all components from this address: [address]<|eot_id|><|start_header_id|>assistant<|end_header_id|>
```
## β‘ Performance Tips
1. **Temperature Settings**: Use 0.1-0.3 for factual tasks, 0.3-0.5 for reasoning tasks
2. **Context Management**: Keep prompts under 512 tokens for optimal performance
3. **Batch Processing**: Process multiple addresses efficiently with batching
4. **Device Placement**: Ensure all tensors are on the same device (GPU/CPU)
5. **Memory Management**: Use float16 for memory efficiency
## β οΈ Limitations
- **Model Size**: 1B parameters - may have limitations compared to larger models
- **Training Data**: Based on specific dataset - may not generalize to all address formats
- **Geographic Scope**: Optimized for Indian addresses and geography
- **Reasoning Depth**: Chain of thought reasoning may vary in complexity
- **Device Compatibility**: Requires proper device placement for inference
## π Model Files
- `adapter_config.json`: LoRA adapter configuration
- `adapter_model.safetensors`: LoRA adapter weights
- `tokenizer_config.json`: Tokenizer configuration
- `tokenizer.json`: Tokenizer vocabulary and settings
- `special_tokens_map.json`: Special tokens mapping
- `chat_template.jinja`: Chat template for conversations
## π Model Updates
- **Version**: 1.0 (Checkpoint 435)
- **Last Updated**: 2025-07-08
- **Training Framework**: Unsloth + LoRA
- **Base Model**: Llama-3.2-1B-Instruct
## π Citation
If you use this model in your research or applications, please cite:
```bibtex
@misc{multitask-address-reasoning-model,
title={Multi-Task Address Reasoning Model},
year={2025},
publisher={Hugging Face},
url={https://huggingface.co/shiprocket-ai/multitask-address-reasoning-llama-1B-model}
}
```
## π Support & Contact
For questions, issues, or feature requests:
- Open an issue in this repository
- Contact: shiprocket-ai team
- Documentation: See usage examples above
## π License
This model is released under the Apache 2.0 License. See LICENSE file for details.
---
*Multi-task address intelligence with reasoning - Built with π§ by shiprocket-ai team using Unsloth* |