File size: 13,177 Bytes
4f64736
 
 
 
 
 
 
96a8b4d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
906eb28
 
96a8b4d
906eb28
96a8b4d
 
 
 
 
 
 
 
 
 
 
906eb28
 
96a8b4d
 
 
906eb28
 
96a8b4d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
906eb28
 
96a8b4d
 
 
906eb28
 
96a8b4d
 
 
 
906eb28
 
96a8b4d
 
 
 
 
 
 
 
 
906eb28
 
96a8b4d
 
 
906eb28
 
 
 
 
96a8b4d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4f64736
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
---
license: apache-2.0
language:
- en
base_model:
- meta-llama/Llama-3.2-1B-Instruct
---
# 🧠 Multi-Task Address Reasoning Model v1.0

This model is a **multi-task fine-tuned model** specialized for **address correction, component extraction, and geographic Q&A** with **Chain of Thought reasoning**. Built on Llama-3.2-1B-Instruct with LoRA fine-tuning using Unsloth.

## 🎯 Model Description

Multi-task Llama-3.2-1B model fine-tuned with LoRA for Indian address correction, component extraction, and geographic Q&A using Chain of Thought reasoning

### Key Capabilities

- **πŸ”§ Address Correction**: Fix spelling errors, formatting issues, and incomplete addresses
- **πŸ“Š Component Extraction**: Extract and structure address components (building, locality, city, state, pincode)
- **❓ Geographic Q&A**: Answer questions about locations, states, cities, and geographic relationships
- **🧠 Chain of Thought Reasoning**: Detailed step-by-step reasoning for address analysis
- **🎯 Multi-Task Learning**: Single model handles multiple address-related tasks

## πŸ“Š Model Architecture & Training

- **Base Model**: unsloth/Llama-3.2-1B-Instruct
- **Fine-tuning Method**: LoRA (Low-Rank Adaptation) via Unsloth
- **LoRA Rank (r)**: 64
- **LoRA Alpha**: 128
- **LoRA Dropout**: 0.1
- **Target Modules**: q_proj, o_proj, k_proj, up_proj, v_proj, down_proj, gate_proj
- **Model Size**: ~276MB (adapter only)
- **Checkpoint**: 435
- **Max Sequence Length**: 1024 tokens (auto-optimized from sequence analysis)

### Training Configuration
- **Learning Rate**: 1e-4
- **Batch Size**: 32 (1 per device Γ— 32 gradient accumulation)
- **Epochs**: 3
- **Optimizer**: adamw_8bit
- **Scheduler**: cosine
- **Weight Decay**: 0.01

## πŸš€ Usage Examples

```python
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import warnings
import json
warnings.filterwarnings("ignore")

# Load base model and tokenizer (using actual base model from training)
base_model_name = "unsloth/Llama-3.2-1B-Instruct"  # Actual base model used in training
model_name = "shiprocket-ai/multitask-address-reasoning-llama-1B-model"

print("πŸ“₯ Loading tokenizer...")
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Add pad token if missing
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

print("πŸ“₯ Loading base model...")
# Load base model (non-quantized version as per training script)
base_model = AutoModelForCausalLM.from_pretrained(
    base_model_name,
    torch_dtype=torch.float16,
    device_map="auto",
    trust_remote_code=True
)

print("πŸ“₯ Loading LoRA adapter...")
# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, model_name)

print("βœ… Model loaded successfully!")

def process_address_with_reasoning(prompt, max_new_tokens=400):
    """Process address with Chain of Thought reasoning (as trained)"""

    # Tokenize
    inputs = tokenizer(prompt, return_tensors="pt", truncation=True, max_length=512)

    # Move inputs to model device
    device = next(model.parameters()).device
    inputs = {k: v.to(device) for k, v in inputs.items()}

    # Generate with reasoning (matching training parameters)
    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=max_new_tokens,
            temperature=0.1,  # Lower temperature as used in training testing
            do_sample=True,
            pad_token_id=tokenizer.eos_token_id,
            use_cache=True
        )

    # Decode only the new tokens
    input_length = inputs['input_ids'].shape[1]
    generated_tokens = outputs[0][input_length:]
    response = tokenizer.decode(generated_tokens, skip_special_tokens=True)

    return response.strip()

def fix_address_with_reasoning(address, max_new_tokens=400):
    """Fix address with detailed Chain of Thought reasoning"""

    messages = [
        {"role": "user", "content": f"Fix and extract components from: {address}"}
    ]

    prompt = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True
    )

    return process_address_with_reasoning(prompt, max_new_tokens)

def answer_geographic_question(question, max_new_tokens=150):
    """Answer geographic questions about addresses"""

    messages = [
        {"role": "user", "content": question}
    ]

    prompt = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True
    )

    return process_address_with_reasoning(prompt, max_new_tokens)

def extract_components(address, max_new_tokens=200):
    """Extract address components with reasoning"""

    messages = [
        {"role": "user", "content": f"Extract all components from this address: {address}"}
    ]

    prompt = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True
    )

    return process_address_with_reasoning(prompt, max_new_tokens)

# Test cases based on training script examples
print("""
🏠 MULTI-TASK ADDRESS MODEL EXAMPLES""")
print("=" * 60)
print("""🧠 Testing Chain of Thought reasoning + Geographic Q&A""")
print("πŸ“Š Model trained with LoRA r=64, alpha=128 for complex reasoning")
print("=" * 60)

# Test address correction with reasoning (exact example from training)
test_addresses = [
    "pandit nagla badi masjid moradabad 244001",
    "sec 14 gurgoan haryana 122001",
    "koramangala bangalor 560095",
    "dlf cyber city gurgaon haryana"
]

print(f"""
πŸ”§ TESTING ADDRESS CORRECTION WITH CHAIN OF THOUGHT:""")
print("-" * 50)

for i, test_address in enumerate(test_addresses, 1):
    print(f"""
πŸ“ Test {i}: {test_address}""")
    result = fix_address_with_reasoning(test_address)
    print(f"πŸ€– Chain of Thought Response:")
    print(f"   {result}")
    print("-" * 40)

# Test geographic Q&A (examples from training script)
qa_tests = [
    "Which state is Mumbai in?",
    "What is the pincode of Bangalore?",
    "Is Delhi a metro city?",
    "What tier city is Pune?",
    "Where is Connaught Place located?",
    "What state does Hyderabad belong to?",
    "Name a city in Karnataka.",
    "What is the postal code for Gurgaon?",
    "Which state is New Delhi in?",  # Training example
    "What cities are in Maharashtra?"
]

print(f"""
❓ TESTING GEOGRAPHIC Q&A:""")
print("-" * 50)

for i, question in enumerate(qa_tests[:8], 1):  # Test first 8 questions
    print(f"""
❓ Q{i}: {question}""")
    result = answer_geographic_question(question)
    print(f"πŸ€– Answer: {result}")

# Test component extraction
print(f"""
πŸ“Š TESTING COMPONENT EXTRACTION:""")
print("-" * 50)

extraction_tests = [
    "Flat 203, Emerald Heights, Sector 15, Gurugram, Haryana 122001",
    "DLF Cyber City, Cyber Hub, Gurgaon, Haryana",
    "Connaught Place, New Delhi, Delhi 110001"
]

for i, test_address in enumerate(extraction_tests, 1):
    print(f"""
πŸ“Š Extract {i}: {test_address}""")
    result = extract_components(test_address)
    print(f"πŸ€– Components: {result}")

print(f"""
βœ… ALL TESTS COMPLETED!""")
print(f"""🧠 Model demonstrates Chain of Thought reasoning""")
print(f"""πŸ“ Geographic knowledge from NER training data""")
print(f"""πŸ”§ Address correction with detailed analysis""")
```

## 🧠 Training Methodology

This model was trained using a sophisticated multi-task approach:

### **1. Data Preparation Strategy**
- **Source**: Address NER dataset with structured components (address β†’ corrected_address β†’ extracted_info)
- **Multi-task Split**: 70% Chain of Thought address correction + 30% Geographic Q&A
- **Data Augmentation**: Generated 584.8% of original data from original dataset
- **Reasoning Integration**: Each sample includes step-by-step analytical reasoning

### **2. Chain of Thought Address Correction**
- **Input**: Raw/incomplete addresses with potential errors
- **Process**: Model analyzes, identifies issues, and explains corrections
- **Output**: Detailed reasoning + structured JSON with address components
- **Examples**: Spelling fixes, state inference, component extraction

### **3. Geographic Q&A Generation**
From each address record's NER data, the model generates multiple Q&A pairs:
- **State-City relationships**: "Which state is Mumbai in?" β†’ "Mumbai is in Maharashtra state."
- **Pincode queries**: "What is the pincode of Bangalore?" β†’ "The pincode of Bangalore is 560001."
- **City tier classification**: "Is Delhi a metro city?" β†’ "Yes, Delhi is a metropolitan city."
- **Locality mapping**: "Where is Connaught Place?" β†’ "Connaught Place is in New Delhi."

### **4. Sequence Optimization**
- **Dynamic Analysis**: Analyzed 1000+ samples to determine optimal context length
- **Result**: 99% samples fit in 768 tokens, optimized for 1024
- **Context Window**: 1024 tokens (auto-optimized from sequence analysis) chosen for reasoning tasks

## πŸ”§ Training Performance

```
Final Training Loss: 0.5506
Training Runtime: 3701.74 seconds (~1 hour)
Training Samples/Second: 3.749
Training Steps/Second: 0.118
Total Epochs: 3.0
```

## 🎭 Supported Tasks

### 1. **Address Correction with Reasoning**
- Fix spelling errors and formatting issues
- Infer missing components (state, city tier)
- Provide step-by-step reasoning for corrections

### 2. **Component Extraction**
- Extract building names, localities, cities, states, pincodes
- Structure unstructured address data
- Identify address hierarchy and relationships

### 3. **Geographic Q&A**
- Answer questions about cities, states, and locations
- Provide geographic knowledge and relationships
- Handle location-based queries

### 4. **Address Standardization**
- Convert informal addresses to structured format
- Normalize address formats
- Handle various input formats

## πŸ’‘ Use Cases

### 1. **E-commerce & Logistics**
- Correct customer addresses during checkout
- Extract delivery components for routing
- Answer location-based customer queries

### 2. **Data Processing & Migration**
- Clean legacy address databases with reasoning
- Extract structured data from unstructured addresses
- Provide explanations for address corrections

### 3. **Customer Support Automation**
- Answer geographic questions about locations
- Help customers correct their addresses
- Provide location-based information

### 4. **Address Intelligence**
- Analyze address patterns and relationships
- Infer missing address components
- Provide geographic context and reasoning

## 🎯 Prompt Formats

The model works with Llama-3.2 chat format:

### Address Correction
```
<|begin_of_text|><|start_header_id|>user<|end_header_id|>

Fix and extract components from: [address]<|eot_id|><|start_header_id|>assistant<|end_header_id|>

```

### Geographic Q&A
```
<|begin_of_text|><|start_header_id|>user<|end_header_id|>

Which state is [location] in?<|eot_id|><|start_header_id|>assistant<|end_header_id|>

```

### Component Extraction
```
<|begin_of_text|><|start_header_id|>user<|end_header_id|>

Extract all components from this address: [address]<|eot_id|><|start_header_id|>assistant<|end_header_id|>

```

## ⚑ Performance Tips

1. **Temperature Settings**: Use 0.1-0.3 for factual tasks, 0.3-0.5 for reasoning tasks
2. **Context Management**: Keep prompts under 512 tokens for optimal performance
3. **Batch Processing**: Process multiple addresses efficiently with batching
4. **Device Placement**: Ensure all tensors are on the same device (GPU/CPU)
5. **Memory Management**: Use float16 for memory efficiency

## ⚠️ Limitations

- **Model Size**: 1B parameters - may have limitations compared to larger models
- **Training Data**: Based on specific dataset - may not generalize to all address formats
- **Geographic Scope**: Optimized for Indian addresses and geography
- **Reasoning Depth**: Chain of thought reasoning may vary in complexity
- **Device Compatibility**: Requires proper device placement for inference

## πŸ“‹ Model Files

- `adapter_config.json`: LoRA adapter configuration
- `adapter_model.safetensors`: LoRA adapter weights
- `tokenizer_config.json`: Tokenizer configuration
- `tokenizer.json`: Tokenizer vocabulary and settings
- `special_tokens_map.json`: Special tokens mapping
- `chat_template.jinja`: Chat template for conversations

## πŸ”„ Model Updates

- **Version**: 1.0 (Checkpoint 435)
- **Last Updated**: 2025-07-08
- **Training Framework**: Unsloth + LoRA
- **Base Model**: Llama-3.2-1B-Instruct

## πŸ“š Citation

If you use this model in your research or applications, please cite:

```bibtex
@misc{multitask-address-reasoning-model,
  title={Multi-Task Address Reasoning Model},
  year={2025},
  publisher={Hugging Face},
  url={https://huggingface.co/shiprocket-ai/multitask-address-reasoning-llama-1B-model}
}
```

## πŸ“ž Support & Contact

For questions, issues, or feature requests:
- Open an issue in this repository
- Contact: shiprocket-ai team
- Documentation: See usage examples above

## πŸ“œ License

This model is released under the Apache 2.0 License. See LICENSE file for details.

---

*Multi-task address intelligence with reasoning - Built with 🧠 by shiprocket-ai team using Unsloth*