Update README.md

4f64736 verified about 2 months ago

13.2 kB

	---
	license: apache-2.0
	language:
	- en
	base_model:
	- meta-llama/Llama-3.2-1B-Instruct
	---
	# 🧠 Multi-Task Address Reasoning Model v1.0

	This model is a multi-task fine-tuned model specialized for address correction, component extraction, and geographic Q&A with Chain of Thought reasoning. Built on Llama-3.2-1B-Instruct with LoRA fine-tuning using Unsloth.

	## 🎯 Model Description

	Multi-task Llama-3.2-1B model fine-tuned with LoRA for Indian address correction, component extraction, and geographic Q&A using Chain of Thought reasoning

	### Key Capabilities

	- 🔧 Address Correction: Fix spelling errors, formatting issues, and incomplete addresses
	- 📊 Component Extraction: Extract and structure address components (building, locality, city, state, pincode)
	- ❓ Geographic Q&A: Answer questions about locations, states, cities, and geographic relationships
	- 🧠 Chain of Thought Reasoning: Detailed step-by-step reasoning for address analysis
	- 🎯 Multi-Task Learning: Single model handles multiple address-related tasks

	## 📊 Model Architecture & Training

	- Base Model: unsloth/Llama-3.2-1B-Instruct
	- Fine-tuning Method: LoRA (Low-Rank Adaptation) via Unsloth
	- LoRA Rank (r): 64
	- LoRA Alpha: 128
	- LoRA Dropout: 0.1
	- Target Modules: q_proj, o_proj, k_proj, up_proj, v_proj, down_proj, gate_proj
	- Model Size: ~276MB (adapter only)
	- Checkpoint: 435
	- Max Sequence Length: 1024 tokens (auto-optimized from sequence analysis)

	### Training Configuration
	- Learning Rate: 1e-4
	- Batch Size: 32 (1 per device × 32 gradient accumulation)
	- Epochs: 3
	- Optimizer: adamw_8bit
	- Scheduler: cosine
	- Weight Decay: 0.01

	## 🚀 Usage Examples

	```python
	import torch
	from transformers import AutoTokenizer, AutoModelForCausalLM
	from peft import PeftModel
	import warnings
	import json
	warnings.filterwarnings("ignore")

	# Load base model and tokenizer (using actual base model from training)
	base_model_name = "unsloth/Llama-3.2-1B-Instruct" # Actual base model used in training
	model_name = "shiprocket-ai/multitask-address-reasoning-llama-1B-model"

	print("📥 Loading tokenizer...")
	# Load tokenizer
	tokenizer = AutoTokenizer.from_pretrained(model_name)

	# Add pad token if missing
	if tokenizer.pad_token is None:
	tokenizer.pad_token = tokenizer.eos_token

	print("📥 Loading base model...")
	# Load base model (non-quantized version as per training script)
	base_model = AutoModelForCausalLM.from_pretrained(
	base_model_name,
	torch_dtype=torch.float16,
	device_map="auto",
	trust_remote_code=True
	)

	print("📥 Loading LoRA adapter...")
	# Load LoRA adapter
	model = PeftModel.from_pretrained(base_model, model_name)

	print("✅ Model loaded successfully!")

	def process_address_with_reasoning(prompt, max_new_tokens=400):
	"""Process address with Chain of Thought reasoning (as trained)"""

	# Tokenize
	inputs = tokenizer(prompt, return_tensors="pt", truncation=True, max_length=512)

	# Move inputs to model device
	device = next(model.parameters()).device
	inputs = {k: v.to(device) for k, v in inputs.items()}

	# Generate with reasoning (matching training parameters)
	with torch.no_grad():
	outputs = model.generate(
	**inputs,
	max_new_tokens=max_new_tokens,
	temperature=0.1, # Lower temperature as used in training testing
	do_sample=True,
	pad_token_id=tokenizer.eos_token_id,
	use_cache=True
	)

	# Decode only the new tokens
	input_length = inputs['input_ids'].shape[1]
	generated_tokens = outputs[0][input_length:]
	response = tokenizer.decode(generated_tokens, skip_special_tokens=True)

	return response.strip()

	def fix_address_with_reasoning(address, max_new_tokens=400):
	"""Fix address with detailed Chain of Thought reasoning"""

	messages = [
	{"role": "user", "content": f"Fix and extract components from: {address}"}
	]

	prompt = tokenizer.apply_chat_template(
	messages,
	tokenize=False,
	add_generation_prompt=True
	)

	return process_address_with_reasoning(prompt, max_new_tokens)

	def answer_geographic_question(question, max_new_tokens=150):
	"""Answer geographic questions about addresses"""

	messages = [
	{"role": "user", "content": question}
	]

	prompt = tokenizer.apply_chat_template(
	messages,
	tokenize=False,
	add_generation_prompt=True
	)

	return process_address_with_reasoning(prompt, max_new_tokens)

	def extract_components(address, max_new_tokens=200):
	"""Extract address components with reasoning"""

	messages = [
	{"role": "user", "content": f"Extract all components from this address: {address}"}
	]

	prompt = tokenizer.apply_chat_template(
	messages,
	tokenize=False,
	add_generation_prompt=True
	)

	return process_address_with_reasoning(prompt, max_new_tokens)

	# Test cases based on training script examples
	print("""
	🏠 MULTI-TASK ADDRESS MODEL EXAMPLES""")
	print("=" * 60)
	print("""🧠 Testing Chain of Thought reasoning + Geographic Q&A""")
	print("📊 Model trained with LoRA r=64, alpha=128 for complex reasoning")
	print("=" * 60)

	# Test address correction with reasoning (exact example from training)
	test_addresses = [
	"pandit nagla badi masjid moradabad 244001",
	"sec 14 gurgoan haryana 122001",
	"koramangala bangalor 560095",
	"dlf cyber city gurgaon haryana"
	]

	print(f"""
	🔧 TESTING ADDRESS CORRECTION WITH CHAIN OF THOUGHT:""")
	print("-" * 50)

	for i, test_address in enumerate(test_addresses, 1):
	print(f"""
	📍 Test {i}: {test_address}""")
	result = fix_address_with_reasoning(test_address)
	print(f"🤖 Chain of Thought Response:")
	print(f" {result}")
	print("-" * 40)

	# Test geographic Q&A (examples from training script)
	qa_tests = [
	"Which state is Mumbai in?",
	"What is the pincode of Bangalore?",
	"Is Delhi a metro city?",
	"What tier city is Pune?",
	"Where is Connaught Place located?",
	"What state does Hyderabad belong to?",
	"Name a city in Karnataka.",
	"What is the postal code for Gurgaon?",
	"Which state is New Delhi in?", # Training example
	"What cities are in Maharashtra?"
	]

	print(f"""
	❓ TESTING GEOGRAPHIC Q&A:""")
	print("-" * 50)

	for i, question in enumerate(qa_tests[:8], 1): # Test first 8 questions
	print(f"""
	❓ Q{i}: {question}""")
	result = answer_geographic_question(question)
	print(f"🤖 Answer: {result}")

	# Test component extraction
	print(f"""
	📊 TESTING COMPONENT EXTRACTION:""")
	print("-" * 50)

	extraction_tests = [
	"Flat 203, Emerald Heights, Sector 15, Gurugram, Haryana 122001",
	"DLF Cyber City, Cyber Hub, Gurgaon, Haryana",
	"Connaught Place, New Delhi, Delhi 110001"
	]

	for i, test_address in enumerate(extraction_tests, 1):
	print(f"""
	📊 Extract {i}: {test_address}""")
	result = extract_components(test_address)
	print(f"🤖 Components: {result}")

	print(f"""
	✅ ALL TESTS COMPLETED!""")
	print(f"""🧠 Model demonstrates Chain of Thought reasoning""")
	print(f"""📍 Geographic knowledge from NER training data""")
	print(f"""🔧 Address correction with detailed analysis""")
	```

	## 🧠 Training Methodology

	This model was trained using a sophisticated multi-task approach:

	### 1. Data Preparation Strategy
	- Source: Address NER dataset with structured components (address → corrected_address → extracted_info)
	- Multi-task Split: 70% Chain of Thought address correction + 30% Geographic Q&A
	- Data Augmentation: Generated 584.8% of original data from original dataset
	- Reasoning Integration: Each sample includes step-by-step analytical reasoning

	### 2. Chain of Thought Address Correction
	- Input: Raw/incomplete addresses with potential errors
	- Process: Model analyzes, identifies issues, and explains corrections
	- Output: Detailed reasoning + structured JSON with address components
	- Examples: Spelling fixes, state inference, component extraction

	### 3. Geographic Q&A Generation
	From each address record's NER data, the model generates multiple Q&A pairs:
	- State-City relationships: "Which state is Mumbai in?" → "Mumbai is in Maharashtra state."
	- Pincode queries: "What is the pincode of Bangalore?" → "The pincode of Bangalore is 560001."
	- City tier classification: "Is Delhi a metro city?" → "Yes, Delhi is a metropolitan city."
	- Locality mapping: "Where is Connaught Place?" → "Connaught Place is in New Delhi."

	### 4. Sequence Optimization
	- Dynamic Analysis: Analyzed 1000+ samples to determine optimal context length
	- Result: 99% samples fit in 768 tokens, optimized for 1024
	- Context Window: 1024 tokens (auto-optimized from sequence analysis) chosen for reasoning tasks

	## 🔧 Training Performance

	```
	Final Training Loss: 0.5506
	Training Runtime: 3701.74 seconds (~1 hour)
	Training Samples/Second: 3.749
	Training Steps/Second: 0.118
	Total Epochs: 3.0
	```

	## 🎭 Supported Tasks

	### 1. Address Correction with Reasoning
	- Fix spelling errors and formatting issues
	- Infer missing components (state, city tier)
	- Provide step-by-step reasoning for corrections

	### 2. Component Extraction
	- Extract building names, localities, cities, states, pincodes
	- Structure unstructured address data
	- Identify address hierarchy and relationships

	### 3. Geographic Q&A
	- Answer questions about cities, states, and locations
	- Provide geographic knowledge and relationships
	- Handle location-based queries

	### 4. Address Standardization
	- Convert informal addresses to structured format
	- Normalize address formats
	- Handle various input formats

	## 💡 Use Cases

	### 1. E-commerce & Logistics
	- Correct customer addresses during checkout
	- Extract delivery components for routing
	- Answer location-based customer queries

	### 2. Data Processing & Migration
	- Clean legacy address databases with reasoning
	- Extract structured data from unstructured addresses
	- Provide explanations for address corrections

	### 3. Customer Support Automation
	- Answer geographic questions about locations
	- Help customers correct their addresses
	- Provide location-based information

	### 4. Address Intelligence
	- Analyze address patterns and relationships
	- Infer missing address components
	- Provide geographic context and reasoning

	## 🎯 Prompt Formats

	The model works with Llama-3.2 chat format:

	### Address Correction
	```
	<\|begin_of_text\|><\|start_header_id\|>user<\|end_header_id\|>

	Fix and extract components from: [address]<\|eot_id\|><\|start_header_id\|>assistant<\|end_header_id\|>

	```

	### Geographic Q&A
	```
	<\|begin_of_text\|><\|start_header_id\|>user<\|end_header_id\|>

	Which state is [location] in?<\|eot_id\|><\|start_header_id\|>assistant<\|end_header_id\|>

	```

	### Component Extraction
	```
	<\|begin_of_text\|><\|start_header_id\|>user<\|end_header_id\|>

	Extract all components from this address: [address]<\|eot_id\|><\|start_header_id\|>assistant<\|end_header_id\|>

	```

	## ⚡ Performance Tips

	1. Temperature Settings: Use 0.1-0.3 for factual tasks, 0.3-0.5 for reasoning tasks
	2. Context Management: Keep prompts under 512 tokens for optimal performance
	3. Batch Processing: Process multiple addresses efficiently with batching
	4. Device Placement: Ensure all tensors are on the same device (GPU/CPU)
	5. Memory Management: Use float16 for memory efficiency

	## ⚠️ Limitations

	- Model Size: 1B parameters - may have limitations compared to larger models
	- Training Data: Based on specific dataset - may not generalize to all address formats
	- Geographic Scope: Optimized for Indian addresses and geography
	- Reasoning Depth: Chain of thought reasoning may vary in complexity
	- Device Compatibility: Requires proper device placement for inference

	## 📋 Model Files

	- `adapter_config.json`: LoRA adapter configuration
	- `adapter_model.safetensors`: LoRA adapter weights
	- `tokenizer_config.json`: Tokenizer configuration
	- `tokenizer.json`: Tokenizer vocabulary and settings
	- `special_tokens_map.json`: Special tokens mapping
	- `chat_template.jinja`: Chat template for conversations

	## 🔄 Model Updates

	- Version: 1.0 (Checkpoint 435)
	- Last Updated: 2025-07-08
	- Training Framework: Unsloth + LoRA
	- Base Model: Llama-3.2-1B-Instruct

	## 📚 Citation

	If you use this model in your research or applications, please cite:

	```bibtex
	@misc{multitask-address-reasoning-model,
	title={Multi-Task Address Reasoning Model},
	year={2025},
	publisher={Hugging Face},
	url={https://huggingface.co/shiprocket-ai/multitask-address-reasoning-llama-1B-model}
	}
	```

	## 📞 Support & Contact

	For questions, issues, or feature requests:
	- Open an issue in this repository
	- Contact: shiprocket-ai team
	- Documentation: See usage examples above

	## 📜 License

	This model is released under the Apache 2.0 License. See LICENSE file for details.

	---

	Multi-task address intelligence with reasoning - Built with 🧠 by shiprocket-ai team using Unsloth