File size: 1,856 Bytes
86d2a32
 
 
58a6c30
 
 
86d2a32
 
 
 
58a6c30
 
86d2a32
 
 
58a6c30
 
 
 
86d2a32
 
58a6c30
86d2a32
58a6c30
 
 
 
 
86d2a32
58a6c30
 
 
 
 
 
86d2a32
58a6c30
 
 
 
 
 
 
 
 
 
 
1768094
58a6c30
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
---
base_model: unsloth/Llama-3.2-3B-Instruct
tags:
- text-generation
- mongodb
- query-generation
- transformers
- unsloth
- llama
- trl
- gguf
- quantized
license: apache-2.0
language:
- en
datasets:
- skshmjn/mongo_prompt_query
pipeline_tag: text-generation
library_name: transformers
---

# MongoDB Query Generator - Llama-3.2-3B (Fine-tuned)  

- **Developed by:** skshmjn  
- **License:** apache-2.0  
- **Finetuned from model:** [unsloth/Llama-3.2-3B-Instruct](https://huggingface.co/unsloth/Llama-3.2-3B-Instruct)  
- **Dataset Used:** [skshmjn/mongodb-chat-query](https://huggingface.co/datasets/skshmjn/mongodb-chat-query)  
- **Supports:** Transformers & GGUF (for fast inference on CPU/GPU)  

## ๐Ÿš€ **Model Overview**  
This model is designed to **generate MongoDB queries** from natural language prompts. It supports:  
- **Basic CRUD operations:** `find`, `insert`, `update`, `delete`  
- **Aggregation Pipelines:** `$group`, `$match`, `$lookup`, `$sort`, etc.  
- **Indexing & Performance Queries**  
- **Nested Queries & Joins (`$lookup`)**  

Trained using **Unsloth** for efficient fine-tuning and **GGUF quantization** for fast inference.  

---

## ๐Ÿ“Œ **Example Usage (Transformers)**  
```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "skshmjn/Llama-3.2-3B-Mongo-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
schema = {} # Pass your mongodb schema here, leave empty for generic queries. Sample available in hugging face's repository

prompt = "Here is mongodb schema {schema} and Find all employees older than 30 in the 'employees' collection."
inputs = tokenizer(prompt, return_tensors="pt")

output = model.generate(**inputs, max_length=100)
query = tokenizer.decode(output[0], skip_special_tokens=True)

print(query)