kihyun1998's picture
Update README.md
f867b2f verified
---
base_model: unsloth/gemma-2b-it
library_name: peft
tags:
- text-to-mongodb
- LoRA
- instruction-tuning
- mongodb
- gemma
license: mit
language:
- en
---
# 🧠 Gemma 2B - MongoDB Query Generator (LoRA)
This is a LoRA fine-tuned version of `unsloth/gemma-2b-it` that converts natural language instructions into **MongoDB query strings** like:
```js
db.users.find({ "isActive": true, "age": { "$gt": 30 } })
```
The model is instruction-tuned to support a text-to-query use case for MongoDB across typical collections like `users`, `orders`, and `products`.
---
## ✨ Model Details
- **Base model**: [`unsloth/gemma-2b-it`](https://huggingface.co/unsloth/gemma-2b-it)
- **Fine-tuned with**: LoRA (4-bit quantized)
- **Framework**: [Unsloth](https://github.com/unslothai/unsloth) + PEFT
- **Dataset**: Synthetic instructions paired with MongoDB queries (300+ examples)
- **Use case**: Text-to-MongoDB query generation
---
## πŸ“¦ How to Use
```python
from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = "unsloth/gemma-2b-it",
max_seq_length = 1024,
dtype = torch.float16,
load_in_4bit = True,
)
# Load LoRA adapter
model = FastLanguageModel.get_peft_model(
model,
r=16,
lora_alpha=32,
target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
lora_dropout=0.05,
bias="none",
)
# Load parameter
model.load_adapter("kihyun1998/gemma-2b-it-mongodb-lora", adapter_name="default")
prompt = """### Instruction:
Convert to MongoDB query string.
### Input:
Collection: users
Fields:
- name (string)
- age (int)
- isActive (boolean)
- country (string)
Question: Show all active users from Korea older than 30.
### Response:
"""
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
output = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(output[0], skip_special_tokens=True))
```
---
## πŸ’‘ Example Output
```js
db.users.find({ "isActive": true, "country": "Korea", "age": { "$gt": 30 } })
```
---
## πŸ“š Intended Use
- Converting business-friendly questions into executable MongoDB queries
- Powering internal dashboards, query builders, or no-code tools
- Works best on structured fields and simple query logic
### Out-of-scope:
- Complex joins or aggregation pipelines
- Nested or dynamic schema reasoning
---
## πŸ“Š Training Details
- LoRA rank: 16
- Epochs: 3
- Dataset: 300+ synthetic natural language β†’ MongoDB query pairs
- Training hardware: Google Colab (T4 GPU)
---
## 🚧 Limitations
- Model assumes collection and fields are already known (RAG context required)
- May hallucinate field names not present in context
- Limited handling of advanced MongoDB features like `$lookup`, `$aggregate`
---
## 🧾 License
The base model is under [Gemma license](https://ai.google.dev/gemma#license).
This LoRA adapter inherits the same conditions.
---
## πŸ§‘β€πŸ’» Author
- 🐱 [@kihyun1998](https://huggingface.co/kihyun1998)
- πŸ’¬ Questions? Open an issue or contact via Hugging Face.
---
## 🏁 Citation
```bibtex
@misc{kihyun2025mongodb,
title={Gemma 2B MongoDB Query Generator (LoRA)},
author={Kihyun Park},
year={2025},
howpublished={\\url{https://huggingface.co/kihyun1998/gemma-2b-it-mongodb-lora}}
}
```