|
--- |
|
base_model: unsloth/gemma-2b-it |
|
library_name: peft |
|
tags: |
|
- text-to-mongodb |
|
- LoRA |
|
- instruction-tuning |
|
- mongodb |
|
- gemma |
|
license: mit |
|
language: |
|
- en |
|
--- |
|
# π§ Gemma 2B - MongoDB Query Generator (LoRA) |
|
|
|
This is a LoRA fine-tuned version of `unsloth/gemma-2b-it` that converts natural language instructions into **MongoDB query strings** like: |
|
|
|
```js |
|
db.users.find({ "isActive": true, "age": { "$gt": 30 } }) |
|
``` |
|
|
|
The model is instruction-tuned to support a text-to-query use case for MongoDB across typical collections like `users`, `orders`, and `products`. |
|
|
|
--- |
|
|
|
## β¨ Model Details |
|
|
|
- **Base model**: [`unsloth/gemma-2b-it`](https://huggingface.co/unsloth/gemma-2b-it) |
|
- **Fine-tuned with**: LoRA (4-bit quantized) |
|
- **Framework**: [Unsloth](https://github.com/unslothai/unsloth) + PEFT |
|
- **Dataset**: Synthetic instructions paired with MongoDB queries (300+ examples) |
|
- **Use case**: Text-to-MongoDB query generation |
|
|
|
--- |
|
|
|
## π¦ How to Use |
|
|
|
```python |
|
from unsloth import FastLanguageModel |
|
|
|
model, tokenizer = FastLanguageModel.from_pretrained( |
|
model_name = "unsloth/gemma-2b-it", |
|
max_seq_length = 1024, |
|
dtype = torch.float16, |
|
load_in_4bit = True, |
|
) |
|
|
|
# Load LoRA adapter |
|
model = FastLanguageModel.get_peft_model( |
|
model, |
|
r=16, |
|
lora_alpha=32, |
|
target_modules=["q_proj", "k_proj", "v_proj", "o_proj"], |
|
lora_dropout=0.05, |
|
bias="none", |
|
) |
|
|
|
# Load parameter |
|
model.load_adapter("kihyun1998/gemma-2b-it-mongodb-lora", adapter_name="default") |
|
|
|
|
|
prompt = """### Instruction: |
|
Convert to MongoDB query string. |
|
|
|
### Input: |
|
Collection: users |
|
Fields: |
|
- name (string) |
|
- age (int) |
|
- isActive (boolean) |
|
- country (string) |
|
|
|
Question: Show all active users from Korea older than 30. |
|
|
|
### Response: |
|
""" |
|
|
|
inputs = tokenizer(prompt, return_tensors="pt").to("cuda") |
|
output = model.generate(**inputs, max_new_tokens=100) |
|
print(tokenizer.decode(output[0], skip_special_tokens=True)) |
|
``` |
|
|
|
--- |
|
|
|
## π‘ Example Output |
|
|
|
```js |
|
db.users.find({ "isActive": true, "country": "Korea", "age": { "$gt": 30 } }) |
|
``` |
|
|
|
--- |
|
|
|
## π Intended Use |
|
|
|
- Converting business-friendly questions into executable MongoDB queries |
|
- Powering internal dashboards, query builders, or no-code tools |
|
- Works best on structured fields and simple query logic |
|
|
|
### Out-of-scope: |
|
|
|
- Complex joins or aggregation pipelines |
|
- Nested or dynamic schema reasoning |
|
|
|
--- |
|
|
|
## π Training Details |
|
|
|
- LoRA rank: 16 |
|
- Epochs: 3 |
|
- Dataset: 300+ synthetic natural language β MongoDB query pairs |
|
- Training hardware: Google Colab (T4 GPU) |
|
|
|
--- |
|
|
|
## π§ Limitations |
|
|
|
- Model assumes collection and fields are already known (RAG context required) |
|
- May hallucinate field names not present in context |
|
- Limited handling of advanced MongoDB features like `$lookup`, `$aggregate` |
|
|
|
--- |
|
|
|
## π§Ύ License |
|
|
|
The base model is under [Gemma license](https://ai.google.dev/gemma#license). |
|
This LoRA adapter inherits the same conditions. |
|
|
|
--- |
|
|
|
## π§βπ» Author |
|
|
|
- π± [@kihyun1998](https://huggingface.co/kihyun1998) |
|
- π¬ Questions? Open an issue or contact via Hugging Face. |
|
|
|
--- |
|
|
|
## π Citation |
|
|
|
```bibtex |
|
@misc{kihyun2025mongodb, |
|
title={Gemma 2B MongoDB Query Generator (LoRA)}, |
|
author={Kihyun Park}, |
|
year={2025}, |
|
howpublished={\\url{https://huggingface.co/kihyun1998/gemma-2b-it-mongodb-lora}} |
|
} |
|
``` |
|
|