🧠 Gemma 2B - MongoDB Query Generator (LoRA)

This is a LoRA fine-tuned version of unsloth/gemma-2b-it that converts natural language instructions into MongoDB query strings like:

db.users.find({ "isActive": true, "age": { "$gt": 30 } })

The model is instruction-tuned to support a text-to-query use case for MongoDB across typical collections like users, orders, and products.

✨ Model Details

Base model: unsloth/gemma-2b-it
Fine-tuned with: LoRA (4-bit quantized)
Framework: Unsloth + PEFT
Dataset: Synthetic instructions paired with MongoDB queries (300+ examples)
Use case: Text-to-MongoDB query generation

📦 How to Use

from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/gemma-2b-it",
    max_seq_length = 1024,
    dtype = torch.float16,
    load_in_4bit = True,
)

# Load LoRA adapter
model = FastLanguageModel.get_peft_model(
    model,
    r=16,
    lora_alpha=32,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
    lora_dropout=0.05,
    bias="none",
)

# Load parameter
model.load_adapter("kihyun1998/gemma-2b-it-mongodb-lora", adapter_name="default")


prompt = """### Instruction:
Convert to MongoDB query string.

### Input:
Collection: users
Fields:
- name (string)
- age (int)
- isActive (boolean)
- country (string)

Question: Show all active users from Korea older than 30.

### Response:
"""

inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
output = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(output[0], skip_special_tokens=True))

💡 Example Output

db.users.find({ "isActive": true, "country": "Korea", "age": { "$gt": 30 } })

📚 Intended Use

Converting business-friendly questions into executable MongoDB queries
Powering internal dashboards, query builders, or no-code tools
Works best on structured fields and simple query logic

Out-of-scope:

Complex joins or aggregation pipelines
Nested or dynamic schema reasoning

📊 Training Details

LoRA rank: 16
Epochs: 3
Dataset: 300+ synthetic natural language → MongoDB query pairs
Training hardware: Google Colab (T4 GPU)

🚧 Limitations

Model assumes collection and fields are already known (RAG context required)
May hallucinate field names not present in context
Limited handling of advanced MongoDB features like $lookup, $aggregate

🧾 License

The base model is under Gemma license.
This LoRA adapter inherits the same conditions.

🧑‍💻 Author

🐱 @kihyun1998
💬 Questions? Open an issue or contact via Hugging Face.

🏁 Citation

@misc{kihyun2025mongodb,
  title={Gemma 2B MongoDB Query Generator (LoRA)},
  author={Kihyun Park},
  year={2025},
  howpublished={\\url{https://huggingface.co/kihyun1998/gemma-2b-it-mongodb-lora}}
}

kihyun1998
/

gemma-2b-it-mongodb-lora