🧠 Gemma 2B - MongoDB Query Generator (LoRA)

This is a LoRA fine-tuned version of unsloth/gemma-2b-it that converts natural language instructions into MongoDB query strings like:

db.users.find({ "isActive": true, "age": { "$gt": 30 } })

The model is instruction-tuned to support a text-to-query use case for MongoDB across typical collections like users, orders, and products.


✨ Model Details

  • Base model: unsloth/gemma-2b-it
  • Fine-tuned with: LoRA (4-bit quantized)
  • Framework: Unsloth + PEFT
  • Dataset: Synthetic instructions paired with MongoDB queries (300+ examples)
  • Use case: Text-to-MongoDB query generation

πŸ“¦ How to Use

from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/gemma-2b-it",
    max_seq_length = 1024,
    dtype = torch.float16,
    load_in_4bit = True,
)

# Load LoRA adapter
model = FastLanguageModel.get_peft_model(
    model,
    r=16,
    lora_alpha=32,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
    lora_dropout=0.05,
    bias="none",
)

# Load parameter
model.load_adapter("kihyun1998/gemma-2b-it-mongodb-lora", adapter_name="default")


prompt = """### Instruction:
Convert to MongoDB query string.

### Input:
Collection: users
Fields:
- name (string)
- age (int)
- isActive (boolean)
- country (string)

Question: Show all active users from Korea older than 30.

### Response:
"""

inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
output = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(output[0], skip_special_tokens=True))

πŸ’‘ Example Output

db.users.find({ "isActive": true, "country": "Korea", "age": { "$gt": 30 } })

πŸ“š Intended Use

  • Converting business-friendly questions into executable MongoDB queries
  • Powering internal dashboards, query builders, or no-code tools
  • Works best on structured fields and simple query logic

Out-of-scope:

  • Complex joins or aggregation pipelines
  • Nested or dynamic schema reasoning

πŸ“Š Training Details

  • LoRA rank: 16
  • Epochs: 3
  • Dataset: 300+ synthetic natural language β†’ MongoDB query pairs
  • Training hardware: Google Colab (T4 GPU)

🚧 Limitations

  • Model assumes collection and fields are already known (RAG context required)
  • May hallucinate field names not present in context
  • Limited handling of advanced MongoDB features like $lookup, $aggregate

🧾 License

The base model is under Gemma license.
This LoRA adapter inherits the same conditions.


πŸ§‘β€πŸ’» Author

  • 🐱 @kihyun1998
  • πŸ’¬ Questions? Open an issue or contact via Hugging Face.

🏁 Citation

@misc{kihyun2025mongodb,
  title={Gemma 2B MongoDB Query Generator (LoRA)},
  author={Kihyun Park},
  year={2025},
  howpublished={\\url{https://huggingface.co/kihyun1998/gemma-2b-it-mongodb-lora}}
}
Downloads last month
26
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for kihyun1998/gemma-2b-it-mongodb-lora

Adapter
(273)
this model