Introducing Completionist, an open-source command-line tool that automates synthetic dataset generation.
It works by iterating over an existing HF dataset and by using a LLM to create completions.
- Problem: You need a fast way to create custom datasets for fine-tuning or RAG, but you want the flexibility to use different LLM backends or your own infrastructure. - Solution: Completionist connects with any OpenAI-compatible endpoint, including Ollama and LM Studio, or a Hugging Face inference endpoint.
A simple CLI like Completionist gives you the possibility to take full control of your synthetic data generation workflow.