id-recigen-bart / README.md
haryoaw's picture
Update README.md
ec17889
---
language: id
tags:
- bart
- id
license: mit
---
# Indonesia Recipe Ingredients Generator Model
**WARNING: inference on Huggingface might not run since the tokenizer used is not transformers's tokenizer.**
Feel free to test the model [in this space](https://huggingface.co/spaces/haryoaw/id-recigen)
😎 **Have fun on generating ingredients** 😎
This is a fine-tuned model to generate the Indonesian food ingredients. One of my personal project that I did in my free time.
Basically, you give the name of the food and it will produce the ingredients of the food.
## Model
Data: [Indonesian Recipe Data on Kaggle](https://www.kaggle.com/datasets/canggih/indonesian-food-recipes)
Pre-trained Model: [IndoBART-v2](https://huggingface.co/indobenchmark/indobart-v2)
## How to use
We will specify the usage of the tokenizer and the model.
### Tokenizer
Since we use `indobart-v2`, we need to use their tokenizer.
First, install the tokenizer by doing `pip install indobenchmark-toolkit`.
After that, you can load the tokenizer:
```python
from indobenchmark.tokenization_indonlg import IndoNLGTokenizer
tokenizer = IndoNLGTokenizer.from_pretrained("haryoaw/id-recigen-bart")
```
**EDIT**:
Seems like the tokenizer in the package is not the same as the one that I use to finetune the model.
There are some noticeable bug such as some subword tokens are not considered as subword. Nevertheless, it stil works!
### Model
The model can be loaded by using AutoModel.
```python
from transformers import AutoModelForSeq2SeqLM
model = AutoModelForSeq2SeqLM.from_pretrained("haryoaw/id-recigen-bart")
```
## Input Example
Make sure to input a **LOWERCASE** food name. The tokenizer is case-sensitive!
```
sayur asam
```
```
nasi goreng ayam
```
~To be continued..