Could you elaborate a bit more on the details of the corpora you used to develop it?

by abxda - opened 11 days ago

11 days ago

Could you please share more details about the corpora you used to develop it, and let us know if it would be possible to gain access to them in order to replicate the results and learn more about your process?

jpalacios

Owner 11 days ago

Hi! You can find the dataset here Axolotl-Spanish-Nahuatl. The code is a standard SFT job on a quantized version of Mistral-7B. The point was to make a demo for this use case: an instruct-type model fine-tuned on an indigenous language dataset. I can also share the code if you want but I'd recommend using a more recent SFT codebase. My script dates back to ancient times (the summer of 2023 when Llama2 came out).

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment