Menlo
/

Ichigo-llama3.1-s-instruct-v0.4

Audio-Text-to-Text

sound language model

Model card Files Files and versions Community

jan-hq commited on Nov 11, 2024

Commit

a8a4bc7

·

verified ·

1 Parent(s): d0d3398

Update README.md

Files changed (1) hide show

README.md +2 -0

README.md CHANGED Viewed

@@ -15,6 +15,8 @@ We have developed and released the family [Ichigo-llama3s](https://huggingface.c
 We SFT [homebrewltd/Ichigo-llama3.1-s-base-v0.3](https://huggingface.co/homebrewltd/Ichigo-llama3.1-s-base-v0.3) with nearly 1B tokens from [Instruction Speech WhisperVQ v3](homebrewltd/mixed-instruction-speech-whispervq-v3-full) dataset.
 This is the model checkpoint from step 7000. Due to some noise in the training data, it has an artificially higher score on the Speech Instruction benchmark.
 **Model developers** Homebrew Research.
 **Input** Text and sound.

 We SFT [homebrewltd/Ichigo-llama3.1-s-base-v0.3](https://huggingface.co/homebrewltd/Ichigo-llama3.1-s-base-v0.3) with nearly 1B tokens from [Instruction Speech WhisperVQ v3](homebrewltd/mixed-instruction-speech-whispervq-v3-full) dataset.
 This is the model checkpoint from step 7000. Due to some noise in the training data, it has an artificially higher score on the Speech Instruction benchmark.
+This model is a supervised fine-tuned (SFT) version of homebrewltd/Ichigo-llama3.1-s-base-v0.3, trained on over 1 billion tokens from the [Instruction Speech WhisperVQ v4](jan-hq/mixed-instruction-speech-whispervq-v3-full-phase2-3) dataset which built upon [Instruction Speech WhisperVQ v3](homebrewltd/mixed-instruction-speech-whispervq-v3-full), adding multi-turn speech conversations and noise rejection capabilities for enhanced performance. This version, we introduce of noise-augmented multi-turn conversations, where we synthetically inject noise into both speech and text-only dialogue data. As a result, the model demonstrates improved robustness against noisy environmental inputs and enhanced multi-turn conversation capabilities, making it more reliable in real-world applications.
 **Model developers** Homebrew Research.
 **Input** Text and sound.