facebook
/

musicgen-stereo-small

Inference Endpoints

Model card Files Files and versions Community

reach-vb HF staff commited on Nov 8, 2023

Commit

8eb9656

·

1 Parent(s): a23d0da

Update README.md

Files changed (1) hide show

README.md +6 -3

README.md CHANGED Viewed

@@ -19,6 +19,9 @@ We further release a set of stereophonic capable models. Those were fine tuned f
 from the mono models. The training data is otherwise identical and capabilities and limitations are shared with the base modes. The stereo models work by getting 2 streams of tokens from the EnCodec model, and interleaving those using
 the delay pattern.
 MusicGen is a text-to-music model capable of genreating high-quality music samples conditioned on text descriptions or audio prompts.
 It is a single stage auto-regressive Transformer model trained over a 32kHz EnCodec tokenizer with 4 codebooks sampled at 50 Hz.
 Unlike existing methods, like MusicLM, MusicGen doesn't require a self-supervised semantic representation, and it generates all 4 codebooks in one pass.
@@ -79,11 +82,11 @@ import scipy
 import torch
 from transformers import pipeline
-synthesiser = pipeline("text-to-audio", "facebook/musicgen-stereo-small", device="cuda", torch_dtype=torch.float16)
-music = synthesiser("lo-fi music with a soothing melody", forward_params={"do_sample": True})
-scipy.io.wavfile.write("musicgen_out.wav", rate=music["sampling_rate"], music=audio["audio"])
 ```
 3. Run inference via the Transformers modelling code. You can use the processor + generate code to convert text into a mono 32 kHz audio waveform for more fine-grained control.

 from the mono models. The training data is otherwise identical and capabilities and limitations are shared with the base modes. The stereo models work by getting 2 streams of tokens from the EnCodec model, and interleaving those using
 the delay pattern.
+Stereophonic sound, also known as stereo, is a technique used to reproduce sound with depth and direction.
+It uses two separate audio channels played through speakers or headphones arranged so that it sounds like you're listening from different angles.
 MusicGen is a text-to-music model capable of genreating high-quality music samples conditioned on text descriptions or audio prompts.
 It is a single stage auto-regressive Transformer model trained over a 32kHz EnCodec tokenizer with 4 codebooks sampled at 50 Hz.
 Unlike existing methods, like MusicLM, MusicGen doesn't require a self-supervised semantic representation, and it generates all 4 codebooks in one pass.
 import torch
 from transformers import pipeline
+synthesiser = pipeline("text-to-audio", "facebook/musicgen-stereo-small", device="cuda:0", torch_dtype=torch.float16)
+music = synthesiser("lo-fi music with a soothing melody", forward_params={"max_new_tokens": 256})
+sf.write("musicgen_out.wav", music["audio"][0].T, music["sampling_rate"])
 ```
 3. Run inference via the Transformers modelling code. You can use the processor + generate code to convert text into a mono 32 kHz audio waveform for more fine-grained control.