Video is unavailable for watching
Show in Telegram
🔥 Speech to Speech model - Fish Agent v0.1 3B by FishAudio
> Trained on 700K hours of multilingual audio
> Continue-pretrained version of Qwen-2.5-3B-Instruct for 200B audio & text tokens
> Zero-shot voice cloning
> Text + audio input/ Audio output
> Ultra-fast inference w/ 200ms TTFA
> Models on the Hub & Finetuning code on its way! 🚀
https://huggingface.co/fishaudio/fish-agent-v0.1-3b
@opendatascience
> Trained on 700K hours of multilingual audio
> Continue-pretrained version of Qwen-2.5-3B-Instruct for 200B audio & text tokens
> Zero-shot voice cloning
> Text + audio input/ Audio output
> Ultra-fast inference w/ 200ms TTFA
> Models on the Hub & Finetuning code on its way! 🚀
https://huggingface.co/fishaudio/fish-agent-v0.1-3b
@opendatascience