Free Open-Source Artificial Intelligence

2928 readers

1 users here now

Welcome to Free Open-Source Artificial Intelligence!

We are a community dedicated to forwarding the availability and access to:

Free Open Source Artificial Intelligence (F.O.S.A.I.)

More AI Communities

LLM Leaderboards

Developer Resources

GitHub Projects

GitHub Stars

FOSAI Time Capsule

founded 2 years ago

MODERATORS

[email protected]

Fish Speech 1.5, an open source voice cloning TTS that's actually good (github.com)

submitted 1 week ago by [email protected] to c/[email protected]

6 comments fedilink hide all child comments

I've been waiting for an open source TTS model that was actually good enough to capture some of the subtleties of language and synthesize them in a natural-sounding way that makes sense. I think I finally found one that fits the requirements.

Model: https://huggingface.co/fishaudio/fish-speech-1.5

It uses an encoder rather than relying on phonemes, and generations sometimes vary because of that, but the amount of errors I've gotten are minimal, and the variations in the generation are all surprisingly natural in slightly different ways, which is very exciting.

Give it a spin if you are also looking for a TTS model that sounds good. It uses voice cloning, so find a good 10-20 second reference clip to have the generations use the same voice.

top 6 comments

sorted by: hot top controversial new old

[–] [email protected] 5 points 1 week ago (1 children)

For a minute I thought there were actually recordings of fish noises from underwater and that someone has put them into TTS.

[–] [email protected] 2 points 1 week ago

But their logo is a whale!

[–] [email protected] 4 points 1 week ago (1 children)

How do you run this locally? What program does one use? I know you can take LLM models and throw them into ollama or gpt4all. What about this?

[–] [email protected] 4 points 1 week ago* (last edited 1 week ago)

I followed their instructions here: https://speech.fish.audio/

I am using the locally-run API server to do inference: https://speech.fish.audio/inference/#http-api-inference

I don't know about other ways. To be clear, this is not (necessarily) an LLM, it's just for speech synthesis, so you don't run it on ollama. That said I think it does technically use Llama under the hood since there are two models, one for encoding text and the other for decoding to audio. Honestly the paper is terrible but it explains the architecture somewhat: https://arxiv.org/pdf/2411.01156

[–] [email protected] 4 points 1 week ago (1 children)

From the link:

We are very excited to announce that we have made our self-research agent demo open source, you can now try our agent demo online at demo for instant English chat and English and Chinese chat locally by following the docs.
You should mention that the content is released under a CC BY-NC-SA 4.0 licence.

So which is it, open source or CC-BY-NC-SA? NC restrictions are not compatible with either the free software or the open source definitions.

[–] [email protected] 3 points 1 week ago

You are right. Their description of "SOTA Open Source TTS" caused me to assume it was open source, but it's clear that

This codebase and all models are released under CC-BY-NC-SA-4.0 License.

So, it's "source available" and not released under a permissive licence.