overview for 2dollarsim

Adobe Podcast: Best site for cleaning poor quality audio! in c/[email protected]

[–] [email protected] 1 points 8 months ago

Perhaps,

Check out https://github.com/gitmylo/audio-webui

7

What's your favorite 13B models for Ooba? (lemmy.world)

submitted 8 months ago by [email protected] to c/[email protected]

0 comments fedilink

It's been a while since I checked the latest, I'm sure there's been tons released but I have no idea what's good now.

Anatomy of a Decline in c/[email protected]

[–] [email protected] 1 points 1 year ago

I respect an article quoting Mike Tyson in the title! Let's have a read.

Text To Speech toolkit: TTS Generation Web UI (Bark, MusicGen, Tortoise) in c/[email protected]

[–] [email protected] 2 points 1 year ago (1 children)

Dam, I have this issue:

    File "/home/st/.pyenv/versions/3.10.12/lib/python3.10/json/encoder.py", line 179, in default
      raise TypeError(f'Object of type {o.__class__.__name__} '
  TypeError: Object of type PosixPath is not JSON serializable
  [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip. ERROR: Failed building wheel for pyworld

Text To Speech toolkit: TTS Generation Web UI (Bark, MusicGen, Tortoise) in c/[email protected]

[–] [email protected] 2 points 1 year ago (1 children)

I'm going to try this out, but if it's using that much VRAM I might be out of luck

New Llama 2 models available now! in c/[email protected]

[–] [email protected] 1 points 1 year ago* (last edited 1 year ago)

Asking the model for a recent event:

Thinks for a moment Well, one thing that comes to mind is the recent advancements in space exploration! Did you hear about NASA's Perseverance rover landing on Mars last month? Excitedly It's so cool to see how much we're learning about the red planet and the potential for life beyond Earth! And with private companies like SpaceX and Blue Origin leading the charge in space travel, it's an exciting time to be alive for anyone who loves science and tech. Her eyes shine with enthusiasm What's your take on all this, Matt? Are you interested in space exploration at all?

And the date of that was:

https://edition.cnn.com/2021/02/18/world/mars-perseverance-rover-landing-scn-trnd/index.html

So trained on data at least as recently as March 2021

7

New Llama 2 models available now! (huggingface.co)

submitted 1 year ago* (last edited 1 year ago) by [email protected] to c/[email protected]

2 comments fedilink

https://huggingface.co/TheBloke

Currently testing

https://huggingface.co/TheBloke/Llama-2-13B-chat-GPTQ

5

NEW LLAMA 2 MODELS FROM THE BLOKE!! (huggingface.co)

submitted 1 year ago by [email protected] to c/[email protected]

0 comments fedilink

Giving this one a go!

https://huggingface.co/TheBloke/Llama-2-13B-chat-GPTQ

5

Introducing Llama 2 - Meta's Next-Generation Commercially Viable Open-Source AI & LLM (lemmy.world)

submitted 1 year ago by [email protected] to c/[email protected]

1 comments fedilink

cross-posted from: https://lemmy.world/post/1750098

Introducing Llama 2 - Meta's Next Generation Free Open-Source Artificially Intelligent Large Language Model

It's incredible it's already here! This is great news for everyone in free open-source artificial intelligence.

Llama 2 unleashes Meta's (previously) closed model (Llama) to become free open-source AI, accelerating access and development for large language models (LLMs).

This marks a significant step in machine learning and deep learning technologies. With this move, a widely supported LLM can become a viable choice for businesses, developers, and entrepreneurs to innovate our future using a model that the community has been eagerly awaiting since its initial leak earlier this year.

Meta Announcement

Meta Overview

Github

Paper

Here are some highlights from the official Meta AI announcement:

Llama 2

In this work, we develop and release Llama 2, a collection of pretrained and fine-tuned large language models (LLMs) ranging in scale from 7 billion to 70 billion parameters. Our fine-tuned LLMs, called Llama 2-Chat, are optimized for dialogue use cases.

Our models outperform open-source chat models on most benchmarks we tested, and based on our human evaluations for helpfulness and safety, may be a suitable substitute for closedsource models. We provide a detailed description of our approach to fine-tuning and safety improvements of Llama 2-Chat in order to enable the community to build on our work and contribute to the responsible development of LLMs.

Llama 2 pretrained models are trained on 2 trillion tokens, and have double the context length than Llama 1. Its fine-tuned models have been trained on over 1 million human annotations.

Inside the Model

Technical details

With each model download you'll receive:

Model code

Model Weights

README (User Guide)

Responsible Use Guide

License

Acceptable Use Policy

Model Card

Benchmarks

Llama 2 outperforms other open source language models on many external benchmarks, including reasoning, coding, proficiency, and knowledge tests. It was pretrained on publicly available online data sources. The fine-tuned model, Llama-2-chat, leverages publicly available instruction datasets and over 1 million human annotations.

RLHF & Training

Llama-2-chat uses reinforcement learning from human feedback to ensure safety and helpfulness. Training Llama-2-chat: Llama 2 is pretrained using publicly available online data. An initial version of Llama-2-chat is then created through the use of supervised fine-tuning. Next, Llama-2-chat is iteratively refined using Reinforcement Learning from Human Feedback (RLHF), which includes rejection sampling and proximal policy optimization (PPO).

The License

Our model and weights are licensed for both researchers and commercial entities, upholding the principles of openness. Our mission is to empower individuals, and industry through this opportunity, while fostering an environment of discovery and ethical AI advancements.

Partnerships

We have a broad range of supporters around the world who believe in our open approach to today’s AI — companies that have given early feedback and are excited to build with Llama 2, cloud providers that will include the model as part of their offering to customers, researchers committed to doing research with the model, and people across tech, academia, and policy who see the benefits of Llama and an open platform as we do.

The/CUT

With the release of Llama 2, Meta has opened up new possibilities for the development and application of large language models. This free open-source AI not only accelerates access but also allows for greater innovation in the field.

Take Three:

Video Game Analogy: Just like getting a powerful, rare (or previously banned) item drop in a game, Llama 2's release gives developers a powerful tool they can use and customize for their unique quests in the world of AI.

Cooking Analogy: Imagine if a world-class chef decided to share their secret recipe with everyone. That's Llama 2, a secret recipe now open for all to use, adapt, and improve upon in the kitchen of AI development.

Construction Analogy: Llama 2 is like a top-grade construction tool now available to all builders. It opens up new possibilities for constructing advanced AI structures that were previously hard to achieve.

Links

Here are the key resources discussed in this post:

Meta Announcement

Meta Overview

Github

Paper

Technical details

Want to get started with free open-source artificial intelligence, but don't know where to begin?

Try starting here:

FOSAI Welcome Message

FOSAI Crash Course

FOSAI Nexus Resource Hub

If you found anything else about this post interesting - consider subscribing to [email protected] where I do my best to keep you in the know about the most important updates in free open-source artificial intelligence.

This particular announcement is exciting to me because it may popularize open-source principles and practices for other enterprises and corporations to follow.

We should see some interesting models emerge out of Llama 2. I for one am looking forward to seeing where this will take us next. Get ready for another wave of innovation! This one is going to be big.

Text To Speech toolkit: TTS Generation Web UI (Bark, MusicGen, Tortoise) in c/[email protected]

[–] [email protected] 2 points 1 year ago

I had issues getting to run, I'll come back to it. I have other ways to generate bark audio. I found bark to be by far the most natural sounding, it just sounds like it was recorded on a pc mic from 1999. Silero, elevenlabs, sounds monotone to me.

I haven't tried Tortoise yet, I'll have to try that!

3

Adobe Podcast: Best site for cleaning poor quality audio! (podcast.adobe.com)

submitted 1 year ago by [email protected] to c/[email protected]

2 comments fedilink

Looking at you, Bark_TTS!

I tried a few and eventually settled on Adobe Podcast. It's easy to generate whatever audio clips you want, and then you can clean them with Podcast and the free limit is more than enough.

3

Parameters explained! Now you too can know what the hell you're doing! (huggingface.co)

submitted 1 year ago by [email protected] to c/[email protected]

0 comments fedilink

Not sure why it took me so long to find this.

Text To Speech toolkit: TTS Generation Web UI (Bark, MusicGen, Tortoise) in c/[email protected]

[–] [email protected] 2 points 1 year ago (5 children)

Leaving this here for later:

https://github.com/gitmylo/audio-webui

With some explanation

Baize exLlama SuperHOT is the best model I've used yet in c/[email protected]

[–] [email protected] 1 points 1 year ago

I ended up downloading all of the 13B SuperHOT models, each one seems great, but I am still trying to work out how to set the parameters correctly

2

Text To Speech toolkit: TTS Generation Web UI (Bark, MusicGen, Tortoise) (github.com)

submitted 1 year ago by [email protected] to c/[email protected]

9 comments fedilink

I just discovered this repo, it looks really useful for creating AI voices

https://github.com/rsxdalv/tts-generation-webui

5

Baize exLlama SuperHOT is the best model I've used yet (huggingface.co)

submitted 1 year ago by [email protected] to c/[email protected]

3 comments fedilink

Huggingface

This has been the best so far, some wierd behaviour sometimes, maybe that's my parameters though.

For some characters, this has been the best at keeping them in character and progressing the story!

2

Model download links (huggingface.co)

submitted 1 year ago by [email protected] to c/[email protected]

0 comments fedilink

https://huggingface.co/TheBloke

contains the latest exLlama SuperHOT 8K context models

[META] [Crosspost] Announcing a new Search Engine for Lemmy in c/[email protected]

[–] [email protected] 12 points 1 year ago (3 children)

Can't just search for communities across all instances?

6

Tutorial on voice cloning with Bark TTS with all the instructions and examples (www.reddit.com)

submitted 1 year ago* (last edited 1 year ago) by [email protected] to c/[email protected]

0 comments fedilink

Linked original reddit post, but this didn't work for me. I had to take a bunch of extra steps so I've written a tutorial. Original instructions here which I'll refer to, so you don't have to visit reddit. My revised tutorial with all instructions will follow this in the replies, please post questions as a new post in this community, I've locked this thread so that the tutorial remains easily accessible.

Zyin 24 points 2 months ago*

Instructions on how to get this setup if you've never used Jupyter before, like me. I'm not an expert at this, so don't respond asking for technical help.

If you've never done stuff that needs Python before, you'll need to install Pip and Git. Google for the download links. If you have Automatic1111 installed already you already have Pip and Git.

Install the repo. It will be installed in the folder where you open the cmd window:

git clone https://github.com/serp-ai/bark-with-voice-clone

Open a new cmd window in newly downloaded repo's folder (or cd into it) and run it's installation stuff:

pip install .

Install Jupyter notebook. It's basically Google Collab, but ran locally:

pip install jupyterlab (this one may not be needed, I did it anyway)

pip install notebook

If you are on windows, you'll need these to do audio code stuff with Python:

pip install soundfile

pip install ipywidgets

You need to have Torch 2 installed. You can do that with this command (will take a while to download/install):

pip3 install numpy --pre torch torchvision torchaudio --force-reinstall --index-url https://download.pytorch.org/whl/nightly/cu118

To check your current Torch version, open a new cmd window and type these in one at a time:

python import torch print(torch.__version__) #(mine says 2.1.0.dev20230421+cu118)

Now everything is installed. Create a folder called "output" in the bark folder, which will be needed later to prevent a permissions error.

Run Jupyter Notebook while in the bark folder:

jupyter notebook

This will open a new browser tab wit the Jupyter interface. Navigate to /notebooks/generate.ipynb

This is very similar to Google Collab where you run blocks of code. Click on the first block of code and click Run. If the code block has a "[*]" next to it, then it is still processing, just give it a minute to finish.

This will take a while and download a bunch of stuff.

If it manages to finish without errors, run blocks 2 and 3. In block 3, change the line to: filepath = "output/audio.wav" to prevent a permissions related error (remove the leading "/").

You can get different voices by changing the voice_name variable in block 1. Voices are installed at: bark\assets\prompts

For reference on my 3060 12GB, it took 90 seconds to generate 13 seconds of audio. The voice models that come out of the box create a robotic sounding voice, not even close to the quality of ElevenLabs. The voice that I created using /notebooks/clone_voice.ipynb with my own voice turned out terrible and was completely unusable, maybe I did something wrong with that, not sure.

If you want to test the voice clone using your own voice, and you record a voice sample using windows Voice Recorder, you can convert the .m4a file to .wav with ffmpeg (separate download):

ffmpeg -i "C:\Users\USER\Documents\Sound recordings\Recording.m4a" "C:\path\to\bark-with-voice-clone\ ___

Funny title in c/[email protected]

[–] [email protected] 2 points 1 year ago

Some of these were absolute gold.. I'm happy to see these coming back!

TIL A man was awarded $7 million in damages after he developed a lung condition from eating two bags of popcorn every day and inhaling the steam from the bag in c/[email protected]

[–] [email protected] 3 points 1 year ago

Goddammit why did you have to say that. Now I can't just gobble everyone's movie popcorn.

Which interface are you primarily using for Stable diffusion? in c/[email protected]

[–] [email protected] 1 points 1 year ago (1 children)

I would agree, but the rate of innovation in AI is so unpredictable that it could go either way.

1

The shadow (lemmy.world)

submitted 1 year ago by [email protected] to c/[email protected]

0 comments fedilink

I enter my son's room because I heard talking, meaning he isn't going to sleep.

When I go in, I find his cousin there, who won't stop laughing. I grab his arm and ask him why he is there, and I realise there is no way he could be there. I angrily yank his arm and demand he tells me how he is here, why he is here, but he just keeps laughing.

I hear ragged breathing from my son, like he is struggling to breathe or being choked, and I turn back to see his head and shoulders covered in a dark shadow. I let go of his cousin and rush towards my son, and the shadow leaps onto my face making the world completely dark.

I wake up breathing quickly.

1

Reddit:pixelnull: My tips for people new to Pygmalion to get better responses and hopefully this clears up a few things about Pygmalion generally. (www.reddit.com)

submitted 1 year ago by [email protected] to c/[email protected]

0 comments fedilink

really good tips here!