hedgehog

joined 2 years ago
[–] [email protected] 4 points 15 hours ago

Is your goal to create things that can be published or used in a project, or to create audiobooks for yourself to listen to?

For voiceovers for text, I use Kokoro Fast API, which has a web frontend. The frontend is only compatible with Chromium browsers on desktop or Android, which sucks as my daily driver is Firefox and an iPhone (there are workarounds in the thread) but it supports voice mixing, speed changes, etc.. It also has an issue where it keeps the models (about 3GB) in memory; I keep the CPU version loaded normally and swap to the GPU version if I need it to be faster. If you want something similar for Bark, check out Bark-GUI.

I’ve also dabbled a bit in some TTS features that have Comfy nodes, though at this point mostly just in terms of getting them set up. For my purposes thus far Kokoro has been fine (and I prefer the FastAPI project over the Comfy nodes for most of my uses), but I’ve found nodes for Kokoro, Dia, F5 TTS, Orpheus, and Zonos.

Autiobooks and audiblez both look promising. A few weeks ago, I used the Kokoro FastAPI web frontend to create an audiobook for an ebook I worked on that used entirely self-hosted AI generation for the outlining and prose. Audiblez, which I found about like two days after that, looks like it would have simplified that process substantially. Still, I’d personally like something more like an audiobook studio, where I can more easily swap voices back and forth, add emotions, play with speed on a more granular level, etc.. I’m thinking about building something that contains that at some point myself, but it’ll be a minute - hopefully someone else will beat me there.

I posted a comment here a few weeks back on a similar topic. I’ve since used OpenReader-WebUI and like it, though that’s not for producing audiobooks, but for a read-along experience. Reproducing the comment below in case it’s helpful for you:

If you want to generate audiobooks using your own / a hosted TTS server, check out one of these options:

  • OpenReader-WebUI - this has built-in read along capability and can be deployed as a PWA that can allow you to download the audiobooks to your phone so you can use them offline
  • p0n1/epub to audiobook
  • ebook2audiobook If you don’t have a decent GPU, Kokoro is a great option as it’s fast enough to run on CPU and still sounds very good. If you’re going to use Kokoro, Audiblez (posted by another commenter) looks like it makes that more of an all-in-one option. If you want something that you can use without an upfront building of the audiobook, of the above options, only OpenReader-WebUI supports that. RealtimeTTS is a library that handles that, but I don’t know if there are already any apps out there that integrate it. If you have the audiobook generation handled and just want to be able to follow along with text / switch between text and audio, check out https://storyteller-platform.gitlab.io/storyteller/
[–] [email protected] 21 points 15 hours ago

The witch turned the creep into a woman and the spell was complete by the time she flew away. Unfortunately, like many women, the creep was born with the body of a man (she’s AMAB). Maybe the witch could have changed her body, too, but that would have made things far too easy, given that the point of the curse was to teach her empathy.

[–] [email protected] 1 points 1 day ago

SublimeText seems to have it. I don’t personally use it but it’s a pretty competent editor and it’s not in the feature table from the Wikipedia page someone else shared.

Sublime 3 was limited to folding by indentation; I’m not sure if that’s true for Sublime 4 as well, but the Markdown plugin docs have a note on folding and mention you can fold by section and heading levels.

[–] [email protected] 13 points 3 days ago (1 children)

Your comment wasn’t in a meta discussion; it was on a post where they were venting about people complaining about them having a women’s only space. There was certainly no indication that the regular community rules didn’t apply, nor any invitation for men to comment.

Commenting that it’s hostile for them to have a women’s only space might be ironic, but couldn’t possibly be good faith, in that context. And if the same mod banned you from multiple communities, then either it was out of line and you could appeal it, or it was warranted due to the perceived likelihood of you causing problems in those other communities and the perceived low likelihood of you contributing anything of value to them.

Even now, you’re acting like the mod(s) banned you because of her / their emotions. You don’t see how that’s misogynistic?

It makes logical sense for bad actors to be preemptively banned. Emotions have nothing to do with it.

[–] [email protected] 2 points 5 days ago* (last edited 4 days ago)

Right now I have Ollama / Open-WebUI, Kokoro FastAPI, ComfyUI, Wan2GP, and FramePack Studio set up. I recently (as in yesterday) configured an API key middleware with Traefik and placed it in front of Ollama and Comfy, but currently nothing is using them yet.

I’ll probably try out Devstral with one of the agentic coding frameworks, like Void or Anon Kode. I may also try out one of the FOSS writing studios (like Plot Bunni) and connect my own Ollama instance. I could use NovelCrafter but paying a subscription fee to use my own server for the compute intensive part feels silly to me.

I tried to use Open Notebook (basically a replacement for NotebookLM) with Ollama and Kokoro, with Kokoro FastAPI as my OpenAI endpoint, but turns out it only supported, and required, text embeddings from OpenAI, so I couldn’t do that fully on my local. At some point, if they don’t fix that, I’m planning to either add support myself or set up some routes with Traefik where the ones OpenNotebook uses point to the service I want to use.

ETA: n8n is one of the services I plan to set up next, and I’ll likely end up integrating both Ollama and Comfy workflows into it.

[–] [email protected] 1 points 6 days ago

You got the idea!

[–] [email protected] 1 points 6 days ago* (last edited 6 days ago) (2 children)

We’re in c/showerthoughts. “What if my grandma was a bike?” would fit right in

[–] [email protected] 3 points 1 week ago

To be clear, I agree that the line you quoted is almost assuredly incorrect. If they changed it to "thousands of deepfake apps powered by open source technology" then I'd still be dubious, simply because it seems weird that there would be thousands of unique apps that all do the same thing, but that would at least be plausible. Most likely they misread something like https://techxplore.com/news/2025-05-downloadable-deepfake-image-generators.html and thought "model variant" (which in this context, explicitly generally means LoRA) and just jumped too hard on the "everything is an open source app" bandwagon.

I did some research - browsing https://github.com/topics/deepfakes (which has 153 total repos listed, many of which are focused on deepfake detection), searching DDG, clicking through to related apps from Github repos, etc..

In terms of actual open source deepfake apps, let's assume that "app" means, at minimum, a piece of software you can run locally, assuming you have access to arbitrary consumer-targeted hardware - generally at least an Nvidia desktop GPU - and including it regardless of whether you have to write custom code to use it (so long as the code is included), use the CLI, hit an API, use a GUI app, a web browser, or a phone app. Considering only apps that have as a primary use case, the capability to create deepfakes by face swapping videos, there are nonetheless several:

  • Roop
  • Roop Unleashed
  • Rope
  • Rope Live
  • VisoMaster
  • DeepFaceLab
  • DeepFaceLive
  • Reactor UI
  • inswapper
  • REFace
  • Refacer
  • Faceswap
  • deepfakes_faceswap
  • SimSwap

If you included forks of all those repos, then you'd definitely get into the thousands.

If you count video generation applications that can imitate people using, at minimum, Img2Img and 1 Lora OR 2 Loras, then these would be included as well:

  • Wan2GP
  • HunyuanVideoGP
  • FramePack Studio
  • FramePack eichi

And if you count the tools that integrate those, then these probably all count:

  • ComfyUI
  • Invoke AI
  • SwarmUI
  • SDNext
  • Automatic1111 SD WebUI
  • Fooocus
  • SD WebUI Forge
  • MetaStable
  • EasyDiffusion
  • StabilityMatrix
  • MochiDiffusion

If the potential criminals use easier ready-made (commercial) web-services instead of buying a RTX 5090, learning ComfyUI, dealing with the steep learning curve etc, we’d know we have to primarily fight those apps and services, not necessarily the generative AI tools.

This is the part where, to be able to answer that, someone would need to go and actually test out the deepfake apps and compare their outputs. I know that they get used for deepfakes because I've seen the outputs, but as far as I know, every single major platform - e.g., Kling, Veo, Runway, Sora - has safeguards in place to prevent nudity and sexual content. I'd be very surprised if they were being used en masse for this.

In terms of the SaaS apps used by people seeking to create nonconsensual, sexually explicit deepfakes... my guess is those are actually not really part of the figure that's being referenced in this article. It really seems like they're talking about doing video gen with LoRAs rather than doing face swaps.

[–] [email protected] 3 points 1 week ago (2 children)

Without searching for them myself to confirm, it’s plausible, especially if you take it to mean “apps leveraging open source AI technology.”

There are a ton of open source AI repos, many of which provide video related capabilities. The number of true open source AI models is very slim, but “Open weight” AI models are commonly referred to as open source, and from the perspective of building your app, fine tuning the model, or creating Loras for it, open weight is good enough.

Some Loras come with details on the training data set, so even if the base model is only open weights, the Lora can still be open source.

Until recently, Civitai had Loras for famous people, e.g., Emma Watson, and apparently just regular people. There was a post here last week, I think (or maybe to some other community), to 404 Media, about those being taken down thanks to credit card processors drawing a line in the sand at deepfake imagery.

ComfyUI is a self hostable AI platform (and there are also many hosts that offer it) that lets you build a workflow from multiple nodes, each of which generally integrates some open source AI tech that was otherwise released. For example, there are nodes that add the capabilities to perform:

  • image generation with Stable Diffusion, Flux, Hidream, etc
  • TTS with KokoroTTS, Piper, F5 TTS, etc
  • video generation with AnimateDiff, Cog, Wan2.1, Hunyuan, FramePack, FantasyTalking, Float
  • video modification, i.e., LatentSync, which takes a video and lipsyncs it to a provided audio file
  • image manipulation, i.e., controlnet, img2img, inpainting, outpainting, or even specific tasks like “remove the background” or “change the face to this other face”

If you think of a deepfake as just a video of a recognizable person doing a thing, you can create a deepfake by:

  • taking an existing video and swapping the face in each frame
  • faceswap video specific approaches, i.e., Roop.
  • an image to video workflow, i.e., with Wan: “the person dances.” You can expand the options available with Wan by using Loras.
  • a text to video workflow, where you use a Lora for that person
  • an image+audio to video workflow, i.e., with FantasyTalking/Float, creating a lipsync to an audio file you provide
  • a video+audio to video workflow with LatentSync to make it look like they said something different, particularly using a TTS (like F5 TTS) that does voice cloning to generate the new audio

My suspicion is that most of the AI apps that are available online are just repackaging these open source technologies, but are not open source themselves. There are certainly some, of course, though the ones I know of are more generic and not deepfake specific (ComfyUI, SwarmUI, Invoke AI, Automatic1111, Forge, Fooocus, n8n, FramePack Studio, FramePack Eichi, Wan2GP, etc.).

This isn’t a licensing issue, as many open source projects are licensed with MIT or Apache licenses, which don’t require you to open source derivative products. Even if they used the GPL, it wouldn’t be required for a SaaS web app. Only the AGPL would protect against that, and even then, only the changes to the AGPL library would need to be shared; the front end app could still be proprietary.

The other issue could be them not knowing what “app” means. If you think of a Lora as an app, then the sentence might be accurate. I don’t know for sure that there were thousands of Loras for people that published their training data, but I wouldn’t be surprised if that were the case.

[–] [email protected] 23 points 1 week ago (4 children)

Have you tried just setting the resolution to 1920x1080 or are you literally trying to run AAA games at 4K on a card that was targeting 1080p when it was released, 4 and a half years ago?

[–] [email protected] 13 points 2 weeks ago

It’s the new hyped up version of “no-code” or low-code solutions, but with AI so you have more flexibility to footgun.

[–] [email protected] 6 points 2 weeks ago

Not any lazier. Script kiddies didn’t write the code themselves, either.

 

This only applies when the homophone is spoken or part of an audible phrase, so written text is safe.

It doesn’t change reality, just how people interpret something said aloud. You could change “Bare hands” to be interpreted as “Bear hands,” for example, but the person wouldn’t suddenly grow bear hands.

You can only change the meaning of the homophones.

It’s not all or nothing. You can change how a phrase is interpreted for everyone, or:

  • You can affect only a specific instance of a phrase - including all recordings of it, if you want - but you need to hear that instance - or a recording of it - to do so. If you hear it live, you can affect everyone else’s interpretation as it’s spoken.
  • You can choose not to affect how it is perceived by people when they say it aloud, and only when they hear it.
  • You can affect only the perception of particular people for a given phrase, but you must either be point at them (pictures work) or be able to refer to them with five or fewer words, at least one of which is a homophone. For example, “my aunt.” Note that if you do this, both interpretations of the homophone are affected, if relevant, (e.g., “my ant”).
  • You can make it so there’s a random chance (in 5% intervals, from 5% to 95%) that a phrase is misinterpreted.
 

cross-posted from: https://lemmy.world/post/19716272

Meta fed its AI on almost everything you’ve posted publicly since 2007

 

The video teaser yesterday about this was already DMCAed by Nintendo, so I don’t think this video will be up long.

view more: next ›