this post was submitted on 06 Jun 2025
16 points (83.3% liked)

datahoarder

7967 readers
28 users here now

Who are we?

We are digital librarians. Among us are represented the various reasons to keep data -- legal requirements, competitive requirements, uncertainty of permanence of cloud services, distaste for transmitting your data externally (e.g. government or corporate espionage), cultural and familial archivists, internet collapse preppers, and people who do it themselves so they're sure it's done right. Everyone has their reasons for curating the data they have decided to keep (either forever or For A Damn Long Time). Along the way we have sought out like-minded individuals to exchange strategies, war stories, and cautionary tales of failures.

We are one. We are legion. And we're trying really hard not to forget.

-- 5-4-3-2-1-bang from this thread

founded 5 years ago
MODERATORS
 

I stumbled upon that new use of mp4 format. Interesting.

top 6 comments
sorted by: hot top controversial new old
[–] [email protected] 3 points 22 hours ago (1 children)

So they use the MP4 container like a zip file and this is revolutionary? I mean, it's a nice stunt, but the developer seems pretty serious about it. What am I misunderstanding?

[–] [email protected] 2 points 21 hours ago (1 children)

Kinda but more like a db (ie you can search it) so instead of spinning a db for your ai you could have that store in that type of container …

[–] [email protected] 5 points 13 hours ago (1 children)

So like indexed text files? I don't understand what's the benefit of encoding text into videos.

[–] [email protected] 3 points 5 hours ago

tl;dr - By using this very strange file format, you can functionally have access to the vast power of a vector database, but with the local simplicity of sqlite.


If I'm understanding this correctly: if you wanted to do a simple search for exact text strings, and that was all that you needed, then yes, you should probably use something like an sqlite database to index and query from.

However, if you are working with massively large data sets, and you need a vector database (for contextual or semantic searches) - well, that's a next level tier of complexity. At that point, you need a vector database server.

What this thing does, however, is format your data into what they call "video" (but realistically would probably look like static if you were to actually play it in VLC). Then...

... I think it's hooking into some similarities between vector databases and video processing, and then using the mature video processing technology to process the "video" at lightning-fast speeds. And you get all of that contextual power without relying on a cloud-based vector database server.

(To be clear, I'm doing a lot of hand-waving over the "similarities between vector databases and video processing" here - perhaps somebody with a computer science degree, or an autistic savant, can explain why this works the way that it does.)

[–] gravitas_deficiency 10 points 1 day ago

Once again proving the addage that you can do literally fucking anything with ffmpeg

[–] [email protected] 4 points 1 day ago

Wow. That's impressive.