this post was submitted on 27 Nov 2023
1 points (100.0% liked)

Data Hoarder

24 readers
1 users here now

We are digital librarians. Among us are represented the various reasons to keep data -- legal requirements, competitive requirements, uncertainty of permanence of cloud services, distaste for transmitting your data externally (e.g. government or corporate espionage), cultural and familial archivists, internet collapse preppers, and people who do it themselves so they're sure it's done right. Everyone has their reasons for curating the data they have decided to keep (either forever or For A Damn Long Time (tm) ). Along the way we have sought out like-minded individuals to exchange strategies, war stories, and cautionary tales of failures.

founded 10 months ago
MODERATORS
 

I want to store my social media posts, comments and conversations in a standard format. They should be stored in simple, human-readable formats. I should be able to browse these files without specialised software.

I've been using Markdown for my websites and my recipes. I can open them as plain text, or use any of the dozens of Markdown viewers out there. Scripts can also work with those files without much effort. I find it preferable to databases and XML files.

I was wondering if there are common human-readable formats for chat logs, social media posts and social media comments.

So far, the best I can come up with is Markdown for social media content, and IRC chat logs for conversations. Is there anything better out there?

you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 1 points 9 months ago

Markdown is a plain text format that can render to HTML, but that is a lot easier to read and edit. It's widely known, used and supported. I run multiple websites that are Markdown files rendered into HTML templates. Most static site generators work that way.

PDF makes no sense here as we're talking about storing a few lines of text plus a few lines of metadata. It would make the files difficult to read on small screens, and very hard to read by machines.