this post was submitted on 27 Nov 2023
1 points (100.0% liked)

Data Hoarder

170 readers
1 users here now

We are digital librarians. Among us are represented the various reasons to keep data -- legal requirements, competitive requirements, uncertainty of permanence of cloud services, distaste for transmitting your data externally (e.g. government or corporate espionage), cultural and familial archivists, internet collapse preppers, and people who do it themselves so they're sure it's done right. Everyone has their reasons for curating the data they have decided to keep (either forever or For A Damn Long Time (tm) ). Along the way we have sought out like-minded individuals to exchange strategies, war stories, and cautionary tales of failures.

founded 1 year ago
MODERATORS
 

I want to store my social media posts, comments and conversations in a standard format. They should be stored in simple, human-readable formats. I should be able to browse these files without specialised software.

I've been using Markdown for my websites and my recipes. I can open them as plain text, or use any of the dozens of Markdown viewers out there. Scripts can also work with those files without much effort. I find it preferable to databases and XML files.

I was wondering if there are common human-readable formats for chat logs, social media posts and social media comments.

So far, the best I can come up with is Markdown for social media content, and IRC chat logs for conversations. Is there anything better out there?

top 5 comments
sorted by: hot top controversial new old
[–] [email protected] 1 points 11 months ago (2 children)

If you’re talking about more long-term storage (that is, you want to be sure, in 5-10-20 years time, that you will easily be able to open and view these files), then I would strongly recommend you chose one of these formats that are widely use for archiving purposes:

https://docs.google.com/spreadsheets/d/1XjEjFBCGF3N1spNZc1y0DG8_Uyw18uG2j8V2bsQdYjk/edit#gid=893099148

[–] [email protected] 1 points 11 months ago

Markdown is plain text; it just adds some formatting syntax. It is on your spreadsheet on row 347. You can read about it here.

[–] [email protected] 1 points 11 months ago

Markdown is a plain text format that can render to HTML, but that is a lot easier to read and edit. It's widely known, used and supported. I run multiple websites that are Markdown files rendered into HTML templates. Most static site generators work that way.

PDF makes no sense here as we're talking about storing a few lines of text plus a few lines of metadata. It would make the files difficult to read on small screens, and very hard to read by machines.

[–] [email protected] 1 points 11 months ago (1 children)

HTML, if you manually want to read it yourself

JSON, if you want a script to read it

a actual database if you want to do a huge analysis about social media trends

[–] [email protected] 1 points 11 months ago

Why HTML over Markdown? There is no semantic benefit here. In fact Markdown has much clearer ways to define metadata like post date, user, community, URL etc (the front-matter at the top of the document)

JSON is sort of human-readable, and it's a decent alternative.