this post was submitted on 16 Aug 2023
942 points (97.1% liked)

xkcd

8859 readers
102 users here now

A community for a webcomic of romance, sarcasm, math, and language.

founded 1 year ago
MODERATORS
942
submitted 1 year ago* (last edited 2 months ago) by Jakylla to c/[email protected]
 

Title text: The heartfelt tune it plays is CC licensed, and you can get it from my seed on JoinDiaspora.net whenever that project gets going.


Transcript2003:

[Cueball approaches a bearded fellow.]

Cueball: Did you get my essay?
Bearded Fellow: Yeah, it was good! But it was a .doc; You should really use a more open-
Cueball: Give it a rest already. Maybe we just want to live our lives and use software that works, not get wrapped up in your stupid nerd turf wars.
Bearded Fellow: I just want people to care about the infrastructures we're building and who-
Cueball: No, you just want to feel smugly superior. You have no sense of perspective and are probably autistic.

2010:

Cueball: Oh my God! We handed control of our social world to Facebook and they're DOING EVIL STUFF!
Bearded Fellow: Do you see this?

[Inset, the bearded fellow rubs his index and middle fingers against his thumb.]

Bearded Fellow: It's the world's tiniest open-source violin.


you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 4 points 1 year ago (1 children)

Really? I must have had a particularly troublesome PDF. It was almost like running it through OCR, generating hundreds of weird typos and formatting errors when I tried to convert with calibre.

[–] [email protected] 3 points 1 year ago (1 children)

The OCR struggles with some PDFs for whatever reasons: font, formatting, etc.

There are 3rd party PDF OCR websites/programs that work better. If I'm having issues I run it through one of those first.

[–] [email protected] 2 points 1 year ago (1 children)

Any suggestions? Even the good ones had error rates that might not matter for a couple of pages, but when scaled to a 500 page book, even a 1% error rate results in an annoying level of typos.

[–] [email protected] 1 points 1 year ago

I use gImageReader + Tesseract, but that probably doesn't meet your criteria. Unfortunately OCR is very rarely perfect unless the input is perfectly clear and with a "OCR friendly" font/formatting. There are "AI powered" OCR out there, but I can't speak to how well they work and I don't know of any free ones.