Mildly Interesting

18342 readers

4 users here now

This is for strictly mildly interesting material. If it's too interesting, it doesn't belong. If it's not interesting, it doesn't belong.

This is obviously an objective criteria, so the mods are always right. Or maybe mildly right? Ahh.. what do we know?

Just post some stuff and don't spam.

founded 2 years ago

MODERATORS

[email protected]

815

At the Internet Archive, this is how we digitize a book—one page at a time, by hand. (files.catbox.moe)

submitted 1 year ago by [email protected] to c/[email protected]

95 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] [email protected] 30 points 1 year ago (3 children)

Wow that seems painfully slow/tedious. Why isn't it automatized? I think I saw a robot do like 20 pages a second on a yt some years ago.

[–] [email protected] 47 points 1 year ago (1 children)

Do you remember the results of those speed scans? Crooked pages, parts of the document cut off, blurry scans, etc.

It was a lazy method that resulted in a lot of junk data.

[–] [email protected] 21 points 1 year ago (1 children)

I think this is what I saw. Not quite 20 pages/s hahah and also a different method.

[–] [email protected] 4 points 1 year ago

Here is an alternative Piped link(s):

this

Piped is a privacy-respecting open-source alternative frontend to YouTube.

I'm open-source; check me out at GitHub.

[–] [email protected] 15 points 1 year ago (1 children)

Google have digitised a lot of books using some more advanced tech, though they started out with something a little like this.

[–] [email protected] 4 points 1 year ago (1 children)

What happened to that in the end? I heard they wanted to digitize the worlds books and then it just petered out at some point and heard nothing about it. Did they continue or was it spun to Internet Archive to do?

[–] [email protected] 3 points 1 year ago

My understanding is the project led into Google Books. Google fought many legal cases and ultimately won but their enthusiasm to scan more books seems to have waned. Google basically convinced judges that by only letting people see a few pages, it fell under fair use, but then that meant you didn't get a giant library because you couldn't read the whole book.

There's an article about it here: https://www.edsurge.com/news/2017-08-10-what-happened-to-google-s-effort-to-scan-millions-of-university-library-books

Also see https://www.hathitrust.org/about/ which is mentioned in the article.

[–] [email protected] 8 points 1 year ago* (last edited 1 year ago)

That would be interesting to see!

This is probably the method that gives you the best quality (deskewing, lighting) without cutting the back of the book and feeding it into a scanner. (AFAIK)

I saw a book scanner similar to this one that used a vacuum to turn pages but otherwise same principle.