this post was submitted on 23 Oct 2023
1 points (100.0% liked)

Self-Hosted Main

502 readers
1 users here now

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.

For Example

We welcome posts that include suggestions for good self-hosted alternatives to popular online services, how they are better, or how they give back control of your data. Also include hints and tips for less technical readers.

Useful Lists

founded 1 year ago
MODERATORS
 

Hello, I'm starting a new course and the materials are all in PDF viewable only, for comody sake i use it a lot for online services to convert image to text, even ChatGpt 4 does it, does somebody knows some king of self hosted ocr converter? To convert screenshots into text?

Tnx

top 9 comments
sorted by: hot top controversial new old
[–] [email protected] 1 points 11 months ago

tesseract-ocr? You can download it via apt or something similar.

[–] [email protected] 1 points 11 months ago (1 children)

paperless-ngx has built in ocr but I don't think it would fit your needs

[–] [email protected] 1 points 11 months ago

I will check it up

[–] [email protected] 1 points 11 months ago (1 children)

Windows 11 has this built in if you take a screenshot

[–] [email protected] 1 points 11 months ago

Didn't know that,i use flameshot for screenshots,i will take a look thnx

[–] [email protected] 1 points 11 months ago (1 children)

You could spin up paperless-ngx. Or use pdf24 creator. Beware paperless consume will delete the file.

I used paperless-ngx before and it works pretty good.

[–] [email protected] 1 points 11 months ago

I will check it up, i have Stirlingpdf and I see it also has ocr support

[–] [email protected] 1 points 11 months ago

I'm not sure I understand you correctly. Do you want to apply OCR to PDFs or to Screenshots?

For PDFs there's the excellent ocrmypdf which paperless-ngx uses under the hood.

[–] [email protected] 1 points 11 months ago

Nextcloud AIO (all-in-one) comes with full text search installed, which brings tesseract to nextcloud. so you can let tesseract-ocr run over all documents and then they will be searchable with Elasticsearch.