No.59
Not sure if it belongs on this board, but I'm looking for some advice on scanning books.
Recently I ordered a book through the library that seemed to exist nowhere online or in a digital format. It was about 120 pages so I thought it might not be too difficult to scan the whole thing. It took about an hour with an overhead scanner, but I was able to get the book in a shoddy (but readable) pdf form. For added measure, I used Acrobat to crop some pages and sanitize the data, which took another half hour. So, what could I do differently? Do you have any experience with this process? What do you use?
>Scanners & Setups
I had access to an overhead scanner which seemed like it would be a bit speedier than a flat-bed scanner but some drawbacks were that sometimes my fingers are in the images, the page detection sometimes cause weird effects on the page, and there was generally a lower resolution.
>Formatting
Are there alternatives to pdf/adobe software? Is there any way of automating the page cropping, the page rotating, etc? How do I make the scan look "clean"?
>DRM & Sharing
If I were to post my scans, what measures should I take? How do I find the people/places that might want this book? Do I just send it off to libgen?
>Anything else?
Is OCR worth it?
In the process of writing this post I found a guide online (pic related) as well as some helpful discussions on the libgen forums. I'll keep lain updated on my experiments.
No.60
I too wish to know. I want to scan some books I bought, and release the scans online.
Alternatives to PDF, as far as I know so far, are PNG, JPG, and djvu
No.66
>>59Did you just borrow an overhead scanner? Where can I look for one?
If it's not on libgen you should contribute, it's real easy, and I'm sure that for most people a shoddy version is better than none.
Also; can you share this guide you spoke of?
That technique seen in your attached image seems like it would make rotating/cropping etc easier due to the static nature of the pages and camera. Any edit done on one can be replicated exactly for every other one
No.68
>>66The scanner was free to use at the local university library. I imagine a hacker-space or printshop might potentially have one as well.
I was too ashamed to post an instructables link in my first post on μ but here it is:
http://www.instructables.com/id/Bargain-Price-Book-Scanner-From-A-Cardboard-Box/ No.72
>>68That guide was awesome, I need to try that myself.
No.75
Is OCR not soykaf anymore, I really don't get the hold up on this relatively simple AI. I guess it's mostly proprietary as well -_-
No.631
For further various methods of book scanning with varying designs according to skills/money one should look here [0]
[0]
https://www.diybookscanner.org/ No.819
>Scanners
I use a basic ass HP scanner. It's annoying but it works, and don't need to stand there in the library for half an hour scanning.
>Formatting
use djvu. If you can only scan PDF's(which applies to most scanners), convert using pdf2djvu. PDF is not intended as a raster format, DJVU is designed specifically for scanned documents. This is what libgen recommends as well.
>DRM & Sharing
libgen is a good place to start. Probably also seed it on libgen once it gets included in a torrent.
>is OCR worth it?
idk
No.820
> ocrIt's definitely worth it. You still need to read it through and spell/grammar check, but if you ocr first before you even begin reading, it's not as dry as read->ocr->read. GNU has an open source ocr if you don't care for freeware (;
https://www.gnu.org/software/ocrad/). It's relatively easy to make your own too, if you're into that sort of thing:(
https://www.nist.gov/node/1298471/emnist-dataset)