Removing Shadow and Background for OCR

Questions and postings pertaining to the usage of ImageMagick regardless of the interface. This includes the command-line utilities, as well as the C and C++ APIs. Usage questions are like "How do I use ImageMagick to create drop shadows?".
Post Reply
orcam
Posts: 4
Joined: 2013-02-02T09:13:24-07:00
Authentication code: 6789

Removing Shadow and Background for OCR

Post by orcam »

I am trying to prepare the following image for OCR: Image
But I cant seem to get it right. Normally, without the background & shadows on the side, the following parameters work to get just the text: -normalize -despeckle -despeckle -type grayscale -sharpen 1 -contrast to get :Image
but obviously this doesnt work for the first image.

Any ideas? Thanks for reading.
snibgo
Posts: 12159
Joined: 2010-01-23T23:01:33-07:00
Authentication code: 1151
Location: England, UK

Re: Removing Shadow and Background for OCR

Post by snibgo »

Get a scanner. Life will be so much easier.

If that isn't feasible (perhaps horrible images like this come from clients), use an interactive editor to square up the image, crop it and curve it.

Automated processing is possible for this particular image, of course. But a generic solution for dodgy photographs of till receipts on noisy worktops with bad lighting is a lot of effort.
snibgo's IM pages: im.snibgo.com
orcam
Posts: 4
Joined: 2013-02-02T09:13:24-07:00
Authentication code: 6789

Re: Removing Shadow and Background for OCR

Post by orcam »

Thanks for the input. However,
snibgo wrote:Get a scanner. Life will be so much easier.
The images are coming from a camera and as you said, it is not feasible to get a scanner.
snibgo wrote:If that isn't feasible (perhaps horrible images like this come from clients), use an interactive editor to square up the image, crop it and curve it.
There are many images. Cropping and curving individually is a full time job.
snibgo wrote:But a generic solution for dodgy photographs of till receipts on noisy worktops with bad lighting is a lot of effort.
Alas, that is what is needed. :(
snibgo wrote:Automated processing is possible for this particular image, of course.
Yes! Exactly what I want to know. Any suggestions?
User avatar
fmw42
Posts: 25562
Joined: 2007-07-02T17:14:51-07:00
Authentication code: 1152
Location: Sunnyvale, California, USA

Re: Removing Shadow and Background for OCR

Post by fmw42 »

If on Linux/Mac or Windows w/Cygwin, try my script, textcleaner at the link below. Or use -lat, which is the basis of my script
orcam
Posts: 4
Joined: 2013-02-02T09:13:24-07:00
Authentication code: 6789

Re: Removing Shadow and Background for OCR

Post by orcam »

Thanks for joining this thread Fred! :D
I did try your script earlier today with [ -g -e normalize -f 30 -o 12 -s 2 ] & many other variations of it to generally get something like Image
I couldn't get the text to be less pixelated by changing the different parameters as can be seen in the zoomed image:
Image
Any idea on what parameter has to be adjusted or what operation can be done?
User avatar
fmw42
Posts: 25562
Joined: 2007-07-02T17:14:51-07:00
Authentication code: 1152
Location: Sunnyvale, California, USA

Re: Removing Shadow and Background for OCR

Post by fmw42 »

You are really limited by the size of the image and thus the resolution available in pixels. You would do better if the image size were much larger.


try one of these (about the best I can get depending upon your idea of less pixelated)


textcleaner -g -f 15 -o 15 -e normalize -t 10 4wtf21w.jpg show:


textcleaner -g -f 15 -o 15 -e normalize -t 10 -s 1 4wtf21w.jpg show:
User avatar
anthony
Posts: 8883
Joined: 2004-05-31T19:27:03-07:00
Authentication code: 8675308
Location: Brisbane, Australia

Re: Removing Shadow and Background for OCR

Post by anthony »

I would agree. the image size while suitable for monitor displays, is not suitable for OCR.

Practically all cameras these days capture at a much much higher resolution.

Also I would try to 'square up' the image more before trying to clean up the image. Phaps looking for the edge of the docket and the workbench to find the rotation.

Finally. Just how much control do you have on the photographing?
Can the work area be controlled?
Provide good strong lighting (or flash)?
Can you control the camera being used (resolution)
Can the camera be mounted perfectly overhead?
How about providing a fixed solid edge the docket can pushed up against so it will be square with the camera?
Can the workbench contrast be controlled?
EG: made dark, or some specific color (green felt) for easier auto docket rotation, and or removal.

The more you can control the environment, the easier it is to automate the OCR conversion, even without going to the expense of a dedicated high-res scanner.

The simple use of an edge, for example means it is fast to position the docket and take the photo, allowing fast turn over of dockets thru the system. Perhaps even a real-time indication of a successful OCR conversion on the computer as you process each docket (even to item codes being checked against the stock database).
Anthony Thyssen -- Webmaster for ImageMagick Example Pages
https://imagemagick.org/Usage/
orcam
Posts: 4
Joined: 2013-02-02T09:13:24-07:00
Authentication code: 6789

Re: Removing Shadow and Background for OCR

Post by orcam »

Sorry, i'm a bit late to reply; got busy. :(
You are really limited by the size of the image and thus the resolution available in pixels. You would do better if the image size were much larger.
I was finally able to get higher quality image here: http://i.minus.com/ibiHhso9LbxL1f.png
try one of these ...
They give about the same quality I originally got. I think this is an OCR issue rather than preprocessing?Perhaps it needs to be trained better.
Also I would try to 'square up' the image more before trying to clean up the image. Phaps looking for the edge of the docket and the workbench to find the rotation.
Yeah, I should probably add that but currently, want to focus on straight images.
Finally. Just how much control do you have on the photographing?
Not much. I suppose most of the images are of the same quality as above. I have access only to the raw image files; cant do much about work area, flash,camera mounting,etc
EG: made dark, or some specific color (green felt) for easier auto docket rotation, and or removal.
Any built in function in imagemagick to do this?
The more you can control the environment, the easier it is to automate the OCR conversion, even without going to the expense of a dedicated high-res scanner.
I agree 1000%; scanned images would be much easier. Unfortunately, only the above type of camera images are available.
The simple use of an edge, for example means it is fast to position the docket and take the photo, allowing fast turn over of dockets thru the system. Perhaps even a real-time indication of a successful OCR conversion on the computer as you process each docket

I heard this kind of system was developed by a Google Book's developer using a hacked scanner.
User avatar
anthony
Posts: 8883
Joined: 2004-05-31T19:27:03-07:00
Authentication code: 8675308
Location: Brisbane, Australia

Re: Removing Shadow and Background for OCR

Post by anthony »

By 'edge'. I means just a raise bit of wood stuck to the work area. The docket is pushed onto that wood edge and thus is immediately perfectly aligned with the camera, before the image is taken. No time spent by user.

Such small changes (like green felt on workbench) makes the later processing that much easier.

But is you have little control. then the next step is to try auto rotation.

have a look at Fred (fwm) scripts. whiteboard script and rotation scripts.
(see his link above)

If the photo can be rotated square before OCR then OCR software should work a lot better.
Anthony Thyssen -- Webmaster for ImageMagick Example Pages
https://imagemagick.org/Usage/
Post Reply