PDF to JPG : images cropped without reason

Questions and postings pertaining to the usage of ImageMagick regardless of the interface. This includes the command-line utilities, as well as the C and C++ APIs. Usage questions are like "How do I use ImageMagick to create drop shadows?".
Post Reply
madoxav
Posts: 1
Joined: 2012-01-03T04:04:24-07:00
Authentication code: 8675308

PDF to JPG : images cropped without reason

Post by madoxav »

Hello,
I'm new with ImageMagick (on windows 7 64b).
I have an old PDF, scanned last yead from a subject made in school. It contains a lot of images.
I want to convert it to multiple JPG (or any other image format).

It works "fine", but the output image is cropped; something like 75px are missing on the left and on the top.

the problem is, I didn't asked for cropping !

Code: Select all

convert  "Prez - ArchiWeb.pdf"  web-%02d.jpg
It seems nobody in this forum had the problem. Can someone give me an advice?

Thanks in advance.

Regards
Tehzaz
User avatar
glennrp
Posts: 1147
Joined: 2006-04-01T08:16:32-07:00
Location: Maryland 39.26.30N 76.16.01W

Re: PDF to JPG : images cropped without reason

Post by glennrp »

Maybe there is some "page" data in the PDF that is interfering. Try this:

Code: Select all

convert  "Prez - ArchiWeb.pdf" +repage web-%02d.jpg
Ivar Snaaijer
Posts: 21
Joined: 2006-02-22T09:22:30-07:00

Re: PDF to JPG : images cropped without reason

Post by Ivar Snaaijer »

I ran into a similar problem with ImageMagick 6.7.4-1 2011-12-18 Q8

I convert a multipage pdf with different pagesizes to tiff. It seems that only the last page is used for metrics as i can see in the output of -verbose :

Code: Select all

[ghostscript library] Files/gs/bin/gswin32c.exe" -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dEPSCrop -dAlignToPixels=0 -dGridFitTT=2 "-sDEVICE=pngalpha" -dTextAlphaBits=4 -dGraphicsAlphaBits=4 "-r72x72" -g638x441  "-sOutputFile=C:/Users/Ivar/AppData/Local/Temp/magick-zY-r
0IXb-%08d" "-fC:/Users/Ivar/AppData/Local/Temp/magick-N2jyWDSy" "-fC:/Users/Ivar/AppData/Local/Temp/magick-E5rw2QQ-"C:/Users/Ivar/AppData/Local/Temp/magick-zY-r0IXb-00000001 PNG 638x441 638x441+0+0 8-bit DirectClass 13.8KB 0.010u 0:00.009
C:/Users/Ivar/AppData/Local/Temp/magick-zY-r0IXb-00000002 PNG 638x441 638x441+0+0 8-bit DirectClass 2.79KB 0.010u 0:00.010
C:/Users/Ivar/AppData/Local/Temp/magick-zY-r0IXb-00000003 PNG 638x441 638x441+0+0 8-bit DirectClass 7.33KB 0.010u 0:00.009
4DF5EDB9D96D472CA48B440566877421.pdf[0] PDF 638x441 638x441+0+0 8-bit DirectClass 13.8KB 0.030u 0:00.110
4DF5EDB9D96D472CA48B440566877421.pdf[1] PDF 638x441 638x441+0+0 8-bit DirectClass 13.8KB 0.020u 0:00.090
4DF5EDB9D96D472CA48B440566877421.pdf[2] PDF 638x441 638x441+0+0 8-bit DirectClass 13.8KB 0.000u 0:00.060
4DF5EDB9D96D472CA48B440566877421.pdf=>tst.tif[0] PDF 638x441 638x441+0+0 8-bit Bilevel DirectClass 1.691MB 0.020u 0:00.290
When i use pdftk to burst the file and only convert one page at the time i see this (page1, result 595x859) :

Code: Select all

[ghostscript library] Files/gs/bin/gswin32c.exe" -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dEPSCrop -dAlignToPixels=0 -dGridFitTT=2 "-sDEVICE=pngalpha" -dTextAlphaBits=4 -dGraphicsAlphaBits=4 "-r72x72" -g595x859  "-sOutputFile=C:/Users/Ivar/AppData/Local/Temp/magick-abqV
2EOQ-%08d" "-fC:/Users/Ivar/AppData/Local/Temp/magick-UK5sR5Mj" "-fC:/Users/Ivar/AppData/Local/Temp/magick-tNhbGQKk"   **** Warning:  Generation number out of 0..65535 range, assuming 0.
C:/Users/Ivar/AppData/Local/Temp/magick-abqV2EOQ-00000001 PNG 595x859 595x859+0+0 8-bit DirectClass 23.3KB 0.020u 0:00.020
files1.pdf PDF 595x859 595x859+0+0 8-bit DirectClass 23.3KB 0.010u 0:00.019
files1.pdf=>tst.tif PDF 595x859 595x859+0+0 8-bit Bilevel DirectClass 1.024MB 0.020u 0:00.170
And this (page3, result 638x441) :

Code: Select all

[ghostscript library] Files/gs/bin/gswin32c.exe" -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dEPSCrop -dAlignToPixels=0 -dGridFitTT=2 "-sDEVICE=pngalpha" -dTextAlphaBits=4 -dGraphicsAlphaBits=4 "-r72x72" -g638x441  "-sOutputFile=C:/Users/Ivar/AppData/Local/Temp/magick-mVE-
36IH-%08d" "-fC:/Users/Ivar/AppData/Local/Temp/magick-vca8MHqM" "-fC:/Users/Ivar/AppData/Local/Temp/magick-3gocIYYF"   **** Warning:  Generation number out of 0..65535 range, assuming 0.
C:/Users/Ivar/AppData/Local/Temp/magick-mVE-36IH-00000001 PNG 638x441 638x441+0+0 8-bit DirectClass 7.33KB 0.010u 0:00.010
files3.pdf PDF 638x441 638x441+0+0 8-bit DirectClass 7.33KB 0.000u 0:00.029
files3.pdf=>tst.tif PDF 638x441 638x441+0+0 8-bit Bilevel DirectClass 564KB 0.010u 0:00.100
I do not remember this was a problem, so i'll try an older version to see if this 'space saving Feature' was added later

Adding +repage before or after the pdf does not help.
Ivar Snaaijer
Posts: 21
Joined: 2006-02-22T09:22:30-07:00

Re: PDF to JPG : images cropped without reason

Post by Ivar Snaaijer »

I tried with ImageMagick 6.6.7-10 2011-02-22 Q8
And got results i like (this is the verbose output)

Code: Select all

[ghostscript library] Files/gs/bin/gswin32c.exe" -q -dQUIET -dPARANOIDSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dEPSCrop -dAlignToPixels=0 -dGridFitTT=2 "-sDEVICE=pngalpha" -dTextAlphaBits=4 -dGraphicsAlphaBits=4 "-r72x72"  "-sOutputFile=C:/Users/Ivar/AppData/Local/Temp/magick-C8CpE8
D5-%08d" "-fC:/Users/Ivar/AppData/Local/Temp/magick-vhu2gB_c" "-fC:/Users/Ivar/AppData/Local/Temp/magick-GFKvKY6u"C:/Users/Ivar/AppData/Local/Temp/magick-C8CpE8D5-00000001 PNG 595x859 595x859+0+0 8-bit DirectClass 23.3KB 0.020u 0:00.020
C:/Users/Ivar/AppData/Local/Temp/magick-C8CpE8D5-00000002 PNG 597x859 597x859+0+0 8-bit DirectClass 7.75KB 0.020u 0:00.020
C:/Users/Ivar/AppData/Local/Temp/magick-C8CpE8D5-00000003 PNG 638x441 638x441+0+0 8-bit DirectClass 7.33KB 0.010u 0:00.010
4DF5EDB9D96D472CA48B440566877421.pdf[0] PDF 595x859 595x859+0+0 8-bit DirectClass 23.3KB 0.050u 0:00.109
4DF5EDB9D96D472CA48B440566877421.pdf[1] PDF 597x859 597x859+0+0 8-bit DirectClass 23.3KB 0.030u 0:00.080
4DF5EDB9D96D472CA48B440566877421.pdf[2] PDF 638x441 638x441+0+0 8-bit DirectClass 23.3KB 0.010u 0:00.070
4DF5EDB9D96D472CA48B440566877421.pdf=>tst.tif[0] PDF 595x859 595x859+0+0 8-bit Bilevel DirectClass 165KB 0.040u 0:00.141
Someone has changed the delegates.xml or the way they are called somehow, the -g option was added by ImageMagick but not explicitly mentioned in delegates.xml.
It also looks like the comments in this files are not updated. I hope to fix it using the standard printf naming (ig. %5s for OutputFile)
User avatar
whugemann
Posts: 289
Joined: 2011-03-28T07:11:31-07:00
Authentication code: 8675308
Location: Münster, Germany 52°N,7.6°E

Re: PDF to JPG : images cropped without reason

Post by whugemann »

If it is scans that are wrapped into a PDF, it's better to use PDFimages to extract them again, which happens lossless in this case. If you are lucky, every page is a single JPEG. (If you're not, the scans were cut into horizontal stripes before they were wrapped into the PDF.) So give PDFimages a try http://www.imagemagick.org/Usage/windows/#auxiliary.

You say that you are working with IM reading PDFs under Windows 64 bit? I wonder that this works at all, because there were severe problems with ImageMagick calling GhostScript under Windows 64 bit. Have these been solved?
Wolfgang Hugemann
Ivar Snaaijer
Posts: 21
Joined: 2006-02-22T09:22:30-07:00

Re: PDF to JPG : images cropped without reason

Post by Ivar Snaaijer »

I found two problems, but both were simple configuration incompatibilities. ghostscript is called gswin32c.exe and gswin64c.exe depending on which version. either change the xml or copy the exe (create a copy of gswin32c.exe and call it gswin64c.exe or vise versa).
The other problem was the -q switch if i recal correctly. this is easily removed from the xml.

The reason i use imagemagick for pdf is that it makes it easy to convert pdf, jpg, xps, png, etc with the same convert commandline
Also only in this special case I process images in the pdf, normally they are pdf versions of word/powerpoint documents.

n.b. on my Ubuntu box (ImageMagick 6.6.0-4 2011-06-15 Q16) it also works without any problem (the -g option is not set)

Could someone elaborate on the need for this -g setting ? (possibly thread 20078 or 20033)
Ivar Snaaijer
Posts: 21
Joined: 2006-02-22T09:22:30-07:00

Re: PDF to JPG : images cropped without reason

Post by Ivar Snaaijer »

There was a change in the code involving -g in the pdf.c file, the delegates.xml is not changed in any significant way.

http://trac.imagemagick.org/changeset/6 ... ders/pdf.c

I'll pm 'cristy' as he made the change (assuming he is not yet aware of this thread)
User avatar
magick
Site Admin
Posts: 11064
Joined: 2003-05-31T11:32:55-07:00

Re: PDF to JPG : images cropped without reason

Post by magick »

ImageMagick looks for the widest media box in the PDF and uses that as the page size. You can override that with the -page option (e.g. -page letter). To help further we'll need to inspect your PDF, post a URL here so we can download it.
User avatar
whugemann
Posts: 289
Joined: 2011-03-28T07:11:31-07:00
Authentication code: 8675308
Location: Münster, Germany 52°N,7.6°E

Re: PDF to JPG : images cropped without reason

Post by whugemann »

Ivar Snaaijer wrote:I found two problems, but both were simple configuration incompatibilities. ghostscript is called gswin32c.exe and gswin64c.exe depending on which version. either change the xml or copy the exe (create a copy of gswin32c.exe and call it gswin64c.exe or vise versa).
The other problem was the -q switch if i recal correctly. this is easily removed from the xml.
I have just installed the latest version of IM Windows 64 bit and found that it's working with 64-bit GhostScript 9.04. I didn't make any changes to the installation. Interesting enough, the -verbose option claims to call gswin32c.exe, although there is only gswin64c present on our server. The program name seems to be translated automatically. I encountered no problems with the -q option, which is also present in my delegates.xml file.

BTW: The deleagtes.xml file always refers to @PSDelegate@ instead of any specifig program. I wonder where this generic reference is resolved (?).
Wolfgang Hugemann
User avatar
whugemann
Posts: 289
Joined: 2011-03-28T07:11:31-07:00
Authentication code: 8675308
Location: Münster, Germany 52°N,7.6°E

Re: PDF to JPG : images cropped without reason

Post by whugemann »

whugemann wrote:
Ivar Snaaijer wrote:I found two problems, but both were simple configuration incompatibilities. ghostscript is called gswin32c.exe and gswin64c.exe depending on which version. either change the xml or copy the exe (create a copy of gswin32c.exe and call it gswin64c.exe or vise versa).
The other problem was the -q switch if i recal correctly. this is easily removed from the xml.
I have just installed the latest version of IM Windows 64 bit and found that it's working with 64-bit GhostScript 9.04. I didn't make any changes to the installation. Interesting enough, the -verbose option claims to call gswin32c.exe, although there is only gswin64c present on our server. (convert -list delegate | FIND "ps<=>pdf" tells me the same.) The program name seems to be translated automatically. I encountered no problems with the -q option, which is also present in my delegates.xml file.

BTW: The deleagtes.xml file always refers to @PSDelegate@ instead of any specifig program. I wonder where this generic reference is resolved (?).
Wolfgang Hugemann
Ivar Snaaijer
Posts: 21
Joined: 2006-02-22T09:22:30-07:00

Re: PDF to JPG : images cropped without reason

Post by Ivar Snaaijer »

@Wolfgang. I'm sorry, i meant that these problems were in the past. i work with different versions of imagemagick and the older ones gave some problems when mixed with ghostscript 9.02 x64. this is not a problem any more.

@Magick. I can not predict the size of the incoming pdf files. some might be A4, some A6, but I would not put it past the users to send something like an A1. All mixed in one file...
This particular pdf (from the statements above) has two pages in A4 and one in something a little bigger than A5 (it is the envelope the other two pages came in). I have generated a pdf with similar data in it as I am not allowed to send the original to you.

I downloaded the Google logo, rotated a copy of it and then did (on Ubuntu 6.6.0-4 x64)

Code: Select all

convert -adjoin logo3w_r.png logo3w_r.png logo3w.png  onefile.pdf
Resulting in the following file
http://home.snaaijer.nl/~ivar/onefile.pdf

When I convert this pdf with 6.6.x all is fine, if i use 6.7.4-4 the first two pages are not only severely cropped, they also gained a lot of white-space in the other direction. (as the -g setting is forced on all pages)

Code: Select all

convert onefile.pdf onefile.tif
SideNote : I just downoaded 6.7.4-4 and when i use -version it says 6.7.4-3
Ivar Snaaijer
Posts: 21
Joined: 2006-02-22T09:22:30-07:00

Re: PDF to JPG : images cropped without reason

Post by Ivar Snaaijer »

I have converted the file from thread 20078 with 6.6.0-4 2011-06-15 Q16 (a) on Ubuntu x64 (latest version) and the colours are off, but reasonable.
I also converted the same file with 6.6.7-10 2011-02-22 Q8 (b) and 6.7.4-3 2011-12-24 Q8 (c)
On Ubuntu I have ghostscript 9.04 (2011-08-05) x64 (a) on Windows I use Ghostscript 9.02 (2011-03-30) x32 (b) and (c)

the result of (a) and (b) are more or less the same , the first and last two pages are larger like A3 the others are A4.
the result of (c) is a file almost twice the size. all pages are the size of the larger pages. as zutautas referes to in the second reaction in the same thread.
This is either due to something odd in the pdf, or a fluke in ghostscript (i do not see it in FoxIt pdf or the pdf reader in Ubuntu)

I can not reproduce the problem that zutautas reports in thread 20078 (the first page looks fine). As I'm not getting the same problem and it seems imagemagick does a fine job, could it be something in ghostscript ?
Ivar Snaaijer
Posts: 21
Joined: 2006-02-22T09:22:30-07:00

Re: PDF to JPG : images cropped without reason

Post by Ivar Snaaijer »

I see there has been a changehttp://trac.imagemagick.org/changeset/6 ... ders/pdf.c made in regard to the -g handling.

I do not yet understand the implications of this change, I'll report back when I have tested it. Thanks.
Ivar Snaaijer
Posts: 21
Joined: 2006-02-22T09:22:30-07:00

Re: PDF to JPG : images cropped without reason

Post by Ivar Snaaijer »

I still did not came around testing this but in the changelog Cristy wrote :

2012-02-13 6.7.5-6 Cristy <quetzlzacatenango@image...>
Only set PDF & PS page size when explicitedly requested (e.g. -page).

Which would mean the problem is solved. The OP might be able to confirm
jobjol
Posts: 2
Joined: 2012-07-27T02:42:30-07:00
Authentication code: 15

Re: PDF to JPG : images cropped without reason

Post by jobjol »

I use the following code that converts in high quality without any cropping:

Code: Select all

convert -density 300x300 [input-pdf-file] -colorspace RGB -quality 90 [output-jpg-file]
So what are the changes? Maybe the density option fix this problem?

You can check the output quality of the command above @ http://pdfjpg.net
Post Reply