Handling very high page count MultiPage Tiffs

Magick.NET is an object-oriented C# interface to ImageMagick. Use this forum to discuss, make suggestions about, or report bugs concerning Magick.NET
Post Reply
trSteve
Posts: 4
Joined: 2019-02-08T15:04:25-07:00
Authentication code: 1152

Handling very high page count MultiPage Tiffs

Post by trSteve »

I need to process some very large MultipageTiff documents (some in excess of 10k frames/images/pages). MagickImageCollection appears to populate the pixel caches for every single frame of the image when the collection is opened (using ~25MB per image) which quickly consumes vast quantities of disk space, and in some cases fills the disk completely causing crashes (this generally happens if there are multiple high page count documents that get processed around the same time).

Is there method in Magick.Net to get the total frame count from the file prior having MagickImageCollection process/open it (I haven't been able to find one)? I would like to handle the pages in batches to minimize the disk impact - and have the following code which works quite well as long as I know the frame count up front, With a batch size of 25 - it doesn't generate any cache files at all (which is ideal) If I go to 50, it does generate 15 to 20 caches depending on source material used.

Code: Select all

var fileName = "testFile.tiff";
var batchSize = 25;
var frameCount = this.GetFrameCount(fileName);  //  How do I do this part?

for (var i = 0; i < frameCount; i += batchSize)
{
	var settings = new MagickReadSettings();
	settings.FrameCount = i + batchSize >= frameCount ? frameCount - i : batchSize;
	settings.FrameIndex = i;

	using (var imageCollection = new MagickImageCollection(fileName, settings))
	{
		foreach (var image in imageCollection)
		{
			// Do Stuff Here.
		}
	}
}
Also open to any other suggestions on how to address this.

Thanks!
trSteve
Posts: 4
Joined: 2019-02-08T15:04:25-07:00
Authentication code: 1152

Re: Handling very high page count MultiPage Tiffs

Post by trSteve »

Additional note - The reason I need the full frame count, is that if you attempt to request for example - frame 101 from a 100 frame image - MagickImageCollection disregards the the read settings and begins loading all of the frames from the file into the cache (seems like a bug?).

This could be resolved if a zero length collection was returned if the requested frames do not exist (or a partial collection if only some of the frames exist). Throwing an error could also work, but doesn't seem useful for addressing this specific issue if you don't already know the full frame count.
trSteve
Posts: 4
Joined: 2019-02-08T15:04:25-07:00
Authentication code: 1152

Re: Handling very high page count MultiPage Tiffs

Post by trSteve »

I'm now pulling frame count using the MS TiffBitmapDecoder - which has allowed me to move forward. I would still prefer to use Magick.Net to get this information without having to populate the pixel cache if at all possible.
User avatar
dlemstra
Posts: 1570
Joined: 2013-05-04T15:28:54-07:00
Authentication code: 6789
Contact:

Re: Handling very high page count MultiPage Tiffs

Post by dlemstra »

Can you share an image that demonstrates this issue? Is this an image with Photoshop layers or tiff pages? And have you tried using the Ping method of the MagickImageCollection to get the number of images?

It would also be nice if you could demonstrate the issue where you request a frame that is to high and you don't get an empty collection. Or the frames that fall without the range that you are requesting when you are requesting frames outside the boundary. I cannot reproduce that issue with the latest version of Magick.NET.
.NET + ImageMagick = Magick.NET https://github.com/dlemstra/Magick.NET, @MagickNET, Donate
trSteve
Posts: 4
Joined: 2019-02-08T15:04:25-07:00
Authentication code: 1152

Re: Handling very high page count MultiPage Tiffs

Post by trSteve »

Ping does exactly what I need - Thanks! Not sure how I missed that.

Upon further investigation, this only affects calls where the index is out of bounds - (frame count is irrelevant).

Here's a sample project the demonstrates this:
https://github.com/swd120/ImageMagickTest

Cache files are generated when the index is out of bounds - however, there are no images loaded into the collection when this occurs. I would think that an error should be thrown in that scenario, or just an empty collection returned without any caches generated? Also, somewhat intermittently some cache files are left behind in this scenario (could possibly be due to contention from the file watcher getting the cache file counts - I haven't investigated that further).
Post Reply