get rid of duplicates in a directory

Questions and postings pertaining to the usage of ImageMagick regardless of the interface. This includes the command-line utilities, as well as the C and C++ APIs. Usage questions are like "How do I use ImageMagick to create drop shadows?".
Post Reply
Senmeis
Posts: 1
Joined: 2013-07-31T20:53:10-07:00
Authentication code: 6789

get rid of duplicates in a directory

Post by Senmeis »

Hello,

Many frames are extracted from a video file, which includes a PPT presentation. Since many frames are the same, I want to get rid of all the duplicates with “compare”. I know two graphics can be compared in this way:

compare -verbose graphic1.jpg graphic2.jpg /dev/null

But how to compare the whole graphics successively and delete all the duplicates?

Thanks
Owen
User avatar
fmw42
Posts: 25562
Joined: 2007-07-02T17:14:51-07:00
Authentication code: 1152
Location: Sunnyvale, California, USA

Re: get rid of duplicates in a directory

Post by fmw42 »

You will have to script a loop over each pair of images in your directory (one image vs all the rest). Then use compare and put the result into a variable. Then test the variable against some threshold. If the difference is small enough, then use rm to delete one of the files. Then repeat for the the next image in the directory.

There is no IM only solution. So this is really more of a scripting issue than one about IM.
snibgo
Posts: 12159
Joined: 2010-01-23T23:01:33-07:00
Authentication code: 1151
Location: England, UK

Re: get rid of duplicates in a directory

Post by snibgo »

As fmw42 says, IM can readily find a difference number, eg:

Code: Select all

compare -metric RMSE frame_000045.tiff frame_000046.tiff NULL: 2>diff_000045_000046.txt
What you do with these numbers depends on what exactly you want to do.
snibgo's IM pages: im.snibgo.com
User avatar
fmw42
Posts: 25562
Joined: 2007-07-02T17:14:51-07:00
Authentication code: 1152
Location: Sunnyvale, California, USA

Re: get rid of duplicates in a directory

Post by fmw42 »

snibgo wrote:As fmw42 says, IM can readily find a difference number, eg:

Code: Select all

compare -metric RMSE frame_000045.tiff frame_000046.tiff NULL: 2>diff_000045_000046.txt
What you do with these numbers depends on what exactly you want to do.

If you want to put it into a variable then use

var=`compare -metric RMSE frame_000045.tiff frame_000046.tiff NULL: 2>&1`

you can add a pipe to sed or (tr and cut) to extract one of the two values returned. I usually go with the second which is in the range 0 to 1, so not IM compile dependent.

var=`compare -metric RMSE frame_000045.tiff frame_000046.tiff NULL: 2>&1 | tr -cs ".0-9" " " | cut -d\ -f2`

Then you can test $var against some fixed threshold value and decide to rm the file or not and continue the loop.
User avatar
anthony
Posts: 8883
Joined: 2004-05-31T19:27:03-07:00
Authentication code: 8675308
Location: Brisbane, Australia

Re: get rid of duplicates in a directory

Post by anthony »

if you can read in the frames into memory, you can also use some of the GIF animation tests with a -fuzz to see what has changed
however videos are notoriously noisy, and lower quality, though that may not be noticable when actually playing.
Anthony Thyssen -- Webmaster for ImageMagick Example Pages
https://imagemagick.org/Usage/
Post Reply