Page 1 of 2

Missing locks in blob - race condition

Posted: 2009-03-21T03:21:12-07:00
by Armani
my multithreaded application often aborts with this assertion
  • lt-test: magick/blob.c:3759: WriteBlob: Assertion `image->blob->type != UndefinedStream' failed.
my test application just loads a jpeg file an starts threads that are doing all the same processing:
zoom, modulate and despeckle the image, always with the same parameters.

after having started some threads and processing simultaneosly this assertion occurs. Sometime just when 10 threads are running, sometimes after started 80 threads.
so it looks to me as this could be a race condition.

I was able to generate an according output from helgrind just before the assertion occured:
  • ==13038== Possible data race during read of size 4 at 0x88cf694 by thread #5
    ==13038== at 0x5FF5EA7: WriteBlob (blob.c:3759)
    ==13038== by 0x5B6FF85: TerminateDestination (jpeg.c:1380)
    ==13038== by 0x5BBEA51: jpeg_finish_compress (in /usr/lib/libjpeg.so.62.0.0)
    ==13038== by 0x602582A: WriteImage (constitute.c:1168)
    ==13038== by 0x5FF9D5A: ImageToBlob (blob.c:1366)
    ==13038== by 0x5D2AFD8: Magick::Image::write(Magick::Blob*) (in /usr/lib/libMagick++.so.2.0.0)

    ==13038== Possible data race during read of size 4 at 0x88cf694 by thread #22
    ==13038== at 0x5FF80D1: CloseBlob (blob.c:478)
    ==13038== by 0x5FF9D31: ImageToBlob (blob.c:1363)
    ==13038== by 0x5D2AFD8: Magick::Image::write(Magick::Blob*) (in /usr/lib/libMagick++.so.2.0.0)

    ==13038== This conflicts with a previous write of size 4 by thread #8
    ==13038== at 0x5FF81B9: CloseBlob (blob.c:486)
    ==13038== by 0x5FF9D31: ImageToBlob (blob.c:1363)
    ==13038== by 0x5D2AFD8: Magick::Image::write(Magick::Blob*) (in /usr/lib/libMagick++.so.2.0.0)
Please let me know if I can do any tests in my environment.

Version: ImageMagick 6.5.0-3 2009-03-21 Q8

~artur

Re: Missing locks in blob - race condition

Posted: 2009-03-21T09:24:52-07:00
by magick
Can you post a small program we can download to reproduce the problem? We tried to generate the fault with our thread units tests but were unable to.

ImageToBlob() is single threaded meaning two threads cannot call ImageToBlob() with the same image. We of course could patch ImageMagick to first clone the image before its written so each thread has a unique blob structure (blobs are single-threaded) but first we need to confirm if this is the source of your fault by inspecting your source code.

Re: Missing locks in blob - race condition

Posted: 2009-03-21T10:24:52-07:00
by egb
magick wrote:Can you post a small program we can download to reproduce the problem? We tried to generate the fault with our thread units tests but were unable to.
Here's a test program: http://www.xebec.de/magick/threadtest.cpp

Compile with:
  • gcc -Wall -o test `Magick++-config --cppflags --cxxflags --ldflags --libs` threadtest.cpp
Have some "test.jpg" in the directory.

Run with:
  • valgrind --tool=helgrind ./test
magick wrote: ImageToBlob() is single threaded meaning two threads cannot call ImageToBlob() with the same image. We of course could patch ImageMagick to first clone the image before its written so each thread has a unique blob structure (blobs are single-threaded) but first we need to confirm if this is the source of your fault by inspecting your source code.
The threads in the example code share the same source image, but make a copy first before writing it into a blob. The Blob is thread-local, too.

with a bit of luck (doesn't happen every time), the test program bails out with:
  • magick/blob.c:3759: WriteBlob: Assertion `image->blob->type != UndefinedStream' failed.
And the helgrind output should show some references to WriteBlob/CloseBlob. (Hidden in a bunch of warnings about a race in GetPixelCacheMaximumThreads, which isn't really one.)

Maybe this helps you tracking down the problem Armani reported.

Re: Missing locks in blob - race condition

Posted: 2009-03-21T13:30:29-07:00
by magick
We have a patch for the problem you posted. Look for it in ImageMagick 6.5.0-4 available later this evening. Thanks.

Re: Missing locks in blob - race condition

Posted: 2009-03-21T13:40:58-07:00
by Armani
Great! Thanks for the fast patch

Re: Missing locks in blob - race condition

Posted: 2009-03-21T14:22:49-07:00
by magick
If you want to test now, change
  • clone_image->blob=ReferenceBlob(image->blob);
to
  • clone_image->blob=CloneBlobInfo((BlobInfo *) NULL);
in magick/image.c/CloneImage().

This also requires a couple of changes to a few coders but most will work properly with the above change. Let us know if your program continues to fail with the JPEG image format.

Re: Missing locks in blob - race condition

Posted: 2009-03-21T14:50:16-07:00
by Armani
Works fine!
Neither my test application nor Ernst's crashes anymore.
Thank you!

But I could see multiple times following output with helgrind
  • ==1780== Possible data race during read of size 4 at 0x6d60660 by thread #25
    ==1780== at 0x5EBB2F8: GetVirtualPixelsFromNexus (cache.c:1736)
    ==1780== by 0x5EBBD2A: GetVirtualPixelCache (cache.c:3351)
    ==1780== by 0x5EB80DC: GetVirtualPixels (cache.c:3465)
    ==1780== by 0x5AF6698: WriteJPEGImage (jpeg.c:2103)
    ==1780== by 0x5EDF80A: WriteImage (constitute.c:1168)
    ==1780== by 0x5EB3D5A: ImageToBlob (blob.c:1366)
    ==1780== by 0x5BE4FD8: Magick::Image::write(Magick::Blob*) (in /usr/lib/libMagick++.so.2.0.0)
    ==1780== This conflicts with a previous write of size 4 by thread #5
    ==1780== at 0x5EB970A: ClonePixelCacheNexus (cache.c:494)
    ==1780== by 0x5EBFE32: GetImagePixelCache (cache.c:2094)
    ==1780== by 0x5EC1FAF: QueueCacheViewAuthenticPixels (cache-view.c:836)
    ==1780== by 0x5F988C2: VerticalFilter (resize.c:2023)
    ==1780== by 0x5F9C85B: ResizeImage (resize.c:2223)
    ==1780== by 0x5F9CAC4: ZoomImage (resize.c:2985)
    ==1780== by 0x5BE70A3: Magick::Image::zoom(Magick::Geometry const&) (in /usr/lib/libMagick++.so.2.0.0)
But this led not to a crash in my test, so it's harmless maybe.

~artur

Re: Missing locks in blob - race condition

Posted: 2009-03-21T15:09:38-07:00
by Armani
I cannot reproduce the race condition with helgrund again, but wenn running Ernst's program again and again in an endlessloop I was able to generate an assertion:
  • Thread 1 (process 2537):
    #0 0xffffe425 in __kernel_vsyscall ()
    #1 0xf79ffdb1 in raise () from /lib/libc.so.6
    #2 0xf7a01668 in abort () from /lib/libc.so.6
    #3 0xf79f9065 in __assert_fail () from /lib/libc.so.6
    #4 0xf7d56b6a in ZoomImage (image=0x895f6c8, columns=326, rows=1, exception=0xf68242f4) at magick/resize.c:2980
    #5 0xf7efa0a4 in Magick::Image::zoom () from /usr/lib/libMagick++.so.2
    #6 0x08049472 in testthread ()
    #7 0xf7b2d1c2 in start_thread () from /lib/libpthread.so.0
    #8 0xf7a9fa5e in clone () from /lib/libc.so.6
All other 23 threads are looking the same:
Thread 2 (process 2542):
  • #0 0xffffe425 in __kernel_vsyscall ()
    #1 0xf7b34329 in __lll_lock_wait () from /lib/libpthread.so.0
    #2 0xf7b2fb3e in _L_lock_55 () from /lib/libpthread.so.0
    #3 0xf7b2f4fa in pthread_mutex_lock () from /lib/libpthread.so.0
    #4 0xf7d5cbca in LockSemaphoreInfo (semaphore_info=0xfffffe00) at magick/semaphore.c:351
    #5 0xf7d5cff4 in AcquireSemaphoreInfo (semaphore_info=0xf7db907c) at magick/semaphore.c:143
    #6 0xf7c997ee in WriteImage (image_info=0xea67b338, image=0xea688678) at magick/constitute.c:1122
    #7 0xf7c6dd5b in ImageToBlob (image_info=0xe9df5930, image=0xea688678, length=0xf4820328, exception=0xf4820308) at magick/blob.c:1366
    #8 0xf7ef7fd9 in Magick::Image::write () from /usr/lib/libMagick++.so.2
    #9 0x08049492 in testthread ()
    #10 0xf7b2d1c2 in start_thread () from /lib/libpthread.so.0
    #11 0xf7a9fa5e in clone () from /lib/libc.so.6
This could be the result of a race condition occured before

Re: Missing locks in blob - race condition

Posted: 2009-03-21T15:26:05-07:00
by magick
Let's see if the problem is related to zoom(). Use scale() instead. Can you produce the fault now?

Re: Missing locks in blob - race condition

Posted: 2009-03-21T16:02:29-07:00
by Armani
I could reproduce it even with scale:
magick/resize.c:2476: ScaleImage: Assertion `image->signature == 0xabacadabUL' failed.

unfortunately no core file was dumped

Re: Missing locks in blob - race condition

Posted: 2009-03-21T16:14:45-07:00
by Armani
Could reproduce it again, now with corefile ;-)
nearly the same as before:

This thread asserted, and it's image pointer doesn't look good
  • Thread 1 (process 6033):
    #0 0xffffe425 in __kernel_vsyscall ()
    #1 0xf7a11db1 in raise () from /lib/libc.so.6
    #2 0xf7a13668 in abort () from /lib/libc.so.6
    #3 0xf7a0b065 in __assert_fail () from /lib/libc.so.6
    #4 0xf7d66779 in ScaleImage (image=0x0, columns=272, rows=1, exception=0xf78382f4) at magick/resize.c:2475
    #5 0xf7f0cad4 in Magick::Image::scale () from /usr/lib/libMagick++.so.2
    #6 0x080493d2 in testthread ()
    #7 0xf7b3f1c2 in start_thread () from /lib/libpthread.so.0
    #8 0xf7ab1a5e in clone () from /lib/libc.so.6
most threads like this:
  • Thread 2 (process 6042):
    #0 0xffffe425 in __kernel_vsyscall ()
    #1 0xf7b46329 in __lll_lock_wait () from /lib/libpthread.so.0
    #2 0xf7b41b3e in _L_lock_55 () from /lib/libpthread.so.0
    #3 0xf7b414fa in pthread_mutex_lock () from /lib/libpthread.so.0
    #4 0xf7d6ebca in LockSemaphoreInfo (semaphore_info=0xfffffe00) at magick/semaphore.c:351
    #5 0xf7d6eff4 in AcquireSemaphoreInfo (semaphore_info=0xf7dcb07c) at magick/semaphore.c:143
    #6 0xf7cab7ee in WriteImage (image_info=0xea5eb3d0, image=0xea5450c0) at magick/constitute.c:1122
    #7 0xf7c7fd5b in ImageToBlob (image_info=0xea5d6340, image=0xea5450c0, length=0xf302f328, exception=0xf302f308) at magick/blob.c:1366
    #8 0xf7f09fd9 in Magick::Image::write () from /usr/lib/libMagick++.so.2
    #9 0x080493f2 in testthread ()
    #10 0xf7b3f1c2 in start_thread () from /lib/libpthread.so.0
    #11 0xf7ab1a5e in clone () from /lib/libc.so.6
One thread currently terminating
  • Thread 23 (process 6032):
    #0 0xffffe425 in __kernel_vsyscall ()
    #1 0xf7b406de in pthread_join () from /lib/libpthread.so.0
    #2 0x0804966f in main ()
On the last one in jpeg lib
  • Thread 24 (process 6046):
    #0 0xffffe425 in __kernel_vsyscall ()
    #1 0xf7aa2863 in read () from /lib/libc.so.6
    #2 0xf7a4ed6d in _IO_file_read () from /lib/libc.so.6
    #3 0xf7a4ffe5 in _IO_file_underflow () from /lib/libc.so.6
    #4 0xf7a5076b in _IO_default_uflow () from /lib/libc.so.6
    #5 0xf7a51b5d in __uflow () from /lib/libc.so.6
    #6 0xf7a44c96 in _IO_getline_info () from /lib/libc.so.6
    #7 0xf7a44bdf in _IO_getline () from /lib/libc.so.6
    #8 0xf7a4db51 in fgets_unlocked () from /lib/libc.so.6
    #9 0xf7ab0dd5 in ?? () from /lib/libc.so.6
    #10 0xf7a7ec1a in sysconf () from /lib/libc.so.6
    #11 0xf7bd0f02 in jpeg_mem_init () from /usr/lib/libjpeg.so.62
    #12 0x00000056 in ?? ()
    #13 0xf7a54da1 in ?? () from /lib/libc.so.6
    #14 0xea1f4cb0 in ?? ()
    #15 0xf7bd3ff4 in ?? () from /usr/lib/libjpeg.so.62
    #16 0xffffffff in ?? ()
    #17 0xf7bd3ff4 in ?? () from /usr/lib/libjpeg.so.62
    #18 0xf1028124 in ?? ()
    #19 0xf7bd02f4 in jinit_memory_mgr () from /usr/lib/libjpeg.so.62
    #20 0xea100048 in ?? ()
    #21 0xea100040 in ?? ()
    #22 0xea100010 in ?? ()
    #23 0x00000002 in ?? ()
    #24 0x08e9d710 in ?? ()
    #25 0xf7bd3ff4 in ?? () from /usr/lib/libjpeg.so.62
    #26 0xf1028124 in ?? ()
    #27 0xf1028084 in ?? ()
    #28 0xf7bb7b6e in jpeg_CreateCompress () from /usr/lib/libjpeg.so.62
    #29 0xf1027f10 in ?? ()
    #30 0x00000000 in ?? ()

Re: Missing locks in blob - race condition

Posted: 2009-03-21T16:34:55-07:00
by magick
If you spot a problem or have a suggestion let us know. Otherwise we will investigate futher sometime next week. Thanks.

Re: Missing locks in blob - race condition

Posted: 2009-03-22T12:38:34-07:00
by Armani
I have done few more tests today and realized, that locks are not working in Magick++.

In Magick++/lib/Magick++/Thread.h I could see a define HasPTHREADS that must be defined until pthread is used,
but this isn't defined anywhere.

My system has pthread library and is has been recognized by configure.

config.h contains:

Code: Select all

#define HAVE_PTHREAD 1
I'm wondering if HasPTHREADS is just a typo and HAVE_PTHREAD should have been used instead.

~artur

Re: Missing locks in blob - race condition

Posted: 2009-03-22T12:55:22-07:00
by magick
Good catch. We discovered this problem earlier today when using helgrind. Look for a patch in ImageMagick 6.5.0-5 later today. If you continue to have problems with ImageMagick 6.5.0-5 let us know. So far the thread unit tests and your threadtest program are running without complaint (> 1000 executions without fail).

In the mean-time download ImageMagick 6.5.0-4 and replace HasPTHREADS with MAGICKCORE_HAVE_PTHREAD in Magick++/lib/Thread.cpp and Magick++/lib/Magick++/Thread.h. Let us know if your program succeed.

Re: Missing locks in blob - race condition

Posted: 2009-03-22T13:43:28-07:00
by Armani
Thanks!

I have tested it and it looks good :-) I will try more runs and will let you know if I encounter any problem.

~artur