[BUG] Circular locking dependency - DRM/CMA/MM/hotplug/... (original) (raw)

Marek Szyprowski m.szyprowski at samsung.com
Tue Feb 18 06:25:21 PST 2014


Hello,

On 2014-02-12 17:33, Russell King - ARM Linux wrote:

On Wed, Feb 12, 2014 at 04:40:50PM +0100, Marek Szyprowski wrote: > Hello, > > On 2014-02-11 19:35, Russell King - ARM Linux wrote: >> The cubox-i4 just hit a new lockdep problem - not quite sure what to >> make of this - it looks like an interaction between quite a lot of >> locks - I suspect more than the lockdep code is reporting in its >> "Possible unsafe locking scenario" report. >> >> I'm hoping I've sent this to appropriate people... if anyone thinks >> this needs to go to someone else, please forward it. Thanks. > > From the attached log it looks like an issue (AB-BA deadlock) between > device mutex (&dev->structmutex) and mm semaphore (&mm->mmapsem). > Similar issue has been discussed quite a long time ago in v4l2 > subsystem:

I think there's more locks involved than just those two. > https://www.mail-archive.com/linux-media@vger.kernel.org/msg38599.html > http://www.spinics.net/lists/linux-media/msg40225.html > > Solving it probably requires some changes in DRM core. I see no direct > relation between this issue and CMA itself. I don't think so - the locking in DRM is pretty sane. Let's take a look: >> the existing dependency chain (in reverse order) is: >> -> #5 (&dev->structmutex){+.+...}: _>> [] lockacquire+0x151c/0x1ca0 >> [] lockacquire+0xa0/0x130 >> [] mutexlocknested+0x5c/0x3ac >> [] drmgemmmap+0x40/0xdc >> [] drmgemcmammap+0x14/0x2c >> [] mmapregion+0x3ac/0x59c >> [] dommappgoff+0x2c8/0x370 >> [] vmmmappgoff+0x6c/0x9c >> [] SySmmappgoff+0x54/0x98 >> [] retfastsyscall+0x0/0x48 vmmmappgoff() takes mm->mmapsem before calling dommappgoff(). So, this results in the following locking order: mm->mmapsem dev->structmutex >> -> #4 (&mm->mmapsem){++++++}: _>> [] lockacquire+0x151c/0x1ca0 >> [] lockacquire+0xa0/0x130 >> [] mightfault+0x6c/0x94 >> [] consetunimap+0x158/0x27c >> [] vtioctl+0x1298/0x1388 >> [] ttyioctl+0x168/0xbf4 >> [] dovfsioctl+0x84/0x664 >> [] SySioctl+0x44/0x64 >> [] retfastsyscall+0x0/0x48 vtioctl() takes the console lock, so this results in: consolelock mm->mmapsem >> -> #3 (consolelock){+.+.+.}: _>> [] lockacquire+0x151c/0x1ca0 >> [] lockacquire+0xa0/0x130 >> [] consolelock+0x60/0x74 >> [] consolecpunotify+0x28/0x34 >> [] notifiercallchain+0x4c/0x8c _>> [] rawnotifiercallchain+0x1c/0x24 _>> [] cpunotify+0x34/0x50 >> [] cpunotifynofail+0x18/0x24 >> [] cpudown+0x100/0x244 >> [] cpudown+0x30/0x44 >> [] cpusubsysoffline+0x14/0x18 >> [] deviceoffline+0x94/0xbc >> [] onlinestore+0x4c/0x74 >> [] devattrstore+0x20/0x2c >> [] sysfskfwrite+0x54/0x58 >> [] kernfsfopwrite+0xc4/0x160 >> [] vfswrite+0xbc/0x184 >> [] SySwrite+0x48/0x70 >> [] retfastsyscall+0x0/0x48 cpudown() takes cpuhotplug.lock, so here we have: cpuhotplug.lock consolelock >> -> #2 (cpuhotplug.lock){+.+.+.}: _>> [] lockacquire+0x151c/0x1ca0 >> [] lockacquire+0xa0/0x130 >> [] mutexlocknested+0x5c/0x3ac >> [] getonlinecpus+0x3c/0x58 >> [] lruadddrainall+0x24/0x190 >> [] migrateprep+0x10/0x18 >> [] alloccontigrange+0xf4/0x30c >> [] dmaallocfromcontiguous+0x7c/0x130 _>> [] allocfromcontiguous+0x38/0x12c >> [] atomicpoolinit+0x74/0x128 >> [] dooneinitcall+0x3c/0x164 >> [] kernelinitfreeable+0x104/0x1d0 >> [] kernelinit+0x10/0xec >> [] retfromfork+0x14/0x2c dmaallocfromcontiguous takes the cmamutex, so here we end up with: cmamutex cpuhotplug.lock >> -> #1 (lock){+.+...}: _>> [] lockacquire+0x151c/0x1ca0 >> [] lockacquire+0xa0/0x130 >> [] mutexlocknested+0x5c/0x3ac >> [] lruadddrainall+0x1c/0x190 >> [] migrateprep+0x10/0x18 >> [] alloccontigrange+0xf4/0x30c >> [] dmaallocfromcontiguous+0x7c/0x130 _>> [] allocfromcontiguous+0x38/0x12c >> [] atomicpoolinit+0x74/0x128 >> [] dooneinitcall+0x3c/0x164 >> [] kernelinitfreeable+0x104/0x1d0 >> [] kernelinit+0x10/0xec >> [] retfromfork+0x14/0x2c Ditto - here we have: cmamutex lock where "lock" is nicely named... this is a lock inside lruadddrainall() and under this lock, we call getonlinecpus() and putonlinecpus(). getonlinecpus() takes cpuhotplug.lock, so here we also have: cmamutex lock cpuhotplug.lock >> -> #0 (cmamutex){+.+.+.}: >> [] printcircularbug+0x70/0x2f0 _>> [] lockacquire+0x1580/0x1ca0 >> [] lockacquire+0xa0/0x130 >> [] mutexlocknested+0x5c/0x3ac >> [] dmareleasefromcontiguous+0xb8/0xf8 _>> [] armdmafree.isra.11+0x194/0x218 >> [] armdmafree+0x1c/0x24 >> [] drmgemcmafreeobject+0x68/0xb8 >> [] drmgemobjectfree+0x30/0x38 >> [] drmgemobjecthandleunreferenceunlocked+0x108/0x148 >> [] drmgemhandledelete+0xb0/0x10c >> [] drmgemdumbdestroy+0x14/0x18 >> [] drmmodedestroydumbioctl+0x34/0x40 >> [] drmioctl+0x3f4/0x498 >> [] dovfsioctl+0x84/0x664 >> [] SySioctl+0x44/0x64 >> [] retfastsyscall+0x0/0x48 drmgemobjectunreferenceunlocked takes dev->structmutex, so: dev->structmutex cmamutex

So, the full locking dependency tree is this: CPU0 CPU1 CPU2 CPU3 CPU4 dev->structmutex (from #0) mm->mmapsem dev->structmutex (from #5) consolelock (from #4) mm->mmapsem cpuhotplug.lock (from #3) consolelock cmamutex (from #2, but also from #1) cpuhotplug.lock cmamutex Which is pretty sick - and I don't think that blaming this solely on V4L2 nor DRM is particularly fair. I believe the onus is on every author of one of those locks involved in that chain needs to re-analyse whether their locking is sane. For instance, what is cmamutex protecting? Is it protecting the CMA bitmap?

This lock is protecting CMA bitmap and also serializes all CMA allocations. It is required by memory management core to serialize all calls to alloc_contig_range() (otherwise page block's migrate types might get overwritten). I don't see any other obvious solution for serializing alloc_contig_range() calls.

What if we did these changes:

struct page *dmaallocfromcontiguous(struct device *dev, int count, unsigned int align) { ... mutexlock(&cmamutex); ... for (;;) { pageno = bitmapfindnextzeroarea(cma->bitmap, cma->count, start, count, mask); if (pageno >= cma->count) break; pfn = cma->basepfn + pageno; + bitmapset(cma->bitmap, pageno, count); + mutexunlock(&cmamutex); ret = alloccontigrange(pfn, pfn + count, MIGRATECMA); + mutexlock(&cmamutex); if (ret == 0) { - bitmapset(cma->bitmap, pageno, count); page = pfntopage(pfn); break; - } else if (ret != -EBUSY) { + } + bitmapclear(cma->bitmap, pageno, count); + if (ret != -EBUSY) { break; } ... mutexunlock(&cmamutex); prdebug("%s(): returned %p\n", func, page); return page; }

bool dmareleasefromcontiguous(struct device *dev, struct page *pages, int count) { ... + freecontigrange(pfn, count); mutexlock(&cmamutex); bitmapclear(cma->bitmap, pfn - cma->basepfn, count); - freecontigrange(pfn, count); mutexunlock(&cmamutex); ... } which avoids the dependency between cmamutex and cpuhotplug.lock ?

This will not work correctly if there will be 2 concurrent calls to alloc_contig_range(), which will touch the same memory page blocks.

Best regards

Marek Szyprowski, PhD Samsung R&D Institute Poland



More information about the dri-devel mailing list