Hello igniters!
So I was working on a fix for https://issues.apache.org/jira/browse/IGNITE-9056 The reason for test flakiness turned out our ConcurrentLinkedHashMap (and its tautological cousin GridBoundedConcurrentLinkedHashMap) is broken :( When you do clear(). its size counter is not updated. So sizex() will return the old size after clear, and if there's maxCnt set, after several clear()s it will immediately evict entries after they are inserted, maintaining map size at 0. This is scary since indexing internals make intense use of ConcurrentLinkedHashMaps. My suggestion for this fix is to avoid ever calling clear(), making it throw UnsupportedOperationException and recreating/replacing map instead of clear()ing it. Unless somebody is going to stand up and fix ConcurrentLinkedHashMap.clear() properly. Frankly speaking I'm afraid of touching this code in any non-trivial way. -- Ilya Kasnacheev |
Thanks for revealing this issue!
I don't understand why should we disallow calling clear(). One way how it can be re-implemented is: 1. acquire write locks on all segments; 2. clear them; 3. reset size to 0; 4. release locks. Another approach is to calculate inside ConcurrentLinkedHashMap.Segment.clear() how many entries you actually deleted and then call size.addAndGet(...). In both cases you'll have to replace LongAdder with AtomicLong. On Tue, Jul 24, 2018 at 4:03 PM, Ilya Kasnacheev <[hidden email]> wrote: > Hello igniters! > > So I was working on a fix for > https://issues.apache.org/jira/browse/IGNITE-9056 > The reason for test flakiness turned out our ConcurrentLinkedHashMap (and > its tautological cousin GridBoundedConcurrentLinkedHashMap) is broken :( > > When you do clear(). its size counter is not updated. So sizex() will > return the old size after clear, and if there's maxCnt set, after several > clear()s it will immediately evict entries after they are inserted, > maintaining map size at 0. > > This is scary since indexing internals make intense use of > ConcurrentLinkedHashMaps. > > My suggestion for this fix is to avoid ever calling clear(), making it > throw UnsupportedOperationException and recreating/replacing map instead of > clear()ing it. Unless somebody is going to stand up and fix > ConcurrentLinkedHashMap.clear() properly. Frankly speaking I'm afraid of > touching this code in any non-trivial way. > > -- > Ilya Kasnacheev > -- Best regards, Ilya |
Hello, Ilya.
May be we need to proceed with ticket [1] "Get rid of org.jsr166.ConcurrentLinkedHashMap"? Especially, if this class is broken and buggy. [1] https://issues.apache.org/jira/browse/IGNITE-7516 В Вт, 24/07/2018 в 16:20 +0300, Ilya Lantukh пишет: > Thanks for revealing this issue! > > I don't understand why should we disallow calling clear(). > > One way how it can be re-implemented is: > 1. acquire write locks on all segments; > 2. clear them; > 3. reset size to 0; > 4. release locks. > > Another approach is to calculate inside > ConcurrentLinkedHashMap.Segment.clear() how many entries you actually > deleted and then call size.addAndGet(...). > > In both cases you'll have to replace LongAdder with AtomicLong. > > On Tue, Jul 24, 2018 at 4:03 PM, Ilya Kasnacheev <[hidden email]> > wrote: > > > Hello igniters! > > > > So I was working on a fix for > > https://issues.apache.org/jira/browse/IGNITE-9056 > > The reason for test flakiness turned out our ConcurrentLinkedHashMap (and > > its tautological cousin GridBoundedConcurrentLinkedHashMap) is broken :( > > > > When you do clear(). its size counter is not updated. So sizex() will > > return the old size after clear, and if there's maxCnt set, after several > > clear()s it will immediately evict entries after they are inserted, > > maintaining map size at 0. > > > > This is scary since indexing internals make intense use of > > ConcurrentLinkedHashMaps. > > > > My suggestion for this fix is to avoid ever calling clear(), making it > > throw UnsupportedOperationException and recreating/replacing map instead of > > clear()ing it. Unless somebody is going to stand up and fix > > ConcurrentLinkedHashMap.clear() properly. Frankly speaking I'm afraid of > > touching this code in any non-trivial way. > > > > -- > > Ilya Kasnacheev > > > > > |
It seems that we currently use the CLHM primarily as a FIFO cache.
I see two ways around that. First, we could implement such LRU/FIFO cache based on another, properly supported data structure from j.u.c. ConcurrentSkipListMap seems OK. I have a draft implementation of LruEvictionPolicy based on it that passes functional tests, but I haven’t benchmarked it yet. Second, Guava has a Cache API with a lot of configuration options that we could use (license is Apache, should be OK). Stan From: Nikolay Izhikov Sent: 24 июля 2018 г. 16:27 To: [hidden email] Subject: Re: ConcurrentLinkedHashMap works incorrectly after clear() Hello, Ilya. May be we need to proceed with ticket [1] "Get rid of org.jsr166.ConcurrentLinkedHashMap"? Especially, if this class is broken and buggy. [1] https://issues.apache.org/jira/browse/IGNITE-7516 В Вт, 24/07/2018 в 16:20 +0300, Ilya Lantukh пишет: > Thanks for revealing this issue! > > I don't understand why should we disallow calling clear(). > > One way how it can be re-implemented is: > 1. acquire write locks on all segments; > 2. clear them; > 3. reset size to 0; > 4. release locks. > > Another approach is to calculate inside > ConcurrentLinkedHashMap.Segment.clear() how many entries you actually > deleted and then call size.addAndGet(...). > > In both cases you'll have to replace LongAdder with AtomicLong. > > On Tue, Jul 24, 2018 at 4:03 PM, Ilya Kasnacheev <[hidden email]> > wrote: > > > Hello igniters! > > > > So I was working on a fix for > > https://issues.apache.org/jira/browse/IGNITE-9056 > > The reason for test flakiness turned out our ConcurrentLinkedHashMap (and > > its tautological cousin GridBoundedConcurrentLinkedHashMap) is broken :( > > > > When you do clear(). its size counter is not updated. So sizex() will > > return the old size after clear, and if there's maxCnt set, after several > > clear()s it will immediately evict entries after they are inserted, > > maintaining map size at 0. > > > > This is scary since indexing internals make intense use of > > ConcurrentLinkedHashMaps. > > > > My suggestion for this fix is to avoid ever calling clear(), making it > > throw UnsupportedOperationException and recreating/replacing map instead of > > clear()ing it. Unless somebody is going to stand up and fix > > ConcurrentLinkedHashMap.clear() properly. Frankly speaking I'm afraid of > > touching this code in any non-trivial way. > > > > -- > > Ilya Kasnacheev > > > > > |
Hello Ilya,
Can you add more info about why and how LT for this test case prints log message twice? From my point, maiking clear() method to throw UnsupportedOperationException is not right fix for flaky test issues. A brief search through CLHM led me to a thought that we just forgot to drop down LongAdder size when iterating over HashEntry array. We incrementing and decrementing this counter on put/remove operations by why not in clear method? Am I right? So, replacing LongAdder to AtomicLong sounds good to me too, as it was suggested by Ilya Lantukh. But I can mistake here. In general way, I think it's a good case to start thinking about how to get rid of CLHM. E.g. we can consider this implementaion [1]. [1] https://github.com/ben-manes/concurrentlinkedhashmap вт, 24 июл. 2018 г. в 16:45, Stanislav Lukyanov <[hidden email]>: > It seems that we currently use the CLHM primarily as a FIFO cache. > I see two ways around that. > > First, we could implement such LRU/FIFO cache based on another, properly > supported data structure from j.u.c. > ConcurrentSkipListMap seems OK. I have a draft implementation of > LruEvictionPolicy based on it that passes functional tests, > but I haven’t benchmarked it yet. > > Second, Guava has a Cache API with a lot of configuration options that we > could use (license is Apache, should be OK). > > Stan > > From: Nikolay Izhikov > Sent: 24 июля 2018 г. 16:27 > To: [hidden email] > Subject: Re: ConcurrentLinkedHashMap works incorrectly after clear() > > Hello, Ilya. > > May be we need to proceed with ticket [1] "Get rid of > org.jsr166.ConcurrentLinkedHashMap"? > > Especially, if this class is broken and buggy. > > [1] https://issues.apache.org/jira/browse/IGNITE-7516 > > В Вт, 24/07/2018 в 16:20 +0300, Ilya Lantukh пишет: > > Thanks for revealing this issue! > > > > I don't understand why should we disallow calling clear(). > > > > One way how it can be re-implemented is: > > 1. acquire write locks on all segments; > > 2. clear them; > > 3. reset size to 0; > > 4. release locks. > > > > Another approach is to calculate inside > > ConcurrentLinkedHashMap.Segment.clear() how many entries you actually > > deleted and then call size.addAndGet(...). > > > > In both cases you'll have to replace LongAdder with AtomicLong. > > > > On Tue, Jul 24, 2018 at 4:03 PM, Ilya Kasnacheev < > [hidden email]> > > wrote: > > > > > Hello igniters! > > > > > > So I was working on a fix for > > > https://issues.apache.org/jira/browse/IGNITE-9056 > > > The reason for test flakiness turned out our ConcurrentLinkedHashMap > (and > > > its tautological cousin GridBoundedConcurrentLinkedHashMap) is broken > :( > > > > > > When you do clear(). its size counter is not updated. So sizex() will > > > return the old size after clear, and if there's maxCnt set, after > several > > > clear()s it will immediately evict entries after they are inserted, > > > maintaining map size at 0. > > > > > > This is scary since indexing internals make intense use of > > > ConcurrentLinkedHashMaps. > > > > > > My suggestion for this fix is to avoid ever calling clear(), making it > > > throw UnsupportedOperationException and recreating/replacing map > instead of > > > clear()ing it. Unless somebody is going to stand up and fix > > > ConcurrentLinkedHashMap.clear() properly. Frankly speaking I'm afraid > of > > > touching this code in any non-trivial way. > > > > > > -- > > > Ilya Kasnacheev > > > > > > > > > > > -- Maxim Muzafarov |
Hello!
LT stops throttling input as soon as it is unable to add any entries to underlying map since it would consider itself full with 0 entries. I have tried to implement both suggestions, and they all have big impact on CLHM code. I am super uncomfortable with making non-trivial edits to it. If the first approach is chosen, there's the need to iterate values instead of clearing table, and if second approach is chosen, locking becomes non-trivial since we're grabbing segment locks outside of segment code.. Changing LongAdder to AtomicLong also has potential implications which are not readily understood. Note that entryQ is not cleared in clear() either which can cause further problems. My suggestion is still to either disallow clear() or throw this class away in its entirety. Regards, -- Ilya Kasnacheev 2018-07-25 12:00 GMT+03:00 Maxim Muzafarov <[hidden email]>: > Hello Ilya, > > Can you add more info about why and how LT for this test case prints log > message twice? > > From my point, maiking clear() method to throw > UnsupportedOperationException is not right > fix for flaky test issues. A brief search through CLHM led me to a thought > that we just forgot to drop down > LongAdder size when iterating over HashEntry array. We incrementing and > decrementing this > counter on put/remove operations by why not in clear method? Am I right? > So, replacing LongAdder to AtomicLong > sounds good to me too, as it was suggested by Ilya Lantukh. But I can > mistake here. > > In general way, I think it's a good case to start thinking about how to get > rid of CLHM. E.g. we can consider this implementaion [1]. > > [1] https://github.com/ben-manes/concurrentlinkedhashmap > > вт, 24 июл. 2018 г. в 16:45, Stanislav Lukyanov <[hidden email]>: > > > It seems that we currently use the CLHM primarily as a FIFO cache. > > I see two ways around that. > > > > First, we could implement such LRU/FIFO cache based on another, properly > > supported data structure from j.u.c. > > ConcurrentSkipListMap seems OK. I have a draft implementation of > > LruEvictionPolicy based on it that passes functional tests, > > but I haven’t benchmarked it yet. > > > > Second, Guava has a Cache API with a lot of configuration options that we > > could use (license is Apache, should be OK). > > > > Stan > > > > From: Nikolay Izhikov > > Sent: 24 июля 2018 г. 16:27 > > To: [hidden email] > > Subject: Re: ConcurrentLinkedHashMap works incorrectly after clear() > > > > Hello, Ilya. > > > > May be we need to proceed with ticket [1] "Get rid of > > org.jsr166.ConcurrentLinkedHashMap"? > > > > Especially, if this class is broken and buggy. > > > > [1] https://issues.apache.org/jira/browse/IGNITE-7516 > > > > В Вт, 24/07/2018 в 16:20 +0300, Ilya Lantukh пишет: > > > Thanks for revealing this issue! > > > > > > I don't understand why should we disallow calling clear(). > > > > > > One way how it can be re-implemented is: > > > 1. acquire write locks on all segments; > > > 2. clear them; > > > 3. reset size to 0; > > > 4. release locks. > > > > > > Another approach is to calculate inside > > > ConcurrentLinkedHashMap.Segment.clear() how many entries you actually > > > deleted and then call size.addAndGet(...). > > > > > > In both cases you'll have to replace LongAdder with AtomicLong. > > > > > > On Tue, Jul 24, 2018 at 4:03 PM, Ilya Kasnacheev < > > [hidden email]> > > > wrote: > > > > > > > Hello igniters! > > > > > > > > So I was working on a fix for > > > > https://issues.apache.org/jira/browse/IGNITE-9056 > > > > The reason for test flakiness turned out our ConcurrentLinkedHashMap > > (and > > > > its tautological cousin GridBoundedConcurrentLinkedHashMap) is > broken > > :( > > > > > > > > When you do clear(). its size counter is not updated. So sizex() will > > > > return the old size after clear, and if there's maxCnt set, after > > several > > > > clear()s it will immediately evict entries after they are inserted, > > > > maintaining map size at 0. > > > > > > > > This is scary since indexing internals make intense use of > > > > ConcurrentLinkedHashMaps. > > > > > > > > My suggestion for this fix is to avoid ever calling clear(), making > it > > > > throw UnsupportedOperationException and recreating/replacing map > > instead of > > > > clear()ing it. Unless somebody is going to stand up and fix > > > > ConcurrentLinkedHashMap.clear() properly. Frankly speaking I'm > afraid > > of > > > > touching this code in any non-trivial way. > > > > > > > > -- > > > > Ilya Kasnacheev > > > > > > > > > > > > > > > > > -- > -- > Maxim Muzafarov > |
Ilya,
As for me, the main cause of this problem is not in CLHM itself but that we are using it for GridLogThrottle as container with fixed size. Suppose, at current moment we have no alternative and should start thinking about futher steps. Can you create clear reproducer for described issue with CLHM above? As workaround for GridLogThrottle we can change clear() like this: public static void clear() { errors.forEach((k, v) -> errors.remove(k)); } Will it helps you to fix test? Thoughts? On Wed, 25 Jul 2018 at 19:57 Ilya Kasnacheev <[hidden email]> wrote: > Hello! > > LT stops throttling input as soon as it is unable to add any entries to > underlying map since it would consider itself full with 0 entries. > > I have tried to implement both suggestions, and they all have big impact on > CLHM code. I am super uncomfortable with making non-trivial edits to it. > > If the first approach is chosen, there's the need to iterate values instead > of clearing table, and if second approach is chosen, locking becomes > non-trivial since we're grabbing segment locks outside of segment code.. > > Changing LongAdder to AtomicLong also has potential implications which are > not readily understood. > > Note that entryQ is not cleared in clear() either which can cause further > problems. My suggestion is still to either disallow clear() or throw this > class away in its entirety. > > Regards, > > -- > Ilya Kasnacheev > > 2018-07-25 12:00 GMT+03:00 Maxim Muzafarov <[hidden email]>: > > > Hello Ilya, > > > > Can you add more info about why and how LT for this test case prints log > > message twice? > > > > From my point, maiking clear() method to throw > > UnsupportedOperationException is not right > > fix for flaky test issues. A brief search through CLHM led me to a > thought > > that we just forgot to drop down > > LongAdder size when iterating over HashEntry array. We incrementing and > > decrementing this > > counter on put/remove operations by why not in clear method? Am I right? > > So, replacing LongAdder to AtomicLong > > sounds good to me too, as it was suggested by Ilya Lantukh. But I can > > mistake here. > > > > In general way, I think it's a good case to start thinking about how to > get > > rid of CLHM. E.g. we can consider this implementaion [1]. > > > > [1] https://github.com/ben-manes/concurrentlinkedhashmap > > > > вт, 24 июл. 2018 г. в 16:45, Stanislav Lukyanov <[hidden email] > >: > > > > > It seems that we currently use the CLHM primarily as a FIFO cache. > > > I see two ways around that. > > > > > > First, we could implement such LRU/FIFO cache based on another, > properly > > > supported data structure from j.u.c. > > > ConcurrentSkipListMap seems OK. I have a draft implementation of > > > LruEvictionPolicy based on it that passes functional tests, > > > but I haven’t benchmarked it yet. > > > > > > Second, Guava has a Cache API with a lot of configuration options that > we > > > could use (license is Apache, should be OK). > > > > > > Stan > > > > > > From: Nikolay Izhikov > > > Sent: 24 июля 2018 г. 16:27 > > > To: [hidden email] > > > Subject: Re: ConcurrentLinkedHashMap works incorrectly after clear() > > > > > > Hello, Ilya. > > > > > > May be we need to proceed with ticket [1] "Get rid of > > > org.jsr166.ConcurrentLinkedHashMap"? > > > > > > Especially, if this class is broken and buggy. > > > > > > [1] https://issues.apache.org/jira/browse/IGNITE-7516 > > > > > > В Вт, 24/07/2018 в 16:20 +0300, Ilya Lantukh пишет: > > > > Thanks for revealing this issue! > > > > > > > > I don't understand why should we disallow calling clear(). > > > > > > > > One way how it can be re-implemented is: > > > > 1. acquire write locks on all segments; > > > > 2. clear them; > > > > 3. reset size to 0; > > > > 4. release locks. > > > > > > > > Another approach is to calculate inside > > > > ConcurrentLinkedHashMap.Segment.clear() how many entries you actually > > > > deleted and then call size.addAndGet(...). > > > > > > > > In both cases you'll have to replace LongAdder with AtomicLong. > > > > > > > > On Tue, Jul 24, 2018 at 4:03 PM, Ilya Kasnacheev < > > > [hidden email]> > > > > wrote: > > > > > > > > > Hello igniters! > > > > > > > > > > So I was working on a fix for > > > > > https://issues.apache.org/jira/browse/IGNITE-9056 > > > > > The reason for test flakiness turned out our > ConcurrentLinkedHashMap > > > (and > > > > > its tautological cousin GridBoundedConcurrentLinkedHashMap) is > > broken > > > :( > > > > > > > > > > When you do clear(). its size counter is not updated. So sizex() > will > > > > > return the old size after clear, and if there's maxCnt set, after > > > several > > > > > clear()s it will immediately evict entries after they are inserted, > > > > > maintaining map size at 0. > > > > > > > > > > This is scary since indexing internals make intense use of > > > > > ConcurrentLinkedHashMaps. > > > > > > > > > > My suggestion for this fix is to avoid ever calling clear(), making > > it > > > > > throw UnsupportedOperationException and recreating/replacing map > > > instead of > > > > > clear()ing it. Unless somebody is going to stand up and fix > > > > > ConcurrentLinkedHashMap.clear() properly. Frankly speaking I'm > > afraid > > > of > > > > > touching this code in any non-trivial way. > > > > > > > > > > -- > > > > > Ilya Kasnacheev > > > > > > > > > > > > > > > > > > > > > > > -- > > -- > > Maxim Muzafarov > > > -- Maxim Muzafarov |
Hello!
Most of our code uses CLHM as a container with fixed size. I can surely fix LogThrottle but my main concern is H2 indexing code which uses the same CLHM with cap. Regards, -- Ilya Kasnacheev 2018-07-27 16:38 GMT+03:00 Maxim Muzafarov <[hidden email]>: > Ilya, > > As for me, the main cause of this problem is not in CLHM itself but that > we are using it for GridLogThrottle as container with fixed size. Suppose, > at current moment we have no alternative and should start thinking about > futher steps. > > Can you create clear reproducer for described issue with CLHM above? > > As workaround for GridLogThrottle we can change clear() like this: > > public static void clear() { > errors.forEach((k, v) -> errors.remove(k)); > } > > Will it helps you to fix test? > > Thoughts? > > On Wed, 25 Jul 2018 at 19:57 Ilya Kasnacheev <[hidden email]> > wrote: > > > Hello! > > > > LT stops throttling input as soon as it is unable to add any entries to > > underlying map since it would consider itself full with 0 entries. > > > > I have tried to implement both suggestions, and they all have big impact > on > > CLHM code. I am super uncomfortable with making non-trivial edits to it. > > > > If the first approach is chosen, there's the need to iterate values > instead > > of clearing table, and if second approach is chosen, locking becomes > > non-trivial since we're grabbing segment locks outside of segment code.. > > > > Changing LongAdder to AtomicLong also has potential implications which > are > > not readily understood. > > > > Note that entryQ is not cleared in clear() either which can cause further > > problems. My suggestion is still to either disallow clear() or throw this > > class away in its entirety. > > > > Regards, > > > > -- > > Ilya Kasnacheev > > > > 2018-07-25 12:00 GMT+03:00 Maxim Muzafarov <[hidden email]>: > > > > > Hello Ilya, > > > > > > Can you add more info about why and how LT for this test case prints > log > > > message twice? > > > > > > From my point, maiking clear() method to throw > > > UnsupportedOperationException is not right > > > fix for flaky test issues. A brief search through CLHM led me to a > > thought > > > that we just forgot to drop down > > > LongAdder size when iterating over HashEntry array. We incrementing and > > > decrementing this > > > counter on put/remove operations by why not in clear method? Am I > right? > > > So, replacing LongAdder to AtomicLong > > > sounds good to me too, as it was suggested by Ilya Lantukh. But I can > > > mistake here. > > > > > > In general way, I think it's a good case to start thinking about how to > > get > > > rid of CLHM. E.g. we can consider this implementaion [1]. > > > > > > [1] https://github.com/ben-manes/concurrentlinkedhashmap > > > > > > вт, 24 июл. 2018 г. в 16:45, Stanislav Lukyanov < > [hidden email] > > >: > > > > > > > It seems that we currently use the CLHM primarily as a FIFO cache. > > > > I see two ways around that. > > > > > > > > First, we could implement such LRU/FIFO cache based on another, > > properly > > > > supported data structure from j.u.c. > > > > ConcurrentSkipListMap seems OK. I have a draft implementation of > > > > LruEvictionPolicy based on it that passes functional tests, > > > > but I haven’t benchmarked it yet. > > > > > > > > Second, Guava has a Cache API with a lot of configuration options > that > > we > > > > could use (license is Apache, should be OK). > > > > > > > > Stan > > > > > > > > From: Nikolay Izhikov > > > > Sent: 24 июля 2018 г. 16:27 > > > > To: [hidden email] > > > > Subject: Re: ConcurrentLinkedHashMap works incorrectly after clear() > > > > > > > > Hello, Ilya. > > > > > > > > May be we need to proceed with ticket [1] "Get rid of > > > > org.jsr166.ConcurrentLinkedHashMap"? > > > > > > > > Especially, if this class is broken and buggy. > > > > > > > > [1] https://issues.apache.org/jira/browse/IGNITE-7516 > > > > > > > > В Вт, 24/07/2018 в 16:20 +0300, Ilya Lantukh пишет: > > > > > Thanks for revealing this issue! > > > > > > > > > > I don't understand why should we disallow calling clear(). > > > > > > > > > > One way how it can be re-implemented is: > > > > > 1. acquire write locks on all segments; > > > > > 2. clear them; > > > > > 3. reset size to 0; > > > > > 4. release locks. > > > > > > > > > > Another approach is to calculate inside > > > > > ConcurrentLinkedHashMap.Segment.clear() how many entries you > actually > > > > > deleted and then call size.addAndGet(...). > > > > > > > > > > In both cases you'll have to replace LongAdder with AtomicLong. > > > > > > > > > > On Tue, Jul 24, 2018 at 4:03 PM, Ilya Kasnacheev < > > > > [hidden email]> > > > > > wrote: > > > > > > > > > > > Hello igniters! > > > > > > > > > > > > So I was working on a fix for > > > > > > https://issues.apache.org/jira/browse/IGNITE-9056 > > > > > > The reason for test flakiness turned out our > > ConcurrentLinkedHashMap > > > > (and > > > > > > its tautological cousin GridBoundedConcurrentLinkedHashMap) is > > > broken > > > > :( > > > > > > > > > > > > When you do clear(). its size counter is not updated. So sizex() > > will > > > > > > return the old size after clear, and if there's maxCnt set, after > > > > several > > > > > > clear()s it will immediately evict entries after they are > inserted, > > > > > > maintaining map size at 0. > > > > > > > > > > > > This is scary since indexing internals make intense use of > > > > > > ConcurrentLinkedHashMaps. > > > > > > > > > > > > My suggestion for this fix is to avoid ever calling clear(), > making > > > it > > > > > > throw UnsupportedOperationException and recreating/replacing map > > > > instead of > > > > > > clear()ing it. Unless somebody is going to stand up and fix > > > > > > ConcurrentLinkedHashMap.clear() properly. Frankly speaking I'm > > > afraid > > > > of > > > > > > touching this code in any non-trivial way. > > > > > > > > > > > > -- > > > > > > Ilya Kasnacheev > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > -- > > > Maxim Muzafarov > > > > > > -- > -- > Maxim Muzafarov > |
Is it possible to replace current implementation with google's [1] or some
other? It a bad idea to have own CLHM and keep it outdated (this version based on CHM from java7!) and broken. [1] https://mvnrepository.com/artifact/com.googlecode.concurrentlinkedhashmap/concurrentlinkedhashmap-lru пт, 27 июл. 2018 г. в 16:40, Ilya Kasnacheev <[hidden email]>: > Hello! > > Most of our code uses CLHM as a container with fixed size. I can surely fix > LogThrottle but my main concern is H2 indexing code which uses the same > CLHM with cap. > > Regards, > > -- > Ilya Kasnacheev > > 2018-07-27 16:38 GMT+03:00 Maxim Muzafarov <[hidden email]>: > > > Ilya, > > > > As for me, the main cause of this problem is not in CLHM itself but that > > we are using it for GridLogThrottle as container with fixed size. > Suppose, > > at current moment we have no alternative and should start thinking about > > futher steps. > > > > Can you create clear reproducer for described issue with CLHM above? > > > > As workaround for GridLogThrottle we can change clear() like this: > > > > public static void clear() { > > errors.forEach((k, v) -> errors.remove(k)); > > } > > > > Will it helps you to fix test? > > > > Thoughts? > > > > On Wed, 25 Jul 2018 at 19:57 Ilya Kasnacheev <[hidden email]> > > wrote: > > > > > Hello! > > > > > > LT stops throttling input as soon as it is unable to add any entries to > > > underlying map since it would consider itself full with 0 entries. > > > > > > I have tried to implement both suggestions, and they all have big > impact > > on > > > CLHM code. I am super uncomfortable with making non-trivial edits to > it. > > > > > > If the first approach is chosen, there's the need to iterate values > > instead > > > of clearing table, and if second approach is chosen, locking becomes > > > non-trivial since we're grabbing segment locks outside of segment > code.. > > > > > > Changing LongAdder to AtomicLong also has potential implications which > > are > > > not readily understood. > > > > > > Note that entryQ is not cleared in clear() either which can cause > further > > > problems. My suggestion is still to either disallow clear() or throw > this > > > class away in its entirety. > > > > > > Regards, > > > > > > -- > > > Ilya Kasnacheev > > > > > > 2018-07-25 12:00 GMT+03:00 Maxim Muzafarov <[hidden email]>: > > > > > > > Hello Ilya, > > > > > > > > Can you add more info about why and how LT for this test case prints > > log > > > > message twice? > > > > > > > > From my point, maiking clear() method to throw > > > > UnsupportedOperationException is not right > > > > fix for flaky test issues. A brief search through CLHM led me to a > > > thought > > > > that we just forgot to drop down > > > > LongAdder size when iterating over HashEntry array. We incrementing > and > > > > decrementing this > > > > counter on put/remove operations by why not in clear method? Am I > > right? > > > > So, replacing LongAdder to AtomicLong > > > > sounds good to me too, as it was suggested by Ilya Lantukh. But I can > > > > mistake here. > > > > > > > > In general way, I think it's a good case to start thinking about how > to > > > get > > > > rid of CLHM. E.g. we can consider this implementaion [1]. > > > > > > > > [1] https://github.com/ben-manes/concurrentlinkedhashmap > > > > > > > > вт, 24 июл. 2018 г. в 16:45, Stanislav Lukyanov < > > [hidden email] > > > >: > > > > > > > > > It seems that we currently use the CLHM primarily as a FIFO cache. > > > > > I see two ways around that. > > > > > > > > > > First, we could implement such LRU/FIFO cache based on another, > > > properly > > > > > supported data structure from j.u.c. > > > > > ConcurrentSkipListMap seems OK. I have a draft implementation of > > > > > LruEvictionPolicy based on it that passes functional tests, > > > > > but I haven’t benchmarked it yet. > > > > > > > > > > Second, Guava has a Cache API with a lot of configuration options > > that > > > we > > > > > could use (license is Apache, should be OK). > > > > > > > > > > Stan > > > > > > > > > > From: Nikolay Izhikov > > > > > Sent: 24 июля 2018 г. 16:27 > > > > > To: [hidden email] > > > > > Subject: Re: ConcurrentLinkedHashMap works incorrectly after > clear() > > > > > > > > > > Hello, Ilya. > > > > > > > > > > May be we need to proceed with ticket [1] "Get rid of > > > > > org.jsr166.ConcurrentLinkedHashMap"? > > > > > > > > > > Especially, if this class is broken and buggy. > > > > > > > > > > [1] https://issues.apache.org/jira/browse/IGNITE-7516 > > > > > > > > > > В Вт, 24/07/2018 в 16:20 +0300, Ilya Lantukh пишет: > > > > > > Thanks for revealing this issue! > > > > > > > > > > > > I don't understand why should we disallow calling clear(). > > > > > > > > > > > > One way how it can be re-implemented is: > > > > > > 1. acquire write locks on all segments; > > > > > > 2. clear them; > > > > > > 3. reset size to 0; > > > > > > 4. release locks. > > > > > > > > > > > > Another approach is to calculate inside > > > > > > ConcurrentLinkedHashMap.Segment.clear() how many entries you > > actually > > > > > > deleted and then call size.addAndGet(...). > > > > > > > > > > > > In both cases you'll have to replace LongAdder with AtomicLong. > > > > > > > > > > > > On Tue, Jul 24, 2018 at 4:03 PM, Ilya Kasnacheev < > > > > > [hidden email]> > > > > > > wrote: > > > > > > > > > > > > > Hello igniters! > > > > > > > > > > > > > > So I was working on a fix for > > > > > > > https://issues.apache.org/jira/browse/IGNITE-9056 > > > > > > > The reason for test flakiness turned out our > > > ConcurrentLinkedHashMap > > > > > (and > > > > > > > its tautological cousin GridBoundedConcurrentLinkedHashMap) is > > > > broken > > > > > :( > > > > > > > > > > > > > > When you do clear(). its size counter is not updated. So > sizex() > > > will > > > > > > > return the old size after clear, and if there's maxCnt set, > after > > > > > several > > > > > > > clear()s it will immediately evict entries after they are > > inserted, > > > > > > > maintaining map size at 0. > > > > > > > > > > > > > > This is scary since indexing internals make intense use of > > > > > > > ConcurrentLinkedHashMaps. > > > > > > > > > > > > > > My suggestion for this fix is to avoid ever calling clear(), > > making > > > > it > > > > > > > throw UnsupportedOperationException and recreating/replacing > map > > > > > instead of > > > > > > > clear()ing it. Unless somebody is going to stand up and fix > > > > > > > ConcurrentLinkedHashMap.clear() properly. Frankly speaking I'm > > > > afraid > > > > > of > > > > > > > touching this code in any non-trivial way. > > > > > > > > > > > > > > -- > > > > > > > Ilya Kasnacheev > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > -- > > > > Maxim Muzafarov > > > > > > > > > -- > > -- > > Maxim Muzafarov > > > |
Guys, I think we can definitely change current implementation of CLHM with
a more stable one, but as a temporal solution I see nothing wrong with throwing an UnsupportedOperationException from clear() method. Ilya already provided a patch which replaces all clear() calls with a new map creation, semantically it has the same behavior as a correct clear() method. I suggest to merge the provided PR because IgniteH2Indexing is broken because of this bug and then create a ticket to replace/fix/whatever it feels right to do with current CLHM. Thoughts? пт, 27 июл. 2018 г. в 16:43, Anton Vinogradov <[hidden email]>: > Is it possible to replace current implementation with google's [1] or some > other? > It a bad idea to have own CLHM and keep it outdated (this version based on > CHM from java7!) and broken. > > [1] > > https://mvnrepository.com/artifact/com.googlecode.concurrentlinkedhashmap/concurrentlinkedhashmap-lru > > пт, 27 июл. 2018 г. в 16:40, Ilya Kasnacheev <[hidden email]>: > > > Hello! > > > > Most of our code uses CLHM as a container with fixed size. I can surely > fix > > LogThrottle but my main concern is H2 indexing code which uses the same > > CLHM with cap. > > > > Regards, > > > > -- > > Ilya Kasnacheev > > > > 2018-07-27 16:38 GMT+03:00 Maxim Muzafarov <[hidden email]>: > > > > > Ilya, > > > > > > As for me, the main cause of this problem is not in CLHM itself but > that > > > we are using it for GridLogThrottle as container with fixed size. > > Suppose, > > > at current moment we have no alternative and should start thinking > about > > > futher steps. > > > > > > Can you create clear reproducer for described issue with CLHM above? > > > > > > As workaround for GridLogThrottle we can change clear() like this: > > > > > > public static void clear() { > > > errors.forEach((k, v) -> errors.remove(k)); > > > } > > > > > > Will it helps you to fix test? > > > > > > Thoughts? > > > > > > On Wed, 25 Jul 2018 at 19:57 Ilya Kasnacheev < > [hidden email]> > > > wrote: > > > > > > > Hello! > > > > > > > > LT stops throttling input as soon as it is unable to add any entries > to > > > > underlying map since it would consider itself full with 0 entries. > > > > > > > > I have tried to implement both suggestions, and they all have big > > impact > > > on > > > > CLHM code. I am super uncomfortable with making non-trivial edits to > > it. > > > > > > > > If the first approach is chosen, there's the need to iterate values > > > instead > > > > of clearing table, and if second approach is chosen, locking becomes > > > > non-trivial since we're grabbing segment locks outside of segment > > code.. > > > > > > > > Changing LongAdder to AtomicLong also has potential implications > which > > > are > > > > not readily understood. > > > > > > > > Note that entryQ is not cleared in clear() either which can cause > > further > > > > problems. My suggestion is still to either disallow clear() or throw > > this > > > > class away in its entirety. > > > > > > > > Regards, > > > > > > > > -- > > > > Ilya Kasnacheev > > > > > > > > 2018-07-25 12:00 GMT+03:00 Maxim Muzafarov <[hidden email]>: > > > > > > > > > Hello Ilya, > > > > > > > > > > Can you add more info about why and how LT for this test case > prints > > > log > > > > > message twice? > > > > > > > > > > From my point, maiking clear() method to throw > > > > > UnsupportedOperationException is not right > > > > > fix for flaky test issues. A brief search through CLHM led me to a > > > > thought > > > > > that we just forgot to drop down > > > > > LongAdder size when iterating over HashEntry array. We incrementing > > and > > > > > decrementing this > > > > > counter on put/remove operations by why not in clear method? Am I > > > right? > > > > > So, replacing LongAdder to AtomicLong > > > > > sounds good to me too, as it was suggested by Ilya Lantukh. But I > can > > > > > mistake here. > > > > > > > > > > In general way, I think it's a good case to start thinking about > how > > to > > > > get > > > > > rid of CLHM. E.g. we can consider this implementaion [1]. > > > > > > > > > > [1] https://github.com/ben-manes/concurrentlinkedhashmap > > > > > > > > > > вт, 24 июл. 2018 г. в 16:45, Stanislav Lukyanov < > > > [hidden email] > > > > >: > > > > > > > > > > > It seems that we currently use the CLHM primarily as a FIFO > cache. > > > > > > I see two ways around that. > > > > > > > > > > > > First, we could implement such LRU/FIFO cache based on another, > > > > properly > > > > > > supported data structure from j.u.c. > > > > > > ConcurrentSkipListMap seems OK. I have a draft implementation of > > > > > > LruEvictionPolicy based on it that passes functional tests, > > > > > > but I haven’t benchmarked it yet. > > > > > > > > > > > > Second, Guava has a Cache API with a lot of configuration options > > > that > > > > we > > > > > > could use (license is Apache, should be OK). > > > > > > > > > > > > Stan > > > > > > > > > > > > From: Nikolay Izhikov > > > > > > Sent: 24 июля 2018 г. 16:27 > > > > > > To: [hidden email] > > > > > > Subject: Re: ConcurrentLinkedHashMap works incorrectly after > > clear() > > > > > > > > > > > > Hello, Ilya. > > > > > > > > > > > > May be we need to proceed with ticket [1] "Get rid of > > > > > > org.jsr166.ConcurrentLinkedHashMap"? > > > > > > > > > > > > Especially, if this class is broken and buggy. > > > > > > > > > > > > [1] https://issues.apache.org/jira/browse/IGNITE-7516 > > > > > > > > > > > > В Вт, 24/07/2018 в 16:20 +0300, Ilya Lantukh пишет: > > > > > > > Thanks for revealing this issue! > > > > > > > > > > > > > > I don't understand why should we disallow calling clear(). > > > > > > > > > > > > > > One way how it can be re-implemented is: > > > > > > > 1. acquire write locks on all segments; > > > > > > > 2. clear them; > > > > > > > 3. reset size to 0; > > > > > > > 4. release locks. > > > > > > > > > > > > > > Another approach is to calculate inside > > > > > > > ConcurrentLinkedHashMap.Segment.clear() how many entries you > > > actually > > > > > > > deleted and then call size.addAndGet(...). > > > > > > > > > > > > > > In both cases you'll have to replace LongAdder with AtomicLong. > > > > > > > > > > > > > > On Tue, Jul 24, 2018 at 4:03 PM, Ilya Kasnacheev < > > > > > > [hidden email]> > > > > > > > wrote: > > > > > > > > > > > > > > > Hello igniters! > > > > > > > > > > > > > > > > So I was working on a fix for > > > > > > > > https://issues.apache.org/jira/browse/IGNITE-9056 > > > > > > > > The reason for test flakiness turned out our > > > > ConcurrentLinkedHashMap > > > > > > (and > > > > > > > > its tautological cousin GridBoundedConcurrentLinkedHashMap) > is > > > > > broken > > > > > > :( > > > > > > > > > > > > > > > > When you do clear(). its size counter is not updated. So > > sizex() > > > > will > > > > > > > > return the old size after clear, and if there's maxCnt set, > > after > > > > > > several > > > > > > > > clear()s it will immediately evict entries after they are > > > inserted, > > > > > > > > maintaining map size at 0. > > > > > > > > > > > > > > > > This is scary since indexing internals make intense use of > > > > > > > > ConcurrentLinkedHashMaps. > > > > > > > > > > > > > > > > My suggestion for this fix is to avoid ever calling clear(), > > > making > > > > > it > > > > > > > > throw UnsupportedOperationException and recreating/replacing > > map > > > > > > instead of > > > > > > > > clear()ing it. Unless somebody is going to stand up and fix > > > > > > > > ConcurrentLinkedHashMap.clear() properly. Frankly speaking > I'm > > > > > afraid > > > > > > of > > > > > > > > touching this code in any non-trivial way. > > > > > > > > > > > > > > > > -- > > > > > > > > Ilya Kasnacheev > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > -- > > > > > Maxim Muzafarov > > > > > > > > > > > > -- > > > -- > > > Maxim Muzafarov > > > > > > |
On Thu, Aug 9, 2018 at 8:16 AM, Alexey Goncharuk <[hidden email]
> wrote: > Guys, I think we can definitely change current implementation of CLHM with > a more stable one, but as a temporal solution I see nothing wrong with > throwing an UnsupportedOperationException from clear() method. Ilya already > provided a patch which replaces all clear() calls with a new map creation, > semantically it has the same behavior as a correct clear() method. > > I suggest to merge the provided PR because IgniteH2Indexing is broken > because of this bug and then create a ticket to replace/fix/whatever it > feels right to do with current CLHM. > > Thoughts? > I agree. This sounds like a less risky approach. D. |
In reply to this post by Stanislav Lukyanov
Stas,
SkipList implementation offers O(log n) for get/put/contains operations while CLHM - O(1). So it is suitable for small data sets but will have serious performance impact for the big ones. However, it seems it's time for right choice: correctness or performance. The answer seems obvious ) On Tue, Jul 24, 2018 at 4:45 PM Stanislav Lukyanov <[hidden email]> wrote: > > It seems that we currently use the CLHM primarily as a FIFO cache. > I see two ways around that. > > First, we could implement such LRU/FIFO cache based on another, properly supported data structure from j.u.c. > ConcurrentSkipListMap seems OK. I have a draft implementation of LruEvictionPolicy based on it that passes functional tests, > but I haven’t benchmarked it yet. > > Second, Guava has a Cache API with a lot of configuration options that we could use (license is Apache, should be OK). > > Stan > > From: Nikolay Izhikov > Sent: 24 июля 2018 г. 16:27 > To: [hidden email] > Subject: Re: ConcurrentLinkedHashMap works incorrectly after clear() > > Hello, Ilya. > > May be we need to proceed with ticket [1] "Get rid of org.jsr166.ConcurrentLinkedHashMap"? > > Especially, if this class is broken and buggy. > > [1] https://issues.apache.org/jira/browse/IGNITE-7516 > > В Вт, 24/07/2018 в 16:20 +0300, Ilya Lantukh пишет: > > Thanks for revealing this issue! > > > > I don't understand why should we disallow calling clear(). > > > > One way how it can be re-implemented is: > > 1. acquire write locks on all segments; > > 2. clear them; > > 3. reset size to 0; > > 4. release locks. > > > > Another approach is to calculate inside > > ConcurrentLinkedHashMap.Segment.clear() how many entries you actually > > deleted and then call size.addAndGet(...). > > > > In both cases you'll have to replace LongAdder with AtomicLong. > > > > On Tue, Jul 24, 2018 at 4:03 PM, Ilya Kasnacheev <[hidden email]> > > wrote: > > > > > Hello igniters! > > > > > > So I was working on a fix for > > > https://issues.apache.org/jira/browse/IGNITE-9056 > > > The reason for test flakiness turned out our ConcurrentLinkedHashMap (and > > > its tautological cousin GridBoundedConcurrentLinkedHashMap) is broken :( > > > > > > When you do clear(). its size counter is not updated. So sizex() will > > > return the old size after clear, and if there's maxCnt set, after several > > > clear()s it will immediately evict entries after they are inserted, > > > maintaining map size at 0. > > > > > > This is scary since indexing internals make intense use of > > > ConcurrentLinkedHashMaps. > > > > > > My suggestion for this fix is to avoid ever calling clear(), making it > > > throw UnsupportedOperationException and recreating/replacing map instead of > > > clear()ing it. Unless somebody is going to stand up and fix > > > ConcurrentLinkedHashMap.clear() properly. Frankly speaking I'm afraid of > > > touching this code in any non-trivial way. > > > > > > -- > > > Ilya Kasnacheev > > > > > > > > > > |
Hello Igniters!
So we have merged the patch. If there are further steps with CLHM, feel free to contribute. Regards, -- Ilya Kasnacheev 2018-08-10 17:50 GMT+03:00 Andrey Gura <[hidden email]>: > Stas, > > SkipList implementation offers O(log n) for get/put/contains > operations while CLHM - O(1). So it is suitable for small data sets > but will have serious performance impact for the big ones. > > However, it seems it's time for right choice: correctness or > performance. The answer seems obvious ) > On Tue, Jul 24, 2018 at 4:45 PM Stanislav Lukyanov > <[hidden email]> wrote: > > > > It seems that we currently use the CLHM primarily as a FIFO cache. > > I see two ways around that. > > > > First, we could implement such LRU/FIFO cache based on another, properly > supported data structure from j.u.c. > > ConcurrentSkipListMap seems OK. I have a draft implementation of > LruEvictionPolicy based on it that passes functional tests, > > but I haven’t benchmarked it yet. > > > > Second, Guava has a Cache API with a lot of configuration options that > we could use (license is Apache, should be OK). > > > > Stan > > > > From: Nikolay Izhikov > > Sent: 24 июля 2018 г. 16:27 > > To: [hidden email] > > Subject: Re: ConcurrentLinkedHashMap works incorrectly after clear() > > > > Hello, Ilya. > > > > May be we need to proceed with ticket [1] "Get rid of org.jsr166. > ConcurrentLinkedHashMap"? > > > > Especially, if this class is broken and buggy. > > > > [1] https://issues.apache.org/jira/browse/IGNITE-7516 > > > > В Вт, 24/07/2018 в 16:20 +0300, Ilya Lantukh пишет: > > > Thanks for revealing this issue! > > > > > > I don't understand why should we disallow calling clear(). > > > > > > One way how it can be re-implemented is: > > > 1. acquire write locks on all segments; > > > 2. clear them; > > > 3. reset size to 0; > > > 4. release locks. > > > > > > Another approach is to calculate inside > > > ConcurrentLinkedHashMap.Segment.clear() how many entries you actually > > > deleted and then call size.addAndGet(...). > > > > > > In both cases you'll have to replace LongAdder with AtomicLong. > > > > > > On Tue, Jul 24, 2018 at 4:03 PM, Ilya Kasnacheev < > [hidden email]> > > > wrote: > > > > > > > Hello igniters! > > > > > > > > So I was working on a fix for > > > > https://issues.apache.org/jira/browse/IGNITE-9056 > > > > The reason for test flakiness turned out our ConcurrentLinkedHashMap > (and > > > > its tautological cousin GridBoundedConcurrentLinkedHashMap) is > broken :( > > > > > > > > When you do clear(). its size counter is not updated. So sizex() will > > > > return the old size after clear, and if there's maxCnt set, after > several > > > > clear()s it will immediately evict entries after they are inserted, > > > > maintaining map size at 0. > > > > > > > > This is scary since indexing internals make intense use of > > > > ConcurrentLinkedHashMaps. > > > > > > > > My suggestion for this fix is to avoid ever calling clear(), making > it > > > > throw UnsupportedOperationException and recreating/replacing map > instead of > > > > clear()ing it. Unless somebody is going to stand up and fix > > > > ConcurrentLinkedHashMap.clear() properly. Frankly speaking I'm > afraid of > > > > touching this code in any non-trivial way. > > > > > > > > -- > > > > Ilya Kasnacheev > > > > > > > > > > > > > > > > |
Free forum by Nabble | Edit this page |