Steps to reproduce.
1. Start node with partitioned cache and load to cache > 1M indexed entries. In my case I used datasteramer Wait while data loaded. 2. Start one more node. It will FAILED (!!!!) to join topology. And in VisualVM I see that first node consuming 25% of CPU and on sampler page I see that first node consume CPU in following methods: GridCacheMapEntry.deletedUnlocked() GridCacheMapEntry.checkExpired() Other scenario. 1. Start couple of nodes without load. 2. Start node with load. In this case all nodes joined topology. 3. After load is finished in VisulaVM observed CPU consumption in: GridCacheMapEntry.deletedUnlocked() GridCacheMapEntry.checkExpired() The more entries will be loaded in cache 2M, 3M than more CPU will be consumed. -- Alexey Kuznetsov GridGain Systems www.gridgain.com |
We narrowed down the root cause of this issue, it is caused by discovery
thread iterating over the whole cache when collecting cache metrics. We fixed isEmpty method and this case looks ok now. Alexey will verify the rest of the tests are good. I think we should revoke the vote since this is a critical issue. 2015-05-06 21:50 GMT-07:00 Alexey Kuznetsov <[hidden email]>: > Steps to reproduce. > > 1. Start node with partitioned cache and load to cache > 1M indexed > entries. > In my case I used datasteramer > Wait while data loaded. > 2. Start one more node. It will FAILED (!!!!) to join topology. > And in VisualVM I see that first node consuming 25% of CPU and on > sampler page I see that first node consume CPU in following methods: > GridCacheMapEntry.deletedUnlocked() > GridCacheMapEntry.checkExpired() > > Other scenario. > > 1. Start couple of nodes without load. > 2. Start node with load. In this case all nodes joined topology. > 3. After load is finished in VisulaVM observed CPU consumption in: > GridCacheMapEntry.deletedUnlocked() > GridCacheMapEntry.checkExpired() > > The more entries will be loaded in cache 2M, 3M than more CPU will be > consumed. > > > -- > Alexey Kuznetsov > GridGain Systems > www.gridgain.com > |
On Thu, May 7, 2015 at 1:21 AM, Alexey Goncharuk <[hidden email]
> wrote: > We narrowed down the root cause of this issue, it is caused by discovery > thread iterating over the whole cache when collecting cache metrics. We > fixed isEmpty method and this case looks ok now. Alexey will verify the > rest of the tests are good. I think we should revoke the vote since this is > a critical issue. > I agree. I will cancel the vote. Let's fix this issue and resubmit the vote when we are ready. Thanks to everyone for looking into this on such a short notice! > 2015-05-06 21:50 GMT-07:00 Alexey Kuznetsov <[hidden email]>: > > > Steps to reproduce. > > > > 1. Start node with partitioned cache and load to cache > 1M indexed > > entries. > > In my case I used datasteramer > > Wait while data loaded. > > 2. Start one more node. It will FAILED (!!!!) to join topology. > > And in VisualVM I see that first node consuming 25% of CPU and on > > sampler page I see that first node consume CPU in following methods: > > GridCacheMapEntry.deletedUnlocked() > > GridCacheMapEntry.checkExpired() > > > > Other scenario. > > > > 1. Start couple of nodes without load. > > 2. Start node with load. In this case all nodes joined topology. > > 3. After load is finished in VisulaVM observed CPU consumption in: > > GridCacheMapEntry.deletedUnlocked() > > GridCacheMapEntry.checkExpired() > > > > The more entries will be loaded in cache 2M, 3M than more CPU will be > > consumed. > > > > > > -- > > Alexey Kuznetsov > > GridGain Systems > > www.gridgain.com > > > |
In reply to this post by Alexey Goncharuk
Great and timely discovery! let's re-vote once it's fixed.
Thanks! On Wed, May 06, 2015 at 11:21PM, Alexey Goncharuk wrote: > We narrowed down the root cause of this issue, it is caused by discovery > thread iterating over the whole cache when collecting cache metrics. We > fixed isEmpty method and this case looks ok now. Alexey will verify the > rest of the tests are good. I think we should revoke the vote since this is > a critical issue. > > 2015-05-06 21:50 GMT-07:00 Alexey Kuznetsov <[hidden email]>: > > > Steps to reproduce. > > > > 1. Start node with partitioned cache and load to cache > 1M indexed > > entries. > > In my case I used datasteramer > > Wait while data loaded. > > 2. Start one more node. It will FAILED (!!!!) to join topology. > > And in VisualVM I see that first node consuming 25% of CPU and on > > sampler page I see that first node consume CPU in following methods: > > GridCacheMapEntry.deletedUnlocked() > > GridCacheMapEntry.checkExpired() > > > > Other scenario. > > > > 1. Start couple of nodes without load. > > 2. Start node with load. In this case all nodes joined topology. > > 3. After load is finished in VisulaVM observed CPU consumption in: > > GridCacheMapEntry.deletedUnlocked() > > GridCacheMapEntry.checkExpired() > > > > The more entries will be loaded in cache 2M, 3M than more CPU will be > > consumed. > > > > > > -- > > Alexey Kuznetsov > > GridGain Systems > > www.gridgain.com > > |
Free forum by Nabble | Edit this page |