Folks,
Do we currently have any way to set a timeout for an atomic operation? I don't see neither a way to do this nor any related documentation. In the code there are CacheAtomicUpdateTimeoutException and CacheAtomicUpdateTimeoutCheckedException, but I can't find a single place where it's created and/or thrown. Looks like we used to have this functionality, but it's not there anymore. Is this really the case or I missed something? I think having a way to timeout atomic operation is very important. For example, two concurrent putAll operations with keys in different order can completely hang the whole cluster forever, which is unacceptable. Is it possible to timeout one of the operations (or both of them) in this case? -Val |
Any thoughts?
-Val On Wed, Jul 19, 2017 at 4:21 PM, Valentin Kulichenko < [hidden email]> wrote: > Folks, > > Do we currently have any way to set a timeout for an atomic operation? I > don't see neither a way to do this nor any related documentation. > > In the code there are CacheAtomicUpdateTimeoutException and > CacheAtomicUpdateTimeoutCheckedException, but I can't find a single place > where it's created and/or thrown. Looks like we used to have this > functionality, but it's not there anymore. Is this really the case or I > missed something? > > I think having a way to timeout atomic operation is very important. For > example, two concurrent putAll operations with keys in different order can > completely hang the whole cluster forever, which is unacceptable. Is it > possible to timeout one of the operations (or both of them) in this case? > > -Val > |
Val, it seems you spotted and issue. Please file a ticket - I would suggest
to remove the exceptions entirely as in my understanding timeout logic for atomic operation will bring additional overhead, but most of the time atomic operations are instant. From timeout perspective, what differs atomic operation from a transaction is that you cannot predict when user releases lock he acquired inside a transaction, but atomic operation should have predictable timeout. As far as your example. Currently, this will lead to java-level deadlock on synchronized sections for the cache entries (but when we move to pure thread-per-partition for atomic caches this will not be an issue any more https://issues.apache.org/jira/browse/IGNITE-4506). I would suggest we file a ticket to implement detection of java-level deadlock and allow user to configure policy to take appropriate action on deadlock wherever it happens - https://issues.apache.org/jira/browse/IGNITE-5811 Any other hang of the atomic operation seem to be caused by issues in Ignite's internal machinery - either hanged exchange or problems in message processing on some node (e.g. all threads are busy and/or in deadlock) which again should result in notifying user and stopping node (by default). --Yakov |
Yakov,
Thanks for response. I definitely like the idea of detecting Java level deadlocks. As for hangs caused by Ignite internal problems, do we have a ticket for this as well? Do you have any idea about how this should be implemented? -Val On Mon, Jul 24, 2017 at 3:55 AM, Yakov Zhdanov <[hidden email]> wrote: > Val, it seems you spotted and issue. Please file a ticket - I would suggest > to remove the exceptions entirely as in my understanding timeout logic for > atomic operation will bring additional overhead, but most of the time > atomic operations are instant. From timeout perspective, what differs > atomic operation from a transaction is that you cannot predict when user > releases lock he acquired inside a transaction, but atomic operation should > have predictable timeout. > > As far as your example. Currently, this will lead to java-level deadlock on > synchronized sections for the cache entries (but when we move to pure > thread-per-partition for atomic caches this will not be an issue any more > https://issues.apache.org/jira/browse/IGNITE-4506). I would suggest we > file > a ticket to implement detection of java-level deadlock and allow user to > configure policy to take appropriate action on deadlock wherever it happens > - https://issues.apache.org/jira/browse/IGNITE-5811 > > Any other hang of the atomic operation seem to be caused by issues in > Ignite's internal machinery - either hanged exchange or problems in message > processing on some node (e.g. all threads are busy and/or in deadlock) > which again should result in notifying user and stopping node (by default). > > --Yakov > |
Val, I think this should be something similar to deadlock detection, but
different condition. --Yakov |
Here is the newbie ticket for removing the exception -
https://issues.apache.org/jira/browse/IGNITE-5823. --Yakov |
Guys, I have edited https://issues.apache.org/jira/browse/IGNITE-5811 and
extended it a bit. Comments are welcome! --Yakov |
Free forum by Nabble | Edit this page |