Hello, Igniters.
I run into issue with critical system worker failure handler. I just run `IgniteDataFrameSuite` and it terminates on random test. My laptop doesn't have bleeding edge hardware, so tests can take significant amount of time. Looks like our watch dog too aggressive on development environment Can you please, help me. What should I do to configure or turn off watch dog? Should we relax it a little bit? At least for a test environment. Error message contains following message: ``` [2018-12-27 11:40:23,597][ERROR][exchange-worker-#5547%grid-2%][root] Critical system error detected. Will be handled accordingly to configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext [type=SYSTEM_WORKER_TERMINATION, err=class o.a.i.IgniteCheckedException: Node is stopping: grid-2]] class org.apache.ignite.IgniteCheckedException: Node is stopping: grid-2 ``` |
Hi Nikolay,
This is the issue I mentioned in "Critical worker threads liveness checking drawbacks" topic which I was expecting to be included to Ignite 2.7, but it was not. To workaround the issue, you should set DataStorageConfiguration#setCheckpointReadLockTimeout to 0. Should we somehow announce it on the user-list or highlight on readme.io? чт, 27 дек. 2018 г. в 11:57, Nikolay Izhikov <[hidden email]>: > Hello, Igniters. > > I run into issue with critical system worker failure handler. > I just run `IgniteDataFrameSuite` and it terminates on random test. > My laptop doesn't have bleeding edge hardware, so tests can take > significant amount of time. > Looks like our watch dog too aggressive on development environment > > Can you please, help me. What should I do to configure or turn off watch > dog? > Should we relax it a little bit? At least for a test environment. > > Error message contains following message: > > ``` > [2018-12-27 11:40:23,597][ERROR][exchange-worker-#5547%grid-2%][root] > Critical system error detected. Will be handled accordingly to configured > handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, > super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet > [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], > failureCtx=FailureContext [type=SYSTEM_WORKER_TERMINATION, err=class > o.a.i.IgniteCheckedException: Node is stopping: grid-2]] > class org.apache.ignite.IgniteCheckedException: Node is stopping: grid-2 > ``` > |
Alexey
Fix for this issue already in master? I run tests on current master. > Should we somehow announce it on the user-list or highlight on readme.io? I don't think our users will be happy to users stuck with this behavior in production. Am I understand you correctly: If someone use 2.7. release and Ignite process slowing for a few seconds for any reason(low-end hardwre, VM pause, other processes grab the resources) then Ignite node will be stopped? > This is the issue I mentioned in "Critical worker threads liveness checking drawbacks" topic Thanks for the link, I will check it out. чт, 27 дек. 2018 г. в 12:24, Alexey Goncharuk <[hidden email]>: > Hi Nikolay, > > This is the issue I mentioned in "Critical worker threads liveness checking > drawbacks" topic which I was expecting to be included to Ignite 2.7, but it > was not. To workaround the issue, you should set > DataStorageConfiguration#setCheckpointReadLockTimeout to 0. > > Should we somehow announce it on the user-list or highlight on readme.io? > > чт, 27 дек. 2018 г. в 11:57, Nikolay Izhikov <[hidden email]>: > > > Hello, Igniters. > > > > I run into issue with critical system worker failure handler. > > I just run `IgniteDataFrameSuite` and it terminates on random test. > > My laptop doesn't have bleeding edge hardware, so tests can take > > significant amount of time. > > Looks like our watch dog too aggressive on development environment > > > > Can you please, help me. What should I do to configure or turn off watch > > dog? > > Should we relax it a little bit? At least for a test environment. > > > > Error message contains following message: > > > > ``` > > [2018-12-27 11:40:23,597][ERROR][exchange-worker-#5547%grid-2%][root] > > Critical system error detected. Will be handled accordingly to configured > > handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, > > super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet > > [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], > > failureCtx=FailureContext [type=SYSTEM_WORKER_TERMINATION, err=class > > o.a.i.IgniteCheckedException: Node is stopping: grid-2]] > > class org.apache.ignite.IgniteCheckedException: Node is stopping: grid-2 > > ``` > > > |
Nikolay,
Yes, the fix is already in master. Looks like I was wrong, in your case failure handler is triggered by 'Node is stopping: grid-2'. Can you please share the full trace? чт, 27 дек. 2018 г. в 12:41, Nikolay Izhikov <[hidden email]>: > Alexey > > Fix for this issue already in master? > I run tests on current master. > > > Should we somehow announce it on the user-list or highlight on readme.io > ? > > I don't think our users will be happy to users stuck with this behavior in > production. > > Am I understand you correctly: > If someone use 2.7. release and Ignite process slowing for a few seconds > for any reason(low-end hardwre, VM pause, other processes grab the > resources) then Ignite node will be stopped? > > > This is the issue I mentioned in "Critical worker threads liveness > checking > drawbacks" topic > > Thanks for the link, I will check it out. > > чт, 27 дек. 2018 г. в 12:24, Alexey Goncharuk <[hidden email] > >: > > > Hi Nikolay, > > > > This is the issue I mentioned in "Critical worker threads liveness > checking > > drawbacks" topic which I was expecting to be included to Ignite 2.7, but > it > > was not. To workaround the issue, you should set > > DataStorageConfiguration#setCheckpointReadLockTimeout to 0. > > > > Should we somehow announce it on the user-list or highlight on readme.io > ? > > > > чт, 27 дек. 2018 г. в 11:57, Nikolay Izhikov <[hidden email]>: > > > > > Hello, Igniters. > > > > > > I run into issue with critical system worker failure handler. > > > I just run `IgniteDataFrameSuite` and it terminates on random test. > > > My laptop doesn't have bleeding edge hardware, so tests can take > > > significant amount of time. > > > Looks like our watch dog too aggressive on development environment > > > > > > Can you please, help me. What should I do to configure or turn off > watch > > > dog? > > > Should we relax it a little bit? At least for a test environment. > > > > > > Error message contains following message: > > > > > > ``` > > > [2018-12-27 11:40:23,597][ERROR][exchange-worker-#5547%grid-2%][root] > > > Critical system error detected. Will be handled accordingly to > configured > > > handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, > > > super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet > > > [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], > > > failureCtx=FailureContext [type=SYSTEM_WORKER_TERMINATION, err=class > > > o.a.i.IgniteCheckedException: Node is stopping: grid-2]] > > > class org.apache.ignite.IgniteCheckedException: Node is stopping: > grid-2 > > > ``` > > > > > > |
Folks,
What are the current timeouts? We need to know the probability of failures in dev environment. This affect usability. -- Denis On Thu, Dec 27, 2018 at 4:59 AM Alexey Goncharuk <[hidden email]> wrote: > Nikolay, > > Yes, the fix is already in master. Looks like I was wrong, in your case > failure handler is triggered by 'Node is stopping: grid-2'. Can you please > share the full trace? > > > > чт, 27 дек. 2018 г. в 12:41, Nikolay Izhikov <[hidden email]>: > > > Alexey > > > > Fix for this issue already in master? > > I run tests on current master. > > > > > Should we somehow announce it on the user-list or highlight on > readme.io > > ? > > > > I don't think our users will be happy to users stuck with this behavior > in > > production. > > > > Am I understand you correctly: > > If someone use 2.7. release and Ignite process slowing for a few seconds > > for any reason(low-end hardwre, VM pause, other processes grab the > > resources) then Ignite node will be stopped? > > > > > This is the issue I mentioned in "Critical worker threads liveness > > checking > > drawbacks" topic > > > > Thanks for the link, I will check it out. > > > > чт, 27 дек. 2018 г. в 12:24, Alexey Goncharuk < > [hidden email] > > >: > > > > > Hi Nikolay, > > > > > > This is the issue I mentioned in "Critical worker threads liveness > > checking > > > drawbacks" topic which I was expecting to be included to Ignite 2.7, > but > > it > > > was not. To workaround the issue, you should set > > > DataStorageConfiguration#setCheckpointReadLockTimeout to 0. > > > > > > Should we somehow announce it on the user-list or highlight on > readme.io > > ? > > > > > > чт, 27 дек. 2018 г. в 11:57, Nikolay Izhikov <[hidden email]>: > > > > > > > Hello, Igniters. > > > > > > > > I run into issue with critical system worker failure handler. > > > > I just run `IgniteDataFrameSuite` and it terminates on random test. > > > > My laptop doesn't have bleeding edge hardware, so tests can take > > > > significant amount of time. > > > > Looks like our watch dog too aggressive on development environment > > > > > > > > Can you please, help me. What should I do to configure or turn off > > watch > > > > dog? > > > > Should we relax it a little bit? At least for a test environment. > > > > > > > > Error message contains following message: > > > > > > > > ``` > > > > [2018-12-27 11:40:23,597][ERROR][exchange-worker-#5547%grid-2%][root] > > > > Critical system error detected. Will be handled accordingly to > > configured > > > > handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, > > > > super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet > > > > [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], > > > > failureCtx=FailureContext [type=SYSTEM_WORKER_TERMINATION, err=class > > > > o.a.i.IgniteCheckedException: Node is stopping: grid-2]] > > > > class org.apache.ignite.IgniteCheckedException: Node is stopping: > > grid-2 > > > > ``` > > > > > > > > > > |
Guys,
there is no problem in blocking thread monitroing. Please, look at the error message: "failureCtx=FailureContext [type=SYSTEM_WORKER_TERMINATION, err=class o.a.i.IgniteCheckedException: Node is stopping: grid-2]]". Some critical worker was terminated unexpectedly. So the problem isn't related with any timeouts. It's a bug that should be investigated. On Thu, Dec 27, 2018 at 9:27 PM Denis Magda <[hidden email]> wrote: > > Folks, > > What are the current timeouts? We need to know the probability of failures > in dev environment. This affect usability. > > -- > Denis > > On Thu, Dec 27, 2018 at 4:59 AM Alexey Goncharuk <[hidden email]> > wrote: > > > Nikolay, > > > > Yes, the fix is already in master. Looks like I was wrong, in your case > > failure handler is triggered by 'Node is stopping: grid-2'. Can you please > > share the full trace? > > > > > > > > чт, 27 дек. 2018 г. в 12:41, Nikolay Izhikov <[hidden email]>: > > > > > Alexey > > > > > > Fix for this issue already in master? > > > I run tests on current master. > > > > > > > Should we somehow announce it on the user-list or highlight on > > readme.io > > > ? > > > > > > I don't think our users will be happy to users stuck with this behavior > > in > > > production. > > > > > > Am I understand you correctly: > > > If someone use 2.7. release and Ignite process slowing for a few seconds > > > for any reason(low-end hardwre, VM pause, other processes grab the > > > resources) then Ignite node will be stopped? > > > > > > > This is the issue I mentioned in "Critical worker threads liveness > > > checking > > > drawbacks" topic > > > > > > Thanks for the link, I will check it out. > > > > > > чт, 27 дек. 2018 г. в 12:24, Alexey Goncharuk < > > [hidden email] > > > >: > > > > > > > Hi Nikolay, > > > > > > > > This is the issue I mentioned in "Critical worker threads liveness > > > checking > > > > drawbacks" topic which I was expecting to be included to Ignite 2.7, > > but > > > it > > > > was not. To workaround the issue, you should set > > > > DataStorageConfiguration#setCheckpointReadLockTimeout to 0. > > > > > > > > Should we somehow announce it on the user-list or highlight on > > readme.io > > > ? > > > > > > > > чт, 27 дек. 2018 г. в 11:57, Nikolay Izhikov <[hidden email]>: > > > > > > > > > Hello, Igniters. > > > > > > > > > > I run into issue with critical system worker failure handler. > > > > > I just run `IgniteDataFrameSuite` and it terminates on random test. > > > > > My laptop doesn't have bleeding edge hardware, so tests can take > > > > > significant amount of time. > > > > > Looks like our watch dog too aggressive on development environment > > > > > > > > > > Can you please, help me. What should I do to configure or turn off > > > watch > > > > > dog? > > > > > Should we relax it a little bit? At least for a test environment. > > > > > > > > > > Error message contains following message: > > > > > > > > > > ``` > > > > > [2018-12-27 11:40:23,597][ERROR][exchange-worker-#5547%grid-2%][root] > > > > > Critical system error detected. Will be handled accordingly to > > > configured > > > > > handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, > > > > > super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet > > > > > [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], > > > > > failureCtx=FailureContext [type=SYSTEM_WORKER_TERMINATION, err=class > > > > > o.a.i.IgniteCheckedException: Node is stopping: grid-2]] > > > > > class org.apache.ignite.IgniteCheckedException: Node is stopping: > > > grid-2 > > > > > ``` > > > > > > > > > > > > > > |
Free forum by Nabble | Edit this page |