Igniters,
I've spent a couple of days in debug mode and found that some of the configured ExecutorServices of an Ignite instance completely hide assertion errors without any logging and node killing. This may violate some internal guarantees and hide serious errors. Let's consider, for instance, GridDhtPartitionsExchangeFuture [1]. It has three places of submitting some Runnable on system executor service. If an assertion error (or even any uncached exception) occurs in the code block below it will be swallowed without logging, exchange future completion or node stoping. cctx.kernalContext().getSystemExecutorService().submit(new Runnable() { @Override public void run() { sendPartitions(newCrd); } }); I've checked that these executor services and most of them configured to catch only OutOfMemoryError. Should we consider catching AssertionErrors as well and treat them as CRITICAL_ERRORS for the Failure Handler? Should we log uncached errors on each of them? The most important list of executor services configured with OOM handler only: execSvc svcExecSvc sysExecSvc p2pExecSvc restExecSvc utilityCacheExecSvc affExecSvc qryExecSvc [1] https://github.com/apache/ignite/blob/master/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/distributed/dht/preloader/GridDhtPartitionsExchangeFuture.java#L4848 |
Hello, Maxim.
I can confirm this itching issue. It also happens when some custom Security plugin throws an exception while processing a communication message. ``` UUID newSecSubjId = secSubjId != null ? secSubjId : nodeId; try (OperationSecurityContext s = ctx.security().withContext(newSecSubjId)) { lsnr.onMessage(nodeId, msg, plc); } finally { if (change) CUR_PLC.set(oldPlc); } ``` If an exception thrown from `withContext` it is hidden from the user. > 4 мая 2020 г., в 19:15, Maxim Muzafarov <[hidden email]> написал(а): > > Igniters, > > > I've spent a couple of days in debug mode and found that some of the > configured ExecutorServices of an Ignite instance completely hide > assertion errors without any logging and node killing. This may > violate some internal guarantees and hide serious errors. > > Let's consider, for instance, GridDhtPartitionsExchangeFuture [1]. It > has three places of submitting some Runnable on system executor > service. If an assertion error (or even any uncached exception) occurs > in the code block below it will be swallowed without logging, exchange > future completion or node stoping. > > cctx.kernalContext().getSystemExecutorService().submit(new Runnable() { > @Override public void run() { > sendPartitions(newCrd); > } > }); > > I've checked that these executor services and most of them configured > to catch only OutOfMemoryError. > > Should we consider catching AssertionErrors as well and treat them as > CRITICAL_ERRORS for the Failure Handler? > Should we log uncached errors on each of them? > > > The most important list of executor services configured with OOM handler only: > execSvc > svcExecSvc > sysExecSvc > p2pExecSvc > restExecSvc > utilityCacheExecSvc > affExecSvc > qryExecSvc > > [1] https://github.com/apache/ignite/blob/master/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/distributed/dht/preloader/GridDhtPartitionsExchangeFuture.java#L4848 |
Folks,
I've created an issue. https://issues.apache.org/jira/browse/IGNITE-13055 On Wed, 6 May 2020 at 10:28, Nikolay Izhikov <[hidden email]> wrote: > > Hello, Maxim. > > I can confirm this itching issue. > It also happens when some custom Security plugin throws an exception while processing a communication message. > > ``` > UUID newSecSubjId = secSubjId != null ? secSubjId : nodeId; > > try (OperationSecurityContext s = ctx.security().withContext(newSecSubjId)) { > lsnr.onMessage(nodeId, msg, plc); > } > finally { > if (change) > CUR_PLC.set(oldPlc); > } > ``` > > If an exception thrown from `withContext` it is hidden from the user. > > > 4 мая 2020 г., в 19:15, Maxim Muzafarov <[hidden email]> написал(а): > > > > Igniters, > > > > > > I've spent a couple of days in debug mode and found that some of the > > configured ExecutorServices of an Ignite instance completely hide > > assertion errors without any logging and node killing. This may > > violate some internal guarantees and hide serious errors. > > > > Let's consider, for instance, GridDhtPartitionsExchangeFuture [1]. It > > has three places of submitting some Runnable on system executor > > service. If an assertion error (or even any uncached exception) occurs > > in the code block below it will be swallowed without logging, exchange > > future completion or node stoping. > > > > cctx.kernalContext().getSystemExecutorService().submit(new Runnable() { > > @Override public void run() { > > sendPartitions(newCrd); > > } > > }); > > > > I've checked that these executor services and most of them configured > > to catch only OutOfMemoryError. > > > > Should we consider catching AssertionErrors as well and treat them as > > CRITICAL_ERRORS for the Failure Handler? > > Should we log uncached errors on each of them? > > > > > > The most important list of executor services configured with OOM handler only: > > execSvc > > svcExecSvc > > sysExecSvc > > p2pExecSvc > > restExecSvc > > utilityCacheExecSvc > > affExecSvc > > qryExecSvc > > > > [1] https://github.com/apache/ignite/blob/master/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/distributed/dht/preloader/GridDhtPartitionsExchangeFuture.java#L4848 > |
Hi Maxim,
I recently had similar troubles related to ExecutorService and exception handling. It was an unexpected behavior for me that exceptions of any kind* thrown from Runnable in a following code: executor.submit(() -> throw new RuntimeException()) were silently ignored. I seemed to me really unexpected for me at first. But later on I figured out the cause. There is a subtle detail which I forgot. ExecutorService.submit returns a Future and leaves a responsibility to handle exceptions of any kind to a caller of Future.get. And it was really subtle for me, because I did not use a returned Future at all. Actually I supposed ExecutorService.execute when used ExecutorService.submit. Is it related to the subject? Can ExecutorService.execute help here? * In my case there were exceptions caused by missing classes on a classpath. 2020-05-21 22:22 GMT+03:00, Maxim Muzafarov <[hidden email]>: > Folks, > > I've created an issue. > https://issues.apache.org/jira/browse/IGNITE-13055 > > On Wed, 6 May 2020 at 10:28, Nikolay Izhikov <[hidden email]> wrote: >> >> Hello, Maxim. >> >> I can confirm this itching issue. >> It also happens when some custom Security plugin throws an exception while >> processing a communication message. >> >> ``` >> UUID newSecSubjId = secSubjId != null ? secSubjId : nodeId; >> >> try (OperationSecurityContext s = >> ctx.security().withContext(newSecSubjId)) { >> lsnr.onMessage(nodeId, msg, plc); >> } >> finally { >> if (change) >> CUR_PLC.set(oldPlc); >> } >> ``` >> >> If an exception thrown from `withContext` it is hidden from the user. >> >> > 4 мая 2020 г., в 19:15, Maxim Muzafarov <[hidden email]> написал(а): >> > >> > Igniters, >> > >> > >> > I've spent a couple of days in debug mode and found that some of the >> > configured ExecutorServices of an Ignite instance completely hide >> > assertion errors without any logging and node killing. This may >> > violate some internal guarantees and hide serious errors. >> > >> > Let's consider, for instance, GridDhtPartitionsExchangeFuture [1]. It >> > has three places of submitting some Runnable on system executor >> > service. If an assertion error (or even any uncached exception) occurs >> > in the code block below it will be swallowed without logging, exchange >> > future completion or node stoping. >> > >> > cctx.kernalContext().getSystemExecutorService().submit(new Runnable() { >> > @Override public void run() { >> > sendPartitions(newCrd); >> > } >> > }); >> > >> > I've checked that these executor services and most of them configured >> > to catch only OutOfMemoryError. >> > >> > Should we consider catching AssertionErrors as well and treat them as >> > CRITICAL_ERRORS for the Failure Handler? >> > Should we log uncached errors on each of them? >> > >> > >> > The most important list of executor services configured with OOM handler >> > only: >> > execSvc >> > svcExecSvc >> > sysExecSvc >> > p2pExecSvc >> > restExecSvc >> > utilityCacheExecSvc >> > affExecSvc >> > qryExecSvc >> > >> > [1] >> > https://github.com/apache/ignite/blob/master/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/distributed/dht/preloader/GridDhtPartitionsExchangeFuture.java#L4848 >> > -- Best regards, Ivan Pavlukhin |
Ivan,
Not sure that I've got your point. Let me give some example: IgniteEx ignite = startGrid(); ignite.context().getSystemExecutorService().execute(() -> { assert false; }); ignite.context().getSystemExecutorService().submit(() -> { assert false; }); In both of these cases there are no exceptions logged and the node not stopped (I assume that failing assertion means a code contract is violated). Of course, I'll get an assertion error if fut.get() will be called but there a lot of cases where it's not. So, I think we must log errors here or fail the node. On Fri, 22 May 2020 at 10:51, Ivan Pavlukhin <[hidden email]> wrote: > > Hi Maxim, > > I recently had similar troubles related to ExecutorService and > exception handling. It was an unexpected behavior for me that > exceptions of any kind* thrown from Runnable in a following code: > executor.submit(() -> throw new RuntimeException()) > were silently ignored. > > I seemed to me really unexpected for me at first. But later on I > figured out the cause. There is a subtle detail which I forgot. > ExecutorService.submit returns a Future and leaves a responsibility to > handle exceptions of any kind to a caller of Future.get. And it was > really subtle for me, because I did not use a returned Future at all. > Actually I supposed ExecutorService.execute when used > ExecutorService.submit. > > Is it related to the subject? Can ExecutorService.execute help here? > > * In my case there were exceptions caused by missing classes on a classpath. > > 2020-05-21 22:22 GMT+03:00, Maxim Muzafarov <[hidden email]>: > > Folks, > > > > I've created an issue. > > https://issues.apache.org/jira/browse/IGNITE-13055 > > > > On Wed, 6 May 2020 at 10:28, Nikolay Izhikov <[hidden email]> wrote: > >> > >> Hello, Maxim. > >> > >> I can confirm this itching issue. > >> It also happens when some custom Security plugin throws an exception while > >> processing a communication message. > >> > >> ``` > >> UUID newSecSubjId = secSubjId != null ? secSubjId : nodeId; > >> > >> try (OperationSecurityContext s = > >> ctx.security().withContext(newSecSubjId)) { > >> lsnr.onMessage(nodeId, msg, plc); > >> } > >> finally { > >> if (change) > >> CUR_PLC.set(oldPlc); > >> } > >> ``` > >> > >> If an exception thrown from `withContext` it is hidden from the user. > >> > >> > 4 мая 2020 г., в 19:15, Maxim Muzafarov <[hidden email]> написал(а): > >> > > >> > Igniters, > >> > > >> > > >> > I've spent a couple of days in debug mode and found that some of the > >> > configured ExecutorServices of an Ignite instance completely hide > >> > assertion errors without any logging and node killing. This may > >> > violate some internal guarantees and hide serious errors. > >> > > >> > Let's consider, for instance, GridDhtPartitionsExchangeFuture [1]. It > >> > has three places of submitting some Runnable on system executor > >> > service. If an assertion error (or even any uncached exception) occurs > >> > in the code block below it will be swallowed without logging, exchange > >> > future completion or node stoping. > >> > > >> > cctx.kernalContext().getSystemExecutorService().submit(new Runnable() { > >> > @Override public void run() { > >> > sendPartitions(newCrd); > >> > } > >> > }); > >> > > >> > I've checked that these executor services and most of them configured > >> > to catch only OutOfMemoryError. > >> > > >> > Should we consider catching AssertionErrors as well and treat them as > >> > CRITICAL_ERRORS for the Failure Handler? > >> > Should we log uncached errors on each of them? > >> > > >> > > >> > The most important list of executor services configured with OOM handler > >> > only: > >> > execSvc > >> > svcExecSvc > >> > sysExecSvc > >> > p2pExecSvc > >> > restExecSvc > >> > utilityCacheExecSvc > >> > affExecSvc > >> > qryExecSvc > >> > > >> > [1] > >> > https://github.com/apache/ignite/blob/master/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/distributed/dht/preloader/GridDhtPartitionsExchangeFuture.java#L4848 > >> > > > > > -- > > Best regards, > Ivan Pavlukhin |
Maxim,
I looked into the code and checked how do we setup uncaught exception handlers for internal executors. Indeed it looks not good (OOME leads to failure handling, other exceptions are silently ignored). Logging sounds good. Have doubts about failure handling (e.g. terminating a node). I remember at least one related discussion [1]. And here are some statements which causes my doubts: 1. Not only assertion errors are a programming errors evidence, e.g. a simple NPE often signals the same. 2. Assertions might be disabled in production. 3. Sometimes it seems that assertions are often overused in the Ignite code. Intermittent concurrency-related assertion errors are not so uncommon. And it seems impossible to figure out how critical a particular error is. [1] http://apache-ignite-developers.2346864.n4.nabble.com/Exceptions-thrown-in-IndexingSpi-and-quot-fail-fast-quot-principle-td38904.html 2020-05-24 12:13 GMT+03:00, Maxim Muzafarov <[hidden email]>: > Ivan, > > Not sure that I've got your point. Let me give some example: > > IgniteEx ignite = startGrid(); > ignite.context().getSystemExecutorService().execute(() -> { assert false; > }); > ignite.context().getSystemExecutorService().submit(() -> { assert false; > }); > > In both of these cases there are no exceptions logged and the node not > stopped (I assume that failing assertion means a code contract is > violated). Of course, I'll get an assertion error if fut.get() will be > called but there a lot of cases where it's not. So, I think we must > log errors here or fail the node. > > On Fri, 22 May 2020 at 10:51, Ivan Pavlukhin <[hidden email]> wrote: >> >> Hi Maxim, >> >> I recently had similar troubles related to ExecutorService and >> exception handling. It was an unexpected behavior for me that >> exceptions of any kind* thrown from Runnable in a following code: >> executor.submit(() -> throw new RuntimeException()) >> were silently ignored. >> >> I seemed to me really unexpected for me at first. But later on I >> figured out the cause. There is a subtle detail which I forgot. >> ExecutorService.submit returns a Future and leaves a responsibility to >> handle exceptions of any kind to a caller of Future.get. And it was >> really subtle for me, because I did not use a returned Future at all. >> Actually I supposed ExecutorService.execute when used >> ExecutorService.submit. >> >> Is it related to the subject? Can ExecutorService.execute help here? >> >> * In my case there were exceptions caused by missing classes on a >> classpath. >> >> 2020-05-21 22:22 GMT+03:00, Maxim Muzafarov <[hidden email]>: >> > Folks, >> > >> > I've created an issue. >> > https://issues.apache.org/jira/browse/IGNITE-13055 >> > >> > On Wed, 6 May 2020 at 10:28, Nikolay Izhikov <[hidden email]> >> > wrote: >> >> >> >> Hello, Maxim. >> >> >> >> I can confirm this itching issue. >> >> It also happens when some custom Security plugin throws an exception >> >> while >> >> processing a communication message. >> >> >> >> ``` >> >> UUID newSecSubjId = secSubjId != null ? secSubjId : nodeId; >> >> >> >> try (OperationSecurityContext s = >> >> ctx.security().withContext(newSecSubjId)) { >> >> lsnr.onMessage(nodeId, msg, plc); >> >> } >> >> finally { >> >> if (change) >> >> CUR_PLC.set(oldPlc); >> >> } >> >> ``` >> >> >> >> If an exception thrown from `withContext` it is hidden from the user. >> >> >> >> > 4 мая 2020 г., в 19:15, Maxim Muzafarov <[hidden email]> >> >> > написал(а): >> >> > >> >> > Igniters, >> >> > >> >> > >> >> > I've spent a couple of days in debug mode and found that some of the >> >> > configured ExecutorServices of an Ignite instance completely hide >> >> > assertion errors without any logging and node killing. This may >> >> > violate some internal guarantees and hide serious errors. >> >> > >> >> > Let's consider, for instance, GridDhtPartitionsExchangeFuture [1]. >> >> > It >> >> > has three places of submitting some Runnable on system executor >> >> > service. If an assertion error (or even any uncached exception) >> >> > occurs >> >> > in the code block below it will be swallowed without logging, >> >> > exchange >> >> > future completion or node stoping. >> >> > >> >> > cctx.kernalContext().getSystemExecutorService().submit(new Runnable() >> >> > { >> >> > @Override public void run() { >> >> > sendPartitions(newCrd); >> >> > } >> >> > }); >> >> > >> >> > I've checked that these executor services and most of them >> >> > configured >> >> > to catch only OutOfMemoryError. >> >> > >> >> > Should we consider catching AssertionErrors as well and treat them >> >> > as >> >> > CRITICAL_ERRORS for the Failure Handler? >> >> > Should we log uncached errors on each of them? >> >> > >> >> > >> >> > The most important list of executor services configured with OOM >> >> > handler >> >> > only: >> >> > execSvc >> >> > svcExecSvc >> >> > sysExecSvc >> >> > p2pExecSvc >> >> > restExecSvc >> >> > utilityCacheExecSvc >> >> > affExecSvc >> >> > qryExecSvc >> >> > >> >> > [1] >> >> > https://github.com/apache/ignite/blob/master/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/distributed/dht/preloader/GridDhtPartitionsExchangeFuture.java#L4848 >> >> >> > >> >> >> -- >> >> Best regards, >> Ivan Pavlukhin > -- Best regards, Ivan Pavlukhin |
Ivan,
> 2. Assertions might be disabled in production. Assertions must be disabled in production :-) Here is another side of the same coin. A contributor who makes a new patch must be able to see failed tests immediately, he must not check all the logs with logged assertions. The assertion is an error (not to be confused with an exception), all the errors must fail the node. If we not enabling catching assertions for ExecutorServices this patch for all the cases, at least it must be done for all tests running the TeamCity environment. But I think we should enable fail node if an error occurs (assume assertions disabled in production). On Mon, 25 May 2020 at 12:36, Ivan Pavlukhin <[hidden email]> wrote: > > Maxim, > > I looked into the code and checked how do we setup uncaught exception > handlers for internal executors. Indeed it looks not good (OOME leads > to failure handling, other exceptions are silently ignored). Logging > sounds good. > > Have doubts about failure handling (e.g. terminating a node). I > remember at least one related discussion [1]. And here are some > statements which causes my doubts: > 1. Not only assertion errors are a programming errors evidence, e.g. a > simple NPE often signals the same. > 2. Assertions might be disabled in production. > 3. Sometimes it seems that assertions are often overused in the Ignite > code. Intermittent concurrency-related assertion errors are not so > uncommon. And it seems impossible to figure out how critical a > particular error is. > > [1] http://apache-ignite-developers.2346864.n4.nabble.com/Exceptions-thrown-in-IndexingSpi-and-quot-fail-fast-quot-principle-td38904.html > > 2020-05-24 12:13 GMT+03:00, Maxim Muzafarov <[hidden email]>: > > Ivan, > > > > Not sure that I've got your point. Let me give some example: > > > > IgniteEx ignite = startGrid(); > > ignite.context().getSystemExecutorService().execute(() -> { assert false; > > }); > > ignite.context().getSystemExecutorService().submit(() -> { assert false; > > }); > > > > In both of these cases there are no exceptions logged and the node not > > stopped (I assume that failing assertion means a code contract is > > violated). Of course, I'll get an assertion error if fut.get() will be > > called but there a lot of cases where it's not. So, I think we must > > log errors here or fail the node. > > > > On Fri, 22 May 2020 at 10:51, Ivan Pavlukhin <[hidden email]> wrote: > >> > >> Hi Maxim, > >> > >> I recently had similar troubles related to ExecutorService and > >> exception handling. It was an unexpected behavior for me that > >> exceptions of any kind* thrown from Runnable in a following code: > >> executor.submit(() -> throw new RuntimeException()) > >> were silently ignored. > >> > >> I seemed to me really unexpected for me at first. But later on I > >> figured out the cause. There is a subtle detail which I forgot. > >> ExecutorService.submit returns a Future and leaves a responsibility to > >> handle exceptions of any kind to a caller of Future.get. And it was > >> really subtle for me, because I did not use a returned Future at all. > >> Actually I supposed ExecutorService.execute when used > >> ExecutorService.submit. > >> > >> Is it related to the subject? Can ExecutorService.execute help here? > >> > >> * In my case there were exceptions caused by missing classes on a > >> classpath. > >> > >> 2020-05-21 22:22 GMT+03:00, Maxim Muzafarov <[hidden email]>: > >> > Folks, > >> > > >> > I've created an issue. > >> > https://issues.apache.org/jira/browse/IGNITE-13055 > >> > > >> > On Wed, 6 May 2020 at 10:28, Nikolay Izhikov <[hidden email]> > >> > wrote: > >> >> > >> >> Hello, Maxim. > >> >> > >> >> I can confirm this itching issue. > >> >> It also happens when some custom Security plugin throws an exception > >> >> while > >> >> processing a communication message. > >> >> > >> >> ``` > >> >> UUID newSecSubjId = secSubjId != null ? secSubjId : nodeId; > >> >> > >> >> try (OperationSecurityContext s = > >> >> ctx.security().withContext(newSecSubjId)) { > >> >> lsnr.onMessage(nodeId, msg, plc); > >> >> } > >> >> finally { > >> >> if (change) > >> >> CUR_PLC.set(oldPlc); > >> >> } > >> >> ``` > >> >> > >> >> If an exception thrown from `withContext` it is hidden from the user. > >> >> > >> >> > 4 мая 2020 г., в 19:15, Maxim Muzafarov <[hidden email]> > >> >> > написал(а): > >> >> > > >> >> > Igniters, > >> >> > > >> >> > > >> >> > I've spent a couple of days in debug mode and found that some of the > >> >> > configured ExecutorServices of an Ignite instance completely hide > >> >> > assertion errors without any logging and node killing. This may > >> >> > violate some internal guarantees and hide serious errors. > >> >> > > >> >> > Let's consider, for instance, GridDhtPartitionsExchangeFuture [1]. > >> >> > It > >> >> > has three places of submitting some Runnable on system executor > >> >> > service. If an assertion error (or even any uncached exception) > >> >> > occurs > >> >> > in the code block below it will be swallowed without logging, > >> >> > exchange > >> >> > future completion or node stoping. > >> >> > > >> >> > cctx.kernalContext().getSystemExecutorService().submit(new Runnable() > >> >> > { > >> >> > @Override public void run() { > >> >> > sendPartitions(newCrd); > >> >> > } > >> >> > }); > >> >> > > >> >> > I've checked that these executor services and most of them > >> >> > configured > >> >> > to catch only OutOfMemoryError. > >> >> > > >> >> > Should we consider catching AssertionErrors as well and treat them > >> >> > as > >> >> > CRITICAL_ERRORS for the Failure Handler? > >> >> > Should we log uncached errors on each of them? > >> >> > > >> >> > > >> >> > The most important list of executor services configured with OOM > >> >> > handler > >> >> > only: > >> >> > execSvc > >> >> > svcExecSvc > >> >> > sysExecSvc > >> >> > p2pExecSvc > >> >> > restExecSvc > >> >> > utilityCacheExecSvc > >> >> > affExecSvc > >> >> > qryExecSvc > >> >> > > >> >> > [1] > >> >> > https://github.com/apache/ignite/blob/master/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/distributed/dht/preloader/GridDhtPartitionsExchangeFuture.java#L4848 > >> >> > >> > > >> > >> > >> -- > >> > >> Best regards, > >> Ivan Pavlukhin > > > > > -- > > Best regards, > Ivan Pavlukhin |
Maxim,
> Assertions must be disabled in production :-) Since production deployments are out of our control we should think about ones with enabled assertions as well (and they do occur in practice). > A contributor who makes a new patch must be able to see failed tests immediately, he must not check all the logs with logged assertions. The assertion is an error (not to be confused with an exception), all the errors must fail the node. I am not sure how practical it is today. I might have missed some recent activities but I recall an effort to enable a node terminating failure handler in tests. AFAIR, it was not successful and had lead to troubles with TC build results interpretation. 2020-05-25 14:27 GMT+03:00, Maxim Muzafarov <[hidden email]>: > Ivan, > >> 2. Assertions might be disabled in production. > Assertions must be disabled in production :-) > > Here is another side of the same coin. > A contributor who makes a new patch must be able to see failed tests > immediately, he must not check all the logs with logged assertions. > The assertion is an error (not to be confused with an exception), all > the errors must fail the node. > > If we not enabling catching assertions for ExecutorServices this patch > for all the cases, at least it must be done for all tests running the > TeamCity environment. But I think we should enable fail node if an > error occurs (assume assertions disabled in production). > > On Mon, 25 May 2020 at 12:36, Ivan Pavlukhin <[hidden email]> wrote: >> >> Maxim, >> >> I looked into the code and checked how do we setup uncaught exception >> handlers for internal executors. Indeed it looks not good (OOME leads >> to failure handling, other exceptions are silently ignored). Logging >> sounds good. >> >> Have doubts about failure handling (e.g. terminating a node). I >> remember at least one related discussion [1]. And here are some >> statements which causes my doubts: >> 1. Not only assertion errors are a programming errors evidence, e.g. a >> simple NPE often signals the same. >> 2. Assertions might be disabled in production. >> 3. Sometimes it seems that assertions are often overused in the Ignite >> code. Intermittent concurrency-related assertion errors are not so >> uncommon. And it seems impossible to figure out how critical a >> particular error is. >> >> [1] >> http://apache-ignite-developers.2346864.n4.nabble.com/Exceptions-thrown-in-IndexingSpi-and-quot-fail-fast-quot-principle-td38904.html >> >> 2020-05-24 12:13 GMT+03:00, Maxim Muzafarov <[hidden email]>: >> > Ivan, >> > >> > Not sure that I've got your point. Let me give some example: >> > >> > IgniteEx ignite = startGrid(); >> > ignite.context().getSystemExecutorService().execute(() -> { assert >> > false; >> > }); >> > ignite.context().getSystemExecutorService().submit(() -> { assert >> > false; >> > }); >> > >> > In both of these cases there are no exceptions logged and the node not >> > stopped (I assume that failing assertion means a code contract is >> > violated). Of course, I'll get an assertion error if fut.get() will be >> > called but there a lot of cases where it's not. So, I think we must >> > log errors here or fail the node. >> > >> > On Fri, 22 May 2020 at 10:51, Ivan Pavlukhin <[hidden email]> >> > wrote: >> >> >> >> Hi Maxim, >> >> >> >> I recently had similar troubles related to ExecutorService and >> >> exception handling. It was an unexpected behavior for me that >> >> exceptions of any kind* thrown from Runnable in a following code: >> >> executor.submit(() -> throw new RuntimeException()) >> >> were silently ignored. >> >> >> >> I seemed to me really unexpected for me at first. But later on I >> >> figured out the cause. There is a subtle detail which I forgot. >> >> ExecutorService.submit returns a Future and leaves a responsibility to >> >> handle exceptions of any kind to a caller of Future.get. And it was >> >> really subtle for me, because I did not use a returned Future at all. >> >> Actually I supposed ExecutorService.execute when used >> >> ExecutorService.submit. >> >> >> >> Is it related to the subject? Can ExecutorService.execute help here? >> >> >> >> * In my case there were exceptions caused by missing classes on a >> >> classpath. >> >> >> >> 2020-05-21 22:22 GMT+03:00, Maxim Muzafarov <[hidden email]>: >> >> > Folks, >> >> > >> >> > I've created an issue. >> >> > https://issues.apache.org/jira/browse/IGNITE-13055 >> >> > >> >> > On Wed, 6 May 2020 at 10:28, Nikolay Izhikov <[hidden email]> >> >> > wrote: >> >> >> >> >> >> Hello, Maxim. >> >> >> >> >> >> I can confirm this itching issue. >> >> >> It also happens when some custom Security plugin throws an >> >> >> exception >> >> >> while >> >> >> processing a communication message. >> >> >> >> >> >> ``` >> >> >> UUID newSecSubjId = secSubjId != null ? secSubjId : nodeId; >> >> >> >> >> >> try (OperationSecurityContext s = >> >> >> ctx.security().withContext(newSecSubjId)) { >> >> >> lsnr.onMessage(nodeId, msg, plc); >> >> >> } >> >> >> finally { >> >> >> if (change) >> >> >> CUR_PLC.set(oldPlc); >> >> >> } >> >> >> ``` >> >> >> >> >> >> If an exception thrown from `withContext` it is hidden from the >> >> >> user. >> >> >> >> >> >> > 4 мая 2020 г., в 19:15, Maxim Muzafarov <[hidden email]> >> >> >> > написал(а): >> >> >> > >> >> >> > Igniters, >> >> >> > >> >> >> > >> >> >> > I've spent a couple of days in debug mode and found that some of >> >> >> > the >> >> >> > configured ExecutorServices of an Ignite instance completely hide >> >> >> > assertion errors without any logging and node killing. This may >> >> >> > violate some internal guarantees and hide serious errors. >> >> >> > >> >> >> > Let's consider, for instance, GridDhtPartitionsExchangeFuture >> >> >> > [1]. >> >> >> > It >> >> >> > has three places of submitting some Runnable on system executor >> >> >> > service. If an assertion error (or even any uncached exception) >> >> >> > occurs >> >> >> > in the code block below it will be swallowed without logging, >> >> >> > exchange >> >> >> > future completion or node stoping. >> >> >> > >> >> >> > cctx.kernalContext().getSystemExecutorService().submit(new >> >> >> > Runnable() >> >> >> > { >> >> >> > @Override public void run() { >> >> >> > sendPartitions(newCrd); >> >> >> > } >> >> >> > }); >> >> >> > >> >> >> > I've checked that these executor services and most of them >> >> >> > configured >> >> >> > to catch only OutOfMemoryError. >> >> >> > >> >> >> > Should we consider catching AssertionErrors as well and treat >> >> >> > them >> >> >> > as >> >> >> > CRITICAL_ERRORS for the Failure Handler? >> >> >> > Should we log uncached errors on each of them? >> >> >> > >> >> >> > >> >> >> > The most important list of executor services configured with OOM >> >> >> > handler >> >> >> > only: >> >> >> > execSvc >> >> >> > svcExecSvc >> >> >> > sysExecSvc >> >> >> > p2pExecSvc >> >> >> > restExecSvc >> >> >> > utilityCacheExecSvc >> >> >> > affExecSvc >> >> >> > qryExecSvc >> >> >> > >> >> >> > [1] >> >> >> > https://github.com/apache/ignite/blob/master/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/distributed/dht/preloader/GridDhtPartitionsExchangeFuture.java#L4848 >> >> >> >> >> > >> >> >> >> >> >> -- >> >> >> >> Best regards, >> >> Ivan Pavlukhin >> > >> >> >> -- >> >> Best regards, >> Ivan Pavlukhin > -- Best regards, Ivan Pavlukhin |
Free forum by Nabble | Edit this page |