ExecutorServices hide assertions without logging and node stop

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

ExecutorServices hide assertions without logging and node stop

Maxim Muzafarov
Igniters,


I've spent a couple of days in debug mode and found that some of the
configured ExecutorServices of an Ignite instance completely hide
assertion errors without any logging and node killing. This may
violate some internal guarantees and hide serious errors.

Let's consider, for instance, GridDhtPartitionsExchangeFuture [1]. It
has three places of submitting some Runnable on system executor
service. If an assertion error (or even any uncached exception) occurs
in the code block below it will be swallowed without logging, exchange
future completion or node stoping.

cctx.kernalContext().getSystemExecutorService().submit(new Runnable() {
    @Override public void run() {
        sendPartitions(newCrd);
    }
});

I've checked that these executor services and most of them configured
to catch only OutOfMemoryError.

Should we consider catching AssertionErrors as well and treat them as
CRITICAL_ERRORS for the Failure Handler?
Should we log uncached errors on each of them?


The most important list of executor services configured with OOM handler only:
execSvc
svcExecSvc
sysExecSvc
p2pExecSvc
restExecSvc
utilityCacheExecSvc
affExecSvc
qryExecSvc

[1] https://github.com/apache/ignite/blob/master/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/distributed/dht/preloader/GridDhtPartitionsExchangeFuture.java#L4848
Reply | Threaded
Open this post in threaded view
|

Re: ExecutorServices hide assertions without logging and node stop

Nikolay Izhikov-2
Hello, Maxim.

I can confirm this itching issue.
It also happens when some custom Security plugin throws an exception while processing a communication message.

```
UUID newSecSubjId = secSubjId != null ? secSubjId : nodeId;

try (OperationSecurityContext s = ctx.security().withContext(newSecSubjId)) {
    lsnr.onMessage(nodeId, msg, plc);
}
finally {
    if (change)
        CUR_PLC.set(oldPlc);
}
```

If an exception thrown from `withContext` it is hidden from the user.

> 4 мая 2020 г., в 19:15, Maxim Muzafarov <[hidden email]> написал(а):
>
> Igniters,
>
>
> I've spent a couple of days in debug mode and found that some of the
> configured ExecutorServices of an Ignite instance completely hide
> assertion errors without any logging and node killing. This may
> violate some internal guarantees and hide serious errors.
>
> Let's consider, for instance, GridDhtPartitionsExchangeFuture [1]. It
> has three places of submitting some Runnable on system executor
> service. If an assertion error (or even any uncached exception) occurs
> in the code block below it will be swallowed without logging, exchange
> future completion or node stoping.
>
> cctx.kernalContext().getSystemExecutorService().submit(new Runnable() {
>    @Override public void run() {
>        sendPartitions(newCrd);
>    }
> });
>
> I've checked that these executor services and most of them configured
> to catch only OutOfMemoryError.
>
> Should we consider catching AssertionErrors as well and treat them as
> CRITICAL_ERRORS for the Failure Handler?
> Should we log uncached errors on each of them?
>
>
> The most important list of executor services configured with OOM handler only:
> execSvc
> svcExecSvc
> sysExecSvc
> p2pExecSvc
> restExecSvc
> utilityCacheExecSvc
> affExecSvc
> qryExecSvc
>
> [1] https://github.com/apache/ignite/blob/master/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/distributed/dht/preloader/GridDhtPartitionsExchangeFuture.java#L4848

Reply | Threaded
Open this post in threaded view
|

Re: ExecutorServices hide assertions without logging and node stop

Maxim Muzafarov
Folks,

I've created an issue.
https://issues.apache.org/jira/browse/IGNITE-13055

On Wed, 6 May 2020 at 10:28, Nikolay Izhikov <[hidden email]> wrote:

>
> Hello, Maxim.
>
> I can confirm this itching issue.
> It also happens when some custom Security plugin throws an exception while processing a communication message.
>
> ```
> UUID newSecSubjId = secSubjId != null ? secSubjId : nodeId;
>
> try (OperationSecurityContext s = ctx.security().withContext(newSecSubjId)) {
>     lsnr.onMessage(nodeId, msg, plc);
> }
> finally {
>     if (change)
>         CUR_PLC.set(oldPlc);
> }
> ```
>
> If an exception thrown from `withContext` it is hidden from the user.
>
> > 4 мая 2020 г., в 19:15, Maxim Muzafarov <[hidden email]> написал(а):
> >
> > Igniters,
> >
> >
> > I've spent a couple of days in debug mode and found that some of the
> > configured ExecutorServices of an Ignite instance completely hide
> > assertion errors without any logging and node killing. This may
> > violate some internal guarantees and hide serious errors.
> >
> > Let's consider, for instance, GridDhtPartitionsExchangeFuture [1]. It
> > has three places of submitting some Runnable on system executor
> > service. If an assertion error (or even any uncached exception) occurs
> > in the code block below it will be swallowed without logging, exchange
> > future completion or node stoping.
> >
> > cctx.kernalContext().getSystemExecutorService().submit(new Runnable() {
> >    @Override public void run() {
> >        sendPartitions(newCrd);
> >    }
> > });
> >
> > I've checked that these executor services and most of them configured
> > to catch only OutOfMemoryError.
> >
> > Should we consider catching AssertionErrors as well and treat them as
> > CRITICAL_ERRORS for the Failure Handler?
> > Should we log uncached errors on each of them?
> >
> >
> > The most important list of executor services configured with OOM handler only:
> > execSvc
> > svcExecSvc
> > sysExecSvc
> > p2pExecSvc
> > restExecSvc
> > utilityCacheExecSvc
> > affExecSvc
> > qryExecSvc
> >
> > [1] https://github.com/apache/ignite/blob/master/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/distributed/dht/preloader/GridDhtPartitionsExchangeFuture.java#L4848
>
Reply | Threaded
Open this post in threaded view
|

Re: ExecutorServices hide assertions without logging and node stop

Ivan Pavlukhin
Hi Maxim,

I recently had similar troubles related to ExecutorService and
exception handling. It was an unexpected behavior for me that
exceptions of any kind* thrown from Runnable in a following code:
executor.submit(() -> throw new RuntimeException())
were silently ignored.

I seemed to me really unexpected for me at first. But later on I
figured out the cause. There is a subtle detail which I forgot.
ExecutorService.submit returns a Future and leaves a responsibility to
handle exceptions of any kind to a caller of Future.get. And it was
really subtle for me, because I did not use a returned Future at all.
Actually I supposed ExecutorService.execute when used
ExecutorService.submit.

Is it related to the subject? Can ExecutorService.execute help here?

* In my case there were exceptions caused by missing classes on a classpath.

2020-05-21 22:22 GMT+03:00, Maxim Muzafarov <[hidden email]>:

> Folks,
>
> I've created an issue.
> https://issues.apache.org/jira/browse/IGNITE-13055
>
> On Wed, 6 May 2020 at 10:28, Nikolay Izhikov <[hidden email]> wrote:
>>
>> Hello, Maxim.
>>
>> I can confirm this itching issue.
>> It also happens when some custom Security plugin throws an exception while
>> processing a communication message.
>>
>> ```
>> UUID newSecSubjId = secSubjId != null ? secSubjId : nodeId;
>>
>> try (OperationSecurityContext s =
>> ctx.security().withContext(newSecSubjId)) {
>>     lsnr.onMessage(nodeId, msg, plc);
>> }
>> finally {
>>     if (change)
>>         CUR_PLC.set(oldPlc);
>> }
>> ```
>>
>> If an exception thrown from `withContext` it is hidden from the user.
>>
>> > 4 мая 2020 г., в 19:15, Maxim Muzafarov <[hidden email]> написал(а):
>> >
>> > Igniters,
>> >
>> >
>> > I've spent a couple of days in debug mode and found that some of the
>> > configured ExecutorServices of an Ignite instance completely hide
>> > assertion errors without any logging and node killing. This may
>> > violate some internal guarantees and hide serious errors.
>> >
>> > Let's consider, for instance, GridDhtPartitionsExchangeFuture [1]. It
>> > has three places of submitting some Runnable on system executor
>> > service. If an assertion error (or even any uncached exception) occurs
>> > in the code block below it will be swallowed without logging, exchange
>> > future completion or node stoping.
>> >
>> > cctx.kernalContext().getSystemExecutorService().submit(new Runnable() {
>> >    @Override public void run() {
>> >        sendPartitions(newCrd);
>> >    }
>> > });
>> >
>> > I've checked that these executor services and most of them configured
>> > to catch only OutOfMemoryError.
>> >
>> > Should we consider catching AssertionErrors as well and treat them as
>> > CRITICAL_ERRORS for the Failure Handler?
>> > Should we log uncached errors on each of them?
>> >
>> >
>> > The most important list of executor services configured with OOM handler
>> > only:
>> > execSvc
>> > svcExecSvc
>> > sysExecSvc
>> > p2pExecSvc
>> > restExecSvc
>> > utilityCacheExecSvc
>> > affExecSvc
>> > qryExecSvc
>> >
>> > [1]
>> > https://github.com/apache/ignite/blob/master/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/distributed/dht/preloader/GridDhtPartitionsExchangeFuture.java#L4848
>>
>


--

Best regards,
Ivan Pavlukhin
Reply | Threaded
Open this post in threaded view
|

Re: ExecutorServices hide assertions without logging and node stop

Maxim Muzafarov
Ivan,

Not sure that I've got your point. Let me give some example:

IgniteEx ignite = startGrid();
ignite.context().getSystemExecutorService().execute(() -> { assert false; });
ignite.context().getSystemExecutorService().submit(() -> { assert false; });

In both of these cases there are no exceptions logged and the node not
stopped (I assume that failing assertion means a code contract is
violated). Of course, I'll get an assertion error if fut.get() will be
called but there a lot of cases where it's not. So, I think we must
log errors here or fail the node.

On Fri, 22 May 2020 at 10:51, Ivan Pavlukhin <[hidden email]> wrote:

>
> Hi Maxim,
>
> I recently had similar troubles related to ExecutorService and
> exception handling. It was an unexpected behavior for me that
> exceptions of any kind* thrown from Runnable in a following code:
> executor.submit(() -> throw new RuntimeException())
> were silently ignored.
>
> I seemed to me really unexpected for me at first. But later on I
> figured out the cause. There is a subtle detail which I forgot.
> ExecutorService.submit returns a Future and leaves a responsibility to
> handle exceptions of any kind to a caller of Future.get. And it was
> really subtle for me, because I did not use a returned Future at all.
> Actually I supposed ExecutorService.execute when used
> ExecutorService.submit.
>
> Is it related to the subject? Can ExecutorService.execute help here?
>
> * In my case there were exceptions caused by missing classes on a classpath.
>
> 2020-05-21 22:22 GMT+03:00, Maxim Muzafarov <[hidden email]>:
> > Folks,
> >
> > I've created an issue.
> > https://issues.apache.org/jira/browse/IGNITE-13055
> >
> > On Wed, 6 May 2020 at 10:28, Nikolay Izhikov <[hidden email]> wrote:
> >>
> >> Hello, Maxim.
> >>
> >> I can confirm this itching issue.
> >> It also happens when some custom Security plugin throws an exception while
> >> processing a communication message.
> >>
> >> ```
> >> UUID newSecSubjId = secSubjId != null ? secSubjId : nodeId;
> >>
> >> try (OperationSecurityContext s =
> >> ctx.security().withContext(newSecSubjId)) {
> >>     lsnr.onMessage(nodeId, msg, plc);
> >> }
> >> finally {
> >>     if (change)
> >>         CUR_PLC.set(oldPlc);
> >> }
> >> ```
> >>
> >> If an exception thrown from `withContext` it is hidden from the user.
> >>
> >> > 4 мая 2020 г., в 19:15, Maxim Muzafarov <[hidden email]> написал(а):
> >> >
> >> > Igniters,
> >> >
> >> >
> >> > I've spent a couple of days in debug mode and found that some of the
> >> > configured ExecutorServices of an Ignite instance completely hide
> >> > assertion errors without any logging and node killing. This may
> >> > violate some internal guarantees and hide serious errors.
> >> >
> >> > Let's consider, for instance, GridDhtPartitionsExchangeFuture [1]. It
> >> > has three places of submitting some Runnable on system executor
> >> > service. If an assertion error (or even any uncached exception) occurs
> >> > in the code block below it will be swallowed without logging, exchange
> >> > future completion or node stoping.
> >> >
> >> > cctx.kernalContext().getSystemExecutorService().submit(new Runnable() {
> >> >    @Override public void run() {
> >> >        sendPartitions(newCrd);
> >> >    }
> >> > });
> >> >
> >> > I've checked that these executor services and most of them configured
> >> > to catch only OutOfMemoryError.
> >> >
> >> > Should we consider catching AssertionErrors as well and treat them as
> >> > CRITICAL_ERRORS for the Failure Handler?
> >> > Should we log uncached errors on each of them?
> >> >
> >> >
> >> > The most important list of executor services configured with OOM handler
> >> > only:
> >> > execSvc
> >> > svcExecSvc
> >> > sysExecSvc
> >> > p2pExecSvc
> >> > restExecSvc
> >> > utilityCacheExecSvc
> >> > affExecSvc
> >> > qryExecSvc
> >> >
> >> > [1]
> >> > https://github.com/apache/ignite/blob/master/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/distributed/dht/preloader/GridDhtPartitionsExchangeFuture.java#L4848
> >>
> >
>
>
> --
>
> Best regards,
> Ivan Pavlukhin
Reply | Threaded
Open this post in threaded view
|

Re: ExecutorServices hide assertions without logging and node stop

Ivan Pavlukhin
Maxim,

I looked into the code and checked how do we setup uncaught exception
handlers for internal executors. Indeed it looks not good (OOME leads
to failure handling, other exceptions are silently ignored). Logging
sounds good.

Have doubts about failure handling (e.g. terminating a node). I
remember at least one related discussion [1]. And here are some
statements which causes my doubts:
1. Not only assertion errors are a programming errors evidence, e.g. a
simple NPE often signals the same.
2. Assertions might be disabled in production.
3. Sometimes it seems that assertions are often overused in the Ignite
code. Intermittent concurrency-related assertion errors are not so
uncommon. And it seems impossible to figure out how critical a
particular error is.

[1] http://apache-ignite-developers.2346864.n4.nabble.com/Exceptions-thrown-in-IndexingSpi-and-quot-fail-fast-quot-principle-td38904.html

2020-05-24 12:13 GMT+03:00, Maxim Muzafarov <[hidden email]>:

> Ivan,
>
> Not sure that I've got your point. Let me give some example:
>
> IgniteEx ignite = startGrid();
> ignite.context().getSystemExecutorService().execute(() -> { assert false;
> });
> ignite.context().getSystemExecutorService().submit(() -> { assert false;
> });
>
> In both of these cases there are no exceptions logged and the node not
> stopped (I assume that failing assertion means a code contract is
> violated). Of course, I'll get an assertion error if fut.get() will be
> called but there a lot of cases where it's not. So, I think we must
> log errors here or fail the node.
>
> On Fri, 22 May 2020 at 10:51, Ivan Pavlukhin <[hidden email]> wrote:
>>
>> Hi Maxim,
>>
>> I recently had similar troubles related to ExecutorService and
>> exception handling. It was an unexpected behavior for me that
>> exceptions of any kind* thrown from Runnable in a following code:
>> executor.submit(() -> throw new RuntimeException())
>> were silently ignored.
>>
>> I seemed to me really unexpected for me at first. But later on I
>> figured out the cause. There is a subtle detail which I forgot.
>> ExecutorService.submit returns a Future and leaves a responsibility to
>> handle exceptions of any kind to a caller of Future.get. And it was
>> really subtle for me, because I did not use a returned Future at all.
>> Actually I supposed ExecutorService.execute when used
>> ExecutorService.submit.
>>
>> Is it related to the subject? Can ExecutorService.execute help here?
>>
>> * In my case there were exceptions caused by missing classes on a
>> classpath.
>>
>> 2020-05-21 22:22 GMT+03:00, Maxim Muzafarov <[hidden email]>:
>> > Folks,
>> >
>> > I've created an issue.
>> > https://issues.apache.org/jira/browse/IGNITE-13055
>> >
>> > On Wed, 6 May 2020 at 10:28, Nikolay Izhikov <[hidden email]>
>> > wrote:
>> >>
>> >> Hello, Maxim.
>> >>
>> >> I can confirm this itching issue.
>> >> It also happens when some custom Security plugin throws an exception
>> >> while
>> >> processing a communication message.
>> >>
>> >> ```
>> >> UUID newSecSubjId = secSubjId != null ? secSubjId : nodeId;
>> >>
>> >> try (OperationSecurityContext s =
>> >> ctx.security().withContext(newSecSubjId)) {
>> >>     lsnr.onMessage(nodeId, msg, plc);
>> >> }
>> >> finally {
>> >>     if (change)
>> >>         CUR_PLC.set(oldPlc);
>> >> }
>> >> ```
>> >>
>> >> If an exception thrown from `withContext` it is hidden from the user.
>> >>
>> >> > 4 мая 2020 г., в 19:15, Maxim Muzafarov <[hidden email]>
>> >> > написал(а):
>> >> >
>> >> > Igniters,
>> >> >
>> >> >
>> >> > I've spent a couple of days in debug mode and found that some of the
>> >> > configured ExecutorServices of an Ignite instance completely hide
>> >> > assertion errors without any logging and node killing. This may
>> >> > violate some internal guarantees and hide serious errors.
>> >> >
>> >> > Let's consider, for instance, GridDhtPartitionsExchangeFuture [1].
>> >> > It
>> >> > has three places of submitting some Runnable on system executor
>> >> > service. If an assertion error (or even any uncached exception)
>> >> > occurs
>> >> > in the code block below it will be swallowed without logging,
>> >> > exchange
>> >> > future completion or node stoping.
>> >> >
>> >> > cctx.kernalContext().getSystemExecutorService().submit(new Runnable()
>> >> > {
>> >> >    @Override public void run() {
>> >> >        sendPartitions(newCrd);
>> >> >    }
>> >> > });
>> >> >
>> >> > I've checked that these executor services and most of them
>> >> > configured
>> >> > to catch only OutOfMemoryError.
>> >> >
>> >> > Should we consider catching AssertionErrors as well and treat them
>> >> > as
>> >> > CRITICAL_ERRORS for the Failure Handler?
>> >> > Should we log uncached errors on each of them?
>> >> >
>> >> >
>> >> > The most important list of executor services configured with OOM
>> >> > handler
>> >> > only:
>> >> > execSvc
>> >> > svcExecSvc
>> >> > sysExecSvc
>> >> > p2pExecSvc
>> >> > restExecSvc
>> >> > utilityCacheExecSvc
>> >> > affExecSvc
>> >> > qryExecSvc
>> >> >
>> >> > [1]
>> >> > https://github.com/apache/ignite/blob/master/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/distributed/dht/preloader/GridDhtPartitionsExchangeFuture.java#L4848
>> >>
>> >
>>
>>
>> --
>>
>> Best regards,
>> Ivan Pavlukhin
>


--

Best regards,
Ivan Pavlukhin
Reply | Threaded
Open this post in threaded view
|

Re: ExecutorServices hide assertions without logging and node stop

Maxim Muzafarov
Ivan,

> 2. Assertions might be disabled in production.
Assertions must be disabled in production :-)

Here is another side of the same coin.
A contributor who makes a new patch must be able to see failed tests
immediately, he must not check all the logs with logged assertions.
The assertion is an error (not to be confused with an exception), all
the errors must fail the node.

If we not enabling catching assertions for ExecutorServices this patch
for all the cases, at least it must be done for all tests running the
TeamCity environment. But I think we should enable fail node if an
error occurs (assume assertions disabled in production).

On Mon, 25 May 2020 at 12:36, Ivan Pavlukhin <[hidden email]> wrote:

>
> Maxim,
>
> I looked into the code and checked how do we setup uncaught exception
> handlers for internal executors. Indeed it looks not good (OOME leads
> to failure handling, other exceptions are silently ignored). Logging
> sounds good.
>
> Have doubts about failure handling (e.g. terminating a node). I
> remember at least one related discussion [1]. And here are some
> statements which causes my doubts:
> 1. Not only assertion errors are a programming errors evidence, e.g. a
> simple NPE often signals the same.
> 2. Assertions might be disabled in production.
> 3. Sometimes it seems that assertions are often overused in the Ignite
> code. Intermittent concurrency-related assertion errors are not so
> uncommon. And it seems impossible to figure out how critical a
> particular error is.
>
> [1] http://apache-ignite-developers.2346864.n4.nabble.com/Exceptions-thrown-in-IndexingSpi-and-quot-fail-fast-quot-principle-td38904.html
>
> 2020-05-24 12:13 GMT+03:00, Maxim Muzafarov <[hidden email]>:
> > Ivan,
> >
> > Not sure that I've got your point. Let me give some example:
> >
> > IgniteEx ignite = startGrid();
> > ignite.context().getSystemExecutorService().execute(() -> { assert false;
> > });
> > ignite.context().getSystemExecutorService().submit(() -> { assert false;
> > });
> >
> > In both of these cases there are no exceptions logged and the node not
> > stopped (I assume that failing assertion means a code contract is
> > violated). Of course, I'll get an assertion error if fut.get() will be
> > called but there a lot of cases where it's not. So, I think we must
> > log errors here or fail the node.
> >
> > On Fri, 22 May 2020 at 10:51, Ivan Pavlukhin <[hidden email]> wrote:
> >>
> >> Hi Maxim,
> >>
> >> I recently had similar troubles related to ExecutorService and
> >> exception handling. It was an unexpected behavior for me that
> >> exceptions of any kind* thrown from Runnable in a following code:
> >> executor.submit(() -> throw new RuntimeException())
> >> were silently ignored.
> >>
> >> I seemed to me really unexpected for me at first. But later on I
> >> figured out the cause. There is a subtle detail which I forgot.
> >> ExecutorService.submit returns a Future and leaves a responsibility to
> >> handle exceptions of any kind to a caller of Future.get. And it was
> >> really subtle for me, because I did not use a returned Future at all.
> >> Actually I supposed ExecutorService.execute when used
> >> ExecutorService.submit.
> >>
> >> Is it related to the subject? Can ExecutorService.execute help here?
> >>
> >> * In my case there were exceptions caused by missing classes on a
> >> classpath.
> >>
> >> 2020-05-21 22:22 GMT+03:00, Maxim Muzafarov <[hidden email]>:
> >> > Folks,
> >> >
> >> > I've created an issue.
> >> > https://issues.apache.org/jira/browse/IGNITE-13055
> >> >
> >> > On Wed, 6 May 2020 at 10:28, Nikolay Izhikov <[hidden email]>
> >> > wrote:
> >> >>
> >> >> Hello, Maxim.
> >> >>
> >> >> I can confirm this itching issue.
> >> >> It also happens when some custom Security plugin throws an exception
> >> >> while
> >> >> processing a communication message.
> >> >>
> >> >> ```
> >> >> UUID newSecSubjId = secSubjId != null ? secSubjId : nodeId;
> >> >>
> >> >> try (OperationSecurityContext s =
> >> >> ctx.security().withContext(newSecSubjId)) {
> >> >>     lsnr.onMessage(nodeId, msg, plc);
> >> >> }
> >> >> finally {
> >> >>     if (change)
> >> >>         CUR_PLC.set(oldPlc);
> >> >> }
> >> >> ```
> >> >>
> >> >> If an exception thrown from `withContext` it is hidden from the user.
> >> >>
> >> >> > 4 мая 2020 г., в 19:15, Maxim Muzafarov <[hidden email]>
> >> >> > написал(а):
> >> >> >
> >> >> > Igniters,
> >> >> >
> >> >> >
> >> >> > I've spent a couple of days in debug mode and found that some of the
> >> >> > configured ExecutorServices of an Ignite instance completely hide
> >> >> > assertion errors without any logging and node killing. This may
> >> >> > violate some internal guarantees and hide serious errors.
> >> >> >
> >> >> > Let's consider, for instance, GridDhtPartitionsExchangeFuture [1].
> >> >> > It
> >> >> > has three places of submitting some Runnable on system executor
> >> >> > service. If an assertion error (or even any uncached exception)
> >> >> > occurs
> >> >> > in the code block below it will be swallowed without logging,
> >> >> > exchange
> >> >> > future completion or node stoping.
> >> >> >
> >> >> > cctx.kernalContext().getSystemExecutorService().submit(new Runnable()
> >> >> > {
> >> >> >    @Override public void run() {
> >> >> >        sendPartitions(newCrd);
> >> >> >    }
> >> >> > });
> >> >> >
> >> >> > I've checked that these executor services and most of them
> >> >> > configured
> >> >> > to catch only OutOfMemoryError.
> >> >> >
> >> >> > Should we consider catching AssertionErrors as well and treat them
> >> >> > as
> >> >> > CRITICAL_ERRORS for the Failure Handler?
> >> >> > Should we log uncached errors on each of them?
> >> >> >
> >> >> >
> >> >> > The most important list of executor services configured with OOM
> >> >> > handler
> >> >> > only:
> >> >> > execSvc
> >> >> > svcExecSvc
> >> >> > sysExecSvc
> >> >> > p2pExecSvc
> >> >> > restExecSvc
> >> >> > utilityCacheExecSvc
> >> >> > affExecSvc
> >> >> > qryExecSvc
> >> >> >
> >> >> > [1]
> >> >> > https://github.com/apache/ignite/blob/master/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/distributed/dht/preloader/GridDhtPartitionsExchangeFuture.java#L4848
> >> >>
> >> >
> >>
> >>
> >> --
> >>
> >> Best regards,
> >> Ivan Pavlukhin
> >
>
>
> --
>
> Best regards,
> Ivan Pavlukhin
Reply | Threaded
Open this post in threaded view
|

Re: ExecutorServices hide assertions without logging and node stop

Ivan Pavlukhin
Maxim,

> Assertions must be disabled in production :-)

Since production deployments are out of our control we should think
about ones with enabled assertions as well (and they do occur in
practice).

> A contributor who makes a new patch must be able to see failed tests immediately, he must not check all the logs with logged assertions. The assertion is an error (not to be confused with an exception), all the errors must fail the node.

I am not sure how practical it is today. I might have missed some
recent activities but I recall an effort to enable a node terminating
failure handler in tests. AFAIR, it was not successful and had lead to
troubles with TC build results interpretation.

2020-05-25 14:27 GMT+03:00, Maxim Muzafarov <[hidden email]>:

> Ivan,
>
>> 2. Assertions might be disabled in production.
> Assertions must be disabled in production :-)
>
> Here is another side of the same coin.
> A contributor who makes a new patch must be able to see failed tests
> immediately, he must not check all the logs with logged assertions.
> The assertion is an error (not to be confused with an exception), all
> the errors must fail the node.
>
> If we not enabling catching assertions for ExecutorServices this patch
> for all the cases, at least it must be done for all tests running the
> TeamCity environment. But I think we should enable fail node if an
> error occurs (assume assertions disabled in production).
>
> On Mon, 25 May 2020 at 12:36, Ivan Pavlukhin <[hidden email]> wrote:
>>
>> Maxim,
>>
>> I looked into the code and checked how do we setup uncaught exception
>> handlers for internal executors. Indeed it looks not good (OOME leads
>> to failure handling, other exceptions are silently ignored). Logging
>> sounds good.
>>
>> Have doubts about failure handling (e.g. terminating a node). I
>> remember at least one related discussion [1]. And here are some
>> statements which causes my doubts:
>> 1. Not only assertion errors are a programming errors evidence, e.g. a
>> simple NPE often signals the same.
>> 2. Assertions might be disabled in production.
>> 3. Sometimes it seems that assertions are often overused in the Ignite
>> code. Intermittent concurrency-related assertion errors are not so
>> uncommon. And it seems impossible to figure out how critical a
>> particular error is.
>>
>> [1]
>> http://apache-ignite-developers.2346864.n4.nabble.com/Exceptions-thrown-in-IndexingSpi-and-quot-fail-fast-quot-principle-td38904.html
>>
>> 2020-05-24 12:13 GMT+03:00, Maxim Muzafarov <[hidden email]>:
>> > Ivan,
>> >
>> > Not sure that I've got your point. Let me give some example:
>> >
>> > IgniteEx ignite = startGrid();
>> > ignite.context().getSystemExecutorService().execute(() -> { assert
>> > false;
>> > });
>> > ignite.context().getSystemExecutorService().submit(() -> { assert
>> > false;
>> > });
>> >
>> > In both of these cases there are no exceptions logged and the node not
>> > stopped (I assume that failing assertion means a code contract is
>> > violated). Of course, I'll get an assertion error if fut.get() will be
>> > called but there a lot of cases where it's not. So, I think we must
>> > log errors here or fail the node.
>> >
>> > On Fri, 22 May 2020 at 10:51, Ivan Pavlukhin <[hidden email]>
>> > wrote:
>> >>
>> >> Hi Maxim,
>> >>
>> >> I recently had similar troubles related to ExecutorService and
>> >> exception handling. It was an unexpected behavior for me that
>> >> exceptions of any kind* thrown from Runnable in a following code:
>> >> executor.submit(() -> throw new RuntimeException())
>> >> were silently ignored.
>> >>
>> >> I seemed to me really unexpected for me at first. But later on I
>> >> figured out the cause. There is a subtle detail which I forgot.
>> >> ExecutorService.submit returns a Future and leaves a responsibility to
>> >> handle exceptions of any kind to a caller of Future.get. And it was
>> >> really subtle for me, because I did not use a returned Future at all.
>> >> Actually I supposed ExecutorService.execute when used
>> >> ExecutorService.submit.
>> >>
>> >> Is it related to the subject? Can ExecutorService.execute help here?
>> >>
>> >> * In my case there were exceptions caused by missing classes on a
>> >> classpath.
>> >>
>> >> 2020-05-21 22:22 GMT+03:00, Maxim Muzafarov <[hidden email]>:
>> >> > Folks,
>> >> >
>> >> > I've created an issue.
>> >> > https://issues.apache.org/jira/browse/IGNITE-13055
>> >> >
>> >> > On Wed, 6 May 2020 at 10:28, Nikolay Izhikov <[hidden email]>
>> >> > wrote:
>> >> >>
>> >> >> Hello, Maxim.
>> >> >>
>> >> >> I can confirm this itching issue.
>> >> >> It also happens when some custom Security plugin throws an
>> >> >> exception
>> >> >> while
>> >> >> processing a communication message.
>> >> >>
>> >> >> ```
>> >> >> UUID newSecSubjId = secSubjId != null ? secSubjId : nodeId;
>> >> >>
>> >> >> try (OperationSecurityContext s =
>> >> >> ctx.security().withContext(newSecSubjId)) {
>> >> >>     lsnr.onMessage(nodeId, msg, plc);
>> >> >> }
>> >> >> finally {
>> >> >>     if (change)
>> >> >>         CUR_PLC.set(oldPlc);
>> >> >> }
>> >> >> ```
>> >> >>
>> >> >> If an exception thrown from `withContext` it is hidden from the
>> >> >> user.
>> >> >>
>> >> >> > 4 мая 2020 г., в 19:15, Maxim Muzafarov <[hidden email]>
>> >> >> > написал(а):
>> >> >> >
>> >> >> > Igniters,
>> >> >> >
>> >> >> >
>> >> >> > I've spent a couple of days in debug mode and found that some of
>> >> >> > the
>> >> >> > configured ExecutorServices of an Ignite instance completely hide
>> >> >> > assertion errors without any logging and node killing. This may
>> >> >> > violate some internal guarantees and hide serious errors.
>> >> >> >
>> >> >> > Let's consider, for instance, GridDhtPartitionsExchangeFuture
>> >> >> > [1].
>> >> >> > It
>> >> >> > has three places of submitting some Runnable on system executor
>> >> >> > service. If an assertion error (or even any uncached exception)
>> >> >> > occurs
>> >> >> > in the code block below it will be swallowed without logging,
>> >> >> > exchange
>> >> >> > future completion or node stoping.
>> >> >> >
>> >> >> > cctx.kernalContext().getSystemExecutorService().submit(new
>> >> >> > Runnable()
>> >> >> > {
>> >> >> >    @Override public void run() {
>> >> >> >        sendPartitions(newCrd);
>> >> >> >    }
>> >> >> > });
>> >> >> >
>> >> >> > I've checked that these executor services and most of them
>> >> >> > configured
>> >> >> > to catch only OutOfMemoryError.
>> >> >> >
>> >> >> > Should we consider catching AssertionErrors as well and treat
>> >> >> > them
>> >> >> > as
>> >> >> > CRITICAL_ERRORS for the Failure Handler?
>> >> >> > Should we log uncached errors on each of them?
>> >> >> >
>> >> >> >
>> >> >> > The most important list of executor services configured with OOM
>> >> >> > handler
>> >> >> > only:
>> >> >> > execSvc
>> >> >> > svcExecSvc
>> >> >> > sysExecSvc
>> >> >> > p2pExecSvc
>> >> >> > restExecSvc
>> >> >> > utilityCacheExecSvc
>> >> >> > affExecSvc
>> >> >> > qryExecSvc
>> >> >> >
>> >> >> > [1]
>> >> >> > https://github.com/apache/ignite/blob/master/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/distributed/dht/preloader/GridDhtPartitionsExchangeFuture.java#L4848
>> >> >>
>> >> >
>> >>
>> >>
>> >> --
>> >>
>> >> Best regards,
>> >> Ivan Pavlukhin
>> >
>>
>>
>> --
>>
>> Best regards,
>> Ivan Pavlukhin
>


--

Best regards,
Ivan Pavlukhin