Atomic cache inconsistent state

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Atomic cache inconsistent state

Denis Garus
Hello Igniters!

 

I have found some confusing behavior of atomic partitioned cache with
`PRIMARY_SYNC` write synchronization mode.

Node with a primary partition sends a message to remote nodes with backup
partitions via `GridDhtAtomicAbstractUpdateFuture#sendDhtRequests`.

If during of sending occurs an error then it, in fact, will be ignored, see
[1]:

```

try {

                ....

 

                cctx.io().send(req.nodeId(), req, cctx.ioPolicy());

 

                ....

}

catch (ClusterTopologyCheckedException ignored) {

                ....

 

                registerResponse(req.nodeId());

}

catch (IgniteCheckedException ignored) {

                ....

 

                registerResponse(req.nodeId());

}

```

This behavior results in the primary partition and backup partitions have
the different value for given key.

 

There is the reproducer [2].

 

Should we consider this behavior as valid?

 

[1].
https://github.com/dgarus/ignite/blob/d473b507f04e2ec843c1da1066d8908e882396
d7/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/di
stributed/dht/atomic/GridDhtAtomicAbstractUpdateFuture.java#L473

[2].
https://github.com/apache/ignite/pull/4126/files#diff-5e5bfb73bd917d85f56a05
552b1d014aR26

Reply | Threaded
Open this post in threaded view
|

Re: Atomic cache inconsistent state

Denis Garus
Fix formatting

Hello Igniters!

I have found some confusing behavior of atomic partitioned cache with
`PRIMARY_SYNC` write synchronization mode.
Node with a primary partition sends a message to remote nodes with backup
partitions via `GridDhtAtomicAbstractUpdateFuture#sendDhtRequests`.
If during of sending occurs an error then it, in fact, will be ignored, see
[1]:
```
try {
                ....

                cctx.io().send(req.nodeId(), req, cctx.ioPolicy());

                ....
}
catch (ClusterTopologyCheckedException ignored) {
                ....

                registerResponse(req.nodeId());
}
catch (IgniteCheckedException ignored) {
                ....

                registerResponse(req.nodeId());
}

```
This behavior results in the primary partition and backup partitions have
the different value for given key.

There is the reproducer [2].

Should we consider this behavior as valid?

[1].
https://github.com/dgarus/ignite/blob/d473b507f04e2ec843c1da1066d8908e882396d7/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/distributed/dht/atomic/GridDhtAtomicAbstractUpdateFuture.java#L473
[2].
https://github.com/apache/ignite/pull/4126/files#diff-5e5bfb73bd917d85f56a05552b1d014aR26

2018-06-05 17:35 GMT+03:00 Denis Garus <[hidden email]>:

> Hello Igniters!
>
>
>
> I have found some confusing behavior of atomic partitioned cache with
> `PRIMARY_SYNC` write synchronization mode.
>
> Node with a primary partition sends a message to remote nodes with backup
> partitions via `GridDhtAtomicAbstractUpdateFuture#sendDhtRequests`.
>
> If during of sending occurs an error then it, in fact, will be ignored,
> see [1]:
>
> ```
>
> try {
>
>                 ....
>
>
>
>                 cctx.io().send(req.nodeId(), req, cctx.ioPolicy());
>
>
>
>                 ....
>
> }
>
> catch (ClusterTopologyCheckedException ignored) {
>
>                 ....
>
>
>
>                 registerResponse(req.nodeId());
>
> }
>
> catch (IgniteCheckedException ignored) {
>
>                 ....
>
>
>
>                 registerResponse(req.nodeId());
>
> }
>
> ```
>
> This behavior results in the primary partition and backup partitions have
> the different value for given key.
>
>
>
> There is the reproducer [2].
>
>
>
> Should we consider this behavior as valid?
>
>
>
> [1]. https://github.com/dgarus/ignite/blob/d473b507f04e2ec843c1da1066d890
> 8e882396d7/modules/core/src/main/java/org/apache/ignite/
> internal/processors/cache/distributed/dht/atomic/
> GridDhtAtomicAbstractUpdateFuture.java#L473
>
> [2]. https://github.com/apache/ignite/pull/4126/files#diff-
> 5e5bfb73bd917d85f56a05552b1d014aR26
>
Reply | Threaded
Open this post in threaded view
|

Re: Atomic cache inconsistent state

Dmitriy Govorukhin
Denis,

Seem that you right, it is a problem.
I guess in this case primary node should send CachePartialUpdateException
to near node.

On Tue, Jun 5, 2018 at 6:13 PM, Denis Garus <[hidden email]> wrote:

> Fix formatting
>
> Hello Igniters!
>
> I have found some confusing behavior of atomic partitioned cache with
> `PRIMARY_SYNC` write synchronization mode.
> Node with a primary partition sends a message to remote nodes with backup
> partitions via `GridDhtAtomicAbstractUpdateFuture#sendDhtRequests`.
> If during of sending occurs an error then it, in fact, will be ignored, see
> [1]:
> ```
> try {
>                 ....
>
>                 cctx.io().send(req.nodeId(), req, cctx.ioPolicy());
>
>                 ....
> }
> catch (ClusterTopologyCheckedException ignored) {
>                 ....
>
>                 registerResponse(req.nodeId());
> }
> catch (IgniteCheckedException ignored) {
>                 ....
>
>                 registerResponse(req.nodeId());
> }
>
> ```
> This behavior results in the primary partition and backup partitions have
> the different value for given key.
>
> There is the reproducer [2].
>
> Should we consider this behavior as valid?
>
> [1].
> https://github.com/dgarus/ignite/blob/d473b507f04e2ec843c1da1066d890
> 8e882396d7/modules/core/src/main/java/org/apache/ignite/
> internal/processors/cache/distributed/dht/atomic/
> GridDhtAtomicAbstractUpdateFuture.java#L473
> [2].
> https://github.com/apache/ignite/pull/4126/files#diff-
> 5e5bfb73bd917d85f56a05552b1d014aR26
>
> 2018-06-05 17:35 GMT+03:00 Denis Garus <[hidden email]>:
>
> > Hello Igniters!
> >
> >
> >
> > I have found some confusing behavior of atomic partitioned cache with
> > `PRIMARY_SYNC` write synchronization mode.
> >
> > Node with a primary partition sends a message to remote nodes with backup
> > partitions via `GridDhtAtomicAbstractUpdateFuture#sendDhtRequests`.
> >
> > If during of sending occurs an error then it, in fact, will be ignored,
> > see [1]:
> >
> > ```
> >
> > try {
> >
> >                 ....
> >
> >
> >
> >                 cctx.io().send(req.nodeId(), req, cctx.ioPolicy());
> >
> >
> >
> >                 ....
> >
> > }
> >
> > catch (ClusterTopologyCheckedException ignored) {
> >
> >                 ....
> >
> >
> >
> >                 registerResponse(req.nodeId());
> >
> > }
> >
> > catch (IgniteCheckedException ignored) {
> >
> >                 ....
> >
> >
> >
> >                 registerResponse(req.nodeId());
> >
> > }
> >
> > ```
> >
> > This behavior results in the primary partition and backup partitions have
> > the different value for given key.
> >
> >
> >
> > There is the reproducer [2].
> >
> >
> >
> > Should we consider this behavior as valid?
> >
> >
> >
> > [1]. https://github.com/dgarus/ignite/blob/
> d473b507f04e2ec843c1da1066d890
> > 8e882396d7/modules/core/src/main/java/org/apache/ignite/
> > internal/processors/cache/distributed/dht/atomic/
> > GridDhtAtomicAbstractUpdateFuture.java#L473
> >
> > [2]. https://github.com/apache/ignite/pull/4126/files#diff-
> > 5e5bfb73bd917d85f56a05552b1d014aR26
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Atomic cache inconsistent state

Andrew Mashenkov
Dmitry,

There are other cases that can result in inconsistent state of Atomic cache
with 2 or more backups.

1. For PRIMARY_SYNC.  Primary sends requests to all backups and respond to
near.... and then one of backup update fails.
Will primary retry update operation? I doubt.

2. For all sync modes.  Primary sends request to 1-st backup and fails to
send to 2-nd backup... and then near node sudden death happens.
No one will retry as near has gone.

On Tue, Jun 5, 2018 at 7:16 PM, Dmitriy Govorukhin <
[hidden email]> wrote:

> Denis,
>
> Seem that you right, it is a problem.
> I guess in this case primary node should send CachePartialUpdateException
> to near node.
>
> On Tue, Jun 5, 2018 at 6:13 PM, Denis Garus <[hidden email]> wrote:
>
> > Fix formatting
> >
> > Hello Igniters!
> >
> > I have found some confusing behavior of atomic partitioned cache with
> > `PRIMARY_SYNC` write synchronization mode.
> > Node with a primary partition sends a message to remote nodes with backup
> > partitions via `GridDhtAtomicAbstractUpdateFuture#sendDhtRequests`.
> > If during of sending occurs an error then it, in fact, will be ignored,
> see
> > [1]:
> > ```
> > try {
> >                 ....
> >
> >                 cctx.io().send(req.nodeId(), req, cctx.ioPolicy());
> >
> >                 ....
> > }
> > catch (ClusterTopologyCheckedException ignored) {
> >                 ....
> >
> >                 registerResponse(req.nodeId());
> > }
> > catch (IgniteCheckedException ignored) {
> >                 ....
> >
> >                 registerResponse(req.nodeId());
> > }
> >
> > ```
> > This behavior results in the primary partition and backup partitions have
> > the different value for given key.
> >
> > There is the reproducer [2].
> >
> > Should we consider this behavior as valid?
> >
> > [1].
> > https://github.com/dgarus/ignite/blob/d473b507f04e2ec843c1da1066d890
> > 8e882396d7/modules/core/src/main/java/org/apache/ignite/
> > internal/processors/cache/distributed/dht/atomic/
> > GridDhtAtomicAbstractUpdateFuture.java#L473
> > [2].
> > https://github.com/apache/ignite/pull/4126/files#diff-
> > 5e5bfb73bd917d85f56a05552b1d014aR26
> >
> > 2018-06-05 17:35 GMT+03:00 Denis Garus <[hidden email]>:
> >
> > > Hello Igniters!
> > >
> > >
> > >
> > > I have found some confusing behavior of atomic partitioned cache with
> > > `PRIMARY_SYNC` write synchronization mode.
> > >
> > > Node with a primary partition sends a message to remote nodes with
> backup
> > > partitions via `GridDhtAtomicAbstractUpdateFuture#sendDhtRequests`.
> > >
> > > If during of sending occurs an error then it, in fact, will be ignored,
> > > see [1]:
> > >
> > > ```
> > >
> > > try {
> > >
> > >                 ....
> > >
> > >
> > >
> > >                 cctx.io().send(req.nodeId(), req, cctx.ioPolicy());
> > >
> > >
> > >
> > >                 ....
> > >
> > > }
> > >
> > > catch (ClusterTopologyCheckedException ignored) {
> > >
> > >                 ....
> > >
> > >
> > >
> > >                 registerResponse(req.nodeId());
> > >
> > > }
> > >
> > > catch (IgniteCheckedException ignored) {
> > >
> > >                 ....
> > >
> > >
> > >
> > >                 registerResponse(req.nodeId());
> > >
> > > }
> > >
> > > ```
> > >
> > > This behavior results in the primary partition and backup partitions
> have
> > > the different value for given key.
> > >
> > >
> > >
> > > There is the reproducer [2].
> > >
> > >
> > >
> > > Should we consider this behavior as valid?
> > >
> > >
> > >
> > > [1]. https://github.com/dgarus/ignite/blob/
> > d473b507f04e2ec843c1da1066d890
> > > 8e882396d7/modules/core/src/main/java/org/apache/ignite/
> > > internal/processors/cache/distributed/dht/atomic/
> > > GridDhtAtomicAbstractUpdateFuture.java#L473
> > >
> > > [2]. https://github.com/apache/ignite/pull/4126/files#diff-
> > > 5e5bfb73bd917d85f56a05552b1d014aR26
> > >
> >
>



--
Best regards,
Andrey V. Mashenkov
Reply | Threaded
Open this post in threaded view
|

Re: Atomic cache inconsistent state

Dmitriy Pavlov
Denis, Alexey, please share you vision.

Sincerely,
Dmitriy Pavlov

вт, 5 июн. 2018 г. в 19:39, Andrey Mashenkov <[hidden email]>:

> Dmitry,
>
> There are other cases that can result in inconsistent state of Atomic cache
> with 2 or more backups.
>
> 1. For PRIMARY_SYNC.  Primary sends requests to all backups and respond to
> near.... and then one of backup update fails.
> Will primary retry update operation? I doubt.
>
> 2. For all sync modes.  Primary sends request to 1-st backup and fails to
> send to 2-nd backup... and then near node sudden death happens.
> No one will retry as near has gone.
>
> On Tue, Jun 5, 2018 at 7:16 PM, Dmitriy Govorukhin <
> [hidden email]> wrote:
>
> > Denis,
> >
> > Seem that you right, it is a problem.
> > I guess in this case primary node should send CachePartialUpdateException
> > to near node.
> >
> > On Tue, Jun 5, 2018 at 6:13 PM, Denis Garus <[hidden email]> wrote:
> >
> > > Fix formatting
> > >
> > > Hello Igniters!
> > >
> > > I have found some confusing behavior of atomic partitioned cache with
> > > `PRIMARY_SYNC` write synchronization mode.
> > > Node with a primary partition sends a message to remote nodes with
> backup
> > > partitions via `GridDhtAtomicAbstractUpdateFuture#sendDhtRequests`.
> > > If during of sending occurs an error then it, in fact, will be ignored,
> > see
> > > [1]:
> > > ```
> > > try {
> > >                 ....
> > >
> > >                 cctx.io().send(req.nodeId(), req, cctx.ioPolicy());
> > >
> > >                 ....
> > > }
> > > catch (ClusterTopologyCheckedException ignored) {
> > >                 ....
> > >
> > >                 registerResponse(req.nodeId());
> > > }
> > > catch (IgniteCheckedException ignored) {
> > >                 ....
> > >
> > >                 registerResponse(req.nodeId());
> > > }
> > >
> > > ```
> > > This behavior results in the primary partition and backup partitions
> have
> > > the different value for given key.
> > >
> > > There is the reproducer [2].
> > >
> > > Should we consider this behavior as valid?
> > >
> > > [1].
> > > https://github.com/dgarus/ignite/blob/d473b507f04e2ec843c1da1066d890
> > > 8e882396d7/modules/core/src/main/java/org/apache/ignite/
> > > internal/processors/cache/distributed/dht/atomic/
> > > GridDhtAtomicAbstractUpdateFuture.java#L473
> > > [2].
> > > https://github.com/apache/ignite/pull/4126/files#diff-
> > > 5e5bfb73bd917d85f56a05552b1d014aR26
> > >
> > > 2018-06-05 17:35 GMT+03:00 Denis Garus <[hidden email]>:
> > >
> > > > Hello Igniters!
> > > >
> > > >
> > > >
> > > > I have found some confusing behavior of atomic partitioned cache with
> > > > `PRIMARY_SYNC` write synchronization mode.
> > > >
> > > > Node with a primary partition sends a message to remote nodes with
> > backup
> > > > partitions via `GridDhtAtomicAbstractUpdateFuture#sendDhtRequests`.
> > > >
> > > > If during of sending occurs an error then it, in fact, will be
> ignored,
> > > > see [1]:
> > > >
> > > > ```
> > > >
> > > > try {
> > > >
> > > >                 ....
> > > >
> > > >
> > > >
> > > >                 cctx.io().send(req.nodeId(), req, cctx.ioPolicy());
> > > >
> > > >
> > > >
> > > >                 ....
> > > >
> > > > }
> > > >
> > > > catch (ClusterTopologyCheckedException ignored) {
> > > >
> > > >                 ....
> > > >
> > > >
> > > >
> > > >                 registerResponse(req.nodeId());
> > > >
> > > > }
> > > >
> > > > catch (IgniteCheckedException ignored) {
> > > >
> > > >                 ....
> > > >
> > > >
> > > >
> > > >                 registerResponse(req.nodeId());
> > > >
> > > > }
> > > >
> > > > ```
> > > >
> > > > This behavior results in the primary partition and backup partitions
> > have
> > > > the different value for given key.
> > > >
> > > >
> > > >
> > > > There is the reproducer [2].
> > > >
> > > >
> > > >
> > > > Should we consider this behavior as valid?
> > > >
> > > >
> > > >
> > > > [1]. https://github.com/dgarus/ignite/blob/
> > > d473b507f04e2ec843c1da1066d890
> > > > 8e882396d7/modules/core/src/main/java/org/apache/ignite/
> > > > internal/processors/cache/distributed/dht/atomic/
> > > > GridDhtAtomicAbstractUpdateFuture.java#L473
> > > >
> > > > [2]. https://github.com/apache/ignite/pull/4126/files#diff-
> > > > 5e5bfb73bd917d85f56a05552b1d014aR26
> > > >
> > >
> >
>
>
>
> --
> Best regards,
> Andrey V. Mashenkov
>