Igniters,
I have recently discovered [1] that Ignite can arrive in a state when an optimistic serializable transaction can never be successfully committed from a backup node [2]. In short, the root cause of this issue is that there are configurations that allow a key to be stored on primary and backup nodes with different versions. This is a fundamental design choice that made a while ago, however, I am not sure if this is a right way to go. When primary and backup versions differ and read load balancing is enabled, the read version will always mismatch with primary version and optimistic serializable transaction will always fail. Here I wanted to discuss both short-term mitigation plan for the issue [2] as well as a longer-term changes to replication protocol. As a short-term solution for [2] I suggest to force reads from a primary node inside optimistic serializable transactions. The question is whether to enforce this behavior only if the cache has a 3-rd party persistence storage or this behavior should be always enforced. Note that the version mismatch may appear even without a 3-rd party persistence storage when an expiry policy is used. However, in this case, the version mismatch is time-bound to the TTL cleanup lag. Personally, I would go with always enforcing primary-node reads inside an optimistic serializable transaction. As a long-term solution which would eliminate the possibility of versions desync on primary and backup nodes, I would suggest to revisit the read-through and TTL expiry semantics. It looks like quite a lot of users are actually struggling with the current implementation of read-through because a miss does not load the value to all partition nodes [3]. As for TTL, I remember it clearing up entries locally was a big issue for a proper MVCC rebalance implementation (we ended up prohibiting TTL for MVCC caches). I think it may be better to make read-through and entry expiry a partition-wide operation with the underlying cache guarantees. For read-through it is justified because a partition-wide operation penalty is comparable with the cache store load anyway (otherwise, a 3rd party storage makes little sense). For entries expiration it should not make any difference because it happens in background anyways. Any thoughts on the subject are very much appreciated. --AG [1] http://apache-ignite-developers.2346864.n4.nabble.com/Fwd-NodeOrder-in-GridCacheVersion-td46108.html [2] https://issues.apache.org/jira/browse/IGNITE-12739 [3] http://apache-ignite-developers.2346864.n4.nabble.com/Re-Read-through-not-working-as-expected-in-case-of-Replicated-cache-td46083.html |
Alexey,
>> In short, the root cause of this issue is that there are configurations >> that allow a key to be stored on primary and backup nodes with different >> versions. Faced with the same problem during ReadRepair development. >> I suggest to force reads from a primary >> node inside optimistic serializable transactions. It looks like a proper fix (read-from-backup= ... && !read-through). >> I would suggest to revisit the >> read-through and TTL expiry semantics. Do we really need these features? - we have great full-featured consistent persistence, what's the point to use limited-featured inconsistent persistence via the external database? Can we get rid of this feature at 3.0? - Expiry policy is expensive (slowdown the cluster) and does not guarantee the in-time removal, and always may be replaced by proper design (state machine, query, eviction, in-memory cluster restart, etc). On Thu, Mar 5, 2020 at 12:33 PM Alexey Goncharuk <[hidden email]> wrote: > Igniters, > > I have recently discovered [1] that Ignite can arrive in a state when an > optimistic serializable transaction can never be successfully committed > from a backup node [2]. > > In short, the root cause of this issue is that there are configurations > that allow a key to be stored on primary and backup nodes with different > versions. This is a fundamental design choice that made a while ago, > however, I am not sure if this is a right way to go. When primary and > backup versions differ and read load balancing is enabled, the read version > will always mismatch with primary version and optimistic serializable > transaction will always fail. > > Here I wanted to discuss both short-term mitigation plan for the issue [2] > as well as a longer-term changes to replication protocol. > > As a short-term solution for [2] I suggest to force reads from a primary > node inside optimistic serializable transactions. The question is whether > to enforce this behavior only if the cache has a 3-rd party persistence > storage or this behavior should be always enforced. Note that the version > mismatch may appear even without a 3-rd party persistence storage when an > expiry policy is used. However, in this case, the version mismatch is > time-bound to the TTL cleanup lag. Personally, I would go with always > enforcing primary-node reads inside an optimistic serializable transaction. > > As a long-term solution which would eliminate the possibility of versions > desync on primary and backup nodes, I would suggest to revisit the > read-through and TTL expiry semantics. It looks like quite a lot of users > are actually struggling with the current implementation of read-through > because a miss does not load the value to all partition nodes [3]. As for > TTL, I remember it clearing up entries locally was a big issue for a proper > MVCC rebalance implementation (we ended up prohibiting TTL for MVCC > caches). > I think it may be better to make read-through and entry expiry a > partition-wide operation with the underlying cache guarantees. For > read-through it is justified because a partition-wide operation penalty is > comparable with the cache store load anyway (otherwise, a 3rd party storage > makes little sense). For entries expiration it should not make any > difference because it happens in background anyways. > > Any thoughts on the subject are very much appreciated. > > --AG > > [1] > > http://apache-ignite-developers.2346864.n4.nabble.com/Fwd-NodeOrder-in-GridCacheVersion-td46108.html > [2] https://issues.apache.org/jira/browse/IGNITE-12739 > [3] > > http://apache-ignite-developers.2346864.n4.nabble.com/Re-Read-through-not-working-as-expected-in-case-of-Replicated-cache-td46083.html > |
Anton,
> >> In short, the root cause of this issue is that there are configurations > >> that allow a key to be stored on primary and backup nodes with different > >> versions. > Faced with the same problem during ReadRepair development. > > >> I suggest to force reads from a primary > >> node inside optimistic serializable transactions. > It looks like a proper fix (read-from-backup= ... && !read-through). > > >> I would suggest to revisit the > >> read-through and TTL expiry semantics. > Do we really need these features? > - we have great full-featured consistent persistence, what's the point to > use limited-featured inconsistent persistence via the external database? > Can we get rid of this feature at 3.0? > - Expiry policy is expensive (slowdown the cluster) and does not guarantee > the in-time removal, and always may be replaced by proper design (state > machine, query, eviction, in-memory cluster restart, etc). > Caching a 3rd-party persistence is one of the most widely used Ignite use-cases, I am sure we cannot drop this. Perhaps, it makes sense to separate the caching scenario in an explicit configuration and cache mode. Probably, even separate cache and database cases. As for expiry policy - I agree that a user can always implement it on application level, but a user can always implement transactions as well. If we already have a feature and we can fix it properly, why not keep it? |
Alex, thanks for monitoring various discussion threads and sharing these
problems with the rest of the dev community. >> As a short-term solution for [2] I suggest to force reads from a primary > node inside optimistic serializable transactions. Totally agree on this. Anyway, consistency and predictable behavior matter most. Also, it shouldn't affect performance anyhow dramatically. >> I think it may be better to make read-through and entry expiry a > partition-wide operation with the underlying cache guarantees. That's a pain in the neck! As you properly mentioned, an in-memory data grid sitting on top of an external database is still our dominating use case. So, a partition-wide operation assumes that if a record is read from a CacheStore than its value will be replicated to all the primary and backup copies, right? - Denis On Fri, Mar 6, 2020 at 12:58 AM Alexey Goncharuk <[hidden email]> wrote: > Anton, > > > > >> In short, the root cause of this issue is that there are > configurations > > >> that allow a key to be stored on primary and backup nodes with > different > > >> versions. > > Faced with the same problem during ReadRepair development. > > > > >> I suggest to force reads from a primary > > >> node inside optimistic serializable transactions. > > It looks like a proper fix (read-from-backup= ... && !read-through). > > > > >> I would suggest to revisit the > > >> read-through and TTL expiry semantics. > > Do we really need these features? > > - we have great full-featured consistent persistence, what's the point to > > use limited-featured inconsistent persistence via the external database? > > Can we get rid of this feature at 3.0? > > - Expiry policy is expensive (slowdown the cluster) and does not > guarantee > > the in-time removal, and always may be replaced by proper design (state > > machine, query, eviction, in-memory cluster restart, etc). > > > > Caching a 3rd-party persistence is one of the most widely used Ignite > use-cases, I am sure we cannot drop this. Perhaps, it makes sense to > separate the caching scenario in an explicit configuration and cache mode. > Probably, even separate cache and database cases. > > As for expiry policy - I agree that a user can always implement it on > application level, but a user can always implement transactions as well. If > we already have a feature and we can fix it properly, why not keep it? > |
>
> > Alex, thanks for monitoring various discussion threads and sharing these > problems with the rest of the dev community. > > >> As a short-term solution for [2] I suggest to force reads from a primary > > node inside optimistic serializable transactions. > > > Totally agree on this. Anyway, consistency and predictable behavior matter > most. Also, it shouldn't affect performance anyhow dramatically. > > >> I think it may be better to make read-through and entry expiry a > > partition-wide operation with the underlying cache guarantees. > > > That's a pain in the neck! As you properly mentioned, an in-memory data > grid sitting on top of an external database is still our dominating use > case. So, a partition-wide operation assumes that if a record is read from > a CacheStore than its value will be replicated to all the primary and > backup copies, right? > Right. As I noted before, I assume the CacheStore load is an expensive operation itself, so this change should not add significant degradation to the cache miss. |
Free forum by Nabble | Edit this page |