Hi Ilya, didn't get what are you trying to say.
The problem I am facing is, my transaction is failing giving TransactionOptimisticException. I do not have a reproducer for this project and this does not happen frequently. Transaction is failing during prepare phase. I had to open a debug port on all grid nodes to do remote debugging in order to debug this issue. What I observed is transaction fails because check in GridCacheMapEntry.checkSerializableReadVersion fails as the nodeOrder in GridCacheVersion in serialized version is different from the actual noderOrder in GridCacheVersion of respective node. This method returns false on 2 nodes out 4 nodes and this is happening for Replicated cache. This is the reason I asked What is nodeOrder in GridCacheVersion and why it is important while checking read entries in Transaction context? I tried to debug nodeOrder in ignite code but could not understand it. Inside transaction I am reading and modifying Replicated as well as Partitioned cache. What I observed is this fails for Replicated cache. As workaround, I have moved the code which reads Replicated cache out of transaction block. Is it allowed to read and modify both replicated and Partitioned cache i.e. use both Replicated and Partition? Complete exception can be found here <https://gist.github.com/61979329224e23dbaef2f63976a87a14.git>. Thanks, Prasad On Thu, Feb 27, 2020 at 1:00 AM Ilya Kasnacheev <[hidden email]> wrote: > Hello! > > I don't think this is userlist discussion, this logging is not aimed at > end-user and you are not supposed to act on it. > > Do you have any context for us, such as reproducer project or complete > logs? > > Regards, > -- > Ilya Kasnacheev > > > ср, 26 февр. 2020 г. в 19:13, Prasad Bhalerao < > [hidden email]>: > >> Can someone please advise? >> >> On Wed 26 Feb, 2020, 12:23 AM Prasad Bhalerao < >> [hidden email] wrote: >> >>> Hi, >>> >>>> Ignite Version: 2.6 >>>> No of nodes: 4 >>>> >>>> I am getting following exception while committing transaction. >>>> >>>> Although I just reading the value from this cache inside transaction >>>> and I am sure that the cache and "cache entry" read is not being modified >>>> out this transaction on any other node. >>>> >>>> So I debugged the code and found out that it fails in following code on >>>> 2 nodes out of 4 nodes. >>>> >>>> GridDhtTxPrepareFuture#checkReadConflict - >>>> GridCacheEntryEx#checkSerializableReadVersion >>>> >>>> GridCacheVersion version failing for equals check are given below for 2 >>>> different caches. I can see that it failing because of change in nodeOrder >>>> of cache. >>>> >>>> 1) Can some please explain the significance of the nodeOrder w.r.t Grid >>>> and cache? When does it change? >>>> 2) How to solve this problem? >>>> >>>> Cache : Addons (Node 2) >>>> serReadVer of entry read inside Transaction: GridCacheVersion >>>> [topVer=194120123, order=4, nodeOrder=2] >>>> version on node3: GridCacheVersion [topVer=194120123, order=4, >>>> nodeOrder=1] >>>> >>>> Cache : Subscription (Node 3) >>>> serReadVer of entry read inside Transaction: GridCacheVersion >>>> [topVer=194120123, order=1, nodeOrder=2] >>>> version on node2: GridCacheVersion [topVer=194120123, order=1, >>>> nodeOrder=10] >>>> >>>> >>>> *EXCEPTION:* >>>> >>>> Caused by: >>>> org.apache.ignite.internal.transactions.IgniteTxOptimisticCheckedException: >>>> Failed to prepare transaction, read/write conflict >>>> >>> >>> >>>> >>>> Thanks, >>>> Prasad >>>> >>> |
Prasad,
Since optimistic transactions do not acquire key locks until prepare phase, it is possible that the key value is concurrently changed before the prepare commences. Optimistic exceptions is thrown exactly in this case and suggest a user that they should retry the transaction. Consider the following example: Thread 1: Start tx 1, Read (key1) -> val1 Thread 2: Start tx 2, Read (key2) -> val1 Thread 1: Write (key1, val2) Thread 1: Commit Thread 2: Write (key1, val3) Thread 2: Commit *Optimistic exception is thrown here since current value of key1 is not val1 anymore* When optimistic transactions are used, a user is expected to have a retry logic. Alternatively, a pessimistic repeatable_read transaction can be used (one should remember that in pessimistic mode locks are acquired on first key access and released only on transaction commit). Hope this helps, --AG |
Hi Alexey,
Key value is not getting changed concurrently, I am sure about it. The cache for which I am getting the exception is kind of bootstrap data and it changes very rarely. I have added retry logic in my code and it failed every time giving the same error . Every time if fails in GridDhtTxPrepareFuture.checkReadConflict -> GridCacheEntryEx.checkSerializableReadVersion method and I think it fails due to the change in value of nodeOrder. This is what I observed while debugging the method. This happens intermittently. I got following values while inspecting GridCacheVersion object on different nodes. Cache : Addons (Node 2) serReadVer of entry read inside Transaction: GridCacheVersion [topVer=194120123, order=4, nodeOrder=2] version on node3: GridCacheVersion [topVer=194120123, order=4, nodeOrder=1] Cache : Subscription (Node 3) serReadVer of entry read inside Transaction: GridCacheVersion [topVer=194120123, order=1, nodeOrder=2] version on node2: GridCacheVersion [topVer=194120123, order=1, nodeOrder =10] Can you please answer following questions? 1) The significance of the nodeOrder w.r.t Grid and cache? 2) When does it change? 3) How it is important w.r.t. transaction? 4) Inside transaction I am reading and modifying Replicated as well as Partitioned cache. What I observed is this fails for Replicated cache. As workaround, I have moved the code which reads Replicated cache out of transaction block. Is it allowed to read and modify both replicated and Partitioned cache i.e. use both Replicated and Partitioned? Thanks, Prasad On Thu, Feb 27, 2020 at 6:01 PM Alexey Goncharuk <[hidden email]> wrote: > Prasad, > > Since optimistic transactions do not acquire key locks until prepare > phase, it is possible that the key value is concurrently changed before the > prepare commences. Optimistic exceptions is thrown exactly in this case and > suggest a user that they should retry the transaction. > > Consider the following example: > Thread 1: Start tx 1, Read (key1) -> val1 > Thread 2: Start tx 2, Read (key2) -> val1 > > Thread 1: Write (key1, val2) > Thread 1: Commit > > Thread 2: Write (key1, val3) > Thread 2: Commit *Optimistic exception is thrown here since current value > of key1 is not val1 anymore* > > When optimistic transactions are used, a user is expected to have a retry > logic. Alternatively, a pessimistic repeatable_read transaction can be used > (one should remember that in pessimistic mode locks are acquired on first > key access and released only on transaction commit). > > Hope this helps, > --AG > |
Hi Prasad.
nodeOrder is local counter used for updates ordering. The version is incremented when lock is aquired for enlisted tx entry. Are you sure where is no concurrent transaction on this replicated cache which works significant time until committed ? How long have you re-tried after getting optimistic exception ? Do you have stable topology (no concurrently failing node) when this is happens ? Do you have on-heap cache enabled ? чт, 27 февр. 2020 г. в 16:06, Prasad Bhalerao <[hidden email] >: > Hi Alexey, > > Key value is not getting changed concurrently, I am sure about it. The > cache for which I am getting the exception is kind of bootstrap data and it > changes very rarely. I have added retry logic in my code and it failed > every time giving the same error . > > Every time if fails in GridDhtTxPrepareFuture.checkReadConflict -> > GridCacheEntryEx.checkSerializableReadVersion method and I think it fails > due to the change in value of nodeOrder. This is what I observed while > debugging the method. > This happens intermittently. > > I got following values while inspecting GridCacheVersion object on > different nodes. > > Cache : Addons (Node 2) > serReadVer of entry read inside Transaction: GridCacheVersion > [topVer=194120123, order=4, nodeOrder=2] > version on node3: GridCacheVersion [topVer=194120123, order=4, nodeOrder > =1] > > Cache : Subscription (Node 3) > serReadVer of entry read inside Transaction: GridCacheVersion > [topVer=194120123, order=1, nodeOrder=2] > version on node2: GridCacheVersion [topVer=194120123, order=1, nodeOrder > =10] > > Can you please answer following questions? > 1) The significance of the nodeOrder w.r.t Grid and cache? > 2) When does it change? > 3) How it is important w.r.t. transaction? > 4) Inside transaction I am reading and modifying Replicated as well as > Partitioned cache. What I observed is this fails for Replicated cache. As > workaround, I have moved the code which reads Replicated cache out of > transaction block. Is it allowed to read and modify both replicated and > Partitioned cache i.e. use both Replicated and Partitioned? > > Thanks, > Prasad > > On Thu, Feb 27, 2020 at 6:01 PM Alexey Goncharuk < > [hidden email]> wrote: > >> Prasad, >> >> Since optimistic transactions do not acquire key locks until prepare >> phase, it is possible that the key value is concurrently changed before the >> prepare commences. Optimistic exceptions is thrown exactly in this case and >> suggest a user that they should retry the transaction. >> >> Consider the following example: >> Thread 1: Start tx 1, Read (key1) -> val1 >> Thread 2: Start tx 2, Read (key2) -> val1 >> >> Thread 1: Write (key1, val2) >> Thread 1: Commit >> >> Thread 2: Write (key1, val3) >> Thread 2: Commit *Optimistic exception is thrown here since current value >> of key1 is not val1 anymore* >> >> When optimistic transactions are used, a user is expected to have a retry >> logic. Alternatively, a pessimistic repeatable_read transaction can be used >> (one should remember that in pessimistic mode locks are acquired on first >> key access and released only on transaction commit). >> >> Hope this helps, >> --AG >> > -- Best regards, Alexei Scherbakov |
In reply to this post by Prasad Bhalerao
Prasad,
> Can you please answer following questions? > 1) The significance of the nodeOrder w.r.t Grid and cache? > Node order is a unique integer assigned to a node when the node joins grid. The node order is included into GridCacheVersion to disambiguate versions generated on different nodes that happen to have the same local version order. > 2) When does it change? > Node order does not change during the node lifetime. If two versions have different node order, it means that the versions were generated on different nodes. > 3) How it is important w.r.t. transaction? > GridCacheVersion is used to detect concurrent read-write conflicts as I described in the previous message, as well as for data rebalance. > 4) Inside transaction I am reading and modifying Replicated as well as > Partitioned cache. What I observed is this fails for Replicated cache. As > workaround, I have moved the code which reads Replicated cache out of > transaction block. Is it allowed to read and modify both replicated and > Partitioned cache i.e. use both Replicated and Partitioned? > Yes, this is perfectly fine to update replicated and transactional caches inside one transaction. From the debug output that you provided we can infer that the version of both entries have changed for both caches before transaction prepare phase. I would back up Alexei here: * How do you ensure that there are no concurrent updates on the keys? * How many retry attempts did you run? * Are your caches in-memory with eviction policy? * Do you have TTL enabled for either of the caches? * Do you have a 3rd-party persistence and read-through and write-through enabled for either of the caches? * Can you check if the issue reproduces if you set -DIGNITE_READ_LOAD_BALANCING=false system property? --AG |
Hi,
* How do you ensure that there are no concurrent updates on the keys? [Prasad]: The cache for which it is failing is kind of bootstrap cache which changes very rarely. I made sure that I was the only one working on this system while debugging the issue. The cache for which it is failing is REPLICATED cache. Read through is enabled and Write through is disabled. Whenever I get an update message for these caches from different system, I update an entry in my cache using following steps. 1. First Remove an entry from the cache using cache.remove() method. 2. Read the entry from cache using cache().get method, which reads the data from oracle db using read through approach. * How many retry attempts did you run? [Prasad] I retried the transaction 8-10 times. * Are your caches in-memory with eviction policy? [Prasad] Yes, caches are in-memory but without eviction policy. I am using Oracle DB as a third party persistence. Ignite native persistence is disabled. * Do you have TTL enabled for either of the caches? [Prasad]: No, I have not set TTL. * Do you have a 3rd-party persistence and read-through and write-through enabled for either of the caches? [Prasad]: Yes, I have 3rd Party persistence. I have read-through caches but for all read-through caches write through is disabled. The write-through is disabled for some caches as I am not the owner of these tables. I also have Write through caches but for all such caches read-through is disabled. At this moment I do not have any cache where read-through and write-through both are enabled. I reload the all my caches using cache loaders. * Can you check if the issue reproduces if you set -DIGNITE_READ_LOAD_BALANCING=false system property? [Prasad]: Sure will try to reproduce this using this parameter. But the problem is this happens intermittently. As per the following code, serReadVer is the GridCache version of transaction co-ordinator node which it compares with grid cache version of other nodes. So as per your explanation, nodeOrder is unique number assigned to each node joins the grid. So each node in cluster will have a different nodeOrder. If this is the case then "serReadVer.equals(ver)" will always return false. Please correct me if I am wrong. I just trying to understand the code. This will help me to identify the issue. *public boolean checkSerializableReadVersion(GridCacheVersion serReadVer) throws GridCacheEntryRemovedException { lockEntry(); try { checkObsolete();* * if (!serReadVer.equals(ver)) { boolean empty = isStartVersion() || deletedUnlocked();* * if (serReadVer.equals(IgniteTxEntry.SER_READ_EMPTY_ENTRY_VER)) return empty; else if (serReadVer.equals(IgniteTxEntry.SER_READ_NOT_EMPTY_VER)) return !empty; return false; } return true; } finally { unlockEntry(); } }* Thanks, Prasad On Fri, Feb 28, 2020 at 2:24 PM Alexey Goncharuk <[hidden email]> wrote: > Prasad, > > >> Can you please answer following questions? >> 1) The significance of the nodeOrder w.r.t Grid and cache? >> > Node order is a unique integer assigned to a node when the node joins > grid. The node order is included into GridCacheVersion to disambiguate > versions generated on different nodes that happen to have the same local > version order. > >> 2) When does it change? >> > Node order does not change during the node lifetime. If two versions have > different node order, it means that the versions were generated on > different nodes. > >> 3) How it is important w.r.t. transaction? >> > GridCacheVersion is used to detect concurrent read-write conflicts as I > described in the previous message, as well as for data rebalance. > >> 4) Inside transaction I am reading and modifying Replicated as well as >> Partitioned cache. What I observed is this fails for Replicated cache. As >> workaround, I have moved the code which reads Replicated cache out of >> transaction block. Is it allowed to read and modify both replicated and >> Partitioned cache i.e. use both Replicated and Partitioned? >> > Yes, this is perfectly fine to update replicated and transactional caches > inside one transaction. > > From the debug output that you provided we can infer that the version of > both entries have changed for both caches before transaction prepare phase. > I would back up Alexei here: > * How do you ensure that there are no concurrent updates on the keys? > * How many retry attempts did you run? > * Are your caches in-memory with eviction policy? > * Do you have TTL enabled for either of the caches? > * Do you have a 3rd-party persistence and read-through and write-through > enabled for either of the caches? > * Can you check if the issue reproduces if you set > -DIGNITE_READ_LOAD_BALANCING=false system property? > > --AG > |
Prasad,
The current version in the entry is checked agains the version which was read from the very same entry, so with absence of concurrent updates the version will be the same. From your description, I think there might be a concurrent read for the key that you clear which loads the value on primary node with a different version. Then, the read happening inside transaction always reads the value from backup which leads to a permanent version mismatch; I reproduced this scenario locally. The -DIGNITE_READ_LOAD_BALANCING=false fixes the issue at a price of disabling read load balancing. I will create a ticket for the issue shortly, perhaps someone from the community will pick it up. |
Free forum by Nabble | Edit this page |