Denis Chudov created IGNITE-12760:
------------------------------------- Summary: Prevent AssertionError on message unmarshalling, when classLoaderId contains id of node that already left Key: IGNITE-12760 URL: https://issues.apache.org/jira/browse/IGNITE-12760 Project: Ignite Issue Type: Bug Reporter: Denis Chudov Assignee: Denis Chudov Following assertion error triggers failure handler and crashes the node. Can possibly crash the whole cluster. {code:java} 2020-02-18 14:34:09.775\[ERROR]\[query-#146129%DPL_GRID%DplGridNodeName%]\[o.a.i.i.p.cache.GridCacheIoManager] Failed to process message \[senderId=727757ed-4ad4-4779-bda9-081525725cce, msg=GridCacheQueryRequest \[id=178, cacheName=com.sbt.tokenization.data.entity.KEKEntity_DPL_union-module, type=SCAN, fields=false, clause=null, clsName=null, keyValFilter=null, rdc=null, trans=null, pageSize=1024, incBackups=false, cancel=false, incMeta=false, all=false, keepBinary=true, subjId=727757ed-4ad4-4779-bda9-081525725cce, taskHash=0, part=-1, topVer=AffinityTopologyVersion \[topVer=97, minorTopVer=0], sendTimestamp=-1, receiveTimestamp=-1, super=GridCacheIdMessage \[cacheId=-1129073400, super=GridCacheMessage \[msgId=179, depInfo=GridDeploymentInfoBean \[clsLdrId=c32670e3071-d30ee64b-0833-45d4-abbe-fb6282669caa, depMode=SHARED, userVer=0, locDepOwner=false, participants=null], lastAffChangedTopVer=AffinityTopologyVersion \[topVer=8, minorTopVer=6], err=null, skipPrepare=false]]]] java.lang.AssertionError: null at org.apache.ignite.internal.processors.cache.GridCacheDeploymentManager$CachedDeploymentInfo.<init>(GridCacheDeploymentManager.java:918) at org.apache.ignite.internal.processors.cache.GridCacheDeploymentManager$CachedDeploymentInfo.<init>(GridCacheDeploymentManager.java:889) at org.apache.ignite.internal.processors.cache.GridCacheDeploymentManager.p2pContext(GridCacheDeploymentManager.java:422) at org.apache.ignite.internal.processors.cache.GridCacheIoManager.unmarshall(GridCacheIoManager.java:1576) at org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:584) at org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:386) at org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:312) at org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:102) at org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:301) at org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1565) at org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1189) at org.apache.ignite.internal.managers.communication.GridIoManager.access$4300(GridIoManager.java:130) at org.apache.ignite.internal.managers.communication.GridIoManager$8.run(GridIoManager.java:1092) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748){code} There is no fair reproducer for now, but it seems that we should prevent such situation in general like following: 1) check the correctness of the message before it will be sent - inside of GridCacheDeploymentManager#prepare. If we have the corresponding class loader on local node, we can try to fix message and replace wrong class loader with local one. 2) log suspicious deployments which we receive from GridDeploymentManager#deploy - maybe we have obsolete deployments in caches. 3) possibly we can remove this assertion, we should have this class on sender node and use it as class loader id, and if we don't, we will receive exception on finishUnmarshall (Failed to peer load class) and try to process this situation with GridCacheIoManager#processFailedMessage. -- This message was sent by Atlassian Jira (v8.3.4#803005) |
Free forum by Nabble | Edit this page |