Server Node comes down with : (err) Failed to notify listener: GridDhtTxPrepareFuture Error

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Server Node comes down with : (err) Failed to notify listener: GridDhtTxPrepareFuture Error

VeenaMithare
This post was updated on .
Hi Team,

We have a 3 node server cluster

A 4th node joins as a client with a continuous query on a Table A(
Transaction_mode = transactional ).

Now If I bring the client down and issue an update to the Table A within
failureDetectionTimeout 30000 , I get the following error and */this error
brings the server down/*:

"(err) Failed to notify listener: GridDhtTxPrepareFuture Error"
===================================
Basically the server , tries to update the record on the Table A, and tries
to  notify Client since it had registered a continuous query for Table A.  
But since the Client Node has been brought down, it undeploys the
remotefilterfactory lambda. Hence the server is no longer able to complete the
transaction .

*/This also brings the server down./
*
How can I resolve this issue ?
=======================================
Please find the complete stack trace for this error :

[12:14:12] (err) Failed to notify listener: GridDhtTxPrepareFuture
[futId=0a69e79c071-93faf34d-a776-4166-9f3b-4b5a0f54b8f9, err=null,
replied=1, mapped=1, req=GridNearTxPrepareRequest
[futId=4250e79c071-51438f4f-c061-45f7-b34e-57c90f2055e9, miniId=1,
topVer=AffinityTopologyVersion [topVer=7, minorTopVer=0],
subjId=da486d0b-36a1-43d4-b05b-47d126fd880e, taskNameHash=0,
flags=[implicitSingle], super=GridDistributedTxPrepareRequest [threadId=382,
concurrency=OPTIMISTIC, isolation=READ_COMMITTED, writeVer=GridCacheVersion
[topVer=195408427, order=1583928843624, nodeOrder=1], timeout=1000,
reads=null, writes=[IgniteTxEntry [key=ABCKEY [idHash=1413504800,
hash=-1419375634, VALUETYPE=somevaluetype, NAME=TEST4375234],
cacheId=-1512899836, txKey=IgniteTxKey [key=ABCKEY [idHash=1413504800,
hash=-1419375634, VALUETYPE=somevaluetype, NAME=TEST4375234],
cacheId=-1512899836], val=[op=CREATE, val=ABC [idHash=108633195,
hash=-965148880, ACTIVE=true, MODIFICATIONDATE=2020-02-03 18:29:03.501,
VALUETYPE=null, SCHEMAREF=null, VALUE=DEV, MACHINENAME=null, COMMENT=null,
NAME=null, APPLICATIONNAME=null, SCHEMANAME=null, KEYNAME=ENVIRONMENT,
USERNAME=null, INTERNALVERSION=null, MODIFICATIONTYPE=null]],
prevVal=[op=NOOP, val=null], oldVal=[op=NOOP, val=null],
entryProcessorsCol=null, ttl=-1, conflictExpireTime=-1, conflictVer=null,
explicitVer=null, dhtVer=null,
filters=[o.a.i.i.processors.cache.CacheEntrySerializablePredicate@388c822f],
filtersPassed=false, filtersSet=false, entry=GridDhtCacheEntry [rdrs=[],
part=136, super=GridDistributedCacheEntry [super=GridCacheMapEntry
[key=ABCKEY [idHash=1413504800, hash=-1419375634, VALUETYPE=somevaluetype,
NAME=TEST4375234], val=null, ver=GridCacheVersion [topVer=195408427,
order=1583928843625, nodeOrder=4], hash=-1419375634,
extras=GridCacheObsoleteEntryExtras [obsoleteVer=GridCacheVersion
[topVer=2147483647, order=0, nodeOrder=0]], flags=2]]], prepared=1,
locked=false, nodeId=null, locMapped=false, expiryPlc=null,
transferExpiryPlc=false, flags=2, partUpdateCntr=0, serReadVer=null,
xidVer=null]], dhtVers=null, txSize=0, plc=2,
txState=IgniteTxImplicitSingleStateImpl [init=true, recovery=false],
flags=onePhase|last, super=GridDistributedBaseMessage [ver=GridCacheVersion
[topVer=195408427, order=1583928843624, nodeOrder=1], committedVers=null,
rolledbackVers=null, cnt=0, super=GridCacheIdMessage [cacheId=0]]]],
trackable=true, nearMiniId=1, last=true, retVal=false, ret=GridCacheReturn
[v=null, cacheObj=null, success=true, invokeRes=false, loc=false,
cacheId=0], lockKeys=[], forceKeysFut=null, locksReady=true, invoke=false,
timeoutObj=PrepareTimeoutObject [timeout=1000], xid=GridCacheVersion
[topVer=195408427, order=1583928843625, nodeOrder=4],
innerFuts=[[node=da486d0b-36a1-43d4-b05b-47d126fd880e, loc=false,
done=true]], super=GridCompoundFuture
[rdc=o.a.i.i.processors.cache.distributed.dht.GridDhtTxPrepareFuture$1@73415bf,
initFlag=1, lsnrCalls=1, done=true, cancelled=false, err=null,
futs=[true]]]java.lang.NoClassDefFoundError:
com/companyname/abc/configstore/helper/ContinuousQueryHelper
        at
com.companyname.abc.configstore.helper.ContinuousQueryHelper$ConfigStoreTableRemoteFilterFactory$1.evaluate(ContinuousQueryHelper.java:293)
        at
org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler.filter(CacheContinuousQueryHandler.java:833)
        at
org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler$2.onEntryUpdated(CacheContinuousQueryHandler.java:422)
        at
org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryManager.onEntryUpdated(CacheContinuousQueryManager.java:426)
        at
org.apache.ignite.internal.processors.cache.GridCacheMapEntry.innerSet(GridCacheMapEntry.java:1584)
        at
org.apache.ignite.internal.processors.cache.transactions.IgniteTxLocalAdapter.userCommit(IgniteTxLocalAdapter.java:741)
        at
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxLocalAdapter.localFinish(GridDhtTxLocalAdapter.java:796)
        at
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxLocal.localFinish(GridDhtTxLocal.java:584)
        at
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxLocal.finishTx(GridDhtTxLocal.java:463)
        at
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxLocal.commitDhtLocalAsync(GridDhtTxLocal.java:516)
        at
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxLocal.commitAsync(GridDhtTxLocal.java:525)
        at
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareFuture.onDone(GridDhtTxPrepareFuture.java:758)
        at
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareFuture.onDone(GridDhtTxPrepareFuture.java:110)
        at
org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:453)
        at
org.apache.ignite.internal.util.future.GridCompoundFuture.checkComplete(GridCompoundFuture.java:285)
        at
org.apache.ignite.internal.util.future.GridCompoundFuture.apply(GridCompoundFuture.java:144)
        at
org.apache.ignite.internal.util.future.GridCompoundFuture.apply(GridCompoundFuture.java:45)
        at
org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:385)
        at
org.apache.ignite.internal.util.future.GridFutureAdapter.unblock(GridFutureAdapter.java:349)
        at
org.apache.ignite.internal.util.future.GridFutureAdapter.unblockAll(GridFutureAdapter.java:337)
        at
org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:497)
        at
org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:476)
        at
org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:453)
        at
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareFuture$MiniFuture.onResult(GridDhtTxPrepareFuture.java:1948)
        at
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareFuture.onResult(GridDhtTxPrepareFuture.java:572)
        at
org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.processDhtTxPrepareResponse(IgniteTxHandler.java:798)
        at
org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.access$500(IgniteTxHandler.java:119)
        at
org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler$6.apply(IgniteTxHandler.java:229)
        at
org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler$6.apply(IgniteTxHandler.java:227)
        at
org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1056)
        at
org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:581)
        at
org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:380)
        at
org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:306)
        at
org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:101)
        at
org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:295)
        at
org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1569)
        at
org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1197)
        at
org.apache.ignite.internal.managers.communication.GridIoManager.access$4200(GridIoManager.java:127)
        at
org.apache.ignite.internal.managers.communication.GridIoManager$9.run(GridIoManager.java:1093)
        at
org.apache.ignite.internal.util.StripedExecutor$Stripe.body(StripedExecutor.java:505)
        at
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassNotFoundException: Failed to peer load class
[class=com.companyname.abc.configstore.helper.ContinuousQueryHelper,
nodeClsLdrs={fb2b9513-a763-488a-86b8-39d80e18427f=35f0489c071-fb2b9513-a763-488a-86b8-39d80e18427f},
parentClsLoader=sun.misc.Launcher$AppClassLoader@73d16e93]
        at
org.apache.ignite.internal.managers.deployment.GridDeploymentClassLoader.sendClassRequest(GridDeploymentClassLoader.java:661)
        at
org.apache.ignite.internal.managers.deployment.GridDeploymentClassLoader.findClass(GridDeploymentClassLoader.java:508)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
        at
org.apache.ignite.internal.managers.deployment.GridDeploymentClassLoader.loadClass(GridDeploymentClassLoader.java:440)
        ... 42 more
Caused by: class org.apache.ignite.IgniteCheckedException: Failed to send
message (node may have left the grid or TCP connection cannot be established
due to firewall issues) [node=TcpDiscoveryNode
[id=fb2b9513-a763-488a-86b8-39d80e18427f, addrs=[0:0:0:0:0:0:0:1, x.x.x.100,
127.0.0.1], sockAddrs=[machinename.companyname.LOCAL/x.x.x.100:0,
/0:0:0:0:0:0:0:1:0, /127.0.0.1:0], discPort=0, order=7, intOrder=5,
lastExchangeTime=1583928842125, loc=false, ver=2.7.6#20190911-sha1:21f7ca41,
isClient=true], topic=TOPIC_CLASSLOAD, msg=GridDeploymentRequest
[rsrcName=com/companyname/abc/configstore/helper/ContinuousQueryHelper.class,
ldrId=35f0489c071-fb2b9513-a763-488a-86b8-39d80e18427f, isUndeploy=false,
nodeIds=null], policy=1]
        at
org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1667)
        at
org.apache.ignite.internal.managers.communication.GridIoManager.sendToGridTopic(GridIoManager.java:1731)
        at
org.apache.ignite.internal.managers.deployment.GridDeploymentCommunication.sendResourceRequest(GridDeploymentCommunication.java:454)
        at
org.apache.ignite.internal.managers.deployment.GridDeploymentClassLoader.sendClassRequest(GridDeploymentClassLoader.java:601)
        ... 45 more
Caused by: class org.apache.ignite.spi.IgniteSpiException: Failed to send
message to remote node: TcpDiscoveryNode
[id=fb2b9513-a763-488a-86b8-39d80e18427f, addrs=[0:0:0:0:0:0:0:1, x.x.x.100,
127.0.0.1], sockAddrs=[machinename.companyname.LOCAL/x.x.x.100:0,
/0:0:0:0:0:0:0:1:0, /127.0.0.1:0], discPort=0, order=7, intOrder=5,
lastExchangeTime=1583928842125, loc=false, ver=2.7.6#20190911-sha1:21f7ca41,
isClient=true]
        at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:2747)
        at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:2672)
        at
org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1656)
        ... 48 more
Caused by: class org.apache.ignite.IgniteCheckedException: Failed to connect
to node (is node still alive?). Make sure that each ComputeTask and cache
Transaction has a timeout set in order to prevent parties from waiting
forever in case of network issues
[nodeId=fb2b9513-a763-488a-86b8-39d80e18427f, addrs=[/127.0.0.1:47102,
/0:0:0:0:0:0:0:1:47102, machinename.companyname.LOCAL/x.x.x.100:47102]]
        at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3459)
        at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createNioClient(TcpCommunicationSpi.java:2987)
        at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:2870)
        at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.access$6000(TcpCommunicationSpi.java:271)
        at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$CommunicationWorker.processDisconnect(TcpCommunicationSpi.java:4489)
        at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$CommunicationWorker.body(TcpCommunicationSpi.java:4294)
        at
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
        at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$5.body(TcpCommunicationSpi.java:2237)
        at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)
        Suppressed: class org.apache.ignite.IgniteCheckedException: Failed to
connect to address [addr=/127.0.0.1:47102, err=Connection refused: no
further information]
                at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3462)
                ... 8 more
        Caused by: java.net.ConnectException: Connection refused: no further
information
                at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
                at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
                at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:111)
                at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3299)
                ... 8 more
        Suppressed: class org.apache.ignite.IgniteCheckedException: Failed to
connect to address [addr=/0:0:0:0:0:0:0:1:47102, err=Connection refused: no
further information]
                at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3462)
                ... 8 more
        Caused by: java.net.ConnectException: Connection refused: no further
information
                at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
                at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
                at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:111)
                at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3299)
                ... 8 more
        Suppressed: class org.apache.ignite.IgniteCheckedException: Failed to
connect to address [addr=machinename.companyname.LOCAL/x.x.x.100:47102,
err=Connection refused: no further information]
                at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3462)
                ... 8 more
        Caused by: java.net.ConnectException: Connection refused: no further
information
                at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
                at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
                at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:111)
                at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3299)



--
Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Server Node comes down with : (err) Failed to notify listener: GridDhtTxPrepareFuture Error

VeenaMithare
Hi ,

Did anyone get a chance to look at this ?
Summary of the issue I am facing. :

We have a 3 node server cluster

A 4th node joins as a client with a continuous query on a Table A(
Transaction_mode = transactional ).

Now If I bring the client down and issue an update to the Table A within
failureDetectionTimeout 30000 , I get the following error and */this error
brings the server down/*:

"(err) Failed to notify listener: GridDhtTxPrepareFuture Error"
===================================
Basically the server , tries to update the record on the Table A, and tries
to  notify Client since it had registered a continuous query for Table A.  
But since the Client Node has been brought down, it undeploys the
remotefilterfactory lambda. Hence the server is no longer able to complete
the
transaction .

*/This also brings the server down./
*
regards,
Veena.



--
Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Server Node comes down with : (err) Failed to notify listener: GridDhtTxPrepareFuture Error

Ilya Kasnacheev
Hello!

But why the node was down?

Was it due to failure handler? Unhandled exception in critical thread?
Anything else? Did you find any work-around?

I have not heard about Continuous Query issues such as the one you are
describing, and I suggest filing an IGNITE ticket with details.

Regards,
--
Ilya Kasnacheev


чт, 12 мар. 2020 г. в 16:39, VeenaMithare <[hidden email]>:

> Hi ,
>
> Did anyone get a chance to look at this ?
> Summary of the issue I am facing. :
>
> We have a 3 node server cluster
>
> A 4th node joins as a client with a continuous query on a Table A(
> Transaction_mode = transactional ).
>
> Now If I bring the client down and issue an update to the Table A within
> failureDetectionTimeout 30000 , I get the following error and */this error
> brings the server down/*:
>
> "(err) Failed to notify listener: GridDhtTxPrepareFuture Error"
> ===================================
> Basically the server , tries to update the record on the Table A, and tries
> to  notify Client since it had registered a continuous query for Table A.
> But since the Client Node has been brought down, it undeploys the
> remotefilterfactory lambda. Hence the server is no longer able to complete
> the
> transaction .
>
> */This also brings the server down./
> *
> regards,
> Veena.
>
>
>
> --
> Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
>
Reply | Threaded
Open this post in threaded view
|

Re: Server Node comes down with : (err) Failed to notify listener: GridDhtTxPrepareFuture Error

VeenaMithare
This post was updated on .
Reply | Threaded
Open this post in threaded view
|

Re: Server Node comes down with : (err) Failed to notify listener: GridDhtTxPrepareFuture Error

Denis Mekhanikov
Thanks for the report!

The issue here is that a remote filter for a continuous query is loaded
using peer class loading, and other classes that this remote filter depends
on can be lazily loaded during its work.
Loading every dependency class involves going to the node where the
originating class was loading from, and asking that node to send missing
classes over the network.
The issues begin when this node is not in the cluster anymore, and the
continuous query wasn't undeployed yet.
A server sends a request for a class to a node that is not available, but
wasn't kicked out of the topology yet, since a failure detection timeout
hasn't elapsed yet.
It leads to a NoClasDefFound exception that you observe in the logs.

The biggest issue here is that this exception triggers a failure handler
that makes the whole node go down.
I would expect that only one request would fail, but not the whole node.

As a temporary solution you can stop relying on peer class loading for
continuous queries and provide the code of remote filters to the classpath
of server nodes.
This way no lazy class loading will be performed over the network since
they will all be available locally.

Denis

пт, 13 мар. 2020 г. в 20:39, VeenaMithare <[hidden email]>:

> Raised this jira :
> https://issues.apache.org/jira/browse/IGNITE-12784
>
> Observed in 2.7.6. Unable to easily test in 2.8.0 because of other issues.
> One of them being -
>
> http://apache-ignite-users.70518.x6.nabble.com/2-8-0-JDBC-Thin-Client-Unable-to-load-the-tables-via-DBeaver-td31681.html
> Please note this happens
>
>
>
> --
> Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
>
Reply | Threaded
Open this post in threaded view
|

Re: Server Node comes down with : (err) Failed to notify listener: GridDhtTxPrepareFuture Error

VeenaMithare
>>As a temporary solution you can stop relying on peer class loading for
continuous queries and provide the code of remote filters to the classpath
of server nodes.

Yes.. I was thinking of a solution on similar lines. Thank you for the
reply.




--
Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/