Question about rebuilding caches, timeouts etc.

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Question about rebuilding caches, timeouts etc.

Ognen Duzlevski
Hello all,

I tried posting to the user list but I am still not seeing the email after
an hour.

I have a 5 node cluster where I "lost" a node temporarily (Amazon reported
a hardware error so my Ops guys shut down the instance and brought a new
one back up).

I ran the same ignite.sh configuration on the new node, expecting it to
join the cluster - however, I am seeing the following in the logs (see
below). In addition, I cannot access the caches anymore from my code -
connecting to a cache via getOrCreateCache() just hangs and eventually
times out. The cluster still has 4 members so I am not quite sure what is
going on. To add to this - I can cache -scan the caches from visor and all
the information is still there, however, inaccessible from my code (with
client mode on or off, doesn't matter). I am baffled.

Thanks!
Ognen

[23:10:38,923][WARNING][grid-timeout-worker-#33%null%][GridDhtPartitionsExchangeFuture]
Retrying preload partition exchange due to timeout [done=false,
dummy=false, exchId=GridDhtPartitionExchangeId
[topVer=AffinityTopologyVersion [topVer=1848, minorTopVer=0],
nodeId=9e648fd3, evt=NODE_JOINED], rcvdIds=[], rmtIds=[e5b581b3, bd33def3,
f7cc4da6, 8eda3172, efef2202], remaining=[e5b581b3, bd33def3, f7cc4da6,
8eda3172, efef2202], init=true, initFut=true, ready=true, replied=false,
added=true, oldest=e5b581b3, oldestOrder=1, evtLatch=0, locNodeOrder=1848,
locNodeId=9e648fd3-7c84-4261-93e0-916275a0a9ae]
[23:10:53,699][WARNING][main][GridCachePartitionExchangeManager] Failed to
wait for initial partition map exchange. Possible reasons are:
  ^-- Transactions in deadlock.
  ^-- Long running transactions (ignore if this is the case).
  ^-- Unreleased explicit locks.
[23:10:53,926][WARNING][grid-timeout-worker-#33%null%][GridDhtPartitionsExchangeFuture]
Retrying preload partition exchange due to timeout [done=false,
dummy=false, exchId=GridDhtPartitionExchangeId
[topVer=AffinityTopologyVersion [topVer=1848, minorTopVer=0],
nodeId=9e648fd3, evt=NODE_JOINED], rcvdIds=[], rmtIds=[e5b581b3, bd33def3,
f7cc4da6, 8eda3172, efef2202], remaining=[e5b581b3, bd33def3, f7cc4da6,
8eda3172, efef2202], init=true, initFut=true, ready=true, replied=false,
added=true, oldest=e5b581b3, oldestOrder=1, evtLatch=0, locNodeOrder=1848,
locNodeId=9e648fd3-7c84-4261-93e0-916275a0a9ae]
[23:11:08,929][WARNING][grid-timeout-worker-#33%null%][GridDhtPartitionsExchangeFuture]
Retrying preload partition exchange due to timeout [done=false,
dummy=false, exchId=GridDhtPartitionExchangeId
[topVer=AffinityTopologyVersion [topVer=1848, minorTopVer=0],
nodeId=9e648fd3, evt=NODE_JOINED], rcvdIds=[], rmtIds=[e5b581b3, bd33def3,
f7cc4da6, 8eda3172, efef2202], remaining=[e5b581b3, bd33def3, f7cc4da6,
8eda3172, efef2202], init=true, initFut=true, ready=true, replied=false,
added=true, oldest=e5b581b3, oldestOrder=1, evtLatch=0, locNodeOrder=1848,
locNodeId=9e648fd3-7c84-4261-93e0-916275a0a9ae]
[....]
[repeated many, many times]
[....]
Reply | Threaded
Open this post in threaded view
|

Re: Question about rebuilding caches, timeouts etc.

dsetrakyan
Ognen,

I see your post on the user list:
http://apache-ignite-users.70518.x6.nabble.com/Node-failing-with-weird-errors-td301.html

I also got an email. Are you sure it didn't end up in your spam folder?

D.

On Tue, May 12, 2015 at 12:21 AM, Ognen Duzlevski <[hidden email]
> wrote:

> Hello all,
>
> I tried posting to the user list but I am still not seeing the email after
> an hour.
>
> I have a 5 node cluster where I "lost" a node temporarily (Amazon reported
> a hardware error so my Ops guys shut down the instance and brought a new
> one back up).
>
> I ran the same ignite.sh configuration on the new node, expecting it to
> join the cluster - however, I am seeing the following in the logs (see
> below). In addition, I cannot access the caches anymore from my code -
> connecting to a cache via getOrCreateCache() just hangs and eventually
> times out. The cluster still has 4 members so I am not quite sure what is
> going on. To add to this - I can cache -scan the caches from visor and all
> the information is still there, however, inaccessible from my code (with
> client mode on or off, doesn't matter). I am baffled.
>
> Thanks!
> Ognen
>
>
> [23:10:38,923][WARNING][grid-timeout-worker-#33%null%][GridDhtPartitionsExchangeFuture]
> Retrying preload partition exchange due to timeout [done=false,
> dummy=false, exchId=GridDhtPartitionExchangeId
> [topVer=AffinityTopologyVersion [topVer=1848, minorTopVer=0],
> nodeId=9e648fd3, evt=NODE_JOINED], rcvdIds=[], rmtIds=[e5b581b3, bd33def3,
> f7cc4da6, 8eda3172, efef2202], remaining=[e5b581b3, bd33def3, f7cc4da6,
> 8eda3172, efef2202], init=true, initFut=true, ready=true, replied=false,
> added=true, oldest=e5b581b3, oldestOrder=1, evtLatch=0, locNodeOrder=1848,
> locNodeId=9e648fd3-7c84-4261-93e0-916275a0a9ae]
> [23:10:53,699][WARNING][main][GridCachePartitionExchangeManager] Failed to
> wait for initial partition map exchange. Possible reasons are:
>   ^-- Transactions in deadlock.
>   ^-- Long running transactions (ignore if this is the case).
>   ^-- Unreleased explicit locks.
>
> [23:10:53,926][WARNING][grid-timeout-worker-#33%null%][GridDhtPartitionsExchangeFuture]
> Retrying preload partition exchange due to timeout [done=false,
> dummy=false, exchId=GridDhtPartitionExchangeId
> [topVer=AffinityTopologyVersion [topVer=1848, minorTopVer=0],
> nodeId=9e648fd3, evt=NODE_JOINED], rcvdIds=[], rmtIds=[e5b581b3, bd33def3,
> f7cc4da6, 8eda3172, efef2202], remaining=[e5b581b3, bd33def3, f7cc4da6,
> 8eda3172, efef2202], init=true, initFut=true, ready=true, replied=false,
> added=true, oldest=e5b581b3, oldestOrder=1, evtLatch=0, locNodeOrder=1848,
> locNodeId=9e648fd3-7c84-4261-93e0-916275a0a9ae]
>
> [23:11:08,929][WARNING][grid-timeout-worker-#33%null%][GridDhtPartitionsExchangeFuture]
> Retrying preload partition exchange due to timeout [done=false,
> dummy=false, exchId=GridDhtPartitionExchangeId
> [topVer=AffinityTopologyVersion [topVer=1848, minorTopVer=0],
> nodeId=9e648fd3, evt=NODE_JOINED], rcvdIds=[], rmtIds=[e5b581b3, bd33def3,
> f7cc4da6, 8eda3172, efef2202], remaining=[e5b581b3, bd33def3, f7cc4da6,
> 8eda3172, efef2202], init=true, initFut=true, ready=true, replied=false,
> added=true, oldest=e5b581b3, oldestOrder=1, evtLatch=0, locNodeOrder=1848,
> locNodeId=9e648fd3-7c84-4261-93e0-916275a0a9ae]
> [....]
> [repeated many, many times]
> [....]
>
Reply | Threaded
Open this post in threaded view
|

Re: Question about rebuilding caches, timeouts etc.

Ognen Duzlevski
Dmitriy, nope - it is not in the spam and I still have not seen it :)

On Tue, May 12, 2015 at 2:52 AM, Dmitriy Setrakyan <[hidden email]>
wrote:

> Ognen,
>
> I see your post on the user list:
>
> http://apache-ignite-users.70518.x6.nabble.com/Node-failing-with-weird-errors-td301.html
>
> I also got an email. Are you sure it didn't end up in your spam folder?
>
> D.
>
> On Tue, May 12, 2015 at 12:21 AM, Ognen Duzlevski <
> [hidden email]
> > wrote:
>
> > Hello all,
> >
> > I tried posting to the user list but I am still not seeing the email
> after
> > an hour.
> >
> > I have a 5 node cluster where I "lost" a node temporarily (Amazon
> reported
> > a hardware error so my Ops guys shut down the instance and brought a new
> > one back up).
> >
> > I ran the same ignite.sh configuration on the new node, expecting it to
> > join the cluster - however, I am seeing the following in the logs (see
> > below). In addition, I cannot access the caches anymore from my code -
> > connecting to a cache via getOrCreateCache() just hangs and eventually
> > times out. The cluster still has 4 members so I am not quite sure what is
> > going on. To add to this - I can cache -scan the caches from visor and
> all
> > the information is still there, however, inaccessible from my code (with
> > client mode on or off, doesn't matter). I am baffled.
> >
> > Thanks!
> > Ognen
> >
> >
> >
> [23:10:38,923][WARNING][grid-timeout-worker-#33%null%][GridDhtPartitionsExchangeFuture]
> > Retrying preload partition exchange due to timeout [done=false,
> > dummy=false, exchId=GridDhtPartitionExchangeId
> > [topVer=AffinityTopologyVersion [topVer=1848, minorTopVer=0],
> > nodeId=9e648fd3, evt=NODE_JOINED], rcvdIds=[], rmtIds=[e5b581b3,
> bd33def3,
> > f7cc4da6, 8eda3172, efef2202], remaining=[e5b581b3, bd33def3, f7cc4da6,
> > 8eda3172, efef2202], init=true, initFut=true, ready=true, replied=false,
> > added=true, oldest=e5b581b3, oldestOrder=1, evtLatch=0,
> locNodeOrder=1848,
> > locNodeId=9e648fd3-7c84-4261-93e0-916275a0a9ae]
> > [23:10:53,699][WARNING][main][GridCachePartitionExchangeManager] Failed
> to
> > wait for initial partition map exchange. Possible reasons are:
> >   ^-- Transactions in deadlock.
> >   ^-- Long running transactions (ignore if this is the case).
> >   ^-- Unreleased explicit locks.
> >
> >
> [23:10:53,926][WARNING][grid-timeout-worker-#33%null%][GridDhtPartitionsExchangeFuture]
> > Retrying preload partition exchange due to timeout [done=false,
> > dummy=false, exchId=GridDhtPartitionExchangeId
> > [topVer=AffinityTopologyVersion [topVer=1848, minorTopVer=0],
> > nodeId=9e648fd3, evt=NODE_JOINED], rcvdIds=[], rmtIds=[e5b581b3,
> bd33def3,
> > f7cc4da6, 8eda3172, efef2202], remaining=[e5b581b3, bd33def3, f7cc4da6,
> > 8eda3172, efef2202], init=true, initFut=true, ready=true, replied=false,
> > added=true, oldest=e5b581b3, oldestOrder=1, evtLatch=0,
> locNodeOrder=1848,
> > locNodeId=9e648fd3-7c84-4261-93e0-916275a0a9ae]
> >
> >
> [23:11:08,929][WARNING][grid-timeout-worker-#33%null%][GridDhtPartitionsExchangeFuture]
> > Retrying preload partition exchange due to timeout [done=false,
> > dummy=false, exchId=GridDhtPartitionExchangeId
> > [topVer=AffinityTopologyVersion [topVer=1848, minorTopVer=0],
> > nodeId=9e648fd3, evt=NODE_JOINED], rcvdIds=[], rmtIds=[e5b581b3,
> bd33def3,
> > f7cc4da6, 8eda3172, efef2202], remaining=[e5b581b3, bd33def3, f7cc4da6,
> > 8eda3172, efef2202], init=true, initFut=true, ready=true, replied=false,
> > added=true, oldest=e5b581b3, oldestOrder=1, evtLatch=0,
> locNodeOrder=1848,
> > locNodeId=9e648fd3-7c84-4261-93e0-916275a0a9ae]
> > [....]
> > [repeated many, many times]
> > [....]
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Question about rebuilding caches, timeouts etc.

dsetrakyan
I know some people probably subscribed to the user list recently. Is there
anyone else in the community having problems with the list working or not
receiving emails?

I have again just subscribed from my other personal account and got through
the ezmim subscription process OK.

Ognen, Can you please subscribe again and see if it starts working for you?

D.

On Tue, May 12, 2015 at 8:06 AM, Ognen Duzlevski <[hidden email]>
wrote:

> Dmitriy, nope - it is not in the spam and I still have not seen it :)
>
> On Tue, May 12, 2015 at 2:52 AM, Dmitriy Setrakyan <[hidden email]>
> wrote:
>
> > Ognen,
> >
> > I see your post on the user list:
> >
> >
> http://apache-ignite-users.70518.x6.nabble.com/Node-failing-with-weird-errors-td301.html
> >
> > I also got an email. Are you sure it didn't end up in your spam folder?
> >
> > D.
> >
> > On Tue, May 12, 2015 at 12:21 AM, Ognen Duzlevski <
> > [hidden email]
> > > wrote:
> >
> > > Hello all,
> > >
> > > I tried posting to the user list but I am still not seeing the email
> > after
> > > an hour.
> > >
> > > I have a 5 node cluster where I "lost" a node temporarily (Amazon
> > reported
> > > a hardware error so my Ops guys shut down the instance and brought a
> new
> > > one back up).
> > >
> > > I ran the same ignite.sh configuration on the new node, expecting it to
> > > join the cluster - however, I am seeing the following in the logs (see
> > > below). In addition, I cannot access the caches anymore from my code -
> > > connecting to a cache via getOrCreateCache() just hangs and eventually
> > > times out. The cluster still has 4 members so I am not quite sure what
> is
> > > going on. To add to this - I can cache -scan the caches from visor and
> > all
> > > the information is still there, however, inaccessible from my code
> (with
> > > client mode on or off, doesn't matter). I am baffled.
> > >
> > > Thanks!
> > > Ognen
> > >
> > >
> > >
> >
> [23:10:38,923][WARNING][grid-timeout-worker-#33%null%][GridDhtPartitionsExchangeFuture]
> > > Retrying preload partition exchange due to timeout [done=false,
> > > dummy=false, exchId=GridDhtPartitionExchangeId
> > > [topVer=AffinityTopologyVersion [topVer=1848, minorTopVer=0],
> > > nodeId=9e648fd3, evt=NODE_JOINED], rcvdIds=[], rmtIds=[e5b581b3,
> > bd33def3,
> > > f7cc4da6, 8eda3172, efef2202], remaining=[e5b581b3, bd33def3, f7cc4da6,
> > > 8eda3172, efef2202], init=true, initFut=true, ready=true,
> replied=false,
> > > added=true, oldest=e5b581b3, oldestOrder=1, evtLatch=0,
> > locNodeOrder=1848,
> > > locNodeId=9e648fd3-7c84-4261-93e0-916275a0a9ae]
> > > [23:10:53,699][WARNING][main][GridCachePartitionExchangeManager] Failed
> > to
> > > wait for initial partition map exchange. Possible reasons are:
> > >   ^-- Transactions in deadlock.
> > >   ^-- Long running transactions (ignore if this is the case).
> > >   ^-- Unreleased explicit locks.
> > >
> > >
> >
> [23:10:53,926][WARNING][grid-timeout-worker-#33%null%][GridDhtPartitionsExchangeFuture]
> > > Retrying preload partition exchange due to timeout [done=false,
> > > dummy=false, exchId=GridDhtPartitionExchangeId
> > > [topVer=AffinityTopologyVersion [topVer=1848, minorTopVer=0],
> > > nodeId=9e648fd3, evt=NODE_JOINED], rcvdIds=[], rmtIds=[e5b581b3,
> > bd33def3,
> > > f7cc4da6, 8eda3172, efef2202], remaining=[e5b581b3, bd33def3, f7cc4da6,
> > > 8eda3172, efef2202], init=true, initFut=true, ready=true,
> replied=false,
> > > added=true, oldest=e5b581b3, oldestOrder=1, evtLatch=0,
> > locNodeOrder=1848,
> > > locNodeId=9e648fd3-7c84-4261-93e0-916275a0a9ae]
> > >
> > >
> >
> [23:11:08,929][WARNING][grid-timeout-worker-#33%null%][GridDhtPartitionsExchangeFuture]
> > > Retrying preload partition exchange due to timeout [done=false,
> > > dummy=false, exchId=GridDhtPartitionExchangeId
> > > [topVer=AffinityTopologyVersion [topVer=1848, minorTopVer=0],
> > > nodeId=9e648fd3, evt=NODE_JOINED], rcvdIds=[], rmtIds=[e5b581b3,
> > bd33def3,
> > > f7cc4da6, 8eda3172, efef2202], remaining=[e5b581b3, bd33def3, f7cc4da6,
> > > 8eda3172, efef2202], init=true, initFut=true, ready=true,
> replied=false,
> > > added=true, oldest=e5b581b3, oldestOrder=1, evtLatch=0,
> > locNodeOrder=1848,
> > > locNodeId=9e648fd3-7c84-4261-93e0-916275a0a9ae]
> > > [....]
> > > [repeated many, many times]
> > > [....]
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Question about rebuilding caches, timeouts etc.

yzhdanov
I believe I have subscribed successfully as well. At least I have received
confirmations.

--Yakov

2015-05-12 16:26 GMT+03:00 Dmitriy Setrakyan <[hidden email]>:

> I know some people probably subscribed to the user list recently. Is there
> anyone else in the community having problems with the list working or not
> receiving emails?
>
> I have again just subscribed from my other personal account and got through
> the ezmim subscription process OK.
>
> Ognen, Can you please subscribe again and see if it starts working for you?
>
> D.
>
> On Tue, May 12, 2015 at 8:06 AM, Ognen Duzlevski <
> [hidden email]>
> wrote:
>
> > Dmitriy, nope - it is not in the spam and I still have not seen it :)
> >
> > On Tue, May 12, 2015 at 2:52 AM, Dmitriy Setrakyan <
> [hidden email]>
> > wrote:
> >
> > > Ognen,
> > >
> > > I see your post on the user list:
> > >
> > >
> >
> http://apache-ignite-users.70518.x6.nabble.com/Node-failing-with-weird-errors-td301.html
> > >
> > > I also got an email. Are you sure it didn't end up in your spam folder?
> > >
> > > D.
> > >
> > > On Tue, May 12, 2015 at 12:21 AM, Ognen Duzlevski <
> > > [hidden email]
> > > > wrote:
> > >
> > > > Hello all,
> > > >
> > > > I tried posting to the user list but I am still not seeing the email
> > > after
> > > > an hour.
> > > >
> > > > I have a 5 node cluster where I "lost" a node temporarily (Amazon
> > > reported
> > > > a hardware error so my Ops guys shut down the instance and brought a
> > new
> > > > one back up).
> > > >
> > > > I ran the same ignite.sh configuration on the new node, expecting it
> to
> > > > join the cluster - however, I am seeing the following in the logs
> (see
> > > > below). In addition, I cannot access the caches anymore from my code
> -
> > > > connecting to a cache via getOrCreateCache() just hangs and
> eventually
> > > > times out. The cluster still has 4 members so I am not quite sure
> what
> > is
> > > > going on. To add to this - I can cache -scan the caches from visor
> and
> > > all
> > > > the information is still there, however, inaccessible from my code
> > (with
> > > > client mode on or off, doesn't matter). I am baffled.
> > > >
> > > > Thanks!
> > > > Ognen
> > > >
> > > >
> > > >
> > >
> >
> [23:10:38,923][WARNING][grid-timeout-worker-#33%null%][GridDhtPartitionsExchangeFuture]
> > > > Retrying preload partition exchange due to timeout [done=false,
> > > > dummy=false, exchId=GridDhtPartitionExchangeId
> > > > [topVer=AffinityTopologyVersion [topVer=1848, minorTopVer=0],
> > > > nodeId=9e648fd3, evt=NODE_JOINED], rcvdIds=[], rmtIds=[e5b581b3,
> > > bd33def3,
> > > > f7cc4da6, 8eda3172, efef2202], remaining=[e5b581b3, bd33def3,
> f7cc4da6,
> > > > 8eda3172, efef2202], init=true, initFut=true, ready=true,
> > replied=false,
> > > > added=true, oldest=e5b581b3, oldestOrder=1, evtLatch=0,
> > > locNodeOrder=1848,
> > > > locNodeId=9e648fd3-7c84-4261-93e0-916275a0a9ae]
> > > > [23:10:53,699][WARNING][main][GridCachePartitionExchangeManager]
> Failed
> > > to
> > > > wait for initial partition map exchange. Possible reasons are:
> > > >   ^-- Transactions in deadlock.
> > > >   ^-- Long running transactions (ignore if this is the case).
> > > >   ^-- Unreleased explicit locks.
> > > >
> > > >
> > >
> >
> [23:10:53,926][WARNING][grid-timeout-worker-#33%null%][GridDhtPartitionsExchangeFuture]
> > > > Retrying preload partition exchange due to timeout [done=false,
> > > > dummy=false, exchId=GridDhtPartitionExchangeId
> > > > [topVer=AffinityTopologyVersion [topVer=1848, minorTopVer=0],
> > > > nodeId=9e648fd3, evt=NODE_JOINED], rcvdIds=[], rmtIds=[e5b581b3,
> > > bd33def3,
> > > > f7cc4da6, 8eda3172, efef2202], remaining=[e5b581b3, bd33def3,
> f7cc4da6,
> > > > 8eda3172, efef2202], init=true, initFut=true, ready=true,
> > replied=false,
> > > > added=true, oldest=e5b581b3, oldestOrder=1, evtLatch=0,
> > > locNodeOrder=1848,
> > > > locNodeId=9e648fd3-7c84-4261-93e0-916275a0a9ae]
> > > >
> > > >
> > >
> >
> [23:11:08,929][WARNING][grid-timeout-worker-#33%null%][GridDhtPartitionsExchangeFuture]
> > > > Retrying preload partition exchange due to timeout [done=false,
> > > > dummy=false, exchId=GridDhtPartitionExchangeId
> > > > [topVer=AffinityTopologyVersion [topVer=1848, minorTopVer=0],
> > > > nodeId=9e648fd3, evt=NODE_JOINED], rcvdIds=[], rmtIds=[e5b581b3,
> > > bd33def3,
> > > > f7cc4da6, 8eda3172, efef2202], remaining=[e5b581b3, bd33def3,
> f7cc4da6,
> > > > 8eda3172, efef2202], init=true, initFut=true, ready=true,
> > replied=false,
> > > > added=true, oldest=e5b581b3, oldestOrder=1, evtLatch=0,
> > > locNodeOrder=1848,
> > > > locNodeId=9e648fd3-7c84-4261-93e0-916275a0a9ae]
> > > > [....]
> > > > [repeated many, many times]
> > > > [....]
> > > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Question about rebuilding caches, timeouts etc.

Valentin Kulichenko
User list works for me.

Ognen, I received your message as well.

--
Val

On Tue, May 12, 2015 at 10:31 AM, Yakov Zhdanov <[hidden email]> wrote:

> I believe I have subscribed successfully as well. At least I have received
> confirmations.
>
> --Yakov
>
> 2015-05-12 16:26 GMT+03:00 Dmitriy Setrakyan <[hidden email]>:
>
> > I know some people probably subscribed to the user list recently. Is
> there
> > anyone else in the community having problems with the list working or not
> > receiving emails?
> >
> > I have again just subscribed from my other personal account and got
> through
> > the ezmim subscription process OK.
> >
> > Ognen, Can you please subscribe again and see if it starts working for
> you?
> >
> > D.
> >
> > On Tue, May 12, 2015 at 8:06 AM, Ognen Duzlevski <
> > [hidden email]>
> > wrote:
> >
> > > Dmitriy, nope - it is not in the spam and I still have not seen it :)
> > >
> > > On Tue, May 12, 2015 at 2:52 AM, Dmitriy Setrakyan <
> > [hidden email]>
> > > wrote:
> > >
> > > > Ognen,
> > > >
> > > > I see your post on the user list:
> > > >
> > > >
> > >
> >
> http://apache-ignite-users.70518.x6.nabble.com/Node-failing-with-weird-errors-td301.html
> > > >
> > > > I also got an email. Are you sure it didn't end up in your spam
> folder?
> > > >
> > > > D.
> > > >
> > > > On Tue, May 12, 2015 at 12:21 AM, Ognen Duzlevski <
> > > > [hidden email]
> > > > > wrote:
> > > >
> > > > > Hello all,
> > > > >
> > > > > I tried posting to the user list but I am still not seeing the
> email
> > > > after
> > > > > an hour.
> > > > >
> > > > > I have a 5 node cluster where I "lost" a node temporarily (Amazon
> > > > reported
> > > > > a hardware error so my Ops guys shut down the instance and brought
> a
> > > new
> > > > > one back up).
> > > > >
> > > > > I ran the same ignite.sh configuration on the new node, expecting
> it
> > to
> > > > > join the cluster - however, I am seeing the following in the logs
> > (see
> > > > > below). In addition, I cannot access the caches anymore from my
> code
> > -
> > > > > connecting to a cache via getOrCreateCache() just hangs and
> > eventually
> > > > > times out. The cluster still has 4 members so I am not quite sure
> > what
> > > is
> > > > > going on. To add to this - I can cache -scan the caches from visor
> > and
> > > > all
> > > > > the information is still there, however, inaccessible from my code
> > > (with
> > > > > client mode on or off, doesn't matter). I am baffled.
> > > > >
> > > > > Thanks!
> > > > > Ognen
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
> [23:10:38,923][WARNING][grid-timeout-worker-#33%null%][GridDhtPartitionsExchangeFuture]
> > > > > Retrying preload partition exchange due to timeout [done=false,
> > > > > dummy=false, exchId=GridDhtPartitionExchangeId
> > > > > [topVer=AffinityTopologyVersion [topVer=1848, minorTopVer=0],
> > > > > nodeId=9e648fd3, evt=NODE_JOINED], rcvdIds=[], rmtIds=[e5b581b3,
> > > > bd33def3,
> > > > > f7cc4da6, 8eda3172, efef2202], remaining=[e5b581b3, bd33def3,
> > f7cc4da6,
> > > > > 8eda3172, efef2202], init=true, initFut=true, ready=true,
> > > replied=false,
> > > > > added=true, oldest=e5b581b3, oldestOrder=1, evtLatch=0,
> > > > locNodeOrder=1848,
> > > > > locNodeId=9e648fd3-7c84-4261-93e0-916275a0a9ae]
> > > > > [23:10:53,699][WARNING][main][GridCachePartitionExchangeManager]
> > Failed
> > > > to
> > > > > wait for initial partition map exchange. Possible reasons are:
> > > > >   ^-- Transactions in deadlock.
> > > > >   ^-- Long running transactions (ignore if this is the case).
> > > > >   ^-- Unreleased explicit locks.
> > > > >
> > > > >
> > > >
> > >
> >
> [23:10:53,926][WARNING][grid-timeout-worker-#33%null%][GridDhtPartitionsExchangeFuture]
> > > > > Retrying preload partition exchange due to timeout [done=false,
> > > > > dummy=false, exchId=GridDhtPartitionExchangeId
> > > > > [topVer=AffinityTopologyVersion [topVer=1848, minorTopVer=0],
> > > > > nodeId=9e648fd3, evt=NODE_JOINED], rcvdIds=[], rmtIds=[e5b581b3,
> > > > bd33def3,
> > > > > f7cc4da6, 8eda3172, efef2202], remaining=[e5b581b3, bd33def3,
> > f7cc4da6,
> > > > > 8eda3172, efef2202], init=true, initFut=true, ready=true,
> > > replied=false,
> > > > > added=true, oldest=e5b581b3, oldestOrder=1, evtLatch=0,
> > > > locNodeOrder=1848,
> > > > > locNodeId=9e648fd3-7c84-4261-93e0-916275a0a9ae]
> > > > >
> > > > >
> > > >
> > >
> >
> [23:11:08,929][WARNING][grid-timeout-worker-#33%null%][GridDhtPartitionsExchangeFuture]
> > > > > Retrying preload partition exchange due to timeout [done=false,
> > > > > dummy=false, exchId=GridDhtPartitionExchangeId
> > > > > [topVer=AffinityTopologyVersion [topVer=1848, minorTopVer=0],
> > > > > nodeId=9e648fd3, evt=NODE_JOINED], rcvdIds=[], rmtIds=[e5b581b3,
> > > > bd33def3,
> > > > > f7cc4da6, 8eda3172, efef2202], remaining=[e5b581b3, bd33def3,
> > f7cc4da6,
> > > > > 8eda3172, efef2202], init=true, initFut=true, ready=true,
> > > replied=false,
> > > > > added=true, oldest=e5b581b3, oldestOrder=1, evtLatch=0,
> > > > locNodeOrder=1848,
> > > > > locNodeId=9e648fd3-7c84-4261-93e0-916275a0a9ae]
> > > > > [....]
> > > > > [repeated many, many times]
> > > > > [....]
> > > > >
> > > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Question about rebuilding caches, timeouts etc.

Ognen Duzlevski
I started receiving them today too - thanks :)



On Tue, May 12, 2015 at 1:02 PM, Valentin Kulichenko <
[hidden email]> wrote:

> User list works for me.
>
> Ognen, I received your message as well.
>
> --
> Val
>
> On Tue, May 12, 2015 at 10:31 AM, Yakov Zhdanov <[hidden email]>
> wrote:
>
> > I believe I have subscribed successfully as well. At least I have
> received
> > confirmations.
> >
> > --Yakov
> >
> > 2015-05-12 16:26 GMT+03:00 Dmitriy Setrakyan <[hidden email]>:
> >
> > > I know some people probably subscribed to the user list recently. Is
> > there
> > > anyone else in the community having problems with the list working or
> not
> > > receiving emails?
> > >
> > > I have again just subscribed from my other personal account and got
> > through
> > > the ezmim subscription process OK.
> > >
> > > Ognen, Can you please subscribe again and see if it starts working for
> > you?
> > >
> > > D.
> > >
> > > On Tue, May 12, 2015 at 8:06 AM, Ognen Duzlevski <
> > > [hidden email]>
> > > wrote:
> > >
> > > > Dmitriy, nope - it is not in the spam and I still have not seen it :)
> > > >
> > > > On Tue, May 12, 2015 at 2:52 AM, Dmitriy Setrakyan <
> > > [hidden email]>
> > > > wrote:
> > > >
> > > > > Ognen,
> > > > >
> > > > > I see your post on the user list:
> > > > >
> > > > >
> > > >
> > >
> >
> http://apache-ignite-users.70518.x6.nabble.com/Node-failing-with-weird-errors-td301.html
> > > > >
> > > > > I also got an email. Are you sure it didn't end up in your spam
> > folder?
> > > > >
> > > > > D.
> > > > >
> > > > > On Tue, May 12, 2015 at 12:21 AM, Ognen Duzlevski <
> > > > > [hidden email]
> > > > > > wrote:
> > > > >
> > > > > > Hello all,
> > > > > >
> > > > > > I tried posting to the user list but I am still not seeing the
> > email
> > > > > after
> > > > > > an hour.
> > > > > >
> > > > > > I have a 5 node cluster where I "lost" a node temporarily (Amazon
> > > > > reported
> > > > > > a hardware error so my Ops guys shut down the instance and
> brought
> > a
> > > > new
> > > > > > one back up).
> > > > > >
> > > > > > I ran the same ignite.sh configuration on the new node, expecting
> > it
> > > to
> > > > > > join the cluster - however, I am seeing the following in the logs
> > > (see
> > > > > > below). In addition, I cannot access the caches anymore from my
> > code
> > > -
> > > > > > connecting to a cache via getOrCreateCache() just hangs and
> > > eventually
> > > > > > times out. The cluster still has 4 members so I am not quite sure
> > > what
> > > > is
> > > > > > going on. To add to this - I can cache -scan the caches from
> visor
> > > and
> > > > > all
> > > > > > the information is still there, however, inaccessible from my
> code
> > > > (with
> > > > > > client mode on or off, doesn't matter). I am baffled.
> > > > > >
> > > > > > Thanks!
> > > > > > Ognen
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> [23:10:38,923][WARNING][grid-timeout-worker-#33%null%][GridDhtPartitionsExchangeFuture]
> > > > > > Retrying preload partition exchange due to timeout [done=false,
> > > > > > dummy=false, exchId=GridDhtPartitionExchangeId
> > > > > > [topVer=AffinityTopologyVersion [topVer=1848, minorTopVer=0],
> > > > > > nodeId=9e648fd3, evt=NODE_JOINED], rcvdIds=[], rmtIds=[e5b581b3,
> > > > > bd33def3,
> > > > > > f7cc4da6, 8eda3172, efef2202], remaining=[e5b581b3, bd33def3,
> > > f7cc4da6,
> > > > > > 8eda3172, efef2202], init=true, initFut=true, ready=true,
> > > > replied=false,
> > > > > > added=true, oldest=e5b581b3, oldestOrder=1, evtLatch=0,
> > > > > locNodeOrder=1848,
> > > > > > locNodeId=9e648fd3-7c84-4261-93e0-916275a0a9ae]
> > > > > > [23:10:53,699][WARNING][main][GridCachePartitionExchangeManager]
> > > Failed
> > > > > to
> > > > > > wait for initial partition map exchange. Possible reasons are:
> > > > > >   ^-- Transactions in deadlock.
> > > > > >   ^-- Long running transactions (ignore if this is the case).
> > > > > >   ^-- Unreleased explicit locks.
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> [23:10:53,926][WARNING][grid-timeout-worker-#33%null%][GridDhtPartitionsExchangeFuture]
> > > > > > Retrying preload partition exchange due to timeout [done=false,
> > > > > > dummy=false, exchId=GridDhtPartitionExchangeId
> > > > > > [topVer=AffinityTopologyVersion [topVer=1848, minorTopVer=0],
> > > > > > nodeId=9e648fd3, evt=NODE_JOINED], rcvdIds=[], rmtIds=[e5b581b3,
> > > > > bd33def3,
> > > > > > f7cc4da6, 8eda3172, efef2202], remaining=[e5b581b3, bd33def3,
> > > f7cc4da6,
> > > > > > 8eda3172, efef2202], init=true, initFut=true, ready=true,
> > > > replied=false,
> > > > > > added=true, oldest=e5b581b3, oldestOrder=1, evtLatch=0,
> > > > > locNodeOrder=1848,
> > > > > > locNodeId=9e648fd3-7c84-4261-93e0-916275a0a9ae]
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> [23:11:08,929][WARNING][grid-timeout-worker-#33%null%][GridDhtPartitionsExchangeFuture]
> > > > > > Retrying preload partition exchange due to timeout [done=false,
> > > > > > dummy=false, exchId=GridDhtPartitionExchangeId
> > > > > > [topVer=AffinityTopologyVersion [topVer=1848, minorTopVer=0],
> > > > > > nodeId=9e648fd3, evt=NODE_JOINED], rcvdIds=[], rmtIds=[e5b581b3,
> > > > > bd33def3,
> > > > > > f7cc4da6, 8eda3172, efef2202], remaining=[e5b581b3, bd33def3,
> > > f7cc4da6,
> > > > > > 8eda3172, efef2202], init=true, initFut=true, ready=true,
> > > > replied=false,
> > > > > > added=true, oldest=e5b581b3, oldestOrder=1, evtLatch=0,
> > > > > locNodeOrder=1848,
> > > > > > locNodeId=9e648fd3-7c84-4261-93e0-916275a0a9ae]
> > > > > > [....]
> > > > > > [repeated many, many times]
> > > > > > [....]
> > > > > >
> > > > >
> > > >
> > >
> >
>