Partitioned cache and node failures

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Partitioned cache and node failures

Ognen Duzlevski
In a partitioned cache (or set of partitioned caches) - does a single node
failure mean all of the cache(s) become unavailable?

I am seeing a situation where I cannot access any of the caches (using
getOrCreateCache) - all my code just "hangs".

The interesting thing is that visor can see all the caches and their
contents.

What is so special about visor?

I would appreciate if someone would try and answer any of these (I can
provide more info). as I am evaluating ignite for our use in a data
science/analytics setup :-)

Thanks!
Ognen
Reply | Threaded
Open this post in threaded view
|

Re: Partitioned cache and node failures

yzhdanov
Can you please file a ticket and share your sample applicaiton with us?

If it is not possible, then attach verbose logs from all the nodes and
threaddumps from all the nodes after issue gets reproduced.

Thanks!

--Yakov

2015-05-12 15:30 GMT+03:00 Ognen Duzlevski <[hidden email]>:

> In a partitioned cache (or set of partitioned caches) - does a single node
> failure mean all of the cache(s) become unavailable?
>
> I am seeing a situation where I cannot access any of the caches (using
> getOrCreateCache) - all my code just "hangs".
>
> The interesting thing is that visor can see all the caches and their
> contents.
>
> What is so special about visor?
>
> I would appreciate if someone would try and answer any of these (I can
> provide more info). as I am evaluating ignite for our use in a data
> science/analytics setup :-)
>
> Thanks!
> Ognen
>
Reply | Threaded
Open this post in threaded view
|

Re: Partitioned cache and node failures

dsetrakyan
Ognen,

It sounds to me like this is the same issue you had recently with the cloud
node crashing due to hardware failure. If this is the case, then it sounds
like a firewall issue for me. Are you sure there is no firewall setup
between nodes and they are all deployed in the same availability zone?

D.

On Tue, May 12, 2015 at 1:33 PM, Yakov Zhdanov <[hidden email]> wrote:

> Can you please file a ticket and share your sample applicaiton with us?
>
> If it is not possible, then attach verbose logs from all the nodes and
> threaddumps from all the nodes after issue gets reproduced.
>
> Thanks!
>
> --Yakov
>
> 2015-05-12 15:30 GMT+03:00 Ognen Duzlevski <[hidden email]>:
>
> > In a partitioned cache (or set of partitioned caches) - does a single
> node
> > failure mean all of the cache(s) become unavailable?
> >
> > I am seeing a situation where I cannot access any of the caches (using
> > getOrCreateCache) - all my code just "hangs".
> >
> > The interesting thing is that visor can see all the caches and their
> > contents.
> >
> > What is so special about visor?
> >
> > I would appreciate if someone would try and answer any of these (I can
> > provide more info). as I am evaluating ignite for our use in a data
> > science/analytics setup :-)
> >
> > Thanks!
> > Ognen
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Partitioned cache and node failures

Ognen Duzlevski
Dmitriy,

It is not a firewall issue. However, the hardware crash has something to do
with it probably.

In that direction - can one expect a crash of one node (out of 5) housing a
few partitioned caches to affect the availability of all the caches? The
strange thing is visor was able to show them all but acquiring them through
a Scala app using getOrCreateCache() just hung. I ended up "rigging" visor
with a capability to dump cache -scan results to a file - I was able to
salvage all my data and then I restarted the cluster.

Certainly pretty clumsy ;)

Ognen

On Tue, May 12, 2015 at 1:28 PM, Dmitriy Setrakyan <[hidden email]>
wrote:

> Ognen,
>
> It sounds to me like this is the same issue you had recently with the cloud
> node crashing due to hardware failure. If this is the case, then it sounds
> like a firewall issue for me. Are you sure there is no firewall setup
> between nodes and they are all deployed in the same availability zone?
>
> D.
>
> On Tue, May 12, 2015 at 1:33 PM, Yakov Zhdanov <[hidden email]>
> wrote:
>
> > Can you please file a ticket and share your sample applicaiton with us?
> >
> > If it is not possible, then attach verbose logs from all the nodes and
> > threaddumps from all the nodes after issue gets reproduced.
> >
> > Thanks!
> >
> > --Yakov
> >
> > 2015-05-12 15:30 GMT+03:00 Ognen Duzlevski <[hidden email]>:
> >
> > > In a partitioned cache (or set of partitioned caches) - does a single
> > node
> > > failure mean all of the cache(s) become unavailable?
> > >
> > > I am seeing a situation where I cannot access any of the caches (using
> > > getOrCreateCache) - all my code just "hangs".
> > >
> > > The interesting thing is that visor can see all the caches and their
> > > contents.
> > >
> > > What is so special about visor?
> > >
> > > I would appreciate if someone would try and answer any of these (I can
> > > provide more info). as I am evaluating ignite for our use in a data
> > > science/analytics setup :-)
> > >
> > > Thanks!
> > > Ognen
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Partitioned cache and node failures

Yakov Zhdanov-2
Ongen, can you share your test via Jira issue?

It would be very helpful if you could take logs and threaddumps from all
the nodes in topology and put them all together to a Jira issue.

Thanks!

--
Yakov Zhdanov, Director R&D
*GridGain Systems*
www.gridgain.com

2015-05-12 22:33 GMT+03:00 Ognen Duzlevski <[hidden email]>:

> Dmitriy,
>
> It is not a firewall issue. However, the hardware crash has something to do
> with it probably.
>
> In that direction - can one expect a crash of one node (out of 5) housing a
> few partitioned caches to affect the availability of all the caches? The
> strange thing is visor was able to show them all but acquiring them through
> a Scala app using getOrCreateCache() just hung. I ended up "rigging" visor
> with a capability to dump cache -scan results to a file - I was able to
> salvage all my data and then I restarted the cluster.
>
> Certainly pretty clumsy ;)
>
> Ognen
>
> On Tue, May 12, 2015 at 1:28 PM, Dmitriy Setrakyan <[hidden email]>
> wrote:
>
> > Ognen,
> >
> > It sounds to me like this is the same issue you had recently with the
> cloud
> > node crashing due to hardware failure. If this is the case, then it
> sounds
> > like a firewall issue for me. Are you sure there is no firewall setup
> > between nodes and they are all deployed in the same availability zone?
> >
> > D.
> >
> > On Tue, May 12, 2015 at 1:33 PM, Yakov Zhdanov <[hidden email]>
> > wrote:
> >
> > > Can you please file a ticket and share your sample applicaiton with us?
> > >
> > > If it is not possible, then attach verbose logs from all the nodes and
> > > threaddumps from all the nodes after issue gets reproduced.
> > >
> > > Thanks!
> > >
> > > --Yakov
> > >
> > > 2015-05-12 15:30 GMT+03:00 Ognen Duzlevski <[hidden email]
> >:
> > >
> > > > In a partitioned cache (or set of partitioned caches) - does a single
> > > node
> > > > failure mean all of the cache(s) become unavailable?
> > > >
> > > > I am seeing a situation where I cannot access any of the caches
> (using
> > > > getOrCreateCache) - all my code just "hangs".
> > > >
> > > > The interesting thing is that visor can see all the caches and their
> > > > contents.
> > > >
> > > > What is so special about visor?
> > > >
> > > > I would appreciate if someone would try and answer any of these (I
> can
> > > > provide more info). as I am evaluating ignite for our use in a data
> > > > science/analytics setup :-)
> > > >
> > > > Thanks!
> > > > Ognen
> > > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Partitioned cache and node failures

Ognen Duzlevski
Jakov, yes - no problem, will do that today.

On Thu, May 14, 2015 at 6:15 AM, Yakov Zhdanov <[hidden email]>
wrote:

> Ongen, can you share your test via Jira issue?
>
> It would be very helpful if you could take logs and threaddumps from all
> the nodes in topology and put them all together to a Jira issue.
>
> Thanks!
>
> --
> Yakov Zhdanov, Director R&D
> *GridGain Systems*
> www.gridgain.com
>
> 2015-05-12 22:33 GMT+03:00 Ognen Duzlevski <[hidden email]>:
>
> > Dmitriy,
> >
> > It is not a firewall issue. However, the hardware crash has something to
> do
> > with it probably.
> >
> > In that direction - can one expect a crash of one node (out of 5)
> housing a
> > few partitioned caches to affect the availability of all the caches? The
> > strange thing is visor was able to show them all but acquiring them
> through
> > a Scala app using getOrCreateCache() just hung. I ended up "rigging"
> visor
> > with a capability to dump cache -scan results to a file - I was able to
> > salvage all my data and then I restarted the cluster.
> >
> > Certainly pretty clumsy ;)
> >
> > Ognen
> >
> > On Tue, May 12, 2015 at 1:28 PM, Dmitriy Setrakyan <
> [hidden email]>
> > wrote:
> >
> > > Ognen,
> > >
> > > It sounds to me like this is the same issue you had recently with the
> > cloud
> > > node crashing due to hardware failure. If this is the case, then it
> > sounds
> > > like a firewall issue for me. Are you sure there is no firewall setup
> > > between nodes and they are all deployed in the same availability zone?
> > >
> > > D.
> > >
> > > On Tue, May 12, 2015 at 1:33 PM, Yakov Zhdanov <[hidden email]>
> > > wrote:
> > >
> > > > Can you please file a ticket and share your sample applicaiton with
> us?
> > > >
> > > > If it is not possible, then attach verbose logs from all the nodes
> and
> > > > threaddumps from all the nodes after issue gets reproduced.
> > > >
> > > > Thanks!
> > > >
> > > > --Yakov
> > > >
> > > > 2015-05-12 15:30 GMT+03:00 Ognen Duzlevski <
> [hidden email]
> > >:
> > > >
> > > > > In a partitioned cache (or set of partitioned caches) - does a
> single
> > > > node
> > > > > failure mean all of the cache(s) become unavailable?
> > > > >
> > > > > I am seeing a situation where I cannot access any of the caches
> > (using
> > > > > getOrCreateCache) - all my code just "hangs".
> > > > >
> > > > > The interesting thing is that visor can see all the caches and
> their
> > > > > contents.
> > > > >
> > > > > What is so special about visor?
> > > > >
> > > > > I would appreciate if someone would try and answer any of these (I
> > can
> > > > > provide more info). as I am evaluating ignite for our use in a data
> > > > > science/analytics setup :-)
> > > > >
> > > > > Thanks!
> > > > > Ognen
> > > > >
> > > >
> > >
> >
>