Apache Ignite Developers - Legacy Mail Archive

Partitioned cache and node failures

Classic

List

Threaded

6 messages Options

Ognen Duzlevski

Partitioned cache and node failures

In a partitioned cache (or set of partitioned caches) - does a single node
failure mean all of the cache(s) become unavailable?

I am seeing a situation where I cannot access any of the caches (using
getOrCreateCache) - all my code just "hangs".

The interesting thing is that visor can see all the caches and their
contents.

What is so special about visor?

I would appreciate if someone would try and answer any of these (I can
provide more info). as I am evaluating ignite for our use in a data
science/analytics setup :-)

Thanks!
Ognen

yzhdanov

Re: Partitioned cache and node failures

Can you please file a ticket and share your sample applicaiton with us?

If it is not possible, then attach verbose logs from all the nodes and
threaddumps from all the nodes after issue gets reproduced.

Thanks!

--Yakov

2015-05-12 15:30 GMT+03:00 Ognen Duzlevski <[hidden email]>:

> In a partitioned cache (or set of partitioned caches) - does a single node
> failure mean all of the cache(s) become unavailable?
>
> I am seeing a situation where I cannot access any of the caches (using
> getOrCreateCache) - all my code just "hangs".
>
> The interesting thing is that visor can see all the caches and their
> contents.
>
> What is so special about visor?
>
> I would appreciate if someone would try and answer any of these (I can
> provide more info). as I am evaluating ignite for our use in a data
> science/analytics setup :-)
>
> Thanks!
> Ognen
>

dsetrakyan

Re: Partitioned cache and node failures

Ognen,

It sounds to me like this is the same issue you had recently with the cloud
node crashing due to hardware failure. If this is the case, then it sounds
like a firewall issue for me. Are you sure there is no firewall setup
between nodes and they are all deployed in the same availability zone?

D.

On Tue, May 12, 2015 at 1:33 PM, Yakov Zhdanov <[hidden email]> wrote:

> Can you please file a ticket and share your sample applicaiton with us?
>
> If it is not possible, then attach verbose logs from all the nodes and
> threaddumps from all the nodes after issue gets reproduced.
>
> Thanks!
>
> --Yakov
>
> 2015-05-12 15:30 GMT+03:00 Ognen Duzlevski <[hidden email]>:
>
> > In a partitioned cache (or set of partitioned caches) - does a single
> node
> > failure mean all of the cache(s) become unavailable?
> >
> > I am seeing a situation where I cannot access any of the caches (using
> > getOrCreateCache) - all my code just "hangs".
> >
> > The interesting thing is that visor can see all the caches and their
> > contents.
> >
> > What is so special about visor?
> >
> > I would appreciate if someone would try and answer any of these (I can
> > provide more info). as I am evaluating ignite for our use in a data
> > science/analytics setup :-)
> >
> > Thanks!
> > Ognen
> >
>

Ognen Duzlevski

Re: Partitioned cache and node failures

Dmitriy,

It is not a firewall issue. However, the hardware crash has something to do
with it probably.

In that direction - can one expect a crash of one node (out of 5) housing a
few partitioned caches to affect the availability of all the caches? The
strange thing is visor was able to show them all but acquiring them through
a Scala app using getOrCreateCache() just hung. I ended up "rigging" visor
with a capability to dump cache -scan results to a file - I was able to
salvage all my data and then I restarted the cluster.

Certainly pretty clumsy ;)

Ognen

On Tue, May 12, 2015 at 1:28 PM, Dmitriy Setrakyan <[hidden email]>
wrote:

> Ognen,
>
> It sounds to me like this is the same issue you had recently with the cloud
> node crashing due to hardware failure. If this is the case, then it sounds
> like a firewall issue for me. Are you sure there is no firewall setup
> between nodes and they are all deployed in the same availability zone?
>
> D.
>
> On Tue, May 12, 2015 at 1:33 PM, Yakov Zhdanov <[hidden email]>
> wrote:
>
> > Can you please file a ticket and share your sample applicaiton with us?
> >
> > If it is not possible, then attach verbose logs from all the nodes and
> > threaddumps from all the nodes after issue gets reproduced.
> >
> > Thanks!
> >
> > --Yakov
> >
> > 2015-05-12 15:30 GMT+03:00 Ognen Duzlevski <[hidden email]>:
> >
> > > In a partitioned cache (or set of partitioned caches) - does a single
> > node
> > > failure mean all of the cache(s) become unavailable?
> > >
> > > I am seeing a situation where I cannot access any of the caches (using
> > > getOrCreateCache) - all my code just "hangs".
> > >
> > > The interesting thing is that visor can see all the caches and their
> > > contents.
> > >
> > > What is so special about visor?
> > >
> > > I would appreciate if someone would try and answer any of these (I can
> > > provide more info). as I am evaluating ignite for our use in a data
> > > science/analytics setup :-)
> > >
> > > Thanks!
> > > Ognen
> > >
> >
>

Yakov Zhdanov-2

Re: Partitioned cache and node failures

Ongen, can you share your test via Jira issue?

It would be very helpful if you could take logs and threaddumps from all
the nodes in topology and put them all together to a Jira issue.

Thanks!

--
Yakov Zhdanov, Director R&D
*GridGain Systems*
www.gridgain.com

2015-05-12 22:33 GMT+03:00 Ognen Duzlevski <[hidden email]>:

> Dmitriy,
>
> It is not a firewall issue. However, the hardware crash has something to do
> with it probably.
>
> In that direction - can one expect a crash of one node (out of 5) housing a
> few partitioned caches to affect the availability of all the caches? The
> strange thing is visor was able to show them all but acquiring them through
> a Scala app using getOrCreateCache() just hung. I ended up "rigging" visor
> with a capability to dump cache -scan results to a file - I was able to
> salvage all my data and then I restarted the cluster.
>
> Certainly pretty clumsy ;)
>
> Ognen
>
> On Tue, May 12, 2015 at 1:28 PM, Dmitriy Setrakyan <[hidden email]>
> wrote:
>
> > Ognen,
> >
> > It sounds to me like this is the same issue you had recently with the
> cloud
> > node crashing due to hardware failure. If this is the case, then it
> sounds
> > like a firewall issue for me. Are you sure there is no firewall setup
> > between nodes and they are all deployed in the same availability zone?
> >
> > D.
> >
> > On Tue, May 12, 2015 at 1:33 PM, Yakov Zhdanov <[hidden email]>
> > wrote:
> >
> > > Can you please file a ticket and share your sample applicaiton with us?
> > >
> > > If it is not possible, then attach verbose logs from all the nodes and
> > > threaddumps from all the nodes after issue gets reproduced.
> > >
> > > Thanks!
> > >
> > > --Yakov
> > >
> > > 2015-05-12 15:30 GMT+03:00 Ognen Duzlevski <[hidden email]
> >:
> > >
> > > > In a partitioned cache (or set of partitioned caches) - does a single
> > > node
> > > > failure mean all of the cache(s) become unavailable?
> > > >
> > > > I am seeing a situation where I cannot access any of the caches
> (using
> > > > getOrCreateCache) - all my code just "hangs".
> > > >
> > > > The interesting thing is that visor can see all the caches and their
> > > > contents.
> > > >
> > > > What is so special about visor?
> > > >
> > > > I would appreciate if someone would try and answer any of these (I
> can
> > > > provide more info). as I am evaluating ignite for our use in a data
> > > > science/analytics setup :-)
> > > >
> > > > Thanks!
> > > > Ognen
> > > >
> > >
> >
>

Ognen Duzlevski

Re: Partitioned cache and node failures

Jakov, yes - no problem, will do that today.

On Thu, May 14, 2015 at 6:15 AM, Yakov Zhdanov <[hidden email]>
wrote:

> Ongen, can you share your test via Jira issue?
>
> It would be very helpful if you could take logs and threaddumps from all
> the nodes in topology and put them all together to a Jira issue.
>
> Thanks!
>
> --
> Yakov Zhdanov, Director R&D
> *GridGain Systems*
> www.gridgain.com
>
> 2015-05-12 22:33 GMT+03:00 Ognen Duzlevski <[hidden email]>:
>
> > Dmitriy,
> >
> > It is not a firewall issue. However, the hardware crash has something to
> do
> > with it probably.
> >
> > In that direction - can one expect a crash of one node (out of 5)
> housing a
> > few partitioned caches to affect the availability of all the caches? The
> > strange thing is visor was able to show them all but acquiring them
> through
> > a Scala app using getOrCreateCache() just hung. I ended up "rigging"
> visor
> > with a capability to dump cache -scan results to a file - I was able to
> > salvage all my data and then I restarted the cluster.
> >
> > Certainly pretty clumsy ;)
> >
> > Ognen
> >
> > On Tue, May 12, 2015 at 1:28 PM, Dmitriy Setrakyan <
> [hidden email]>
> > wrote:
> >
> > > Ognen,
> > >
> > > It sounds to me like this is the same issue you had recently with the
> > cloud
> > > node crashing due to hardware failure. If this is the case, then it
> > sounds
> > > like a firewall issue for me. Are you sure there is no firewall setup
> > > between nodes and they are all deployed in the same availability zone?
> > >
> > > D.
> > >
> > > On Tue, May 12, 2015 at 1:33 PM, Yakov Zhdanov <[hidden email]>
> > > wrote:
> > >
> > > > Can you please file a ticket and share your sample applicaiton with
> us?
> > > >
> > > > If it is not possible, then attach verbose logs from all the nodes
> and
> > > > threaddumps from all the nodes after issue gets reproduced.
> > > >
> > > > Thanks!
> > > >
> > > > --Yakov
> > > >
> > > > 2015-05-12 15:30 GMT+03:00 Ognen Duzlevski <
> [hidden email]
> > >:
> > > >
> > > > > In a partitioned cache (or set of partitioned caches) - does a
> single
> > > > node
> > > > > failure mean all of the cache(s) become unavailable?
> > > > >
> > > > > I am seeing a situation where I cannot access any of the caches
> > (using
> > > > > getOrCreateCache) - all my code just "hangs".
> > > > >
> > > > > The interesting thing is that visor can see all the caches and
> their
> > > > > contents.
> > > > >
> > > > > What is so special about visor?
> > > > >
> > > > > I would appreciate if someone would try and answer any of these (I
> > can
> > > > > provide more info). as I am evaluating ignite for our use in a data
> > > > > science/analytics setup :-)
> > > > >
> > > > > Thanks!
> > > > > Ognen
> > > > >
> > > >
> > >
> >
>