Apache Ignite Developers - Legacy Mail Archive

Stability issues with 1.1.0

Classic

List

Threaded

4 messages Options

Ognen Duzlevski

Stability issues with 1.1.0

Has anyone else experienced stability issues with 1.1.0?

I had 1.0.0 cluster of 5 machines running on EC2 (VPC) for a few months
with hardly any glitches. I upgraded our caches to 1.1.0 3 days ago and I
had to restart the whole cluster twice already - I am experiencing loss of
random nodes (4 out of 5 twice already). The logs are empty as in not much
going except periodic checks and then you see a message that a node is
unreachable. Same machines on EC2, they have all been up for 150+ days.

Thanks!
Ognen

dsetrakyan

Re: Stability issues with 1.1.0

Ognen,

Can I ask you to upgrade EC2 and client instances to Ignite 1.2? I think we
should investigate the issue, but I want to investigate on the latest
released code base.

Also, can you start Ignite instances with "-v" flag or
"-DIGNITE_QUIET=true" system property? This way you will get more output in
your log, which may help us investigate the issue.

D.

On Fri, Jul 10, 2015 at 8:15 AM, Ognen Duzlevski <[hidden email]>
wrote:

> Has anyone else experienced stability issues with 1.1.0?
>
> I had 1.0.0 cluster of 5 machines running on EC2 (VPC) for a few months
> with hardly any glitches. I upgraded our caches to 1.1.0 3 days ago and I
> had to restart the whole cluster twice already - I am experiencing loss of
> random nodes (4 out of 5 twice already). The logs are empty as in not much
> going except periodic checks and then you see a message that a node is
> unreachable. Same machines on EC2, they have all been up for 150+ days.
>
> Thanks!
> Ognen
>

Ognen Duzlevski

Re: Stability issues with 1.1.0

Dmitriy, I will do both and report back.
Ognen

On Fri, Jul 10, 2015 at 2:28 PM, Dmitriy Setrakyan <[hidden email]>
wrote:

> Ognen,
>
> Can I ask you to upgrade EC2 and client instances to Ignite 1.2? I think we
> should investigate the issue, but I want to investigate on the latest
> released code base.
>
> Also, can you start Ignite instances with "-v" flag or
> "-DIGNITE_QUIET=true" system property? This way you will get more output in
> your log, which may help us investigate the issue.
>
> D.
>
> On Fri, Jul 10, 2015 at 8:15 AM, Ognen Duzlevski <
> [hidden email]>
> wrote:
>
> > Has anyone else experienced stability issues with 1.1.0?
> >
> > I had 1.0.0 cluster of 5 machines running on EC2 (VPC) for a few months
> > with hardly any glitches. I upgraded our caches to 1.1.0 3 days ago and I
> > had to restart the whole cluster twice already - I am experiencing loss
> of
> > random nodes (4 out of 5 twice already). The logs are empty as in not
> much
> > going except periodic checks and then you see a message that a node is
> > unreachable. Same machines on EC2, they have all been up for 150+ days.
> >
> > Thanks!
> > Ognen
> >
>

dsetrakyan

Re: Stability issues with 1.1.0

On Fri, Jul 10, 2015 at 1:11 PM, Ognen Duzlevski <[hidden email]>
wrote:

> Dmitriy, I will do both and report back.
> Ognen
>
> On Fri, Jul 10, 2015 at 2:28 PM, Dmitriy Setrakyan <[hidden email]>
> wrote:
>
> > Ognen,
> >
> > Can I ask you to upgrade EC2 and client instances to Ignite 1.2? I think
> we
> > should investigate the issue, but I want to investigate on the latest
> > released code base.
> >
> > Also, can you start Ignite instances with "-v" flag or
> > "-DIGNITE_QUIET=true" system property?

Should be -DIGNITE_QUIET=false, sorry.

> This way you will get more output in
> > your log, which may help us investigate the issue.
> >
> > D.
> >
> > On Fri, Jul 10, 2015 at 8:15 AM, Ognen Duzlevski <
> > [hidden email]>
> > wrote:
> >
> > > Has anyone else experienced stability issues with 1.1.0?
> > >
> > > I had 1.0.0 cluster of 5 machines running on EC2 (VPC) for a few months
> > > with hardly any glitches. I upgraded our caches to 1.1.0 3 days ago
> and I
> > > had to restart the whole cluster twice already - I am experiencing loss
> > of
> > > random nodes (4 out of 5 twice already). The logs are empty as in not
> > much
> > > going except periodic checks and then you see a message that a node is
> > > unreachable. Same machines on EC2, they have all been up for 150+ days.
> > >
> > > Thanks!
> > > Ognen
> > >
> >
>