Memory leak in ignite-cassandra module while using RoundRobinPolicy LoadBalancingPolicy

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Memory leak in ignite-cassandra module while using RoundRobinPolicy LoadBalancingPolicy

kotamrajuyashasvi
This post was updated on .
Hi

We are working on an Ignite project with Cassandra as persistent storage.
During our tests we faced the continuous cassandra session refresh issue.
https://issues.apache.org/jira/browse/IGNITE-8354

When we observed the above issue we also ran into OutOfMemory Exception.
Though the above issue is solved we ran through the source code to find out
the root cause
of OOM. We found one potential cause.

In org.apache.ignite.cache.store.cassandra.session.CassandraSessionImpl.java
when refresh() method is invoked to handle Exceptions, new Cluster is build
with same LoadBalancingPolicy Object. We are using RoundRobinPolicy so same
RoundRobinPolicy object would be used while building Cluster when refresh()
is invoked. In RoundRobinPolicy there is a CopyOnWriteArrayList<Host>
liveHosts. When ever init(Cluster cluster, Collection<Host> hosts) is called
on RoundRobinPolicy  it calls liveHosts.addAll(hosts) adding all the Host
Object Collection to liveHosts.
When ever Cluster is build during refresh() the Host Collection are added
again to the liveHosts of the same RoundRobinPolicy that is used. Thus same
Hosts are added again to liveHosts for every refresh() and the size would
grow indefinitely after many refresh() calls causing OOM. Even in the heap
dump post OOM we found huge number of Objects in liveHosts of
RoundRobinPolicy Object.

IGNITE-8354 has fixed the OOM by preventing unnecessary refresh() but still
does not fix the actual Memory leak caused due to RoundRobinPolicy . In a
long run we can have many Cassandra refresh due to some genuine reasons and
then we end up with many Hosts in liveHosts of the RoundRobinPolicy Object.
Some possible solutions would be
1. To use new LoadBalancingPolicy object while building new Cluster during
refresh().
2. Somehow clear Objects in liveHosts during refresh().

Also there's a work around to use DCAwareRoundRobinPolicy as it uses adds
hosts dc wise and adds only if absent. But we are using single datacenter
and its not recommended to use DCAwareRoundRobinPolicy when we have single
datacenter.

I would like to request some one from ignite cassandra module development
look into this issue.



--
Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Memory leak in ignite-cassandra module

irudyak
Hi Kotamrajuyashasvi,

Could you please create a ticket for this in Ignite JIRA? That's the
standard process to make improvements/fixes to Ignite.

Thanks,
Igor Rudyak

On Mon, Jun 11, 2018 at 11:36 PM, kotamrajuyashasvi <
[hidden email]> wrote:

> Hi
>
> We are working on an Ignite project with Cassandra as persistent storage.
> During our tests we faced the continuous cassandra session refresh issue.
> https://issues.apache.org/jira/browse/IGNITE-8354
>
> When we observed the above issue we also ran into OutOfMemory Exception.
> Though the above issue is solved we ran through the source code to find out
> the root cause
> of OOM. We found one potential cause.
>
> In org.apache.ignite.cache.store.cassandra.session.
> CassandraSessionImpl.java
> when refresh() method is invoked to handle Exceptions, new Cluster is build
> with same LoadBalancingPolicy Object. We are using RoundRobinPolicy so same
> RoundRobinPolicy object would be used while building Cluster when refresh()
> is invoked. In RoundRobinPolicy there is a CopyOnWriteArrayList<Host>
> liveHosts. When ever init(Cluster cluster, Collection<Host> hosts) is
> called
> on RoundRobinPolicy  it calls liveHosts.addAll(hosts) adding all the Host
> Object Collection to liveHosts.
> When ever Cluster is build during refresh() the Host Collection are added
> again to the liveHosts of the same RoundRobinPolicy that is used. Thus same
> Hosts are added again to liveHosts for every refresh() and the size would
> grow indefinitely after many refresh() calls causing OOM. Even in the heap
> dump post OOM we found huge number of Objects in liveHosts of
> RoundRobinPolicy Object.
>
> IGNITE-8354 has fixed the OOM by preventing unnecessary refresh() but still
> does not fix the actual Memory leak caused due to RoundRobinPolicy . In a
> long run we can have many Cassandra refresh due to some genuine reasons and
> then we end up with many Hosts in liveHosts of the RoundRobinPolicy Object.
> Some possible solutions would be
> 1. To use new LoadBalancingPolicy object while building new Cluster during
> refresh().
> 2. Somehow clear Objects in liveHosts during refresh().
>
> Also there's a work around to use DCAwareRoundRobinPolicy as it uses adds
> hosts dc wise and adds only if absent. But we are using single datacenter
> and its not recommended to use DCAwareRoundRobinPolicy when we have single
> datacenter.
>
> I would like to request some one from ignite cassandra module development
> look into this issue.
>
>
>
> --
> Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
>
Reply | Threaded
Open this post in threaded view
|

Re: Memory leak in ignite-cassandra module

dmagda
Igor,

Do you have any glues/ideas how to fix it? Is the provided information
enough for you?

--
Denis

On Mon, Jun 11, 2018 at 11:45 PM Igor Rudyak <[hidden email]> wrote:

> Hi Kotamrajuyashasvi,
>
> Could you please create a ticket for this in Ignite JIRA? That's the
> standard process to make improvements/fixes to Ignite.
>
> Thanks,
> Igor Rudyak
>
> On Mon, Jun 11, 2018 at 11:36 PM, kotamrajuyashasvi <
> [hidden email]> wrote:
>
> > Hi
> >
> > We are working on an Ignite project with Cassandra as persistent storage.
> > During our tests we faced the continuous cassandra session refresh issue.
> > https://issues.apache.org/jira/browse/IGNITE-8354
> >
> > When we observed the above issue we also ran into OutOfMemory Exception.
> > Though the above issue is solved we ran through the source code to find
> out
> > the root cause
> > of OOM. We found one potential cause.
> >
> > In org.apache.ignite.cache.store.cassandra.session.
> > CassandraSessionImpl.java
> > when refresh() method is invoked to handle Exceptions, new Cluster is
> build
> > with same LoadBalancingPolicy Object. We are using RoundRobinPolicy so
> same
> > RoundRobinPolicy object would be used while building Cluster when
> refresh()
> > is invoked. In RoundRobinPolicy there is a CopyOnWriteArrayList<Host>
> > liveHosts. When ever init(Cluster cluster, Collection<Host> hosts) is
> > called
> > on RoundRobinPolicy  it calls liveHosts.addAll(hosts) adding all the Host
> > Object Collection to liveHosts.
> > When ever Cluster is build during refresh() the Host Collection are added
> > again to the liveHosts of the same RoundRobinPolicy that is used. Thus
> same
> > Hosts are added again to liveHosts for every refresh() and the size would
> > grow indefinitely after many refresh() calls causing OOM. Even in the
> heap
> > dump post OOM we found huge number of Objects in liveHosts of
> > RoundRobinPolicy Object.
> >
> > IGNITE-8354 has fixed the OOM by preventing unnecessary refresh() but
> still
> > does not fix the actual Memory leak caused due to RoundRobinPolicy . In a
> > long run we can have many Cassandra refresh due to some genuine reasons
> and
> > then we end up with many Hosts in liveHosts of the RoundRobinPolicy
> Object.
> > Some possible solutions would be
> > 1. To use new LoadBalancingPolicy object while building new Cluster
> during
> > refresh().
> > 2. Somehow clear Objects in liveHosts during refresh().
> >
> > Also there's a work around to use DCAwareRoundRobinPolicy as it uses adds
> > hosts dc wise and adds only if absent. But we are using single datacenter
> > and its not recommended to use DCAwareRoundRobinPolicy when we have
> single
> > datacenter.
> >
> > I would like to request some one from ignite cassandra module development
> > look into this issue.
> >
> >
> >
> > --
> > Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Memory leak in ignite-cassandra module

irudyak
Denis,

I don't have ideas right now. First need to create a test to reproduce this
case. Then I'll have some ideas :-)

Igor

On Tue, Jun 12, 2018 at 11:26 AM, Denis Magda <[hidden email]> wrote:

> Igor,
>
> Do you have any glues/ideas how to fix it? Is the provided information
> enough for you?
>
> --
> Denis
>
> On Mon, Jun 11, 2018 at 11:45 PM Igor Rudyak <[hidden email]> wrote:
>
> > Hi Kotamrajuyashasvi,
> >
> > Could you please create a ticket for this in Ignite JIRA? That's the
> > standard process to make improvements/fixes to Ignite.
> >
> > Thanks,
> > Igor Rudyak
> >
> > On Mon, Jun 11, 2018 at 11:36 PM, kotamrajuyashasvi <
> > [hidden email]> wrote:
> >
> > > Hi
> > >
> > > We are working on an Ignite project with Cassandra as persistent
> storage.
> > > During our tests we faced the continuous cassandra session refresh
> issue.
> > > https://issues.apache.org/jira/browse/IGNITE-8354
> > >
> > > When we observed the above issue we also ran into OutOfMemory
> Exception.
> > > Though the above issue is solved we ran through the source code to find
> > out
> > > the root cause
> > > of OOM. We found one potential cause.
> > >
> > > In org.apache.ignite.cache.store.cassandra.session.
> > > CassandraSessionImpl.java
> > > when refresh() method is invoked to handle Exceptions, new Cluster is
> > build
> > > with same LoadBalancingPolicy Object. We are using RoundRobinPolicy so
> > same
> > > RoundRobinPolicy object would be used while building Cluster when
> > refresh()
> > > is invoked. In RoundRobinPolicy there is a CopyOnWriteArrayList<Host>
> > > liveHosts. When ever init(Cluster cluster, Collection<Host> hosts) is
> > > called
> > > on RoundRobinPolicy  it calls liveHosts.addAll(hosts) adding all the
> Host
> > > Object Collection to liveHosts.
> > > When ever Cluster is build during refresh() the Host Collection are
> added
> > > again to the liveHosts of the same RoundRobinPolicy that is used. Thus
> > same
> > > Hosts are added again to liveHosts for every refresh() and the size
> would
> > > grow indefinitely after many refresh() calls causing OOM. Even in the
> > heap
> > > dump post OOM we found huge number of Objects in liveHosts of
> > > RoundRobinPolicy Object.
> > >
> > > IGNITE-8354 has fixed the OOM by preventing unnecessary refresh() but
> > still
> > > does not fix the actual Memory leak caused due to RoundRobinPolicy .
> In a
> > > long run we can have many Cassandra refresh due to some genuine reasons
> > and
> > > then we end up with many Hosts in liveHosts of the RoundRobinPolicy
> > Object.
> > > Some possible solutions would be
> > > 1. To use new LoadBalancingPolicy object while building new Cluster
> > during
> > > refresh().
> > > 2. Somehow clear Objects in liveHosts during refresh().
> > >
> > > Also there's a work around to use DCAwareRoundRobinPolicy as it uses
> adds
> > > hosts dc wise and adds only if absent. But we are using single
> datacenter
> > > and its not recommended to use DCAwareRoundRobinPolicy when we have
> > single
> > > datacenter.
> > >
> > > I would like to request some one from ignite cassandra module
> development
> > > look into this issue.
> > >
> > >
> > >
> > > --
> > > Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Memory leak in ignite-cassandra module

irudyak
I will be also good to know which version of Cassandra driver was used to
run into OOM exception.

Igor

On Tue, Jun 12, 2018 at 11:39 AM, Igor Rudyak <[hidden email]> wrote:

> Denis,
>
> I don't have ideas right now. First need to create a test to reproduce
> this case. Then I'll have some ideas :-)
>
> Igor
>
> On Tue, Jun 12, 2018 at 11:26 AM, Denis Magda <[hidden email]> wrote:
>
>> Igor,
>>
>> Do you have any glues/ideas how to fix it? Is the provided information
>> enough for you?
>>
>> --
>> Denis
>>
>> On Mon, Jun 11, 2018 at 11:45 PM Igor Rudyak <[hidden email]> wrote:
>>
>> > Hi Kotamrajuyashasvi,
>> >
>> > Could you please create a ticket for this in Ignite JIRA? That's the
>> > standard process to make improvements/fixes to Ignite.
>> >
>> > Thanks,
>> > Igor Rudyak
>> >
>> > On Mon, Jun 11, 2018 at 11:36 PM, kotamrajuyashasvi <
>> > [hidden email]> wrote:
>> >
>> > > Hi
>> > >
>> > > We are working on an Ignite project with Cassandra as persistent
>> storage.
>> > > During our tests we faced the continuous cassandra session refresh
>> issue.
>> > > https://issues.apache.org/jira/browse/IGNITE-8354
>> > >
>> > > When we observed the above issue we also ran into OutOfMemory
>> Exception.
>> > > Though the above issue is solved we ran through the source code to
>> find
>> > out
>> > > the root cause
>> > > of OOM. We found one potential cause.
>> > >
>> > > In org.apache.ignite.cache.store.cassandra.session.
>> > > CassandraSessionImpl.java
>> > > when refresh() method is invoked to handle Exceptions, new Cluster is
>> > build
>> > > with same LoadBalancingPolicy Object. We are using RoundRobinPolicy so
>> > same
>> > > RoundRobinPolicy object would be used while building Cluster when
>> > refresh()
>> > > is invoked. In RoundRobinPolicy there is a CopyOnWriteArrayList<Host>
>> > > liveHosts. When ever init(Cluster cluster, Collection<Host> hosts) is
>> > > called
>> > > on RoundRobinPolicy  it calls liveHosts.addAll(hosts) adding all the
>> Host
>> > > Object Collection to liveHosts.
>> > > When ever Cluster is build during refresh() the Host Collection are
>> added
>> > > again to the liveHosts of the same RoundRobinPolicy that is used. Thus
>> > same
>> > > Hosts are added again to liveHosts for every refresh() and the size
>> would
>> > > grow indefinitely after many refresh() calls causing OOM. Even in the
>> > heap
>> > > dump post OOM we found huge number of Objects in liveHosts of
>> > > RoundRobinPolicy Object.
>> > >
>> > > IGNITE-8354 has fixed the OOM by preventing unnecessary refresh() but
>> > still
>> > > does not fix the actual Memory leak caused due to RoundRobinPolicy .
>> In a
>> > > long run we can have many Cassandra refresh due to some genuine
>> reasons
>> > and
>> > > then we end up with many Hosts in liveHosts of the RoundRobinPolicy
>> > Object.
>> > > Some possible solutions would be
>> > > 1. To use new LoadBalancingPolicy object while building new Cluster
>> > during
>> > > refresh().
>> > > 2. Somehow clear Objects in liveHosts during refresh().
>> > >
>> > > Also there's a work around to use DCAwareRoundRobinPolicy as it uses
>> adds
>> > > hosts dc wise and adds only if absent. But we are using single
>> datacenter
>> > > and its not recommended to use DCAwareRoundRobinPolicy when we have
>> > single
>> > > datacenter.
>> > >
>> > > I would like to request some one from ignite cassandra module
>> development
>> > > look into this issue.
>> > >
>> > >
>> > >
>> > > --
>> > > Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
>> > >
>> >
>>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Memory leak in ignite-cassandra module

irudyak
In reply to this post by kotamrajuyashasvi
Hi,

That's actually a bug in Cassandra's *RoundRobinPolicy*
implementation. When ever *init(Cluster cluster, Collection<Host> hosts) *is
called on *RoundRobinPolicy* it should reset object state to some initial
state. While in current implementation it accumulates old state with new
state.

The problem could be fixed by implementing custom RoundRobinePolicy and
overriding its *init* method.

Igor

On Mon, Jun 11, 2018 at 11:36 PM, kotamrajuyashasvi <
[hidden email]> wrote:

> Hi
>
> We are working on an Ignite project with Cassandra as persistent storage.
> During our tests we faced the continuous cassandra session refresh issue.
> https://issues.apache.org/jira/browse/IGNITE-8354
>
> When we observed the above issue we also ran into OutOfMemory Exception.
> Though the above issue is solved we ran through the source code to find out
> the root cause
> of OOM. We found one potential cause.
>
> In org.apache.ignite.cache.store.cassandra.session.
> CassandraSessionImpl.java
> when refresh() method is invoked to handle Exceptions, new Cluster is build
> with same LoadBalancingPolicy Object. We are using RoundRobinPolicy so same
> RoundRobinPolicy object would be used while building Cluster when refresh()
> is invoked. In RoundRobinPolicy there is a CopyOnWriteArrayList<Host>
> liveHosts. When ever init(Cluster cluster, Collection<Host> hosts) is
> called
> on RoundRobinPolicy  it calls liveHosts.addAll(hosts) adding all the Host
> Object Collection to liveHosts.
> When ever Cluster is build during refresh() the Host Collection are added
> again to the liveHosts of the same RoundRobinPolicy that is used. Thus same
> Hosts are added again to liveHosts for every refresh() and the size would
> grow indefinitely after many refresh() calls causing OOM. Even in the heap
> dump post OOM we found huge number of Objects in liveHosts of
> RoundRobinPolicy Object.
>
> IGNITE-8354 has fixed the OOM by preventing unnecessary refresh() but still
> does not fix the actual Memory leak caused due to RoundRobinPolicy . In a
> long run we can have many Cassandra refresh due to some genuine reasons and
> then we end up with many Hosts in liveHosts of the RoundRobinPolicy Object.
> Some possible solutions would be
> 1. To use new LoadBalancingPolicy object while building new Cluster during
> refresh().
> 2. Somehow clear Objects in liveHosts during refresh().
>
> Also there's a work around to use DCAwareRoundRobinPolicy as it uses adds
> hosts dc wise and adds only if absent. But we are using single datacenter
> and its not recommended to use DCAwareRoundRobinPolicy when we have single
> datacenter.
>
> I would like to request some one from ignite cassandra module development
> look into this issue.
>
>
>
> --
> Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
>