Rebalancing speedup (IGNITE-1093)

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Rebalancing speedup (IGNITE-1093)

Anton Vinogradov
Hello,
  I think that speed of rebalancing can be increased by parallel handling
of supply messages.
  Main Idea is to use per-partition topics.

  At the start of DemandPool we can create topics for every partition and
handle each partition's supply messages at separated listener. In this case
only one thread will request lock for partition it have to relocate.

  Also we can do the same at SupplyPool, and make supply messages in
parallel way.

  Amount of simultaneous messages should be limited to prevent overloading.
Current idea is to request N partitions at start and request one next
partition when one of prevous relocated.

  Per-partition listeners can be created this way:
for (int p = 0; p <= maxP; p++) {
    cctx.io().addOrderedHandler(topic(p), new CI2<UUID,
GridDhtPartitionSupplyMessage>() {
        @Override public void apply(UUID nodeId,
GridDhtPartitionSupplyMessage msg) {
            handleSupplyMessage(new SupplyMessage(nodeId, msg));
        }
    });
}

Thoughts?
Reply | Threaded
Open this post in threaded view
|

Re: Rebalancing speedup (IGNITE-1093)

dsetrakyan
Anton,

Very good point. Valentin is already doing per-partition topic in his work
for Continuous Queries:
https://issues.apache.org/jira/browse/IGNITE-104

Perhaps it makes sense for both of you to sync up.

Also, as a side note, whenever starting a dev list discussion about a
ticket, it always helps to include a link to the dev list thread into Jira
(I already did it for IGNITE-1093).

D.

On Thu, Jul 30, 2015 at 11:31 AM, Anton Vinogradov <[hidden email]
> wrote:

> Hello,
>   I think that speed of rebalancing can be increased by parallel handling
> of supply messages.
>   Main Idea is to use per-partition topics.
>
>   At the start of DemandPool we can create topics for every partition and
> handle each partition's supply messages at separated listener. In this case
> only one thread will request lock for partition it have to relocate.
>
>   Also we can do the same at SupplyPool, and make supply messages in
> parallel way.
>
>   Amount of simultaneous messages should be limited to prevent overloading.
> Current idea is to request N partitions at start and request one next
> partition when one of prevous relocated.
>
>   Per-partition listeners can be created this way:
> for (int p = 0; p <= maxP; p++) {
>     cctx.io().addOrderedHandler(topic(p), new CI2<UUID,
> GridDhtPartitionSupplyMessage>() {
>         @Override public void apply(UUID nodeId,
> GridDhtPartitionSupplyMessage msg) {
>             handleSupplyMessage(new SupplyMessage(nodeId, msg));
>         }
>     });
> }
>
> Thoughts?
>
Reply | Threaded
Open this post in threaded view
|

Re: Rebalancing speedup (IGNITE-1093)

Anton Vinogradov
Parallel Demand pool seems to be ready.
Working on parallel Supply pool.

General idea is to use systemPool to handle demand requests. Amount ot
parallel workers will be limited indirectly by Demand poll's parallelism
limits.
But overload still can happens in this case. Will write load tests.

Next idea is to go through whole partition and when
s.messageSize() >= cctx.config().getRebalanceBatchSize()
send supply batch, store current iterator, swapLsnr and etc. and return
from supply thread.
When same node will make next request then supplyPool just continue
iterating and give next or last batch for requested partition.

Also speed can be increased if SupplyPool will prepare N batches on first
demand request and will generate additional on each new request.

On Thu, Jul 30, 2015 at 9:41 PM, Dmitriy Setrakyan <[hidden email]>
wrote:

> Anton,
>
> Very good point. Valentin is already doing per-partition topic in his work
> for Continuous Queries:
> https://issues.apache.org/jira/browse/IGNITE-104
>
> Perhaps it makes sense for both of you to sync up.
>
> Also, as a side note, whenever starting a dev list discussion about a
> ticket, it always helps to include a link to the dev list thread into Jira
> (I already did it for IGNITE-1093).
>
> D.
>
> On Thu, Jul 30, 2015 at 11:31 AM, Anton Vinogradov <
> [hidden email]
> > wrote:
>
> > Hello,
> >   I think that speed of rebalancing can be increased by parallel handling
> > of supply messages.
> >   Main Idea is to use per-partition topics.
> >
> >   At the start of DemandPool we can create topics for every partition and
> > handle each partition's supply messages at separated listener. In this
> case
> > only one thread will request lock for partition it have to relocate.
> >
> >   Also we can do the same at SupplyPool, and make supply messages in
> > parallel way.
> >
> >   Amount of simultaneous messages should be limited to prevent
> overloading.
> > Current idea is to request N partitions at start and request one next
> > partition when one of prevous relocated.
> >
> >   Per-partition listeners can be created this way:
> > for (int p = 0; p <= maxP; p++) {
> >     cctx.io().addOrderedHandler(topic(p), new CI2<UUID,
> > GridDhtPartitionSupplyMessage>() {
> >         @Override public void apply(UUID nodeId,
> > GridDhtPartitionSupplyMessage msg) {
> >             handleSupplyMessage(new SupplyMessage(nodeId, msg));
> >         }
> >     });
> > }
> >
> > Thoughts?
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Rebalancing speedup (IGNITE-1093)

dsetrakyan
Anton,

Good points!

Can you please copy the design points made in this discussion to the Jira
ticket?

D.

On Wed, Aug 5, 2015 at 2:57 AM, Anton Vinogradov <[hidden email]>
wrote:

> Parallel Demand pool seems to be ready.
> Working on parallel Supply pool.
>
> General idea is to use systemPool to handle demand requests. Amount ot
> parallel workers will be limited indirectly by Demand poll's parallelism
> limits.
> But overload still can happens in this case. Will write load tests.
>
> Next idea is to go through whole partition and when
> s.messageSize() >= cctx.config().getRebalanceBatchSize()
> send supply batch, store current iterator, swapLsnr and etc. and return
> from supply thread.
> When same node will make next request then supplyPool just continue
> iterating and give next or last batch for requested partition.
>
> Also speed can be increased if SupplyPool will prepare N batches on first
> demand request and will generate additional on each new request.
>
> On Thu, Jul 30, 2015 at 9:41 PM, Dmitriy Setrakyan <[hidden email]>
> wrote:
>
> > Anton,
> >
> > Very good point. Valentin is already doing per-partition topic in his
> work
> > for Continuous Queries:
> > https://issues.apache.org/jira/browse/IGNITE-104
> >
> > Perhaps it makes sense for both of you to sync up.
> >
> > Also, as a side note, whenever starting a dev list discussion about a
> > ticket, it always helps to include a link to the dev list thread into
> Jira
> > (I already did it for IGNITE-1093).
> >
> > D.
> >
> > On Thu, Jul 30, 2015 at 11:31 AM, Anton Vinogradov <
> > [hidden email]
> > > wrote:
> >
> > > Hello,
> > >   I think that speed of rebalancing can be increased by parallel
> handling
> > > of supply messages.
> > >   Main Idea is to use per-partition topics.
> > >
> > >   At the start of DemandPool we can create topics for every partition
> and
> > > handle each partition's supply messages at separated listener. In this
> > case
> > > only one thread will request lock for partition it have to relocate.
> > >
> > >   Also we can do the same at SupplyPool, and make supply messages in
> > > parallel way.
> > >
> > >   Amount of simultaneous messages should be limited to prevent
> > overloading.
> > > Current idea is to request N partitions at start and request one next
> > > partition when one of prevous relocated.
> > >
> > >   Per-partition listeners can be created this way:
> > > for (int p = 0; p <= maxP; p++) {
> > >     cctx.io().addOrderedHandler(topic(p), new CI2<UUID,
> > > GridDhtPartitionSupplyMessage>() {
> > >         @Override public void apply(UUID nodeId,
> > > GridDhtPartitionSupplyMessage msg) {
> > >             handleSupplyMessage(new SupplyMessage(nodeId, msg));
> > >         }
> > >     });
> > > }
> > >
> > > Thoughts?
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Rebalancing speedup (IGNITE-1093)

Alexey Kuznetsov-2
Anton,

I see that you are working on rebalancing could you as part of your changes
also introduce a boolean flag that will indicate that rebalance in progress
or not?
Also there will be very useful if you add more details in node log when
rebalancing started. Such as: how many partitions will be moved and on what
node and e.t.


--
Alexey Kuznetsov
GridGain Systems
www.gridgain.com
Reply | Threaded
Open this post in threaded view
|

Re: Rebalancing speedup (IGNITE-1093)

Anton Vinogradov
Alexey,

Could you please create and link Jira issue with these wishes?

On Tue, Aug 11, 2015 at 5:43 AM, Alexey Kuznetsov <[hidden email]>
wrote:

> Anton,
>
> I see that you are working on rebalancing could you as part of your changes
> also introduce a boolean flag that will indicate that rebalance in progress
> or not?
> Also there will be very useful if you add more details in node log when
> rebalancing started. Such as: how many partitions will be moved and on what
> node and e.t.
>
>
> --
> Alexey Kuznetsov
> GridGain Systems
> www.gridgain.com
>
Reply | Threaded
Open this post in threaded view
|

Re: Rebalancing speedup (IGNITE-1093)

Alexey Kuznetsov-2
Anton, I created issues IGNITE-1230 and IGNITE-1231 and linked them to
https://issues.apache.org/jira/browse/IGNITE-1093.

On Tue, Aug 11, 2015 at 3:31 PM, Anton Vinogradov <[hidden email]>
wrote:

> Alexey,
>
> Could you please create and link Jira issue with these wishes?
>
> On Tue, Aug 11, 2015 at 5:43 AM, Alexey Kuznetsov <[hidden email]
> >
> wrote:
>
> > Anton,
> >
> > I see that you are working on rebalancing could you as part of your
> changes
> > also introduce a boolean flag that will indicate that rebalance in
> progress
> > or not?
> > Also there will be very useful if you add more details in node log when
> > rebalancing started. Such as: how many partitions will be moved and on
> what
> > node and e.t.
> >
> >
> > --
> > Alexey Kuznetsov
> > GridGain Systems
> > www.gridgain.com
> >
>



--
Alexey Kuznetsov
GridGain Systems
www.gridgain.com
Reply | Threaded
Open this post in threaded view
|

Re: Rebalancing speedup (IGNITE-1093)

Anton Vinogradov
Hello,
I've finished work on IGNITE-1093 & merged changes to master & 1.5.
Results listed at Issue <https://issues.apache.org/jira/browse/IGNITE-1093>

Main results are:
- Speed of rebalancing in case node joined increased ~twice (default
settings) at the same grid throughput,
- No GC hell happens now at the end of rebalancing.

On Tue, Aug 11, 2015 at 11:42 AM, Alexey Kuznetsov <[hidden email]>
wrote:

> Anton, I created issues IGNITE-1230 and IGNITE-1231 and linked them to
> https://issues.apache.org/jira/browse/IGNITE-1093.
>
> On Tue, Aug 11, 2015 at 3:31 PM, Anton Vinogradov <
> [hidden email]>
> wrote:
>
> > Alexey,
> >
> > Could you please create and link Jira issue with these wishes?
> >
> > On Tue, Aug 11, 2015 at 5:43 AM, Alexey Kuznetsov <
> [hidden email]
> > >
> > wrote:
> >
> > > Anton,
> > >
> > > I see that you are working on rebalancing could you as part of your
> > changes
> > > also introduce a boolean flag that will indicate that rebalance in
> > progress
> > > or not?
> > > Also there will be very useful if you add more details in node log when
> > > rebalancing started. Such as: how many partitions will be moved and on
> > what
> > > node and e.t.
> > >
> > >
> > > --
> > > Alexey Kuznetsov
> > > GridGain Systems
> > > www.gridgain.com
> > >
> >
>
>
>
> --
> Alexey Kuznetsov
> GridGain Systems
> www.gridgain.com
>
Reply | Threaded
Open this post in threaded view
|

Re: Rebalancing speedup (IGNITE-1093)

dsetrakyan
Thanks Anton! It is always exhilarating to see performance improvements :)

On Mon, Nov 9, 2015 at 7:30 AM, Anton Vinogradov <[hidden email]>
wrote:

> Hello,
> I've finished work on IGNITE-1093 & merged changes to master & 1.5.
> Results listed at Issue <https://issues.apache.org/jira/browse/IGNITE-1093
> >
>
> Main results are:
> - Speed of rebalancing in case node joined increased ~twice (default
> settings) at the same grid throughput,
> - No GC hell happens now at the end of rebalancing.
>
> On Tue, Aug 11, 2015 at 11:42 AM, Alexey Kuznetsov <
> [hidden email]>
> wrote:
>
> > Anton, I created issues IGNITE-1230 and IGNITE-1231 and linked them to
> > https://issues.apache.org/jira/browse/IGNITE-1093.
> >
> > On Tue, Aug 11, 2015 at 3:31 PM, Anton Vinogradov <
> > [hidden email]>
> > wrote:
> >
> > > Alexey,
> > >
> > > Could you please create and link Jira issue with these wishes?
> > >
> > > On Tue, Aug 11, 2015 at 5:43 AM, Alexey Kuznetsov <
> > [hidden email]
> > > >
> > > wrote:
> > >
> > > > Anton,
> > > >
> > > > I see that you are working on rebalancing could you as part of your
> > > changes
> > > > also introduce a boolean flag that will indicate that rebalance in
> > > progress
> > > > or not?
> > > > Also there will be very useful if you add more details in node log
> when
> > > > rebalancing started. Such as: how many partitions will be moved and
> on
> > > what
> > > > node and e.t.
> > > >
> > > >
> > > > --
> > > > Alexey Kuznetsov
> > > > GridGain Systems
> > > > www.gridgain.com
> > > >
> > >
> >
> >
> >
> > --
> > Alexey Kuznetsov
> > GridGain Systems
> > www.gridgain.com
> >
>