Apache Ignite Developers - Legacy Mail Archive

Proper collocation of computations and data.

Classic

List

Threaded

8 messages Options

Alexei Scherbakov

Proper collocation of computations and data.

Guys,

Currently I'm looking into the problem how to properly deliver computation
to data in most efficient way.

Basically I need to iterate over all cache partitions on all grid nodes,
compute some function on each key-value pair and return aggregated result
to a caller.

This is a job for map-reduce API.

But it seems where is no possibility to easily manage automatic routing and
failover of compute jobs to data nodes containing specific partitions.

I found interesting paragraph in javadoc for @AffinityKeyMapped annotation.

Collocating Computations And Data
It is also possible to route computations to the nodes where the data is
cached. This concept is otherwise known as Collocation Of Computations And
Data ....

which makes strong sense for me.

But in fact this is not working. Instead we only have automatic routing(and
failover) for special cases: affinityCall and affinityRun with explicit
partition. And it seems I can't longer use task sessions for them with
recent changes in Compute API (removed withAsync support)

I think this is not OK and we should allow jobs to be automatically routed
to data if they have some annotation attached to them specifying partition
and cache names, same as for affinityCall/Run. Probably we should introduce
special task type for such workflows, something like AffinityComputeTask,
without explicit mapping phase, for convenient usage.

I'm willing to make a JIRA ticket for this.

Thoughs ?

--

Best regards,
Alexei Scherbakov

dsetrakyan

Re: Proper collocation of computations and data.

Alexey,

Have you taken a look at the Affinity API in Ignite? It seems that it has
all the functionality you may need to map partitions to nodes. You can take
that info, and use it to route your computations.

https://ignite.apache.org/releases/latest/javadoc/org/apache/ignite/cache/affinity/Affinity.html

D.

On Wed, Apr 19, 2017 at 10:09 AM, Alexei Scherbakov <
[hidden email]> wrote:

> Guys,
>
> Currently I'm looking into the problem how to properly deliver computation
> to data in most efficient way.
>
> Basically I need to iterate over all cache partitions on all grid nodes,
> compute some function on each key-value pair and return aggregated result
> to a caller.
>
> This is a job for map-reduce API.
>
> But it seems where is no possibility to easily manage automatic routing and
> failover of compute jobs to data nodes containing specific partitions.
>
> I found interesting paragraph in javadoc for @AffinityKeyMapped annotation.
>
> Collocating Computations And Data
> It is also possible to route computations to the nodes where the data is
> cached. This concept is otherwise known as Collocation Of Computations And
> Data ....
>
> which makes strong sense for me.
>
> But in fact this is not working. Instead we only have automatic routing(and
> failover) for special cases: affinityCall and affinityRun with explicit
> partition. And it seems I can't longer use task sessions for them with
> recent changes in Compute API (removed withAsync support)
>
> I think this is not OK and we should allow jobs to be automatically routed
> to data if they have some annotation attached to them specifying partition
> and cache names, same as for affinityCall/Run. Probably we should introduce
> special task type for such workflows, something like AffinityComputeTask,
> without explicit mapping phase, for convenient usage.
>
> I'm willing to make a JIRA ticket for this.
>
> Thoughs ?
>
> --
>
> Best regards,
> Alexei Scherbakov
>

Alexei Scherbakov

Re: Proper collocation of computations and data.

Dmitriy,

I know I could, but it requires too much work and messing with various APIs
for correct routing/failover.

What about automatic partition reservation for consitency? How could I
achieve what using only affinity API ?

I think this should be available out of the box, as for affinityRun/Call.

Moreover, I could say the same for affinityRun/Call. One can use affinity
API and determine node to send a closure. Why bother with additional
methods ?

And javadoc in AffinityKeyMapped definitely should be fixed, because it
refers to unexistent things.

2017-04-19 21:38 GMT+03:00 Dmitriy Setrakyan <[hidden email]>:

> Alexey,
>
> Have you taken a look at the Affinity API in Ignite? It seems that it has
> all the functionality you may need to map partitions to nodes. You can take
> that info, and use it to route your computations.
>
> https://ignite.apache.org/releases/latest/javadoc/org/
> apache/ignite/cache/affinity/Affinity.html
>
> D.
>
> On Wed, Apr 19, 2017 at 10:09 AM, Alexei Scherbakov <
> [hidden email]> wrote:
>
> > Guys,
> >
> > Currently I'm looking into the problem how to properly deliver
> computation
> > to data in most efficient way.
> >
> > Basically I need to iterate over all cache partitions on all grid nodes,
> > compute some function on each key-value pair and return aggregated result
> > to a caller.
> >
> > This is a job for map-reduce API.
> >
> > But it seems where is no possibility to easily manage automatic routing
> and
> > failover of compute jobs to data nodes containing specific partitions.
> >
> > I found interesting paragraph in javadoc for @AffinityKeyMapped
> annotation.
> >
> > Collocating Computations And Data
> > It is also possible to route computations to the nodes where the data is
> > cached. This concept is otherwise known as Collocation Of Computations
> And
> > Data ....
> >
> > which makes strong sense for me.
> >
> > But in fact this is not working. Instead we only have automatic
> routing(and
> > failover) for special cases: affinityCall and affinityRun with explicit
> > partition. And it seems I can't longer use task sessions for them with
> > recent changes in Compute API (removed withAsync support)
> >
> > I think this is not OK and we should allow jobs to be automatically
> routed
> > to data if they have some annotation attached to them specifying
> partition
> > and cache names, same as for affinityCall/Run. Probably we should
> introduce
> > special task type for such workflows, something like AffinityComputeTask,
> > without explicit mapping phase, for convenient usage.
> >
> > I'm willing to make a JIRA ticket for this.
> >
> > Thoughs ?
> >
> > --
> >
> > Best regards,
> > Alexei Scherbakov
> >
>

--

Best regards,
Alexei Scherbakov

Semyon Boikov

Re: Proper collocation of computations and data.

Alexei,

I think AffinityKeyMapped supposed to work for ComputeJobs and any closures
executed by IgniteCompute, but it seems there are no tests for this and
this functionalty is broken now, and I think this should be fixed.

Thanks,
Semyon

On Wed, Apr 19, 2017 at 9:59 PM, Alexei Scherbakov <
[hidden email]> wrote:

> Dmitriy,
>
> I know I could, but it requires too much work and messing with various APIs
> for correct routing/failover.
>
> What about automatic partition reservation for consitency? How could I
> achieve what using only affinity API ?
>
> I think this should be available out of the box, as for affinityRun/Call.
>
> Moreover, I could say the same for affinityRun/Call. One can use affinity
> API and determine node to send a closure. Why bother with additional
> methods ?
>
> And javadoc in AffinityKeyMapped definitely should be fixed, because it
> refers to unexistent things.
>
> 2017-04-19 21:38 GMT+03:00 Dmitriy Setrakyan <[hidden email]>:
>
> > Alexey,
> >
> > Have you taken a look at the Affinity API in Ignite? It seems that it has
> > all the functionality you may need to map partitions to nodes. You can
> take
> > that info, and use it to route your computations.
> >
> > https://ignite.apache.org/releases/latest/javadoc/org/
> > apache/ignite/cache/affinity/Affinity.html
> >
> > D.
> >
> > On Wed, Apr 19, 2017 at 10:09 AM, Alexei Scherbakov <
> > [hidden email]> wrote:
> >
> > > Guys,
> > >
> > > Currently I'm looking into the problem how to properly deliver
> > computation
> > > to data in most efficient way.
> > >
> > > Basically I need to iterate over all cache partitions on all grid
> nodes,
> > > compute some function on each key-value pair and return aggregated
> result
> > > to a caller.
> > >
> > > This is a job for map-reduce API.
> > >
> > > But it seems where is no possibility to easily manage automatic routing
> > and
> > > failover of compute jobs to data nodes containing specific partitions.
> > >
> > > I found interesting paragraph in javadoc for @AffinityKeyMapped
> > annotation.
> > >
> > > Collocating Computations And Data
> > > It is also possible to route computations to the nodes where the data
> is
> > > cached. This concept is otherwise known as Collocation Of Computations
> > And
> > > Data ....
> > >
> > > which makes strong sense for me.
> > >
> > > But in fact this is not working. Instead we only have automatic
> > routing(and
> > > failover) for special cases: affinityCall and affinityRun with explicit
> > > partition. And it seems I can't longer use task sessions for them with
> > > recent changes in Compute API (removed withAsync support)
> > >
> > > I think this is not OK and we should allow jobs to be automatically
> > routed
> > > to data if they have some annotation attached to them specifying
> > partition
> > > and cache names, same as for affinityCall/Run. Probably we should
> > introduce
> > > special task type for such workflows, something like
> AffinityComputeTask,
> > > without explicit mapping phase, for convenient usage.
> > >
> > > I'm willing to make a JIRA ticket for this.
> > >
> > > Thoughs ?
> > >
> > > --
> > >
> > > Best regards,
> > > Alexei Scherbakov
> > >
> >
>
>
>
> --
>
> Best regards,
> Alexei Scherbakov
>

dsetrakyan

Re: Proper collocation of computations and data.

In reply to this post by Alexei Scherbakov

In this case, you should be using this API on IgniteCompute:

affinityRun(Collection<String> cacheNames, int partId, IgniteRunnable job)

This will ensure that the partition is not migrated while the computation
is in progress. Why is this method insufficient?

As far as IgniteCompute deprecating the "withAsync" API, the async behavior
will now be supported through direct async API invocations now, so the
support for async behavior is still there.

D.

On Wed, Apr 19, 2017 at 11:59 AM, Alexei Scherbakov <
[hidden email]> wrote:

Alexei Scherbakov

Re: Proper collocation of computations and data.

In reply to this post by Semyon Boikov

Semyon,

This is exactly what I want.

But currently I don't undestand how affinity would be calculated without
reference to a cache (or caches).

I think something must be added to annotate such things.

I'll create a ticket for this improvement.

2017-04-19 22:04 GMT+03:00 Semyon Boikov <[hidden email]>:

> Alexei,
>
> I think AffinityKeyMapped supposed to work for ComputeJobs and any closures
> executed by IgniteCompute, but it seems there are no tests for this and
> this functionalty is broken now, and I think this should be fixed.
>
> Thanks,
> Semyon
>
> On Wed, Apr 19, 2017 at 9:59 PM, Alexei Scherbakov <
> [hidden email]> wrote:
>
> > Dmitriy,
> >
> > I know I could, but it requires too much work and messing with various
> APIs
> > for correct routing/failover.
> >
> > What about automatic partition reservation for consitency? How could I
> > achieve what using only affinity API ?
> >
> > I think this should be available out of the box, as for affinityRun/Call.
> >
> > Moreover, I could say the same for affinityRun/Call. One can use affinity
> > API and determine node to send a closure. Why bother with additional
> > methods ?
> >
> > And javadoc in AffinityKeyMapped definitely should be fixed, because it
> > refers to unexistent things.
> >
> > 2017-04-19 21:38 GMT+03:00 Dmitriy Setrakyan <[hidden email]>:
> >
> > > Alexey,
> > >
> > > Have you taken a look at the Affinity API in Ignite? It seems that it
> has
> > > all the functionality you may need to map partitions to nodes. You can
> > take
> > > that info, and use it to route your computations.
> > >
> > > https://ignite.apache.org/releases/latest/javadoc/org/
> > > apache/ignite/cache/affinity/Affinity.html
> > >
> > > D.
> > >
> > > On Wed, Apr 19, 2017 at 10:09 AM, Alexei Scherbakov <
> > > [hidden email]> wrote:
> > >
> > > > Guys,
> > > >
> > > > Currently I'm looking into the problem how to properly deliver
> > > computation
> > > > to data in most efficient way.
> > > >
> > > > Basically I need to iterate over all cache partitions on all grid
> > nodes,
> > > > compute some function on each key-value pair and return aggregated
> > result
> > > > to a caller.
> > > >
> > > > This is a job for map-reduce API.
> > > >
> > > > But it seems where is no possibility to easily manage automatic
> routing
> > > and
> > > > failover of compute jobs to data nodes containing specific
> partitions.
> > > >
> > > > I found interesting paragraph in javadoc for @AffinityKeyMapped
> > > annotation.
> > > >
> > > > Collocating Computations And Data
> > > > It is also possible to route computations to the nodes where the data
> > is
> > > > cached. This concept is otherwise known as Collocation Of
> Computations
> > > And
> > > > Data ....
> > > >
> > > > which makes strong sense for me.
> > > >
> > > > But in fact this is not working. Instead we only have automatic
> > > routing(and
> > > > failover) for special cases: affinityCall and affinityRun with
> explicit
> > > > partition. And it seems I can't longer use task sessions for them
> with
> > > > recent changes in Compute API (removed withAsync support)
> > > >
> > > > I think this is not OK and we should allow jobs to be automatically
> > > routed
> > > > to data if they have some annotation attached to them specifying
> > > partition
> > > > and cache names, same as for affinityCall/Run. Probably we should
> > > introduce
> > > > special task type for such workflows, something like
> > AffinityComputeTask,
> > > > without explicit mapping phase, for convenient usage.
> > > >
> > > > I'm willing to make a JIRA ticket for this.
> > > >
> > > > Thoughs ?
> > > >
> > > > --
> > > >
> > > > Best regards,
> > > > Alexei Scherbakov
> > > >
> > >
> >
> >
> >
> > --
> >
> > Best regards,
> > Alexei Scherbakov
> >
>

--

Best regards,
Alexei Scherbakov

Alexei Scherbakov

Re: Proper collocation of computations and data.

In reply to this post by dsetrakyan

affinityRun will work only for single closure.

But I need to have the same functionality for all jobs in my compute task,
with support for task sessions and other benefits from map-reduce API.

2017-04-19 22:07 GMT+03:00 Dmitriy Setrakyan <[hidden email]>:

> In this case, you should be using this API on IgniteCompute:
>
> affinityRun(Collection<String> cacheNames, int partId, IgniteRunnable job)
>
> This will ensure that the partition is not migrated while the computation
> is in progress. Why is this method insufficient?
>
> As far as IgniteCompute deprecating the "withAsync" API, the async behavior
> will now be supported through direct async API invocations now, so the
> support for async behavior is still there.
>
> D.
>
> On Wed, Apr 19, 2017 at 11:59 AM, Alexei Scherbakov <
> [hidden email]> wrote:
>
> > Dmitriy,
> >
> > I know I could, but it requires too much work and messing with various
> APIs
> > for correct routing/failover.
> >
> > What about automatic partition reservation for consitency? How could I
> > achieve what using only affinity API ?
> >
> > I think this should be available out of the box, as for affinityRun/Call.
> >
> > Moreover, I could say the same for affinityRun/Call. One can use affinity
> > API and determine node to send a closure. Why bother with additional
> > methods ?
> >
> > And javadoc in AffinityKeyMapped definitely should be fixed, because it
> > refers to unexistent things.
> >
> > 2017-04-19 21:38 GMT+03:00 Dmitriy Setrakyan <[hidden email]>:
> >
> > > Alexey,
> > >
> > > Have you taken a look at the Affinity API in Ignite? It seems that it
> has
> > > all the functionality you may need to map partitions to nodes. You can
> > take
> > > that info, and use it to route your computations.
> > >
> > > https://ignite.apache.org/releases/latest/javadoc/org/
> > > apache/ignite/cache/affinity/Affinity.html
> > >
> > > D.
> > >
> > > On Wed, Apr 19, 2017 at 10:09 AM, Alexei Scherbakov <
> > > [hidden email]> wrote:
> > >
> > > > Guys,
> > > >
> > > > Currently I'm looking into the problem how to properly deliver
> > > computation
> > > > to data in most efficient way.
> > > >
> > > > Basically I need to iterate over all cache partitions on all grid
> > nodes,
> > > > compute some function on each key-value pair and return aggregated
> > result
> > > > to a caller.
> > > >
> > > > This is a job for map-reduce API.
> > > >
> > > > But it seems where is no possibility to easily manage automatic
> routing
> > > and
> > > > failover of compute jobs to data nodes containing specific
> partitions.
> > > >
> > > > I found interesting paragraph in javadoc for @AffinityKeyMapped
> > > annotation.
> > > >
> > > > Collocating Computations And Data
> > > > It is also possible to route computations to the nodes where the data
> > is
> > > > cached. This concept is otherwise known as Collocation Of
> Computations
> > > And
> > > > Data ....
> > > >
> > > > which makes strong sense for me.
> > > >
> > > > But in fact this is not working. Instead we only have automatic
> > > routing(and
> > > > failover) for special cases: affinityCall and affinityRun with
> explicit
> > > > partition. And it seems I can't longer use task sessions for them
> with
> > > > recent changes in Compute API (removed withAsync support)
> > > >
> > > > I think this is not OK and we should allow jobs to be automatically
> > > routed
> > > > to data if they have some annotation attached to them specifying
> > > partition
> > > > and cache names, same as for affinityCall/Run. Probably we should
> > > introduce
> > > > special task type for such workflows, something like
> > AffinityComputeTask,
> > > > without explicit mapping phase, for convenient usage.
> > > >
> > > > I'm willing to make a JIRA ticket for this.
> > > >
> > > > Thoughs ?
> > > >
> > > > --
> > > >
> > > > Best regards,
> > > > Alexei Scherbakov
> > > >
> > >
> >
> >
> >
> > --
> >
> > Best regards,
> > Alexei Scherbakov
> >
>

--

Best regards,
Alexei Scherbakov

Valentin Kulichenko

Re: Proper collocation of computations and data.

I think the following should be done:

- AffinityKeyMapped annotation should be fixed if it's broken
(obviously).
- There should be a way to specify cache name in addition to key
(another annotation?).
- Semantics of a job or closure executed with this annotation should be
the same as for affinityRun/Call, with all guarantees for reserved
partitions, etc.

Alexey, this would cover all the cases you described, right?

-Val

On Wed, Apr 19, 2017 at 9:16 PM, Alexei Scherbakov <
[hidden email]> wrote:

> affinityRun will work only for single closure.
>
> But I need to have the same functionality for all jobs in my compute task,
> with support for task sessions and other benefits from map-reduce API.
>
>
>
> 2017-04-19 22:07 GMT+03:00 Dmitriy Setrakyan <[hidden email]>:
>
> > In this case, you should be using this API on IgniteCompute:
> >
> > affinityRun(Collection<String> cacheNames, int partId, IgniteRunnable
> job)
> >
> > This will ensure that the partition is not migrated while the computation
> > is in progress. Why is this method insufficient?
> >
> > As far as IgniteCompute deprecating the "withAsync" API, the async
> behavior
> > will now be supported through direct async API invocations now, so the
> > support for async behavior is still there.
> >
> > D.
> >
> > On Wed, Apr 19, 2017 at 11:59 AM, Alexei Scherbakov <
> > [hidden email]> wrote:
> >
> > > Dmitriy,
> > >
> > > I know I could, but it requires too much work and messing with various
> > APIs
> > > for correct routing/failover.
> > >
> > > What about automatic partition reservation for consitency? How could I
> > > achieve what using only affinity API ?
> > >
> > > I think this should be available out of the box, as for
> affinityRun/Call.
> > >
> > > Moreover, I could say the same for affinityRun/Call. One can use
> affinity
> > > API and determine node to send a closure. Why bother with additional
> > > methods ?
> > >
> > > And javadoc in AffinityKeyMapped definitely should be fixed, because it
> > > refers to unexistent things.
> > >
> > > 2017-04-19 21:38 GMT+03:00 Dmitriy Setrakyan <[hidden email]>:
> > >
> > > > Alexey,
> > > >
> > > > Have you taken a look at the Affinity API in Ignite? It seems that it
> > has
> > > > all the functionality you may need to map partitions to nodes. You
> can
> > > take
> > > > that info, and use it to route your computations.
> > > >
> > > > https://ignite.apache.org/releases/latest/javadoc/org/
> > > > apache/ignite/cache/affinity/Affinity.html
> > > >
> > > > D.
> > > >
> > > > On Wed, Apr 19, 2017 at 10:09 AM, Alexei Scherbakov <
> > > > [hidden email]> wrote:
> > > >
> > > > > Guys,
> > > > >
> > > > > Currently I'm looking into the problem how to properly deliver
> > > > computation
> > > > > to data in most efficient way.
> > > > >
> > > > > Basically I need to iterate over all cache partitions on all grid
> > > nodes,
> > > > > compute some function on each key-value pair and return aggregated
> > > result
> > > > > to a caller.
> > > > >
> > > > > This is a job for map-reduce API.
> > > > >
> > > > > But it seems where is no possibility to easily manage automatic
> > routing
> > > > and
> > > > > failover of compute jobs to data nodes containing specific
> > partitions.
> > > > >
> > > > > I found interesting paragraph in javadoc for @AffinityKeyMapped
> > > > annotation.
> > > > >
> > > > > Collocating Computations And Data
> > > > > It is also possible to route computations to the nodes where the
> data
> > > is
> > > > > cached. This concept is otherwise known as Collocation Of
> > Computations
> > > > And
> > > > > Data ....
> > > > >
> > > > > which makes strong sense for me.
> > > > >
> > > > > But in fact this is not working. Instead we only have automatic
> > > > routing(and
> > > > > failover) for special cases: affinityCall and affinityRun with
> > explicit
> > > > > partition. And it seems I can't longer use task sessions for them
> > with
> > > > > recent changes in Compute API (removed withAsync support)
> > > > >
> > > > > I think this is not OK and we should allow jobs to be automatically
> > > > routed
> > > > > to data if they have some annotation attached to them specifying
> > > > partition
> > > > > and cache names, same as for affinityCall/Run. Probably we should
> > > > introduce
> > > > > special task type for such workflows, something like
> > > AffinityComputeTask,
> > > > > without explicit mapping phase, for convenient usage.
> > > > >
> > > > > I'm willing to make a JIRA ticket for this.
> > > > >
> > > > > Thoughs ?
> > > > >
> > > > > --
> > > > >
> > > > > Best regards,
> > > > > Alexei Scherbakov
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > >
> > > Best regards,
> > > Alexei Scherbakov
> > >
> >
>
>
>
> --
>
> Best regards,
> Alexei Scherbakov
>