Guys,
Currently I'm looking into the problem how to properly deliver computation to data in most efficient way. Basically I need to iterate over all cache partitions on all grid nodes, compute some function on each key-value pair and return aggregated result to a caller. This is a job for map-reduce API. But it seems where is no possibility to easily manage automatic routing and failover of compute jobs to data nodes containing specific partitions. I found interesting paragraph in javadoc for @AffinityKeyMapped annotation. Collocating Computations And Data It is also possible to route computations to the nodes where the data is cached. This concept is otherwise known as Collocation Of Computations And Data .... which makes strong sense for me. But in fact this is not working. Instead we only have automatic routing(and failover) for special cases: affinityCall and affinityRun with explicit partition. And it seems I can't longer use task sessions for them with recent changes in Compute API (removed withAsync support) I think this is not OK and we should allow jobs to be automatically routed to data if they have some annotation attached to them specifying partition and cache names, same as for affinityCall/Run. Probably we should introduce special task type for such workflows, something like AffinityComputeTask, without explicit mapping phase, for convenient usage. I'm willing to make a JIRA ticket for this. Thoughs ? -- Best regards, Alexei Scherbakov |
Alexey,
Have you taken a look at the Affinity API in Ignite? It seems that it has all the functionality you may need to map partitions to nodes. You can take that info, and use it to route your computations. https://ignite.apache.org/releases/latest/javadoc/org/apache/ignite/cache/affinity/Affinity.html D. On Wed, Apr 19, 2017 at 10:09 AM, Alexei Scherbakov < [hidden email]> wrote: > Guys, > > Currently I'm looking into the problem how to properly deliver computation > to data in most efficient way. > > Basically I need to iterate over all cache partitions on all grid nodes, > compute some function on each key-value pair and return aggregated result > to a caller. > > This is a job for map-reduce API. > > But it seems where is no possibility to easily manage automatic routing and > failover of compute jobs to data nodes containing specific partitions. > > I found interesting paragraph in javadoc for @AffinityKeyMapped annotation. > > Collocating Computations And Data > It is also possible to route computations to the nodes where the data is > cached. This concept is otherwise known as Collocation Of Computations And > Data .... > > which makes strong sense for me. > > But in fact this is not working. Instead we only have automatic routing(and > failover) for special cases: affinityCall and affinityRun with explicit > partition. And it seems I can't longer use task sessions for them with > recent changes in Compute API (removed withAsync support) > > I think this is not OK and we should allow jobs to be automatically routed > to data if they have some annotation attached to them specifying partition > and cache names, same as for affinityCall/Run. Probably we should introduce > special task type for such workflows, something like AffinityComputeTask, > without explicit mapping phase, for convenient usage. > > I'm willing to make a JIRA ticket for this. > > Thoughs ? > > -- > > Best regards, > Alexei Scherbakov > |
Dmitriy,
I know I could, but it requires too much work and messing with various APIs for correct routing/failover. What about automatic partition reservation for consitency? How could I achieve what using only affinity API ? I think this should be available out of the box, as for affinityRun/Call. Moreover, I could say the same for affinityRun/Call. One can use affinity API and determine node to send a closure. Why bother with additional methods ? And javadoc in AffinityKeyMapped definitely should be fixed, because it refers to unexistent things. 2017-04-19 21:38 GMT+03:00 Dmitriy Setrakyan <[hidden email]>: > Alexey, > > Have you taken a look at the Affinity API in Ignite? It seems that it has > all the functionality you may need to map partitions to nodes. You can take > that info, and use it to route your computations. > > https://ignite.apache.org/releases/latest/javadoc/org/ > apache/ignite/cache/affinity/Affinity.html > > D. > > On Wed, Apr 19, 2017 at 10:09 AM, Alexei Scherbakov < > [hidden email]> wrote: > > > Guys, > > > > Currently I'm looking into the problem how to properly deliver > computation > > to data in most efficient way. > > > > Basically I need to iterate over all cache partitions on all grid nodes, > > compute some function on each key-value pair and return aggregated result > > to a caller. > > > > This is a job for map-reduce API. > > > > But it seems where is no possibility to easily manage automatic routing > and > > failover of compute jobs to data nodes containing specific partitions. > > > > I found interesting paragraph in javadoc for @AffinityKeyMapped > annotation. > > > > Collocating Computations And Data > > It is also possible to route computations to the nodes where the data is > > cached. This concept is otherwise known as Collocation Of Computations > And > > Data .... > > > > which makes strong sense for me. > > > > But in fact this is not working. Instead we only have automatic > routing(and > > failover) for special cases: affinityCall and affinityRun with explicit > > partition. And it seems I can't longer use task sessions for them with > > recent changes in Compute API (removed withAsync support) > > > > I think this is not OK and we should allow jobs to be automatically > routed > > to data if they have some annotation attached to them specifying > partition > > and cache names, same as for affinityCall/Run. Probably we should > introduce > > special task type for such workflows, something like AffinityComputeTask, > > without explicit mapping phase, for convenient usage. > > > > I'm willing to make a JIRA ticket for this. > > > > Thoughs ? > > > > -- > > > > Best regards, > > Alexei Scherbakov > > > -- Best regards, Alexei Scherbakov |
Alexei,
I think AffinityKeyMapped supposed to work for ComputeJobs and any closures executed by IgniteCompute, but it seems there are no tests for this and this functionalty is broken now, and I think this should be fixed. Thanks, Semyon On Wed, Apr 19, 2017 at 9:59 PM, Alexei Scherbakov < [hidden email]> wrote: > Dmitriy, > > I know I could, but it requires too much work and messing with various APIs > for correct routing/failover. > > What about automatic partition reservation for consitency? How could I > achieve what using only affinity API ? > > I think this should be available out of the box, as for affinityRun/Call. > > Moreover, I could say the same for affinityRun/Call. One can use affinity > API and determine node to send a closure. Why bother with additional > methods ? > > And javadoc in AffinityKeyMapped definitely should be fixed, because it > refers to unexistent things. > > 2017-04-19 21:38 GMT+03:00 Dmitriy Setrakyan <[hidden email]>: > > > Alexey, > > > > Have you taken a look at the Affinity API in Ignite? It seems that it has > > all the functionality you may need to map partitions to nodes. You can > take > > that info, and use it to route your computations. > > > > https://ignite.apache.org/releases/latest/javadoc/org/ > > apache/ignite/cache/affinity/Affinity.html > > > > D. > > > > On Wed, Apr 19, 2017 at 10:09 AM, Alexei Scherbakov < > > [hidden email]> wrote: > > > > > Guys, > > > > > > Currently I'm looking into the problem how to properly deliver > > computation > > > to data in most efficient way. > > > > > > Basically I need to iterate over all cache partitions on all grid > nodes, > > > compute some function on each key-value pair and return aggregated > result > > > to a caller. > > > > > > This is a job for map-reduce API. > > > > > > But it seems where is no possibility to easily manage automatic routing > > and > > > failover of compute jobs to data nodes containing specific partitions. > > > > > > I found interesting paragraph in javadoc for @AffinityKeyMapped > > annotation. > > > > > > Collocating Computations And Data > > > It is also possible to route computations to the nodes where the data > is > > > cached. This concept is otherwise known as Collocation Of Computations > > And > > > Data .... > > > > > > which makes strong sense for me. > > > > > > But in fact this is not working. Instead we only have automatic > > routing(and > > > failover) for special cases: affinityCall and affinityRun with explicit > > > partition. And it seems I can't longer use task sessions for them with > > > recent changes in Compute API (removed withAsync support) > > > > > > I think this is not OK and we should allow jobs to be automatically > > routed > > > to data if they have some annotation attached to them specifying > > partition > > > and cache names, same as for affinityCall/Run. Probably we should > > introduce > > > special task type for such workflows, something like > AffinityComputeTask, > > > without explicit mapping phase, for convenient usage. > > > > > > I'm willing to make a JIRA ticket for this. > > > > > > Thoughs ? > > > > > > -- > > > > > > Best regards, > > > Alexei Scherbakov > > > > > > > > > -- > > Best regards, > Alexei Scherbakov > |
In reply to this post by Alexei Scherbakov
In this case, you should be using this API on IgniteCompute:
affinityRun(Collection<String> cacheNames, int partId, IgniteRunnable job) This will ensure that the partition is not migrated while the computation is in progress. Why is this method insufficient? As far as IgniteCompute deprecating the "withAsync" API, the async behavior will now be supported through direct async API invocations now, so the support for async behavior is still there. D. On Wed, Apr 19, 2017 at 11:59 AM, Alexei Scherbakov < [hidden email]> wrote: > Dmitriy, > > I know I could, but it requires too much work and messing with various APIs > for correct routing/failover. > > What about automatic partition reservation for consitency? How could I > achieve what using only affinity API ? > > I think this should be available out of the box, as for affinityRun/Call. > > Moreover, I could say the same for affinityRun/Call. One can use affinity > API and determine node to send a closure. Why bother with additional > methods ? > > And javadoc in AffinityKeyMapped definitely should be fixed, because it > refers to unexistent things. > > 2017-04-19 21:38 GMT+03:00 Dmitriy Setrakyan <[hidden email]>: > > > Alexey, > > > > Have you taken a look at the Affinity API in Ignite? It seems that it has > > all the functionality you may need to map partitions to nodes. You can > take > > that info, and use it to route your computations. > > > > https://ignite.apache.org/releases/latest/javadoc/org/ > > apache/ignite/cache/affinity/Affinity.html > > > > D. > > > > On Wed, Apr 19, 2017 at 10:09 AM, Alexei Scherbakov < > > [hidden email]> wrote: > > > > > Guys, > > > > > > Currently I'm looking into the problem how to properly deliver > > computation > > > to data in most efficient way. > > > > > > Basically I need to iterate over all cache partitions on all grid > nodes, > > > compute some function on each key-value pair and return aggregated > result > > > to a caller. > > > > > > This is a job for map-reduce API. > > > > > > But it seems where is no possibility to easily manage automatic routing > > and > > > failover of compute jobs to data nodes containing specific partitions. > > > > > > I found interesting paragraph in javadoc for @AffinityKeyMapped > > annotation. > > > > > > Collocating Computations And Data > > > It is also possible to route computations to the nodes where the data > is > > > cached. This concept is otherwise known as Collocation Of Computations > > And > > > Data .... > > > > > > which makes strong sense for me. > > > > > > But in fact this is not working. Instead we only have automatic > > routing(and > > > failover) for special cases: affinityCall and affinityRun with explicit > > > partition. And it seems I can't longer use task sessions for them with > > > recent changes in Compute API (removed withAsync support) > > > > > > I think this is not OK and we should allow jobs to be automatically > > routed > > > to data if they have some annotation attached to them specifying > > partition > > > and cache names, same as for affinityCall/Run. Probably we should > > introduce > > > special task type for such workflows, something like > AffinityComputeTask, > > > without explicit mapping phase, for convenient usage. > > > > > > I'm willing to make a JIRA ticket for this. > > > > > > Thoughs ? > > > > > > -- > > > > > > Best regards, > > > Alexei Scherbakov > > > > > > > > > -- > > Best regards, > Alexei Scherbakov > |
In reply to this post by Semyon Boikov
Semyon,
This is exactly what I want. But currently I don't undestand how affinity would be calculated without reference to a cache (or caches). I think something must be added to annotate such things. I'll create a ticket for this improvement. 2017-04-19 22:04 GMT+03:00 Semyon Boikov <[hidden email]>: > Alexei, > > I think AffinityKeyMapped supposed to work for ComputeJobs and any closures > executed by IgniteCompute, but it seems there are no tests for this and > this functionalty is broken now, and I think this should be fixed. > > Thanks, > Semyon > > On Wed, Apr 19, 2017 at 9:59 PM, Alexei Scherbakov < > [hidden email]> wrote: > > > Dmitriy, > > > > I know I could, but it requires too much work and messing with various > APIs > > for correct routing/failover. > > > > What about automatic partition reservation for consitency? How could I > > achieve what using only affinity API ? > > > > I think this should be available out of the box, as for affinityRun/Call. > > > > Moreover, I could say the same for affinityRun/Call. One can use affinity > > API and determine node to send a closure. Why bother with additional > > methods ? > > > > And javadoc in AffinityKeyMapped definitely should be fixed, because it > > refers to unexistent things. > > > > 2017-04-19 21:38 GMT+03:00 Dmitriy Setrakyan <[hidden email]>: > > > > > Alexey, > > > > > > Have you taken a look at the Affinity API in Ignite? It seems that it > has > > > all the functionality you may need to map partitions to nodes. You can > > take > > > that info, and use it to route your computations. > > > > > > https://ignite.apache.org/releases/latest/javadoc/org/ > > > apache/ignite/cache/affinity/Affinity.html > > > > > > D. > > > > > > On Wed, Apr 19, 2017 at 10:09 AM, Alexei Scherbakov < > > > [hidden email]> wrote: > > > > > > > Guys, > > > > > > > > Currently I'm looking into the problem how to properly deliver > > > computation > > > > to data in most efficient way. > > > > > > > > Basically I need to iterate over all cache partitions on all grid > > nodes, > > > > compute some function on each key-value pair and return aggregated > > result > > > > to a caller. > > > > > > > > This is a job for map-reduce API. > > > > > > > > But it seems where is no possibility to easily manage automatic > routing > > > and > > > > failover of compute jobs to data nodes containing specific > partitions. > > > > > > > > I found interesting paragraph in javadoc for @AffinityKeyMapped > > > annotation. > > > > > > > > Collocating Computations And Data > > > > It is also possible to route computations to the nodes where the data > > is > > > > cached. This concept is otherwise known as Collocation Of > Computations > > > And > > > > Data .... > > > > > > > > which makes strong sense for me. > > > > > > > > But in fact this is not working. Instead we only have automatic > > > routing(and > > > > failover) for special cases: affinityCall and affinityRun with > explicit > > > > partition. And it seems I can't longer use task sessions for them > with > > > > recent changes in Compute API (removed withAsync support) > > > > > > > > I think this is not OK and we should allow jobs to be automatically > > > routed > > > > to data if they have some annotation attached to them specifying > > > partition > > > > and cache names, same as for affinityCall/Run. Probably we should > > > introduce > > > > special task type for such workflows, something like > > AffinityComputeTask, > > > > without explicit mapping phase, for convenient usage. > > > > > > > > I'm willing to make a JIRA ticket for this. > > > > > > > > Thoughs ? > > > > > > > > -- > > > > > > > > Best regards, > > > > Alexei Scherbakov > > > > > > > > > > > > > > > -- > > > > Best regards, > > Alexei Scherbakov > > > -- Best regards, Alexei Scherbakov |
In reply to this post by dsetrakyan
affinityRun will work only for single closure.
But I need to have the same functionality for all jobs in my compute task, with support for task sessions and other benefits from map-reduce API. 2017-04-19 22:07 GMT+03:00 Dmitriy Setrakyan <[hidden email]>: > In this case, you should be using this API on IgniteCompute: > > affinityRun(Collection<String> cacheNames, int partId, IgniteRunnable job) > > This will ensure that the partition is not migrated while the computation > is in progress. Why is this method insufficient? > > As far as IgniteCompute deprecating the "withAsync" API, the async behavior > will now be supported through direct async API invocations now, so the > support for async behavior is still there. > > D. > > On Wed, Apr 19, 2017 at 11:59 AM, Alexei Scherbakov < > [hidden email]> wrote: > > > Dmitriy, > > > > I know I could, but it requires too much work and messing with various > APIs > > for correct routing/failover. > > > > What about automatic partition reservation for consitency? How could I > > achieve what using only affinity API ? > > > > I think this should be available out of the box, as for affinityRun/Call. > > > > Moreover, I could say the same for affinityRun/Call. One can use affinity > > API and determine node to send a closure. Why bother with additional > > methods ? > > > > And javadoc in AffinityKeyMapped definitely should be fixed, because it > > refers to unexistent things. > > > > 2017-04-19 21:38 GMT+03:00 Dmitriy Setrakyan <[hidden email]>: > > > > > Alexey, > > > > > > Have you taken a look at the Affinity API in Ignite? It seems that it > has > > > all the functionality you may need to map partitions to nodes. You can > > take > > > that info, and use it to route your computations. > > > > > > https://ignite.apache.org/releases/latest/javadoc/org/ > > > apache/ignite/cache/affinity/Affinity.html > > > > > > D. > > > > > > On Wed, Apr 19, 2017 at 10:09 AM, Alexei Scherbakov < > > > [hidden email]> wrote: > > > > > > > Guys, > > > > > > > > Currently I'm looking into the problem how to properly deliver > > > computation > > > > to data in most efficient way. > > > > > > > > Basically I need to iterate over all cache partitions on all grid > > nodes, > > > > compute some function on each key-value pair and return aggregated > > result > > > > to a caller. > > > > > > > > This is a job for map-reduce API. > > > > > > > > But it seems where is no possibility to easily manage automatic > routing > > > and > > > > failover of compute jobs to data nodes containing specific > partitions. > > > > > > > > I found interesting paragraph in javadoc for @AffinityKeyMapped > > > annotation. > > > > > > > > Collocating Computations And Data > > > > It is also possible to route computations to the nodes where the data > > is > > > > cached. This concept is otherwise known as Collocation Of > Computations > > > And > > > > Data .... > > > > > > > > which makes strong sense for me. > > > > > > > > But in fact this is not working. Instead we only have automatic > > > routing(and > > > > failover) for special cases: affinityCall and affinityRun with > explicit > > > > partition. And it seems I can't longer use task sessions for them > with > > > > recent changes in Compute API (removed withAsync support) > > > > > > > > I think this is not OK and we should allow jobs to be automatically > > > routed > > > > to data if they have some annotation attached to them specifying > > > partition > > > > and cache names, same as for affinityCall/Run. Probably we should > > > introduce > > > > special task type for such workflows, something like > > AffinityComputeTask, > > > > without explicit mapping phase, for convenient usage. > > > > > > > > I'm willing to make a JIRA ticket for this. > > > > > > > > Thoughs ? > > > > > > > > -- > > > > > > > > Best regards, > > > > Alexei Scherbakov > > > > > > > > > > > > > > > -- > > > > Best regards, > > Alexei Scherbakov > > > -- Best regards, Alexei Scherbakov |
I think the following should be done:
- AffinityKeyMapped annotation should be fixed if it's broken (obviously). - There should be a way to specify cache name in addition to key (another annotation?). - Semantics of a job or closure executed with this annotation should be the same as for affinityRun/Call, with all guarantees for reserved partitions, etc. Alexey, this would cover all the cases you described, right? -Val On Wed, Apr 19, 2017 at 9:16 PM, Alexei Scherbakov < [hidden email]> wrote: > affinityRun will work only for single closure. > > But I need to have the same functionality for all jobs in my compute task, > with support for task sessions and other benefits from map-reduce API. > > > > 2017-04-19 22:07 GMT+03:00 Dmitriy Setrakyan <[hidden email]>: > > > In this case, you should be using this API on IgniteCompute: > > > > affinityRun(Collection<String> cacheNames, int partId, IgniteRunnable > job) > > > > This will ensure that the partition is not migrated while the computation > > is in progress. Why is this method insufficient? > > > > As far as IgniteCompute deprecating the "withAsync" API, the async > behavior > > will now be supported through direct async API invocations now, so the > > support for async behavior is still there. > > > > D. > > > > On Wed, Apr 19, 2017 at 11:59 AM, Alexei Scherbakov < > > [hidden email]> wrote: > > > > > Dmitriy, > > > > > > I know I could, but it requires too much work and messing with various > > APIs > > > for correct routing/failover. > > > > > > What about automatic partition reservation for consitency? How could I > > > achieve what using only affinity API ? > > > > > > I think this should be available out of the box, as for > affinityRun/Call. > > > > > > Moreover, I could say the same for affinityRun/Call. One can use > affinity > > > API and determine node to send a closure. Why bother with additional > > > methods ? > > > > > > And javadoc in AffinityKeyMapped definitely should be fixed, because it > > > refers to unexistent things. > > > > > > 2017-04-19 21:38 GMT+03:00 Dmitriy Setrakyan <[hidden email]>: > > > > > > > Alexey, > > > > > > > > Have you taken a look at the Affinity API in Ignite? It seems that it > > has > > > > all the functionality you may need to map partitions to nodes. You > can > > > take > > > > that info, and use it to route your computations. > > > > > > > > https://ignite.apache.org/releases/latest/javadoc/org/ > > > > apache/ignite/cache/affinity/Affinity.html > > > > > > > > D. > > > > > > > > On Wed, Apr 19, 2017 at 10:09 AM, Alexei Scherbakov < > > > > [hidden email]> wrote: > > > > > > > > > Guys, > > > > > > > > > > Currently I'm looking into the problem how to properly deliver > > > > computation > > > > > to data in most efficient way. > > > > > > > > > > Basically I need to iterate over all cache partitions on all grid > > > nodes, > > > > > compute some function on each key-value pair and return aggregated > > > result > > > > > to a caller. > > > > > > > > > > This is a job for map-reduce API. > > > > > > > > > > But it seems where is no possibility to easily manage automatic > > routing > > > > and > > > > > failover of compute jobs to data nodes containing specific > > partitions. > > > > > > > > > > I found interesting paragraph in javadoc for @AffinityKeyMapped > > > > annotation. > > > > > > > > > > Collocating Computations And Data > > > > > It is also possible to route computations to the nodes where the > data > > > is > > > > > cached. This concept is otherwise known as Collocation Of > > Computations > > > > And > > > > > Data .... > > > > > > > > > > which makes strong sense for me. > > > > > > > > > > But in fact this is not working. Instead we only have automatic > > > > routing(and > > > > > failover) for special cases: affinityCall and affinityRun with > > explicit > > > > > partition. And it seems I can't longer use task sessions for them > > with > > > > > recent changes in Compute API (removed withAsync support) > > > > > > > > > > I think this is not OK and we should allow jobs to be automatically > > > > routed > > > > > to data if they have some annotation attached to them specifying > > > > partition > > > > > and cache names, same as for affinityCall/Run. Probably we should > > > > introduce > > > > > special task type for such workflows, something like > > > AffinityComputeTask, > > > > > without explicit mapping phase, for convenient usage. > > > > > > > > > > I'm willing to make a JIRA ticket for this. > > > > > > > > > > Thoughs ? > > > > > > > > > > -- > > > > > > > > > > Best regards, > > > > > Alexei Scherbakov > > > > > > > > > > > > > > > > > > > > > -- > > > > > > Best regards, > > > Alexei Scherbakov > > > > > > > > > -- > > Best regards, > Alexei Scherbakov > |
Free forum by Nabble | Edit this page |