Non-collocated distributed SQL Joins across caches over separate cluster groups

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Non-collocated distributed SQL Joins across caches over separate cluster groups

christos
Igniters,

Is it correct to assume the following:
We have an Ignite cluster comprised of 2 cluster groups A & B that have different caches deployed.
We use an Ignite client to obtain API access to the whole cluster and execute a join query that joins data across the 2 caches
My understanding is that this is not possible, correct?

Reading this article [1 <https://dzone.com/articles/how-apache-ignite-helped-a-large-bank-process-geog-1>] it seems that such cross-cluster-group behaviour is supported with the transactions API and also advised.

Any thoughts why the SQL API would not allow this and requires caches to be located on all nodes when the JOIN query is executed?

Cheers,
Christos
Reply | Threaded
Open this post in threaded view
|

Re: Non-collocated distributed SQL Joins across caches over separate cluster groups

Sergi
Hi,

Moreover distributed joins can be executed only between caches with the
same affinity (same partitions on the same nodes).

Keep in mind that distributed join is already a "last resort" thing and you
have to prefer collocated joins as much as possible, if you want to achieve
good performance. Distributed join between different cluster groups will
make things even worse.

Sergi

2017-04-05 12:02 GMT+03:00 Christos Erotocritou <[hidden email]>:

> Igniters,
>
> Is it correct to assume the following:
>
>    - We have an Ignite cluster comprised of 2 cluster groups A & B that
>    have different caches deployed.
>    - We use an Ignite client to obtain API access to the whole cluster
>    and execute a join query that joins data across the 2 caches
>
> My understanding is that this is *not possible*, correct?
>
> Reading this article [1
> <https://dzone.com/articles/how-apache-ignite-helped-a-large-bank-process-geog-1>]
> it seems that such cross-cluster-group behaviour is supported with the
> transactions API and also advised.
>
> Any thoughts why the SQL API would not allow this and requires caches to
> be located on all nodes when the JOIN query is executed?
>
> Cheers,
> Christos
>
Reply | Threaded
Open this post in threaded view
|

Re: Non-collocated distributed SQL Joins across caches over separate cluster groups

Andrew Mashenkov
Sergi,

Does it means that "broken" FairAffinityFunction can lead to wrong SQL
query result?
As we know, using FairAF have no guarantee that same parititions of
different caches can belongs to different nodes in some cases.

On Wed, Apr 5, 2017 at 1:47 PM, Sergi Vladykin <[hidden email]>
wrote:

> Hi,
>
> Moreover distributed joins can be executed only between caches with the
> same affinity (same partitions on the same nodes).
>
> Keep in mind that distributed join is already a "last resort" thing and you
> have to prefer collocated joins as much as possible, if you want to achieve
> good performance. Distributed join between different cluster groups will
> make things even worse.
>
> Sergi
>
> 2017-04-05 12:02 GMT+03:00 Christos Erotocritou <[hidden email]>:
>
> > Igniters,
> >
> > Is it correct to assume the following:
> >
> >    - We have an Ignite cluster comprised of 2 cluster groups A & B that
> >    have different caches deployed.
> >    - We use an Ignite client to obtain API access to the whole cluster
> >    and execute a join query that joins data across the 2 caches
> >
> > My understanding is that this is *not possible*, correct?
> >
> > Reading this article [1
> > <https://dzone.com/articles/how-apache-ignite-helped-a-
> large-bank-process-geog-1>]
> > it seems that such cross-cluster-group behaviour is supported with the
> > transactions API and also advised.
> >
> > Any thoughts why the SQL API would not allow this and requires caches to
> > be located on all nodes when the JOIN query is executed?
> >
> > Cheers,
> > Christos
> >
>



--
Best regards,
Andrey V. Mashenkov
Reply | Threaded
Open this post in threaded view
|

Re: Non-collocated distributed SQL Joins across caches over separate cluster groups

christos
I suggest we continue the conversation on the user list. My bad for pinging the email to both channels.
Reply | Threaded
Open this post in threaded view
|

Re: Non-collocated distributed SQL Joins across caches over separate cluster groups

Sergi
Andrey,

I did not know that FairAffinity can lead to this inconsistent behavior. AG
can you please comment on this?

Christos,

Because it will complicate execution pipeline (and by that may be slowdown
even collocated execution) and in case of different cluster groups we never
will be collocated.

Sergi

2017-04-05 15:22 GMT+03:00 christos <[hidden email]>:

> I suggest we continue the conversation on the user list. My bad for pinging
> the email to both channels.
>
>
>
> --
> View this message in context: http://apache-ignite-
> developers.2346864.n4.nabble.com/Non-collocated-
> distributed-SQL-Joins-across-caches-over-separate-cluster-
> groups-tp16136p16163.html
> Sent from the Apache Ignite Developers mailing list archive at Nabble.com.
>
Reply | Threaded
Open this post in threaded view
|

Re: Non-collocated distributed SQL Joins across caches over separate cluster groups

Alexey Goncharuk
Yes, this can happen if caches were created on different versions of
topology, because FairAffinityFunction is stateful and requires previous
affinity assignment state.

Generally, this can be fixed by introducing cache groups that use the same
affinity and use this shared state across all caches.

2017-04-06 12:37 GMT+03:00 Sergi Vladykin <[hidden email]>:

> Andrey,
>
> I did not know that FairAffinity can lead to this inconsistent behavior. AG
> can you please comment on this?
>
> Christos,
>
> Because it will complicate execution pipeline (and by that may be slowdown
> even collocated execution) and in case of different cluster groups we never
> will be collocated.
>
> Sergi
>
> 2017-04-05 15:22 GMT+03:00 christos <[hidden email]>:
>
> > I suggest we continue the conversation on the user list. My bad for
> pinging
> > the email to both channels.
> >
> >
> >
> > --
> > View this message in context: http://apache-ignite-
> > developers.2346864.n4.nabble.com/Non-collocated-
> > distributed-SQL-Joins-across-caches-over-separate-cluster-
> > groups-tp16136p16163.html
> > Sent from the Apache Ignite Developers mailing list archive at
> Nabble.com.
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Non-collocated distributed SQL Joins across caches over separate cluster groups

dsetrakyan
On Thu, Apr 6, 2017 at 2:52 AM, Alexey Goncharuk <[hidden email]
> wrote:

> Yes, this can happen if caches were created on different versions of
> topology, because FairAffinityFunction is stateful and requires previous
> affinity assignment state.
>

In this case we have to add validation for dynamically started  caches and
throw an exception if FairAffinityFunction was used. Do we do it?

Generally, this can be fixed by introducing cache groups that use the same
> affinity and use this shared state across all caches.
>

Can you please explain what you mean by this?