Apache Ignite Developers - Legacy Mail Archive

Calcite based SQL query engine. Local queries

Classic

List

Threaded

18 messages Options

gvvinblade

Calcite based SQL query engine. Local queries

Hi Igniters,

Working on new generation of Ignite SQL I faced a question: «Do we need local queries at all and, if so, what semantic they should have?».

Current planing flow consists of next steps:

1) Parsing SQL to AST
2) Validating AST (against Schema)
3) Optimizing (Building execution graph)
4) Splitting (into query fragments which executes on target nodes)
5) Mapping (query fragments to nodes/partitions)

At last step we check that all Fragment sources (a table or result) have the same distribution (in other words all sources have to be co-located)

Planner and Splitter guarantee that all caches in a Fragment are co-located, an Exchange is produced otherwise. But if we force local execution we cannot produce Exchanges, that means we may face two non-co-located caches inside a single query fragment (result of local query planning is a single query fragment). So, we cannot pass the check.

Should we throw an exception or omit the check for local query planning or prohibit local queries at all?

Your thoughts?

Regards,
Igor

Roman Kondakov

Re: Calcite based SQL query engine. Local queries

Hi Igor!

IMO we need to maintain the backward compatibility between old and new
query engines as much as possible. And therefore we shouldn't change the
behavior of local queries.

So, for local queries Calcite's planner shouldn't consider the
distribution trait at all.

--
Kind Regards
Roman Kondakov

On 01.11.2019 17:07, Seliverstov Igor wrote:

> Hi Igniters,
>
> Working on new generation of Ignite SQL I faced a question: «Do we need local queries at all and, if so, what semantic they should have?».
>
> Current planing flow consists of next steps:
>
> 1) Parsing SQL to AST
> 2) Validating AST (against Schema)
> 3) Optimizing (Building execution graph)
> 4) Splitting (into query fragments which executes on target nodes)
> 5) Mapping (query fragments to nodes/partitions)
>
> At last step we check that all Fragment sources (a table or result) have the same distribution (in other words all sources have to be co-located)
>
> Planner and Splitter guarantee that all caches in a Fragment are co-located, an Exchange is produced otherwise. But if we force local execution we cannot produce Exchanges, that means we may face two non-co-located caches inside a single query fragment (result of local query planning is a single query fragment). So, we cannot pass the check.
>
> Should we throw an exception or omit the check for local query planning or prohibit local queries at all?
>
> Your thoughts?
>
> Regards,
> Igor

dmagda

Re: Calcite based SQL query engine. Local queries

Hi Igor,

Local queries feature is broadly used together with affinity-based compute
tasks:
https://apacheignite.readme.io/docs/collocate-compute-and-data#section-affinity-call-and-run-methods

The use case is as follows. The user knows that all required data needed
for computation is collocated, and SQL is used as an advanced API for data
retrieval from the computation code. The affinity task ensures that
partitions won't be discarded from the node(s) if the topology changes
during the task execution and, thus, it's safe to run SQL locally skipping
distributed phases.

The combination of affinity compute tasks with local SQL is a real and
valuable use case, and this is what we need to support with Calcite. Do you
see any challenges?

-
Denis

On Fri, Nov 1, 2019 at 8:46 AM Roman Kondakov <[hidden email]>
wrote:

> Hi Igor!
>
> IMO we need to maintain the backward compatibility between old and new
> query engines as much as possible. And therefore we shouldn't change the
> behavior of local queries.
>
> So, for local queries Calcite's planner shouldn't consider the
> distribution trait at all.
>
>
> --
> Kind Regards
> Roman Kondakov
>
> On 01.11.2019 17:07, Seliverstov Igor wrote:
> > Hi Igniters,
> >
> > Working on new generation of Ignite SQL I faced a question: «Do we need
> local queries at all and, if so, what semantic they should have?».
> >
> > Current planing flow consists of next steps:
> >
> > 1) Parsing SQL to AST
> > 2) Validating AST (against Schema)
> > 3) Optimizing (Building execution graph)
> > 4) Splitting (into query fragments which executes on target nodes)
> > 5) Mapping (query fragments to nodes/partitions)
> >
> > At last step we check that all Fragment sources (a table or result) have
> the same distribution (in other words all sources have to be co-located)
> >
> > Planner and Splitter guarantee that all caches in a Fragment are
> co-located, an Exchange is produced otherwise. But if we force local
> execution we cannot produce Exchanges, that means we may face two
> non-co-located caches inside a single query fragment (result of local query
> planning is a single query fragment). So, we cannot pass the check.
> >
> > Should we throw an exception or omit the check for local query planning
> or prohibit local queries at all?
> >
> > Your thoughts?
> >
> > Regards,
> > Igor
>

Ivan Pavlukhin

Re: Calcite based SQL query engine. Local queries

Denis,

Would be nice to see real use-cases of affinity call + local SQL
combination. Generally, new engine will be able to infer collocation
resulting in the same collocated execution automatically.

пт, 1 нояб. 2019 г. в 19:11, Denis Magda <[hidden email]>:

>
> Hi Igor,
>
> Local queries feature is broadly used together with affinity-based compute
> tasks:
> https://apacheignite.readme.io/docs/collocate-compute-and-data#section-affinity-call-and-run-methods
>
> The use case is as follows. The user knows that all required data needed
> for computation is collocated, and SQL is used as an advanced API for data
> retrieval from the computation code. The affinity task ensures that
> partitions won't be discarded from the node(s) if the topology changes
> during the task execution and, thus, it's safe to run SQL locally skipping
> distributed phases.
>
> The combination of affinity compute tasks with local SQL is a real and
> valuable use case, and this is what we need to support with Calcite. Do you
> see any challenges?
>
> -
> Denis
>
>
> On Fri, Nov 1, 2019 at 8:46 AM Roman Kondakov <[hidden email]>
> wrote:
>
> > Hi Igor!
> >
> > IMO we need to maintain the backward compatibility between old and new
> > query engines as much as possible. And therefore we shouldn't change the
> > behavior of local queries.
> >
> > So, for local queries Calcite's planner shouldn't consider the
> > distribution trait at all.
> >
> >
> > --
> > Kind Regards
> > Roman Kondakov
> >
> > On 01.11.2019 17:07, Seliverstov Igor wrote:
> > > Hi Igniters,
> > >
> > > Working on new generation of Ignite SQL I faced a question: «Do we need
> > local queries at all and, if so, what semantic they should have?».
> > >
> > > Current planing flow consists of next steps:
> > >
> > > 1) Parsing SQL to AST
> > > 2) Validating AST (against Schema)
> > > 3) Optimizing (Building execution graph)
> > > 4) Splitting (into query fragments which executes on target nodes)
> > > 5) Mapping (query fragments to nodes/partitions)
> > >
> > > At last step we check that all Fragment sources (a table or result) have
> > the same distribution (in other words all sources have to be co-located)
> > >
> > > Planner and Splitter guarantee that all caches in a Fragment are
> > co-located, an Exchange is produced otherwise. But if we force local
> > execution we cannot produce Exchanges, that means we may face two
> > non-co-located caches inside a single query fragment (result of local query
> > planning is a single query fragment). So, we cannot pass the check.
> > >
> > > Should we throw an exception or omit the check for local query planning
> > or prohibit local queries at all?
> > >
> > > Your thoughts?
> > >
> > > Regards,
> > > Igor
> >

--
Best regards,
Ivan Pavlukhin

dmagda

Re: Calcite based SQL query engine. Local queries

Ivan,

I was involved in a couple of such use cases personally, so, that's not my
imagination ;) Even more, as far as I remember, the primary reason why we
improved our affinityRuns ensuring no partition is purged from a node until
a task is completed is because many users were running local SQL from
compute tasks and needed a guarantee that SQL will always return a correct
result set.

-
Denis

On Fri, Nov 1, 2019 at 10:01 AM Ivan Pavlukhin <[hidden email]> wrote:

> Denis,
>
> Would be nice to see real use-cases of affinity call + local SQL
> combination. Generally, new engine will be able to infer collocation
> resulting in the same collocated execution automatically.
>
> пт, 1 нояб. 2019 г. в 19:11, Denis Magda <[hidden email]>:
> >
> > Hi Igor,
> >
> > Local queries feature is broadly used together with affinity-based
> compute
> > tasks:
> >
> https://apacheignite.readme.io/docs/collocate-compute-and-data#section-affinity-call-and-run-methods
> >
> > The use case is as follows. The user knows that all required data needed
> > for computation is collocated, and SQL is used as an advanced API for
> data
> > retrieval from the computation code. The affinity task ensures that
> > partitions won't be discarded from the node(s) if the topology changes
> > during the task execution and, thus, it's safe to run SQL locally
> skipping
> > distributed phases.
> >
> > The combination of affinity compute tasks with local SQL is a real and
> > valuable use case, and this is what we need to support with Calcite. Do
> you
> > see any challenges?
> >
> > -
> > Denis
> >
> >
> > On Fri, Nov 1, 2019 at 8:46 AM Roman Kondakov <[hidden email]
> >
> > wrote:
> >
> > > Hi Igor!
> > >
> > > IMO we need to maintain the backward compatibility between old and new
> > > query engines as much as possible. And therefore we shouldn't change
> the
> > > behavior of local queries.
> > >
> > > So, for local queries Calcite's planner shouldn't consider the
> > > distribution trait at all.
> > >
> > >
> > > --
> > > Kind Regards
> > > Roman Kondakov
> > >
> > > On 01.11.2019 17:07, Seliverstov Igor wrote:
> > > > Hi Igniters,
> > > >
> > > > Working on new generation of Ignite SQL I faced a question: «Do we
> need
> > > local queries at all and, if so, what semantic they should have?».
> > > >
> > > > Current planing flow consists of next steps:
> > > >
> > > > 1) Parsing SQL to AST
> > > > 2) Validating AST (against Schema)
> > > > 3) Optimizing (Building execution graph)
> > > > 4) Splitting (into query fragments which executes on target nodes)
> > > > 5) Mapping (query fragments to nodes/partitions)
> > > >
> > > > At last step we check that all Fragment sources (a table or result)
> have
> > > the same distribution (in other words all sources have to be
> co-located)
> > > >
> > > > Planner and Splitter guarantee that all caches in a Fragment are
> > > co-located, an Exchange is produced otherwise. But if we force local
> > > execution we cannot produce Exchanges, that means we may face two
> > > non-co-located caches inside a single query fragment (result of local
> query
> > > planning is a single query fragment). So, we cannot pass the check.
> > > >
> > > > Should we throw an exception or omit the check for local query
> planning
> > > or prohibit local queries at all?
> > > >
> > > > Your thoughts?
> > > >
> > > > Regards,
> > > > Igor
> > >
>
>
>
> --
> Best regards,
> Ivan Pavlukhin
>

Ivan Pavlukhin

Re: Calcite based SQL query engine. Local queries

Denis,

I am mostly concerned about gathering use cases. It would be great to
critically assess such cases to identify why it cannot be solved by
using distributed SQL. Also it sounds similar to some kind of "hints",
but very limited and with all hints drawbacks (impossibility to use
full strength of CBO). We can provide better "hints" support with new
engine as well.

пт, 1 нояб. 2019 г. в 20:14, Denis Magda <[hidden email]>:

>
> Ivan,
>
> I was involved in a couple of such use cases personally, so, that's not my
> imagination ;) Even more, as far as I remember, the primary reason why we
> improved our affinityRuns ensuring no partition is purged from a node until
> a task is completed is because many users were running local SQL from
> compute tasks and needed a guarantee that SQL will always return a correct
> result set.
>
> -
> Denis
>
>
> On Fri, Nov 1, 2019 at 10:01 AM Ivan Pavlukhin <[hidden email]> wrote:
>
> > Denis,
> >
> > Would be nice to see real use-cases of affinity call + local SQL
> > combination. Generally, new engine will be able to infer collocation
> > resulting in the same collocated execution automatically.
> >
> > пт, 1 нояб. 2019 г. в 19:11, Denis Magda <[hidden email]>:
> > >
> > > Hi Igor,
> > >
> > > Local queries feature is broadly used together with affinity-based
> > compute
> > > tasks:
> > >
> > https://apacheignite.readme.io/docs/collocate-compute-and-data#section-affinity-call-and-run-methods
> > >
> > > The use case is as follows. The user knows that all required data needed
> > > for computation is collocated, and SQL is used as an advanced API for
> > data
> > > retrieval from the computation code. The affinity task ensures that
> > > partitions won't be discarded from the node(s) if the topology changes
> > > during the task execution and, thus, it's safe to run SQL locally
> > skipping
> > > distributed phases.
> > >
> > > The combination of affinity compute tasks with local SQL is a real and
> > > valuable use case, and this is what we need to support with Calcite. Do
> > you
> > > see any challenges?
> > >
> > > -
> > > Denis
> > >
> > >
> > > On Fri, Nov 1, 2019 at 8:46 AM Roman Kondakov <[hidden email]
> > >
> > > wrote:
> > >
> > > > Hi Igor!
> > > >
> > > > IMO we need to maintain the backward compatibility between old and new
> > > > query engines as much as possible. And therefore we shouldn't change
> > the
> > > > behavior of local queries.
> > > >
> > > > So, for local queries Calcite's planner shouldn't consider the
> > > > distribution trait at all.
> > > >
> > > >
> > > > --
> > > > Kind Regards
> > > > Roman Kondakov
> > > >
> > > > On 01.11.2019 17:07, Seliverstov Igor wrote:
> > > > > Hi Igniters,
> > > > >
> > > > > Working on new generation of Ignite SQL I faced a question: «Do we
> > need
> > > > local queries at all and, if so, what semantic they should have?».
> > > > >
> > > > > Current planing flow consists of next steps:
> > > > >
> > > > > 1) Parsing SQL to AST
> > > > > 2) Validating AST (against Schema)
> > > > > 3) Optimizing (Building execution graph)
> > > > > 4) Splitting (into query fragments which executes on target nodes)
> > > > > 5) Mapping (query fragments to nodes/partitions)
> > > > >
> > > > > At last step we check that all Fragment sources (a table or result)
> > have
> > > > the same distribution (in other words all sources have to be
> > co-located)
> > > > >
> > > > > Planner and Splitter guarantee that all caches in a Fragment are
> > > > co-located, an Exchange is produced otherwise. But if we force local
> > > > execution we cannot produce Exchanges, that means we may face two
> > > > non-co-located caches inside a single query fragment (result of local
> > query
> > > > planning is a single query fragment). So, we cannot pass the check.
> > > > >
> > > > > Should we throw an exception or omit the check for local query
> > planning
> > > > or prohibit local queries at all?
> > > > >
> > > > > Your thoughts?
> > > > >
> > > > > Regards,
> > > > > Igor
> > > >
> >
> >
> >
> > --
> > Best regards,
> > Ivan Pavlukhin
> >

--
Best regards,
Ivan Pavlukhin

sdarlington

Re: Calcite based SQL query engine. Local queries

A common use case is where you want to work on many rows of data across the grid. You’d broadcast a closure, running the same code on every node with just the local data. SQL doesn’t work in isolation — it’s often used as a filter for future computations.

Regards,
Stephen

> On 1 Nov 2019, at 17:53, Ivan Pavlukhin <[hidden email]> wrote:
>
> Denis,
>
> I am mostly concerned about gathering use cases. It would be great to
> critically assess such cases to identify why it cannot be solved by
> using distributed SQL. Also it sounds similar to some kind of "hints",
> but very limited and with all hints drawbacks (impossibility to use
> full strength of CBO). We can provide better "hints" support with new
> engine as well.
>
> пт, 1 нояб. 2019 г. в 20:14, Denis Magda <[hidden email]>:
>>
>> Ivan,
>>
>> I was involved in a couple of such use cases personally, so, that's not my
>> imagination ;) Even more, as far as I remember, the primary reason why we
>> improved our affinityRuns ensuring no partition is purged from a node until
>> a task is completed is because many users were running local SQL from
>> compute tasks and needed a guarantee that SQL will always return a correct
>> result set.
>>
>> -
>> Denis
>>
>>
>> On Fri, Nov 1, 2019 at 10:01 AM Ivan Pavlukhin <[hidden email]> wrote:
>>
>>> Denis,
>>>
>>> Would be nice to see real use-cases of affinity call + local SQL
>>> combination. Generally, new engine will be able to infer collocation
>>> resulting in the same collocated execution automatically.
>>>
>>> пт, 1 нояб. 2019 г. в 19:11, Denis Magda <[hidden email]>:
>>>>
>>>> Hi Igor,
>>>>
>>>> Local queries feature is broadly used together with affinity-based
>>> compute
>>>> tasks:
>>>>
>>> https://apacheignite.readme.io/docs/collocate-compute-and-data#section-affinity-call-and-run-methods
>>>>
>>>> The use case is as follows. The user knows that all required data needed
>>>> for computation is collocated, and SQL is used as an advanced API for
>>> data
>>>> retrieval from the computation code. The affinity task ensures that
>>>> partitions won't be discarded from the node(s) if the topology changes
>>>> during the task execution and, thus, it's safe to run SQL locally
>>> skipping
>>>> distributed phases.
>>>>
>>>> The combination of affinity compute tasks with local SQL is a real and
>>>> valuable use case, and this is what we need to support with Calcite. Do
>>> you
>>>> see any challenges?
>>>>
>>>> -
>>>> Denis
>>>>
>>>>
>>>> On Fri, Nov 1, 2019 at 8:46 AM Roman Kondakov <[hidden email]
>>>>
>>>> wrote:
>>>>
>>>>> Hi Igor!
>>>>>
>>>>> IMO we need to maintain the backward compatibility between old and new
>>>>> query engines as much as possible. And therefore we shouldn't change
>>> the
>>>>> behavior of local queries.
>>>>>
>>>>> So, for local queries Calcite's planner shouldn't consider the
>>>>> distribution trait at all.
>>>>>
>>>>>
>>>>> --
>>>>> Kind Regards
>>>>> Roman Kondakov
>>>>>
>>>>> On 01.11.2019 17:07, Seliverstov Igor wrote:
>>>>>> Hi Igniters,
>>>>>>
>>>>>> Working on new generation of Ignite SQL I faced a question: «Do we
>>> need
>>>>> local queries at all and, if so, what semantic they should have?».
>>>>>>
>>>>>> Current planing flow consists of next steps:
>>>>>>
>>>>>> 1) Parsing SQL to AST
>>>>>> 2) Validating AST (against Schema)
>>>>>> 3) Optimizing (Building execution graph)
>>>>>> 4) Splitting (into query fragments which executes on target nodes)
>>>>>> 5) Mapping (query fragments to nodes/partitions)
>>>>>>
>>>>>> At last step we check that all Fragment sources (a table or result)
>>> have
>>>>> the same distribution (in other words all sources have to be
>>> co-located)
>>>>>>
>>>>>> Planner and Splitter guarantee that all caches in a Fragment are
>>>>> co-located, an Exchange is produced otherwise. But if we force local
>>>>> execution we cannot produce Exchanges, that means we may face two
>>>>> non-co-located caches inside a single query fragment (result of local
>>> query
>>>>> planning is a single query fragment). So, we cannot pass the check.
>>>>>>
>>>>>> Should we throw an exception or omit the check for local query
>>> planning
>>>>> or prohibit local queries at all?
>>>>>>
>>>>>> Your thoughts?
>>>>>>
>>>>>> Regards,
>>>>>> Igor
>>>>>
>>>
>>>
>>>
>>> --
>>> Best regards,
>>> Ivan Pavlukhin
>>>
>
>
>
> --
> Best regards,
> Ivan Pavlukhin

Alexey Goncharuk

Re: Calcite based SQL query engine. Local queries

Denis, Stephen,

Running a local query in a broadcast closure won't work on changing
topology. We specifically added an affinityCall method to the compute API
in order to pin a partition to prevent its moving and eviction throughout
the task execution. Therefore, the query inside an affinityCall is always
executed against some partitions (otherwise the query may give incorrect
results when topology is changed).

I support Igor's question and think that the 'local' flag for the query
should be deprecated and eventually removed. A 'local' query can always be
expressed as a query agains a set of partitions. If those partitions are
located on the same node - good, we get fast and correct results. If not -
we may either raise an exception and ask user to remap the query, or
fallback to a distributed query execution.

Given that the Calcite prototype is in its early stages, it's likely its
first version will be available in 3.x, and it's a good chance to get rid
of wrong API pieces.

--AG

пн, 4 нояб. 2019 г. в 14:02, Stephen Darlington <
[hidden email]>:

> A common use case is where you want to work on many rows of data across
> the grid. You’d broadcast a closure, running the same code on every node
> with just the local data. SQL doesn’t work in isolation — it’s often used
> as a filter for future computations.
>
> Regards,
> Stephen
>
> > On 1 Nov 2019, at 17:53, Ivan Pavlukhin <[hidden email]> wrote:
> >
> > Denis,
> >
> > I am mostly concerned about gathering use cases. It would be great to
> > critically assess such cases to identify why it cannot be solved by
> > using distributed SQL. Also it sounds similar to some kind of "hints",
> > but very limited and with all hints drawbacks (impossibility to use
> > full strength of CBO). We can provide better "hints" support with new
> > engine as well.
> >
> > пт, 1 нояб. 2019 г. в 20:14, Denis Magda <[hidden email]>:
> >>
> >> Ivan,
> >>
> >> I was involved in a couple of such use cases personally, so, that's not
> my
> >> imagination ;) Even more, as far as I remember, the primary reason why
> we
> >> improved our affinityRuns ensuring no partition is purged from a node
> until
> >> a task is completed is because many users were running local SQL from
> >> compute tasks and needed a guarantee that SQL will always return a
> correct
> >> result set.
> >>
> >> -
> >> Denis
> >>
> >>
> >> On Fri, Nov 1, 2019 at 10:01 AM Ivan Pavlukhin <[hidden email]>
> wrote:
> >>
> >>> Denis,
> >>>
> >>> Would be nice to see real use-cases of affinity call + local SQL
> >>> combination. Generally, new engine will be able to infer collocation
> >>> resulting in the same collocated execution automatically.
> >>>
> >>> пт, 1 нояб. 2019 г. в 19:11, Denis Magda <[hidden email]>:
> >>>>
> >>>> Hi Igor,
> >>>>
> >>>> Local queries feature is broadly used together with affinity-based
> >>> compute
> >>>> tasks:
> >>>>
> >>>
> https://apacheignite.readme.io/docs/collocate-compute-and-data#section-affinity-call-and-run-methods
> >>>>
> >>>> The use case is as follows. The user knows that all required data
> needed
> >>>> for computation is collocated, and SQL is used as an advanced API for
> >>> data
> >>>> retrieval from the computation code. The affinity task ensures that
> >>>> partitions won't be discarded from the node(s) if the topology changes
> >>>> during the task execution and, thus, it's safe to run SQL locally
> >>> skipping
> >>>> distributed phases.
> >>>>
> >>>> The combination of affinity compute tasks with local SQL is a real and
> >>>> valuable use case, and this is what we need to support with Calcite.
> Do
> >>> you
> >>>> see any challenges?
> >>>>
> >>>> -
> >>>> Denis
> >>>>
> >>>>
> >>>> On Fri, Nov 1, 2019 at 8:46 AM Roman Kondakov
> <[hidden email]
> >>>>
> >>>> wrote:
> >>>>
> >>>>> Hi Igor!
> >>>>>
> >>>>> IMO we need to maintain the backward compatibility between old and
> new
> >>>>> query engines as much as possible. And therefore we shouldn't change
> >>> the
> >>>>> behavior of local queries.
> >>>>>
> >>>>> So, for local queries Calcite's planner shouldn't consider the
> >>>>> distribution trait at all.
> >>>>>
> >>>>>
> >>>>> --
> >>>>> Kind Regards
> >>>>> Roman Kondakov
> >>>>>
> >>>>> On 01.11.2019 17:07, Seliverstov Igor wrote:
> >>>>>> Hi Igniters,
> >>>>>>
> >>>>>> Working on new generation of Ignite SQL I faced a question: «Do we
> >>> need
> >>>>> local queries at all and, if so, what semantic they should have?».
> >>>>>>
> >>>>>> Current planing flow consists of next steps:
> >>>>>>
> >>>>>> 1) Parsing SQL to AST
> >>>>>> 2) Validating AST (against Schema)
> >>>>>> 3) Optimizing (Building execution graph)
> >>>>>> 4) Splitting (into query fragments which executes on target nodes)
> >>>>>> 5) Mapping (query fragments to nodes/partitions)
> >>>>>>
> >>>>>> At last step we check that all Fragment sources (a table or result)
> >>> have
> >>>>> the same distribution (in other words all sources have to be
> >>> co-located)
> >>>>>>
> >>>>>> Planner and Splitter guarantee that all caches in a Fragment are
> >>>>> co-located, an Exchange is produced otherwise. But if we force local
> >>>>> execution we cannot produce Exchanges, that means we may face two
> >>>>> non-co-located caches inside a single query fragment (result of local
> >>> query
> >>>>> planning is a single query fragment). So, we cannot pass the check.
> >>>>>>
> >>>>>> Should we throw an exception or omit the check for local query
> >>> planning
> >>>>> or prohibit local queries at all?
> >>>>>>
> >>>>>> Your thoughts?
> >>>>>>
> >>>>>> Regards,
> >>>>>> Igor
> >>>>>
> >>>
> >>>
> >>>
> >>> --
> >>> Best regards,
> >>> Ivan Pavlukhin
> >>>
> >
> >
> >
> > --
> > Best regards,
> > Ivan Pavlukhin
>
>
>

Andrew Mashenkov

Re: Calcite based SQL query engine. Local queries

+1 to Alexey's concerns.

Local SQL query mode is error prone, as a query executes over non-predicted
set of partitions.
Using local mode with deep SQL execution model understanding will lead to
inconsistent result.

Just imagine if we add a note to documentation that "in case of local SQL
user results can depends on topology (partition distribution)".
This definitely not looks like the thing we'd like to provide to end-user.

On Thu, Nov 7, 2019 at 11:31 AM Alexey Goncharuk <[hidden email]>
wrote:

> Denis, Stephen,
>
> Running a local query in a broadcast closure won't work on changing
> topology. We specifically added an affinityCall method to the compute API
> in order to pin a partition to prevent its moving and eviction throughout
> the task execution. Therefore, the query inside an affinityCall is always
> executed against some partitions (otherwise the query may give incorrect
> results when topology is changed).
>
> I support Igor's question and think that the 'local' flag for the query
> should be deprecated and eventually removed. A 'local' query can always be
> expressed as a query agains a set of partitions. If those partitions are
> located on the same node - good, we get fast and correct results. If not -
> we may either raise an exception and ask user to remap the query, or
> fallback to a distributed query execution.
>
> Given that the Calcite prototype is in its early stages, it's likely its
> first version will be available in 3.x, and it's a good chance to get rid
> of wrong API pieces.
>
> --AG
>
> пн, 4 нояб. 2019 г. в 14:02, Stephen Darlington <
> [hidden email]>:
>
> > A common use case is where you want to work on many rows of data across
> > the grid. You’d broadcast a closure, running the same code on every node
> > with just the local data. SQL doesn’t work in isolation — it’s often used
> > as a filter for future computations.
> >
> > Regards,
> > Stephen
> >
> > > On 1 Nov 2019, at 17:53, Ivan Pavlukhin <[hidden email]> wrote:
> > >
> > > Denis,
> > >
> > > I am mostly concerned about gathering use cases. It would be great to
> > > critically assess such cases to identify why it cannot be solved by
> > > using distributed SQL. Also it sounds similar to some kind of "hints",
> > > but very limited and with all hints drawbacks (impossibility to use
> > > full strength of CBO). We can provide better "hints" support with new
> > > engine as well.
> > >
> > > пт, 1 нояб. 2019 г. в 20:14, Denis Magda <[hidden email]>:
> > >>
> > >> Ivan,
> > >>
> > >> I was involved in a couple of such use cases personally, so, that's
> not
> > my
> > >> imagination ;) Even more, as far as I remember, the primary reason why
> > we
> > >> improved our affinityRuns ensuring no partition is purged from a node
> > until
> > >> a task is completed is because many users were running local SQL from
> > >> compute tasks and needed a guarantee that SQL will always return a
> > correct
> > >> result set.
> > >>
> > >> -
> > >> Denis
> > >>
> > >>
> > >> On Fri, Nov 1, 2019 at 10:01 AM Ivan Pavlukhin <[hidden email]>
> > wrote:
> > >>
> > >>> Denis,
> > >>>
> > >>> Would be nice to see real use-cases of affinity call + local SQL
> > >>> combination. Generally, new engine will be able to infer collocation
> > >>> resulting in the same collocated execution automatically.
> > >>>
> > >>> пт, 1 нояб. 2019 г. в 19:11, Denis Magda <[hidden email]>:
> > >>>>
> > >>>> Hi Igor,
> > >>>>
> > >>>> Local queries feature is broadly used together with affinity-based
> > >>> compute
> > >>>> tasks:
> > >>>>
> > >>>
> >
> https://apacheignite.readme.io/docs/collocate-compute-and-data#section-affinity-call-and-run-methods
> > >>>>
> > >>>> The use case is as follows. The user knows that all required data
> > needed
> > >>>> for computation is collocated, and SQL is used as an advanced API
> for
> > >>> data
> > >>>> retrieval from the computation code. The affinity task ensures that
> > >>>> partitions won't be discarded from the node(s) if the topology
> changes
> > >>>> during the task execution and, thus, it's safe to run SQL locally
> > >>> skipping
> > >>>> distributed phases.
> > >>>>
> > >>>> The combination of affinity compute tasks with local SQL is a real
> and
> > >>>> valuable use case, and this is what we need to support with Calcite.
> > Do
> > >>> you
> > >>>> see any challenges?
> > >>>>
> > >>>> -
> > >>>> Denis
> > >>>>
> > >>>>
> > >>>> On Fri, Nov 1, 2019 at 8:46 AM Roman Kondakov
> > <[hidden email]
> > >>>>
> > >>>> wrote:
> > >>>>
> > >>>>> Hi Igor!
> > >>>>>
> > >>>>> IMO we need to maintain the backward compatibility between old and
> > new
> > >>>>> query engines as much as possible. And therefore we shouldn't
> change
> > >>> the
> > >>>>> behavior of local queries.
> > >>>>>
> > >>>>> So, for local queries Calcite's planner shouldn't consider the
> > >>>>> distribution trait at all.
> > >>>>>
> > >>>>>
> > >>>>> --
> > >>>>> Kind Regards
> > >>>>> Roman Kondakov
> > >>>>>
> > >>>>> On 01.11.2019 17:07, Seliverstov Igor wrote:
> > >>>>>> Hi Igniters,
> > >>>>>>
> > >>>>>> Working on new generation of Ignite SQL I faced a question: «Do we
> > >>> need
> > >>>>> local queries at all and, if so, what semantic they should have?».
> > >>>>>>
> > >>>>>> Current planing flow consists of next steps:
> > >>>>>>
> > >>>>>> 1) Parsing SQL to AST
> > >>>>>> 2) Validating AST (against Schema)
> > >>>>>> 3) Optimizing (Building execution graph)
> > >>>>>> 4) Splitting (into query fragments which executes on target nodes)
> > >>>>>> 5) Mapping (query fragments to nodes/partitions)
> > >>>>>>
> > >>>>>> At last step we check that all Fragment sources (a table or
> result)
> > >>> have
> > >>>>> the same distribution (in other words all sources have to be
> > >>> co-located)
> > >>>>>>
> > >>>>>> Planner and Splitter guarantee that all caches in a Fragment are
> > >>>>> co-located, an Exchange is produced otherwise. But if we force
> local
> > >>>>> execution we cannot produce Exchanges, that means we may face two
> > >>>>> non-co-located caches inside a single query fragment (result of
> local
> > >>> query
> > >>>>> planning is a single query fragment). So, we cannot pass the check.
> > >>>>>>
> > >>>>>> Should we throw an exception or omit the check for local query
> > >>> planning
> > >>>>> or prohibit local queries at all?
> > >>>>>>
> > >>>>>> Your thoughts?
> > >>>>>>
> > >>>>>> Regards,
> > >>>>>> Igor
> > >>>>>
> > >>>
> > >>>
> > >>>
> > >>> --
> > >>> Best regards,
> > >>> Ivan Pavlukhin
> > >>>
> > >
> > >
> > >
> > > --
> > > Best regards,
> > > Ivan Pavlukhin
> >
> >
> >
>

--
Best regards,
Andrey V. Mashenkov

sdarlington

Re: Calcite based SQL query engine. Local queries

In reply to this post by Alexey Goncharuk

I made a (bad) assumption that this would also affect queries against partitions. If “setLocal()” goes away but “setPartitions()” remains I’m happy.

What I would say is that the “broadcast / local” method is one I see fairly often. Do we need to do a better job educating people of the “correct” way?

Regards,
Stephen

> On 7 Nov 2019, at 08:30, Alexey Goncharuk <[hidden email]> wrote:
>
> Denis, Stephen,
>
> Running a local query in a broadcast closure won't work on changing
> topology. We specifically added an affinityCall method to the compute API
> in order to pin a partition to prevent its moving and eviction throughout
> the task execution. Therefore, the query inside an affinityCall is always
> executed against some partitions (otherwise the query may give incorrect
> results when topology is changed).
>
> I support Igor's question and think that the 'local' flag for the query
> should be deprecated and eventually removed. A 'local' query can always be
> expressed as a query agains a set of partitions. If those partitions are
> located on the same node - good, we get fast and correct results. If not -
> we may either raise an exception and ask user to remap the query, or
> fallback to a distributed query execution.
>
> Given that the Calcite prototype is in its early stages, it's likely its
> first version will be available in 3.x, and it's a good chance to get rid
> of wrong API pieces.
>
> --AG
>
> пн, 4 нояб. 2019 г. в 14:02, Stephen Darlington <
> [hidden email]>:
>
>> A common use case is where you want to work on many rows of data across
>> the grid. You’d broadcast a closure, running the same code on every node
>> with just the local data. SQL doesn’t work in isolation — it’s often used
>> as a filter for future computations.
>>
>> Regards,
>> Stephen
>>
>>> On 1 Nov 2019, at 17:53, Ivan Pavlukhin <[hidden email]> wrote:
>>>
>>> Denis,
>>>
>>> I am mostly concerned about gathering use cases. It would be great to
>>> critically assess such cases to identify why it cannot be solved by
>>> using distributed SQL. Also it sounds similar to some kind of "hints",
>>> but very limited and with all hints drawbacks (impossibility to use
>>> full strength of CBO). We can provide better "hints" support with new
>>> engine as well.
>>>
>>> пт, 1 нояб. 2019 г. в 20:14, Denis Magda <[hidden email]>:
>>>>
>>>> Ivan,
>>>>
>>>> I was involved in a couple of such use cases personally, so, that's not
>> my
>>>> imagination ;) Even more, as far as I remember, the primary reason why
>> we
>>>> improved our affinityRuns ensuring no partition is purged from a node
>> until
>>>> a task is completed is because many users were running local SQL from
>>>> compute tasks and needed a guarantee that SQL will always return a
>> correct
>>>> result set.
>>>>
>>>> -
>>>> Denis
>>>>
>>>>
>>>> On Fri, Nov 1, 2019 at 10:01 AM Ivan Pavlukhin <[hidden email]>
>> wrote:
>>>>
>>>>> Denis,
>>>>>
>>>>> Would be nice to see real use-cases of affinity call + local SQL
>>>>> combination. Generally, new engine will be able to infer collocation
>>>>> resulting in the same collocated execution automatically.
>>>>>
>>>>> пт, 1 нояб. 2019 г. в 19:11, Denis Magda <[hidden email]>:
>>>>>>
>>>>>> Hi Igor,
>>>>>>
>>>>>> Local queries feature is broadly used together with affinity-based
>>>>> compute
>>>>>> tasks:
>>>>>>
>>>>>
>> https://apacheignite.readme.io/docs/collocate-compute-and-data#section-affinity-call-and-run-methods
>>>>>>
>>>>>> The use case is as follows. The user knows that all required data
>> needed
>>>>>> for computation is collocated, and SQL is used as an advanced API for
>>>>> data
>>>>>> retrieval from the computation code. The affinity task ensures that
>>>>>> partitions won't be discarded from the node(s) if the topology changes
>>>>>> during the task execution and, thus, it's safe to run SQL locally
>>>>> skipping
>>>>>> distributed phases.
>>>>>>
>>>>>> The combination of affinity compute tasks with local SQL is a real and
>>>>>> valuable use case, and this is what we need to support with Calcite.
>> Do
>>>>> you
>>>>>> see any challenges?
>>>>>>
>>>>>> -
>>>>>> Denis
>>>>>>
>>>>>>
>>>>>> On Fri, Nov 1, 2019 at 8:46 AM Roman Kondakov
>> <[hidden email]
>>>>>>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Igor!
>>>>>>>
>>>>>>> IMO we need to maintain the backward compatibility between old and
>> new
>>>>>>> query engines as much as possible. And therefore we shouldn't change
>>>>> the
>>>>>>> behavior of local queries.
>>>>>>>
>>>>>>> So, for local queries Calcite's planner shouldn't consider the
>>>>>>> distribution trait at all.
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Kind Regards
>>>>>>> Roman Kondakov
>>>>>>>
>>>>>>> On 01.11.2019 17:07, Seliverstov Igor wrote:
>>>>>>>> Hi Igniters,
>>>>>>>>
>>>>>>>> Working on new generation of Ignite SQL I faced a question: «Do we
>>>>> need
>>>>>>> local queries at all and, if so, what semantic they should have?».
>>>>>>>>
>>>>>>>> Current planing flow consists of next steps:
>>>>>>>>
>>>>>>>> 1) Parsing SQL to AST
>>>>>>>> 2) Validating AST (against Schema)
>>>>>>>> 3) Optimizing (Building execution graph)
>>>>>>>> 4) Splitting (into query fragments which executes on target nodes)
>>>>>>>> 5) Mapping (query fragments to nodes/partitions)
>>>>>>>>
>>>>>>>> At last step we check that all Fragment sources (a table or result)
>>>>> have
>>>>>>> the same distribution (in other words all sources have to be
>>>>> co-located)
>>>>>>>>
>>>>>>>> Planner and Splitter guarantee that all caches in a Fragment are
>>>>>>> co-located, an Exchange is produced otherwise. But if we force local
>>>>>>> execution we cannot produce Exchanges, that means we may face two
>>>>>>> non-co-located caches inside a single query fragment (result of local
>>>>> query
>>>>>>> planning is a single query fragment). So, we cannot pass the check.
>>>>>>>>
>>>>>>>> Should we throw an exception or omit the check for local query
>>>>> planning
>>>>>>> or prohibit local queries at all?
>>>>>>>>
>>>>>>>> Your thoughts?
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Igor
>>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Best regards,
>>>>> Ivan Pavlukhin
>>>>>
>>>
>>>
>>>
>>> --
>>> Best regards,
>>> Ivan Pavlukhin
>>
>>
>>

Ivan Pavlukhin

Re: Calcite based SQL query engine. Local queries

Stephen,

In my understanding we need to do a better job to realize use-cases of
Compute + LocalSQL ourselves.

Ideally smart optimizer should do the best job of query deployment.

чт, 7 нояб. 2019 г. в 13:04, Stephen Darlington
<[hidden email]>:

>
> I made a (bad) assumption that this would also affect queries against partitions. If “setLocal()” goes away but “setPartitions()” remains I’m happy.
>
> What I would say is that the “broadcast / local” method is one I see fairly often. Do we need to do a better job educating people of the “correct” way?
>
> Regards,
> Stephen
>
> > On 7 Nov 2019, at 08:30, Alexey Goncharuk <[hidden email]> wrote:
> >
> > Denis, Stephen,
> >
> > Running a local query in a broadcast closure won't work on changing
> > topology. We specifically added an affinityCall method to the compute API
> > in order to pin a partition to prevent its moving and eviction throughout
> > the task execution. Therefore, the query inside an affinityCall is always
> > executed against some partitions (otherwise the query may give incorrect
> > results when topology is changed).
> >
> > I support Igor's question and think that the 'local' flag for the query
> > should be deprecated and eventually removed. A 'local' query can always be
> > expressed as a query agains a set of partitions. If those partitions are
> > located on the same node - good, we get fast and correct results. If not -
> > we may either raise an exception and ask user to remap the query, or
> > fallback to a distributed query execution.
> >
> > Given that the Calcite prototype is in its early stages, it's likely its
> > first version will be available in 3.x, and it's a good chance to get rid
> > of wrong API pieces.
> >
> > --AG
> >
> > пн, 4 нояб. 2019 г. в 14:02, Stephen Darlington <
> > [hidden email]>:
> >
> >> A common use case is where you want to work on many rows of data across
> >> the grid. You’d broadcast a closure, running the same code on every node
> >> with just the local data. SQL doesn’t work in isolation — it’s often used
> >> as a filter for future computations.
> >>
> >> Regards,
> >> Stephen
> >>
> >>> On 1 Nov 2019, at 17:53, Ivan Pavlukhin <[hidden email]> wrote:
> >>>
> >>> Denis,
> >>>
> >>> I am mostly concerned about gathering use cases. It would be great to
> >>> critically assess such cases to identify why it cannot be solved by
> >>> using distributed SQL. Also it sounds similar to some kind of "hints",
> >>> but very limited and with all hints drawbacks (impossibility to use
> >>> full strength of CBO). We can provide better "hints" support with new
> >>> engine as well.
> >>>
> >>> пт, 1 нояб. 2019 г. в 20:14, Denis Magda <[hidden email]>:
> >>>>
> >>>> Ivan,
> >>>>
> >>>> I was involved in a couple of such use cases personally, so, that's not
> >> my
> >>>> imagination ;) Even more, as far as I remember, the primary reason why
> >> we
> >>>> improved our affinityRuns ensuring no partition is purged from a node
> >> until
> >>>> a task is completed is because many users were running local SQL from
> >>>> compute tasks and needed a guarantee that SQL will always return a
> >> correct
> >>>> result set.
> >>>>
> >>>> -
> >>>> Denis
> >>>>
> >>>>
> >>>> On Fri, Nov 1, 2019 at 10:01 AM Ivan Pavlukhin <[hidden email]>
> >> wrote:
> >>>>
> >>>>> Denis,
> >>>>>
> >>>>> Would be nice to see real use-cases of affinity call + local SQL
> >>>>> combination. Generally, new engine will be able to infer collocation
> >>>>> resulting in the same collocated execution automatically.
> >>>>>
> >>>>> пт, 1 нояб. 2019 г. в 19:11, Denis Magda <[hidden email]>:
> >>>>>>
> >>>>>> Hi Igor,
> >>>>>>
> >>>>>> Local queries feature is broadly used together with affinity-based
> >>>>> compute
> >>>>>> tasks:
> >>>>>>
> >>>>>
> >> https://apacheignite.readme.io/docs/collocate-compute-and-data#section-affinity-call-and-run-methods
> >>>>>>
> >>>>>> The use case is as follows. The user knows that all required data
> >> needed
> >>>>>> for computation is collocated, and SQL is used as an advanced API for
> >>>>> data
> >>>>>> retrieval from the computation code. The affinity task ensures that
> >>>>>> partitions won't be discarded from the node(s) if the topology changes
> >>>>>> during the task execution and, thus, it's safe to run SQL locally
> >>>>> skipping
> >>>>>> distributed phases.
> >>>>>>
> >>>>>> The combination of affinity compute tasks with local SQL is a real and
> >>>>>> valuable use case, and this is what we need to support with Calcite.
> >> Do
> >>>>> you
> >>>>>> see any challenges?
> >>>>>>
> >>>>>> -
> >>>>>> Denis
> >>>>>>
> >>>>>>
> >>>>>> On Fri, Nov 1, 2019 at 8:46 AM Roman Kondakov
> >> <[hidden email]
> >>>>>>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> Hi Igor!
> >>>>>>>
> >>>>>>> IMO we need to maintain the backward compatibility between old and
> >> new
> >>>>>>> query engines as much as possible. And therefore we shouldn't change
> >>>>> the
> >>>>>>> behavior of local queries.
> >>>>>>>
> >>>>>>> So, for local queries Calcite's planner shouldn't consider the
> >>>>>>> distribution trait at all.
> >>>>>>>
> >>>>>>>
> >>>>>>> --
> >>>>>>> Kind Regards
> >>>>>>> Roman Kondakov
> >>>>>>>
> >>>>>>> On 01.11.2019 17:07, Seliverstov Igor wrote:
> >>>>>>>> Hi Igniters,
> >>>>>>>>
> >>>>>>>> Working on new generation of Ignite SQL I faced a question: «Do we
> >>>>> need
> >>>>>>> local queries at all and, if so, what semantic they should have?».
> >>>>>>>>
> >>>>>>>> Current planing flow consists of next steps:
> >>>>>>>>
> >>>>>>>> 1) Parsing SQL to AST
> >>>>>>>> 2) Validating AST (against Schema)
> >>>>>>>> 3) Optimizing (Building execution graph)
> >>>>>>>> 4) Splitting (into query fragments which executes on target nodes)
> >>>>>>>> 5) Mapping (query fragments to nodes/partitions)
> >>>>>>>>
> >>>>>>>> At last step we check that all Fragment sources (a table or result)
> >>>>> have
> >>>>>>> the same distribution (in other words all sources have to be
> >>>>> co-located)
> >>>>>>>>
> >>>>>>>> Planner and Splitter guarantee that all caches in a Fragment are
> >>>>>>> co-located, an Exchange is produced otherwise. But if we force local
> >>>>>>> execution we cannot produce Exchanges, that means we may face two
> >>>>>>> non-co-located caches inside a single query fragment (result of local
> >>>>> query
> >>>>>>> planning is a single query fragment). So, we cannot pass the check.
> >>>>>>>>
> >>>>>>>> Should we throw an exception or omit the check for local query
> >>>>> planning
> >>>>>>> or prohibit local queries at all?
> >>>>>>>>
> >>>>>>>> Your thoughts?
> >>>>>>>>
> >>>>>>>> Regards,
> >>>>>>>> Igor
> >>>>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> Best regards,
> >>>>> Ivan Pavlukhin
> >>>>>
> >>>
> >>>
> >>>
> >>> --
> >>> Best regards,
> >>> Ivan Pavlukhin
> >>
> >>
> >>
>
>

--
Best regards,
Ivan Pavlukhin

dmagda

Re: Calcite based SQL query engine. Local queries

Folks,

See our compute tasks as an advanced version of stored procedures that let
the users code the logic of various complexity with Java, .NET or C++ (and
not with PL/SQL). The logic can use a combination of APIs (key-value, SQL,
etc.) to access data both locally and remotely while being executed on
server nodes. The logic can make N key-value requests or run M SQL queries.

We kept supporting local SQL queries exactly for such scenarios (for our
version of stored procedures) to ensure the distributed map-reduce phase is
canceled if all the data is local. And affinityCalls were improved one day
to pin the partitions.

If the new engine is smart enough to understand that all the partitions are
available locally during the affinityRun execution then it's totally fine
to remove the 'local' flag. Otherwise, we need to instruct the engine
manually that a distributed phase is redundant via 'local' flag or by other
means.

Does it make things clearer?

-
Denis

On Thu, Nov 7, 2019 at 3:53 AM Ivan Pavlukhin <[hidden email]> wrote:

> Stephen,
>
> In my understanding we need to do a better job to realize use-cases of
> Compute + LocalSQL ourselves.
>
> Ideally smart optimizer should do the best job of query deployment.
>
> чт, 7 нояб. 2019 г. в 13:04, Stephen Darlington
> <[hidden email]>:
> >
> > I made a (bad) assumption that this would also affect queries against
> partitions. If “setLocal()” goes away but “setPartitions()” remains I’m
> happy.
> >
> > What I would say is that the “broadcast / local” method is one I see
> fairly often. Do we need to do a better job educating people of the
> “correct” way?
> >
> > Regards,
> > Stephen
> >
> > > On 7 Nov 2019, at 08:30, Alexey Goncharuk <[hidden email]>
> wrote:
> > >
> > > Denis, Stephen,
> > >
> > > Running a local query in a broadcast closure won't work on changing
> > > topology. We specifically added an affinityCall method to the compute
> API
> > > in order to pin a partition to prevent its moving and eviction
> throughout
> > > the task execution. Therefore, the query inside an affinityCall is
> always
> > > executed against some partitions (otherwise the query may give
> incorrect
> > > results when topology is changed).
> > >
> > > I support Igor's question and think that the 'local' flag for the query
> > > should be deprecated and eventually removed. A 'local' query can
> always be
> > > expressed as a query agains a set of partitions. If those partitions
> are
> > > located on the same node - good, we get fast and correct results. If
> not -
> > > we may either raise an exception and ask user to remap the query, or
> > > fallback to a distributed query execution.
> > >
> > > Given that the Calcite prototype is in its early stages, it's likely
> its
> > > first version will be available in 3.x, and it's a good chance to get
> rid
> > > of wrong API pieces.
> > >
> > > --AG
> > >
> > > пн, 4 нояб. 2019 г. в 14:02, Stephen Darlington <
> > > [hidden email]>:
> > >
> > >> A common use case is where you want to work on many rows of data
> across
> > >> the grid. You’d broadcast a closure, running the same code on every
> node
> > >> with just the local data. SQL doesn’t work in isolation — it’s often
> used
> > >> as a filter for future computations.
> > >>
> > >> Regards,
> > >> Stephen
> > >>
> > >>> On 1 Nov 2019, at 17:53, Ivan Pavlukhin <[hidden email]> wrote:
> > >>>
> > >>> Denis,
> > >>>
> > >>> I am mostly concerned about gathering use cases. It would be great to
> > >>> critically assess such cases to identify why it cannot be solved by
> > >>> using distributed SQL. Also it sounds similar to some kind of
> "hints",
> > >>> but very limited and with all hints drawbacks (impossibility to use
> > >>> full strength of CBO). We can provide better "hints" support with new
> > >>> engine as well.
> > >>>
> > >>> пт, 1 нояб. 2019 г. в 20:14, Denis Magda <[hidden email]>:
> > >>>>
> > >>>> Ivan,
> > >>>>
> > >>>> I was involved in a couple of such use cases personally, so, that's
> not
> > >> my
> > >>>> imagination ;) Even more, as far as I remember, the primary reason
> why
> > >> we
> > >>>> improved our affinityRuns ensuring no partition is purged from a
> node
> > >> until
> > >>>> a task is completed is because many users were running local SQL
> from
> > >>>> compute tasks and needed a guarantee that SQL will always return a
> > >> correct
> > >>>> result set.
> > >>>>
> > >>>> -
> > >>>> Denis
> > >>>>
> > >>>>
> > >>>> On Fri, Nov 1, 2019 at 10:01 AM Ivan Pavlukhin <[hidden email]
> >
> > >> wrote:
> > >>>>
> > >>>>> Denis,
> > >>>>>
> > >>>>> Would be nice to see real use-cases of affinity call + local SQL
> > >>>>> combination. Generally, new engine will be able to infer
> collocation
> > >>>>> resulting in the same collocated execution automatically.
> > >>>>>
> > >>>>> пт, 1 нояб. 2019 г. в 19:11, Denis Magda <[hidden email]>:
> > >>>>>>
> > >>>>>> Hi Igor,
> > >>>>>>
> > >>>>>> Local queries feature is broadly used together with affinity-based
> > >>>>> compute
> > >>>>>> tasks:
> > >>>>>>
> > >>>>>
> > >>
> https://apacheignite.readme.io/docs/collocate-compute-and-data#section-affinity-call-and-run-methods
> > >>>>>>
> > >>>>>> The use case is as follows. The user knows that all required data
> > >> needed
> > >>>>>> for computation is collocated, and SQL is used as an advanced API
> for
> > >>>>> data
> > >>>>>> retrieval from the computation code. The affinity task ensures
> that
> > >>>>>> partitions won't be discarded from the node(s) if the topology
> changes
> > >>>>>> during the task execution and, thus, it's safe to run SQL locally
> > >>>>> skipping
> > >>>>>> distributed phases.
> > >>>>>>
> > >>>>>> The combination of affinity compute tasks with local SQL is a
> real and
> > >>>>>> valuable use case, and this is what we need to support with
> Calcite.
> > >> Do
> > >>>>> you
> > >>>>>> see any challenges?
> > >>>>>>
> > >>>>>> -
> > >>>>>> Denis
> > >>>>>>
> > >>>>>>
> > >>>>>> On Fri, Nov 1, 2019 at 8:46 AM Roman Kondakov
> > >> <[hidden email]
> > >>>>>>
> > >>>>>> wrote:
> > >>>>>>
> > >>>>>>> Hi Igor!
> > >>>>>>>
> > >>>>>>> IMO we need to maintain the backward compatibility between old
> and
> > >> new
> > >>>>>>> query engines as much as possible. And therefore we shouldn't
> change
> > >>>>> the
> > >>>>>>> behavior of local queries.
> > >>>>>>>
> > >>>>>>> So, for local queries Calcite's planner shouldn't consider the
> > >>>>>>> distribution trait at all.
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> --
> > >>>>>>> Kind Regards
> > >>>>>>> Roman Kondakov
> > >>>>>>>
> > >>>>>>> On 01.11.2019 17:07, Seliverstov Igor wrote:
> > >>>>>>>> Hi Igniters,
> > >>>>>>>>
> > >>>>>>>> Working on new generation of Ignite SQL I faced a question: «Do
> we
> > >>>>> need
> > >>>>>>> local queries at all and, if so, what semantic they should
> have?».
> > >>>>>>>>
> > >>>>>>>> Current planing flow consists of next steps:
> > >>>>>>>>
> > >>>>>>>> 1) Parsing SQL to AST
> > >>>>>>>> 2) Validating AST (against Schema)
> > >>>>>>>> 3) Optimizing (Building execution graph)
> > >>>>>>>> 4) Splitting (into query fragments which executes on target
> nodes)
> > >>>>>>>> 5) Mapping (query fragments to nodes/partitions)
> > >>>>>>>>
> > >>>>>>>> At last step we check that all Fragment sources (a table or
> result)
> > >>>>> have
> > >>>>>>> the same distribution (in other words all sources have to be
> > >>>>> co-located)
> > >>>>>>>>
> > >>>>>>>> Planner and Splitter guarantee that all caches in a Fragment are
> > >>>>>>> co-located, an Exchange is produced otherwise. But if we force
> local
> > >>>>>>> execution we cannot produce Exchanges, that means we may face two
> > >>>>>>> non-co-located caches inside a single query fragment (result of
> local
> > >>>>> query
> > >>>>>>> planning is a single query fragment). So, we cannot pass the
> check.
> > >>>>>>>>
> > >>>>>>>> Should we throw an exception or omit the check for local query
> > >>>>> planning
> > >>>>>>> or prohibit local queries at all?
> > >>>>>>>>
> > >>>>>>>> Your thoughts?
> > >>>>>>>>
> > >>>>>>>> Regards,
> > >>>>>>>> Igor
> > >>>>>>>
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> --
> > >>>>> Best regards,
> > >>>>> Ivan Pavlukhin
> > >>>>>
> > >>>
> > >>>
> > >>>
> > >>> --
> > >>> Best regards,
> > >>> Ivan Pavlukhin
> > >>
> > >>
> > >>
> >
> >
>
>
> --
> Best regards,
> Ivan Pavlukhin
>

Ivan Pavlukhin

Re: Calcite based SQL query engine. Local queries

Denis,

To make things really clearer we need to provide some concrete example
of Compute + LocalSQL and reason about it to figure out whether
"smart" SQL engine can deliver the same (or better) results or not.

пт, 8 нояб. 2019 г. в 01:48, Denis Magda <[hidden email]>:

>
> Folks,
>
> See our compute tasks as an advanced version of stored procedures that let
> the users code the logic of various complexity with Java, .NET or C++ (and
> not with PL/SQL). The logic can use a combination of APIs (key-value, SQL,
> etc.) to access data both locally and remotely while being executed on
> server nodes. The logic can make N key-value requests or run M SQL queries.
>
> We kept supporting local SQL queries exactly for such scenarios (for our
> version of stored procedures) to ensure the distributed map-reduce phase is
> canceled if all the data is local. And affinityCalls were improved one day
> to pin the partitions.
>
> If the new engine is smart enough to understand that all the partitions are
> available locally during the affinityRun execution then it's totally fine
> to remove the 'local' flag. Otherwise, we need to instruct the engine
> manually that a distributed phase is redundant via 'local' flag or by other
> means.
>
> Does it make things clearer?
>
>
> -
> Denis
>
>
> On Thu, Nov 7, 2019 at 3:53 AM Ivan Pavlukhin <[hidden email]> wrote:
>
> > Stephen,
> >
> > In my understanding we need to do a better job to realize use-cases of
> > Compute + LocalSQL ourselves.
> >
> > Ideally smart optimizer should do the best job of query deployment.
> >
> > чт, 7 нояб. 2019 г. в 13:04, Stephen Darlington
> > <[hidden email]>:
> > >
> > > I made a (bad) assumption that this would also affect queries against
> > partitions. If “setLocal()” goes away but “setPartitions()” remains I’m
> > happy.
> > >
> > > What I would say is that the “broadcast / local” method is one I see
> > fairly often. Do we need to do a better job educating people of the
> > “correct” way?
> > >
> > > Regards,
> > > Stephen
> > >
> > > > On 7 Nov 2019, at 08:30, Alexey Goncharuk <[hidden email]>
> > wrote:
> > > >
> > > > Denis, Stephen,
> > > >
> > > > Running a local query in a broadcast closure won't work on changing
> > > > topology. We specifically added an affinityCall method to the compute
> > API
> > > > in order to pin a partition to prevent its moving and eviction
> > throughout
> > > > the task execution. Therefore, the query inside an affinityCall is
> > always
> > > > executed against some partitions (otherwise the query may give
> > incorrect
> > > > results when topology is changed).
> > > >
> > > > I support Igor's question and think that the 'local' flag for the query
> > > > should be deprecated and eventually removed. A 'local' query can
> > always be
> > > > expressed as a query agains a set of partitions. If those partitions
> > are
> > > > located on the same node - good, we get fast and correct results. If
> > not -
> > > > we may either raise an exception and ask user to remap the query, or
> > > > fallback to a distributed query execution.
> > > >
> > > > Given that the Calcite prototype is in its early stages, it's likely
> > its
> > > > first version will be available in 3.x, and it's a good chance to get
> > rid
> > > > of wrong API pieces.
> > > >
> > > > --AG
> > > >
> > > > пн, 4 нояб. 2019 г. в 14:02, Stephen Darlington <
> > > > [hidden email]>:
> > > >
> > > >> A common use case is where you want to work on many rows of data
> > across
> > > >> the grid. You’d broadcast a closure, running the same code on every
> > node
> > > >> with just the local data. SQL doesn’t work in isolation — it’s often
> > used
> > > >> as a filter for future computations.
> > > >>
> > > >> Regards,
> > > >> Stephen
> > > >>
> > > >>> On 1 Nov 2019, at 17:53, Ivan Pavlukhin <[hidden email]> wrote:
> > > >>>
> > > >>> Denis,
> > > >>>
> > > >>> I am mostly concerned about gathering use cases. It would be great to
> > > >>> critically assess such cases to identify why it cannot be solved by
> > > >>> using distributed SQL. Also it sounds similar to some kind of
> > "hints",
> > > >>> but very limited and with all hints drawbacks (impossibility to use
> > > >>> full strength of CBO). We can provide better "hints" support with new
> > > >>> engine as well.
> > > >>>
> > > >>> пт, 1 нояб. 2019 г. в 20:14, Denis Magda <[hidden email]>:
> > > >>>>
> > > >>>> Ivan,
> > > >>>>
> > > >>>> I was involved in a couple of such use cases personally, so, that's
> > not
> > > >> my
> > > >>>> imagination ;) Even more, as far as I remember, the primary reason
> > why
> > > >> we
> > > >>>> improved our affinityRuns ensuring no partition is purged from a
> > node
> > > >> until
> > > >>>> a task is completed is because many users were running local SQL
> > from
> > > >>>> compute tasks and needed a guarantee that SQL will always return a
> > > >> correct
> > > >>>> result set.
> > > >>>>
> > > >>>> -
> > > >>>> Denis
> > > >>>>
> > > >>>>
> > > >>>> On Fri, Nov 1, 2019 at 10:01 AM Ivan Pavlukhin <[hidden email]
> > >
> > > >> wrote:
> > > >>>>
> > > >>>>> Denis,
> > > >>>>>
> > > >>>>> Would be nice to see real use-cases of affinity call + local SQL
> > > >>>>> combination. Generally, new engine will be able to infer
> > collocation
> > > >>>>> resulting in the same collocated execution automatically.
> > > >>>>>
> > > >>>>> пт, 1 нояб. 2019 г. в 19:11, Denis Magda <[hidden email]>:
> > > >>>>>>
> > > >>>>>> Hi Igor,
> > > >>>>>>
> > > >>>>>> Local queries feature is broadly used together with affinity-based
> > > >>>>> compute
> > > >>>>>> tasks:
> > > >>>>>>
> > > >>>>>
> > > >>
> > https://apacheignite.readme.io/docs/collocate-compute-and-data#section-affinity-call-and-run-methods
> > > >>>>>>
> > > >>>>>> The use case is as follows. The user knows that all required data
> > > >> needed
> > > >>>>>> for computation is collocated, and SQL is used as an advanced API
> > for
> > > >>>>> data
> > > >>>>>> retrieval from the computation code. The affinity task ensures
> > that
> > > >>>>>> partitions won't be discarded from the node(s) if the topology
> > changes
> > > >>>>>> during the task execution and, thus, it's safe to run SQL locally
> > > >>>>> skipping
> > > >>>>>> distributed phases.
> > > >>>>>>
> > > >>>>>> The combination of affinity compute tasks with local SQL is a
> > real and
> > > >>>>>> valuable use case, and this is what we need to support with
> > Calcite.
> > > >> Do
> > > >>>>> you
> > > >>>>>> see any challenges?
> > > >>>>>>
> > > >>>>>> -
> > > >>>>>> Denis
> > > >>>>>>
> > > >>>>>>
> > > >>>>>> On Fri, Nov 1, 2019 at 8:46 AM Roman Kondakov
> > > >> <[hidden email]
> > > >>>>>>
> > > >>>>>> wrote:
> > > >>>>>>
> > > >>>>>>> Hi Igor!
> > > >>>>>>>
> > > >>>>>>> IMO we need to maintain the backward compatibility between old
> > and
> > > >> new
> > > >>>>>>> query engines as much as possible. And therefore we shouldn't
> > change
> > > >>>>> the
> > > >>>>>>> behavior of local queries.
> > > >>>>>>>
> > > >>>>>>> So, for local queries Calcite's planner shouldn't consider the
> > > >>>>>>> distribution trait at all.
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>> --
> > > >>>>>>> Kind Regards
> > > >>>>>>> Roman Kondakov
> > > >>>>>>>
> > > >>>>>>> On 01.11.2019 17:07, Seliverstov Igor wrote:
> > > >>>>>>>> Hi Igniters,
> > > >>>>>>>>
> > > >>>>>>>> Working on new generation of Ignite SQL I faced a question: «Do
> > we
> > > >>>>> need
> > > >>>>>>> local queries at all and, if so, what semantic they should
> > have?».
> > > >>>>>>>>
> > > >>>>>>>> Current planing flow consists of next steps:
> > > >>>>>>>>
> > > >>>>>>>> 1) Parsing SQL to AST
> > > >>>>>>>> 2) Validating AST (against Schema)
> > > >>>>>>>> 3) Optimizing (Building execution graph)
> > > >>>>>>>> 4) Splitting (into query fragments which executes on target
> > nodes)
> > > >>>>>>>> 5) Mapping (query fragments to nodes/partitions)
> > > >>>>>>>>
> > > >>>>>>>> At last step we check that all Fragment sources (a table or
> > result)
> > > >>>>> have
> > > >>>>>>> the same distribution (in other words all sources have to be
> > > >>>>> co-located)
> > > >>>>>>>>
> > > >>>>>>>> Planner and Splitter guarantee that all caches in a Fragment are
> > > >>>>>>> co-located, an Exchange is produced otherwise. But if we force
> > local
> > > >>>>>>> execution we cannot produce Exchanges, that means we may face two
> > > >>>>>>> non-co-located caches inside a single query fragment (result of
> > local
> > > >>>>> query
> > > >>>>>>> planning is a single query fragment). So, we cannot pass the
> > check.
> > > >>>>>>>>
> > > >>>>>>>> Should we throw an exception or omit the check for local query
> > > >>>>> planning
> > > >>>>>>> or prohibit local queries at all?
> > > >>>>>>>>
> > > >>>>>>>> Your thoughts?
> > > >>>>>>>>
> > > >>>>>>>> Regards,
> > > >>>>>>>> Igor
> > > >>>>>>>
> > > >>>>>
> > > >>>>>
> > > >>>>>
> > > >>>>> --
> > > >>>>> Best regards,
> > > >>>>> Ivan Pavlukhin
> > > >>>>>
> > > >>>
> > > >>>
> > > >>>
> > > >>> --
> > > >>> Best regards,
> > > >>> Ivan Pavlukhin
> > > >>
> > > >>
> > > >>
> > >
> > >
> >
> >
> > --
> > Best regards,
> > Ivan Pavlukhin
> >

--
Best regards,
Ivan Pavlukhin

Dmitry Pavlov

Re: Calcite based SQL query engine. Local queries

Hi Ivan, Igniters, imagine you need to scan all entities in the cluster.

Ideally, you don't want to de-serialize all of entries, so you can use
withKeepBinary(). e.g. you need a couple of fields and get some cumulative
metric on this data. You can send compute to all cluster nodes and run
there SQL scan queries with local mode is on. In that manner you can
implement Map-Reduce.

It may be there is another way of doing that, so I encourage to share it. I
could update workshops/training I preparing in background.

Sincerely,
Dmitriy Pavlov

пт, 8 нояб. 2019 г. в 08:57, Ivan Pavlukhin <[hidden email]>:

> Denis,
>
> To make things really clearer we need to provide some concrete example
> of Compute + LocalSQL and reason about it to figure out whether
> "smart" SQL engine can deliver the same (or better) results or not.
>
> пт, 8 нояб. 2019 г. в 01:48, Denis Magda <[hidden email]>:
> >
> > Folks,
> >
> > See our compute tasks as an advanced version of stored procedures that
> let
> > the users code the logic of various complexity with Java, .NET or C++
> (and
> > not with PL/SQL). The logic can use a combination of APIs (key-value,
> SQL,
> > etc.) to access data both locally and remotely while being executed on
> > server nodes. The logic can make N key-value requests or run M SQL
> queries.
> >
> > We kept supporting local SQL queries exactly for such scenarios (for our
> > version of stored procedures) to ensure the distributed map-reduce phase
> is
> > canceled if all the data is local. And affinityCalls were improved one
> day
> > to pin the partitions.
> >
> > If the new engine is smart enough to understand that all the partitions
> are
> > available locally during the affinityRun execution then it's totally fine
> > to remove the 'local' flag. Otherwise, we need to instruct the engine
> > manually that a distributed phase is redundant via 'local' flag or by
> other
> > means.
> >
> > Does it make things clearer?
> >
> >
> > -
> > Denis
> >
> >
> > On Thu, Nov 7, 2019 at 3:53 AM Ivan Pavlukhin <[hidden email]>
> wrote:
> >
> > > Stephen,
> > >
> > > In my understanding we need to do a better job to realize use-cases of
> > > Compute + LocalSQL ourselves.
> > >
> > > Ideally smart optimizer should do the best job of query deployment.
> > >
> > > чт, 7 нояб. 2019 г. в 13:04, Stephen Darlington
> > > <[hidden email]>:
> > > >
> > > > I made a (bad) assumption that this would also affect queries against
> > > partitions. If “setLocal()” goes away but “setPartitions()” remains I’m
> > > happy.
> > > >
> > > > What I would say is that the “broadcast / local” method is one I see
> > > fairly often. Do we need to do a better job educating people of the
> > > “correct” way?
> > > >
> > > > Regards,
> > > > Stephen
> > > >
> > > > > On 7 Nov 2019, at 08:30, Alexey Goncharuk <
> [hidden email]>
> > > wrote:
> > > > >
> > > > > Denis, Stephen,
> > > > >
> > > > > Running a local query in a broadcast closure won't work on changing
> > > > > topology. We specifically added an affinityCall method to the
> compute
> > > API
> > > > > in order to pin a partition to prevent its moving and eviction
> > > throughout
> > > > > the task execution. Therefore, the query inside an affinityCall is
> > > always
> > > > > executed against some partitions (otherwise the query may give
> > > incorrect
> > > > > results when topology is changed).
> > > > >
> > > > > I support Igor's question and think that the 'local' flag for the
> query
> > > > > should be deprecated and eventually removed. A 'local' query can
> > > always be
> > > > > expressed as a query agains a set of partitions. If those
> partitions
> > > are
> > > > > located on the same node - good, we get fast and correct results.
> If
> > > not -
> > > > > we may either raise an exception and ask user to remap the query,
> or
> > > > > fallback to a distributed query execution.
> > > > >
> > > > > Given that the Calcite prototype is in its early stages, it's
> likely
> > > its
> > > > > first version will be available in 3.x, and it's a good chance to
> get
> > > rid
> > > > > of wrong API pieces.
> > > > >
> > > > > --AG
> > > > >
> > > > > пн, 4 нояб. 2019 г. в 14:02, Stephen Darlington <
> > > > > [hidden email]>:
> > > > >
> > > > >> A common use case is where you want to work on many rows of data
> > > across
> > > > >> the grid. You’d broadcast a closure, running the same code on
> every
> > > node
> > > > >> with just the local data. SQL doesn’t work in isolation — it’s
> often
> > > used
> > > > >> as a filter for future computations.
> > > > >>
> > > > >> Regards,
> > > > >> Stephen
> > > > >>
> > > > >>> On 1 Nov 2019, at 17:53, Ivan Pavlukhin <[hidden email]>
> wrote:
> > > > >>>
> > > > >>> Denis,
> > > > >>>
> > > > >>> I am mostly concerned about gathering use cases. It would be
> great to
> > > > >>> critically assess such cases to identify why it cannot be solved
> by
> > > > >>> using distributed SQL. Also it sounds similar to some kind of
> > > "hints",
> > > > >>> but very limited and with all hints drawbacks (impossibility to
> use
> > > > >>> full strength of CBO). We can provide better "hints" support
> with new
> > > > >>> engine as well.
> > > > >>>
> > > > >>> пт, 1 нояб. 2019 г. в 20:14, Denis Magda <[hidden email]>:
> > > > >>>>
> > > > >>>> Ivan,
> > > > >>>>
> > > > >>>> I was involved in a couple of such use cases personally, so,
> that's
> > > not
> > > > >> my
> > > > >>>> imagination ;) Even more, as far as I remember, the primary
> reason
> > > why
> > > > >> we
> > > > >>>> improved our affinityRuns ensuring no partition is purged from a
> > > node
> > > > >> until
> > > > >>>> a task is completed is because many users were running local SQL
> > > from
> > > > >>>> compute tasks and needed a guarantee that SQL will always
> return a
> > > > >> correct
> > > > >>>> result set.
> > > > >>>>
> > > > >>>> -
> > > > >>>> Denis
> > > > >>>>
> > > > >>>>
> > > > >>>> On Fri, Nov 1, 2019 at 10:01 AM Ivan Pavlukhin <
> [hidden email]
> > > >
> > > > >> wrote:
> > > > >>>>
> > > > >>>>> Denis,
> > > > >>>>>
> > > > >>>>> Would be nice to see real use-cases of affinity call + local
> SQL
> > > > >>>>> combination. Generally, new engine will be able to infer
> > > collocation
> > > > >>>>> resulting in the same collocated execution automatically.
> > > > >>>>>
> > > > >>>>> пт, 1 нояб. 2019 г. в 19:11, Denis Magda <[hidden email]>:
> > > > >>>>>>
> > > > >>>>>> Hi Igor,
> > > > >>>>>>
> > > > >>>>>> Local queries feature is broadly used together with
> affinity-based
> > > > >>>>> compute
> > > > >>>>>> tasks:
> > > > >>>>>>
> > > > >>>>>
> > > > >>
> > >
> https://apacheignite.readme.io/docs/collocate-compute-and-data#section-affinity-call-and-run-methods
> > > > >>>>>>
> > > > >>>>>> The use case is as follows. The user knows that all required
> data
> > > > >> needed
> > > > >>>>>> for computation is collocated, and SQL is used as an advanced
> API
> > > for
> > > > >>>>> data
> > > > >>>>>> retrieval from the computation code. The affinity task ensures
> > > that
> > > > >>>>>> partitions won't be discarded from the node(s) if the topology
> > > changes
> > > > >>>>>> during the task execution and, thus, it's safe to run SQL
> locally
> > > > >>>>> skipping
> > > > >>>>>> distributed phases.
> > > > >>>>>>
> > > > >>>>>> The combination of affinity compute tasks with local SQL is a
> > > real and
> > > > >>>>>> valuable use case, and this is what we need to support with
> > > Calcite.
> > > > >> Do
> > > > >>>>> you
> > > > >>>>>> see any challenges?
> > > > >>>>>>
> > > > >>>>>> -
> > > > >>>>>> Denis
> > > > >>>>>>
> > > > >>>>>>
> > > > >>>>>> On Fri, Nov 1, 2019 at 8:46 AM Roman Kondakov
> > > > >> <[hidden email]
> > > > >>>>>>
> > > > >>>>>> wrote:
> > > > >>>>>>
> > > > >>>>>>> Hi Igor!
> > > > >>>>>>>
> > > > >>>>>>> IMO we need to maintain the backward compatibility between
> old
> > > and
> > > > >> new
> > > > >>>>>>> query engines as much as possible. And therefore we shouldn't
> > > change
> > > > >>>>> the
> > > > >>>>>>> behavior of local queries.
> > > > >>>>>>>
> > > > >>>>>>> So, for local queries Calcite's planner shouldn't consider
> the
> > > > >>>>>>> distribution trait at all.
> > > > >>>>>>>
> > > > >>>>>>>
> > > > >>>>>>> --
> > > > >>>>>>> Kind Regards
> > > > >>>>>>> Roman Kondakov
> > > > >>>>>>>
> > > > >>>>>>> On 01.11.2019 17:07, Seliverstov Igor wrote:
> > > > >>>>>>>> Hi Igniters,
> > > > >>>>>>>>
> > > > >>>>>>>> Working on new generation of Ignite SQL I faced a question:
> «Do
> > > we
> > > > >>>>> need
> > > > >>>>>>> local queries at all and, if so, what semantic they should
> > > have?».
> > > > >>>>>>>>
> > > > >>>>>>>> Current planing flow consists of next steps:
> > > > >>>>>>>>
> > > > >>>>>>>> 1) Parsing SQL to AST
> > > > >>>>>>>> 2) Validating AST (against Schema)
> > > > >>>>>>>> 3) Optimizing (Building execution graph)
> > > > >>>>>>>> 4) Splitting (into query fragments which executes on target
> > > nodes)
> > > > >>>>>>>> 5) Mapping (query fragments to nodes/partitions)
> > > > >>>>>>>>
> > > > >>>>>>>> At last step we check that all Fragment sources (a table or
> > > result)
> > > > >>>>> have
> > > > >>>>>>> the same distribution (in other words all sources have to be
> > > > >>>>> co-located)
> > > > >>>>>>>>
> > > > >>>>>>>> Planner and Splitter guarantee that all caches in a
> Fragment are
> > > > >>>>>>> co-located, an Exchange is produced otherwise. But if we
> force
> > > local
> > > > >>>>>>> execution we cannot produce Exchanges, that means we may
> face two
> > > > >>>>>>> non-co-located caches inside a single query fragment (result
> of
> > > local
> > > > >>>>> query
> > > > >>>>>>> planning is a single query fragment). So, we cannot pass the
> > > check.
> > > > >>>>>>>>
> > > > >>>>>>>> Should we throw an exception or omit the check for local
> query
> > > > >>>>> planning
> > > > >>>>>>> or prohibit local queries at all?
> > > > >>>>>>>>
> > > > >>>>>>>> Your thoughts?
> > > > >>>>>>>>
> > > > >>>>>>>> Regards,
> > > > >>>>>>>> Igor
> > > > >>>>>>>
> > > > >>>>>
> > > > >>>>>
> > > > >>>>>
> > > > >>>>> --
> > > > >>>>> Best regards,
> > > > >>>>> Ivan Pavlukhin
> > > > >>>>>
> > > > >>>
> > > > >>>
> > > > >>>
> > > > >>> --
> > > > >>> Best regards,
> > > > >>> Ivan Pavlukhin
> > > > >>
> > > > >>
> > > > >>
> > > >
> > > >
> > >
> > >
> > > --
> > > Best regards,
> > > Ivan Pavlukhin
> > >
>
>
>
> --
> Best regards,
> Ivan Pavlukhin
>

Ivan Pavlukhin

Re: Calcite based SQL query engine. Local queries

Dmitriy,

First, what kind of cumulative metric can it be? A lot of cumulative
metrics can be compared using SQL. MIN, MAX, AVG are simple ones. For
more complex ones I can think about user-define aggregate functions
(UDAF). We do not have them in Ignite so far, but can introduce them.

Second, naive approaches of such ComputeScan can lead to incorrect
results as partitions might not be properly pinned and duplicate
entries might appear.

пт, 8 нояб. 2019 г. в 15:27, Dmitriy Pavlov <[hidden email]>:

>
> Hi Ivan, Igniters, imagine you need to scan all entities in the cluster.
>
> Ideally, you don't want to de-serialize all of entries, so you can use
> withKeepBinary(). e.g. you need a couple of fields and get some cumulative
> metric on this data. You can send compute to all cluster nodes and run
> there SQL scan queries with local mode is on. In that manner you can
> implement Map-Reduce.
>
> It may be there is another way of doing that, so I encourage to share it. I
> could update workshops/training I preparing in background.
>
> Sincerely,
> Dmitriy Pavlov
>
> пт, 8 нояб. 2019 г. в 08:57, Ivan Pavlukhin <[hidden email]>:
>
> > Denis,
> >
> > To make things really clearer we need to provide some concrete example
> > of Compute + LocalSQL and reason about it to figure out whether
> > "smart" SQL engine can deliver the same (or better) results or not.
> >
> > пт, 8 нояб. 2019 г. в 01:48, Denis Magda <[hidden email]>:
> > >
> > > Folks,
> > >
> > > See our compute tasks as an advanced version of stored procedures that
> > let
> > > the users code the logic of various complexity with Java, .NET or C++
> > (and
> > > not with PL/SQL). The logic can use a combination of APIs (key-value,
> > SQL,
> > > etc.) to access data both locally and remotely while being executed on
> > > server nodes. The logic can make N key-value requests or run M SQL
> > queries.
> > >
> > > We kept supporting local SQL queries exactly for such scenarios (for our
> > > version of stored procedures) to ensure the distributed map-reduce phase
> > is
> > > canceled if all the data is local. And affinityCalls were improved one
> > day
> > > to pin the partitions.
> > >
> > > If the new engine is smart enough to understand that all the partitions
> > are
> > > available locally during the affinityRun execution then it's totally fine
> > > to remove the 'local' flag. Otherwise, we need to instruct the engine
> > > manually that a distributed phase is redundant via 'local' flag or by
> > other
> > > means.
> > >
> > > Does it make things clearer?
> > >
> > >
> > > -
> > > Denis
> > >
> > >
> > > On Thu, Nov 7, 2019 at 3:53 AM Ivan Pavlukhin <[hidden email]>
> > wrote:
> > >
> > > > Stephen,
> > > >
> > > > In my understanding we need to do a better job to realize use-cases of
> > > > Compute + LocalSQL ourselves.
> > > >
> > > > Ideally smart optimizer should do the best job of query deployment.
> > > >
> > > > чт, 7 нояб. 2019 г. в 13:04, Stephen Darlington
> > > > <[hidden email]>:
> > > > >
> > > > > I made a (bad) assumption that this would also affect queries against
> > > > partitions. If “setLocal()” goes away but “setPartitions()” remains I’m
> > > > happy.
> > > > >
> > > > > What I would say is that the “broadcast / local” method is one I see
> > > > fairly often. Do we need to do a better job educating people of the
> > > > “correct” way?
> > > > >
> > > > > Regards,
> > > > > Stephen
> > > > >
> > > > > > On 7 Nov 2019, at 08:30, Alexey Goncharuk <
> > [hidden email]>
> > > > wrote:
> > > > > >
> > > > > > Denis, Stephen,
> > > > > >
> > > > > > Running a local query in a broadcast closure won't work on changing
> > > > > > topology. We specifically added an affinityCall method to the
> > compute
> > > > API
> > > > > > in order to pin a partition to prevent its moving and eviction
> > > > throughout
> > > > > > the task execution. Therefore, the query inside an affinityCall is
> > > > always
> > > > > > executed against some partitions (otherwise the query may give
> > > > incorrect
> > > > > > results when topology is changed).
> > > > > >
> > > > > > I support Igor's question and think that the 'local' flag for the
> > query
> > > > > > should be deprecated and eventually removed. A 'local' query can
> > > > always be
> > > > > > expressed as a query agains a set of partitions. If those
> > partitions
> > > > are
> > > > > > located on the same node - good, we get fast and correct results.
> > If
> > > > not -
> > > > > > we may either raise an exception and ask user to remap the query,
> > or
> > > > > > fallback to a distributed query execution.
> > > > > >
> > > > > > Given that the Calcite prototype is in its early stages, it's
> > likely
> > > > its
> > > > > > first version will be available in 3.x, and it's a good chance to
> > get
> > > > rid
> > > > > > of wrong API pieces.
> > > > > >
> > > > > > --AG
> > > > > >
> > > > > > пн, 4 нояб. 2019 г. в 14:02, Stephen Darlington <
> > > > > > [hidden email]>:
> > > > > >
> > > > > >> A common use case is where you want to work on many rows of data
> > > > across
> > > > > >> the grid. You’d broadcast a closure, running the same code on
> > every
> > > > node
> > > > > >> with just the local data. SQL doesn’t work in isolation — it’s
> > often
> > > > used
> > > > > >> as a filter for future computations.
> > > > > >>
> > > > > >> Regards,
> > > > > >> Stephen
> > > > > >>
> > > > > >>> On 1 Nov 2019, at 17:53, Ivan Pavlukhin <[hidden email]>
> > wrote:
> > > > > >>>
> > > > > >>> Denis,
> > > > > >>>
> > > > > >>> I am mostly concerned about gathering use cases. It would be
> > great to
> > > > > >>> critically assess such cases to identify why it cannot be solved
> > by
> > > > > >>> using distributed SQL. Also it sounds similar to some kind of
> > > > "hints",
> > > > > >>> but very limited and with all hints drawbacks (impossibility to
> > use
> > > > > >>> full strength of CBO). We can provide better "hints" support
> > with new
> > > > > >>> engine as well.
> > > > > >>>
> > > > > >>> пт, 1 нояб. 2019 г. в 20:14, Denis Magda <[hidden email]>:
> > > > > >>>>
> > > > > >>>> Ivan,
> > > > > >>>>
> > > > > >>>> I was involved in a couple of such use cases personally, so,
> > that's
> > > > not
> > > > > >> my
> > > > > >>>> imagination ;) Even more, as far as I remember, the primary
> > reason
> > > > why
> > > > > >> we
> > > > > >>>> improved our affinityRuns ensuring no partition is purged from a
> > > > node
> > > > > >> until
> > > > > >>>> a task is completed is because many users were running local SQL
> > > > from
> > > > > >>>> compute tasks and needed a guarantee that SQL will always
> > return a
> > > > > >> correct
> > > > > >>>> result set.
> > > > > >>>>
> > > > > >>>> -
> > > > > >>>> Denis
> > > > > >>>>
> > > > > >>>>
> > > > > >>>> On Fri, Nov 1, 2019 at 10:01 AM Ivan Pavlukhin <
> > [hidden email]
> > > > >
> > > > > >> wrote:
> > > > > >>>>
> > > > > >>>>> Denis,
> > > > > >>>>>
> > > > > >>>>> Would be nice to see real use-cases of affinity call + local
> > SQL
> > > > > >>>>> combination. Generally, new engine will be able to infer
> > > > collocation
> > > > > >>>>> resulting in the same collocated execution automatically.
> > > > > >>>>>
> > > > > >>>>> пт, 1 нояб. 2019 г. в 19:11, Denis Magda <[hidden email]>:
> > > > > >>>>>>
> > > > > >>>>>> Hi Igor,
> > > > > >>>>>>
> > > > > >>>>>> Local queries feature is broadly used together with
> > affinity-based
> > > > > >>>>> compute
> > > > > >>>>>> tasks:
> > > > > >>>>>>
> > > > > >>>>>
> > > > > >>
> > > >
> > https://apacheignite.readme.io/docs/collocate-compute-and-data#section-affinity-call-and-run-methods
> > > > > >>>>>>
> > > > > >>>>>> The use case is as follows. The user knows that all required
> > data
> > > > > >> needed
> > > > > >>>>>> for computation is collocated, and SQL is used as an advanced
> > API
> > > > for
> > > > > >>>>> data
> > > > > >>>>>> retrieval from the computation code. The affinity task ensures
> > > > that
> > > > > >>>>>> partitions won't be discarded from the node(s) if the topology
> > > > changes
> > > > > >>>>>> during the task execution and, thus, it's safe to run SQL
> > locally
> > > > > >>>>> skipping
> > > > > >>>>>> distributed phases.
> > > > > >>>>>>
> > > > > >>>>>> The combination of affinity compute tasks with local SQL is a
> > > > real and
> > > > > >>>>>> valuable use case, and this is what we need to support with
> > > > Calcite.
> > > > > >> Do
> > > > > >>>>> you
> > > > > >>>>>> see any challenges?
> > > > > >>>>>>
> > > > > >>>>>> -
> > > > > >>>>>> Denis
> > > > > >>>>>>
> > > > > >>>>>>
> > > > > >>>>>> On Fri, Nov 1, 2019 at 8:46 AM Roman Kondakov
> > > > > >> <[hidden email]
> > > > > >>>>>>
> > > > > >>>>>> wrote:
> > > > > >>>>>>
> > > > > >>>>>>> Hi Igor!
> > > > > >>>>>>>
> > > > > >>>>>>> IMO we need to maintain the backward compatibility between
> > old
> > > > and
> > > > > >> new
> > > > > >>>>>>> query engines as much as possible. And therefore we shouldn't
> > > > change
> > > > > >>>>> the
> > > > > >>>>>>> behavior of local queries.
> > > > > >>>>>>>
> > > > > >>>>>>> So, for local queries Calcite's planner shouldn't consider
> > the
> > > > > >>>>>>> distribution trait at all.
> > > > > >>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>> --
> > > > > >>>>>>> Kind Regards
> > > > > >>>>>>> Roman Kondakov
> > > > > >>>>>>>
> > > > > >>>>>>> On 01.11.2019 17:07, Seliverstov Igor wrote:
> > > > > >>>>>>>> Hi Igniters,
> > > > > >>>>>>>>
> > > > > >>>>>>>> Working on new generation of Ignite SQL I faced a question:
> > «Do
> > > > we
> > > > > >>>>> need
> > > > > >>>>>>> local queries at all and, if so, what semantic they should
> > > > have?».
> > > > > >>>>>>>>
> > > > > >>>>>>>> Current planing flow consists of next steps:
> > > > > >>>>>>>>
> > > > > >>>>>>>> 1) Parsing SQL to AST
> > > > > >>>>>>>> 2) Validating AST (against Schema)
> > > > > >>>>>>>> 3) Optimizing (Building execution graph)
> > > > > >>>>>>>> 4) Splitting (into query fragments which executes on target
> > > > nodes)
> > > > > >>>>>>>> 5) Mapping (query fragments to nodes/partitions)
> > > > > >>>>>>>>
> > > > > >>>>>>>> At last step we check that all Fragment sources (a table or
> > > > result)
> > > > > >>>>> have
> > > > > >>>>>>> the same distribution (in other words all sources have to be
> > > > > >>>>> co-located)
> > > > > >>>>>>>>
> > > > > >>>>>>>> Planner and Splitter guarantee that all caches in a
> > Fragment are
> > > > > >>>>>>> co-located, an Exchange is produced otherwise. But if we
> > force
> > > > local
> > > > > >>>>>>> execution we cannot produce Exchanges, that means we may
> > face two
> > > > > >>>>>>> non-co-located caches inside a single query fragment (result
> > of
> > > > local
> > > > > >>>>> query
> > > > > >>>>>>> planning is a single query fragment). So, we cannot pass the
> > > > check.
> > > > > >>>>>>>>
> > > > > >>>>>>>> Should we throw an exception or omit the check for local
> > query
> > > > > >>>>> planning
> > > > > >>>>>>> or prohibit local queries at all?
> > > > > >>>>>>>>
> > > > > >>>>>>>> Your thoughts?
> > > > > >>>>>>>>
> > > > > >>>>>>>> Regards,
> > > > > >>>>>>>> Igor
> > > > > >>>>>>>
> > > > > >>>>>
> > > > > >>>>>
> > > > > >>>>>
> > > > > >>>>> --
> > > > > >>>>> Best regards,
> > > > > >>>>> Ivan Pavlukhin
> > > > > >>>>>
> > > > > >>>
> > > > > >>>
> > > > > >>>
> > > > > >>> --
> > > > > >>> Best regards,
> > > > > >>> Ivan Pavlukhin
> > > > > >>
> > > > > >>
> > > > > >>
> > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > Best regards,
> > > > Ivan Pavlukhin
> > > >
> >
> >
> >
> > --
> > Best regards,
> > Ivan Pavlukhin
> >

--
Best regards,
Ivan Pavlukhin

Dmitry Pavlov

Re: Calcite based SQL query engine. Local queries

Yes, I understand that it is straightforward and, may be, naive approach.
Which is why I'm asking how to do map-reduce on cache C data in Ignite with
proper partition pinning.

About Predefined/Implemented aggregate - I'm not sure I agree that we can
predict everything. It is real perk of Ignite that you can send any of your
code (which, BTW, can be developed in lifetime of the system) to your data.

So I propose map and reduce phase should allow user code to be executed. If
I know any other better approach, I would somehow document it (e.g. add to
some next training/workshop).

Sincerely,
Dmitriy Pavlov

пт, 8 нояб. 2019 г. в 15:45, Ivan Pavlukhin <[hidden email]>:

> Dmitriy,
>
> First, what kind of cumulative metric can it be? A lot of cumulative
> metrics can be compared using SQL. MIN, MAX, AVG are simple ones. For
> more complex ones I can think about user-define aggregate functions
> (UDAF). We do not have them in Ignite so far, but can introduce them.
>
> Second, naive approaches of such ComputeScan can lead to incorrect
> results as partitions might not be properly pinned and duplicate
> entries might appear.
>
> пт, 8 нояб. 2019 г. в 15:27, Dmitriy Pavlov <[hidden email]>:
> >
> > Hi Ivan, Igniters, imagine you need to scan all entities in the cluster.
> >
> > Ideally, you don't want to de-serialize all of entries, so you can use
> > withKeepBinary(). e.g. you need a couple of fields and get some
> cumulative
> > metric on this data. You can send compute to all cluster nodes and run
> > there SQL scan queries with local mode is on. In that manner you can
> > implement Map-Reduce.
> >
> > It may be there is another way of doing that, so I encourage to share
> it. I
> > could update workshops/training I preparing in background.
> >
> > Sincerely,
> > Dmitriy Pavlov
> >
> > пт, 8 нояб. 2019 г. в 08:57, Ivan Pavlukhin <[hidden email]>:
> >
> > > Denis,
> > >
> > > To make things really clearer we need to provide some concrete example
> > > of Compute + LocalSQL and reason about it to figure out whether
> > > "smart" SQL engine can deliver the same (or better) results or not.
> > >
> > > пт, 8 нояб. 2019 г. в 01:48, Denis Magda <[hidden email]>:
> > > >
> > > > Folks,
> > > >
> > > > See our compute tasks as an advanced version of stored procedures
> that
> > > let
> > > > the users code the logic of various complexity with Java, .NET or C++
> > > (and
> > > > not with PL/SQL). The logic can use a combination of APIs (key-value,
> > > SQL,
> > > > etc.) to access data both locally and remotely while being executed
> on
> > > > server nodes. The logic can make N key-value requests or run M SQL
> > > queries.
> > > >
> > > > We kept supporting local SQL queries exactly for such scenarios (for
> our
> > > > version of stored procedures) to ensure the distributed map-reduce
> phase
> > > is
> > > > canceled if all the data is local. And affinityCalls were improved
> one
> > > day
> > > > to pin the partitions.
> > > >
> > > > If the new engine is smart enough to understand that all the
> partitions
> > > are
> > > > available locally during the affinityRun execution then it's totally
> fine
> > > > to remove the 'local' flag. Otherwise, we need to instruct the engine
> > > > manually that a distributed phase is redundant via 'local' flag or by
> > > other
> > > > means.
> > > >
> > > > Does it make things clearer?
> > > >
> > > >
> > > > -
> > > > Denis
> > > >
> > > >
> > > > On Thu, Nov 7, 2019 at 3:53 AM Ivan Pavlukhin <[hidden email]>
> > > wrote:
> > > >
> > > > > Stephen,
> > > > >
> > > > > In my understanding we need to do a better job to realize
> use-cases of
> > > > > Compute + LocalSQL ourselves.
> > > > >
> > > > > Ideally smart optimizer should do the best job of query deployment.
> > > > >
> > > > > чт, 7 нояб. 2019 г. в 13:04, Stephen Darlington
> > > > > <[hidden email]>:
> > > > > >
> > > > > > I made a (bad) assumption that this would also affect queries
> against
> > > > > partitions. If “setLocal()” goes away but “setPartitions()”
> remains I’m
> > > > > happy.
> > > > > >
> > > > > > What I would say is that the “broadcast / local” method is one I
> see
> > > > > fairly often. Do we need to do a better job educating people of the
> > > > > “correct” way?
> > > > > >
> > > > > > Regards,
> > > > > > Stephen
> > > > > >
> > > > > > > On 7 Nov 2019, at 08:30, Alexey Goncharuk <
> > > [hidden email]>
> > > > > wrote:
> > > > > > >
> > > > > > > Denis, Stephen,
> > > > > > >
> > > > > > > Running a local query in a broadcast closure won't work on
> changing
> > > > > > > topology. We specifically added an affinityCall method to the
> > > compute
> > > > > API
> > > > > > > in order to pin a partition to prevent its moving and eviction
> > > > > throughout
> > > > > > > the task execution. Therefore, the query inside an
> affinityCall is
> > > > > always
> > > > > > > executed against some partitions (otherwise the query may give
> > > > > incorrect
> > > > > > > results when topology is changed).
> > > > > > >
> > > > > > > I support Igor's question and think that the 'local' flag for
> the
> > > query
> > > > > > > should be deprecated and eventually removed. A 'local' query
> can
> > > > > always be
> > > > > > > expressed as a query agains a set of partitions. If those
> > > partitions
> > > > > are
> > > > > > > located on the same node - good, we get fast and correct
> results.
> > > If
> > > > > not -
> > > > > > > we may either raise an exception and ask user to remap the
> query,
> > > or
> > > > > > > fallback to a distributed query execution.
> > > > > > >
> > > > > > > Given that the Calcite prototype is in its early stages, it's
> > > likely
> > > > > its
> > > > > > > first version will be available in 3.x, and it's a good chance
> to
> > > get
> > > > > rid
> > > > > > > of wrong API pieces.
> > > > > > >
> > > > > > > --AG
> > > > > > >
> > > > > > > пн, 4 нояб. 2019 г. в 14:02, Stephen Darlington <
> > > > > > > [hidden email]>:
> > > > > > >
> > > > > > >> A common use case is where you want to work on many rows of
> data
> > > > > across
> > > > > > >> the grid. You’d broadcast a closure, running the same code on
> > > every
> > > > > node
> > > > > > >> with just the local data. SQL doesn’t work in isolation — it’s
> > > often
> > > > > used
> > > > > > >> as a filter for future computations.
> > > > > > >>
> > > > > > >> Regards,
> > > > > > >> Stephen
> > > > > > >>
> > > > > > >>> On 1 Nov 2019, at 17:53, Ivan Pavlukhin <[hidden email]
> >
> > > wrote:
> > > > > > >>>
> > > > > > >>> Denis,
> > > > > > >>>
> > > > > > >>> I am mostly concerned about gathering use cases. It would be
> > > great to
> > > > > > >>> critically assess such cases to identify why it cannot be
> solved
> > > by
> > > > > > >>> using distributed SQL. Also it sounds similar to some kind of
> > > > > "hints",
> > > > > > >>> but very limited and with all hints drawbacks (impossibility
> to
> > > use
> > > > > > >>> full strength of CBO). We can provide better "hints" support
> > > with new
> > > > > > >>> engine as well.
> > > > > > >>>
> > > > > > >>> пт, 1 нояб. 2019 г. в 20:14, Denis Magda <[hidden email]
> >:
> > > > > > >>>>
> > > > > > >>>> Ivan,
> > > > > > >>>>
> > > > > > >>>> I was involved in a couple of such use cases personally, so,
> > > that's
> > > > > not
> > > > > > >> my
> > > > > > >>>> imagination ;) Even more, as far as I remember, the primary
> > > reason
> > > > > why
> > > > > > >> we
> > > > > > >>>> improved our affinityRuns ensuring no partition is purged
> from a
> > > > > node
> > > > > > >> until
> > > > > > >>>> a task is completed is because many users were running
> local SQL
> > > > > from
> > > > > > >>>> compute tasks and needed a guarantee that SQL will always
> > > return a
> > > > > > >> correct
> > > > > > >>>> result set.
> > > > > > >>>>
> > > > > > >>>> -
> > > > > > >>>> Denis
> > > > > > >>>>
> > > > > > >>>>
> > > > > > >>>> On Fri, Nov 1, 2019 at 10:01 AM Ivan Pavlukhin <
> > > [hidden email]
> > > > > >
> > > > > > >> wrote:
> > > > > > >>>>
> > > > > > >>>>> Denis,
> > > > > > >>>>>
> > > > > > >>>>> Would be nice to see real use-cases of affinity call +
> local
> > > SQL
> > > > > > >>>>> combination. Generally, new engine will be able to infer
> > > > > collocation
> > > > > > >>>>> resulting in the same collocated execution automatically.
> > > > > > >>>>>
> > > > > > >>>>> пт, 1 нояб. 2019 г. в 19:11, Denis Magda <
> [hidden email]>:
> > > > > > >>>>>>
> > > > > > >>>>>> Hi Igor,
> > > > > > >>>>>>
> > > > > > >>>>>> Local queries feature is broadly used together with
> > > affinity-based
> > > > > > >>>>> compute
> > > > > > >>>>>> tasks:
> > > > > > >>>>>>
> > > > > > >>>>>
> > > > > > >>
> > > > >
> > >
> https://apacheignite.readme.io/docs/collocate-compute-and-data#section-affinity-call-and-run-methods
> > > > > > >>>>>>
> > > > > > >>>>>> The use case is as follows. The user knows that all
> required
> > > data
> > > > > > >> needed
> > > > > > >>>>>> for computation is collocated, and SQL is used as an
> advanced
> > > API
> > > > > for
> > > > > > >>>>> data
> > > > > > >>>>>> retrieval from the computation code. The affinity task
> ensures
> > > > > that
> > > > > > >>>>>> partitions won't be discarded from the node(s) if the
> topology
> > > > > changes
> > > > > > >>>>>> during the task execution and, thus, it's safe to run SQL
> > > locally
> > > > > > >>>>> skipping
> > > > > > >>>>>> distributed phases.
> > > > > > >>>>>>
> > > > > > >>>>>> The combination of affinity compute tasks with local SQL
> is a
> > > > > real and
> > > > > > >>>>>> valuable use case, and this is what we need to support
> with
> > > > > Calcite.
> > > > > > >> Do
> > > > > > >>>>> you
> > > > > > >>>>>> see any challenges?
> > > > > > >>>>>>
> > > > > > >>>>>> -
> > > > > > >>>>>> Denis
> > > > > > >>>>>>
> > > > > > >>>>>>
> > > > > > >>>>>> On Fri, Nov 1, 2019 at 8:46 AM Roman Kondakov
> > > > > > >> <[hidden email]
> > > > > > >>>>>>
> > > > > > >>>>>> wrote:
> > > > > > >>>>>>
> > > > > > >>>>>>> Hi Igor!
> > > > > > >>>>>>>
> > > > > > >>>>>>> IMO we need to maintain the backward compatibility
> between
> > > old
> > > > > and
> > > > > > >> new
> > > > > > >>>>>>> query engines as much as possible. And therefore we
> shouldn't
> > > > > change
> > > > > > >>>>> the
> > > > > > >>>>>>> behavior of local queries.
> > > > > > >>>>>>>
> > > > > > >>>>>>> So, for local queries Calcite's planner shouldn't
> consider
> > > the
> > > > > > >>>>>>> distribution trait at all.
> > > > > > >>>>>>>
> > > > > > >>>>>>>
> > > > > > >>>>>>> --
> > > > > > >>>>>>> Kind Regards
> > > > > > >>>>>>> Roman Kondakov
> > > > > > >>>>>>>
> > > > > > >>>>>>> On 01.11.2019 17:07, Seliverstov Igor wrote:
> > > > > > >>>>>>>> Hi Igniters,
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> Working on new generation of Ignite SQL I faced a
> question:
> > > «Do
> > > > > we
> > > > > > >>>>> need
> > > > > > >>>>>>> local queries at all and, if so, what semantic they
> should
> > > > > have?».
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> Current planing flow consists of next steps:
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> 1) Parsing SQL to AST
> > > > > > >>>>>>>> 2) Validating AST (against Schema)
> > > > > > >>>>>>>> 3) Optimizing (Building execution graph)
> > > > > > >>>>>>>> 4) Splitting (into query fragments which executes on
> target
> > > > > nodes)
> > > > > > >>>>>>>> 5) Mapping (query fragments to nodes/partitions)
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> At last step we check that all Fragment sources (a
> table or
> > > > > result)
> > > > > > >>>>> have
> > > > > > >>>>>>> the same distribution (in other words all sources have
> to be
> > > > > > >>>>> co-located)
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> Planner and Splitter guarantee that all caches in a
> > > Fragment are
> > > > > > >>>>>>> co-located, an Exchange is produced otherwise. But if we
> > > force
> > > > > local
> > > > > > >>>>>>> execution we cannot produce Exchanges, that means we may
> > > face two
> > > > > > >>>>>>> non-co-located caches inside a single query fragment
> (result
> > > of
> > > > > local
> > > > > > >>>>> query
> > > > > > >>>>>>> planning is a single query fragment). So, we cannot pass
> the
> > > > > check.
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> Should we throw an exception or omit the check for local
> > > query
> > > > > > >>>>> planning
> > > > > > >>>>>>> or prohibit local queries at all?
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> Your thoughts?
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> Regards,
> > > > > > >>>>>>>> Igor
> > > > > > >>>>>>>
> > > > > > >>>>>
> > > > > > >>>>>
> > > > > > >>>>>
> > > > > > >>>>> --
> > > > > > >>>>> Best regards,
> > > > > > >>>>> Ivan Pavlukhin
> > > > > > >>>>>
> > > > > > >>>
> > > > > > >>>
> > > > > > >>>
> > > > > > >>> --
> > > > > > >>> Best regards,
> > > > > > >>> Ivan Pavlukhin
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > >
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Best regards,
> > > > > Ivan Pavlukhin
> > > > >
> > >
> > >
> > >
> > > --
> > > Best regards,
> > > Ivan Pavlukhin
> > >
>
>
>
> --
> Best regards,
> Ivan Pavlukhin
>

dmagda

Re: Calcite based SQL query engine. Local queries

In reply to this post by Ivan Pavlukhin

Take the amount of cashback calculation or payments authorization as
examples of compute tasks with local SQL. In the first case, all
transactions are collocated per account and a bank needs to calculate the
cashback monthly by broadcasting the task that executes special logic
across all accounts and SQL is used by the logic to access the data with
various filters. The same is done for an individual account with an
affinity call. In the second case, a man swipes a card at a shop register,
systems sends a compute task to the node that collocates a lot of data per
the man account and begins calculating hundreds or thousands of variables
retrieving data with both key-value and SQL.

Also, take drugs discovery and other pharmaceutical examples. Those are
compute-heavy and the users from that space were sharing the stories how
compute, scan, sql and key-value apis are used together with compute.

At all, each industry has compute-heavy use cases that need to retrieve
local data with local SQL, there are real Ignite users who do this in prod.
Again, we also need to think about our compute as of advanced stored and
complex procedures that can retrieve local/collocated data not only with
key-value and scans but with SQL as well that supports conditions, joins,
etc.

Denis

On Thursday, November 7, 2019, Ivan Pavlukhin <[hidden email]> wrote:

> Denis,
>
> To make things really clearer we need to provide some concrete example
> of Compute + LocalSQL and reason about it to figure out whether
> "smart" SQL engine can deliver the same (or better) results or not.
>
> пт, 8 нояб. 2019 г. в 01:48, Denis Magda <[hidden email]>:
> >
> > Folks,
> >
> > See our compute tasks as an advanced version of stored procedures that
> let
> > the users code the logic of various complexity with Java, .NET or C++
> (and
> > not with PL/SQL). The logic can use a combination of APIs (key-value,
> SQL,
> > etc.) to access data both locally and remotely while being executed on
> > server nodes. The logic can make N key-value requests or run M SQL
> queries.
> >
> > We kept supporting local SQL queries exactly for such scenarios (for our
> > version of stored procedures) to ensure the distributed map-reduce phase
> is
> > canceled if all the data is local. And affinityCalls were improved one
> day
> > to pin the partitions.
> >
> > If the new engine is smart enough to understand that all the partitions
> are
> > available locally during the affinityRun execution then it's totally fine
> > to remove the 'local' flag. Otherwise, we need to instruct the engine
> > manually that a distributed phase is redundant via 'local' flag or by
> other
> > means.
> >
> > Does it make things clearer?
> >
> >
> > -
> > Denis
> >
> >
> > On Thu, Nov 7, 2019 at 3:53 AM Ivan Pavlukhin <[hidden email]>
> wrote:
> >
> > > Stephen,
> > >
> > > In my understanding we need to do a better job to realize use-cases of
> > > Compute + LocalSQL ourselves.
> > >
> > > Ideally smart optimizer should do the best job of query deployment.
> > >
> > > чт, 7 нояб. 2019 г. в 13:04, Stephen Darlington
> > > <[hidden email]>:
> > > >
> > > > I made a (bad) assumption that this would also affect queries against
> > > partitions. If “setLocal()” goes away but “setPartitions()” remains I’m
> > > happy.
> > > >
> > > > What I would say is that the “broadcast / local” method is one I see
> > > fairly often. Do we need to do a better job educating people of the
> > > “correct” way?
> > > >
> > > > Regards,
> > > > Stephen
> > > >
> > > > > On 7 Nov 2019, at 08:30, Alexey Goncharuk <
> [hidden email]>
> > > wrote:
> > > > >
> > > > > Denis, Stephen,
> > > > >
> > > > > Running a local query in a broadcast closure won't work on changing
> > > > > topology. We specifically added an affinityCall method to the
> compute
> > > API
> > > > > in order to pin a partition to prevent its moving and eviction
> > > throughout
> > > > > the task execution. Therefore, the query inside an affinityCall is
> > > always
> > > > > executed against some partitions (otherwise the query may give
> > > incorrect
> > > > > results when topology is changed).
> > > > >
> > > > > I support Igor's question and think that the 'local' flag for the
> query
> > > > > should be deprecated and eventually removed. A 'local' query can
> > > always be
> > > > > expressed as a query agains a set of partitions. If those
> partitions
> > > are
> > > > > located on the same node - good, we get fast and correct results.
> If
> > > not -
> > > > > we may either raise an exception and ask user to remap the query,
> or
> > > > > fallback to a distributed query execution.
> > > > >
> > > > > Given that the Calcite prototype is in its early stages, it's
> likely
> > > its
> > > > > first version will be available in 3.x, and it's a good chance to
> get
> > > rid
> > > > > of wrong API pieces.
> > > > >
> > > > > --AG
> > > > >
> > > > > пн, 4 нояб. 2019 г. в 14:02, Stephen Darlington <
> > > > > [hidden email]>:
> > > > >
> > > > >> A common use case is where you want to work on many rows of data
> > > across
> > > > >> the grid. You’d broadcast a closure, running the same code on
> every
> > > node
> > > > >> with just the local data. SQL doesn’t work in isolation — it’s
> often
> > > used
> > > > >> as a filter for future computations.
> > > > >>
> > > > >> Regards,
> > > > >> Stephen
> > > > >>
> > > > >>> On 1 Nov 2019, at 17:53, Ivan Pavlukhin <[hidden email]>
> wrote:
> > > > >>>
> > > > >>> Denis,
> > > > >>>
> > > > >>> I am mostly concerned about gathering use cases. It would be
> great to
> > > > >>> critically assess such cases to identify why it cannot be solved
> by
> > > > >>> using distributed SQL. Also it sounds similar to some kind of
> > > "hints",
> > > > >>> but very limited and with all hints drawbacks (impossibility to
> use
> > > > >>> full strength of CBO). We can provide better "hints" support
> with new
> > > > >>> engine as well.
> > > > >>>
> > > > >>> пт, 1 нояб. 2019 г. в 20:14, Denis Magda <[hidden email]>:
> > > > >>>>
> > > > >>>> Ivan,
> > > > >>>>
> > > > >>>> I was involved in a couple of such use cases personally, so,
> that's
> > > not
> > > > >> my
> > > > >>>> imagination ;) Even more, as far as I remember, the primary
> reason
> > > why
> > > > >> we
> > > > >>>> improved our affinityRuns ensuring no partition is purged from a
> > > node
> > > > >> until
> > > > >>>> a task is completed is because many users were running local SQL
> > > from
> > > > >>>> compute tasks and needed a guarantee that SQL will always
> return a
> > > > >> correct
> > > > >>>> result set.
> > > > >>>>
> > > > >>>> -
> > > > >>>> Denis
> > > > >>>>
> > > > >>>>
> > > > >>>> On Fri, Nov 1, 2019 at 10:01 AM Ivan Pavlukhin <
> [hidden email]
> > > >
> > > > >> wrote:
> > > > >>>>
> > > > >>>>> Denis,
> > > > >>>>>
> > > > >>>>> Would be nice to see real use-cases of affinity call + local
> SQL
> > > > >>>>> combination. Generally, new engine will be able to infer
> > > collocation
> > > > >>>>> resulting in the same collocated execution automatically.
> > > > >>>>>
> > > > >>>>> пт, 1 нояб. 2019 г. в 19:11, Denis Magda <[hidden email]>:
> > > > >>>>>>
> > > > >>>>>> Hi Igor,
> > > > >>>>>>
> > > > >>>>>> Local queries feature is broadly used together with
> affinity-based
> > > > >>>>> compute
> > > > >>>>>> tasks:
> > > > >>>>>>
> > > > >>>>>
> > > > >>
> > > https://apacheignite.readme.io/docs/collocate-compute-and-
> data#section-affinity-call-and-run-methods
> > > > >>>>>>
> > > > >>>>>> The use case is as follows. The user knows that all required
> data
> > > > >> needed
> > > > >>>>>> for computation is collocated, and SQL is used as an advanced
> API
> > > for
> > > > >>>>> data
> > > > >>>>>> retrieval from the computation code. The affinity task ensures
> > > that
> > > > >>>>>> partitions won't be discarded from the node(s) if the topology
> > > changes
> > > > >>>>>> during the task execution and, thus, it's safe to run SQL
> locally
> > > > >>>>> skipping
> > > > >>>>>> distributed phases.
> > > > >>>>>>
> > > > >>>>>> The combination of affinity compute tasks with local SQL is a
> > > real and
> > > > >>>>>> valuable use case, and this is what we need to support with
> > > Calcite.
> > > > >> Do
> > > > >>>>> you
> > > > >>>>>> see any challenges?
> > > > >>>>>>
> > > > >>>>>> -
> > > > >>>>>> Denis
> > > > >>>>>>
> > > > >>>>>>
> > > > >>>>>> On Fri, Nov 1, 2019 at 8:46 AM Roman Kondakov
> > > > >> <[hidden email]
> > > > >>>>>>
> > > > >>>>>> wrote:
> > > > >>>>>>
> > > > >>>>>>> Hi Igor!
> > > > >>>>>>>
> > > > >>>>>>> IMO we need to maintain the backward compatibility between
> old
> > > and
> > > > >> new
> > > > >>>>>>> query engines as much as possible. And therefore we shouldn't
> > > change
> > > > >>>>> the
> > > > >>>>>>> behavior of local queries.
> > > > >>>>>>>
> > > > >>>>>>> So, for local queries Calcite's planner shouldn't consider
> the
> > > > >>>>>>> distribution trait at all.
> > > > >>>>>>>
> > > > >>>>>>>
> > > > >>>>>>> --
> > > > >>>>>>> Kind Regards
> > > > >>>>>>> Roman Kondakov
> > > > >>>>>>>
> > > > >>>>>>> On 01.11.2019 17:07, Seliverstov Igor wrote:
> > > > >>>>>>>> Hi Igniters,
> > > > >>>>>>>>
> > > > >>>>>>>> Working on new generation of Ignite SQL I faced a question:
> «Do
> > > we
> > > > >>>>> need
> > > > >>>>>>> local queries at all and, if so, what semantic they should
> > > have?».
> > > > >>>>>>>>
> > > > >>>>>>>> Current planing flow consists of next steps:
> > > > >>>>>>>>
> > > > >>>>>>>> 1) Parsing SQL to AST
> > > > >>>>>>>> 2) Validating AST (against Schema)
> > > > >>>>>>>> 3) Optimizing (Building execution graph)
> > > > >>>>>>>> 4) Splitting (into query fragments which executes on target
> > > nodes)
> > > > >>>>>>>> 5) Mapping (query fragments to nodes/partitions)
> > > > >>>>>>>>
> > > > >>>>>>>> At last step we check that all Fragment sources (a table or
> > > result)
> > > > >>>>> have
> > > > >>>>>>> the same distribution (in other words all sources have to be
> > > > >>>>> co-located)
> > > > >>>>>>>>
> > > > >>>>>>>> Planner and Splitter guarantee that all caches in a
> Fragment are
> > > > >>>>>>> co-located, an Exchange is produced otherwise. But if we
> force
> > > local
> > > > >>>>>>> execution we cannot produce Exchanges, that means we may
> face two
> > > > >>>>>>> non-co-located caches inside a single query fragment (result
> of
> > > local
> > > > >>>>> query
> > > > >>>>>>> planning is a single query fragment). So, we cannot pass the
> > > check.
> > > > >>>>>>>>
> > > > >>>>>>>> Should we throw an exception or omit the check for local
> query
> > > > >>>>> planning
> > > > >>>>>>> or prohibit local queries at all?
> > > > >>>>>>>>
> > > > >>>>>>>> Your thoughts?
> > > > >>>>>>>>
> > > > >>>>>>>> Regards,
> > > > >>>>>>>> Igor
> > > > >>>>>>>
> > > > >>>>>
> > > > >>>>>
> > > > >>>>>
> > > > >>>>> --
> > > > >>>>> Best regards,
> > > > >>>>> Ivan Pavlukhin
> > > > >>>>>
> > > > >>>
> > > > >>>
> > > > >>>
> > > > >>> --
> > > > >>> Best regards,
> > > > >>> Ivan Pavlukhin
> > > > >>
> > > > >>
> > > > >>
> > > >
> > > >
> > >
> > >
> > > --
> > > Best regards,
> > > Ivan Pavlukhin
> > >
>
>
>
> --
> Best regards,
> Ivan Pavlukhin
>

--
-
Denis

Ivan Pavlukhin

Re: Calcite based SQL query engine. Local queries

Dmitriy,

Would be great if you can describe your use-case in more details,
might be sharing a code it the best option here.

Denis,

Yep, the idea of mixing up Compute, SQL, KV APIs in a super weapon
sounds as a killer feature. But I have a great deal of doubt that it
is not over-complex to use such tool properly in practice. Partition
reservation is not obvious with Ignite compute, KV API can be
transactional but SQL not and so on. Too many pitfalls.

пт, 8 нояб. 2019 г. в 16:50, Denis Magda <[hidden email]>:

>
> Take the amount of cashback calculation or payments authorization as
> examples of compute tasks with local SQL. In the first case, all
> transactions are collocated per account and a bank needs to calculate the
> cashback monthly by broadcasting the task that executes special logic
> across all accounts and SQL is used by the logic to access the data with
> various filters. The same is done for an individual account with an
> affinity call. In the second case, a man swipes a card at a shop register,
> systems sends a compute task to the node that collocates a lot of data per
> the man account and begins calculating hundreds or thousands of variables
> retrieving data with both key-value and SQL.
>
> Also, take drugs discovery and other pharmaceutical examples. Those are
> compute-heavy and the users from that space were sharing the stories how
> compute, scan, sql and key-value apis are used together with compute.
>
> At all, each industry has compute-heavy use cases that need to retrieve
> local data with local SQL, there are real Ignite users who do this in prod.
> Again, we also need to think about our compute as of advanced stored and
> complex procedures that can retrieve local/collocated data not only with
> key-value and scans but with SQL as well that supports conditions, joins,
> etc.
>
> Denis
>
> On Thursday, November 7, 2019, Ivan Pavlukhin <[hidden email]> wrote:
>
> > Denis,
> >
> > To make things really clearer we need to provide some concrete example
> > of Compute + LocalSQL and reason about it to figure out whether
> > "smart" SQL engine can deliver the same (or better) results or not.
> >
> > пт, 8 нояб. 2019 г. в 01:48, Denis Magda <[hidden email]>:
> > >
> > > Folks,
> > >
> > > See our compute tasks as an advanced version of stored procedures that
> > let
> > > the users code the logic of various complexity with Java, .NET or C++
> > (and
> > > not with PL/SQL). The logic can use a combination of APIs (key-value,
> > SQL,
> > > etc.) to access data both locally and remotely while being executed on
> > > server nodes. The logic can make N key-value requests or run M SQL
> > queries.
> > >
> > > We kept supporting local SQL queries exactly for such scenarios (for our
> > > version of stored procedures) to ensure the distributed map-reduce phase
> > is
> > > canceled if all the data is local. And affinityCalls were improved one
> > day
> > > to pin the partitions.
> > >
> > > If the new engine is smart enough to understand that all the partitions
> > are
> > > available locally during the affinityRun execution then it's totally fine
> > > to remove the 'local' flag. Otherwise, we need to instruct the engine
> > > manually that a distributed phase is redundant via 'local' flag or by
> > other
> > > means.
> > >
> > > Does it make things clearer?
> > >
> > >
> > > -
> > > Denis
> > >
> > >
> > > On Thu, Nov 7, 2019 at 3:53 AM Ivan Pavlukhin <[hidden email]>
> > wrote:
> > >
> > > > Stephen,
> > > >
> > > > In my understanding we need to do a better job to realize use-cases of
> > > > Compute + LocalSQL ourselves.
> > > >
> > > > Ideally smart optimizer should do the best job of query deployment.
> > > >
> > > > чт, 7 нояб. 2019 г. в 13:04, Stephen Darlington
> > > > <[hidden email]>:
> > > > >
> > > > > I made a (bad) assumption that this would also affect queries against
> > > > partitions. If “setLocal()” goes away but “setPartitions()” remains I’m
> > > > happy.
> > > > >
> > > > > What I would say is that the “broadcast / local” method is one I see
> > > > fairly often. Do we need to do a better job educating people of the
> > > > “correct” way?
> > > > >
> > > > > Regards,
> > > > > Stephen
> > > > >
> > > > > > On 7 Nov 2019, at 08:30, Alexey Goncharuk <
> > [hidden email]>
> > > > wrote:
> > > > > >
> > > > > > Denis, Stephen,
> > > > > >
> > > > > > Running a local query in a broadcast closure won't work on changing
> > > > > > topology. We specifically added an affinityCall method to the
> > compute
> > > > API
> > > > > > in order to pin a partition to prevent its moving and eviction
> > > > throughout
> > > > > > the task execution. Therefore, the query inside an affinityCall is
> > > > always
> > > > > > executed against some partitions (otherwise the query may give
> > > > incorrect
> > > > > > results when topology is changed).
> > > > > >
> > > > > > I support Igor's question and think that the 'local' flag for the
> > query
> > > > > > should be deprecated and eventually removed. A 'local' query can
> > > > always be
> > > > > > expressed as a query agains a set of partitions. If those
> > partitions
> > > > are
> > > > > > located on the same node - good, we get fast and correct results.
> > If
> > > > not -
> > > > > > we may either raise an exception and ask user to remap the query,
> > or
> > > > > > fallback to a distributed query execution.
> > > > > >
> > > > > > Given that the Calcite prototype is in its early stages, it's
> > likely
> > > > its
> > > > > > first version will be available in 3.x, and it's a good chance to
> > get
> > > > rid
> > > > > > of wrong API pieces.
> > > > > >
> > > > > > --AG
> > > > > >
> > > > > > пн, 4 нояб. 2019 г. в 14:02, Stephen Darlington <
> > > > > > [hidden email]>:
> > > > > >
> > > > > >> A common use case is where you want to work on many rows of data
> > > > across
> > > > > >> the grid. You’d broadcast a closure, running the same code on
> > every
> > > > node
> > > > > >> with just the local data. SQL doesn’t work in isolation — it’s
> > often
> > > > used
> > > > > >> as a filter for future computations.
> > > > > >>
> > > > > >> Regards,
> > > > > >> Stephen
> > > > > >>
> > > > > >>> On 1 Nov 2019, at 17:53, Ivan Pavlukhin <[hidden email]>
> > wrote:
> > > > > >>>
> > > > > >>> Denis,
> > > > > >>>
> > > > > >>> I am mostly concerned about gathering use cases. It would be
> > great to
> > > > > >>> critically assess such cases to identify why it cannot be solved
> > by
> > > > > >>> using distributed SQL. Also it sounds similar to some kind of
> > > > "hints",
> > > > > >>> but very limited and with all hints drawbacks (impossibility to
> > use
> > > > > >>> full strength of CBO). We can provide better "hints" support
> > with new
> > > > > >>> engine as well.
> > > > > >>>
> > > > > >>> пт, 1 нояб. 2019 г. в 20:14, Denis Magda <[hidden email]>:
> > > > > >>>>
> > > > > >>>> Ivan,
> > > > > >>>>
> > > > > >>>> I was involved in a couple of such use cases personally, so,
> > that's
> > > > not
> > > > > >> my
> > > > > >>>> imagination ;) Even more, as far as I remember, the primary
> > reason
> > > > why
> > > > > >> we
> > > > > >>>> improved our affinityRuns ensuring no partition is purged from a
> > > > node
> > > > > >> until
> > > > > >>>> a task is completed is because many users were running local SQL
> > > > from
> > > > > >>>> compute tasks and needed a guarantee that SQL will always
> > return a
> > > > > >> correct
> > > > > >>>> result set.
> > > > > >>>>
> > > > > >>>> -
> > > > > >>>> Denis
> > > > > >>>>
> > > > > >>>>
> > > > > >>>> On Fri, Nov 1, 2019 at 10:01 AM Ivan Pavlukhin <
> > [hidden email]
> > > > >
> > > > > >> wrote:
> > > > > >>>>
> > > > > >>>>> Denis,
> > > > > >>>>>
> > > > > >>>>> Would be nice to see real use-cases of affinity call + local
> > SQL
> > > > > >>>>> combination. Generally, new engine will be able to infer
> > > > collocation
> > > > > >>>>> resulting in the same collocated execution automatically.
> > > > > >>>>>
> > > > > >>>>> пт, 1 нояб. 2019 г. в 19:11, Denis Magda <[hidden email]>:
> > > > > >>>>>>
> > > > > >>>>>> Hi Igor,
> > > > > >>>>>>
> > > > > >>>>>> Local queries feature is broadly used together with
> > affinity-based
> > > > > >>>>> compute
> > > > > >>>>>> tasks:
> > > > > >>>>>>
> > > > > >>>>>
> > > > > >>
> > > > https://apacheignite.readme.io/docs/collocate-compute-and-
> > data#section-affinity-call-and-run-methods
> > > > > >>>>>>
> > > > > >>>>>> The use case is as follows. The user knows that all required
> > data
> > > > > >> needed
> > > > > >>>>>> for computation is collocated, and SQL is used as an advanced
> > API
> > > > for
> > > > > >>>>> data
> > > > > >>>>>> retrieval from the computation code. The affinity task ensures
> > > > that
> > > > > >>>>>> partitions won't be discarded from the node(s) if the topology
> > > > changes
> > > > > >>>>>> during the task execution and, thus, it's safe to run SQL
> > locally
> > > > > >>>>> skipping
> > > > > >>>>>> distributed phases.
> > > > > >>>>>>
> > > > > >>>>>> The combination of affinity compute tasks with local SQL is a
> > > > real and
> > > > > >>>>>> valuable use case, and this is what we need to support with
> > > > Calcite.
> > > > > >> Do
> > > > > >>>>> you
> > > > > >>>>>> see any challenges?
> > > > > >>>>>>
> > > > > >>>>>> -
> > > > > >>>>>> Denis
> > > > > >>>>>>
> > > > > >>>>>>
> > > > > >>>>>> On Fri, Nov 1, 2019 at 8:46 AM Roman Kondakov
> > > > > >> <[hidden email]
> > > > > >>>>>>
> > > > > >>>>>> wrote:
> > > > > >>>>>>
> > > > > >>>>>>> Hi Igor!
> > > > > >>>>>>>
> > > > > >>>>>>> IMO we need to maintain the backward compatibility between
> > old
> > > > and
> > > > > >> new
> > > > > >>>>>>> query engines as much as possible. And therefore we shouldn't
> > > > change
> > > > > >>>>> the
> > > > > >>>>>>> behavior of local queries.
> > > > > >>>>>>>
> > > > > >>>>>>> So, for local queries Calcite's planner shouldn't consider
> > the
> > > > > >>>>>>> distribution trait at all.
> > > > > >>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>> --
> > > > > >>>>>>> Kind Regards
> > > > > >>>>>>> Roman Kondakov
> > > > > >>>>>>>
> > > > > >>>>>>> On 01.11.2019 17:07, Seliverstov Igor wrote:
> > > > > >>>>>>>> Hi Igniters,
> > > > > >>>>>>>>
> > > > > >>>>>>>> Working on new generation of Ignite SQL I faced a question:
> > «Do
> > > > we
> > > > > >>>>> need
> > > > > >>>>>>> local queries at all and, if so, what semantic they should
> > > > have?».
> > > > > >>>>>>>>
> > > > > >>>>>>>> Current planing flow consists of next steps:
> > > > > >>>>>>>>
> > > > > >>>>>>>> 1) Parsing SQL to AST
> > > > > >>>>>>>> 2) Validating AST (against Schema)
> > > > > >>>>>>>> 3) Optimizing (Building execution graph)
> > > > > >>>>>>>> 4) Splitting (into query fragments which executes on target
> > > > nodes)
> > > > > >>>>>>>> 5) Mapping (query fragments to nodes/partitions)
> > > > > >>>>>>>>
> > > > > >>>>>>>> At last step we check that all Fragment sources (a table or
> > > > result)
> > > > > >>>>> have
> > > > > >>>>>>> the same distribution (in other words all sources have to be
> > > > > >>>>> co-located)
> > > > > >>>>>>>>
> > > > > >>>>>>>> Planner and Splitter guarantee that all caches in a
> > Fragment are
> > > > > >>>>>>> co-located, an Exchange is produced otherwise. But if we
> > force
> > > > local
> > > > > >>>>>>> execution we cannot produce Exchanges, that means we may
> > face two
> > > > > >>>>>>> non-co-located caches inside a single query fragment (result
> > of
> > > > local
> > > > > >>>>> query
> > > > > >>>>>>> planning is a single query fragment). So, we cannot pass the
> > > > check.
> > > > > >>>>>>>>
> > > > > >>>>>>>> Should we throw an exception or omit the check for local
> > query
> > > > > >>>>> planning
> > > > > >>>>>>> or prohibit local queries at all?
> > > > > >>>>>>>>
> > > > > >>>>>>>> Your thoughts?
> > > > > >>>>>>>>
> > > > > >>>>>>>> Regards,
> > > > > >>>>>>>> Igor
> > > > > >>>>>>>
> > > > > >>>>>
> > > > > >>>>>
> > > > > >>>>>
> > > > > >>>>> --
> > > > > >>>>> Best regards,
> > > > > >>>>> Ivan Pavlukhin
> > > > > >>>>>
> > > > > >>>
> > > > > >>>
> > > > > >>>
> > > > > >>> --
> > > > > >>> Best regards,
> > > > > >>> Ivan Pavlukhin
> > > > > >>
> > > > > >>
> > > > > >>
> > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > Best regards,
> > > > Ivan Pavlukhin
> > > >
> >
> >
> >
> > --
> > Best regards,
> > Ivan Pavlukhin
> >
>
>
> --
> -
> Denis

--
Best regards,
Ivan Pavlukhin