Apache Ignite Developers - Legacy Mail Archive

New SQL execution engine

Classic

List

Threaded

41 messages Options

123

Igor Seliverstov

New SQL execution engine

Hi Igniters!

As you might know currently we have many open issues relating to current H2 based engine and its execution flow.

Some of them are critical (like impossibility to execute particular queries), some of them are majors (like impossibility to execute particular queries without pre-preparation your data to have a collocation) and many minors.

Most of the issues cannot be solved without whole engine redesign.

So, here the proposal: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=130028084

I'll appreciate if you share your thoughts on top of that.

Regards,
Igor

Nikolay Izhikov-2

Re: New SQL execution engine

Hello, Igor.

Thanks for starting this discussion.

I think we should take a step back in it and answer the following questions:

1. What the exact issues with the H2 integration?
Can you send a tickets links?
Can we label all H2 integration issues in JIRA? I propose to use "h2" label.

2. What are the requirements for the new SQL engine?
We should write it down and discuss.

3. What options do we have?
Are there any alternatives to Calcite on the market?
We did the wrong choice that looked obvious one time.
So we should carefully avoid it at this time.

4. What is improvements of Ignite we want to make with the new engine?

В Пт, 27/09/2019 в 08:44 +0000, Igor Seliverstov пишет:

> Hi Igniters!
>
> As you might know currently we have many open issues relating to current H2 based engine and its execution flow.
>
> Some of them are critical (like impossibility to execute particular queries), some of them are majors (like impossibility to execute particular queries without pre-preparation your data to have a collocation) and many minors.
>
> Most of the issues cannot be solved without whole engine redesign.
>
> So, here the proposal: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=130028084
>
> I'll appreciate if you share your thoughts on top of that.
>
> Regards,
> Igor

signature.asc (499 bytes) Download Attachment

Roman Kondakov

Re: New SQL execution engine

In reply to this post by Igor Seliverstov

Hi Igor!

In my opinion using Apache Calcite for distributed SQL query
optimization and planning is much more promising approach than using H2.
H2 is not suitable for distributed query execution and also it has very
limited abilities for query optimization. While Apache Calcite is the
open source implementation of Cascade/Volcano query optimization
framework [1,2] (other implementations: MS SQL Server, Greenplum). The
main advantage of this framework is it's extensibility - we can change
the optimizer behavior by simply adding or removing optimization rules
to it. Calcite has a cost based optimizer as well as heuristic one which
can be useful in some situations.

The main challenges I see here:

1. Implementing the distributed query planning for Apache Calcite (it is
was primarily developed for the single-node query optimization). We can
reuse the solution of Apache Drill [3] guys here.

2. We need to implement a new distributed query execution engine. Apache
Calcite is a query planning framework, but not the execution one,
besides it has some abilities for executing queries in the single-node
case.

3. Secondary indexes are not supported by Calcite, so we need to
overcome this problem somehow. AFAIK Apache Phoenix [4] guys implemented
support of the secondary indexes as a sorted materialized views.

4. Apache Calcite is a cost-based optimizer - so we need to create our
own cost model and gather statistics to be able to choose the most
effective query execution plans.

5. What about deprecating our current query API which has a number of
drawbacks like using shortcuts `List<?>' as a query result or multiple
redundant flags in `SqlFieldsQuery` (collocated, lazy, etc) which are
useless for the new query execution engine?

[1]
https://www.cse.iitb.ac.in/infolab/Data/Courses/CS632/Papers/Cascades-graefe.pdf
[2]
https://www.cse.iitb.ac.in/infolab/Data/Courses/CS632/Papers/Volcano-graefe.pdf
[3] https://drill.apache.org/
[4] https://phoenix.apache.org/
--
Kind Regards
Roman Kondakov

On 27.09.2019 11:44, Igor Seliverstov wrote:

Andrew Mashenkov

Re: New SQL execution engine

In reply to this post by Nikolay Izhikov-2

Hi Nikolay,

Let me add my 5- cent here.

Ignite SQL layer has some issues that can't be fix with changes in Ignite
only, and we are blocked with H2.
To resolve these issues we can:
1. Donate some changes to H2 and wait for it's next release. But there are
more cons than pros and I think we can't rely on H2 project anymore.
- There is no guarantee our changes will be approved by H2 community.
- We definitely won't to depend on H2 product lifecycle.
- New H2 features (like parallel multi-statement query processing in latest
release) force Ignite for significant changes\refactoring in Ignite SQL
layer with no visible benefits.
Every next release it becomes harder to upgrade H2 dependency.
- Latest H2 versions causes questions about their stability.

Hot issues are
- Large intermediate results inside H2 internals can cause OOM for some
kind of queries. Ignite can't handle this anyhow for now without reworking
H2 code.
- HashJoins
- Ignite can't start multi-step queries, but 2-step (map-reduce) only.
- It is not possible to apply optimizations on query plan as no logical
plan actually doen't exists. H2 execution plan is hard-wired with H2
internals and can't be easily transformed.
Implementing a new good planner over H2 looks like a huge task.

2. Fork H2.
We already done this in GridGain (you can found H2 module in GridGain
community edition) as fastest way to unblock work on SQL improvements.
But this way doesn't look like a good one for Ignite, regarding our
experience.
- H2 code can't be included into Ignite at all.
H2 license are MIT and EPL. From one side they can't be changed to Apache
Licence. From other side Apache Foundation don't want to host any code
licensed with other than Apache License.
GridGain is ok with this, but Apache Foundation won't.

- We can made separate H2 fork project with it's own lifecycle with full
control over it and publish it in Maven Central.
This doen't seem like a big deal. But will causes additional difficulties
in development, test and release processes of Ignite.
This way seems bring much pain for every contributor.

3. Replace H2 with smth else.
E.g. with Apache Calcite.
- Calcite is a framework and it is designed very flexible and extendable.
- Every it's part can be replaced with our own implementation.
- Apache License is out of the box =)

So, summary:
1-st way of pain we have now and it slows down Ignite SQL layer developing.
2-nd looks few better, but seems bring Ignite to nowhere in prospect.
3-rd is a risky, but promissory way.

On Fri, Sep 27, 2019 at 12:16 PM Nikolay Izhikov <[hidden email]>
wrote:

> Hello, Igor.
>
> Thanks for starting this discussion.
>
> I think we should take a step back in it and answer the following
> questions:
>
> 1. What the exact issues with the H2 integration?
> Can you send a tickets links?
> Can we label all H2 integration issues in JIRA? I propose to use "h2"
> label.
>
> 2. What are the requirements for the new SQL engine?
> We should write it down and discuss.
>
> 3. What options do we have?
> Are there any alternatives to Calcite on the market?
> We did the wrong choice that looked obvious one time.
> So we should carefully avoid it at this time.
>
> 4. What is improvements of Ignite we want to make with the new engine?
>
>
> В Пт, 27/09/2019 в 08:44 +0000, Igor Seliverstov пишет:
> > Hi Igniters!
> >
> > As you might know currently we have many open issues relating to current
> H2 based engine and its execution flow.
> >
> > Some of them are critical (like impossibility to execute particular
> queries), some of them are majors (like impossibility to execute particular
> queries without pre-preparation your data to have a collocation) and many
> minors.
> >
> > Most of the issues cannot be solved without whole engine redesign.
> >
> > So, here the proposal:
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=130028084
> >
> > I'll appreciate if you share your thoughts on top of that.
> >
> > Regards,
> > Igor
>

--
Best regards,
Andrey V. Mashenkov

Roman Kondakov

Re: New SQL execution engine

In reply to this post by Nikolay Izhikov-2

Hello Nikolay.

You've asked very good questions. I'll try to answer.

> 1. What the exact issues with the H2 integration?
> Can you send a tickets links?
> Can we label all H2 integration issues in JIRA? I propose to use "h2" label.
Current SQL engine is confined in the single-pass map-reduce algorithm.
This make impossible to execute complex queries which can not be
expressed with a single map-reduce pass like subqueries with aggregates
[1]. Another problem is that H2 optimizer is very primitive and not
able to perform many useful optimizations [2].

Also Apache Calcite is commonly used in popular Apache projects like
Hive, Drill, Flink and others [3]. So it's mature and well battle tested
framework, while H2 is a toy database which is hardly ever used in the
real production systems.

> 2. What are the requirements for the new SQL engine?
> We should write it down and discuss.
The main requirement is to fix the problems listed above. The new SQL
engine should be able to *effectively* execute SQL queries of the
*arbitrary complexity*. For example the new engine will be able to
perform distributed joins in a multiple ways [4], when current engine
can do it only in two ways: collocated and distributed (the latter is
usually not very efficient and needed to set manually).

> 3. What options do we have?
> Are there any alternatives to Calcite on the market?
> We did the wrong choice that looked obvious one time.
> So we should carefully avoid it at this time.
I know the only one open source implementation of the efficient query
optimization strategy - and this is Apache Calcite. The alternative way
is to write our own query optimizer from scratch which is not a trivial
task at all.

> 4. What is improvements of Ignite we want to make with the new engine?
Ignite will be able to execute complex queries using optimal strategy. I
think this is a quite good improvement.

[1] https://issues.apache.org/jira/browse/IGNITE-11448
[2] https://issues.apache.org/jira/browse/IGNITE-6085
[3] https://calcite.apache.org/docs/powered_by.html
[4] https://www.memsql.com/blog/scaling-distributed-joins/
--
Kind Regards
Roman Kondakov

On 27.09.2019 12:20, Nikolay Izhikov wrote:

> Hello, Igor.
>
> Thanks for starting this discussion.
>
> I think we should take a step back in it and answer the following questions:
>
> 1. What the exact issues with the H2 integration?
> Can you send a tickets links?
> Can we label all H2 integration issues in JIRA? I propose to use "h2" label.
>
> 2. What are the requirements for the new SQL engine?
> We should write it down and discuss.
>
> 3. What options do we have?
> Are there any alternatives to Calcite on the market?
> We did the wrong choice that looked obvious one time.
> So we should carefully avoid it at this time.
>
> 4. What is improvements of Ignite we want to make with the new engine?
>
>
> В Пт, 27/09/2019 в 08:44 +0000, Igor Seliverstov пишет:
>> Hi Igniters!
>>
>> As you might know currently we have many open issues relating to current H2 based engine and its execution flow.
>>
>> Some of them are critical (like impossibility to execute particular queries), some of them are majors (like impossibility to execute particular queries without pre-preparation your data to have a collocation) and many minors.
>>
>> Most of the issues cannot be solved without whole engine redesign.
>>
>> So, here the proposal: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=130028084
>>
>> I'll appreciate if you share your thoughts on top of that.
>>
>> Regards,
>> Igor

Nikolay Izhikov-2

Re: New SQL execution engine

In reply to this post by Andrew Mashenkov

Hello, Andrey.

> Ignite SQL layer has some issues that can't be fix with changes in Ignite
> only, and we are blocked with H2.

What are these issues?
Can you make it specific and send a tickets for this issues?

> 3. Replace H2 with smth else.

Actually, I support this decision in general.
But, to make a right choise for H2 replacement we should carefully discuss such huge replacement.

So far, I can't see any written down(in IEP) requirements for SQL engine.
Let's do it and discuss them.

В Пт, 27/09/2019 в 13:39 +0300, Andrey Mashenkov пишет:

> Hi Nikolay,
>
> Let me add my 5- cent here.
>
> Ignite SQL layer has some issues that can't be fix with changes in Ignite
> only, and we are blocked with H2.
> To resolve these issues we can:
> 1. Donate some changes to H2 and wait for it's next release. But there are
> more cons than pros and I think we can't rely on H2 project anymore.
> - There is no guarantee our changes will be approved by H2 community.
> - We definitely won't to depend on H2 product lifecycle.
> - New H2 features (like parallel multi-statement query processing in latest
> release) force Ignite for significant changes\refactoring in Ignite SQL
> layer with no visible benefits.
> Every next release it becomes harder to upgrade H2 dependency.
> - Latest H2 versions causes questions about their stability.
>
> Hot issues are
> - Large intermediate results inside H2 internals can cause OOM for some
> kind of queries. Ignite can't handle this anyhow for now without reworking
> H2 code.
> - HashJoins
> - Ignite can't start multi-step queries, but 2-step (map-reduce) only.
> - It is not possible to apply optimizations on query plan as no logical
> plan actually doen't exists. H2 execution plan is hard-wired with H2
> internals and can't be easily transformed.
> Implementing a new good planner over H2 looks like a huge task.
>
> 2. Fork H2.
> We already done this in GridGain (you can found H2 module in GridGain
> community edition) as fastest way to unblock work on SQL improvements.
> But this way doesn't look like a good one for Ignite, regarding our
> experience.
> - H2 code can't be included into Ignite at all.
> H2 license are MIT and EPL. From one side they can't be changed to Apache
> Licence. From other side Apache Foundation don't want to host any code
> licensed with other than Apache License.
> GridGain is ok with this, but Apache Foundation won't.
>
> - We can made separate H2 fork project with it's own lifecycle with full
> control over it and publish it in Maven Central.
> This doen't seem like a big deal. But will causes additional difficulties
> in development, test and release processes of Ignite.
> This way seems bring much pain for every contributor.
>
> 3. Replace H2 with smth else.
> E.g. with Apache Calcite.
> - Calcite is a framework and it is designed very flexible and extendable.
> - Every it's part can be replaced with our own implementation.
> - Apache License is out of the box =)
>
> So, summary:
> 1-st way of pain we have now and it slows down Ignite SQL layer developing.
> 2-nd looks few better, but seems bring Ignite to nowhere in prospect.
> 3-rd is a risky, but promissory way.
>
>
> On Fri, Sep 27, 2019 at 12:16 PM Nikolay Izhikov <[hidden email]>
> wrote:
>
> > Hello, Igor.
> >
> > Thanks for starting this discussion.
> >
> > I think we should take a step back in it and answer the following
> > questions:
> >
> > 1. What the exact issues with the H2 integration?
> > Can you send a tickets links?
> > Can we label all H2 integration issues in JIRA? I propose to use "h2"
> > label.
> >
> > 2. What are the requirements for the new SQL engine?
> > We should write it down and discuss.
> >
> > 3. What options do we have?
> > Are there any alternatives to Calcite on the market?
> > We did the wrong choice that looked obvious one time.
> > So we should carefully avoid it at this time.
> >
> > 4. What is improvements of Ignite we want to make with the new engine?
> >
> >
> > В Пт, 27/09/2019 в 08:44 +0000, Igor Seliverstov пишет:
> > > Hi Igniters!
> > >
> > > As you might know currently we have many open issues relating to current
> >
> > H2 based engine and its execution flow.
> > >
> > > Some of them are critical (like impossibility to execute particular
> >
> > queries), some of them are majors (like impossibility to execute particular
> > queries without pre-preparation your data to have a collocation) and many
> > minors.
> > >
> > > Most of the issues cannot be solved without whole engine redesign.
> > >
> > > So, here the proposal:
> >
> > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=130028084
> > >
> > > I'll appreciate if you share your thoughts on top of that.
> > >
> > > Regards,
> > > Igor
>
>

signature.asc (499 bytes) Download Attachment

Nikolay Izhikov-2

Re: New SQL execution engine

In reply to this post by Roman Kondakov

Hello, Roman.

> Also Apache Calcite is commonly used in popular Apache projects

I don't think it's a good point.
H2 is also commonly used.
But, it doesn't conform to Ignite requirements.

Can you, please, write down issues and engine requirements to the IEP?
So we can discuss each point separately.

В Пт, 27/09/2019 в 13:56 +0300, Roman Kondakov пишет:

> Hello Nikolay.
>
> You've asked very good questions. I'll try to answer.
>
> > 1. What the exact issues with the H2 integration?
> > Can you send a tickets links?
> > Can we label all H2 integration issues in JIRA? I propose to use "h2" label.
>
> Current SQL engine is confined in the single-pass map-reduce algorithm.
> This make impossible to execute complex queries which can not be
> expressed with a single map-reduce pass like subqueries with aggregates
> [1]. Another problem is that H2 optimizer is very primitive and not
> able to perform many useful optimizations [2].
>
> Also Apache Calcite is commonly used in popular Apache projects like
> Hive, Drill, Flink and others [3]. So it's mature and well battle tested
> framework, while H2 is a toy database which is hardly ever used in the
> real production systems.
>
> > 2. What are the requirements for the new SQL engine?
> > We should write it down and discuss.
>
> The main requirement is to fix the problems listed above. The new SQL
> engine should be able to *effectively* execute SQL queries of the
> *arbitrary complexity*. For example the new engine will be able to
> perform distributed joins in a multiple ways [4], when current engine
> can do it only in two ways: collocated and distributed (the latter is
> usually not very efficient and needed to set manually).
>
> > 3. What options do we have?
> > Are there any alternatives to Calcite on the market?
> > We did the wrong choice that looked obvious one time.
> > So we should carefully avoid it at this time.
>
> I know the only one open source implementation of the efficient query
> optimization strategy - and this is Apache Calcite. The alternative way
> is to write our own query optimizer from scratch which is not a trivial
> task at all.
>
>
> > 4. What is improvements of Ignite we want to make with the new engine?
>
> Ignite will be able to execute complex queries using optimal strategy. I
> think this is a quite good improvement.
>
>
> [1] https://issues.apache.org/jira/browse/IGNITE-11448
> [2] https://issues.apache.org/jira/browse/IGNITE-6085
> [3] https://calcite.apache.org/docs/powered_by.html
> [4] https://www.memsql.com/blog/scaling-distributed-joins/

signature.asc (499 bytes) Download Attachment

Roman Kondakov

Re: New SQL execution engine

Hello Nikolay,

please see IEP--37 [1]. Issues are there.

[1]
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=130028084

--
Kind Regards
Roman Kondakov

On 27.09.2019 14:20, Nikolay Izhikov wrote:

> Hello, Roman.
>
>> Also Apache Calcite is commonly used in popular Apache projects
> I don't think it's a good point.
> H2 is also commonly used.
> But, it doesn't conform to Ignite requirements.
>
> Can you, please, write down issues and engine requirements to the IEP?
> So we can discuss each point separately.
>
>
> В Пт, 27/09/2019 в 13:56 +0300, Roman Kondakov пишет:
>> Hello Nikolay.
>>
>> You've asked very good questions. I'll try to answer.
>>
>>> 1. What the exact issues with the H2 integration?
>>> Can you send a tickets links?
>>> Can we label all H2 integration issues in JIRA? I propose to use "h2" label.
>> Current SQL engine is confined in the single-pass map-reduce algorithm.
>> This make impossible to execute complex queries which can not be
>> expressed with a single map-reduce pass like subqueries with aggregates
>> [1]. Another problem is that H2 optimizer is very primitive and not
>> able to perform many useful optimizations [2].
>>
>> Also Apache Calcite is commonly used in popular Apache projects like
>> Hive, Drill, Flink and others [3]. So it's mature and well battle tested
>> framework, while H2 is a toy database which is hardly ever used in the
>> real production systems.
>>
>>> 2. What are the requirements for the new SQL engine?
>>> We should write it down and discuss.
>> The main requirement is to fix the problems listed above. The new SQL
>> engine should be able to *effectively* execute SQL queries of the
>> *arbitrary complexity*. For example the new engine will be able to
>> perform distributed joins in a multiple ways [4], when current engine
>> can do it only in two ways: collocated and distributed (the latter is
>> usually not very efficient and needed to set manually).
>>
>>> 3. What options do we have?
>>> Are there any alternatives to Calcite on the market?
>>> We did the wrong choice that looked obvious one time.
>>> So we should carefully avoid it at this time.
>> I know the only one open source implementation of the efficient query
>> optimization strategy - and this is Apache Calcite. The alternative way
>> is to write our own query optimizer from scratch which is not a trivial
>> task at all.
>>
>>
>>> 4. What is improvements of Ignite we want to make with the new engine?
>> Ignite will be able to execute complex queries using optimal strategy. I
>> think this is a quite good improvement.
>>
>>
>> [1] https://issues.apache.org/jira/browse/IGNITE-11448
>> [2] https://issues.apache.org/jira/browse/IGNITE-6085
>> [3] https://calcite.apache.org/docs/powered_by.html
>> [4] https://www.memsql.com/blog/scaling-distributed-joins/

Nikolay Izhikov-2

Re: New SQL execution engine

Hello, Roman.

All I see is links to two tickets:

IGNITE-11448 - Open
IGNITE-6085 - Closed

Other issues described poorly and have not ticket links.
We can't discuss such a huge change as an execution engine replacement with descrition like:

"No data co-location control, i.e. arbitrary data can be returned silently" or
"Low control on how query executes internally, as a result we have limited possibility to implement improvements/fixes."

I think we need some reproducer that shows issue.
Tech details also should be added.

Let's make these descriptions more specific.
Let's discuss how we want to fix them with the new engine.

В Пт, 27/09/2019 в 15:10 +0300, Roman Kondakov пишет:
> Hello Nikolay,
>
> please see IEP--37 [1]. Issues are there.
>
>
> [1]
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=130028084
>
>

signature.asc (499 bytes) Download Attachment

Maxim Muzafarov

Re: New SQL execution engine

Folks,

I agree with Nikolay, the idea of replacing the H2 engine with the
most suitable one is reasonable. But since such change is major we
should have a strong argumentation on it even for members with are
working outside the SQL-team.

I think it is really necessary to have:

1. The list of issues related to the current engine (H2) which from
different points of view and for different developers must seem
unsolvable. For example, `... the H2 execution plan is hard-wired with
H2 internals and can't be easily transformed` seems doesn't have a
strong technical argumentation.
After this step, we should have a clear understanding that the engine
change is required.

2. Why only the Apache Calcite? It seems to me we should have a table
with a comparison of different engines with the pros and cons of each
other. A brief search shows me that we may have a few options here.
After this step, we should have a clear understanding of why we choose
this dependency prior to another.

3. We should also have a migration decomposition and step by step
actions to do. I haven't found such a decomposition on IEP-37 page. Do
we have one? What the implementation phases will be? What components
will be changed? What a new API would be and would it be? What
problems we are expecting e.g performance degradation on prototype
implementation? `Risks and Assumptions` topic doesn't seem to be a
good described.
After this step, we should have a clear and obvious a new feature
implementation plan.

Let's have a strong technical discussion.

On Fri, 27 Sep 2019 at 15:17, Nikolay Izhikov <[hidden email]> wrote:

>
> Hello, Roman.
>
> All I see is links to two tickets:
>
> IGNITE-11448 - Open
> IGNITE-6085 - Closed
>
> Other issues described poorly and have not ticket links.
> We can't discuss such a huge change as an execution engine replacement with descrition like:
>
> "No data co-location control, i.e. arbitrary data can be returned silently" or
> "Low control on how query executes internally, as a result we have limited possibility to implement improvements/fixes."
>
> I think we need some reproducer that shows issue.
> Tech details also should be added.
>
> Let's make these descriptions more specific.
> Let's discuss how we want to fix them with the new engine.
>
>
> В Пт, 27/09/2019 в 15:10 +0300, Roman Kondakov пишет:
> > Hello Nikolay,
> >
> > please see IEP--37 [1]. Issues are there.
> >
> >
> > [1]
> > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=130028084
> >
> >

Alexey Goncharuk

Re: New SQL execution engine

Nikolay, Maxim,

Asking to provide a list of issues with the current H2 is pointless because
it has a fundamental architectural flow, not just a bunch of bugs:

Currently, the query execution is limited to a two-phase map-reduce task
(with an optional remote cursor when 'distributed joins' flag is enabled)
and only a limited subset of queries can be executed. You can easily see
that if you try to draw how three non-collocated caches should be joined on
an arbitrary condition.

H2 cannot solve this problem because H2 is a local database and is not
designed to execute distributed queries, let alone the fact that it is not
designed to be embedded to other projects as an execution engine. Because
of this, H2 upgrade is a huge pain which leads to issues up to broken
compilation. This is exactly the reason why the ticket with index use for
IN() expression [1] has only been fixed in 2.7, one can see the amount of
changes needed for a simple version upgrade.

Now, as for alternatives for Apache Calcite - I personally spent quite a
large amount of time looking for alternatives but did not find any even
remotely matching the abilities and flexibility of Calcite, but did not
find any. As folks noted before, Calcite is specifically designed to have
flexible optimization rules and support distributed query execution, which
is already proved by real-life projects. If you have any other framework in
mind that should be considered - please let the community know, I believe
it will be a more productive discussion than now.

As for the IEP content - I agree, we should have a more detailed
description of steps and technical information there, but I believe this
will be improved further.

--AG

[1] https://issues.apache.org/jira/browse/IGNITE-4150

пт, 27 сент. 2019 г. в 15:33, Maxim Muzafarov <[hidden email]>:

> Folks,
>
> I agree with Nikolay, the idea of replacing the H2 engine with the
> most suitable one is reasonable. But since such change is major we
> should have a strong argumentation on it even for members with are
> working outside the SQL-team.
>
> I think it is really necessary to have:
>
> 1. The list of issues related to the current engine (H2) which from
> different points of view and for different developers must seem
> unsolvable. For example, `... the H2 execution plan is hard-wired with
> H2 internals and can't be easily transformed` seems doesn't have a
> strong technical argumentation.
> After this step, we should have a clear understanding that the engine
> change is required.
>
> 2. Why only the Apache Calcite? It seems to me we should have a table
> with a comparison of different engines with the pros and cons of each
> other. A brief search shows me that we may have a few options here.
> After this step, we should have a clear understanding of why we choose
> this dependency prior to another.
>
> 3. We should also have a migration decomposition and step by step
> actions to do. I haven't found such a decomposition on IEP-37 page. Do
> we have one? What the implementation phases will be? What components
> will be changed? What a new API would be and would it be? What
> problems we are expecting e.g performance degradation on prototype
> implementation? `Risks and Assumptions` topic doesn't seem to be a
> good described.
> After this step, we should have a clear and obvious a new feature
> implementation plan.
>
> Let's have a strong technical discussion.
>
> On Fri, 27 Sep 2019 at 15:17, Nikolay Izhikov <[hidden email]> wrote:
> >
> > Hello, Roman.
> >
> > All I see is links to two tickets:
> >
> > IGNITE-11448 - Open
> > IGNITE-6085 - Closed
> >
> > Other issues described poorly and have not ticket links.
> > We can't discuss such a huge change as an execution engine replacement
> with descrition like:
> >
> > "No data co-location control, i.e. arbitrary data can be returned
> silently" or
> > "Low control on how query executes internally, as a result we have
> limited possibility to implement improvements/fixes."
> >
> > I think we need some reproducer that shows issue.
> > Tech details also should be added.
> >
> > Let's make these descriptions more specific.
> > Let's discuss how we want to fix them with the new engine.
> >
> >
> > В Пт, 27/09/2019 в 15:10 +0300, Roman Kondakov пишет:
> > > Hello Nikolay,
> > >
> > > please see IEP--37 [1]. Issues are there.
> > >
> > >
> > > [1]
> > >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=130028084
> > >
> > >
>

Nikolay Izhikov-2

Re: New SQL execution engine

Hello, Alexey.

Thanks for the details.

> Now, as for alternatives for Apache Calcite

I want to discuss our *requirements* for the new engine first.
Can we do it?
The main reason to do it - We should avoid wrong technical decision.
We made one with H2 and we shouldn't do it again.

> As for the IEP content - I agree, we should have a more detailed
> description of steps and technical information there, but I believe this
> will be improved further.

Thanks!
Looking forward for IEP details.

В Пт, 27/09/2019 в 16:04 +0300, Alexey Goncharuk пишет:

> Nikolay, Maxim,
>
> Asking to provide a list of issues with the current H2 is pointless because
> it has a fundamental architectural flow, not just a bunch of bugs:
>
> Currently, the query execution is limited to a two-phase map-reduce task
> (with an optional remote cursor when 'distributed joins' flag is enabled)
> and only a limited subset of queries can be executed. You can easily see
> that if you try to draw how three non-collocated caches should be joined on
> an arbitrary condition.
>
> H2 cannot solve this problem because H2 is a local database and is not
> designed to execute distributed queries, let alone the fact that it is not
> designed to be embedded to other projects as an execution engine. Because
> of this, H2 upgrade is a huge pain which leads to issues up to broken
> compilation. This is exactly the reason why the ticket with index use for
> IN() expression [1] has only been fixed in 2.7, one can see the amount of
> changes needed for a simple version upgrade.
>
> Now, as for alternatives for Apache Calcite - I personally spent quite a
> large amount of time looking for alternatives but did not find any even
> remotely matching the abilities and flexibility of Calcite, but did not
> find any. As folks noted before, Calcite is specifically designed to have
> flexible optimization rules and support distributed query execution, which
> is already proved by real-life projects. If you have any other framework in
> mind that should be considered - please let the community know, I believe
> it will be a more productive discussion than now.
>
> As for the IEP content - I agree, we should have a more detailed
> description of steps and technical information there, but I believe this
> will be improved further.
>
> --AG
>
> [1] https://issues.apache.org/jira/browse/IGNITE-4150
>
>
>
> пт, 27 сент. 2019 г. в 15:33, Maxim Muzafarov <[hidden email]>:
>
> > Folks,
> >
> > I agree with Nikolay, the idea of replacing the H2 engine with the
> > most suitable one is reasonable. But since such change is major we
> > should have a strong argumentation on it even for members with are
> > working outside the SQL-team.
> >
> > I think it is really necessary to have:
> >
> > 1. The list of issues related to the current engine (H2) which from
> > different points of view and for different developers must seem
> > unsolvable. For example, `... the H2 execution plan is hard-wired with
> > H2 internals and can't be easily transformed` seems doesn't have a
> > strong technical argumentation.
> > After this step, we should have a clear understanding that the engine
> > change is required.
> >
> > 2. Why only the Apache Calcite? It seems to me we should have a table
> > with a comparison of different engines with the pros and cons of each
> > other. A brief search shows me that we may have a few options here.
> > After this step, we should have a clear understanding of why we choose
> > this dependency prior to another.
> >
> > 3. We should also have a migration decomposition and step by step
> > actions to do. I haven't found such a decomposition on IEP-37 page. Do
> > we have one? What the implementation phases will be? What components
> > will be changed? What a new API would be and would it be? What
> > problems we are expecting e.g performance degradation on prototype
> > implementation? `Risks and Assumptions` topic doesn't seem to be a
> > good described.
> > After this step, we should have a clear and obvious a new feature
> > implementation plan.
> >
> > Let's have a strong technical discussion.
> >
> > On Fri, 27 Sep 2019 at 15:17, Nikolay Izhikov <[hidden email]> wrote:
> > >
> > > Hello, Roman.
> > >
> > > All I see is links to two tickets:
> > >
> > > IGNITE-11448 - Open
> > > IGNITE-6085 - Closed
> > >
> > > Other issues described poorly and have not ticket links.
> > > We can't discuss such a huge change as an execution engine replacement
> >
> > with descrition like:
> > >
> > > "No data co-location control, i.e. arbitrary data can be returned
> >
> > silently" or
> > > "Low control on how query executes internally, as a result we have
> >
> > limited possibility to implement improvements/fixes."
> > >
> > > I think we need some reproducer that shows issue.
> > > Tech details also should be added.
> > >
> > > Let's make these descriptions more specific.
> > > Let's discuss how we want to fix them with the new engine.
> > >
> > >
> > > В Пт, 27/09/2019 в 15:10 +0300, Roman Kondakov пишет:
> > > > Hello Nikolay,
> > > >
> > > > please see IEP--37 [1]. Issues are there.
> > > >
> > > >
> > > > [1]
> > > >
> >
> > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=130028084
> > > >
> > > >

signature.asc (499 bytes) Download Attachment

Roman Kondakov

Re: New SQL execution engine

In reply to this post by Maxim Muzafarov

Maxim, Nikolay,

I've listed two issues which show the ideological flaws of the current
engine.

1. IGNITE-11448 - Open. This ticket describes the impossibility of
executing queries which can not be fit in the hardcoded one pass
map-reduce paradigm.

2. IGNITE-6085 - Closed (won't fix) - This ticket describes the second
major problem with the current engine: H2 query optimizer is very
primitive and can not perform many useful optimizations.

These two points (single map-reduce execution and inflexible optimizer)
are the main problems with the current engine. It means that our engine
is currently suitable for execution only a very limited subset of the
typical SQL queries. For example it can not even run most of the TPC-H
benchmark queries because they don't fit to the simple map-reduce paradigm.

> All I see is links to two tickets:
How many tickets would satisfy you? I named two. And it looks like it is
not enough from your point of view. Ok, so how many is enough? The set
of problems caused by listed above tickets is infinite, therefore I can
not create a ticket for each of them.
> Tech details also should be added.

Tech details are in the tickets.

> We can't discuss such a huge change as an execution engine replacement with descrition like:
> "No data co-location control, i.e. arbitrary data can be returned silently" or
> "Low control on how query executes internally, as a result we have limited possibility to implement improvements/fixes."
Why not? Don't you understand these problems? Or you don't think this is
a problem?

> Let's make these descriptions more specific.
What do you mean by "more specific"? What is the criteria of the
specific description?

Nikolay, Maxim, I understand that our arguments may not be as obvious
for you as it obvious for SQL team. So, please arrange your questions in
a more constructive way.

Thank you!
--
Kind Regards
Roman Kondakov

On 27.09.2019 15:32, Maxim Muzafarov wrote:

> Folks,
>
> I agree with Nikolay, the idea of replacing the H2 engine with the
> most suitable one is reasonable. But since such change is major we
> should have a strong argumentation on it even for members with are
> working outside the SQL-team.
>
> I think it is really necessary to have:
>
> 1. The list of issues related to the current engine (H2) which from
> different points of view and for different developers must seem
> unsolvable. For example, `... the H2 execution plan is hard-wired with
> H2 internals and can't be easily transformed` seems doesn't have a
> strong technical argumentation.
> After this step, we should have a clear understanding that the engine
> change is required.
>
> 2. Why only the Apache Calcite? It seems to me we should have a table
> with a comparison of different engines with the pros and cons of each
> other. A brief search shows me that we may have a few options here.
> After this step, we should have a clear understanding of why we choose
> this dependency prior to another.
>
> 3. We should also have a migration decomposition and step by step
> actions to do. I haven't found such a decomposition on IEP-37 page. Do
> we have one? What the implementation phases will be? What components
> will be changed? What a new API would be and would it be? What
> problems we are expecting e.g performance degradation on prototype
> implementation? `Risks and Assumptions` topic doesn't seem to be a
> good described.
> After this step, we should have a clear and obvious a new feature
> implementation plan.
>
> Let's have a strong technical discussion.
>
> On Fri, 27 Sep 2019 at 15:17, Nikolay Izhikov <[hidden email]> wrote:
>> Hello, Roman.
>>
>> All I see is links to two tickets:
>>
>> IGNITE-11448 - Open
>> IGNITE-6085 - Closed
>>
>> Other issues described poorly and have not ticket links.
>> We can't discuss such a huge change as an execution engine replacement with descrition like:
>>
>> "No data co-location control, i.e. arbitrary data can be returned silently" or
>> "Low control on how query executes internally, as a result we have limited possibility to implement improvements/fixes."
>>
>> I think we need some reproducer that shows issue.
>> Tech details also should be added.
>>
>> Let's make these descriptions more specific.
>> Let's discuss how we want to fix them with the new engine.
>>
>>
>> В Пт, 27/09/2019 в 15:10 +0300, Roman Kondakov пишет:
>>> Hello Nikolay,
>>>
>>> please see IEP--37 [1]. Issues are there.
>>>
>>>
>>> [1]
>>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=130028084
>>>
>>>

Nikolay Izhikov-2

Re: New SQL execution engine

Roman.

> Nikolay, Maxim, I understand that our arguments may not be as obvious
> for you as it obvious for SQL team. So, please arrange your questions in
> a more constructive way.

What is SQL team?
I only know Ignite community :)

Please, share you knowledge in IEP.
I want to join to the process of engine *selection*.
It should start with the requirements to such engine.
Can you write it in IEP, please?

My point is very simple:

1. We made the wrong decision with H2
2. We should make a well-thought decision about the new engine.

> How many tickets would satisfy you?

You write about "issueS" with the H2.
All I see is one open ticket.
IEP doesn't provide enough information.
So it's not about the number of tickets, it's about

> These two points (single map-reduce execution and inflexible optimizer)
> are the main problems with the current engine.

We may come to the point when Calcite(or any other engine) brings us third and other "main problems".
This is how it happens with H2.

Let's start from what we want to get with the engine and move forward from this base.
What do you think?

В Пт, 27/09/2019 в 16:15 +0300, Roman Kondakov пишет:

> Maxim, Nikolay,
>
> I've listed two issues which show the ideological flaws of the current
> engine.
>
> 1. IGNITE-11448 - Open. This ticket describes the impossibility of
> executing queries which can not be fit in the hardcoded one pass
> map-reduce paradigm.
>
> 2. IGNITE-6085 - Closed (won't fix) - This ticket describes the second
> major problem with the current engine: H2 query optimizer is very
> primitive and can not perform many useful optimizations.
>
> These two points (single map-reduce execution and inflexible optimizer)
> are the main problems with the current engine. It means that our engine
> is currently suitable for execution only a very limited subset of the
> typical SQL queries. For example it can not even run most of the TPC-H
> benchmark queries because they don't fit to the simple map-reduce paradigm.
>
> > All I see is links to two tickets:
>
> How many tickets would satisfy you? I named two. And it looks like it is
> not enough from your point of view. Ok, so how many is enough? The set
> of problems caused by listed above tickets is infinite, therefore I can
> not create a ticket for each of them.
> > Tech details also should be added.
>
> Tech details are in the tickets.
>
> > We can't discuss such a huge change as an execution engine replacement with descrition like:
> > "No data co-location control, i.e. arbitrary data can be returned silently" or
> > "Low control on how query executes internally, as a result we have limited possibility to implement improvements/fixes."
>
> Why not? Don't you understand these problems? Or you don't think this is
> a problem?
>
> > Let's make these descriptions more specific.
>
> What do you mean by "more specific"? What is the criteria of the
> specific description?
>
>
>
> Nikolay, Maxim, I understand that our arguments may not be as obvious
> for you as it obvious for SQL team. So, please arrange your questions in
> a more constructive way.
>
> Thank you!

signature.asc (499 bytes) Download Attachment

gvvinblade

Re: New SQL execution engine

Nikolay,

The main issue - there is no *selection*.

There is a field of knowledge - relational algebra, which describes how to transform relational expressions saving their semantics, and a couple of implementations (Calcite is only one written in Java).

There are only two alternatives:

1) Implementing white papers from scratch
2) Adopting Calcite to our needs.

The second way was chosen by several other projects, there is experience, there is a list of known issues (like using indexes) so, almost everything is already done for us.

Implementing a planner is a big deal, I think anybody understands it there. That's why our proposal to reuse others experience is obvious.

If you have an alternative - you're welcome, I'll gratefully listen to you.

The main question isn't "WHAT" but "HOW" - that's the discussion topic from my point of view.

Regards,
Igor

> 27 сент. 2019 г., в 16:37, Nikolay Izhikov <[hidden email]> написал(а):
>
> Roman.
>
>> Nikolay, Maxim, I understand that our arguments may not be as obvious
>> for you as it obvious for SQL team. So, please arrange your questions in
>> a more constructive way.
>
> What is SQL team?
> I only know Ignite community :)
>
> Please, share you knowledge in IEP.
> I want to join to the process of engine *selection*.
> It should start with the requirements to such engine.
> Can you write it in IEP, please?
>
> My point is very simple:
>
> 1. We made the wrong decision with H2
> 2. We should make a well-thought decision about the new engine.
>
>> How many tickets would satisfy you?
>
> You write about "issueS" with the H2.
> All I see is one open ticket.
> IEP doesn't provide enough information.
> So it's not about the number of tickets, it's about
>
>> These two points (single map-reduce execution and inflexible optimizer)
>> are the main problems with the current engine.
>
> We may come to the point when Calcite(or any other engine) brings us third and other "main problems".
> This is how it happens with H2.
>
> Let's start from what we want to get with the engine and move forward from this base.
> What do you think?
>
>
>
> В Пт, 27/09/2019 в 16:15 +0300, Roman Kondakov пишет:
>> Maxim, Nikolay,
>>
>> I've listed two issues which show the ideological flaws of the current
>> engine.
>>
>> 1. IGNITE-11448 - Open. This ticket describes the impossibility of
>> executing queries which can not be fit in the hardcoded one pass
>> map-reduce paradigm.
>>
>> 2. IGNITE-6085 - Closed (won't fix) - This ticket describes the second
>> major problem with the current engine: H2 query optimizer is very
>> primitive and can not perform many useful optimizations.
>>
>> These two points (single map-reduce execution and inflexible optimizer)
>> are the main problems with the current engine. It means that our engine
>> is currently suitable for execution only a very limited subset of the
>> typical SQL queries. For example it can not even run most of the TPC-H
>> benchmark queries because they don't fit to the simple map-reduce paradigm.
>>
>>> All I see is links to two tickets:
>>
>> How many tickets would satisfy you? I named two. And it looks like it is
>> not enough from your point of view. Ok, so how many is enough? The set
>> of problems caused by listed above tickets is infinite, therefore I can
>> not create a ticket for each of them.
>>> Tech details also should be added.
>>
>> Tech details are in the tickets.
>>
>>> We can't discuss such a huge change as an execution engine replacement with descrition like:
>>> "No data co-location control, i.e. arbitrary data can be returned silently" or
>>> "Low control on how query executes internally, as a result we have limited possibility to implement improvements/fixes."
>>
>> Why not? Don't you understand these problems? Or you don't think this is
>> a problem?
>>
>>> Let's make these descriptions more specific.
>>
>> What do you mean by "more specific"? What is the criteria of the
>> specific description?
>>
>>
>>
>> Nikolay, Maxim, I understand that our arguments may not be as obvious
>> for you as it obvious for SQL team. So, please arrange your questions in
>> a more constructive way.
>>
>> Thank you!

Andrew Mashenkov

Re: New SQL execution engine

In reply to this post by Nikolay Izhikov-2

> Roman.
>
> > Nikolay, Maxim, I understand that our arguments may not be as obvious
> > for you as it obvious for SQL team. So, please arrange your questions in
> > a more constructive way.
>
> What is SQL team?
> I only know Ignite community :)
>
> Please, share you knowledge in IEP.
> I want to join to the process of engine *selection*.
> It should start with the requirements to such engine.
> Can you write it in IEP, please?
>
> My point is very simple:
>
> 1. We made the wrong decision with H2
> 2. We should make a well-thought decision about the new engine.
>
> > How many tickets would satisfy you?
>
> You write about "issueS" with the H2.
> All I see is one open ticket.
> IEP doesn't provide enough information.
> So it's not about the number of tickets, it's about
>
> > These two points (single map-reduce execution and inflexible optimizer)
> > are the main problems with the current engine.
>
> We may come to the point when Calcite(or any other engine) brings us third
> and other "main problems".
> This is how it happens with H2.
>
> Let's start from what we want to get with the engine and move forward from
> this base.
> What do you think?
>
>
>
> В Пт, 27/09/2019 в 16:15 +0300, Roman Kondakov пишет:
> > Maxim, Nikolay,
> >
> > I've listed two issues which show the ideological flaws of the current
> > engine.
> >
> > 1. IGNITE-11448 - Open. This ticket describes the impossibility of
> > executing queries which can not be fit in the hardcoded one pass
> > map-reduce paradigm.
> >
> > 2. IGNITE-6085 - Closed (won't fix) - This ticket describes the second
> > major problem with the current engine: H2 query optimizer is very
> > primitive and can not perform many useful optimizations.
> >
> > These two points (single map-reduce execution and inflexible optimizer)
> > are the main problems with the current engine. It means that our engine
> > is currently suitable for execution only a very limited subset of the
> > typical SQL queries. For example it can not even run most of the TPC-H
> > benchmark queries because they don't fit to the simple map-reduce
> paradigm.
> >
> > > All I see is links to two tickets:
> >
> > How many tickets would satisfy you? I named two. And it looks like it is
> > not enough from your point of view. Ok, so how many is enough? The set
> > of problems caused by listed above tickets is infinite, therefore I can
> > not create a ticket for each of them.
> > > Tech details also should be added.
> >
> > Tech details are in the tickets.
> >
> > > We can't discuss such a huge change as an execution engine replacement
> with descrition like:
> > > "No data co-location control, i.e. arbitrary data can be returned
> silently" or
> > > "Low control on how query executes internally, as a result we have
> limited possibility to implement improvements/fixes."
> >
> > Why not? Don't you understand these problems? Or you don't think this is
> > a problem?
> >
> > > Let's make these descriptions more specific.
> >
> > What do you mean by "more specific"? What is the criteria of the
> > specific description?
> >
> >
> >
> > Nikolay, Maxim, I understand that our arguments may not be as obvious
> > for you as it obvious for SQL team. So, please arrange your questions in
> > a more constructive way.
> >
> > Thank you!
>

--
Best regards,
Andrey V. Mashenkov

Nikolay Izhikov-2

Re: New SQL execution engine

Thanks, Andrey!

Will take a loo, shortly.

В Пт, 27/09/2019 в 17:19 +0300, Andrey Mashenkov пишет:

> Issues can't be resolved without changes in H2.
> Hope, this will be enough.
>
> https://issues.apache.org/jira/browse/IGNITE-10598
> https://issues.apache.org/jira/browse/IGNITE-11473
> https://issues.apache.org/jira/browse/IGNITE-11444
> https://issues.apache.org/jira/browse/IGNITE-5289
> https://issues.apache.org/jira/browse/IGNITE-10855
> https://issues.apache.org/jira/browse/IGNITE-11341
> https://issues.apache.org/jira/browse/IGNITE-7526
> https://issues.apache.org/jira/browse/IGNITE-9480
> https://issues.apache.org/jira/browse/IGNITE-9616
> https://issues.apache.org/jira/browse/IGNITE-11891
> https://issues.apache.org/jira/browse/IGNITE-6202
> https://issues.apache.org/jira/browse/IGNITE-11448
> https://issues.apache.org/jira/browse/IGNITE-3911
>
>
> On Fri, Sep 27, 2019 at 4:34 PM Nikolay Izhikov <[hidden email]> wrote:
>
> > Roman.
> >
> > > Nikolay, Maxim, I understand that our arguments may not be as obvious
> > > for you as it obvious for SQL team. So, please arrange your questions in
> > > a more constructive way.
> >
> > What is SQL team?
> > I only know Ignite community :)
> >
> > Please, share you knowledge in IEP.
> > I want to join to the process of engine *selection*.
> > It should start with the requirements to such engine.
> > Can you write it in IEP, please?
> >
> > My point is very simple:
> >
> > 1. We made the wrong decision with H2
> > 2. We should make a well-thought decision about the new engine.
> >
> > > How many tickets would satisfy you?
> >
> > You write about "issueS" with the H2.
> > All I see is one open ticket.
> > IEP doesn't provide enough information.
> > So it's not about the number of tickets, it's about
> >
> > > These two points (single map-reduce execution and inflexible optimizer)
> > > are the main problems with the current engine.
> >
> > We may come to the point when Calcite(or any other engine) brings us third
> > and other "main problems".
> > This is how it happens with H2.
> >
> > Let's start from what we want to get with the engine and move forward from
> > this base.
> > What do you think?
> >
> >
> >
> > В Пт, 27/09/2019 в 16:15 +0300, Roman Kondakov пишет:
> > > Maxim, Nikolay,
> > >
> > > I've listed two issues which show the ideological flaws of the current
> > > engine.
> > >
> > > 1. IGNITE-11448 - Open. This ticket describes the impossibility of
> > > executing queries which can not be fit in the hardcoded one pass
> > > map-reduce paradigm.
> > >
> > > 2. IGNITE-6085 - Closed (won't fix) - This ticket describes the second
> > > major problem with the current engine: H2 query optimizer is very
> > > primitive and can not perform many useful optimizations.
> > >
> > > These two points (single map-reduce execution and inflexible optimizer)
> > > are the main problems with the current engine. It means that our engine
> > > is currently suitable for execution only a very limited subset of the
> > > typical SQL queries. For example it can not even run most of the TPC-H
> > > benchmark queries because they don't fit to the simple map-reduce
> >
> > paradigm.
> > >
> > > > All I see is links to two tickets:
> > >
> > > How many tickets would satisfy you? I named two. And it looks like it is
> > > not enough from your point of view. Ok, so how many is enough? The set
> > > of problems caused by listed above tickets is infinite, therefore I can
> > > not create a ticket for each of them.
> > > > Tech details also should be added.
> > >
> > > Tech details are in the tickets.
> > >
> > > > We can't discuss such a huge change as an execution engine replacement
> >
> > with descrition like:
> > > > "No data co-location control, i.e. arbitrary data can be returned
> >
> > silently" or
> > > > "Low control on how query executes internally, as a result we have
> >
> > limited possibility to implement improvements/fixes."
> > >
> > > Why not? Don't you understand these problems? Or you don't think this is
> > > a problem?
> > >
> > > > Let's make these descriptions more specific.
> > >
> > > What do you mean by "more specific"? What is the criteria of the
> > > specific description?
> > >
> > >
> > >
> > > Nikolay, Maxim, I understand that our arguments may not be as obvious
> > > for you as it obvious for SQL team. So, please arrange your questions in
> > > a more constructive way.
> > >
> > > Thank you!
>
>

signature.asc (499 bytes) Download Attachment

Nikolay Izhikov-2

Re: New SQL execution engine

In reply to this post by gvvinblade

Igor.

> The main issue - there is no *selection*.

1. I don't remember community decision about this.

2. We should avoid to make such long-term decision so quickly.
We done this kind of decision with H2 and come to the point when we should review it.

> 1) Implementing white papers from scratch
> 2) Adopting Calcite to our needs.

The third option don't fix issues we have with H2.
The fourth option I know is using spark-catalyst.

What is wrong with writing engine from scratch?

I ask you to start with engine requirements.
Can we, please, discuss it?

> If you have an alternative - you're welcome, I'll gratefully listen to you.

We have alternative for now - H2 based engine.

> The main question isn't "WHAT" but "HOW" - that's the discussion topic from my point of view.

When we make a decision about engine we can discuss roadmap for replacement.
One more time - replacement of SQL engine to some more customizable make sense for me.
But, this kind of decisions need carefull discussion.

В Пт, 27/09/2019 в 17:08 +0300, Seliverstov Igor пишет:

> Nikolay,
>
> The main issue - there is no *selection*.
>
> There is a field of knowledge - relational algebra, which describes how to transform relational expressions saving their semantics, and a couple of implementations (Calcite is only one written in Java).
>
> There are only two alternatives:
>
> 1) Implementing white papers from scratch
> 2) Adopting Calcite to our needs.
>
> The second way was chosen by several other projects, there is experience, there is a list of known issues (like using indexes) so, almost everything is already done for us.
>
> Implementing a planner is a big deal, I think anybody understands it there. That's why our proposal to reuse others experience is obvious.
>
> If you have an alternative - you're welcome, I'll gratefully listen to you.
>
> The main question isn't "WHAT" but "HOW" - that's the discussion topic from my point of view.
>
> Regards,
> Igor
>
> > 27 сент. 2019 г., в 16:37, Nikolay Izhikov <[hidden email]> написал(а):
> >
> > Roman.
> >
> > > Nikolay, Maxim, I understand that our arguments may not be as obvious
> > > for you as it obvious for SQL team. So, please arrange your questions in
> > > a more constructive way.
> >
> > What is SQL team?
> > I only know Ignite community :)
> >
> > Please, share you knowledge in IEP.
> > I want to join to the process of engine *selection*.
> > It should start with the requirements to such engine.
> > Can you write it in IEP, please?
> >
> > My point is very simple:
> >
> > 1. We made the wrong decision with H2
> > 2. We should make a well-thought decision about the new engine.
> >
> > > How many tickets would satisfy you?
> >
> > You write about "issueS" with the H2.
> > All I see is one open ticket.
> > IEP doesn't provide enough information.
> > So it's not about the number of tickets, it's about
> >
> > > These two points (single map-reduce execution and inflexible optimizer)
> > > are the main problems with the current engine.
> >
> > We may come to the point when Calcite(or any other engine) brings us third and other "main problems".
> > This is how it happens with H2.
> >
> > Let's start from what we want to get with the engine and move forward from this base.
> > What do you think?
> >
> >
> >
> > В Пт, 27/09/2019 в 16:15 +0300, Roman Kondakov пишет:
> > > Maxim, Nikolay,
> > >
> > > I've listed two issues which show the ideological flaws of the current
> > > engine.
> > >
> > > 1. IGNITE-11448 - Open. This ticket describes the impossibility of
> > > executing queries which can not be fit in the hardcoded one pass
> > > map-reduce paradigm.
> > >
> > > 2. IGNITE-6085 - Closed (won't fix) - This ticket describes the second
> > > major problem with the current engine: H2 query optimizer is very
> > > primitive and can not perform many useful optimizations.
> > >
> > > These two points (single map-reduce execution and inflexible optimizer)
> > > are the main problems with the current engine. It means that our engine
> > > is currently suitable for execution only a very limited subset of the
> > > typical SQL queries. For example it can not even run most of the TPC-H
> > > benchmark queries because they don't fit to the simple map-reduce paradigm.
> > >
> > > > All I see is links to two tickets:
> > >
> > > How many tickets would satisfy you? I named two. And it looks like it is
> > > not enough from your point of view. Ok, so how many is enough? The set
> > > of problems caused by listed above tickets is infinite, therefore I can
> > > not create a ticket for each of them.
> > > > Tech details also should be added.
> > >
> > > Tech details are in the tickets.
> > >
> > > > We can't discuss such a huge change as an execution engine replacement with descrition like:
> > > > "No data co-location control, i.e. arbitrary data can be returned silently" or
> > > > "Low control on how query executes internally, as a result we have limited possibility to implement improvements/fixes."
> > >
> > > Why not? Don't you understand these problems? Or you don't think this is
> > > a problem?
> > >
> > > > Let's make these descriptions more specific.
> > >
> > > What do you mean by "more specific"? What is the criteria of the
> > > specific description?
> > >
> > >
> > >
> > > Nikolay, Maxim, I understand that our arguments may not be as obvious
> > > for you as it obvious for SQL team. So, please arrange your questions in
> > > a more constructive way.
> > >
> > > Thank you!
>
>

signature.asc (499 bytes) Download Attachment

Andrew Mashenkov

Re: New SQL execution engine

In reply to this post by gvvinblade

Nikolay, Igor.

Implementing from scratch is an option, of course.
If we decide to go this way then we definitely won't to spend long nights
to invent "yet another SQL parser" with all the stuff related to query
rewrite rules (e.g. IN -> JOIN) or type casting \ validation \ conversion.

We thought about step-by-step H2 replacing.
1. We've tried to make POC with parser replacement to generated one from
SQL grammar with ASM,
but this approach looks slow, AFAIR. Gridgainers, anybody, have smth on
this?

2. Then we need a planner with all the rules.
Of course we will need to write rules optimized for "Distributed" execution
in anyway, but I doubt anybody want to write common-rules that already has
Calcite.
We can copy-paste, but what for?

3. Then we have to implement execution pipeline.
Possibly, we can adopt new query plans for H2 execution, but then we will
still have same pain with resolving H2 internal issues (e.g. OOM).
H2 approach is outdated, it doesn't fit Ignite needs as distributes system.

With Calcite we can concentrate on 2 and (mostly) 3 points and reuse
their architectural abstracts, otherwise we should reinvent those abstracts
through long discussions on dev-list.

I agree, we should make IEP clear to everyone in community who want to be
involved in IEP implementation at first.
Both approaches ("from scratch" and "with Calcite") are risky, so

Can we try to make an additional engine "beta"-implementation and allow
users fallback to old engine until a new one will be decided to become
mature enough.

On Fri, Sep 27, 2019 at 5:08 PM Seliverstov Igor <[hidden email]>
wrote:

> Nikolay,
>
> The main issue - there is no *selection*.
>
> There is a field of knowledge - relational algebra, which describes how to
> transform relational expressions saving their semantics, and a couple of
> implementations (Calcite is only one written in Java).
>
> There are only two alternatives:
>
> 1) Implementing white papers from scratch
> 2) Adopting Calcite to our needs.
>
> The second way was chosen by several other projects, there is experience,
> there is a list of known issues (like using indexes) so, almost everything
> is already done for us.
>
> Implementing a planner is a big deal, I think anybody understands it
> there. That's why our proposal to reuse others experience is obvious.
>
> If you have an alternative - you're welcome, I'll gratefully listen to you.
>
> The main question isn't "WHAT" but "HOW" - that's the discussion topic
> from my point of view.
>
> Regards,
> Igor
>
> > 27 сент. 2019 г., в 16:37, Nikolay Izhikov <[hidden email]>
> написал(а):
> >
> > Roman.
> >
> >> Nikolay, Maxim, I understand that our arguments may not be as obvious
> >> for you as it obvious for SQL team. So, please arrange your questions
> in
> >> a more constructive way.
> >
> > What is SQL team?
> > I only know Ignite community :)
> >
> > Please, share you knowledge in IEP.
> > I want to join to the process of engine *selection*.
> > It should start with the requirements to such engine.
> > Can you write it in IEP, please?
> >
> > My point is very simple:
> >
> > 1. We made the wrong decision with H2
> > 2. We should make a well-thought decision about the new engine.
> >
> >> How many tickets would satisfy you?
> >
> > You write about "issueS" with the H2.
> > All I see is one open ticket.
> > IEP doesn't provide enough information.
> > So it's not about the number of tickets, it's about
> >
> >> These two points (single map-reduce execution and inflexible optimizer)
> >> are the main problems with the current engine.
> >
> > We may come to the point when Calcite(or any other engine) brings us
> third and other "main problems".
> > This is how it happens with H2.
> >
> > Let's start from what we want to get with the engine and move forward
> from this base.
> > What do you think?
> >
> >
> >
> > В Пт, 27/09/2019 в 16:15 +0300, Roman Kondakov пишет:
> >> Maxim, Nikolay,
> >>
> >> I've listed two issues which show the ideological flaws of the current
> >> engine.
> >>
> >> 1. IGNITE-11448 - Open. This ticket describes the impossibility of
> >> executing queries which can not be fit in the hardcoded one pass
> >> map-reduce paradigm.
> >>
> >> 2. IGNITE-6085 - Closed (won't fix) - This ticket describes the second
> >> major problem with the current engine: H2 query optimizer is very
> >> primitive and can not perform many useful optimizations.
> >>
> >> These two points (single map-reduce execution and inflexible optimizer)
> >> are the main problems with the current engine. It means that our engine
> >> is currently suitable for execution only a very limited subset of the
> >> typical SQL queries. For example it can not even run most of the TPC-H
> >> benchmark queries because they don't fit to the simple map-reduce
> paradigm.
> >>
> >>> All I see is links to two tickets:
> >>
> >> How many tickets would satisfy you? I named two. And it looks like it
> is
> >> not enough from your point of view. Ok, so how many is enough? The set
> >> of problems caused by listed above tickets is infinite, therefore I can
> >> not create a ticket for each of them.
> >>> Tech details also should be added.
> >>
> >> Tech details are in the tickets.
> >>
> >>> We can't discuss such a huge change as an execution engine replacement
> with descrition like:
> >>> "No data co-location control, i.e. arbitrary data can be returned
> silently" or
> >>> "Low control on how query executes internally, as a result we have
> limited possibility to implement improvements/fixes."
> >>
> >> Why not? Don't you understand these problems? Or you don't think this
> is
> >> a problem?
> >>
> >>> Let's make these descriptions more specific.
> >>
> >> What do you mean by "more specific"? What is the criteria of the
> >> specific description?
> >>
> >>
> >>
> >> Nikolay, Maxim, I understand that our arguments may not be as obvious
> >> for you as it obvious for SQL team. So, please arrange your questions
> in
> >> a more constructive way.
> >>
> >> Thank you!
>
>

--
Best regards,
Andrey V. Mashenkov

gvvinblade

Re: New SQL execution engine

In reply to this post by Nikolay Izhikov-2

Nikolay,

At last we have better questions.

There is no decision, here we should decide.

Doing nothing isn’t a decision, it’s just doing nothing

Spark Catalyst is a good example, but under the hood it has absolutely the same idea, but adopted to Spark. Calcite is the same, but general. That’s why it’s better start point.

Implementing an engine from scratch is really cool, but looks like inventing a bicycle, don’t think it makes sense. At least I against this option.

I added requirements to IEP (as you asked), you may see it’s in DRAFT state and will be complemented by details.

We have some thoughts on how to make smooth replacement, but at first we should decide what to replace and what with.

At now Calcite based engine is placed in different module, we checked it can build execution graph for both local and distributed cases, it has good expandability.
We talked to Calcite community to identify possible future issues and everything points to the fact it’s the best option.
It’s possible to develop it as an experimental extension at first (not a replacement) until we make sure that it works as expected. This way there are no risks for anybody who uses Ignite on production environment.

Regards,
Igor

> 27 сент. 2019 г., в 17:25, Nikolay Izhikov <[hidden email]> написал(а):
>
> Igor.
>
>> The main issue - there is no *selection*.
>
> 1. I don't remember community decision about this.
>
> 2. We should avoid to make such long-term decision so quickly.
> We done this kind of decision with H2 and come to the point when we should review it.
>
>> 1) Implementing white papers from scratch
>> 2) Adopting Calcite to our needs.
>
> The third option don't fix issues we have with H2.
> The fourth option I know is using spark-catalyst.
>
> What is wrong with writing engine from scratch?
>
> I ask you to start with engine requirements.
> Can we, please, discuss it?
>
>> If you have an alternative - you're welcome, I'll gratefully listen to you.
>
> We have alternative for now - H2 based engine.
>
>> The main question isn't "WHAT" but "HOW" - that's the discussion topic from my point of view.
>
> When we make a decision about engine we can discuss roadmap for replacement.
> One more time - replacement of SQL engine to some more customizable make sense for me.
> But, this kind of decisions need carefull discussion.
>
> В Пт, 27/09/2019 в 17:08 +0300, Seliverstov Igor пишет:
>> Nikolay,
>>
>> The main issue - there is no *selection*.
>>
>> There is a field of knowledge - relational algebra, which describes how to transform relational expressions saving their semantics, and a couple of implementations (Calcite is only one written in Java).
>>
>> There are only two alternatives:
>>
>> 1) Implementing white papers from scratch
>> 2) Adopting Calcite to our needs.
>>
>> The second way was chosen by several other projects, there is experience, there is a list of known issues (like using indexes) so, almost everything is already done for us.
>>
>> Implementing a planner is a big deal, I think anybody understands it there. That's why our proposal to reuse others experience is obvious.
>>
>> If you have an alternative - you're welcome, I'll gratefully listen to you.
>>
>> The main question isn't "WHAT" but "HOW" - that's the discussion topic from my point of view.
>>
>> Regards,
>> Igor
>>
>>> 27 сент. 2019 г., в 16:37, Nikolay Izhikov <[hidden email]> написал(а):
>>>
>>> Roman.
>>>
>>>> Nikolay, Maxim, I understand that our arguments may not be as obvious
>>>> for you as it obvious for SQL team. So, please arrange your questions in
>>>> a more constructive way.
>>>
>>> What is SQL team?
>>> I only know Ignite community :)
>>>
>>> Please, share you knowledge in IEP.
>>> I want to join to the process of engine *selection*.
>>> It should start with the requirements to such engine.
>>> Can you write it in IEP, please?
>>>
>>> My point is very simple:
>>>
>>> 1. We made the wrong decision with H2
>>> 2. We should make a well-thought decision about the new engine.
>>>
>>>> How many tickets would satisfy you?
>>>
>>> You write about "issueS" with the H2.
>>> All I see is one open ticket.
>>> IEP doesn't provide enough information.
>>> So it's not about the number of tickets, it's about
>>>
>>>> These two points (single map-reduce execution and inflexible optimizer)
>>>> are the main problems with the current engine.
>>>
>>> We may come to the point when Calcite(or any other engine) brings us third and other "main problems".
>>> This is how it happens with H2.
>>>
>>> Let's start from what we want to get with the engine and move forward from this base.
>>> What do you think?
>>>
>>>
>>>
>>> В Пт, 27/09/2019 в 16:15 +0300, Roman Kondakov пишет:
>>>> Maxim, Nikolay,
>>>>
>>>> I've listed two issues which show the ideological flaws of the current
>>>> engine.
>>>>
>>>> 1. IGNITE-11448 - Open. This ticket describes the impossibility of
>>>> executing queries which can not be fit in the hardcoded one pass
>>>> map-reduce paradigm.
>>>>
>>>> 2. IGNITE-6085 - Closed (won't fix) - This ticket describes the second
>>>> major problem with the current engine: H2 query optimizer is very
>>>> primitive and can not perform many useful optimizations.
>>>>
>>>> These two points (single map-reduce execution and inflexible optimizer)
>>>> are the main problems with the current engine. It means that our engine
>>>> is currently suitable for execution only a very limited subset of the
>>>> typical SQL queries. For example it can not even run most of the TPC-H
>>>> benchmark queries because they don't fit to the simple map-reduce paradigm.
>>>>
>>>>> All I see is links to two tickets:
>>>>
>>>> How many tickets would satisfy you? I named two. And it looks like it is
>>>> not enough from your point of view. Ok, so how many is enough? The set
>>>> of problems caused by listed above tickets is infinite, therefore I can
>>>> not create a ticket for each of them.
>>>>> Tech details also should be added.
>>>>
>>>> Tech details are in the tickets.
>>>>
>>>>> We can't discuss such a huge change as an execution engine replacement with descrition like:
>>>>> "No data co-location control, i.e. arbitrary data can be returned silently" or
>>>>> "Low control on how query executes internally, as a result we have limited possibility to implement improvements/fixes."
>>>>
>>>> Why not? Don't you understand these problems? Or you don't think this is
>>>> a problem?
>>>>
>>>>> Let's make these descriptions more specific.
>>>>
>>>> What do you mean by "more specific"? What is the criteria of the
>>>> specific description?
>>>>
>>>>
>>>>
>>>> Nikolay, Maxim, I understand that our arguments may not be as obvious
>>>> for you as it obvious for SQL team. So, please arrange your questions in
>>>> a more constructive way.
>>>>
>>>> Thank you!
>>
>>

123