Adding support for Ignite secondary indexes to Apache Calcite planner

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

Adding support for Ignite secondary indexes to Apache Calcite planner

Roman Kondakov
Hi all!

As you may know there is an activity on integration of Apache Calcite
query optimizer into Ignite codebase is being carried out [1],[2].

One of a bunch of problems in this integration is the absence of
out-of-the-box support for secondary indexes in Apache Calcite. After
some research I came to conclusion that this problem has a couple of
workarounds. Let's name them
1. Phoenix-style approach - representing secondary indexes as
materialized views which are natively supported by Calcite engine [3]
2. Drill-style approach - pushing filters into the table scans and
choose appropriate index for lookups when possible [4]

Both these approaches have advantages and disadvantages:

Phoenix style pros:
- natural way of adding indexes as an alternative source of rows: index
can be considered as a kind of sorted materialized view.
- possibility of using index sortedness for stream aggregates,
deduplication (DISTINCT operator), merge joins, etc.
- ability to support other types of indexes (i.e. functional indexes).

Phoenix style cons:
- polluting optimizer's search space extra table scans hence increasing
the planning time.

Drill style pros:
- easier to implement (although it's questionable).
- search space is not inflated.

Drill style cons:
- missed opportunity to exploit sortedness.

There is a good discussion about using both approaches can be found in [5].

I made a small sketch [6] in order to demonstrate the applicability of
the Phoenix approach to Ignite. Key design concepts are:
1. On creating indexes are registered as tables in Calcite schema. This
step is needed for internal Calcite's routines.
2. On planner initialization we register these indexes as materialized
views in Calcite's optimizer using VolcanoPlanner#addMaterialization method.
3. Right before the query execution Calcite selects all materialized
views (indexes) which can be potentially used in query.
4. During the query optimization indexes are registered by planner as
usual TableScans and hence can be chosen by optimizer if they have lower
cost.

This sketch shows the ability to exploit index sortedness only. So the
future work in this direction should be focused on using indexes for
fast index lookups. At first glance FilterableTable and
FilterTableScanRule are good points to start. We can push Filter into
the TableScan and then use FilterableTable for fast index lookups
avoiding reading the whole index on TableScan step and then filtering
its output on the Filter step.

What do you think?



[1]
http://apache-ignite-developers.2346864.n4.nabble.com/New-SQL-execution-engine-tt43724.html#none
[2]
https://cwiki.apache.org/confluence/display/IGNITE/IEP-37%3A+New+query+execution+engine
[3] https://issues.apache.org/jira/browse/PHOENIX-2047
[4] https://issues.apache.org/jira/browse/DRILL-6381
[5] https://issues.apache.org/jira/browse/DRILL-3929
[6] https://github.com/apache/ignite/pull/7115


--
Kind Regards
Roman Kondakov

Reply | Threaded
Open this post in threaded view
|

Re: Adding support for Ignite secondary indexes to Apache Calcite planner

Zhenya Stanilovsky

Roman just as fast remark, Phoenix builds their approach on already existing monolith HBase architecture, most cases it`s just a stub for someone who wants use secondary indexes with a base with no native support of it. Don`t think it`s good idea here.
   

>
>
>------- Forwarded message -------
>From: "Roman Kondakov" < [hidden email] >
>To:  [hidden email]
>Cc:
>Subject: Adding support for Ignite secondary indexes to Apache Calcite
>planner
>Date: Tue, 10 Dec 2019 15:55:52 +0300
>
>Hi all!
>
>As you may know there is an activity on integration of Apache Calcite
>query optimizer into Ignite codebase is being carried out [1],[2].
>
>One of a bunch of problems in this integration is the absence of
>out-of-the-box support for secondary indexes in Apache Calcite. After
>some research I came to conclusion that this problem has a couple of
>workarounds. Let's name them
>1. Phoenix-style approach - representing secondary indexes as
>materialized views which are natively supported by Calcite engine [3]
>2. Drill-style approach - pushing filters into the table scans and
>choose appropriate index for lookups when possible [4]
>
>Both these approaches have advantages and disadvantages:
>
>Phoenix style pros:
>- natural way of adding indexes as an alternative source of rows: index
>can be considered as a kind of sorted materialized view.
>- possibility of using index sortedness for stream aggregates,
>deduplication (DISTINCT operator), merge joins, etc.
>- ability to support other types of indexes (i.e. functional indexes).
>
>Phoenix style cons:
>- polluting optimizer's search space extra table scans hence increasing
>the planning time.
>
>Drill style pros:
>- easier to implement (although it's questionable).
>- search space is not inflated.
>
>Drill style cons:
>- missed opportunity to exploit sortedness.
>
>There is a good discussion about using both approaches can be found in [5].
>
>I made a small sketch [6] in order to demonstrate the applicability of
>the Phoenix approach to Ignite. Key design concepts are:
>1. On creating indexes are registered as tables in Calcite schema. This
>step is needed for internal Calcite's routines.
>2. On planner initialization we register these indexes as materialized
>views in Calcite's optimizer using VolcanoPlanner#addMaterialization
>method.
>3. Right before the query execution Calcite selects all materialized
>views (indexes) which can be potentially used in query.
>4. During the query optimization indexes are registered by planner as
>usual TableScans and hence can be chosen by optimizer if they have lower
>cost.
>
>This sketch shows the ability to exploit index sortedness only. So the
>future work in this direction should be focused on using indexes for
>fast index lookups. At first glance FilterableTable and
>FilterTableScanRule are good points to start. We can push Filter into
>the TableScan and then use FilterableTable for fast index lookups
>avoiding reading the whole index on TableScan step and then filtering
>its output on the Filter step.
>
>What do you think?
>
>
>
>[1]
>http://apache-ignite-developers.2346864.n4.nabble.com/New-SQL-execution-engine-tt43724.html#none
>[2]
>https://cwiki.apache.org/confluence/display/IGNITE/IEP-37%3A+New+query+execution+engine
>[3]  https://issues.apache.org/jira/browse/PHOENIX-2047
>[4]  https://issues.apache.org/jira/browse/DRILL-6381
>[5]  https://issues.apache.org/jira/browse/DRILL-3929
>[6]  https://github.com/apache/ignite/pull/7115 
 
 
 
 
Reply | Threaded
Open this post in threaded view
|

Re: Adding support for Ignite secondary indexes to Apache Calcite planner

Vladimir Ozerov-2
Hi Roman,

Why do you think that Drill-style will not let you exploit collation?
Collation should be propagated from the index scan in the same way as in
other sorted operators, such as merge join or streaming aggregate. Provided
that you use converter-hack (or any alternative solution to trigger parent
re-analysis).
In other words, propagation of collation from Drill-style indexes should be
no different from other sorted operators.

Regards,
Vladimir.

вт, 10 дек. 2019 г. в 16:40, Zhenya Stanilovsky <[hidden email]
>:

>
> Roman just as fast remark, Phoenix builds their approach on
> already existing monolith HBase architecture, most cases it`s just a stub
> for someone who wants use secondary indexes with a base with no
> native support of it. Don`t think it`s good idea here.
>
> >
> >
> >------- Forwarded message -------
> >From: "Roman Kondakov" < [hidden email] >
> >To:  [hidden email]
> >Cc:
> >Subject: Adding support for Ignite secondary indexes to Apache Calcite
> >planner
> >Date: Tue, 10 Dec 2019 15:55:52 +0300
> >
> >Hi all!
> >
> >As you may know there is an activity on integration of Apache Calcite
> >query optimizer into Ignite codebase is being carried out [1],[2].
> >
> >One of a bunch of problems in this integration is the absence of
> >out-of-the-box support for secondary indexes in Apache Calcite. After
> >some research I came to conclusion that this problem has a couple of
> >workarounds. Let's name them
> >1. Phoenix-style approach - representing secondary indexes as
> >materialized views which are natively supported by Calcite engine [3]
> >2. Drill-style approach - pushing filters into the table scans and
> >choose appropriate index for lookups when possible [4]
> >
> >Both these approaches have advantages and disadvantages:
> >
> >Phoenix style pros:
> >- natural way of adding indexes as an alternative source of rows: index
> >can be considered as a kind of sorted materialized view.
> >- possibility of using index sortedness for stream aggregates,
> >deduplication (DISTINCT operator), merge joins, etc.
> >- ability to support other types of indexes (i.e. functional indexes).
> >
> >Phoenix style cons:
> >- polluting optimizer's search space extra table scans hence increasing
> >the planning time.
> >
> >Drill style pros:
> >- easier to implement (although it's questionable).
> >- search space is not inflated.
> >
> >Drill style cons:
> >- missed opportunity to exploit sortedness.
> >
> >There is a good discussion about using both approaches can be found in
> [5].
> >
> >I made a small sketch [6] in order to demonstrate the applicability of
> >the Phoenix approach to Ignite. Key design concepts are:
> >1. On creating indexes are registered as tables in Calcite schema. This
> >step is needed for internal Calcite's routines.
> >2. On planner initialization we register these indexes as materialized
> >views in Calcite's optimizer using VolcanoPlanner#addMaterialization
> >method.
> >3. Right before the query execution Calcite selects all materialized
> >views (indexes) which can be potentially used in query.
> >4. During the query optimization indexes are registered by planner as
> >usual TableScans and hence can be chosen by optimizer if they have lower
> >cost.
> >
> >This sketch shows the ability to exploit index sortedness only. So the
> >future work in this direction should be focused on using indexes for
> >fast index lookups. At first glance FilterableTable and
> >FilterTableScanRule are good points to start. We can push Filter into
> >the TableScan and then use FilterableTable for fast index lookups
> >avoiding reading the whole index on TableScan step and then filtering
> >its output on the Filter step.
> >
> >What do you think?
> >
> >
> >
> >[1]
> >
> http://apache-ignite-developers.2346864.n4.nabble.com/New-SQL-execution-engine-tt43724.html#none
> >[2]
> >
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-37%3A+New+query+execution+engine
> >[3]  https://issues.apache.org/jira/browse/PHOENIX-2047
> >[4]  https://issues.apache.org/jira/browse/DRILL-6381
> >[5]  https://issues.apache.org/jira/browse/DRILL-3929
> >[6]  https://github.com/apache/ignite/pull/7115
>
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Adding support for Ignite secondary indexes to Apache Calcite planner

Alexey Zinoviev
I'd like Drill approach, worked and debugged with something similar, it's
more easy to support


Buuut, you have an implemented prototype (it votes for Phoenix in my mind)

вт, 10 дек. 2019 г. в 17:19, Vladimir Ozerov <[hidden email]>:

> Hi Roman,
>
> Why do you think that Drill-style will not let you exploit collation?
> Collation should be propagated from the index scan in the same way as in
> other sorted operators, such as merge join or streaming aggregate. Provided
> that you use converter-hack (or any alternative solution to trigger parent
> re-analysis).
> In other words, propagation of collation from Drill-style indexes should be
> no different from other sorted operators.
>
> Regards,
> Vladimir.
>
> вт, 10 дек. 2019 г. в 16:40, Zhenya Stanilovsky <[hidden email]
> >:
>
> >
> > Roman just as fast remark, Phoenix builds their approach on
> > already existing monolith HBase architecture, most cases it`s just a stub
> > for someone who wants use secondary indexes with a base with no
> > native support of it. Don`t think it`s good idea here.
> >
> > >
> > >
> > >------- Forwarded message -------
> > >From: "Roman Kondakov" < [hidden email] >
> > >To:  [hidden email]
> > >Cc:
> > >Subject: Adding support for Ignite secondary indexes to Apache Calcite
> > >planner
> > >Date: Tue, 10 Dec 2019 15:55:52 +0300
> > >
> > >Hi all!
> > >
> > >As you may know there is an activity on integration of Apache Calcite
> > >query optimizer into Ignite codebase is being carried out [1],[2].
> > >
> > >One of a bunch of problems in this integration is the absence of
> > >out-of-the-box support for secondary indexes in Apache Calcite. After
> > >some research I came to conclusion that this problem has a couple of
> > >workarounds. Let's name them
> > >1. Phoenix-style approach - representing secondary indexes as
> > >materialized views which are natively supported by Calcite engine [3]
> > >2. Drill-style approach - pushing filters into the table scans and
> > >choose appropriate index for lookups when possible [4]
> > >
> > >Both these approaches have advantages and disadvantages:
> > >
> > >Phoenix style pros:
> > >- natural way of adding indexes as an alternative source of rows: index
> > >can be considered as a kind of sorted materialized view.
> > >- possibility of using index sortedness for stream aggregates,
> > >deduplication (DISTINCT operator), merge joins, etc.
> > >- ability to support other types of indexes (i.e. functional indexes).
> > >
> > >Phoenix style cons:
> > >- polluting optimizer's search space extra table scans hence increasing
> > >the planning time.
> > >
> > >Drill style pros:
> > >- easier to implement (although it's questionable).
> > >- search space is not inflated.
> > >
> > >Drill style cons:
> > >- missed opportunity to exploit sortedness.
> > >
> > >There is a good discussion about using both approaches can be found in
> > [5].
> > >
> > >I made a small sketch [6] in order to demonstrate the applicability of
> > >the Phoenix approach to Ignite. Key design concepts are:
> > >1. On creating indexes are registered as tables in Calcite schema. This
> > >step is needed for internal Calcite's routines.
> > >2. On planner initialization we register these indexes as materialized
> > >views in Calcite's optimizer using VolcanoPlanner#addMaterialization
> > >method.
> > >3. Right before the query execution Calcite selects all materialized
> > >views (indexes) which can be potentially used in query.
> > >4. During the query optimization indexes are registered by planner as
> > >usual TableScans and hence can be chosen by optimizer if they have lower
> > >cost.
> > >
> > >This sketch shows the ability to exploit index sortedness only. So the
> > >future work in this direction should be focused on using indexes for
> > >fast index lookups. At first glance FilterableTable and
> > >FilterTableScanRule are good points to start. We can push Filter into
> > >the TableScan and then use FilterableTable for fast index lookups
> > >avoiding reading the whole index on TableScan step and then filtering
> > >its output on the Filter step.
> > >
> > >What do you think?
> > >
> > >
> > >
> > >[1]
> > >
> >
> http://apache-ignite-developers.2346864.n4.nabble.com/New-SQL-execution-engine-tt43724.html#none
> > >[2]
> > >
> >
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-37%3A+New+query+execution+engine
> > >[3]  https://issues.apache.org/jira/browse/PHOENIX-2047
> > >[4]  https://issues.apache.org/jira/browse/DRILL-6381
> > >[5]  https://issues.apache.org/jira/browse/DRILL-3929
> > >[6]  https://github.com/apache/ignite/pull/7115
> >
> >
> >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Adding support for Ignite secondary indexes to Apache Calcite planner

Roman Kondakov
In reply to this post by Zhenya Stanilovsky
Zhenya,

there is nothing in common in implementation of Ignite indexes and
Phoenix indexes. I just borrowed the idea how Phoenix supplies the index
metadata (index name, columns, sorting, etc.) to Calcite optimizer. It's
not about index implementation, it's about metadata handling.


--
Kind Regards
Roman Kondakov


On 10.12.2019 16:40, Zhenya Stanilovsky wrote:

>
> Roman just as fast remark, Phoenix builds their approach on already existing monolith HBase architecture, most cases it`s just a stub for someone who wants use secondary indexes with a base with no native support of it. Don`t think it`s good idea here.
>    
>>
>>
>> ------- Forwarded message -------
>> From: "Roman Kondakov" < [hidden email] >
>> To:  [hidden email]
>> Cc:
>> Subject: Adding support for Ignite secondary indexes to Apache Calcite
>> planner
>> Date: Tue, 10 Dec 2019 15:55:52 +0300
>>
>> Hi all!
>>
>> As you may know there is an activity on integration of Apache Calcite
>> query optimizer into Ignite codebase is being carried out [1],[2].
>>
>> One of a bunch of problems in this integration is the absence of
>> out-of-the-box support for secondary indexes in Apache Calcite. After
>> some research I came to conclusion that this problem has a couple of
>> workarounds. Let's name them
>> 1. Phoenix-style approach - representing secondary indexes as
>> materialized views which are natively supported by Calcite engine [3]
>> 2. Drill-style approach - pushing filters into the table scans and
>> choose appropriate index for lookups when possible [4]
>>
>> Both these approaches have advantages and disadvantages:
>>
>> Phoenix style pros:
>> - natural way of adding indexes as an alternative source of rows: index
>> can be considered as a kind of sorted materialized view.
>> - possibility of using index sortedness for stream aggregates,
>> deduplication (DISTINCT operator), merge joins, etc.
>> - ability to support other types of indexes (i.e. functional indexes).
>>
>> Phoenix style cons:
>> - polluting optimizer's search space extra table scans hence increasing
>> the planning time.
>>
>> Drill style pros:
>> - easier to implement (although it's questionable).
>> - search space is not inflated.
>>
>> Drill style cons:
>> - missed opportunity to exploit sortedness.
>>
>> There is a good discussion about using both approaches can be found in [5].
>>
>> I made a small sketch [6] in order to demonstrate the applicability of
>> the Phoenix approach to Ignite. Key design concepts are:
>> 1. On creating indexes are registered as tables in Calcite schema. This
>> step is needed for internal Calcite's routines.
>> 2. On planner initialization we register these indexes as materialized
>> views in Calcite's optimizer using VolcanoPlanner#addMaterialization
>> method.
>> 3. Right before the query execution Calcite selects all materialized
>> views (indexes) which can be potentially used in query.
>> 4. During the query optimization indexes are registered by planner as
>> usual TableScans and hence can be chosen by optimizer if they have lower
>> cost.
>>
>> This sketch shows the ability to exploit index sortedness only. So the
>> future work in this direction should be focused on using indexes for
>> fast index lookups. At first glance FilterableTable and
>> FilterTableScanRule are good points to start. We can push Filter into
>> the TableScan and then use FilterableTable for fast index lookups
>> avoiding reading the whole index on TableScan step and then filtering
>> its output on the Filter step.
>>
>> What do you think?
>>
>>
>>
>> [1]
>> http://apache-ignite-developers.2346864.n4.nabble.com/New-SQL-execution-engine-tt43724.html#none
>> [2]
>> https://cwiki.apache.org/confluence/display/IGNITE/IEP-37%3A+New+query+execution+engine
>> [3]  https://issues.apache.org/jira/browse/PHOENIX-2047
>> [4]  https://issues.apache.org/jira/browse/DRILL-6381
>> [5]  https://issues.apache.org/jira/browse/DRILL-3929
>> [6]  https://github.com/apache/ignite/pull/7115 
>  
>  
>  
>  
>
Reply | Threaded
Open this post in threaded view
|

Re: Adding support for Ignite secondary indexes to Apache Calcite planner

Roman Kondakov
In reply to this post by Roman Kondakov
Hi Vladimir,

from what I understand, Drill does not exploit collation of indexes. To
be precise it does not exploit index collation in "natural" way where,
say, we a have sorted TableScan and hence we do not create a new Sort.
Instead of it Drill always create a Sort operator, but if TableScan can
be replaced with an IndexScan, this Sort operator is removed by the
dedicated rule.

Lets consider initial an operator tree:

Project
  Sort
    TableScan

after applying rule DbScanToIndexScanPrule this tree will be converted to:

Project
  Sort
    IndexScan

and finally, after applying DbScanSortRemovalRule we have:

Project
  IndexScan

while for Phoenix approach we would have two equivalent subsets in our
planner:

Project
  Sort
    TableScan

and

Project
  IndexScan

and most likely the last plan  will be chosen as the best one.

--
Kind Regards
Roman Kondakov


On 10.12.2019 17:19, Vladimir Ozerov wrote:

> Hi Roman,
>
> Why do you think that Drill-style will not let you exploit collation?
> Collation should be propagated from the index scan in the same way as in
> other sorted operators, such as merge join or streaming aggregate. Provided
> that you use converter-hack (or any alternative solution to trigger parent
> re-analysis).
> In other words, propagation of collation from Drill-style indexes should be
> no different from other sorted operators.
>
> Regards,
> Vladimir.
>
> вт, 10 дек. 2019 г. в 16:40, Zhenya Stanilovsky <[hidden email]
>> :
>
>>
>> Roman just as fast remark, Phoenix builds their approach on
>> already existing monolith HBase architecture, most cases it`s just a stub
>> for someone who wants use secondary indexes with a base with no
>> native support of it. Don`t think it`s good idea here.
>>
>>>
>>>
>>> ------- Forwarded message -------
>>> From: "Roman Kondakov" < [hidden email] >
>>> To:  [hidden email]
>>> Cc:
>>> Subject: Adding support for Ignite secondary indexes to Apache Calcite
>>> planner
>>> Date: Tue, 10 Dec 2019 15:55:52 +0300
>>>
>>> Hi all!
>>>
>>> As you may know there is an activity on integration of Apache Calcite
>>> query optimizer into Ignite codebase is being carried out [1],[2].
>>>
>>> One of a bunch of problems in this integration is the absence of
>>> out-of-the-box support for secondary indexes in Apache Calcite. After
>>> some research I came to conclusion that this problem has a couple of
>>> workarounds. Let's name them
>>> 1. Phoenix-style approach - representing secondary indexes as
>>> materialized views which are natively supported by Calcite engine [3]
>>> 2. Drill-style approach - pushing filters into the table scans and
>>> choose appropriate index for lookups when possible [4]
>>>
>>> Both these approaches have advantages and disadvantages:
>>>
>>> Phoenix style pros:
>>> - natural way of adding indexes as an alternative source of rows: index
>>> can be considered as a kind of sorted materialized view.
>>> - possibility of using index sortedness for stream aggregates,
>>> deduplication (DISTINCT operator), merge joins, etc.
>>> - ability to support other types of indexes (i.e. functional indexes).
>>>
>>> Phoenix style cons:
>>> - polluting optimizer's search space extra table scans hence increasing
>>> the planning time.
>>>
>>> Drill style pros:
>>> - easier to implement (although it's questionable).
>>> - search space is not inflated.
>>>
>>> Drill style cons:
>>> - missed opportunity to exploit sortedness.
>>>
>>> There is a good discussion about using both approaches can be found in
>> [5].
>>>
>>> I made a small sketch [6] in order to demonstrate the applicability of
>>> the Phoenix approach to Ignite. Key design concepts are:
>>> 1. On creating indexes are registered as tables in Calcite schema. This
>>> step is needed for internal Calcite's routines.
>>> 2. On planner initialization we register these indexes as materialized
>>> views in Calcite's optimizer using VolcanoPlanner#addMaterialization
>>> method.
>>> 3. Right before the query execution Calcite selects all materialized
>>> views (indexes) which can be potentially used in query.
>>> 4. During the query optimization indexes are registered by planner as
>>> usual TableScans and hence can be chosen by optimizer if they have lower
>>> cost.
>>>
>>> This sketch shows the ability to exploit index sortedness only. So the
>>> future work in this direction should be focused on using indexes for
>>> fast index lookups. At first glance FilterableTable and
>>> FilterTableScanRule are good points to start. We can push Filter into
>>> the TableScan and then use FilterableTable for fast index lookups
>>> avoiding reading the whole index on TableScan step and then filtering
>>> its output on the Filter step.
>>>
>>> What do you think?
>>>
>>>
>>>
>>> [1]
>>>
>> http://apache-ignite-developers.2346864.n4.nabble.com/New-SQL-execution-engine-tt43724.html#none
>>> [2]
>>>
>> https://cwiki.apache.org/confluence/display/IGNITE/IEP-37%3A+New+query+execution+engine
>>> [3]  https://issues.apache.org/jira/browse/PHOENIX-2047
>>> [4]  https://issues.apache.org/jira/browse/DRILL-6381
>>> [5]  https://issues.apache.org/jira/browse/DRILL-3929
>>> [6]  https://github.com/apache/ignite/pull/7115
>>
>>
>>
>>
>
Reply | Threaded
Open this post in threaded view
|

Re: Adding support for Ignite secondary indexes to Apache Calcite planner

Roman Kondakov
In reply to this post by Alexey Zinoviev
Alexey,

from my point of view Drill's approach looks like somewhat a hack:
sortedness and index lookups added to a removed from the query plan by
the special rules (which look very messy and complicated). Compare it to
the Phoneix approach where index is added to optimizer as a sorted view
of a table.


--
Kind Regards
Roman Kondakov


On 10.12.2019 17:44, Alexey Zinoviev wrote:

> I'd like Drill approach, worked and debugged with something similar, it's
> more easy to support
>
>
> Buuut, you have an implemented prototype (it votes for Phoenix in my mind)
>
> вт, 10 дек. 2019 г. в 17:19, Vladimir Ozerov <[hidden email]>:
>
>> Hi Roman,
>>
>> Why do you think that Drill-style will not let you exploit collation?
>> Collation should be propagated from the index scan in the same way as in
>> other sorted operators, such as merge join or streaming aggregate. Provided
>> that you use converter-hack (or any alternative solution to trigger parent
>> re-analysis).
>> In other words, propagation of collation from Drill-style indexes should be
>> no different from other sorted operators.
>>
>> Regards,
>> Vladimir.
>>
>> вт, 10 дек. 2019 г. в 16:40, Zhenya Stanilovsky <[hidden email]
>>> :
>>
>>>
>>> Roman just as fast remark, Phoenix builds their approach on
>>> already existing monolith HBase architecture, most cases it`s just a stub
>>> for someone who wants use secondary indexes with a base with no
>>> native support of it. Don`t think it`s good idea here.
>>>
>>>>
>>>>
>>>> ------- Forwarded message -------
>>>> From: "Roman Kondakov" < [hidden email] >
>>>> To:  [hidden email]
>>>> Cc:
>>>> Subject: Adding support for Ignite secondary indexes to Apache Calcite
>>>> planner
>>>> Date: Tue, 10 Dec 2019 15:55:52 +0300
>>>>
>>>> Hi all!
>>>>
>>>> As you may know there is an activity on integration of Apache Calcite
>>>> query optimizer into Ignite codebase is being carried out [1],[2].
>>>>
>>>> One of a bunch of problems in this integration is the absence of
>>>> out-of-the-box support for secondary indexes in Apache Calcite. After
>>>> some research I came to conclusion that this problem has a couple of
>>>> workarounds. Let's name them
>>>> 1. Phoenix-style approach - representing secondary indexes as
>>>> materialized views which are natively supported by Calcite engine [3]
>>>> 2. Drill-style approach - pushing filters into the table scans and
>>>> choose appropriate index for lookups when possible [4]
>>>>
>>>> Both these approaches have advantages and disadvantages:
>>>>
>>>> Phoenix style pros:
>>>> - natural way of adding indexes as an alternative source of rows: index
>>>> can be considered as a kind of sorted materialized view.
>>>> - possibility of using index sortedness for stream aggregates,
>>>> deduplication (DISTINCT operator), merge joins, etc.
>>>> - ability to support other types of indexes (i.e. functional indexes).
>>>>
>>>> Phoenix style cons:
>>>> - polluting optimizer's search space extra table scans hence increasing
>>>> the planning time.
>>>>
>>>> Drill style pros:
>>>> - easier to implement (although it's questionable).
>>>> - search space is not inflated.
>>>>
>>>> Drill style cons:
>>>> - missed opportunity to exploit sortedness.
>>>>
>>>> There is a good discussion about using both approaches can be found in
>>> [5].
>>>>
>>>> I made a small sketch [6] in order to demonstrate the applicability of
>>>> the Phoenix approach to Ignite. Key design concepts are:
>>>> 1. On creating indexes are registered as tables in Calcite schema. This
>>>> step is needed for internal Calcite's routines.
>>>> 2. On planner initialization we register these indexes as materialized
>>>> views in Calcite's optimizer using VolcanoPlanner#addMaterialization
>>>> method.
>>>> 3. Right before the query execution Calcite selects all materialized
>>>> views (indexes) which can be potentially used in query.
>>>> 4. During the query optimization indexes are registered by planner as
>>>> usual TableScans and hence can be chosen by optimizer if they have lower
>>>> cost.
>>>>
>>>> This sketch shows the ability to exploit index sortedness only. So the
>>>> future work in this direction should be focused on using indexes for
>>>> fast index lookups. At first glance FilterableTable and
>>>> FilterTableScanRule are good points to start. We can push Filter into
>>>> the TableScan and then use FilterableTable for fast index lookups
>>>> avoiding reading the whole index on TableScan step and then filtering
>>>> its output on the Filter step.
>>>>
>>>> What do you think?
>>>>
>>>>
>>>>
>>>> [1]
>>>>
>>>
>> http://apache-ignite-developers.2346864.n4.nabble.com/New-SQL-execution-engine-tt43724.html#none
>>>> [2]
>>>>
>>>
>> https://cwiki.apache.org/confluence/display/IGNITE/IEP-37%3A+New+query+execution+engine
>>>> [3]  https://issues.apache.org/jira/browse/PHOENIX-2047
>>>> [4]  https://issues.apache.org/jira/browse/DRILL-6381
>>>> [5]  https://issues.apache.org/jira/browse/DRILL-3929
>>>> [6]  https://github.com/apache/ignite/pull/7115
>>>
>>>
>>>
>>>
>>
>
Reply | Threaded
Open this post in threaded view
|

Re: Adding support for Ignite secondary indexes to Apache Calcite planner

Vladimir Ozerov-2
In reply to this post by Roman Kondakov
Roman,

What is the advantage of Phoenix approach then? BTW, it looks like Phoenix
integration with Calcite never made it to production, did it?

вт, 10 дек. 2019 г. в 19:50, Roman Kondakov <[hidden email]>:

> Hi Vladimir,
>
> from what I understand, Drill does not exploit collation of indexes. To
> be precise it does not exploit index collation in "natural" way where,
> say, we a have sorted TableScan and hence we do not create a new Sort.
> Instead of it Drill always create a Sort operator, but if TableScan can
> be replaced with an IndexScan, this Sort operator is removed by the
> dedicated rule.
>
> Lets consider initial an operator tree:
>
> Project
>   Sort
>     TableScan
>
> after applying rule DbScanToIndexScanPrule this tree will be converted to:
>
> Project
>   Sort
>     IndexScan
>
> and finally, after applying DbScanSortRemovalRule we have:
>
> Project
>   IndexScan
>
> while for Phoenix approach we would have two equivalent subsets in our
> planner:
>
> Project
>   Sort
>     TableScan
>
> and
>
> Project
>   IndexScan
>
> and most likely the last plan  will be chosen as the best one.
>
> --
> Kind Regards
> Roman Kondakov
>
>
> On 10.12.2019 17:19, Vladimir Ozerov wrote:
> > Hi Roman,
> >
> > Why do you think that Drill-style will not let you exploit collation?
> > Collation should be propagated from the index scan in the same way as in
> > other sorted operators, such as merge join or streaming aggregate.
> Provided
> > that you use converter-hack (or any alternative solution to trigger
> parent
> > re-analysis).
> > In other words, propagation of collation from Drill-style indexes should
> be
> > no different from other sorted operators.
> >
> > Regards,
> > Vladimir.
> >
> > вт, 10 дек. 2019 г. в 16:40, Zhenya Stanilovsky
> <[hidden email]
> >> :
> >
> >>
> >> Roman just as fast remark, Phoenix builds their approach on
> >> already existing monolith HBase architecture, most cases it`s just a
> stub
> >> for someone who wants use secondary indexes with a base with no
> >> native support of it. Don`t think it`s good idea here.
> >>
> >>>
> >>>
> >>> ------- Forwarded message -------
> >>> From: "Roman Kondakov" < [hidden email] >
> >>> To:  [hidden email]
> >>> Cc:
> >>> Subject: Adding support for Ignite secondary indexes to Apache Calcite
> >>> planner
> >>> Date: Tue, 10 Dec 2019 15:55:52 +0300
> >>>
> >>> Hi all!
> >>>
> >>> As you may know there is an activity on integration of Apache Calcite
> >>> query optimizer into Ignite codebase is being carried out [1],[2].
> >>>
> >>> One of a bunch of problems in this integration is the absence of
> >>> out-of-the-box support for secondary indexes in Apache Calcite. After
> >>> some research I came to conclusion that this problem has a couple of
> >>> workarounds. Let's name them
> >>> 1. Phoenix-style approach - representing secondary indexes as
> >>> materialized views which are natively supported by Calcite engine [3]
> >>> 2. Drill-style approach - pushing filters into the table scans and
> >>> choose appropriate index for lookups when possible [4]
> >>>
> >>> Both these approaches have advantages and disadvantages:
> >>>
> >>> Phoenix style pros:
> >>> - natural way of adding indexes as an alternative source of rows: index
> >>> can be considered as a kind of sorted materialized view.
> >>> - possibility of using index sortedness for stream aggregates,
> >>> deduplication (DISTINCT operator), merge joins, etc.
> >>> - ability to support other types of indexes (i.e. functional indexes).
> >>>
> >>> Phoenix style cons:
> >>> - polluting optimizer's search space extra table scans hence increasing
> >>> the planning time.
> >>>
> >>> Drill style pros:
> >>> - easier to implement (although it's questionable).
> >>> - search space is not inflated.
> >>>
> >>> Drill style cons:
> >>> - missed opportunity to exploit sortedness.
> >>>
> >>> There is a good discussion about using both approaches can be found in
> >> [5].
> >>>
> >>> I made a small sketch [6] in order to demonstrate the applicability of
> >>> the Phoenix approach to Ignite. Key design concepts are:
> >>> 1. On creating indexes are registered as tables in Calcite schema. This
> >>> step is needed for internal Calcite's routines.
> >>> 2. On planner initialization we register these indexes as materialized
> >>> views in Calcite's optimizer using VolcanoPlanner#addMaterialization
> >>> method.
> >>> 3. Right before the query execution Calcite selects all materialized
> >>> views (indexes) which can be potentially used in query.
> >>> 4. During the query optimization indexes are registered by planner as
> >>> usual TableScans and hence can be chosen by optimizer if they have
> lower
> >>> cost.
> >>>
> >>> This sketch shows the ability to exploit index sortedness only. So the
> >>> future work in this direction should be focused on using indexes for
> >>> fast index lookups. At first glance FilterableTable and
> >>> FilterTableScanRule are good points to start. We can push Filter into
> >>> the TableScan and then use FilterableTable for fast index lookups
> >>> avoiding reading the whole index on TableScan step and then filtering
> >>> its output on the Filter step.
> >>>
> >>> What do you think?
> >>>
> >>>
> >>>
> >>> [1]
> >>>
> >>
> http://apache-ignite-developers.2346864.n4.nabble.com/New-SQL-execution-engine-tt43724.html#none
> >>> [2]
> >>>
> >>
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-37%3A+New+query+execution+engine
> >>> [3]  https://issues.apache.org/jira/browse/PHOENIX-2047
> >>> [4]  https://issues.apache.org/jira/browse/DRILL-6381
> >>> [5]  https://issues.apache.org/jira/browse/DRILL-3929
> >>> [6]  https://github.com/apache/ignite/pull/7115
> >>
> >>
> >>
> >>
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Adding support for Ignite secondary indexes to Apache Calcite planner

Ivan Pavlukhin
Vladimir,

You are right Phoenix integration with Calcite stalled halfway. See
[1] to get some reasons.

[1] https://lists.apache.org/thread.html/0152a97bfebb85c74f10e26e94ab9cd416dec374abba7dc2e1af9d61%40%3Cdev.phoenix.apache.org%3E

ср, 11 дек. 2019 г. в 17:11, Vladimir Ozerov <[hidden email]>:

>
> Roman,
>
> What is the advantage of Phoenix approach then? BTW, it looks like Phoenix
> integration with Calcite never made it to production, did it?
>
> вт, 10 дек. 2019 г. в 19:50, Roman Kondakov <[hidden email]>:
>
> > Hi Vladimir,
> >
> > from what I understand, Drill does not exploit collation of indexes. To
> > be precise it does not exploit index collation in "natural" way where,
> > say, we a have sorted TableScan and hence we do not create a new Sort.
> > Instead of it Drill always create a Sort operator, but if TableScan can
> > be replaced with an IndexScan, this Sort operator is removed by the
> > dedicated rule.
> >
> > Lets consider initial an operator tree:
> >
> > Project
> >   Sort
> >     TableScan
> >
> > after applying rule DbScanToIndexScanPrule this tree will be converted to:
> >
> > Project
> >   Sort
> >     IndexScan
> >
> > and finally, after applying DbScanSortRemovalRule we have:
> >
> > Project
> >   IndexScan
> >
> > while for Phoenix approach we would have two equivalent subsets in our
> > planner:
> >
> > Project
> >   Sort
> >     TableScan
> >
> > and
> >
> > Project
> >   IndexScan
> >
> > and most likely the last plan  will be chosen as the best one.
> >
> > --
> > Kind Regards
> > Roman Kondakov
> >
> >
> > On 10.12.2019 17:19, Vladimir Ozerov wrote:
> > > Hi Roman,
> > >
> > > Why do you think that Drill-style will not let you exploit collation?
> > > Collation should be propagated from the index scan in the same way as in
> > > other sorted operators, such as merge join or streaming aggregate.
> > Provided
> > > that you use converter-hack (or any alternative solution to trigger
> > parent
> > > re-analysis).
> > > In other words, propagation of collation from Drill-style indexes should
> > be
> > > no different from other sorted operators.
> > >
> > > Regards,
> > > Vladimir.
> > >
> > > вт, 10 дек. 2019 г. в 16:40, Zhenya Stanilovsky
> > <[hidden email]
> > >> :
> > >
> > >>
> > >> Roman just as fast remark, Phoenix builds their approach on
> > >> already existing monolith HBase architecture, most cases it`s just a
> > stub
> > >> for someone who wants use secondary indexes with a base with no
> > >> native support of it. Don`t think it`s good idea here.
> > >>
> > >>>
> > >>>
> > >>> ------- Forwarded message -------
> > >>> From: "Roman Kondakov" < [hidden email] >
> > >>> To:  [hidden email]
> > >>> Cc:
> > >>> Subject: Adding support for Ignite secondary indexes to Apache Calcite
> > >>> planner
> > >>> Date: Tue, 10 Dec 2019 15:55:52 +0300
> > >>>
> > >>> Hi all!
> > >>>
> > >>> As you may know there is an activity on integration of Apache Calcite
> > >>> query optimizer into Ignite codebase is being carried out [1],[2].
> > >>>
> > >>> One of a bunch of problems in this integration is the absence of
> > >>> out-of-the-box support for secondary indexes in Apache Calcite. After
> > >>> some research I came to conclusion that this problem has a couple of
> > >>> workarounds. Let's name them
> > >>> 1. Phoenix-style approach - representing secondary indexes as
> > >>> materialized views which are natively supported by Calcite engine [3]
> > >>> 2. Drill-style approach - pushing filters into the table scans and
> > >>> choose appropriate index for lookups when possible [4]
> > >>>
> > >>> Both these approaches have advantages and disadvantages:
> > >>>
> > >>> Phoenix style pros:
> > >>> - natural way of adding indexes as an alternative source of rows: index
> > >>> can be considered as a kind of sorted materialized view.
> > >>> - possibility of using index sortedness for stream aggregates,
> > >>> deduplication (DISTINCT operator), merge joins, etc.
> > >>> - ability to support other types of indexes (i.e. functional indexes).
> > >>>
> > >>> Phoenix style cons:
> > >>> - polluting optimizer's search space extra table scans hence increasing
> > >>> the planning time.
> > >>>
> > >>> Drill style pros:
> > >>> - easier to implement (although it's questionable).
> > >>> - search space is not inflated.
> > >>>
> > >>> Drill style cons:
> > >>> - missed opportunity to exploit sortedness.
> > >>>
> > >>> There is a good discussion about using both approaches can be found in
> > >> [5].
> > >>>
> > >>> I made a small sketch [6] in order to demonstrate the applicability of
> > >>> the Phoenix approach to Ignite. Key design concepts are:
> > >>> 1. On creating indexes are registered as tables in Calcite schema. This
> > >>> step is needed for internal Calcite's routines.
> > >>> 2. On planner initialization we register these indexes as materialized
> > >>> views in Calcite's optimizer using VolcanoPlanner#addMaterialization
> > >>> method.
> > >>> 3. Right before the query execution Calcite selects all materialized
> > >>> views (indexes) which can be potentially used in query.
> > >>> 4. During the query optimization indexes are registered by planner as
> > >>> usual TableScans and hence can be chosen by optimizer if they have
> > lower
> > >>> cost.
> > >>>
> > >>> This sketch shows the ability to exploit index sortedness only. So the
> > >>> future work in this direction should be focused on using indexes for
> > >>> fast index lookups. At first glance FilterableTable and
> > >>> FilterTableScanRule are good points to start. We can push Filter into
> > >>> the TableScan and then use FilterableTable for fast index lookups
> > >>> avoiding reading the whole index on TableScan step and then filtering
> > >>> its output on the Filter step.
> > >>>
> > >>> What do you think?
> > >>>
> > >>>
> > >>>
> > >>> [1]
> > >>>
> > >>
> > http://apache-ignite-developers.2346864.n4.nabble.com/New-SQL-execution-engine-tt43724.html#none
> > >>> [2]
> > >>>
> > >>
> > https://cwiki.apache.org/confluence/display/IGNITE/IEP-37%3A+New+query+execution+engine
> > >>> [3]  https://issues.apache.org/jira/browse/PHOENIX-2047
> > >>> [4]  https://issues.apache.org/jira/browse/DRILL-6381
> > >>> [5]  https://issues.apache.org/jira/browse/DRILL-3929
> > >>> [6]  https://github.com/apache/ignite/pull/7115
> > >>
> > >>
> > >>
> > >>
> > >
> >



--
Best regards,
Ivan Pavlukhin
Reply | Threaded
Open this post in threaded view
|

Re: Adding support for Ignite secondary indexes to Apache Calcite planner

Roman Kondakov
In reply to this post by Vladimir Ozerov-2
Vladimir,

the main advantage of the Phoenix approach I can see is the using of
Calcite's native materializations API. Calcite has advanced support for
materializations [1] and lattices [2]. Since secondary indexes can be
considered as materialized views (it's just a sorted representation of
the same table) we can seamlessly use views to simulate indexes behavior
for Calcite planner.


[1] https://calcite.apache.org/docs/materialized_views.html
[2] https://calcite.apache.org/docs/lattice.html

--
Kind Regards
Roman Kondakov


On 11.12.2019 17:11, Vladimir Ozerov wrote:

> Roman,
>
> What is the advantage of Phoenix approach then? BTW, it looks like Phoenix
> integration with Calcite never made it to production, did it?
>
> вт, 10 дек. 2019 г. в 19:50, Roman Kondakov <[hidden email]>:
>
>> Hi Vladimir,
>>
>> from what I understand, Drill does not exploit collation of indexes. To
>> be precise it does not exploit index collation in "natural" way where,
>> say, we a have sorted TableScan and hence we do not create a new Sort.
>> Instead of it Drill always create a Sort operator, but if TableScan can
>> be replaced with an IndexScan, this Sort operator is removed by the
>> dedicated rule.
>>
>> Lets consider initial an operator tree:
>>
>> Project
>>   Sort
>>     TableScan
>>
>> after applying rule DbScanToIndexScanPrule this tree will be converted to:
>>
>> Project
>>   Sort
>>     IndexScan
>>
>> and finally, after applying DbScanSortRemovalRule we have:
>>
>> Project
>>   IndexScan
>>
>> while for Phoenix approach we would have two equivalent subsets in our
>> planner:
>>
>> Project
>>   Sort
>>     TableScan
>>
>> and
>>
>> Project
>>   IndexScan
>>
>> and most likely the last plan  will be chosen as the best one.
>>
>> --
>> Kind Regards
>> Roman Kondakov
>>
>>
>> On 10.12.2019 17:19, Vladimir Ozerov wrote:
>>> Hi Roman,
>>>
>>> Why do you think that Drill-style will not let you exploit collation?
>>> Collation should be propagated from the index scan in the same way as in
>>> other sorted operators, such as merge join or streaming aggregate.
>> Provided
>>> that you use converter-hack (or any alternative solution to trigger
>> parent
>>> re-analysis).
>>> In other words, propagation of collation from Drill-style indexes should
>> be
>>> no different from other sorted operators.
>>>
>>> Regards,
>>> Vladimir.
>>>
>>> вт, 10 дек. 2019 г. в 16:40, Zhenya Stanilovsky
>> <[hidden email]
>>>> :
>>>
>>>>
>>>> Roman just as fast remark, Phoenix builds their approach on
>>>> already existing monolith HBase architecture, most cases it`s just a
>> stub
>>>> for someone who wants use secondary indexes with a base with no
>>>> native support of it. Don`t think it`s good idea here.
>>>>
>>>>>
>>>>>
>>>>> ------- Forwarded message -------
>>>>> From: "Roman Kondakov" < [hidden email] >
>>>>> To:  [hidden email]
>>>>> Cc:
>>>>> Subject: Adding support for Ignite secondary indexes to Apache Calcite
>>>>> planner
>>>>> Date: Tue, 10 Dec 2019 15:55:52 +0300
>>>>>
>>>>> Hi all!
>>>>>
>>>>> As you may know there is an activity on integration of Apache Calcite
>>>>> query optimizer into Ignite codebase is being carried out [1],[2].
>>>>>
>>>>> One of a bunch of problems in this integration is the absence of
>>>>> out-of-the-box support for secondary indexes in Apache Calcite. After
>>>>> some research I came to conclusion that this problem has a couple of
>>>>> workarounds. Let's name them
>>>>> 1. Phoenix-style approach - representing secondary indexes as
>>>>> materialized views which are natively supported by Calcite engine [3]
>>>>> 2. Drill-style approach - pushing filters into the table scans and
>>>>> choose appropriate index for lookups when possible [4]
>>>>>
>>>>> Both these approaches have advantages and disadvantages:
>>>>>
>>>>> Phoenix style pros:
>>>>> - natural way of adding indexes as an alternative source of rows: index
>>>>> can be considered as a kind of sorted materialized view.
>>>>> - possibility of using index sortedness for stream aggregates,
>>>>> deduplication (DISTINCT operator), merge joins, etc.
>>>>> - ability to support other types of indexes (i.e. functional indexes).
>>>>>
>>>>> Phoenix style cons:
>>>>> - polluting optimizer's search space extra table scans hence increasing
>>>>> the planning time.
>>>>>
>>>>> Drill style pros:
>>>>> - easier to implement (although it's questionable).
>>>>> - search space is not inflated.
>>>>>
>>>>> Drill style cons:
>>>>> - missed opportunity to exploit sortedness.
>>>>>
>>>>> There is a good discussion about using both approaches can be found in
>>>> [5].
>>>>>
>>>>> I made a small sketch [6] in order to demonstrate the applicability of
>>>>> the Phoenix approach to Ignite. Key design concepts are:
>>>>> 1. On creating indexes are registered as tables in Calcite schema. This
>>>>> step is needed for internal Calcite's routines.
>>>>> 2. On planner initialization we register these indexes as materialized
>>>>> views in Calcite's optimizer using VolcanoPlanner#addMaterialization
>>>>> method.
>>>>> 3. Right before the query execution Calcite selects all materialized
>>>>> views (indexes) which can be potentially used in query.
>>>>> 4. During the query optimization indexes are registered by planner as
>>>>> usual TableScans and hence can be chosen by optimizer if they have
>> lower
>>>>> cost.
>>>>>
>>>>> This sketch shows the ability to exploit index sortedness only. So the
>>>>> future work in this direction should be focused on using indexes for
>>>>> fast index lookups. At first glance FilterableTable and
>>>>> FilterTableScanRule are good points to start. We can push Filter into
>>>>> the TableScan and then use FilterableTable for fast index lookups
>>>>> avoiding reading the whole index on TableScan step and then filtering
>>>>> its output on the Filter step.
>>>>>
>>>>> What do you think?
>>>>>
>>>>>
>>>>>
>>>>> [1]
>>>>>
>>>>
>> http://apache-ignite-developers.2346864.n4.nabble.com/New-SQL-execution-engine-tt43724.html#none
>>>>> [2]
>>>>>
>>>>
>> https://cwiki.apache.org/confluence/display/IGNITE/IEP-37%3A+New+query+execution+engine
>>>>> [3]  https://issues.apache.org/jira/browse/PHOENIX-2047
>>>>> [4]  https://issues.apache.org/jira/browse/DRILL-6381
>>>>> [5]  https://issues.apache.org/jira/browse/DRILL-3929
>>>>> [6]  https://github.com/apache/ignite/pull/7115
>>>>
>>>>
>>>>
>>>>
>>>
>>
>
Reply | Threaded
Open this post in threaded view
|

Re: Adding support for Ignite secondary indexes to Apache Calcite planner

Vladimir Ozerov-2
Roman,

What I am trying to understand is what advantage of materialization API you
see over the normal optimization process? Does it save optimization time,
or reduce memory footprint, or maybe provide better plans? I am asking
because I do not see how expressing indexes as materializations fit
classical optimization process. We discussed Sort <- Scan optimization.
Let's consider another example:

LogicalSort[a ASC]
  LogicalJoin

Initially, you do not know the implementation of the join, and hence do not
know it's collation. Then you may execute physical join rules, which
produce, say, PhysicalMergeJoin[a ASC]. If you execute sort implementation
rule afterwards, you may easily eliminate the sort, or make it simpler
(e.g. remove local sorting phase), depending on the distribution. In other
words, proper implementation of sorting optimization assumes that you have
a kind of SortRemoveRule anyway, irrespectively of whether you use
materializations or not, because sorting may be injected on top of any
operator. With this in mind, the use of materializations doesn't make the
planner simpler. Neither it improves the outcome of the whole optimization
process.

What is left is either lower CPU or RAM usage? Is this the case?

ср, 11 дек. 2019 г. в 18:37, Roman Kondakov <[hidden email]>:

> Vladimir,
>
> the main advantage of the Phoenix approach I can see is the using of
> Calcite's native materializations API. Calcite has advanced support for
> materializations [1] and lattices [2]. Since secondary indexes can be
> considered as materialized views (it's just a sorted representation of
> the same table) we can seamlessly use views to simulate indexes behavior
> for Calcite planner.
>
>
> [1] https://calcite.apache.org/docs/materialized_views.html
> [2] https://calcite.apache.org/docs/lattice.html
>
> --
> Kind Regards
> Roman Kondakov
>
>
> On 11.12.2019 17:11, Vladimir Ozerov wrote:
> > Roman,
> >
> > What is the advantage of Phoenix approach then? BTW, it looks like
> Phoenix
> > integration with Calcite never made it to production, did it?
> >
> > вт, 10 дек. 2019 г. в 19:50, Roman Kondakov <[hidden email]
> >:
> >
> >> Hi Vladimir,
> >>
> >> from what I understand, Drill does not exploit collation of indexes. To
> >> be precise it does not exploit index collation in "natural" way where,
> >> say, we a have sorted TableScan and hence we do not create a new Sort.
> >> Instead of it Drill always create a Sort operator, but if TableScan can
> >> be replaced with an IndexScan, this Sort operator is removed by the
> >> dedicated rule.
> >>
> >> Lets consider initial an operator tree:
> >>
> >> Project
> >>   Sort
> >>     TableScan
> >>
> >> after applying rule DbScanToIndexScanPrule this tree will be converted
> to:
> >>
> >> Project
> >>   Sort
> >>     IndexScan
> >>
> >> and finally, after applying DbScanSortRemovalRule we have:
> >>
> >> Project
> >>   IndexScan
> >>
> >> while for Phoenix approach we would have two equivalent subsets in our
> >> planner:
> >>
> >> Project
> >>   Sort
> >>     TableScan
> >>
> >> and
> >>
> >> Project
> >>   IndexScan
> >>
> >> and most likely the last plan  will be chosen as the best one.
> >>
> >> --
> >> Kind Regards
> >> Roman Kondakov
> >>
> >>
> >> On 10.12.2019 17:19, Vladimir Ozerov wrote:
> >>> Hi Roman,
> >>>
> >>> Why do you think that Drill-style will not let you exploit collation?
> >>> Collation should be propagated from the index scan in the same way as
> in
> >>> other sorted operators, such as merge join or streaming aggregate.
> >> Provided
> >>> that you use converter-hack (or any alternative solution to trigger
> >> parent
> >>> re-analysis).
> >>> In other words, propagation of collation from Drill-style indexes
> should
> >> be
> >>> no different from other sorted operators.
> >>>
> >>> Regards,
> >>> Vladimir.
> >>>
> >>> вт, 10 дек. 2019 г. в 16:40, Zhenya Stanilovsky
> >> <[hidden email]
> >>>> :
> >>>
> >>>>
> >>>> Roman just as fast remark, Phoenix builds their approach on
> >>>> already existing monolith HBase architecture, most cases it`s just a
> >> stub
> >>>> for someone who wants use secondary indexes with a base with no
> >>>> native support of it. Don`t think it`s good idea here.
> >>>>
> >>>>>
> >>>>>
> >>>>> ------- Forwarded message -------
> >>>>> From: "Roman Kondakov" < [hidden email] >
> >>>>> To:  [hidden email]
> >>>>> Cc:
> >>>>> Subject: Adding support for Ignite secondary indexes to Apache
> Calcite
> >>>>> planner
> >>>>> Date: Tue, 10 Dec 2019 15:55:52 +0300
> >>>>>
> >>>>> Hi all!
> >>>>>
> >>>>> As you may know there is an activity on integration of Apache Calcite
> >>>>> query optimizer into Ignite codebase is being carried out [1],[2].
> >>>>>
> >>>>> One of a bunch of problems in this integration is the absence of
> >>>>> out-of-the-box support for secondary indexes in Apache Calcite. After
> >>>>> some research I came to conclusion that this problem has a couple of
> >>>>> workarounds. Let's name them
> >>>>> 1. Phoenix-style approach - representing secondary indexes as
> >>>>> materialized views which are natively supported by Calcite engine [3]
> >>>>> 2. Drill-style approach - pushing filters into the table scans and
> >>>>> choose appropriate index for lookups when possible [4]
> >>>>>
> >>>>> Both these approaches have advantages and disadvantages:
> >>>>>
> >>>>> Phoenix style pros:
> >>>>> - natural way of adding indexes as an alternative source of rows:
> index
> >>>>> can be considered as a kind of sorted materialized view.
> >>>>> - possibility of using index sortedness for stream aggregates,
> >>>>> deduplication (DISTINCT operator), merge joins, etc.
> >>>>> - ability to support other types of indexes (i.e. functional
> indexes).
> >>>>>
> >>>>> Phoenix style cons:
> >>>>> - polluting optimizer's search space extra table scans hence
> increasing
> >>>>> the planning time.
> >>>>>
> >>>>> Drill style pros:
> >>>>> - easier to implement (although it's questionable).
> >>>>> - search space is not inflated.
> >>>>>
> >>>>> Drill style cons:
> >>>>> - missed opportunity to exploit sortedness.
> >>>>>
> >>>>> There is a good discussion about using both approaches can be found
> in
> >>>> [5].
> >>>>>
> >>>>> I made a small sketch [6] in order to demonstrate the applicability
> of
> >>>>> the Phoenix approach to Ignite. Key design concepts are:
> >>>>> 1. On creating indexes are registered as tables in Calcite schema.
> This
> >>>>> step is needed for internal Calcite's routines.
> >>>>> 2. On planner initialization we register these indexes as
> materialized
> >>>>> views in Calcite's optimizer using VolcanoPlanner#addMaterialization
> >>>>> method.
> >>>>> 3. Right before the query execution Calcite selects all materialized
> >>>>> views (indexes) which can be potentially used in query.
> >>>>> 4. During the query optimization indexes are registered by planner as
> >>>>> usual TableScans and hence can be chosen by optimizer if they have
> >> lower
> >>>>> cost.
> >>>>>
> >>>>> This sketch shows the ability to exploit index sortedness only. So
> the
> >>>>> future work in this direction should be focused on using indexes for
> >>>>> fast index lookups. At first glance FilterableTable and
> >>>>> FilterTableScanRule are good points to start. We can push Filter into
> >>>>> the TableScan and then use FilterableTable for fast index lookups
> >>>>> avoiding reading the whole index on TableScan step and then filtering
> >>>>> its output on the Filter step.
> >>>>>
> >>>>> What do you think?
> >>>>>
> >>>>>
> >>>>>
> >>>>> [1]
> >>>>>
> >>>>
> >>
> http://apache-ignite-developers.2346864.n4.nabble.com/New-SQL-execution-engine-tt43724.html#none
> >>>>> [2]
> >>>>>
> >>>>
> >>
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-37%3A+New+query+execution+engine
> >>>>> [3]  https://issues.apache.org/jira/browse/PHOENIX-2047
> >>>>> [4]  https://issues.apache.org/jira/browse/DRILL-6381
> >>>>> [5]  https://issues.apache.org/jira/browse/DRILL-3929
> >>>>> [6]  https://github.com/apache/ignite/pull/7115
> >>>>
> >>>>
> >>>>
> >>>>
> >>>
> >>
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Adding support for Ignite secondary indexes to Apache Calcite planner

gvvinblade
Colleagues,

As far as I understand, materialization acts like a special rule, that matches some subtree pattern (a leaf part of a query plan) to a star table, which may have better cost than the subtree, it replaces. Saying that, in general, there is no difference between approaches - they do the same almost in the same way but using different API.

My opinion is it’s better to do the deal using rules - it makes overall approach consistent.

Regards,
Igor

> 12 дек. 2019 г., в 10:03, Vladimir Ozerov <[hidden email]> написал(а):
>
> Roman,
>
> What I am trying to understand is what advantage of materialization API you
> see over the normal optimization process? Does it save optimization time,
> or reduce memory footprint, or maybe provide better plans? I am asking
> because I do not see how expressing indexes as materializations fit
> classical optimization process. We discussed Sort <- Scan optimization.
> Let's consider another example:
>
> LogicalSort[a ASC]
>  LogicalJoin
>
> Initially, you do not know the implementation of the join, and hence do not
> know it's collation. Then you may execute physical join rules, which
> produce, say, PhysicalMergeJoin[a ASC]. If you execute sort implementation
> rule afterwards, you may easily eliminate the sort, or make it simpler
> (e.g. remove local sorting phase), depending on the distribution. In other
> words, proper implementation of sorting optimization assumes that you have
> a kind of SortRemoveRule anyway, irrespectively of whether you use
> materializations or not, because sorting may be injected on top of any
> operator. With this in mind, the use of materializations doesn't make the
> planner simpler. Neither it improves the outcome of the whole optimization
> process.
>
> What is left is either lower CPU or RAM usage? Is this the case?
>
> ср, 11 дек. 2019 г. в 18:37, Roman Kondakov <[hidden email]>:
>
>> Vladimir,
>>
>> the main advantage of the Phoenix approach I can see is the using of
>> Calcite's native materializations API. Calcite has advanced support for
>> materializations [1] and lattices [2]. Since secondary indexes can be
>> considered as materialized views (it's just a sorted representation of
>> the same table) we can seamlessly use views to simulate indexes behavior
>> for Calcite planner.
>>
>>
>> [1] https://calcite.apache.org/docs/materialized_views.html
>> [2] https://calcite.apache.org/docs/lattice.html
>>
>> --
>> Kind Regards
>> Roman Kondakov
>>
>>
>> On 11.12.2019 17:11, Vladimir Ozerov wrote:
>>> Roman,
>>>
>>> What is the advantage of Phoenix approach then? BTW, it looks like
>> Phoenix
>>> integration with Calcite never made it to production, did it?
>>>
>>> вт, 10 дек. 2019 г. в 19:50, Roman Kondakov <[hidden email]
>>> :
>>>
>>>> Hi Vladimir,
>>>>
>>>> from what I understand, Drill does not exploit collation of indexes. To
>>>> be precise it does not exploit index collation in "natural" way where,
>>>> say, we a have sorted TableScan and hence we do not create a new Sort.
>>>> Instead of it Drill always create a Sort operator, but if TableScan can
>>>> be replaced with an IndexScan, this Sort operator is removed by the
>>>> dedicated rule.
>>>>
>>>> Lets consider initial an operator tree:
>>>>
>>>> Project
>>>>  Sort
>>>>    TableScan
>>>>
>>>> after applying rule DbScanToIndexScanPrule this tree will be converted
>> to:
>>>>
>>>> Project
>>>>  Sort
>>>>    IndexScan
>>>>
>>>> and finally, after applying DbScanSortRemovalRule we have:
>>>>
>>>> Project
>>>>  IndexScan
>>>>
>>>> while for Phoenix approach we would have two equivalent subsets in our
>>>> planner:
>>>>
>>>> Project
>>>>  Sort
>>>>    TableScan
>>>>
>>>> and
>>>>
>>>> Project
>>>>  IndexScan
>>>>
>>>> and most likely the last plan  will be chosen as the best one.
>>>>
>>>> --
>>>> Kind Regards
>>>> Roman Kondakov
>>>>
>>>>
>>>> On 10.12.2019 17:19, Vladimir Ozerov wrote:
>>>>> Hi Roman,
>>>>>
>>>>> Why do you think that Drill-style will not let you exploit collation?
>>>>> Collation should be propagated from the index scan in the same way as
>> in
>>>>> other sorted operators, such as merge join or streaming aggregate.
>>>> Provided
>>>>> that you use converter-hack (or any alternative solution to trigger
>>>> parent
>>>>> re-analysis).
>>>>> In other words, propagation of collation from Drill-style indexes
>> should
>>>> be
>>>>> no different from other sorted operators.
>>>>>
>>>>> Regards,
>>>>> Vladimir.
>>>>>
>>>>> вт, 10 дек. 2019 г. в 16:40, Zhenya Stanilovsky
>>>> <[hidden email]
>>>>>> :
>>>>>
>>>>>>
>>>>>> Roman just as fast remark, Phoenix builds their approach on
>>>>>> already existing monolith HBase architecture, most cases it`s just a
>>>> stub
>>>>>> for someone who wants use secondary indexes with a base with no
>>>>>> native support of it. Don`t think it`s good idea here.
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> ------- Forwarded message -------
>>>>>>> From: "Roman Kondakov" < [hidden email] >
>>>>>>> To:  [hidden email]
>>>>>>> Cc:
>>>>>>> Subject: Adding support for Ignite secondary indexes to Apache
>> Calcite
>>>>>>> planner
>>>>>>> Date: Tue, 10 Dec 2019 15:55:52 +0300
>>>>>>>
>>>>>>> Hi all!
>>>>>>>
>>>>>>> As you may know there is an activity on integration of Apache Calcite
>>>>>>> query optimizer into Ignite codebase is being carried out [1],[2].
>>>>>>>
>>>>>>> One of a bunch of problems in this integration is the absence of
>>>>>>> out-of-the-box support for secondary indexes in Apache Calcite. After
>>>>>>> some research I came to conclusion that this problem has a couple of
>>>>>>> workarounds. Let's name them
>>>>>>> 1. Phoenix-style approach - representing secondary indexes as
>>>>>>> materialized views which are natively supported by Calcite engine [3]
>>>>>>> 2. Drill-style approach - pushing filters into the table scans and
>>>>>>> choose appropriate index for lookups when possible [4]
>>>>>>>
>>>>>>> Both these approaches have advantages and disadvantages:
>>>>>>>
>>>>>>> Phoenix style pros:
>>>>>>> - natural way of adding indexes as an alternative source of rows:
>> index
>>>>>>> can be considered as a kind of sorted materialized view.
>>>>>>> - possibility of using index sortedness for stream aggregates,
>>>>>>> deduplication (DISTINCT operator), merge joins, etc.
>>>>>>> - ability to support other types of indexes (i.e. functional
>> indexes).
>>>>>>>
>>>>>>> Phoenix style cons:
>>>>>>> - polluting optimizer's search space extra table scans hence
>> increasing
>>>>>>> the planning time.
>>>>>>>
>>>>>>> Drill style pros:
>>>>>>> - easier to implement (although it's questionable).
>>>>>>> - search space is not inflated.
>>>>>>>
>>>>>>> Drill style cons:
>>>>>>> - missed opportunity to exploit sortedness.
>>>>>>>
>>>>>>> There is a good discussion about using both approaches can be found
>> in
>>>>>> [5].
>>>>>>>
>>>>>>> I made a small sketch [6] in order to demonstrate the applicability
>> of
>>>>>>> the Phoenix approach to Ignite. Key design concepts are:
>>>>>>> 1. On creating indexes are registered as tables in Calcite schema.
>> This
>>>>>>> step is needed for internal Calcite's routines.
>>>>>>> 2. On planner initialization we register these indexes as
>> materialized
>>>>>>> views in Calcite's optimizer using VolcanoPlanner#addMaterialization
>>>>>>> method.
>>>>>>> 3. Right before the query execution Calcite selects all materialized
>>>>>>> views (indexes) which can be potentially used in query.
>>>>>>> 4. During the query optimization indexes are registered by planner as
>>>>>>> usual TableScans and hence can be chosen by optimizer if they have
>>>> lower
>>>>>>> cost.
>>>>>>>
>>>>>>> This sketch shows the ability to exploit index sortedness only. So
>> the
>>>>>>> future work in this direction should be focused on using indexes for
>>>>>>> fast index lookups. At first glance FilterableTable and
>>>>>>> FilterTableScanRule are good points to start. We can push Filter into
>>>>>>> the TableScan and then use FilterableTable for fast index lookups
>>>>>>> avoiding reading the whole index on TableScan step and then filtering
>>>>>>> its output on the Filter step.
>>>>>>>
>>>>>>> What do you think?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> [1]
>>>>>>>
>>>>>>
>>>>
>> http://apache-ignite-developers.2346864.n4.nabble.com/New-SQL-execution-engine-tt43724.html#none
>>>>>>> [2]
>>>>>>>
>>>>>>
>>>>
>> https://cwiki.apache.org/confluence/display/IGNITE/IEP-37%3A+New+query+execution+engine
>>>>>>> [3]  https://issues.apache.org/jira/browse/PHOENIX-2047
>>>>>>> [4]  https://issues.apache.org/jira/browse/DRILL-6381
>>>>>>> [5]  https://issues.apache.org/jira/browse/DRILL-3929
>>>>>>> [6]  https://github.com/apache/ignite/pull/7115
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>

Reply | Threaded
Open this post in threaded view
|

Re: Adding support for Ignite secondary indexes to Apache Calcite planner

Roman Kondakov
Colleagues,

sorry for late reply.

@Igor, I wouldn't say that materialized views work like rules. There is
no pattern matching there (at least for trivial cases like sorted
indexes, for more complex views pattern matching will have place, see
VolcanoPlanner#registerMaterializations). Also it couldn't be
retriggered several times. The better analogy here is an extra call of
VolcanoPlanner#register for index scans and registering them in the same
sets as usual scans. See an example below.

@Vladimir, I'm not sure that phoenix approach will significantly reduce
optimization time, but it looks like materializations might save some
efforts. Lets consider an example with Merge join.

With drill-like approach initially we have:

LogicalJoin(emps.depId=deps.id)
  LogicalScan(emps)
  LogicalScan(deps)

then we apply converters to convert this tree into the physical
representation. Assume that both tables are sorted by 'id' column and
there is an index on 'emps.depId' column. Also they are collocated on
'emps.depId=deps.id' columns. You apply MergeJoinRule and demand it's
inputs are sorted on 'emps.depId' and 'deps.id' together. After applying
this rule, converting scans to the physical nodes and expanding
AbstractConverters you'll end up with

MergeJoin(emps.depId=deps.id)
  Sort(emps.depId)
    PhysicalScan(emps)
  PhysicalScan(deps)

then you apply ScanToIndexScanRule

MergeJoin(emps.depId=deps.id)
  Sort(emps.depId)
    PhysicalIndexScan(emps.depId)
  PhysicalScan(deps)

and finally after removing redundant sort by SortRemoveRule you'll get

MergeJoin(emps.depId=deps.id)
  PhysicalIndexScan(emps.depId)
  PhysicalScan(deps)

Now, lets take a look to the same optimization process with the
phoenix-like approach. Initially we have the same query tree:

LogicalJoin(emps.depId=deps.id)
  LogicalScan(emps)
  LogicalScan(deps)

But then, just before planning, we register materializations (see the
beginning of the VolcanoPlanner#findBestExp method). And query tree now
looks like

LogicalJoin(emps.depId=deps.id)
  Set 0: [LogicalScan(emps), LogicalIndexScan(collation=emps.depId)]
  LogicalScan(deps)

Note that we have two scans with different collations for 'emps' table
in the Set0. And this happened before the actual planning process. After
converting scans to the physical nodes we'll have:

LogicalJoin(emps.depId=deps.id)
  Set 0: [PhysicalScan(emps), PhysicalIndexScan(collation=emps.depId)]
  PhysicalScan(deps)

and after applying MergeJoinRule and demanding that 'deps' should be
sorted by 'id' column and 'emps' should be sorted by 'depId' column, we
will end up with a tree without Sort operator (unlike in drill case),
because we have already had a properly sorted subset for 'emps' scan.
The tree will look like this:

MergeJoin(emps.depId=deps.id)
  PhysicalIndexScan(collation=emps.depId)
  PhysicalScan(deps)

So, we get to the same point without creating and removing redundant
sort, because we have all possible index scans registered before the
planning is actually started and we can demand sortedness of table scans
directly without applying IndexRules and Abstract converters.

--
Kind Regards
Roman Kondakov


On 13.12.2019 12:38, Seliverstov Igor wrote:

> Colleagues,
>
> As far as I understand, materialization acts like a special rule, that matches some subtree pattern (a leaf part of a query plan) to a star table, which may have better cost than the subtree, it replaces. Saying that, in general, there is no difference between approaches - they do the same almost in the same way but using different API.
>
> My opinion is it’s better to do the deal using rules - it makes overall approach consistent.
>
> Regards,
> Igor
>
>> 12 дек. 2019 г., в 10:03, Vladimir Ozerov <[hidden email]> написал(а):
>>
>> Roman,
>>
>> What I am trying to understand is what advantage of materialization API you
>> see over the normal optimization process? Does it save optimization time,
>> or reduce memory footprint, or maybe provide better plans? I am asking
>> because I do not see how expressing indexes as materializations fit
>> classical optimization process. We discussed Sort <- Scan optimization.
>> Let's consider another example:
>>
>> LogicalSort[a ASC]
>>  LogicalJoin
>>
>> Initially, you do not know the implementation of the join, and hence do not
>> know it's collation. Then you may execute physical join rules, which
>> produce, say, PhysicalMergeJoin[a ASC]. If you execute sort implementation
>> rule afterwards, you may easily eliminate the sort, or make it simpler
>> (e.g. remove local sorting phase), depending on the distribution. In other
>> words, proper implementation of sorting optimization assumes that you have
>> a kind of SortRemoveRule anyway, irrespectively of whether you use
>> materializations or not, because sorting may be injected on top of any
>> operator. With this in mind, the use of materializations doesn't make the
>> planner simpler. Neither it improves the outcome of the whole optimization
>> process.
>>
>> What is left is either lower CPU or RAM usage? Is this the case?
>>
>> ср, 11 дек. 2019 г. в 18:37, Roman Kondakov <[hidden email]>:
>>
>>> Vladimir,
>>>
>>> the main advantage of the Phoenix approach I can see is the using of
>>> Calcite's native materializations API. Calcite has advanced support for
>>> materializations [1] and lattices [2]. Since secondary indexes can be
>>> considered as materialized views (it's just a sorted representation of
>>> the same table) we can seamlessly use views to simulate indexes behavior
>>> for Calcite planner.
>>>
>>>
>>> [1] https://calcite.apache.org/docs/materialized_views.html
>>> [2] https://calcite.apache.org/docs/lattice.html
>>>
>>> --
>>> Kind Regards
>>> Roman Kondakov
>>>
>>>
>>> On 11.12.2019 17:11, Vladimir Ozerov wrote:
>>>> Roman,
>>>>
>>>> What is the advantage of Phoenix approach then? BTW, it looks like
>>> Phoenix
>>>> integration with Calcite never made it to production, did it?
>>>>
>>>> вт, 10 дек. 2019 г. в 19:50, Roman Kondakov <[hidden email]
>>>> :
>>>>
>>>>> Hi Vladimir,
>>>>>
>>>>> from what I understand, Drill does not exploit collation of indexes. To
>>>>> be precise it does not exploit index collation in "natural" way where,
>>>>> say, we a have sorted TableScan and hence we do not create a new Sort.
>>>>> Instead of it Drill always create a Sort operator, but if TableScan can
>>>>> be replaced with an IndexScan, this Sort operator is removed by the
>>>>> dedicated rule.
>>>>>
>>>>> Lets consider initial an operator tree:
>>>>>
>>>>> Project
>>>>>  Sort
>>>>>    TableScan
>>>>>
>>>>> after applying rule DbScanToIndexScanPrule this tree will be converted
>>> to:
>>>>>
>>>>> Project
>>>>>  Sort
>>>>>    IndexScan
>>>>>
>>>>> and finally, after applying DbScanSortRemovalRule we have:
>>>>>
>>>>> Project
>>>>>  IndexScan
>>>>>
>>>>> while for Phoenix approach we would have two equivalent subsets in our
>>>>> planner:
>>>>>
>>>>> Project
>>>>>  Sort
>>>>>    TableScan
>>>>>
>>>>> and
>>>>>
>>>>> Project
>>>>>  IndexScan
>>>>>
>>>>> and most likely the last plan  will be chosen as the best one.
>>>>>
>>>>> --
>>>>> Kind Regards
>>>>> Roman Kondakov
>>>>>
>>>>>
>>>>> On 10.12.2019 17:19, Vladimir Ozerov wrote:
>>>>>> Hi Roman,
>>>>>>
>>>>>> Why do you think that Drill-style will not let you exploit collation?
>>>>>> Collation should be propagated from the index scan in the same way as
>>> in
>>>>>> other sorted operators, such as merge join or streaming aggregate.
>>>>> Provided
>>>>>> that you use converter-hack (or any alternative solution to trigger
>>>>> parent
>>>>>> re-analysis).
>>>>>> In other words, propagation of collation from Drill-style indexes
>>> should
>>>>> be
>>>>>> no different from other sorted operators.
>>>>>>
>>>>>> Regards,
>>>>>> Vladimir.
>>>>>>
>>>>>> вт, 10 дек. 2019 г. в 16:40, Zhenya Stanilovsky
>>>>> <[hidden email]
>>>>>>> :
>>>>>>
>>>>>>>
>>>>>>> Roman just as fast remark, Phoenix builds their approach on
>>>>>>> already existing monolith HBase architecture, most cases it`s just a
>>>>> stub
>>>>>>> for someone who wants use secondary indexes with a base with no
>>>>>>> native support of it. Don`t think it`s good idea here.
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> ------- Forwarded message -------
>>>>>>>> From: "Roman Kondakov" < [hidden email] >
>>>>>>>> To:  [hidden email]
>>>>>>>> Cc:
>>>>>>>> Subject: Adding support for Ignite secondary indexes to Apache
>>> Calcite
>>>>>>>> planner
>>>>>>>> Date: Tue, 10 Dec 2019 15:55:52 +0300
>>>>>>>>
>>>>>>>> Hi all!
>>>>>>>>
>>>>>>>> As you may know there is an activity on integration of Apache Calcite
>>>>>>>> query optimizer into Ignite codebase is being carried out [1],[2].
>>>>>>>>
>>>>>>>> One of a bunch of problems in this integration is the absence of
>>>>>>>> out-of-the-box support for secondary indexes in Apache Calcite. After
>>>>>>>> some research I came to conclusion that this problem has a couple of
>>>>>>>> workarounds. Let's name them
>>>>>>>> 1. Phoenix-style approach - representing secondary indexes as
>>>>>>>> materialized views which are natively supported by Calcite engine [3]
>>>>>>>> 2. Drill-style approach - pushing filters into the table scans and
>>>>>>>> choose appropriate index for lookups when possible [4]
>>>>>>>>
>>>>>>>> Both these approaches have advantages and disadvantages:
>>>>>>>>
>>>>>>>> Phoenix style pros:
>>>>>>>> - natural way of adding indexes as an alternative source of rows:
>>> index
>>>>>>>> can be considered as a kind of sorted materialized view.
>>>>>>>> - possibility of using index sortedness for stream aggregates,
>>>>>>>> deduplication (DISTINCT operator), merge joins, etc.
>>>>>>>> - ability to support other types of indexes (i.e. functional
>>> indexes).
>>>>>>>>
>>>>>>>> Phoenix style cons:
>>>>>>>> - polluting optimizer's search space extra table scans hence
>>> increasing
>>>>>>>> the planning time.
>>>>>>>>
>>>>>>>> Drill style pros:
>>>>>>>> - easier to implement (although it's questionable).
>>>>>>>> - search space is not inflated.
>>>>>>>>
>>>>>>>> Drill style cons:
>>>>>>>> - missed opportunity to exploit sortedness.
>>>>>>>>
>>>>>>>> There is a good discussion about using both approaches can be found
>>> in
>>>>>>> [5].
>>>>>>>>
>>>>>>>> I made a small sketch [6] in order to demonstrate the applicability
>>> of
>>>>>>>> the Phoenix approach to Ignite. Key design concepts are:
>>>>>>>> 1. On creating indexes are registered as tables in Calcite schema.
>>> This
>>>>>>>> step is needed for internal Calcite's routines.
>>>>>>>> 2. On planner initialization we register these indexes as
>>> materialized
>>>>>>>> views in Calcite's optimizer using VolcanoPlanner#addMaterialization
>>>>>>>> method.
>>>>>>>> 3. Right before the query execution Calcite selects all materialized
>>>>>>>> views (indexes) which can be potentially used in query.
>>>>>>>> 4. During the query optimization indexes are registered by planner as
>>>>>>>> usual TableScans and hence can be chosen by optimizer if they have
>>>>> lower
>>>>>>>> cost.
>>>>>>>>
>>>>>>>> This sketch shows the ability to exploit index sortedness only. So
>>> the
>>>>>>>> future work in this direction should be focused on using indexes for
>>>>>>>> fast index lookups. At first glance FilterableTable and
>>>>>>>> FilterTableScanRule are good points to start. We can push Filter into
>>>>>>>> the TableScan and then use FilterableTable for fast index lookups
>>>>>>>> avoiding reading the whole index on TableScan step and then filtering
>>>>>>>> its output on the Filter step.
>>>>>>>>
>>>>>>>> What do you think?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> [1]
>>>>>>>>
>>>>>>>
>>>>>
>>> http://apache-ignite-developers.2346864.n4.nabble.com/New-SQL-execution-engine-tt43724.html#none
>>>>>>>> [2]
>>>>>>>>
>>>>>>>
>>>>>
>>> https://cwiki.apache.org/confluence/display/IGNITE/IEP-37%3A+New+query+execution+engine
>>>>>>>> [3]  https://issues.apache.org/jira/browse/PHOENIX-2047
>>>>>>>> [4]  https://issues.apache.org/jira/browse/DRILL-6381
>>>>>>>> [5]  https://issues.apache.org/jira/browse/DRILL-3929
>>>>>>>> [6]  https://github.com/apache/ignite/pull/7115
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>