Aggregation functions

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Aggregation functions

Vinokurov Pavel
Hi Igniters!

I often meet the following use case.
Id
Company_id
Name
Birthday(dd.mm.yyyy)
1 1 John 01.01.2000
2 1 Mike 01.01.2010
3 1 Nick 01.01.2015

Having table Person, it requires to select min and max birth-dates and name
of the youngest and oldest person for each company.

The current possible solution is  write the query using join between the
same table. Such query has poor performance and looks quite clumsy. Also it
requires to handle same birth dates:
*Ignite Query(simplified)*
SELECT
  MIN_MAX.company_id,
  p1.name as oldest_name,
  MIN_MAX.min_date,
  p2.name as youngest,
  MIN_MAX.max_date
FROM
(SELECT
     company_id,
     min(birthday) as min_date,
     max(birthday) as max_date
group by company_id) MIN_MAX
INNER JOIN Person p1 on p1.birthday=MIN_MAX.MIN_DATE and
p1.company_id=MIN_MAX.company_id
INNER JOIN Person p2 on p2.birthday=MIN_MAX.MAX_DATE and
p2.company_id=MIN_MAX.company_id

Given performance of this query, it's make sense to re-implement this
usecase using pure java code.

But in H2 it's possible to execute the following query:
SELECT
     company_id,
     first_value(name) over( ORDER BY birthday) as oldest_name,
     min(birthday)
     last_value(name) over( ORDER BY birthday) as youngest_name,
     max(birthday)
group by company_id

Ignite doesn't provide any window or inside grouping functions excepting
GROUP_CONCAT, so we could make the similar query.
SELECT
     company_id,
     PARSE_STRING_AND_GET_FIRST_STRING(GROUP_CONCAT( name order by birthday
SEPARATOR ',')) as oldest_name
     min(birthday)
     PARSE_STRING_AND_GET_LAST_STRING(GROUP_CONCAT( name order by birthday
SEPARATOR ',')) as youngest_name
     max(birthday)
group by company_id

These last 2 queries are much faster(10-100x) than the first one.

Thus I want to clarify a few questions:

   1. Does GROUP_CONCAT[2] function really work and make aggregation
   inside  group( in collocated case)?
   2. Are queries 2 and 3 equivalent?
   3. Is there any options to implement first_value[1], last_value without
   custom partitioning. IMHO first_value is the simplified version of
   GROUP_CONCAT. Am I right?


[1] http://www.h2database.com/html/functions.html#first_value
<http://ggsystems.atlassian.net/wiki/pages/createpage.action?spaceKey=GG&title=1&linkCreation=true&fromPageId=1296597032>

[2] https://apacheignite-sql.readme.io/docs/group_concat


Thanks,

Pavel



--

Regards

Pavel Vinokurov
Reply | Threaded
Open this post in threaded view
|

Re: Aggregation functions

Юрий
Hi Pavel,

1. Yes GROUP_CONCAT function works and as for collocated and for non
collocated case. There are following tests:

org.apache.ignite.internal.processors.query.IgniteSqlGroupConcatNotCollocatedTest

org.apache.ignite.internal.processors.query.IgniteSqlGroupConcatCollocatedTest

2. Seems yes, they should be equivalent.
3. I think yes, it could be implemented as some aggregate function. May be
any one of Igniters want to implement it?


вт, 27 авг. 2019 г. в 15:30, Pavel Vinokurov <[hidden email]>:

> Hi Igniters!
>
> I often meet the following use case.
> Id
> Company_id
> Name
> Birthday(dd.mm.yyyy)
> 1 1 John 01.01.2000
> 2 1 Mike 01.01.2010
> 3 1 Nick 01.01.2015
>
> Having table Person, it requires to select min and max birth-dates and name
> of the youngest and oldest person for each company.
>
> The current possible solution is  write the query using join between the
> same table. Such query has poor performance and looks quite clumsy. Also it
> requires to handle same birth dates:
> *Ignite Query(simplified)*
> SELECT
>   MIN_MAX.company_id,
>   p1.name as oldest_name,
>   MIN_MAX.min_date,
>   p2.name as youngest,
>   MIN_MAX.max_date
> FROM
> (SELECT
>      company_id,
>      min(birthday) as min_date,
>      max(birthday) as max_date
> group by company_id) MIN_MAX
> INNER JOIN Person p1 on p1.birthday=MIN_MAX.MIN_DATE and
> p1.company_id=MIN_MAX.company_id
> INNER JOIN Person p2 on p2.birthday=MIN_MAX.MAX_DATE and
> p2.company_id=MIN_MAX.company_id
>
> Given performance of this query, it's make sense to re-implement this
> usecase using pure java code.
>
> But in H2 it's possible to execute the following query:
> SELECT
>      company_id,
>      first_value(name) over( ORDER BY birthday) as oldest_name,
>      min(birthday)
>      last_value(name) over( ORDER BY birthday) as youngest_name,
>      max(birthday)
> group by company_id
>
> Ignite doesn't provide any window or inside grouping functions excepting
> GROUP_CONCAT, so we could make the similar query.
> SELECT
>      company_id,
>      PARSE_STRING_AND_GET_FIRST_STRING(GROUP_CONCAT( name order by birthday
> SEPARATOR ',')) as oldest_name
>      min(birthday)
>      PARSE_STRING_AND_GET_LAST_STRING(GROUP_CONCAT( name order by birthday
> SEPARATOR ',')) as youngest_name
>      max(birthday)
> group by company_id
>
> These last 2 queries are much faster(10-100x) than the first one.
>
> Thus I want to clarify a few questions:
>
>    1. Does GROUP_CONCAT[2] function really work and make aggregation
>    inside  group( in collocated case)?
>    2. Are queries 2 and 3 equivalent?
>    3. Is there any options to implement first_value[1], last_value without
>    custom partitioning. IMHO first_value is the simplified version of
>    GROUP_CONCAT. Am I right?
>
>
> [1] http://www.h2database.com/html/functions.html#first_value
> <
> http://ggsystems.atlassian.net/wiki/pages/createpage.action?spaceKey=GG&title=1&linkCreation=true&fromPageId=1296597032
> >
>
> [2] https://apacheignite-sql.readme.io/docs/group_concat
>
>
> Thanks,
>
> Pavel
>
>
>
> --
>
> Regards
>
> Pavel Vinokurov
>


--
Живи с улыбкой! :D