Collecting query metrics and statistics

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Collecting query metrics and statistics

Vladimir Ozerov
Igniters,

At the moment Ignite lacks administrative and management capabilities in
various places. On such demanded place is SQL queries. I want to start a
discussion around adding query better metrics to Ignite.

I propose to split the task in two parts: infrastructure and UX.

1) We start with adding unique UUID to all query requests, to be able to
glue all pieces together. Then we log the following things (approximately):
- Query ID
- Original query
- Map query
- Mapper execution time
- Mapper rows
- Mapper output bytes (to reducer)
- Mapper input bytes (for distributed joins)
- Reduce query
- Reduce execution time
- Reduce rows
- Reduce input bytes (sum of all mapper output bytes)
- Some info on distributed joins may be
- Advanced things in future (memory page accesses, disk accesses, etc.)

All these stats are saved to local structures. These structures are
accessible through some API and typically not exchange between nodes until
requested.

2) UX
The most important - we will made these stats *accessible through SQL*!

SELECT * FROM map_query_stats
|node|query_id|query              |time_ms|input_bytes|
-------------------------------------------------------
|CLI1|*UUID1*   |SELECT AVG(f)      | 50    |100        |

SELECT * FROM reduce_query_stats
|node|query_id|query                  |time_ms|output_bytes|disk_reads|
-----------------------------------------------------------------------
|SRV1|*UUID1*   |SELECT SUM(f), COUNT(f)|00     |20          |35        |

Then we do some UNIONS/JOINS from client/console/driver/whatever, and:
SELECT ... FROM map_query_stats WHERE query_id = *UUID1*
UNION
SELECT ... FROM reduce_query_stats WHERE query_id = *UUID1*

|total_time|total_disk_reads|reduce_node|reduce_time|reduce_disk_reads|
-----------------------------------------------------------------------
|100       |180             |           |           |                 |
-----------------------------------------------------------------------
|          |                |SRV1       |10         |35               |
-----------------------------------------------------------------------
|          |                |SRV2       |90         |130              |
-----------------------------------------------------------------------
|          |                |SRV3       |20         |25               |

Makes sense?

Vladimir.
Reply | Threaded
Open this post in threaded view
|

Re: Collecting query metrics and statistics

Andrew Mashenkov
Vladimir,

Looks like "Must have" feature.
Most of databases I see have such info available via system tables in
schema.

We have to choose and reserve a schema name for this. E.g.
"information_schema".



On Fri, Aug 4, 2017 at 5:48 PM, Vladimir Ozerov <[hidden email]>
wrote:

> Igniters,
>
> At the moment Ignite lacks administrative and management capabilities in
> various places. On such demanded place is SQL queries. I want to start a
> discussion around adding query better metrics to Ignite.
>
> I propose to split the task in two parts: infrastructure and UX.
>
> 1) We start with adding unique UUID to all query requests, to be able to
> glue all pieces together. Then we log the following things (approximately):
> - Query ID
> - Original query
> - Map query
> - Mapper execution time
> - Mapper rows
> - Mapper output bytes (to reducer)
> - Mapper input bytes (for distributed joins)
> - Reduce query
> - Reduce execution time
> - Reduce rows
> - Reduce input bytes (sum of all mapper output bytes)
> - Some info on distributed joins may be
> - Advanced things in future (memory page accesses, disk accesses, etc.)
>
> All these stats are saved to local structures. These structures are
> accessible through some API and typically not exchange between nodes until
> requested.
>
> 2) UX
> The most important - we will made these stats *accessible through SQL*!
>
> SELECT * FROM map_query_stats
> |node|query_id|query              |time_ms|input_bytes|
> -------------------------------------------------------
> |CLI1|*UUID1*   |SELECT AVG(f)      | 50    |100        |
>
> SELECT * FROM reduce_query_stats
> |node|query_id|query                  |time_ms|output_bytes|disk_reads|
> -----------------------------------------------------------------------
> |SRV1|*UUID1*   |SELECT SUM(f), COUNT(f)|00     |20          |35        |
>
> Then we do some UNIONS/JOINS from client/console/driver/whatever, and:
> SELECT ... FROM map_query_stats WHERE query_id = *UUID1*
> UNION
> SELECT ... FROM reduce_query_stats WHERE query_id = *UUID1*
>
> |total_time|total_disk_reads|reduce_node|reduce_time|reduce_disk_reads|
> -----------------------------------------------------------------------
> |100       |180             |           |           |                 |
> -----------------------------------------------------------------------
> |          |                |SRV1       |10         |35               |
> -----------------------------------------------------------------------
> |          |                |SRV2       |90         |130              |
> -----------------------------------------------------------------------
> |          |                |SRV3       |20         |25               |
>
> Makes sense?
>
> Vladimir.
>



--
Best regards,
Andrey V. Mashenkov
Reply | Threaded
Open this post in threaded view
|

Re: Collecting query metrics and statistics

Vladimir Ozerov
In reply to this post by Vladimir Ozerov
NB: I mixed up in places "mapper" and "reducer" in the second part of the
message.

пт, 4 авг. 2017 г. в 17:48, Vladimir Ozerov <[hidden email]>:

> Igniters,
>
> At the moment Ignite lacks administrative and management capabilities in
> various places. On such demanded place is SQL queries. I want to start a
> discussion around adding query better metrics to Ignite.
>
> I propose to split the task in two parts: infrastructure and UX.
>
> 1) We start with adding unique UUID to all query requests, to be able to
> glue all pieces together. Then we log the following things (approximately):
> - Query ID
> - Original query
> - Map query
> - Mapper execution time
> - Mapper rows
> - Mapper output bytes (to reducer)
> - Mapper input bytes (for distributed joins)
> - Reduce query
> - Reduce execution time
> - Reduce rows
> - Reduce input bytes (sum of all mapper output bytes)
> - Some info on distributed joins may be
> - Advanced things in future (memory page accesses, disk accesses, etc.)
>
> All these stats are saved to local structures. These structures are
> accessible through some API and typically not exchange between nodes until
> requested.
>
> 2) UX
> The most important - we will made these stats *accessible through SQL*!
>
> SELECT * FROM map_query_stats
> |node|query_id|query              |time_ms|input_bytes|
> -------------------------------------------------------
> |CLI1|*UUID1*   |SELECT AVG(f)      | 50    |100        |
>
> SELECT * FROM reduce_query_stats
> |node|query_id|query                  |time_ms|output_bytes|disk_reads|
> -----------------------------------------------------------------------
> |SRV1|*UUID1*   |SELECT SUM(f), COUNT(f)|00     |20          |35        |
>
> Then we do some UNIONS/JOINS from client/console/driver/whatever, and:
> SELECT ... FROM map_query_stats WHERE query_id = *UUID1*
> UNION
> SELECT ... FROM reduce_query_stats WHERE query_id = *UUID1*
>
> |total_time|total_disk_reads|reduce_node|reduce_time|reduce_disk_reads|
> -----------------------------------------------------------------------
> |100       |180             |           |           |                 |
> -----------------------------------------------------------------------
> |          |                |SRV1       |10         |35               |
> -----------------------------------------------------------------------
> |          |                |SRV2       |90         |130              |
> -----------------------------------------------------------------------
> |          |                |SRV3       |20         |25               |
>
> Makes sense?
>
> Vladimir.
>
Reply | Threaded
Open this post in threaded view
|

Re: Collecting query metrics and statistics

yzhdanov
Vladimir, I like the idea very much. We should also provide the stats via
MBeans.

--Yakov