Ignite logs adoption for enterprise grade monitoring tools

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Ignite logs adoption for enterprise grade monitoring tools

dmagda
Igniters,

As a preface, Alexey Kukushkin laid out an insightful and profound explanation on what’s wrong with Ignite logs from a DevOps perspective, how the community can easily tackle the gaps and how our efforts will be payed off if we take his advice in consideration:
http://apache-ignite-developers.2346864.n4.nabble.com/Ignite-not-friendly-for-Monitoring-td20802.html

In short, Ignite log events (errors, warnings and non-severe messages) are not assigned unique identifiers.
Why a mature project like Ignite needs it?

First, to have a human-friendly glossary of error messages or warnings (see MySQL [1] and MongoDB [2] examples) that simplify troubleshooting and debugging on the dev side. Actually we planned to do it back in 2016! [3]

Second, turns out to be that popular DevOps monitoring tools such as DynaTrace [4] and Nagios [5] can easily analyze IDs of log events and help automate their processing or trigger notifications. For instance, if “node left” log message was labeled with an ID then DynaTrace could detect that event and by looking at overall memory usage (JMX) decide what to do next - just send an email to an admin or add a new node to the cluster.

My proposal is to start putting the glossary together making Ignite ready for enterprise grade monitoring systems and DevOps!

As a first step, let’s define subsystems of Ignite spreading out IDs ranges among them:
- networking (discovery, communication) - 1000 - 3000
- memory and persistence - 4000 - 6000
- key-value, caching - 7000 - 9000
- SQL - 10000 - 11000
- etc.

Is everyone with this format and overall endeavor?

[1] https://dev.mysql.com/doc/refman/5.5/en/error-messages-server.html
[2] https://github.com/mongodb/mongo/blob/master/src/mongo/base/error_codes.err
[3] https://issues.apache.org/jira/browse/IGNITE-3690
[4] https://www.dynatrace.com/capabilities/log-analytics/
[5] https://www.nagios.com/solutions/log-monitoring/
Reply | Threaded
Open this post in threaded view
|

Re: Ignite logs adoption for enterprise grade monitoring tools

Alexey Kuznetsov
Denis,

I think it will be very useful.

P.S. Minor note, I think ranges should be _000 - _999?



On Wed, Jan 10, 2018 at 7:49 AM, Denis Magda <[hidden email]> wrote:

> Igniters,
>
> As a preface, Alexey Kukushkin laid out an insightful and profound
> explanation on what’s wrong with Ignite logs from a DevOps perspective, how
> the community can easily tackle the gaps and how our efforts will be payed
> off if we take his advice in consideration:
> http://apache-ignite-developers.2346864.n4.nabble.
> com/Ignite-not-friendly-for-Monitoring-td20802.html
>
> In short, Ignite log events (errors, warnings and non-severe messages) are
> not assigned unique identifiers.
> Why a mature project like Ignite needs it?
>
> First, to have a human-friendly glossary of error messages or warnings
> (see MySQL [1] and MongoDB [2] examples) that simplify troubleshooting and
> debugging on the dev side. Actually we planned to do it back in 2016! [3]
>
> Second, turns out to be that popular DevOps monitoring tools such as
> DynaTrace [4] and Nagios [5] can easily analyze IDs of log events and help
> automate their processing or trigger notifications. For instance, if “node
> left” log message was labeled with an ID then DynaTrace could detect that
> event and by looking at overall memory usage (JMX) decide what to do next -
> just send an email to an admin or add a new node to the cluster.
>
> My proposal is to start putting the glossary together making Ignite ready
> for enterprise grade monitoring systems and DevOps!
>
> As a first step, let’s define subsystems of Ignite spreading out IDs
> ranges among them:
> - networking (discovery, communication) - 1000 - 3000
> - memory and persistence - 4000 - 6000
> - key-value, caching - 7000 - 9000
> - SQL - 10000 - 11000
> - etc.
>
> Is everyone with this format and overall endeavor?
>
> [1] https://dev.mysql.com/doc/refman/5.5/en/error-messages-server.html
> [2] https://github.com/mongodb/mongo/blob/master/src/mongo/
> base/error_codes.err
> [3] https://issues.apache.org/jira/browse/IGNITE-3690
> [4] https://www.dynatrace.com/capabilities/log-analytics/
> [5] https://www.nagios.com/solutions/log-monitoring/




--
Alexey Kuznetsov
Reply | Threaded
Open this post in threaded view
|

Re: Ignite logs adoption for enterprise grade monitoring tools

Vladimir Ozerov
+1, better t I think we need more ch bigger ranges for subsystems. E.g.,
10K reserved for SQL would be enough. Not because there are so many real
errors, but because we would need subgroups.

ср, 10 янв. 2018 г. в 6:49, Alexey Kuznetsov <[hidden email]>:

> Denis,
>
> I think it will be very useful.
>
> P.S. Minor note, I think ranges should be _000 - _999?
>
>
>
> On Wed, Jan 10, 2018 at 7:49 AM, Denis Magda <[hidden email]> wrote:
>
> > Igniters,
> >
> > As a preface, Alexey Kukushkin laid out an insightful and profound
> > explanation on what’s wrong with Ignite logs from a DevOps perspective,
> how
> > the community can easily tackle the gaps and how our efforts will be
> payed
> > off if we take his advice in consideration:
> > http://apache-ignite-developers.2346864.n4.nabble.
> > com/Ignite-not-friendly-for-Monitoring-td20802.html
> >
> > In short, Ignite log events (errors, warnings and non-severe messages)
> are
> > not assigned unique identifiers.
> > Why a mature project like Ignite needs it?
> >
> > First, to have a human-friendly glossary of error messages or warnings
> > (see MySQL [1] and MongoDB [2] examples) that simplify troubleshooting
> and
> > debugging on the dev side. Actually we planned to do it back in 2016! [3]
> >
> > Second, turns out to be that popular DevOps monitoring tools such as
> > DynaTrace [4] and Nagios [5] can easily analyze IDs of log events and
> help
> > automate their processing or trigger notifications. For instance, if
> “node
> > left” log message was labeled with an ID then DynaTrace could detect that
> > event and by looking at overall memory usage (JMX) decide what to do
> next -
> > just send an email to an admin or add a new node to the cluster.
> >
> > My proposal is to start putting the glossary together making Ignite ready
> > for enterprise grade monitoring systems and DevOps!
> >
> > As a first step, let’s define subsystems of Ignite spreading out IDs
> > ranges among them:
> > - networking (discovery, communication) - 1000 - 3000
> > - memory and persistence - 4000 - 6000
> > - key-value, caching - 7000 - 9000
> > - SQL - 10000 - 11000
> > - etc.
> >
> > Is everyone with this format and overall endeavor?
> >
> > [1] https://dev.mysql.com/doc/refman/5.5/en/error-messages-server.html
> > [2] https://github.com/mongodb/mongo/blob/master/src/mongo/
> > base/error_codes.err
> > [3] https://issues.apache.org/jira/browse/IGNITE-3690
> > [4] https://www.dynatrace.com/capabilities/log-analytics/
> > [5] https://www.nagios.com/solutions/log-monitoring/
>
>
>
>
> --
> Alexey Kuznetsov
>
Reply | Threaded
Open this post in threaded view
|

Re: Ignite logs adoption for enterprise grade monitoring tools

Sergey Kosarev-2
In reply to this post by dmagda
Hi,
Why Event ID should be a number?
Maybe better to  split main subsystems by prefixes?  
something like
Networking: IGN-NET
Persistence: IGN-PERS
etc.

Sergey Kosarev

> On 10 Jan 2018, at 03:49, Denis Magda <[hidden email]> wrote:
>
> Igniters,
>
> As a preface, Alexey Kukushkin laid out an insightful and profound explanation on what’s wrong with Ignite logs from a DevOps perspective, how the community can easily tackle the gaps and how our efforts will be payed off if we take his advice in consideration:
> http://apache-ignite-developers.2346864.n4.nabble.com/Ignite-not-friendly-for-Monitoring-td20802.html
>
> In short, Ignite log events (errors, warnings and non-severe messages) are not assigned unique identifiers.
> Why a mature project like Ignite needs it?
>
> First, to have a human-friendly glossary of error messages or warnings (see MySQL [1] and MongoDB [2] examples) that simplify troubleshooting and debugging on the dev side. Actually we planned to do it back in 2016! [3]
>
> Second, turns out to be that popular DevOps monitoring tools such as DynaTrace [4] and Nagios [5] can easily analyze IDs of log events and help automate their processing or trigger notifications. For instance, if “node left” log message was labeled with an ID then DynaTrace could detect that event and by looking at overall memory usage (JMX) decide what to do next - just send an email to an admin or add a new node to the cluster.
>
> My proposal is to start putting the glossary together making Ignite ready for enterprise grade monitoring systems and DevOps!
>
> As a first step, let’s define subsystems of Ignite spreading out IDs ranges among them:
> - networking (discovery, communication) - 1000 - 3000
> - memory and persistence - 4000 - 6000
> - key-value, caching - 7000 - 9000
> - SQL - 10000 - 11000
> - etc.
>
> Is everyone with this format and overall endeavor?
>
> [1] https://dev.mysql.com/doc/refman/5.5/en/error-messages-server.html
> [2] https://github.com/mongodb/mongo/blob/master/src/mongo/base/error_codes.err
> [3] https://issues.apache.org/jira/browse/IGNITE-3690
> [4] https://www.dynatrace.com/capabilities/log-analytics/
> [5] https://www.nagios.com/solutions/log-monitoring/

Reply | Threaded
Open this post in threaded view
|

Re: Ignite logs adoption for enterprise grade monitoring tools

Vladimir Ozerov
Single Number is easier to manage than prefix + number

ср, 10 янв. 2018 г. в 10:13, Sergey Kosarev <[hidden email]>:

> Hi,
> Why Event ID should be a number?
> Maybe better to  split main subsystems by prefixes?
> something like
> Networking: IGN-NET
> Persistence: IGN-PERS
> etc.
>
> Sergey Kosarev
>
> > On 10 Jan 2018, at 03:49, Denis Magda <[hidden email]> wrote:
> >
> > Igniters,
> >
> > As a preface, Alexey Kukushkin laid out an insightful and profound
> explanation on what’s wrong with Ignite logs from a DevOps perspective, how
> the community can easily tackle the gaps and how our efforts will be payed
> off if we take his advice in consideration:
> >
> http://apache-ignite-developers.2346864.n4.nabble.com/Ignite-not-friendly-for-Monitoring-td20802.html
> >
> > In short, Ignite log events (errors, warnings and non-severe messages)
> are not assigned unique identifiers.
> > Why a mature project like Ignite needs it?
> >
> > First, to have a human-friendly glossary of error messages or warnings
> (see MySQL [1] and MongoDB [2] examples) that simplify troubleshooting and
> debugging on the dev side. Actually we planned to do it back in 2016! [3]
> >
> > Second, turns out to be that popular DevOps monitoring tools such as
> DynaTrace [4] and Nagios [5] can easily analyze IDs of log events and help
> automate their processing or trigger notifications. For instance, if “node
> left” log message was labeled with an ID then DynaTrace could detect that
> event and by looking at overall memory usage (JMX) decide what to do next -
> just send an email to an admin or add a new node to the cluster.
> >
> > My proposal is to start putting the glossary together making Ignite
> ready for enterprise grade monitoring systems and DevOps!
> >
> > As a first step, let’s define subsystems of Ignite spreading out IDs
> ranges among them:
> > - networking (discovery, communication) - 1000 - 3000
> > - memory and persistence - 4000 - 6000
> > - key-value, caching - 7000 - 9000
> > - SQL - 10000 - 11000
> > - etc.
> >
> > Is everyone with this format and overall endeavor?
> >
> > [1] https://dev.mysql.com/doc/refman/5.5/en/error-messages-server.html
> > [2]
> https://github.com/mongodb/mongo/blob/master/src/mongo/base/error_codes.err
> > [3] https://issues.apache.org/jira/browse/IGNITE-3690
> > [4] https://www.dynatrace.com/capabilities/log-analytics/
> > [5] https://www.nagios.com/solutions/log-monitoring/
>
>