usage analytics

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

usage analytics

nivanov
Igniters,
I would like to kick off the discussion on the idea of collecting Ignite
usage statistics. The basic idea behind this is to better understand
general and anonymous Ignite usage information to better calibrate
community efforts in developing new features, improving existing ones,
delivering better documentation - and in every other way to make our
project a better software solution.

Although such instrumentation is standard practice in commercially
developed software, for an ASF project this could be a sensitive issue.
Therefore I would like to initiate a full community discussion on how best
to implement such practice for the benefit of project while ensuring the
privacy protection of Ignite users.

To ignite (pun intended) the discussion I'll outline below some of the
basic thoughts that I have on this subject. They are here only to give an
idea of what such instrumentation may potentially look like so that we can
discuss the merits of this idea in a tangible context.

Overview
-------------
Upon start and every hour thereafter each Ignite node will collect, encrypt
and send usage statistics over HTTPS to the ASF-hosted server. That server
will accept such HTTPS packets, decrypt them and store them in a
time-series DB. A web interface will be provided to view the usage
information.

Opt-In or Opt-out
-------------------------
Opt-out. Ignite website will offer simple instructions (system property) on
how to disable this instrumentation.

Code, Infra, Access
---------------------------
Ignite instrumentation will be part of the Ignite code base. The collection
server will be a separate module in the Ignite code base (released
separately from Ignite). The collection server will be hosted by ASF Infra.

Usage statistics will be publicly accessible by anyone in the community.

Private, Personal Data
------------------------------
No private or personal data will ever be transferred. No emails, usernames,
company names, grid names, etc.

Data Retention
--------------------
All data will be retained for 1 year and deleted permanently thereafter.

Usage Data
----------------
The following data will be collected in each packet sent to the collection
server:
- GRID_SIZE (to correspond our testing environment with the more frequent
cluster sizes)
- IP_ADDR (for general geo-tracking as well as to know what documentation
language should be a priority)
- SES_ID (to track continues uptime vs. re-starts)
- USERNAME_TYPE (privilege username vs. standard, to track production vs.
dev/testing usage; note - this is not an actual username)
- OS_NAME
- OS_VER
- OS_ARCH
- JAVA_VER
- JAVA_VENDOR
- COMP_SQL (whether or not this feature was used)
- COMP_COMPUTE (whether or not this feature was used)
- COMP_DATAGRID (whether or not this feature was used)
- COMP_STREAMING (whether or not this feature was used)
- COMP_IGFS (whether or not this feature was used)
- COMP_SERVICE (whether or not this feature was used)
- COMP_PERSISTENCE (whether or not this feature was used)

Please let's discuss this idea. Everyone's comments and suggestions are
*extremely* welcome.

Thanks,
Nikita Ivanov.
Reply | Threaded
Open this post in threaded view
|

Re: usage analytics

Roman Shtykh
NIkita,

While this will help improve Ignite, it will prevent its adoption by many projects -- sending and retaining IP adresses, OS versions, etc. raises tons of questions when considering to use Ignite. Even if it can be opted out.
-- Roman


    On Thursday, July 6, 2017 5:38 AM, Nikita Ivanov <[hidden email]> wrote:
 

 Igniters,
I would like to kick off the discussion on the idea of collecting Ignite
usage statistics. The basic idea behind this is to better understand
general and anonymous Ignite usage information to better calibrate
community efforts in developing new features, improving existing ones,
delivering better documentation - and in every other way to make our
project a better software solution.

Although such instrumentation is standard practice in commercially
developed software, for an ASF project this could be a sensitive issue.
Therefore I would like to initiate a full community discussion on how best
to implement such practice for the benefit of project while ensuring the
privacy protection of Ignite users.

To ignite (pun intended) the discussion I'll outline below some of the
basic thoughts that I have on this subject. They are here only to give an
idea of what such instrumentation may potentially look like so that we can
discuss the merits of this idea in a tangible context.

Overview
-------------
Upon start and every hour thereafter each Ignite node will collect, encrypt
and send usage statistics over HTTPS to the ASF-hosted server. That server
will accept such HTTPS packets, decrypt them and store them in a
time-series DB. A web interface will be provided to view the usage
information.

Opt-In or Opt-out
-------------------------
Opt-out. Ignite website will offer simple instructions (system property) on
how to disable this instrumentation.

Code, Infra, Access
---------------------------
Ignite instrumentation will be part of the Ignite code base. The collection
server will be a separate module in the Ignite code base (released
separately from Ignite). The collection server will be hosted by ASF Infra.

Usage statistics will be publicly accessible by anyone in the community.

Private, Personal Data
------------------------------
No private or personal data will ever be transferred. No emails, usernames,
company names, grid names, etc.

Data Retention
--------------------
All data will be retained for 1 year and deleted permanently thereafter.

Usage Data
----------------
The following data will be collected in each packet sent to the collection
server:
- GRID_SIZE (to correspond our testing environment with the more frequent
cluster sizes)
- IP_ADDR (for general geo-tracking as well as to know what documentation
language should be a priority)
- SES_ID (to track continues uptime vs. re-starts)
- USERNAME_TYPE (privilege username vs. standard, to track production vs.
dev/testing usage; note - this is not an actual username)
- OS_NAME
- OS_VER
- OS_ARCH
- JAVA_VER
- JAVA_VENDOR
- COMP_SQL (whether or not this feature was used)
- COMP_COMPUTE (whether or not this feature was used)
- COMP_DATAGRID (whether or not this feature was used)
- COMP_STREAMING (whether or not this feature was used)
- COMP_IGFS (whether or not this feature was used)
- COMP_SERVICE (whether or not this feature was used)
- COMP_PERSISTENCE (whether or not this feature was used)

Please let's discuss this idea. Everyone's comments and suggestions are
*extremely* welcome.

Thanks,
Nikita Ivanov.


   
Reply | Threaded
Open this post in threaded view
|

Re: usage analytics

Nikita Ivanov-2
Roman,
Thanks for the feedback. What are those questions specifically? Are IP
addresses and OS is what causing it?

Thanks!

--
Nikita Ivanov
Founder & CTO
GridGain Systems

On Wed, Jul 5, 2017 at 6:15 PM, Roman Shtykh <[hidden email]>
wrote:

> NIkita,
>
> While this will help improve Ignite, it will prevent its adoption by many
> projects -- sending and retaining IP adresses, OS versions, etc. raises
> tons of questions when considering to use Ignite. Even if it can be opted
> out.
> -- Roman
>
>
>     On Thursday, July 6, 2017 5:38 AM, Nikita Ivanov <[hidden email]>
> wrote:
>
>
>  Igniters,
> I would like to kick off the discussion on the idea of collecting Ignite
> usage statistics. The basic idea behind this is to better understand
> general and anonymous Ignite usage information to better calibrate
> community efforts in developing new features, improving existing ones,
> delivering better documentation - and in every other way to make our
> project a better software solution.
>
> Although such instrumentation is standard practice in commercially
> developed software, for an ASF project this could be a sensitive issue.
> Therefore I would like to initiate a full community discussion on how best
> to implement such practice for the benefit of project while ensuring the
> privacy protection of Ignite users.
>
> To ignite (pun intended) the discussion I'll outline below some of the
> basic thoughts that I have on this subject. They are here only to give an
> idea of what such instrumentation may potentially look like so that we can
> discuss the merits of this idea in a tangible context.
>
> Overview
> -------------
> Upon start and every hour thereafter each Ignite node will collect, encrypt
> and send usage statistics over HTTPS to the ASF-hosted server. That server
> will accept such HTTPS packets, decrypt them and store them in a
> time-series DB. A web interface will be provided to view the usage
> information.
>
> Opt-In or Opt-out
> -------------------------
> Opt-out. Ignite website will offer simple instructions (system property) on
> how to disable this instrumentation.
>
> Code, Infra, Access
> ---------------------------
> Ignite instrumentation will be part of the Ignite code base. The collection
> server will be a separate module in the Ignite code base (released
> separately from Ignite). The collection server will be hosted by ASF Infra.
>
> Usage statistics will be publicly accessible by anyone in the community.
>
> Private, Personal Data
> ------------------------------
> No private or personal data will ever be transferred. No emails, usernames,
> company names, grid names, etc.
>
> Data Retention
> --------------------
> All data will be retained for 1 year and deleted permanently thereafter.
>
> Usage Data
> ----------------
> The following data will be collected in each packet sent to the collection
> server:
> - GRID_SIZE (to correspond our testing environment with the more frequent
> cluster sizes)
> - IP_ADDR (for general geo-tracking as well as to know what documentation
> language should be a priority)
> - SES_ID (to track continues uptime vs. re-starts)
> - USERNAME_TYPE (privilege username vs. standard, to track production vs.
> dev/testing usage; note - this is not an actual username)
> - OS_NAME
> - OS_VER
> - OS_ARCH
> - JAVA_VER
> - JAVA_VENDOR
> - COMP_SQL (whether or not this feature was used)
> - COMP_COMPUTE (whether or not this feature was used)
> - COMP_DATAGRID (whether or not this feature was used)
> - COMP_STREAMING (whether or not this feature was used)
> - COMP_IGFS (whether or not this feature was used)
> - COMP_SERVICE (whether or not this feature was used)
> - COMP_PERSISTENCE (whether or not this feature was used)
>
> Please let's discuss this idea. Everyone's comments and suggestions are
> *extremely* welcome.
>
> Thanks,
> Nikita Ivanov.
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: usage analytics

ignite_dev2017
In reply to this post by Roman Shtykh
With such statistics collected by Ignite , we won't ever accept ignite in our environment.

However, turning on and off stats collection capabilities would be helpful here if the feature is accepted further for implementation.

Take Care,
Rishi

> On Jul 5, 2017, at 8:15 PM, Roman Shtykh <[hidden email]> wrote:
>
> NIkita,
>
> While this will help improve Ignite, it will prevent its adoption by many projects -- sending and retaining IP adresses, OS versions, etc. raises tons of questions when considering to use Ignite. Even if it can be opted out.
> -- Roman
>
>
>    On Thursday, July 6, 2017 5:38 AM, Nikita Ivanov <[hidden email]> wrote:
>
>
> Igniters,
> I would like to kick off the discussion on the idea of collecting Ignite
> usage statistics. The basic idea behind this is to better understand
> general and anonymous Ignite usage information to better calibrate
> community efforts in developing new features, improving existing ones,
> delivering better documentation - and in every other way to make our
> project a better software solution.
>
> Although such instrumentation is standard practice in commercially
> developed software, for an ASF project this could be a sensitive issue.
> Therefore I would like to initiate a full community discussion on how best
> to implement such practice for the benefit of project while ensuring the
> privacy protection of Ignite users.
>
> To ignite (pun intended) the discussion I'll outline below some of the
> basic thoughts that I have on this subject. They are here only to give an
> idea of what such instrumentation may potentially look like so that we can
> discuss the merits of this idea in a tangible context.
>
> Overview
> -------------
> Upon start and every hour thereafter each Ignite node will collect, encrypt
> and send usage statistics over HTTPS to the ASF-hosted server. That server
> will accept such HTTPS packets, decrypt them and store them in a
> time-series DB. A web interface will be provided to view the usage
> information.
>
> Opt-In or Opt-out
> -------------------------
> Opt-out. Ignite website will offer simple instructions (system property) on
> how to disable this instrumentation.
>
> Code, Infra, Access
> ---------------------------
> Ignite instrumentation will be part of the Ignite code base. The collection
> server will be a separate module in the Ignite code base (released
> separately from Ignite). The collection server will be hosted by ASF Infra.
>
> Usage statistics will be publicly accessible by anyone in the community.
>
> Private, Personal Data
> ------------------------------
> No private or personal data will ever be transferred. No emails, usernames,
> company names, grid names, etc.
>
> Data Retention
> --------------------
> All data will be retained for 1 year and deleted permanently thereafter.
>
> Usage Data
> ----------------
> The following data will be collected in each packet sent to the collection
> server:
> - GRID_SIZE (to correspond our testing environment with the more frequent
> cluster sizes)
> - IP_ADDR (for general geo-tracking as well as to know what documentation
> language should be a priority)
> - SES_ID (to track continues uptime vs. re-starts)
> - USERNAME_TYPE (privilege username vs. standard, to track production vs.
> dev/testing usage; note - this is not an actual username)
> - OS_NAME
> - OS_VER
> - OS_ARCH
> - JAVA_VER
> - JAVA_VENDOR
> - COMP_SQL (whether or not this feature was used)
> - COMP_COMPUTE (whether or not this feature was used)
> - COMP_DATAGRID (whether or not this feature was used)
> - COMP_STREAMING (whether or not this feature was used)
> - COMP_IGFS (whether or not this feature was used)
> - COMP_SERVICE (whether or not this feature was used)
> - COMP_PERSISTENCE (whether or not this feature was used)
>
> Please let's discuss this idea. Everyone's comments and suggestions are
> *extremely* welcome.
>
> Thanks,
> Nikita Ivanov.
>
>
Reply | Threaded
Open this post in threaded view
|

Re: usage analytics

Roman Shtykh
In reply to this post by Nikita Ivanov-2
Nikita,
Sending and storing (somewhere the company cannot securely handle) any information (OS version, IP addresses, etc.) that can be used to compromise the services would be unacceptable.
Turning it off might be ok (possibly through the cluster settings, not via globally-accessible site), but the thing that there's a risk some information can leak outside (for any reason, starting from a human mistake) is scary.
-- Roman



    On Thursday, July 6, 2017 12:38 PM, Nikita Ivanov <[hidden email]> wrote:
 

 Roman,Thanks for the feedback. What are those questions specifically? Are IP addresses and OS is what causing it?
Thanks!
--Nikita IvanovFounder & CTO
GridGain Systems
On Wed, Jul 5, 2017 at 6:15 PM, Roman Shtykh <[hidden email]> wrote:

NIkita,

While this will help improve Ignite, it will prevent its adoption by many projects -- sending and retaining IP adresses, OS versions, etc. raises tons of questions when considering to use Ignite. Even if it can be opted out.
-- Roman


    On Thursday, July 6, 2017 5:38 AM, Nikita Ivanov <[hidden email]> wrote:


 Igniters,
I would like to kick off the discussion on the idea of collecting Ignite
usage statistics. The basic idea behind this is to better understand
general and anonymous Ignite usage information to better calibrate
community efforts in developing new features, improving existing ones,
delivering better documentation - and in every other way to make our
project a better software solution.

Although such instrumentation is standard practice in commercially
developed software, for an ASF project this could be a sensitive issue.
Therefore I would like to initiate a full community discussion on how best
to implement such practice for the benefit of project while ensuring the
privacy protection of Ignite users.

To ignite (pun intended) the discussion I'll outline below some of the
basic thoughts that I have on this subject. They are here only to give an
idea of what such instrumentation may potentially look like so that we can
discuss the merits of this idea in a tangible context.

Overview
-------------
Upon start and every hour thereafter each Ignite node will collect, encrypt
and send usage statistics over HTTPS to the ASF-hosted server. That server
will accept such HTTPS packets, decrypt them and store them in a
time-series DB. A web interface will be provided to view the usage
information.

Opt-In or Opt-out
-------------------------
Opt-out. Ignite website will offer simple instructions (system property) on
how to disable this instrumentation.

Code, Infra, Access
---------------------------
Ignite instrumentation will be part of the Ignite code base. The collection
server will be a separate module in the Ignite code base (released
separately from Ignite). The collection server will be hosted by ASF Infra.

Usage statistics will be publicly accessible by anyone in the community.

Private, Personal Data
------------------------------
No private or personal data will ever be transferred. No emails, usernames,
company names, grid names, etc.

Data Retention
--------------------
All data will be retained for 1 year and deleted permanently thereafter.

Usage Data
----------------
The following data will be collected in each packet sent to the collection
server:
- GRID_SIZE (to correspond our testing environment with the more frequent
cluster sizes)
- IP_ADDR (for general geo-tracking as well as to know what documentation
language should be a priority)
- SES_ID (to track continues uptime vs. re-starts)
- USERNAME_TYPE (privilege username vs. standard, to track production vs.
dev/testing usage; note - this is not an actual username)
- OS_NAME
- OS_VER
- OS_ARCH
- JAVA_VER
- JAVA_VENDOR
- COMP_SQL (whether or not this feature was used)
- COMP_COMPUTE (whether or not this feature was used)
- COMP_DATAGRID (whether or not this feature was used)
- COMP_STREAMING (whether or not this feature was used)
- COMP_IGFS (whether or not this feature was used)
- COMP_SERVICE (whether or not this feature was used)
- COMP_PERSISTENCE (whether or not this feature was used)

Please let's discuss this idea. Everyone's comments and suggestions are
*extremely* welcome.

Thanks,
Nikita Ivanov.


   



   
Reply | Threaded
Open this post in threaded view
|

Re: usage analytics

Nikita Ivanov-2
The idea so far is to have a single system property in configuration that
turns this off completely. I envision that this will be prominently
featured on Ignite website so that everyone who would like to disable it -
can do it in seconds.

Thoughts?

--
Nikita Ivanov
Founder & CTO
GridGain Systems

On Wed, Jul 5, 2017 at 9:27 PM, Roman Shtykh <[hidden email]> wrote:

> Nikita,
>
> Sending and storing (somewhere the company cannot securely handle) any
> information (OS version, IP addresses, etc.) that can be used to compromise
> the services would be unacceptable.
> Turning it off might be ok (possibly through the cluster settings, not via
> globally-accessible site), but the thing that there's a risk some
> information can leak outside (for any reason, starting from a human
> mistake) is scary.
>
> -- Roman
>
>
>
>
> On Thursday, July 6, 2017 12:38 PM, Nikita Ivanov <[hidden email]>
> wrote:
>
>
> Roman,
> Thanks for the feedback. What are those questions specifically? Are IP
> addresses and OS is what causing it?
>
> Thanks!
>
> --
> Nikita Ivanov
> Founder & CTO
> GridGain Systems
>
> On Wed, Jul 5, 2017 at 6:15 PM, Roman Shtykh <[hidden email]>
> wrote:
>
> NIkita,
>
> While this will help improve Ignite, it will prevent its adoption by many
> projects -- sending and retaining IP adresses, OS versions, etc. raises
> tons of questions when considering to use Ignite. Even if it can be opted
> out.
> -- Roman
>
>
>     On Thursday, July 6, 2017 5:38 AM, Nikita Ivanov <[hidden email]>
> wrote:
>
>
>  Igniters,
> I would like to kick off the discussion on the idea of collecting Ignite
> usage statistics. The basic idea behind this is to better understand
> general and anonymous Ignite usage information to better calibrate
> community efforts in developing new features, improving existing ones,
> delivering better documentation - and in every other way to make our
> project a better software solution.
>
> Although such instrumentation is standard practice in commercially
> developed software, for an ASF project this could be a sensitive issue.
> Therefore I would like to initiate a full community discussion on how best
> to implement such practice for the benefit of project while ensuring the
> privacy protection of Ignite users.
>
> To ignite (pun intended) the discussion I'll outline below some of the
> basic thoughts that I have on this subject. They are here only to give an
> idea of what such instrumentation may potentially look like so that we can
> discuss the merits of this idea in a tangible context.
>
> Overview
> -------------
> Upon start and every hour thereafter each Ignite node will collect, encrypt
> and send usage statistics over HTTPS to the ASF-hosted server. That server
> will accept such HTTPS packets, decrypt them and store them in a
> time-series DB. A web interface will be provided to view the usage
> information.
>
> Opt-In or Opt-out
> -------------------------
> Opt-out. Ignite website will offer simple instructions (system property) on
> how to disable this instrumentation.
>
> Code, Infra, Access
> ---------------------------
> Ignite instrumentation will be part of the Ignite code base. The collection
> server will be a separate module in the Ignite code base (released
> separately from Ignite). The collection server will be hosted by ASF Infra.
>
> Usage statistics will be publicly accessible by anyone in the community.
>
> Private, Personal Data
> ------------------------------
> No private or personal data will ever be transferred. No emails, usernames,
> company names, grid names, etc.
>
> Data Retention
> --------------------
> All data will be retained for 1 year and deleted permanently thereafter.
>
> Usage Data
> ----------------
> The following data will be collected in each packet sent to the collection
> server:
> - GRID_SIZE (to correspond our testing environment with the more frequent
> cluster sizes)
> - IP_ADDR (for general geo-tracking as well as to know what documentation
> language should be a priority)
> - SES_ID (to track continues uptime vs. re-starts)
> - USERNAME_TYPE (privilege username vs. standard, to track production vs.
> dev/testing usage; note - this is not an actual username)
> - OS_NAME
> - OS_VER
> - OS_ARCH
> - JAVA_VER
> - JAVA_VENDOR
> - COMP_SQL (whether or not this feature was used)
> - COMP_COMPUTE (whether or not this feature was used)
> - COMP_DATAGRID (whether or not this feature was used)
> - COMP_STREAMING (whether or not this feature was used)
> - COMP_IGFS (whether or not this feature was used)
> - COMP_SERVICE (whether or not this feature was used)
> - COMP_PERSISTENCE (whether or not this feature was used)
>
> Please let's discuss this idea. Everyone's comments and suggestions are
> *extremely* welcome.
>
> Thanks,
> Nikita Ivanov.
>
>
>
>
>
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: usage analytics

Konstantin Boudnik-2
Actually, that should be OFF by default. It sounds like this reduce the amount
of the data collected, but this would address the concerns of companies like
Roman's. I know for sure that a few of my clients would sue my ass out of
existence if I gave them the platform collecting their data-centers info.

Let's have it, set if off by default and document and easy way to turn it off.
Then start making rounds asking our user base to share _some_ of the stats
with the community, so we can track the growth of the install base, etc.

Cos

On Thu, Jul 06, 2017 at 08:20AM, Nikita Ivanov wrote:

> The idea so far is to have a single system property in configuration that
> turns this off completely. I envision that this will be prominently
> featured on Ignite website so that everyone who would like to disable it -
> can do it in seconds.
>
> Thoughts?
>
> --
> Nikita Ivanov
> Founder & CTO
> GridGain Systems
>
> On Wed, Jul 5, 2017 at 9:27 PM, Roman Shtykh <[hidden email]> wrote:
>
> > Nikita,
> >
> > Sending and storing (somewhere the company cannot securely handle) any
> > information (OS version, IP addresses, etc.) that can be used to compromise
> > the services would be unacceptable.
> > Turning it off might be ok (possibly through the cluster settings, not via
> > globally-accessible site), but the thing that there's a risk some
> > information can leak outside (for any reason, starting from a human
> > mistake) is scary.
> >
> > -- Roman
> >
> >
> >
> >
> > On Thursday, July 6, 2017 12:38 PM, Nikita Ivanov <[hidden email]>
> > wrote:
> >
> >
> > Roman,
> > Thanks for the feedback. What are those questions specifically? Are IP
> > addresses and OS is what causing it?
> >
> > Thanks!
> >
> > --
> > Nikita Ivanov
> > Founder & CTO
> > GridGain Systems
> >
> > On Wed, Jul 5, 2017 at 6:15 PM, Roman Shtykh <[hidden email]>
> > wrote:
> >
> > NIkita,
> >
> > While this will help improve Ignite, it will prevent its adoption by many
> > projects -- sending and retaining IP adresses, OS versions, etc. raises
> > tons of questions when considering to use Ignite. Even if it can be opted
> > out.
> > -- Roman
> >
> >
> >     On Thursday, July 6, 2017 5:38 AM, Nikita Ivanov <[hidden email]>
> > wrote:
> >
> >
> >  Igniters,
> > I would like to kick off the discussion on the idea of collecting Ignite
> > usage statistics. The basic idea behind this is to better understand
> > general and anonymous Ignite usage information to better calibrate
> > community efforts in developing new features, improving existing ones,
> > delivering better documentation - and in every other way to make our
> > project a better software solution.
> >
> > Although such instrumentation is standard practice in commercially
> > developed software, for an ASF project this could be a sensitive issue.
> > Therefore I would like to initiate a full community discussion on how best
> > to implement such practice for the benefit of project while ensuring the
> > privacy protection of Ignite users.
> >
> > To ignite (pun intended) the discussion I'll outline below some of the
> > basic thoughts that I have on this subject. They are here only to give an
> > idea of what such instrumentation may potentially look like so that we can
> > discuss the merits of this idea in a tangible context.
> >
> > Overview
> > -------------
> > Upon start and every hour thereafter each Ignite node will collect, encrypt
> > and send usage statistics over HTTPS to the ASF-hosted server. That server
> > will accept such HTTPS packets, decrypt them and store them in a
> > time-series DB. A web interface will be provided to view the usage
> > information.
> >
> > Opt-In or Opt-out
> > -------------------------
> > Opt-out. Ignite website will offer simple instructions (system property) on
> > how to disable this instrumentation.
> >
> > Code, Infra, Access
> > ---------------------------
> > Ignite instrumentation will be part of the Ignite code base. The collection
> > server will be a separate module in the Ignite code base (released
> > separately from Ignite). The collection server will be hosted by ASF Infra.
> >
> > Usage statistics will be publicly accessible by anyone in the community.
> >
> > Private, Personal Data
> > ------------------------------
> > No private or personal data will ever be transferred. No emails, usernames,
> > company names, grid names, etc.
> >
> > Data Retention
> > --------------------
> > All data will be retained for 1 year and deleted permanently thereafter.
> >
> > Usage Data
> > ----------------
> > The following data will be collected in each packet sent to the collection
> > server:
> > - GRID_SIZE (to correspond our testing environment with the more frequent
> > cluster sizes)
> > - IP_ADDR (for general geo-tracking as well as to know what documentation
> > language should be a priority)
> > - SES_ID (to track continues uptime vs. re-starts)
> > - USERNAME_TYPE (privilege username vs. standard, to track production vs.
> > dev/testing usage; note - this is not an actual username)
> > - OS_NAME
> > - OS_VER
> > - OS_ARCH
> > - JAVA_VER
> > - JAVA_VENDOR
> > - COMP_SQL (whether or not this feature was used)
> > - COMP_COMPUTE (whether or not this feature was used)
> > - COMP_DATAGRID (whether or not this feature was used)
> > - COMP_STREAMING (whether or not this feature was used)
> > - COMP_IGFS (whether or not this feature was used)
> > - COMP_SERVICE (whether or not this feature was used)
> > - COMP_PERSISTENCE (whether or not this feature was used)
> >
> > Please let's discuss this idea. Everyone's comments and suggestions are
> > *extremely* welcome.
> >
> > Thanks,
> > Nikita Ivanov.
> >
> >
> >
> >
> >
> >
> >
> >

signature.asc (237 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: usage analytics

Nikita Ivanov-2
Cos,
Based on my experience having it off by default negates the entire
purpose... We need statistically meaningful data set to make any inferences
from it. Moreover, if we are going to ask folks to turn it on it will
significantly skew the resulting data set anyways and show full picture. I
think "on" by default is the better option if we are to collect usage stats
to begin with.

Also, I want to re-iterate it again to avoid misunderstanding: there is no
proposal nor will there be a technical way to attribute collected data back
to a certain company. That's not what this is all about. We should only be
interested in aggregated stats (community size, geo information, language
information, components usage).

Thoughts?

--
Nikita Ivanov
Founder & CTO
GridGain Systems

On Fri, Jul 7, 2017 at 8:17 PM, Konstantin Boudnik <[hidden email]> wrote:

> Actually, that should be OFF by default. It sounds like this reduce the
> amount
> of the data collected, but this would address the concerns of companies
> like
> Roman's. I know for sure that a few of my clients would sue my ass out of
> existence if I gave them the platform collecting their data-centers info.
>
> Let's have it, set if off by default and document and easy way to turn it
> off.
> Then start making rounds asking our user base to share _some_ of the stats
> with the community, so we can track the growth of the install base, etc.
>
> Cos
>
> On Thu, Jul 06, 2017 at 08:20AM, Nikita Ivanov wrote:
> > The idea so far is to have a single system property in configuration that
> > turns this off completely. I envision that this will be prominently
> > featured on Ignite website so that everyone who would like to disable it
> -
> > can do it in seconds.
> >
> > Thoughts?
> >
> > --
> > Nikita Ivanov
> > Founder & CTO
> > GridGain Systems
> >
> > On Wed, Jul 5, 2017 at 9:27 PM, Roman Shtykh <[hidden email]> wrote:
> >
> > > Nikita,
> > >
> > > Sending and storing (somewhere the company cannot securely handle) any
> > > information (OS version, IP addresses, etc.) that can be used to
> compromise
> > > the services would be unacceptable.
> > > Turning it off might be ok (possibly through the cluster settings, not
> via
> > > globally-accessible site), but the thing that there's a risk some
> > > information can leak outside (for any reason, starting from a human
> > > mistake) is scary.
> > >
> > > -- Roman
> > >
> > >
> > >
> > >
> > > On Thursday, July 6, 2017 12:38 PM, Nikita Ivanov <
> [hidden email]>
> > > wrote:
> > >
> > >
> > > Roman,
> > > Thanks for the feedback. What are those questions specifically? Are IP
> > > addresses and OS is what causing it?
> > >
> > > Thanks!
> > >
> > > --
> > > Nikita Ivanov
> > > Founder & CTO
> > > GridGain Systems
> > >
> > > On Wed, Jul 5, 2017 at 6:15 PM, Roman Shtykh <[hidden email]
> >
> > > wrote:
> > >
> > > NIkita,
> > >
> > > While this will help improve Ignite, it will prevent its adoption by
> many
> > > projects -- sending and retaining IP adresses, OS versions, etc. raises
> > > tons of questions when considering to use Ignite. Even if it can be
> opted
> > > out.
> > > -- Roman
> > >
> > >
> > >     On Thursday, July 6, 2017 5:38 AM, Nikita Ivanov <
> [hidden email]>
> > > wrote:
> > >
> > >
> > >  Igniters,
> > > I would like to kick off the discussion on the idea of collecting
> Ignite
> > > usage statistics. The basic idea behind this is to better understand
> > > general and anonymous Ignite usage information to better calibrate
> > > community efforts in developing new features, improving existing ones,
> > > delivering better documentation - and in every other way to make our
> > > project a better software solution.
> > >
> > > Although such instrumentation is standard practice in commercially
> > > developed software, for an ASF project this could be a sensitive issue.
> > > Therefore I would like to initiate a full community discussion on how
> best
> > > to implement such practice for the benefit of project while ensuring
> the
> > > privacy protection of Ignite users.
> > >
> > > To ignite (pun intended) the discussion I'll outline below some of the
> > > basic thoughts that I have on this subject. They are here only to give
> an
> > > idea of what such instrumentation may potentially look like so that we
> can
> > > discuss the merits of this idea in a tangible context.
> > >
> > > Overview
> > > -------------
> > > Upon start and every hour thereafter each Ignite node will collect,
> encrypt
> > > and send usage statistics over HTTPS to the ASF-hosted server. That
> server
> > > will accept such HTTPS packets, decrypt them and store them in a
> > > time-series DB. A web interface will be provided to view the usage
> > > information.
> > >
> > > Opt-In or Opt-out
> > > -------------------------
> > > Opt-out. Ignite website will offer simple instructions (system
> property) on
> > > how to disable this instrumentation.
> > >
> > > Code, Infra, Access
> > > ---------------------------
> > > Ignite instrumentation will be part of the Ignite code base. The
> collection
> > > server will be a separate module in the Ignite code base (released
> > > separately from Ignite). The collection server will be hosted by ASF
> Infra.
> > >
> > > Usage statistics will be publicly accessible by anyone in the
> community.
> > >
> > > Private, Personal Data
> > > ------------------------------
> > > No private or personal data will ever be transferred. No emails,
> usernames,
> > > company names, grid names, etc.
> > >
> > > Data Retention
> > > --------------------
> > > All data will be retained for 1 year and deleted permanently
> thereafter.
> > >
> > > Usage Data
> > > ----------------
> > > The following data will be collected in each packet sent to the
> collection
> > > server:
> > > - GRID_SIZE (to correspond our testing environment with the more
> frequent
> > > cluster sizes)
> > > - IP_ADDR (for general geo-tracking as well as to know what
> documentation
> > > language should be a priority)
> > > - SES_ID (to track continues uptime vs. re-starts)
> > > - USERNAME_TYPE (privilege username vs. standard, to track production
> vs.
> > > dev/testing usage; note - this is not an actual username)
> > > - OS_NAME
> > > - OS_VER
> > > - OS_ARCH
> > > - JAVA_VER
> > > - JAVA_VENDOR
> > > - COMP_SQL (whether or not this feature was used)
> > > - COMP_COMPUTE (whether or not this feature was used)
> > > - COMP_DATAGRID (whether or not this feature was used)
> > > - COMP_STREAMING (whether or not this feature was used)
> > > - COMP_IGFS (whether or not this feature was used)
> > > - COMP_SERVICE (whether or not this feature was used)
> > > - COMP_PERSISTENCE (whether or not this feature was used)
> > >
> > > Please let's discuss this idea. Everyone's comments and suggestions are
> > > *extremely* welcome.
> > >
> > > Thanks,
> > > Nikita Ivanov.
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
>
Reply | Threaded
Open this post in threaded view
|

Re: usage analytics

Konstantin Boudnik-2
On Sat, Jul 08, 2017 at 11:04AM, Nikita Ivanov wrote:
> Cos,
> Based on my experience having it off by default negates the entire
> purpose... We need statistically meaningful data set to make any inferences
> from it. Moreover, if we are going to ask folks to turn it on it will
> significantly skew the resulting data set anyways and show full picture. I
> think "on" by default is the better option if we are to collect usage stats
> to begin with.

yes, sure. But having this "on" by default is likely to expose us to another
shit-storm down the road. An interesting dilemma to have indeed. In my
experience, whenever I install something like a browser or an operating
system, it would ask if I want to make the particular piece of software better
by sending back some anonymized stats. Basically, I am given a way to
explicitly opt-out if I wish.

By turning the feature "on" by default is like saying: "we'll be collecting
some stats, but if you don't want to you can go here and there and disable the
collection. Oh, and by the way - you need to go and figure out the exact steps
to disable it."

> Also, I want to re-iterate it again to avoid misunderstanding: there is no
> proposal nor will there be a technical way to attribute collected data back
> to a certain company. That's not what this is all about. We should only be
> interested in aggregated stats (community size, geo information, language
> information, components usage).

Yes, I think it is clear, but never hurts to re-iterate.

Cos

> Thoughts?
>
> --
> Nikita Ivanov
> Founder & CTO
> GridGain Systems
>
> On Fri, Jul 7, 2017 at 8:17 PM, Konstantin Boudnik <[hidden email]> wrote:
>
> > Actually, that should be OFF by default. It sounds like this reduce the
> > amount
> > of the data collected, but this would address the concerns of companies
> > like
> > Roman's. I know for sure that a few of my clients would sue my ass out of
> > existence if I gave them the platform collecting their data-centers info.
> >
> > Let's have it, set if off by default and document and easy way to turn it
> > off.
> > Then start making rounds asking our user base to share _some_ of the stats
> > with the community, so we can track the growth of the install base, etc.
> >
> > Cos
> >
> > On Thu, Jul 06, 2017 at 08:20AM, Nikita Ivanov wrote:
> > > The idea so far is to have a single system property in configuration that
> > > turns this off completely. I envision that this will be prominently
> > > featured on Ignite website so that everyone who would like to disable it
> > -
> > > can do it in seconds.
> > >
> > > Thoughts?
> > >
> > > --
> > > Nikita Ivanov
> > > Founder & CTO
> > > GridGain Systems
> > >
> > > On Wed, Jul 5, 2017 at 9:27 PM, Roman Shtykh <[hidden email]> wrote:
> > >
> > > > Nikita,
> > > >
> > > > Sending and storing (somewhere the company cannot securely handle) any
> > > > information (OS version, IP addresses, etc.) that can be used to
> > compromise
> > > > the services would be unacceptable.
> > > > Turning it off might be ok (possibly through the cluster settings, not
> > via
> > > > globally-accessible site), but the thing that there's a risk some
> > > > information can leak outside (for any reason, starting from a human
> > > > mistake) is scary.
> > > >
> > > > -- Roman
> > > >
> > > >
> > > >
> > > >
> > > > On Thursday, July 6, 2017 12:38 PM, Nikita Ivanov <
> > [hidden email]>
> > > > wrote:
> > > >
> > > >
> > > > Roman,
> > > > Thanks for the feedback. What are those questions specifically? Are IP
> > > > addresses and OS is what causing it?
> > > >
> > > > Thanks!
> > > >
> > > > --
> > > > Nikita Ivanov
> > > > Founder & CTO
> > > > GridGain Systems
> > > >
> > > > On Wed, Jul 5, 2017 at 6:15 PM, Roman Shtykh <[hidden email]
> > >
> > > > wrote:
> > > >
> > > > NIkita,
> > > >
> > > > While this will help improve Ignite, it will prevent its adoption by
> > many
> > > > projects -- sending and retaining IP adresses, OS versions, etc. raises
> > > > tons of questions when considering to use Ignite. Even if it can be
> > opted
> > > > out.
> > > > -- Roman
> > > >
> > > >
> > > >     On Thursday, July 6, 2017 5:38 AM, Nikita Ivanov <
> > [hidden email]>
> > > > wrote:
> > > >
> > > >
> > > >  Igniters,
> > > > I would like to kick off the discussion on the idea of collecting
> > Ignite
> > > > usage statistics. The basic idea behind this is to better understand
> > > > general and anonymous Ignite usage information to better calibrate
> > > > community efforts in developing new features, improving existing ones,
> > > > delivering better documentation - and in every other way to make our
> > > > project a better software solution.
> > > >
> > > > Although such instrumentation is standard practice in commercially
> > > > developed software, for an ASF project this could be a sensitive issue.
> > > > Therefore I would like to initiate a full community discussion on how
> > best
> > > > to implement such practice for the benefit of project while ensuring
> > the
> > > > privacy protection of Ignite users.
> > > >
> > > > To ignite (pun intended) the discussion I'll outline below some of the
> > > > basic thoughts that I have on this subject. They are here only to give
> > an
> > > > idea of what such instrumentation may potentially look like so that we
> > can
> > > > discuss the merits of this idea in a tangible context.
> > > >
> > > > Overview
> > > > -------------
> > > > Upon start and every hour thereafter each Ignite node will collect,
> > encrypt
> > > > and send usage statistics over HTTPS to the ASF-hosted server. That
> > server
> > > > will accept such HTTPS packets, decrypt them and store them in a
> > > > time-series DB. A web interface will be provided to view the usage
> > > > information.
> > > >
> > > > Opt-In or Opt-out
> > > > -------------------------
> > > > Opt-out. Ignite website will offer simple instructions (system
> > property) on
> > > > how to disable this instrumentation.
> > > >
> > > > Code, Infra, Access
> > > > ---------------------------
> > > > Ignite instrumentation will be part of the Ignite code base. The
> > collection
> > > > server will be a separate module in the Ignite code base (released
> > > > separately from Ignite). The collection server will be hosted by ASF
> > Infra.
> > > >
> > > > Usage statistics will be publicly accessible by anyone in the
> > community.
> > > >
> > > > Private, Personal Data
> > > > ------------------------------
> > > > No private or personal data will ever be transferred. No emails,
> > usernames,
> > > > company names, grid names, etc.
> > > >
> > > > Data Retention
> > > > --------------------
> > > > All data will be retained for 1 year and deleted permanently
> > thereafter.
> > > >
> > > > Usage Data
> > > > ----------------
> > > > The following data will be collected in each packet sent to the
> > collection
> > > > server:
> > > > - GRID_SIZE (to correspond our testing environment with the more
> > frequent
> > > > cluster sizes)
> > > > - IP_ADDR (for general geo-tracking as well as to know what
> > documentation
> > > > language should be a priority)
> > > > - SES_ID (to track continues uptime vs. re-starts)
> > > > - USERNAME_TYPE (privilege username vs. standard, to track production
> > vs.
> > > > dev/testing usage; note - this is not an actual username)
> > > > - OS_NAME
> > > > - OS_VER
> > > > - OS_ARCH
> > > > - JAVA_VER
> > > > - JAVA_VENDOR
> > > > - COMP_SQL (whether or not this feature was used)
> > > > - COMP_COMPUTE (whether or not this feature was used)
> > > > - COMP_DATAGRID (whether or not this feature was used)
> > > > - COMP_STREAMING (whether or not this feature was used)
> > > > - COMP_IGFS (whether or not this feature was used)
> > > > - COMP_SERVICE (whether or not this feature was used)
> > > > - COMP_PERSISTENCE (whether or not this feature was used)
> > > >
> > > > Please let's discuss this idea. Everyone's comments and suggestions are
> > > > *extremely* welcome.
> > > >
> > > > Thanks,
> > > > Nikita Ivanov.
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> >


signature.asc (237 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: usage analytics

nivanov
Igniters,
Just a quick update. I haven't gotten response from ASF Legal on this
thread and I frankly don't know how to proceed here. What's the process to
arrive to a decision point here?

Thanks!
--
Nikita Ivanov


On Mon, Jul 10, 2017 at 3:11 PM, Konstantin Boudnik <[hidden email]> wrote:

> On Sat, Jul 08, 2017 at 11:04AM, Nikita Ivanov wrote:
> > Cos,
> > Based on my experience having it off by default negates the entire
> > purpose... We need statistically meaningful data set to make any
> inferences
> > from it. Moreover, if we are going to ask folks to turn it on it will
> > significantly skew the resulting data set anyways and show full picture.
> I
> > think "on" by default is the better option if we are to collect usage
> stats
> > to begin with.
>
> yes, sure. But having this "on" by default is likely to expose us to
> another
> shit-storm down the road. An interesting dilemma to have indeed. In my
> experience, whenever I install something like a browser or an operating
> system, it would ask if I want to make the particular piece of software
> better
> by sending back some anonymized stats. Basically, I am given a way to
> explicitly opt-out if I wish.
>
> By turning the feature "on" by default is like saying: "we'll be collecting
> some stats, but if you don't want to you can go here and there and disable
> the
> collection. Oh, and by the way - you need to go and figure out the exact
> steps
> to disable it."
>
> > Also, I want to re-iterate it again to avoid misunderstanding: there is
> no
> > proposal nor will there be a technical way to attribute collected data
> back
> > to a certain company. That's not what this is all about. We should only
> be
> > interested in aggregated stats (community size, geo information, language
> > information, components usage).
>
> Yes, I think it is clear, but never hurts to re-iterate.
>
> Cos
>
> > Thoughts?
> >
> > --
> > Nikita Ivanov
> > Founder & CTO
> > GridGain Systems
> >
> > On Fri, Jul 7, 2017 at 8:17 PM, Konstantin Boudnik <[hidden email]>
> wrote:
> >
> > > Actually, that should be OFF by default. It sounds like this reduce the
> > > amount
> > > of the data collected, but this would address the concerns of companies
> > > like
> > > Roman's. I know for sure that a few of my clients would sue my ass out
> of
> > > existence if I gave them the platform collecting their data-centers
> info.
> > >
> > > Let's have it, set if off by default and document and easy way to turn
> it
> > > off.
> > > Then start making rounds asking our user base to share _some_ of the
> stats
> > > with the community, so we can track the growth of the install base,
> etc.
> > >
> > > Cos
> > >
> > > On Thu, Jul 06, 2017 at 08:20AM, Nikita Ivanov wrote:
> > > > The idea so far is to have a single system property in configuration
> that
> > > > turns this off completely. I envision that this will be prominently
> > > > featured on Ignite website so that everyone who would like to
> disable it
> > > -
> > > > can do it in seconds.
> > > >
> > > > Thoughts?
> > > >
> > > > --
> > > > Nikita Ivanov
> > > > Founder & CTO
> > > > GridGain Systems
> > > >
> > > > On Wed, Jul 5, 2017 at 9:27 PM, Roman Shtykh <[hidden email]>
> wrote:
> > > >
> > > > > Nikita,
> > > > >
> > > > > Sending and storing (somewhere the company cannot securely handle)
> any
> > > > > information (OS version, IP addresses, etc.) that can be used to
> > > compromise
> > > > > the services would be unacceptable.
> > > > > Turning it off might be ok (possibly through the cluster settings,
> not
> > > via
> > > > > globally-accessible site), but the thing that there's a risk some
> > > > > information can leak outside (for any reason, starting from a human
> > > > > mistake) is scary.
> > > > >
> > > > > -- Roman
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Thursday, July 6, 2017 12:38 PM, Nikita Ivanov <
> > > [hidden email]>
> > > > > wrote:
> > > > >
> > > > >
> > > > > Roman,
> > > > > Thanks for the feedback. What are those questions specifically?
> Are IP
> > > > > addresses and OS is what causing it?
> > > > >
> > > > > Thanks!
> > > > >
> > > > > --
> > > > > Nikita Ivanov
> > > > > Founder & CTO
> > > > > GridGain Systems
> > > > >
> > > > > On Wed, Jul 5, 2017 at 6:15 PM, Roman Shtykh
> <[hidden email]
> > > >
> > > > > wrote:
> > > > >
> > > > > NIkita,
> > > > >
> > > > > While this will help improve Ignite, it will prevent its adoption
> by
> > > many
> > > > > projects -- sending and retaining IP adresses, OS versions, etc.
> raises
> > > > > tons of questions when considering to use Ignite. Even if it can be
> > > opted
> > > > > out.
> > > > > -- Roman
> > > > >
> > > > >
> > > > >     On Thursday, July 6, 2017 5:38 AM, Nikita Ivanov <
> > > [hidden email]>
> > > > > wrote:
> > > > >
> > > > >
> > > > >  Igniters,
> > > > > I would like to kick off the discussion on the idea of collecting
> > > Ignite
> > > > > usage statistics. The basic idea behind this is to better
> understand
> > > > > general and anonymous Ignite usage information to better calibrate
> > > > > community efforts in developing new features, improving existing
> ones,
> > > > > delivering better documentation - and in every other way to make
> our
> > > > > project a better software solution.
> > > > >
> > > > > Although such instrumentation is standard practice in commercially
> > > > > developed software, for an ASF project this could be a sensitive
> issue.
> > > > > Therefore I would like to initiate a full community discussion on
> how
> > > best
> > > > > to implement such practice for the benefit of project while
> ensuring
> > > the
> > > > > privacy protection of Ignite users.
> > > > >
> > > > > To ignite (pun intended) the discussion I'll outline below some of
> the
> > > > > basic thoughts that I have on this subject. They are here only to
> give
> > > an
> > > > > idea of what such instrumentation may potentially look like so
> that we
> > > can
> > > > > discuss the merits of this idea in a tangible context.
> > > > >
> > > > > Overview
> > > > > -------------
> > > > > Upon start and every hour thereafter each Ignite node will collect,
> > > encrypt
> > > > > and send usage statistics over HTTPS to the ASF-hosted server. That
> > > server
> > > > > will accept such HTTPS packets, decrypt them and store them in a
> > > > > time-series DB. A web interface will be provided to view the usage
> > > > > information.
> > > > >
> > > > > Opt-In or Opt-out
> > > > > -------------------------
> > > > > Opt-out. Ignite website will offer simple instructions (system
> > > property) on
> > > > > how to disable this instrumentation.
> > > > >
> > > > > Code, Infra, Access
> > > > > ---------------------------
> > > > > Ignite instrumentation will be part of the Ignite code base. The
> > > collection
> > > > > server will be a separate module in the Ignite code base (released
> > > > > separately from Ignite). The collection server will be hosted by
> ASF
> > > Infra.
> > > > >
> > > > > Usage statistics will be publicly accessible by anyone in the
> > > community.
> > > > >
> > > > > Private, Personal Data
> > > > > ------------------------------
> > > > > No private or personal data will ever be transferred. No emails,
> > > usernames,
> > > > > company names, grid names, etc.
> > > > >
> > > > > Data Retention
> > > > > --------------------
> > > > > All data will be retained for 1 year and deleted permanently
> > > thereafter.
> > > > >
> > > > > Usage Data
> > > > > ----------------
> > > > > The following data will be collected in each packet sent to the
> > > collection
> > > > > server:
> > > > > - GRID_SIZE (to correspond our testing environment with the more
> > > frequent
> > > > > cluster sizes)
> > > > > - IP_ADDR (for general geo-tracking as well as to know what
> > > documentation
> > > > > language should be a priority)
> > > > > - SES_ID (to track continues uptime vs. re-starts)
> > > > > - USERNAME_TYPE (privilege username vs. standard, to track
> production
> > > vs.
> > > > > dev/testing usage; note - this is not an actual username)
> > > > > - OS_NAME
> > > > > - OS_VER
> > > > > - OS_ARCH
> > > > > - JAVA_VER
> > > > > - JAVA_VENDOR
> > > > > - COMP_SQL (whether or not this feature was used)
> > > > > - COMP_COMPUTE (whether or not this feature was used)
> > > > > - COMP_DATAGRID (whether or not this feature was used)
> > > > > - COMP_STREAMING (whether or not this feature was used)
> > > > > - COMP_IGFS (whether or not this feature was used)
> > > > > - COMP_SERVICE (whether or not this feature was used)
> > > > > - COMP_PERSISTENCE (whether or not this feature was used)
> > > > >
> > > > > Please let's discuss this idea. Everyone's comments and
> suggestions are
> > > > > *extremely* welcome.
> > > > >
> > > > > Thanks,
> > > > > Nikita Ivanov.
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > >
>
>
Reply | Threaded
Open this post in threaded view
|

Re: usage analytics

dsetrakyan
I would try to ping legal again and see if they respond. If not, I think we will need to come up with a simpler approach, that does not require legal approval.

⁣D.​

On Jul 18, 2017, 2:23 PM, at 2:23 PM, Nikita Ivanov <[hidden email]> wrote:

>Igniters,
>Just a quick update. I haven't gotten response from ASF Legal on this
>thread and I frankly don't know how to proceed here. What's the process
>to
>arrive to a decision point here?
>
>Thanks!
>--
>Nikita Ivanov
>
>
>On Mon, Jul 10, 2017 at 3:11 PM, Konstantin Boudnik <[hidden email]>
>wrote:
>
>> On Sat, Jul 08, 2017 at 11:04AM, Nikita Ivanov wrote:
>> > Cos,
>> > Based on my experience having it off by default negates the entire
>> > purpose... We need statistically meaningful data set to make any
>> inferences
>> > from it. Moreover, if we are going to ask folks to turn it on it
>will
>> > significantly skew the resulting data set anyways and show full
>picture.
>> I
>> > think "on" by default is the better option if we are to collect
>usage
>> stats
>> > to begin with.
>>
>> yes, sure. But having this "on" by default is likely to expose us to
>> another
>> shit-storm down the road. An interesting dilemma to have indeed. In
>my
>> experience, whenever I install something like a browser or an
>operating
>> system, it would ask if I want to make the particular piece of
>software
>> better
>> by sending back some anonymized stats. Basically, I am given a way to
>> explicitly opt-out if I wish.
>>
>> By turning the feature "on" by default is like saying: "we'll be
>collecting
>> some stats, but if you don't want to you can go here and there and
>disable
>> the
>> collection. Oh, and by the way - you need to go and figure out the
>exact
>> steps
>> to disable it."
>>
>> > Also, I want to re-iterate it again to avoid misunderstanding:
>there is
>> no
>> > proposal nor will there be a technical way to attribute collected
>data
>> back
>> > to a certain company. That's not what this is all about. We should
>only
>> be
>> > interested in aggregated stats (community size, geo information,
>language
>> > information, components usage).
>>
>> Yes, I think it is clear, but never hurts to re-iterate.
>>
>> Cos
>>
>> > Thoughts?
>> >
>> > --
>> > Nikita Ivanov
>> > Founder & CTO
>> > GridGain Systems
>> >
>> > On Fri, Jul 7, 2017 at 8:17 PM, Konstantin Boudnik <[hidden email]>
>> wrote:
>> >
>> > > Actually, that should be OFF by default. It sounds like this
>reduce the
>> > > amount
>> > > of the data collected, but this would address the concerns of
>companies
>> > > like
>> > > Roman's. I know for sure that a few of my clients would sue my
>ass out
>> of
>> > > existence if I gave them the platform collecting their
>data-centers
>> info.
>> > >
>> > > Let's have it, set if off by default and document and easy way to
>turn
>> it
>> > > off.
>> > > Then start making rounds asking our user base to share _some_ of
>the
>> stats
>> > > with the community, so we can track the growth of the install
>base,
>> etc.
>> > >
>> > > Cos
>> > >
>> > > On Thu, Jul 06, 2017 at 08:20AM, Nikita Ivanov wrote:
>> > > > The idea so far is to have a single system property in
>configuration
>> that
>> > > > turns this off completely. I envision that this will be
>prominently
>> > > > featured on Ignite website so that everyone who would like to
>> disable it
>> > > -
>> > > > can do it in seconds.
>> > > >
>> > > > Thoughts?
>> > > >
>> > > > --
>> > > > Nikita Ivanov
>> > > > Founder & CTO
>> > > > GridGain Systems
>> > > >
>> > > > On Wed, Jul 5, 2017 at 9:27 PM, Roman Shtykh
><[hidden email]>
>> wrote:
>> > > >
>> > > > > Nikita,
>> > > > >
>> > > > > Sending and storing (somewhere the company cannot securely
>handle)
>> any
>> > > > > information (OS version, IP addresses, etc.) that can be used
>to
>> > > compromise
>> > > > > the services would be unacceptable.
>> > > > > Turning it off might be ok (possibly through the cluster
>settings,
>> not
>> > > via
>> > > > > globally-accessible site), but the thing that there's a risk
>some
>> > > > > information can leak outside (for any reason, starting from a
>human
>> > > > > mistake) is scary.
>> > > > >
>> > > > > -- Roman
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > > On Thursday, July 6, 2017 12:38 PM, Nikita Ivanov <
>> > > [hidden email]>
>> > > > > wrote:
>> > > > >
>> > > > >
>> > > > > Roman,
>> > > > > Thanks for the feedback. What are those questions
>specifically?
>> Are IP
>> > > > > addresses and OS is what causing it?
>> > > > >
>> > > > > Thanks!
>> > > > >
>> > > > > --
>> > > > > Nikita Ivanov
>> > > > > Founder & CTO
>> > > > > GridGain Systems
>> > > > >
>> > > > > On Wed, Jul 5, 2017 at 6:15 PM, Roman Shtykh
>> <[hidden email]
>> > > >
>> > > > > wrote:
>> > > > >
>> > > > > NIkita,
>> > > > >
>> > > > > While this will help improve Ignite, it will prevent its
>adoption
>> by
>> > > many
>> > > > > projects -- sending and retaining IP adresses, OS versions,
>etc.
>> raises
>> > > > > tons of questions when considering to use Ignite. Even if it
>can be
>> > > opted
>> > > > > out.
>> > > > > -- Roman
>> > > > >
>> > > > >
>> > > > >     On Thursday, July 6, 2017 5:38 AM, Nikita Ivanov <
>> > > [hidden email]>
>> > > > > wrote:
>> > > > >
>> > > > >
>> > > > >  Igniters,
>> > > > > I would like to kick off the discussion on the idea of
>collecting
>> > > Ignite
>> > > > > usage statistics. The basic idea behind this is to better
>> understand
>> > > > > general and anonymous Ignite usage information to better
>calibrate
>> > > > > community efforts in developing new features, improving
>existing
>> ones,
>> > > > > delivering better documentation - and in every other way to
>make
>> our
>> > > > > project a better software solution.
>> > > > >
>> > > > > Although such instrumentation is standard practice in
>commercially
>> > > > > developed software, for an ASF project this could be a
>sensitive
>> issue.
>> > > > > Therefore I would like to initiate a full community
>discussion on
>> how
>> > > best
>> > > > > to implement such practice for the benefit of project while
>> ensuring
>> > > the
>> > > > > privacy protection of Ignite users.
>> > > > >
>> > > > > To ignite (pun intended) the discussion I'll outline below
>some of
>> the
>> > > > > basic thoughts that I have on this subject. They are here
>only to
>> give
>> > > an
>> > > > > idea of what such instrumentation may potentially look like
>so
>> that we
>> > > can
>> > > > > discuss the merits of this idea in a tangible context.
>> > > > >
>> > > > > Overview
>> > > > > -------------
>> > > > > Upon start and every hour thereafter each Ignite node will
>collect,
>> > > encrypt
>> > > > > and send usage statistics over HTTPS to the ASF-hosted
>server. That
>> > > server
>> > > > > will accept such HTTPS packets, decrypt them and store them
>in a
>> > > > > time-series DB. A web interface will be provided to view the
>usage
>> > > > > information.
>> > > > >
>> > > > > Opt-In or Opt-out
>> > > > > -------------------------
>> > > > > Opt-out. Ignite website will offer simple instructions
>(system
>> > > property) on
>> > > > > how to disable this instrumentation.
>> > > > >
>> > > > > Code, Infra, Access
>> > > > > ---------------------------
>> > > > > Ignite instrumentation will be part of the Ignite code base.
>The
>> > > collection
>> > > > > server will be a separate module in the Ignite code base
>(released
>> > > > > separately from Ignite). The collection server will be hosted
>by
>> ASF
>> > > Infra.
>> > > > >
>> > > > > Usage statistics will be publicly accessible by anyone in the
>> > > community.
>> > > > >
>> > > > > Private, Personal Data
>> > > > > ------------------------------
>> > > > > No private or personal data will ever be transferred. No
>emails,
>> > > usernames,
>> > > > > company names, grid names, etc.
>> > > > >
>> > > > > Data Retention
>> > > > > --------------------
>> > > > > All data will be retained for 1 year and deleted permanently
>> > > thereafter.
>> > > > >
>> > > > > Usage Data
>> > > > > ----------------
>> > > > > The following data will be collected in each packet sent to
>the
>> > > collection
>> > > > > server:
>> > > > > - GRID_SIZE (to correspond our testing environment with the
>more
>> > > frequent
>> > > > > cluster sizes)
>> > > > > - IP_ADDR (for general geo-tracking as well as to know what
>> > > documentation
>> > > > > language should be a priority)
>> > > > > - SES_ID (to track continues uptime vs. re-starts)
>> > > > > - USERNAME_TYPE (privilege username vs. standard, to track
>> production
>> > > vs.
>> > > > > dev/testing usage; note - this is not an actual username)
>> > > > > - OS_NAME
>> > > > > - OS_VER
>> > > > > - OS_ARCH
>> > > > > - JAVA_VER
>> > > > > - JAVA_VENDOR
>> > > > > - COMP_SQL (whether or not this feature was used)
>> > > > > - COMP_COMPUTE (whether or not this feature was used)
>> > > > > - COMP_DATAGRID (whether or not this feature was used)
>> > > > > - COMP_STREAMING (whether or not this feature was used)
>> > > > > - COMP_IGFS (whether or not this feature was used)
>> > > > > - COMP_SERVICE (whether or not this feature was used)
>> > > > > - COMP_PERSISTENCE (whether or not this feature was used)
>> > > > >
>> > > > > Please let's discuss this idea. Everyone's comments and
>> suggestions are
>> > > > > *extremely* welcome.
>> > > > >
>> > > > > Thanks,
>> > > > > Nikita Ivanov.
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > >
>>
>>
Reply | Threaded
Open this post in threaded view
|

Re: usage analytics

Alexey Goncharuk
Folks,

I want to bump up this discussion and slightly change the format suggested
by Nikita. I dot think it is correct to gather any information related to
the user environment. However, can we collect just the fact of some of the
Ignite APIs/subsystems being used with no user information whatsoever?
Having started thinking about Ignite 3.0 I realized that we lack even some
very basic knowledge on the impact of changing one or another feature or
API.

To my knowledge, the Ignite website already uses google analytics which is
available to the community. The google analytics platform already has
tooling to track app screen hits in a completely anonymous way, so we can
use this tool to track Ignite components usage (once per node startup)
sending solely component name and a unique environment hash - no IP
addresses, no jdk/os/other information. The information will be available
in the same toolkit we are already using to analyze the website and
optimize our docs.

WDYT?

ср, 19 июл. 2017 г. в 01:15, <[hidden email]>:

> I would try to ping legal again and see if they respond. If not, I think
> we will need to come up with a simpler approach, that does not require
> legal approval.
>
> ⁣D.​
>
> On Jul 18, 2017, 2:23 PM, at 2:23 PM, Nikita Ivanov <[hidden email]>
> wrote:
> >Igniters,
> >Just a quick update. I haven't gotten response from ASF Legal on this
> >thread and I frankly don't know how to proceed here. What's the process
> >to
> >arrive to a decision point here?
> >
> >Thanks!
> >--
> >Nikita Ivanov
> >
> >
> >On Mon, Jul 10, 2017 at 3:11 PM, Konstantin Boudnik <[hidden email]>
> >wrote:
> >
> >> On Sat, Jul 08, 2017 at 11:04AM, Nikita Ivanov wrote:
> >> > Cos,
> >> > Based on my experience having it off by default negates the entire
> >> > purpose... We need statistically meaningful data set to make any
> >> inferences
> >> > from it. Moreover, if we are going to ask folks to turn it on it
> >will
> >> > significantly skew the resulting data set anyways and show full
> >picture.
> >> I
> >> > think "on" by default is the better option if we are to collect
> >usage
> >> stats
> >> > to begin with.
> >>
> >> yes, sure. But having this "on" by default is likely to expose us to
> >> another
> >> shit-storm down the road. An interesting dilemma to have indeed. In
> >my
> >> experience, whenever I install something like a browser or an
> >operating
> >> system, it would ask if I want to make the particular piece of
> >software
> >> better
> >> by sending back some anonymized stats. Basically, I am given a way to
> >> explicitly opt-out if I wish.
> >>
> >> By turning the feature "on" by default is like saying: "we'll be
> >collecting
> >> some stats, but if you don't want to you can go here and there and
> >disable
> >> the
> >> collection. Oh, and by the way - you need to go and figure out the
> >exact
> >> steps
> >> to disable it."
> >>
> >> > Also, I want to re-iterate it again to avoid misunderstanding:
> >there is
> >> no
> >> > proposal nor will there be a technical way to attribute collected
> >data
> >> back
> >> > to a certain company. That's not what this is all about. We should
> >only
> >> be
> >> > interested in aggregated stats (community size, geo information,
> >language
> >> > information, components usage).
> >>
> >> Yes, I think it is clear, but never hurts to re-iterate.
> >>
> >> Cos
> >>
> >> > Thoughts?
> >> >
> >> > --
> >> > Nikita Ivanov
> >> > Founder & CTO
> >> > GridGain Systems
> >> >
> >> > On Fri, Jul 7, 2017 at 8:17 PM, Konstantin Boudnik <[hidden email]>
> >> wrote:
> >> >
> >> > > Actually, that should be OFF by default. It sounds like this
> >reduce the
> >> > > amount
> >> > > of the data collected, but this would address the concerns of
> >companies
> >> > > like
> >> > > Roman's. I know for sure that a few of my clients would sue my
> >ass out
> >> of
> >> > > existence if I gave them the platform collecting their
> >data-centers
> >> info.
> >> > >
> >> > > Let's have it, set if off by default and document and easy way to
> >turn
> >> it
> >> > > off.
> >> > > Then start making rounds asking our user base to share _some_ of
> >the
> >> stats
> >> > > with the community, so we can track the growth of the install
> >base,
> >> etc.
> >> > >
> >> > > Cos
> >> > >
> >> > > On Thu, Jul 06, 2017 at 08:20AM, Nikita Ivanov wrote:
> >> > > > The idea so far is to have a single system property in
> >configuration
> >> that
> >> > > > turns this off completely. I envision that this will be
> >prominently
> >> > > > featured on Ignite website so that everyone who would like to
> >> disable it
> >> > > -
> >> > > > can do it in seconds.
> >> > > >
> >> > > > Thoughts?
> >> > > >
> >> > > > --
> >> > > > Nikita Ivanov
> >> > > > Founder & CTO
> >> > > > GridGain Systems
> >> > > >
> >> > > > On Wed, Jul 5, 2017 at 9:27 PM, Roman Shtykh
> ><[hidden email]>
> >> wrote:
> >> > > >
> >> > > > > Nikita,
> >> > > > >
> >> > > > > Sending and storing (somewhere the company cannot securely
> >handle)
> >> any
> >> > > > > information (OS version, IP addresses, etc.) that can be used
> >to
> >> > > compromise
> >> > > > > the services would be unacceptable.
> >> > > > > Turning it off might be ok (possibly through the cluster
> >settings,
> >> not
> >> > > via
> >> > > > > globally-accessible site), but the thing that there's a risk
> >some
> >> > > > > information can leak outside (for any reason, starting from a
> >human
> >> > > > > mistake) is scary.
> >> > > > >
> >> > > > > -- Roman
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > > On Thursday, July 6, 2017 12:38 PM, Nikita Ivanov <
> >> > > [hidden email]>
> >> > > > > wrote:
> >> > > > >
> >> > > > >
> >> > > > > Roman,
> >> > > > > Thanks for the feedback. What are those questions
> >specifically?
> >> Are IP
> >> > > > > addresses and OS is what causing it?
> >> > > > >
> >> > > > > Thanks!
> >> > > > >
> >> > > > > --
> >> > > > > Nikita Ivanov
> >> > > > > Founder & CTO
> >> > > > > GridGain Systems
> >> > > > >
> >> > > > > On Wed, Jul 5, 2017 at 6:15 PM, Roman Shtykh
> >> <[hidden email]
> >> > > >
> >> > > > > wrote:
> >> > > > >
> >> > > > > NIkita,
> >> > > > >
> >> > > > > While this will help improve Ignite, it will prevent its
> >adoption
> >> by
> >> > > many
> >> > > > > projects -- sending and retaining IP adresses, OS versions,
> >etc.
> >> raises
> >> > > > > tons of questions when considering to use Ignite. Even if it
> >can be
> >> > > opted
> >> > > > > out.
> >> > > > > -- Roman
> >> > > > >
> >> > > > >
> >> > > > >     On Thursday, July 6, 2017 5:38 AM, Nikita Ivanov <
> >> > > [hidden email]>
> >> > > > > wrote:
> >> > > > >
> >> > > > >
> >> > > > >  Igniters,
> >> > > > > I would like to kick off the discussion on the idea of
> >collecting
> >> > > Ignite
> >> > > > > usage statistics. The basic idea behind this is to better
> >> understand
> >> > > > > general and anonymous Ignite usage information to better
> >calibrate
> >> > > > > community efforts in developing new features, improving
> >existing
> >> ones,
> >> > > > > delivering better documentation - and in every other way to
> >make
> >> our
> >> > > > > project a better software solution.
> >> > > > >
> >> > > > > Although such instrumentation is standard practice in
> >commercially
> >> > > > > developed software, for an ASF project this could be a
> >sensitive
> >> issue.
> >> > > > > Therefore I would like to initiate a full community
> >discussion on
> >> how
> >> > > best
> >> > > > > to implement such practice for the benefit of project while
> >> ensuring
> >> > > the
> >> > > > > privacy protection of Ignite users.
> >> > > > >
> >> > > > > To ignite (pun intended) the discussion I'll outline below
> >some of
> >> the
> >> > > > > basic thoughts that I have on this subject. They are here
> >only to
> >> give
> >> > > an
> >> > > > > idea of what such instrumentation may potentially look like
> >so
> >> that we
> >> > > can
> >> > > > > discuss the merits of this idea in a tangible context.
> >> > > > >
> >> > > > > Overview
> >> > > > > -------------
> >> > > > > Upon start and every hour thereafter each Ignite node will
> >collect,
> >> > > encrypt
> >> > > > > and send usage statistics over HTTPS to the ASF-hosted
> >server. That
> >> > > server
> >> > > > > will accept such HTTPS packets, decrypt them and store them
> >in a
> >> > > > > time-series DB. A web interface will be provided to view the
> >usage
> >> > > > > information.
> >> > > > >
> >> > > > > Opt-In or Opt-out
> >> > > > > -------------------------
> >> > > > > Opt-out. Ignite website will offer simple instructions
> >(system
> >> > > property) on
> >> > > > > how to disable this instrumentation.
> >> > > > >
> >> > > > > Code, Infra, Access
> >> > > > > ---------------------------
> >> > > > > Ignite instrumentation will be part of the Ignite code base.
> >The
> >> > > collection
> >> > > > > server will be a separate module in the Ignite code base
> >(released
> >> > > > > separately from Ignite). The collection server will be hosted
> >by
> >> ASF
> >> > > Infra.
> >> > > > >
> >> > > > > Usage statistics will be publicly accessible by anyone in the
> >> > > community.
> >> > > > >
> >> > > > > Private, Personal Data
> >> > > > > ------------------------------
> >> > > > > No private or personal data will ever be transferred. No
> >emails,
> >> > > usernames,
> >> > > > > company names, grid names, etc.
> >> > > > >
> >> > > > > Data Retention
> >> > > > > --------------------
> >> > > > > All data will be retained for 1 year and deleted permanently
> >> > > thereafter.
> >> > > > >
> >> > > > > Usage Data
> >> > > > > ----------------
> >> > > > > The following data will be collected in each packet sent to
> >the
> >> > > collection
> >> > > > > server:
> >> > > > > - GRID_SIZE (to correspond our testing environment with the
> >more
> >> > > frequent
> >> > > > > cluster sizes)
> >> > > > > - IP_ADDR (for general geo-tracking as well as to know what
> >> > > documentation
> >> > > > > language should be a priority)
> >> > > > > - SES_ID (to track continues uptime vs. re-starts)
> >> > > > > - USERNAME_TYPE (privilege username vs. standard, to track
> >> production
> >> > > vs.
> >> > > > > dev/testing usage; note - this is not an actual username)
> >> > > > > - OS_NAME
> >> > > > > - OS_VER
> >> > > > > - OS_ARCH
> >> > > > > - JAVA_VER
> >> > > > > - JAVA_VENDOR
> >> > > > > - COMP_SQL (whether or not this feature was used)
> >> > > > > - COMP_COMPUTE (whether or not this feature was used)
> >> > > > > - COMP_DATAGRID (whether or not this feature was used)
> >> > > > > - COMP_STREAMING (whether or not this feature was used)
> >> > > > > - COMP_IGFS (whether or not this feature was used)
> >> > > > > - COMP_SERVICE (whether or not this feature was used)
> >> > > > > - COMP_PERSISTENCE (whether or not this feature was used)
> >> > > > >
> >> > > > > Please let's discuss this idea. Everyone's comments and
> >> suggestions are
> >> > > > > *extremely* welcome.
> >> > > > >
> >> > > > > Thanks,
> >> > > > > Nikita Ivanov.
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > >
> >> > >
> >>
> >>
>
Reply | Threaded
Open this post in threaded view
|

Re: usage analytics

Valentin Kulichenko
Makes sense to me. I would love to know which components/APIs are used more
than others. Obviously, we should make sure everything is anonymous and we
don't collect any private user data, but I believe this is already
guaranteed by Google Analytics.

-Val

On Tue, Nov 3, 2020 at 3:59 AM Alexey Goncharuk <[hidden email]>
wrote:

> Folks,
>
> I want to bump up this discussion and slightly change the format suggested
> by Nikita. I dot think it is correct to gather any information related to
> the user environment. However, can we collect just the fact of some of the
> Ignite APIs/subsystems being used with no user information whatsoever?
> Having started thinking about Ignite 3.0 I realized that we lack even some
> very basic knowledge on the impact of changing one or another feature or
> API.
>
> To my knowledge, the Ignite website already uses google analytics which is
> available to the community. The google analytics platform already has
> tooling to track app screen hits in a completely anonymous way, so we can
> use this tool to track Ignite components usage (once per node startup)
> sending solely component name and a unique environment hash - no IP
> addresses, no jdk/os/other information. The information will be available
> in the same toolkit we are already using to analyze the website and
> optimize our docs.
>
> WDYT?
>
> ср, 19 июл. 2017 г. в 01:15, <[hidden email]>:
>
> > I would try to ping legal again and see if they respond. If not, I think
> > we will need to come up with a simpler approach, that does not require
> > legal approval.
> >
> > ⁣D.​
> >
> > On Jul 18, 2017, 2:23 PM, at 2:23 PM, Nikita Ivanov <[hidden email]
> >
> > wrote:
> > >Igniters,
> > >Just a quick update. I haven't gotten response from ASF Legal on this
> > >thread and I frankly don't know how to proceed here. What's the process
> > >to
> > >arrive to a decision point here?
> > >
> > >Thanks!
> > >--
> > >Nikita Ivanov
> > >
> > >
> > >On Mon, Jul 10, 2017 at 3:11 PM, Konstantin Boudnik <[hidden email]>
> > >wrote:
> > >
> > >> On Sat, Jul 08, 2017 at 11:04AM, Nikita Ivanov wrote:
> > >> > Cos,
> > >> > Based on my experience having it off by default negates the entire
> > >> > purpose... We need statistically meaningful data set to make any
> > >> inferences
> > >> > from it. Moreover, if we are going to ask folks to turn it on it
> > >will
> > >> > significantly skew the resulting data set anyways and show full
> > >picture.
> > >> I
> > >> > think "on" by default is the better option if we are to collect
> > >usage
> > >> stats
> > >> > to begin with.
> > >>
> > >> yes, sure. But having this "on" by default is likely to expose us to
> > >> another
> > >> shit-storm down the road. An interesting dilemma to have indeed. In
> > >my
> > >> experience, whenever I install something like a browser or an
> > >operating
> > >> system, it would ask if I want to make the particular piece of
> > >software
> > >> better
> > >> by sending back some anonymized stats. Basically, I am given a way to
> > >> explicitly opt-out if I wish.
> > >>
> > >> By turning the feature "on" by default is like saying: "we'll be
> > >collecting
> > >> some stats, but if you don't want to you can go here and there and
> > >disable
> > >> the
> > >> collection. Oh, and by the way - you need to go and figure out the
> > >exact
> > >> steps
> > >> to disable it."
> > >>
> > >> > Also, I want to re-iterate it again to avoid misunderstanding:
> > >there is
> > >> no
> > >> > proposal nor will there be a technical way to attribute collected
> > >data
> > >> back
> > >> > to a certain company. That's not what this is all about. We should
> > >only
> > >> be
> > >> > interested in aggregated stats (community size, geo information,
> > >language
> > >> > information, components usage).
> > >>
> > >> Yes, I think it is clear, but never hurts to re-iterate.
> > >>
> > >> Cos
> > >>
> > >> > Thoughts?
> > >> >
> > >> > --
> > >> > Nikita Ivanov
> > >> > Founder & CTO
> > >> > GridGain Systems
> > >> >
> > >> > On Fri, Jul 7, 2017 at 8:17 PM, Konstantin Boudnik <[hidden email]>
> > >> wrote:
> > >> >
> > >> > > Actually, that should be OFF by default. It sounds like this
> > >reduce the
> > >> > > amount
> > >> > > of the data collected, but this would address the concerns of
> > >companies
> > >> > > like
> > >> > > Roman's. I know for sure that a few of my clients would sue my
> > >ass out
> > >> of
> > >> > > existence if I gave them the platform collecting their
> > >data-centers
> > >> info.
> > >> > >
> > >> > > Let's have it, set if off by default and document and easy way to
> > >turn
> > >> it
> > >> > > off.
> > >> > > Then start making rounds asking our user base to share _some_ of
> > >the
> > >> stats
> > >> > > with the community, so we can track the growth of the install
> > >base,
> > >> etc.
> > >> > >
> > >> > > Cos
> > >> > >
> > >> > > On Thu, Jul 06, 2017 at 08:20AM, Nikita Ivanov wrote:
> > >> > > > The idea so far is to have a single system property in
> > >configuration
> > >> that
> > >> > > > turns this off completely. I envision that this will be
> > >prominently
> > >> > > > featured on Ignite website so that everyone who would like to
> > >> disable it
> > >> > > -
> > >> > > > can do it in seconds.
> > >> > > >
> > >> > > > Thoughts?
> > >> > > >
> > >> > > > --
> > >> > > > Nikita Ivanov
> > >> > > > Founder & CTO
> > >> > > > GridGain Systems
> > >> > > >
> > >> > > > On Wed, Jul 5, 2017 at 9:27 PM, Roman Shtykh
> > ><[hidden email]>
> > >> wrote:
> > >> > > >
> > >> > > > > Nikita,
> > >> > > > >
> > >> > > > > Sending and storing (somewhere the company cannot securely
> > >handle)
> > >> any
> > >> > > > > information (OS version, IP addresses, etc.) that can be used
> > >to
> > >> > > compromise
> > >> > > > > the services would be unacceptable.
> > >> > > > > Turning it off might be ok (possibly through the cluster
> > >settings,
> > >> not
> > >> > > via
> > >> > > > > globally-accessible site), but the thing that there's a risk
> > >some
> > >> > > > > information can leak outside (for any reason, starting from a
> > >human
> > >> > > > > mistake) is scary.
> > >> > > > >
> > >> > > > > -- Roman
> > >> > > > >
> > >> > > > >
> > >> > > > >
> > >> > > > >
> > >> > > > > On Thursday, July 6, 2017 12:38 PM, Nikita Ivanov <
> > >> > > [hidden email]>
> > >> > > > > wrote:
> > >> > > > >
> > >> > > > >
> > >> > > > > Roman,
> > >> > > > > Thanks for the feedback. What are those questions
> > >specifically?
> > >> Are IP
> > >> > > > > addresses and OS is what causing it?
> > >> > > > >
> > >> > > > > Thanks!
> > >> > > > >
> > >> > > > > --
> > >> > > > > Nikita Ivanov
> > >> > > > > Founder & CTO
> > >> > > > > GridGain Systems
> > >> > > > >
> > >> > > > > On Wed, Jul 5, 2017 at 6:15 PM, Roman Shtykh
> > >> <[hidden email]
> > >> > > >
> > >> > > > > wrote:
> > >> > > > >
> > >> > > > > NIkita,
> > >> > > > >
> > >> > > > > While this will help improve Ignite, it will prevent its
> > >adoption
> > >> by
> > >> > > many
> > >> > > > > projects -- sending and retaining IP adresses, OS versions,
> > >etc.
> > >> raises
> > >> > > > > tons of questions when considering to use Ignite. Even if it
> > >can be
> > >> > > opted
> > >> > > > > out.
> > >> > > > > -- Roman
> > >> > > > >
> > >> > > > >
> > >> > > > >     On Thursday, July 6, 2017 5:38 AM, Nikita Ivanov <
> > >> > > [hidden email]>
> > >> > > > > wrote:
> > >> > > > >
> > >> > > > >
> > >> > > > >  Igniters,
> > >> > > > > I would like to kick off the discussion on the idea of
> > >collecting
> > >> > > Ignite
> > >> > > > > usage statistics. The basic idea behind this is to better
> > >> understand
> > >> > > > > general and anonymous Ignite usage information to better
> > >calibrate
> > >> > > > > community efforts in developing new features, improving
> > >existing
> > >> ones,
> > >> > > > > delivering better documentation - and in every other way to
> > >make
> > >> our
> > >> > > > > project a better software solution.
> > >> > > > >
> > >> > > > > Although such instrumentation is standard practice in
> > >commercially
> > >> > > > > developed software, for an ASF project this could be a
> > >sensitive
> > >> issue.
> > >> > > > > Therefore I would like to initiate a full community
> > >discussion on
> > >> how
> > >> > > best
> > >> > > > > to implement such practice for the benefit of project while
> > >> ensuring
> > >> > > the
> > >> > > > > privacy protection of Ignite users.
> > >> > > > >
> > >> > > > > To ignite (pun intended) the discussion I'll outline below
> > >some of
> > >> the
> > >> > > > > basic thoughts that I have on this subject. They are here
> > >only to
> > >> give
> > >> > > an
> > >> > > > > idea of what such instrumentation may potentially look like
> > >so
> > >> that we
> > >> > > can
> > >> > > > > discuss the merits of this idea in a tangible context.
> > >> > > > >
> > >> > > > > Overview
> > >> > > > > -------------
> > >> > > > > Upon start and every hour thereafter each Ignite node will
> > >collect,
> > >> > > encrypt
> > >> > > > > and send usage statistics over HTTPS to the ASF-hosted
> > >server. That
> > >> > > server
> > >> > > > > will accept such HTTPS packets, decrypt them and store them
> > >in a
> > >> > > > > time-series DB. A web interface will be provided to view the
> > >usage
> > >> > > > > information.
> > >> > > > >
> > >> > > > > Opt-In or Opt-out
> > >> > > > > -------------------------
> > >> > > > > Opt-out. Ignite website will offer simple instructions
> > >(system
> > >> > > property) on
> > >> > > > > how to disable this instrumentation.
> > >> > > > >
> > >> > > > > Code, Infra, Access
> > >> > > > > ---------------------------
> > >> > > > > Ignite instrumentation will be part of the Ignite code base.
> > >The
> > >> > > collection
> > >> > > > > server will be a separate module in the Ignite code base
> > >(released
> > >> > > > > separately from Ignite). The collection server will be hosted
> > >by
> > >> ASF
> > >> > > Infra.
> > >> > > > >
> > >> > > > > Usage statistics will be publicly accessible by anyone in the
> > >> > > community.
> > >> > > > >
> > >> > > > > Private, Personal Data
> > >> > > > > ------------------------------
> > >> > > > > No private or personal data will ever be transferred. No
> > >emails,
> > >> > > usernames,
> > >> > > > > company names, grid names, etc.
> > >> > > > >
> > >> > > > > Data Retention
> > >> > > > > --------------------
> > >> > > > > All data will be retained for 1 year and deleted permanently
> > >> > > thereafter.
> > >> > > > >
> > >> > > > > Usage Data
> > >> > > > > ----------------
> > >> > > > > The following data will be collected in each packet sent to
> > >the
> > >> > > collection
> > >> > > > > server:
> > >> > > > > - GRID_SIZE (to correspond our testing environment with the
> > >more
> > >> > > frequent
> > >> > > > > cluster sizes)
> > >> > > > > - IP_ADDR (for general geo-tracking as well as to know what
> > >> > > documentation
> > >> > > > > language should be a priority)
> > >> > > > > - SES_ID (to track continues uptime vs. re-starts)
> > >> > > > > - USERNAME_TYPE (privilege username vs. standard, to track
> > >> production
> > >> > > vs.
> > >> > > > > dev/testing usage; note - this is not an actual username)
> > >> > > > > - OS_NAME
> > >> > > > > - OS_VER
> > >> > > > > - OS_ARCH
> > >> > > > > - JAVA_VER
> > >> > > > > - JAVA_VENDOR
> > >> > > > > - COMP_SQL (whether or not this feature was used)
> > >> > > > > - COMP_COMPUTE (whether or not this feature was used)
> > >> > > > > - COMP_DATAGRID (whether or not this feature was used)
> > >> > > > > - COMP_STREAMING (whether or not this feature was used)
> > >> > > > > - COMP_IGFS (whether or not this feature was used)
> > >> > > > > - COMP_SERVICE (whether or not this feature was used)
> > >> > > > > - COMP_PERSISTENCE (whether or not this feature was used)
> > >> > > > >
> > >> > > > > Please let's discuss this idea. Everyone's comments and
> > >> suggestions are
> > >> > > > > *extremely* welcome.
> > >> > > > >
> > >> > > > > Thanks,
> > >> > > > > Nikita Ivanov.
> > >> > > > >
> > >> > > > >
> > >> > > > >
> > >> > > > >
> > >> > > > >
> > >> > > > >
> > >> > > > >
> > >> > > > >
> > >> > >
> > >>
> > >>
> >
>