GA Grid: Request to contribute GA library to Apache Ignite

classic Classic list List threaded Threaded
48 messages Options
123
Reply | Threaded
Open this post in threaded view
|

GA Grid: Request to contribute GA library to Apache Ignite

techbysample
Igniters,

GA Grid (Beta) is an in memory Genetic Algorithm (GA) for Apache Ignite used
to solve complex problems by simulating biological evolution.
GA's are a form of Machine Learning (ML), excellent for finding an optimal
solution, among possibly thousands (or more) candidate solutions for a given
domain.

GA Grid was developed by NetMillennium Inc, Inc outside of Apache Ignite
platform as a proof of concept to determine feasibility in the GA space.
In GA Grid, all genetic operations: Fitness Calculation, Crossover, and
Mutation  are modeled as a ComputeTask for distributive behavior.  Also,
these ComputeTasks leverage Apache Ignite's Affinity Colocation to route
ComputeJobs to respective nodes where Chromosomes are stored in cache.

After it's initial release, Denis Magda inquired about the possibility of
donating GA Grid to Apache Ignite. Per discussions with Denis, he believed
GA Grid would be well suited as an extension to Apache Ignite's ML library.
Currently, NetMillennium, Inc. has agreed to begin the process of donating
GA Grid to Apache Ignite.

With it's latest release, GA Grid enhances knowledge discovery by providing
custom SQL functions to 'pivot' genetic optimization results. This enables
improved visualizations inside Apache Zeppelin.

To learn more about GA Grid please visit:

https://github.com/techbysample/gagrid

Check out my recent post on how GA Grid for Ignite integrates with Zeppelin:

https://www.linkedin.com/post/edit/apache-ignite-visualize-ga-grid-solutions-deep-turik-campbell

Please advise.

Best Regards,
Turik Campbell
NetMillennium, Inc.



--
Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: GA Grid: Request to contribute GA library to Apache Ignite

dsetrakyan
Hi Turik,

Is my understanding correct that GA Grid is a vertical component for
genetic algorithms? So far Ignite has been a horizontal product without any
vertical functionality. I personally would like to keep it this way (other
community members should chime in here).

I personally think it is great that GA Grid gets a lot of mention on Apache
Ignite website and has been a valuable integration for the Ignite community.

If you would like to join Apache, have you considered a separate Apache
project? This may help you build a community around your work and benefit
from the established Apache processes during incubation.

D.


On Wed, Nov 1, 2017 at 5:26 PM, techbysample <[hidden email]> wrote:

> Igniters,
>
> GA Grid (Beta) is an in memory Genetic Algorithm (GA) for Apache Ignite
> used
> to solve complex problems by simulating biological evolution.
> GA's are a form of Machine Learning (ML), excellent for finding an optimal
> solution, among possibly thousands (or more) candidate solutions for a
> given
> domain.
>
> GA Grid was developed by NetMillennium Inc, Inc outside of Apache Ignite
> platform as a proof of concept to determine feasibility in the GA space.
> In GA Grid, all genetic operations: Fitness Calculation, Crossover, and
> Mutation  are modeled as a ComputeTask for distributive behavior.  Also,
> these ComputeTasks leverage Apache Ignite's Affinity Colocation to route
> ComputeJobs to respective nodes where Chromosomes are stored in cache.
>
> After it's initial release, Denis Magda inquired about the possibility of
> donating GA Grid to Apache Ignite. Per discussions with Denis, he believed
> GA Grid would be well suited as an extension to Apache Ignite's ML library.
> Currently, NetMillennium, Inc. has agreed to begin the process of donating
> GA Grid to Apache Ignite.
>
> With it's latest release, GA Grid enhances knowledge discovery by providing
> custom SQL functions to 'pivot' genetic optimization results. This enables
> improved visualizations inside Apache Zeppelin.
>
> To learn more about GA Grid please visit:
>
> https://github.com/techbysample/gagrid
>
> Check out my recent post on how GA Grid for Ignite integrates with
> Zeppelin:
>
> https://www.linkedin.com/post/edit/apache-ignite-visualize-
> ga-grid-solutions-deep-turik-campbell
>
> Please advise.
>
> Best Regards,
> Turik Campbell
> NetMillennium, Inc.
>
>
>
> --
> Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
>
Reply | Threaded
Open this post in threaded view
|

Re: GA Grid: Request to contribute GA library to Apache Ignite

techbysample
Dmitriy,

Hello. Unfortunately, I am not sure that I fully understand your comments:  

"..Is my understanding correct that GA Grid is a vertical component for
genetic algorithms?  So far Ignite has been a horizontal product without any
vertical functionality. I personally would like to keep it this way.."

Would you please clarify?

I simply view GA Grid as a software component that implements a distributive
Genetic Algorithm (GA).
GA Grid relies on Apache Ignite's major features: advanced clustering,
compute grid, data grid, etc.

Here is a diagram of how GA Grid relates to other components within Ignite:

<http://apache-ignite-developers.2346864.n4.nabble.com/file/t375/GAIgniteComps.png>

Based my earlier discussion with Denis M., I assumed GA Grid could be added
to the collection of ML algorithms within ML Grid, since GA's are a type of
'Machine Learning" algorithms.  

If it is determined that GA Grid would not fit into Apache Ignite
architecturally, I would consider including
it as separate Apache project.

Denis, would you please add your feedback as well?

Please advise.

Best,
Turik




--
Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: GA Grid: Request to contribute GA library to Apache Ignite

dmagda
Hi guys,

Yes, in my opinion genetic algorithms developed by Turik perfectly fit our ML component as a separate package. Look, *ML* is a building block on top of Ignite distributed storage and compute grid. Same is true about GA grid. So, why don’t we merge GA into ML?

Nikita, Yuri and the rest of ML folks please chime in.


Denis


> On Nov 2, 2017, at 5:02 PM, techbysample <[hidden email]> wrote:
>
> Dmitriy,
>
> Hello. Unfortunately, I am not sure that I fully understand your comments:  
>
> "..Is my understanding correct that GA Grid is a vertical component for
> genetic algorithms?  So far Ignite has been a horizontal product without any
> vertical functionality. I personally would like to keep it this way.."
>
> Would you please clarify?
>
> I simply view GA Grid as a software component that implements a distributive
> Genetic Algorithm (GA).
> GA Grid relies on Apache Ignite's major features: advanced clustering,
> compute grid, data grid, etc.
>
> Here is a diagram of how GA Grid relates to other components within Ignite:
>
> <http://apache-ignite-developers.2346864.n4.nabble.com/file/t375/GAIgniteComps.png>
>
> Based my earlier discussion with Denis M., I assumed GA Grid could be added
> to the collection of ML algorithms within ML Grid, since GA's are a type of
> 'Machine Learning" algorithms.  
>
> If it is determined that GA Grid would not fit into Apache Ignite
> architecturally, I would consider including
> it as separate Apache project.
>
> Denis, would you please add your feedback as well?
>
> Please advise.
>
> Best,
> Turik
>
>
>
>
> --
> Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/

Reply | Threaded
Open this post in threaded view
|

Re: GA Grid: Request to contribute GA library to Apache Ignite

Yuriy Babak
Hi all,

Please let me put some comments about GA Grid. Actually I like it, but
currently it doesn't fit to our API.

I`m not sure that we could merge GA Grid as is into ML module, but we see
two possibilities.

First is add GA Grid as separate module like ML module.

Second is adapt this genetic algorithm as trainer for ML model(like
regressions, clusterers, neural nets).

Also we could use both approaches: add GA Grid as separate module and
implement trainers based on GA Grid.

Regards,
Yury



--
Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: GA Grid: Request to contribute GA library to Apache Ignite

dmagda
Yury,

There is no rush. So if the community agrees to accept GA as a part of ML then I would go for the second suggested approach.

Turik, what’d you think about the second approach?


Denis

> On Nov 3, 2017, at 8:54 AM, Yury Babak <[hidden email]> wrote:
>
> Hi all,
>
> Please let me put some comments about GA Grid. Actually I like it, but
> currently it doesn't fit to our API.
>
> I`m not sure that we could merge GA Grid as is into ML module, but we see
> two possibilities.
>
> First is add GA Grid as separate module like ML module.
>
> Second is adapt this genetic algorithm as trainer for ML model(like
> regressions, clusterers, neural nets).
>
> Also we could use both approaches: add GA Grid as separate module and
> implement trainers based on GA Grid.
>
> Regards,
> Yury
>
>
>
> --
> Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/

Reply | Threaded
Open this post in threaded view
|

Re: GA Grid: Request to contribute GA library to Apache Ignite

techbysample
Denis/Yury,

I am in favor of the second approach.  Also, I envision GA Grid implemented
into a separate package within the ML module.  

If possible, It seem's the initial priority would be merging GA Grid into
the ML module such that it operates independently. Next, we can
discuss/prioritize how best other ML algorithms could utilize GA Grid.(ie:
implement trainers based on GA Grid as you mentioned)
I am not very familiar with 'trainer' concepts in  ML but don't see it as an
issue.


Please advise.

Best,
Turik



--
Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: GA Grid: Request to contribute GA library to Apache Ignite

Yuriy Babak
Turik,

Basically we have two main concepts the model and the trainer. Each machine
learning method generates some model which could predict some result based
on learning dataset. This model is just a function and model training is
minimization of loss function, difference between model predictions and
actual values.

The model trainer is a mechanism for minimization of loss function. Usually
for this purpose uses gradient descent or its variations like SGD.

And here we could use genetic algorithm for minimization of loss function.
Usining genetic algorithm we could find optimal values for neuron weights in
neural network, cluster centers, regression coefficients, etc.

For both contepts we have API: org.apache.ignite.ml.Model and
org.apache.ignite.ml.Trainer. So if we want to use genetic algoritm for
model trainig we should implement specific trainer for each ML algorithms
like lin regression, kmean, decision tree and others.

For example let`s take a look on lin regression. Currently we have OLS
(Ordinary Least Squares) multiple linear regression. For this regression we
will have OLSRegressionModel and at least two possible trainers: analytical
trainer (a solution of matrix equation, analytical solution) and gradient
descent (numerical solution). And also we could implement GA trainer which
will use GA Grid.

NB: this API is currently under development and right now lin regression
doesn't use model and trainer API, we will refactor this algorithm in
nearest future.

Regards,
Yury



--
Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: GA Grid: Request to contribute GA library to Apache Ignite

dmagda
Yuri, Turik,

Considering the concept the ML is built around, it should be straightforward to adopt the genetic algorithms to it.

To be more specific, Genes and Chromosomes which are the central building blocks of GA turn out to be basic ML Models. All the standard genetic operations/algorithms such as mutation, crossover and fitness calculation correspond to ML Trainer.

If my understanding is correct, then we should add GA as a package to ML lib and implement all the basic Model and Trainer interfaces.  
 
Sounds reasonable?


Denis

> On Nov 7, 2017, at 6:07 AM, Yury Babak <[hidden email]> wrote:
>
> Turik,
>
> Basically we have two main concepts the model and the trainer. Each machine
> learning method generates some model which could predict some result based
> on learning dataset. This model is just a function and model training is
> minimization of loss function, difference between model predictions and
> actual values.
>
> The model trainer is a mechanism for minimization of loss function. Usually
> for this purpose uses gradient descent or its variations like SGD.
>
> And here we could use genetic algorithm for minimization of loss function.
> Usining genetic algorithm we could find optimal values for neuron weights in
> neural network, cluster centers, regression coefficients, etc.
>
> For both contepts we have API: org.apache.ignite.ml.Model and
> org.apache.ignite.ml.Trainer. So if we want to use genetic algoritm for
> model trainig we should implement specific trainer for each ML algorithms
> like lin regression, kmean, decision tree and others.
>
> For example let`s take a look on lin regression. Currently we have OLS
> (Ordinary Least Squares) multiple linear regression. For this regression we
> will have OLSRegressionModel and at least two possible trainers: analytical
> trainer (a solution of matrix equation, analytical solution) and gradient
> descent (numerical solution). And also we could implement GA trainer which
> will use GA Grid.
>
> NB: this API is currently under development and right now lin regression
> doesn't use model and trainer API, we will refactor this algorithm in
> nearest future.
>
> Regards,
> Yury
>
>
>
> --
> Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/

Reply | Threaded
Open this post in threaded view
|

Re: GA Grid: Request to contribute GA library to Apache Ignite

Yuriy Babak
Denis,

Let me clarify.

Firstly, here gene is a single model coefficient(neuron weight, etc),
chromosomes - whole model representation.

Secondly GA should be implementation of Trainer API for each ML algorithm
such as regression, clusterization, NNs, etc.

And last but not least genetic algorithm does not fits for to Model API, so
it shouldn't`t implement it. Generally genetic algorithm dont produce any
predictive models.

Regards,
Yury



--
Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: GA Grid: Request to contribute GA library to Apache Ignite

dmagda
Yury,

Please see inline

> On Nov 7, 2017, at 12:11 PM, Yury Babak <[hidden email]> wrote:
>
> Denis,
>
> Let me clarify.
>
> Firstly, here gene is a single model coefficient(neuron weight, etc),
> chromosomes - whole model representation.
>

Sounds good. Actually a chromosome can be see as a model.

> Secondly GA should be implementation of Trainer API for each ML algorithm
> such as regression, clusterization, NNs, etc.
>

It’s optional, right? Initially there should be a way to run standard operations over chromosomes specific to GA only. Those operations/algorithms are crossover, fitness score, mutations. Do this 3 operations fit trainer API? We’re putting aside extended support of regression, clusterization, etc.

> And last but not least genetic algorithm does not fits for to Model API, so
> it shouldn't`t implement it. Generally genetic algorithm dont produce any
> predictive models.
>

I’m a bit confused here. Before we say that a chromosome is a model in terms of ML and it’s all about providing concrete trainer implementations.


Denis

> Regards,
> Yury
>
>
>
> --
> Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/

Reply | Threaded
Open this post in threaded view
|

Re: GA Grid: Request to contribute GA library to Apache Ignite

techbysample
This post was updated on .
Denis/Yury,

Upon review of your previous comments,  please respond to my feedback :
 
1.  I believe GA Grid can be implemented in separate package within ML
library and operate independently  of other algorithms for use cases where
/only /GA is required.

2.   I am still not totally clear concerning Trainer and Model relationship
in the ML API.

     a. Am I correct that org.apache.ignite.ml.Trainer and
org.apache.ignite.ml.Model
        API is not available as it is under development? Please advise when available.

     b.  Based on  Yury's comments:
 
    "...For both concepts we have API: org.apache.ignite.ml.Model and
org.apache.ignite.ml.Trainer. So if we want to use genetic algoritm for
model trainig we should implement specific trainer for each ML algorithms
like lin regression, kmean, decision tree and others.

For example let`s take a look on lin regression. Currently we have OLS
(Ordinary Least Squares) multiple linear regression. For this regression we
will have OLSRegressionModel and at least two possible trainers: analytical
trainer (a solution of matrix equation, analytical solution) and gradient
descent (numerical solution). And also we could implement GA trainer which
will use GA Grid... "

    Do org.apache.ignite.ml.Trainer generate org.apache.ignite.ml.Models?

  Please advise and clarify accordingly.

Best,
Turik

 


 





--
Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: GA Grid: Request to contribute GA library to Apache Ignite

techbysample
Denis/Yury,

I updated my previous post slightly,
For clarity, Please simply respond to this post and disregard previous.

Upon review of your previous comments,  please respond to my feedback :
 
1.  I believe GA Grid can be implemented in separate package within ML
library and operate independently  of other algorithms for use cases where
/only /GA is required.

2.   I am still not totally clear concerning Trainer and Model relationship
in the ML API.

     a. Since org.apache.ignite.ml.Trainer and
org.apache.ignite.ml.Model
        are not available, when is it planned to be available?

      Please advise when available.

     b.  Based on  Yury's comments:
 
    "...For both concepts we have API: org.apache.ignite.ml.Model and
org.apache.ignite.ml.Trainer. So if we want to use genetic algoritm for
model trainig we should implement specific trainer for each ML algorithms
like lin regression, kmean, decision tree and others.

For example let`s take a look on lin regression. Currently we have OLS
(Ordinary Least Squares) multiple linear regression. For this regression we
will have OLSRegressionModel and at least two possible trainers: analytical
trainer (a solution of matrix equation, analytical solution) and gradient
descent (numerical solution). And also we could implement GA trainer which
will use GA Grid... "

    Do org.apache.ignite.ml.Trainer generate org.apache.ignite.ml.Models?
    This part is still not clear..

  Please advise and clarify accordingly.


Best,
Turik



--
Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: GA Grid: Request to contribute GA library to Apache Ignite

Yuriy Babak
Turik,

1) Yes, it`s correct.

2.a) Model API is available, Trainer API in
PR(https://github.com/apache/ignite/pull/2936) which should be merged today
or tomorrow.

2.b) Yes, Trainer generates Model. Here is the Trainer interface:

public interface Trainer<M extends Model, T> {
   public M train(T data);
}

Regards,
Yury



--
Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: GA Grid: Request to contribute GA library to Apache Ignite

techbysample
Yury,

Thanks for feedback.

I reviewed the Trainer API at:
https://github.com/apache/ignite/blob/db7697b17cf6eb94754edb2b5e200655a3610dc1/modules/ml/src/main/java/org/apache/ignite/ml/Trainer.java 
and also recommend approach that new "GA trainers" should be implemented
that will use GA Grid".


Denis/Yury,

Please advise on next steps based on most recent posts.

Best,
Turik





--
Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: GA Grid: Request to contribute GA library to Apache Ignite

Yuriy Babak
Turik,

From my point of view our first step is add GA Grid as is into package
org.apache.ignite.ml.genetic in ML module.

It shouldn't be a problem, but before this we should check that GA Grid fits
to our codestyle.

So please prepare pull-request with GA Grid.

Also if nobody object I will create ticket in our JIRA for this first step.

And also we have few formal steps, I hope Denis could help with them.

Regards,
Yury



--
Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: GA Grid: Request to contribute GA library to Apache Ignite

dmagda
In addition to that, we need to push GA grid through the IP clearance process:
http://incubator.apache.org/ip-clearance/ip-clearance-template.html

This is how an IP clearance form looked like and process happened when GridGain was donating Ignite persistence:
http://incubator.apache.org/ip-clearance/persistent-distributed-store-ignite.html

I’ll help with the formalities.


Denis

> On Nov 13, 2017, at 10:03 AM, Yury Babak <[hidden email]> wrote:
>
> Turik,
>
> From my point of view our first step is add GA Grid as is into package
> org.apache.ignite.ml.genetic in ML module.
>
> It shouldn't be a problem, but before this we should check that GA Grid fits
> to our codestyle.
>
> So please prepare pull-request with GA Grid.
>
> Also if nobody object I will create ticket in our JIRA for this first step.
>
> And also we have few formal steps, I hope Denis could help with them.
>
> Regards,
> Yury
>
>
>
> --
> Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/

Reply | Threaded
Open this post in threaded view
|

Re: GA Grid: Request to contribute GA library to Apache Ignite

techbysample
Denis,

Ok.  It's not clear what stage we are in this process.

Do I need to fill out the IP Clearance form here?
http://incubator.apache.org/ip-clearance/ip-clearance-template.html
If so, I will simply model what was done for Ignite Persistence Store.

Also, What about the software grant form? When will it be provided?

Will I follow steps/guidelines here for pull request?:
https://cwiki.apache.org/confluence/display/IGNITE/How+to+Contribute#HowtoContribute-
(Section: 1. Create GitHub pull-request)

Please advise on general order of steps in this whole process..

Regards,
Turik





--
Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: GA Grid: Request to contribute GA library to Apache Ignite

Yuriy Babak
Turik,

For making pull-request you should perform the following steps:

1) create JIRA account(in case you dont have one)
(https://issues.apache.org/jira)
1.1) write to dev-list and ask contributors permission.
2) assign the ticket which I`ve created for you
(https://issues.apache.org/jira/browse/IGNITE-6899)
3) clone Apache Ignite and create new branch from master, for example branch
name could be ignite-6889
4) add whole GA Grid to package org.apache.ignite.ml.genetic
5) add some tests for GA Grid
6) add some examples to example module(your current tests are good for just
move them to example module)
7) create pull-request from your branch to our master

After those steps we will perform code review.And after this we could merge
PR.

Regards,
Yury



--
Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: GA Grid: Request to contribute GA library to Apache Ignite

Yuriy Babak
Also please check the coding guideline -
https://cwiki.apache.org/confluence/display/IGNITE/Coding+Guidelines

Code in PR should fits to this guideline.

Regards,
Yury



--
Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
123