Igniters,
GA Grid (Beta) is an in memory Genetic Algorithm (GA) for Apache Ignite used to solve complex problems by simulating biological evolution. GA's are a form of Machine Learning (ML), excellent for finding an optimal solution, among possibly thousands (or more) candidate solutions for a given domain. GA Grid was developed by NetMillennium Inc, Inc outside of Apache Ignite platform as a proof of concept to determine feasibility in the GA space. In GA Grid, all genetic operations: Fitness Calculation, Crossover, and Mutation are modeled as a ComputeTask for distributive behavior. Also, these ComputeTasks leverage Apache Ignite's Affinity Colocation to route ComputeJobs to respective nodes where Chromosomes are stored in cache. After it's initial release, Denis Magda inquired about the possibility of donating GA Grid to Apache Ignite. Per discussions with Denis, he believed GA Grid would be well suited as an extension to Apache Ignite's ML library. Currently, NetMillennium, Inc. has agreed to begin the process of donating GA Grid to Apache Ignite. With it's latest release, GA Grid enhances knowledge discovery by providing custom SQL functions to 'pivot' genetic optimization results. This enables improved visualizations inside Apache Zeppelin. To learn more about GA Grid please visit: https://github.com/techbysample/gagrid Check out my recent post on how GA Grid for Ignite integrates with Zeppelin: https://www.linkedin.com/post/edit/apache-ignite-visualize-ga-grid-solutions-deep-turik-campbell Please advise. Best Regards, Turik Campbell NetMillennium, Inc. -- Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/ |
Hi Turik,
Is my understanding correct that GA Grid is a vertical component for genetic algorithms? So far Ignite has been a horizontal product without any vertical functionality. I personally would like to keep it this way (other community members should chime in here). I personally think it is great that GA Grid gets a lot of mention on Apache Ignite website and has been a valuable integration for the Ignite community. If you would like to join Apache, have you considered a separate Apache project? This may help you build a community around your work and benefit from the established Apache processes during incubation. D. On Wed, Nov 1, 2017 at 5:26 PM, techbysample <[hidden email]> wrote: > Igniters, > > GA Grid (Beta) is an in memory Genetic Algorithm (GA) for Apache Ignite > used > to solve complex problems by simulating biological evolution. > GA's are a form of Machine Learning (ML), excellent for finding an optimal > solution, among possibly thousands (or more) candidate solutions for a > given > domain. > > GA Grid was developed by NetMillennium Inc, Inc outside of Apache Ignite > platform as a proof of concept to determine feasibility in the GA space. > In GA Grid, all genetic operations: Fitness Calculation, Crossover, and > Mutation are modeled as a ComputeTask for distributive behavior. Also, > these ComputeTasks leverage Apache Ignite's Affinity Colocation to route > ComputeJobs to respective nodes where Chromosomes are stored in cache. > > After it's initial release, Denis Magda inquired about the possibility of > donating GA Grid to Apache Ignite. Per discussions with Denis, he believed > GA Grid would be well suited as an extension to Apache Ignite's ML library. > Currently, NetMillennium, Inc. has agreed to begin the process of donating > GA Grid to Apache Ignite. > > With it's latest release, GA Grid enhances knowledge discovery by providing > custom SQL functions to 'pivot' genetic optimization results. This enables > improved visualizations inside Apache Zeppelin. > > To learn more about GA Grid please visit: > > https://github.com/techbysample/gagrid > > Check out my recent post on how GA Grid for Ignite integrates with > Zeppelin: > > https://www.linkedin.com/post/edit/apache-ignite-visualize- > ga-grid-solutions-deep-turik-campbell > > Please advise. > > Best Regards, > Turik Campbell > NetMillennium, Inc. > > > > -- > Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/ > |
Dmitriy,
Hello. Unfortunately, I am not sure that I fully understand your comments: "..Is my understanding correct that GA Grid is a vertical component for genetic algorithms? So far Ignite has been a horizontal product without any vertical functionality. I personally would like to keep it this way.." Would you please clarify? I simply view GA Grid as a software component that implements a distributive Genetic Algorithm (GA). GA Grid relies on Apache Ignite's major features: advanced clustering, compute grid, data grid, etc. Here is a diagram of how GA Grid relates to other components within Ignite: <http://apache-ignite-developers.2346864.n4.nabble.com/file/t375/GAIgniteComps.png> Based my earlier discussion with Denis M., I assumed GA Grid could be added to the collection of ML algorithms within ML Grid, since GA's are a type of 'Machine Learning" algorithms. If it is determined that GA Grid would not fit into Apache Ignite architecturally, I would consider including it as separate Apache project. Denis, would you please add your feedback as well? Please advise. Best, Turik -- Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/ |
Hi guys,
Yes, in my opinion genetic algorithms developed by Turik perfectly fit our ML component as a separate package. Look, *ML* is a building block on top of Ignite distributed storage and compute grid. Same is true about GA grid. So, why don’t we merge GA into ML? Nikita, Yuri and the rest of ML folks please chime in. — Denis > On Nov 2, 2017, at 5:02 PM, techbysample <[hidden email]> wrote: > > Dmitriy, > > Hello. Unfortunately, I am not sure that I fully understand your comments: > > "..Is my understanding correct that GA Grid is a vertical component for > genetic algorithms? So far Ignite has been a horizontal product without any > vertical functionality. I personally would like to keep it this way.." > > Would you please clarify? > > I simply view GA Grid as a software component that implements a distributive > Genetic Algorithm (GA). > GA Grid relies on Apache Ignite's major features: advanced clustering, > compute grid, data grid, etc. > > Here is a diagram of how GA Grid relates to other components within Ignite: > > <http://apache-ignite-developers.2346864.n4.nabble.com/file/t375/GAIgniteComps.png> > > Based my earlier discussion with Denis M., I assumed GA Grid could be added > to the collection of ML algorithms within ML Grid, since GA's are a type of > 'Machine Learning" algorithms. > > If it is determined that GA Grid would not fit into Apache Ignite > architecturally, I would consider including > it as separate Apache project. > > Denis, would you please add your feedback as well? > > Please advise. > > Best, > Turik > > > > > -- > Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/ |
Hi all,
Please let me put some comments about GA Grid. Actually I like it, but currently it doesn't fit to our API. I`m not sure that we could merge GA Grid as is into ML module, but we see two possibilities. First is add GA Grid as separate module like ML module. Second is adapt this genetic algorithm as trainer for ML model(like regressions, clusterers, neural nets). Also we could use both approaches: add GA Grid as separate module and implement trainers based on GA Grid. Regards, Yury -- Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/ |
Yury,
There is no rush. So if the community agrees to accept GA as a part of ML then I would go for the second suggested approach. Turik, what’d you think about the second approach? — Denis > On Nov 3, 2017, at 8:54 AM, Yury Babak <[hidden email]> wrote: > > Hi all, > > Please let me put some comments about GA Grid. Actually I like it, but > currently it doesn't fit to our API. > > I`m not sure that we could merge GA Grid as is into ML module, but we see > two possibilities. > > First is add GA Grid as separate module like ML module. > > Second is adapt this genetic algorithm as trainer for ML model(like > regressions, clusterers, neural nets). > > Also we could use both approaches: add GA Grid as separate module and > implement trainers based on GA Grid. > > Regards, > Yury > > > > -- > Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/ |
Denis/Yury,
I am in favor of the second approach. Also, I envision GA Grid implemented into a separate package within the ML module. If possible, It seem's the initial priority would be merging GA Grid into the ML module such that it operates independently. Next, we can discuss/prioritize how best other ML algorithms could utilize GA Grid.(ie: implement trainers based on GA Grid as you mentioned) I am not very familiar with 'trainer' concepts in ML but don't see it as an issue. Please advise. Best, Turik -- Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/ |
Turik,
Basically we have two main concepts the model and the trainer. Each machine learning method generates some model which could predict some result based on learning dataset. This model is just a function and model training is minimization of loss function, difference between model predictions and actual values. The model trainer is a mechanism for minimization of loss function. Usually for this purpose uses gradient descent or its variations like SGD. And here we could use genetic algorithm for minimization of loss function. Usining genetic algorithm we could find optimal values for neuron weights in neural network, cluster centers, regression coefficients, etc. For both contepts we have API: org.apache.ignite.ml.Model and org.apache.ignite.ml.Trainer. So if we want to use genetic algoritm for model trainig we should implement specific trainer for each ML algorithms like lin regression, kmean, decision tree and others. For example let`s take a look on lin regression. Currently we have OLS (Ordinary Least Squares) multiple linear regression. For this regression we will have OLSRegressionModel and at least two possible trainers: analytical trainer (a solution of matrix equation, analytical solution) and gradient descent (numerical solution). And also we could implement GA trainer which will use GA Grid. NB: this API is currently under development and right now lin regression doesn't use model and trainer API, we will refactor this algorithm in nearest future. Regards, Yury -- Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/ |
Yuri, Turik,
Considering the concept the ML is built around, it should be straightforward to adopt the genetic algorithms to it. To be more specific, Genes and Chromosomes which are the central building blocks of GA turn out to be basic ML Models. All the standard genetic operations/algorithms such as mutation, crossover and fitness calculation correspond to ML Trainer. If my understanding is correct, then we should add GA as a package to ML lib and implement all the basic Model and Trainer interfaces. Sounds reasonable? — Denis > On Nov 7, 2017, at 6:07 AM, Yury Babak <[hidden email]> wrote: > > Turik, > > Basically we have two main concepts the model and the trainer. Each machine > learning method generates some model which could predict some result based > on learning dataset. This model is just a function and model training is > minimization of loss function, difference between model predictions and > actual values. > > The model trainer is a mechanism for minimization of loss function. Usually > for this purpose uses gradient descent or its variations like SGD. > > And here we could use genetic algorithm for minimization of loss function. > Usining genetic algorithm we could find optimal values for neuron weights in > neural network, cluster centers, regression coefficients, etc. > > For both contepts we have API: org.apache.ignite.ml.Model and > org.apache.ignite.ml.Trainer. So if we want to use genetic algoritm for > model trainig we should implement specific trainer for each ML algorithms > like lin regression, kmean, decision tree and others. > > For example let`s take a look on lin regression. Currently we have OLS > (Ordinary Least Squares) multiple linear regression. For this regression we > will have OLSRegressionModel and at least two possible trainers: analytical > trainer (a solution of matrix equation, analytical solution) and gradient > descent (numerical solution). And also we could implement GA trainer which > will use GA Grid. > > NB: this API is currently under development and right now lin regression > doesn't use model and trainer API, we will refactor this algorithm in > nearest future. > > Regards, > Yury > > > > -- > Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/ |
Denis,
Let me clarify. Firstly, here gene is a single model coefficient(neuron weight, etc), chromosomes - whole model representation. Secondly GA should be implementation of Trainer API for each ML algorithm such as regression, clusterization, NNs, etc. And last but not least genetic algorithm does not fits for to Model API, so it shouldn't`t implement it. Generally genetic algorithm dont produce any predictive models. Regards, Yury -- Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/ |
Yury,
Please see inline > On Nov 7, 2017, at 12:11 PM, Yury Babak <[hidden email]> wrote: > > Denis, > > Let me clarify. > > Firstly, here gene is a single model coefficient(neuron weight, etc), > chromosomes - whole model representation. > Sounds good. Actually a chromosome can be see as a model. > Secondly GA should be implementation of Trainer API for each ML algorithm > such as regression, clusterization, NNs, etc. > It’s optional, right? Initially there should be a way to run standard operations over chromosomes specific to GA only. Those operations/algorithms are crossover, fitness score, mutations. Do this 3 operations fit trainer API? We’re putting aside extended support of regression, clusterization, etc. > And last but not least genetic algorithm does not fits for to Model API, so > it shouldn't`t implement it. Generally genetic algorithm dont produce any > predictive models. > I’m a bit confused here. Before we say that a chromosome is a model in terms of ML and it’s all about providing concrete trainer implementations. — Denis > Regards, > Yury > > > > -- > Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/ |
This post was updated on .
Denis/Yury,
Upon review of your previous comments, please respond to my feedback : 1. I believe GA Grid can be implemented in separate package within ML library and operate independently of other algorithms for use cases where /only /GA is required. 2. I am still not totally clear concerning Trainer and Model relationship in the ML API. a. Am I correct that org.apache.ignite.ml.Trainer and org.apache.ignite.ml.Model API is not available as it is under development? Please advise when available. b. Based on Yury's comments: "...For both concepts we have API: org.apache.ignite.ml.Model and org.apache.ignite.ml.Trainer. So if we want to use genetic algoritm for model trainig we should implement specific trainer for each ML algorithms like lin regression, kmean, decision tree and others. For example let`s take a look on lin regression. Currently we have OLS (Ordinary Least Squares) multiple linear regression. For this regression we will have OLSRegressionModel and at least two possible trainers: analytical trainer (a solution of matrix equation, analytical solution) and gradient descent (numerical solution). And also we could implement GA trainer which will use GA Grid... " Do org.apache.ignite.ml.Trainer generate org.apache.ignite.ml.Models? Please advise and clarify accordingly. Best, Turik -- Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/ |
Denis/Yury,
I updated my previous post slightly, For clarity, Please simply respond to this post and disregard previous. Upon review of your previous comments, please respond to my feedback : 1. I believe GA Grid can be implemented in separate package within ML library and operate independently of other algorithms for use cases where /only /GA is required. 2. I am still not totally clear concerning Trainer and Model relationship in the ML API. a. Since org.apache.ignite.ml.Trainer and org.apache.ignite.ml.Model are not available, when is it planned to be available? Please advise when available. b. Based on Yury's comments: "...For both concepts we have API: org.apache.ignite.ml.Model and org.apache.ignite.ml.Trainer. So if we want to use genetic algoritm for model trainig we should implement specific trainer for each ML algorithms like lin regression, kmean, decision tree and others. For example let`s take a look on lin regression. Currently we have OLS (Ordinary Least Squares) multiple linear regression. For this regression we will have OLSRegressionModel and at least two possible trainers: analytical trainer (a solution of matrix equation, analytical solution) and gradient descent (numerical solution). And also we could implement GA trainer which will use GA Grid... " Do org.apache.ignite.ml.Trainer generate org.apache.ignite.ml.Models? This part is still not clear.. Please advise and clarify accordingly. Best, Turik -- Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/ |
Turik,
1) Yes, it`s correct. 2.a) Model API is available, Trainer API in PR(https://github.com/apache/ignite/pull/2936) which should be merged today or tomorrow. 2.b) Yes, Trainer generates Model. Here is the Trainer interface: public interface Trainer<M extends Model, T> { public M train(T data); } Regards, Yury -- Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/ |
Yury,
Thanks for feedback. I reviewed the Trainer API at: https://github.com/apache/ignite/blob/db7697b17cf6eb94754edb2b5e200655a3610dc1/modules/ml/src/main/java/org/apache/ignite/ml/Trainer.java and also recommend approach that new "GA trainers" should be implemented that will use GA Grid". Denis/Yury, Please advise on next steps based on most recent posts. Best, Turik -- Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/ |
Turik,
From my point of view our first step is add GA Grid as is into package org.apache.ignite.ml.genetic in ML module. It shouldn't be a problem, but before this we should check that GA Grid fits to our codestyle. So please prepare pull-request with GA Grid. Also if nobody object I will create ticket in our JIRA for this first step. And also we have few formal steps, I hope Denis could help with them. Regards, Yury -- Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/ |
In addition to that, we need to push GA grid through the IP clearance process:
http://incubator.apache.org/ip-clearance/ip-clearance-template.html This is how an IP clearance form looked like and process happened when GridGain was donating Ignite persistence: http://incubator.apache.org/ip-clearance/persistent-distributed-store-ignite.html I’ll help with the formalities. — Denis > On Nov 13, 2017, at 10:03 AM, Yury Babak <[hidden email]> wrote: > > Turik, > > From my point of view our first step is add GA Grid as is into package > org.apache.ignite.ml.genetic in ML module. > > It shouldn't be a problem, but before this we should check that GA Grid fits > to our codestyle. > > So please prepare pull-request with GA Grid. > > Also if nobody object I will create ticket in our JIRA for this first step. > > And also we have few formal steps, I hope Denis could help with them. > > Regards, > Yury > > > > -- > Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/ |
Denis,
Ok. It's not clear what stage we are in this process. Do I need to fill out the IP Clearance form here? http://incubator.apache.org/ip-clearance/ip-clearance-template.html If so, I will simply model what was done for Ignite Persistence Store. Also, What about the software grant form? When will it be provided? Will I follow steps/guidelines here for pull request?: https://cwiki.apache.org/confluence/display/IGNITE/How+to+Contribute#HowtoContribute- (Section: 1. Create GitHub pull-request) Please advise on general order of steps in this whole process.. Regards, Turik -- Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/ |
Turik,
For making pull-request you should perform the following steps: 1) create JIRA account(in case you dont have one) (https://issues.apache.org/jira) 1.1) write to dev-list and ask contributors permission. 2) assign the ticket which I`ve created for you (https://issues.apache.org/jira/browse/IGNITE-6899) 3) clone Apache Ignite and create new branch from master, for example branch name could be ignite-6889 4) add whole GA Grid to package org.apache.ignite.ml.genetic 5) add some tests for GA Grid 6) add some examples to example module(your current tests are good for just move them to example module) 7) create pull-request from your branch to our master After those steps we will perform code review.And after this we could merge PR. Regards, Yury -- Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/ |
Also please check the coding guideline -
https://cwiki.apache.org/confluence/display/IGNITE/Coding+Guidelines Code in PR should fits to this guideline. Regards, Yury -- Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/ |
Free forum by Nabble | Edit this page |