Guys,
Since the first version of Ignite ML module was merged into ignite 2.0 we want to discuss our next steps. Currently we think about 3 big areas to explore: 1) Regression and clustering algorithms. 2) Deep Learning/Neural Networks stuff. 3) DSL/scripting support. Suggestions/thoughts about these topics (or something else which you think we have missed) are welcome here as well as in IGNITE-5029. Some details about above topics. * First draft of ordinary least squares linear regression is in progress (by Artem, IGNITE-5012). * Deep learning/other NN stuff: currently Artem is investigating existing frameworks like Tensorflow/Encog/etc to find out if we can integrate with them somehow or at least define the scope/ideas for API of DL/NN functionality we need. * Also we think about using Java 8 Nashorn as script engine and possibility of build R-like DSL (mostly by me). Thanks, Yury Babak. |
Sounds like a good plan to tackle for 1.0 GA release sometime in the future.
-- Nikita Ivanov On Fri, Apr 21, 2017 at 9:43 AM, Yury Babak <[hidden email]> wrote: > Guys, > > Since the first version of Ignite ML module was merged into ignite 2.0 we > want to discuss our next steps. > > Currently we think about 3 big areas to explore: > > 1) Regression and clustering algorithms. > 2) Deep Learning/Neural Networks stuff. > 3) DSL/scripting support. > > Suggestions/thoughts about these topics (or something else which you think > we have missed) are welcome here as well as in IGNITE-5029. > > Some details about above topics. > > * First draft of ordinary least squares linear regression is in progress > (by > Artem, IGNITE-5012). > * Deep learning/other NN stuff: currently Artem is investigating existing > frameworks like Tensorflow/Encog/etc to find out if we can integrate with > them somehow or at least define the scope/ideas for API of DL/NN > functionality we need. > * Also we think about using Java 8 Nashorn as script engine and possibility > of build R-like DSL (mostly by me). > > Thanks, > Yury Babak. > > > > > > -- > View this message in context: http://apache-ignite- > developers.2346864.n4.nabble.com/Ignite-ML-next-steps- > IGNITE-5029-tp17096.html > Sent from the Apache Ignite Developers mailing list archive at Nabble.com. > |
Hi,
excellent job so far! What about classification algorithms? If everyone agrees, let's start by adding logistic regression to the list: https://issues.apache.org/jira/browse/IGNITE-5059 Best regards, Vladisav On Fri, Apr 21, 2017 at 10:00 PM, Nikita Ivanov <[hidden email]> wrote: > Sounds like a good plan to tackle for 1.0 GA release sometime in the > future. > > -- > Nikita Ivanov > > > On Fri, Apr 21, 2017 at 9:43 AM, Yury Babak <[hidden email]> wrote: > > > Guys, > > > > Since the first version of Ignite ML module was merged into ignite 2.0 we > > want to discuss our next steps. > > > > Currently we think about 3 big areas to explore: > > > > 1) Regression and clustering algorithms. > > 2) Deep Learning/Neural Networks stuff. > > 3) DSL/scripting support. > > > > Suggestions/thoughts about these topics (or something else which you > think > > we have missed) are welcome here as well as in IGNITE-5029. > > > > Some details about above topics. > > > > * First draft of ordinary least squares linear regression is in progress > > (by > > Artem, IGNITE-5012). > > * Deep learning/other NN stuff: currently Artem is investigating existing > > frameworks like Tensorflow/Encog/etc to find out if we can integrate with > > them somehow or at least define the scope/ideas for API of DL/NN > > functionality we need. > > * Also we think about using Java 8 Nashorn as script engine and > possibility > > of build R-like DSL (mostly by me). > > > > Thanks, > > Yury Babak. > > > > > > > > > > > > -- > > View this message in context: http://apache-ignite- > > developers.2346864.n4.nabble.com/Ignite-ML-next-steps- > > IGNITE-5029-tp17096.html > > Sent from the Apache Ignite Developers mailing list archive at > Nabble.com. > > > |
Sure. It would be great!
|
In reply to this post by Yuriy Babak
I think this
> 3) DSL/scripting support. makes a lot of sense. In all honesty, over a number of experiences with different approaches in the area I found that Groovy has the most convenient and flexible way of building DSLs. E.g. Apache Camel uses it for their own DSL as well as many other projects, including Apache Bigtop. As an quick overview I suggest to look at http://docs.groovy-lang.org/docs/latest/html/documentation/core-domain-specific-languages.html And it seems like a relatively simple exercise to add the support to Ignite, considering its Java foundation (and, thank you Dao, not Scala!). Seriously. Cos -- Take care, Konstantin (Cos) Boudnik 2CAC 8312 4870 D885 8616 6115 220F 6980 1F27 E622 Disclaimer: Opinions expressed in this email are those of the author, and do not necessarily represent the views of any company the author might be affiliated with at the moment of writing. On Fri, Apr 21, 2017 at 9:43 AM, Yury Babak <[hidden email]> wrote: > Guys, > > Since the first version of Ignite ML module was merged into ignite 2.0 we > want to discuss our next steps. > > Currently we think about 3 big areas to explore: > > 1) Regression and clustering algorithms. > 2) Deep Learning/Neural Networks stuff. > 3) DSL/scripting support. > > Suggestions/thoughts about these topics (or something else which you think > we have missed) are welcome here as well as in IGNITE-5029. > > Some details about above topics. > > * First draft of ordinary least squares linear regression is in progress (by > Artem, IGNITE-5012). > * Deep learning/other NN stuff: currently Artem is investigating existing > frameworks like Tensorflow/Encog/etc to find out if we can integrate with > them somehow or at least define the scope/ideas for API of DL/NN > functionality we need. > * Also we think about using Java 8 Nashorn as script engine and possibility > of build R-like DSL (mostly by me). > > Thanks, > Yury Babak. > > > > > > -- > View this message in context: http://apache-ignite-developers.2346864.n4.nabble.com/Ignite-ML-next-steps-IGNITE-5029-tp17096.html > Sent from the Apache Ignite Developers mailing list archive at Nabble.com. |
First of all thanks for this advice.
And DSL/Scripting update: Actually it's a two separate features. The first is provide scripting from some web ui. I think we could use web-console as ui part and JSR 223 for scripting itself, Nashorn for JS and Jython for Python. And the second feature - DSL. For those features I've created IGNITE-5065. Thanks, Yury. |
It is preferable to avoid hard bindings to some exact scripting engines. If
user wants to plug in Groovy we should allow it. As for DSL I believe it is a waste of time. Few years ago it was somewhat popular idea to create DSLs for everything, but no one actually wants to learn new quirky languages, so it it never worked out. All in all the best choice is to have a good API that can be conveniently used from any scripting language + ability to plug in any scripting engine user likes. Sergi 2017-04-25 20:31 GMT+03:00 Yury Babak <[hidden email]>: > First of all thanks for this advice. > > And DSL/Scripting update: > > Actually it's a two separate features. The first is provide scripting from > some web ui. I think we could use web-console as ui part and JSR 223 for > scripting itself, Nashorn for JS and Jython for Python. > > And the second feature - DSL. > > For those features I've created IGNITE-5065. > > Thanks, > Yury. > > > > -- > View this message in context: http://apache-ignite- > developers.2346864.n4.nabble.com/Ignite-ML-next-steps- > IGNITE-5029-tp17096p17211.html > Sent from the Apache Ignite Developers mailing list archive at Nabble.com. > |
In reply to this post by Yuriy Babak
Yury,
Thanks for driving this. From my side I would suggest looking at Spark MLlib borrowing the most frequently used algorithms from there. You've already mentioned regression and clustering algorithms, however, it’s reasonable to support classifications and decision trees. http://spark.apache.org/docs/latest/ml-guide.html <http://spark.apache.org/docs/latest/ml-guide.html> Next, according to my observations Ignite ML Lib has to support Ruby and Python if we wish the lib to be used by researches and scientists. Finally, we have to find a better way on how to integrate Java 8 based Ignite ML with the rest of the platform. Presently, it’s a pain for Ignite build and release processes to treat Ignite ML differently. I propose to make up a solution by the time of Apache Ignite 2.1. — Denis > On Apr 21, 2017, at 9:43 AM, Yury Babak <[hidden email]> wrote: > > Guys, > > Since the first version of Ignite ML module was merged into ignite 2.0 we > want to discuss our next steps. > > Currently we think about 3 big areas to explore: > > 1) Regression and clustering algorithms. > 2) Deep Learning/Neural Networks stuff. > 3) DSL/scripting support. > > Suggestions/thoughts about these topics (or something else which you think > we have missed) are welcome here as well as in IGNITE-5029. > > Some details about above topics. > > * First draft of ordinary least squares linear regression is in progress (by > Artem, IGNITE-5012). > * Deep learning/other NN stuff: currently Artem is investigating existing > frameworks like Tensorflow/Encog/etc to find out if we can integrate with > them somehow or at least define the scope/ideas for API of DL/NN > functionality we need. > * Also we think about using Java 8 Nashorn as script engine and possibility > of build R-like DSL (mostly by me). > > Thanks, > Yury Babak. > > > > > > -- > View this message in context: http://apache-ignite-developers.2346864.n4.nabble.com/Ignite-ML-next-steps-IGNITE-5029-tp17096.html > Sent from the Apache Ignite Developers mailing list archive at Nabble.com. |
Guys,
First of all, great job on the contribution! I took a brief look at the source of ML lib and have a comment regarding the SparseDistributedMatrixStorage. It looks like it will create a new cache for every new matrix. This sounds a little bit excessive to me, because (at least in my understanding) new matrix creation will be a typical operation for many of the ML algorithms, so this should be a fast operation. Cache creation, on the other hand, requires a discovery ring message and then full exchange cycle, which is not too fast. Moreover, when a new cache is being created, all cache operations on other caches are also blocked (this is because we have cross cache transactions and the need to properly synchronize updates and rebalancing). I would suggest creating a cache per some more long-living entity and modify the storage so that multiple matrices are stored in the same cache. Ignite data structures (Set, Queue, etc) are implemented exactly the same way. In this case, you will have a very fast matrix creation which will not block other operations. --AG 2017-04-26 7:35 GMT+03:00 Denis Magda <[hidden email]>: > Yury, > > Thanks for driving this. From my side I would suggest looking at Spark > MLlib borrowing the most frequently used algorithms from there. You've > already mentioned regression and clustering algorithms, however, it’s > reasonable to support classifications and decision trees. > http://spark.apache.org/docs/latest/ml-guide.html < > http://spark.apache.org/docs/latest/ml-guide.html> > > Next, according to my observations Ignite ML Lib has to support Ruby and > Python if we wish the lib to be used by researches and scientists. > > Finally, we have to find a better way on how to integrate Java 8 based > Ignite ML with the rest of the platform. Presently, it’s a pain for Ignite > build and release processes to treat Ignite ML differently. I propose to > make up a solution by the time of Apache Ignite 2.1. > > — > Denis > > > On Apr 21, 2017, at 9:43 AM, Yury Babak <[hidden email]> wrote: > > > > Guys, > > > > Since the first version of Ignite ML module was merged into ignite 2.0 we > > want to discuss our next steps. > > > > Currently we think about 3 big areas to explore: > > > > 1) Regression and clustering algorithms. > > 2) Deep Learning/Neural Networks stuff. > > 3) DSL/scripting support. > > > > Suggestions/thoughts about these topics (or something else which you > think > > we have missed) are welcome here as well as in IGNITE-5029. > > > > Some details about above topics. > > > > * First draft of ordinary least squares linear regression is in progress > (by > > Artem, IGNITE-5012). > > * Deep learning/other NN stuff: currently Artem is investigating existing > > frameworks like Tensorflow/Encog/etc to find out if we can integrate with > > them somehow or at least define the scope/ideas for API of DL/NN > > functionality we need. > > * Also we think about using Java 8 Nashorn as script engine and > possibility > > of build R-like DSL (mostly by me). > > > > Thanks, > > Yury Babak. > > > > > > > > > > > > -- > > View this message in context: http://apache-ignite- > developers.2346864.n4.nabble.com/Ignite-ML-next-steps- > IGNITE-5029-tp17096.html > > Sent from the Apache Ignite Developers mailing list archive at > Nabble.com. > > |
Alexey,
Thanks for this advice. We will refactor this matrix. I think that one dedicated cache for ML sparse distributed matrices will good solution. In that case we need implementation of giveNextCacheKeySet() logic foreach new matrix instead of new cache creation. |
In reply to this post by Yuriy Babak
Guys,
Here's update with some details. We expect several new features/bugfixes: * Ordinary least squares (OLS) linear regression, IGNITE-5012. * OLS examples, IGNITE-5112. * Logistic regression, IGNITE-5059. * K-means clustering, IGNITE-5113. * Refactoring SparseDistributedMatrix, IGNITE-5109. * Parallel matrix multiply/plus, IGNITE-5114. * Scripting support, IGNITE-5065. Also we are looking on the following topics: * Further NN/DL investigation. * Downpour SGD. * Clustering visualization, maybe we don't need it. Please feel free to submit any other suggestions. Thanks, Yury Babak. |
In reply to this post by Yuriy Babak
In the initial implementation of foldMap/map for sparse we considered only nonzero elements. This is problem because it conflicts with behaviour of other matrix implementation. For example, if we try to run map (+1) over sparse identity matrix, we'll get not matrix with 2's on diagonal and other elements equal to one, but matrix with 2s on diagonal and 0s in other positions. Also if we try fold (by fold(f) i mean foldMap(f, identity)) sparse identity matrix with min, we'll get 1 instead of 0. I've started fixing this behaviour in IGNITE-5102. Basic idea for fixing map is to map default element along with nondefault elements (we'll now have not necessary zero as default element). For fold the idea is in performing folding n times with default element (n is count of default element) after folding on nondefault elements. Actually this approach requires accumulator function to be commutative, but it seems that it does not conflict with javadoc for foldMap (it does not specify any order of folding explicitly, so there is no order guaranteed and therefore for operation to be well defined, accumulator should be commutative). Any objections for this?
|
In reply to this post by Yuriy Babak
Update about DNNs: there is DL4J framework which currently has module for integration with spark. Maybe we can write a module for integration with Apache Ignite and get their rich functionality rather easily. Still investigating it, but it looks promising. If someone has thoughts on it, please, share.
Regards, Artem. |
In reply to this post by Yuriy Babak
Update:
Here's the last batch of ml related tickets for next a few months: * Add Stream API support to Ignite ML matrices, IGNITE-5216. * Gradient descent, IGNITE-5217. * Decision trees, IGNITE-5218. * Generalization of cost function for Linear Regression, IGNITE-5219. * Investigate possibility of integrating with dl4j, IGNITE-5221. Regards, Yury Babak. |
Looks good, thanks for the update!
Yuri, what do you think can be done in 2.1 time frame. This release should be rolled out ~ in the mid of June. — Denis > On May 15, 2017, at 5:19 AM, Yury Babak <[hidden email]> wrote: > > Update: > > Here's the last batch of ml related tickets for next a few months: > > * Add Stream API support to Ignite ML matrices, IGNITE-5216 > <https://issues.apache.org/jira/browse/IGNITE-5216> . > * Gradient descent, IGNITE-5217 > <https://issues.apache.org/jira/browse/IGNITE-5217> . > * Decision trees, IGNITE-5218 > <https://issues.apache.org/jira/browse/IGNITE-5218> . > * Generalization of cost function for Linear Regression, IGNITE-5219 > <https://issues.apache.org/jira/browse/IGNITE-5219> . > * Investigate possibility of integrating with dl4j, IGNITE-5221 > <https://issues.apache.org/jira/browse/IGNITE-5221> . > > Regards, > Yury Babak. > > > > -- > View this message in context: http://apache-ignite-developers.2346864.n4.nabble.com/Ignite-ML-next-steps-IGNITE-5029-tp17096p17651.html > Sent from the Apache Ignite Developers mailing list archive at Nabble.com. |
Hi,
From my point of view it's IGNITE-5113(k-means), IGNITE-5109(already done), IGNITE-5114(optimized distributed matrix arithmetic), IGNITE-5102(fix for fold/map behaviour) for sure and possibly the IGNITE-5059(Logistic regression) if we will have enough time before code freeze. Regards, Yury |
Free forum by Nabble | Edit this page |