Ignite ML, next steps (IGNITE-5029)

classic Classic list List threaded Threaded
17 messages Options
Reply | Threaded
Open this post in threaded view
|

Ignite ML, next steps (IGNITE-5029)

Yuriy Babak
Guys,

Since the first version of Ignite ML module was merged into ignite 2.0 we want to discuss our next steps.

Currently we think about 3 big areas to explore:

1) Regression and clustering algorithms.
2) Deep Learning/Neural Networks stuff.
3) DSL/scripting support.

Suggestions/thoughts about these topics (or something else which you think we have missed) are welcome here as well as in IGNITE-5029.

Some details about above topics.

* First draft of ordinary least squares linear regression is in progress (by Artem, IGNITE-5012).
* Deep learning/other NN stuff: currently Artem is investigating existing frameworks like Tensorflow/Encog/etc to find out if we can integrate with them somehow or at least define the scope/ideas for API of DL/NN functionality we need.
* Also we think about using Java 8 Nashorn as script engine and possibility of build R-like DSL (mostly by me).

Thanks,
Yury Babak.

Reply | Threaded
Open this post in threaded view
|

Re: Ignite ML, next steps (IGNITE-5029)

nivanov
Sounds like a good plan to tackle for 1.0 GA release sometime in the future.

--
Nikita Ivanov


On Fri, Apr 21, 2017 at 9:43 AM, Yury Babak <[hidden email]> wrote:

> Guys,
>
> Since the first version of Ignite ML module was merged into ignite 2.0 we
> want to discuss our next steps.
>
> Currently we think about 3 big areas to explore:
>
> 1) Regression and clustering algorithms.
> 2) Deep Learning/Neural Networks stuff.
> 3) DSL/scripting support.
>
> Suggestions/thoughts about these topics (or something else which you think
> we have missed) are welcome here as well as in IGNITE-5029.
>
> Some details about above topics.
>
> * First draft of ordinary least squares linear regression is in progress
> (by
> Artem, IGNITE-5012).
> * Deep learning/other NN stuff: currently Artem is investigating existing
> frameworks like Tensorflow/Encog/etc to find out if we can integrate with
> them somehow or at least define the scope/ideas for API of DL/NN
> functionality we need.
> * Also we think about using Java 8 Nashorn as script engine and possibility
> of build R-like DSL (mostly by me).
>
> Thanks,
> Yury Babak.
>
>
>
>
>
> --
> View this message in context: http://apache-ignite-
> developers.2346864.n4.nabble.com/Ignite-ML-next-steps-
> IGNITE-5029-tp17096.html
> Sent from the Apache Ignite Developers mailing list archive at Nabble.com.
>
Reply | Threaded
Open this post in threaded view
|

Re: Ignite ML, next steps (IGNITE-5029)

Vladisav Jelisavcic
Hi,

excellent job so far!

What about classification algorithms?
If everyone agrees, let's start by adding logistic regression to the list:
https://issues.apache.org/jira/browse/IGNITE-5059


Best regards,
Vladisav


On Fri, Apr 21, 2017 at 10:00 PM, Nikita Ivanov <[hidden email]> wrote:

> Sounds like a good plan to tackle for 1.0 GA release sometime in the
> future.
>
> --
> Nikita Ivanov
>
>
> On Fri, Apr 21, 2017 at 9:43 AM, Yury Babak <[hidden email]> wrote:
>
> > Guys,
> >
> > Since the first version of Ignite ML module was merged into ignite 2.0 we
> > want to discuss our next steps.
> >
> > Currently we think about 3 big areas to explore:
> >
> > 1) Regression and clustering algorithms.
> > 2) Deep Learning/Neural Networks stuff.
> > 3) DSL/scripting support.
> >
> > Suggestions/thoughts about these topics (or something else which you
> think
> > we have missed) are welcome here as well as in IGNITE-5029.
> >
> > Some details about above topics.
> >
> > * First draft of ordinary least squares linear regression is in progress
> > (by
> > Artem, IGNITE-5012).
> > * Deep learning/other NN stuff: currently Artem is investigating existing
> > frameworks like Tensorflow/Encog/etc to find out if we can integrate with
> > them somehow or at least define the scope/ideas for API of DL/NN
> > functionality we need.
> > * Also we think about using Java 8 Nashorn as script engine and
> possibility
> > of build R-like DSL (mostly by me).
> >
> > Thanks,
> > Yury Babak.
> >
> >
> >
> >
> >
> > --
> > View this message in context: http://apache-ignite-
> > developers.2346864.n4.nabble.com/Ignite-ML-next-steps-
> > IGNITE-5029-tp17096.html
> > Sent from the Apache Ignite Developers mailing list archive at
> Nabble.com.
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Ignite ML, next steps (IGNITE-5029)

ArtemM
Sure. It would be great!
Reply | Threaded
Open this post in threaded view
|

Re: Ignite ML, next steps (IGNITE-5029)

Konstantin Boudnik-2
In reply to this post by Yuriy Babak
I think this
> 3) DSL/scripting support.
makes a lot of sense. In all honesty, over a number of experiences
with different approaches in the area I found that Groovy has the most
convenient and flexible way of building DSLs. E.g. Apache Camel uses
it for their own DSL as well as many other projects, including Apache
Bigtop.

As an quick overview I suggest to look at
  http://docs.groovy-lang.org/docs/latest/html/documentation/core-domain-specific-languages.html

And it seems like a relatively simple exercise to add the support to
Ignite, considering its Java foundation (and, thank you Dao, not
Scala!). Seriously.

Cos

--
  Take care,
Konstantin (Cos) Boudnik
2CAC 8312 4870 D885 8616  6115 220F 6980 1F27 E622

Disclaimer: Opinions expressed in this email are those of the author,
and do not necessarily represent the views of any company the author
might be affiliated with at the moment of writing.


On Fri, Apr 21, 2017 at 9:43 AM, Yury Babak <[hidden email]> wrote:

> Guys,
>
> Since the first version of Ignite ML module was merged into ignite 2.0 we
> want to discuss our next steps.
>
> Currently we think about 3 big areas to explore:
>
> 1) Regression and clustering algorithms.
> 2) Deep Learning/Neural Networks stuff.
> 3) DSL/scripting support.
>
> Suggestions/thoughts about these topics (or something else which you think
> we have missed) are welcome here as well as in IGNITE-5029.
>
> Some details about above topics.
>
> * First draft of ordinary least squares linear regression is in progress (by
> Artem, IGNITE-5012).
> * Deep learning/other NN stuff: currently Artem is investigating existing
> frameworks like Tensorflow/Encog/etc to find out if we can integrate with
> them somehow or at least define the scope/ideas for API of DL/NN
> functionality we need.
> * Also we think about using Java 8 Nashorn as script engine and possibility
> of build R-like DSL (mostly by me).
>
> Thanks,
> Yury Babak.
>
>
>
>
>
> --
> View this message in context: http://apache-ignite-developers.2346864.n4.nabble.com/Ignite-ML-next-steps-IGNITE-5029-tp17096.html
> Sent from the Apache Ignite Developers mailing list archive at Nabble.com.
Reply | Threaded
Open this post in threaded view
|

Re: Ignite ML, next steps (IGNITE-5029)

Yuriy Babak
First of all thanks for this advice.

And DSL/Scripting update:

Actually it's a two separate features. The first is provide scripting from some web ui. I think we could use web-console as ui part and JSR 223 for scripting itself, Nashorn for JS and Jython for Python.

And the second feature - DSL.

For those features I've created IGNITE-5065.

Thanks,
Yury.
Reply | Threaded
Open this post in threaded view
|

Re: Ignite ML, next steps (IGNITE-5029)

Sergi
It is preferable to avoid hard bindings to some exact scripting engines. If
user wants to plug in Groovy we should allow it.

As for DSL I believe it is a waste of time. Few years ago it was somewhat
popular idea to create DSLs for everything, but no one actually wants to
learn new quirky languages, so it it never worked out.

All in all the best choice is to have a good API that can be conveniently
used from any scripting language + ability to plug in any scripting engine
user likes.

Sergi

2017-04-25 20:31 GMT+03:00 Yury Babak <[hidden email]>:

> First of all thanks for this advice.
>
> And DSL/Scripting update:
>
> Actually it's a two separate features. The first is provide scripting from
> some web ui. I think we could use web-console as ui part and JSR 223 for
> scripting itself, Nashorn for JS and Jython for Python.
>
> And the second feature - DSL.
>
> For those features I've created IGNITE-5065.
>
> Thanks,
> Yury.
>
>
>
> --
> View this message in context: http://apache-ignite-
> developers.2346864.n4.nabble.com/Ignite-ML-next-steps-
> IGNITE-5029-tp17096p17211.html
> Sent from the Apache Ignite Developers mailing list archive at Nabble.com.
>
Reply | Threaded
Open this post in threaded view
|

Re: Ignite ML, next steps (IGNITE-5029)

dmagda
In reply to this post by Yuriy Babak
Yury,

Thanks for driving this. From my side I would suggest looking at Spark MLlib borrowing the most frequently used algorithms from there. You've already mentioned regression and clustering algorithms, however, it’s reasonable to support classifications and decision trees.
http://spark.apache.org/docs/latest/ml-guide.html <http://spark.apache.org/docs/latest/ml-guide.html>

Next, according to my observations Ignite ML Lib has to support Ruby and Python if we wish the lib to be used by researches and scientists.

Finally, we have to find a better way on how to integrate Java 8 based Ignite ML with the rest of the platform. Presently, it’s a pain for Ignite build and release processes to treat Ignite ML differently. I propose to make up a solution by the time of Apache Ignite 2.1.


Denis

> On Apr 21, 2017, at 9:43 AM, Yury Babak <[hidden email]> wrote:
>
> Guys,
>
> Since the first version of Ignite ML module was merged into ignite 2.0 we
> want to discuss our next steps.
>
> Currently we think about 3 big areas to explore:
>
> 1) Regression and clustering algorithms.
> 2) Deep Learning/Neural Networks stuff.
> 3) DSL/scripting support.
>
> Suggestions/thoughts about these topics (or something else which you think
> we have missed) are welcome here as well as in IGNITE-5029.
>
> Some details about above topics.
>
> * First draft of ordinary least squares linear regression is in progress (by
> Artem, IGNITE-5012).
> * Deep learning/other NN stuff: currently Artem is investigating existing
> frameworks like Tensorflow/Encog/etc to find out if we can integrate with
> them somehow or at least define the scope/ideas for API of DL/NN
> functionality we need.
> * Also we think about using Java 8 Nashorn as script engine and possibility
> of build R-like DSL (mostly by me).
>
> Thanks,
> Yury Babak.
>
>
>
>
>
> --
> View this message in context: http://apache-ignite-developers.2346864.n4.nabble.com/Ignite-ML-next-steps-IGNITE-5029-tp17096.html
> Sent from the Apache Ignite Developers mailing list archive at Nabble.com.

Reply | Threaded
Open this post in threaded view
|

Re: Ignite ML, next steps (IGNITE-5029)

Alexey Goncharuk
Guys,

First of all, great job on the contribution!

I took a brief look at the source of ML lib and have a comment regarding
the SparseDistributedMatrixStorage. It looks like it will create a new
cache for every new matrix. This sounds a little bit excessive to me,
because (at least in my understanding) new matrix creation will be a
typical operation for many of the ML algorithms, so this should be a fast
operation. Cache creation, on the other hand, requires a discovery ring
message and then full exchange cycle, which is not too fast. Moreover, when
a new cache is being created, all cache operations on other caches are also
blocked (this is because we have cross cache transactions and the need to
properly synchronize updates and rebalancing).

I would suggest creating a cache per some more long-living entity and
modify the storage so that multiple matrices are stored in the same cache.
Ignite data structures (Set, Queue, etc) are implemented exactly the same
way. In this case, you will have a very fast matrix creation which will not
block other operations.

--AG

2017-04-26 7:35 GMT+03:00 Denis Magda <[hidden email]>:

> Yury,
>
> Thanks for driving this. From my side I would suggest looking at Spark
> MLlib borrowing the most frequently used algorithms from there. You've
> already mentioned regression and clustering algorithms, however, it’s
> reasonable to support classifications and decision trees.
> http://spark.apache.org/docs/latest/ml-guide.html <
> http://spark.apache.org/docs/latest/ml-guide.html>
>
> Next, according to my observations Ignite ML Lib has to support Ruby and
> Python if we wish the lib to be used by researches and scientists.
>
> Finally, we have to find a better way on how to integrate Java 8 based
> Ignite ML with the rest of the platform. Presently, it’s a pain for Ignite
> build and release processes to treat Ignite ML differently. I propose to
> make up a solution by the time of Apache Ignite 2.1.
>
> —
> Denis
>
> > On Apr 21, 2017, at 9:43 AM, Yury Babak <[hidden email]> wrote:
> >
> > Guys,
> >
> > Since the first version of Ignite ML module was merged into ignite 2.0 we
> > want to discuss our next steps.
> >
> > Currently we think about 3 big areas to explore:
> >
> > 1) Regression and clustering algorithms.
> > 2) Deep Learning/Neural Networks stuff.
> > 3) DSL/scripting support.
> >
> > Suggestions/thoughts about these topics (or something else which you
> think
> > we have missed) are welcome here as well as in IGNITE-5029.
> >
> > Some details about above topics.
> >
> > * First draft of ordinary least squares linear regression is in progress
> (by
> > Artem, IGNITE-5012).
> > * Deep learning/other NN stuff: currently Artem is investigating existing
> > frameworks like Tensorflow/Encog/etc to find out if we can integrate with
> > them somehow or at least define the scope/ideas for API of DL/NN
> > functionality we need.
> > * Also we think about using Java 8 Nashorn as script engine and
> possibility
> > of build R-like DSL (mostly by me).
> >
> > Thanks,
> > Yury Babak.
> >
> >
> >
> >
> >
> > --
> > View this message in context: http://apache-ignite-
> developers.2346864.n4.nabble.com/Ignite-ML-next-steps-
> IGNITE-5029-tp17096.html
> > Sent from the Apache Ignite Developers mailing list archive at
> Nabble.com.
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Ignite ML, next steps (IGNITE-5029)

Yuriy Babak
Alexey,

Thanks for this advice. We will refactor this matrix. I think that one dedicated cache for ML sparse distributed matrices will good solution.

In that case we need implementation of giveNextCacheKeySet() logic foreach new matrix instead of new cache creation.
Reply | Threaded
Open this post in threaded view
|

Re: Ignite ML, next steps (IGNITE-5029)

Yuriy Babak
Update:

I've created ticket for this: IGNITE-5109.

Regards,
Yury Babak.
Reply | Threaded
Open this post in threaded view
|

Re: Ignite ML, next steps (IGNITE-5029)

Yuriy Babak
In reply to this post by Yuriy Babak
Guys,

Here's update with some details.

We expect several new features/bugfixes:

* Ordinary least squares (OLS) linear regression, IGNITE-5012.
* OLS examples, IGNITE-5112.
* Logistic regression, IGNITE-5059.
* K-means clustering, IGNITE-5113.
* Refactoring SparseDistributedMatrix, IGNITE-5109.
* Parallel matrix multiply/plus,  IGNITE-5114.
* Scripting support, IGNITE-5065.

Also we are looking on the following topics:

* Further NN/DL investigation.
* Downpour SGD.
* Clustering visualization, maybe we don't need it.

Please feel free to submit any other suggestions.

Thanks,
Yury Babak.
Reply | Threaded
Open this post in threaded view
|

Re: Ignite ML, next steps (IGNITE-5029)

ArtemM
In reply to this post by Yuriy Babak
In the initial implementation of foldMap/map for sparse we considered only nonzero elements. This is problem because it conflicts with behaviour  of other matrix implementation. For example, if we try to run map (+1) over sparse identity matrix, we'll get not matrix with 2's on diagonal and other elements equal to one, but matrix with 2s on diagonal and 0s in other positions. Also if we try fold (by fold(f) i mean foldMap(f, identity)) sparse identity matrix with min, we'll get 1 instead of 0. I've started fixing this behaviour in IGNITE-5102. Basic idea for fixing map is to map default element along with nondefault elements (we'll now have not necessary zero as default element). For fold the idea is in performing folding n times with default element (n is count of default element) after folding on nondefault elements.  Actually this approach requires accumulator function to be commutative, but it seems that it does not conflict with javadoc for foldMap (it does not specify any order of folding explicitly, so there is no order guaranteed and therefore for operation to be well defined, accumulator should be commutative). Any objections for this?
Reply | Threaded
Open this post in threaded view
|

Re: Ignite ML, next steps (IGNITE-5029)

ArtemM
In reply to this post by Yuriy Babak
Update about DNNs: there is DL4J framework which currently has module for integration with spark. Maybe we can write a module for integration with Apache Ignite and get their rich functionality rather easily. Still investigating it, but it looks promising. If someone has thoughts on it, please, share.

Regards, Artem.
Reply | Threaded
Open this post in threaded view
|

Re: Ignite ML, next steps (IGNITE-5029)

Yuriy Babak
In reply to this post by Yuriy Babak
Update:

Here's the last batch of ml related tickets for next a few months:

* Add Stream API support to Ignite ML matrices, IGNITE-5216.
* Gradient descent, IGNITE-5217.
* Decision trees, IGNITE-5218.
* Generalization of cost function for Linear Regression, IGNITE-5219.
* Investigate possibility of integrating with dl4j, IGNITE-5221.

Regards,
Yury Babak.
Reply | Threaded
Open this post in threaded view
|

Re: Ignite ML, next steps (IGNITE-5029)

dmagda
Looks good, thanks for the update!

Yuri, what do you think can be done in 2.1 time frame. This release should be rolled out ~ in the mid of June.


Denis

> On May 15, 2017, at 5:19 AM, Yury Babak <[hidden email]> wrote:
>
> Update:
>
> Here's the last batch of ml related tickets for next a few months:
>
> * Add Stream API support to Ignite ML matrices,  IGNITE-5216
> <https://issues.apache.org/jira/browse/IGNITE-5216>  .
> * Gradient descent,  IGNITE-5217
> <https://issues.apache.org/jira/browse/IGNITE-5217>  .
> * Decision trees,  IGNITE-5218
> <https://issues.apache.org/jira/browse/IGNITE-5218>  .
> * Generalization of cost function for Linear Regression,  IGNITE-5219
> <https://issues.apache.org/jira/browse/IGNITE-5219>  .
> * Investigate possibility of integrating with dl4j,  IGNITE-5221
> <https://issues.apache.org/jira/browse/IGNITE-5221>  .
>
> Regards,
> Yury Babak.
>
>
>
> --
> View this message in context: http://apache-ignite-developers.2346864.n4.nabble.com/Ignite-ML-next-steps-IGNITE-5029-tp17096p17651.html
> Sent from the Apache Ignite Developers mailing list archive at Nabble.com.

Reply | Threaded
Open this post in threaded view
|

Re: Ignite ML, next steps (IGNITE-5029)

Yuriy Babak
Hi,

From my point of view it's IGNITE-5113(k-means), IGNITE-5109(already done), IGNITE-5114(optimized distributed matrix arithmetic), IGNITE-5102(fix for fold/map behaviour) for sure and possibly the IGNITE-5059(Logistic regression) if we will have enough time before code freeze.

Regards,
Yury