[ML] Machine Learning Pipeline Improvement

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

[ML] Machine Learning Pipeline Improvement

Alexey Zinoviev
 Hi Igniters,

I suggest to add and implement by myself sequential pipeline of machine
learning operations including all preprocessing stages like Pipeline object
in Python library scikit-learn (look here
http://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html
for the details)

It can be combined with current Cross-Validator and Evaluator objects.

The possible solution will sequentially apply a list of transforms and a
final estimator.

Alexey
Reply | Threaded
Open this post in threaded view
|

Re: [ML] Machine Learning Pipeline Improvement

dmagda
Hi Alexey,

I can't name myself an ML expert but heard that our ML component is missing
some essential data preprocessing APIs.

Are these pipelines part of our intention to bring in the preprocessing
APIs to Ignite?

--
Denis

On Thu, Jul 19, 2018 at 5:29 AM Alexey Zinoviev <[hidden email]>
wrote:

>  Hi Igniters,
>
> I suggest to add and implement by myself sequential pipeline of machine
> learning operations including all preprocessing stages like Pipeline object
> in Python library scikit-learn (look here
>
> http://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html
> for the details)
>
> It can be combined with current Cross-Validator and Evaluator objects.
>
> The possible solution will sequentially apply a list of transforms and a
> final estimator.
>
> Alexey
>
Reply | Threaded
Open this post in threaded view
|

Re: [ML] Machine Learning Pipeline Improvement

Alexey Zinoviev
Yes, it make the prerocessing easy and clear for reading and understanding.

In API it will looks like 

Model mdl = Pipeline.of(reading, featureExctracting, labelExtracting, normalizing, encoding, scaling, logisticRegression) 

where in .of(...) we can see the sequence of ML stages.




Reply | Threaded
Open this post in threaded view
|

Re: [ML] Machine Learning Pipeline Improvement

Yuriy Babak
Alexey,

I like this idea, this should improve usability of our ML module.

Regards,
Yury



--
Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: [ML] Machine Learning Pipeline Improvement

Alexey Zinoviev
Could you please create a ticket for this task?

2018-07-20 16:47 GMT+06:00 Yury Babak <[hidden email]>:

> Alexey,
>
> I like this idea, this should improve usability of our ML module.
>
> Regards,
> Yury
>
>
>
> --
> Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
>
Reply | Threaded
Open this post in threaded view
|

Re: [ML] Machine Learning Pipeline Improvement

Yuriy Babak
Reply | Threaded
Open this post in threaded view
|

Re: [ML] Machine Learning Pipeline Improvement

Alexey Zinoviev
The prototype of the API will look like that

PipelineMdl mdl = new Pipeline<Integer, Object[]> ()
                       .addFeatureExtractor(featureExtractor)
                       .addLabelExtractor(lbExtractor)
                       .addStage(new EncoderTrainer<Integer, Object[]>()
                           .withEncoderType(EncoderType.STRING_ENCODER)
                           .withEncodedFeature(1)
                           .withEncodedFeature(6))
                       .addStage(new ImputerTrainer<Integer, Object[]>())
                       .addStage(new MinMaxScalerTrainer<Integer,
Object[]>())
                       .addStage(new NormalizationTrainer<Integer,
Object[]>()
                           .withP(1))
                       .addFinalStage(new
DecisionTreeClassificationTrainer(5, 0))
                       .fit(ignite, dataCache);

Also, I've added separate ticket for the update of ParamGrid/CrossValidation
API to support tune hyperparameters not only in final trainers but in
intermideate preprocessing stages too.

https://issues.apache.org/jira/browse/IGNITE-9497

I suggest to add this feature in 2.8 because it doesn't change the current
API of algorithms and has no serialized issues



--
Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: [ML] Machine Learning Pipeline Improvement

Manu
Hi, all!

Could be viable to integrate Apache Arrow to improve ML computation using
GPU?
Out of this thread, could be viable to integrate Apache Arrow to improve
Indexing computation using GPU?

Regards

https://rapids.ai <https://rapids.ai>  
https://arrow.apache.org <https://arrow.apache.org>  



--
Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: [ML] Machine Learning Pipeline Improvement

Alexey Zinoviev
Dear Manu
it could be a great idea!

Could you please provide any examples of Apache Arrow integration for speed
up ML computation in another ML frameworks, it would be very helpful!

Sincerely yours
      Alexey Zinovyev



--
Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/