[ML][DISCUSSION] The future of Vectorizer

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

[ML][DISCUSSION] The future of Vectorizer

Alexey Zinoviev
Hi, Igniters

The new functionality of building Vectors was merged to Apache Ignite in the
next commit
<https://github.com/apache/ignite/commit/a0a15d62a250defb0db9ec72153ee287830f6a15>  

This new functionality brings to Ignite ML the new approach of building
vectors. But in my opinion the shouldn't constrain ourselves with narrow
understanding of Vector nature as an analogue of double[] array.

I suggest to extend the Vector and Vectorizer API to support Strings and
another types (like Blobs, Images and etc) as a vector elements.

It brings next advantages:
* gives a chance to inify the hierarchy of Preprocessing Trainers and Model
Trainers
* give us a chance to implement ML algorithms working not only with doubles
* unifies our Vectorizers as a first step in our Pipelines
* drops a lot of unused generics
* makes one simple requirement to final users: convert their data to Vectors

Join to discussion, ML-interested persons and share your opinon here!



--
Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: [ML][DISCUSSION] The future of Vectorizer

dmitrievanthony
It's a brilliant idea, I agree!



--
Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: [ML][DISCUSSION] The future of Vectorizer

aplatonov
In reply to this post by Alexey Zinoviev
Yep, I definitely agree with you.

Moreover, such improvement should reduce parallel hierarchies in trainers
and preprocessors, from this point of view preprocessor will be equal to a
trainer. In my opinion, this improvement is very important for ml module
because it can give a flexible hierarchy of components.

I created a ticket for serializable object support in Vectors:
https://issues.apache.org/jira/browse/IGNITE-11647
Another related ticket to this thread:
https://issues.apache.org/jira/browse/IGNITE-11642

чт, 28 мар. 2019 г. в 11:27, Alexey Zinoviev <[hidden email]>:

> Hi, Igniters
>
> The new functionality of building Vectors was merged to Apache Ignite in
> the
> next commit
> <
> https://github.com/apache/ignite/commit/a0a15d62a250defb0db9ec72153ee287830f6a15>
>
>
> This new functionality brings to Ignite ML the new approach of building
> vectors. But in my opinion the shouldn't constrain ourselves with narrow
> understanding of Vector nature as an analogue of double[] array.
>
> I suggest to extend the Vector and Vectorizer API to support Strings and
> another types (like Blobs, Images and etc) as a vector elements.
>
> It brings next advantages:
> * gives a chance to inify the hierarchy of Preprocessing Trainers and Model
> Trainers
> * give us a chance to implement ML algorithms working not only with doubles
> * unifies our Vectorizers as a first step in our Pipelines
> * drops a lot of unused generics
> * makes one simple requirement to final users: convert their data to
> Vectors
>
> Join to discussion, ML-interested persons and share your opinon here!
>
>
>
> --
> Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
>
Reply | Threaded
Open this post in threaded view
|

Re: [ML][DISCUSSION] The future of Vectorizer

aplatonov
Hey, Igniters!
I prepared some PR for Serializable support in our Vectors.
Could you review this: https://github.com/apache/ignite/pull/6378 ?

чт, 28 мар. 2019 г. в 11:47, Алексей Платонов <[hidden email]>:

> Yep, I definitely agree with you.
>
> Moreover, such improvement should reduce parallel hierarchies in trainers
> and preprocessors, from this point of view preprocessor will be equal to a
> trainer. In my opinion, this improvement is very important for ml module
> because it can give a flexible hierarchy of components.
>
> I created a ticket for serializable object support in Vectors:
> https://issues.apache.org/jira/browse/IGNITE-11647
> Another related ticket to this thread:
> https://issues.apache.org/jira/browse/IGNITE-11642
>
> чт, 28 мар. 2019 г. в 11:27, Alexey Zinoviev <[hidden email]>:
>
>> Hi, Igniters
>>
>> The new functionality of building Vectors was merged to Apache Ignite in
>> the
>> next commit
>> <
>> https://github.com/apache/ignite/commit/a0a15d62a250defb0db9ec72153ee287830f6a15>
>>
>>
>> This new functionality brings to Ignite ML the new approach of building
>> vectors. But in my opinion the shouldn't constrain ourselves with narrow
>> understanding of Vector nature as an analogue of double[] array.
>>
>> I suggest to extend the Vector and Vectorizer API to support Strings and
>> another types (like Blobs, Images and etc) as a vector elements.
>>
>> It brings next advantages:
>> * gives a chance to inify the hierarchy of Preprocessing Trainers and
>> Model
>> Trainers
>> * give us a chance to implement ML algorithms working not only with
>> doubles
>> * unifies our Vectorizers as a first step in our Pipelines
>> * drops a lot of unused generics
>> * makes one simple requirement to final users: convert their data to
>> Vectors
>>
>> Join to discussion, ML-interested persons and share your opinon here!
>>
>>
>>
>> --
>> Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
>>
>
Reply | Threaded
Open this post in threaded view
|

Re: [ML][DISCUSSION] The future of Vectorizer

Alexey Zinoviev
Yes, I do it in a few days, great PR, many thanks

пт, 29 марта 2019 г., 18:49 Алексей Платонов [hidden email]:

> Hey, Igniters!
> I prepared some PR for Serializable support in our Vectors.
> Could you review this: https://github.com/apache/ignite/pull/6378 ?
>
> чт, 28 мар. 2019 г. в 11:47, Алексей Платонов <[hidden email]>:
>
> > Yep, I definitely agree with you.
> >
> > Moreover, such improvement should reduce parallel hierarchies in trainers
> > and preprocessors, from this point of view preprocessor will be equal to
> a
> > trainer. In my opinion, this improvement is very important for ml module
> > because it can give a flexible hierarchy of components.
> >
> > I created a ticket for serializable object support in Vectors:
> > https://issues.apache.org/jira/browse/IGNITE-11647
> > Another related ticket to this thread:
> > https://issues.apache.org/jira/browse/IGNITE-11642
> >
> > чт, 28 мар. 2019 г. в 11:27, Alexey Zinoviev <[hidden email]>:
> >
> >> Hi, Igniters
> >>
> >> The new functionality of building Vectors was merged to Apache Ignite in
> >> the
> >> next commit
> >> <
> >>
> https://github.com/apache/ignite/commit/a0a15d62a250defb0db9ec72153ee287830f6a15
> >
> >>
> >>
> >> This new functionality brings to Ignite ML the new approach of building
> >> vectors. But in my opinion the shouldn't constrain ourselves with narrow
> >> understanding of Vector nature as an analogue of double[] array.
> >>
> >> I suggest to extend the Vector and Vectorizer API to support Strings and
> >> another types (like Blobs, Images and etc) as a vector elements.
> >>
> >> It brings next advantages:
> >> * gives a chance to inify the hierarchy of Preprocessing Trainers and
> >> Model
> >> Trainers
> >> * give us a chance to implement ML algorithms working not only with
> >> doubles
> >> * unifies our Vectorizers as a first step in our Pipelines
> >> * drops a lot of unused generics
> >> * makes one simple requirement to final users: convert their data to
> >> Vectors
> >>
> >> Join to discussion, ML-interested persons and share your opinon here!
> >>
> >>
> >>
> >> --
> >> Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
> >>
> >
>