Data compression in Ignite 2.0

classic Classic list List threaded Threaded
59 messages Options
123
Reply | Threaded
Open this post in threaded view
|

Re: Data compression in Ignite 2.0

daradurvs
Dmitriy,

I have ready prototype. I want to show it.
It is always easier to discuss on example.

2017-05-15 19:02 GMT+03:00 Dmitriy Setrakyan <[hidden email]>:

> Vyacheslav,
>
> I think it is a bit premature to provide a PR without getting a community
> consensus on the dev list. Please allow some time for the community to
> respond.
>
> D.
>
> On Mon, May 15, 2017 at 6:36 AM, Vyacheslav Daradur <[hidden email]>
> wrote:
>
> > I created the ticket: https://issues.apache.org/jira/browse/IGNITE-5226
> >
> > I'll prepare a PR with described solution in couple of days.
> >
> > 2017-05-15 15:05 GMT+03:00 Vyacheslav Daradur <[hidden email]>:
> >
> > > Hi, Igniters!
> > >
> > > Apache 2.0 is released.
> > >
> > > Let's continue the discussion about a compression design.
> > >
> > > At the moment, I found only one solution which is compatible with
> > querying
> > > and indexing, this is per-objects-field compression.
> > > Per-fields compression means that metadata (a header) of an object
> won't
> > > be compressed, only serialized values of an object fields (in bytes
> array
> > > form) will be compressed.
> > >
> > > This solution have some contentious issues:
> > > - small values, like primitives and short arrays - there isn't sense to
> > > compress them;
> > > - there is no possible to use compression with java-predefined types;
> > >
> > > We can provide an annotation, @IgniteCompression - for example, which
> can
> > > be used by users for marking fields to compress.
> > >
> > > Any thoughts?
> > >
> > > Maybe someone already have ready design?
> > >
> > > 2017-04-10 11:06 GMT+03:00 Vyacheslav Daradur <[hidden email]>:
> > >
> > >> Alexey,
> > >>
> > >> Yes, I've read it.
> > >>
> > >> Ok, let's discuss about public API design.
> > >>
> > >> I think we need to add some a configure entity to CacheConfiguration,
> > >> which will contain the Compressor interface implementation and some
> > usefull
> > >> parameters.
> > >> Or maybe to provide a BinaryMarshaller decorator, which will be
> compress
> > >> data after marshalling.
> > >>
> > >>
> > >> 2017-04-10 10:40 GMT+03:00 Alexey Kuznetsov <[hidden email]>:
> > >>
> > >>> Vyacheslav,
> > >>>
> > >>> Did you read initial discussion [1] about compression?
> > >>> As far as I remember we agreed to add only some "top-level" API in
> > order
> > >>> to
> > >>> provide a way for
> > >>> Ignite users to inject some sort of custom compression.
> > >>>
> > >>>
> > >>> [1]
> > >>> http://apache-ignite-developers.2346864.n4.nabble.com/Data-c
> > >>> ompression-in-Ignite-2-0-td10099.html
> > >>>
> > >>> On Mon, Apr 10, 2017 at 2:19 PM, daradurvs <[hidden email]>
> > wrote:
> > >>>
> > >>> > Hi Igniters!
> > >>> >
> > >>> > I am interested in this task.
> > >>> > Provide some kind of pluggable compression SPI support
> > >>> > <https://issues.apache.org/jira/browse/IGNITE-3592>
> > >>> >
> > >>> > I developed a solution on BinaryMarshaller-level, but reviewer has
> > >>> rejected
> > >>> > it.
> > >>> >
> > >>> > Let's continue discussion of task goals and solution design.
> > >>> > As I understood that, the main goal of this task is to store data
> in
> > >>> > compressed form.
> > >>> > This is what I need from Ignite as its user. Compression provides
> > >>> economy
> > >>> > on
> > >>> > servers.
> > >>> > We can store more data on same servers at the cost of increasing
> CPU
> > >>> > utilization.
> > >>> >
> > >>> > I'm researching a possibility of implementation of compression at
> the
> > >>> > cache-level.
> > >>> >
> > >>> > Any thoughts?
> > >>> >
> > >>> > --
> > >>> > Best regards,
> > >>> > Vyacheslav
> > >>> >
> > >>> >
> > >>> >
> > >>> >
> > >>> > --
> > >>> > View this message in context: http://apache-ignite-
> > >>> > developers.2346864.n4.nabble.com/Data-compression-in-
> > >>> > Ignite-2-0-tp10099p16317.html
> > >>> > Sent from the Apache Ignite Developers mailing list archive at
> > >>> Nabble.com.
> > >>> >
> > >>>
> > >>>
> > >>>
> > >>> --
> > >>> Alexey Kuznetsov
> > >>>
> > >>
> > >>
> > >>
> > >> --
> > >> Best Regards, Vyacheslav
> > >>
> > >
> > >
> > >
> > > --
> > > Best Regards, Vyacheslav
> > >
> >
> >
> >
> > --
> > Best Regards, Vyacheslav
> >
>



--
Best Regards, Vyacheslav
Reply | Threaded
Open this post in threaded view
|

Re: Data compression in Ignite 2.0

daradurvs
Hi guys,

I've prepared the PR to show my idea.
https://github.com/apache/ignite/pull/1951/files

About querying - I've just copied existing tests and have annotated the
testing data.
https://github.com/apache/ignite/pull/1951/files#diff-c19a9df4058141d059bb577e75244764

It means fields which will be marked by @BinaryCompression will be
compressed at marshalling via BinaryMarshaller.

This solution has no effect on existing data or project architecture.

I'll be glad to see your thougths.


2017-05-15 19:18 GMT+03:00 Vyacheslav Daradur <[hidden email]>:

> Dmitriy,
>
> I have ready prototype. I want to show it.
> It is always easier to discuss on example.
>
> 2017-05-15 19:02 GMT+03:00 Dmitriy Setrakyan <[hidden email]>:
>
>> Vyacheslav,
>>
>> I think it is a bit premature to provide a PR without getting a community
>> consensus on the dev list. Please allow some time for the community to
>> respond.
>>
>> D.
>>
>> On Mon, May 15, 2017 at 6:36 AM, Vyacheslav Daradur <[hidden email]>
>> wrote:
>>
>> > I created the ticket: https://issues.apache.org/jira/browse/IGNITE-5226
>> >
>> > I'll prepare a PR with described solution in couple of days.
>> >
>> > 2017-05-15 15:05 GMT+03:00 Vyacheslav Daradur <[hidden email]>:
>> >
>> > > Hi, Igniters!
>> > >
>> > > Apache 2.0 is released.
>> > >
>> > > Let's continue the discussion about a compression design.
>> > >
>> > > At the moment, I found only one solution which is compatible with
>> > querying
>> > > and indexing, this is per-objects-field compression.
>> > > Per-fields compression means that metadata (a header) of an object
>> won't
>> > > be compressed, only serialized values of an object fields (in bytes
>> array
>> > > form) will be compressed.
>> > >
>> > > This solution have some contentious issues:
>> > > - small values, like primitives and short arrays - there isn't sense
>> to
>> > > compress them;
>> > > - there is no possible to use compression with java-predefined types;
>> > >
>> > > We can provide an annotation, @IgniteCompression - for example, which
>> can
>> > > be used by users for marking fields to compress.
>> > >
>> > > Any thoughts?
>> > >
>> > > Maybe someone already have ready design?
>> > >
>> > > 2017-04-10 11:06 GMT+03:00 Vyacheslav Daradur <[hidden email]>:
>> > >
>> > >> Alexey,
>> > >>
>> > >> Yes, I've read it.
>> > >>
>> > >> Ok, let's discuss about public API design.
>> > >>
>> > >> I think we need to add some a configure entity to CacheConfiguration,
>> > >> which will contain the Compressor interface implementation and some
>> > usefull
>> > >> parameters.
>> > >> Or maybe to provide a BinaryMarshaller decorator, which will be
>> compress
>> > >> data after marshalling.
>> > >>
>> > >>
>> > >> 2017-04-10 10:40 GMT+03:00 Alexey Kuznetsov <[hidden email]>:
>> > >>
>> > >>> Vyacheslav,
>> > >>>
>> > >>> Did you read initial discussion [1] about compression?
>> > >>> As far as I remember we agreed to add only some "top-level" API in
>> > order
>> > >>> to
>> > >>> provide a way for
>> > >>> Ignite users to inject some sort of custom compression.
>> > >>>
>> > >>>
>> > >>> [1]
>> > >>> http://apache-ignite-developers.2346864.n4.nabble.com/Data-c
>> > >>> ompression-in-Ignite-2-0-td10099.html
>> > >>>
>> > >>> On Mon, Apr 10, 2017 at 2:19 PM, daradurvs <[hidden email]>
>> > wrote:
>> > >>>
>> > >>> > Hi Igniters!
>> > >>> >
>> > >>> > I am interested in this task.
>> > >>> > Provide some kind of pluggable compression SPI support
>> > >>> > <https://issues.apache.org/jira/browse/IGNITE-3592>
>> > >>> >
>> > >>> > I developed a solution on BinaryMarshaller-level, but reviewer has
>> > >>> rejected
>> > >>> > it.
>> > >>> >
>> > >>> > Let's continue discussion of task goals and solution design.
>> > >>> > As I understood that, the main goal of this task is to store data
>> in
>> > >>> > compressed form.
>> > >>> > This is what I need from Ignite as its user. Compression provides
>> > >>> economy
>> > >>> > on
>> > >>> > servers.
>> > >>> > We can store more data on same servers at the cost of increasing
>> CPU
>> > >>> > utilization.
>> > >>> >
>> > >>> > I'm researching a possibility of implementation of compression at
>> the
>> > >>> > cache-level.
>> > >>> >
>> > >>> > Any thoughts?
>> > >>> >
>> > >>> > --
>> > >>> > Best regards,
>> > >>> > Vyacheslav
>> > >>> >
>> > >>> >
>> > >>> >
>> > >>> >
>> > >>> > --
>> > >>> > View this message in context: http://apache-ignite-
>> > >>> > developers.2346864.n4.nabble.com/Data-compression-in-
>> > >>> > Ignite-2-0-tp10099p16317.html
>> > >>> > Sent from the Apache Ignite Developers mailing list archive at
>> > >>> Nabble.com.
>> > >>> >
>> > >>>
>> > >>>
>> > >>>
>> > >>> --
>> > >>> Alexey Kuznetsov
>> > >>>
>> > >>
>> > >>
>> > >>
>> > >> --
>> > >> Best Regards, Vyacheslav
>> > >>
>> > >
>> > >
>> > >
>> > > --
>> > > Best Regards, Vyacheslav
>> > >
>> >
>> >
>> >
>> > --
>> > Best Regards, Vyacheslav
>> >
>>
>
>
>
> --
> Best Regards, Vyacheslav
>



--
Best Regards, Vyacheslav
Reply | Threaded
Open this post in threaded view
|

Re: Data compression in Ignite 2.0

daradurvs
Guys, any thoughts?

2017-05-16 13:40 GMT+03:00 Vyacheslav Daradur <[hidden email]>:

> Hi guys,
>
> I've prepared the PR to show my idea.
> https://github.com/apache/ignite/pull/1951/files
>
> About querying - I've just copied existing tests and have annotated the
> testing data.
> https://github.com/apache/ignite/pull/1951/files#diff-
> c19a9df4058141d059bb577e75244764
>
> It means fields which will be marked by @BinaryCompression will be
> compressed at marshalling via BinaryMarshaller.
>
> This solution has no effect on existing data or project architecture.
>
> I'll be glad to see your thougths.
>
>
> 2017-05-15 19:18 GMT+03:00 Vyacheslav Daradur <[hidden email]>:
>
>> Dmitriy,
>>
>> I have ready prototype. I want to show it.
>> It is always easier to discuss on example.
>>
>> 2017-05-15 19:02 GMT+03:00 Dmitriy Setrakyan <[hidden email]>:
>>
>>> Vyacheslav,
>>>
>>> I think it is a bit premature to provide a PR without getting a community
>>> consensus on the dev list. Please allow some time for the community to
>>> respond.
>>>
>>> D.
>>>
>>> On Mon, May 15, 2017 at 6:36 AM, Vyacheslav Daradur <[hidden email]
>>> >
>>> wrote:
>>>
>>> > I created the ticket: https://issues.apache.org/jira
>>> /browse/IGNITE-5226
>>> >
>>> > I'll prepare a PR with described solution in couple of days.
>>> >
>>> > 2017-05-15 15:05 GMT+03:00 Vyacheslav Daradur <[hidden email]>:
>>> >
>>> > > Hi, Igniters!
>>> > >
>>> > > Apache 2.0 is released.
>>> > >
>>> > > Let's continue the discussion about a compression design.
>>> > >
>>> > > At the moment, I found only one solution which is compatible with
>>> > querying
>>> > > and indexing, this is per-objects-field compression.
>>> > > Per-fields compression means that metadata (a header) of an object
>>> won't
>>> > > be compressed, only serialized values of an object fields (in bytes
>>> array
>>> > > form) will be compressed.
>>> > >
>>> > > This solution have some contentious issues:
>>> > > - small values, like primitives and short arrays - there isn't sense
>>> to
>>> > > compress them;
>>> > > - there is no possible to use compression with java-predefined types;
>>> > >
>>> > > We can provide an annotation, @IgniteCompression - for example,
>>> which can
>>> > > be used by users for marking fields to compress.
>>> > >
>>> > > Any thoughts?
>>> > >
>>> > > Maybe someone already have ready design?
>>> > >
>>> > > 2017-04-10 11:06 GMT+03:00 Vyacheslav Daradur <[hidden email]>:
>>> > >
>>> > >> Alexey,
>>> > >>
>>> > >> Yes, I've read it.
>>> > >>
>>> > >> Ok, let's discuss about public API design.
>>> > >>
>>> > >> I think we need to add some a configure entity to
>>> CacheConfiguration,
>>> > >> which will contain the Compressor interface implementation and some
>>> > usefull
>>> > >> parameters.
>>> > >> Or maybe to provide a BinaryMarshaller decorator, which will be
>>> compress
>>> > >> data after marshalling.
>>> > >>
>>> > >>
>>> > >> 2017-04-10 10:40 GMT+03:00 Alexey Kuznetsov <[hidden email]
>>> >:
>>> > >>
>>> > >>> Vyacheslav,
>>> > >>>
>>> > >>> Did you read initial discussion [1] about compression?
>>> > >>> As far as I remember we agreed to add only some "top-level" API in
>>> > order
>>> > >>> to
>>> > >>> provide a way for
>>> > >>> Ignite users to inject some sort of custom compression.
>>> > >>>
>>> > >>>
>>> > >>> [1]
>>> > >>> http://apache-ignite-developers.2346864.n4.nabble.com/Data-c
>>> > >>> ompression-in-Ignite-2-0-td10099.html
>>> > >>>
>>> > >>> On Mon, Apr 10, 2017 at 2:19 PM, daradurvs <[hidden email]>
>>> > wrote:
>>> > >>>
>>> > >>> > Hi Igniters!
>>> > >>> >
>>> > >>> > I am interested in this task.
>>> > >>> > Provide some kind of pluggable compression SPI support
>>> > >>> > <https://issues.apache.org/jira/browse/IGNITE-3592>
>>> > >>> >
>>> > >>> > I developed a solution on BinaryMarshaller-level, but reviewer
>>> has
>>> > >>> rejected
>>> > >>> > it.
>>> > >>> >
>>> > >>> > Let's continue discussion of task goals and solution design.
>>> > >>> > As I understood that, the main goal of this task is to store
>>> data in
>>> > >>> > compressed form.
>>> > >>> > This is what I need from Ignite as its user. Compression provides
>>> > >>> economy
>>> > >>> > on
>>> > >>> > servers.
>>> > >>> > We can store more data on same servers at the cost of increasing
>>> CPU
>>> > >>> > utilization.
>>> > >>> >
>>> > >>> > I'm researching a possibility of implementation of compression
>>> at the
>>> > >>> > cache-level.
>>> > >>> >
>>> > >>> > Any thoughts?
>>> > >>> >
>>> > >>> > --
>>> > >>> > Best regards,
>>> > >>> > Vyacheslav
>>> > >>> >
>>> > >>> >
>>> > >>> >
>>> > >>> >
>>> > >>> > --
>>> > >>> > View this message in context: http://apache-ignite-
>>> > >>> > developers.2346864.n4.nabble.com/Data-compression-in-
>>> > >>> > Ignite-2-0-tp10099p16317.html
>>> > >>> > Sent from the Apache Ignite Developers mailing list archive at
>>> > >>> Nabble.com.
>>> > >>> >
>>> > >>>
>>> > >>>
>>> > >>>
>>> > >>> --
>>> > >>> Alexey Kuznetsov
>>> > >>>
>>> > >>
>>> > >>
>>> > >>
>>> > >> --
>>> > >> Best Regards, Vyacheslav
>>> > >>
>>> > >
>>> > >
>>> > >
>>> > > --
>>> > > Best Regards, Vyacheslav
>>> > >
>>> >
>>> >
>>> >
>>> > --
>>> > Best Regards, Vyacheslav
>>> >
>>>
>>
>>
>>
>> --
>> Best Regards, Vyacheslav
>>
>
>
>
> --
> Best Regards, Vyacheslav
>



--
Best Regards, Vyacheslav
Reply | Threaded
Open this post in threaded view
|

Re: Data compression in Ignite 2.0

daradurvs
Hi, Igniters.

I've prepared some benchmarking. Results [1].

And I've prepared the evaluation in the form of diagrams [2].

I hope that helps to interest the community and accelerates a reaction to
this improvment :)

[1]
https://github.com/daradurvs/ignite-compression/tree/master/src/main/resources/result
[2] https://drive.google.com/file/d/0B2CeUAOgrHkoMklyZ25YTEdKcEk/view



2017-05-24 9:49 GMT+03:00 Vyacheslav Daradur <[hidden email]>:

> Guys, any thoughts?
>
> 2017-05-16 13:40 GMT+03:00 Vyacheslav Daradur <[hidden email]>:
>
>> Hi guys,
>>
>> I've prepared the PR to show my idea.
>> https://github.com/apache/ignite/pull/1951/files
>>
>> About querying - I've just copied existing tests and have annotated the
>> testing data.
>> https://github.com/apache/ignite/pull/1951/files#diff-c19a9d
>> f4058141d059bb577e75244764
>>
>> It means fields which will be marked by @BinaryCompression will be
>> compressed at marshalling via BinaryMarshaller.
>>
>> This solution has no effect on existing data or project architecture.
>>
>> I'll be glad to see your thougths.
>>
>>
>> 2017-05-15 19:18 GMT+03:00 Vyacheslav Daradur <[hidden email]>:
>>
>>> Dmitriy,
>>>
>>> I have ready prototype. I want to show it.
>>> It is always easier to discuss on example.
>>>
>>> 2017-05-15 19:02 GMT+03:00 Dmitriy Setrakyan <[hidden email]>:
>>>
>>>> Vyacheslav,
>>>>
>>>> I think it is a bit premature to provide a PR without getting a
>>>> community
>>>> consensus on the dev list. Please allow some time for the community to
>>>> respond.
>>>>
>>>> D.
>>>>
>>>> On Mon, May 15, 2017 at 6:36 AM, Vyacheslav Daradur <
>>>> [hidden email]>
>>>> wrote:
>>>>
>>>> > I created the ticket: https://issues.apache.org/jira
>>>> /browse/IGNITE-5226
>>>> >
>>>> > I'll prepare a PR with described solution in couple of days.
>>>> >
>>>> > 2017-05-15 15:05 GMT+03:00 Vyacheslav Daradur <[hidden email]>:
>>>> >
>>>> > > Hi, Igniters!
>>>> > >
>>>> > > Apache 2.0 is released.
>>>> > >
>>>> > > Let's continue the discussion about a compression design.
>>>> > >
>>>> > > At the moment, I found only one solution which is compatible with
>>>> > querying
>>>> > > and indexing, this is per-objects-field compression.
>>>> > > Per-fields compression means that metadata (a header) of an object
>>>> won't
>>>> > > be compressed, only serialized values of an object fields (in bytes
>>>> array
>>>> > > form) will be compressed.
>>>> > >
>>>> > > This solution have some contentious issues:
>>>> > > - small values, like primitives and short arrays - there isn't
>>>> sense to
>>>> > > compress them;
>>>> > > - there is no possible to use compression with java-predefined
>>>> types;
>>>> > >
>>>> > > We can provide an annotation, @IgniteCompression - for example,
>>>> which can
>>>> > > be used by users for marking fields to compress.
>>>> > >
>>>> > > Any thoughts?
>>>> > >
>>>> > > Maybe someone already have ready design?
>>>> > >
>>>> > > 2017-04-10 11:06 GMT+03:00 Vyacheslav Daradur <[hidden email]
>>>> >:
>>>> > >
>>>> > >> Alexey,
>>>> > >>
>>>> > >> Yes, I've read it.
>>>> > >>
>>>> > >> Ok, let's discuss about public API design.
>>>> > >>
>>>> > >> I think we need to add some a configure entity to
>>>> CacheConfiguration,
>>>> > >> which will contain the Compressor interface implementation and some
>>>> > usefull
>>>> > >> parameters.
>>>> > >> Or maybe to provide a BinaryMarshaller decorator, which will be
>>>> compress
>>>> > >> data after marshalling.
>>>> > >>
>>>> > >>
>>>> > >> 2017-04-10 10:40 GMT+03:00 Alexey Kuznetsov <[hidden email]
>>>> >:
>>>> > >>
>>>> > >>> Vyacheslav,
>>>> > >>>
>>>> > >>> Did you read initial discussion [1] about compression?
>>>> > >>> As far as I remember we agreed to add only some "top-level" API in
>>>> > order
>>>> > >>> to
>>>> > >>> provide a way for
>>>> > >>> Ignite users to inject some sort of custom compression.
>>>> > >>>
>>>> > >>>
>>>> > >>> [1]
>>>> > >>> http://apache-ignite-developers.2346864.n4.nabble.com/Data-c
>>>> > >>> ompression-in-Ignite-2-0-td10099.html
>>>> > >>>
>>>> > >>> On Mon, Apr 10, 2017 at 2:19 PM, daradurvs <[hidden email]>
>>>> > wrote:
>>>> > >>>
>>>> > >>> > Hi Igniters!
>>>> > >>> >
>>>> > >>> > I am interested in this task.
>>>> > >>> > Provide some kind of pluggable compression SPI support
>>>> > >>> > <https://issues.apache.org/jira/browse/IGNITE-3592>
>>>> > >>> >
>>>> > >>> > I developed a solution on BinaryMarshaller-level, but reviewer
>>>> has
>>>> > >>> rejected
>>>> > >>> > it.
>>>> > >>> >
>>>> > >>> > Let's continue discussion of task goals and solution design.
>>>> > >>> > As I understood that, the main goal of this task is to store
>>>> data in
>>>> > >>> > compressed form.
>>>> > >>> > This is what I need from Ignite as its user. Compression
>>>> provides
>>>> > >>> economy
>>>> > >>> > on
>>>> > >>> > servers.
>>>> > >>> > We can store more data on same servers at the cost of
>>>> increasing CPU
>>>> > >>> > utilization.
>>>> > >>> >
>>>> > >>> > I'm researching a possibility of implementation of compression
>>>> at the
>>>> > >>> > cache-level.
>>>> > >>> >
>>>> > >>> > Any thoughts?
>>>> > >>> >
>>>> > >>> > --
>>>> > >>> > Best regards,
>>>> > >>> > Vyacheslav
>>>> > >>> >
>>>> > >>> >
>>>> > >>> >
>>>> > >>> >
>>>> > >>> > --
>>>> > >>> > View this message in context: http://apache-ignite-
>>>> > >>> > developers.2346864.n4.nabble.com/Data-compression-in-
>>>> > >>> > Ignite-2-0-tp10099p16317.html
>>>> > >>> > Sent from the Apache Ignite Developers mailing list archive at
>>>> > >>> Nabble.com.
>>>> > >>> >
>>>> > >>>
>>>> > >>>
>>>> > >>>
>>>> > >>> --
>>>> > >>> Alexey Kuznetsov
>>>> > >>>
>>>> > >>
>>>> > >>
>>>> > >>
>>>> > >> --
>>>> > >> Best Regards, Vyacheslav
>>>> > >>
>>>> > >
>>>> > >
>>>> > >
>>>> > > --
>>>> > > Best Regards, Vyacheslav
>>>> > >
>>>> >
>>>> >
>>>> >
>>>> > --
>>>> > Best Regards, Vyacheslav
>>>> >
>>>>
>>>
>>>
>>>
>>> --
>>> Best Regards, Vyacheslav
>>>
>>
>>
>>
>> --
>> Best Regards, Vyacheslav
>>
>
>
>
> --
> Best Regards, Vyacheslav
>



--
Best Regards, Vyacheslav
Reply | Threaded
Open this post in threaded view
|

Re: Data compression in Ignite 2.0

dsetrakyan
Vladimir, I am not sure how to interpret the graphs? What are we looking at?

On Tue, Jun 6, 2017 at 12:33 AM, Vyacheslav Daradur <[hidden email]>
wrote:

> Hi, Igniters.
>
> I've prepared some benchmarking. Results [1].
>
> And I've prepared the evaluation in the form of diagrams [2].
>
> I hope that helps to interest the community and accelerates a reaction to
> this improvment :)
>
> [1]
> https://github.com/daradurvs/ignite-compression/tree/
> master/src/main/resources/result
> [2] https://drive.google.com/file/d/0B2CeUAOgrHkoMklyZ25YTEdKcEk/view
>
>
>
> 2017-05-24 9:49 GMT+03:00 Vyacheslav Daradur <[hidden email]>:
>
> > Guys, any thoughts?
> >
> > 2017-05-16 13:40 GMT+03:00 Vyacheslav Daradur <[hidden email]>:
> >
> >> Hi guys,
> >>
> >> I've prepared the PR to show my idea.
> >> https://github.com/apache/ignite/pull/1951/files
> >>
> >> About querying - I've just copied existing tests and have annotated the
> >> testing data.
> >> https://github.com/apache/ignite/pull/1951/files#diff-c19a9d
> >> f4058141d059bb577e75244764
> >>
> >> It means fields which will be marked by @BinaryCompression will be
> >> compressed at marshalling via BinaryMarshaller.
> >>
> >> This solution has no effect on existing data or project architecture.
> >>
> >> I'll be glad to see your thougths.
> >>
> >>
> >> 2017-05-15 19:18 GMT+03:00 Vyacheslav Daradur <[hidden email]>:
> >>
> >>> Dmitriy,
> >>>
> >>> I have ready prototype. I want to show it.
> >>> It is always easier to discuss on example.
> >>>
> >>> 2017-05-15 19:02 GMT+03:00 Dmitriy Setrakyan <[hidden email]>:
> >>>
> >>>> Vyacheslav,
> >>>>
> >>>> I think it is a bit premature to provide a PR without getting a
> >>>> community
> >>>> consensus on the dev list. Please allow some time for the community to
> >>>> respond.
> >>>>
> >>>> D.
> >>>>
> >>>> On Mon, May 15, 2017 at 6:36 AM, Vyacheslav Daradur <
> >>>> [hidden email]>
> >>>> wrote:
> >>>>
> >>>> > I created the ticket: https://issues.apache.org/jira
> >>>> /browse/IGNITE-5226
> >>>> >
> >>>> > I'll prepare a PR with described solution in couple of days.
> >>>> >
> >>>> > 2017-05-15 15:05 GMT+03:00 Vyacheslav Daradur <[hidden email]
> >:
> >>>> >
> >>>> > > Hi, Igniters!
> >>>> > >
> >>>> > > Apache 2.0 is released.
> >>>> > >
> >>>> > > Let's continue the discussion about a compression design.
> >>>> > >
> >>>> > > At the moment, I found only one solution which is compatible with
> >>>> > querying
> >>>> > > and indexing, this is per-objects-field compression.
> >>>> > > Per-fields compression means that metadata (a header) of an object
> >>>> won't
> >>>> > > be compressed, only serialized values of an object fields (in
> bytes
> >>>> array
> >>>> > > form) will be compressed.
> >>>> > >
> >>>> > > This solution have some contentious issues:
> >>>> > > - small values, like primitives and short arrays - there isn't
> >>>> sense to
> >>>> > > compress them;
> >>>> > > - there is no possible to use compression with java-predefined
> >>>> types;
> >>>> > >
> >>>> > > We can provide an annotation, @IgniteCompression - for example,
> >>>> which can
> >>>> > > be used by users for marking fields to compress.
> >>>> > >
> >>>> > > Any thoughts?
> >>>> > >
> >>>> > > Maybe someone already have ready design?
> >>>> > >
> >>>> > > 2017-04-10 11:06 GMT+03:00 Vyacheslav Daradur <
> [hidden email]
> >>>> >:
> >>>> > >
> >>>> > >> Alexey,
> >>>> > >>
> >>>> > >> Yes, I've read it.
> >>>> > >>
> >>>> > >> Ok, let's discuss about public API design.
> >>>> > >>
> >>>> > >> I think we need to add some a configure entity to
> >>>> CacheConfiguration,
> >>>> > >> which will contain the Compressor interface implementation and
> some
> >>>> > usefull
> >>>> > >> parameters.
> >>>> > >> Or maybe to provide a BinaryMarshaller decorator, which will be
> >>>> compress
> >>>> > >> data after marshalling.
> >>>> > >>
> >>>> > >>
> >>>> > >> 2017-04-10 10:40 GMT+03:00 Alexey Kuznetsov <
> [hidden email]
> >>>> >:
> >>>> > >>
> >>>> > >>> Vyacheslav,
> >>>> > >>>
> >>>> > >>> Did you read initial discussion [1] about compression?
> >>>> > >>> As far as I remember we agreed to add only some "top-level" API
> in
> >>>> > order
> >>>> > >>> to
> >>>> > >>> provide a way for
> >>>> > >>> Ignite users to inject some sort of custom compression.
> >>>> > >>>
> >>>> > >>>
> >>>> > >>> [1]
> >>>> > >>> http://apache-ignite-developers.2346864.n4.nabble.com/Data-c
> >>>> > >>> ompression-in-Ignite-2-0-td10099.html
> >>>> > >>>
> >>>> > >>> On Mon, Apr 10, 2017 at 2:19 PM, daradurvs <[hidden email]
> >
> >>>> > wrote:
> >>>> > >>>
> >>>> > >>> > Hi Igniters!
> >>>> > >>> >
> >>>> > >>> > I am interested in this task.
> >>>> > >>> > Provide some kind of pluggable compression SPI support
> >>>> > >>> > <https://issues.apache.org/jira/browse/IGNITE-3592>
> >>>> > >>> >
> >>>> > >>> > I developed a solution on BinaryMarshaller-level, but reviewer
> >>>> has
> >>>> > >>> rejected
> >>>> > >>> > it.
> >>>> > >>> >
> >>>> > >>> > Let's continue discussion of task goals and solution design.
> >>>> > >>> > As I understood that, the main goal of this task is to store
> >>>> data in
> >>>> > >>> > compressed form.
> >>>> > >>> > This is what I need from Ignite as its user. Compression
> >>>> provides
> >>>> > >>> economy
> >>>> > >>> > on
> >>>> > >>> > servers.
> >>>> > >>> > We can store more data on same servers at the cost of
> >>>> increasing CPU
> >>>> > >>> > utilization.
> >>>> > >>> >
> >>>> > >>> > I'm researching a possibility of implementation of compression
> >>>> at the
> >>>> > >>> > cache-level.
> >>>> > >>> >
> >>>> > >>> > Any thoughts?
> >>>> > >>> >
> >>>> > >>> > --
> >>>> > >>> > Best regards,
> >>>> > >>> > Vyacheslav
> >>>> > >>> >
> >>>> > >>> >
> >>>> > >>> >
> >>>> > >>> >
> >>>> > >>> > --
> >>>> > >>> > View this message in context: http://apache-ignite-
> >>>> > >>> > developers.2346864.n4.nabble.com/Data-compression-in-
> >>>> > >>> > Ignite-2-0-tp10099p16317.html
> >>>> > >>> > Sent from the Apache Ignite Developers mailing list archive at
> >>>> > >>> Nabble.com.
> >>>> > >>> >
> >>>> > >>>
> >>>> > >>>
> >>>> > >>>
> >>>> > >>> --
> >>>> > >>> Alexey Kuznetsov
> >>>> > >>>
> >>>> > >>
> >>>> > >>
> >>>> > >>
> >>>> > >> --
> >>>> > >> Best Regards, Vyacheslav
> >>>> > >>
> >>>> > >
> >>>> > >
> >>>> > >
> >>>> > > --
> >>>> > > Best Regards, Vyacheslav
> >>>> > >
> >>>> >
> >>>> >
> >>>> >
> >>>> > --
> >>>> > Best Regards, Vyacheslav
> >>>> >
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> Best Regards, Vyacheslav
> >>>
> >>
> >>
> >>
> >> --
> >> Best Regards, Vyacheslav
> >>
> >
> >
> >
> > --
> > Best Regards, Vyacheslav
> >
>
>
>
> --
> Best Regards, Vyacheslav
>
Reply | Threaded
Open this post in threaded view
|

Re: Data compression in Ignite 2.0

daradurvs
Dmitry,

Excel-pages:

1). "Compression ratio (2)" - shows object size, with compression and
without compression. (Conditions: literal text)
1st graph shows compression ratios of using different compression algrithms
depending on size of compressed field.
2nd graph shows evaluation of size of objects depending on sizes and
compression algorithms.

2). "Compression ratio (1)" - shows object size, with compression and
without compression. (Conditions:  badly compressed character sequence)
1st graph shows compression ratios of using different compression
algrithms depending on size of compressed field.
2nd graph shows evaluation of size of objects depending on sizes and
compression algorithms.

3) 'put-avg" - shows average time of the "put" operation depending on size
and compression algorithms.

4) 'put-thrpt" - shows throughput of the "put" operation depending on size
and compression algorithms.

5) 'get-avg" - shows average time of the "get" operation depending on size
and compression algorithms.

6) 'get-thrpt" - shows throughput of the "get" operation depending on size
and compression algorithms.




2017-06-06 10:59 GMT+03:00 Dmitriy Setrakyan <[hidden email]>:

> Vladimir, I am not sure how to interpret the graphs? What are we looking
> at?
>
> On Tue, Jun 6, 2017 at 12:33 AM, Vyacheslav Daradur <[hidden email]>
> wrote:
>
> > Hi, Igniters.
> >
> > I've prepared some benchmarking. Results [1].
> >
> > And I've prepared the evaluation in the form of diagrams [2].
> >
> > I hope that helps to interest the community and accelerates a reaction to
> > this improvment :)
> >
> > [1]
> > https://github.com/daradurvs/ignite-compression/tree/
> > master/src/main/resources/result
> > [2] https://drive.google.com/file/d/0B2CeUAOgrHkoMklyZ25YTEdKcEk/view
> >
> >
> >
> > 2017-05-24 9:49 GMT+03:00 Vyacheslav Daradur <[hidden email]>:
> >
> > > Guys, any thoughts?
> > >
> > > 2017-05-16 13:40 GMT+03:00 Vyacheslav Daradur <[hidden email]>:
> > >
> > >> Hi guys,
> > >>
> > >> I've prepared the PR to show my idea.
> > >> https://github.com/apache/ignite/pull/1951/files
> > >>
> > >> About querying - I've just copied existing tests and have annotated
> the
> > >> testing data.
> > >> https://github.com/apache/ignite/pull/1951/files#diff-c19a9d
> > >> f4058141d059bb577e75244764
> > >>
> > >> It means fields which will be marked by @BinaryCompression will be
> > >> compressed at marshalling via BinaryMarshaller.
> > >>
> > >> This solution has no effect on existing data or project architecture.
> > >>
> > >> I'll be glad to see your thougths.
> > >>
> > >>
> > >> 2017-05-15 19:18 GMT+03:00 Vyacheslav Daradur <[hidden email]>:
> > >>
> > >>> Dmitriy,
> > >>>
> > >>> I have ready prototype. I want to show it.
> > >>> It is always easier to discuss on example.
> > >>>
> > >>> 2017-05-15 19:02 GMT+03:00 Dmitriy Setrakyan <[hidden email]
> >:
> > >>>
> > >>>> Vyacheslav,
> > >>>>
> > >>>> I think it is a bit premature to provide a PR without getting a
> > >>>> community
> > >>>> consensus on the dev list. Please allow some time for the community
> to
> > >>>> respond.
> > >>>>
> > >>>> D.
> > >>>>
> > >>>> On Mon, May 15, 2017 at 6:36 AM, Vyacheslav Daradur <
> > >>>> [hidden email]>
> > >>>> wrote:
> > >>>>
> > >>>> > I created the ticket: https://issues.apache.org/jira
> > >>>> /browse/IGNITE-5226
> > >>>> >
> > >>>> > I'll prepare a PR with described solution in couple of days.
> > >>>> >
> > >>>> > 2017-05-15 15:05 GMT+03:00 Vyacheslav Daradur <
> [hidden email]
> > >:
> > >>>> >
> > >>>> > > Hi, Igniters!
> > >>>> > >
> > >>>> > > Apache 2.0 is released.
> > >>>> > >
> > >>>> > > Let's continue the discussion about a compression design.
> > >>>> > >
> > >>>> > > At the moment, I found only one solution which is compatible
> with
> > >>>> > querying
> > >>>> > > and indexing, this is per-objects-field compression.
> > >>>> > > Per-fields compression means that metadata (a header) of an
> object
> > >>>> won't
> > >>>> > > be compressed, only serialized values of an object fields (in
> > bytes
> > >>>> array
> > >>>> > > form) will be compressed.
> > >>>> > >
> > >>>> > > This solution have some contentious issues:
> > >>>> > > - small values, like primitives and short arrays - there isn't
> > >>>> sense to
> > >>>> > > compress them;
> > >>>> > > - there is no possible to use compression with java-predefined
> > >>>> types;
> > >>>> > >
> > >>>> > > We can provide an annotation, @IgniteCompression - for example,
> > >>>> which can
> > >>>> > > be used by users for marking fields to compress.
> > >>>> > >
> > >>>> > > Any thoughts?
> > >>>> > >
> > >>>> > > Maybe someone already have ready design?
> > >>>> > >
> > >>>> > > 2017-04-10 11:06 GMT+03:00 Vyacheslav Daradur <
> > [hidden email]
> > >>>> >:
> > >>>> > >
> > >>>> > >> Alexey,
> > >>>> > >>
> > >>>> > >> Yes, I've read it.
> > >>>> > >>
> > >>>> > >> Ok, let's discuss about public API design.
> > >>>> > >>
> > >>>> > >> I think we need to add some a configure entity to
> > >>>> CacheConfiguration,
> > >>>> > >> which will contain the Compressor interface implementation and
> > some
> > >>>> > usefull
> > >>>> > >> parameters.
> > >>>> > >> Or maybe to provide a BinaryMarshaller decorator, which will be
> > >>>> compress
> > >>>> > >> data after marshalling.
> > >>>> > >>
> > >>>> > >>
> > >>>> > >> 2017-04-10 10:40 GMT+03:00 Alexey Kuznetsov <
> > [hidden email]
> > >>>> >:
> > >>>> > >>
> > >>>> > >>> Vyacheslav,
> > >>>> > >>>
> > >>>> > >>> Did you read initial discussion [1] about compression?
> > >>>> > >>> As far as I remember we agreed to add only some "top-level"
> API
> > in
> > >>>> > order
> > >>>> > >>> to
> > >>>> > >>> provide a way for
> > >>>> > >>> Ignite users to inject some sort of custom compression.
> > >>>> > >>>
> > >>>> > >>>
> > >>>> > >>> [1]
> > >>>> > >>> http://apache-ignite-developers.2346864.n4.nabble.com/Data-c
> > >>>> > >>> ompression-in-Ignite-2-0-td10099.html
> > >>>> > >>>
> > >>>> > >>> On Mon, Apr 10, 2017 at 2:19 PM, daradurvs <
> [hidden email]
> > >
> > >>>> > wrote:
> > >>>> > >>>
> > >>>> > >>> > Hi Igniters!
> > >>>> > >>> >
> > >>>> > >>> > I am interested in this task.
> > >>>> > >>> > Provide some kind of pluggable compression SPI support
> > >>>> > >>> > <https://issues.apache.org/jira/browse/IGNITE-3592>
> > >>>> > >>> >
> > >>>> > >>> > I developed a solution on BinaryMarshaller-level, but
> reviewer
> > >>>> has
> > >>>> > >>> rejected
> > >>>> > >>> > it.
> > >>>> > >>> >
> > >>>> > >>> > Let's continue discussion of task goals and solution design.
> > >>>> > >>> > As I understood that, the main goal of this task is to store
> > >>>> data in
> > >>>> > >>> > compressed form.
> > >>>> > >>> > This is what I need from Ignite as its user. Compression
> > >>>> provides
> > >>>> > >>> economy
> > >>>> > >>> > on
> > >>>> > >>> > servers.
> > >>>> > >>> > We can store more data on same servers at the cost of
> > >>>> increasing CPU
> > >>>> > >>> > utilization.
> > >>>> > >>> >
> > >>>> > >>> > I'm researching a possibility of implementation of
> compression
> > >>>> at the
> > >>>> > >>> > cache-level.
> > >>>> > >>> >
> > >>>> > >>> > Any thoughts?
> > >>>> > >>> >
> > >>>> > >>> > --
> > >>>> > >>> > Best regards,
> > >>>> > >>> > Vyacheslav
> > >>>> > >>> >
> > >>>> > >>> >
> > >>>> > >>> >
> > >>>> > >>> >
> > >>>> > >>> > --
> > >>>> > >>> > View this message in context: http://apache-ignite-
> > >>>> > >>> > developers.2346864.n4.nabble.com/Data-compression-in-
> > >>>> > >>> > Ignite-2-0-tp10099p16317.html
> > >>>> > >>> > Sent from the Apache Ignite Developers mailing list archive
> at
> > >>>> > >>> Nabble.com.
> > >>>> > >>> >
> > >>>> > >>>
> > >>>> > >>>
> > >>>> > >>>
> > >>>> > >>> --
> > >>>> > >>> Alexey Kuznetsov
> > >>>> > >>>
> > >>>> > >>
> > >>>> > >>
> > >>>> > >>
> > >>>> > >> --
> > >>>> > >> Best Regards, Vyacheslav
> > >>>> > >>
> > >>>> > >
> > >>>> > >
> > >>>> > >
> > >>>> > > --
> > >>>> > > Best Regards, Vyacheslav
> > >>>> > >
> > >>>> >
> > >>>> >
> > >>>> >
> > >>>> > --
> > >>>> > Best Regards, Vyacheslav
> > >>>> >
> > >>>>
> > >>>
> > >>>
> > >>>
> > >>> --
> > >>> Best Regards, Vyacheslav
> > >>>
> > >>
> > >>
> > >>
> > >> --
> > >> Best Regards, Vyacheslav
> > >>
> > >
> > >
> > >
> > > --
> > > Best Regards, Vyacheslav
> > >
> >
> >
> >
> > --
> > Best Regards, Vyacheslav
> >
>



--
Best Regards, Vyacheslav
Reply | Threaded
Open this post in threaded view
|

Re: Data compression in Ignite 2.0

daradurvs
All metrics are taken from app based on custom assembly of AI, containing
the provided PR.

2017-06-06 11:28 GMT+03:00 Vyacheslav Daradur <[hidden email]>:

> Dmitry,
>
> Excel-pages:
>
> 1). "Compression ratio (2)" - shows object size, with compression and
> without compression. (Conditions: literal text)
> 1st graph shows compression ratios of using different compression
> algrithms depending on size of compressed field.
> 2nd graph shows evaluation of size of objects depending on sizes and
> compression algorithms.
>
> 2). "Compression ratio (1)" - shows object size, with compression and
> without compression. (Conditions:  badly compressed character sequence)
> 1st graph shows compression ratios of using different compression
> algrithms depending on size of compressed field.
> 2nd graph shows evaluation of size of objects depending on sizes and
> compression algorithms.
>
> 3) 'put-avg" - shows average time of the "put" operation depending on size
> and compression algorithms.
>
> 4) 'put-thrpt" - shows throughput of the "put" operation depending on size
> and compression algorithms.
>
> 5) 'get-avg" - shows average time of the "get" operation depending on size
> and compression algorithms.
>
> 6) 'get-thrpt" - shows throughput of the "get" operation depending on size
> and compression algorithms.
>
>
>
>
> 2017-06-06 10:59 GMT+03:00 Dmitriy Setrakyan <[hidden email]>:
>
>> Vladimir, I am not sure how to interpret the graphs? What are we looking
>> at?
>>
>> On Tue, Jun 6, 2017 at 12:33 AM, Vyacheslav Daradur <[hidden email]>
>> wrote:
>>
>> > Hi, Igniters.
>> >
>> > I've prepared some benchmarking. Results [1].
>> >
>> > And I've prepared the evaluation in the form of diagrams [2].
>> >
>> > I hope that helps to interest the community and accelerates a reaction
>> to
>> > this improvment :)
>> >
>> > [1]
>> > https://github.com/daradurvs/ignite-compression/tree/
>> > master/src/main/resources/result
>> > [2] https://drive.google.com/file/d/0B2CeUAOgrHkoMklyZ25YTEdKcEk/view
>> >
>> >
>> >
>> > 2017-05-24 9:49 GMT+03:00 Vyacheslav Daradur <[hidden email]>:
>> >
>> > > Guys, any thoughts?
>> > >
>> > > 2017-05-16 13:40 GMT+03:00 Vyacheslav Daradur <[hidden email]>:
>> > >
>> > >> Hi guys,
>> > >>
>> > >> I've prepared the PR to show my idea.
>> > >> https://github.com/apache/ignite/pull/1951/files
>> > >>
>> > >> About querying - I've just copied existing tests and have annotated
>> the
>> > >> testing data.
>> > >> https://github.com/apache/ignite/pull/1951/files#diff-c19a9d
>> > >> f4058141d059bb577e75244764
>> > >>
>> > >> It means fields which will be marked by @BinaryCompression will be
>> > >> compressed at marshalling via BinaryMarshaller.
>> > >>
>> > >> This solution has no effect on existing data or project architecture.
>> > >>
>> > >> I'll be glad to see your thougths.
>> > >>
>> > >>
>> > >> 2017-05-15 19:18 GMT+03:00 Vyacheslav Daradur <[hidden email]>:
>> > >>
>> > >>> Dmitriy,
>> > >>>
>> > >>> I have ready prototype. I want to show it.
>> > >>> It is always easier to discuss on example.
>> > >>>
>> > >>> 2017-05-15 19:02 GMT+03:00 Dmitriy Setrakyan <[hidden email]
>> >:
>> > >>>
>> > >>>> Vyacheslav,
>> > >>>>
>> > >>>> I think it is a bit premature to provide a PR without getting a
>> > >>>> community
>> > >>>> consensus on the dev list. Please allow some time for the
>> community to
>> > >>>> respond.
>> > >>>>
>> > >>>> D.
>> > >>>>
>> > >>>> On Mon, May 15, 2017 at 6:36 AM, Vyacheslav Daradur <
>> > >>>> [hidden email]>
>> > >>>> wrote:
>> > >>>>
>> > >>>> > I created the ticket: https://issues.apache.org/jira
>> > >>>> /browse/IGNITE-5226
>> > >>>> >
>> > >>>> > I'll prepare a PR with described solution in couple of days.
>> > >>>> >
>> > >>>> > 2017-05-15 15:05 GMT+03:00 Vyacheslav Daradur <
>> [hidden email]
>> > >:
>> > >>>> >
>> > >>>> > > Hi, Igniters!
>> > >>>> > >
>> > >>>> > > Apache 2.0 is released.
>> > >>>> > >
>> > >>>> > > Let's continue the discussion about a compression design.
>> > >>>> > >
>> > >>>> > > At the moment, I found only one solution which is compatible
>> with
>> > >>>> > querying
>> > >>>> > > and indexing, this is per-objects-field compression.
>> > >>>> > > Per-fields compression means that metadata (a header) of an
>> object
>> > >>>> won't
>> > >>>> > > be compressed, only serialized values of an object fields (in
>> > bytes
>> > >>>> array
>> > >>>> > > form) will be compressed.
>> > >>>> > >
>> > >>>> > > This solution have some contentious issues:
>> > >>>> > > - small values, like primitives and short arrays - there isn't
>> > >>>> sense to
>> > >>>> > > compress them;
>> > >>>> > > - there is no possible to use compression with java-predefined
>> > >>>> types;
>> > >>>> > >
>> > >>>> > > We can provide an annotation, @IgniteCompression - for example,
>> > >>>> which can
>> > >>>> > > be used by users for marking fields to compress.
>> > >>>> > >
>> > >>>> > > Any thoughts?
>> > >>>> > >
>> > >>>> > > Maybe someone already have ready design?
>> > >>>> > >
>> > >>>> > > 2017-04-10 11:06 GMT+03:00 Vyacheslav Daradur <
>> > [hidden email]
>> > >>>> >:
>> > >>>> > >
>> > >>>> > >> Alexey,
>> > >>>> > >>
>> > >>>> > >> Yes, I've read it.
>> > >>>> > >>
>> > >>>> > >> Ok, let's discuss about public API design.
>> > >>>> > >>
>> > >>>> > >> I think we need to add some a configure entity to
>> > >>>> CacheConfiguration,
>> > >>>> > >> which will contain the Compressor interface implementation and
>> > some
>> > >>>> > usefull
>> > >>>> > >> parameters.
>> > >>>> > >> Or maybe to provide a BinaryMarshaller decorator, which will
>> be
>> > >>>> compress
>> > >>>> > >> data after marshalling.
>> > >>>> > >>
>> > >>>> > >>
>> > >>>> > >> 2017-04-10 10:40 GMT+03:00 Alexey Kuznetsov <
>> > [hidden email]
>> > >>>> >:
>> > >>>> > >>
>> > >>>> > >>> Vyacheslav,
>> > >>>> > >>>
>> > >>>> > >>> Did you read initial discussion [1] about compression?
>> > >>>> > >>> As far as I remember we agreed to add only some "top-level"
>> API
>> > in
>> > >>>> > order
>> > >>>> > >>> to
>> > >>>> > >>> provide a way for
>> > >>>> > >>> Ignite users to inject some sort of custom compression.
>> > >>>> > >>>
>> > >>>> > >>>
>> > >>>> > >>> [1]
>> > >>>> > >>> http://apache-ignite-developers.2346864.n4.nabble.com/Data-c
>> > >>>> > >>> ompression-in-Ignite-2-0-td10099.html
>> > >>>> > >>>
>> > >>>> > >>> On Mon, Apr 10, 2017 at 2:19 PM, daradurvs <
>> [hidden email]
>> > >
>> > >>>> > wrote:
>> > >>>> > >>>
>> > >>>> > >>> > Hi Igniters!
>> > >>>> > >>> >
>> > >>>> > >>> > I am interested in this task.
>> > >>>> > >>> > Provide some kind of pluggable compression SPI support
>> > >>>> > >>> > <https://issues.apache.org/jira/browse/IGNITE-3592>
>> > >>>> > >>> >
>> > >>>> > >>> > I developed a solution on BinaryMarshaller-level, but
>> reviewer
>> > >>>> has
>> > >>>> > >>> rejected
>> > >>>> > >>> > it.
>> > >>>> > >>> >
>> > >>>> > >>> > Let's continue discussion of task goals and solution
>> design.
>> > >>>> > >>> > As I understood that, the main goal of this task is to
>> store
>> > >>>> data in
>> > >>>> > >>> > compressed form.
>> > >>>> > >>> > This is what I need from Ignite as its user. Compression
>> > >>>> provides
>> > >>>> > >>> economy
>> > >>>> > >>> > on
>> > >>>> > >>> > servers.
>> > >>>> > >>> > We can store more data on same servers at the cost of
>> > >>>> increasing CPU
>> > >>>> > >>> > utilization.
>> > >>>> > >>> >
>> > >>>> > >>> > I'm researching a possibility of implementation of
>> compression
>> > >>>> at the
>> > >>>> > >>> > cache-level.
>> > >>>> > >>> >
>> > >>>> > >>> > Any thoughts?
>> > >>>> > >>> >
>> > >>>> > >>> > --
>> > >>>> > >>> > Best regards,
>> > >>>> > >>> > Vyacheslav
>> > >>>> > >>> >
>> > >>>> > >>> >
>> > >>>> > >>> >
>> > >>>> > >>> >
>> > >>>> > >>> > --
>> > >>>> > >>> > View this message in context: http://apache-ignite-
>> > >>>> > >>> > developers.2346864.n4.nabble.com/Data-compression-in-
>> > >>>> > >>> > Ignite-2-0-tp10099p16317.html
>> > >>>> > >>> > Sent from the Apache Ignite Developers mailing list
>> archive at
>> > >>>> > >>> Nabble.com.
>> > >>>> > >>> >
>> > >>>> > >>>
>> > >>>> > >>>
>> > >>>> > >>>
>> > >>>> > >>> --
>> > >>>> > >>> Alexey Kuznetsov
>> > >>>> > >>>
>> > >>>> > >>
>> > >>>> > >>
>> > >>>> > >>
>> > >>>> > >> --
>> > >>>> > >> Best Regards, Vyacheslav
>> > >>>> > >>
>> > >>>> > >
>> > >>>> > >
>> > >>>> > >
>> > >>>> > > --
>> > >>>> > > Best Regards, Vyacheslav
>> > >>>> > >
>> > >>>> >
>> > >>>> >
>> > >>>> >
>> > >>>> > --
>> > >>>> > Best Regards, Vyacheslav
>> > >>>> >
>> > >>>>
>> > >>>
>> > >>>
>> > >>>
>> > >>> --
>> > >>> Best Regards, Vyacheslav
>> > >>>
>> > >>
>> > >>
>> > >>
>> > >> --
>> > >> Best Regards, Vyacheslav
>> > >>
>> > >
>> > >
>> > >
>> > > --
>> > > Best Regards, Vyacheslav
>> > >
>> >
>> >
>> >
>> > --
>> > Best Regards, Vyacheslav
>> >
>>
>
>
>
> --
> Best Regards, Vyacheslav
>



--
Best Regards, Vyacheslav
Reply | Threaded
Open this post in threaded view
|

Re: Data compression in Ignite 2.0

Антон Чураев
In reply to this post by daradurvs
Vyacheslav, thank you! But could you please provide a conclusions or
proposals based on this benchmarks?

2017-06-06 11:28 GMT+03:00 Vyacheslav Daradur <[hidden email]>:

> Dmitry,
>
> Excel-pages:
>
> 1). "Compression ratio (2)" - shows object size, with compression and
> without compression. (Conditions: literal text)
> 1st graph shows compression ratios of using different compression algrithms
> depending on size of compressed field.
> 2nd graph shows evaluation of size of objects depending on sizes and
> compression algorithms.
>
> 2). "Compression ratio (1)" - shows object size, with compression and
> without compression. (Conditions:  badly compressed character sequence)
> 1st graph shows compression ratios of using different compression
> algrithms depending on size of compressed field.
> 2nd graph shows evaluation of size of objects depending on sizes and
> compression algorithms.
>
> 3) 'put-avg" - shows average time of the "put" operation depending on size
> and compression algorithms.
>
> 4) 'put-thrpt" - shows throughput of the "put" operation depending on size
> and compression algorithms.
>
> 5) 'get-avg" - shows average time of the "get" operation depending on size
> and compression algorithms.
>
> 6) 'get-thrpt" - shows throughput of the "get" operation depending on size
> and compression algorithms.
>
>
>
>
> 2017-06-06 10:59 GMT+03:00 Dmitriy Setrakyan <[hidden email]>:
>
> > Vladimir, I am not sure how to interpret the graphs? What are we looking
> > at?
> >
> > On Tue, Jun 6, 2017 at 12:33 AM, Vyacheslav Daradur <[hidden email]
> >
> > wrote:
> >
> > > Hi, Igniters.
> > >
> > > I've prepared some benchmarking. Results [1].
> > >
> > > And I've prepared the evaluation in the form of diagrams [2].
> > >
> > > I hope that helps to interest the community and accelerates a reaction
> to
> > > this improvment :)
> > >
> > > [1]
> > > https://github.com/daradurvs/ignite-compression/tree/
> > > master/src/main/resources/result
> > > [2] https://drive.google.com/file/d/0B2CeUAOgrHkoMklyZ25YTEdKcEk/view
> > >
> > >
> > >
> > > 2017-05-24 9:49 GMT+03:00 Vyacheslav Daradur <[hidden email]>:
> > >
> > > > Guys, any thoughts?
> > > >
> > > > 2017-05-16 13:40 GMT+03:00 Vyacheslav Daradur <[hidden email]>:
> > > >
> > > >> Hi guys,
> > > >>
> > > >> I've prepared the PR to show my idea.
> > > >> https://github.com/apache/ignite/pull/1951/files
> > > >>
> > > >> About querying - I've just copied existing tests and have annotated
> > the
> > > >> testing data.
> > > >> https://github.com/apache/ignite/pull/1951/files#diff-c19a9d
> > > >> f4058141d059bb577e75244764
> > > >>
> > > >> It means fields which will be marked by @BinaryCompression will be
> > > >> compressed at marshalling via BinaryMarshaller.
> > > >>
> > > >> This solution has no effect on existing data or project
> architecture.
> > > >>
> > > >> I'll be glad to see your thougths.
> > > >>
> > > >>
> > > >> 2017-05-15 19:18 GMT+03:00 Vyacheslav Daradur <[hidden email]
> >:
> > > >>
> > > >>> Dmitriy,
> > > >>>
> > > >>> I have ready prototype. I want to show it.
> > > >>> It is always easier to discuss on example.
> > > >>>
> > > >>> 2017-05-15 19:02 GMT+03:00 Dmitriy Setrakyan <
> [hidden email]
> > >:
> > > >>>
> > > >>>> Vyacheslav,
> > > >>>>
> > > >>>> I think it is a bit premature to provide a PR without getting a
> > > >>>> community
> > > >>>> consensus on the dev list. Please allow some time for the
> community
> > to
> > > >>>> respond.
> > > >>>>
> > > >>>> D.
> > > >>>>
> > > >>>> On Mon, May 15, 2017 at 6:36 AM, Vyacheslav Daradur <
> > > >>>> [hidden email]>
> > > >>>> wrote:
> > > >>>>
> > > >>>> > I created the ticket: https://issues.apache.org/jira
> > > >>>> /browse/IGNITE-5226
> > > >>>> >
> > > >>>> > I'll prepare a PR with described solution in couple of days.
> > > >>>> >
> > > >>>> > 2017-05-15 15:05 GMT+03:00 Vyacheslav Daradur <
> > [hidden email]
> > > >:
> > > >>>> >
> > > >>>> > > Hi, Igniters!
> > > >>>> > >
> > > >>>> > > Apache 2.0 is released.
> > > >>>> > >
> > > >>>> > > Let's continue the discussion about a compression design.
> > > >>>> > >
> > > >>>> > > At the moment, I found only one solution which is compatible
> > with
> > > >>>> > querying
> > > >>>> > > and indexing, this is per-objects-field compression.
> > > >>>> > > Per-fields compression means that metadata (a header) of an
> > object
> > > >>>> won't
> > > >>>> > > be compressed, only serialized values of an object fields (in
> > > bytes
> > > >>>> array
> > > >>>> > > form) will be compressed.
> > > >>>> > >
> > > >>>> > > This solution have some contentious issues:
> > > >>>> > > - small values, like primitives and short arrays - there isn't
> > > >>>> sense to
> > > >>>> > > compress them;
> > > >>>> > > - there is no possible to use compression with java-predefined
> > > >>>> types;
> > > >>>> > >
> > > >>>> > > We can provide an annotation, @IgniteCompression - for
> example,
> > > >>>> which can
> > > >>>> > > be used by users for marking fields to compress.
> > > >>>> > >
> > > >>>> > > Any thoughts?
> > > >>>> > >
> > > >>>> > > Maybe someone already have ready design?
> > > >>>> > >
> > > >>>> > > 2017-04-10 11:06 GMT+03:00 Vyacheslav Daradur <
> > > [hidden email]
> > > >>>> >:
> > > >>>> > >
> > > >>>> > >> Alexey,
> > > >>>> > >>
> > > >>>> > >> Yes, I've read it.
> > > >>>> > >>
> > > >>>> > >> Ok, let's discuss about public API design.
> > > >>>> > >>
> > > >>>> > >> I think we need to add some a configure entity to
> > > >>>> CacheConfiguration,
> > > >>>> > >> which will contain the Compressor interface implementation
> and
> > > some
> > > >>>> > usefull
> > > >>>> > >> parameters.
> > > >>>> > >> Or maybe to provide a BinaryMarshaller decorator, which will
> be
> > > >>>> compress
> > > >>>> > >> data after marshalling.
> > > >>>> > >>
> > > >>>> > >>
> > > >>>> > >> 2017-04-10 10:40 GMT+03:00 Alexey Kuznetsov <
> > > [hidden email]
> > > >>>> >:
> > > >>>> > >>
> > > >>>> > >>> Vyacheslav,
> > > >>>> > >>>
> > > >>>> > >>> Did you read initial discussion [1] about compression?
> > > >>>> > >>> As far as I remember we agreed to add only some "top-level"
> > API
> > > in
> > > >>>> > order
> > > >>>> > >>> to
> > > >>>> > >>> provide a way for
> > > >>>> > >>> Ignite users to inject some sort of custom compression.
> > > >>>> > >>>
> > > >>>> > >>>
> > > >>>> > >>> [1]
> > > >>>> > >>> http://apache-ignite-developers.2346864.n4.nabble.
> com/Data-c
> > > >>>> > >>> ompression-in-Ignite-2-0-td10099.html
> > > >>>> > >>>
> > > >>>> > >>> On Mon, Apr 10, 2017 at 2:19 PM, daradurvs <
> > [hidden email]
> > > >
> > > >>>> > wrote:
> > > >>>> > >>>
> > > >>>> > >>> > Hi Igniters!
> > > >>>> > >>> >
> > > >>>> > >>> > I am interested in this task.
> > > >>>> > >>> > Provide some kind of pluggable compression SPI support
> > > >>>> > >>> > <https://issues.apache.org/jira/browse/IGNITE-3592>
> > > >>>> > >>> >
> > > >>>> > >>> > I developed a solution on BinaryMarshaller-level, but
> > reviewer
> > > >>>> has
> > > >>>> > >>> rejected
> > > >>>> > >>> > it.
> > > >>>> > >>> >
> > > >>>> > >>> > Let's continue discussion of task goals and solution
> design.
> > > >>>> > >>> > As I understood that, the main goal of this task is to
> store
> > > >>>> data in
> > > >>>> > >>> > compressed form.
> > > >>>> > >>> > This is what I need from Ignite as its user. Compression
> > > >>>> provides
> > > >>>> > >>> economy
> > > >>>> > >>> > on
> > > >>>> > >>> > servers.
> > > >>>> > >>> > We can store more data on same servers at the cost of
> > > >>>> increasing CPU
> > > >>>> > >>> > utilization.
> > > >>>> > >>> >
> > > >>>> > >>> > I'm researching a possibility of implementation of
> > compression
> > > >>>> at the
> > > >>>> > >>> > cache-level.
> > > >>>> > >>> >
> > > >>>> > >>> > Any thoughts?
> > > >>>> > >>> >
> > > >>>> > >>> > --
> > > >>>> > >>> > Best regards,
> > > >>>> > >>> > Vyacheslav
> > > >>>> > >>> >
> > > >>>> > >>> >
> > > >>>> > >>> >
> > > >>>> > >>> >
> > > >>>> > >>> > --
> > > >>>> > >>> > View this message in context: http://apache-ignite-
> > > >>>> > >>> > developers.2346864.n4.nabble.com/Data-compression-in-
> > > >>>> > >>> > Ignite-2-0-tp10099p16317.html
> > > >>>> > >>> > Sent from the Apache Ignite Developers mailing list
> archive
> > at
> > > >>>> > >>> Nabble.com.
> > > >>>> > >>> >
> > > >>>> > >>>
> > > >>>> > >>>
> > > >>>> > >>>
> > > >>>> > >>> --
> > > >>>> > >>> Alexey Kuznetsov
> > > >>>> > >>>
> > > >>>> > >>
> > > >>>> > >>
> > > >>>> > >>
> > > >>>> > >> --
> > > >>>> > >> Best Regards, Vyacheslav
> > > >>>> > >>
> > > >>>> > >
> > > >>>> > >
> > > >>>> > >
> > > >>>> > > --
> > > >>>> > > Best Regards, Vyacheslav
> > > >>>> > >
> > > >>>> >
> > > >>>> >
> > > >>>> >
> > > >>>> > --
> > > >>>> > Best Regards, Vyacheslav
> > > >>>> >
> > > >>>>
> > > >>>
> > > >>>
> > > >>>
> > > >>> --
> > > >>> Best Regards, Vyacheslav
> > > >>>
> > > >>
> > > >>
> > > >>
> > > >> --
> > > >> Best Regards, Vyacheslav
> > > >>
> > > >
> > > >
> > > >
> > > > --
> > > > Best Regards, Vyacheslav
> > > >
> > >
> > >
> > >
> > > --
> > > Best Regards, Vyacheslav
> > >
> >
>
>
>
> --
> Best Regards, Vyacheslav
>



--

Best Regards, Anton Churaev
Reply | Threaded
Open this post in threaded view
|

Re: Data compression in Ignite 2.0

daradurvs
Conclusion:
Provided solution allows reduce size of an object in IgniteCache at the
cost of throughput reduction (small - in some cases), it depends on part of
object which will be compressed and compression algorithm.
I mean, we can make more effective use of memory, and in some cases it can
reduce loading of the interconnect. (replication, rebalancing)

Especially, it will be particularly useful for object's fields which are
large text (>~ 250 bytes) and can be effectively compressed.

2017-06-06 12:00 GMT+03:00 Антон Чураев <[hidden email]>:

> Vyacheslav, thank you! But could you please provide a conclusions or
> proposals based on this benchmarks?
>
> 2017-06-06 11:28 GMT+03:00 Vyacheslav Daradur <[hidden email]>:
>
> > Dmitry,
> >
> > Excel-pages:
> >
> > 1). "Compression ratio (2)" - shows object size, with compression and
> > without compression. (Conditions: literal text)
> > 1st graph shows compression ratios of using different compression
> algrithms
> > depending on size of compressed field.
> > 2nd graph shows evaluation of size of objects depending on sizes and
> > compression algorithms.
> >
> > 2). "Compression ratio (1)" - shows object size, with compression and
> > without compression. (Conditions:  badly compressed character sequence)
> > 1st graph shows compression ratios of using different compression
> > algrithms depending on size of compressed field.
> > 2nd graph shows evaluation of size of objects depending on sizes and
> > compression algorithms.
> >
> > 3) 'put-avg" - shows average time of the "put" operation depending on
> size
> > and compression algorithms.
> >
> > 4) 'put-thrpt" - shows throughput of the "put" operation depending on
> size
> > and compression algorithms.
> >
> > 5) 'get-avg" - shows average time of the "get" operation depending on
> size
> > and compression algorithms.
> >
> > 6) 'get-thrpt" - shows throughput of the "get" operation depending on
> size
> > and compression algorithms.
> >
> >
> >
> >
> > 2017-06-06 10:59 GMT+03:00 Dmitriy Setrakyan <[hidden email]>:
> >
> > > Vladimir, I am not sure how to interpret the graphs? What are we
> looking
> > > at?
> > >
> > > On Tue, Jun 6, 2017 at 12:33 AM, Vyacheslav Daradur <
> [hidden email]
> > >
> > > wrote:
> > >
> > > > Hi, Igniters.
> > > >
> > > > I've prepared some benchmarking. Results [1].
> > > >
> > > > And I've prepared the evaluation in the form of diagrams [2].
> > > >
> > > > I hope that helps to interest the community and accelerates a
> reaction
> > to
> > > > this improvment :)
> > > >
> > > > [1]
> > > > https://github.com/daradurvs/ignite-compression/tree/
> > > > master/src/main/resources/result
> > > > [2] https://drive.google.com/file/d/0B2CeUAOgrHkoMklyZ25YTEdKcEk/
> view
> > > >
> > > >
> > > >
> > > > 2017-05-24 9:49 GMT+03:00 Vyacheslav Daradur <[hidden email]>:
> > > >
> > > > > Guys, any thoughts?
> > > > >
> > > > > 2017-05-16 13:40 GMT+03:00 Vyacheslav Daradur <[hidden email]
> >:
> > > > >
> > > > >> Hi guys,
> > > > >>
> > > > >> I've prepared the PR to show my idea.
> > > > >> https://github.com/apache/ignite/pull/1951/files
> > > > >>
> > > > >> About querying - I've just copied existing tests and have
> annotated
> > > the
> > > > >> testing data.
> > > > >> https://github.com/apache/ignite/pull/1951/files#diff-c19a9d
> > > > >> f4058141d059bb577e75244764
> > > > >>
> > > > >> It means fields which will be marked by @BinaryCompression will be
> > > > >> compressed at marshalling via BinaryMarshaller.
> > > > >>
> > > > >> This solution has no effect on existing data or project
> > architecture.
> > > > >>
> > > > >> I'll be glad to see your thougths.
> > > > >>
> > > > >>
> > > > >> 2017-05-15 19:18 GMT+03:00 Vyacheslav Daradur <
> [hidden email]
> > >:
> > > > >>
> > > > >>> Dmitriy,
> > > > >>>
> > > > >>> I have ready prototype. I want to show it.
> > > > >>> It is always easier to discuss on example.
> > > > >>>
> > > > >>> 2017-05-15 19:02 GMT+03:00 Dmitriy Setrakyan <
> > [hidden email]
> > > >:
> > > > >>>
> > > > >>>> Vyacheslav,
> > > > >>>>
> > > > >>>> I think it is a bit premature to provide a PR without getting a
> > > > >>>> community
> > > > >>>> consensus on the dev list. Please allow some time for the
> > community
> > > to
> > > > >>>> respond.
> > > > >>>>
> > > > >>>> D.
> > > > >>>>
> > > > >>>> On Mon, May 15, 2017 at 6:36 AM, Vyacheslav Daradur <
> > > > >>>> [hidden email]>
> > > > >>>> wrote:
> > > > >>>>
> > > > >>>> > I created the ticket: https://issues.apache.org/jira
> > > > >>>> /browse/IGNITE-5226
> > > > >>>> >
> > > > >>>> > I'll prepare a PR with described solution in couple of days.
> > > > >>>> >
> > > > >>>> > 2017-05-15 15:05 GMT+03:00 Vyacheslav Daradur <
> > > [hidden email]
> > > > >:
> > > > >>>> >
> > > > >>>> > > Hi, Igniters!
> > > > >>>> > >
> > > > >>>> > > Apache 2.0 is released.
> > > > >>>> > >
> > > > >>>> > > Let's continue the discussion about a compression design.
> > > > >>>> > >
> > > > >>>> > > At the moment, I found only one solution which is compatible
> > > with
> > > > >>>> > querying
> > > > >>>> > > and indexing, this is per-objects-field compression.
> > > > >>>> > > Per-fields compression means that metadata (a header) of an
> > > object
> > > > >>>> won't
> > > > >>>> > > be compressed, only serialized values of an object fields
> (in
> > > > bytes
> > > > >>>> array
> > > > >>>> > > form) will be compressed.
> > > > >>>> > >
> > > > >>>> > > This solution have some contentious issues:
> > > > >>>> > > - small values, like primitives and short arrays - there
> isn't
> > > > >>>> sense to
> > > > >>>> > > compress them;
> > > > >>>> > > - there is no possible to use compression with
> java-predefined
> > > > >>>> types;
> > > > >>>> > >
> > > > >>>> > > We can provide an annotation, @IgniteCompression - for
> > example,
> > > > >>>> which can
> > > > >>>> > > be used by users for marking fields to compress.
> > > > >>>> > >
> > > > >>>> > > Any thoughts?
> > > > >>>> > >
> > > > >>>> > > Maybe someone already have ready design?
> > > > >>>> > >
> > > > >>>> > > 2017-04-10 11:06 GMT+03:00 Vyacheslav Daradur <
> > > > [hidden email]
> > > > >>>> >:
> > > > >>>> > >
> > > > >>>> > >> Alexey,
> > > > >>>> > >>
> > > > >>>> > >> Yes, I've read it.
> > > > >>>> > >>
> > > > >>>> > >> Ok, let's discuss about public API design.
> > > > >>>> > >>
> > > > >>>> > >> I think we need to add some a configure entity to
> > > > >>>> CacheConfiguration,
> > > > >>>> > >> which will contain the Compressor interface implementation
> > and
> > > > some
> > > > >>>> > usefull
> > > > >>>> > >> parameters.
> > > > >>>> > >> Or maybe to provide a BinaryMarshaller decorator, which
> will
> > be
> > > > >>>> compress
> > > > >>>> > >> data after marshalling.
> > > > >>>> > >>
> > > > >>>> > >>
> > > > >>>> > >> 2017-04-10 10:40 GMT+03:00 Alexey Kuznetsov <
> > > > [hidden email]
> > > > >>>> >:
> > > > >>>> > >>
> > > > >>>> > >>> Vyacheslav,
> > > > >>>> > >>>
> > > > >>>> > >>> Did you read initial discussion [1] about compression?
> > > > >>>> > >>> As far as I remember we agreed to add only some
> "top-level"
> > > API
> > > > in
> > > > >>>> > order
> > > > >>>> > >>> to
> > > > >>>> > >>> provide a way for
> > > > >>>> > >>> Ignite users to inject some sort of custom compression.
> > > > >>>> > >>>
> > > > >>>> > >>>
> > > > >>>> > >>> [1]
> > > > >>>> > >>> http://apache-ignite-developers.2346864.n4.nabble.
> > com/Data-c
> > > > >>>> > >>> ompression-in-Ignite-2-0-td10099.html
> > > > >>>> > >>>
> > > > >>>> > >>> On Mon, Apr 10, 2017 at 2:19 PM, daradurvs <
> > > [hidden email]
> > > > >
> > > > >>>> > wrote:
> > > > >>>> > >>>
> > > > >>>> > >>> > Hi Igniters!
> > > > >>>> > >>> >
> > > > >>>> > >>> > I am interested in this task.
> > > > >>>> > >>> > Provide some kind of pluggable compression SPI support
> > > > >>>> > >>> > <https://issues.apache.org/jira/browse/IGNITE-3592>
> > > > >>>> > >>> >
> > > > >>>> > >>> > I developed a solution on BinaryMarshaller-level, but
> > > reviewer
> > > > >>>> has
> > > > >>>> > >>> rejected
> > > > >>>> > >>> > it.
> > > > >>>> > >>> >
> > > > >>>> > >>> > Let's continue discussion of task goals and solution
> > design.
> > > > >>>> > >>> > As I understood that, the main goal of this task is to
> > store
> > > > >>>> data in
> > > > >>>> > >>> > compressed form.
> > > > >>>> > >>> > This is what I need from Ignite as its user. Compression
> > > > >>>> provides
> > > > >>>> > >>> economy
> > > > >>>> > >>> > on
> > > > >>>> > >>> > servers.
> > > > >>>> > >>> > We can store more data on same servers at the cost of
> > > > >>>> increasing CPU
> > > > >>>> > >>> > utilization.
> > > > >>>> > >>> >
> > > > >>>> > >>> > I'm researching a possibility of implementation of
> > > compression
> > > > >>>> at the
> > > > >>>> > >>> > cache-level.
> > > > >>>> > >>> >
> > > > >>>> > >>> > Any thoughts?
> > > > >>>> > >>> >
> > > > >>>> > >>> > --
> > > > >>>> > >>> > Best regards,
> > > > >>>> > >>> > Vyacheslav
> > > > >>>> > >>> >
> > > > >>>> > >>> >
> > > > >>>> > >>> >
> > > > >>>> > >>> >
> > > > >>>> > >>> > --
> > > > >>>> > >>> > View this message in context: http://apache-ignite-
> > > > >>>> > >>> > developers.2346864.n4.nabble.com/Data-compression-in-
> > > > >>>> > >>> > Ignite-2-0-tp10099p16317.html
> > > > >>>> > >>> > Sent from the Apache Ignite Developers mailing list
> > archive
> > > at
> > > > >>>> > >>> Nabble.com.
> > > > >>>> > >>> >
> > > > >>>> > >>>
> > > > >>>> > >>>
> > > > >>>> > >>>
> > > > >>>> > >>> --
> > > > >>>> > >>> Alexey Kuznetsov
> > > > >>>> > >>>
> > > > >>>> > >>
> > > > >>>> > >>
> > > > >>>> > >>
> > > > >>>> > >> --
> > > > >>>> > >> Best Regards, Vyacheslav
> > > > >>>> > >>
> > > > >>>> > >
> > > > >>>> > >
> > > > >>>> > >
> > > > >>>> > > --
> > > > >>>> > > Best Regards, Vyacheslav
> > > > >>>> > >
> > > > >>>> >
> > > > >>>> >
> > > > >>>> >
> > > > >>>> > --
> > > > >>>> > Best Regards, Vyacheslav
> > > > >>>> >
> > > > >>>>
> > > > >>>
> > > > >>>
> > > > >>>
> > > > >>> --
> > > > >>> Best Regards, Vyacheslav
> > > > >>>
> > > > >>
> > > > >>
> > > > >>
> > > > >> --
> > > > >> Best Regards, Vyacheslav
> > > > >>
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Best Regards, Vyacheslav
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Best Regards, Vyacheslav
> > > >
> > >
> >
> >
> >
> > --
> > Best Regards, Vyacheslav
> >
>
>
>
> --
>
> Best Regards, Anton Churaev
>



--
Best Regards, Vyacheslav
Reply | Threaded
Open this post in threaded view
|

Re: Data compression in Ignite 2.0

daradurvs
I wish to note, that the results of benchmarking shows metrics from
stress-testing.
I mean in real scenarios, for example business operations, which take
milliseconds or seсonds, increase in time of put-get-operation will be
insignificant.

2017-06-06 14:48 GMT+03:00 Vyacheslav Daradur <[hidden email]>:

> Conclusion:
> Provided solution allows reduce size of an object in IgniteCache at the
> cost of throughput reduction (small - in some cases), it depends on part of
> object which will be compressed and compression algorithm.
> I mean, we can make more effective use of memory, and in some cases it can
> reduce loading of the interconnect. (replication, rebalancing)
>
> Especially, it will be particularly useful for object's fields which are
> large text (>~ 250 bytes) and can be effectively compressed.
>
> 2017-06-06 12:00 GMT+03:00 Антон Чураев <[hidden email]>:
>
>> Vyacheslav, thank you! But could you please provide a conclusions or
>> proposals based on this benchmarks?
>>
>> 2017-06-06 11:28 GMT+03:00 Vyacheslav Daradur <[hidden email]>:
>>
>> > Dmitry,
>> >
>> > Excel-pages:
>> >
>> > 1). "Compression ratio (2)" - shows object size, with compression and
>> > without compression. (Conditions: literal text)
>> > 1st graph shows compression ratios of using different compression
>> algrithms
>> > depending on size of compressed field.
>> > 2nd graph shows evaluation of size of objects depending on sizes and
>> > compression algorithms.
>> >
>> > 2). "Compression ratio (1)" - shows object size, with compression and
>> > without compression. (Conditions:  badly compressed character sequence)
>> > 1st graph shows compression ratios of using different compression
>> > algrithms depending on size of compressed field.
>> > 2nd graph shows evaluation of size of objects depending on sizes and
>> > compression algorithms.
>> >
>> > 3) 'put-avg" - shows average time of the "put" operation depending on
>> size
>> > and compression algorithms.
>> >
>> > 4) 'put-thrpt" - shows throughput of the "put" operation depending on
>> size
>> > and compression algorithms.
>> >
>> > 5) 'get-avg" - shows average time of the "get" operation depending on
>> size
>> > and compression algorithms.
>> >
>> > 6) 'get-thrpt" - shows throughput of the "get" operation depending on
>> size
>> > and compression algorithms.
>> >
>> >
>> >
>> >
>> > 2017-06-06 10:59 GMT+03:00 Dmitriy Setrakyan <[hidden email]>:
>> >
>> > > Vladimir, I am not sure how to interpret the graphs? What are we
>> looking
>> > > at?
>> > >
>> > > On Tue, Jun 6, 2017 at 12:33 AM, Vyacheslav Daradur <
>> [hidden email]
>> > >
>> > > wrote:
>> > >
>> > > > Hi, Igniters.
>> > > >
>> > > > I've prepared some benchmarking. Results [1].
>> > > >
>> > > > And I've prepared the evaluation in the form of diagrams [2].
>> > > >
>> > > > I hope that helps to interest the community and accelerates a
>> reaction
>> > to
>> > > > this improvment :)
>> > > >
>> > > > [1]
>> > > > https://github.com/daradurvs/ignite-compression/tree/
>> > > > master/src/main/resources/result
>> > > > [2] https://drive.google.com/file/d/0B2CeUAOgrHkoMklyZ25YTEdKcEk
>> /view
>> > > >
>> > > >
>> > > >
>> > > > 2017-05-24 9:49 GMT+03:00 Vyacheslav Daradur <[hidden email]>:
>> > > >
>> > > > > Guys, any thoughts?
>> > > > >
>> > > > > 2017-05-16 13:40 GMT+03:00 Vyacheslav Daradur <
>> [hidden email]>:
>> > > > >
>> > > > >> Hi guys,
>> > > > >>
>> > > > >> I've prepared the PR to show my idea.
>> > > > >> https://github.com/apache/ignite/pull/1951/files
>> > > > >>
>> > > > >> About querying - I've just copied existing tests and have
>> annotated
>> > > the
>> > > > >> testing data.
>> > > > >> https://github.com/apache/ignite/pull/1951/files#diff-c19a9d
>> > > > >> f4058141d059bb577e75244764
>> > > > >>
>> > > > >> It means fields which will be marked by @BinaryCompression will
>> be
>> > > > >> compressed at marshalling via BinaryMarshaller.
>> > > > >>
>> > > > >> This solution has no effect on existing data or project
>> > architecture.
>> > > > >>
>> > > > >> I'll be glad to see your thougths.
>> > > > >>
>> > > > >>
>> > > > >> 2017-05-15 19:18 GMT+03:00 Vyacheslav Daradur <
>> [hidden email]
>> > >:
>> > > > >>
>> > > > >>> Dmitriy,
>> > > > >>>
>> > > > >>> I have ready prototype. I want to show it.
>> > > > >>> It is always easier to discuss on example.
>> > > > >>>
>> > > > >>> 2017-05-15 19:02 GMT+03:00 Dmitriy Setrakyan <
>> > [hidden email]
>> > > >:
>> > > > >>>
>> > > > >>>> Vyacheslav,
>> > > > >>>>
>> > > > >>>> I think it is a bit premature to provide a PR without getting a
>> > > > >>>> community
>> > > > >>>> consensus on the dev list. Please allow some time for the
>> > community
>> > > to
>> > > > >>>> respond.
>> > > > >>>>
>> > > > >>>> D.
>> > > > >>>>
>> > > > >>>> On Mon, May 15, 2017 at 6:36 AM, Vyacheslav Daradur <
>> > > > >>>> [hidden email]>
>> > > > >>>> wrote:
>> > > > >>>>
>> > > > >>>> > I created the ticket: https://issues.apache.org/jira
>> > > > >>>> /browse/IGNITE-5226
>> > > > >>>> >
>> > > > >>>> > I'll prepare a PR with described solution in couple of days.
>> > > > >>>> >
>> > > > >>>> > 2017-05-15 15:05 GMT+03:00 Vyacheslav Daradur <
>> > > [hidden email]
>> > > > >:
>> > > > >>>> >
>> > > > >>>> > > Hi, Igniters!
>> > > > >>>> > >
>> > > > >>>> > > Apache 2.0 is released.
>> > > > >>>> > >
>> > > > >>>> > > Let's continue the discussion about a compression design.
>> > > > >>>> > >
>> > > > >>>> > > At the moment, I found only one solution which is
>> compatible
>> > > with
>> > > > >>>> > querying
>> > > > >>>> > > and indexing, this is per-objects-field compression.
>> > > > >>>> > > Per-fields compression means that metadata (a header) of an
>> > > object
>> > > > >>>> won't
>> > > > >>>> > > be compressed, only serialized values of an object fields
>> (in
>> > > > bytes
>> > > > >>>> array
>> > > > >>>> > > form) will be compressed.
>> > > > >>>> > >
>> > > > >>>> > > This solution have some contentious issues:
>> > > > >>>> > > - small values, like primitives and short arrays - there
>> isn't
>> > > > >>>> sense to
>> > > > >>>> > > compress them;
>> > > > >>>> > > - there is no possible to use compression with
>> java-predefined
>> > > > >>>> types;
>> > > > >>>> > >
>> > > > >>>> > > We can provide an annotation, @IgniteCompression - for
>> > example,
>> > > > >>>> which can
>> > > > >>>> > > be used by users for marking fields to compress.
>> > > > >>>> > >
>> > > > >>>> > > Any thoughts?
>> > > > >>>> > >
>> > > > >>>> > > Maybe someone already have ready design?
>> > > > >>>> > >
>> > > > >>>> > > 2017-04-10 11:06 GMT+03:00 Vyacheslav Daradur <
>> > > > [hidden email]
>> > > > >>>> >:
>> > > > >>>> > >
>> > > > >>>> > >> Alexey,
>> > > > >>>> > >>
>> > > > >>>> > >> Yes, I've read it.
>> > > > >>>> > >>
>> > > > >>>> > >> Ok, let's discuss about public API design.
>> > > > >>>> > >>
>> > > > >>>> > >> I think we need to add some a configure entity to
>> > > > >>>> CacheConfiguration,
>> > > > >>>> > >> which will contain the Compressor interface implementation
>> > and
>> > > > some
>> > > > >>>> > usefull
>> > > > >>>> > >> parameters.
>> > > > >>>> > >> Or maybe to provide a BinaryMarshaller decorator, which
>> will
>> > be
>> > > > >>>> compress
>> > > > >>>> > >> data after marshalling.
>> > > > >>>> > >>
>> > > > >>>> > >>
>> > > > >>>> > >> 2017-04-10 10:40 GMT+03:00 Alexey Kuznetsov <
>> > > > [hidden email]
>> > > > >>>> >:
>> > > > >>>> > >>
>> > > > >>>> > >>> Vyacheslav,
>> > > > >>>> > >>>
>> > > > >>>> > >>> Did you read initial discussion [1] about compression?
>> > > > >>>> > >>> As far as I remember we agreed to add only some
>> "top-level"
>> > > API
>> > > > in
>> > > > >>>> > order
>> > > > >>>> > >>> to
>> > > > >>>> > >>> provide a way for
>> > > > >>>> > >>> Ignite users to inject some sort of custom compression.
>> > > > >>>> > >>>
>> > > > >>>> > >>>
>> > > > >>>> > >>> [1]
>> > > > >>>> > >>> http://apache-ignite-developers.2346864.n4.nabble.
>> > com/Data-c
>> > > > >>>> > >>> ompression-in-Ignite-2-0-td10099.html
>> > > > >>>> > >>>
>> > > > >>>> > >>> On Mon, Apr 10, 2017 at 2:19 PM, daradurvs <
>> > > [hidden email]
>> > > > >
>> > > > >>>> > wrote:
>> > > > >>>> > >>>
>> > > > >>>> > >>> > Hi Igniters!
>> > > > >>>> > >>> >
>> > > > >>>> > >>> > I am interested in this task.
>> > > > >>>> > >>> > Provide some kind of pluggable compression SPI support
>> > > > >>>> > >>> > <https://issues.apache.org/jira/browse/IGNITE-3592>
>> > > > >>>> > >>> >
>> > > > >>>> > >>> > I developed a solution on BinaryMarshaller-level, but
>> > > reviewer
>> > > > >>>> has
>> > > > >>>> > >>> rejected
>> > > > >>>> > >>> > it.
>> > > > >>>> > >>> >
>> > > > >>>> > >>> > Let's continue discussion of task goals and solution
>> > design.
>> > > > >>>> > >>> > As I understood that, the main goal of this task is to
>> > store
>> > > > >>>> data in
>> > > > >>>> > >>> > compressed form.
>> > > > >>>> > >>> > This is what I need from Ignite as its user.
>> Compression
>> > > > >>>> provides
>> > > > >>>> > >>> economy
>> > > > >>>> > >>> > on
>> > > > >>>> > >>> > servers.
>> > > > >>>> > >>> > We can store more data on same servers at the cost of
>> > > > >>>> increasing CPU
>> > > > >>>> > >>> > utilization.
>> > > > >>>> > >>> >
>> > > > >>>> > >>> > I'm researching a possibility of implementation of
>> > > compression
>> > > > >>>> at the
>> > > > >>>> > >>> > cache-level.
>> > > > >>>> > >>> >
>> > > > >>>> > >>> > Any thoughts?
>> > > > >>>> > >>> >
>> > > > >>>> > >>> > --
>> > > > >>>> > >>> > Best regards,
>> > > > >>>> > >>> > Vyacheslav
>> > > > >>>> > >>> >
>> > > > >>>> > >>> >
>> > > > >>>> > >>> >
>> > > > >>>> > >>> >
>> > > > >>>> > >>> > --
>> > > > >>>> > >>> > View this message in context: http://apache-ignite-
>> > > > >>>> > >>> > developers.2346864.n4.nabble.com/Data-compression-in-
>> > > > >>>> > >>> > Ignite-2-0-tp10099p16317.html
>> > > > >>>> > >>> > Sent from the Apache Ignite Developers mailing list
>> > archive
>> > > at
>> > > > >>>> > >>> Nabble.com.
>> > > > >>>> > >>> >
>> > > > >>>> > >>>
>> > > > >>>> > >>>
>> > > > >>>> > >>>
>> > > > >>>> > >>> --
>> > > > >>>> > >>> Alexey Kuznetsov
>> > > > >>>> > >>>
>> > > > >>>> > >>
>> > > > >>>> > >>
>> > > > >>>> > >>
>> > > > >>>> > >> --
>> > > > >>>> > >> Best Regards, Vyacheslav
>> > > > >>>> > >>
>> > > > >>>> > >
>> > > > >>>> > >
>> > > > >>>> > >
>> > > > >>>> > > --
>> > > > >>>> > > Best Regards, Vyacheslav
>> > > > >>>> > >
>> > > > >>>> >
>> > > > >>>> >
>> > > > >>>> >
>> > > > >>>> > --
>> > > > >>>> > Best Regards, Vyacheslav
>> > > > >>>> >
>> > > > >>>>
>> > > > >>>
>> > > > >>>
>> > > > >>>
>> > > > >>> --
>> > > > >>> Best Regards, Vyacheslav
>> > > > >>>
>> > > > >>
>> > > > >>
>> > > > >>
>> > > > >> --
>> > > > >> Best Regards, Vyacheslav
>> > > > >>
>> > > > >
>> > > > >
>> > > > >
>> > > > > --
>> > > > > Best Regards, Vyacheslav
>> > > > >
>> > > >
>> > > >
>> > > >
>> > > > --
>> > > > Best Regards, Vyacheslav
>> > > >
>> > >
>> >
>> >
>> >
>> > --
>> > Best Regards, Vyacheslav
>> >
>>
>>
>>
>> --
>>
>> Best Regards, Anton Churaev
>>
>
>
>
> --
> Best Regards, Vyacheslav
>



--
Best Regards, Vyacheslav
Reply | Threaded
Open this post in threaded view
|

Re: Data compression in Ignite 2.0

Антон Чураев
In reply to this post by daradurvs
Vyacheslav,

Is it possible to propose implementation that can be switched on on-demand?
In this case it should not affect performance of current solution.

I mean, that users should make decision what is more important for them:
throutput or memory/net usage.
May be they will be choose not all objects, or only some attributes of
objects for compress.

2017-06-06 14:48 GMT+03:00 Vyacheslav Daradur <[hidden email]>:

> Conclusion:
> Provided solution allows reduce size of an object in IgniteCache at the
> cost of throughput reduction (small - in some cases), it depends on part of
> object which will be compressed and compression algorithm.
> I mean, we can make more effective use of memory, and in some cases it can
> reduce loading of the interconnect. (replication, rebalancing)
>
> Especially, it will be particularly useful for object's fields which are
> large text (>~ 250 bytes) and can be effectively compressed.
>
> 2017-06-06 12:00 GMT+03:00 Антон Чураев <[hidden email]>:
>
> > Vyacheslav, thank you! But could you please provide a conclusions or
> > proposals based on this benchmarks?
> >
> > 2017-06-06 11:28 GMT+03:00 Vyacheslav Daradur <[hidden email]>:
> >
> > > Dmitry,
> > >
> > > Excel-pages:
> > >
> > > 1). "Compression ratio (2)" - shows object size, with compression and
> > > without compression. (Conditions: literal text)
> > > 1st graph shows compression ratios of using different compression
> > algrithms
> > > depending on size of compressed field.
> > > 2nd graph shows evaluation of size of objects depending on sizes and
> > > compression algorithms.
> > >
> > > 2). "Compression ratio (1)" - shows object size, with compression and
> > > without compression. (Conditions:  badly compressed character sequence)
> > > 1st graph shows compression ratios of using different compression
> > > algrithms depending on size of compressed field.
> > > 2nd graph shows evaluation of size of objects depending on sizes and
> > > compression algorithms.
> > >
> > > 3) 'put-avg" - shows average time of the "put" operation depending on
> > size
> > > and compression algorithms.
> > >
> > > 4) 'put-thrpt" - shows throughput of the "put" operation depending on
> > size
> > > and compression algorithms.
> > >
> > > 5) 'get-avg" - shows average time of the "get" operation depending on
> > size
> > > and compression algorithms.
> > >
> > > 6) 'get-thrpt" - shows throughput of the "get" operation depending on
> > size
> > > and compression algorithms.
> > >
> > >
> > >
> > >
> > > 2017-06-06 10:59 GMT+03:00 Dmitriy Setrakyan <[hidden email]>:
> > >
> > > > Vladimir, I am not sure how to interpret the graphs? What are we
> > looking
> > > > at?
> > > >
> > > > On Tue, Jun 6, 2017 at 12:33 AM, Vyacheslav Daradur <
> > [hidden email]
> > > >
> > > > wrote:
> > > >
> > > > > Hi, Igniters.
> > > > >
> > > > > I've prepared some benchmarking. Results [1].
> > > > >
> > > > > And I've prepared the evaluation in the form of diagrams [2].
> > > > >
> > > > > I hope that helps to interest the community and accelerates a
> > reaction
> > > to
> > > > > this improvment :)
> > > > >
> > > > > [1]
> > > > > https://github.com/daradurvs/ignite-compression/tree/
> > > > > master/src/main/resources/result
> > > > > [2] https://drive.google.com/file/d/0B2CeUAOgrHkoMklyZ25YTEdKcEk/
> > view
> > > > >
> > > > >
> > > > >
> > > > > 2017-05-24 9:49 GMT+03:00 Vyacheslav Daradur <[hidden email]
> >:
> > > > >
> > > > > > Guys, any thoughts?
> > > > > >
> > > > > > 2017-05-16 13:40 GMT+03:00 Vyacheslav Daradur <
> [hidden email]
> > >:
> > > > > >
> > > > > >> Hi guys,
> > > > > >>
> > > > > >> I've prepared the PR to show my idea.
> > > > > >> https://github.com/apache/ignite/pull/1951/files
> > > > > >>
> > > > > >> About querying - I've just copied existing tests and have
> > annotated
> > > > the
> > > > > >> testing data.
> > > > > >> https://github.com/apache/ignite/pull/1951/files#diff-c19a9d
> > > > > >> f4058141d059bb577e75244764
> > > > > >>
> > > > > >> It means fields which will be marked by @BinaryCompression will
> be
> > > > > >> compressed at marshalling via BinaryMarshaller.
> > > > > >>
> > > > > >> This solution has no effect on existing data or project
> > > architecture.
> > > > > >>
> > > > > >> I'll be glad to see your thougths.
> > > > > >>
> > > > > >>
> > > > > >> 2017-05-15 19:18 GMT+03:00 Vyacheslav Daradur <
> > [hidden email]
> > > >:
> > > > > >>
> > > > > >>> Dmitriy,
> > > > > >>>
> > > > > >>> I have ready prototype. I want to show it.
> > > > > >>> It is always easier to discuss on example.
> > > > > >>>
> > > > > >>> 2017-05-15 19:02 GMT+03:00 Dmitriy Setrakyan <
> > > [hidden email]
> > > > >:
> > > > > >>>
> > > > > >>>> Vyacheslav,
> > > > > >>>>
> > > > > >>>> I think it is a bit premature to provide a PR without getting
> a
> > > > > >>>> community
> > > > > >>>> consensus on the dev list. Please allow some time for the
> > > community
> > > > to
> > > > > >>>> respond.
> > > > > >>>>
> > > > > >>>> D.
> > > > > >>>>
> > > > > >>>> On Mon, May 15, 2017 at 6:36 AM, Vyacheslav Daradur <
> > > > > >>>> [hidden email]>
> > > > > >>>> wrote:
> > > > > >>>>
> > > > > >>>> > I created the ticket: https://issues.apache.org/jira
> > > > > >>>> /browse/IGNITE-5226
> > > > > >>>> >
> > > > > >>>> > I'll prepare a PR with described solution in couple of days.
> > > > > >>>> >
> > > > > >>>> > 2017-05-15 15:05 GMT+03:00 Vyacheslav Daradur <
> > > > [hidden email]
> > > > > >:
> > > > > >>>> >
> > > > > >>>> > > Hi, Igniters!
> > > > > >>>> > >
> > > > > >>>> > > Apache 2.0 is released.
> > > > > >>>> > >
> > > > > >>>> > > Let's continue the discussion about a compression design.
> > > > > >>>> > >
> > > > > >>>> > > At the moment, I found only one solution which is
> compatible
> > > > with
> > > > > >>>> > querying
> > > > > >>>> > > and indexing, this is per-objects-field compression.
> > > > > >>>> > > Per-fields compression means that metadata (a header) of
> an
> > > > object
> > > > > >>>> won't
> > > > > >>>> > > be compressed, only serialized values of an object fields
> > (in
> > > > > bytes
> > > > > >>>> array
> > > > > >>>> > > form) will be compressed.
> > > > > >>>> > >
> > > > > >>>> > > This solution have some contentious issues:
> > > > > >>>> > > - small values, like primitives and short arrays - there
> > isn't
> > > > > >>>> sense to
> > > > > >>>> > > compress them;
> > > > > >>>> > > - there is no possible to use compression with
> > java-predefined
> > > > > >>>> types;
> > > > > >>>> > >
> > > > > >>>> > > We can provide an annotation, @IgniteCompression - for
> > > example,
> > > > > >>>> which can
> > > > > >>>> > > be used by users for marking fields to compress.
> > > > > >>>> > >
> > > > > >>>> > > Any thoughts?
> > > > > >>>> > >
> > > > > >>>> > > Maybe someone already have ready design?
> > > > > >>>> > >
> > > > > >>>> > > 2017-04-10 11:06 GMT+03:00 Vyacheslav Daradur <
> > > > > [hidden email]
> > > > > >>>> >:
> > > > > >>>> > >
> > > > > >>>> > >> Alexey,
> > > > > >>>> > >>
> > > > > >>>> > >> Yes, I've read it.
> > > > > >>>> > >>
> > > > > >>>> > >> Ok, let's discuss about public API design.
> > > > > >>>> > >>
> > > > > >>>> > >> I think we need to add some a configure entity to
> > > > > >>>> CacheConfiguration,
> > > > > >>>> > >> which will contain the Compressor interface
> implementation
> > > and
> > > > > some
> > > > > >>>> > usefull
> > > > > >>>> > >> parameters.
> > > > > >>>> > >> Or maybe to provide a BinaryMarshaller decorator, which
> > will
> > > be
> > > > > >>>> compress
> > > > > >>>> > >> data after marshalling.
> > > > > >>>> > >>
> > > > > >>>> > >>
> > > > > >>>> > >> 2017-04-10 10:40 GMT+03:00 Alexey Kuznetsov <
> > > > > [hidden email]
> > > > > >>>> >:
> > > > > >>>> > >>
> > > > > >>>> > >>> Vyacheslav,
> > > > > >>>> > >>>
> > > > > >>>> > >>> Did you read initial discussion [1] about compression?
> > > > > >>>> > >>> As far as I remember we agreed to add only some
> > "top-level"
> > > > API
> > > > > in
> > > > > >>>> > order
> > > > > >>>> > >>> to
> > > > > >>>> > >>> provide a way for
> > > > > >>>> > >>> Ignite users to inject some sort of custom compression.
> > > > > >>>> > >>>
> > > > > >>>> > >>>
> > > > > >>>> > >>> [1]
> > > > > >>>> > >>> http://apache-ignite-developers.2346864.n4.nabble.
> > > com/Data-c
> > > > > >>>> > >>> ompression-in-Ignite-2-0-td10099.html
> > > > > >>>> > >>>
> > > > > >>>> > >>> On Mon, Apr 10, 2017 at 2:19 PM, daradurvs <
> > > > [hidden email]
> > > > > >
> > > > > >>>> > wrote:
> > > > > >>>> > >>>
> > > > > >>>> > >>> > Hi Igniters!
> > > > > >>>> > >>> >
> > > > > >>>> > >>> > I am interested in this task.
> > > > > >>>> > >>> > Provide some kind of pluggable compression SPI support
> > > > > >>>> > >>> > <https://issues.apache.org/jira/browse/IGNITE-3592>
> > > > > >>>> > >>> >
> > > > > >>>> > >>> > I developed a solution on BinaryMarshaller-level, but
> > > > reviewer
> > > > > >>>> has
> > > > > >>>> > >>> rejected
> > > > > >>>> > >>> > it.
> > > > > >>>> > >>> >
> > > > > >>>> > >>> > Let's continue discussion of task goals and solution
> > > design.
> > > > > >>>> > >>> > As I understood that, the main goal of this task is to
> > > store
> > > > > >>>> data in
> > > > > >>>> > >>> > compressed form.
> > > > > >>>> > >>> > This is what I need from Ignite as its user.
> Compression
> > > > > >>>> provides
> > > > > >>>> > >>> economy
> > > > > >>>> > >>> > on
> > > > > >>>> > >>> > servers.
> > > > > >>>> > >>> > We can store more data on same servers at the cost of
> > > > > >>>> increasing CPU
> > > > > >>>> > >>> > utilization.
> > > > > >>>> > >>> >
> > > > > >>>> > >>> > I'm researching a possibility of implementation of
> > > > compression
> > > > > >>>> at the
> > > > > >>>> > >>> > cache-level.
> > > > > >>>> > >>> >
> > > > > >>>> > >>> > Any thoughts?
> > > > > >>>> > >>> >
> > > > > >>>> > >>> > --
> > > > > >>>> > >>> > Best regards,
> > > > > >>>> > >>> > Vyacheslav
> > > > > >>>> > >>> >
> > > > > >>>> > >>> >
> > > > > >>>> > >>> >
> > > > > >>>> > >>> >
> > > > > >>>> > >>> > --
> > > > > >>>> > >>> > View this message in context: http://apache-ignite-
> > > > > >>>> > >>> > developers.2346864.n4.nabble.com/Data-compression-in-
> > > > > >>>> > >>> > Ignite-2-0-tp10099p16317.html
> > > > > >>>> > >>> > Sent from the Apache Ignite Developers mailing list
> > > archive
> > > > at
> > > > > >>>> > >>> Nabble.com.
> > > > > >>>> > >>> >
> > > > > >>>> > >>>
> > > > > >>>> > >>>
> > > > > >>>> > >>>
> > > > > >>>> > >>> --
> > > > > >>>> > >>> Alexey Kuznetsov
> > > > > >>>> > >>>
> > > > > >>>> > >>
> > > > > >>>> > >>
> > > > > >>>> > >>
> > > > > >>>> > >> --
> > > > > >>>> > >> Best Regards, Vyacheslav
> > > > > >>>> > >>
> > > > > >>>> > >
> > > > > >>>> > >
> > > > > >>>> > >
> > > > > >>>> > > --
> > > > > >>>> > > Best Regards, Vyacheslav
> > > > > >>>> > >
> > > > > >>>> >
> > > > > >>>> >
> > > > > >>>> >
> > > > > >>>> > --
> > > > > >>>> > Best Regards, Vyacheslav
> > > > > >>>> >
> > > > > >>>>
> > > > > >>>
> > > > > >>>
> > > > > >>>
> > > > > >>> --
> > > > > >>> Best Regards, Vyacheslav
> > > > > >>>
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >> --
> > > > > >> Best Regards, Vyacheslav
> > > > > >>
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Best Regards, Vyacheslav
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Best Regards, Vyacheslav
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Best Regards, Vyacheslav
> > >
> >
> >
> >
> > --
> >
> > Best Regards, Anton Churaev
> >
>
>
>
> --
> Best Regards, Vyacheslav
>



--

Best Regards, Anton Churaev
Reply | Threaded
Open this post in threaded view
|

Re: Data compression in Ignite 2.0

daradurvs
Anton,

Of course, the solution does not affect on existing implementation. I mean,
there is no changes if user not use the annotation @BinaryCompression. (no
performance changes)
Only if user make decision to use compression on specific field or fields
of a class - in that case compression will be used at marshalling in
relation to annotated fields.

2017-06-06 15:10 GMT+03:00 Антон Чураев <[hidden email]>:

> Vyacheslav,
>
> Is it possible to propose implementation that can be switched on on-demand?
> In this case it should not affect performance of current solution.
>
> I mean, that users should make decision what is more important for them:
> throutput or memory/net usage.
> May be they will be choose not all objects, or only some attributes of
> objects for compress.
>
> 2017-06-06 14:48 GMT+03:00 Vyacheslav Daradur <[hidden email]>:
>
> > Conclusion:
> > Provided solution allows reduce size of an object in IgniteCache at the
> > cost of throughput reduction (small - in some cases), it depends on part
> of
> > object which will be compressed and compression algorithm.
> > I mean, we can make more effective use of memory, and in some cases it
> can
> > reduce loading of the interconnect. (replication, rebalancing)
> >
> > Especially, it will be particularly useful for object's fields which are
> > large text (>~ 250 bytes) and can be effectively compressed.
> >
> > 2017-06-06 12:00 GMT+03:00 Антон Чураев <[hidden email]>:
> >
> > > Vyacheslav, thank you! But could you please provide a conclusions or
> > > proposals based on this benchmarks?
> > >
> > > 2017-06-06 11:28 GMT+03:00 Vyacheslav Daradur <[hidden email]>:
> > >
> > > > Dmitry,
> > > >
> > > > Excel-pages:
> > > >
> > > > 1). "Compression ratio (2)" - shows object size, with compression and
> > > > without compression. (Conditions: literal text)
> > > > 1st graph shows compression ratios of using different compression
> > > algrithms
> > > > depending on size of compressed field.
> > > > 2nd graph shows evaluation of size of objects depending on sizes and
> > > > compression algorithms.
> > > >
> > > > 2). "Compression ratio (1)" - shows object size, with compression and
> > > > without compression. (Conditions:  badly compressed character
> sequence)
> > > > 1st graph shows compression ratios of using different compression
> > > > algrithms depending on size of compressed field.
> > > > 2nd graph shows evaluation of size of objects depending on sizes and
> > > > compression algorithms.
> > > >
> > > > 3) 'put-avg" - shows average time of the "put" operation depending on
> > > size
> > > > and compression algorithms.
> > > >
> > > > 4) 'put-thrpt" - shows throughput of the "put" operation depending on
> > > size
> > > > and compression algorithms.
> > > >
> > > > 5) 'get-avg" - shows average time of the "get" operation depending on
> > > size
> > > > and compression algorithms.
> > > >
> > > > 6) 'get-thrpt" - shows throughput of the "get" operation depending on
> > > size
> > > > and compression algorithms.
> > > >
> > > >
> > > >
> > > >
> > > > 2017-06-06 10:59 GMT+03:00 Dmitriy Setrakyan <[hidden email]
> >:
> > > >
> > > > > Vladimir, I am not sure how to interpret the graphs? What are we
> > > looking
> > > > > at?
> > > > >
> > > > > On Tue, Jun 6, 2017 at 12:33 AM, Vyacheslav Daradur <
> > > [hidden email]
> > > > >
> > > > > wrote:
> > > > >
> > > > > > Hi, Igniters.
> > > > > >
> > > > > > I've prepared some benchmarking. Results [1].
> > > > > >
> > > > > > And I've prepared the evaluation in the form of diagrams [2].
> > > > > >
> > > > > > I hope that helps to interest the community and accelerates a
> > > reaction
> > > > to
> > > > > > this improvment :)
> > > > > >
> > > > > > [1]
> > > > > > https://github.com/daradurvs/ignite-compression/tree/
> > > > > > master/src/main/resources/result
> > > > > > [2] https://drive.google.com/file/d/
> 0B2CeUAOgrHkoMklyZ25YTEdKcEk/
> > > view
> > > > > >
> > > > > >
> > > > > >
> > > > > > 2017-05-24 9:49 GMT+03:00 Vyacheslav Daradur <
> [hidden email]
> > >:
> > > > > >
> > > > > > > Guys, any thoughts?
> > > > > > >
> > > > > > > 2017-05-16 13:40 GMT+03:00 Vyacheslav Daradur <
> > [hidden email]
> > > >:
> > > > > > >
> > > > > > >> Hi guys,
> > > > > > >>
> > > > > > >> I've prepared the PR to show my idea.
> > > > > > >> https://github.com/apache/ignite/pull/1951/files
> > > > > > >>
> > > > > > >> About querying - I've just copied existing tests and have
> > > annotated
> > > > > the
> > > > > > >> testing data.
> > > > > > >> https://github.com/apache/ignite/pull/1951/files#diff-c19a9d
> > > > > > >> f4058141d059bb577e75244764
> > > > > > >>
> > > > > > >> It means fields which will be marked by @BinaryCompression
> will
> > be
> > > > > > >> compressed at marshalling via BinaryMarshaller.
> > > > > > >>
> > > > > > >> This solution has no effect on existing data or project
> > > > architecture.
> > > > > > >>
> > > > > > >> I'll be glad to see your thougths.
> > > > > > >>
> > > > > > >>
> > > > > > >> 2017-05-15 19:18 GMT+03:00 Vyacheslav Daradur <
> > > [hidden email]
> > > > >:
> > > > > > >>
> > > > > > >>> Dmitriy,
> > > > > > >>>
> > > > > > >>> I have ready prototype. I want to show it.
> > > > > > >>> It is always easier to discuss on example.
> > > > > > >>>
> > > > > > >>> 2017-05-15 19:02 GMT+03:00 Dmitriy Setrakyan <
> > > > [hidden email]
> > > > > >:
> > > > > > >>>
> > > > > > >>>> Vyacheslav,
> > > > > > >>>>
> > > > > > >>>> I think it is a bit premature to provide a PR without
> getting
> > a
> > > > > > >>>> community
> > > > > > >>>> consensus on the dev list. Please allow some time for the
> > > > community
> > > > > to
> > > > > > >>>> respond.
> > > > > > >>>>
> > > > > > >>>> D.
> > > > > > >>>>
> > > > > > >>>> On Mon, May 15, 2017 at 6:36 AM, Vyacheslav Daradur <
> > > > > > >>>> [hidden email]>
> > > > > > >>>> wrote:
> > > > > > >>>>
> > > > > > >>>> > I created the ticket: https://issues.apache.org/jira
> > > > > > >>>> /browse/IGNITE-5226
> > > > > > >>>> >
> > > > > > >>>> > I'll prepare a PR with described solution in couple of
> days.
> > > > > > >>>> >
> > > > > > >>>> > 2017-05-15 15:05 GMT+03:00 Vyacheslav Daradur <
> > > > > [hidden email]
> > > > > > >:
> > > > > > >>>> >
> > > > > > >>>> > > Hi, Igniters!
> > > > > > >>>> > >
> > > > > > >>>> > > Apache 2.0 is released.
> > > > > > >>>> > >
> > > > > > >>>> > > Let's continue the discussion about a compression
> design.
> > > > > > >>>> > >
> > > > > > >>>> > > At the moment, I found only one solution which is
> > compatible
> > > > > with
> > > > > > >>>> > querying
> > > > > > >>>> > > and indexing, this is per-objects-field compression.
> > > > > > >>>> > > Per-fields compression means that metadata (a header) of
> > an
> > > > > object
> > > > > > >>>> won't
> > > > > > >>>> > > be compressed, only serialized values of an object
> fields
> > > (in
> > > > > > bytes
> > > > > > >>>> array
> > > > > > >>>> > > form) will be compressed.
> > > > > > >>>> > >
> > > > > > >>>> > > This solution have some contentious issues:
> > > > > > >>>> > > - small values, like primitives and short arrays - there
> > > isn't
> > > > > > >>>> sense to
> > > > > > >>>> > > compress them;
> > > > > > >>>> > > - there is no possible to use compression with
> > > java-predefined
> > > > > > >>>> types;
> > > > > > >>>> > >
> > > > > > >>>> > > We can provide an annotation, @IgniteCompression - for
> > > > example,
> > > > > > >>>> which can
> > > > > > >>>> > > be used by users for marking fields to compress.
> > > > > > >>>> > >
> > > > > > >>>> > > Any thoughts?
> > > > > > >>>> > >
> > > > > > >>>> > > Maybe someone already have ready design?
> > > > > > >>>> > >
> > > > > > >>>> > > 2017-04-10 11:06 GMT+03:00 Vyacheslav Daradur <
> > > > > > [hidden email]
> > > > > > >>>> >:
> > > > > > >>>> > >
> > > > > > >>>> > >> Alexey,
> > > > > > >>>> > >>
> > > > > > >>>> > >> Yes, I've read it.
> > > > > > >>>> > >>
> > > > > > >>>> > >> Ok, let's discuss about public API design.
> > > > > > >>>> > >>
> > > > > > >>>> > >> I think we need to add some a configure entity to
> > > > > > >>>> CacheConfiguration,
> > > > > > >>>> > >> which will contain the Compressor interface
> > implementation
> > > > and
> > > > > > some
> > > > > > >>>> > usefull
> > > > > > >>>> > >> parameters.
> > > > > > >>>> > >> Or maybe to provide a BinaryMarshaller decorator, which
> > > will
> > > > be
> > > > > > >>>> compress
> > > > > > >>>> > >> data after marshalling.
> > > > > > >>>> > >>
> > > > > > >>>> > >>
> > > > > > >>>> > >> 2017-04-10 10:40 GMT+03:00 Alexey Kuznetsov <
> > > > > > [hidden email]
> > > > > > >>>> >:
> > > > > > >>>> > >>
> > > > > > >>>> > >>> Vyacheslav,
> > > > > > >>>> > >>>
> > > > > > >>>> > >>> Did you read initial discussion [1] about compression?
> > > > > > >>>> > >>> As far as I remember we agreed to add only some
> > > "top-level"
> > > > > API
> > > > > > in
> > > > > > >>>> > order
> > > > > > >>>> > >>> to
> > > > > > >>>> > >>> provide a way for
> > > > > > >>>> > >>> Ignite users to inject some sort of custom
> compression.
> > > > > > >>>> > >>>
> > > > > > >>>> > >>>
> > > > > > >>>> > >>> [1]
> > > > > > >>>> > >>> http://apache-ignite-developers.2346864.n4.nabble.
> > > > com/Data-c
> > > > > > >>>> > >>> ompression-in-Ignite-2-0-td10099.html
> > > > > > >>>> > >>>
> > > > > > >>>> > >>> On Mon, Apr 10, 2017 at 2:19 PM, daradurvs <
> > > > > [hidden email]
> > > > > > >
> > > > > > >>>> > wrote:
> > > > > > >>>> > >>>
> > > > > > >>>> > >>> > Hi Igniters!
> > > > > > >>>> > >>> >
> > > > > > >>>> > >>> > I am interested in this task.
> > > > > > >>>> > >>> > Provide some kind of pluggable compression SPI
> support
> > > > > > >>>> > >>> > <https://issues.apache.org/jira/browse/IGNITE-3592>
> > > > > > >>>> > >>> >
> > > > > > >>>> > >>> > I developed a solution on BinaryMarshaller-level,
> but
> > > > > reviewer
> > > > > > >>>> has
> > > > > > >>>> > >>> rejected
> > > > > > >>>> > >>> > it.
> > > > > > >>>> > >>> >
> > > > > > >>>> > >>> > Let's continue discussion of task goals and solution
> > > > design.
> > > > > > >>>> > >>> > As I understood that, the main goal of this task is
> to
> > > > store
> > > > > > >>>> data in
> > > > > > >>>> > >>> > compressed form.
> > > > > > >>>> > >>> > This is what I need from Ignite as its user.
> > Compression
> > > > > > >>>> provides
> > > > > > >>>> > >>> economy
> > > > > > >>>> > >>> > on
> > > > > > >>>> > >>> > servers.
> > > > > > >>>> > >>> > We can store more data on same servers at the cost
> of
> > > > > > >>>> increasing CPU
> > > > > > >>>> > >>> > utilization.
> > > > > > >>>> > >>> >
> > > > > > >>>> > >>> > I'm researching a possibility of implementation of
> > > > > compression
> > > > > > >>>> at the
> > > > > > >>>> > >>> > cache-level.
> > > > > > >>>> > >>> >
> > > > > > >>>> > >>> > Any thoughts?
> > > > > > >>>> > >>> >
> > > > > > >>>> > >>> > --
> > > > > > >>>> > >>> > Best regards,
> > > > > > >>>> > >>> > Vyacheslav
> > > > > > >>>> > >>> >
> > > > > > >>>> > >>> >
> > > > > > >>>> > >>> >
> > > > > > >>>> > >>> >
> > > > > > >>>> > >>> > --
> > > > > > >>>> > >>> > View this message in context: http://apache-ignite-
> > > > > > >>>> > >>> > developers.2346864.n4.nabble.
> com/Data-compression-in-
> > > > > > >>>> > >>> > Ignite-2-0-tp10099p16317.html
> > > > > > >>>> > >>> > Sent from the Apache Ignite Developers mailing list
> > > > archive
> > > > > at
> > > > > > >>>> > >>> Nabble.com.
> > > > > > >>>> > >>> >
> > > > > > >>>> > >>>
> > > > > > >>>> > >>>
> > > > > > >>>> > >>>
> > > > > > >>>> > >>> --
> > > > > > >>>> > >>> Alexey Kuznetsov
> > > > > > >>>> > >>>
> > > > > > >>>> > >>
> > > > > > >>>> > >>
> > > > > > >>>> > >>
> > > > > > >>>> > >> --
> > > > > > >>>> > >> Best Regards, Vyacheslav
> > > > > > >>>> > >>
> > > > > > >>>> > >
> > > > > > >>>> > >
> > > > > > >>>> > >
> > > > > > >>>> > > --
> > > > > > >>>> > > Best Regards, Vyacheslav
> > > > > > >>>> > >
> > > > > > >>>> >
> > > > > > >>>> >
> > > > > > >>>> >
> > > > > > >>>> > --
> > > > > > >>>> > Best Regards, Vyacheslav
> > > > > > >>>> >
> > > > > > >>>>
> > > > > > >>>
> > > > > > >>>
> > > > > > >>>
> > > > > > >>> --
> > > > > > >>> Best Regards, Vyacheslav
> > > > > > >>>
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > > >> --
> > > > > > >> Best Regards, Vyacheslav
> > > > > > >>
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Best Regards, Vyacheslav
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Best Regards, Vyacheslav
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Best Regards, Vyacheslav
> > > >
> > >
> > >
> > >
> > > --
> > >
> > > Best Regards, Anton Churaev
> > >
> >
> >
> >
> > --
> > Best Regards, Vyacheslav
> >
>
>
>
> --
>
> Best Regards, Anton Churaev
>



--
Best Regards, Vyacheslav
Reply | Threaded
Open this post in threaded view
|

Re: Data compression in Ignite 2.0

Антон Чураев
Looks good for me.

Could You propose design of implementation in couple of sentences?
So that we can estimate the completeness and complexity of the proposal.

2017-06-06 15:26 GMT+03:00 Vyacheslav Daradur <[hidden email]>:

> Anton,
>
> Of course, the solution does not affect on existing implementation. I mean,
> there is no changes if user not use the annotation @BinaryCompression. (no
> performance changes)
> Only if user make decision to use compression on specific field or fields
> of a class - in that case compression will be used at marshalling in
> relation to annotated fields.
>
> 2017-06-06 15:10 GMT+03:00 Антон Чураев <[hidden email]>:
>
> > Vyacheslav,
> >
> > Is it possible to propose implementation that can be switched on
> on-demand?
> > In this case it should not affect performance of current solution.
> >
> > I mean, that users should make decision what is more important for them:
> > throutput or memory/net usage.
> > May be they will be choose not all objects, or only some attributes of
> > objects for compress.
> >
> > 2017-06-06 14:48 GMT+03:00 Vyacheslav Daradur <[hidden email]>:
> >
> > > Conclusion:
> > > Provided solution allows reduce size of an object in IgniteCache at the
> > > cost of throughput reduction (small - in some cases), it depends on
> part
> > of
> > > object which will be compressed and compression algorithm.
> > > I mean, we can make more effective use of memory, and in some cases it
> > can
> > > reduce loading of the interconnect. (replication, rebalancing)
> > >
> > > Especially, it will be particularly useful for object's fields which
> are
> > > large text (>~ 250 bytes) and can be effectively compressed.
> > >
> > > 2017-06-06 12:00 GMT+03:00 Антон Чураев <[hidden email]>:
> > >
> > > > Vyacheslav, thank you! But could you please provide a conclusions or
> > > > proposals based on this benchmarks?
> > > >
> > > > 2017-06-06 11:28 GMT+03:00 Vyacheslav Daradur <[hidden email]>:
> > > >
> > > > > Dmitry,
> > > > >
> > > > > Excel-pages:
> > > > >
> > > > > 1). "Compression ratio (2)" - shows object size, with compression
> and
> > > > > without compression. (Conditions: literal text)
> > > > > 1st graph shows compression ratios of using different compression
> > > > algrithms
> > > > > depending on size of compressed field.
> > > > > 2nd graph shows evaluation of size of objects depending on sizes
> and
> > > > > compression algorithms.
> > > > >
> > > > > 2). "Compression ratio (1)" - shows object size, with compression
> and
> > > > > without compression. (Conditions:  badly compressed character
> > sequence)
> > > > > 1st graph shows compression ratios of using different compression
> > > > > algrithms depending on size of compressed field.
> > > > > 2nd graph shows evaluation of size of objects depending on sizes
> and
> > > > > compression algorithms.
> > > > >
> > > > > 3) 'put-avg" - shows average time of the "put" operation depending
> on
> > > > size
> > > > > and compression algorithms.
> > > > >
> > > > > 4) 'put-thrpt" - shows throughput of the "put" operation depending
> on
> > > > size
> > > > > and compression algorithms.
> > > > >
> > > > > 5) 'get-avg" - shows average time of the "get" operation depending
> on
> > > > size
> > > > > and compression algorithms.
> > > > >
> > > > > 6) 'get-thrpt" - shows throughput of the "get" operation depending
> on
> > > > size
> > > > > and compression algorithms.
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > 2017-06-06 10:59 GMT+03:00 Dmitriy Setrakyan <
> [hidden email]
> > >:
> > > > >
> > > > > > Vladimir, I am not sure how to interpret the graphs? What are we
> > > > looking
> > > > > > at?
> > > > > >
> > > > > > On Tue, Jun 6, 2017 at 12:33 AM, Vyacheslav Daradur <
> > > > [hidden email]
> > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Hi, Igniters.
> > > > > > >
> > > > > > > I've prepared some benchmarking. Results [1].
> > > > > > >
> > > > > > > And I've prepared the evaluation in the form of diagrams [2].
> > > > > > >
> > > > > > > I hope that helps to interest the community and accelerates a
> > > > reaction
> > > > > to
> > > > > > > this improvment :)
> > > > > > >
> > > > > > > [1]
> > > > > > > https://github.com/daradurvs/ignite-compression/tree/
> > > > > > > master/src/main/resources/result
> > > > > > > [2] https://drive.google.com/file/d/
> > 0B2CeUAOgrHkoMklyZ25YTEdKcEk/
> > > > view
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > 2017-05-24 9:49 GMT+03:00 Vyacheslav Daradur <
> > [hidden email]
> > > >:
> > > > > > >
> > > > > > > > Guys, any thoughts?
> > > > > > > >
> > > > > > > > 2017-05-16 13:40 GMT+03:00 Vyacheslav Daradur <
> > > [hidden email]
> > > > >:
> > > > > > > >
> > > > > > > >> Hi guys,
> > > > > > > >>
> > > > > > > >> I've prepared the PR to show my idea.
> > > > > > > >> https://github.com/apache/ignite/pull/1951/files
> > > > > > > >>
> > > > > > > >> About querying - I've just copied existing tests and have
> > > > annotated
> > > > > > the
> > > > > > > >> testing data.
> > > > > > > >> https://github.com/apache/ignite/pull/1951/files#diff-
> c19a9d
> > > > > > > >> f4058141d059bb577e75244764
> > > > > > > >>
> > > > > > > >> It means fields which will be marked by @BinaryCompression
> > will
> > > be
> > > > > > > >> compressed at marshalling via BinaryMarshaller.
> > > > > > > >>
> > > > > > > >> This solution has no effect on existing data or project
> > > > > architecture.
> > > > > > > >>
> > > > > > > >> I'll be glad to see your thougths.
> > > > > > > >>
> > > > > > > >>
> > > > > > > >> 2017-05-15 19:18 GMT+03:00 Vyacheslav Daradur <
> > > > [hidden email]
> > > > > >:
> > > > > > > >>
> > > > > > > >>> Dmitriy,
> > > > > > > >>>
> > > > > > > >>> I have ready prototype. I want to show it.
> > > > > > > >>> It is always easier to discuss on example.
> > > > > > > >>>
> > > > > > > >>> 2017-05-15 19:02 GMT+03:00 Dmitriy Setrakyan <
> > > > > [hidden email]
> > > > > > >:
> > > > > > > >>>
> > > > > > > >>>> Vyacheslav,
> > > > > > > >>>>
> > > > > > > >>>> I think it is a bit premature to provide a PR without
> > getting
> > > a
> > > > > > > >>>> community
> > > > > > > >>>> consensus on the dev list. Please allow some time for the
> > > > > community
> > > > > > to
> > > > > > > >>>> respond.
> > > > > > > >>>>
> > > > > > > >>>> D.
> > > > > > > >>>>
> > > > > > > >>>> On Mon, May 15, 2017 at 6:36 AM, Vyacheslav Daradur <
> > > > > > > >>>> [hidden email]>
> > > > > > > >>>> wrote:
> > > > > > > >>>>
> > > > > > > >>>> > I created the ticket: https://issues.apache.org/jira
> > > > > > > >>>> /browse/IGNITE-5226
> > > > > > > >>>> >
> > > > > > > >>>> > I'll prepare a PR with described solution in couple of
> > days.
> > > > > > > >>>> >
> > > > > > > >>>> > 2017-05-15 15:05 GMT+03:00 Vyacheslav Daradur <
> > > > > > [hidden email]
> > > > > > > >:
> > > > > > > >>>> >
> > > > > > > >>>> > > Hi, Igniters!
> > > > > > > >>>> > >
> > > > > > > >>>> > > Apache 2.0 is released.
> > > > > > > >>>> > >
> > > > > > > >>>> > > Let's continue the discussion about a compression
> > design.
> > > > > > > >>>> > >
> > > > > > > >>>> > > At the moment, I found only one solution which is
> > > compatible
> > > > > > with
> > > > > > > >>>> > querying
> > > > > > > >>>> > > and indexing, this is per-objects-field compression.
> > > > > > > >>>> > > Per-fields compression means that metadata (a header)
> of
> > > an
> > > > > > object
> > > > > > > >>>> won't
> > > > > > > >>>> > > be compressed, only serialized values of an object
> > fields
> > > > (in
> > > > > > > bytes
> > > > > > > >>>> array
> > > > > > > >>>> > > form) will be compressed.
> > > > > > > >>>> > >
> > > > > > > >>>> > > This solution have some contentious issues:
> > > > > > > >>>> > > - small values, like primitives and short arrays -
> there
> > > > isn't
> > > > > > > >>>> sense to
> > > > > > > >>>> > > compress them;
> > > > > > > >>>> > > - there is no possible to use compression with
> > > > java-predefined
> > > > > > > >>>> types;
> > > > > > > >>>> > >
> > > > > > > >>>> > > We can provide an annotation, @IgniteCompression - for
> > > > > example,
> > > > > > > >>>> which can
> > > > > > > >>>> > > be used by users for marking fields to compress.
> > > > > > > >>>> > >
> > > > > > > >>>> > > Any thoughts?
> > > > > > > >>>> > >
> > > > > > > >>>> > > Maybe someone already have ready design?
> > > > > > > >>>> > >
> > > > > > > >>>> > > 2017-04-10 11:06 GMT+03:00 Vyacheslav Daradur <
> > > > > > > [hidden email]
> > > > > > > >>>> >:
> > > > > > > >>>> > >
> > > > > > > >>>> > >> Alexey,
> > > > > > > >>>> > >>
> > > > > > > >>>> > >> Yes, I've read it.
> > > > > > > >>>> > >>
> > > > > > > >>>> > >> Ok, let's discuss about public API design.
> > > > > > > >>>> > >>
> > > > > > > >>>> > >> I think we need to add some a configure entity to
> > > > > > > >>>> CacheConfiguration,
> > > > > > > >>>> > >> which will contain the Compressor interface
> > > implementation
> > > > > and
> > > > > > > some
> > > > > > > >>>> > usefull
> > > > > > > >>>> > >> parameters.
> > > > > > > >>>> > >> Or maybe to provide a BinaryMarshaller decorator,
> which
> > > > will
> > > > > be
> > > > > > > >>>> compress
> > > > > > > >>>> > >> data after marshalling.
> > > > > > > >>>> > >>
> > > > > > > >>>> > >>
> > > > > > > >>>> > >> 2017-04-10 10:40 GMT+03:00 Alexey Kuznetsov <
> > > > > > > [hidden email]
> > > > > > > >>>> >:
> > > > > > > >>>> > >>
> > > > > > > >>>> > >>> Vyacheslav,
> > > > > > > >>>> > >>>
> > > > > > > >>>> > >>> Did you read initial discussion [1] about
> compression?
> > > > > > > >>>> > >>> As far as I remember we agreed to add only some
> > > > "top-level"
> > > > > > API
> > > > > > > in
> > > > > > > >>>> > order
> > > > > > > >>>> > >>> to
> > > > > > > >>>> > >>> provide a way for
> > > > > > > >>>> > >>> Ignite users to inject some sort of custom
> > compression.
> > > > > > > >>>> > >>>
> > > > > > > >>>> > >>>
> > > > > > > >>>> > >>> [1]
> > > > > > > >>>> > >>> http://apache-ignite-developers.2346864.n4.nabble.
> > > > > com/Data-c
> > > > > > > >>>> > >>> ompression-in-Ignite-2-0-td10099.html
> > > > > > > >>>> > >>>
> > > > > > > >>>> > >>> On Mon, Apr 10, 2017 at 2:19 PM, daradurvs <
> > > > > > [hidden email]
> > > > > > > >
> > > > > > > >>>> > wrote:
> > > > > > > >>>> > >>>
> > > > > > > >>>> > >>> > Hi Igniters!
> > > > > > > >>>> > >>> >
> > > > > > > >>>> > >>> > I am interested in this task.
> > > > > > > >>>> > >>> > Provide some kind of pluggable compression SPI
> > support
> > > > > > > >>>> > >>> > <https://issues.apache.org/
> jira/browse/IGNITE-3592>
> > > > > > > >>>> > >>> >
> > > > > > > >>>> > >>> > I developed a solution on BinaryMarshaller-level,
> > but
> > > > > > reviewer
> > > > > > > >>>> has
> > > > > > > >>>> > >>> rejected
> > > > > > > >>>> > >>> > it.
> > > > > > > >>>> > >>> >
> > > > > > > >>>> > >>> > Let's continue discussion of task goals and
> solution
> > > > > design.
> > > > > > > >>>> > >>> > As I understood that, the main goal of this task
> is
> > to
> > > > > store
> > > > > > > >>>> data in
> > > > > > > >>>> > >>> > compressed form.
> > > > > > > >>>> > >>> > This is what I need from Ignite as its user.
> > > Compression
> > > > > > > >>>> provides
> > > > > > > >>>> > >>> economy
> > > > > > > >>>> > >>> > on
> > > > > > > >>>> > >>> > servers.
> > > > > > > >>>> > >>> > We can store more data on same servers at the cost
> > of
> > > > > > > >>>> increasing CPU
> > > > > > > >>>> > >>> > utilization.
> > > > > > > >>>> > >>> >
> > > > > > > >>>> > >>> > I'm researching a possibility of implementation of
> > > > > > compression
> > > > > > > >>>> at the
> > > > > > > >>>> > >>> > cache-level.
> > > > > > > >>>> > >>> >
> > > > > > > >>>> > >>> > Any thoughts?
> > > > > > > >>>> > >>> >
> > > > > > > >>>> > >>> > --
> > > > > > > >>>> > >>> > Best regards,
> > > > > > > >>>> > >>> > Vyacheslav
> > > > > > > >>>> > >>> >
> > > > > > > >>>> > >>> >
> > > > > > > >>>> > >>> >
> > > > > > > >>>> > >>> >
> > > > > > > >>>> > >>> > --
> > > > > > > >>>> > >>> > View this message in context:
> http://apache-ignite-
> > > > > > > >>>> > >>> > developers.2346864.n4.nabble.
> > com/Data-compression-in-
> > > > > > > >>>> > >>> > Ignite-2-0-tp10099p16317.html
> > > > > > > >>>> > >>> > Sent from the Apache Ignite Developers mailing
> list
> > > > > archive
> > > > > > at
> > > > > > > >>>> > >>> Nabble.com.
> > > > > > > >>>> > >>> >
> > > > > > > >>>> > >>>
> > > > > > > >>>> > >>>
> > > > > > > >>>> > >>>
> > > > > > > >>>> > >>> --
> > > > > > > >>>> > >>> Alexey Kuznetsov
> > > > > > > >>>> > >>>
> > > > > > > >>>> > >>
> > > > > > > >>>> > >>
> > > > > > > >>>> > >>
> > > > > > > >>>> > >> --
> > > > > > > >>>> > >> Best Regards, Vyacheslav
> > > > > > > >>>> > >>
> > > > > > > >>>> > >
> > > > > > > >>>> > >
> > > > > > > >>>> > >
> > > > > > > >>>> > > --
> > > > > > > >>>> > > Best Regards, Vyacheslav
> > > > > > > >>>> > >
> > > > > > > >>>> >
> > > > > > > >>>> >
> > > > > > > >>>> >
> > > > > > > >>>> > --
> > > > > > > >>>> > Best Regards, Vyacheslav
> > > > > > > >>>> >
> > > > > > > >>>>
> > > > > > > >>>
> > > > > > > >>>
> > > > > > > >>>
> > > > > > > >>> --
> > > > > > > >>> Best Regards, Vyacheslav
> > > > > > > >>>
> > > > > > > >>
> > > > > > > >>
> > > > > > > >>
> > > > > > > >> --
> > > > > > > >> Best Regards, Vyacheslav
> > > > > > > >>
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Best Regards, Vyacheslav
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Best Regards, Vyacheslav
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Best Regards, Vyacheslav
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > >
> > > > Best Regards, Anton Churaev
> > > >
> > >
> > >
> > >
> > > --
> > > Best Regards, Vyacheslav
> > >
> >
> >
> >
> > --
> >
> > Best Regards, Anton Churaev
> >
>
>
>
> --
> Best Regards, Vyacheslav
>



--

Best Regards, Anton Churaev
Reply | Threaded
Open this post in threaded view
|

Re: Data compression in Ignite 2.0

daradurvs
In short,

During marshalling a fields is represented as BinaryFieldAccessor which
manages its marshalling. It checks if the field is marked by annotation
@BinaryCompression, in that case - binary  representation of field (bytes
array) will be compressed. It will be marked as compressed by types
constant (GridBinaryMarshaller.COMPRESSED), after this the compressed bytes
array wiil be include in binary representation of whole object. Note,
header of marshalled object will not be compressed. Compression affected
only object's field representation.

Objects in IgniteCache is represented as BinaryObject which is wrapper over
bytes array of marshalled object.
BinaryObject provides some usefull methods, which are used by Ignite
systems.
For example, the Queries use BinaryObject#field method, which deserializes
only field of object, without deserializing of whole object.
BinaryObject#field method during deserialization, if meets the constant of
compressed type, decompress this bytes array, then continue unmarshalling
as usual.

Now, I introduced the Compressor interface in IgniteConfigurations, it
allows user to use own implementation of compressor - it is the requirement
in the task[1].

As far as I know, Vladimir Ozerov doesn't like the idea of granting this
opportunity to the user.
In that case we can choose a compression algorithm which we will provide by
default and will move the interface to internals of binary infractructure.
For this case I've prepared benchmarked, which I've sent earlier.

I vote for ZSTD algorithm[2], it provides good compression ratio and good
throughput. It has implementation in Java, .NET and C++, and has
ASF-friendly license, we can use it in the all Ignite platforms.
You can look at an assessment of this algorithm in my benchmark's

[1] https://issues.apache.org/jira/browse/IGNITE-3592
[2]https://github.com/facebook/zstd


2017-06-06 16:02 GMT+03:00 Антон Чураев <[hidden email]>:

> Looks good for me.
>
> Could You propose design of implementation in couple of sentences?
> So that we can estimate the completeness and complexity of the proposal.
>
> 2017-06-06 15:26 GMT+03:00 Vyacheslav Daradur <[hidden email]>:
>
> > Anton,
> >
> > Of course, the solution does not affect on existing implementation. I
> mean,
> > there is no changes if user not use the annotation @BinaryCompression.
> (no
> > performance changes)
> > Only if user make decision to use compression on specific field or fields
> > of a class - in that case compression will be used at marshalling in
> > relation to annotated fields.
> >
> > 2017-06-06 15:10 GMT+03:00 Антон Чураев <[hidden email]>:
> >
> > > Vyacheslav,
> > >
> > > Is it possible to propose implementation that can be switched on
> > on-demand?
> > > In this case it should not affect performance of current solution.
> > >
> > > I mean, that users should make decision what is more important for
> them:
> > > throutput or memory/net usage.
> > > May be they will be choose not all objects, or only some attributes of
> > > objects for compress.
> > >
> > > 2017-06-06 14:48 GMT+03:00 Vyacheslav Daradur <[hidden email]>:
> > >
> > > > Conclusion:
> > > > Provided solution allows reduce size of an object in IgniteCache at
> the
> > > > cost of throughput reduction (small - in some cases), it depends on
> > part
> > > of
> > > > object which will be compressed and compression algorithm.
> > > > I mean, we can make more effective use of memory, and in some cases
> it
> > > can
> > > > reduce loading of the interconnect. (replication, rebalancing)
> > > >
> > > > Especially, it will be particularly useful for object's fields which
> > are
> > > > large text (>~ 250 bytes) and can be effectively compressed.
> > > >
> > > > 2017-06-06 12:00 GMT+03:00 Антон Чураев <[hidden email]>:
> > > >
> > > > > Vyacheslav, thank you! But could you please provide a conclusions
> or
> > > > > proposals based on this benchmarks?
> > > > >
> > > > > 2017-06-06 11:28 GMT+03:00 Vyacheslav Daradur <[hidden email]
> >:
> > > > >
> > > > > > Dmitry,
> > > > > >
> > > > > > Excel-pages:
> > > > > >
> > > > > > 1). "Compression ratio (2)" - shows object size, with compression
> > and
> > > > > > without compression. (Conditions: literal text)
> > > > > > 1st graph shows compression ratios of using different compression
> > > > > algrithms
> > > > > > depending on size of compressed field.
> > > > > > 2nd graph shows evaluation of size of objects depending on sizes
> > and
> > > > > > compression algorithms.
> > > > > >
> > > > > > 2). "Compression ratio (1)" - shows object size, with compression
> > and
> > > > > > without compression. (Conditions:  badly compressed character
> > > sequence)
> > > > > > 1st graph shows compression ratios of using different compression
> > > > > > algrithms depending on size of compressed field.
> > > > > > 2nd graph shows evaluation of size of objects depending on sizes
> > and
> > > > > > compression algorithms.
> > > > > >
> > > > > > 3) 'put-avg" - shows average time of the "put" operation
> depending
> > on
> > > > > size
> > > > > > and compression algorithms.
> > > > > >
> > > > > > 4) 'put-thrpt" - shows throughput of the "put" operation
> depending
> > on
> > > > > size
> > > > > > and compression algorithms.
> > > > > >
> > > > > > 5) 'get-avg" - shows average time of the "get" operation
> depending
> > on
> > > > > size
> > > > > > and compression algorithms.
> > > > > >
> > > > > > 6) 'get-thrpt" - shows throughput of the "get" operation
> depending
> > on
> > > > > size
> > > > > > and compression algorithms.
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > 2017-06-06 10:59 GMT+03:00 Dmitriy Setrakyan <
> > [hidden email]
> > > >:
> > > > > >
> > > > > > > Vladimir, I am not sure how to interpret the graphs? What are
> we
> > > > > looking
> > > > > > > at?
> > > > > > >
> > > > > > > On Tue, Jun 6, 2017 at 12:33 AM, Vyacheslav Daradur <
> > > > > [hidden email]
> > > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi, Igniters.
> > > > > > > >
> > > > > > > > I've prepared some benchmarking. Results [1].
> > > > > > > >
> > > > > > > > And I've prepared the evaluation in the form of diagrams [2].
> > > > > > > >
> > > > > > > > I hope that helps to interest the community and accelerates a
> > > > > reaction
> > > > > > to
> > > > > > > > this improvment :)
> > > > > > > >
> > > > > > > > [1]
> > > > > > > > https://github.com/daradurvs/ignite-compression/tree/
> > > > > > > > master/src/main/resources/result
> > > > > > > > [2] https://drive.google.com/file/d/
> > > 0B2CeUAOgrHkoMklyZ25YTEdKcEk/
> > > > > view
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > 2017-05-24 9:49 GMT+03:00 Vyacheslav Daradur <
> > > [hidden email]
> > > > >:
> > > > > > > >
> > > > > > > > > Guys, any thoughts?
> > > > > > > > >
> > > > > > > > > 2017-05-16 13:40 GMT+03:00 Vyacheslav Daradur <
> > > > [hidden email]
> > > > > >:
> > > > > > > > >
> > > > > > > > >> Hi guys,
> > > > > > > > >>
> > > > > > > > >> I've prepared the PR to show my idea.
> > > > > > > > >> https://github.com/apache/ignite/pull/1951/files
> > > > > > > > >>
> > > > > > > > >> About querying - I've just copied existing tests and have
> > > > > annotated
> > > > > > > the
> > > > > > > > >> testing data.
> > > > > > > > >> https://github.com/apache/ignite/pull/1951/files#diff-
> > c19a9d
> > > > > > > > >> f4058141d059bb577e75244764
> > > > > > > > >>
> > > > > > > > >> It means fields which will be marked by @BinaryCompression
> > > will
> > > > be
> > > > > > > > >> compressed at marshalling via BinaryMarshaller.
> > > > > > > > >>
> > > > > > > > >> This solution has no effect on existing data or project
> > > > > > architecture.
> > > > > > > > >>
> > > > > > > > >> I'll be glad to see your thougths.
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >> 2017-05-15 19:18 GMT+03:00 Vyacheslav Daradur <
> > > > > [hidden email]
> > > > > > >:
> > > > > > > > >>
> > > > > > > > >>> Dmitriy,
> > > > > > > > >>>
> > > > > > > > >>> I have ready prototype. I want to show it.
> > > > > > > > >>> It is always easier to discuss on example.
> > > > > > > > >>>
> > > > > > > > >>> 2017-05-15 19:02 GMT+03:00 Dmitriy Setrakyan <
> > > > > > [hidden email]
> > > > > > > >:
> > > > > > > > >>>
> > > > > > > > >>>> Vyacheslav,
> > > > > > > > >>>>
> > > > > > > > >>>> I think it is a bit premature to provide a PR without
> > > getting
> > > > a
> > > > > > > > >>>> community
> > > > > > > > >>>> consensus on the dev list. Please allow some time for
> the
> > > > > > community
> > > > > > > to
> > > > > > > > >>>> respond.
> > > > > > > > >>>>
> > > > > > > > >>>> D.
> > > > > > > > >>>>
> > > > > > > > >>>> On Mon, May 15, 2017 at 6:36 AM, Vyacheslav Daradur <
> > > > > > > > >>>> [hidden email]>
> > > > > > > > >>>> wrote:
> > > > > > > > >>>>
> > > > > > > > >>>> > I created the ticket: https://issues.apache.org/jira
> > > > > > > > >>>> /browse/IGNITE-5226
> > > > > > > > >>>> >
> > > > > > > > >>>> > I'll prepare a PR with described solution in couple of
> > > days.
> > > > > > > > >>>> >
> > > > > > > > >>>> > 2017-05-15 15:05 GMT+03:00 Vyacheslav Daradur <
> > > > > > > [hidden email]
> > > > > > > > >:
> > > > > > > > >>>> >
> > > > > > > > >>>> > > Hi, Igniters!
> > > > > > > > >>>> > >
> > > > > > > > >>>> > > Apache 2.0 is released.
> > > > > > > > >>>> > >
> > > > > > > > >>>> > > Let's continue the discussion about a compression
> > > design.
> > > > > > > > >>>> > >
> > > > > > > > >>>> > > At the moment, I found only one solution which is
> > > > compatible
> > > > > > > with
> > > > > > > > >>>> > querying
> > > > > > > > >>>> > > and indexing, this is per-objects-field compression.
> > > > > > > > >>>> > > Per-fields compression means that metadata (a
> header)
> > of
> > > > an
> > > > > > > object
> > > > > > > > >>>> won't
> > > > > > > > >>>> > > be compressed, only serialized values of an object
> > > fields
> > > > > (in
> > > > > > > > bytes
> > > > > > > > >>>> array
> > > > > > > > >>>> > > form) will be compressed.
> > > > > > > > >>>> > >
> > > > > > > > >>>> > > This solution have some contentious issues:
> > > > > > > > >>>> > > - small values, like primitives and short arrays -
> > there
> > > > > isn't
> > > > > > > > >>>> sense to
> > > > > > > > >>>> > > compress them;
> > > > > > > > >>>> > > - there is no possible to use compression with
> > > > > java-predefined
> > > > > > > > >>>> types;
> > > > > > > > >>>> > >
> > > > > > > > >>>> > > We can provide an annotation, @IgniteCompression -
> for
> > > > > > example,
> > > > > > > > >>>> which can
> > > > > > > > >>>> > > be used by users for marking fields to compress.
> > > > > > > > >>>> > >
> > > > > > > > >>>> > > Any thoughts?
> > > > > > > > >>>> > >
> > > > > > > > >>>> > > Maybe someone already have ready design?
> > > > > > > > >>>> > >
> > > > > > > > >>>> > > 2017-04-10 11:06 GMT+03:00 Vyacheslav Daradur <
> > > > > > > > [hidden email]
> > > > > > > > >>>> >:
> > > > > > > > >>>> > >
> > > > > > > > >>>> > >> Alexey,
> > > > > > > > >>>> > >>
> > > > > > > > >>>> > >> Yes, I've read it.
> > > > > > > > >>>> > >>
> > > > > > > > >>>> > >> Ok, let's discuss about public API design.
> > > > > > > > >>>> > >>
> > > > > > > > >>>> > >> I think we need to add some a configure entity to
> > > > > > > > >>>> CacheConfiguration,
> > > > > > > > >>>> > >> which will contain the Compressor interface
> > > > implementation
> > > > > > and
> > > > > > > > some
> > > > > > > > >>>> > usefull
> > > > > > > > >>>> > >> parameters.
> > > > > > > > >>>> > >> Or maybe to provide a BinaryMarshaller decorator,
> > which
> > > > > will
> > > > > > be
> > > > > > > > >>>> compress
> > > > > > > > >>>> > >> data after marshalling.
> > > > > > > > >>>> > >>
> > > > > > > > >>>> > >>
> > > > > > > > >>>> > >> 2017-04-10 10:40 GMT+03:00 Alexey Kuznetsov <
> > > > > > > > [hidden email]
> > > > > > > > >>>> >:
> > > > > > > > >>>> > >>
> > > > > > > > >>>> > >>> Vyacheslav,
> > > > > > > > >>>> > >>>
> > > > > > > > >>>> > >>> Did you read initial discussion [1] about
> > compression?
> > > > > > > > >>>> > >>> As far as I remember we agreed to add only some
> > > > > "top-level"
> > > > > > > API
> > > > > > > > in
> > > > > > > > >>>> > order
> > > > > > > > >>>> > >>> to
> > > > > > > > >>>> > >>> provide a way for
> > > > > > > > >>>> > >>> Ignite users to inject some sort of custom
> > > compression.
> > > > > > > > >>>> > >>>
> > > > > > > > >>>> > >>>
> > > > > > > > >>>> > >>> [1]
> > > > > > > > >>>> > >>> http://apache-ignite-developers.2346864.n4.nabble
> .
> > > > > > com/Data-c
> > > > > > > > >>>> > >>> ompression-in-Ignite-2-0-td10099.html
> > > > > > > > >>>> > >>>
> > > > > > > > >>>> > >>> On Mon, Apr 10, 2017 at 2:19 PM, daradurvs <
> > > > > > > [hidden email]
> > > > > > > > >
> > > > > > > > >>>> > wrote:
> > > > > > > > >>>> > >>>
> > > > > > > > >>>> > >>> > Hi Igniters!
> > > > > > > > >>>> > >>> >
> > > > > > > > >>>> > >>> > I am interested in this task.
> > > > > > > > >>>> > >>> > Provide some kind of pluggable compression SPI
> > > support
> > > > > > > > >>>> > >>> > <https://issues.apache.org/
> > jira/browse/IGNITE-3592>
> > > > > > > > >>>> > >>> >
> > > > > > > > >>>> > >>> > I developed a solution on
> BinaryMarshaller-level,
> > > but
> > > > > > > reviewer
> > > > > > > > >>>> has
> > > > > > > > >>>> > >>> rejected
> > > > > > > > >>>> > >>> > it.
> > > > > > > > >>>> > >>> >
> > > > > > > > >>>> > >>> > Let's continue discussion of task goals and
> > solution
> > > > > > design.
> > > > > > > > >>>> > >>> > As I understood that, the main goal of this task
> > is
> > > to
> > > > > > store
> > > > > > > > >>>> data in
> > > > > > > > >>>> > >>> > compressed form.
> > > > > > > > >>>> > >>> > This is what I need from Ignite as its user.
> > > > Compression
> > > > > > > > >>>> provides
> > > > > > > > >>>> > >>> economy
> > > > > > > > >>>> > >>> > on
> > > > > > > > >>>> > >>> > servers.
> > > > > > > > >>>> > >>> > We can store more data on same servers at the
> cost
> > > of
> > > > > > > > >>>> increasing CPU
> > > > > > > > >>>> > >>> > utilization.
> > > > > > > > >>>> > >>> >
> > > > > > > > >>>> > >>> > I'm researching a possibility of implementation
> of
> > > > > > > compression
> > > > > > > > >>>> at the
> > > > > > > > >>>> > >>> > cache-level.
> > > > > > > > >>>> > >>> >
> > > > > > > > >>>> > >>> > Any thoughts?
> > > > > > > > >>>> > >>> >
> > > > > > > > >>>> > >>> > --
> > > > > > > > >>>> > >>> > Best regards,
> > > > > > > > >>>> > >>> > Vyacheslav
> > > > > > > > >>>> > >>> >
> > > > > > > > >>>> > >>> >
> > > > > > > > >>>> > >>> >
> > > > > > > > >>>> > >>> >
> > > > > > > > >>>> > >>> > --
> > > > > > > > >>>> > >>> > View this message in context:
> > http://apache-ignite-
> > > > > > > > >>>> > >>> > developers.2346864.n4.nabble.
> > > com/Data-compression-in-
> > > > > > > > >>>> > >>> > Ignite-2-0-tp10099p16317.html
> > > > > > > > >>>> > >>> > Sent from the Apache Ignite Developers mailing
> > list
> > > > > > archive
> > > > > > > at
> > > > > > > > >>>> > >>> Nabble.com.
> > > > > > > > >>>> > >>> >
> > > > > > > > >>>> > >>>
> > > > > > > > >>>> > >>>
> > > > > > > > >>>> > >>>
> > > > > > > > >>>> > >>> --
> > > > > > > > >>>> > >>> Alexey Kuznetsov
> > > > > > > > >>>> > >>>
> > > > > > > > >>>> > >>
> > > > > > > > >>>> > >>
> > > > > > > > >>>> > >>
> > > > > > > > >>>> > >> --
> > > > > > > > >>>> > >> Best Regards, Vyacheslav
> > > > > > > > >>>> > >>
> > > > > > > > >>>> > >
> > > > > > > > >>>> > >
> > > > > > > > >>>> > >
> > > > > > > > >>>> > > --
> > > > > > > > >>>> > > Best Regards, Vyacheslav
> > > > > > > > >>>> > >
> > > > > > > > >>>> >
> > > > > > > > >>>> >
> > > > > > > > >>>> >
> > > > > > > > >>>> > --
> > > > > > > > >>>> > Best Regards, Vyacheslav
> > > > > > > > >>>> >
> > > > > > > > >>>>
> > > > > > > > >>>
> > > > > > > > >>>
> > > > > > > > >>>
> > > > > > > > >>> --
> > > > > > > > >>> Best Regards, Vyacheslav
> > > > > > > > >>>
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >> --
> > > > > > > > >> Best Regards, Vyacheslav
> > > > > > > > >>
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > Best Regards, Vyacheslav
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Best Regards, Vyacheslav
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Best Regards, Vyacheslav
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > >
> > > > > Best Regards, Anton Churaev
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Best Regards, Vyacheslav
> > > >
> > >
> > >
> > >
> > > --
> > >
> > > Best Regards, Anton Churaev
> > >
> >
> >
> >
> > --
> > Best Regards, Vyacheslav
> >
>
>
>
> --
>
> Best Regards, Anton Churaev
>



--
Best Regards, Vyacheslav
Reply | Threaded
Open this post in threaded view
|

Re: Data compression in Ignite 2.0

Антон Чураев
Vyacheslav, correct me if something wrong

We could provide opportunity of choose between CPU usage and MEM/NET usage
for users by compression some attributes of stored objects.
You have learned design, and it is possible to localize changes in
marshalling without performance affect and current functionality.

I think, that it's usefull for our project and users.
Community, what do you think about this proposal?


2017-06-06 17:29 GMT+03:00 Vyacheslav Daradur <[hidden email]>:

> In short,
>
> During marshalling a fields is represented as BinaryFieldAccessor which
> manages its marshalling. It checks if the field is marked by annotation
> @BinaryCompression, in that case - binary  representation of field (bytes
> array) will be compressed. It will be marked as compressed by types
> constant (GridBinaryMarshaller.COMPRESSED), after this the compressed
> bytes
> array wiil be include in binary representation of whole object. Note,
> header of marshalled object will not be compressed. Compression affected
> only object's field representation.
>
> Objects in IgniteCache is represented as BinaryObject which is wrapper over
> bytes array of marshalled object.
> BinaryObject provides some usefull methods, which are used by Ignite
> systems.
> For example, the Queries use BinaryObject#field method, which deserializes
> only field of object, without deserializing of whole object.
> BinaryObject#field method during deserialization, if meets the constant of
> compressed type, decompress this bytes array, then continue unmarshalling
> as usual.
>
> Now, I introduced the Compressor interface in IgniteConfigurations, it
> allows user to use own implementation of compressor - it is the requirement
> in the task[1].
>
> As far as I know, Vladimir Ozerov doesn't like the idea of granting this
> opportunity to the user.
> In that case we can choose a compression algorithm which we will provide by
> default and will move the interface to internals of binary infractructure.
> For this case I've prepared benchmarked, which I've sent earlier.
>
> I vote for ZSTD algorithm[2], it provides good compression ratio and good
> throughput. It has implementation in Java, .NET and C++, and has
> ASF-friendly license, we can use it in the all Ignite platforms.
> You can look at an assessment of this algorithm in my benchmark's
>
> [1] https://issues.apache.org/jira/browse/IGNITE-3592
> [2]https://github.com/facebook/zstd
>
>
> 2017-06-06 16:02 GMT+03:00 Антон Чураев <[hidden email]>:
>
> > Looks good for me.
> >
> > Could You propose design of implementation in couple of sentences?
> > So that we can estimate the completeness and complexity of the proposal.
> >
> > 2017-06-06 15:26 GMT+03:00 Vyacheslav Daradur <[hidden email]>:
> >
> > > Anton,
> > >
> > > Of course, the solution does not affect on existing implementation. I
> > mean,
> > > there is no changes if user not use the annotation @BinaryCompression.
> > (no
> > > performance changes)
> > > Only if user make decision to use compression on specific field or
> fields
> > > of a class - in that case compression will be used at marshalling in
> > > relation to annotated fields.
> > >
> > > 2017-06-06 15:10 GMT+03:00 Антон Чураев <[hidden email]>:
> > >
> > > > Vyacheslav,
> > > >
> > > > Is it possible to propose implementation that can be switched on
> > > on-demand?
> > > > In this case it should not affect performance of current solution.
> > > >
> > > > I mean, that users should make decision what is more important for
> > them:
> > > > throutput or memory/net usage.
> > > > May be they will be choose not all objects, or only some attributes
> of
> > > > objects for compress.
> > > >
> > > > 2017-06-06 14:48 GMT+03:00 Vyacheslav Daradur <[hidden email]>:
> > > >
> > > > > Conclusion:
> > > > > Provided solution allows reduce size of an object in IgniteCache at
> > the
> > > > > cost of throughput reduction (small - in some cases), it depends on
> > > part
> > > > of
> > > > > object which will be compressed and compression algorithm.
> > > > > I mean, we can make more effective use of memory, and in some cases
> > it
> > > > can
> > > > > reduce loading of the interconnect. (replication, rebalancing)
> > > > >
> > > > > Especially, it will be particularly useful for object's fields
> which
> > > are
> > > > > large text (>~ 250 bytes) and can be effectively compressed.
> > > > >
> > > > > 2017-06-06 12:00 GMT+03:00 Антон Чураев <[hidden email]>:
> > > > >
> > > > > > Vyacheslav, thank you! But could you please provide a conclusions
> > or
> > > > > > proposals based on this benchmarks?
> > > > > >
> > > > > > 2017-06-06 11:28 GMT+03:00 Vyacheslav Daradur <
> [hidden email]
> > >:
> > > > > >
> > > > > > > Dmitry,
> > > > > > >
> > > > > > > Excel-pages:
> > > > > > >
> > > > > > > 1). "Compression ratio (2)" - shows object size, with
> compression
> > > and
> > > > > > > without compression. (Conditions: literal text)
> > > > > > > 1st graph shows compression ratios of using different
> compression
> > > > > > algrithms
> > > > > > > depending on size of compressed field.
> > > > > > > 2nd graph shows evaluation of size of objects depending on
> sizes
> > > and
> > > > > > > compression algorithms.
> > > > > > >
> > > > > > > 2). "Compression ratio (1)" - shows object size, with
> compression
> > > and
> > > > > > > without compression. (Conditions:  badly compressed character
> > > > sequence)
> > > > > > > 1st graph shows compression ratios of using different
> compression
> > > > > > > algrithms depending on size of compressed field.
> > > > > > > 2nd graph shows evaluation of size of objects depending on
> sizes
> > > and
> > > > > > > compression algorithms.
> > > > > > >
> > > > > > > 3) 'put-avg" - shows average time of the "put" operation
> > depending
> > > on
> > > > > > size
> > > > > > > and compression algorithms.
> > > > > > >
> > > > > > > 4) 'put-thrpt" - shows throughput of the "put" operation
> > depending
> > > on
> > > > > > size
> > > > > > > and compression algorithms.
> > > > > > >
> > > > > > > 5) 'get-avg" - shows average time of the "get" operation
> > depending
> > > on
> > > > > > size
> > > > > > > and compression algorithms.
> > > > > > >
> > > > > > > 6) 'get-thrpt" - shows throughput of the "get" operation
> > depending
> > > on
> > > > > > size
> > > > > > > and compression algorithms.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > 2017-06-06 10:59 GMT+03:00 Dmitriy Setrakyan <
> > > [hidden email]
> > > > >:
> > > > > > >
> > > > > > > > Vladimir, I am not sure how to interpret the graphs? What are
> > we
> > > > > > looking
> > > > > > > > at?
> > > > > > > >
> > > > > > > > On Tue, Jun 6, 2017 at 12:33 AM, Vyacheslav Daradur <
> > > > > > [hidden email]
> > > > > > > >
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi, Igniters.
> > > > > > > > >
> > > > > > > > > I've prepared some benchmarking. Results [1].
> > > > > > > > >
> > > > > > > > > And I've prepared the evaluation in the form of diagrams
> [2].
> > > > > > > > >
> > > > > > > > > I hope that helps to interest the community and
> accelerates a
> > > > > > reaction
> > > > > > > to
> > > > > > > > > this improvment :)
> > > > > > > > >
> > > > > > > > > [1]
> > > > > > > > > https://github.com/daradurvs/ignite-compression/tree/
> > > > > > > > > master/src/main/resources/result
> > > > > > > > > [2] https://drive.google.com/file/d/
> > > > 0B2CeUAOgrHkoMklyZ25YTEdKcEk/
> > > > > > view
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > 2017-05-24 9:49 GMT+03:00 Vyacheslav Daradur <
> > > > [hidden email]
> > > > > >:
> > > > > > > > >
> > > > > > > > > > Guys, any thoughts?
> > > > > > > > > >
> > > > > > > > > > 2017-05-16 13:40 GMT+03:00 Vyacheslav Daradur <
> > > > > [hidden email]
> > > > > > >:
> > > > > > > > > >
> > > > > > > > > >> Hi guys,
> > > > > > > > > >>
> > > > > > > > > >> I've prepared the PR to show my idea.
> > > > > > > > > >> https://github.com/apache/ignite/pull/1951/files
> > > > > > > > > >>
> > > > > > > > > >> About querying - I've just copied existing tests and
> have
> > > > > > annotated
> > > > > > > > the
> > > > > > > > > >> testing data.
> > > > > > > > > >> https://github.com/apache/ignite/pull/1951/files#diff-
> > > c19a9d
> > > > > > > > > >> f4058141d059bb577e75244764
> > > > > > > > > >>
> > > > > > > > > >> It means fields which will be marked by
> @BinaryCompression
> > > > will
> > > > > be
> > > > > > > > > >> compressed at marshalling via BinaryMarshaller.
> > > > > > > > > >>
> > > > > > > > > >> This solution has no effect on existing data or project
> > > > > > > architecture.
> > > > > > > > > >>
> > > > > > > > > >> I'll be glad to see your thougths.
> > > > > > > > > >>
> > > > > > > > > >>
> > > > > > > > > >> 2017-05-15 19:18 GMT+03:00 Vyacheslav Daradur <
> > > > > > [hidden email]
> > > > > > > >:
> > > > > > > > > >>
> > > > > > > > > >>> Dmitriy,
> > > > > > > > > >>>
> > > > > > > > > >>> I have ready prototype. I want to show it.
> > > > > > > > > >>> It is always easier to discuss on example.
> > > > > > > > > >>>
> > > > > > > > > >>> 2017-05-15 19:02 GMT+03:00 Dmitriy Setrakyan <
> > > > > > > [hidden email]
> > > > > > > > >:
> > > > > > > > > >>>
> > > > > > > > > >>>> Vyacheslav,
> > > > > > > > > >>>>
> > > > > > > > > >>>> I think it is a bit premature to provide a PR without
> > > > getting
> > > > > a
> > > > > > > > > >>>> community
> > > > > > > > > >>>> consensus on the dev list. Please allow some time for
> > the
> > > > > > > community
> > > > > > > > to
> > > > > > > > > >>>> respond.
> > > > > > > > > >>>>
> > > > > > > > > >>>> D.
> > > > > > > > > >>>>
> > > > > > > > > >>>> On Mon, May 15, 2017 at 6:36 AM, Vyacheslav Daradur <
> > > > > > > > > >>>> [hidden email]>
> > > > > > > > > >>>> wrote:
> > > > > > > > > >>>>
> > > > > > > > > >>>> > I created the ticket:
> https://issues.apache.org/jira
> > > > > > > > > >>>> /browse/IGNITE-5226
> > > > > > > > > >>>> >
> > > > > > > > > >>>> > I'll prepare a PR with described solution in couple
> of
> > > > days.
> > > > > > > > > >>>> >
> > > > > > > > > >>>> > 2017-05-15 15:05 GMT+03:00 Vyacheslav Daradur <
> > > > > > > > [hidden email]
> > > > > > > > > >:
> > > > > > > > > >>>> >
> > > > > > > > > >>>> > > Hi, Igniters!
> > > > > > > > > >>>> > >
> > > > > > > > > >>>> > > Apache 2.0 is released.
> > > > > > > > > >>>> > >
> > > > > > > > > >>>> > > Let's continue the discussion about a compression
> > > > design.
> > > > > > > > > >>>> > >
> > > > > > > > > >>>> > > At the moment, I found only one solution which is
> > > > > compatible
> > > > > > > > with
> > > > > > > > > >>>> > querying
> > > > > > > > > >>>> > > and indexing, this is per-objects-field
> compression.
> > > > > > > > > >>>> > > Per-fields compression means that metadata (a
> > header)
> > > of
> > > > > an
> > > > > > > > object
> > > > > > > > > >>>> won't
> > > > > > > > > >>>> > > be compressed, only serialized values of an object
> > > > fields
> > > > > > (in
> > > > > > > > > bytes
> > > > > > > > > >>>> array
> > > > > > > > > >>>> > > form) will be compressed.
> > > > > > > > > >>>> > >
> > > > > > > > > >>>> > > This solution have some contentious issues:
> > > > > > > > > >>>> > > - small values, like primitives and short arrays -
> > > there
> > > > > > isn't
> > > > > > > > > >>>> sense to
> > > > > > > > > >>>> > > compress them;
> > > > > > > > > >>>> > > - there is no possible to use compression with
> > > > > > java-predefined
> > > > > > > > > >>>> types;
> > > > > > > > > >>>> > >
> > > > > > > > > >>>> > > We can provide an annotation, @IgniteCompression -
> > for
> > > > > > > example,
> > > > > > > > > >>>> which can
> > > > > > > > > >>>> > > be used by users for marking fields to compress.
> > > > > > > > > >>>> > >
> > > > > > > > > >>>> > > Any thoughts?
> > > > > > > > > >>>> > >
> > > > > > > > > >>>> > > Maybe someone already have ready design?
> > > > > > > > > >>>> > >
> > > > > > > > > >>>> > > 2017-04-10 11:06 GMT+03:00 Vyacheslav Daradur <
> > > > > > > > > [hidden email]
> > > > > > > > > >>>> >:
> > > > > > > > > >>>> > >
> > > > > > > > > >>>> > >> Alexey,
> > > > > > > > > >>>> > >>
> > > > > > > > > >>>> > >> Yes, I've read it.
> > > > > > > > > >>>> > >>
> > > > > > > > > >>>> > >> Ok, let's discuss about public API design.
> > > > > > > > > >>>> > >>
> > > > > > > > > >>>> > >> I think we need to add some a configure entity to
> > > > > > > > > >>>> CacheConfiguration,
> > > > > > > > > >>>> > >> which will contain the Compressor interface
> > > > > implementation
> > > > > > > and
> > > > > > > > > some
> > > > > > > > > >>>> > usefull
> > > > > > > > > >>>> > >> parameters.
> > > > > > > > > >>>> > >> Or maybe to provide a BinaryMarshaller decorator,
> > > which
> > > > > > will
> > > > > > > be
> > > > > > > > > >>>> compress
> > > > > > > > > >>>> > >> data after marshalling.
> > > > > > > > > >>>> > >>
> > > > > > > > > >>>> > >>
> > > > > > > > > >>>> > >> 2017-04-10 10:40 GMT+03:00 Alexey Kuznetsov <
> > > > > > > > > [hidden email]
> > > > > > > > > >>>> >:
> > > > > > > > > >>>> > >>
> > > > > > > > > >>>> > >>> Vyacheslav,
> > > > > > > > > >>>> > >>>
> > > > > > > > > >>>> > >>> Did you read initial discussion [1] about
> > > compression?
> > > > > > > > > >>>> > >>> As far as I remember we agreed to add only some
> > > > > > "top-level"
> > > > > > > > API
> > > > > > > > > in
> > > > > > > > > >>>> > order
> > > > > > > > > >>>> > >>> to
> > > > > > > > > >>>> > >>> provide a way for
> > > > > > > > > >>>> > >>> Ignite users to inject some sort of custom
> > > > compression.
> > > > > > > > > >>>> > >>>
> > > > > > > > > >>>> > >>>
> > > > > > > > > >>>> > >>> [1]
> > > > > > > > > >>>> > >>> http://apache-ignite-developer
> s.2346864.n4.nabble
> > .
> > > > > > > com/Data-c
> > > > > > > > > >>>> > >>> ompression-in-Ignite-2-0-td10099.html
> > > > > > > > > >>>> > >>>
> > > > > > > > > >>>> > >>> On Mon, Apr 10, 2017 at 2:19 PM, daradurvs <
> > > > > > > > [hidden email]
> > > > > > > > > >
> > > > > > > > > >>>> > wrote:
> > > > > > > > > >>>> > >>>
> > > > > > > > > >>>> > >>> > Hi Igniters!
> > > > > > > > > >>>> > >>> >
> > > > > > > > > >>>> > >>> > I am interested in this task.
> > > > > > > > > >>>> > >>> > Provide some kind of pluggable compression SPI
> > > > support
> > > > > > > > > >>>> > >>> > <https://issues.apache.org/
> > > jira/browse/IGNITE-3592>
> > > > > > > > > >>>> > >>> >
> > > > > > > > > >>>> > >>> > I developed a solution on
> > BinaryMarshaller-level,
> > > > but
> > > > > > > > reviewer
> > > > > > > > > >>>> has
> > > > > > > > > >>>> > >>> rejected
> > > > > > > > > >>>> > >>> > it.
> > > > > > > > > >>>> > >>> >
> > > > > > > > > >>>> > >>> > Let's continue discussion of task goals and
> > > solution
> > > > > > > design.
> > > > > > > > > >>>> > >>> > As I understood that, the main goal of this
> task
> > > is
> > > > to
> > > > > > > store
> > > > > > > > > >>>> data in
> > > > > > > > > >>>> > >>> > compressed form.
> > > > > > > > > >>>> > >>> > This is what I need from Ignite as its user.
> > > > > Compression
> > > > > > > > > >>>> provides
> > > > > > > > > >>>> > >>> economy
> > > > > > > > > >>>> > >>> > on
> > > > > > > > > >>>> > >>> > servers.
> > > > > > > > > >>>> > >>> > We can store more data on same servers at the
> > cost
> > > > of
> > > > > > > > > >>>> increasing CPU
> > > > > > > > > >>>> > >>> > utilization.
> > > > > > > > > >>>> > >>> >
> > > > > > > > > >>>> > >>> > I'm researching a possibility of
> implementation
> > of
> > > > > > > > compression
> > > > > > > > > >>>> at the
> > > > > > > > > >>>> > >>> > cache-level.
> > > > > > > > > >>>> > >>> >
> > > > > > > > > >>>> > >>> > Any thoughts?
> > > > > > > > > >>>> > >>> >
> > > > > > > > > >>>> > >>> > --
> > > > > > > > > >>>> > >>> > Best regards,
> > > > > > > > > >>>> > >>> > Vyacheslav
> > > > > > > > > >>>> > >>> >
> > > > > > > > > >>>> > >>> >
> > > > > > > > > >>>> > >>> >
> > > > > > > > > >>>> > >>> >
> > > > > > > > > >>>> > >>> > --
> > > > > > > > > >>>> > >>> > View this message in context:
> > > http://apache-ignite-
> > > > > > > > > >>>> > >>> > developers.2346864.n4.nabble.
> > > > com/Data-compression-in-
> > > > > > > > > >>>> > >>> > Ignite-2-0-tp10099p16317.html
> > > > > > > > > >>>> > >>> > Sent from the Apache Ignite Developers mailing
> > > list
> > > > > > > archive
> > > > > > > > at
> > > > > > > > > >>>> > >>> Nabble.com.
> > > > > > > > > >>>> > >>> >
> > > > > > > > > >>>> > >>>
> > > > > > > > > >>>> > >>>
> > > > > > > > > >>>> > >>>
> > > > > > > > > >>>> > >>> --
> > > > > > > > > >>>> > >>> Alexey Kuznetsov
> > > > > > > > > >>>> > >>>
> > > > > > > > > >>>> > >>
> > > > > > > > > >>>> > >>
> > > > > > > > > >>>> > >>
> > > > > > > > > >>>> > >> --
> > > > > > > > > >>>> > >> Best Regards, Vyacheslav
> > > > > > > > > >>>> > >>
> > > > > > > > > >>>> > >
> > > > > > > > > >>>> > >
> > > > > > > > > >>>> > >
> > > > > > > > > >>>> > > --
> > > > > > > > > >>>> > > Best Regards, Vyacheslav
> > > > > > > > > >>>> > >
> > > > > > > > > >>>> >
> > > > > > > > > >>>> >
> > > > > > > > > >>>> >
> > > > > > > > > >>>> > --
> > > > > > > > > >>>> > Best Regards, Vyacheslav
> > > > > > > > > >>>> >
> > > > > > > > > >>>>
> > > > > > > > > >>>
> > > > > > > > > >>>
> > > > > > > > > >>>
> > > > > > > > > >>> --
> > > > > > > > > >>> Best Regards, Vyacheslav
> > > > > > > > > >>>
> > > > > > > > > >>
> > > > > > > > > >>
> > > > > > > > > >>
> > > > > > > > > >> --
> > > > > > > > > >> Best Regards, Vyacheslav
> > > > > > > > > >>
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > > Best Regards, Vyacheslav
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > Best Regards, Vyacheslav
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Best Regards, Vyacheslav
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > >
> > > > > > Best Regards, Anton Churaev
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Best Regards, Vyacheslav
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > >
> > > > Best Regards, Anton Churaev
> > > >
> > >
> > >
> > >
> > > --
> > > Best Regards, Vyacheslav
> > >
> >
> >
> >
> > --
> >
> > Best Regards, Anton Churaev
> >
>
>
>
> --
> Best Regards, Vyacheslav
>



--

Best Regards, Anton Churaev
Reply | Threaded
Open this post in threaded view
|

Re: Data compression in Ignite 2.0

Valentin Kulichenko
Vyacheslav, Anton,

Are there any ideas and/or prototypes for the API? Your design suggestions
seem to make sense, but I would like to see how it all this will like from
user's standpoint.

-Val

On Wed, Jun 7, 2017 at 1:06 AM, Антон Чураев <[hidden email]> wrote:

> Vyacheslav, correct me if something wrong
>
> We could provide opportunity of choose between CPU usage and MEM/NET usage
> for users by compression some attributes of stored objects.
> You have learned design, and it is possible to localize changes in
> marshalling without performance affect and current functionality.
>
> I think, that it's usefull for our project and users.
> Community, what do you think about this proposal?
>
>
> 2017-06-06 17:29 GMT+03:00 Vyacheslav Daradur <[hidden email]>:
>
> > In short,
> >
> > During marshalling a fields is represented as BinaryFieldAccessor which
> > manages its marshalling. It checks if the field is marked by annotation
> > @BinaryCompression, in that case - binary  representation of field (bytes
> > array) will be compressed. It will be marked as compressed by types
> > constant (GridBinaryMarshaller.COMPRESSED), after this the compressed
> > bytes
> > array wiil be include in binary representation of whole object. Note,
> > header of marshalled object will not be compressed. Compression affected
> > only object's field representation.
> >
> > Objects in IgniteCache is represented as BinaryObject which is wrapper
> over
> > bytes array of marshalled object.
> > BinaryObject provides some usefull methods, which are used by Ignite
> > systems.
> > For example, the Queries use BinaryObject#field method, which
> deserializes
> > only field of object, without deserializing of whole object.
> > BinaryObject#field method during deserialization, if meets the constant
> of
> > compressed type, decompress this bytes array, then continue unmarshalling
> > as usual.
> >
> > Now, I introduced the Compressor interface in IgniteConfigurations, it
> > allows user to use own implementation of compressor - it is the
> requirement
> > in the task[1].
> >
> > As far as I know, Vladimir Ozerov doesn't like the idea of granting this
> > opportunity to the user.
> > In that case we can choose a compression algorithm which we will provide
> by
> > default and will move the interface to internals of binary
> infractructure.
> > For this case I've prepared benchmarked, which I've sent earlier.
> >
> > I vote for ZSTD algorithm[2], it provides good compression ratio and good
> > throughput. It has implementation in Java, .NET and C++, and has
> > ASF-friendly license, we can use it in the all Ignite platforms.
> > You can look at an assessment of this algorithm in my benchmark's
> >
> > [1] https://issues.apache.org/jira/browse/IGNITE-3592
> > [2]https://github.com/facebook/zstd
> >
> >
> > 2017-06-06 16:02 GMT+03:00 Антон Чураев <[hidden email]>:
> >
> > > Looks good for me.
> > >
> > > Could You propose design of implementation in couple of sentences?
> > > So that we can estimate the completeness and complexity of the
> proposal.
> > >
> > > 2017-06-06 15:26 GMT+03:00 Vyacheslav Daradur <[hidden email]>:
> > >
> > > > Anton,
> > > >
> > > > Of course, the solution does not affect on existing implementation. I
> > > mean,
> > > > there is no changes if user not use the annotation
> @BinaryCompression.
> > > (no
> > > > performance changes)
> > > > Only if user make decision to use compression on specific field or
> > fields
> > > > of a class - in that case compression will be used at marshalling in
> > > > relation to annotated fields.
> > > >
> > > > 2017-06-06 15:10 GMT+03:00 Антон Чураев <[hidden email]>:
> > > >
> > > > > Vyacheslav,
> > > > >
> > > > > Is it possible to propose implementation that can be switched on
> > > > on-demand?
> > > > > In this case it should not affect performance of current solution.
> > > > >
> > > > > I mean, that users should make decision what is more important for
> > > them:
> > > > > throutput or memory/net usage.
> > > > > May be they will be choose not all objects, or only some attributes
> > of
> > > > > objects for compress.
> > > > >
> > > > > 2017-06-06 14:48 GMT+03:00 Vyacheslav Daradur <[hidden email]
> >:
> > > > >
> > > > > > Conclusion:
> > > > > > Provided solution allows reduce size of an object in IgniteCache
> at
> > > the
> > > > > > cost of throughput reduction (small - in some cases), it depends
> on
> > > > part
> > > > > of
> > > > > > object which will be compressed and compression algorithm.
> > > > > > I mean, we can make more effective use of memory, and in some
> cases
> > > it
> > > > > can
> > > > > > reduce loading of the interconnect. (replication, rebalancing)
> > > > > >
> > > > > > Especially, it will be particularly useful for object's fields
> > which
> > > > are
> > > > > > large text (>~ 250 bytes) and can be effectively compressed.
> > > > > >
> > > > > > 2017-06-06 12:00 GMT+03:00 Антон Чураев <[hidden email]>:
> > > > > >
> > > > > > > Vyacheslav, thank you! But could you please provide a
> conclusions
> > > or
> > > > > > > proposals based on this benchmarks?
> > > > > > >
> > > > > > > 2017-06-06 11:28 GMT+03:00 Vyacheslav Daradur <
> > [hidden email]
> > > >:
> > > > > > >
> > > > > > > > Dmitry,
> > > > > > > >
> > > > > > > > Excel-pages:
> > > > > > > >
> > > > > > > > 1). "Compression ratio (2)" - shows object size, with
> > compression
> > > > and
> > > > > > > > without compression. (Conditions: literal text)
> > > > > > > > 1st graph shows compression ratios of using different
> > compression
> > > > > > > algrithms
> > > > > > > > depending on size of compressed field.
> > > > > > > > 2nd graph shows evaluation of size of objects depending on
> > sizes
> > > > and
> > > > > > > > compression algorithms.
> > > > > > > >
> > > > > > > > 2). "Compression ratio (1)" - shows object size, with
> > compression
> > > > and
> > > > > > > > without compression. (Conditions:  badly compressed character
> > > > > sequence)
> > > > > > > > 1st graph shows compression ratios of using different
> > compression
> > > > > > > > algrithms depending on size of compressed field.
> > > > > > > > 2nd graph shows evaluation of size of objects depending on
> > sizes
> > > > and
> > > > > > > > compression algorithms.
> > > > > > > >
> > > > > > > > 3) 'put-avg" - shows average time of the "put" operation
> > > depending
> > > > on
> > > > > > > size
> > > > > > > > and compression algorithms.
> > > > > > > >
> > > > > > > > 4) 'put-thrpt" - shows throughput of the "put" operation
> > > depending
> > > > on
> > > > > > > size
> > > > > > > > and compression algorithms.
> > > > > > > >
> > > > > > > > 5) 'get-avg" - shows average time of the "get" operation
> > > depending
> > > > on
> > > > > > > size
> > > > > > > > and compression algorithms.
> > > > > > > >
> > > > > > > > 6) 'get-thrpt" - shows throughput of the "get" operation
> > > depending
> > > > on
> > > > > > > size
> > > > > > > > and compression algorithms.
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > 2017-06-06 10:59 GMT+03:00 Dmitriy Setrakyan <
> > > > [hidden email]
> > > > > >:
> > > > > > > >
> > > > > > > > > Vladimir, I am not sure how to interpret the graphs? What
> are
> > > we
> > > > > > > looking
> > > > > > > > > at?
> > > > > > > > >
> > > > > > > > > On Tue, Jun 6, 2017 at 12:33 AM, Vyacheslav Daradur <
> > > > > > > [hidden email]
> > > > > > > > >
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi, Igniters.
> > > > > > > > > >
> > > > > > > > > > I've prepared some benchmarking. Results [1].
> > > > > > > > > >
> > > > > > > > > > And I've prepared the evaluation in the form of diagrams
> > [2].
> > > > > > > > > >
> > > > > > > > > > I hope that helps to interest the community and
> > accelerates a
> > > > > > > reaction
> > > > > > > > to
> > > > > > > > > > this improvment :)
> > > > > > > > > >
> > > > > > > > > > [1]
> > > > > > > > > > https://github.com/daradurvs/ignite-compression/tree/
> > > > > > > > > > master/src/main/resources/result
> > > > > > > > > > [2] https://drive.google.com/file/d/
> > > > > 0B2CeUAOgrHkoMklyZ25YTEdKcEk/
> > > > > > > view
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > 2017-05-24 9:49 GMT+03:00 Vyacheslav Daradur <
> > > > > [hidden email]
> > > > > > >:
> > > > > > > > > >
> > > > > > > > > > > Guys, any thoughts?
> > > > > > > > > > >
> > > > > > > > > > > 2017-05-16 13:40 GMT+03:00 Vyacheslav Daradur <
> > > > > > [hidden email]
> > > > > > > >:
> > > > > > > > > > >
> > > > > > > > > > >> Hi guys,
> > > > > > > > > > >>
> > > > > > > > > > >> I've prepared the PR to show my idea.
> > > > > > > > > > >> https://github.com/apache/ignite/pull/1951/files
> > > > > > > > > > >>
> > > > > > > > > > >> About querying - I've just copied existing tests and
> > have
> > > > > > > annotated
> > > > > > > > > the
> > > > > > > > > > >> testing data.
> > > > > > > > > > >> https://github.com/apache/
> ignite/pull/1951/files#diff-
> > > > c19a9d
> > > > > > > > > > >> f4058141d059bb577e75244764
> > > > > > > > > > >>
> > > > > > > > > > >> It means fields which will be marked by
> > @BinaryCompression
> > > > > will
> > > > > > be
> > > > > > > > > > >> compressed at marshalling via BinaryMarshaller.
> > > > > > > > > > >>
> > > > > > > > > > >> This solution has no effect on existing data or
> project
> > > > > > > > architecture.
> > > > > > > > > > >>
> > > > > > > > > > >> I'll be glad to see your thougths.
> > > > > > > > > > >>
> > > > > > > > > > >>
> > > > > > > > > > >> 2017-05-15 19:18 GMT+03:00 Vyacheslav Daradur <
> > > > > > > [hidden email]
> > > > > > > > >:
> > > > > > > > > > >>
> > > > > > > > > > >>> Dmitriy,
> > > > > > > > > > >>>
> > > > > > > > > > >>> I have ready prototype. I want to show it.
> > > > > > > > > > >>> It is always easier to discuss on example.
> > > > > > > > > > >>>
> > > > > > > > > > >>> 2017-05-15 19:02 GMT+03:00 Dmitriy Setrakyan <
> > > > > > > > [hidden email]
> > > > > > > > > >:
> > > > > > > > > > >>>
> > > > > > > > > > >>>> Vyacheslav,
> > > > > > > > > > >>>>
> > > > > > > > > > >>>> I think it is a bit premature to provide a PR
> without
> > > > > getting
> > > > > > a
> > > > > > > > > > >>>> community
> > > > > > > > > > >>>> consensus on the dev list. Please allow some time
> for
> > > the
> > > > > > > > community
> > > > > > > > > to
> > > > > > > > > > >>>> respond.
> > > > > > > > > > >>>>
> > > > > > > > > > >>>> D.
> > > > > > > > > > >>>>
> > > > > > > > > > >>>> On Mon, May 15, 2017 at 6:36 AM, Vyacheslav Daradur
> <
> > > > > > > > > > >>>> [hidden email]>
> > > > > > > > > > >>>> wrote:
> > > > > > > > > > >>>>
> > > > > > > > > > >>>> > I created the ticket:
> > https://issues.apache.org/jira
> > > > > > > > > > >>>> /browse/IGNITE-5226
> > > > > > > > > > >>>> >
> > > > > > > > > > >>>> > I'll prepare a PR with described solution in
> couple
> > of
> > > > > days.
> > > > > > > > > > >>>> >
> > > > > > > > > > >>>> > 2017-05-15 15:05 GMT+03:00 Vyacheslav Daradur <
> > > > > > > > > [hidden email]
> > > > > > > > > > >:
> > > > > > > > > > >>>> >
> > > > > > > > > > >>>> > > Hi, Igniters!
> > > > > > > > > > >>>> > >
> > > > > > > > > > >>>> > > Apache 2.0 is released.
> > > > > > > > > > >>>> > >
> > > > > > > > > > >>>> > > Let's continue the discussion about a
> compression
> > > > > design.
> > > > > > > > > > >>>> > >
> > > > > > > > > > >>>> > > At the moment, I found only one solution which
> is
> > > > > > compatible
> > > > > > > > > with
> > > > > > > > > > >>>> > querying
> > > > > > > > > > >>>> > > and indexing, this is per-objects-field
> > compression.
> > > > > > > > > > >>>> > > Per-fields compression means that metadata (a
> > > header)
> > > > of
> > > > > > an
> > > > > > > > > object
> > > > > > > > > > >>>> won't
> > > > > > > > > > >>>> > > be compressed, only serialized values of an
> object
> > > > > fields
> > > > > > > (in
> > > > > > > > > > bytes
> > > > > > > > > > >>>> array
> > > > > > > > > > >>>> > > form) will be compressed.
> > > > > > > > > > >>>> > >
> > > > > > > > > > >>>> > > This solution have some contentious issues:
> > > > > > > > > > >>>> > > - small values, like primitives and short
> arrays -
> > > > there
> > > > > > > isn't
> > > > > > > > > > >>>> sense to
> > > > > > > > > > >>>> > > compress them;
> > > > > > > > > > >>>> > > - there is no possible to use compression with
> > > > > > > java-predefined
> > > > > > > > > > >>>> types;
> > > > > > > > > > >>>> > >
> > > > > > > > > > >>>> > > We can provide an annotation,
> @IgniteCompression -
> > > for
> > > > > > > > example,
> > > > > > > > > > >>>> which can
> > > > > > > > > > >>>> > > be used by users for marking fields to compress.
> > > > > > > > > > >>>> > >
> > > > > > > > > > >>>> > > Any thoughts?
> > > > > > > > > > >>>> > >
> > > > > > > > > > >>>> > > Maybe someone already have ready design?
> > > > > > > > > > >>>> > >
> > > > > > > > > > >>>> > > 2017-04-10 11:06 GMT+03:00 Vyacheslav Daradur <
> > > > > > > > > > [hidden email]
> > > > > > > > > > >>>> >:
> > > > > > > > > > >>>> > >
> > > > > > > > > > >>>> > >> Alexey,
> > > > > > > > > > >>>> > >>
> > > > > > > > > > >>>> > >> Yes, I've read it.
> > > > > > > > > > >>>> > >>
> > > > > > > > > > >>>> > >> Ok, let's discuss about public API design.
> > > > > > > > > > >>>> > >>
> > > > > > > > > > >>>> > >> I think we need to add some a configure entity
> to
> > > > > > > > > > >>>> CacheConfiguration,
> > > > > > > > > > >>>> > >> which will contain the Compressor interface
> > > > > > implementation
> > > > > > > > and
> > > > > > > > > > some
> > > > > > > > > > >>>> > usefull
> > > > > > > > > > >>>> > >> parameters.
> > > > > > > > > > >>>> > >> Or maybe to provide a BinaryMarshaller
> decorator,
> > > > which
> > > > > > > will
> > > > > > > > be
> > > > > > > > > > >>>> compress
> > > > > > > > > > >>>> > >> data after marshalling.
> > > > > > > > > > >>>> > >>
> > > > > > > > > > >>>> > >>
> > > > > > > > > > >>>> > >> 2017-04-10 10:40 GMT+03:00 Alexey Kuznetsov <
> > > > > > > > > > [hidden email]
> > > > > > > > > > >>>> >:
> > > > > > > > > > >>>> > >>
> > > > > > > > > > >>>> > >>> Vyacheslav,
> > > > > > > > > > >>>> > >>>
> > > > > > > > > > >>>> > >>> Did you read initial discussion [1] about
> > > > compression?
> > > > > > > > > > >>>> > >>> As far as I remember we agreed to add only
> some
> > > > > > > "top-level"
> > > > > > > > > API
> > > > > > > > > > in
> > > > > > > > > > >>>> > order
> > > > > > > > > > >>>> > >>> to
> > > > > > > > > > >>>> > >>> provide a way for
> > > > > > > > > > >>>> > >>> Ignite users to inject some sort of custom
> > > > > compression.
> > > > > > > > > > >>>> > >>>
> > > > > > > > > > >>>> > >>>
> > > > > > > > > > >>>> > >>> [1]
> > > > > > > > > > >>>> > >>> http://apache-ignite-developer
> > s.2346864.n4.nabble
> > > .
> > > > > > > > com/Data-c
> > > > > > > > > > >>>> > >>> ompression-in-Ignite-2-0-td10099.html
> > > > > > > > > > >>>> > >>>
> > > > > > > > > > >>>> > >>> On Mon, Apr 10, 2017 at 2:19 PM, daradurvs <
> > > > > > > > > [hidden email]
> > > > > > > > > > >
> > > > > > > > > > >>>> > wrote:
> > > > > > > > > > >>>> > >>>
> > > > > > > > > > >>>> > >>> > Hi Igniters!
> > > > > > > > > > >>>> > >>> >
> > > > > > > > > > >>>> > >>> > I am interested in this task.
> > > > > > > > > > >>>> > >>> > Provide some kind of pluggable compression
> SPI
> > > > > support
> > > > > > > > > > >>>> > >>> > <https://issues.apache.org/
> > > > jira/browse/IGNITE-3592>
> > > > > > > > > > >>>> > >>> >
> > > > > > > > > > >>>> > >>> > I developed a solution on
> > > BinaryMarshaller-level,
> > > > > but
> > > > > > > > > reviewer
> > > > > > > > > > >>>> has
> > > > > > > > > > >>>> > >>> rejected
> > > > > > > > > > >>>> > >>> > it.
> > > > > > > > > > >>>> > >>> >
> > > > > > > > > > >>>> > >>> > Let's continue discussion of task goals and
> > > > solution
> > > > > > > > design.
> > > > > > > > > > >>>> > >>> > As I understood that, the main goal of this
> > task
> > > > is
> > > > > to
> > > > > > > > store
> > > > > > > > > > >>>> data in
> > > > > > > > > > >>>> > >>> > compressed form.
> > > > > > > > > > >>>> > >>> > This is what I need from Ignite as its user.
> > > > > > Compression
> > > > > > > > > > >>>> provides
> > > > > > > > > > >>>> > >>> economy
> > > > > > > > > > >>>> > >>> > on
> > > > > > > > > > >>>> > >>> > servers.
> > > > > > > > > > >>>> > >>> > We can store more data on same servers at
> the
> > > cost
> > > > > of
> > > > > > > > > > >>>> increasing CPU
> > > > > > > > > > >>>> > >>> > utilization.
> > > > > > > > > > >>>> > >>> >
> > > > > > > > > > >>>> > >>> > I'm researching a possibility of
> > implementation
> > > of
> > > > > > > > > compression
> > > > > > > > > > >>>> at the
> > > > > > > > > > >>>> > >>> > cache-level.
> > > > > > > > > > >>>> > >>> >
> > > > > > > > > > >>>> > >>> > Any thoughts?
> > > > > > > > > > >>>> > >>> >
> > > > > > > > > > >>>> > >>> > --
> > > > > > > > > > >>>> > >>> > Best regards,
> > > > > > > > > > >>>> > >>> > Vyacheslav
> > > > > > > > > > >>>> > >>> >
> > > > > > > > > > >>>> > >>> >
> > > > > > > > > > >>>> > >>> >
> > > > > > > > > > >>>> > >>> >
> > > > > > > > > > >>>> > >>> > --
> > > > > > > > > > >>>> > >>> > View this message in context:
> > > > http://apache-ignite-
> > > > > > > > > > >>>> > >>> > developers.2346864.n4.nabble.
> > > > > com/Data-compression-in-
> > > > > > > > > > >>>> > >>> > Ignite-2-0-tp10099p16317.html
> > > > > > > > > > >>>> > >>> > Sent from the Apache Ignite Developers
> mailing
> > > > list
> > > > > > > > archive
> > > > > > > > > at
> > > > > > > > > > >>>> > >>> Nabble.com.
> > > > > > > > > > >>>> > >>> >
> > > > > > > > > > >>>> > >>>
> > > > > > > > > > >>>> > >>>
> > > > > > > > > > >>>> > >>>
> > > > > > > > > > >>>> > >>> --
> > > > > > > > > > >>>> > >>> Alexey Kuznetsov
> > > > > > > > > > >>>> > >>>
> > > > > > > > > > >>>> > >>
> > > > > > > > > > >>>> > >>
> > > > > > > > > > >>>> > >>
> > > > > > > > > > >>>> > >> --
> > > > > > > > > > >>>> > >> Best Regards, Vyacheslav
> > > > > > > > > > >>>> > >>
> > > > > > > > > > >>>> > >
> > > > > > > > > > >>>> > >
> > > > > > > > > > >>>> > >
> > > > > > > > > > >>>> > > --
> > > > > > > > > > >>>> > > Best Regards, Vyacheslav
> > > > > > > > > > >>>> > >
> > > > > > > > > > >>>> >
> > > > > > > > > > >>>> >
> > > > > > > > > > >>>> >
> > > > > > > > > > >>>> > --
> > > > > > > > > > >>>> > Best Regards, Vyacheslav
> > > > > > > > > > >>>> >
> > > > > > > > > > >>>>
> > > > > > > > > > >>>
> > > > > > > > > > >>>
> > > > > > > > > > >>>
> > > > > > > > > > >>> --
> > > > > > > > > > >>> Best Regards, Vyacheslav
> > > > > > > > > > >>>
> > > > > > > > > > >>
> > > > > > > > > > >>
> > > > > > > > > > >>
> > > > > > > > > > >> --
> > > > > > > > > > >> Best Regards, Vyacheslav
> > > > > > > > > > >>
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > --
> > > > > > > > > > > Best Regards, Vyacheslav
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > > Best Regards, Vyacheslav
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Best Regards, Vyacheslav
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > >
> > > > > > > Best Regards, Anton Churaev
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Best Regards, Vyacheslav
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > >
> > > > > Best Regards, Anton Churaev
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Best Regards, Vyacheslav
> > > >
> > >
> > >
> > >
> > > --
> > >
> > > Best Regards, Anton Churaev
> > >
> >
> >
> >
> > --
> > Best Regards, Vyacheslav
> >
>
>
>
> --
>
> Best Regards, Anton Churaev
>
Reply | Threaded
Open this post in threaded view
|

Re: Data compression in Ignite 2.0

daradurvs
Valentin,

Yes, I have the prototype[1][2]

You can see an example of Java class[3] that I used in my benchmark.
For example:
class Foo {
@BinaryCompression
String data;
}
If user make decision to store the object in compressed form, he can use
the annotation @BinaryCompression as shown above.
It means annotated field 'data' will be compressed at marshalling.

[1] https://github.com/apache/ignite/pull/1951
[2] https://issues.apache.org/jira/browse/IGNITE-5226
[3]
https://github.com/daradurvs/ignite-compression/blob/master/src/main/java/ru/daradurvs/ignite/compression/model/Audit1F.java



2017-06-08 2:04 GMT+03:00 Valentin Kulichenko <[hidden email]
>:

> Vyacheslav, Anton,
>
> Are there any ideas and/or prototypes for the API? Your design suggestions
> seem to make sense, but I would like to see how it all this will like from
> user's standpoint.
>
> -Val
>
> On Wed, Jun 7, 2017 at 1:06 AM, Антон Чураев <[hidden email]> wrote:
>
> > Vyacheslav, correct me if something wrong
> >
> > We could provide opportunity of choose between CPU usage and MEM/NET
> usage
> > for users by compression some attributes of stored objects.
> > You have learned design, and it is possible to localize changes in
> > marshalling without performance affect and current functionality.
> >
> > I think, that it's usefull for our project and users.
> > Community, what do you think about this proposal?
> >
> >
> > 2017-06-06 17:29 GMT+03:00 Vyacheslav Daradur <[hidden email]>:
> >
> > > In short,
> > >
> > > During marshalling a fields is represented as BinaryFieldAccessor which
> > > manages its marshalling. It checks if the field is marked by annotation
> > > @BinaryCompression, in that case - binary  representation of field
> (bytes
> > > array) will be compressed. It will be marked as compressed by types
> > > constant (GridBinaryMarshaller.COMPRESSED), after this the compressed
> > > bytes
> > > array wiil be include in binary representation of whole object. Note,
> > > header of marshalled object will not be compressed. Compression
> affected
> > > only object's field representation.
> > >
> > > Objects in IgniteCache is represented as BinaryObject which is wrapper
> > over
> > > bytes array of marshalled object.
> > > BinaryObject provides some usefull methods, which are used by Ignite
> > > systems.
> > > For example, the Queries use BinaryObject#field method, which
> > deserializes
> > > only field of object, without deserializing of whole object.
> > > BinaryObject#field method during deserialization, if meets the constant
> > of
> > > compressed type, decompress this bytes array, then continue
> unmarshalling
> > > as usual.
> > >
> > > Now, I introduced the Compressor interface in IgniteConfigurations, it
> > > allows user to use own implementation of compressor - it is the
> > requirement
> > > in the task[1].
> > >
> > > As far as I know, Vladimir Ozerov doesn't like the idea of granting
> this
> > > opportunity to the user.
> > > In that case we can choose a compression algorithm which we will
> provide
> > by
> > > default and will move the interface to internals of binary
> > infractructure.
> > > For this case I've prepared benchmarked, which I've sent earlier.
> > >
> > > I vote for ZSTD algorithm[2], it provides good compression ratio and
> good
> > > throughput. It has implementation in Java, .NET and C++, and has
> > > ASF-friendly license, we can use it in the all Ignite platforms.
> > > You can look at an assessment of this algorithm in my benchmark's
> > >
> > > [1] https://issues.apache.org/jira/browse/IGNITE-3592
> > > [2]https://github.com/facebook/zstd
> > >
> > >
> > > 2017-06-06 16:02 GMT+03:00 Антон Чураев <[hidden email]>:
> > >
> > > > Looks good for me.
> > > >
> > > > Could You propose design of implementation in couple of sentences?
> > > > So that we can estimate the completeness and complexity of the
> > proposal.
> > > >
> > > > 2017-06-06 15:26 GMT+03:00 Vyacheslav Daradur <[hidden email]>:
> > > >
> > > > > Anton,
> > > > >
> > > > > Of course, the solution does not affect on existing
> implementation. I
> > > > mean,
> > > > > there is no changes if user not use the annotation
> > @BinaryCompression.
> > > > (no
> > > > > performance changes)
> > > > > Only if user make decision to use compression on specific field or
> > > fields
> > > > > of a class - in that case compression will be used at marshalling
> in
> > > > > relation to annotated fields.
> > > > >
> > > > > 2017-06-06 15:10 GMT+03:00 Антон Чураев <[hidden email]>:
> > > > >
> > > > > > Vyacheslav,
> > > > > >
> > > > > > Is it possible to propose implementation that can be switched on
> > > > > on-demand?
> > > > > > In this case it should not affect performance of current
> solution.
> > > > > >
> > > > > > I mean, that users should make decision what is more important
> for
> > > > them:
> > > > > > throutput or memory/net usage.
> > > > > > May be they will be choose not all objects, or only some
> attributes
> > > of
> > > > > > objects for compress.
> > > > > >
> > > > > > 2017-06-06 14:48 GMT+03:00 Vyacheslav Daradur <
> [hidden email]
> > >:
> > > > > >
> > > > > > > Conclusion:
> > > > > > > Provided solution allows reduce size of an object in
> IgniteCache
> > at
> > > > the
> > > > > > > cost of throughput reduction (small - in some cases), it
> depends
> > on
> > > > > part
> > > > > > of
> > > > > > > object which will be compressed and compression algorithm.
> > > > > > > I mean, we can make more effective use of memory, and in some
> > cases
> > > > it
> > > > > > can
> > > > > > > reduce loading of the interconnect. (replication, rebalancing)
> > > > > > >
> > > > > > > Especially, it will be particularly useful for object's fields
> > > which
> > > > > are
> > > > > > > large text (>~ 250 bytes) and can be effectively compressed.
> > > > > > >
> > > > > > > 2017-06-06 12:00 GMT+03:00 Антон Чураев <[hidden email]
> >:
> > > > > > >
> > > > > > > > Vyacheslav, thank you! But could you please provide a
> > conclusions
> > > > or
> > > > > > > > proposals based on this benchmarks?
> > > > > > > >
> > > > > > > > 2017-06-06 11:28 GMT+03:00 Vyacheslav Daradur <
> > > [hidden email]
> > > > >:
> > > > > > > >
> > > > > > > > > Dmitry,
> > > > > > > > >
> > > > > > > > > Excel-pages:
> > > > > > > > >
> > > > > > > > > 1). "Compression ratio (2)" - shows object size, with
> > > compression
> > > > > and
> > > > > > > > > without compression. (Conditions: literal text)
> > > > > > > > > 1st graph shows compression ratios of using different
> > > compression
> > > > > > > > algrithms
> > > > > > > > > depending on size of compressed field.
> > > > > > > > > 2nd graph shows evaluation of size of objects depending on
> > > sizes
> > > > > and
> > > > > > > > > compression algorithms.
> > > > > > > > >
> > > > > > > > > 2). "Compression ratio (1)" - shows object size, with
> > > compression
> > > > > and
> > > > > > > > > without compression. (Conditions:  badly compressed
> character
> > > > > > sequence)
> > > > > > > > > 1st graph shows compression ratios of using different
> > > compression
> > > > > > > > > algrithms depending on size of compressed field.
> > > > > > > > > 2nd graph shows evaluation of size of objects depending on
> > > sizes
> > > > > and
> > > > > > > > > compression algorithms.
> > > > > > > > >
> > > > > > > > > 3) 'put-avg" - shows average time of the "put" operation
> > > > depending
> > > > > on
> > > > > > > > size
> > > > > > > > > and compression algorithms.
> > > > > > > > >
> > > > > > > > > 4) 'put-thrpt" - shows throughput of the "put" operation
> > > > depending
> > > > > on
> > > > > > > > size
> > > > > > > > > and compression algorithms.
> > > > > > > > >
> > > > > > > > > 5) 'get-avg" - shows average time of the "get" operation
> > > > depending
> > > > > on
> > > > > > > > size
> > > > > > > > > and compression algorithms.
> > > > > > > > >
> > > > > > > > > 6) 'get-thrpt" - shows throughput of the "get" operation
> > > > depending
> > > > > on
> > > > > > > > size
> > > > > > > > > and compression algorithms.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > 2017-06-06 10:59 GMT+03:00 Dmitriy Setrakyan <
> > > > > [hidden email]
> > > > > > >:
> > > > > > > > >
> > > > > > > > > > Vladimir, I am not sure how to interpret the graphs? What
> > are
> > > > we
> > > > > > > > looking
> > > > > > > > > > at?
> > > > > > > > > >
> > > > > > > > > > On Tue, Jun 6, 2017 at 12:33 AM, Vyacheslav Daradur <
> > > > > > > > [hidden email]
> > > > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hi, Igniters.
> > > > > > > > > > >
> > > > > > > > > > > I've prepared some benchmarking. Results [1].
> > > > > > > > > > >
> > > > > > > > > > > And I've prepared the evaluation in the form of
> diagrams
> > > [2].
> > > > > > > > > > >
> > > > > > > > > > > I hope that helps to interest the community and
> > > accelerates a
> > > > > > > > reaction
> > > > > > > > > to
> > > > > > > > > > > this improvment :)
> > > > > > > > > > >
> > > > > > > > > > > [1]
> > > > > > > > > > > https://github.com/daradurvs/ignite-compression/tree/
> > > > > > > > > > > master/src/main/resources/result
> > > > > > > > > > > [2] https://drive.google.com/file/d/
> > > > > > 0B2CeUAOgrHkoMklyZ25YTEdKcEk/
> > > > > > > > view
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > 2017-05-24 9:49 GMT+03:00 Vyacheslav Daradur <
> > > > > > [hidden email]
> > > > > > > >:
> > > > > > > > > > >
> > > > > > > > > > > > Guys, any thoughts?
> > > > > > > > > > > >
> > > > > > > > > > > > 2017-05-16 13:40 GMT+03:00 Vyacheslav Daradur <
> > > > > > > [hidden email]
> > > > > > > > >:
> > > > > > > > > > > >
> > > > > > > > > > > >> Hi guys,
> > > > > > > > > > > >>
> > > > > > > > > > > >> I've prepared the PR to show my idea.
> > > > > > > > > > > >> https://github.com/apache/ignite/pull/1951/files
> > > > > > > > > > > >>
> > > > > > > > > > > >> About querying - I've just copied existing tests and
> > > have
> > > > > > > > annotated
> > > > > > > > > > the
> > > > > > > > > > > >> testing data.
> > > > > > > > > > > >> https://github.com/apache/
> > ignite/pull/1951/files#diff-
> > > > > c19a9d
> > > > > > > > > > > >> f4058141d059bb577e75244764
> > > > > > > > > > > >>
> > > > > > > > > > > >> It means fields which will be marked by
> > > @BinaryCompression
> > > > > > will
> > > > > > > be
> > > > > > > > > > > >> compressed at marshalling via BinaryMarshaller.
> > > > > > > > > > > >>
> > > > > > > > > > > >> This solution has no effect on existing data or
> > project
> > > > > > > > > architecture.
> > > > > > > > > > > >>
> > > > > > > > > > > >> I'll be glad to see your thougths.
> > > > > > > > > > > >>
> > > > > > > > > > > >>
> > > > > > > > > > > >> 2017-05-15 19:18 GMT+03:00 Vyacheslav Daradur <
> > > > > > > > [hidden email]
> > > > > > > > > >:
> > > > > > > > > > > >>
> > > > > > > > > > > >>> Dmitriy,
> > > > > > > > > > > >>>
> > > > > > > > > > > >>> I have ready prototype. I want to show it.
> > > > > > > > > > > >>> It is always easier to discuss on example.
> > > > > > > > > > > >>>
> > > > > > > > > > > >>> 2017-05-15 19:02 GMT+03:00 Dmitriy Setrakyan <
> > > > > > > > > [hidden email]
> > > > > > > > > > >:
> > > > > > > > > > > >>>
> > > > > > > > > > > >>>> Vyacheslav,
> > > > > > > > > > > >>>>
> > > > > > > > > > > >>>> I think it is a bit premature to provide a PR
> > without
> > > > > > getting
> > > > > > > a
> > > > > > > > > > > >>>> community
> > > > > > > > > > > >>>> consensus on the dev list. Please allow some time
> > for
> > > > the
> > > > > > > > > community
> > > > > > > > > > to
> > > > > > > > > > > >>>> respond.
> > > > > > > > > > > >>>>
> > > > > > > > > > > >>>> D.
> > > > > > > > > > > >>>>
> > > > > > > > > > > >>>> On Mon, May 15, 2017 at 6:36 AM, Vyacheslav
> Daradur
> > <
> > > > > > > > > > > >>>> [hidden email]>
> > > > > > > > > > > >>>> wrote:
> > > > > > > > > > > >>>>
> > > > > > > > > > > >>>> > I created the ticket:
> > > https://issues.apache.org/jira
> > > > > > > > > > > >>>> /browse/IGNITE-5226
> > > > > > > > > > > >>>> >
> > > > > > > > > > > >>>> > I'll prepare a PR with described solution in
> > couple
> > > of
> > > > > > days.
> > > > > > > > > > > >>>> >
> > > > > > > > > > > >>>> > 2017-05-15 15:05 GMT+03:00 Vyacheslav Daradur <
> > > > > > > > > > [hidden email]
> > > > > > > > > > > >:
> > > > > > > > > > > >>>> >
> > > > > > > > > > > >>>> > > Hi, Igniters!
> > > > > > > > > > > >>>> > >
> > > > > > > > > > > >>>> > > Apache 2.0 is released.
> > > > > > > > > > > >>>> > >
> > > > > > > > > > > >>>> > > Let's continue the discussion about a
> > compression
> > > > > > design.
> > > > > > > > > > > >>>> > >
> > > > > > > > > > > >>>> > > At the moment, I found only one solution which
> > is
> > > > > > > compatible
> > > > > > > > > > with
> > > > > > > > > > > >>>> > querying
> > > > > > > > > > > >>>> > > and indexing, this is per-objects-field
> > > compression.
> > > > > > > > > > > >>>> > > Per-fields compression means that metadata (a
> > > > header)
> > > > > of
> > > > > > > an
> > > > > > > > > > object
> > > > > > > > > > > >>>> won't
> > > > > > > > > > > >>>> > > be compressed, only serialized values of an
> > object
> > > > > > fields
> > > > > > > > (in
> > > > > > > > > > > bytes
> > > > > > > > > > > >>>> array
> > > > > > > > > > > >>>> > > form) will be compressed.
> > > > > > > > > > > >>>> > >
> > > > > > > > > > > >>>> > > This solution have some contentious issues:
> > > > > > > > > > > >>>> > > - small values, like primitives and short
> > arrays -
> > > > > there
> > > > > > > > isn't
> > > > > > > > > > > >>>> sense to
> > > > > > > > > > > >>>> > > compress them;
> > > > > > > > > > > >>>> > > - there is no possible to use compression with
> > > > > > > > java-predefined
> > > > > > > > > > > >>>> types;
> > > > > > > > > > > >>>> > >
> > > > > > > > > > > >>>> > > We can provide an annotation,
> > @IgniteCompression -
> > > > for
> > > > > > > > > example,
> > > > > > > > > > > >>>> which can
> > > > > > > > > > > >>>> > > be used by users for marking fields to
> compress.
> > > > > > > > > > > >>>> > >
> > > > > > > > > > > >>>> > > Any thoughts?
> > > > > > > > > > > >>>> > >
> > > > > > > > > > > >>>> > > Maybe someone already have ready design?
> > > > > > > > > > > >>>> > >
> > > > > > > > > > > >>>> > > 2017-04-10 11:06 GMT+03:00 Vyacheslav Daradur
> <
> > > > > > > > > > > [hidden email]
> > > > > > > > > > > >>>> >:
> > > > > > > > > > > >>>> > >
> > > > > > > > > > > >>>> > >> Alexey,
> > > > > > > > > > > >>>> > >>
> > > > > > > > > > > >>>> > >> Yes, I've read it.
> > > > > > > > > > > >>>> > >>
> > > > > > > > > > > >>>> > >> Ok, let's discuss about public API design.
> > > > > > > > > > > >>>> > >>
> > > > > > > > > > > >>>> > >> I think we need to add some a configure
> entity
> > to
> > > > > > > > > > > >>>> CacheConfiguration,
> > > > > > > > > > > >>>> > >> which will contain the Compressor interface
> > > > > > > implementation
> > > > > > > > > and
> > > > > > > > > > > some
> > > > > > > > > > > >>>> > usefull
> > > > > > > > > > > >>>> > >> parameters.
> > > > > > > > > > > >>>> > >> Or maybe to provide a BinaryMarshaller
> > decorator,
> > > > > which
> > > > > > > > will
> > > > > > > > > be
> > > > > > > > > > > >>>> compress
> > > > > > > > > > > >>>> > >> data after marshalling.
> > > > > > > > > > > >>>> > >>
> > > > > > > > > > > >>>> > >>
> > > > > > > > > > > >>>> > >> 2017-04-10 10:40 GMT+03:00 Alexey Kuznetsov <
> > > > > > > > > > > [hidden email]
> > > > > > > > > > > >>>> >:
> > > > > > > > > > > >>>> > >>
> > > > > > > > > > > >>>> > >>> Vyacheslav,
> > > > > > > > > > > >>>> > >>>
> > > > > > > > > > > >>>> > >>> Did you read initial discussion [1] about
> > > > > compression?
> > > > > > > > > > > >>>> > >>> As far as I remember we agreed to add only
> > some
> > > > > > > > "top-level"
> > > > > > > > > > API
> > > > > > > > > > > in
> > > > > > > > > > > >>>> > order
> > > > > > > > > > > >>>> > >>> to
> > > > > > > > > > > >>>> > >>> provide a way for
> > > > > > > > > > > >>>> > >>> Ignite users to inject some sort of custom
> > > > > > compression.
> > > > > > > > > > > >>>> > >>>
> > > > > > > > > > > >>>> > >>>
> > > > > > > > > > > >>>> > >>> [1]
> > > > > > > > > > > >>>> > >>> http://apache-ignite-developer
> > > s.2346864.n4.nabble
> > > > .
> > > > > > > > > com/Data-c
> > > > > > > > > > > >>>> > >>> ompression-in-Ignite-2-0-td10099.html
> > > > > > > > > > > >>>> > >>>
> > > > > > > > > > > >>>> > >>> On Mon, Apr 10, 2017 at 2:19 PM, daradurvs <
> > > > > > > > > > [hidden email]
> > > > > > > > > > > >
> > > > > > > > > > > >>>> > wrote:
> > > > > > > > > > > >>>> > >>>
> > > > > > > > > > > >>>> > >>> > Hi Igniters!
> > > > > > > > > > > >>>> > >>> >
> > > > > > > > > > > >>>> > >>> > I am interested in this task.
> > > > > > > > > > > >>>> > >>> > Provide some kind of pluggable compression
> > SPI
> > > > > > support
> > > > > > > > > > > >>>> > >>> > <https://issues.apache.org/
> > > > > jira/browse/IGNITE-3592>
> > > > > > > > > > > >>>> > >>> >
> > > > > > > > > > > >>>> > >>> > I developed a solution on
> > > > BinaryMarshaller-level,
> > > > > > but
> > > > > > > > > > reviewer
> > > > > > > > > > > >>>> has
> > > > > > > > > > > >>>> > >>> rejected
> > > > > > > > > > > >>>> > >>> > it.
> > > > > > > > > > > >>>> > >>> >
> > > > > > > > > > > >>>> > >>> > Let's continue discussion of task goals
> and
> > > > > solution
> > > > > > > > > design.
> > > > > > > > > > > >>>> > >>> > As I understood that, the main goal of
> this
> > > task
> > > > > is
> > > > > > to
> > > > > > > > > store
> > > > > > > > > > > >>>> data in
> > > > > > > > > > > >>>> > >>> > compressed form.
> > > > > > > > > > > >>>> > >>> > This is what I need from Ignite as its
> user.
> > > > > > > Compression
> > > > > > > > > > > >>>> provides
> > > > > > > > > > > >>>> > >>> economy
> > > > > > > > > > > >>>> > >>> > on
> > > > > > > > > > > >>>> > >>> > servers.
> > > > > > > > > > > >>>> > >>> > We can store more data on same servers at
> > the
> > > > cost
> > > > > > of
> > > > > > > > > > > >>>> increasing CPU
> > > > > > > > > > > >>>> > >>> > utilization.
> > > > > > > > > > > >>>> > >>> >
> > > > > > > > > > > >>>> > >>> > I'm researching a possibility of
> > > implementation
> > > > of
> > > > > > > > > > compression
> > > > > > > > > > > >>>> at the
> > > > > > > > > > > >>>> > >>> > cache-level.
> > > > > > > > > > > >>>> > >>> >
> > > > > > > > > > > >>>> > >>> > Any thoughts?
> > > > > > > > > > > >>>> > >>> >
> > > > > > > > > > > >>>> > >>> > --
> > > > > > > > > > > >>>> > >>> > Best regards,
> > > > > > > > > > > >>>> > >>> > Vyacheslav
> > > > > > > > > > > >>>> > >>> >
> > > > > > > > > > > >>>> > >>> >
> > > > > > > > > > > >>>> > >>> >
> > > > > > > > > > > >>>> > >>> >
> > > > > > > > > > > >>>> > >>> > --
> > > > > > > > > > > >>>> > >>> > View this message in context:
> > > > > http://apache-ignite-
> > > > > > > > > > > >>>> > >>> > developers.2346864.n4.nabble.
> > > > > > com/Data-compression-in-
> > > > > > > > > > > >>>> > >>> > Ignite-2-0-tp10099p16317.html
> > > > > > > > > > > >>>> > >>> > Sent from the Apache Ignite Developers
> > mailing
> > > > > list
> > > > > > > > > archive
> > > > > > > > > > at
> > > > > > > > > > > >>>> > >>> Nabble.com.
> > > > > > > > > > > >>>> > >>> >
> > > > > > > > > > > >>>> > >>>
> > > > > > > > > > > >>>> > >>>
> > > > > > > > > > > >>>> > >>>
> > > > > > > > > > > >>>> > >>> --
> > > > > > > > > > > >>>> > >>> Alexey Kuznetsov
> > > > > > > > > > > >>>> > >>>
> > > > > > > > > > > >>>> > >>
> > > > > > > > > > > >>>> > >>
> > > > > > > > > > > >>>> > >>
> > > > > > > > > > > >>>> > >> --
> > > > > > > > > > > >>>> > >> Best Regards, Vyacheslav
> > > > > > > > > > > >>>> > >>
> > > > > > > > > > > >>>> > >
> > > > > > > > > > > >>>> > >
> > > > > > > > > > > >>>> > >
> > > > > > > > > > > >>>> > > --
> > > > > > > > > > > >>>> > > Best Regards, Vyacheslav
> > > > > > > > > > > >>>> > >
> > > > > > > > > > > >>>> >
> > > > > > > > > > > >>>> >
> > > > > > > > > > > >>>> >
> > > > > > > > > > > >>>> > --
> > > > > > > > > > > >>>> > Best Regards, Vyacheslav
> > > > > > > > > > > >>>> >
> > > > > > > > > > > >>>>
> > > > > > > > > > > >>>
> > > > > > > > > > > >>>
> > > > > > > > > > > >>>
> > > > > > > > > > > >>> --
> > > > > > > > > > > >>> Best Regards, Vyacheslav
> > > > > > > > > > > >>>
> > > > > > > > > > > >>
> > > > > > > > > > > >>
> > > > > > > > > > > >>
> > > > > > > > > > > >> --
> > > > > > > > > > > >> Best Regards, Vyacheslav
> > > > > > > > > > > >>
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > --
> > > > > > > > > > > > Best Regards, Vyacheslav
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > --
> > > > > > > > > > > Best Regards, Vyacheslav
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > Best Regards, Vyacheslav
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > >
> > > > > > > > Best Regards, Anton Churaev
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Best Regards, Vyacheslav
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > >
> > > > > > Best Regards, Anton Churaev
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Best Regards, Vyacheslav
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > >
> > > > Best Regards, Anton Churaev
> > > >
> > >
> > >
> > >
> > > --
> > > Best Regards, Vyacheslav
> > >
> >
> >
> >
> > --
> >
> > Best Regards, Anton Churaev
> >
>



--
Best Regards, Vyacheslav
Reply | Threaded
Open this post in threaded view
|

Re: Data compression in Ignite 2.0

Vladimir Ozerov
Igniters,

Honestly I still do not see how to apply it gracefully this feature ti
Ignite. And overall approach to compress only particular fields looks
overcomplicated to me. Remember, that our main use case is an application
without classes on the server. It means that any kind of annotations are
inapplicable. To be more precise: proper API should be implemented to
handle no-class case (e.g. how would build such an object through
BinaryBuilder without a class?), and only then add annotations as
convenient addition to more basic API.

It seems to me that full implementation, which takes in count proper
"classless" API, changes to binary metadata to reflect compressed fields,
changes to SQL, changes to binary protocol, and porting to .NET and CPP,
will yield very complex solution with little value to the product.

Instead, as I proposed earlier, it seems that we'd better start with the
problem we are trying to solve. Basically, compression could help in two
cases:
1) Transmitting data over wire - it should be implemented on communication
layer and should not affect binary serialization component a lot.
2) Storing data in memory - here the much simpler step would be to full
compression on per-cache basis rather than dealing with per-fields case.

In the end, if user would like to compress particular field, he can always
to it on his own, and set already compressed field to our BinaryObject.

Vladimir.


On Thu, Jun 8, 2017 at 12:37 PM, Vyacheslav Daradur <[hidden email]>
wrote:

> Valentin,
>
> Yes, I have the prototype[1][2]
>
> You can see an example of Java class[3] that I used in my benchmark.
> For example:
> class Foo {
> @BinaryCompression
> String data;
> }
> If user make decision to store the object in compressed form, he can use
> the annotation @BinaryCompression as shown above.
> It means annotated field 'data' will be compressed at marshalling.
>
> [1] https://github.com/apache/ignite/pull/1951
> [2] https://issues.apache.org/jira/browse/IGNITE-5226
> [3]
> https://github.com/daradurvs/ignite-compression/blob/
> master/src/main/java/ru/daradurvs/ignite/compression/model/Audit1F.java
>
>
>
> 2017-06-08 2:04 GMT+03:00 Valentin Kulichenko <
> [hidden email]
> >:
>
> > Vyacheslav, Anton,
> >
> > Are there any ideas and/or prototypes for the API? Your design
> suggestions
> > seem to make sense, but I would like to see how it all this will like
> from
> > user's standpoint.
> >
> > -Val
> >
> > On Wed, Jun 7, 2017 at 1:06 AM, Антон Чураев <[hidden email]>
> wrote:
> >
> > > Vyacheslav, correct me if something wrong
> > >
> > > We could provide opportunity of choose between CPU usage and MEM/NET
> > usage
> > > for users by compression some attributes of stored objects.
> > > You have learned design, and it is possible to localize changes in
> > > marshalling without performance affect and current functionality.
> > >
> > > I think, that it's usefull for our project and users.
> > > Community, what do you think about this proposal?
> > >
> > >
> > > 2017-06-06 17:29 GMT+03:00 Vyacheslav Daradur <[hidden email]>:
> > >
> > > > In short,
> > > >
> > > > During marshalling a fields is represented as BinaryFieldAccessor
> which
> > > > manages its marshalling. It checks if the field is marked by
> annotation
> > > > @BinaryCompression, in that case - binary  representation of field
> > (bytes
> > > > array) will be compressed. It will be marked as compressed by types
> > > > constant (GridBinaryMarshaller.COMPRESSED), after this the
> compressed
> > > > bytes
> > > > array wiil be include in binary representation of whole object. Note,
> > > > header of marshalled object will not be compressed. Compression
> > affected
> > > > only object's field representation.
> > > >
> > > > Objects in IgniteCache is represented as BinaryObject which is
> wrapper
> > > over
> > > > bytes array of marshalled object.
> > > > BinaryObject provides some usefull methods, which are used by Ignite
> > > > systems.
> > > > For example, the Queries use BinaryObject#field method, which
> > > deserializes
> > > > only field of object, without deserializing of whole object.
> > > > BinaryObject#field method during deserialization, if meets the
> constant
> > > of
> > > > compressed type, decompress this bytes array, then continue
> > unmarshalling
> > > > as usual.
> > > >
> > > > Now, I introduced the Compressor interface in IgniteConfigurations,
> it
> > > > allows user to use own implementation of compressor - it is the
> > > requirement
> > > > in the task[1].
> > > >
> > > > As far as I know, Vladimir Ozerov doesn't like the idea of granting
> > this
> > > > opportunity to the user.
> > > > In that case we can choose a compression algorithm which we will
> > provide
> > > by
> > > > default and will move the interface to internals of binary
> > > infractructure.
> > > > For this case I've prepared benchmarked, which I've sent earlier.
> > > >
> > > > I vote for ZSTD algorithm[2], it provides good compression ratio and
> > good
> > > > throughput. It has implementation in Java, .NET and C++, and has
> > > > ASF-friendly license, we can use it in the all Ignite platforms.
> > > > You can look at an assessment of this algorithm in my benchmark's
> > > >
> > > > [1] https://issues.apache.org/jira/browse/IGNITE-3592
> > > > [2]https://github.com/facebook/zstd
> > > >
> > > >
> > > > 2017-06-06 16:02 GMT+03:00 Антон Чураев <[hidden email]>:
> > > >
> > > > > Looks good for me.
> > > > >
> > > > > Could You propose design of implementation in couple of sentences?
> > > > > So that we can estimate the completeness and complexity of the
> > > proposal.
> > > > >
> > > > > 2017-06-06 15:26 GMT+03:00 Vyacheslav Daradur <[hidden email]
> >:
> > > > >
> > > > > > Anton,
> > > > > >
> > > > > > Of course, the solution does not affect on existing
> > implementation. I
> > > > > mean,
> > > > > > there is no changes if user not use the annotation
> > > @BinaryCompression.
> > > > > (no
> > > > > > performance changes)
> > > > > > Only if user make decision to use compression on specific field
> or
> > > > fields
> > > > > > of a class - in that case compression will be used at marshalling
> > in
> > > > > > relation to annotated fields.
> > > > > >
> > > > > > 2017-06-06 15:10 GMT+03:00 Антон Чураев <[hidden email]>:
> > > > > >
> > > > > > > Vyacheslav,
> > > > > > >
> > > > > > > Is it possible to propose implementation that can be switched
> on
> > > > > > on-demand?
> > > > > > > In this case it should not affect performance of current
> > solution.
> > > > > > >
> > > > > > > I mean, that users should make decision what is more important
> > for
> > > > > them:
> > > > > > > throutput or memory/net usage.
> > > > > > > May be they will be choose not all objects, or only some
> > attributes
> > > > of
> > > > > > > objects for compress.
> > > > > > >
> > > > > > > 2017-06-06 14:48 GMT+03:00 Vyacheslav Daradur <
> > [hidden email]
> > > >:
> > > > > > >
> > > > > > > > Conclusion:
> > > > > > > > Provided solution allows reduce size of an object in
> > IgniteCache
> > > at
> > > > > the
> > > > > > > > cost of throughput reduction (small - in some cases), it
> > depends
> > > on
> > > > > > part
> > > > > > > of
> > > > > > > > object which will be compressed and compression algorithm.
> > > > > > > > I mean, we can make more effective use of memory, and in some
> > > cases
> > > > > it
> > > > > > > can
> > > > > > > > reduce loading of the interconnect. (replication,
> rebalancing)
> > > > > > > >
> > > > > > > > Especially, it will be particularly useful for object's
> fields
> > > > which
> > > > > > are
> > > > > > > > large text (>~ 250 bytes) and can be effectively compressed.
> > > > > > > >
> > > > > > > > 2017-06-06 12:00 GMT+03:00 Антон Чураев <
> [hidden email]
> > >:
> > > > > > > >
> > > > > > > > > Vyacheslav, thank you! But could you please provide a
> > > conclusions
> > > > > or
> > > > > > > > > proposals based on this benchmarks?
> > > > > > > > >
> > > > > > > > > 2017-06-06 11:28 GMT+03:00 Vyacheslav Daradur <
> > > > [hidden email]
> > > > > >:
> > > > > > > > >
> > > > > > > > > > Dmitry,
> > > > > > > > > >
> > > > > > > > > > Excel-pages:
> > > > > > > > > >
> > > > > > > > > > 1). "Compression ratio (2)" - shows object size, with
> > > > compression
> > > > > > and
> > > > > > > > > > without compression. (Conditions: literal text)
> > > > > > > > > > 1st graph shows compression ratios of using different
> > > > compression
> > > > > > > > > algrithms
> > > > > > > > > > depending on size of compressed field.
> > > > > > > > > > 2nd graph shows evaluation of size of objects depending
> on
> > > > sizes
> > > > > > and
> > > > > > > > > > compression algorithms.
> > > > > > > > > >
> > > > > > > > > > 2). "Compression ratio (1)" - shows object size, with
> > > > compression
> > > > > > and
> > > > > > > > > > without compression. (Conditions:  badly compressed
> > character
> > > > > > > sequence)
> > > > > > > > > > 1st graph shows compression ratios of using different
> > > > compression
> > > > > > > > > > algrithms depending on size of compressed field.
> > > > > > > > > > 2nd graph shows evaluation of size of objects depending
> on
> > > > sizes
> > > > > > and
> > > > > > > > > > compression algorithms.
> > > > > > > > > >
> > > > > > > > > > 3) 'put-avg" - shows average time of the "put" operation
> > > > > depending
> > > > > > on
> > > > > > > > > size
> > > > > > > > > > and compression algorithms.
> > > > > > > > > >
> > > > > > > > > > 4) 'put-thrpt" - shows throughput of the "put" operation
> > > > > depending
> > > > > > on
> > > > > > > > > size
> > > > > > > > > > and compression algorithms.
> > > > > > > > > >
> > > > > > > > > > 5) 'get-avg" - shows average time of the "get" operation
> > > > > depending
> > > > > > on
> > > > > > > > > size
> > > > > > > > > > and compression algorithms.
> > > > > > > > > >
> > > > > > > > > > 6) 'get-thrpt" - shows throughput of the "get" operation
> > > > > depending
> > > > > > on
> > > > > > > > > size
> > > > > > > > > > and compression algorithms.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > 2017-06-06 10:59 GMT+03:00 Dmitriy Setrakyan <
> > > > > > [hidden email]
> > > > > > > >:
> > > > > > > > > >
> > > > > > > > > > > Vladimir, I am not sure how to interpret the graphs?
> What
> > > are
> > > > > we
> > > > > > > > > looking
> > > > > > > > > > > at?
> > > > > > > > > > >
> > > > > > > > > > > On Tue, Jun 6, 2017 at 12:33 AM, Vyacheslav Daradur <
> > > > > > > > > [hidden email]
> > > > > > > > > > >
> > > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Hi, Igniters.
> > > > > > > > > > > >
> > > > > > > > > > > > I've prepared some benchmarking. Results [1].
> > > > > > > > > > > >
> > > > > > > > > > > > And I've prepared the evaluation in the form of
> > diagrams
> > > > [2].
> > > > > > > > > > > >
> > > > > > > > > > > > I hope that helps to interest the community and
> > > > accelerates a
> > > > > > > > > reaction
> > > > > > > > > > to
> > > > > > > > > > > > this improvment :)
> > > > > > > > > > > >
> > > > > > > > > > > > [1]
> > > > > > > > > > > > https://github.com/daradurvs/
> ignite-compression/tree/
> > > > > > > > > > > > master/src/main/resources/result
> > > > > > > > > > > > [2] https://drive.google.com/file/d/
> > > > > > > 0B2CeUAOgrHkoMklyZ25YTEdKcEk/
> > > > > > > > > view
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > 2017-05-24 9:49 GMT+03:00 Vyacheslav Daradur <
> > > > > > > [hidden email]
> > > > > > > > >:
> > > > > > > > > > > >
> > > > > > > > > > > > > Guys, any thoughts?
> > > > > > > > > > > > >
> > > > > > > > > > > > > 2017-05-16 13:40 GMT+03:00 Vyacheslav Daradur <
> > > > > > > > [hidden email]
> > > > > > > > > >:
> > > > > > > > > > > > >
> > > > > > > > > > > > >> Hi guys,
> > > > > > > > > > > > >>
> > > > > > > > > > > > >> I've prepared the PR to show my idea.
> > > > > > > > > > > > >> https://github.com/apache/ignite/pull/1951/files
> > > > > > > > > > > > >>
> > > > > > > > > > > > >> About querying - I've just copied existing tests
> and
> > > > have
> > > > > > > > > annotated
> > > > > > > > > > > the
> > > > > > > > > > > > >> testing data.
> > > > > > > > > > > > >> https://github.com/apache/
> > > ignite/pull/1951/files#diff-
> > > > > > c19a9d
> > > > > > > > > > > > >> f4058141d059bb577e75244764
> > > > > > > > > > > > >>
> > > > > > > > > > > > >> It means fields which will be marked by
> > > > @BinaryCompression
> > > > > > > will
> > > > > > > > be
> > > > > > > > > > > > >> compressed at marshalling via BinaryMarshaller.
> > > > > > > > > > > > >>
> > > > > > > > > > > > >> This solution has no effect on existing data or
> > > project
> > > > > > > > > > architecture.
> > > > > > > > > > > > >>
> > > > > > > > > > > > >> I'll be glad to see your thougths.
> > > > > > > > > > > > >>
> > > > > > > > > > > > >>
> > > > > > > > > > > > >> 2017-05-15 19:18 GMT+03:00 Vyacheslav Daradur <
> > > > > > > > > [hidden email]
> > > > > > > > > > >:
> > > > > > > > > > > > >>
> > > > > > > > > > > > >>> Dmitriy,
> > > > > > > > > > > > >>>
> > > > > > > > > > > > >>> I have ready prototype. I want to show it.
> > > > > > > > > > > > >>> It is always easier to discuss on example.
> > > > > > > > > > > > >>>
> > > > > > > > > > > > >>> 2017-05-15 19:02 GMT+03:00 Dmitriy Setrakyan <
> > > > > > > > > > [hidden email]
> > > > > > > > > > > >:
> > > > > > > > > > > > >>>
> > > > > > > > > > > > >>>> Vyacheslav,
> > > > > > > > > > > > >>>>
> > > > > > > > > > > > >>>> I think it is a bit premature to provide a PR
> > > without
> > > > > > > getting
> > > > > > > > a
> > > > > > > > > > > > >>>> community
> > > > > > > > > > > > >>>> consensus on the dev list. Please allow some
> time
> > > for
> > > > > the
> > > > > > > > > > community
> > > > > > > > > > > to
> > > > > > > > > > > > >>>> respond.
> > > > > > > > > > > > >>>>
> > > > > > > > > > > > >>>> D.
> > > > > > > > > > > > >>>>
> > > > > > > > > > > > >>>> On Mon, May 15, 2017 at 6:36 AM, Vyacheslav
> > Daradur
> > > <
> > > > > > > > > > > > >>>> [hidden email]>
> > > > > > > > > > > > >>>> wrote:
> > > > > > > > > > > > >>>>
> > > > > > > > > > > > >>>> > I created the ticket:
> > > > https://issues.apache.org/jira
> > > > > > > > > > > > >>>> /browse/IGNITE-5226
> > > > > > > > > > > > >>>> >
> > > > > > > > > > > > >>>> > I'll prepare a PR with described solution in
> > > couple
> > > > of
> > > > > > > days.
> > > > > > > > > > > > >>>> >
> > > > > > > > > > > > >>>> > 2017-05-15 15:05 GMT+03:00 Vyacheslav Daradur
> <
> > > > > > > > > > > [hidden email]
> > > > > > > > > > > > >:
> > > > > > > > > > > > >>>> >
> > > > > > > > > > > > >>>> > > Hi, Igniters!
> > > > > > > > > > > > >>>> > >
> > > > > > > > > > > > >>>> > > Apache 2.0 is released.
> > > > > > > > > > > > >>>> > >
> > > > > > > > > > > > >>>> > > Let's continue the discussion about a
> > > compression
> > > > > > > design.
> > > > > > > > > > > > >>>> > >
> > > > > > > > > > > > >>>> > > At the moment, I found only one solution
> which
> > > is
> > > > > > > > compatible
> > > > > > > > > > > with
> > > > > > > > > > > > >>>> > querying
> > > > > > > > > > > > >>>> > > and indexing, this is per-objects-field
> > > > compression.
> > > > > > > > > > > > >>>> > > Per-fields compression means that metadata
> (a
> > > > > header)
> > > > > > of
> > > > > > > > an
> > > > > > > > > > > object
> > > > > > > > > > > > >>>> won't
> > > > > > > > > > > > >>>> > > be compressed, only serialized values of an
> > > object
> > > > > > > fields
> > > > > > > > > (in
> > > > > > > > > > > > bytes
> > > > > > > > > > > > >>>> array
> > > > > > > > > > > > >>>> > > form) will be compressed.
> > > > > > > > > > > > >>>> > >
> > > > > > > > > > > > >>>> > > This solution have some contentious issues:
> > > > > > > > > > > > >>>> > > - small values, like primitives and short
> > > arrays -
> > > > > > there
> > > > > > > > > isn't
> > > > > > > > > > > > >>>> sense to
> > > > > > > > > > > > >>>> > > compress them;
> > > > > > > > > > > > >>>> > > - there is no possible to use compression
> with
> > > > > > > > > java-predefined
> > > > > > > > > > > > >>>> types;
> > > > > > > > > > > > >>>> > >
> > > > > > > > > > > > >>>> > > We can provide an annotation,
> > > @IgniteCompression -
> > > > > for
> > > > > > > > > > example,
> > > > > > > > > > > > >>>> which can
> > > > > > > > > > > > >>>> > > be used by users for marking fields to
> > compress.
> > > > > > > > > > > > >>>> > >
> > > > > > > > > > > > >>>> > > Any thoughts?
> > > > > > > > > > > > >>>> > >
> > > > > > > > > > > > >>>> > > Maybe someone already have ready design?
> > > > > > > > > > > > >>>> > >
> > > > > > > > > > > > >>>> > > 2017-04-10 11:06 GMT+03:00 Vyacheslav
> Daradur
> > <
> > > > > > > > > > > > [hidden email]
> > > > > > > > > > > > >>>> >:
> > > > > > > > > > > > >>>> > >
> > > > > > > > > > > > >>>> > >> Alexey,
> > > > > > > > > > > > >>>> > >>
> > > > > > > > > > > > >>>> > >> Yes, I've read it.
> > > > > > > > > > > > >>>> > >>
> > > > > > > > > > > > >>>> > >> Ok, let's discuss about public API design.
> > > > > > > > > > > > >>>> > >>
> > > > > > > > > > > > >>>> > >> I think we need to add some a configure
> > entity
> > > to
> > > > > > > > > > > > >>>> CacheConfiguration,
> > > > > > > > > > > > >>>> > >> which will contain the Compressor interface
> > > > > > > > implementation
> > > > > > > > > > and
> > > > > > > > > > > > some
> > > > > > > > > > > > >>>> > usefull
> > > > > > > > > > > > >>>> > >> parameters.
> > > > > > > > > > > > >>>> > >> Or maybe to provide a BinaryMarshaller
> > > decorator,
> > > > > > which
> > > > > > > > > will
> > > > > > > > > > be
> > > > > > > > > > > > >>>> compress
> > > > > > > > > > > > >>>> > >> data after marshalling.
> > > > > > > > > > > > >>>> > >>
> > > > > > > > > > > > >>>> > >>
> > > > > > > > > > > > >>>> > >> 2017-04-10 10:40 GMT+03:00 Alexey
> Kuznetsov <
> > > > > > > > > > > > [hidden email]
> > > > > > > > > > > > >>>> >:
> > > > > > > > > > > > >>>> > >>
> > > > > > > > > > > > >>>> > >>> Vyacheslav,
> > > > > > > > > > > > >>>> > >>>
> > > > > > > > > > > > >>>> > >>> Did you read initial discussion [1] about
> > > > > > compression?
> > > > > > > > > > > > >>>> > >>> As far as I remember we agreed to add only
> > > some
> > > > > > > > > "top-level"
> > > > > > > > > > > API
> > > > > > > > > > > > in
> > > > > > > > > > > > >>>> > order
> > > > > > > > > > > > >>>> > >>> to
> > > > > > > > > > > > >>>> > >>> provide a way for
> > > > > > > > > > > > >>>> > >>> Ignite users to inject some sort of custom
> > > > > > > compression.
> > > > > > > > > > > > >>>> > >>>
> > > > > > > > > > > > >>>> > >>>
> > > > > > > > > > > > >>>> > >>> [1]
> > > > > > > > > > > > >>>> > >>> http://apache-ignite-developer
> > > > s.2346864.n4.nabble
> > > > > .
> > > > > > > > > > com/Data-c
> > > > > > > > > > > > >>>> > >>> ompression-in-Ignite-2-0-td10099.html
> > > > > > > > > > > > >>>> > >>>
> > > > > > > > > > > > >>>> > >>> On Mon, Apr 10, 2017 at 2:19 PM,
> daradurvs <
> > > > > > > > > > > [hidden email]
> > > > > > > > > > > > >
> > > > > > > > > > > > >>>> > wrote:
> > > > > > > > > > > > >>>> > >>>
> > > > > > > > > > > > >>>> > >>> > Hi Igniters!
> > > > > > > > > > > > >>>> > >>> >
> > > > > > > > > > > > >>>> > >>> > I am interested in this task.
> > > > > > > > > > > > >>>> > >>> > Provide some kind of pluggable
> compression
> > > SPI
> > > > > > > support
> > > > > > > > > > > > >>>> > >>> > <https://issues.apache.org/
> > > > > > jira/browse/IGNITE-3592>
> > > > > > > > > > > > >>>> > >>> >
> > > > > > > > > > > > >>>> > >>> > I developed a solution on
> > > > > BinaryMarshaller-level,
> > > > > > > but
> > > > > > > > > > > reviewer
> > > > > > > > > > > > >>>> has
> > > > > > > > > > > > >>>> > >>> rejected
> > > > > > > > > > > > >>>> > >>> > it.
> > > > > > > > > > > > >>>> > >>> >
> > > > > > > > > > > > >>>> > >>> > Let's continue discussion of task goals
> > and
> > > > > > solution
> > > > > > > > > > design.
> > > > > > > > > > > > >>>> > >>> > As I understood that, the main goal of
> > this
> > > > task
> > > > > > is
> > > > > > > to
> > > > > > > > > > store
> > > > > > > > > > > > >>>> data in
> > > > > > > > > > > > >>>> > >>> > compressed form.
> > > > > > > > > > > > >>>> > >>> > This is what I need from Ignite as its
> > user.
> > > > > > > > Compression
> > > > > > > > > > > > >>>> provides
> > > > > > > > > > > > >>>> > >>> economy
> > > > > > > > > > > > >>>> > >>> > on
> > > > > > > > > > > > >>>> > >>> > servers.
> > > > > > > > > > > > >>>> > >>> > We can store more data on same servers
> at
> > > the
> > > > > cost
> > > > > > > of
> > > > > > > > > > > > >>>> increasing CPU
> > > > > > > > > > > > >>>> > >>> > utilization.
> > > > > > > > > > > > >>>> > >>> >
> > > > > > > > > > > > >>>> > >>> > I'm researching a possibility of
> > > > implementation
> > > > > of
> > > > > > > > > > > compression
> > > > > > > > > > > > >>>> at the
> > > > > > > > > > > > >>>> > >>> > cache-level.
> > > > > > > > > > > > >>>> > >>> >
> > > > > > > > > > > > >>>> > >>> > Any thoughts?
> > > > > > > > > > > > >>>> > >>> >
> > > > > > > > > > > > >>>> > >>> > --
> > > > > > > > > > > > >>>> > >>> > Best regards,
> > > > > > > > > > > > >>>> > >>> > Vyacheslav
> > > > > > > > > > > > >>>> > >>> >
> > > > > > > > > > > > >>>> > >>> >
> > > > > > > > > > > > >>>> > >>> >
> > > > > > > > > > > > >>>> > >>> >
> > > > > > > > > > > > >>>> > >>> > --
> > > > > > > > > > > > >>>> > >>> > View this message in context:
> > > > > > http://apache-ignite-
> > > > > > > > > > > > >>>> > >>> > developers.2346864.n4.nabble.
> > > > > > > com/Data-compression-in-
> > > > > > > > > > > > >>>> > >>> > Ignite-2-0-tp10099p16317.html
> > > > > > > > > > > > >>>> > >>> > Sent from the Apache Ignite Developers
> > > mailing
> > > > > > list
> > > > > > > > > > archive
> > > > > > > > > > > at
> > > > > > > > > > > > >>>> > >>> Nabble.com.
> > > > > > > > > > > > >>>> > >>> >
> > > > > > > > > > > > >>>> > >>>
> > > > > > > > > > > > >>>> > >>>
> > > > > > > > > > > > >>>> > >>>
> > > > > > > > > > > > >>>> > >>> --
> > > > > > > > > > > > >>>> > >>> Alexey Kuznetsov
> > > > > > > > > > > > >>>> > >>>
> > > > > > > > > > > > >>>> > >>
> > > > > > > > > > > > >>>> > >>
> > > > > > > > > > > > >>>> > >>
> > > > > > > > > > > > >>>> > >> --
> > > > > > > > > > > > >>>> > >> Best Regards, Vyacheslav
> > > > > > > > > > > > >>>> > >>
> > > > > > > > > > > > >>>> > >
> > > > > > > > > > > > >>>> > >
> > > > > > > > > > > > >>>> > >
> > > > > > > > > > > > >>>> > > --
> > > > > > > > > > > > >>>> > > Best Regards, Vyacheslav
> > > > > > > > > > > > >>>> > >
> > > > > > > > > > > > >>>> >
> > > > > > > > > > > > >>>> >
> > > > > > > > > > > > >>>> >
> > > > > > > > > > > > >>>> > --
> > > > > > > > > > > > >>>> > Best Regards, Vyacheslav
> > > > > > > > > > > > >>>> >
> > > > > > > > > > > > >>>>
> > > > > > > > > > > > >>>
> > > > > > > > > > > > >>>
> > > > > > > > > > > > >>>
> > > > > > > > > > > > >>> --
> > > > > > > > > > > > >>> Best Regards, Vyacheslav
> > > > > > > > > > > > >>>
> > > > > > > > > > > > >>
> > > > > > > > > > > > >>
> > > > > > > > > > > > >>
> > > > > > > > > > > > >> --
> > > > > > > > > > > > >> Best Regards, Vyacheslav
> > > > > > > > > > > > >>
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > --
> > > > > > > > > > > > > Best Regards, Vyacheslav
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > --
> > > > > > > > > > > > Best Regards, Vyacheslav
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > > Best Regards, Vyacheslav
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > >
> > > > > > > > > Best Regards, Anton Churaev
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Best Regards, Vyacheslav
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > >
> > > > > > > Best Regards, Anton Churaev
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Best Regards, Vyacheslav
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > >
> > > > > Best Regards, Anton Churaev
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Best Regards, Vyacheslav
> > > >
> > >
> > >
> > >
> > > --
> > >
> > > Best Regards, Anton Churaev
> > >
> >
>
>
>
> --
> Best Regards, Vyacheslav
>
Reply | Threaded
Open this post in threaded view
|

Re: Data compression in Ignite 2.0

daradurvs
Vladimir,

The main problem which I'am trying to solve is storing data in memory in a
compression form via Ignite.
The main goal is using memory more effectivelly.

>> here the much simpler step would be to full
compression on per-cache basis rather than dealing with per-fields case.

Please explain your idea. Compess data by memory-page?
Is it compatible with quering and indexing?

>> In the end, if user would like to compress particular field, he can
always to it on his own
I think we mustn't think in this way, if user need something he trying to
choose a tool which has this feature OOTB.



2017-06-08 12:53 GMT+03:00 Vladimir Ozerov <[hidden email]>:

> Igniters,
>
> Honestly I still do not see how to apply it gracefully this feature ti
> Ignite. And overall approach to compress only particular fields looks
> overcomplicated to me. Remember, that our main use case is an application
> without classes on the server. It means that any kind of annotations are
> inapplicable. To be more precise: proper API should be implemented to
> handle no-class case (e.g. how would build such an object through
> BinaryBuilder without a class?), and only then add annotations as
> convenient addition to more basic API.
>
> It seems to me that full implementation, which takes in count proper
> "classless" API, changes to binary metadata to reflect compressed fields,
> changes to SQL, changes to binary protocol, and porting to .NET and CPP,
> will yield very complex solution with little value to the product.
>
> Instead, as I proposed earlier, it seems that we'd better start with the
> problem we are trying to solve. Basically, compression could help in two
> cases:
> 1) Transmitting data over wire - it should be implemented on communication
> layer and should not affect binary serialization component a lot.
> 2) Storing data in memory - here the much simpler step would be to full
> compression on per-cache basis rather than dealing with per-fields case.
>
> In the end, if user would like to compress particular field, he can always
> to it on his own, and set already compressed field to our BinaryObject.
>
> Vladimir.
>
>
> On Thu, Jun 8, 2017 at 12:37 PM, Vyacheslav Daradur <[hidden email]>
> wrote:
>
> > Valentin,
> >
> > Yes, I have the prototype[1][2]
> >
> > You can see an example of Java class[3] that I used in my benchmark.
> > For example:
> > class Foo {
> > @BinaryCompression
> > String data;
> > }
> > If user make decision to store the object in compressed form, he can use
> > the annotation @BinaryCompression as shown above.
> > It means annotated field 'data' will be compressed at marshalling.
> >
> > [1] https://github.com/apache/ignite/pull/1951
> > [2] https://issues.apache.org/jira/browse/IGNITE-5226
> > [3]
> > https://github.com/daradurvs/ignite-compression/blob/
> > master/src/main/java/ru/daradurvs/ignite/compression/model/Audit1F.java
> >
> >
> >
> > 2017-06-08 2:04 GMT+03:00 Valentin Kulichenko <
> > [hidden email]
> > >:
> >
> > > Vyacheslav, Anton,
> > >
> > > Are there any ideas and/or prototypes for the API? Your design
> > suggestions
> > > seem to make sense, but I would like to see how it all this will like
> > from
> > > user's standpoint.
> > >
> > > -Val
> > >
> > > On Wed, Jun 7, 2017 at 1:06 AM, Антон Чураев <[hidden email]>
> > wrote:
> > >
> > > > Vyacheslav, correct me if something wrong
> > > >
> > > > We could provide opportunity of choose between CPU usage and MEM/NET
> > > usage
> > > > for users by compression some attributes of stored objects.
> > > > You have learned design, and it is possible to localize changes in
> > > > marshalling without performance affect and current functionality.
> > > >
> > > > I think, that it's usefull for our project and users.
> > > > Community, what do you think about this proposal?
> > > >
> > > >
> > > > 2017-06-06 17:29 GMT+03:00 Vyacheslav Daradur <[hidden email]>:
> > > >
> > > > > In short,
> > > > >
> > > > > During marshalling a fields is represented as BinaryFieldAccessor
> > which
> > > > > manages its marshalling. It checks if the field is marked by
> > annotation
> > > > > @BinaryCompression, in that case - binary  representation of field
> > > (bytes
> > > > > array) will be compressed. It will be marked as compressed by types
> > > > > constant (GridBinaryMarshaller.COMPRESSED), after this the
> > compressed
> > > > > bytes
> > > > > array wiil be include in binary representation of whole object.
> Note,
> > > > > header of marshalled object will not be compressed. Compression
> > > affected
> > > > > only object's field representation.
> > > > >
> > > > > Objects in IgniteCache is represented as BinaryObject which is
> > wrapper
> > > > over
> > > > > bytes array of marshalled object.
> > > > > BinaryObject provides some usefull methods, which are used by
> Ignite
> > > > > systems.
> > > > > For example, the Queries use BinaryObject#field method, which
> > > > deserializes
> > > > > only field of object, without deserializing of whole object.
> > > > > BinaryObject#field method during deserialization, if meets the
> > constant
> > > > of
> > > > > compressed type, decompress this bytes array, then continue
> > > unmarshalling
> > > > > as usual.
> > > > >
> > > > > Now, I introduced the Compressor interface in IgniteConfigurations,
> > it
> > > > > allows user to use own implementation of compressor - it is the
> > > > requirement
> > > > > in the task[1].
> > > > >
> > > > > As far as I know, Vladimir Ozerov doesn't like the idea of granting
> > > this
> > > > > opportunity to the user.
> > > > > In that case we can choose a compression algorithm which we will
> > > provide
> > > > by
> > > > > default and will move the interface to internals of binary
> > > > infractructure.
> > > > > For this case I've prepared benchmarked, which I've sent earlier.
> > > > >
> > > > > I vote for ZSTD algorithm[2], it provides good compression ratio
> and
> > > good
> > > > > throughput. It has implementation in Java, .NET and C++, and has
> > > > > ASF-friendly license, we can use it in the all Ignite platforms.
> > > > > You can look at an assessment of this algorithm in my benchmark's
> > > > >
> > > > > [1] https://issues.apache.org/jira/browse/IGNITE-3592
> > > > > [2]https://github.com/facebook/zstd
> > > > >
> > > > >
> > > > > 2017-06-06 16:02 GMT+03:00 Антон Чураев <[hidden email]>:
> > > > >
> > > > > > Looks good for me.
> > > > > >
> > > > > > Could You propose design of implementation in couple of
> sentences?
> > > > > > So that we can estimate the completeness and complexity of the
> > > > proposal.
> > > > > >
> > > > > > 2017-06-06 15:26 GMT+03:00 Vyacheslav Daradur <
> [hidden email]
> > >:
> > > > > >
> > > > > > > Anton,
> > > > > > >
> > > > > > > Of course, the solution does not affect on existing
> > > implementation. I
> > > > > > mean,
> > > > > > > there is no changes if user not use the annotation
> > > > @BinaryCompression.
> > > > > > (no
> > > > > > > performance changes)
> > > > > > > Only if user make decision to use compression on specific field
> > or
> > > > > fields
> > > > > > > of a class - in that case compression will be used at
> marshalling
> > > in
> > > > > > > relation to annotated fields.
> > > > > > >
> > > > > > > 2017-06-06 15:10 GMT+03:00 Антон Чураев <[hidden email]
> >:
> > > > > > >
> > > > > > > > Vyacheslav,
> > > > > > > >
> > > > > > > > Is it possible to propose implementation that can be switched
> > on
> > > > > > > on-demand?
> > > > > > > > In this case it should not affect performance of current
> > > solution.
> > > > > > > >
> > > > > > > > I mean, that users should make decision what is more
> important
> > > for
> > > > > > them:
> > > > > > > > throutput or memory/net usage.
> > > > > > > > May be they will be choose not all objects, or only some
> > > attributes
> > > > > of
> > > > > > > > objects for compress.
> > > > > > > >
> > > > > > > > 2017-06-06 14:48 GMT+03:00 Vyacheslav Daradur <
> > > [hidden email]
> > > > >:
> > > > > > > >
> > > > > > > > > Conclusion:
> > > > > > > > > Provided solution allows reduce size of an object in
> > > IgniteCache
> > > > at
> > > > > > the
> > > > > > > > > cost of throughput reduction (small - in some cases), it
> > > depends
> > > > on
> > > > > > > part
> > > > > > > > of
> > > > > > > > > object which will be compressed and compression algorithm.
> > > > > > > > > I mean, we can make more effective use of memory, and in
> some
> > > > cases
> > > > > > it
> > > > > > > > can
> > > > > > > > > reduce loading of the interconnect. (replication,
> > rebalancing)
> > > > > > > > >
> > > > > > > > > Especially, it will be particularly useful for object's
> > fields
> > > > > which
> > > > > > > are
> > > > > > > > > large text (>~ 250 bytes) and can be effectively
> compressed.
> > > > > > > > >
> > > > > > > > > 2017-06-06 12:00 GMT+03:00 Антон Чураев <
> > [hidden email]
> > > >:
> > > > > > > > >
> > > > > > > > > > Vyacheslav, thank you! But could you please provide a
> > > > conclusions
> > > > > > or
> > > > > > > > > > proposals based on this benchmarks?
> > > > > > > > > >
> > > > > > > > > > 2017-06-06 11:28 GMT+03:00 Vyacheslav Daradur <
> > > > > [hidden email]
> > > > > > >:
> > > > > > > > > >
> > > > > > > > > > > Dmitry,
> > > > > > > > > > >
> > > > > > > > > > > Excel-pages:
> > > > > > > > > > >
> > > > > > > > > > > 1). "Compression ratio (2)" - shows object size, with
> > > > > compression
> > > > > > > and
> > > > > > > > > > > without compression. (Conditions: literal text)
> > > > > > > > > > > 1st graph shows compression ratios of using different
> > > > > compression
> > > > > > > > > > algrithms
> > > > > > > > > > > depending on size of compressed field.
> > > > > > > > > > > 2nd graph shows evaluation of size of objects depending
> > on
> > > > > sizes
> > > > > > > and
> > > > > > > > > > > compression algorithms.
> > > > > > > > > > >
> > > > > > > > > > > 2). "Compression ratio (1)" - shows object size, with
> > > > > compression
> > > > > > > and
> > > > > > > > > > > without compression. (Conditions:  badly compressed
> > > character
> > > > > > > > sequence)
> > > > > > > > > > > 1st graph shows compression ratios of using different
> > > > > compression
> > > > > > > > > > > algrithms depending on size of compressed field.
> > > > > > > > > > > 2nd graph shows evaluation of size of objects depending
> > on
> > > > > sizes
> > > > > > > and
> > > > > > > > > > > compression algorithms.
> > > > > > > > > > >
> > > > > > > > > > > 3) 'put-avg" - shows average time of the "put"
> operation
> > > > > > depending
> > > > > > > on
> > > > > > > > > > size
> > > > > > > > > > > and compression algorithms.
> > > > > > > > > > >
> > > > > > > > > > > 4) 'put-thrpt" - shows throughput of the "put"
> operation
> > > > > > depending
> > > > > > > on
> > > > > > > > > > size
> > > > > > > > > > > and compression algorithms.
> > > > > > > > > > >
> > > > > > > > > > > 5) 'get-avg" - shows average time of the "get"
> operation
> > > > > > depending
> > > > > > > on
> > > > > > > > > > size
> > > > > > > > > > > and compression algorithms.
> > > > > > > > > > >
> > > > > > > > > > > 6) 'get-thrpt" - shows throughput of the "get"
> operation
> > > > > > depending
> > > > > > > on
> > > > > > > > > > size
> > > > > > > > > > > and compression algorithms.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > 2017-06-06 10:59 GMT+03:00 Dmitriy Setrakyan <
> > > > > > > [hidden email]
> > > > > > > > >:
> > > > > > > > > > >
> > > > > > > > > > > > Vladimir, I am not sure how to interpret the graphs?
> > What
> > > > are
> > > > > > we
> > > > > > > > > > looking
> > > > > > > > > > > > at?
> > > > > > > > > > > >
> > > > > > > > > > > > On Tue, Jun 6, 2017 at 12:33 AM, Vyacheslav Daradur <
> > > > > > > > > > [hidden email]
> > > > > > > > > > > >
> > > > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Hi, Igniters.
> > > > > > > > > > > > >
> > > > > > > > > > > > > I've prepared some benchmarking. Results [1].
> > > > > > > > > > > > >
> > > > > > > > > > > > > And I've prepared the evaluation in the form of
> > > diagrams
> > > > > [2].
> > > > > > > > > > > > >
> > > > > > > > > > > > > I hope that helps to interest the community and
> > > > > accelerates a
> > > > > > > > > > reaction
> > > > > > > > > > > to
> > > > > > > > > > > > > this improvment :)
> > > > > > > > > > > > >
> > > > > > > > > > > > > [1]
> > > > > > > > > > > > > https://github.com/daradurvs/
> > ignite-compression/tree/
> > > > > > > > > > > > > master/src/main/resources/result
> > > > > > > > > > > > > [2] https://drive.google.com/file/d/
> > > > > > > > 0B2CeUAOgrHkoMklyZ25YTEdKcEk/
> > > > > > > > > > view
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > 2017-05-24 9:49 GMT+03:00 Vyacheslav Daradur <
> > > > > > > > [hidden email]
> > > > > > > > > >:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Guys, any thoughts?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > 2017-05-16 13:40 GMT+03:00 Vyacheslav Daradur <
> > > > > > > > > [hidden email]
> > > > > > > > > > >:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >> Hi guys,
> > > > > > > > > > > > > >>
> > > > > > > > > > > > > >> I've prepared the PR to show my idea.
> > > > > > > > > > > > > >> https://github.com/apache/
> ignite/pull/1951/files
> > > > > > > > > > > > > >>
> > > > > > > > > > > > > >> About querying - I've just copied existing tests
> > and
> > > > > have
> > > > > > > > > > annotated
> > > > > > > > > > > > the
> > > > > > > > > > > > > >> testing data.
> > > > > > > > > > > > > >> https://github.com/apache/
> > > > ignite/pull/1951/files#diff-
> > > > > > > c19a9d
> > > > > > > > > > > > > >> f4058141d059bb577e75244764
> > > > > > > > > > > > > >>
> > > > > > > > > > > > > >> It means fields which will be marked by
> > > > > @BinaryCompression
> > > > > > > > will
> > > > > > > > > be
> > > > > > > > > > > > > >> compressed at marshalling via BinaryMarshaller.
> > > > > > > > > > > > > >>
> > > > > > > > > > > > > >> This solution has no effect on existing data or
> > > > project
> > > > > > > > > > > architecture.
> > > > > > > > > > > > > >>
> > > > > > > > > > > > > >> I'll be glad to see your thougths.
> > > > > > > > > > > > > >>
> > > > > > > > > > > > > >>
> > > > > > > > > > > > > >> 2017-05-15 19:18 GMT+03:00 Vyacheslav Daradur <
> > > > > > > > > > [hidden email]
> > > > > > > > > > > >:
> > > > > > > > > > > > > >>
> > > > > > > > > > > > > >>> Dmitriy,
> > > > > > > > > > > > > >>>
> > > > > > > > > > > > > >>> I have ready prototype. I want to show it.
> > > > > > > > > > > > > >>> It is always easier to discuss on example.
> > > > > > > > > > > > > >>>
> > > > > > > > > > > > > >>> 2017-05-15 19:02 GMT+03:00 Dmitriy Setrakyan <
> > > > > > > > > > > [hidden email]
> > > > > > > > > > > > >:
> > > > > > > > > > > > > >>>
> > > > > > > > > > > > > >>>> Vyacheslav,
> > > > > > > > > > > > > >>>>
> > > > > > > > > > > > > >>>> I think it is a bit premature to provide a PR
> > > > without
> > > > > > > > getting
> > > > > > > > > a
> > > > > > > > > > > > > >>>> community
> > > > > > > > > > > > > >>>> consensus on the dev list. Please allow some
> > time
> > > > for
> > > > > > the
> > > > > > > > > > > community
> > > > > > > > > > > > to
> > > > > > > > > > > > > >>>> respond.
> > > > > > > > > > > > > >>>>
> > > > > > > > > > > > > >>>> D.
> > > > > > > > > > > > > >>>>
> > > > > > > > > > > > > >>>> On Mon, May 15, 2017 at 6:36 AM, Vyacheslav
> > > Daradur
> > > > <
> > > > > > > > > > > > > >>>> [hidden email]>
> > > > > > > > > > > > > >>>> wrote:
> > > > > > > > > > > > > >>>>
> > > > > > > > > > > > > >>>> > I created the ticket:
> > > > > https://issues.apache.org/jira
> > > > > > > > > > > > > >>>> /browse/IGNITE-5226
> > > > > > > > > > > > > >>>> >
> > > > > > > > > > > > > >>>> > I'll prepare a PR with described solution in
> > > > couple
> > > > > of
> > > > > > > > days.
> > > > > > > > > > > > > >>>> >
> > > > > > > > > > > > > >>>> > 2017-05-15 15:05 GMT+03:00 Vyacheslav
> Daradur
> > <
> > > > > > > > > > > > [hidden email]
> > > > > > > > > > > > > >:
> > > > > > > > > > > > > >>>> >
> > > > > > > > > > > > > >>>> > > Hi, Igniters!
> > > > > > > > > > > > > >>>> > >
> > > > > > > > > > > > > >>>> > > Apache 2.0 is released.
> > > > > > > > > > > > > >>>> > >
> > > > > > > > > > > > > >>>> > > Let's continue the discussion about a
> > > > compression
> > > > > > > > design.
> > > > > > > > > > > > > >>>> > >
> > > > > > > > > > > > > >>>> > > At the moment, I found only one solution
> > which
> > > > is
> > > > > > > > > compatible
> > > > > > > > > > > > with
> > > > > > > > > > > > > >>>> > querying
> > > > > > > > > > > > > >>>> > > and indexing, this is per-objects-field
> > > > > compression.
> > > > > > > > > > > > > >>>> > > Per-fields compression means that metadata
> > (a
> > > > > > header)
> > > > > > > of
> > > > > > > > > an
> > > > > > > > > > > > object
> > > > > > > > > > > > > >>>> won't
> > > > > > > > > > > > > >>>> > > be compressed, only serialized values of
> an
> > > > object
> > > > > > > > fields
> > > > > > > > > > (in
> > > > > > > > > > > > > bytes
> > > > > > > > > > > > > >>>> array
> > > > > > > > > > > > > >>>> > > form) will be compressed.
> > > > > > > > > > > > > >>>> > >
> > > > > > > > > > > > > >>>> > > This solution have some contentious
> issues:
> > > > > > > > > > > > > >>>> > > - small values, like primitives and short
> > > > arrays -
> > > > > > > there
> > > > > > > > > > isn't
> > > > > > > > > > > > > >>>> sense to
> > > > > > > > > > > > > >>>> > > compress them;
> > > > > > > > > > > > > >>>> > > - there is no possible to use compression
> > with
> > > > > > > > > > java-predefined
> > > > > > > > > > > > > >>>> types;
> > > > > > > > > > > > > >>>> > >
> > > > > > > > > > > > > >>>> > > We can provide an annotation,
> > > > @IgniteCompression -
> > > > > > for
> > > > > > > > > > > example,
> > > > > > > > > > > > > >>>> which can
> > > > > > > > > > > > > >>>> > > be used by users for marking fields to
> > > compress.
> > > > > > > > > > > > > >>>> > >
> > > > > > > > > > > > > >>>> > > Any thoughts?
> > > > > > > > > > > > > >>>> > >
> > > > > > > > > > > > > >>>> > > Maybe someone already have ready design?
> > > > > > > > > > > > > >>>> > >
> > > > > > > > > > > > > >>>> > > 2017-04-10 11:06 GMT+03:00 Vyacheslav
> > Daradur
> > > <
> > > > > > > > > > > > > [hidden email]
> > > > > > > > > > > > > >>>> >:
> > > > > > > > > > > > > >>>> > >
> > > > > > > > > > > > > >>>> > >> Alexey,
> > > > > > > > > > > > > >>>> > >>
> > > > > > > > > > > > > >>>> > >> Yes, I've read it.
> > > > > > > > > > > > > >>>> > >>
> > > > > > > > > > > > > >>>> > >> Ok, let's discuss about public API
> design.
> > > > > > > > > > > > > >>>> > >>
> > > > > > > > > > > > > >>>> > >> I think we need to add some a configure
> > > entity
> > > > to
> > > > > > > > > > > > > >>>> CacheConfiguration,
> > > > > > > > > > > > > >>>> > >> which will contain the Compressor
> interface
> > > > > > > > > implementation
> > > > > > > > > > > and
> > > > > > > > > > > > > some
> > > > > > > > > > > > > >>>> > usefull
> > > > > > > > > > > > > >>>> > >> parameters.
> > > > > > > > > > > > > >>>> > >> Or maybe to provide a BinaryMarshaller
> > > > decorator,
> > > > > > > which
> > > > > > > > > > will
> > > > > > > > > > > be
> > > > > > > > > > > > > >>>> compress
> > > > > > > > > > > > > >>>> > >> data after marshalling.
> > > > > > > > > > > > > >>>> > >>
> > > > > > > > > > > > > >>>> > >>
> > > > > > > > > > > > > >>>> > >> 2017-04-10 10:40 GMT+03:00 Alexey
> > Kuznetsov <
> > > > > > > > > > > > > [hidden email]
> > > > > > > > > > > > > >>>> >:
> > > > > > > > > > > > > >>>> > >>
> > > > > > > > > > > > > >>>> > >>> Vyacheslav,
> > > > > > > > > > > > > >>>> > >>>
> > > > > > > > > > > > > >>>> > >>> Did you read initial discussion [1]
> about
> > > > > > > compression?
> > > > > > > > > > > > > >>>> > >>> As far as I remember we agreed to add
> only
> > > > some
> > > > > > > > > > "top-level"
> > > > > > > > > > > > API
> > > > > > > > > > > > > in
> > > > > > > > > > > > > >>>> > order
> > > > > > > > > > > > > >>>> > >>> to
> > > > > > > > > > > > > >>>> > >>> provide a way for
> > > > > > > > > > > > > >>>> > >>> Ignite users to inject some sort of
> custom
> > > > > > > > compression.
> > > > > > > > > > > > > >>>> > >>>
> > > > > > > > > > > > > >>>> > >>>
> > > > > > > > > > > > > >>>> > >>> [1]
> > > > > > > > > > > > > >>>> > >>> http://apache-ignite-developer
> > > > > s.2346864.n4.nabble
> > > > > > .
> > > > > > > > > > > com/Data-c
> > > > > > > > > > > > > >>>> > >>> ompression-in-Ignite-2-0-td10099.html
> > > > > > > > > > > > > >>>> > >>>
> > > > > > > > > > > > > >>>> > >>> On Mon, Apr 10, 2017 at 2:19 PM,
> > daradurvs <
> > > > > > > > > > > > [hidden email]
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >>>> > wrote:
> > > > > > > > > > > > > >>>> > >>>
> > > > > > > > > > > > > >>>> > >>> > Hi Igniters!
> > > > > > > > > > > > > >>>> > >>> >
> > > > > > > > > > > > > >>>> > >>> > I am interested in this task.
> > > > > > > > > > > > > >>>> > >>> > Provide some kind of pluggable
> > compression
> > > > SPI
> > > > > > > > support
> > > > > > > > > > > > > >>>> > >>> > <https://issues.apache.org/
> > > > > > > jira/browse/IGNITE-3592>
> > > > > > > > > > > > > >>>> > >>> >
> > > > > > > > > > > > > >>>> > >>> > I developed a solution on
> > > > > > BinaryMarshaller-level,
> > > > > > > > but
> > > > > > > > > > > > reviewer
> > > > > > > > > > > > > >>>> has
> > > > > > > > > > > > > >>>> > >>> rejected
> > > > > > > > > > > > > >>>> > >>> > it.
> > > > > > > > > > > > > >>>> > >>> >
> > > > > > > > > > > > > >>>> > >>> > Let's continue discussion of task
> goals
> > > and
> > > > > > > solution
> > > > > > > > > > > design.
> > > > > > > > > > > > > >>>> > >>> > As I understood that, the main goal of
> > > this
> > > > > task
> > > > > > > is
> > > > > > > > to
> > > > > > > > > > > store
> > > > > > > > > > > > > >>>> data in
> > > > > > > > > > > > > >>>> > >>> > compressed form.
> > > > > > > > > > > > > >>>> > >>> > This is what I need from Ignite as its
> > > user.
> > > > > > > > > Compression
> > > > > > > > > > > > > >>>> provides
> > > > > > > > > > > > > >>>> > >>> economy
> > > > > > > > > > > > > >>>> > >>> > on
> > > > > > > > > > > > > >>>> > >>> > servers.
> > > > > > > > > > > > > >>>> > >>> > We can store more data on same servers
> > at
> > > > the
> > > > > > cost
> > > > > > > > of
> > > > > > > > > > > > > >>>> increasing CPU
> > > > > > > > > > > > > >>>> > >>> > utilization.
> > > > > > > > > > > > > >>>> > >>> >
> > > > > > > > > > > > > >>>> > >>> > I'm researching a possibility of
> > > > > implementation
> > > > > > of
> > > > > > > > > > > > compression
> > > > > > > > > > > > > >>>> at the
> > > > > > > > > > > > > >>>> > >>> > cache-level.
> > > > > > > > > > > > > >>>> > >>> >
> > > > > > > > > > > > > >>>> > >>> > Any thoughts?
> > > > > > > > > > > > > >>>> > >>> >
> > > > > > > > > > > > > >>>> > >>> > --
> > > > > > > > > > > > > >>>> > >>> > Best regards,
> > > > > > > > > > > > > >>>> > >>> > Vyacheslav
> > > > > > > > > > > > > >>>> > >>> >
> > > > > > > > > > > > > >>>> > >>> >
> > > > > > > > > > > > > >>>> > >>> >
> > > > > > > > > > > > > >>>> > >>> >
> > > > > > > > > > > > > >>>> > >>> > --
> > > > > > > > > > > > > >>>> > >>> > View this message in context:
> > > > > > > http://apache-ignite-
> > > > > > > > > > > > > >>>> > >>> > developers.2346864.n4.nabble.
> > > > > > > > com/Data-compression-in-
> > > > > > > > > > > > > >>>> > >>> > Ignite-2-0-tp10099p16317.html
> > > > > > > > > > > > > >>>> > >>> > Sent from the Apache Ignite Developers
> > > > mailing
> > > > > > > list
> > > > > > > > > > > archive
> > > > > > > > > > > > at
> > > > > > > > > > > > > >>>> > >>> Nabble.com.
> > > > > > > > > > > > > >>>> > >>> >
> > > > > > > > > > > > > >>>> > >>>
> > > > > > > > > > > > > >>>> > >>>
> > > > > > > > > > > > > >>>> > >>>
> > > > > > > > > > > > > >>>> > >>> --
> > > > > > > > > > > > > >>>> > >>> Alexey Kuznetsov
> > > > > > > > > > > > > >>>> > >>>
> > > > > > > > > > > > > >>>> > >>
> > > > > > > > > > > > > >>>> > >>
> > > > > > > > > > > > > >>>> > >>
> > > > > > > > > > > > > >>>> > >> --
> > > > > > > > > > > > > >>>> > >> Best Regards, Vyacheslav
> > > > > > > > > > > > > >>>> > >>
> > > > > > > > > > > > > >>>> > >
> > > > > > > > > > > > > >>>> > >
> > > > > > > > > > > > > >>>> > >
> > > > > > > > > > > > > >>>> > > --
> > > > > > > > > > > > > >>>> > > Best Regards, Vyacheslav
> > > > > > > > > > > > > >>>> > >
> > > > > > > > > > > > > >>>> >
> > > > > > > > > > > > > >>>> >
> > > > > > > > > > > > > >>>> >
> > > > > > > > > > > > > >>>> > --
> > > > > > > > > > > > > >>>> > Best Regards, Vyacheslav
> > > > > > > > > > > > > >>>> >
> > > > > > > > > > > > > >>>>
> > > > > > > > > > > > > >>>
> > > > > > > > > > > > > >>>
> > > > > > > > > > > > > >>>
> > > > > > > > > > > > > >>> --
> > > > > > > > > > > > > >>> Best Regards, Vyacheslav
> > > > > > > > > > > > > >>>
> > > > > > > > > > > > > >>
> > > > > > > > > > > > > >>
> > > > > > > > > > > > > >>
> > > > > > > > > > > > > >> --
> > > > > > > > > > > > > >> Best Regards, Vyacheslav
> > > > > > > > > > > > > >>
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > --
> > > > > > > > > > > > > > Best Regards, Vyacheslav
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > --
> > > > > > > > > > > > > Best Regards, Vyacheslav
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > --
> > > > > > > > > > > Best Regards, Vyacheslav
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > >
> > > > > > > > > > Best Regards, Anton Churaev
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > Best Regards, Vyacheslav
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > >
> > > > > > > > Best Regards, Anton Churaev
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Best Regards, Vyacheslav
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > >
> > > > > > Best Regards, Anton Churaev
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Best Regards, Vyacheslav
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > >
> > > > Best Regards, Anton Churaev
> > > >
> > >
> >
> >
> >
> > --
> > Best Regards, Vyacheslav
> >
>



--
Best Regards, Vyacheslav
Reply | Threaded
Open this post in threaded view
|

Re: Data compression in Ignite 2.0

Антон Чураев
Guys, could you please help me.
I thought that if there will storing compressed data in the memory, data
will transmit over wire in compression too. Is it right?

2017-06-08 13:30 GMT+03:00 Vyacheslav Daradur <[hidden email]>:

> Vladimir,
>
> The main problem which I'am trying to solve is storing data in memory in a
> compression form via Ignite.
> The main goal is using memory more effectivelly.
>
> >> here the much simpler step would be to full
> compression on per-cache basis rather than dealing with per-fields case.
>
> Please explain your idea. Compess data by memory-page?
> Is it compatible with quering and indexing?
>
> >> In the end, if user would like to compress particular field, he can
> always to it on his own
> I think we mustn't think in this way, if user need something he trying to
> choose a tool which has this feature OOTB.
>
>
>
> 2017-06-08 12:53 GMT+03:00 Vladimir Ozerov <[hidden email]>:
>
> > Igniters,
> >
> > Honestly I still do not see how to apply it gracefully this feature ti
> > Ignite. And overall approach to compress only particular fields looks
> > overcomplicated to me. Remember, that our main use case is an application
> > without classes on the server. It means that any kind of annotations are
> > inapplicable. To be more precise: proper API should be implemented to
> > handle no-class case (e.g. how would build such an object through
> > BinaryBuilder without a class?), and only then add annotations as
> > convenient addition to more basic API.
> >
> > It seems to me that full implementation, which takes in count proper
> > "classless" API, changes to binary metadata to reflect compressed fields,
> > changes to SQL, changes to binary protocol, and porting to .NET and CPP,
> > will yield very complex solution with little value to the product.
> >
> > Instead, as I proposed earlier, it seems that we'd better start with the
> > problem we are trying to solve. Basically, compression could help in two
> > cases:
> > 1) Transmitting data over wire - it should be implemented on
> communication
> > layer and should not affect binary serialization component a lot.
> > 2) Storing data in memory - here the much simpler step would be to full
> > compression on per-cache basis rather than dealing with per-fields case.
> >
> > In the end, if user would like to compress particular field, he can
> always
> > to it on his own, and set already compressed field to our BinaryObject.
> >
> > Vladimir.
> >
> >
> > On Thu, Jun 8, 2017 at 12:37 PM, Vyacheslav Daradur <[hidden email]
> >
> > wrote:
> >
> > > Valentin,
> > >
> > > Yes, I have the prototype[1][2]
> > >
> > > You can see an example of Java class[3] that I used in my benchmark.
> > > For example:
> > > class Foo {
> > > @BinaryCompression
> > > String data;
> > > }
> > > If user make decision to store the object in compressed form, he can
> use
> > > the annotation @BinaryCompression as shown above.
> > > It means annotated field 'data' will be compressed at marshalling.
> > >
> > > [1] https://github.com/apache/ignite/pull/1951
> > > [2] https://issues.apache.org/jira/browse/IGNITE-5226
> > > [3]
> > > https://github.com/daradurvs/ignite-compression/blob/
> > > master/src/main/java/ru/daradurvs/ignite/compression/
> model/Audit1F.java
> > >
> > >
> > >
> > > 2017-06-08 2:04 GMT+03:00 Valentin Kulichenko <
> > > [hidden email]
> > > >:
> > >
> > > > Vyacheslav, Anton,
> > > >
> > > > Are there any ideas and/or prototypes for the API? Your design
> > > suggestions
> > > > seem to make sense, but I would like to see how it all this will like
> > > from
> > > > user's standpoint.
> > > >
> > > > -Val
> > > >
> > > > On Wed, Jun 7, 2017 at 1:06 AM, Антон Чураев <[hidden email]>
> > > wrote:
> > > >
> > > > > Vyacheslav, correct me if something wrong
> > > > >
> > > > > We could provide opportunity of choose between CPU usage and
> MEM/NET
> > > > usage
> > > > > for users by compression some attributes of stored objects.
> > > > > You have learned design, and it is possible to localize changes in
> > > > > marshalling without performance affect and current functionality.
> > > > >
> > > > > I think, that it's usefull for our project and users.
> > > > > Community, what do you think about this proposal?
> > > > >
> > > > >
> > > > > 2017-06-06 17:29 GMT+03:00 Vyacheslav Daradur <[hidden email]
> >:
> > > > >
> > > > > > In short,
> > > > > >
> > > > > > During marshalling a fields is represented as BinaryFieldAccessor
> > > which
> > > > > > manages its marshalling. It checks if the field is marked by
> > > annotation
> > > > > > @BinaryCompression, in that case - binary  representation of
> field
> > > > (bytes
> > > > > > array) will be compressed. It will be marked as compressed by
> types
> > > > > > constant (GridBinaryMarshaller.COMPRESSED), after this the
> > > compressed
> > > > > > bytes
> > > > > > array wiil be include in binary representation of whole object.
> > Note,
> > > > > > header of marshalled object will not be compressed. Compression
> > > > affected
> > > > > > only object's field representation.
> > > > > >
> > > > > > Objects in IgniteCache is represented as BinaryObject which is
> > > wrapper
> > > > > over
> > > > > > bytes array of marshalled object.
> > > > > > BinaryObject provides some usefull methods, which are used by
> > Ignite
> > > > > > systems.
> > > > > > For example, the Queries use BinaryObject#field method, which
> > > > > deserializes
> > > > > > only field of object, without deserializing of whole object.
> > > > > > BinaryObject#field method during deserialization, if meets the
> > > constant
> > > > > of
> > > > > > compressed type, decompress this bytes array, then continue
> > > > unmarshalling
> > > > > > as usual.
> > > > > >
> > > > > > Now, I introduced the Compressor interface in
> IgniteConfigurations,
> > > it
> > > > > > allows user to use own implementation of compressor - it is the
> > > > > requirement
> > > > > > in the task[1].
> > > > > >
> > > > > > As far as I know, Vladimir Ozerov doesn't like the idea of
> granting
> > > > this
> > > > > > opportunity to the user.
> > > > > > In that case we can choose a compression algorithm which we will
> > > > provide
> > > > > by
> > > > > > default and will move the interface to internals of binary
> > > > > infractructure.
> > > > > > For this case I've prepared benchmarked, which I've sent earlier.
> > > > > >
> > > > > > I vote for ZSTD algorithm[2], it provides good compression ratio
> > and
> > > > good
> > > > > > throughput. It has implementation in Java, .NET and C++, and has
> > > > > > ASF-friendly license, we can use it in the all Ignite platforms.
> > > > > > You can look at an assessment of this algorithm in my benchmark's
> > > > > >
> > > > > > [1] https://issues.apache.org/jira/browse/IGNITE-3592
> > > > > > [2]https://github.com/facebook/zstd
> > > > > >
> > > > > >
> > > > > > 2017-06-06 16:02 GMT+03:00 Антон Чураев <[hidden email]>:
> > > > > >
> > > > > > > Looks good for me.
> > > > > > >
> > > > > > > Could You propose design of implementation in couple of
> > sentences?
> > > > > > > So that we can estimate the completeness and complexity of the
> > > > > proposal.
> > > > > > >
> > > > > > > 2017-06-06 15:26 GMT+03:00 Vyacheslav Daradur <
> > [hidden email]
> > > >:
> > > > > > >
> > > > > > > > Anton,
> > > > > > > >
> > > > > > > > Of course, the solution does not affect on existing
> > > > implementation. I
> > > > > > > mean,
> > > > > > > > there is no changes if user not use the annotation
> > > > > @BinaryCompression.
> > > > > > > (no
> > > > > > > > performance changes)
> > > > > > > > Only if user make decision to use compression on specific
> field
> > > or
> > > > > > fields
> > > > > > > > of a class - in that case compression will be used at
> > marshalling
> > > > in
> > > > > > > > relation to annotated fields.
> > > > > > > >
> > > > > > > > 2017-06-06 15:10 GMT+03:00 Антон Чураев <
> [hidden email]
> > >:
> > > > > > > >
> > > > > > > > > Vyacheslav,
> > > > > > > > >
> > > > > > > > > Is it possible to propose implementation that can be
> switched
> > > on
> > > > > > > > on-demand?
> > > > > > > > > In this case it should not affect performance of current
> > > > solution.
> > > > > > > > >
> > > > > > > > > I mean, that users should make decision what is more
> > important
> > > > for
> > > > > > > them:
> > > > > > > > > throutput or memory/net usage.
> > > > > > > > > May be they will be choose not all objects, or only some
> > > > attributes
> > > > > > of
> > > > > > > > > objects for compress.
> > > > > > > > >
> > > > > > > > > 2017-06-06 14:48 GMT+03:00 Vyacheslav Daradur <
> > > > [hidden email]
> > > > > >:
> > > > > > > > >
> > > > > > > > > > Conclusion:
> > > > > > > > > > Provided solution allows reduce size of an object in
> > > > IgniteCache
> > > > > at
> > > > > > > the
> > > > > > > > > > cost of throughput reduction (small - in some cases), it
> > > > depends
> > > > > on
> > > > > > > > part
> > > > > > > > > of
> > > > > > > > > > object which will be compressed and compression
> algorithm.
> > > > > > > > > > I mean, we can make more effective use of memory, and in
> > some
> > > > > cases
> > > > > > > it
> > > > > > > > > can
> > > > > > > > > > reduce loading of the interconnect. (replication,
> > > rebalancing)
> > > > > > > > > >
> > > > > > > > > > Especially, it will be particularly useful for object's
> > > fields
> > > > > > which
> > > > > > > > are
> > > > > > > > > > large text (>~ 250 bytes) and can be effectively
> > compressed.
> > > > > > > > > >
> > > > > > > > > > 2017-06-06 12:00 GMT+03:00 Антон Чураев <
> > > [hidden email]
> > > > >:
> > > > > > > > > >
> > > > > > > > > > > Vyacheslav, thank you! But could you please provide a
> > > > > conclusions
> > > > > > > or
> > > > > > > > > > > proposals based on this benchmarks?
> > > > > > > > > > >
> > > > > > > > > > > 2017-06-06 11:28 GMT+03:00 Vyacheslav Daradur <
> > > > > > [hidden email]
> > > > > > > >:
> > > > > > > > > > >
> > > > > > > > > > > > Dmitry,
> > > > > > > > > > > >
> > > > > > > > > > > > Excel-pages:
> > > > > > > > > > > >
> > > > > > > > > > > > 1). "Compression ratio (2)" - shows object size, with
> > > > > > compression
> > > > > > > > and
> > > > > > > > > > > > without compression. (Conditions: literal text)
> > > > > > > > > > > > 1st graph shows compression ratios of using different
> > > > > > compression
> > > > > > > > > > > algrithms
> > > > > > > > > > > > depending on size of compressed field.
> > > > > > > > > > > > 2nd graph shows evaluation of size of objects
> depending
> > > on
> > > > > > sizes
> > > > > > > > and
> > > > > > > > > > > > compression algorithms.
> > > > > > > > > > > >
> > > > > > > > > > > > 2). "Compression ratio (1)" - shows object size, with
> > > > > > compression
> > > > > > > > and
> > > > > > > > > > > > without compression. (Conditions:  badly compressed
> > > > character
> > > > > > > > > sequence)
> > > > > > > > > > > > 1st graph shows compression ratios of using different
> > > > > > compression
> > > > > > > > > > > > algrithms depending on size of compressed field.
> > > > > > > > > > > > 2nd graph shows evaluation of size of objects
> depending
> > > on
> > > > > > sizes
> > > > > > > > and
> > > > > > > > > > > > compression algorithms.
> > > > > > > > > > > >
> > > > > > > > > > > > 3) 'put-avg" - shows average time of the "put"
> > operation
> > > > > > > depending
> > > > > > > > on
> > > > > > > > > > > size
> > > > > > > > > > > > and compression algorithms.
> > > > > > > > > > > >
> > > > > > > > > > > > 4) 'put-thrpt" - shows throughput of the "put"
> > operation
> > > > > > > depending
> > > > > > > > on
> > > > > > > > > > > size
> > > > > > > > > > > > and compression algorithms.
> > > > > > > > > > > >
> > > > > > > > > > > > 5) 'get-avg" - shows average time of the "get"
> > operation
> > > > > > > depending
> > > > > > > > on
> > > > > > > > > > > size
> > > > > > > > > > > > and compression algorithms.
> > > > > > > > > > > >
> > > > > > > > > > > > 6) 'get-thrpt" - shows throughput of the "get"
> > operation
> > > > > > > depending
> > > > > > > > on
> > > > > > > > > > > size
> > > > > > > > > > > > and compression algorithms.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > 2017-06-06 10:59 GMT+03:00 Dmitriy Setrakyan <
> > > > > > > > [hidden email]
> > > > > > > > > >:
> > > > > > > > > > > >
> > > > > > > > > > > > > Vladimir, I am not sure how to interpret the
> graphs?
> > > What
> > > > > are
> > > > > > > we
> > > > > > > > > > > looking
> > > > > > > > > > > > > at?
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Tue, Jun 6, 2017 at 12:33 AM, Vyacheslav
> Daradur <
> > > > > > > > > > > [hidden email]
> > > > > > > > > > > > >
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Hi, Igniters.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I've prepared some benchmarking. Results [1].
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > And I've prepared the evaluation in the form of
> > > > diagrams
> > > > > > [2].
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I hope that helps to interest the community and
> > > > > > accelerates a
> > > > > > > > > > > reaction
> > > > > > > > > > > > to
> > > > > > > > > > > > > > this improvment :)
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > [1]
> > > > > > > > > > > > > > https://github.com/daradurvs/
> > > ignite-compression/tree/
> > > > > > > > > > > > > > master/src/main/resources/result
> > > > > > > > > > > > > > [2] https://drive.google.com/file/d/
> > > > > > > > > 0B2CeUAOgrHkoMklyZ25YTEdKcEk/
> > > > > > > > > > > view
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > 2017-05-24 9:49 GMT+03:00 Vyacheslav Daradur <
> > > > > > > > > [hidden email]
> > > > > > > > > > >:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Guys, any thoughts?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > 2017-05-16 13:40 GMT+03:00 Vyacheslav Daradur <
> > > > > > > > > > [hidden email]
> > > > > > > > > > > >:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >> Hi guys,
> > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > >> I've prepared the PR to show my idea.
> > > > > > > > > > > > > > >> https://github.com/apache/
> > ignite/pull/1951/files
> > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > >> About querying - I've just copied existing
> tests
> > > and
> > > > > > have
> > > > > > > > > > > annotated
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > >> testing data.
> > > > > > > > > > > > > > >> https://github.com/apache/
> > > > > ignite/pull/1951/files#diff-
> > > > > > > > c19a9d
> > > > > > > > > > > > > > >> f4058141d059bb577e75244764
> > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > >> It means fields which will be marked by
> > > > > > @BinaryCompression
> > > > > > > > > will
> > > > > > > > > > be
> > > > > > > > > > > > > > >> compressed at marshalling via
> BinaryMarshaller.
> > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > >> This solution has no effect on existing data
> or
> > > > > project
> > > > > > > > > > > > architecture.
> > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > >> I'll be glad to see your thougths.
> > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > >> 2017-05-15 19:18 GMT+03:00 Vyacheslav Daradur
> <
> > > > > > > > > > > [hidden email]
> > > > > > > > > > > > >:
> > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > >>> Dmitriy,
> > > > > > > > > > > > > > >>>
> > > > > > > > > > > > > > >>> I have ready prototype. I want to show it.
> > > > > > > > > > > > > > >>> It is always easier to discuss on example.
> > > > > > > > > > > > > > >>>
> > > > > > > > > > > > > > >>> 2017-05-15 19:02 GMT+03:00 Dmitriy Setrakyan
> <
> > > > > > > > > > > > [hidden email]
> > > > > > > > > > > > > >:
> > > > > > > > > > > > > > >>>
> > > > > > > > > > > > > > >>>> Vyacheslav,
> > > > > > > > > > > > > > >>>>
> > > > > > > > > > > > > > >>>> I think it is a bit premature to provide a
> PR
> > > > > without
> > > > > > > > > getting
> > > > > > > > > > a
> > > > > > > > > > > > > > >>>> community
> > > > > > > > > > > > > > >>>> consensus on the dev list. Please allow some
> > > time
> > > > > for
> > > > > > > the
> > > > > > > > > > > > community
> > > > > > > > > > > > > to
> > > > > > > > > > > > > > >>>> respond.
> > > > > > > > > > > > > > >>>>
> > > > > > > > > > > > > > >>>> D.
> > > > > > > > > > > > > > >>>>
> > > > > > > > > > > > > > >>>> On Mon, May 15, 2017 at 6:36 AM, Vyacheslav
> > > > Daradur
> > > > > <
> > > > > > > > > > > > > > >>>> [hidden email]>
> > > > > > > > > > > > > > >>>> wrote:
> > > > > > > > > > > > > > >>>>
> > > > > > > > > > > > > > >>>> > I created the ticket:
> > > > > > https://issues.apache.org/jira
> > > > > > > > > > > > > > >>>> /browse/IGNITE-5226
> > > > > > > > > > > > > > >>>> >
> > > > > > > > > > > > > > >>>> > I'll prepare a PR with described solution
> in
> > > > > couple
> > > > > > of
> > > > > > > > > days.
> > > > > > > > > > > > > > >>>> >
> > > > > > > > > > > > > > >>>> > 2017-05-15 15:05 GMT+03:00 Vyacheslav
> > Daradur
> > > <
> > > > > > > > > > > > > [hidden email]
> > > > > > > > > > > > > > >:
> > > > > > > > > > > > > > >>>> >
> > > > > > > > > > > > > > >>>> > > Hi, Igniters!
> > > > > > > > > > > > > > >>>> > >
> > > > > > > > > > > > > > >>>> > > Apache 2.0 is released.
> > > > > > > > > > > > > > >>>> > >
> > > > > > > > > > > > > > >>>> > > Let's continue the discussion about a
> > > > > compression
> > > > > > > > > design.
> > > > > > > > > > > > > > >>>> > >
> > > > > > > > > > > > > > >>>> > > At the moment, I found only one solution
> > > which
> > > > > is
> > > > > > > > > > compatible
> > > > > > > > > > > > > with
> > > > > > > > > > > > > > >>>> > querying
> > > > > > > > > > > > > > >>>> > > and indexing, this is per-objects-field
> > > > > > compression.
> > > > > > > > > > > > > > >>>> > > Per-fields compression means that
> metadata
> > > (a
> > > > > > > header)
> > > > > > > > of
> > > > > > > > > > an
> > > > > > > > > > > > > object
> > > > > > > > > > > > > > >>>> won't
> > > > > > > > > > > > > > >>>> > > be compressed, only serialized values of
> > an
> > > > > object
> > > > > > > > > fields
> > > > > > > > > > > (in
> > > > > > > > > > > > > > bytes
> > > > > > > > > > > > > > >>>> array
> > > > > > > > > > > > > > >>>> > > form) will be compressed.
> > > > > > > > > > > > > > >>>> > >
> > > > > > > > > > > > > > >>>> > > This solution have some contentious
> > issues:
> > > > > > > > > > > > > > >>>> > > - small values, like primitives and
> short
> > > > > arrays -
> > > > > > > > there
> > > > > > > > > > > isn't
> > > > > > > > > > > > > > >>>> sense to
> > > > > > > > > > > > > > >>>> > > compress them;
> > > > > > > > > > > > > > >>>> > > - there is no possible to use
> compression
> > > with
> > > > > > > > > > > java-predefined
> > > > > > > > > > > > > > >>>> types;
> > > > > > > > > > > > > > >>>> > >
> > > > > > > > > > > > > > >>>> > > We can provide an annotation,
> > > > > @IgniteCompression -
> > > > > > > for
> > > > > > > > > > > > example,
> > > > > > > > > > > > > > >>>> which can
> > > > > > > > > > > > > > >>>> > > be used by users for marking fields to
> > > > compress.
> > > > > > > > > > > > > > >>>> > >
> > > > > > > > > > > > > > >>>> > > Any thoughts?
> > > > > > > > > > > > > > >>>> > >
> > > > > > > > > > > > > > >>>> > > Maybe someone already have ready design?
> > > > > > > > > > > > > > >>>> > >
> > > > > > > > > > > > > > >>>> > > 2017-04-10 11:06 GMT+03:00 Vyacheslav
> > > Daradur
> > > > <
> > > > > > > > > > > > > > [hidden email]
> > > > > > > > > > > > > > >>>> >:
> > > > > > > > > > > > > > >>>> > >
> > > > > > > > > > > > > > >>>> > >> Alexey,
> > > > > > > > > > > > > > >>>> > >>
> > > > > > > > > > > > > > >>>> > >> Yes, I've read it.
> > > > > > > > > > > > > > >>>> > >>
> > > > > > > > > > > > > > >>>> > >> Ok, let's discuss about public API
> > design.
> > > > > > > > > > > > > > >>>> > >>
> > > > > > > > > > > > > > >>>> > >> I think we need to add some a configure
> > > > entity
> > > > > to
> > > > > > > > > > > > > > >>>> CacheConfiguration,
> > > > > > > > > > > > > > >>>> > >> which will contain the Compressor
> > interface
> > > > > > > > > > implementation
> > > > > > > > > > > > and
> > > > > > > > > > > > > > some
> > > > > > > > > > > > > > >>>> > usefull
> > > > > > > > > > > > > > >>>> > >> parameters.
> > > > > > > > > > > > > > >>>> > >> Or maybe to provide a BinaryMarshaller
> > > > > decorator,
> > > > > > > > which
> > > > > > > > > > > will
> > > > > > > > > > > > be
> > > > > > > > > > > > > > >>>> compress
> > > > > > > > > > > > > > >>>> > >> data after marshalling.
> > > > > > > > > > > > > > >>>> > >>
> > > > > > > > > > > > > > >>>> > >>
> > > > > > > > > > > > > > >>>> > >> 2017-04-10 10:40 GMT+03:00 Alexey
> > > Kuznetsov <
> > > > > > > > > > > > > > [hidden email]
> > > > > > > > > > > > > > >>>> >:
> > > > > > > > > > > > > > >>>> > >>
> > > > > > > > > > > > > > >>>> > >>> Vyacheslav,
> > > > > > > > > > > > > > >>>> > >>>
> > > > > > > > > > > > > > >>>> > >>> Did you read initial discussion [1]
> > about
> > > > > > > > compression?
> > > > > > > > > > > > > > >>>> > >>> As far as I remember we agreed to add
> > only
> > > > > some
> > > > > > > > > > > "top-level"
> > > > > > > > > > > > > API
> > > > > > > > > > > > > > in
> > > > > > > > > > > > > > >>>> > order
> > > > > > > > > > > > > > >>>> > >>> to
> > > > > > > > > > > > > > >>>> > >>> provide a way for
> > > > > > > > > > > > > > >>>> > >>> Ignite users to inject some sort of
> > custom
> > > > > > > > > compression.
> > > > > > > > > > > > > > >>>> > >>>
> > > > > > > > > > > > > > >>>> > >>>
> > > > > > > > > > > > > > >>>> > >>> [1]
> > > > > > > > > > > > > > >>>> > >>> http://apache-ignite-developer
> > > > > > s.2346864.n4.nabble
> > > > > > > .
> > > > > > > > > > > > com/Data-c
> > > > > > > > > > > > > > >>>> > >>> ompression-in-Ignite-2-0-td10099.html
> > > > > > > > > > > > > > >>>> > >>>
> > > > > > > > > > > > > > >>>> > >>> On Mon, Apr 10, 2017 at 2:19 PM,
> > > daradurvs <
> > > > > > > > > > > > > [hidden email]
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >>>> > wrote:
> > > > > > > > > > > > > > >>>> > >>>
> > > > > > > > > > > > > > >>>> > >>> > Hi Igniters!
> > > > > > > > > > > > > > >>>> > >>> >
> > > > > > > > > > > > > > >>>> > >>> > I am interested in this task.
> > > > > > > > > > > > > > >>>> > >>> > Provide some kind of pluggable
> > > compression
> > > > > SPI
> > > > > > > > > support
> > > > > > > > > > > > > > >>>> > >>> > <https://issues.apache.org/
> > > > > > > > jira/browse/IGNITE-3592>
> > > > > > > > > > > > > > >>>> > >>> >
> > > > > > > > > > > > > > >>>> > >>> > I developed a solution on
> > > > > > > BinaryMarshaller-level,
> > > > > > > > > but
> > > > > > > > > > > > > reviewer
> > > > > > > > > > > > > > >>>> has
> > > > > > > > > > > > > > >>>> > >>> rejected
> > > > > > > > > > > > > > >>>> > >>> > it.
> > > > > > > > > > > > > > >>>> > >>> >
> > > > > > > > > > > > > > >>>> > >>> > Let's continue discussion of task
> > goals
> > > > and
> > > > > > > > solution
> > > > > > > > > > > > design.
> > > > > > > > > > > > > > >>>> > >>> > As I understood that, the main goal
> of
> > > > this
> > > > > > task
> > > > > > > > is
> > > > > > > > > to
> > > > > > > > > > > > store
> > > > > > > > > > > > > > >>>> data in
> > > > > > > > > > > > > > >>>> > >>> > compressed form.
> > > > > > > > > > > > > > >>>> > >>> > This is what I need from Ignite as
> its
> > > > user.
> > > > > > > > > > Compression
> > > > > > > > > > > > > > >>>> provides
> > > > > > > > > > > > > > >>>> > >>> economy
> > > > > > > > > > > > > > >>>> > >>> > on
> > > > > > > > > > > > > > >>>> > >>> > servers.
> > > > > > > > > > > > > > >>>> > >>> > We can store more data on same
> servers
> > > at
> > > > > the
> > > > > > > cost
> > > > > > > > > of
> > > > > > > > > > > > > > >>>> increasing CPU
> > > > > > > > > > > > > > >>>> > >>> > utilization.
> > > > > > > > > > > > > > >>>> > >>> >
> > > > > > > > > > > > > > >>>> > >>> > I'm researching a possibility of
> > > > > > implementation
> > > > > > > of
> > > > > > > > > > > > > compression
> > > > > > > > > > > > > > >>>> at the
> > > > > > > > > > > > > > >>>> > >>> > cache-level.
> > > > > > > > > > > > > > >>>> > >>> >
> > > > > > > > > > > > > > >>>> > >>> > Any thoughts?
> > > > > > > > > > > > > > >>>> > >>> >
> > > > > > > > > > > > > > >>>> > >>> > --
> > > > > > > > > > > > > > >>>> > >>> > Best regards,
> > > > > > > > > > > > > > >>>> > >>> > Vyacheslav
> > > > > > > > > > > > > > >>>> > >>> >
> > > > > > > > > > > > > > >>>> > >>> >
> > > > > > > > > > > > > > >>>> > >>> >
> > > > > > > > > > > > > > >>>> > >>> >
> > > > > > > > > > > > > > >>>> > >>> > --
> > > > > > > > > > > > > > >>>> > >>> > View this message in context:
> > > > > > > > http://apache-ignite-
> > > > > > > > > > > > > > >>>> > >>> > developers.2346864.n4.nabble.
> > > > > > > > > com/Data-compression-in-
> > > > > > > > > > > > > > >>>> > >>> > Ignite-2-0-tp10099p16317.html
> > > > > > > > > > > > > > >>>> > >>> > Sent from the Apache Ignite
> Developers
> > > > > mailing
> > > > > > > > list
> > > > > > > > > > > > archive
> > > > > > > > > > > > > at
> > > > > > > > > > > > > > >>>> > >>> Nabble.com.
> > > > > > > > > > > > > > >>>> > >>> >
> > > > > > > > > > > > > > >>>> > >>>
> > > > > > > > > > > > > > >>>> > >>>
> > > > > > > > > > > > > > >>>> > >>>
> > > > > > > > > > > > > > >>>> > >>> --
> > > > > > > > > > > > > > >>>> > >>> Alexey Kuznetsov
> > > > > > > > > > > > > > >>>> > >>>
> > > > > > > > > > > > > > >>>> > >>
> > > > > > > > > > > > > > >>>> > >>
> > > > > > > > > > > > > > >>>> > >>
> > > > > > > > > > > > > > >>>> > >> --
> > > > > > > > > > > > > > >>>> > >> Best Regards, Vyacheslav
> > > > > > > > > > > > > > >>>> > >>
> > > > > > > > > > > > > > >>>> > >
> > > > > > > > > > > > > > >>>> > >
> > > > > > > > > > > > > > >>>> > >
> > > > > > > > > > > > > > >>>> > > --
> > > > > > > > > > > > > > >>>> > > Best Regards, Vyacheslav
> > > > > > > > > > > > > > >>>> > >
> > > > > > > > > > > > > > >>>> >
> > > > > > > > > > > > > > >>>> >
> > > > > > > > > > > > > > >>>> >
> > > > > > > > > > > > > > >>>> > --
> > > > > > > > > > > > > > >>>> > Best Regards, Vyacheslav
> > > > > > > > > > > > > > >>>> >
> > > > > > > > > > > > > > >>>>
> > > > > > > > > > > > > > >>>
> > > > > > > > > > > > > > >>>
> > > > > > > > > > > > > > >>>
> > > > > > > > > > > > > > >>> --
> > > > > > > > > > > > > > >>> Best Regards, Vyacheslav
> > > > > > > > > > > > > > >>>
> > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > >> --
> > > > > > > > > > > > > > >> Best Regards, Vyacheslav
> > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > --
> > > > > > > > > > > > > > > Best Regards, Vyacheslav
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > --
> > > > > > > > > > > > > > Best Regards, Vyacheslav
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > --
> > > > > > > > > > > > Best Regards, Vyacheslav
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > --
> > > > > > > > > > >
> > > > > > > > > > > Best Regards, Anton Churaev
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > > Best Regards, Vyacheslav
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > >
> > > > > > > > > Best Regards, Anton Churaev
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Best Regards, Vyacheslav
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > >
> > > > > > > Best Regards, Anton Churaev
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Best Regards, Vyacheslav
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > >
> > > > > Best Regards, Anton Churaev
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Best Regards, Vyacheslav
> > >
> >
>
>
>
> --
> Best Regards, Vyacheslav
>



--

Best Regards, Anton Churaev
123