Data compression in Ignite 2.0

classic Classic list List threaded Threaded
59 messages Options
123
Reply | Threaded
Open this post in threaded view
|

Data compression in Ignite 2.0

Alexey Kuznetsov-2
Hi, All!

I would like to propose one more feature for Ignite 2.0.

Data compression for data in binary format.

Binary format is stored as field name + field data.
So we have a description.
How about to add one more byte to binary data descriptor:

*Compressed*:
 0 - Data stored as is (no compression).
 1 - Data compressed by dictionary (something like DB2 row compression [1],
 but for all binary types). We could have system or user defined replicated
cache for such dictionary and *cache.compact()* method that will scan
cache, build dictionary and compact data.
 2 - Data compressed by Java built in ZIP.
 3 - Data compressed by some user custom algorithm.

Of course it is possible to compress data in current Ignite 1.x but in this
case compressed data cannot be accessed from SQL engine, if we implement
support for compression on Ignite core level SQL engine will be able to
detect that data is compressed and properly handle such data.

What do you think?
If community consider this feature useful I will create issue in JIRA.

[1]
http://www.ibm.com/developerworks/data/library/techarticle/dm-1205db210compression/

--
Alexey Kuznetsov
Reply | Threaded
Open this post in threaded view
|

Re: Data compression in Ignite 2.0

Sergi
This will make sense only for rare cases when you have very large objects
stored, which can be effectively compressed. And even then it will
introduce slowdown on all the operations, which often will not be
acceptable. I guess only few users will find this feature useful, thus I
think it does not worth the effort.

Sergi



2016-07-25 9:28 GMT+03:00 Alexey Kuznetsov <[hidden email]>:

> Hi, All!
>
> I would like to propose one more feature for Ignite 2.0.
>
> Data compression for data in binary format.
>
> Binary format is stored as field name + field data.
> So we have a description.
> How about to add one more byte to binary data descriptor:
>
> *Compressed*:
>  0 - Data stored as is (no compression).
>  1 - Data compressed by dictionary (something like DB2 row compression [1],
>  but for all binary types). We could have system or user defined replicated
> cache for such dictionary and *cache.compact()* method that will scan
> cache, build dictionary and compact data.
>  2 - Data compressed by Java built in ZIP.
>  3 - Data compressed by some user custom algorithm.
>
> Of course it is possible to compress data in current Ignite 1.x but in this
> case compressed data cannot be accessed from SQL engine, if we implement
> support for compression on Ignite core level SQL engine will be able to
> detect that data is compressed and properly handle such data.
>
> What do you think?
> If community consider this feature useful I will create issue in JIRA.
>
> [1]
>
> http://www.ibm.com/developerworks/data/library/techarticle/dm-1205db210compression/
>
> --
> Alexey Kuznetsov
>
Reply | Threaded
Open this post in threaded view
|

Re: Data compression in Ignite 2.0

Alexey Kuznetsov-2
Sergi,

Of course it will introduce some slowdown, but with compression more data
could be stored in memory
and not will be evicted to disk. In case of compress by dictionary
substitution it will be only one more lookup
and should be fast.

In general we could provide only API for compression out of the box, and
users that really need some sort of compression
will implement it by them self. This will not require much effort I think.



On Mon, Jul 25, 2016 at 2:18 PM, Sergi Vladykin <[hidden email]>
wrote:

> This will make sense only for rare cases when you have very large objects
> stored, which can be effectively compressed. And even then it will
> introduce slowdown on all the operations, which often will not be
> acceptable. I guess only few users will find this feature useful, thus I
> think it does not worth the effort.
>
> Sergi
>
>
>
> 2016-07-25 9:28 GMT+03:00 Alexey Kuznetsov <[hidden email]>:
>
> > Hi, All!
> >
> > I would like to propose one more feature for Ignite 2.0.
> >
> > Data compression for data in binary format.
> >
> > Binary format is stored as field name + field data.
> > So we have a description.
> > How about to add one more byte to binary data descriptor:
> >
> > *Compressed*:
> >  0 - Data stored as is (no compression).
> >  1 - Data compressed by dictionary (something like DB2 row compression
> [1],
> >  but for all binary types). We could have system or user defined
> replicated
> > cache for such dictionary and *cache.compact()* method that will scan
> > cache, build dictionary and compact data.
> >  2 - Data compressed by Java built in ZIP.
> >  3 - Data compressed by some user custom algorithm.
> >
> > Of course it is possible to compress data in current Ignite 1.x but in
> this
> > case compressed data cannot be accessed from SQL engine, if we implement
> > support for compression on Ignite core level SQL engine will be able to
> > detect that data is compressed and properly handle such data.
> >
> > What do you think?
> > If community consider this feature useful I will create issue in JIRA.
> >
> > [1]
> >
> >
> http://www.ibm.com/developerworks/data/library/techarticle/dm-1205db210compression/
> >
> > --
> > Alexey Kuznetsov
> >
>



--
Alexey Kuznetsov
Reply | Threaded
Open this post in threaded view
|

Re: Data compression in Ignite 2.0

Sergey Kozlov
Hi

For approach 1: Put a large object into a partiton cache will force to
update the dictionary placed on replication cache. It seeis it may be
time-expense operation.
Appoach 2-3 are make sense for rare cases as Sergi commented.
Aslo I see a danger of OOM if we've got high compression level and try to
restore orginal value in memory.

On Mon, Jul 25, 2016 at 10:39 AM, Alexey Kuznetsov <[hidden email]>
wrote:

> Sergi,
>
> Of course it will introduce some slowdown, but with compression more data
> could be stored in memory
> and not will be evicted to disk. In case of compress by dictionary
> substitution it will be only one more lookup
> and should be fast.
>
> In general we could provide only API for compression out of the box, and
> users that really need some sort of compression
> will implement it by them self. This will not require much effort I think.
>
>
>
> On Mon, Jul 25, 2016 at 2:18 PM, Sergi Vladykin <[hidden email]>
> wrote:
>
> > This will make sense only for rare cases when you have very large objects
> > stored, which can be effectively compressed. And even then it will
> > introduce slowdown on all the operations, which often will not be
> > acceptable. I guess only few users will find this feature useful, thus I
> > think it does not worth the effort.
> >
> > Sergi
> >
> >
> >
> > 2016-07-25 9:28 GMT+03:00 Alexey Kuznetsov <[hidden email]>:
> >
> > > Hi, All!
> > >
> > > I would like to propose one more feature for Ignite 2.0.
> > >
> > > Data compression for data in binary format.
> > >
> > > Binary format is stored as field name + field data.
> > > So we have a description.
> > > How about to add one more byte to binary data descriptor:
> > >
> > > *Compressed*:
> > >  0 - Data stored as is (no compression).
> > >  1 - Data compressed by dictionary (something like DB2 row compression
> > [1],
> > >  but for all binary types). We could have system or user defined
> > replicated
> > > cache for such dictionary and *cache.compact()* method that will scan
> > > cache, build dictionary and compact data.
> > >  2 - Data compressed by Java built in ZIP.
> > >  3 - Data compressed by some user custom algorithm.
> > >
> > > Of course it is possible to compress data in current Ignite 1.x but in
> > this
> > > case compressed data cannot be accessed from SQL engine, if we
> implement
> > > support for compression on Ignite core level SQL engine will be able to
> > > detect that data is compressed and properly handle such data.
> > >
> > > What do you think?
> > > If community consider this feature useful I will create issue in JIRA.
> > >
> > > [1]
> > >
> > >
> >
> http://www.ibm.com/developerworks/data/library/techarticle/dm-1205db210compression/
> > >
> > > --
> > > Alexey Kuznetsov
> > >
> >
>
>
>
> --
> Alexey Kuznetsov
>



--
Sergey Kozlov
GridGain Systems
www.gridgain.com
Reply | Threaded
Open this post in threaded view
|

Re: Data compression in Ignite 2.0

nivanov
SAP Hana does the compression by 1) compressing SQL parameters before
execution, and 2) storing only compressed data in memory. This way all SQL
queries work as normal with zero modifications or performance overhead.
Only results of the query can be (optionally) decompressed back before
returning to the user.

--
Nikita Ivanov


On Mon, Jul 25, 2016 at 1:40 PM, Sergey Kozlov <[hidden email]> wrote:

> Hi
>
> For approach 1: Put a large object into a partiton cache will force to
> update the dictionary placed on replication cache. It seeis it may be
> time-expense operation.
> Appoach 2-3 are make sense for rare cases as Sergi commented.
> Aslo I see a danger of OOM if we've got high compression level and try to
> restore orginal value in memory.
>
> On Mon, Jul 25, 2016 at 10:39 AM, Alexey Kuznetsov <
> [hidden email]>
> wrote:
>
> > Sergi,
> >
> > Of course it will introduce some slowdown, but with compression more data
> > could be stored in memory
> > and not will be evicted to disk. In case of compress by dictionary
> > substitution it will be only one more lookup
> > and should be fast.
> >
> > In general we could provide only API for compression out of the box, and
> > users that really need some sort of compression
> > will implement it by them self. This will not require much effort I
> think.
> >
> >
> >
> > On Mon, Jul 25, 2016 at 2:18 PM, Sergi Vladykin <
> [hidden email]>
> > wrote:
> >
> > > This will make sense only for rare cases when you have very large
> objects
> > > stored, which can be effectively compressed. And even then it will
> > > introduce slowdown on all the operations, which often will not be
> > > acceptable. I guess only few users will find this feature useful, thus
> I
> > > think it does not worth the effort.
> > >
> > > Sergi
> > >
> > >
> > >
> > > 2016-07-25 9:28 GMT+03:00 Alexey Kuznetsov <[hidden email]>:
> > >
> > > > Hi, All!
> > > >
> > > > I would like to propose one more feature for Ignite 2.0.
> > > >
> > > > Data compression for data in binary format.
> > > >
> > > > Binary format is stored as field name + field data.
> > > > So we have a description.
> > > > How about to add one more byte to binary data descriptor:
> > > >
> > > > *Compressed*:
> > > >  0 - Data stored as is (no compression).
> > > >  1 - Data compressed by dictionary (something like DB2 row
> compression
> > > [1],
> > > >  but for all binary types). We could have system or user defined
> > > replicated
> > > > cache for such dictionary and *cache.compact()* method that will scan
> > > > cache, build dictionary and compact data.
> > > >  2 - Data compressed by Java built in ZIP.
> > > >  3 - Data compressed by some user custom algorithm.
> > > >
> > > > Of course it is possible to compress data in current Ignite 1.x but
> in
> > > this
> > > > case compressed data cannot be accessed from SQL engine, if we
> > implement
> > > > support for compression on Ignite core level SQL engine will be able
> to
> > > > detect that data is compressed and properly handle such data.
> > > >
> > > > What do you think?
> > > > If community consider this feature useful I will create issue in
> JIRA.
> > > >
> > > > [1]
> > > >
> > > >
> > >
> >
> http://www.ibm.com/developerworks/data/library/techarticle/dm-1205db210compression/
> > > >
> > > > --
> > > > Alexey Kuznetsov
> > > >
> > >
> >
> >
> >
> > --
> > Alexey Kuznetsov
> >
>
>
>
> --
> Sergey Kozlov
> GridGain Systems
> www.gridgain.com
>
Reply | Threaded
Open this post in threaded view
|

Re: Data compression in Ignite 2.0

dsetrakyan
Nikita, this sounds like a pretty elegant approach.

Does anyone in the community see a problem with this design?

On Mon, Jul 25, 2016 at 4:49 PM, Nikita Ivanov <[hidden email]> wrote:

> SAP Hana does the compression by 1) compressing SQL parameters before
> execution, and 2) storing only compressed data in memory. This way all SQL
> queries work as normal with zero modifications or performance overhead.
> Only results of the query can be (optionally) decompressed back before
> returning to the user.
>
> --
> Nikita Ivanov
>
>
> On Mon, Jul 25, 2016 at 1:40 PM, Sergey Kozlov <[hidden email]>
> wrote:
>
> > Hi
> >
> > For approach 1: Put a large object into a partiton cache will force to
> > update the dictionary placed on replication cache. It seeis it may be
> > time-expense operation.
> > Appoach 2-3 are make sense for rare cases as Sergi commented.
> > Aslo I see a danger of OOM if we've got high compression level and try to
> > restore orginal value in memory.
> >
> > On Mon, Jul 25, 2016 at 10:39 AM, Alexey Kuznetsov <
> > [hidden email]>
> > wrote:
> >
> > > Sergi,
> > >
> > > Of course it will introduce some slowdown, but with compression more
> data
> > > could be stored in memory
> > > and not will be evicted to disk. In case of compress by dictionary
> > > substitution it will be only one more lookup
> > > and should be fast.
> > >
> > > In general we could provide only API for compression out of the box,
> and
> > > users that really need some sort of compression
> > > will implement it by them self. This will not require much effort I
> > think.
> > >
> > >
> > >
> > > On Mon, Jul 25, 2016 at 2:18 PM, Sergi Vladykin <
> > [hidden email]>
> > > wrote:
> > >
> > > > This will make sense only for rare cases when you have very large
> > objects
> > > > stored, which can be effectively compressed. And even then it will
> > > > introduce slowdown on all the operations, which often will not be
> > > > acceptable. I guess only few users will find this feature useful,
> thus
> > I
> > > > think it does not worth the effort.
> > > >
> > > > Sergi
> > > >
> > > >
> > > >
> > > > 2016-07-25 9:28 GMT+03:00 Alexey Kuznetsov <[hidden email]
> >:
> > > >
> > > > > Hi, All!
> > > > >
> > > > > I would like to propose one more feature for Ignite 2.0.
> > > > >
> > > > > Data compression for data in binary format.
> > > > >
> > > > > Binary format is stored as field name + field data.
> > > > > So we have a description.
> > > > > How about to add one more byte to binary data descriptor:
> > > > >
> > > > > *Compressed*:
> > > > >  0 - Data stored as is (no compression).
> > > > >  1 - Data compressed by dictionary (something like DB2 row
> > compression
> > > > [1],
> > > > >  but for all binary types). We could have system or user defined
> > > > replicated
> > > > > cache for such dictionary and *cache.compact()* method that will
> scan
> > > > > cache, build dictionary and compact data.
> > > > >  2 - Data compressed by Java built in ZIP.
> > > > >  3 - Data compressed by some user custom algorithm.
> > > > >
> > > > > Of course it is possible to compress data in current Ignite 1.x but
> > in
> > > > this
> > > > > case compressed data cannot be accessed from SQL engine, if we
> > > implement
> > > > > support for compression on Ignite core level SQL engine will be
> able
> > to
> > > > > detect that data is compressed and properly handle such data.
> > > > >
> > > > > What do you think?
> > > > > If community consider this feature useful I will create issue in
> > JIRA.
> > > > >
> > > > > [1]
> > > > >
> > > > >
> > > >
> > >
> >
> http://www.ibm.com/developerworks/data/library/techarticle/dm-1205db210compression/
> > > > >
> > > > > --
> > > > > Alexey Kuznetsov
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Alexey Kuznetsov
> > >
> >
> >
> >
> > --
> > Sergey Kozlov
> > GridGain Systems
> > www.gridgain.com
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Data compression in Ignite 2.0

Andrey Kornev
I'm guessing the suggestion here is to use the compressed form directly for WHERE clause evaluation. If that's the case I think there are a couple of issues:

1) the LIKE predicate.

2) predicates other than equality (for example, <, >, etc.)


But since Ignite isn't just about SQL queries (surprisingly some people still use it just as distributed cache!), in general I think compression is a great data. The cleanest way to achieve that would be to just make it possible to chain the marshallers. It is possible to do it already without any Ignite code changes, but unfortunately it would force people to use the non-public BinaryMarshaller class directly (as the first element of the chain).


Cheers

Andrey

________________________________
From: Dmitriy Setrakyan <[hidden email]>
Sent: Monday, July 25, 2016 1:53 PM
To: [hidden email]
Subject: Re: Data compression in Ignite 2.0

Nikita, this sounds like a pretty elegant approach.

Does anyone in the community see a problem with this design?

On Mon, Jul 25, 2016 at 4:49 PM, Nikita Ivanov <[hidden email]> wrote:

> SAP Hana does the compression by 1) compressing SQL parameters before
> execution, and 2) storing only compressed data in memory. This way all SQL
> queries work as normal with zero modifications or performance overhead.
> Only results of the query can be (optionally) decompressed back before
> returning to the user.
>
> --
> Nikita Ivanov
>
>
> On Mon, Jul 25, 2016 at 1:40 PM, Sergey Kozlov <[hidden email]>
> wrote:
>
> > Hi
> >
> > For approach 1: Put a large object into a partiton cache will force to
> > update the dictionary placed on replication cache. It seeis it may be
> > time-expense operation.
> > Appoach 2-3 are make sense for rare cases as Sergi commented.
> > Aslo I see a danger of OOM if we've got high compression level and try to
> > restore orginal value in memory.
> >
> > On Mon, Jul 25, 2016 at 10:39 AM, Alexey Kuznetsov <
> > [hidden email]>
> > wrote:
> >
> > > Sergi,
> > >
> > > Of course it will introduce some slowdown, but with compression more
> data
> > > could be stored in memory
> > > and not will be evicted to disk. In case of compress by dictionary
> > > substitution it will be only one more lookup
> > > and should be fast.
> > >
> > > In general we could provide only API for compression out of the box,
> and
> > > users that really need some sort of compression
> > > will implement it by them self. This will not require much effort I
> > think.
> > >
> > >
> > >
> > > On Mon, Jul 25, 2016 at 2:18 PM, Sergi Vladykin <
> > [hidden email]>
> > > wrote:
> > >
> > > > This will make sense only for rare cases when you have very large
> > objects
> > > > stored, which can be effectively compressed. And even then it will
> > > > introduce slowdown on all the operations, which often will not be
> > > > acceptable. I guess only few users will find this feature useful,
> thus
> > I
> > > > think it does not worth the effort.
> > > >
> > > > Sergi
> > > >
> > > >
> > > >
> > > > 2016-07-25 9:28 GMT+03:00 Alexey Kuznetsov <[hidden email]
> >:
> > > >
> > > > > Hi, All!
> > > > >
> > > > > I would like to propose one more feature for Ignite 2.0.
> > > > >
> > > > > Data compression for data in binary format.
> > > > >
> > > > > Binary format is stored as field name + field data.
> > > > > So we have a description.
> > > > > How about to add one more byte to binary data descriptor:
> > > > >
> > > > > *Compressed*:
> > > > >  0 - Data stored as is (no compression).
> > > > >  1 - Data compressed by dictionary (something like DB2 row
> > compression
> > > > [1],
> > > > >  but for all binary types). We could have system or user defined
> > > > replicated
> > > > > cache for such dictionary and *cache.compact()* method that will
> scan
> > > > > cache, build dictionary and compact data.
> > > > >  2 - Data compressed by Java built in ZIP.
> > > > >  3 - Data compressed by some user custom algorithm.
> > > > >
> > > > > Of course it is possible to compress data in current Ignite 1.x but
> > in
> > > > this
> > > > > case compressed data cannot be accessed from SQL engine, if we
> > > implement
> > > > > support for compression on Ignite core level SQL engine will be
> able
> > to
> > > > > detect that data is compressed and properly handle such data.
> > > > >
> > > > > What do you think?
> > > > > If community consider this feature useful I will create issue in
> > JIRA.
> > > > >
> > > > > [1]
> > > > >
> > > > >
> > > >
> > >
> >
> http://www.ibm.com/developerworks/data/library/techarticle/dm-1205db210compression/
Optimize storage with deep compression in DB2 10<http://www.ibm.com/developerworks/data/library/techarticle/dm-1205db210compression/>
www.ibm.com
Thomas Fanghaenel has been with IBM for nine years and is a Senior Software Engineer with the DB2 Data Warehouse Storage and Indexing ...


> > > > >
> > > > > --
> > > > > Alexey Kuznetsov
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Alexey Kuznetsov
> > >
> >
> >
> >
> > --
> > Sergey Kozlov
> > GridGain Systems
> > www.gridgain.com<http://www.gridgain.com>
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Data compression in Ignite 2.0

Alexey Kuznetsov-2
Sergey Kozlov wrote:
 >> For approach 1: Put a large object into a partition cache will
force to update
the dictionary placed on replication cache. It may be time-expense
operation.
The dictionary will be built only once. And we could control what should be
put into dictionary, for example, we could check min and max size and
decide - put value to dictionary or not.

>> Approach 2-3 are make sense for rare cases as Sergi commented.
But it is better at least have a possibility to plug user code for
compression than not to have it at all.

>> Also I see a danger of OOM if we've got high compression level and try
to restore original value in memory.
We could easily get OOM with many other operations right now without
compression, I think it is not an issue, we could add a NOTE to
documentation about such possibility.

Andrey Kornev wrote:
>> ...  in general I think compression is a great data. The cleanest way to
achieve that would be to just make it possible to chain the marshallers...
I think it is also good idea. And looks like it could be used for
compression with some sort of ZIP algorithm, but how to deal with
compression by dictionary substitution?
We need to build dictionary first. Any ideas?

Nikita Ivanov wrote:
>> SAP Hana does the compression by 1) compressing SQL parameters before
execution...
Looks interesting, but my initial point was about compression of cache
data, not SQL queries.
My idea was to make compression transparent for SQL engine when it will
lookup for data.

But idea of compressing SQL queries result looks very interesting, because
it is known fact, that SQL engine could consume quite a lot of heap for
storing result sets.
I think this should be discussed in separate thread.

Just for you information, in first message I mentioned that DB2 has
compression by dictionary and according to them it is possible to
compress usual data to 50-80%.
I have some experience with DB2 and can confirm this.

--
Alexey Kuznetsov
Reply | Threaded
Open this post in threaded view
|

Re: Data compression in Ignite 2.0

Andrey Kornev
Dictionary compression requires some knowledge about data being compressed. For example, for numeric types a range of values must be known so that the dictionary can be generated. For strings, the number of unique values of the column is the key piece of input into the dictionary generation.
SAP HANA is a column-based database system: it stores the fields of the data tuple individually using the best compression for the given data type and the particular set of values. HANA has been specifically built as a general purpose database, rather than as an afterthought layer on top of an already existing distributed cache.
On the other hand, Ignite is a distributed cache implementation (a pretty good one!) that in general requires no schema and stores its data in the row-based fashion. Its current design doesn't land itself readily to the kind of optimizations HANA provides out of the box.
For the curios types among us, the implementation details of HANA are well documented in "In-memory Data Management", by Hasso Plattner & Alexander Zeier.
Cheers
Andrey
_____________________________
From: Alexey Kuznetsov <[hidden email]<mailto:[hidden email]>>
Sent: Tuesday, July 26, 2016 5:36 AM
Subject: Re: Data compression in Ignite 2.0
To: <[hidden email]<mailto:[hidden email]>>


Sergey Kozlov wrote:
>> For approach 1: Put a large object into a partition cache will
force to update
the dictionary placed on replication cache. It may be time-expense
operation.
The dictionary will be built only once. And we could control what should be
put into dictionary, for example, we could check min and max size and
decide - put value to dictionary or not.

>> Approach 2-3 are make sense for rare cases as Sergi commented.
But it is better at least have a possibility to plug user code for
compression than not to have it at all.

>> Also I see a danger of OOM if we've got high compression level and try
to restore original value in memory.
We could easily get OOM with many other operations right now without
compression, I think it is not an issue, we could add a NOTE to
documentation about such possibility.

Andrey Kornev wrote:
>> ... in general I think compression is a great data. The cleanest way to
achieve that would be to just make it possible to chain the marshallers...
I think it is also good idea. And looks like it could be used for
compression with some sort of ZIP algorithm, but how to deal with
compression by dictionary substitution?
We need to build dictionary first. Any ideas?

Nikita Ivanov wrote:
>> SAP Hana does the compression by 1) compressing SQL parameters before
execution...
Looks interesting, but my initial point was about compression of cache
data, not SQL queries.
My idea was to make compression transparent for SQL engine when it will
lookup for data.

But idea of compressing SQL queries result looks very interesting, because
it is known fact, that SQL engine could consume quite a lot of heap for
storing result sets.
I think this should be discussed in separate thread.

Just for you information, in first message I mentioned that DB2 has
compression by dictionary and according to them it is possible to
compress usual data to 50-80%.
I have some experience with DB2 and can confirm this.

--
Alexey Kuznetsov


Reply | Threaded
Open this post in threaded view
|

Re: Data compression in Ignite 2.0

nivanov
Very good points indeed. I get the compression in Ignite question quite
often and Hana reference is a typical lead in.

My personal opinion is still that in Ignite *specifically* the compression
is best left to the end-user. But we may need to provide a better facility
to inject user's logic here...

--
Nikita Ivanov


On Tue, Jul 26, 2016 at 9:53 PM, Andrey Kornev <[hidden email]>
wrote:

> Dictionary compression requires some knowledge about data being
> compressed. For example, for numeric types a range of values must be known
> so that the dictionary can be generated. For strings, the number of unique
> values of the column is the key piece of input into the dictionary
> generation.
> SAP HANA is a column-based database system: it stores the fields of the
> data tuple individually using the best compression for the given data type
> and the particular set of values. HANA has been specifically built as a
> general purpose database, rather than as an afterthought layer on top of an
> already existing distributed cache.
> On the other hand, Ignite is a distributed cache implementation (a pretty
> good one!) that in general requires no schema and stores its data in the
> row-based fashion. Its current design doesn't land itself readily to the
> kind of optimizations HANA provides out of the box.
> For the curios types among us, the implementation details of HANA are well
> documented in "In-memory Data Management", by Hasso Plattner & Alexander
> Zeier.
> Cheers
> Andrey
> _____________________________
> From: Alexey Kuznetsov <[hidden email]<mailto:
> [hidden email]>>
> Sent: Tuesday, July 26, 2016 5:36 AM
> Subject: Re: Data compression in Ignite 2.0
> To: <[hidden email]<mailto:[hidden email]>>
>
>
> Sergey Kozlov wrote:
> >> For approach 1: Put a large object into a partition cache will
> force to update
> the dictionary placed on replication cache. It may be time-expense
> operation.
> The dictionary will be built only once. And we could control what should be
> put into dictionary, for example, we could check min and max size and
> decide - put value to dictionary or not.
>
> >> Approach 2-3 are make sense for rare cases as Sergi commented.
> But it is better at least have a possibility to plug user code for
> compression than not to have it at all.
>
> >> Also I see a danger of OOM if we've got high compression level and try
> to restore original value in memory.
> We could easily get OOM with many other operations right now without
> compression, I think it is not an issue, we could add a NOTE to
> documentation about such possibility.
>
> Andrey Kornev wrote:
> >> ... in general I think compression is a great data. The cleanest way to
> achieve that would be to just make it possible to chain the marshallers...
> I think it is also good idea. And looks like it could be used for
> compression with some sort of ZIP algorithm, but how to deal with
> compression by dictionary substitution?
> We need to build dictionary first. Any ideas?
>
> Nikita Ivanov wrote:
> >> SAP Hana does the compression by 1) compressing SQL parameters before
> execution...
> Looks interesting, but my initial point was about compression of cache
> data, not SQL queries.
> My idea was to make compression transparent for SQL engine when it will
> lookup for data.
>
> But idea of compressing SQL queries result looks very interesting, because
> it is known fact, that SQL engine could consume quite a lot of heap for
> storing result sets.
> I think this should be discussed in separate thread.
>
> Just for you information, in first message I mentioned that DB2 has
> compression by dictionary and according to them it is possible to
> compress usual data to 50-80%.
> I have some experience with DB2 and can confirm this.
>
> --
> Alexey Kuznetsov
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Data compression in Ignite 2.0

Alexey Kuznetsov-2
Nikita,

That was my intention: "we may need to provide a better facility to inject
user's logic here..."

Andrey,
About compression, once again - DB2 is a row-based DB and they can compress
:)

On Wed, Jul 27, 2016 at 12:56 PM, Nikita Ivanov <[hidden email]> wrote:

> Very good points indeed. I get the compression in Ignite question quite
> often and Hana reference is a typical lead in.
>
> My personal opinion is still that in Ignite *specifically* the compression
> is best left to the end-user. But we may need to provide a better facility
> to inject user's logic here...
>
> --
> Nikita Ivanov
>
>
> On Tue, Jul 26, 2016 at 9:53 PM, Andrey Kornev <[hidden email]>
> wrote:
>
> > Dictionary compression requires some knowledge about data being
> > compressed. For example, for numeric types a range of values must be
> known
> > so that the dictionary can be generated. For strings, the number of
> unique
> > values of the column is the key piece of input into the dictionary
> > generation.
> > SAP HANA is a column-based database system: it stores the fields of the
> > data tuple individually using the best compression for the given data
> type
> > and the particular set of values. HANA has been specifically built as a
> > general purpose database, rather than as an afterthought layer on top of
> an
> > already existing distributed cache.
> > On the other hand, Ignite is a distributed cache implementation (a pretty
> > good one!) that in general requires no schema and stores its data in the
> > row-based fashion. Its current design doesn't land itself readily to the
> > kind of optimizations HANA provides out of the box.
> > For the curios types among us, the implementation details of HANA are
> well
> > documented in "In-memory Data Management", by Hasso Plattner & Alexander
> > Zeier.
> > Cheers
> > Andrey
> > _____________________________
> > From: Alexey Kuznetsov <[hidden email]<mailto:
> > [hidden email]>>
> > Sent: Tuesday, July 26, 2016 5:36 AM
> > Subject: Re: Data compression in Ignite 2.0
> > To: <[hidden email]<mailto:[hidden email]>>
> >
> >
> > Sergey Kozlov wrote:
> > >> For approach 1: Put a large object into a partition cache will
> > force to update
> > the dictionary placed on replication cache. It may be time-expense
> > operation.
> > The dictionary will be built only once. And we could control what should
> be
> > put into dictionary, for example, we could check min and max size and
> > decide - put value to dictionary or not.
> >
> > >> Approach 2-3 are make sense for rare cases as Sergi commented.
> > But it is better at least have a possibility to plug user code for
> > compression than not to have it at all.
> >
> > >> Also I see a danger of OOM if we've got high compression level and try
> > to restore original value in memory.
> > We could easily get OOM with many other operations right now without
> > compression, I think it is not an issue, we could add a NOTE to
> > documentation about such possibility.
> >
> > Andrey Kornev wrote:
> > >> ... in general I think compression is a great data. The cleanest way
> to
> > achieve that would be to just make it possible to chain the
> marshallers...
> > I think it is also good idea. And looks like it could be used for
> > compression with some sort of ZIP algorithm, but how to deal with
> > compression by dictionary substitution?
> > We need to build dictionary first. Any ideas?
> >
> > Nikita Ivanov wrote:
> > >> SAP Hana does the compression by 1) compressing SQL parameters before
> > execution...
> > Looks interesting, but my initial point was about compression of cache
> > data, not SQL queries.
> > My idea was to make compression transparent for SQL engine when it will
> > lookup for data.
> >
> > But idea of compressing SQL queries result looks very interesting,
> because
> > it is known fact, that SQL engine could consume quite a lot of heap for
> > storing result sets.
> > I think this should be discussed in separate thread.
> >
> > Just for you information, in first message I mentioned that DB2 has
> > compression by dictionary and according to them it is possible to
> > compress usual data to 50-80%.
> > I have some experience with DB2 and can confirm this.
> >
> > --
> > Alexey Kuznetsov
>


--
Alexey Kuznetsov
Reply | Threaded
Open this post in threaded view
|

Re: Data compression in Ignite 2.0

Sebastien DIAZ
Hi

I add Redis as a sample of memory compression strategy

http://labs.octivi.com/how-we-cut-down-memory-usage-by-82/

http://redis.io/topics/memory-optimization

Regards

S DIAZ



2016-07-27 8:24 GMT+02:00 Alexey Kuznetsov <[hidden email]>:

> Nikita,
>
> That was my intention: "we may need to provide a better facility to inject
> user's logic here..."
>
> Andrey,
> About compression, once again - DB2 is a row-based DB and they can compress
> :)
>
> On Wed, Jul 27, 2016 at 12:56 PM, Nikita Ivanov <[hidden email]>
> wrote:
>
> > Very good points indeed. I get the compression in Ignite question quite
> > often and Hana reference is a typical lead in.
> >
> > My personal opinion is still that in Ignite *specifically* the
> compression
> > is best left to the end-user. But we may need to provide a better
> facility
> > to inject user's logic here...
> >
> > --
> > Nikita Ivanov
> >
> >
> > On Tue, Jul 26, 2016 at 9:53 PM, Andrey Kornev <[hidden email]
> >
> > wrote:
> >
> > > Dictionary compression requires some knowledge about data being
> > > compressed. For example, for numeric types a range of values must be
> > known
> > > so that the dictionary can be generated. For strings, the number of
> > unique
> > > values of the column is the key piece of input into the dictionary
> > > generation.
> > > SAP HANA is a column-based database system: it stores the fields of the
> > > data tuple individually using the best compression for the given data
> > type
> > > and the particular set of values. HANA has been specifically built as a
> > > general purpose database, rather than as an afterthought layer on top
> of
> > an
> > > already existing distributed cache.
> > > On the other hand, Ignite is a distributed cache implementation (a
> pretty
> > > good one!) that in general requires no schema and stores its data in
> the
> > > row-based fashion. Its current design doesn't land itself readily to
> the
> > > kind of optimizations HANA provides out of the box.
> > > For the curios types among us, the implementation details of HANA are
> > well
> > > documented in "In-memory Data Management", by Hasso Plattner &
> Alexander
> > > Zeier.
> > > Cheers
> > > Andrey
> > > _____________________________
> > > From: Alexey Kuznetsov <[hidden email]<mailto:
> > > [hidden email]>>
> > > Sent: Tuesday, July 26, 2016 5:36 AM
> > > Subject: Re: Data compression in Ignite 2.0
> > > To: <[hidden email]<mailto:[hidden email]>>
> > >
> > >
> > > Sergey Kozlov wrote:
> > > >> For approach 1: Put a large object into a partition cache will
> > > force to update
> > > the dictionary placed on replication cache. It may be time-expense
> > > operation.
> > > The dictionary will be built only once. And we could control what
> should
> > be
> > > put into dictionary, for example, we could check min and max size and
> > > decide - put value to dictionary or not.
> > >
> > > >> Approach 2-3 are make sense for rare cases as Sergi commented.
> > > But it is better at least have a possibility to plug user code for
> > > compression than not to have it at all.
> > >
> > > >> Also I see a danger of OOM if we've got high compression level and
> try
> > > to restore original value in memory.
> > > We could easily get OOM with many other operations right now without
> > > compression, I think it is not an issue, we could add a NOTE to
> > > documentation about such possibility.
> > >
> > > Andrey Kornev wrote:
> > > >> ... in general I think compression is a great data. The cleanest way
> > to
> > > achieve that would be to just make it possible to chain the
> > marshallers...
> > > I think it is also good idea. And looks like it could be used for
> > > compression with some sort of ZIP algorithm, but how to deal with
> > > compression by dictionary substitution?
> > > We need to build dictionary first. Any ideas?
> > >
> > > Nikita Ivanov wrote:
> > > >> SAP Hana does the compression by 1) compressing SQL parameters
> before
> > > execution...
> > > Looks interesting, but my initial point was about compression of cache
> > > data, not SQL queries.
> > > My idea was to make compression transparent for SQL engine when it will
> > > lookup for data.
> > >
> > > But idea of compressing SQL queries result looks very interesting,
> > because
> > > it is known fact, that SQL engine could consume quite a lot of heap for
> > > storing result sets.
> > > I think this should be discussed in separate thread.
> > >
> > > Just for you information, in first message I mentioned that DB2 has
> > > compression by dictionary and according to them it is possible to
> > > compress usual data to 50-80%.
> > > I have some experience with DB2 and can confirm this.
> > >
> > > --
> > > Alexey Kuznetsov
> >
>
>
> --
> Alexey Kuznetsov
>
Reply | Threaded
Open this post in threaded view
|

Re: Data compression in Ignite 2.0

Sergi
Nikita,

I agree with Andrey, HANA is a bad comparison to Ignite in this respect. I
did not find any evidence on the internet that their row store is very
efficient with compression. It was always about column store.

Alexey,

As for DB2, can you check what exactly, when and how it compresses and does
it give any decent results before suggesting it as an example to follow? I
don't think it is good idea to repeat every bad idea after other products.

And even if there are good results in DB2, will this all be applicable to
Ignite? PostgreSql for example provides TOAST compression and this can be
useful when used in a smart way, but this is a very different architecture
from what we have.

All in all I agree that may be we should provide some kind of pluggable
compression SPI support, but do not expect much from it, usually it will be
just useless.

Sergi



2016-07-27 10:16 GMT+03:00 Sebastien DIAZ <[hidden email]>:

> Hi
>
> I add Redis as a sample of memory compression strategy
>
> http://labs.octivi.com/how-we-cut-down-memory-usage-by-82/
>
> http://redis.io/topics/memory-optimization
>
> Regards
>
> S DIAZ
>
>
>
> 2016-07-27 8:24 GMT+02:00 Alexey Kuznetsov <[hidden email]>:
>
> > Nikita,
> >
> > That was my intention: "we may need to provide a better facility to
> inject
> > user's logic here..."
> >
> > Andrey,
> > About compression, once again - DB2 is a row-based DB and they can
> compress
> > :)
> >
> > On Wed, Jul 27, 2016 at 12:56 PM, Nikita Ivanov <[hidden email]>
> > wrote:
> >
> > > Very good points indeed. I get the compression in Ignite question quite
> > > often and Hana reference is a typical lead in.
> > >
> > > My personal opinion is still that in Ignite *specifically* the
> > compression
> > > is best left to the end-user. But we may need to provide a better
> > facility
> > > to inject user's logic here...
> > >
> > > --
> > > Nikita Ivanov
> > >
> > >
> > > On Tue, Jul 26, 2016 at 9:53 PM, Andrey Kornev <
> [hidden email]
> > >
> > > wrote:
> > >
> > > > Dictionary compression requires some knowledge about data being
> > > > compressed. For example, for numeric types a range of values must be
> > > known
> > > > so that the dictionary can be generated. For strings, the number of
> > > unique
> > > > values of the column is the key piece of input into the dictionary
> > > > generation.
> > > > SAP HANA is a column-based database system: it stores the fields of
> the
> > > > data tuple individually using the best compression for the given data
> > > type
> > > > and the particular set of values. HANA has been specifically built
> as a
> > > > general purpose database, rather than as an afterthought layer on top
> > of
> > > an
> > > > already existing distributed cache.
> > > > On the other hand, Ignite is a distributed cache implementation (a
> > pretty
> > > > good one!) that in general requires no schema and stores its data in
> > the
> > > > row-based fashion. Its current design doesn't land itself readily to
> > the
> > > > kind of optimizations HANA provides out of the box.
> > > > For the curios types among us, the implementation details of HANA are
> > > well
> > > > documented in "In-memory Data Management", by Hasso Plattner &
> > Alexander
> > > > Zeier.
> > > > Cheers
> > > > Andrey
> > > > _____________________________
> > > > From: Alexey Kuznetsov <[hidden email]<mailto:
> > > > [hidden email]>>
> > > > Sent: Tuesday, July 26, 2016 5:36 AM
> > > > Subject: Re: Data compression in Ignite 2.0
> > > > To: <[hidden email]<mailto:[hidden email]>>
> > > >
> > > >
> > > > Sergey Kozlov wrote:
> > > > >> For approach 1: Put a large object into a partition cache will
> > > > force to update
> > > > the dictionary placed on replication cache. It may be time-expense
> > > > operation.
> > > > The dictionary will be built only once. And we could control what
> > should
> > > be
> > > > put into dictionary, for example, we could check min and max size and
> > > > decide - put value to dictionary or not.
> > > >
> > > > >> Approach 2-3 are make sense for rare cases as Sergi commented.
> > > > But it is better at least have a possibility to plug user code for
> > > > compression than not to have it at all.
> > > >
> > > > >> Also I see a danger of OOM if we've got high compression level and
> > try
> > > > to restore original value in memory.
> > > > We could easily get OOM with many other operations right now without
> > > > compression, I think it is not an issue, we could add a NOTE to
> > > > documentation about such possibility.
> > > >
> > > > Andrey Kornev wrote:
> > > > >> ... in general I think compression is a great data. The cleanest
> way
> > > to
> > > > achieve that would be to just make it possible to chain the
> > > marshallers...
> > > > I think it is also good idea. And looks like it could be used for
> > > > compression with some sort of ZIP algorithm, but how to deal with
> > > > compression by dictionary substitution?
> > > > We need to build dictionary first. Any ideas?
> > > >
> > > > Nikita Ivanov wrote:
> > > > >> SAP Hana does the compression by 1) compressing SQL parameters
> > before
> > > > execution...
> > > > Looks interesting, but my initial point was about compression of
> cache
> > > > data, not SQL queries.
> > > > My idea was to make compression transparent for SQL engine when it
> will
> > > > lookup for data.
> > > >
> > > > But idea of compressing SQL queries result looks very interesting,
> > > because
> > > > it is known fact, that SQL engine could consume quite a lot of heap
> for
> > > > storing result sets.
> > > > I think this should be discussed in separate thread.
> > > >
> > > > Just for you information, in first message I mentioned that DB2 has
> > > > compression by dictionary and according to them it is possible to
> > > > compress usual data to 50-80%.
> > > > I have some experience with DB2 and can confirm this.
> > > >
> > > > --
> > > > Alexey Kuznetsov
> > >
> >
> >
> > --
> > Alexey Kuznetsov
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Data compression in Ignite 2.0

Alexey Kuznetsov-2
FYI, I created issue for Ignite 2.0:
https://issues.apache.org/jira/browse/IGNITE-3592

Thanks!

On Wed, Jul 27, 2016 at 2:36 PM, Sergi Vladykin <[hidden email]>
wrote:

> Nikita,
>
> I agree with Andrey, HANA is a bad comparison to Ignite in this respect. I
> did not find any evidence on the internet that their row store is very
> efficient with compression. It was always about column store.
>
> Alexey,
>
> As for DB2, can you check what exactly, when and how it compresses and does
> it give any decent results before suggesting it as an example to follow? I
> don't think it is good idea to repeat every bad idea after other products.
>
> And even if there are good results in DB2, will this all be applicable to
> Ignite? PostgreSql for example provides TOAST compression and this can be
> useful when used in a smart way, but this is a very different architecture
> from what we have.
>
> All in all I agree that may be we should provide some kind of pluggable
> compression SPI support, but do not expect much from it, usually it will be
> just useless.
>
> Sergi
>
>
>
> 2016-07-27 10:16 GMT+03:00 Sebastien DIAZ <[hidden email]>:
>
> > Hi
> >
> > I add Redis as a sample of memory compression strategy
> >
> > http://labs.octivi.com/how-we-cut-down-memory-usage-by-82/
> >
> > http://redis.io/topics/memory-optimization
> >
> > Regards
> >
> > S DIAZ
> >
> >
> >
> > 2016-07-27 8:24 GMT+02:00 Alexey Kuznetsov <[hidden email]>:
> >
> > > Nikita,
> > >
> > > That was my intention: "we may need to provide a better facility to
> > inject
> > > user's logic here..."
> > >
> > > Andrey,
> > > About compression, once again - DB2 is a row-based DB and they can
> > compress
> > > :)
> > >
> > > On Wed, Jul 27, 2016 at 12:56 PM, Nikita Ivanov <[hidden email]>
> > > wrote:
> > >
> > > > Very good points indeed. I get the compression in Ignite question
> quite
> > > > often and Hana reference is a typical lead in.
> > > >
> > > > My personal opinion is still that in Ignite *specifically* the
> > > compression
> > > > is best left to the end-user. But we may need to provide a better
> > > facility
> > > > to inject user's logic here...
> > > >
> > > > --
> > > > Nikita Ivanov
> > > >
> > > >
> > > > On Tue, Jul 26, 2016 at 9:53 PM, Andrey Kornev <
> > [hidden email]
> > > >
> > > > wrote:
> > > >
> > > > > Dictionary compression requires some knowledge about data being
> > > > > compressed. For example, for numeric types a range of values must
> be
> > > > known
> > > > > so that the dictionary can be generated. For strings, the number of
> > > > unique
> > > > > values of the column is the key piece of input into the dictionary
> > > > > generation.
> > > > > SAP HANA is a column-based database system: it stores the fields of
> > the
> > > > > data tuple individually using the best compression for the given
> data
> > > > type
> > > > > and the particular set of values. HANA has been specifically built
> > as a
> > > > > general purpose database, rather than as an afterthought layer on
> top
> > > of
> > > > an
> > > > > already existing distributed cache.
> > > > > On the other hand, Ignite is a distributed cache implementation (a
> > > pretty
> > > > > good one!) that in general requires no schema and stores its data
> in
> > > the
> > > > > row-based fashion. Its current design doesn't land itself readily
> to
> > > the
> > > > > kind of optimizations HANA provides out of the box.
> > > > > For the curios types among us, the implementation details of HANA
> are
> > > > well
> > > > > documented in "In-memory Data Management", by Hasso Plattner &
> > > Alexander
> > > > > Zeier.
> > > > > Cheers
> > > > > Andrey
> > > > > _____________________________
> > > > > From: Alexey Kuznetsov <[hidden email]<mailto:
> > > > > [hidden email]>>
> > > > > Sent: Tuesday, July 26, 2016 5:36 AM
> > > > > Subject: Re: Data compression in Ignite 2.0
> > > > > To: <[hidden email]<mailto:[hidden email]>>
> > > > >
> > > > >
> > > > > Sergey Kozlov wrote:
> > > > > >> For approach 1: Put a large object into a partition cache will
> > > > > force to update
> > > > > the dictionary placed on replication cache. It may be time-expense
> > > > > operation.
> > > > > The dictionary will be built only once. And we could control what
> > > should
> > > > be
> > > > > put into dictionary, for example, we could check min and max size
> and
> > > > > decide - put value to dictionary or not.
> > > > >
> > > > > >> Approach 2-3 are make sense for rare cases as Sergi commented.
> > > > > But it is better at least have a possibility to plug user code for
> > > > > compression than not to have it at all.
> > > > >
> > > > > >> Also I see a danger of OOM if we've got high compression level
> and
> > > try
> > > > > to restore original value in memory.
> > > > > We could easily get OOM with many other operations right now
> without
> > > > > compression, I think it is not an issue, we could add a NOTE to
> > > > > documentation about such possibility.
> > > > >
> > > > > Andrey Kornev wrote:
> > > > > >> ... in general I think compression is a great data. The cleanest
> > way
> > > > to
> > > > > achieve that would be to just make it possible to chain the
> > > > marshallers...
> > > > > I think it is also good idea. And looks like it could be used for
> > > > > compression with some sort of ZIP algorithm, but how to deal with
> > > > > compression by dictionary substitution?
> > > > > We need to build dictionary first. Any ideas?
> > > > >
> > > > > Nikita Ivanov wrote:
> > > > > >> SAP Hana does the compression by 1) compressing SQL parameters
> > > before
> > > > > execution...
> > > > > Looks interesting, but my initial point was about compression of
> > cache
> > > > > data, not SQL queries.
> > > > > My idea was to make compression transparent for SQL engine when it
> > will
> > > > > lookup for data.
> > > > >
> > > > > But idea of compressing SQL queries result looks very interesting,
> > > > because
> > > > > it is known fact, that SQL engine could consume quite a lot of heap
> > for
> > > > > storing result sets.
> > > > > I think this should be discussed in separate thread.
> > > > >
> > > > > Just for you information, in first message I mentioned that DB2 has
> > > > > compression by dictionary and according to them it is possible to
> > > > > compress usual data to 50-80%.
> > > > > I have some experience with DB2 and can confirm this.
> > > > >
> > > > > --
> > > > > Alexey Kuznetsov
> > > >
> > >
> > >
> > > --
> > > Alexey Kuznetsov
> > >
> >
>



--
Alexey Kuznetsov
GridGain Systems
www.gridgain.com
Reply | Threaded
Open this post in threaded view
|

Re: Data compression in Ignite 2.0

daradurvs
Hi Igniters!

I am interested in this task.
Provide some kind of pluggable compression SPI support

I developed a solution on BinaryMarshaller-level, but reviewer has rejected it.

Let's continue discussion of task goals and solution design.
As I understood that, the main goal of this task is to store data in compressed form.
This is what I need from Ignite as its user. Compression provides economy on servers.
We can store more data on same servers at the cost of increasing CPU utilization.

I'm researching a possibility of implementation of compression at the cache-level.

Any thoughts?

--
Best regards,
Vyacheslav
Reply | Threaded
Open this post in threaded view
|

Re: Data compression in Ignite 2.0

Alexey Kuznetsov
Vyacheslav,

Did you read initial discussion [1] about compression?
As far as I remember we agreed to add only some "top-level" API in order to
provide a way for
Ignite users to inject some sort of custom compression.


[1]
http://apache-ignite-developers.2346864.n4.nabble.com/Data-compression-in-Ignite-2-0-td10099.html

On Mon, Apr 10, 2017 at 2:19 PM, daradurvs <[hidden email]> wrote:

> Hi Igniters!
>
> I am interested in this task.
> Provide some kind of pluggable compression SPI support
> <https://issues.apache.org/jira/browse/IGNITE-3592>
>
> I developed a solution on BinaryMarshaller-level, but reviewer has rejected
> it.
>
> Let's continue discussion of task goals and solution design.
> As I understood that, the main goal of this task is to store data in
> compressed form.
> This is what I need from Ignite as its user. Compression provides economy
> on
> servers.
> We can store more data on same servers at the cost of increasing CPU
> utilization.
>
> I'm researching a possibility of implementation of compression at the
> cache-level.
>
> Any thoughts?
>
> --
> Best regards,
> Vyacheslav
>
>
>
>
> --
> View this message in context: http://apache-ignite-
> developers.2346864.n4.nabble.com/Data-compression-in-
> Ignite-2-0-tp10099p16317.html
> Sent from the Apache Ignite Developers mailing list archive at Nabble.com.
>



--
Alexey Kuznetsov
Reply | Threaded
Open this post in threaded view
|

Re: Data compression in Ignite 2.0

daradurvs
Alexey,

Yes, I've read it.

Ok, let's discuss about public API design.

I think we need to add some a configure entity to CacheConfiguration, which
will contain the Compressor interface implementation and some usefull
parameters.
Or maybe to provide a BinaryMarshaller decorator, which will be compress
data after marshalling.


2017-04-10 10:40 GMT+03:00 Alexey Kuznetsov <[hidden email]>:

> Vyacheslav,
>
> Did you read initial discussion [1] about compression?
> As far as I remember we agreed to add only some "top-level" API in order to
> provide a way for
> Ignite users to inject some sort of custom compression.
>
>
> [1]
> http://apache-ignite-developers.2346864.n4.nabble.com/Data-compression-in-
> Ignite-2-0-td10099.html
>
> On Mon, Apr 10, 2017 at 2:19 PM, daradurvs <[hidden email]> wrote:
>
> > Hi Igniters!
> >
> > I am interested in this task.
> > Provide some kind of pluggable compression SPI support
> > <https://issues.apache.org/jira/browse/IGNITE-3592>
> >
> > I developed a solution on BinaryMarshaller-level, but reviewer has
> rejected
> > it.
> >
> > Let's continue discussion of task goals and solution design.
> > As I understood that, the main goal of this task is to store data in
> > compressed form.
> > This is what I need from Ignite as its user. Compression provides economy
> > on
> > servers.
> > We can store more data on same servers at the cost of increasing CPU
> > utilization.
> >
> > I'm researching a possibility of implementation of compression at the
> > cache-level.
> >
> > Any thoughts?
> >
> > --
> > Best regards,
> > Vyacheslav
> >
> >
> >
> >
> > --
> > View this message in context: http://apache-ignite-
> > developers.2346864.n4.nabble.com/Data-compression-in-
> > Ignite-2-0-tp10099p16317.html
> > Sent from the Apache Ignite Developers mailing list archive at
> Nabble.com.
> >
>
>
>
> --
> Alexey Kuznetsov
>



--
Best Regards, Vyacheslav
Reply | Threaded
Open this post in threaded view
|

Re: Data compression in Ignite 2.0

daradurvs
Hi, Igniters!

Apache 2.0 is released.

Let's continue the discussion about a compression design.

At the moment, I found only one solution which is compatible with querying
and indexing, this is per-objects-field compression.
Per-fields compression means that metadata (a header) of an object won't be
compressed, only serialized values of an object fields (in bytes array
form) will be compressed.

This solution have some contentious issues:
- small values, like primitives and short arrays - there isn't sense to
compress them;
- there is no possible to use compression with java-predefined types;

We can provide an annotation, @IgniteCompression - for example, which can
be used by users for marking fields to compress.

Any thoughts?

Maybe someone already have ready design?

2017-04-10 11:06 GMT+03:00 Vyacheslav Daradur <[hidden email]>:

> Alexey,
>
> Yes, I've read it.
>
> Ok, let's discuss about public API design.
>
> I think we need to add some a configure entity to CacheConfiguration,
> which will contain the Compressor interface implementation and some usefull
> parameters.
> Or maybe to provide a BinaryMarshaller decorator, which will be compress
> data after marshalling.
>
>
> 2017-04-10 10:40 GMT+03:00 Alexey Kuznetsov <[hidden email]>:
>
>> Vyacheslav,
>>
>> Did you read initial discussion [1] about compression?
>> As far as I remember we agreed to add only some "top-level" API in order
>> to
>> provide a way for
>> Ignite users to inject some sort of custom compression.
>>
>>
>> [1]
>> http://apache-ignite-developers.2346864.n4.nabble.com/Data-
>> compression-in-Ignite-2-0-td10099.html
>>
>> On Mon, Apr 10, 2017 at 2:19 PM, daradurvs <[hidden email]> wrote:
>>
>> > Hi Igniters!
>> >
>> > I am interested in this task.
>> > Provide some kind of pluggable compression SPI support
>> > <https://issues.apache.org/jira/browse/IGNITE-3592>
>> >
>> > I developed a solution on BinaryMarshaller-level, but reviewer has
>> rejected
>> > it.
>> >
>> > Let's continue discussion of task goals and solution design.
>> > As I understood that, the main goal of this task is to store data in
>> > compressed form.
>> > This is what I need from Ignite as its user. Compression provides
>> economy
>> > on
>> > servers.
>> > We can store more data on same servers at the cost of increasing CPU
>> > utilization.
>> >
>> > I'm researching a possibility of implementation of compression at the
>> > cache-level.
>> >
>> > Any thoughts?
>> >
>> > --
>> > Best regards,
>> > Vyacheslav
>> >
>> >
>> >
>> >
>> > --
>> > View this message in context: http://apache-ignite-
>> > developers.2346864.n4.nabble.com/Data-compression-in-
>> > Ignite-2-0-tp10099p16317.html
>> > Sent from the Apache Ignite Developers mailing list archive at
>> Nabble.com.
>> >
>>
>>
>>
>> --
>> Alexey Kuznetsov
>>
>
>
>
> --
> Best Regards, Vyacheslav
>



--
Best Regards, Vyacheslav
Reply | Threaded
Open this post in threaded view
|

Re: Data compression in Ignite 2.0

daradurvs
I created the ticket: https://issues.apache.org/jira/browse/IGNITE-5226

I'll prepare a PR with described solution in couple of days.

2017-05-15 15:05 GMT+03:00 Vyacheslav Daradur <[hidden email]>:

> Hi, Igniters!
>
> Apache 2.0 is released.
>
> Let's continue the discussion about a compression design.
>
> At the moment, I found only one solution which is compatible with querying
> and indexing, this is per-objects-field compression.
> Per-fields compression means that metadata (a header) of an object won't
> be compressed, only serialized values of an object fields (in bytes array
> form) will be compressed.
>
> This solution have some contentious issues:
> - small values, like primitives and short arrays - there isn't sense to
> compress them;
> - there is no possible to use compression with java-predefined types;
>
> We can provide an annotation, @IgniteCompression - for example, which can
> be used by users for marking fields to compress.
>
> Any thoughts?
>
> Maybe someone already have ready design?
>
> 2017-04-10 11:06 GMT+03:00 Vyacheslav Daradur <[hidden email]>:
>
>> Alexey,
>>
>> Yes, I've read it.
>>
>> Ok, let's discuss about public API design.
>>
>> I think we need to add some a configure entity to CacheConfiguration,
>> which will contain the Compressor interface implementation and some usefull
>> parameters.
>> Or maybe to provide a BinaryMarshaller decorator, which will be compress
>> data after marshalling.
>>
>>
>> 2017-04-10 10:40 GMT+03:00 Alexey Kuznetsov <[hidden email]>:
>>
>>> Vyacheslav,
>>>
>>> Did you read initial discussion [1] about compression?
>>> As far as I remember we agreed to add only some "top-level" API in order
>>> to
>>> provide a way for
>>> Ignite users to inject some sort of custom compression.
>>>
>>>
>>> [1]
>>> http://apache-ignite-developers.2346864.n4.nabble.com/Data-c
>>> ompression-in-Ignite-2-0-td10099.html
>>>
>>> On Mon, Apr 10, 2017 at 2:19 PM, daradurvs <[hidden email]> wrote:
>>>
>>> > Hi Igniters!
>>> >
>>> > I am interested in this task.
>>> > Provide some kind of pluggable compression SPI support
>>> > <https://issues.apache.org/jira/browse/IGNITE-3592>
>>> >
>>> > I developed a solution on BinaryMarshaller-level, but reviewer has
>>> rejected
>>> > it.
>>> >
>>> > Let's continue discussion of task goals and solution design.
>>> > As I understood that, the main goal of this task is to store data in
>>> > compressed form.
>>> > This is what I need from Ignite as its user. Compression provides
>>> economy
>>> > on
>>> > servers.
>>> > We can store more data on same servers at the cost of increasing CPU
>>> > utilization.
>>> >
>>> > I'm researching a possibility of implementation of compression at the
>>> > cache-level.
>>> >
>>> > Any thoughts?
>>> >
>>> > --
>>> > Best regards,
>>> > Vyacheslav
>>> >
>>> >
>>> >
>>> >
>>> > --
>>> > View this message in context: http://apache-ignite-
>>> > developers.2346864.n4.nabble.com/Data-compression-in-
>>> > Ignite-2-0-tp10099p16317.html
>>> > Sent from the Apache Ignite Developers mailing list archive at
>>> Nabble.com.
>>> >
>>>
>>>
>>>
>>> --
>>> Alexey Kuznetsov
>>>
>>
>>
>>
>> --
>> Best Regards, Vyacheslav
>>
>
>
>
> --
> Best Regards, Vyacheslav
>



--
Best Regards, Vyacheslav
Reply | Threaded
Open this post in threaded view
|

Re: Data compression in Ignite 2.0

dsetrakyan
Vyacheslav,

I think it is a bit premature to provide a PR without getting a community
consensus on the dev list. Please allow some time for the community to
respond.

D.

On Mon, May 15, 2017 at 6:36 AM, Vyacheslav Daradur <[hidden email]>
wrote:

> I created the ticket: https://issues.apache.org/jira/browse/IGNITE-5226
>
> I'll prepare a PR with described solution in couple of days.
>
> 2017-05-15 15:05 GMT+03:00 Vyacheslav Daradur <[hidden email]>:
>
> > Hi, Igniters!
> >
> > Apache 2.0 is released.
> >
> > Let's continue the discussion about a compression design.
> >
> > At the moment, I found only one solution which is compatible with
> querying
> > and indexing, this is per-objects-field compression.
> > Per-fields compression means that metadata (a header) of an object won't
> > be compressed, only serialized values of an object fields (in bytes array
> > form) will be compressed.
> >
> > This solution have some contentious issues:
> > - small values, like primitives and short arrays - there isn't sense to
> > compress them;
> > - there is no possible to use compression with java-predefined types;
> >
> > We can provide an annotation, @IgniteCompression - for example, which can
> > be used by users for marking fields to compress.
> >
> > Any thoughts?
> >
> > Maybe someone already have ready design?
> >
> > 2017-04-10 11:06 GMT+03:00 Vyacheslav Daradur <[hidden email]>:
> >
> >> Alexey,
> >>
> >> Yes, I've read it.
> >>
> >> Ok, let's discuss about public API design.
> >>
> >> I think we need to add some a configure entity to CacheConfiguration,
> >> which will contain the Compressor interface implementation and some
> usefull
> >> parameters.
> >> Or maybe to provide a BinaryMarshaller decorator, which will be compress
> >> data after marshalling.
> >>
> >>
> >> 2017-04-10 10:40 GMT+03:00 Alexey Kuznetsov <[hidden email]>:
> >>
> >>> Vyacheslav,
> >>>
> >>> Did you read initial discussion [1] about compression?
> >>> As far as I remember we agreed to add only some "top-level" API in
> order
> >>> to
> >>> provide a way for
> >>> Ignite users to inject some sort of custom compression.
> >>>
> >>>
> >>> [1]
> >>> http://apache-ignite-developers.2346864.n4.nabble.com/Data-c
> >>> ompression-in-Ignite-2-0-td10099.html
> >>>
> >>> On Mon, Apr 10, 2017 at 2:19 PM, daradurvs <[hidden email]>
> wrote:
> >>>
> >>> > Hi Igniters!
> >>> >
> >>> > I am interested in this task.
> >>> > Provide some kind of pluggable compression SPI support
> >>> > <https://issues.apache.org/jira/browse/IGNITE-3592>
> >>> >
> >>> > I developed a solution on BinaryMarshaller-level, but reviewer has
> >>> rejected
> >>> > it.
> >>> >
> >>> > Let's continue discussion of task goals and solution design.
> >>> > As I understood that, the main goal of this task is to store data in
> >>> > compressed form.
> >>> > This is what I need from Ignite as its user. Compression provides
> >>> economy
> >>> > on
> >>> > servers.
> >>> > We can store more data on same servers at the cost of increasing CPU
> >>> > utilization.
> >>> >
> >>> > I'm researching a possibility of implementation of compression at the
> >>> > cache-level.
> >>> >
> >>> > Any thoughts?
> >>> >
> >>> > --
> >>> > Best regards,
> >>> > Vyacheslav
> >>> >
> >>> >
> >>> >
> >>> >
> >>> > --
> >>> > View this message in context: http://apache-ignite-
> >>> > developers.2346864.n4.nabble.com/Data-compression-in-
> >>> > Ignite-2-0-tp10099p16317.html
> >>> > Sent from the Apache Ignite Developers mailing list archive at
> >>> Nabble.com.
> >>> >
> >>>
> >>>
> >>>
> >>> --
> >>> Alexey Kuznetsov
> >>>
> >>
> >>
> >>
> >> --
> >> Best Regards, Vyacheslav
> >>
> >
> >
> >
> > --
> > Best Regards, Vyacheslav
> >
>
>
>
> --
> Best Regards, Vyacheslav
>
123