Hi, All!
I would like to propose one more feature for Ignite 2.0. Data compression for data in binary format. Binary format is stored as field name + field data. So we have a description. How about to add one more byte to binary data descriptor: *Compressed*: 0 - Data stored as is (no compression). 1 - Data compressed by dictionary (something like DB2 row compression [1], but for all binary types). We could have system or user defined replicated cache for such dictionary and *cache.compact()* method that will scan cache, build dictionary and compact data. 2 - Data compressed by Java built in ZIP. 3 - Data compressed by some user custom algorithm. Of course it is possible to compress data in current Ignite 1.x but in this case compressed data cannot be accessed from SQL engine, if we implement support for compression on Ignite core level SQL engine will be able to detect that data is compressed and properly handle such data. What do you think? If community consider this feature useful I will create issue in JIRA. [1] http://www.ibm.com/developerworks/data/library/techarticle/dm-1205db210compression/ -- Alexey Kuznetsov |
This will make sense only for rare cases when you have very large objects
stored, which can be effectively compressed. And even then it will introduce slowdown on all the operations, which often will not be acceptable. I guess only few users will find this feature useful, thus I think it does not worth the effort. Sergi 2016-07-25 9:28 GMT+03:00 Alexey Kuznetsov <[hidden email]>: > Hi, All! > > I would like to propose one more feature for Ignite 2.0. > > Data compression for data in binary format. > > Binary format is stored as field name + field data. > So we have a description. > How about to add one more byte to binary data descriptor: > > *Compressed*: > 0 - Data stored as is (no compression). > 1 - Data compressed by dictionary (something like DB2 row compression [1], > but for all binary types). We could have system or user defined replicated > cache for such dictionary and *cache.compact()* method that will scan > cache, build dictionary and compact data. > 2 - Data compressed by Java built in ZIP. > 3 - Data compressed by some user custom algorithm. > > Of course it is possible to compress data in current Ignite 1.x but in this > case compressed data cannot be accessed from SQL engine, if we implement > support for compression on Ignite core level SQL engine will be able to > detect that data is compressed and properly handle such data. > > What do you think? > If community consider this feature useful I will create issue in JIRA. > > [1] > > http://www.ibm.com/developerworks/data/library/techarticle/dm-1205db210compression/ > > -- > Alexey Kuznetsov > |
Sergi,
Of course it will introduce some slowdown, but with compression more data could be stored in memory and not will be evicted to disk. In case of compress by dictionary substitution it will be only one more lookup and should be fast. In general we could provide only API for compression out of the box, and users that really need some sort of compression will implement it by them self. This will not require much effort I think. On Mon, Jul 25, 2016 at 2:18 PM, Sergi Vladykin <[hidden email]> wrote: > This will make sense only for rare cases when you have very large objects > stored, which can be effectively compressed. And even then it will > introduce slowdown on all the operations, which often will not be > acceptable. I guess only few users will find this feature useful, thus I > think it does not worth the effort. > > Sergi > > > > 2016-07-25 9:28 GMT+03:00 Alexey Kuznetsov <[hidden email]>: > > > Hi, All! > > > > I would like to propose one more feature for Ignite 2.0. > > > > Data compression for data in binary format. > > > > Binary format is stored as field name + field data. > > So we have a description. > > How about to add one more byte to binary data descriptor: > > > > *Compressed*: > > 0 - Data stored as is (no compression). > > 1 - Data compressed by dictionary (something like DB2 row compression > [1], > > but for all binary types). We could have system or user defined > replicated > > cache for such dictionary and *cache.compact()* method that will scan > > cache, build dictionary and compact data. > > 2 - Data compressed by Java built in ZIP. > > 3 - Data compressed by some user custom algorithm. > > > > Of course it is possible to compress data in current Ignite 1.x but in > this > > case compressed data cannot be accessed from SQL engine, if we implement > > support for compression on Ignite core level SQL engine will be able to > > detect that data is compressed and properly handle such data. > > > > What do you think? > > If community consider this feature useful I will create issue in JIRA. > > > > [1] > > > > > http://www.ibm.com/developerworks/data/library/techarticle/dm-1205db210compression/ > > > > -- > > Alexey Kuznetsov > > > -- Alexey Kuznetsov |
Hi
For approach 1: Put a large object into a partiton cache will force to update the dictionary placed on replication cache. It seeis it may be time-expense operation. Appoach 2-3 are make sense for rare cases as Sergi commented. Aslo I see a danger of OOM if we've got high compression level and try to restore orginal value in memory. On Mon, Jul 25, 2016 at 10:39 AM, Alexey Kuznetsov <[hidden email]> wrote: > Sergi, > > Of course it will introduce some slowdown, but with compression more data > could be stored in memory > and not will be evicted to disk. In case of compress by dictionary > substitution it will be only one more lookup > and should be fast. > > In general we could provide only API for compression out of the box, and > users that really need some sort of compression > will implement it by them self. This will not require much effort I think. > > > > On Mon, Jul 25, 2016 at 2:18 PM, Sergi Vladykin <[hidden email]> > wrote: > > > This will make sense only for rare cases when you have very large objects > > stored, which can be effectively compressed. And even then it will > > introduce slowdown on all the operations, which often will not be > > acceptable. I guess only few users will find this feature useful, thus I > > think it does not worth the effort. > > > > Sergi > > > > > > > > 2016-07-25 9:28 GMT+03:00 Alexey Kuznetsov <[hidden email]>: > > > > > Hi, All! > > > > > > I would like to propose one more feature for Ignite 2.0. > > > > > > Data compression for data in binary format. > > > > > > Binary format is stored as field name + field data. > > > So we have a description. > > > How about to add one more byte to binary data descriptor: > > > > > > *Compressed*: > > > 0 - Data stored as is (no compression). > > > 1 - Data compressed by dictionary (something like DB2 row compression > > [1], > > > but for all binary types). We could have system or user defined > > replicated > > > cache for such dictionary and *cache.compact()* method that will scan > > > cache, build dictionary and compact data. > > > 2 - Data compressed by Java built in ZIP. > > > 3 - Data compressed by some user custom algorithm. > > > > > > Of course it is possible to compress data in current Ignite 1.x but in > > this > > > case compressed data cannot be accessed from SQL engine, if we > implement > > > support for compression on Ignite core level SQL engine will be able to > > > detect that data is compressed and properly handle such data. > > > > > > What do you think? > > > If community consider this feature useful I will create issue in JIRA. > > > > > > [1] > > > > > > > > > http://www.ibm.com/developerworks/data/library/techarticle/dm-1205db210compression/ > > > > > > -- > > > Alexey Kuznetsov > > > > > > > > > -- > Alexey Kuznetsov > -- Sergey Kozlov GridGain Systems www.gridgain.com |
SAP Hana does the compression by 1) compressing SQL parameters before
execution, and 2) storing only compressed data in memory. This way all SQL queries work as normal with zero modifications or performance overhead. Only results of the query can be (optionally) decompressed back before returning to the user. -- Nikita Ivanov On Mon, Jul 25, 2016 at 1:40 PM, Sergey Kozlov <[hidden email]> wrote: > Hi > > For approach 1: Put a large object into a partiton cache will force to > update the dictionary placed on replication cache. It seeis it may be > time-expense operation. > Appoach 2-3 are make sense for rare cases as Sergi commented. > Aslo I see a danger of OOM if we've got high compression level and try to > restore orginal value in memory. > > On Mon, Jul 25, 2016 at 10:39 AM, Alexey Kuznetsov < > [hidden email]> > wrote: > > > Sergi, > > > > Of course it will introduce some slowdown, but with compression more data > > could be stored in memory > > and not will be evicted to disk. In case of compress by dictionary > > substitution it will be only one more lookup > > and should be fast. > > > > In general we could provide only API for compression out of the box, and > > users that really need some sort of compression > > will implement it by them self. This will not require much effort I > think. > > > > > > > > On Mon, Jul 25, 2016 at 2:18 PM, Sergi Vladykin < > [hidden email]> > > wrote: > > > > > This will make sense only for rare cases when you have very large > objects > > > stored, which can be effectively compressed. And even then it will > > > introduce slowdown on all the operations, which often will not be > > > acceptable. I guess only few users will find this feature useful, thus > I > > > think it does not worth the effort. > > > > > > Sergi > > > > > > > > > > > > 2016-07-25 9:28 GMT+03:00 Alexey Kuznetsov <[hidden email]>: > > > > > > > Hi, All! > > > > > > > > I would like to propose one more feature for Ignite 2.0. > > > > > > > > Data compression for data in binary format. > > > > > > > > Binary format is stored as field name + field data. > > > > So we have a description. > > > > How about to add one more byte to binary data descriptor: > > > > > > > > *Compressed*: > > > > 0 - Data stored as is (no compression). > > > > 1 - Data compressed by dictionary (something like DB2 row > compression > > > [1], > > > > but for all binary types). We could have system or user defined > > > replicated > > > > cache for such dictionary and *cache.compact()* method that will scan > > > > cache, build dictionary and compact data. > > > > 2 - Data compressed by Java built in ZIP. > > > > 3 - Data compressed by some user custom algorithm. > > > > > > > > Of course it is possible to compress data in current Ignite 1.x but > in > > > this > > > > case compressed data cannot be accessed from SQL engine, if we > > implement > > > > support for compression on Ignite core level SQL engine will be able > to > > > > detect that data is compressed and properly handle such data. > > > > > > > > What do you think? > > > > If community consider this feature useful I will create issue in > JIRA. > > > > > > > > [1] > > > > > > > > > > > > > > http://www.ibm.com/developerworks/data/library/techarticle/dm-1205db210compression/ > > > > > > > > -- > > > > Alexey Kuznetsov > > > > > > > > > > > > > > > -- > > Alexey Kuznetsov > > > > > > -- > Sergey Kozlov > GridGain Systems > www.gridgain.com > |
Nikita, this sounds like a pretty elegant approach.
Does anyone in the community see a problem with this design? On Mon, Jul 25, 2016 at 4:49 PM, Nikita Ivanov <[hidden email]> wrote: > SAP Hana does the compression by 1) compressing SQL parameters before > execution, and 2) storing only compressed data in memory. This way all SQL > queries work as normal with zero modifications or performance overhead. > Only results of the query can be (optionally) decompressed back before > returning to the user. > > -- > Nikita Ivanov > > > On Mon, Jul 25, 2016 at 1:40 PM, Sergey Kozlov <[hidden email]> > wrote: > > > Hi > > > > For approach 1: Put a large object into a partiton cache will force to > > update the dictionary placed on replication cache. It seeis it may be > > time-expense operation. > > Appoach 2-3 are make sense for rare cases as Sergi commented. > > Aslo I see a danger of OOM if we've got high compression level and try to > > restore orginal value in memory. > > > > On Mon, Jul 25, 2016 at 10:39 AM, Alexey Kuznetsov < > > [hidden email]> > > wrote: > > > > > Sergi, > > > > > > Of course it will introduce some slowdown, but with compression more > data > > > could be stored in memory > > > and not will be evicted to disk. In case of compress by dictionary > > > substitution it will be only one more lookup > > > and should be fast. > > > > > > In general we could provide only API for compression out of the box, > and > > > users that really need some sort of compression > > > will implement it by them self. This will not require much effort I > > think. > > > > > > > > > > > > On Mon, Jul 25, 2016 at 2:18 PM, Sergi Vladykin < > > [hidden email]> > > > wrote: > > > > > > > This will make sense only for rare cases when you have very large > > objects > > > > stored, which can be effectively compressed. And even then it will > > > > introduce slowdown on all the operations, which often will not be > > > > acceptable. I guess only few users will find this feature useful, > thus > > I > > > > think it does not worth the effort. > > > > > > > > Sergi > > > > > > > > > > > > > > > > 2016-07-25 9:28 GMT+03:00 Alexey Kuznetsov <[hidden email] > >: > > > > > > > > > Hi, All! > > > > > > > > > > I would like to propose one more feature for Ignite 2.0. > > > > > > > > > > Data compression for data in binary format. > > > > > > > > > > Binary format is stored as field name + field data. > > > > > So we have a description. > > > > > How about to add one more byte to binary data descriptor: > > > > > > > > > > *Compressed*: > > > > > 0 - Data stored as is (no compression). > > > > > 1 - Data compressed by dictionary (something like DB2 row > > compression > > > > [1], > > > > > but for all binary types). We could have system or user defined > > > > replicated > > > > > cache for such dictionary and *cache.compact()* method that will > scan > > > > > cache, build dictionary and compact data. > > > > > 2 - Data compressed by Java built in ZIP. > > > > > 3 - Data compressed by some user custom algorithm. > > > > > > > > > > Of course it is possible to compress data in current Ignite 1.x but > > in > > > > this > > > > > case compressed data cannot be accessed from SQL engine, if we > > > implement > > > > > support for compression on Ignite core level SQL engine will be > able > > to > > > > > detect that data is compressed and properly handle such data. > > > > > > > > > > What do you think? > > > > > If community consider this feature useful I will create issue in > > JIRA. > > > > > > > > > > [1] > > > > > > > > > > > > > > > > > > > > http://www.ibm.com/developerworks/data/library/techarticle/dm-1205db210compression/ > > > > > > > > > > -- > > > > > Alexey Kuznetsov > > > > > > > > > > > > > > > > > > > > > -- > > > Alexey Kuznetsov > > > > > > > > > > > -- > > Sergey Kozlov > > GridGain Systems > > www.gridgain.com > > > |
I'm guessing the suggestion here is to use the compressed form directly for WHERE clause evaluation. If that's the case I think there are a couple of issues:
1) the LIKE predicate. 2) predicates other than equality (for example, <, >, etc.) But since Ignite isn't just about SQL queries (surprisingly some people still use it just as distributed cache!), in general I think compression is a great data. The cleanest way to achieve that would be to just make it possible to chain the marshallers. It is possible to do it already without any Ignite code changes, but unfortunately it would force people to use the non-public BinaryMarshaller class directly (as the first element of the chain). Cheers Andrey ________________________________ From: Dmitriy Setrakyan <[hidden email]> Sent: Monday, July 25, 2016 1:53 PM To: [hidden email] Subject: Re: Data compression in Ignite 2.0 Nikita, this sounds like a pretty elegant approach. Does anyone in the community see a problem with this design? On Mon, Jul 25, 2016 at 4:49 PM, Nikita Ivanov <[hidden email]> wrote: > SAP Hana does the compression by 1) compressing SQL parameters before > execution, and 2) storing only compressed data in memory. This way all SQL > queries work as normal with zero modifications or performance overhead. > Only results of the query can be (optionally) decompressed back before > returning to the user. > > -- > Nikita Ivanov > > > On Mon, Jul 25, 2016 at 1:40 PM, Sergey Kozlov <[hidden email]> > wrote: > > > Hi > > > > For approach 1: Put a large object into a partiton cache will force to > > update the dictionary placed on replication cache. It seeis it may be > > time-expense operation. > > Appoach 2-3 are make sense for rare cases as Sergi commented. > > Aslo I see a danger of OOM if we've got high compression level and try to > > restore orginal value in memory. > > > > On Mon, Jul 25, 2016 at 10:39 AM, Alexey Kuznetsov < > > [hidden email]> > > wrote: > > > > > Sergi, > > > > > > Of course it will introduce some slowdown, but with compression more > data > > > could be stored in memory > > > and not will be evicted to disk. In case of compress by dictionary > > > substitution it will be only one more lookup > > > and should be fast. > > > > > > In general we could provide only API for compression out of the box, > and > > > users that really need some sort of compression > > > will implement it by them self. This will not require much effort I > > think. > > > > > > > > > > > > On Mon, Jul 25, 2016 at 2:18 PM, Sergi Vladykin < > > [hidden email]> > > > wrote: > > > > > > > This will make sense only for rare cases when you have very large > > objects > > > > stored, which can be effectively compressed. And even then it will > > > > introduce slowdown on all the operations, which often will not be > > > > acceptable. I guess only few users will find this feature useful, > thus > > I > > > > think it does not worth the effort. > > > > > > > > Sergi > > > > > > > > > > > > > > > > 2016-07-25 9:28 GMT+03:00 Alexey Kuznetsov <[hidden email] > >: > > > > > > > > > Hi, All! > > > > > > > > > > I would like to propose one more feature for Ignite 2.0. > > > > > > > > > > Data compression for data in binary format. > > > > > > > > > > Binary format is stored as field name + field data. > > > > > So we have a description. > > > > > How about to add one more byte to binary data descriptor: > > > > > > > > > > *Compressed*: > > > > > 0 - Data stored as is (no compression). > > > > > 1 - Data compressed by dictionary (something like DB2 row > > compression > > > > [1], > > > > > but for all binary types). We could have system or user defined > > > > replicated > > > > > cache for such dictionary and *cache.compact()* method that will > scan > > > > > cache, build dictionary and compact data. > > > > > 2 - Data compressed by Java built in ZIP. > > > > > 3 - Data compressed by some user custom algorithm. > > > > > > > > > > Of course it is possible to compress data in current Ignite 1.x but > > in > > > > this > > > > > case compressed data cannot be accessed from SQL engine, if we > > > implement > > > > > support for compression on Ignite core level SQL engine will be > able > > to > > > > > detect that data is compressed and properly handle such data. > > > > > > > > > > What do you think? > > > > > If community consider this feature useful I will create issue in > > JIRA. > > > > > > > > > > [1] > > > > > > > > > > > > > > > > > > > > http://www.ibm.com/developerworks/data/library/techarticle/dm-1205db210compression/ www.ibm.com Thomas Fanghaenel has been with IBM for nine years and is a Senior Software Engineer with the DB2 Data Warehouse Storage and Indexing ... > > > > > > > > > > -- > > > > > Alexey Kuznetsov > > > > > > > > > > > > > > > > > > > > > -- > > > Alexey Kuznetsov > > > > > > > > > > > -- > > Sergey Kozlov > > GridGain Systems > > www.gridgain.com<http://www.gridgain.com> > > > |
Sergey Kozlov wrote:
>> For approach 1: Put a large object into a partition cache will force to update the dictionary placed on replication cache. It may be time-expense operation. The dictionary will be built only once. And we could control what should be put into dictionary, for example, we could check min and max size and decide - put value to dictionary or not. >> Approach 2-3 are make sense for rare cases as Sergi commented. But it is better at least have a possibility to plug user code for compression than not to have it at all. >> Also I see a danger of OOM if we've got high compression level and try to restore original value in memory. We could easily get OOM with many other operations right now without compression, I think it is not an issue, we could add a NOTE to documentation about such possibility. Andrey Kornev wrote: >> ... in general I think compression is a great data. The cleanest way to achieve that would be to just make it possible to chain the marshallers... I think it is also good idea. And looks like it could be used for compression with some sort of ZIP algorithm, but how to deal with compression by dictionary substitution? We need to build dictionary first. Any ideas? Nikita Ivanov wrote: >> SAP Hana does the compression by 1) compressing SQL parameters before execution... Looks interesting, but my initial point was about compression of cache data, not SQL queries. My idea was to make compression transparent for SQL engine when it will lookup for data. But idea of compressing SQL queries result looks very interesting, because it is known fact, that SQL engine could consume quite a lot of heap for storing result sets. I think this should be discussed in separate thread. Just for you information, in first message I mentioned that DB2 has compression by dictionary and according to them it is possible to compress usual data to 50-80%. I have some experience with DB2 and can confirm this. -- Alexey Kuznetsov |
Dictionary compression requires some knowledge about data being compressed. For example, for numeric types a range of values must be known so that the dictionary can be generated. For strings, the number of unique values of the column is the key piece of input into the dictionary generation.
SAP HANA is a column-based database system: it stores the fields of the data tuple individually using the best compression for the given data type and the particular set of values. HANA has been specifically built as a general purpose database, rather than as an afterthought layer on top of an already existing distributed cache. On the other hand, Ignite is a distributed cache implementation (a pretty good one!) that in general requires no schema and stores its data in the row-based fashion. Its current design doesn't land itself readily to the kind of optimizations HANA provides out of the box. For the curios types among us, the implementation details of HANA are well documented in "In-memory Data Management", by Hasso Plattner & Alexander Zeier. Cheers Andrey _____________________________ From: Alexey Kuznetsov <[hidden email]<mailto:[hidden email]>> Sent: Tuesday, July 26, 2016 5:36 AM Subject: Re: Data compression in Ignite 2.0 To: <[hidden email]<mailto:[hidden email]>> Sergey Kozlov wrote: >> For approach 1: Put a large object into a partition cache will force to update the dictionary placed on replication cache. It may be time-expense operation. The dictionary will be built only once. And we could control what should be put into dictionary, for example, we could check min and max size and decide - put value to dictionary or not. >> Approach 2-3 are make sense for rare cases as Sergi commented. But it is better at least have a possibility to plug user code for compression than not to have it at all. >> Also I see a danger of OOM if we've got high compression level and try to restore original value in memory. We could easily get OOM with many other operations right now without compression, I think it is not an issue, we could add a NOTE to documentation about such possibility. Andrey Kornev wrote: >> ... in general I think compression is a great data. The cleanest way to achieve that would be to just make it possible to chain the marshallers... I think it is also good idea. And looks like it could be used for compression with some sort of ZIP algorithm, but how to deal with compression by dictionary substitution? We need to build dictionary first. Any ideas? Nikita Ivanov wrote: >> SAP Hana does the compression by 1) compressing SQL parameters before execution... Looks interesting, but my initial point was about compression of cache data, not SQL queries. My idea was to make compression transparent for SQL engine when it will lookup for data. But idea of compressing SQL queries result looks very interesting, because it is known fact, that SQL engine could consume quite a lot of heap for storing result sets. I think this should be discussed in separate thread. Just for you information, in first message I mentioned that DB2 has compression by dictionary and according to them it is possible to compress usual data to 50-80%. I have some experience with DB2 and can confirm this. -- Alexey Kuznetsov |
Very good points indeed. I get the compression in Ignite question quite
often and Hana reference is a typical lead in. My personal opinion is still that in Ignite *specifically* the compression is best left to the end-user. But we may need to provide a better facility to inject user's logic here... -- Nikita Ivanov On Tue, Jul 26, 2016 at 9:53 PM, Andrey Kornev <[hidden email]> wrote: > Dictionary compression requires some knowledge about data being > compressed. For example, for numeric types a range of values must be known > so that the dictionary can be generated. For strings, the number of unique > values of the column is the key piece of input into the dictionary > generation. > SAP HANA is a column-based database system: it stores the fields of the > data tuple individually using the best compression for the given data type > and the particular set of values. HANA has been specifically built as a > general purpose database, rather than as an afterthought layer on top of an > already existing distributed cache. > On the other hand, Ignite is a distributed cache implementation (a pretty > good one!) that in general requires no schema and stores its data in the > row-based fashion. Its current design doesn't land itself readily to the > kind of optimizations HANA provides out of the box. > For the curios types among us, the implementation details of HANA are well > documented in "In-memory Data Management", by Hasso Plattner & Alexander > Zeier. > Cheers > Andrey > _____________________________ > From: Alexey Kuznetsov <[hidden email]<mailto: > [hidden email]>> > Sent: Tuesday, July 26, 2016 5:36 AM > Subject: Re: Data compression in Ignite 2.0 > To: <[hidden email]<mailto:[hidden email]>> > > > Sergey Kozlov wrote: > >> For approach 1: Put a large object into a partition cache will > force to update > the dictionary placed on replication cache. It may be time-expense > operation. > The dictionary will be built only once. And we could control what should be > put into dictionary, for example, we could check min and max size and > decide - put value to dictionary or not. > > >> Approach 2-3 are make sense for rare cases as Sergi commented. > But it is better at least have a possibility to plug user code for > compression than not to have it at all. > > >> Also I see a danger of OOM if we've got high compression level and try > to restore original value in memory. > We could easily get OOM with many other operations right now without > compression, I think it is not an issue, we could add a NOTE to > documentation about such possibility. > > Andrey Kornev wrote: > >> ... in general I think compression is a great data. The cleanest way to > achieve that would be to just make it possible to chain the marshallers... > I think it is also good idea. And looks like it could be used for > compression with some sort of ZIP algorithm, but how to deal with > compression by dictionary substitution? > We need to build dictionary first. Any ideas? > > Nikita Ivanov wrote: > >> SAP Hana does the compression by 1) compressing SQL parameters before > execution... > Looks interesting, but my initial point was about compression of cache > data, not SQL queries. > My idea was to make compression transparent for SQL engine when it will > lookup for data. > > But idea of compressing SQL queries result looks very interesting, because > it is known fact, that SQL engine could consume quite a lot of heap for > storing result sets. > I think this should be discussed in separate thread. > > Just for you information, in first message I mentioned that DB2 has > compression by dictionary and according to them it is possible to > compress usual data to 50-80%. > I have some experience with DB2 and can confirm this. > > -- > Alexey Kuznetsov > > > |
Nikita,
That was my intention: "we may need to provide a better facility to inject user's logic here..." Andrey, About compression, once again - DB2 is a row-based DB and they can compress :) On Wed, Jul 27, 2016 at 12:56 PM, Nikita Ivanov <[hidden email]> wrote: > Very good points indeed. I get the compression in Ignite question quite > often and Hana reference is a typical lead in. > > My personal opinion is still that in Ignite *specifically* the compression > is best left to the end-user. But we may need to provide a better facility > to inject user's logic here... > > -- > Nikita Ivanov > > > On Tue, Jul 26, 2016 at 9:53 PM, Andrey Kornev <[hidden email]> > wrote: > > > Dictionary compression requires some knowledge about data being > > compressed. For example, for numeric types a range of values must be > known > > so that the dictionary can be generated. For strings, the number of > unique > > values of the column is the key piece of input into the dictionary > > generation. > > SAP HANA is a column-based database system: it stores the fields of the > > data tuple individually using the best compression for the given data > type > > and the particular set of values. HANA has been specifically built as a > > general purpose database, rather than as an afterthought layer on top of > an > > already existing distributed cache. > > On the other hand, Ignite is a distributed cache implementation (a pretty > > good one!) that in general requires no schema and stores its data in the > > row-based fashion. Its current design doesn't land itself readily to the > > kind of optimizations HANA provides out of the box. > > For the curios types among us, the implementation details of HANA are > well > > documented in "In-memory Data Management", by Hasso Plattner & Alexander > > Zeier. > > Cheers > > Andrey > > _____________________________ > > From: Alexey Kuznetsov <[hidden email]<mailto: > > [hidden email]>> > > Sent: Tuesday, July 26, 2016 5:36 AM > > Subject: Re: Data compression in Ignite 2.0 > > To: <[hidden email]<mailto:[hidden email]>> > > > > > > Sergey Kozlov wrote: > > >> For approach 1: Put a large object into a partition cache will > > force to update > > the dictionary placed on replication cache. It may be time-expense > > operation. > > The dictionary will be built only once. And we could control what should > be > > put into dictionary, for example, we could check min and max size and > > decide - put value to dictionary or not. > > > > >> Approach 2-3 are make sense for rare cases as Sergi commented. > > But it is better at least have a possibility to plug user code for > > compression than not to have it at all. > > > > >> Also I see a danger of OOM if we've got high compression level and try > > to restore original value in memory. > > We could easily get OOM with many other operations right now without > > compression, I think it is not an issue, we could add a NOTE to > > documentation about such possibility. > > > > Andrey Kornev wrote: > > >> ... in general I think compression is a great data. The cleanest way > to > > achieve that would be to just make it possible to chain the > marshallers... > > I think it is also good idea. And looks like it could be used for > > compression with some sort of ZIP algorithm, but how to deal with > > compression by dictionary substitution? > > We need to build dictionary first. Any ideas? > > > > Nikita Ivanov wrote: > > >> SAP Hana does the compression by 1) compressing SQL parameters before > > execution... > > Looks interesting, but my initial point was about compression of cache > > data, not SQL queries. > > My idea was to make compression transparent for SQL engine when it will > > lookup for data. > > > > But idea of compressing SQL queries result looks very interesting, > because > > it is known fact, that SQL engine could consume quite a lot of heap for > > storing result sets. > > I think this should be discussed in separate thread. > > > > Just for you information, in first message I mentioned that DB2 has > > compression by dictionary and according to them it is possible to > > compress usual data to 50-80%. > > I have some experience with DB2 and can confirm this. > > > > -- > > Alexey Kuznetsov > -- Alexey Kuznetsov |
Hi
I add Redis as a sample of memory compression strategy http://labs.octivi.com/how-we-cut-down-memory-usage-by-82/ http://redis.io/topics/memory-optimization Regards S DIAZ 2016-07-27 8:24 GMT+02:00 Alexey Kuznetsov <[hidden email]>: > Nikita, > > That was my intention: "we may need to provide a better facility to inject > user's logic here..." > > Andrey, > About compression, once again - DB2 is a row-based DB and they can compress > :) > > On Wed, Jul 27, 2016 at 12:56 PM, Nikita Ivanov <[hidden email]> > wrote: > > > Very good points indeed. I get the compression in Ignite question quite > > often and Hana reference is a typical lead in. > > > > My personal opinion is still that in Ignite *specifically* the > compression > > is best left to the end-user. But we may need to provide a better > facility > > to inject user's logic here... > > > > -- > > Nikita Ivanov > > > > > > On Tue, Jul 26, 2016 at 9:53 PM, Andrey Kornev <[hidden email] > > > > wrote: > > > > > Dictionary compression requires some knowledge about data being > > > compressed. For example, for numeric types a range of values must be > > known > > > so that the dictionary can be generated. For strings, the number of > > unique > > > values of the column is the key piece of input into the dictionary > > > generation. > > > SAP HANA is a column-based database system: it stores the fields of the > > > data tuple individually using the best compression for the given data > > type > > > and the particular set of values. HANA has been specifically built as a > > > general purpose database, rather than as an afterthought layer on top > of > > an > > > already existing distributed cache. > > > On the other hand, Ignite is a distributed cache implementation (a > pretty > > > good one!) that in general requires no schema and stores its data in > the > > > row-based fashion. Its current design doesn't land itself readily to > the > > > kind of optimizations HANA provides out of the box. > > > For the curios types among us, the implementation details of HANA are > > well > > > documented in "In-memory Data Management", by Hasso Plattner & > Alexander > > > Zeier. > > > Cheers > > > Andrey > > > _____________________________ > > > From: Alexey Kuznetsov <[hidden email]<mailto: > > > [hidden email]>> > > > Sent: Tuesday, July 26, 2016 5:36 AM > > > Subject: Re: Data compression in Ignite 2.0 > > > To: <[hidden email]<mailto:[hidden email]>> > > > > > > > > > Sergey Kozlov wrote: > > > >> For approach 1: Put a large object into a partition cache will > > > force to update > > > the dictionary placed on replication cache. It may be time-expense > > > operation. > > > The dictionary will be built only once. And we could control what > should > > be > > > put into dictionary, for example, we could check min and max size and > > > decide - put value to dictionary or not. > > > > > > >> Approach 2-3 are make sense for rare cases as Sergi commented. > > > But it is better at least have a possibility to plug user code for > > > compression than not to have it at all. > > > > > > >> Also I see a danger of OOM if we've got high compression level and > try > > > to restore original value in memory. > > > We could easily get OOM with many other operations right now without > > > compression, I think it is not an issue, we could add a NOTE to > > > documentation about such possibility. > > > > > > Andrey Kornev wrote: > > > >> ... in general I think compression is a great data. The cleanest way > > to > > > achieve that would be to just make it possible to chain the > > marshallers... > > > I think it is also good idea. And looks like it could be used for > > > compression with some sort of ZIP algorithm, but how to deal with > > > compression by dictionary substitution? > > > We need to build dictionary first. Any ideas? > > > > > > Nikita Ivanov wrote: > > > >> SAP Hana does the compression by 1) compressing SQL parameters > before > > > execution... > > > Looks interesting, but my initial point was about compression of cache > > > data, not SQL queries. > > > My idea was to make compression transparent for SQL engine when it will > > > lookup for data. > > > > > > But idea of compressing SQL queries result looks very interesting, > > because > > > it is known fact, that SQL engine could consume quite a lot of heap for > > > storing result sets. > > > I think this should be discussed in separate thread. > > > > > > Just for you information, in first message I mentioned that DB2 has > > > compression by dictionary and according to them it is possible to > > > compress usual data to 50-80%. > > > I have some experience with DB2 and can confirm this. > > > > > > -- > > > Alexey Kuznetsov > > > > > -- > Alexey Kuznetsov > |
Nikita,
I agree with Andrey, HANA is a bad comparison to Ignite in this respect. I did not find any evidence on the internet that their row store is very efficient with compression. It was always about column store. Alexey, As for DB2, can you check what exactly, when and how it compresses and does it give any decent results before suggesting it as an example to follow? I don't think it is good idea to repeat every bad idea after other products. And even if there are good results in DB2, will this all be applicable to Ignite? PostgreSql for example provides TOAST compression and this can be useful when used in a smart way, but this is a very different architecture from what we have. All in all I agree that may be we should provide some kind of pluggable compression SPI support, but do not expect much from it, usually it will be just useless. Sergi 2016-07-27 10:16 GMT+03:00 Sebastien DIAZ <[hidden email]>: > Hi > > I add Redis as a sample of memory compression strategy > > http://labs.octivi.com/how-we-cut-down-memory-usage-by-82/ > > http://redis.io/topics/memory-optimization > > Regards > > S DIAZ > > > > 2016-07-27 8:24 GMT+02:00 Alexey Kuznetsov <[hidden email]>: > > > Nikita, > > > > That was my intention: "we may need to provide a better facility to > inject > > user's logic here..." > > > > Andrey, > > About compression, once again - DB2 is a row-based DB and they can > compress > > :) > > > > On Wed, Jul 27, 2016 at 12:56 PM, Nikita Ivanov <[hidden email]> > > wrote: > > > > > Very good points indeed. I get the compression in Ignite question quite > > > often and Hana reference is a typical lead in. > > > > > > My personal opinion is still that in Ignite *specifically* the > > compression > > > is best left to the end-user. But we may need to provide a better > > facility > > > to inject user's logic here... > > > > > > -- > > > Nikita Ivanov > > > > > > > > > On Tue, Jul 26, 2016 at 9:53 PM, Andrey Kornev < > [hidden email] > > > > > > wrote: > > > > > > > Dictionary compression requires some knowledge about data being > > > > compressed. For example, for numeric types a range of values must be > > > known > > > > so that the dictionary can be generated. For strings, the number of > > > unique > > > > values of the column is the key piece of input into the dictionary > > > > generation. > > > > SAP HANA is a column-based database system: it stores the fields of > the > > > > data tuple individually using the best compression for the given data > > > type > > > > and the particular set of values. HANA has been specifically built > as a > > > > general purpose database, rather than as an afterthought layer on top > > of > > > an > > > > already existing distributed cache. > > > > On the other hand, Ignite is a distributed cache implementation (a > > pretty > > > > good one!) that in general requires no schema and stores its data in > > the > > > > row-based fashion. Its current design doesn't land itself readily to > > the > > > > kind of optimizations HANA provides out of the box. > > > > For the curios types among us, the implementation details of HANA are > > > well > > > > documented in "In-memory Data Management", by Hasso Plattner & > > Alexander > > > > Zeier. > > > > Cheers > > > > Andrey > > > > _____________________________ > > > > From: Alexey Kuznetsov <[hidden email]<mailto: > > > > [hidden email]>> > > > > Sent: Tuesday, July 26, 2016 5:36 AM > > > > Subject: Re: Data compression in Ignite 2.0 > > > > To: <[hidden email]<mailto:[hidden email]>> > > > > > > > > > > > > Sergey Kozlov wrote: > > > > >> For approach 1: Put a large object into a partition cache will > > > > force to update > > > > the dictionary placed on replication cache. It may be time-expense > > > > operation. > > > > The dictionary will be built only once. And we could control what > > should > > > be > > > > put into dictionary, for example, we could check min and max size and > > > > decide - put value to dictionary or not. > > > > > > > > >> Approach 2-3 are make sense for rare cases as Sergi commented. > > > > But it is better at least have a possibility to plug user code for > > > > compression than not to have it at all. > > > > > > > > >> Also I see a danger of OOM if we've got high compression level and > > try > > > > to restore original value in memory. > > > > We could easily get OOM with many other operations right now without > > > > compression, I think it is not an issue, we could add a NOTE to > > > > documentation about such possibility. > > > > > > > > Andrey Kornev wrote: > > > > >> ... in general I think compression is a great data. The cleanest > way > > > to > > > > achieve that would be to just make it possible to chain the > > > marshallers... > > > > I think it is also good idea. And looks like it could be used for > > > > compression with some sort of ZIP algorithm, but how to deal with > > > > compression by dictionary substitution? > > > > We need to build dictionary first. Any ideas? > > > > > > > > Nikita Ivanov wrote: > > > > >> SAP Hana does the compression by 1) compressing SQL parameters > > before > > > > execution... > > > > Looks interesting, but my initial point was about compression of > cache > > > > data, not SQL queries. > > > > My idea was to make compression transparent for SQL engine when it > will > > > > lookup for data. > > > > > > > > But idea of compressing SQL queries result looks very interesting, > > > because > > > > it is known fact, that SQL engine could consume quite a lot of heap > for > > > > storing result sets. > > > > I think this should be discussed in separate thread. > > > > > > > > Just for you information, in first message I mentioned that DB2 has > > > > compression by dictionary and according to them it is possible to > > > > compress usual data to 50-80%. > > > > I have some experience with DB2 and can confirm this. > > > > > > > > -- > > > > Alexey Kuznetsov > > > > > > > > > -- > > Alexey Kuznetsov > > > |
FYI, I created issue for Ignite 2.0:
https://issues.apache.org/jira/browse/IGNITE-3592 Thanks! On Wed, Jul 27, 2016 at 2:36 PM, Sergi Vladykin <[hidden email]> wrote: > Nikita, > > I agree with Andrey, HANA is a bad comparison to Ignite in this respect. I > did not find any evidence on the internet that their row store is very > efficient with compression. It was always about column store. > > Alexey, > > As for DB2, can you check what exactly, when and how it compresses and does > it give any decent results before suggesting it as an example to follow? I > don't think it is good idea to repeat every bad idea after other products. > > And even if there are good results in DB2, will this all be applicable to > Ignite? PostgreSql for example provides TOAST compression and this can be > useful when used in a smart way, but this is a very different architecture > from what we have. > > All in all I agree that may be we should provide some kind of pluggable > compression SPI support, but do not expect much from it, usually it will be > just useless. > > Sergi > > > > 2016-07-27 10:16 GMT+03:00 Sebastien DIAZ <[hidden email]>: > > > Hi > > > > I add Redis as a sample of memory compression strategy > > > > http://labs.octivi.com/how-we-cut-down-memory-usage-by-82/ > > > > http://redis.io/topics/memory-optimization > > > > Regards > > > > S DIAZ > > > > > > > > 2016-07-27 8:24 GMT+02:00 Alexey Kuznetsov <[hidden email]>: > > > > > Nikita, > > > > > > That was my intention: "we may need to provide a better facility to > > inject > > > user's logic here..." > > > > > > Andrey, > > > About compression, once again - DB2 is a row-based DB and they can > > compress > > > :) > > > > > > On Wed, Jul 27, 2016 at 12:56 PM, Nikita Ivanov <[hidden email]> > > > wrote: > > > > > > > Very good points indeed. I get the compression in Ignite question > quite > > > > often and Hana reference is a typical lead in. > > > > > > > > My personal opinion is still that in Ignite *specifically* the > > > compression > > > > is best left to the end-user. But we may need to provide a better > > > facility > > > > to inject user's logic here... > > > > > > > > -- > > > > Nikita Ivanov > > > > > > > > > > > > On Tue, Jul 26, 2016 at 9:53 PM, Andrey Kornev < > > [hidden email] > > > > > > > > wrote: > > > > > > > > > Dictionary compression requires some knowledge about data being > > > > > compressed. For example, for numeric types a range of values must > be > > > > known > > > > > so that the dictionary can be generated. For strings, the number of > > > > unique > > > > > values of the column is the key piece of input into the dictionary > > > > > generation. > > > > > SAP HANA is a column-based database system: it stores the fields of > > the > > > > > data tuple individually using the best compression for the given > data > > > > type > > > > > and the particular set of values. HANA has been specifically built > > as a > > > > > general purpose database, rather than as an afterthought layer on > top > > > of > > > > an > > > > > already existing distributed cache. > > > > > On the other hand, Ignite is a distributed cache implementation (a > > > pretty > > > > > good one!) that in general requires no schema and stores its data > in > > > the > > > > > row-based fashion. Its current design doesn't land itself readily > to > > > the > > > > > kind of optimizations HANA provides out of the box. > > > > > For the curios types among us, the implementation details of HANA > are > > > > well > > > > > documented in "In-memory Data Management", by Hasso Plattner & > > > Alexander > > > > > Zeier. > > > > > Cheers > > > > > Andrey > > > > > _____________________________ > > > > > From: Alexey Kuznetsov <[hidden email]<mailto: > > > > > [hidden email]>> > > > > > Sent: Tuesday, July 26, 2016 5:36 AM > > > > > Subject: Re: Data compression in Ignite 2.0 > > > > > To: <[hidden email]<mailto:[hidden email]>> > > > > > > > > > > > > > > > Sergey Kozlov wrote: > > > > > >> For approach 1: Put a large object into a partition cache will > > > > > force to update > > > > > the dictionary placed on replication cache. It may be time-expense > > > > > operation. > > > > > The dictionary will be built only once. And we could control what > > > should > > > > be > > > > > put into dictionary, for example, we could check min and max size > and > > > > > decide - put value to dictionary or not. > > > > > > > > > > >> Approach 2-3 are make sense for rare cases as Sergi commented. > > > > > But it is better at least have a possibility to plug user code for > > > > > compression than not to have it at all. > > > > > > > > > > >> Also I see a danger of OOM if we've got high compression level > and > > > try > > > > > to restore original value in memory. > > > > > We could easily get OOM with many other operations right now > without > > > > > compression, I think it is not an issue, we could add a NOTE to > > > > > documentation about such possibility. > > > > > > > > > > Andrey Kornev wrote: > > > > > >> ... in general I think compression is a great data. The cleanest > > way > > > > to > > > > > achieve that would be to just make it possible to chain the > > > > marshallers... > > > > > I think it is also good idea. And looks like it could be used for > > > > > compression with some sort of ZIP algorithm, but how to deal with > > > > > compression by dictionary substitution? > > > > > We need to build dictionary first. Any ideas? > > > > > > > > > > Nikita Ivanov wrote: > > > > > >> SAP Hana does the compression by 1) compressing SQL parameters > > > before > > > > > execution... > > > > > Looks interesting, but my initial point was about compression of > > cache > > > > > data, not SQL queries. > > > > > My idea was to make compression transparent for SQL engine when it > > will > > > > > lookup for data. > > > > > > > > > > But idea of compressing SQL queries result looks very interesting, > > > > because > > > > > it is known fact, that SQL engine could consume quite a lot of heap > > for > > > > > storing result sets. > > > > > I think this should be discussed in separate thread. > > > > > > > > > > Just for you information, in first message I mentioned that DB2 has > > > > > compression by dictionary and according to them it is possible to > > > > > compress usual data to 50-80%. > > > > > I have some experience with DB2 and can confirm this. > > > > > > > > > > -- > > > > > Alexey Kuznetsov > > > > > > > > > > > > > -- > > > Alexey Kuznetsov > > > > > > -- Alexey Kuznetsov GridGain Systems www.gridgain.com |
Hi Igniters!
I am interested in this task. Provide some kind of pluggable compression SPI support I developed a solution on BinaryMarshaller-level, but reviewer has rejected it. Let's continue discussion of task goals and solution design. As I understood that, the main goal of this task is to store data in compressed form. This is what I need from Ignite as its user. Compression provides economy on servers. We can store more data on same servers at the cost of increasing CPU utilization. I'm researching a possibility of implementation of compression at the cache-level. Any thoughts? -- Best regards, Vyacheslav |
Vyacheslav,
Did you read initial discussion [1] about compression? As far as I remember we agreed to add only some "top-level" API in order to provide a way for Ignite users to inject some sort of custom compression. [1] http://apache-ignite-developers.2346864.n4.nabble.com/Data-compression-in-Ignite-2-0-td10099.html On Mon, Apr 10, 2017 at 2:19 PM, daradurvs <[hidden email]> wrote: > Hi Igniters! > > I am interested in this task. > Provide some kind of pluggable compression SPI support > <https://issues.apache.org/jira/browse/IGNITE-3592> > > I developed a solution on BinaryMarshaller-level, but reviewer has rejected > it. > > Let's continue discussion of task goals and solution design. > As I understood that, the main goal of this task is to store data in > compressed form. > This is what I need from Ignite as its user. Compression provides economy > on > servers. > We can store more data on same servers at the cost of increasing CPU > utilization. > > I'm researching a possibility of implementation of compression at the > cache-level. > > Any thoughts? > > -- > Best regards, > Vyacheslav > > > > > -- > View this message in context: http://apache-ignite- > developers.2346864.n4.nabble.com/Data-compression-in- > Ignite-2-0-tp10099p16317.html > Sent from the Apache Ignite Developers mailing list archive at Nabble.com. > -- Alexey Kuznetsov |
Alexey,
Yes, I've read it. Ok, let's discuss about public API design. I think we need to add some a configure entity to CacheConfiguration, which will contain the Compressor interface implementation and some usefull parameters. Or maybe to provide a BinaryMarshaller decorator, which will be compress data after marshalling. 2017-04-10 10:40 GMT+03:00 Alexey Kuznetsov <[hidden email]>: > Vyacheslav, > > Did you read initial discussion [1] about compression? > As far as I remember we agreed to add only some "top-level" API in order to > provide a way for > Ignite users to inject some sort of custom compression. > > > [1] > http://apache-ignite-developers.2346864.n4.nabble.com/Data-compression-in- > Ignite-2-0-td10099.html > > On Mon, Apr 10, 2017 at 2:19 PM, daradurvs <[hidden email]> wrote: > > > Hi Igniters! > > > > I am interested in this task. > > Provide some kind of pluggable compression SPI support > > <https://issues.apache.org/jira/browse/IGNITE-3592> > > > > I developed a solution on BinaryMarshaller-level, but reviewer has > rejected > > it. > > > > Let's continue discussion of task goals and solution design. > > As I understood that, the main goal of this task is to store data in > > compressed form. > > This is what I need from Ignite as its user. Compression provides economy > > on > > servers. > > We can store more data on same servers at the cost of increasing CPU > > utilization. > > > > I'm researching a possibility of implementation of compression at the > > cache-level. > > > > Any thoughts? > > > > -- > > Best regards, > > Vyacheslav > > > > > > > > > > -- > > View this message in context: http://apache-ignite- > > developers.2346864.n4.nabble.com/Data-compression-in- > > Ignite-2-0-tp10099p16317.html > > Sent from the Apache Ignite Developers mailing list archive at > Nabble.com. > > > > > > -- > Alexey Kuznetsov > -- Best Regards, Vyacheslav |
Hi, Igniters!
Apache 2.0 is released. Let's continue the discussion about a compression design. At the moment, I found only one solution which is compatible with querying and indexing, this is per-objects-field compression. Per-fields compression means that metadata (a header) of an object won't be compressed, only serialized values of an object fields (in bytes array form) will be compressed. This solution have some contentious issues: - small values, like primitives and short arrays - there isn't sense to compress them; - there is no possible to use compression with java-predefined types; We can provide an annotation, @IgniteCompression - for example, which can be used by users for marking fields to compress. Any thoughts? Maybe someone already have ready design? 2017-04-10 11:06 GMT+03:00 Vyacheslav Daradur <[hidden email]>: > Alexey, > > Yes, I've read it. > > Ok, let's discuss about public API design. > > I think we need to add some a configure entity to CacheConfiguration, > which will contain the Compressor interface implementation and some usefull > parameters. > Or maybe to provide a BinaryMarshaller decorator, which will be compress > data after marshalling. > > > 2017-04-10 10:40 GMT+03:00 Alexey Kuznetsov <[hidden email]>: > >> Vyacheslav, >> >> Did you read initial discussion [1] about compression? >> As far as I remember we agreed to add only some "top-level" API in order >> to >> provide a way for >> Ignite users to inject some sort of custom compression. >> >> >> [1] >> http://apache-ignite-developers.2346864.n4.nabble.com/Data- >> compression-in-Ignite-2-0-td10099.html >> >> On Mon, Apr 10, 2017 at 2:19 PM, daradurvs <[hidden email]> wrote: >> >> > Hi Igniters! >> > >> > I am interested in this task. >> > Provide some kind of pluggable compression SPI support >> > <https://issues.apache.org/jira/browse/IGNITE-3592> >> > >> > I developed a solution on BinaryMarshaller-level, but reviewer has >> rejected >> > it. >> > >> > Let's continue discussion of task goals and solution design. >> > As I understood that, the main goal of this task is to store data in >> > compressed form. >> > This is what I need from Ignite as its user. Compression provides >> economy >> > on >> > servers. >> > We can store more data on same servers at the cost of increasing CPU >> > utilization. >> > >> > I'm researching a possibility of implementation of compression at the >> > cache-level. >> > >> > Any thoughts? >> > >> > -- >> > Best regards, >> > Vyacheslav >> > >> > >> > >> > >> > -- >> > View this message in context: http://apache-ignite- >> > developers.2346864.n4.nabble.com/Data-compression-in- >> > Ignite-2-0-tp10099p16317.html >> > Sent from the Apache Ignite Developers mailing list archive at >> Nabble.com. >> > >> >> >> >> -- >> Alexey Kuznetsov >> > > > > -- > Best Regards, Vyacheslav > -- Best Regards, Vyacheslav |
I created the ticket: https://issues.apache.org/jira/browse/IGNITE-5226
I'll prepare a PR with described solution in couple of days. 2017-05-15 15:05 GMT+03:00 Vyacheslav Daradur <[hidden email]>: > Hi, Igniters! > > Apache 2.0 is released. > > Let's continue the discussion about a compression design. > > At the moment, I found only one solution which is compatible with querying > and indexing, this is per-objects-field compression. > Per-fields compression means that metadata (a header) of an object won't > be compressed, only serialized values of an object fields (in bytes array > form) will be compressed. > > This solution have some contentious issues: > - small values, like primitives and short arrays - there isn't sense to > compress them; > - there is no possible to use compression with java-predefined types; > > We can provide an annotation, @IgniteCompression - for example, which can > be used by users for marking fields to compress. > > Any thoughts? > > Maybe someone already have ready design? > > 2017-04-10 11:06 GMT+03:00 Vyacheslav Daradur <[hidden email]>: > >> Alexey, >> >> Yes, I've read it. >> >> Ok, let's discuss about public API design. >> >> I think we need to add some a configure entity to CacheConfiguration, >> which will contain the Compressor interface implementation and some usefull >> parameters. >> Or maybe to provide a BinaryMarshaller decorator, which will be compress >> data after marshalling. >> >> >> 2017-04-10 10:40 GMT+03:00 Alexey Kuznetsov <[hidden email]>: >> >>> Vyacheslav, >>> >>> Did you read initial discussion [1] about compression? >>> As far as I remember we agreed to add only some "top-level" API in order >>> to >>> provide a way for >>> Ignite users to inject some sort of custom compression. >>> >>> >>> [1] >>> http://apache-ignite-developers.2346864.n4.nabble.com/Data-c >>> ompression-in-Ignite-2-0-td10099.html >>> >>> On Mon, Apr 10, 2017 at 2:19 PM, daradurvs <[hidden email]> wrote: >>> >>> > Hi Igniters! >>> > >>> > I am interested in this task. >>> > Provide some kind of pluggable compression SPI support >>> > <https://issues.apache.org/jira/browse/IGNITE-3592> >>> > >>> > I developed a solution on BinaryMarshaller-level, but reviewer has >>> rejected >>> > it. >>> > >>> > Let's continue discussion of task goals and solution design. >>> > As I understood that, the main goal of this task is to store data in >>> > compressed form. >>> > This is what I need from Ignite as its user. Compression provides >>> economy >>> > on >>> > servers. >>> > We can store more data on same servers at the cost of increasing CPU >>> > utilization. >>> > >>> > I'm researching a possibility of implementation of compression at the >>> > cache-level. >>> > >>> > Any thoughts? >>> > >>> > -- >>> > Best regards, >>> > Vyacheslav >>> > >>> > >>> > >>> > >>> > -- >>> > View this message in context: http://apache-ignite- >>> > developers.2346864.n4.nabble.com/Data-compression-in- >>> > Ignite-2-0-tp10099p16317.html >>> > Sent from the Apache Ignite Developers mailing list archive at >>> Nabble.com. >>> > >>> >>> >>> >>> -- >>> Alexey Kuznetsov >>> >> >> >> >> -- >> Best Regards, Vyacheslav >> > > > > -- > Best Regards, Vyacheslav > -- Best Regards, Vyacheslav |
Vyacheslav,
I think it is a bit premature to provide a PR without getting a community consensus on the dev list. Please allow some time for the community to respond. D. On Mon, May 15, 2017 at 6:36 AM, Vyacheslav Daradur <[hidden email]> wrote: > I created the ticket: https://issues.apache.org/jira/browse/IGNITE-5226 > > I'll prepare a PR with described solution in couple of days. > > 2017-05-15 15:05 GMT+03:00 Vyacheslav Daradur <[hidden email]>: > > > Hi, Igniters! > > > > Apache 2.0 is released. > > > > Let's continue the discussion about a compression design. > > > > At the moment, I found only one solution which is compatible with > querying > > and indexing, this is per-objects-field compression. > > Per-fields compression means that metadata (a header) of an object won't > > be compressed, only serialized values of an object fields (in bytes array > > form) will be compressed. > > > > This solution have some contentious issues: > > - small values, like primitives and short arrays - there isn't sense to > > compress them; > > - there is no possible to use compression with java-predefined types; > > > > We can provide an annotation, @IgniteCompression - for example, which can > > be used by users for marking fields to compress. > > > > Any thoughts? > > > > Maybe someone already have ready design? > > > > 2017-04-10 11:06 GMT+03:00 Vyacheslav Daradur <[hidden email]>: > > > >> Alexey, > >> > >> Yes, I've read it. > >> > >> Ok, let's discuss about public API design. > >> > >> I think we need to add some a configure entity to CacheConfiguration, > >> which will contain the Compressor interface implementation and some > usefull > >> parameters. > >> Or maybe to provide a BinaryMarshaller decorator, which will be compress > >> data after marshalling. > >> > >> > >> 2017-04-10 10:40 GMT+03:00 Alexey Kuznetsov <[hidden email]>: > >> > >>> Vyacheslav, > >>> > >>> Did you read initial discussion [1] about compression? > >>> As far as I remember we agreed to add only some "top-level" API in > order > >>> to > >>> provide a way for > >>> Ignite users to inject some sort of custom compression. > >>> > >>> > >>> [1] > >>> http://apache-ignite-developers.2346864.n4.nabble.com/Data-c > >>> ompression-in-Ignite-2-0-td10099.html > >>> > >>> On Mon, Apr 10, 2017 at 2:19 PM, daradurvs <[hidden email]> > wrote: > >>> > >>> > Hi Igniters! > >>> > > >>> > I am interested in this task. > >>> > Provide some kind of pluggable compression SPI support > >>> > <https://issues.apache.org/jira/browse/IGNITE-3592> > >>> > > >>> > I developed a solution on BinaryMarshaller-level, but reviewer has > >>> rejected > >>> > it. > >>> > > >>> > Let's continue discussion of task goals and solution design. > >>> > As I understood that, the main goal of this task is to store data in > >>> > compressed form. > >>> > This is what I need from Ignite as its user. Compression provides > >>> economy > >>> > on > >>> > servers. > >>> > We can store more data on same servers at the cost of increasing CPU > >>> > utilization. > >>> > > >>> > I'm researching a possibility of implementation of compression at the > >>> > cache-level. > >>> > > >>> > Any thoughts? > >>> > > >>> > -- > >>> > Best regards, > >>> > Vyacheslav > >>> > > >>> > > >>> > > >>> > > >>> > -- > >>> > View this message in context: http://apache-ignite- > >>> > developers.2346864.n4.nabble.com/Data-compression-in- > >>> > Ignite-2-0-tp10099p16317.html > >>> > Sent from the Apache Ignite Developers mailing list archive at > >>> Nabble.com. > >>> > > >>> > >>> > >>> > >>> -- > >>> Alexey Kuznetsov > >>> > >> > >> > >> > >> -- > >> Best Regards, Vyacheslav > >> > > > > > > > > -- > > Best Regards, Vyacheslav > > > > > > -- > Best Regards, Vyacheslav > |
Free forum by Nabble | Edit this page |