Igniters,
I had several private talks with Igniters about data compression and would like to share the summary with ... Igniters :-) Currently all Ignite's data is uncompressed. It leads to excessive network traffic, GC pressure and disk IO (in case of persistence). Most modern databases are able to compress data, what gives them 2-4x size reduction on typical workloads. We need compression in Ignite. There are several options I'd like to discuss. The main difference between them - on what "level" to compress: per-entry, per-data-page or per-cache. *1) Per-entry compression* Apache Geode uses this approach. Every cache entry is compressed using Snappy. This is very easy to implement, but every entry access (e.g. reading single field) require full decompression or even re-compression, what could lead to higher CPU consumption and worse performance. *2) Per-data-page compression* Oracle and DB2 use this approach. Pages are compressed with dictionary-based approach (e.g. LZV). It is important, that they do not compress the whole page. Instead, only actual data is compressed, while page structure remains intact. Dictionary is placed within the page. This way it is possible to work with individual entries and even individual fields without full page decompression. Another important thing - it is not necessary to re-compress the page on each write. Instead, data is stored in uncompressed form first, and compressed even after certain threshold is reached. So negative CPU impact is minimal. Typical compression rate would be higher than in per-entry case, because the more data you have, the better it can be compressed. *3) Per-cache compression* Suggested by Alex Goncharuk. We could have a dictionary for the whole cache. This way we could achieve the highest compression rate possible. The downside is complex implementation - we would have to develop an algorithm of sharing the dictionary within the cluster. At some point the dictionary could become too huge to fit in-memory, so we should either control it's size or spill it to disk. I propose to use per-data-page approach as both gives nice compression rate and relatively easy to implement. Please share your thoughts. Vladimir. |
Agree, I like 2 best of all.
--Yakov |
In reply to this post by Vladimir Ozerov
Vova,
I do not understand how (2) will help us reduce the size of the serialized object. It sounds like we need to serialize an object the way we do today, and then compress it when it arrives to the server side and gets stored in some page. This looks like a double whammy. I do not understand why the approach (3) is difficult. Yes, we will have to implement a shared compression table, but I am not sure if it will ever grow too big. As a matter of fact, we can stop adding entries to it, after it reaches a certain pre-defined size. Also, we already know how to persist data, so spilling the compression table to disk should not be an issue. If my assumption about (2) is correct, I would like to ask you to give another thought to the (3). D. On Wed, Aug 9, 2017 at 7:48 AM, Vladimir Ozerov <[hidden email]> wrote: > Igniters, > > I had several private talks with Igniters about data compression and would > like to share the summary with ... Igniters :-) > > Currently all Ignite's data is uncompressed. It leads to excessive network > traffic, GC pressure and disk IO (in case of persistence). Most modern > databases are able to compress data, what gives them 2-4x size reduction on > typical workloads. We need compression in Ignite. > > There are several options I'd like to discuss. The main difference between > them - on what "level" to compress: per-entry, per-data-page or per-cache. > > *1) Per-entry compression* > Apache Geode uses this approach. Every cache entry is compressed using > Snappy. This is very easy to implement, but every entry access (e.g. > reading single field) require full decompression or even re-compression, > what could lead to higher CPU consumption and worse performance. > > *2) Per-data-page compression* > Oracle and DB2 use this approach. Pages are compressed with > dictionary-based approach (e.g. LZV). It is important, that they do not > compress the whole page. Instead, only actual data is compressed, while > page structure remains intact. Dictionary is placed within the page. This > way it is possible to work with individual entries and even individual > fields without full page decompression. Another important thing - it is not > necessary to re-compress the page on each write. Instead, data is stored in > uncompressed form first, and compressed even after certain threshold is > reached. So negative CPU impact is minimal. Typical compression rate would > be higher than in per-entry case, because the more data you have, the > better it can be compressed. > > *3) Per-cache compression* > Suggested by Alex Goncharuk. We could have a dictionary for the whole > cache. This way we could achieve the highest compression rate possible. The > downside is complex implementation - we would have to develop an algorithm > of sharing the dictionary within the cluster. At some point the dictionary > could become too huge to fit in-memory, so we should either control it's > size or spill it to disk. > > I propose to use per-data-page approach as both gives nice compression rate > and relatively easy to implement. > > Please share your thoughts. > > Vladimir. > |
In reply to this post by Vladimir Ozerov
+1 for per-page approach (2).
I'm not sure if dictionary-based encoding within page is really efficient. Compression quality grows with increasing size of sample (it's pageSize in our case - not so big), and we can't encode "words" on the border of two entries. Also, we'll have to "garbage-collect" page dictionary from time to time (as set of entries may change and some "words" may become redundant). However, if Oracle implemented it in profitable way, we can do it as well :) I have a concern about (3) that it will add significant latency/network overhead. Accessing distributed caсhe with dictionary at every put is not free. Also, I'd like to propose one more approach (let it be (4)) - local per-partition dictionary. We can store set of dictionary "words" in B+ tree or any other structure in pages - it will be automatically persisted. Efficiency of encoding will be higher than in per-page case. Thoughts? The open question for per-partition dictionary is how to "garbage-collect" it. In per-page case we can do it under page write lock, but here we have to do it in more tricky way. Best Regards, Ivan Rakov On 09.08.2017 17:48, Vladimir Ozerov wrote: > Igniters, > > I had several private talks with Igniters about data compression and would > like to share the summary with ... Igniters :-) > > Currently all Ignite's data is uncompressed. It leads to excessive network > traffic, GC pressure and disk IO (in case of persistence). Most modern > databases are able to compress data, what gives them 2-4x size reduction on > typical workloads. We need compression in Ignite. > > There are several options I'd like to discuss. The main difference between > them - on what "level" to compress: per-entry, per-data-page or per-cache. > > *1) Per-entry compression* > Apache Geode uses this approach. Every cache entry is compressed using > Snappy. This is very easy to implement, but every entry access (e.g. > reading single field) require full decompression or even re-compression, > what could lead to higher CPU consumption and worse performance. > > *2) Per-data-page compression* > Oracle and DB2 use this approach. Pages are compressed with > dictionary-based approach (e.g. LZV). It is important, that they do not > compress the whole page. Instead, only actual data is compressed, while > page structure remains intact. Dictionary is placed within the page. This > way it is possible to work with individual entries and even individual > fields without full page decompression. Another important thing - it is not > necessary to re-compress the page on each write. Instead, data is stored in > uncompressed form first, and compressed even after certain threshold is > reached. So negative CPU impact is minimal. Typical compression rate would > be higher than in per-entry case, because the more data you have, the > better it can be compressed. > > *3) Per-cache compression* > Suggested by Alex Goncharuk. We could have a dictionary for the whole > cache. This way we could achieve the highest compression rate possible. The > downside is complex implementation - we would have to develop an algorithm > of sharing the dictionary within the cluster. At some point the dictionary > could become too huge to fit in-memory, so we should either control it's > size or spill it to disk. > > I propose to use per-data-page approach as both gives nice compression rate > and relatively easy to implement. > > Please share your thoughts. > > Vladimir. > |
Ivan, your points definitely make sense, however, I see at list one issue
compared to per-page approach. With per-page we can split page to compressed and uncompressed regions and change the dictionary if more efficient compression is possible. With bigger areas such as partition or entire cache dynamic dictionary change may be very complex. --Yakov |
I still don't understand per-partition compression, which is only local.
How do entries get updated in the middle of a partition file? D. On Fri, Aug 11, 2017 at 3:44 AM, Yakov Zhdanov <[hidden email]> wrote: > Ivan, your points definitely make sense, however, I see at list one issue > compared to per-page approach. With per-page we can split page to > compressed and uncompressed regions and change the dictionary if more > efficient compression is possible. With bigger areas such as partition or > entire cache dynamic dictionary change may be very complex. > > --Yakov > |
Igniters,
This is to inform you that we had another conversation about compression design, where Ivan's proposal about per-partition compression was discussed. Let me share our current vision. 1) Per-partition approach will have higher compression rate, than per-page approach. This is beyond doubts. 2) But it is much more difficult to implement comparing to per-page. In particular, dictionary management. Moreover, write operations might suffer. As a result we think that per-page approach is the way to go as a first iteration. In future we may consider per-partition compression as a kind of "one-time" batch operation which could be performed on rarely updated and historical data. Tiered approach. Mature commercial vendors work in exactly the same way. Sergey Puchnin and I are trying to better understand on how exactly compression could be implemented. What we understand now: 1) This will be dictionary-based compression (e.g. LZV) 2) Page will be compressed in batch mode. I.e. not on every change, but when certain threshold is reached (e.g. page's free space drops below 20%) What we do not understand yet: 1) Granularity of compression algorithm. 1.1) It could be per-entry - i.e. we compress the whole entry content, but respect boundaries between entries. E.g.: before - [ENTRY_1][ENTRY_2], after - [COMPRESSED_ENTRY_1][COMPRESSED_ENTRY_2] (as opposed to [COMPRESSED ENTRY_1 and ENTRY_2]). 1.2) Or it could be per-field - i.e. we compress fields, but respect binary object layout. First approach is simple, straightforward, and will give acceptable compression rate, but we will have to compress the whole binary object on every field access, what may ruin our SQL performance. Second approach is more complex, we are not sure about it's compression rate, but as BinaryObject structure is preserved, we will still have fast constant-time per-field access. Please share your thoughts. Vladimir. On Sat, Aug 12, 2017 at 4:08 AM, Dmitriy Setrakyan <[hidden email]> wrote: > I still don't understand per-partition compression, which is only local. > How do entries get updated in the middle of a partition file? > > D. > > On Fri, Aug 11, 2017 at 3:44 AM, Yakov Zhdanov <[hidden email]> > wrote: > > > Ivan, your points definitely make sense, however, I see at list one issue > > compared to per-page approach. With per-page we can split page to > > compressed and uncompressed regions and change the dictionary if more > > efficient compression is possible. With bigger areas such as partition or > > entire cache dynamic dictionary change may be very complex. > > > > --Yakov > > > |
Some more comments:
1. I see advantage in both approaches, any chance we can support both of them? 2. We have to make sure that it is possible to enable different types of compression on per-cache level. 3. Would also be nice to show the compression ratio, i.e. how much data space was saved, as part of monitoring. D. On Wed, Aug 23, 2017 at 6:12 AM, Vladimir Ozerov <[hidden email]> wrote: > Igniters, > > This is to inform you that we had another conversation about compression > design, where Ivan's proposal about per-partition compression was > discussed. Let me share our current vision. > > 1) Per-partition approach will have higher compression rate, than per-page > approach. This is beyond doubts. > 2) But it is much more difficult to implement comparing to per-page. In > particular, dictionary management. Moreover, write operations might suffer. > > As a result we think that per-page approach is the way to go as a first > iteration. In future we may consider per-partition compression as a kind of > "one-time" batch operation which could be performed on rarely updated and > historical data. Tiered approach. Mature commercial vendors work in exactly > the same way. Sergey Puchnin and I are trying to better understand on how > exactly compression could be implemented. > > What we understand now: > 1) This will be dictionary-based compression (e.g. LZV) > 2) Page will be compressed in batch mode. I.e. not on every change, but > when certain threshold is reached (e.g. page's free space drops below 20%) > > What we do not understand yet: > 1) Granularity of compression algorithm. > 1.1) It could be per-entry - i.e. we compress the whole entry content, but > respect boundaries between entries. E.g.: before - [ENTRY_1][ENTRY_2], > after - [COMPRESSED_ENTRY_1][COMPRESSED_ENTRY_2] (as opposed to > [COMPRESSED > ENTRY_1 and ENTRY_2]). > 1.2) Or it could be per-field - i.e. we compress fields, but respect binary > object layout. First approach is simple, straightforward, and will give > acceptable compression rate, but we will have to compress the whole binary > object on every field access, what may ruin our SQL performance. Second > approach is more complex, we are not sure about it's compression rate, but > as BinaryObject structure is preserved, we will still have fast > constant-time per-field access. > > Please share your thoughts. > > Vladimir. > > > On Sat, Aug 12, 2017 at 4:08 AM, Dmitriy Setrakyan <[hidden email]> > wrote: > > > I still don't understand per-partition compression, which is only local. > > How do entries get updated in the middle of a partition file? > > > > D. > > > > On Fri, Aug 11, 2017 at 3:44 AM, Yakov Zhdanov <[hidden email]> > > wrote: > > > > > Ivan, your points definitely make sense, however, I see at list one > issue > > > compared to per-page approach. With per-page we can split page to > > > compressed and uncompressed regions and change the dictionary if more > > > efficient compression is possible. With bigger areas such as partition > or > > > entire cache dynamic dictionary change may be very complex. > > > > > > --Yakov > > > > > > |
Hi, Igniters!
Are there any results of researching or a prototype of compression feature? -- Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/ |
Hi Igniters,
What do you think about implementing analogue of Java G1 collector featue 'String deduplication': -XX:+UseG1GC -XX:+UseStringDeduplication Most of business application has almost all objects of type String. As result char[] array is often on top of heap usage. To reduce consumption by duplicates G1 collector in background identifies and deduplicates strings having equal array into one instance (as String is immutable). Unfortunately we can’t reuse collector’s feature as Ignite stores data off-heap. What if we consider implementing same deduplication feature for Ignite Durable Memory? Sincerely, Dmitry Pavlov ср, 18 окт. 2017 г. в 18:52, daradurvs <[hidden email]>: > Hi, Igniters! > > Are there any results of researching or a prototype of compression feature? > > > > -- > Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/ > |
Dmitry,
What we've discussed so far in this topic is essentially the same concept. We will deduplicate same byte sequences on page level. On Fri, Nov 10, 2017 at 6:10 PM, Dmitry Pavlov <[hidden email]> wrote: > Hi Igniters, > > What do you think about implementing analogue of Java G1 collector featue > 'String deduplication': -XX:+UseG1GC -XX:+UseStringDeduplication > > Most of business application has almost all objects of type String. As > result char[] array is often on top of heap usage. To reduce consumption by > duplicates G1 collector in background identifies and deduplicates strings > having equal array into one instance (as String is immutable). > Unfortunately we can’t reuse collector’s feature as Ignite stores data > off-heap. > > What if we consider implementing same deduplication feature for Ignite > Durable Memory? > > Sincerely, > Dmitry Pavlov > > > ср, 18 окт. 2017 г. в 18:52, daradurvs <[hidden email]>: > > > Hi, Igniters! > > > > Are there any results of researching or a prototype of compression > feature? > > > > > > > > -- > > Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/ > > > |
Vladimir,
orientation on string will also allow us to deduplicate strings in objects during unmarshalling from page into heap. Moreover, this can be first simple step of implementating more complex algorithm. Sincerely, Dmitriy Pavlov пт, 10 нояб. 2017 г. в 18:19, Vladimir Ozerov <[hidden email]>: > Dmitry, > > What we've discussed so far in this topic is essentially the same concept. > We will deduplicate same byte sequences on page level. > > On Fri, Nov 10, 2017 at 6:10 PM, Dmitry Pavlov <[hidden email]> > wrote: > > > Hi Igniters, > > > > What do you think about implementing analogue of Java G1 collector featue > > 'String deduplication': -XX:+UseG1GC -XX:+UseStringDeduplication > > > > Most of business application has almost all objects of type String. As > > result char[] array is often on top of heap usage. To reduce consumption > by > > duplicates G1 collector in background identifies and deduplicates strings > > having equal array into one instance (as String is immutable). > > Unfortunately we can’t reuse collector’s feature as Ignite stores data > > off-heap. > > > > What if we consider implementing same deduplication feature for Ignite > > Durable Memory? > > > > Sincerely, > > Dmitry Pavlov > > > > > > ср, 18 окт. 2017 г. в 18:52, daradurvs <[hidden email]>: > > > > > Hi, Igniters! > > > > > > Are there any results of researching or a prototype of compression > > feature? > > > > > > > > > > > > -- > > > Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/ > > > > > > |
This would require shared dictionary, which is complex to maintain. We
evaluated this option, but rejected due to complexity. Another important thing is that String type doesn't dominate in user models, so I do not see why it should be a kind of special case. пт, 10 нояб. 2017 г. в 18:45, Dmitry Pavlov <[hidden email]>: > Vladimir, > > orientation on string will also allow us to deduplicate strings in objects > during unmarshalling from page into heap. > > Moreover, this can be first simple step of implementating more complex > algorithm. > > Sincerely, > Dmitriy Pavlov > > пт, 10 нояб. 2017 г. в 18:19, Vladimir Ozerov <[hidden email]>: > > > Dmitry, > > > > What we've discussed so far in this topic is essentially the same > concept. > > We will deduplicate same byte sequences on page level. > > > > On Fri, Nov 10, 2017 at 6:10 PM, Dmitry Pavlov <[hidden email]> > > wrote: > > > > > Hi Igniters, > > > > > > What do you think about implementing analogue of Java G1 collector > featue > > > 'String deduplication': -XX:+UseG1GC -XX:+UseStringDeduplication > > > > > > Most of business application has almost all objects of type String. As > > > result char[] array is often on top of heap usage. To reduce > consumption > > by > > > duplicates G1 collector in background identifies and deduplicates > strings > > > having equal array into one instance (as String is immutable). > > > Unfortunately we can’t reuse collector’s feature as Ignite stores data > > > off-heap. > > > > > > What if we consider implementing same deduplication feature for Ignite > > > Durable Memory? > > > > > > Sincerely, > > > Dmitry Pavlov > > > > > > > > > ср, 18 окт. 2017 г. в 18:52, daradurvs <[hidden email]>: > > > > > > > Hi, Igniters! > > > > > > > > Are there any results of researching or a prototype of compression > > > feature? > > > > > > > > > > > > > > > > -- > > > > Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/ > > > > > > > > > > |
Hi Vladimir,
To my experience string is often used data type in business applications and moreover, indexed. > String type doesn't dominate in user models what is the basis of this assumption? Could you explain why String is more complex than byte[] compression. It seems they both requires dictionaries. Sincerely, Dmitriy Pavlov пт, 10 нояб. 2017 г. в 18:57, Vladimir Ozerov <[hidden email]>: > This would require shared dictionary, which is complex to maintain. We > evaluated this option, but rejected due to complexity. Another important > thing is that String type doesn't dominate in user models, so I do not see > why it should be a kind of special case. > > пт, 10 нояб. 2017 г. в 18:45, Dmitry Pavlov <[hidden email]>: > > > Vladimir, > > > > orientation on string will also allow us to deduplicate strings in > objects > > during unmarshalling from page into heap. > > > > Moreover, this can be first simple step of implementating more complex > > algorithm. > > > > Sincerely, > > Dmitriy Pavlov > > > > пт, 10 нояб. 2017 г. в 18:19, Vladimir Ozerov <[hidden email]>: > > > > > Dmitry, > > > > > > What we've discussed so far in this topic is essentially the same > > concept. > > > We will deduplicate same byte sequences on page level. > > > > > > On Fri, Nov 10, 2017 at 6:10 PM, Dmitry Pavlov <[hidden email]> > > > wrote: > > > > > > > Hi Igniters, > > > > > > > > What do you think about implementing analogue of Java G1 collector > > featue > > > > 'String deduplication': -XX:+UseG1GC -XX:+UseStringDeduplication > > > > > > > > Most of business application has almost all objects of type String. > As > > > > result char[] array is often on top of heap usage. To reduce > > consumption > > > by > > > > duplicates G1 collector in background identifies and deduplicates > > strings > > > > having equal array into one instance (as String is immutable). > > > > Unfortunately we can’t reuse collector’s feature as Ignite stores > data > > > > off-heap. > > > > > > > > What if we consider implementing same deduplication feature for > Ignite > > > > Durable Memory? > > > > > > > > Sincerely, > > > > Dmitry Pavlov > > > > > > > > > > > > ср, 18 окт. 2017 г. в 18:52, daradurvs <[hidden email]>: > > > > > > > > > Hi, Igniters! > > > > > > > > > > Are there any results of researching or a prototype of compression > > > > feature? > > > > > > > > > > > > > > > > > > > > -- > > > > > Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/ > > > > > > > > > > > > > > > |
Dmitry,
Ignite is used by a variety of applications. Some models I saw were made completely of stirngs. Others - of longs and decimals, etc.. It is impossible to either prove or disprove what is the dominant data type. My position is based on experience with Ignite users and approaches used in other databases. Strings are more complex because you approach assumes that there is a common dictionary with strings, and reference to these strings from data pages. As soon as you have cross-page references, you are in trobule, because you need to maintain that dictionary. WIth page based approach we agreed previously, the dictionary is generic (i.e. it can compress not only strings, but any byte sequence), and is located inside the page, meaning that all you need to maintain this dictionary is page lock. On Fri, Nov 10, 2017 at 7:02 PM, Dmitry Pavlov <[hidden email]> wrote: > Hi Vladimir, > > To my experience string is often used data type in business applications > and moreover, indexed. > > String type doesn't dominate in user models > what is the basis of this assumption? > > Could you explain why String is more complex than byte[] compression. It > seems they both requires dictionaries. > > Sincerely, > Dmitriy Pavlov > > пт, 10 нояб. 2017 г. в 18:57, Vladimir Ozerov <[hidden email]>: > > > This would require shared dictionary, which is complex to maintain. We > > evaluated this option, but rejected due to complexity. Another important > > thing is that String type doesn't dominate in user models, so I do not > see > > why it should be a kind of special case. > > > > пт, 10 нояб. 2017 г. в 18:45, Dmitry Pavlov <[hidden email]>: > > > > > Vladimir, > > > > > > orientation on string will also allow us to deduplicate strings in > > objects > > > during unmarshalling from page into heap. > > > > > > Moreover, this can be first simple step of implementating more complex > > > algorithm. > > > > > > Sincerely, > > > Dmitriy Pavlov > > > > > > пт, 10 нояб. 2017 г. в 18:19, Vladimir Ozerov <[hidden email]>: > > > > > > > Dmitry, > > > > > > > > What we've discussed so far in this topic is essentially the same > > > concept. > > > > We will deduplicate same byte sequences on page level. > > > > > > > > On Fri, Nov 10, 2017 at 6:10 PM, Dmitry Pavlov < > [hidden email]> > > > > wrote: > > > > > > > > > Hi Igniters, > > > > > > > > > > What do you think about implementing analogue of Java G1 collector > > > featue > > > > > 'String deduplication': -XX:+UseG1GC -XX:+UseStringDeduplication > > > > > > > > > > Most of business application has almost all objects of type String. > > As > > > > > result char[] array is often on top of heap usage. To reduce > > > consumption > > > > by > > > > > duplicates G1 collector in background identifies and deduplicates > > > strings > > > > > having equal array into one instance (as String is immutable). > > > > > Unfortunately we can’t reuse collector’s feature as Ignite stores > > data > > > > > off-heap. > > > > > > > > > > What if we consider implementing same deduplication feature for > > Ignite > > > > > Durable Memory? > > > > > > > > > > Sincerely, > > > > > Dmitry Pavlov > > > > > > > > > > > > > > > ср, 18 окт. 2017 г. в 18:52, daradurvs <[hidden email]>: > > > > > > > > > > > Hi, Igniters! > > > > > > > > > > > > Are there any results of researching or a prototype of > compression > > > > > feature? > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > Sent from: http://apache-ignite-developers.2346864.n4.nabble. > com/ > > > > > > > > > > > > > > > > > > > > > |
Free forum by Nabble | Edit this page |