Folks,
I've implemented page compression for persistent store and going to merge it to master. https://github.com/apache/ignite/pull/5200 Some design notes: It employs "hole punching" approach, it means that the pages are kept uncompressed in memory, but when they get written to disk, they will be compressed and all the extra file system blocks for the page will be released. Thus the storage files become sparse. Right now we will support 4 compression methods: ZSTD, LZ4, SNAPPY and SKIP_GARBAGE. All of them are self-explaining except SKIP_GARBAGE, which basically just takes only meaningful data from half-filled pages but does not apply any compression. It is easy to add more if needed. Since we can release only full file system blocks which are typically 4k size, user must configure page size to be at least multiple FS blocks, e.g. 8k or 16k. It also means that max compression ratio here is fsBlockSize / pageSize = 4k / 16k = 0.25 It is possible to enable compression for existing databases if they were configured for large enough page size. In this case pages will be written to disk in compressed form when updated, and the database will become compressed gradually. There will be 2 new properties on CacheConfiguration (setDiskPageCompression and setDiskPageCompressionLevel) to setup disk page compression. Compression dictionaries are not supported at the time, but may in the future. IMO it should be added as a separate feature if needed. The only supported platform for now is Linux. Since all popular file systems support sparse files, it must be relatively easy to support more platforms. Please take a look and provide your thoughts and suggestions. Thanks! Sergi |
Hi Sergi,
It is not clear for me will your changes affect PageSnapshot WAL record. Is it possible to add compression support for PageSnapshot WAL record as well, to reduce WAL size? Thanks. On Mon, Nov 19, 2018 at 1:01 PM Sergi Vladykin <[hidden email]> wrote: > Folks, > > I've implemented page compression for persistent store and going to merge > it to master. > > https://github.com/apache/ignite/pull/5200 > > Some design notes: > > It employs "hole punching" approach, it means that the pages are kept > uncompressed in memory, > but when they get written to disk, they will be compressed and all the > extra file system blocks for the page will be released. Thus the storage > files become sparse. > > Right now we will support 4 compression methods: ZSTD, LZ4, SNAPPY and > SKIP_GARBAGE. All of them are self-explaining except SKIP_GARBAGE, which > basically just takes only meaningful data from half-filled pages but does > not apply any compression. It is easy to add more if needed. > > Since we can release only full file system blocks which are typically 4k > size, user must configure page size to be at least multiple FS blocks, e.g. > 8k or 16k. It also means that max compression ratio here is fsBlockSize / > pageSize = 4k / 16k = 0.25 > > It is possible to enable compression for existing databases if they were > configured for large enough page size. In this case pages will be written > to disk in compressed form when updated, and the database will become > compressed gradually. > > There will be 2 new properties on CacheConfiguration > (setDiskPageCompression and setDiskPageCompressionLevel) to setup disk page > compression. > > Compression dictionaries are not supported at the time, but may in the > future. IMO it should be added as a separate feature if needed. > > The only supported platform for now is Linux. Since all popular file > systems support sparse files, it must be relatively easy to support more > platforms. > > Please take a look and provide your thoughts and suggestions. > > Thanks! > > Sergi > -- Best regards, Andrey V. Mashenkov |
Right now the functionality has nothing to do with WAL, but your idea
definitely makes sense and worth being implemented as a next step. Sergi пн, 19 нояб. 2018 г. в 13:58, Andrey Mashenkov <[hidden email]>: > Hi Sergi, > > It is not clear for me will your changes affect PageSnapshot WAL record. > Is it possible to add compression support for PageSnapshot WAL record as > well, to reduce WAL size? > > Thanks. > > On Mon, Nov 19, 2018 at 1:01 PM Sergi Vladykin <[hidden email]> > wrote: > > > Folks, > > > > I've implemented page compression for persistent store and going to merge > > it to master. > > > > https://github.com/apache/ignite/pull/5200 > > > > Some design notes: > > > > It employs "hole punching" approach, it means that the pages are kept > > uncompressed in memory, > > but when they get written to disk, they will be compressed and all the > > extra file system blocks for the page will be released. Thus the storage > > files become sparse. > > > > Right now we will support 4 compression methods: ZSTD, LZ4, SNAPPY and > > SKIP_GARBAGE. All of them are self-explaining except SKIP_GARBAGE, which > > basically just takes only meaningful data from half-filled pages but does > > not apply any compression. It is easy to add more if needed. > > > > Since we can release only full file system blocks which are typically 4k > > size, user must configure page size to be at least multiple FS blocks, > e.g. > > 8k or 16k. It also means that max compression ratio here is fsBlockSize / > > pageSize = 4k / 16k = 0.25 > > > > It is possible to enable compression for existing databases if they were > > configured for large enough page size. In this case pages will be written > > to disk in compressed form when updated, and the database will become > > compressed gradually. > > > > There will be 2 new properties on CacheConfiguration > > (setDiskPageCompression and setDiskPageCompressionLevel) to setup disk > page > > compression. > > > > Compression dictionaries are not supported at the time, but may in the > > future. IMO it should be added as a separate feature if needed. > > > > The only supported platform for now is Linux. Since all popular file > > systems support sparse files, it must be relatively easy to support more > > platforms. > > > > Please take a look and provide your thoughts and suggestions. > > > > Thanks! > > > > Sergi > > > > > -- > Best regards, > Andrey V. Mashenkov > |
Hello!
You have zstd default level of 3. In my tests, zstd usually performed much better with compression level 2. Please consider. I admire your effort! Regards, -- Ilya Kasnacheev пн, 19 нояб. 2018 г. в 14:02, Sergi Vladykin <[hidden email]>: > Right now the functionality has nothing to do with WAL, but your idea > definitely makes sense and worth being implemented as a next step. > > Sergi > > пн, 19 нояб. 2018 г. в 13:58, Andrey Mashenkov <[hidden email] > >: > > > Hi Sergi, > > > > It is not clear for me will your changes affect PageSnapshot WAL record. > > Is it possible to add compression support for PageSnapshot WAL record as > > well, to reduce WAL size? > > > > Thanks. > > > > On Mon, Nov 19, 2018 at 1:01 PM Sergi Vladykin <[hidden email] > > > > wrote: > > > > > Folks, > > > > > > I've implemented page compression for persistent store and going to > merge > > > it to master. > > > > > > https://github.com/apache/ignite/pull/5200 > > > > > > Some design notes: > > > > > > It employs "hole punching" approach, it means that the pages are kept > > > uncompressed in memory, > > > but when they get written to disk, they will be compressed and all the > > > extra file system blocks for the page will be released. Thus the > storage > > > files become sparse. > > > > > > Right now we will support 4 compression methods: ZSTD, LZ4, SNAPPY and > > > SKIP_GARBAGE. All of them are self-explaining except SKIP_GARBAGE, > which > > > basically just takes only meaningful data from half-filled pages but > does > > > not apply any compression. It is easy to add more if needed. > > > > > > Since we can release only full file system blocks which are typically > 4k > > > size, user must configure page size to be at least multiple FS blocks, > > e.g. > > > 8k or 16k. It also means that max compression ratio here is > fsBlockSize / > > > pageSize = 4k / 16k = 0.25 > > > > > > It is possible to enable compression for existing databases if they > were > > > configured for large enough page size. In this case pages will be > written > > > to disk in compressed form when updated, and the database will become > > > compressed gradually. > > > > > > There will be 2 new properties on CacheConfiguration > > > (setDiskPageCompression and setDiskPageCompressionLevel) to setup disk > > page > > > compression. > > > > > > Compression dictionaries are not supported at the time, but may in the > > > future. IMO it should be added as a separate feature if needed. > > > > > > The only supported platform for now is Linux. Since all popular file > > > systems support sparse files, it must be relatively easy to support > more > > > platforms. > > > > > > Please take a look and provide your thoughts and suggestions. > > > > > > Thanks! > > > > > > Sergi > > > > > > > > > -- > > Best regards, > > Andrey V. Mashenkov > > > |
Ilya,
Zstd itself has default compression level 3. I just used that number to be consistent with the library defaults. I will check if there is a significant difference in performance. Sergi пн, 19 нояб. 2018 г. в 14:59, Ilya Kasnacheev <[hidden email]>: > Hello! > > You have zstd default level of 3. In my tests, zstd usually performed much > better with compression level 2. Please consider. > > I admire your effort! > > Regards, > -- > Ilya Kasnacheev > > > пн, 19 нояб. 2018 г. в 14:02, Sergi Vladykin <[hidden email]>: > > > Right now the functionality has nothing to do with WAL, but your idea > > definitely makes sense and worth being implemented as a next step. > > > > Sergi > > > > пн, 19 нояб. 2018 г. в 13:58, Andrey Mashenkov < > [hidden email] > > >: > > > > > Hi Sergi, > > > > > > It is not clear for me will your changes affect PageSnapshot WAL > record. > > > Is it possible to add compression support for PageSnapshot WAL record > as > > > well, to reduce WAL size? > > > > > > Thanks. > > > > > > On Mon, Nov 19, 2018 at 1:01 PM Sergi Vladykin < > [hidden email] > > > > > > wrote: > > > > > > > Folks, > > > > > > > > I've implemented page compression for persistent store and going to > > merge > > > > it to master. > > > > > > > > https://github.com/apache/ignite/pull/5200 > > > > > > > > Some design notes: > > > > > > > > It employs "hole punching" approach, it means that the pages are kept > > > > uncompressed in memory, > > > > but when they get written to disk, they will be compressed and all > the > > > > extra file system blocks for the page will be released. Thus the > > storage > > > > files become sparse. > > > > > > > > Right now we will support 4 compression methods: ZSTD, LZ4, SNAPPY > and > > > > SKIP_GARBAGE. All of them are self-explaining except SKIP_GARBAGE, > > which > > > > basically just takes only meaningful data from half-filled pages but > > does > > > > not apply any compression. It is easy to add more if needed. > > > > > > > > Since we can release only full file system blocks which are typically > > 4k > > > > size, user must configure page size to be at least multiple FS > blocks, > > > e.g. > > > > 8k or 16k. It also means that max compression ratio here is > > fsBlockSize / > > > > pageSize = 4k / 16k = 0.25 > > > > > > > > It is possible to enable compression for existing databases if they > > were > > > > configured for large enough page size. In this case pages will be > > written > > > > to disk in compressed form when updated, and the database will become > > > > compressed gradually. > > > > > > > > There will be 2 new properties on CacheConfiguration > > > > (setDiskPageCompression and setDiskPageCompressionLevel) to setup > disk > > > page > > > > compression. > > > > > > > > Compression dictionaries are not supported at the time, but may in > the > > > > future. IMO it should be added as a separate feature if needed. > > > > > > > > The only supported platform for now is Linux. Since all popular file > > > > systems support sparse files, it must be relatively easy to support > > more > > > > platforms. > > > > > > > > Please take a look and provide your thoughts and suggestions. > > > > > > > > Thanks! > > > > > > > > Sergi > > > > > > > > > > > > > -- > > > Best regards, > > > Andrey V. Mashenkov > > > > > > |
In reply to this post by Sergi
Hi Sergi,
Didn't know you were cooking this dish in the background ) Excellent. Just to be sure, that's part of this IEP, right? https://cwiki.apache.org/confluence/display/IGNITE/IEP-20%3A+Data+Compression+in+Ignite#IEP-20:DataCompressioninIgnite-Withoutin-memorycompression Since we can release only full file system blocks which are typically 4k > size, user must configure page size to be at least multiple FS blocks, e.g. > 8k or 16k. It also means that max compression ratio here is fsBlockSize / > pageSize = 4k / 16k = 0.25 How to we handle the case if the page size is not a multiple of 4K? What is the most optimal page size if the user wants to get the best compression? Probably, we can adjust the default page size automatically if it's a clean deployment. There will be 2 new properties on CacheConfiguration > (setDiskPageCompression and setDiskPageCompressionLevel) to setup disk page > compression. How about setting it at DataRegionConfiguration level as well so that it's applied for all the caches/tables from there? -- Denis On Mon, Nov 19, 2018 at 2:01 AM Sergi Vladykin <[hidden email]> wrote: > Folks, > > I've implemented page compression for persistent store and going to merge > it to master. > > https://github.com/apache/ignite/pull/5200 > > Some design notes: > > It employs "hole punching" approach, it means that the pages are kept > uncompressed in memory, > but when they get written to disk, they will be compressed and all the > extra file system blocks for the page will be released. Thus the storage > files become sparse. > > Right now we will support 4 compression methods: ZSTD, LZ4, SNAPPY and > SKIP_GARBAGE. All of them are self-explaining except SKIP_GARBAGE, which > basically just takes only meaningful data from half-filled pages but does > not apply any compression. It is easy to add more if needed. > > Since we can release only full file system blocks which are typically 4k > size, user must configure page size to be at least multiple FS blocks, e.g. > 8k or 16k. It also means that max compression ratio here is fsBlockSize / > pageSize = 4k / 16k = 0.25 > > It is possible to enable compression for existing databases if they were > configured for large enough page size. In this case pages will be written > to disk in compressed form when updated, and the database will become > compressed gradually. > > There will be 2 new properties on CacheConfiguration > (setDiskPageCompression and setDiskPageCompressionLevel) to setup disk page > compression. > > Compression dictionaries are not supported at the time, but may in the > future. IMO it should be added as a separate feature if needed. > > The only supported platform for now is Linux. Since all popular file > systems support sparse files, it must be relatively easy to support more > platforms. > > Please take a look and provide your thoughts and suggestions. > > Thanks! > > Sergi > |
Denis,
See inline. пн, 19 нояб. 2018 г. в 20:17, Denis Magda <[hidden email]>: > Hi Sergi, > > Didn't know you were cooking this dish in the background ) Excellent. Just > to be sure, that's part of this IEP, right? > > https://cwiki.apache.org/confluence/display/IGNITE/IEP-20%3A+Data+Compression+in+Ignite#IEP-20:DataCompressioninIgnite-Withoutin-memorycompression Correct. > > > Since we can release only full file system blocks which are typically 4k > > size, user must configure page size to be at least multiple FS blocks, > e.g. > > 8k or 16k. It also means that max compression ratio here is fsBlockSize / > > pageSize = 4k / 16k = 0.25 > > > How to we handle the case if the page size is not a multiple of 4K? What is > the most optimal page size if the user wants to get the best compression? > Probably, we can adjust the default page size automatically if it's a clean > deployment. > > Thus there are only 2 options greater than 4k: either 8k or 16k. So page must be just large enough. Obviously the greater page size, the better compression you have, but having very large pages may affect performance badly. Thus 8k with ratio 0.5 or 16k with ratio 0.25 must be OK for the most of cases. > There will be 2 new properties on CacheConfiguration > > (setDiskPageCompression and setDiskPageCompressionLevel) to setup disk > page > > compression. > > > How about setting it at DataRegionConfiguration level as well so that it's > applied for all the caches/tables from there? > > data regions independently (now we can't). I would start with that one first. Sergi -- > Denis > > On Mon, Nov 19, 2018 at 2:01 AM Sergi Vladykin <[hidden email]> > wrote: > > > Folks, > > > > I've implemented page compression for persistent store and going to merge > > it to master. > > > > https://github.com/apache/ignite/pull/5200 > > > > Some design notes: > > > > It employs "hole punching" approach, it means that the pages are kept > > uncompressed in memory, > > but when they get written to disk, they will be compressed and all the > > extra file system blocks for the page will be released. Thus the storage > > files become sparse. > > > > Right now we will support 4 compression methods: ZSTD, LZ4, SNAPPY and > > SKIP_GARBAGE. All of them are self-explaining except SKIP_GARBAGE, which > > basically just takes only meaningful data from half-filled pages but does > > not apply any compression. It is easy to add more if needed. > > > > Since we can release only full file system blocks which are typically 4k > > size, user must configure page size to be at least multiple FS blocks, > e.g. > > 8k or 16k. It also means that max compression ratio here is fsBlockSize / > > pageSize = 4k / 16k = 0.25 > > > > It is possible to enable compression for existing databases if they were > > configured for large enough page size. In this case pages will be written > > to disk in compressed form when updated, and the database will become > > compressed gradually. > > > > There will be 2 new properties on CacheConfiguration > > (setDiskPageCompression and setDiskPageCompressionLevel) to setup disk > page > > compression. > > > > Compression dictionaries are not supported at the time, but may in the > > future. IMO it should be added as a separate feature if needed. > > > > The only supported platform for now is Linux. Since all popular file > > systems support sparse files, it must be relatively easy to support more > > platforms. > > > > Please take a look and provide your thoughts and suggestions. > > > > Thanks! > > > > Sergi > > > |
Hello Igniters!
I've implemented compression of WAL page snapshot records. Ticket [1], PR [2]. I've used page compression module implemented by Sergi Vladykin for page store. To configure WAL page records compression there are 2 properties added to DataStorageConfiguration: walPageCompression and walPageCompressionLevel. Unlike page store compression, WAL compression doesn't use sparse files and can be used on any file system (there is also not necessary to enable page store compression to enable WAL page records compression). WAL page snapshot compression is useful and performing best when there are many caches and partitions used by Ignite instance. In this case, page snapshot records take a considerable part of WAL (in my tests it's more than 90% of WAL size). I've done some benchmarks using the yardstick framework and got pretty good results. It's not only WAL size significantly reduced, but there is also an improvement in throughput and latency. I've attached some of the benchmark results to the ticket. Can anyone review the patch? [1]: https://issues.apache.org/jira/browse/IGNITE-11336 [2]: https://github.com/apache/ignite/pull/6116 вт, 20 нояб. 2018 г. в 08:19, Sergi Vladykin <[hidden email]>: > Denis, > > See inline. > > > пн, 19 нояб. 2018 г. в 20:17, Denis Magda <[hidden email]>: > > > Hi Sergi, > > > > Didn't know you were cooking this dish in the background ) Excellent. > Just > > to be sure, that's part of this IEP, right? > > > > > https://cwiki.apache.org/confluence/display/IGNITE/IEP-20%3A+Data+Compression+in+Ignite#IEP-20:DataCompressioninIgnite-Withoutin-memorycompression > > > Correct. > > > > > > > > Since we can release only full file system blocks which are typically 4k > > > size, user must configure page size to be at least multiple FS blocks, > > e.g. > > > 8k or 16k. It also means that max compression ratio here is > fsBlockSize / > > > pageSize = 4k / 16k = 0.25 > > > > > > How to we handle the case if the page size is not a multiple of 4K? What > is > > the most optimal page size if the user wants to get the best compression? > > Probably, we can adjust the default page size automatically if it's a > clean > > deployment. > > > > > We already force page size to be between 1k and 16k and to be power of 2. > Thus there are only 2 options greater than 4k: either 8k or 16k. So page > must be just large enough. > > Obviously the greater page size, the better compression you have, but > having very large pages may affect performance badly. Thus 8k with ratio > 0.5 or 16k with ratio 0.25 must be OK for the most of cases. > > > > > There will be 2 new properties on CacheConfiguration > > > (setDiskPageCompression and setDiskPageCompressionLevel) to setup disk > > page > > > compression. > > > > > > How about setting it at DataRegionConfiguration level as well so that > it's > > applied for all the caches/tables from there? > > > > > Does not seem to make much sense until we can tweak page size for different > data regions independently (now we can't). I would start with that one > first. > > Sergi > > > -- > > Denis > > > > On Mon, Nov 19, 2018 at 2:01 AM Sergi Vladykin <[hidden email] > > > > wrote: > > > > > Folks, > > > > > > I've implemented page compression for persistent store and going to > merge > > > it to master. > > > > > > https://github.com/apache/ignite/pull/5200 > > > > > > Some design notes: > > > > > > It employs "hole punching" approach, it means that the pages are kept > > > uncompressed in memory, > > > but when they get written to disk, they will be compressed and all the > > > extra file system blocks for the page will be released. Thus the > storage > > > files become sparse. > > > > > > Right now we will support 4 compression methods: ZSTD, LZ4, SNAPPY and > > > SKIP_GARBAGE. All of them are self-explaining except SKIP_GARBAGE, > which > > > basically just takes only meaningful data from half-filled pages but > does > > > not apply any compression. It is easy to add more if needed. > > > > > > Since we can release only full file system blocks which are typically > 4k > > > size, user must configure page size to be at least multiple FS blocks, > > e.g. > > > 8k or 16k. It also means that max compression ratio here is > fsBlockSize / > > > pageSize = 4k / 16k = 0.25 > > > > > > It is possible to enable compression for existing databases if they > were > > > configured for large enough page size. In this case pages will be > written > > > to disk in compressed form when updated, and the database will become > > > compressed gradually. > > > > > > There will be 2 new properties on CacheConfiguration > > > (setDiskPageCompression and setDiskPageCompressionLevel) to setup disk > > page > > > compression. > > > > > > Compression dictionaries are not supported at the time, but may in the > > > future. IMO it should be added as a separate feature if needed. > > > > > > The only supported platform for now is Linux. Since all popular file > > > systems support sparse files, it must be relatively easy to support > more > > > platforms. > > > > > > Please take a look and provide your thoughts and suggestions. > > > > > > Thanks! > > > > > > Sergi > > > > > > |
Hi Sergi,
we are planning to on-board Ignite persistence store and since we are having huge volume of data we need to have page compression. After checking few links I would like to confirm that does any critical issue still pending or do we need to wait for specific version to release? It would be great help to plan our development further since we are in first phase only. -Vimal -- Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/ |
Free forum by Nabble | Edit this page |