Igniters,
Initial data load is one of the most important use cases for our product. This is one the first things user try to do with Ignite. And if it takes too much time, it is very likely that user will look for other solutions. We did good progress in this area recently. Specifically - a set of internal improvements on our indexes, steaming mode for JDBC driver, COPY command. But our internals are still not very efficient - every single update goes through the whole set of Ignite components, such as page cache, free-lists, BTrees, etc.. I created IEP-22 [1]. It's goal is to implement special direct data load mode which will bypass our page cache and use alternative algorithm for index updates. Together with COPY command and streaming this improvement will allow Ignite to load data with very high speed. Please review the IEP and share your comments. Vladimir. [1] https://cwiki.apache.org/confluence/display/IGNITE/IEP-22%3A+Direct+Data+Load |
Hello, Vladimir.
Does this IEP fit with IEP-18: TDE? Do we allow to user to load data into encrypted cache? В Ср, 20/06/2018 в 18:08 +0300, Vladimir Ozerov пишет: > Igniters, > > Initial data load is one of the most important use cases for our product. > This is one the first things user try to do with Ignite. And if it takes > too much time, it is very likely that user will look for other solutions. > > We did good progress in this area recently. Specifically - a set of > internal improvements on our indexes, steaming mode for JDBC driver, COPY > command. But our internals are still not very efficient - every single > update goes through the whole set of Ignite components, such as page cache, > free-lists, BTrees, etc.. > > I created IEP-22 [1]. It's goal is to implement special direct data load > mode which will bypass our page cache and use alternative algorithm for > index updates. Together with COPY command and streaming this improvement > will allow Ignite to load data with very high speed. > > Please review the IEP and share your comments. > > Vladimir. > > [1] > https://cwiki.apache.org/confluence/display/IGNITE/IEP-22%3A+Direct+Data+Load |
In reply to this post by Vladimir Ozerov
Vladimir,
Great IEP, but I couldn't comprehend the beginning of the "Direct Data Load" paragraph. Maybe, there are some typos? ср, 20 июн. 2018 г. в 18:08, Vladimir Ozerov <[hidden email]>: > Igniters, > > Initial data load is one of the most important use cases for our product. > This is one the first things user try to do with Ignite. And if it takes > too much time, it is very likely that user will look for other solutions. > Best regards, Andrey Kuznetsov. |
On Wed, Jun 20, 2018 at 8:40 AM, Andrey Kuznetsov <[hidden email]> wrote:
> Vladimir, > > Great IEP, but I couldn't comprehend the beginning of the "Direct Data > Load" paragraph. Maybe, there are some typos? > I fixed some typos, it is more readable now. |
In reply to this post by Nikolay Izhikov-2
Hi Nikolay,
I do not see any problems with TDE for now. On Wed, Jun 20, 2018 at 6:16 PM, Nikolay Izhikov <[hidden email]> wrote: > Hello, Vladimir. > > Does this IEP fit with IEP-18: TDE? > > Do we allow to user to load data into encrypted cache? > > В Ср, 20/06/2018 в 18:08 +0300, Vladimir Ozerov пишет: > > Igniters, > > > > Initial data load is one of the most important use cases for our product. > > This is one the first things user try to do with Ignite. And if it takes > > too much time, it is very likely that user will look for other solutions. > > > > We did good progress in this area recently. Specifically - a set of > > internal improvements on our indexes, steaming mode for JDBC driver, COPY > > command. But our internals are still not very efficient - every single > > update goes through the whole set of Ignite components, such as page > cache, > > free-lists, BTrees, etc.. > > > > I created IEP-22 [1]. It's goal is to implement special direct data load > > mode which will bypass our page cache and use alternative algorithm for > > index updates. Together with COPY command and streaming this improvement > > will allow Ignite to load data with very high speed. > > > > Please review the IEP and share your comments. > > > > Vladimir. > > > > [1] > > https://cwiki.apache.org/confluence/display/IGNITE/IEP- > 22%3A+Direct+Data+Load > |
In reply to this post by Vladimir Ozerov
Vladimir,
As I see from the IEP, this data loading technique is supposed to be used for deployments with Ignite persistence enabled. Is it possible to generalize this solution and use for pure in-memory and in-memory + 3rd party DB scenarios? -- Denis On Wed, Jun 20, 2018 at 8:08 AM Vladimir Ozerov <[hidden email]> wrote: > Igniters, > > Initial data load is one of the most important use cases for our product. > This is one the first things user try to do with Ignite. And if it takes > too much time, it is very likely that user will look for other solutions. > > We did good progress in this area recently. Specifically - a set of > internal improvements on our indexes, steaming mode for JDBC driver, COPY > command. But our internals are still not very efficient - every single > update goes through the whole set of Ignite components, such as page cache, > free-lists, BTrees, etc.. > > I created IEP-22 [1]. It's goal is to implement special direct data load > mode which will bypass our page cache and use alternative algorithm for > index updates. Together with COPY command and streaming this improvement > will allow Ignite to load data with very high speed. > > Please review the IEP and share your comments. > > Vladimir. > > [1] > > https://cwiki.apache.org/confluence/display/IGNITE/IEP-22%3A+Direct+Data+Load > |
Hi Denis,
This IEP is mostly about how we work with our own indexes and pages. So 3rd party DB is out of question. On Thu, Jun 21, 2018 at 10:38 PM Denis Magda <[hidden email]> wrote: > Vladimir, > > As I see from the IEP, this data loading technique is supposed to be used > for deployments with Ignite persistence enabled. Is it possible to > generalize this solution and use for pure in-memory and in-memory + 3rd > party DB scenarios? > > -- > Denis > > On Wed, Jun 20, 2018 at 8:08 AM Vladimir Ozerov <[hidden email]> > wrote: > > > Igniters, > > > > Initial data load is one of the most important use cases for our product. > > This is one the first things user try to do with Ignite. And if it takes > > too much time, it is very likely that user will look for other solutions. > > > > We did good progress in this area recently. Specifically - a set of > > internal improvements on our indexes, steaming mode for JDBC driver, COPY > > command. But our internals are still not very efficient - every single > > update goes through the whole set of Ignite components, such as page > cache, > > free-lists, BTrees, etc.. > > > > I created IEP-22 [1]. It's goal is to implement special direct data load > > mode which will bypass our page cache and use alternative algorithm for > > index updates. Together with COPY command and streaming this improvement > > will allow Ignite to load data with very high speed. > > > > Please review the IEP and share your comments. > > > > Vladimir. > > > > [1] > > > > > https://cwiki.apache.org/confluence/display/IGNITE/IEP-22%3A+Direct+Data+Load > > > |
On Thu, Aug 16, 2018 at 1:24 AM, Vladimir Ozerov <[hidden email]>
wrote: > Hi Denis, > > This IEP is mostly about how we work with our own indexes and pages. So 3rd > party DB is out of question. > Why? I think 3rd party DB will be supported automatically with CacheStore. However, do we need to do something different for memory-only vs. memory+disk? D. |
Dima,
By "out of question" I meant that 3rd party persistence should work out of the box when IEP-22 is ready. No changes should be required there. As far as persistence vs memory, most probably yes, there might be some differences. Specifically, when data load starts and persistence is enabled, we will bypass free lists and write data to new blocks. This way, overall data will need more pages than when loaded in normal mode. This is a kind of trade-off you face when loading speed is important (at the very least Oracle works this way, most probably other vendors does the same). But this approach may be not applicable for in-memory mode, where total number of pages is limited, and we do not want to hit page eviction. To summarize - some optimizations which are applicable for persistent mode will not be applicable for in-memory. Vladimir. On Thu, Aug 16, 2018 at 11:41 AM Dmitriy Setrakyan <[hidden email]> wrote: > On Thu, Aug 16, 2018 at 1:24 AM, Vladimir Ozerov <[hidden email]> > wrote: > > > Hi Denis, > > > > This IEP is mostly about how we work with our own indexes and pages. So > 3rd > > party DB is out of question. > > > > Why? I think 3rd party DB will be supported automatically with CacheStore. > However, do we need to do something different for memory-only vs. > memory+disk? > > D. > |
Free forum by Nabble | Edit this page |