IEP-22: Direct Data Load proposal

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

IEP-22: Direct Data Load proposal

Vladimir Ozerov
Igniters,

Initial data load is one of the most important use cases for our product.
This is one the first things user try to do with Ignite. And if it takes
too much time, it is very likely that user will look for other solutions.

We did good progress in this area recently. Specifically - a set of
internal improvements on our indexes, steaming mode for JDBC driver, COPY
command. But our internals are still not very efficient - every single
update goes through the whole set of Ignite components, such as page cache,
free-lists, BTrees, etc..

I created IEP-22 [1]. It's goal is to implement special direct data load
mode which will bypass our page cache and use alternative algorithm for
index updates. Together with COPY command and streaming this improvement
will allow Ignite to load data with very high speed.

Please review the IEP and share your comments.

Vladimir.

[1]
https://cwiki.apache.org/confluence/display/IGNITE/IEP-22%3A+Direct+Data+Load
Reply | Threaded
Open this post in threaded view
|

Re: IEP-22: Direct Data Load proposal

Nikolay Izhikov-2
Hello, Vladimir.

Does this IEP fit with IEP-18: TDE?

Do we allow to user to load data into encrypted cache?

В Ср, 20/06/2018 в 18:08 +0300, Vladimir Ozerov пишет:

> Igniters,
>
> Initial data load is one of the most important use cases for our product.
> This is one the first things user try to do with Ignite. And if it takes
> too much time, it is very likely that user will look for other solutions.
>
> We did good progress in this area recently. Specifically - a set of
> internal improvements on our indexes, steaming mode for JDBC driver, COPY
> command. But our internals are still not very efficient - every single
> update goes through the whole set of Ignite components, such as page cache,
> free-lists, BTrees, etc..
>
> I created IEP-22 [1]. It's goal is to implement special direct data load
> mode which will bypass our page cache and use alternative algorithm for
> index updates. Together with COPY command and streaming this improvement
> will allow Ignite to load data with very high speed.
>
> Please review the IEP and share your comments.
>
> Vladimir.
>
> [1]
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-22%3A+Direct+Data+Load

signature.asc (499 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: IEP-22: Direct Data Load proposal

Andrey Kuznetsov
In reply to this post by Vladimir Ozerov
Vladimir,

Great IEP, but I couldn't comprehend the beginning of the "Direct Data
Load" paragraph. Maybe, there are some typos?

ср, 20 июн. 2018 г. в 18:08, Vladimir Ozerov <[hidden email]>:

> Igniters,
>
> Initial data load is one of the most important use cases for our product.
> This is one the first things user try to do with Ignite. And if it takes
> too much time, it is very likely that user will look for other solutions.
>

Best regards,
  Andrey Kuznetsov.
Reply | Threaded
Open this post in threaded view
|

Re: IEP-22: Direct Data Load proposal

dsetrakyan
On Wed, Jun 20, 2018 at 8:40 AM, Andrey Kuznetsov <[hidden email]> wrote:

> Vladimir,
>
> Great IEP, but I couldn't comprehend the beginning of the "Direct Data
> Load" paragraph. Maybe, there are some typos?
>

I fixed some typos, it is more readable now.
Reply | Threaded
Open this post in threaded view
|

Re: IEP-22: Direct Data Load proposal

Vladimir Ozerov
In reply to this post by Nikolay Izhikov-2
Hi Nikolay,

I do not see any problems with TDE for now.

On Wed, Jun 20, 2018 at 6:16 PM, Nikolay Izhikov <[hidden email]>
wrote:

> Hello, Vladimir.
>
> Does this IEP fit with IEP-18: TDE?
>
> Do we allow to user to load data into encrypted cache?
>
> В Ср, 20/06/2018 в 18:08 +0300, Vladimir Ozerov пишет:
> > Igniters,
> >
> > Initial data load is one of the most important use cases for our product.
> > This is one the first things user try to do with Ignite. And if it takes
> > too much time, it is very likely that user will look for other solutions.
> >
> > We did good progress in this area recently. Specifically - a set of
> > internal improvements on our indexes, steaming mode for JDBC driver, COPY
> > command. But our internals are still not very efficient - every single
> > update goes through the whole set of Ignite components, such as page
> cache,
> > free-lists, BTrees, etc..
> >
> > I created IEP-22 [1]. It's goal is to implement special direct data load
> > mode which will bypass our page cache and use alternative algorithm for
> > index updates. Together with COPY command and streaming this improvement
> > will allow Ignite to load data with very high speed.
> >
> > Please review the IEP and share your comments.
> >
> > Vladimir.
> >
> > [1]
> > https://cwiki.apache.org/confluence/display/IGNITE/IEP-
> 22%3A+Direct+Data+Load
>
Reply | Threaded
Open this post in threaded view
|

Re: IEP-22: Direct Data Load proposal

dmagda
In reply to this post by Vladimir Ozerov
Vladimir,

As I see from the IEP, this data loading technique is supposed to be used
for deployments with Ignite persistence enabled. Is it possible to
generalize this solution and use for pure in-memory and in-memory + 3rd
party DB scenarios?

--
Denis

On Wed, Jun 20, 2018 at 8:08 AM Vladimir Ozerov <[hidden email]>
wrote:

> Igniters,
>
> Initial data load is one of the most important use cases for our product.
> This is one the first things user try to do with Ignite. And if it takes
> too much time, it is very likely that user will look for other solutions.
>
> We did good progress in this area recently. Specifically - a set of
> internal improvements on our indexes, steaming mode for JDBC driver, COPY
> command. But our internals are still not very efficient - every single
> update goes through the whole set of Ignite components, such as page cache,
> free-lists, BTrees, etc..
>
> I created IEP-22 [1]. It's goal is to implement special direct data load
> mode which will bypass our page cache and use alternative algorithm for
> index updates. Together with COPY command and streaming this improvement
> will allow Ignite to load data with very high speed.
>
> Please review the IEP and share your comments.
>
> Vladimir.
>
> [1]
>
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-22%3A+Direct+Data+Load
>
Reply | Threaded
Open this post in threaded view
|

Re: IEP-22: Direct Data Load proposal

Vladimir Ozerov
Hi Denis,

This IEP is mostly about how we work with our own indexes and pages. So 3rd
party DB is out of question.

On Thu, Jun 21, 2018 at 10:38 PM Denis Magda <[hidden email]> wrote:

> Vladimir,
>
> As I see from the IEP, this data loading technique is supposed to be used
> for deployments with Ignite persistence enabled. Is it possible to
> generalize this solution and use for pure in-memory and in-memory + 3rd
> party DB scenarios?
>
> --
> Denis
>
> On Wed, Jun 20, 2018 at 8:08 AM Vladimir Ozerov <[hidden email]>
> wrote:
>
> > Igniters,
> >
> > Initial data load is one of the most important use cases for our product.
> > This is one the first things user try to do with Ignite. And if it takes
> > too much time, it is very likely that user will look for other solutions.
> >
> > We did good progress in this area recently. Specifically - a set of
> > internal improvements on our indexes, steaming mode for JDBC driver, COPY
> > command. But our internals are still not very efficient - every single
> > update goes through the whole set of Ignite components, such as page
> cache,
> > free-lists, BTrees, etc..
> >
> > I created IEP-22 [1]. It's goal is to implement special direct data load
> > mode which will bypass our page cache and use alternative algorithm for
> > index updates. Together with COPY command and streaming this improvement
> > will allow Ignite to load data with very high speed.
> >
> > Please review the IEP and share your comments.
> >
> > Vladimir.
> >
> > [1]
> >
> >
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-22%3A+Direct+Data+Load
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: IEP-22: Direct Data Load proposal

dsetrakyan
On Thu, Aug 16, 2018 at 1:24 AM, Vladimir Ozerov <[hidden email]>
wrote:

> Hi Denis,
>
> This IEP is mostly about how we work with our own indexes and pages. So 3rd
> party DB is out of question.
>

Why? I think 3rd party DB will be supported automatically with CacheStore.
However, do we need to do something different for memory-only vs.
memory+disk?

D.
Reply | Threaded
Open this post in threaded view
|

Re: IEP-22: Direct Data Load proposal

Vladimir Ozerov
Dima,

By "out of question" I meant that 3rd party persistence should work out of
the box when IEP-22 is ready. No changes should be required there.

As far as persistence vs memory, most probably yes, there might be some
differences. Specifically, when data load starts and persistence is
enabled, we will bypass free lists and write data to new blocks. This way,
overall data will need more pages than when loaded in normal mode. This is
a kind of trade-off you face when loading speed is important (at the very
least Oracle works this way, most probably other vendors does the same).
But this approach may be not applicable for in-memory mode, where total
number of pages is limited, and we do not want to hit page eviction.

To summarize - some optimizations which are applicable for persistent mode
will not be applicable for in-memory.

Vladimir.

On Thu, Aug 16, 2018 at 11:41 AM Dmitriy Setrakyan <[hidden email]>
wrote:

> On Thu, Aug 16, 2018 at 1:24 AM, Vladimir Ozerov <[hidden email]>
> wrote:
>
> > Hi Denis,
> >
> > This IEP is mostly about how we work with our own indexes and pages. So
> 3rd
> > party DB is out of question.
> >
>
> Why? I think 3rd party DB will be supported automatically with CacheStore.
> However, do we need to do something different for memory-only vs.
> memory+disk?
>
> D.
>