Cache scan efficiency

classic Classic list List threaded Threaded
19 messages Options
Reply | Threaded
Open this post in threaded view
|

Cache scan efficiency

Alexei Scherbakov
Igniters,

My use case involves scenario where it's necessary to iterate over
large(many TBs) persistent cache doing some calculation on read data.

The basic solution is to iterate cache using ScanQuery.

This turns out to be slow because iteration over cache involves a lot of
random disk access for reading data pages referenced from leaf pages by
links.

This is especially true when data is stored on disks with slow random
access, like SAS disks. In my case on modern SAS disks array reading speed
was like several MB/sec while sequential read speed in perf test was about
GB/sec.

I was able to fix the issue by using ScanQuery with explicit partition set
and running simple warmup code before each partition scan.

The code pins cold pages in memory in sequential order thus eliminating
random disk access. Speedup was like x100 magnitude.

I suggest adding the improvement to the product's core  by always
sequentially preloading pages for all internal partition iterations (cache
iterators, scan queries, sql queries with scan plan) if partition is cold
(low number of pinned pages).

This also should speed up rebalancing from cold partitions.

Ignite JIRA ticket [1]

Thoughts ?

[1] https://issues.apache.org/jira/browse/IGNITE-8873

--

Best regards,
Alexei Scherbakov
Reply | Threaded
Open this post in threaded view
|

Re: Cache scan efficiency

dsetrakyan
Alexey, this is a great feature. Can you explain what you meant by
"warm-up" when iterating through pages? Do you have this feature already
implemented?

D.

On Sun, Sep 16, 2018 at 6:44 PM Alexei Scherbakov <
[hidden email]> wrote:

> Igniters,
>
> My use case involves scenario where it's necessary to iterate over
> large(many TBs) persistent cache doing some calculation on read data.
>
> The basic solution is to iterate cache using ScanQuery.
>
> This turns out to be slow because iteration over cache involves a lot of
> random disk access for reading data pages referenced from leaf pages by
> links.
>
> This is especially true when data is stored on disks with slow random
> access, like SAS disks. In my case on modern SAS disks array reading speed
> was like several MB/sec while sequential read speed in perf test was about
> GB/sec.
>
> I was able to fix the issue by using ScanQuery with explicit partition set
> and running simple warmup code before each partition scan.
>
> The code pins cold pages in memory in sequential order thus eliminating
> random disk access. Speedup was like x100 magnitude.
>
> I suggest adding the improvement to the product's core  by always
> sequentially preloading pages for all internal partition iterations (cache
> iterators, scan queries, sql queries with scan plan) if partition is cold
> (low number of pinned pages).
>
> This also should speed up rebalancing from cold partitions.
>
> Ignite JIRA ticket [1]
>
> Thoughts ?
>
> [1] https://issues.apache.org/jira/browse/IGNITE-8873
>
> --
>
> Best regards,
> Alexei Scherbakov
>
Reply | Threaded
Open this post in threaded view
|

Re: Cache scan efficiency

Vladimir Ozerov
In reply to this post by Alexei Scherbakov
HI Alex,

This is good that you observed speedup. But I do not think this solution
works for the product in general case. Amount of RAM is limited, and even a
single partition may need more space than RAM available. Moving a lot of
pages to page memory for scan means that you evict a lot of other pages,
what will ultimately lead to bad performance of subsequent queries and
defeat LRU algorithms, which are of great improtance for good database
performance.

Database vendors choose another approach - skip BTrees, iterate direclty
over data pages, read them in multi-block fashion, use separate scan buffer
to avoid excessive evictions of other hot pages. Corresponding ticket for
SQL exists [1], but idea is common for all parts of the system, requiring
scans.

As far as proposed solution, it might be good idea to add special API to
"warmup" partition with clear explanation of pros (fast scan after warmup)
and cons (slowdown of any other operations). But I think we should not make
this approach part of normal scans.

Vladimir.

[1] https://issues.apache.org/jira/browse/IGNITE-6057


On Sun, Sep 16, 2018 at 6:44 PM Alexei Scherbakov <
[hidden email]> wrote:

> Igniters,
>
> My use case involves scenario where it's necessary to iterate over
> large(many TBs) persistent cache doing some calculation on read data.
>
> The basic solution is to iterate cache using ScanQuery.
>
> This turns out to be slow because iteration over cache involves a lot of
> random disk access for reading data pages referenced from leaf pages by
> links.
>
> This is especially true when data is stored on disks with slow random
> access, like SAS disks. In my case on modern SAS disks array reading speed
> was like several MB/sec while sequential read speed in perf test was about
> GB/sec.
>
> I was able to fix the issue by using ScanQuery with explicit partition set
> and running simple warmup code before each partition scan.
>
> The code pins cold pages in memory in sequential order thus eliminating
> random disk access. Speedup was like x100 magnitude.
>
> I suggest adding the improvement to the product's core  by always
> sequentially preloading pages for all internal partition iterations (cache
> iterators, scan queries, sql queries with scan plan) if partition is cold
> (low number of pinned pages).
>
> This also should speed up rebalancing from cold partitions.
>
> Ignite JIRA ticket [1]
>
> Thoughts ?
>
> [1] https://issues.apache.org/jira/browse/IGNITE-8873
>
> --
>
> Best regards,
> Alexei Scherbakov
>
Reply | Threaded
Open this post in threaded view
|

Re: Cache scan efficiency

Dmitriy Pavlov
Hi Alexei,

I did not find any PRs associated with the ticket for check code changes
behind this idea. Are there any PRs?

If we create some forwards scan of pages, it should be a very intellectual
algorithm including a lot of parameters (how much RAM is free, how probably
we will need next page, etc). We had the private talk about such idea some
time ago.

By my experience, Linux systems already do such forward reading of file
data (for corresponding sequential flagged file descriptors), but some
prefetching of data at the level of application may be useful for O_DIRECT
file descriptors.

And one more concern from me is about selecting a right place in the system
to do such prefetch.

Sincerely,
Dmitriy Pavlov

вс, 16 сент. 2018 г. в 19:54, Vladimir Ozerov <[hidden email]>:

> HI Alex,
>
> This is good that you observed speedup. But I do not think this solution
> works for the product in general case. Amount of RAM is limited, and even a
> single partition may need more space than RAM available. Moving a lot of
> pages to page memory for scan means that you evict a lot of other pages,
> what will ultimately lead to bad performance of subsequent queries and
> defeat LRU algorithms, which are of great improtance for good database
> performance.
>
> Database vendors choose another approach - skip BTrees, iterate direclty
> over data pages, read them in multi-block fashion, use separate scan buffer
> to avoid excessive evictions of other hot pages. Corresponding ticket for
> SQL exists [1], but idea is common for all parts of the system, requiring
> scans.
>
> As far as proposed solution, it might be good idea to add special API to
> "warmup" partition with clear explanation of pros (fast scan after warmup)
> and cons (slowdown of any other operations). But I think we should not make
> this approach part of normal scans.
>
> Vladimir.
>
> [1] https://issues.apache.org/jira/browse/IGNITE-6057
>
>
> On Sun, Sep 16, 2018 at 6:44 PM Alexei Scherbakov <
> [hidden email]> wrote:
>
> > Igniters,
> >
> > My use case involves scenario where it's necessary to iterate over
> > large(many TBs) persistent cache doing some calculation on read data.
> >
> > The basic solution is to iterate cache using ScanQuery.
> >
> > This turns out to be slow because iteration over cache involves a lot of
> > random disk access for reading data pages referenced from leaf pages by
> > links.
> >
> > This is especially true when data is stored on disks with slow random
> > access, like SAS disks. In my case on modern SAS disks array reading
> speed
> > was like several MB/sec while sequential read speed in perf test was
> about
> > GB/sec.
> >
> > I was able to fix the issue by using ScanQuery with explicit partition
> set
> > and running simple warmup code before each partition scan.
> >
> > The code pins cold pages in memory in sequential order thus eliminating
> > random disk access. Speedup was like x100 magnitude.
> >
> > I suggest adding the improvement to the product's core  by always
> > sequentially preloading pages for all internal partition iterations
> (cache
> > iterators, scan queries, sql queries with scan plan) if partition is cold
> > (low number of pinned pages).
> >
> > This also should speed up rebalancing from cold partitions.
> >
> > Ignite JIRA ticket [1]
> >
> > Thoughts ?
> >
> > [1] https://issues.apache.org/jira/browse/IGNITE-8873
> >
> > --
> >
> > Best regards,
> > Alexei Scherbakov
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Cache scan efficiency

Mmuzaf
Folks,

Such warming up can be an effective technique for performing calculations
which required large cache
data reads, but I think it's the single narrow use case of all over Ignite
store usages. Like all other
powerfull techniques, we should use it wisely. In the general case, I think
we should consider other
techniques mentioned by Vladimir and may create something like `global
statistics of cache data usage`
to choose the best technique in each case.

For instance, it's not obvious what would take longer: multi-block reads or
50 single-block reads issues
sequentially. It strongly depends on used hardware under the hood and might
depend on workload system
resources (CPU-intensive calculations and I\O access) as well. But
`statistics` will help us to choose
the right way.


On Sun, 16 Sep 2018 at 23:59 Dmitriy Pavlov <[hidden email]> wrote:

> Hi Alexei,
>
> I did not find any PRs associated with the ticket for check code changes
> behind this idea. Are there any PRs?
>
> If we create some forwards scan of pages, it should be a very intellectual
> algorithm including a lot of parameters (how much RAM is free, how probably
> we will need next page, etc). We had the private talk about such idea some
> time ago.
>
> By my experience, Linux systems already do such forward reading of file
> data (for corresponding sequential flagged file descriptors), but some
> prefetching of data at the level of application may be useful for O_DIRECT
> file descriptors.
>
> And one more concern from me is about selecting a right place in the system
> to do such prefetch.
>
> Sincerely,
> Dmitriy Pavlov
>
> вс, 16 сент. 2018 г. в 19:54, Vladimir Ozerov <[hidden email]>:
>
> > HI Alex,
> >
> > This is good that you observed speedup. But I do not think this solution
> > works for the product in general case. Amount of RAM is limited, and
> even a
> > single partition may need more space than RAM available. Moving a lot of
> > pages to page memory for scan means that you evict a lot of other pages,
> > what will ultimately lead to bad performance of subsequent queries and
> > defeat LRU algorithms, which are of great improtance for good database
> > performance.
> >
> > Database vendors choose another approach - skip BTrees, iterate direclty
> > over data pages, read them in multi-block fashion, use separate scan
> buffer
> > to avoid excessive evictions of other hot pages. Corresponding ticket for
> > SQL exists [1], but idea is common for all parts of the system, requiring
> > scans.
> >
> > As far as proposed solution, it might be good idea to add special API to
> > "warmup" partition with clear explanation of pros (fast scan after
> warmup)
> > and cons (slowdown of any other operations). But I think we should not
> make
> > this approach part of normal scans.
> >
> > Vladimir.
> >
> > [1] https://issues.apache.org/jira/browse/IGNITE-6057
> >
> >
> > On Sun, Sep 16, 2018 at 6:44 PM Alexei Scherbakov <
> > [hidden email]> wrote:
> >
> > > Igniters,
> > >
> > > My use case involves scenario where it's necessary to iterate over
> > > large(many TBs) persistent cache doing some calculation on read data.
> > >
> > > The basic solution is to iterate cache using ScanQuery.
> > >
> > > This turns out to be slow because iteration over cache involves a lot
> of
> > > random disk access for reading data pages referenced from leaf pages by
> > > links.
> > >
> > > This is especially true when data is stored on disks with slow random
> > > access, like SAS disks. In my case on modern SAS disks array reading
> > speed
> > > was like several MB/sec while sequential read speed in perf test was
> > about
> > > GB/sec.
> > >
> > > I was able to fix the issue by using ScanQuery with explicit partition
> > set
> > > and running simple warmup code before each partition scan.
> > >
> > > The code pins cold pages in memory in sequential order thus eliminating
> > > random disk access. Speedup was like x100 magnitude.
> > >
> > > I suggest adding the improvement to the product's core  by always
> > > sequentially preloading pages for all internal partition iterations
> > (cache
> > > iterators, scan queries, sql queries with scan plan) if partition is
> cold
> > > (low number of pinned pages).
> > >
> > > This also should speed up rebalancing from cold partitions.
> > >
> > > Ignite JIRA ticket [1]
> > >
> > > Thoughts ?
> > >
> > > [1] https://issues.apache.org/jira/browse/IGNITE-8873
> > >
> > > --
> > >
> > > Best regards,
> > > Alexei Scherbakov
> > >
> >
>
--
--
Maxim Muzafarov
Reply | Threaded
Open this post in threaded view
|

Re: Cache scan efficiency

Alexey Goncharuk
I think it would be beneficial for some Ignite users if we added such a
partition warmup method to the public API. The method should be
well-documented and state that it may invalidate existing page cache. It
will be a very effective instrument until we add the proper scan ability
that Vladimir was referring to.

пн, 17 сент. 2018 г. в 13:05, Maxim Muzafarov <[hidden email]>:

> Folks,
>
> Such warming up can be an effective technique for performing calculations
> which required large cache
> data reads, but I think it's the single narrow use case of all over Ignite
> store usages. Like all other
> powerfull techniques, we should use it wisely. In the general case, I think
> we should consider other
> techniques mentioned by Vladimir and may create something like `global
> statistics of cache data usage`
> to choose the best technique in each case.
>
> For instance, it's not obvious what would take longer: multi-block reads or
> 50 single-block reads issues
> sequentially. It strongly depends on used hardware under the hood and might
> depend on workload system
> resources (CPU-intensive calculations and I\O access) as well. But
> `statistics` will help us to choose
> the right way.
>
>
> On Sun, 16 Sep 2018 at 23:59 Dmitriy Pavlov <[hidden email]> wrote:
>
> > Hi Alexei,
> >
> > I did not find any PRs associated with the ticket for check code changes
> > behind this idea. Are there any PRs?
> >
> > If we create some forwards scan of pages, it should be a very
> intellectual
> > algorithm including a lot of parameters (how much RAM is free, how
> probably
> > we will need next page, etc). We had the private talk about such idea
> some
> > time ago.
> >
> > By my experience, Linux systems already do such forward reading of file
> > data (for corresponding sequential flagged file descriptors), but some
> > prefetching of data at the level of application may be useful for
> O_DIRECT
> > file descriptors.
> >
> > And one more concern from me is about selecting a right place in the
> system
> > to do such prefetch.
> >
> > Sincerely,
> > Dmitriy Pavlov
> >
> > вс, 16 сент. 2018 г. в 19:54, Vladimir Ozerov <[hidden email]>:
> >
> > > HI Alex,
> > >
> > > This is good that you observed speedup. But I do not think this
> solution
> > > works for the product in general case. Amount of RAM is limited, and
> > even a
> > > single partition may need more space than RAM available. Moving a lot
> of
> > > pages to page memory for scan means that you evict a lot of other
> pages,
> > > what will ultimately lead to bad performance of subsequent queries and
> > > defeat LRU algorithms, which are of great improtance for good database
> > > performance.
> > >
> > > Database vendors choose another approach - skip BTrees, iterate
> direclty
> > > over data pages, read them in multi-block fashion, use separate scan
> > buffer
> > > to avoid excessive evictions of other hot pages. Corresponding ticket
> for
> > > SQL exists [1], but idea is common for all parts of the system,
> requiring
> > > scans.
> > >
> > > As far as proposed solution, it might be good idea to add special API
> to
> > > "warmup" partition with clear explanation of pros (fast scan after
> > warmup)
> > > and cons (slowdown of any other operations). But I think we should not
> > make
> > > this approach part of normal scans.
> > >
> > > Vladimir.
> > >
> > > [1] https://issues.apache.org/jira/browse/IGNITE-6057
> > >
> > >
> > > On Sun, Sep 16, 2018 at 6:44 PM Alexei Scherbakov <
> > > [hidden email]> wrote:
> > >
> > > > Igniters,
> > > >
> > > > My use case involves scenario where it's necessary to iterate over
> > > > large(many TBs) persistent cache doing some calculation on read data.
> > > >
> > > > The basic solution is to iterate cache using ScanQuery.
> > > >
> > > > This turns out to be slow because iteration over cache involves a lot
> > of
> > > > random disk access for reading data pages referenced from leaf pages
> by
> > > > links.
> > > >
> > > > This is especially true when data is stored on disks with slow random
> > > > access, like SAS disks. In my case on modern SAS disks array reading
> > > speed
> > > > was like several MB/sec while sequential read speed in perf test was
> > > about
> > > > GB/sec.
> > > >
> > > > I was able to fix the issue by using ScanQuery with explicit
> partition
> > > set
> > > > and running simple warmup code before each partition scan.
> > > >
> > > > The code pins cold pages in memory in sequential order thus
> eliminating
> > > > random disk access. Speedup was like x100 magnitude.
> > > >
> > > > I suggest adding the improvement to the product's core  by always
> > > > sequentially preloading pages for all internal partition iterations
> > > (cache
> > > > iterators, scan queries, sql queries with scan plan) if partition is
> > cold
> > > > (low number of pinned pages).
> > > >
> > > > This also should speed up rebalancing from cold partitions.
> > > >
> > > > Ignite JIRA ticket [1]
> > > >
> > > > Thoughts ?
> > > >
> > > > [1] https://issues.apache.org/jira/browse/IGNITE-8873
> > > >
> > > > --
> > > >
> > > > Best regards,
> > > > Alexei Scherbakov
> > > >
> > >
> >
> --
> --
> Maxim Muzafarov
>
Reply | Threaded
Open this post in threaded view
|

Re: Cache scan efficiency

dsetrakyan
I would rather fix the scan than hack the scan. Is there any technical
reason for hacking it now instead of fixing it properly? Can some of the
experts in this thread provide an estimate of complexity and difference in
work that would be required for each approach?

D.

On Mon, Sep 17, 2018 at 4:42 PM Alexey Goncharuk <[hidden email]>
wrote:

> I think it would be beneficial for some Ignite users if we added such a
> partition warmup method to the public API. The method should be
> well-documented and state that it may invalidate existing page cache. It
> will be a very effective instrument until we add the proper scan ability
> that Vladimir was referring to.
>
> пн, 17 сент. 2018 г. в 13:05, Maxim Muzafarov <[hidden email]>:
>
> > Folks,
> >
> > Such warming up can be an effective technique for performing calculations
> > which required large cache
> > data reads, but I think it's the single narrow use case of all over
> Ignite
> > store usages. Like all other
> > powerfull techniques, we should use it wisely. In the general case, I
> think
> > we should consider other
> > techniques mentioned by Vladimir and may create something like `global
> > statistics of cache data usage`
> > to choose the best technique in each case.
> >
> > For instance, it's not obvious what would take longer: multi-block reads
> or
> > 50 single-block reads issues
> > sequentially. It strongly depends on used hardware under the hood and
> might
> > depend on workload system
> > resources (CPU-intensive calculations and I\O access) as well. But
> > `statistics` will help us to choose
> > the right way.
> >
> >
> > On Sun, 16 Sep 2018 at 23:59 Dmitriy Pavlov <[hidden email]>
> wrote:
> >
> > > Hi Alexei,
> > >
> > > I did not find any PRs associated with the ticket for check code
> changes
> > > behind this idea. Are there any PRs?
> > >
> > > If we create some forwards scan of pages, it should be a very
> > intellectual
> > > algorithm including a lot of parameters (how much RAM is free, how
> > probably
> > > we will need next page, etc). We had the private talk about such idea
> > some
> > > time ago.
> > >
> > > By my experience, Linux systems already do such forward reading of file
> > > data (for corresponding sequential flagged file descriptors), but some
> > > prefetching of data at the level of application may be useful for
> > O_DIRECT
> > > file descriptors.
> > >
> > > And one more concern from me is about selecting a right place in the
> > system
> > > to do such prefetch.
> > >
> > > Sincerely,
> > > Dmitriy Pavlov
> > >
> > > вс, 16 сент. 2018 г. в 19:54, Vladimir Ozerov <[hidden email]>:
> > >
> > > > HI Alex,
> > > >
> > > > This is good that you observed speedup. But I do not think this
> > solution
> > > > works for the product in general case. Amount of RAM is limited, and
> > > even a
> > > > single partition may need more space than RAM available. Moving a lot
> > of
> > > > pages to page memory for scan means that you evict a lot of other
> > pages,
> > > > what will ultimately lead to bad performance of subsequent queries
> and
> > > > defeat LRU algorithms, which are of great improtance for good
> database
> > > > performance.
> > > >
> > > > Database vendors choose another approach - skip BTrees, iterate
> > direclty
> > > > over data pages, read them in multi-block fashion, use separate scan
> > > buffer
> > > > to avoid excessive evictions of other hot pages. Corresponding ticket
> > for
> > > > SQL exists [1], but idea is common for all parts of the system,
> > requiring
> > > > scans.
> > > >
> > > > As far as proposed solution, it might be good idea to add special API
> > to
> > > > "warmup" partition with clear explanation of pros (fast scan after
> > > warmup)
> > > > and cons (slowdown of any other operations). But I think we should
> not
> > > make
> > > > this approach part of normal scans.
> > > >
> > > > Vladimir.
> > > >
> > > > [1] https://issues.apache.org/jira/browse/IGNITE-6057
> > > >
> > > >
> > > > On Sun, Sep 16, 2018 at 6:44 PM Alexei Scherbakov <
> > > > [hidden email]> wrote:
> > > >
> > > > > Igniters,
> > > > >
> > > > > My use case involves scenario where it's necessary to iterate over
> > > > > large(many TBs) persistent cache doing some calculation on read
> data.
> > > > >
> > > > > The basic solution is to iterate cache using ScanQuery.
> > > > >
> > > > > This turns out to be slow because iteration over cache involves a
> lot
> > > of
> > > > > random disk access for reading data pages referenced from leaf
> pages
> > by
> > > > > links.
> > > > >
> > > > > This is especially true when data is stored on disks with slow
> random
> > > > > access, like SAS disks. In my case on modern SAS disks array
> reading
> > > > speed
> > > > > was like several MB/sec while sequential read speed in perf test
> was
> > > > about
> > > > > GB/sec.
> > > > >
> > > > > I was able to fix the issue by using ScanQuery with explicit
> > partition
> > > > set
> > > > > and running simple warmup code before each partition scan.
> > > > >
> > > > > The code pins cold pages in memory in sequential order thus
> > eliminating
> > > > > random disk access. Speedup was like x100 magnitude.
> > > > >
> > > > > I suggest adding the improvement to the product's core  by always
> > > > > sequentially preloading pages for all internal partition iterations
> > > > (cache
> > > > > iterators, scan queries, sql queries with scan plan) if partition
> is
> > > cold
> > > > > (low number of pinned pages).
> > > > >
> > > > > This also should speed up rebalancing from cold partitions.
> > > > >
> > > > > Ignite JIRA ticket [1]
> > > > >
> > > > > Thoughts ?
> > > > >
> > > > > [1] https://issues.apache.org/jira/browse/IGNITE-8873
> > > > >
> > > > > --
> > > > >
> > > > > Best regards,
> > > > > Alexei Scherbakov
> > > > >
> > > >
> > >
> > --
> > --
> > Maxim Muzafarov
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Cache scan efficiency

Dmitriy Pavlov
As I understood it is not a hack, it is an advanced feature for warming up
the partition. We can build warm-up of the overall cache by calling its
partitions warm-up. Users often ask about this feature and are not
confident with our lazy upload.

Please correct me if I misunderstood the idea.

пн, 17 сент. 2018 г. в 18:37, Dmitriy Setrakyan <[hidden email]>:

> I would rather fix the scan than hack the scan. Is there any technical
> reason for hacking it now instead of fixing it properly? Can some of the
> experts in this thread provide an estimate of complexity and difference in
> work that would be required for each approach?
>
> D.
>
> On Mon, Sep 17, 2018 at 4:42 PM Alexey Goncharuk <
> [hidden email]>
> wrote:
>
> > I think it would be beneficial for some Ignite users if we added such a
> > partition warmup method to the public API. The method should be
> > well-documented and state that it may invalidate existing page cache. It
> > will be a very effective instrument until we add the proper scan ability
> > that Vladimir was referring to.
> >
> > пн, 17 сент. 2018 г. в 13:05, Maxim Muzafarov <[hidden email]>:
> >
> > > Folks,
> > >
> > > Such warming up can be an effective technique for performing
> calculations
> > > which required large cache
> > > data reads, but I think it's the single narrow use case of all over
> > Ignite
> > > store usages. Like all other
> > > powerfull techniques, we should use it wisely. In the general case, I
> > think
> > > we should consider other
> > > techniques mentioned by Vladimir and may create something like `global
> > > statistics of cache data usage`
> > > to choose the best technique in each case.
> > >
> > > For instance, it's not obvious what would take longer: multi-block
> reads
> > or
> > > 50 single-block reads issues
> > > sequentially. It strongly depends on used hardware under the hood and
> > might
> > > depend on workload system
> > > resources (CPU-intensive calculations and I\O access) as well. But
> > > `statistics` will help us to choose
> > > the right way.
> > >
> > >
> > > On Sun, 16 Sep 2018 at 23:59 Dmitriy Pavlov <[hidden email]>
> > wrote:
> > >
> > > > Hi Alexei,
> > > >
> > > > I did not find any PRs associated with the ticket for check code
> > changes
> > > > behind this idea. Are there any PRs?
> > > >
> > > > If we create some forwards scan of pages, it should be a very
> > > intellectual
> > > > algorithm including a lot of parameters (how much RAM is free, how
> > > probably
> > > > we will need next page, etc). We had the private talk about such idea
> > > some
> > > > time ago.
> > > >
> > > > By my experience, Linux systems already do such forward reading of
> file
> > > > data (for corresponding sequential flagged file descriptors), but
> some
> > > > prefetching of data at the level of application may be useful for
> > > O_DIRECT
> > > > file descriptors.
> > > >
> > > > And one more concern from me is about selecting a right place in the
> > > system
> > > > to do such prefetch.
> > > >
> > > > Sincerely,
> > > > Dmitriy Pavlov
> > > >
> > > > вс, 16 сент. 2018 г. в 19:54, Vladimir Ozerov <[hidden email]
> >:
> > > >
> > > > > HI Alex,
> > > > >
> > > > > This is good that you observed speedup. But I do not think this
> > > solution
> > > > > works for the product in general case. Amount of RAM is limited,
> and
> > > > even a
> > > > > single partition may need more space than RAM available. Moving a
> lot
> > > of
> > > > > pages to page memory for scan means that you evict a lot of other
> > > pages,
> > > > > what will ultimately lead to bad performance of subsequent queries
> > and
> > > > > defeat LRU algorithms, which are of great improtance for good
> > database
> > > > > performance.
> > > > >
> > > > > Database vendors choose another approach - skip BTrees, iterate
> > > direclty
> > > > > over data pages, read them in multi-block fashion, use separate
> scan
> > > > buffer
> > > > > to avoid excessive evictions of other hot pages. Corresponding
> ticket
> > > for
> > > > > SQL exists [1], but idea is common for all parts of the system,
> > > requiring
> > > > > scans.
> > > > >
> > > > > As far as proposed solution, it might be good idea to add special
> API
> > > to
> > > > > "warmup" partition with clear explanation of pros (fast scan after
> > > > warmup)
> > > > > and cons (slowdown of any other operations). But I think we should
> > not
> > > > make
> > > > > this approach part of normal scans.
> > > > >
> > > > > Vladimir.
> > > > >
> > > > > [1] https://issues.apache.org/jira/browse/IGNITE-6057
> > > > >
> > > > >
> > > > > On Sun, Sep 16, 2018 at 6:44 PM Alexei Scherbakov <
> > > > > [hidden email]> wrote:
> > > > >
> > > > > > Igniters,
> > > > > >
> > > > > > My use case involves scenario where it's necessary to iterate
> over
> > > > > > large(many TBs) persistent cache doing some calculation on read
> > data.
> > > > > >
> > > > > > The basic solution is to iterate cache using ScanQuery.
> > > > > >
> > > > > > This turns out to be slow because iteration over cache involves a
> > lot
> > > > of
> > > > > > random disk access for reading data pages referenced from leaf
> > pages
> > > by
> > > > > > links.
> > > > > >
> > > > > > This is especially true when data is stored on disks with slow
> > random
> > > > > > access, like SAS disks. In my case on modern SAS disks array
> > reading
> > > > > speed
> > > > > > was like several MB/sec while sequential read speed in perf test
> > was
> > > > > about
> > > > > > GB/sec.
> > > > > >
> > > > > > I was able to fix the issue by using ScanQuery with explicit
> > > partition
> > > > > set
> > > > > > and running simple warmup code before each partition scan.
> > > > > >
> > > > > > The code pins cold pages in memory in sequential order thus
> > > eliminating
> > > > > > random disk access. Speedup was like x100 magnitude.
> > > > > >
> > > > > > I suggest adding the improvement to the product's core  by always
> > > > > > sequentially preloading pages for all internal partition
> iterations
> > > > > (cache
> > > > > > iterators, scan queries, sql queries with scan plan) if partition
> > is
> > > > cold
> > > > > > (low number of pinned pages).
> > > > > >
> > > > > > This also should speed up rebalancing from cold partitions.
> > > > > >
> > > > > > Ignite JIRA ticket [1]
> > > > > >
> > > > > > Thoughts ?
> > > > > >
> > > > > > [1] https://issues.apache.org/jira/browse/IGNITE-8873
> > > > > >
> > > > > > --
> > > > > >
> > > > > > Best regards,
> > > > > > Alexei Scherbakov
> > > > > >
> > > > >
> > > >
> > > --
> > > --
> > > Maxim Muzafarov
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Cache scan efficiency

Alexey Goncharuk
Dmitriy,

In my understanding, the proper fix for the scan query looks like a big
change and it is unlikely that we include it in Ignite 2.7. On the other
hand, the method suggested by Alexei is quite simple  and it definitely
fits Ignite 2.7, which will provide a better user experience. Even having a
proper scan query implemented this method can be useful in some specific
scenarios, so we will not have to deprecate it.

--AG

пн, 17 сент. 2018 г. в 19:15, Dmitriy Pavlov <[hidden email]>:

> As I understood it is not a hack, it is an advanced feature for warming up
> the partition. We can build warm-up of the overall cache by calling its
> partitions warm-up. Users often ask about this feature and are not
> confident with our lazy upload.
>
> Please correct me if I misunderstood the idea.
>
> пн, 17 сент. 2018 г. в 18:37, Dmitriy Setrakyan <[hidden email]>:
>
> > I would rather fix the scan than hack the scan. Is there any technical
> > reason for hacking it now instead of fixing it properly? Can some of the
> > experts in this thread provide an estimate of complexity and difference
> in
> > work that would be required for each approach?
> >
> > D.
> >
> > On Mon, Sep 17, 2018 at 4:42 PM Alexey Goncharuk <
> > [hidden email]>
> > wrote:
> >
> > > I think it would be beneficial for some Ignite users if we added such a
> > > partition warmup method to the public API. The method should be
> > > well-documented and state that it may invalidate existing page cache.
> It
> > > will be a very effective instrument until we add the proper scan
> ability
> > > that Vladimir was referring to.
> > >
> > > пн, 17 сент. 2018 г. в 13:05, Maxim Muzafarov <[hidden email]>:
> > >
> > > > Folks,
> > > >
> > > > Such warming up can be an effective technique for performing
> > calculations
> > > > which required large cache
> > > > data reads, but I think it's the single narrow use case of all over
> > > Ignite
> > > > store usages. Like all other
> > > > powerfull techniques, we should use it wisely. In the general case, I
> > > think
> > > > we should consider other
> > > > techniques mentioned by Vladimir and may create something like
> `global
> > > > statistics of cache data usage`
> > > > to choose the best technique in each case.
> > > >
> > > > For instance, it's not obvious what would take longer: multi-block
> > reads
> > > or
> > > > 50 single-block reads issues
> > > > sequentially. It strongly depends on used hardware under the hood and
> > > might
> > > > depend on workload system
> > > > resources (CPU-intensive calculations and I\O access) as well. But
> > > > `statistics` will help us to choose
> > > > the right way.
> > > >
> > > >
> > > > On Sun, 16 Sep 2018 at 23:59 Dmitriy Pavlov <[hidden email]>
> > > wrote:
> > > >
> > > > > Hi Alexei,
> > > > >
> > > > > I did not find any PRs associated with the ticket for check code
> > > changes
> > > > > behind this idea. Are there any PRs?
> > > > >
> > > > > If we create some forwards scan of pages, it should be a very
> > > > intellectual
> > > > > algorithm including a lot of parameters (how much RAM is free, how
> > > > probably
> > > > > we will need next page, etc). We had the private talk about such
> idea
> > > > some
> > > > > time ago.
> > > > >
> > > > > By my experience, Linux systems already do such forward reading of
> > file
> > > > > data (for corresponding sequential flagged file descriptors), but
> > some
> > > > > prefetching of data at the level of application may be useful for
> > > > O_DIRECT
> > > > > file descriptors.
> > > > >
> > > > > And one more concern from me is about selecting a right place in
> the
> > > > system
> > > > > to do such prefetch.
> > > > >
> > > > > Sincerely,
> > > > > Dmitriy Pavlov
> > > > >
> > > > > вс, 16 сент. 2018 г. в 19:54, Vladimir Ozerov <
> [hidden email]
> > >:
> > > > >
> > > > > > HI Alex,
> > > > > >
> > > > > > This is good that you observed speedup. But I do not think this
> > > > solution
> > > > > > works for the product in general case. Amount of RAM is limited,
> > and
> > > > > even a
> > > > > > single partition may need more space than RAM available. Moving a
> > lot
> > > > of
> > > > > > pages to page memory for scan means that you evict a lot of other
> > > > pages,
> > > > > > what will ultimately lead to bad performance of subsequent
> queries
> > > and
> > > > > > defeat LRU algorithms, which are of great improtance for good
> > > database
> > > > > > performance.
> > > > > >
> > > > > > Database vendors choose another approach - skip BTrees, iterate
> > > > direclty
> > > > > > over data pages, read them in multi-block fashion, use separate
> > scan
> > > > > buffer
> > > > > > to avoid excessive evictions of other hot pages. Corresponding
> > ticket
> > > > for
> > > > > > SQL exists [1], but idea is common for all parts of the system,
> > > > requiring
> > > > > > scans.
> > > > > >
> > > > > > As far as proposed solution, it might be good idea to add special
> > API
> > > > to
> > > > > > "warmup" partition with clear explanation of pros (fast scan
> after
> > > > > warmup)
> > > > > > and cons (slowdown of any other operations). But I think we
> should
> > > not
> > > > > make
> > > > > > this approach part of normal scans.
> > > > > >
> > > > > > Vladimir.
> > > > > >
> > > > > > [1] https://issues.apache.org/jira/browse/IGNITE-6057
> > > > > >
> > > > > >
> > > > > > On Sun, Sep 16, 2018 at 6:44 PM Alexei Scherbakov <
> > > > > > [hidden email]> wrote:
> > > > > >
> > > > > > > Igniters,
> > > > > > >
> > > > > > > My use case involves scenario where it's necessary to iterate
> > over
> > > > > > > large(many TBs) persistent cache doing some calculation on read
> > > data.
> > > > > > >
> > > > > > > The basic solution is to iterate cache using ScanQuery.
> > > > > > >
> > > > > > > This turns out to be slow because iteration over cache
> involves a
> > > lot
> > > > > of
> > > > > > > random disk access for reading data pages referenced from leaf
> > > pages
> > > > by
> > > > > > > links.
> > > > > > >
> > > > > > > This is especially true when data is stored on disks with slow
> > > random
> > > > > > > access, like SAS disks. In my case on modern SAS disks array
> > > reading
> > > > > > speed
> > > > > > > was like several MB/sec while sequential read speed in perf
> test
> > > was
> > > > > > about
> > > > > > > GB/sec.
> > > > > > >
> > > > > > > I was able to fix the issue by using ScanQuery with explicit
> > > > partition
> > > > > > set
> > > > > > > and running simple warmup code before each partition scan.
> > > > > > >
> > > > > > > The code pins cold pages in memory in sequential order thus
> > > > eliminating
> > > > > > > random disk access. Speedup was like x100 magnitude.
> > > > > > >
> > > > > > > I suggest adding the improvement to the product's core  by
> always
> > > > > > > sequentially preloading pages for all internal partition
> > iterations
> > > > > > (cache
> > > > > > > iterators, scan queries, sql queries with scan plan) if
> partition
> > > is
> > > > > cold
> > > > > > > (low number of pinned pages).
> > > > > > >
> > > > > > > This also should speed up rebalancing from cold partitions.
> > > > > > >
> > > > > > > Ignite JIRA ticket [1]
> > > > > > >
> > > > > > > Thoughts ?
> > > > > > >
> > > > > > > [1] https://issues.apache.org/jira/browse/IGNITE-8873
> > > > > > >
> > > > > > > --
> > > > > > >
> > > > > > > Best regards,
> > > > > > > Alexei Scherbakov
> > > > > > >
> > > > > >
> > > > >
> > > > --
> > > > --
> > > > Maxim Muzafarov
> > > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Cache scan efficiency

Alexei Scherbakov
Summing up, I suggest adding new public
method IgniteCache.preloadPartition(partId).

I will start preparing PR for IGNITE-8873
<https://issues.apache.org/jira/browse/IGNITE-8873> if no more objections
follow.



вт, 18 сент. 2018 г. в 10:50, Alexey Goncharuk <[hidden email]>:

> Dmitriy,
>
> In my understanding, the proper fix for the scan query looks like a big
> change and it is unlikely that we include it in Ignite 2.7. On the other
> hand, the method suggested by Alexei is quite simple  and it definitely
> fits Ignite 2.7, which will provide a better user experience. Even having a
> proper scan query implemented this method can be useful in some specific
> scenarios, so we will not have to deprecate it.
>
> --AG
>
> пн, 17 сент. 2018 г. в 19:15, Dmitriy Pavlov <[hidden email]>:
>
> > As I understood it is not a hack, it is an advanced feature for warming
> up
> > the partition. We can build warm-up of the overall cache by calling its
> > partitions warm-up. Users often ask about this feature and are not
> > confident with our lazy upload.
> >
> > Please correct me if I misunderstood the idea.
> >
> > пн, 17 сент. 2018 г. в 18:37, Dmitriy Setrakyan <[hidden email]>:
> >
> > > I would rather fix the scan than hack the scan. Is there any technical
> > > reason for hacking it now instead of fixing it properly? Can some of
> the
> > > experts in this thread provide an estimate of complexity and difference
> > in
> > > work that would be required for each approach?
> > >
> > > D.
> > >
> > > On Mon, Sep 17, 2018 at 4:42 PM Alexey Goncharuk <
> > > [hidden email]>
> > > wrote:
> > >
> > > > I think it would be beneficial for some Ignite users if we added
> such a
> > > > partition warmup method to the public API. The method should be
> > > > well-documented and state that it may invalidate existing page cache.
> > It
> > > > will be a very effective instrument until we add the proper scan
> > ability
> > > > that Vladimir was referring to.
> > > >
> > > > пн, 17 сент. 2018 г. в 13:05, Maxim Muzafarov <[hidden email]>:
> > > >
> > > > > Folks,
> > > > >
> > > > > Such warming up can be an effective technique for performing
> > > calculations
> > > > > which required large cache
> > > > > data reads, but I think it's the single narrow use case of all over
> > > > Ignite
> > > > > store usages. Like all other
> > > > > powerfull techniques, we should use it wisely. In the general
> case, I
> > > > think
> > > > > we should consider other
> > > > > techniques mentioned by Vladimir and may create something like
> > `global
> > > > > statistics of cache data usage`
> > > > > to choose the best technique in each case.
> > > > >
> > > > > For instance, it's not obvious what would take longer: multi-block
> > > reads
> > > > or
> > > > > 50 single-block reads issues
> > > > > sequentially. It strongly depends on used hardware under the hood
> and
> > > > might
> > > > > depend on workload system
> > > > > resources (CPU-intensive calculations and I\O access) as well. But
> > > > > `statistics` will help us to choose
> > > > > the right way.
> > > > >
> > > > >
> > > > > On Sun, 16 Sep 2018 at 23:59 Dmitriy Pavlov <[hidden email]
> >
> > > > wrote:
> > > > >
> > > > > > Hi Alexei,
> > > > > >
> > > > > > I did not find any PRs associated with the ticket for check code
> > > > changes
> > > > > > behind this idea. Are there any PRs?
> > > > > >
> > > > > > If we create some forwards scan of pages, it should be a very
> > > > > intellectual
> > > > > > algorithm including a lot of parameters (how much RAM is free,
> how
> > > > > probably
> > > > > > we will need next page, etc). We had the private talk about such
> > idea
> > > > > some
> > > > > > time ago.
> > > > > >
> > > > > > By my experience, Linux systems already do such forward reading
> of
> > > file
> > > > > > data (for corresponding sequential flagged file descriptors), but
> > > some
> > > > > > prefetching of data at the level of application may be useful for
> > > > > O_DIRECT
> > > > > > file descriptors.
> > > > > >
> > > > > > And one more concern from me is about selecting a right place in
> > the
> > > > > system
> > > > > > to do such prefetch.
> > > > > >
> > > > > > Sincerely,
> > > > > > Dmitriy Pavlov
> > > > > >
> > > > > > вс, 16 сент. 2018 г. в 19:54, Vladimir Ozerov <
> > [hidden email]
> > > >:
> > > > > >
> > > > > > > HI Alex,
> > > > > > >
> > > > > > > This is good that you observed speedup. But I do not think this
> > > > > solution
> > > > > > > works for the product in general case. Amount of RAM is
> limited,
> > > and
> > > > > > even a
> > > > > > > single partition may need more space than RAM available.
> Moving a
> > > lot
> > > > > of
> > > > > > > pages to page memory for scan means that you evict a lot of
> other
> > > > > pages,
> > > > > > > what will ultimately lead to bad performance of subsequent
> > queries
> > > > and
> > > > > > > defeat LRU algorithms, which are of great improtance for good
> > > > database
> > > > > > > performance.
> > > > > > >
> > > > > > > Database vendors choose another approach - skip BTrees, iterate
> > > > > direclty
> > > > > > > over data pages, read them in multi-block fashion, use separate
> > > scan
> > > > > > buffer
> > > > > > > to avoid excessive evictions of other hot pages. Corresponding
> > > ticket
> > > > > for
> > > > > > > SQL exists [1], but idea is common for all parts of the system,
> > > > > requiring
> > > > > > > scans.
> > > > > > >
> > > > > > > As far as proposed solution, it might be good idea to add
> special
> > > API
> > > > > to
> > > > > > > "warmup" partition with clear explanation of pros (fast scan
> > after
> > > > > > warmup)
> > > > > > > and cons (slowdown of any other operations). But I think we
> > should
> > > > not
> > > > > > make
> > > > > > > this approach part of normal scans.
> > > > > > >
> > > > > > > Vladimir.
> > > > > > >
> > > > > > > [1] https://issues.apache.org/jira/browse/IGNITE-6057
> > > > > > >
> > > > > > >
> > > > > > > On Sun, Sep 16, 2018 at 6:44 PM Alexei Scherbakov <
> > > > > > > [hidden email]> wrote:
> > > > > > >
> > > > > > > > Igniters,
> > > > > > > >
> > > > > > > > My use case involves scenario where it's necessary to iterate
> > > over
> > > > > > > > large(many TBs) persistent cache doing some calculation on
> read
> > > > data.
> > > > > > > >
> > > > > > > > The basic solution is to iterate cache using ScanQuery.
> > > > > > > >
> > > > > > > > This turns out to be slow because iteration over cache
> > involves a
> > > > lot
> > > > > > of
> > > > > > > > random disk access for reading data pages referenced from
> leaf
> > > > pages
> > > > > by
> > > > > > > > links.
> > > > > > > >
> > > > > > > > This is especially true when data is stored on disks with
> slow
> > > > random
> > > > > > > > access, like SAS disks. In my case on modern SAS disks array
> > > > reading
> > > > > > > speed
> > > > > > > > was like several MB/sec while sequential read speed in perf
> > test
> > > > was
> > > > > > > about
> > > > > > > > GB/sec.
> > > > > > > >
> > > > > > > > I was able to fix the issue by using ScanQuery with explicit
> > > > > partition
> > > > > > > set
> > > > > > > > and running simple warmup code before each partition scan.
> > > > > > > >
> > > > > > > > The code pins cold pages in memory in sequential order thus
> > > > > eliminating
> > > > > > > > random disk access. Speedup was like x100 magnitude.
> > > > > > > >
> > > > > > > > I suggest adding the improvement to the product's core  by
> > always
> > > > > > > > sequentially preloading pages for all internal partition
> > > iterations
> > > > > > > (cache
> > > > > > > > iterators, scan queries, sql queries with scan plan) if
> > partition
> > > > is
> > > > > > cold
> > > > > > > > (low number of pinned pages).
> > > > > > > >
> > > > > > > > This also should speed up rebalancing from cold partitions.
> > > > > > > >
> > > > > > > > Ignite JIRA ticket [1]
> > > > > > > >
> > > > > > > > Thoughts ?
> > > > > > > >
> > > > > > > > [1] https://issues.apache.org/jira/browse/IGNITE-8873
> > > > > > > >
> > > > > > > > --
> > > > > > > >
> > > > > > > > Best regards,
> > > > > > > > Alexei Scherbakov
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > --
> > > > > --
> > > > > Maxim Muzafarov
> > > > >
> > > >
> > >
> >
>


--

Best regards,
Alexei Scherbakov
Reply | Threaded
Open this post in threaded view
|

Re: Cache scan efficiency

dsetrakyan
On Tue, Sep 18, 2018 at 1:58 PM Alexei Scherbakov <
[hidden email]> wrote:

> Summing up, I suggest adding new public
> method IgniteCache.preloadPartition(partId).
>
> I will start preparing PR for IGNITE-8873
> <https://issues.apache.org/jira/browse/IGNITE-8873> if no more objections
> follow.
>


Alexey, let's make sure we document this feature very well in Javadoc, as
well as in public readme.io documentation. Also, all cache iterator methods
and SCAN queries should be documented, suggesting when the partitions
should be preloaded to achieve better performance.

D.
Reply | Threaded
Open this post in threaded view
|

Re: Cache scan efficiency

dmagda
In reply to this post by Alexei Scherbakov
Folks,

Since we're adding a method that would preload a certain partition, can we
add the one which will preload the whole cache? Ignite persistence users
I've been working with look puzzled once they realize there is no way to
warm up RAM after the restart. There are use cases that require this.

Can the current optimizations be expanded to the cache preloading use case?

--
Denis

On Tue, Sep 18, 2018 at 3:58 AM Alexei Scherbakov <
[hidden email]> wrote:

> Summing up, I suggest adding new public
> method IgniteCache.preloadPartition(partId).
>
> I will start preparing PR for IGNITE-8873
> <https://issues.apache.org/jira/browse/IGNITE-8873> if no more objections
> follow.
>
>
>
> вт, 18 сент. 2018 г. в 10:50, Alexey Goncharuk <[hidden email]
> >:
>
> > Dmitriy,
> >
> > In my understanding, the proper fix for the scan query looks like a big
> > change and it is unlikely that we include it in Ignite 2.7. On the other
> > hand, the method suggested by Alexei is quite simple  and it definitely
> > fits Ignite 2.7, which will provide a better user experience. Even
> having a
> > proper scan query implemented this method can be useful in some specific
> > scenarios, so we will not have to deprecate it.
> >
> > --AG
> >
> > пн, 17 сент. 2018 г. в 19:15, Dmitriy Pavlov <[hidden email]>:
> >
> > > As I understood it is not a hack, it is an advanced feature for warming
> > up
> > > the partition. We can build warm-up of the overall cache by calling its
> > > partitions warm-up. Users often ask about this feature and are not
> > > confident with our lazy upload.
> > >
> > > Please correct me if I misunderstood the idea.
> > >
> > > пн, 17 сент. 2018 г. в 18:37, Dmitriy Setrakyan <[hidden email]
> >:
> > >
> > > > I would rather fix the scan than hack the scan. Is there any
> technical
> > > > reason for hacking it now instead of fixing it properly? Can some of
> > the
> > > > experts in this thread provide an estimate of complexity and
> difference
> > > in
> > > > work that would be required for each approach?
> > > >
> > > > D.
> > > >
> > > > On Mon, Sep 17, 2018 at 4:42 PM Alexey Goncharuk <
> > > > [hidden email]>
> > > > wrote:
> > > >
> > > > > I think it would be beneficial for some Ignite users if we added
> > such a
> > > > > partition warmup method to the public API. The method should be
> > > > > well-documented and state that it may invalidate existing page
> cache.
> > > It
> > > > > will be a very effective instrument until we add the proper scan
> > > ability
> > > > > that Vladimir was referring to.
> > > > >
> > > > > пн, 17 сент. 2018 г. в 13:05, Maxim Muzafarov <[hidden email]
> >:
> > > > >
> > > > > > Folks,
> > > > > >
> > > > > > Such warming up can be an effective technique for performing
> > > > calculations
> > > > > > which required large cache
> > > > > > data reads, but I think it's the single narrow use case of all
> over
> > > > > Ignite
> > > > > > store usages. Like all other
> > > > > > powerfull techniques, we should use it wisely. In the general
> > case, I
> > > > > think
> > > > > > we should consider other
> > > > > > techniques mentioned by Vladimir and may create something like
> > > `global
> > > > > > statistics of cache data usage`
> > > > > > to choose the best technique in each case.
> > > > > >
> > > > > > For instance, it's not obvious what would take longer:
> multi-block
> > > > reads
> > > > > or
> > > > > > 50 single-block reads issues
> > > > > > sequentially. It strongly depends on used hardware under the hood
> > and
> > > > > might
> > > > > > depend on workload system
> > > > > > resources (CPU-intensive calculations and I\O access) as well.
> But
> > > > > > `statistics` will help us to choose
> > > > > > the right way.
> > > > > >
> > > > > >
> > > > > > On Sun, 16 Sep 2018 at 23:59 Dmitriy Pavlov <
> [hidden email]
> > >
> > > > > wrote:
> > > > > >
> > > > > > > Hi Alexei,
> > > > > > >
> > > > > > > I did not find any PRs associated with the ticket for check
> code
> > > > > changes
> > > > > > > behind this idea. Are there any PRs?
> > > > > > >
> > > > > > > If we create some forwards scan of pages, it should be a very
> > > > > > intellectual
> > > > > > > algorithm including a lot of parameters (how much RAM is free,
> > how
> > > > > > probably
> > > > > > > we will need next page, etc). We had the private talk about
> such
> > > idea
> > > > > > some
> > > > > > > time ago.
> > > > > > >
> > > > > > > By my experience, Linux systems already do such forward reading
> > of
> > > > file
> > > > > > > data (for corresponding sequential flagged file descriptors),
> but
> > > > some
> > > > > > > prefetching of data at the level of application may be useful
> for
> > > > > > O_DIRECT
> > > > > > > file descriptors.
> > > > > > >
> > > > > > > And one more concern from me is about selecting a right place
> in
> > > the
> > > > > > system
> > > > > > > to do such prefetch.
> > > > > > >
> > > > > > > Sincerely,
> > > > > > > Dmitriy Pavlov
> > > > > > >
> > > > > > > вс, 16 сент. 2018 г. в 19:54, Vladimir Ozerov <
> > > [hidden email]
> > > > >:
> > > > > > >
> > > > > > > > HI Alex,
> > > > > > > >
> > > > > > > > This is good that you observed speedup. But I do not think
> this
> > > > > > solution
> > > > > > > > works for the product in general case. Amount of RAM is
> > limited,
> > > > and
> > > > > > > even a
> > > > > > > > single partition may need more space than RAM available.
> > Moving a
> > > > lot
> > > > > > of
> > > > > > > > pages to page memory for scan means that you evict a lot of
> > other
> > > > > > pages,
> > > > > > > > what will ultimately lead to bad performance of subsequent
> > > queries
> > > > > and
> > > > > > > > defeat LRU algorithms, which are of great improtance for good
> > > > > database
> > > > > > > > performance.
> > > > > > > >
> > > > > > > > Database vendors choose another approach - skip BTrees,
> iterate
> > > > > > direclty
> > > > > > > > over data pages, read them in multi-block fashion, use
> separate
> > > > scan
> > > > > > > buffer
> > > > > > > > to avoid excessive evictions of other hot pages.
> Corresponding
> > > > ticket
> > > > > > for
> > > > > > > > SQL exists [1], but idea is common for all parts of the
> system,
> > > > > > requiring
> > > > > > > > scans.
> > > > > > > >
> > > > > > > > As far as proposed solution, it might be good idea to add
> > special
> > > > API
> > > > > > to
> > > > > > > > "warmup" partition with clear explanation of pros (fast scan
> > > after
> > > > > > > warmup)
> > > > > > > > and cons (slowdown of any other operations). But I think we
> > > should
> > > > > not
> > > > > > > make
> > > > > > > > this approach part of normal scans.
> > > > > > > >
> > > > > > > > Vladimir.
> > > > > > > >
> > > > > > > > [1] https://issues.apache.org/jira/browse/IGNITE-6057
> > > > > > > >
> > > > > > > >
> > > > > > > > On Sun, Sep 16, 2018 at 6:44 PM Alexei Scherbakov <
> > > > > > > > [hidden email]> wrote:
> > > > > > > >
> > > > > > > > > Igniters,
> > > > > > > > >
> > > > > > > > > My use case involves scenario where it's necessary to
> iterate
> > > > over
> > > > > > > > > large(many TBs) persistent cache doing some calculation on
> > read
> > > > > data.
> > > > > > > > >
> > > > > > > > > The basic solution is to iterate cache using ScanQuery.
> > > > > > > > >
> > > > > > > > > This turns out to be slow because iteration over cache
> > > involves a
> > > > > lot
> > > > > > > of
> > > > > > > > > random disk access for reading data pages referenced from
> > leaf
> > > > > pages
> > > > > > by
> > > > > > > > > links.
> > > > > > > > >
> > > > > > > > > This is especially true when data is stored on disks with
> > slow
> > > > > random
> > > > > > > > > access, like SAS disks. In my case on modern SAS disks
> array
> > > > > reading
> > > > > > > > speed
> > > > > > > > > was like several MB/sec while sequential read speed in perf
> > > test
> > > > > was
> > > > > > > > about
> > > > > > > > > GB/sec.
> > > > > > > > >
> > > > > > > > > I was able to fix the issue by using ScanQuery with
> explicit
> > > > > > partition
> > > > > > > > set
> > > > > > > > > and running simple warmup code before each partition scan.
> > > > > > > > >
> > > > > > > > > The code pins cold pages in memory in sequential order thus
> > > > > > eliminating
> > > > > > > > > random disk access. Speedup was like x100 magnitude.
> > > > > > > > >
> > > > > > > > > I suggest adding the improvement to the product's core  by
> > > always
> > > > > > > > > sequentially preloading pages for all internal partition
> > > > iterations
> > > > > > > > (cache
> > > > > > > > > iterators, scan queries, sql queries with scan plan) if
> > > partition
> > > > > is
> > > > > > > cold
> > > > > > > > > (low number of pinned pages).
> > > > > > > > >
> > > > > > > > > This also should speed up rebalancing from cold partitions.
> > > > > > > > >
> > > > > > > > > Ignite JIRA ticket [1]
> > > > > > > > >
> > > > > > > > > Thoughts ?
> > > > > > > > >
> > > > > > > > > [1] https://issues.apache.org/jira/browse/IGNITE-8873
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > >
> > > > > > > > > Best regards,
> > > > > > > > > Alexei Scherbakov
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > --
> > > > > > --
> > > > > > Maxim Muzafarov
> > > > > >
> > > > >
> > > >
> > >
> >
>
>
> --
>
> Best regards,
> Alexei Scherbakov
>
Reply | Threaded
Open this post in threaded view
|

Re: Cache scan efficiency

Dmitriy Pavlov
Hi,

I totally support the idea of cache preload.

IMO it can be expanded. We can iterate over local partitions of the cache
group and preload each.

But it should be really clear documented methods so a user can be aware of
the benefits of such method (e.g. if RAM region is big enough, etc).

Sincerely,
Dmitriy Pavlov

вт, 18 сент. 2018 г. в 21:36, Denis Magda <[hidden email]>:

> Folks,
>
> Since we're adding a method that would preload a certain partition, can we
> add the one which will preload the whole cache? Ignite persistence users
> I've been working with look puzzled once they realize there is no way to
> warm up RAM after the restart. There are use cases that require this.
>
> Can the current optimizations be expanded to the cache preloading use case?
>
> --
> Denis
>
> On Tue, Sep 18, 2018 at 3:58 AM Alexei Scherbakov <
> [hidden email]> wrote:
>
> > Summing up, I suggest adding new public
> > method IgniteCache.preloadPartition(partId).
> >
> > I will start preparing PR for IGNITE-8873
> > <https://issues.apache.org/jira/browse/IGNITE-8873> if no more
> objections
> > follow.
> >
> >
> >
> > вт, 18 сент. 2018 г. в 10:50, Alexey Goncharuk <
> [hidden email]
> > >:
> >
> > > Dmitriy,
> > >
> > > In my understanding, the proper fix for the scan query looks like a big
> > > change and it is unlikely that we include it in Ignite 2.7. On the
> other
> > > hand, the method suggested by Alexei is quite simple  and it definitely
> > > fits Ignite 2.7, which will provide a better user experience. Even
> > having a
> > > proper scan query implemented this method can be useful in some
> specific
> > > scenarios, so we will not have to deprecate it.
> > >
> > > --AG
> > >
> > > пн, 17 сент. 2018 г. в 19:15, Dmitriy Pavlov <[hidden email]>:
> > >
> > > > As I understood it is not a hack, it is an advanced feature for
> warming
> > > up
> > > > the partition. We can build warm-up of the overall cache by calling
> its
> > > > partitions warm-up. Users often ask about this feature and are not
> > > > confident with our lazy upload.
> > > >
> > > > Please correct me if I misunderstood the idea.
> > > >
> > > > пн, 17 сент. 2018 г. в 18:37, Dmitriy Setrakyan <
> [hidden email]
> > >:
> > > >
> > > > > I would rather fix the scan than hack the scan. Is there any
> > technical
> > > > > reason for hacking it now instead of fixing it properly? Can some
> of
> > > the
> > > > > experts in this thread provide an estimate of complexity and
> > difference
> > > > in
> > > > > work that would be required for each approach?
> > > > >
> > > > > D.
> > > > >
> > > > > On Mon, Sep 17, 2018 at 4:42 PM Alexey Goncharuk <
> > > > > [hidden email]>
> > > > > wrote:
> > > > >
> > > > > > I think it would be beneficial for some Ignite users if we added
> > > such a
> > > > > > partition warmup method to the public API. The method should be
> > > > > > well-documented and state that it may invalidate existing page
> > cache.
> > > > It
> > > > > > will be a very effective instrument until we add the proper scan
> > > > ability
> > > > > > that Vladimir was referring to.
> > > > > >
> > > > > > пн, 17 сент. 2018 г. в 13:05, Maxim Muzafarov <
> [hidden email]
> > >:
> > > > > >
> > > > > > > Folks,
> > > > > > >
> > > > > > > Such warming up can be an effective technique for performing
> > > > > calculations
> > > > > > > which required large cache
> > > > > > > data reads, but I think it's the single narrow use case of all
> > over
> > > > > > Ignite
> > > > > > > store usages. Like all other
> > > > > > > powerfull techniques, we should use it wisely. In the general
> > > case, I
> > > > > > think
> > > > > > > we should consider other
> > > > > > > techniques mentioned by Vladimir and may create something like
> > > > `global
> > > > > > > statistics of cache data usage`
> > > > > > > to choose the best technique in each case.
> > > > > > >
> > > > > > > For instance, it's not obvious what would take longer:
> > multi-block
> > > > > reads
> > > > > > or
> > > > > > > 50 single-block reads issues
> > > > > > > sequentially. It strongly depends on used hardware under the
> hood
> > > and
> > > > > > might
> > > > > > > depend on workload system
> > > > > > > resources (CPU-intensive calculations and I\O access) as well.
> > But
> > > > > > > `statistics` will help us to choose
> > > > > > > the right way.
> > > > > > >
> > > > > > >
> > > > > > > On Sun, 16 Sep 2018 at 23:59 Dmitriy Pavlov <
> > [hidden email]
> > > >
> > > > > > wrote:
> > > > > > >
> > > > > > > > Hi Alexei,
> > > > > > > >
> > > > > > > > I did not find any PRs associated with the ticket for check
> > code
> > > > > > changes
> > > > > > > > behind this idea. Are there any PRs?
> > > > > > > >
> > > > > > > > If we create some forwards scan of pages, it should be a very
> > > > > > > intellectual
> > > > > > > > algorithm including a lot of parameters (how much RAM is
> free,
> > > how
> > > > > > > probably
> > > > > > > > we will need next page, etc). We had the private talk about
> > such
> > > > idea
> > > > > > > some
> > > > > > > > time ago.
> > > > > > > >
> > > > > > > > By my experience, Linux systems already do such forward
> reading
> > > of
> > > > > file
> > > > > > > > data (for corresponding sequential flagged file descriptors),
> > but
> > > > > some
> > > > > > > > prefetching of data at the level of application may be useful
> > for
> > > > > > > O_DIRECT
> > > > > > > > file descriptors.
> > > > > > > >
> > > > > > > > And one more concern from me is about selecting a right place
> > in
> > > > the
> > > > > > > system
> > > > > > > > to do such prefetch.
> > > > > > > >
> > > > > > > > Sincerely,
> > > > > > > > Dmitriy Pavlov
> > > > > > > >
> > > > > > > > вс, 16 сент. 2018 г. в 19:54, Vladimir Ozerov <
> > > > [hidden email]
> > > > > >:
> > > > > > > >
> > > > > > > > > HI Alex,
> > > > > > > > >
> > > > > > > > > This is good that you observed speedup. But I do not think
> > this
> > > > > > > solution
> > > > > > > > > works for the product in general case. Amount of RAM is
> > > limited,
> > > > > and
> > > > > > > > even a
> > > > > > > > > single partition may need more space than RAM available.
> > > Moving a
> > > > > lot
> > > > > > > of
> > > > > > > > > pages to page memory for scan means that you evict a lot of
> > > other
> > > > > > > pages,
> > > > > > > > > what will ultimately lead to bad performance of subsequent
> > > > queries
> > > > > > and
> > > > > > > > > defeat LRU algorithms, which are of great improtance for
> good
> > > > > > database
> > > > > > > > > performance.
> > > > > > > > >
> > > > > > > > > Database vendors choose another approach - skip BTrees,
> > iterate
> > > > > > > direclty
> > > > > > > > > over data pages, read them in multi-block fashion, use
> > separate
> > > > > scan
> > > > > > > > buffer
> > > > > > > > > to avoid excessive evictions of other hot pages.
> > Corresponding
> > > > > ticket
> > > > > > > for
> > > > > > > > > SQL exists [1], but idea is common for all parts of the
> > system,
> > > > > > > requiring
> > > > > > > > > scans.
> > > > > > > > >
> > > > > > > > > As far as proposed solution, it might be good idea to add
> > > special
> > > > > API
> > > > > > > to
> > > > > > > > > "warmup" partition with clear explanation of pros (fast
> scan
> > > > after
> > > > > > > > warmup)
> > > > > > > > > and cons (slowdown of any other operations). But I think we
> > > > should
> > > > > > not
> > > > > > > > make
> > > > > > > > > this approach part of normal scans.
> > > > > > > > >
> > > > > > > > > Vladimir.
> > > > > > > > >
> > > > > > > > > [1] https://issues.apache.org/jira/browse/IGNITE-6057
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Sun, Sep 16, 2018 at 6:44 PM Alexei Scherbakov <
> > > > > > > > > [hidden email]> wrote:
> > > > > > > > >
> > > > > > > > > > Igniters,
> > > > > > > > > >
> > > > > > > > > > My use case involves scenario where it's necessary to
> > iterate
> > > > > over
> > > > > > > > > > large(many TBs) persistent cache doing some calculation
> on
> > > read
> > > > > > data.
> > > > > > > > > >
> > > > > > > > > > The basic solution is to iterate cache using ScanQuery.
> > > > > > > > > >
> > > > > > > > > > This turns out to be slow because iteration over cache
> > > > involves a
> > > > > > lot
> > > > > > > > of
> > > > > > > > > > random disk access for reading data pages referenced from
> > > leaf
> > > > > > pages
> > > > > > > by
> > > > > > > > > > links.
> > > > > > > > > >
> > > > > > > > > > This is especially true when data is stored on disks with
> > > slow
> > > > > > random
> > > > > > > > > > access, like SAS disks. In my case on modern SAS disks
> > array
> > > > > > reading
> > > > > > > > > speed
> > > > > > > > > > was like several MB/sec while sequential read speed in
> perf
> > > > test
> > > > > > was
> > > > > > > > > about
> > > > > > > > > > GB/sec.
> > > > > > > > > >
> > > > > > > > > > I was able to fix the issue by using ScanQuery with
> > explicit
> > > > > > > partition
> > > > > > > > > set
> > > > > > > > > > and running simple warmup code before each partition
> scan.
> > > > > > > > > >
> > > > > > > > > > The code pins cold pages in memory in sequential order
> thus
> > > > > > > eliminating
> > > > > > > > > > random disk access. Speedup was like x100 magnitude.
> > > > > > > > > >
> > > > > > > > > > I suggest adding the improvement to the product's core
> by
> > > > always
> > > > > > > > > > sequentially preloading pages for all internal partition
> > > > > iterations
> > > > > > > > > (cache
> > > > > > > > > > iterators, scan queries, sql queries with scan plan) if
> > > > partition
> > > > > > is
> > > > > > > > cold
> > > > > > > > > > (low number of pinned pages).
> > > > > > > > > >
> > > > > > > > > > This also should speed up rebalancing from cold
> partitions.
> > > > > > > > > >
> > > > > > > > > > Ignite JIRA ticket [1]
> > > > > > > > > >
> > > > > > > > > > Thoughts ?
> > > > > > > > > >
> > > > > > > > > > [1] https://issues.apache.org/jira/browse/IGNITE-8873
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > >
> > > > > > > > > > Best regards,
> > > > > > > > > > Alexei Scherbakov
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > > --
> > > > > > > --
> > > > > > > Maxim Muzafarov
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> >
> > --
> >
> > Best regards,
> > Alexei Scherbakov
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Cache scan efficiency

dmagda
Agree, it's just a matter of the documentation. If a user stores 100% in
RAM and on disk, and just wants to warm RAM up after a restart then he
knows everything will fit there. If during the preloading we detect that
the RAM is exhausted we can halt it and print out a warning.

--
Denis

On Tue, Sep 18, 2018 at 2:10 PM Dmitriy Pavlov <[hidden email]>
wrote:

> Hi,
>
> I totally support the idea of cache preload.
>
> IMO it can be expanded. We can iterate over local partitions of the cache
> group and preload each.
>
> But it should be really clear documented methods so a user can be aware of
> the benefits of such method (e.g. if RAM region is big enough, etc).
>
> Sincerely,
> Dmitriy Pavlov
>
> вт, 18 сент. 2018 г. в 21:36, Denis Magda <[hidden email]>:
>
> > Folks,
> >
> > Since we're adding a method that would preload a certain partition, can
> we
> > add the one which will preload the whole cache? Ignite persistence users
> > I've been working with look puzzled once they realize there is no way to
> > warm up RAM after the restart. There are use cases that require this.
> >
> > Can the current optimizations be expanded to the cache preloading use
> case?
> >
> > --
> > Denis
> >
> > On Tue, Sep 18, 2018 at 3:58 AM Alexei Scherbakov <
> > [hidden email]> wrote:
> >
> > > Summing up, I suggest adding new public
> > > method IgniteCache.preloadPartition(partId).
> > >
> > > I will start preparing PR for IGNITE-8873
> > > <https://issues.apache.org/jira/browse/IGNITE-8873> if no more
> > objections
> > > follow.
> > >
> > >
> > >
> > > вт, 18 сент. 2018 г. в 10:50, Alexey Goncharuk <
> > [hidden email]
> > > >:
> > >
> > > > Dmitriy,
> > > >
> > > > In my understanding, the proper fix for the scan query looks like a
> big
> > > > change and it is unlikely that we include it in Ignite 2.7. On the
> > other
> > > > hand, the method suggested by Alexei is quite simple  and it
> definitely
> > > > fits Ignite 2.7, which will provide a better user experience. Even
> > > having a
> > > > proper scan query implemented this method can be useful in some
> > specific
> > > > scenarios, so we will not have to deprecate it.
> > > >
> > > > --AG
> > > >
> > > > пн, 17 сент. 2018 г. в 19:15, Dmitriy Pavlov <[hidden email]
> >:
> > > >
> > > > > As I understood it is not a hack, it is an advanced feature for
> > warming
> > > > up
> > > > > the partition. We can build warm-up of the overall cache by calling
> > its
> > > > > partitions warm-up. Users often ask about this feature and are not
> > > > > confident with our lazy upload.
> > > > >
> > > > > Please correct me if I misunderstood the idea.
> > > > >
> > > > > пн, 17 сент. 2018 г. в 18:37, Dmitriy Setrakyan <
> > [hidden email]
> > > >:
> > > > >
> > > > > > I would rather fix the scan than hack the scan. Is there any
> > > technical
> > > > > > reason for hacking it now instead of fixing it properly? Can some
> > of
> > > > the
> > > > > > experts in this thread provide an estimate of complexity and
> > > difference
> > > > > in
> > > > > > work that would be required for each approach?
> > > > > >
> > > > > > D.
> > > > > >
> > > > > > On Mon, Sep 17, 2018 at 4:42 PM Alexey Goncharuk <
> > > > > > [hidden email]>
> > > > > > wrote:
> > > > > >
> > > > > > > I think it would be beneficial for some Ignite users if we
> added
> > > > such a
> > > > > > > partition warmup method to the public API. The method should be
> > > > > > > well-documented and state that it may invalidate existing page
> > > cache.
> > > > > It
> > > > > > > will be a very effective instrument until we add the proper
> scan
> > > > > ability
> > > > > > > that Vladimir was referring to.
> > > > > > >
> > > > > > > пн, 17 сент. 2018 г. в 13:05, Maxim Muzafarov <
> > [hidden email]
> > > >:
> > > > > > >
> > > > > > > > Folks,
> > > > > > > >
> > > > > > > > Such warming up can be an effective technique for performing
> > > > > > calculations
> > > > > > > > which required large cache
> > > > > > > > data reads, but I think it's the single narrow use case of
> all
> > > over
> > > > > > > Ignite
> > > > > > > > store usages. Like all other
> > > > > > > > powerfull techniques, we should use it wisely. In the general
> > > > case, I
> > > > > > > think
> > > > > > > > we should consider other
> > > > > > > > techniques mentioned by Vladimir and may create something
> like
> > > > > `global
> > > > > > > > statistics of cache data usage`
> > > > > > > > to choose the best technique in each case.
> > > > > > > >
> > > > > > > > For instance, it's not obvious what would take longer:
> > > multi-block
> > > > > > reads
> > > > > > > or
> > > > > > > > 50 single-block reads issues
> > > > > > > > sequentially. It strongly depends on used hardware under the
> > hood
> > > > and
> > > > > > > might
> > > > > > > > depend on workload system
> > > > > > > > resources (CPU-intensive calculations and I\O access) as
> well.
> > > But
> > > > > > > > `statistics` will help us to choose
> > > > > > > > the right way.
> > > > > > > >
> > > > > > > >
> > > > > > > > On Sun, 16 Sep 2018 at 23:59 Dmitriy Pavlov <
> > > [hidden email]
> > > > >
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi Alexei,
> > > > > > > > >
> > > > > > > > > I did not find any PRs associated with the ticket for check
> > > code
> > > > > > > changes
> > > > > > > > > behind this idea. Are there any PRs?
> > > > > > > > >
> > > > > > > > > If we create some forwards scan of pages, it should be a
> very
> > > > > > > > intellectual
> > > > > > > > > algorithm including a lot of parameters (how much RAM is
> > free,
> > > > how
> > > > > > > > probably
> > > > > > > > > we will need next page, etc). We had the private talk about
> > > such
> > > > > idea
> > > > > > > > some
> > > > > > > > > time ago.
> > > > > > > > >
> > > > > > > > > By my experience, Linux systems already do such forward
> > reading
> > > > of
> > > > > > file
> > > > > > > > > data (for corresponding sequential flagged file
> descriptors),
> > > but
> > > > > > some
> > > > > > > > > prefetching of data at the level of application may be
> useful
> > > for
> > > > > > > > O_DIRECT
> > > > > > > > > file descriptors.
> > > > > > > > >
> > > > > > > > > And one more concern from me is about selecting a right
> place
> > > in
> > > > > the
> > > > > > > > system
> > > > > > > > > to do such prefetch.
> > > > > > > > >
> > > > > > > > > Sincerely,
> > > > > > > > > Dmitriy Pavlov
> > > > > > > > >
> > > > > > > > > вс, 16 сент. 2018 г. в 19:54, Vladimir Ozerov <
> > > > > [hidden email]
> > > > > > >:
> > > > > > > > >
> > > > > > > > > > HI Alex,
> > > > > > > > > >
> > > > > > > > > > This is good that you observed speedup. But I do not
> think
> > > this
> > > > > > > > solution
> > > > > > > > > > works for the product in general case. Amount of RAM is
> > > > limited,
> > > > > > and
> > > > > > > > > even a
> > > > > > > > > > single partition may need more space than RAM available.
> > > > Moving a
> > > > > > lot
> > > > > > > > of
> > > > > > > > > > pages to page memory for scan means that you evict a lot
> of
> > > > other
> > > > > > > > pages,
> > > > > > > > > > what will ultimately lead to bad performance of
> subsequent
> > > > > queries
> > > > > > > and
> > > > > > > > > > defeat LRU algorithms, which are of great improtance for
> > good
> > > > > > > database
> > > > > > > > > > performance.
> > > > > > > > > >
> > > > > > > > > > Database vendors choose another approach - skip BTrees,
> > > iterate
> > > > > > > > direclty
> > > > > > > > > > over data pages, read them in multi-block fashion, use
> > > separate
> > > > > > scan
> > > > > > > > > buffer
> > > > > > > > > > to avoid excessive evictions of other hot pages.
> > > Corresponding
> > > > > > ticket
> > > > > > > > for
> > > > > > > > > > SQL exists [1], but idea is common for all parts of the
> > > system,
> > > > > > > > requiring
> > > > > > > > > > scans.
> > > > > > > > > >
> > > > > > > > > > As far as proposed solution, it might be good idea to add
> > > > special
> > > > > > API
> > > > > > > > to
> > > > > > > > > > "warmup" partition with clear explanation of pros (fast
> > scan
> > > > > after
> > > > > > > > > warmup)
> > > > > > > > > > and cons (slowdown of any other operations). But I think
> we
> > > > > should
> > > > > > > not
> > > > > > > > > make
> > > > > > > > > > this approach part of normal scans.
> > > > > > > > > >
> > > > > > > > > > Vladimir.
> > > > > > > > > >
> > > > > > > > > > [1] https://issues.apache.org/jira/browse/IGNITE-6057
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Sun, Sep 16, 2018 at 6:44 PM Alexei Scherbakov <
> > > > > > > > > > [hidden email]> wrote:
> > > > > > > > > >
> > > > > > > > > > > Igniters,
> > > > > > > > > > >
> > > > > > > > > > > My use case involves scenario where it's necessary to
> > > iterate
> > > > > > over
> > > > > > > > > > > large(many TBs) persistent cache doing some calculation
> > on
> > > > read
> > > > > > > data.
> > > > > > > > > > >
> > > > > > > > > > > The basic solution is to iterate cache using ScanQuery.
> > > > > > > > > > >
> > > > > > > > > > > This turns out to be slow because iteration over cache
> > > > > involves a
> > > > > > > lot
> > > > > > > > > of
> > > > > > > > > > > random disk access for reading data pages referenced
> from
> > > > leaf
> > > > > > > pages
> > > > > > > > by
> > > > > > > > > > > links.
> > > > > > > > > > >
> > > > > > > > > > > This is especially true when data is stored on disks
> with
> > > > slow
> > > > > > > random
> > > > > > > > > > > access, like SAS disks. In my case on modern SAS disks
> > > array
> > > > > > > reading
> > > > > > > > > > speed
> > > > > > > > > > > was like several MB/sec while sequential read speed in
> > perf
> > > > > test
> > > > > > > was
> > > > > > > > > > about
> > > > > > > > > > > GB/sec.
> > > > > > > > > > >
> > > > > > > > > > > I was able to fix the issue by using ScanQuery with
> > > explicit
> > > > > > > > partition
> > > > > > > > > > set
> > > > > > > > > > > and running simple warmup code before each partition
> > scan.
> > > > > > > > > > >
> > > > > > > > > > > The code pins cold pages in memory in sequential order
> > thus
> > > > > > > > eliminating
> > > > > > > > > > > random disk access. Speedup was like x100 magnitude.
> > > > > > > > > > >
> > > > > > > > > > > I suggest adding the improvement to the product's core
> > by
> > > > > always
> > > > > > > > > > > sequentially preloading pages for all internal
> partition
> > > > > > iterations
> > > > > > > > > > (cache
> > > > > > > > > > > iterators, scan queries, sql queries with scan plan) if
> > > > > partition
> > > > > > > is
> > > > > > > > > cold
> > > > > > > > > > > (low number of pinned pages).
> > > > > > > > > > >
> > > > > > > > > > > This also should speed up rebalancing from cold
> > partitions.
> > > > > > > > > > >
> > > > > > > > > > > Ignite JIRA ticket [1]
> > > > > > > > > > >
> > > > > > > > > > > Thoughts ?
> > > > > > > > > > >
> > > > > > > > > > > [1] https://issues.apache.org/jira/browse/IGNITE-8873
> > > > > > > > > > >
> > > > > > > > > > > --
> > > > > > > > > > >
> > > > > > > > > > > Best regards,
> > > > > > > > > > > Alexei Scherbakov
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > --
> > > > > > > > --
> > > > > > > > Maxim Muzafarov
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > >
> > > --
> > >
> > > Best regards,
> > > Alexei Scherbakov
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Cache scan efficiency

Dmitriy Pavlov
Even better, if RAM is exhausted page replacement process will be started.
https://cwiki.apache.org/confluence/display/IGNITE/Ignite+Durable+Memory+-+under+the+hood#IgniteDurableMemory-underthehood-Pagereplacement(rotationwithdisk)

Effect of the preloading will be still markable, but not as excelled as
with full-fitting into RAM. Later I can review or improve javadocs if it is
necessary.

ср, 19 сент. 2018 г. в 0:18, Denis Magda <[hidden email]>:

> Agree, it's just a matter of the documentation. If a user stores 100% in
> RAM and on disk, and just wants to warm RAM up after a restart then he
> knows everything will fit there. If during the preloading we detect that
> the RAM is exhausted we can halt it and print out a warning.
>
> --
> Denis
>
> On Tue, Sep 18, 2018 at 2:10 PM Dmitriy Pavlov <[hidden email]>
> wrote:
>
> > Hi,
> >
> > I totally support the idea of cache preload.
> >
> > IMO it can be expanded. We can iterate over local partitions of the cache
> > group and preload each.
> >
> > But it should be really clear documented methods so a user can be aware
> of
> > the benefits of such method (e.g. if RAM region is big enough, etc).
> >
> > Sincerely,
> > Dmitriy Pavlov
> >
> > вт, 18 сент. 2018 г. в 21:36, Denis Magda <[hidden email]>:
> >
> > > Folks,
> > >
> > > Since we're adding a method that would preload a certain partition, can
> > we
> > > add the one which will preload the whole cache? Ignite persistence
> users
> > > I've been working with look puzzled once they realize there is no way
> to
> > > warm up RAM after the restart. There are use cases that require this.
> > >
> > > Can the current optimizations be expanded to the cache preloading use
> > case?
> > >
> > > --
> > > Denis
> > >
> > > On Tue, Sep 18, 2018 at 3:58 AM Alexei Scherbakov <
> > > [hidden email]> wrote:
> > >
> > > > Summing up, I suggest adding new public
> > > > method IgniteCache.preloadPartition(partId).
> > > >
> > > > I will start preparing PR for IGNITE-8873
> > > > <https://issues.apache.org/jira/browse/IGNITE-8873> if no more
> > > objections
> > > > follow.
> > > >
> > > >
> > > >
> > > > вт, 18 сент. 2018 г. в 10:50, Alexey Goncharuk <
> > > [hidden email]
> > > > >:
> > > >
> > > > > Dmitriy,
> > > > >
> > > > > In my understanding, the proper fix for the scan query looks like a
> > big
> > > > > change and it is unlikely that we include it in Ignite 2.7. On the
> > > other
> > > > > hand, the method suggested by Alexei is quite simple  and it
> > definitely
> > > > > fits Ignite 2.7, which will provide a better user experience. Even
> > > > having a
> > > > > proper scan query implemented this method can be useful in some
> > > specific
> > > > > scenarios, so we will not have to deprecate it.
> > > > >
> > > > > --AG
> > > > >
> > > > > пн, 17 сент. 2018 г. в 19:15, Dmitriy Pavlov <
> [hidden email]
> > >:
> > > > >
> > > > > > As I understood it is not a hack, it is an advanced feature for
> > > warming
> > > > > up
> > > > > > the partition. We can build warm-up of the overall cache by
> calling
> > > its
> > > > > > partitions warm-up. Users often ask about this feature and are
> not
> > > > > > confident with our lazy upload.
> > > > > >
> > > > > > Please correct me if I misunderstood the idea.
> > > > > >
> > > > > > пн, 17 сент. 2018 г. в 18:37, Dmitriy Setrakyan <
> > > [hidden email]
> > > > >:
> > > > > >
> > > > > > > I would rather fix the scan than hack the scan. Is there any
> > > > technical
> > > > > > > reason for hacking it now instead of fixing it properly? Can
> some
> > > of
> > > > > the
> > > > > > > experts in this thread provide an estimate of complexity and
> > > > difference
> > > > > > in
> > > > > > > work that would be required for each approach?
> > > > > > >
> > > > > > > D.
> > > > > > >
> > > > > > > On Mon, Sep 17, 2018 at 4:42 PM Alexey Goncharuk <
> > > > > > > [hidden email]>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > I think it would be beneficial for some Ignite users if we
> > added
> > > > > such a
> > > > > > > > partition warmup method to the public API. The method should
> be
> > > > > > > > well-documented and state that it may invalidate existing
> page
> > > > cache.
> > > > > > It
> > > > > > > > will be a very effective instrument until we add the proper
> > scan
> > > > > > ability
> > > > > > > > that Vladimir was referring to.
> > > > > > > >
> > > > > > > > пн, 17 сент. 2018 г. в 13:05, Maxim Muzafarov <
> > > [hidden email]
> > > > >:
> > > > > > > >
> > > > > > > > > Folks,
> > > > > > > > >
> > > > > > > > > Such warming up can be an effective technique for
> performing
> > > > > > > calculations
> > > > > > > > > which required large cache
> > > > > > > > > data reads, but I think it's the single narrow use case of
> > all
> > > > over
> > > > > > > > Ignite
> > > > > > > > > store usages. Like all other
> > > > > > > > > powerfull techniques, we should use it wisely. In the
> general
> > > > > case, I
> > > > > > > > think
> > > > > > > > > we should consider other
> > > > > > > > > techniques mentioned by Vladimir and may create something
> > like
> > > > > > `global
> > > > > > > > > statistics of cache data usage`
> > > > > > > > > to choose the best technique in each case.
> > > > > > > > >
> > > > > > > > > For instance, it's not obvious what would take longer:
> > > > multi-block
> > > > > > > reads
> > > > > > > > or
> > > > > > > > > 50 single-block reads issues
> > > > > > > > > sequentially. It strongly depends on used hardware under
> the
> > > hood
> > > > > and
> > > > > > > > might
> > > > > > > > > depend on workload system
> > > > > > > > > resources (CPU-intensive calculations and I\O access) as
> > well.
> > > > But
> > > > > > > > > `statistics` will help us to choose
> > > > > > > > > the right way.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Sun, 16 Sep 2018 at 23:59 Dmitriy Pavlov <
> > > > [hidden email]
> > > > > >
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi Alexei,
> > > > > > > > > >
> > > > > > > > > > I did not find any PRs associated with the ticket for
> check
> > > > code
> > > > > > > > changes
> > > > > > > > > > behind this idea. Are there any PRs?
> > > > > > > > > >
> > > > > > > > > > If we create some forwards scan of pages, it should be a
> > very
> > > > > > > > > intellectual
> > > > > > > > > > algorithm including a lot of parameters (how much RAM is
> > > free,
> > > > > how
> > > > > > > > > probably
> > > > > > > > > > we will need next page, etc). We had the private talk
> about
> > > > such
> > > > > > idea
> > > > > > > > > some
> > > > > > > > > > time ago.
> > > > > > > > > >
> > > > > > > > > > By my experience, Linux systems already do such forward
> > > reading
> > > > > of
> > > > > > > file
> > > > > > > > > > data (for corresponding sequential flagged file
> > descriptors),
> > > > but
> > > > > > > some
> > > > > > > > > > prefetching of data at the level of application may be
> > useful
> > > > for
> > > > > > > > > O_DIRECT
> > > > > > > > > > file descriptors.
> > > > > > > > > >
> > > > > > > > > > And one more concern from me is about selecting a right
> > place
> > > > in
> > > > > > the
> > > > > > > > > system
> > > > > > > > > > to do such prefetch.
> > > > > > > > > >
> > > > > > > > > > Sincerely,
> > > > > > > > > > Dmitriy Pavlov
> > > > > > > > > >
> > > > > > > > > > вс, 16 сент. 2018 г. в 19:54, Vladimir Ozerov <
> > > > > > [hidden email]
> > > > > > > >:
> > > > > > > > > >
> > > > > > > > > > > HI Alex,
> > > > > > > > > > >
> > > > > > > > > > > This is good that you observed speedup. But I do not
> > think
> > > > this
> > > > > > > > > solution
> > > > > > > > > > > works for the product in general case. Amount of RAM is
> > > > > limited,
> > > > > > > and
> > > > > > > > > > even a
> > > > > > > > > > > single partition may need more space than RAM
> available.
> > > > > Moving a
> > > > > > > lot
> > > > > > > > > of
> > > > > > > > > > > pages to page memory for scan means that you evict a
> lot
> > of
> > > > > other
> > > > > > > > > pages,
> > > > > > > > > > > what will ultimately lead to bad performance of
> > subsequent
> > > > > > queries
> > > > > > > > and
> > > > > > > > > > > defeat LRU algorithms, which are of great improtance
> for
> > > good
> > > > > > > > database
> > > > > > > > > > > performance.
> > > > > > > > > > >
> > > > > > > > > > > Database vendors choose another approach - skip BTrees,
> > > > iterate
> > > > > > > > > direclty
> > > > > > > > > > > over data pages, read them in multi-block fashion, use
> > > > separate
> > > > > > > scan
> > > > > > > > > > buffer
> > > > > > > > > > > to avoid excessive evictions of other hot pages.
> > > > Corresponding
> > > > > > > ticket
> > > > > > > > > for
> > > > > > > > > > > SQL exists [1], but idea is common for all parts of the
> > > > system,
> > > > > > > > > requiring
> > > > > > > > > > > scans.
> > > > > > > > > > >
> > > > > > > > > > > As far as proposed solution, it might be good idea to
> add
> > > > > special
> > > > > > > API
> > > > > > > > > to
> > > > > > > > > > > "warmup" partition with clear explanation of pros (fast
> > > scan
> > > > > > after
> > > > > > > > > > warmup)
> > > > > > > > > > > and cons (slowdown of any other operations). But I
> think
> > we
> > > > > > should
> > > > > > > > not
> > > > > > > > > > make
> > > > > > > > > > > this approach part of normal scans.
> > > > > > > > > > >
> > > > > > > > > > > Vladimir.
> > > > > > > > > > >
> > > > > > > > > > > [1] https://issues.apache.org/jira/browse/IGNITE-6057
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Sun, Sep 16, 2018 at 6:44 PM Alexei Scherbakov <
> > > > > > > > > > > [hidden email]> wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Igniters,
> > > > > > > > > > > >
> > > > > > > > > > > > My use case involves scenario where it's necessary to
> > > > iterate
> > > > > > > over
> > > > > > > > > > > > large(many TBs) persistent cache doing some
> calculation
> > > on
> > > > > read
> > > > > > > > data.
> > > > > > > > > > > >
> > > > > > > > > > > > The basic solution is to iterate cache using
> ScanQuery.
> > > > > > > > > > > >
> > > > > > > > > > > > This turns out to be slow because iteration over
> cache
> > > > > > involves a
> > > > > > > > lot
> > > > > > > > > > of
> > > > > > > > > > > > random disk access for reading data pages referenced
> > from
> > > > > leaf
> > > > > > > > pages
> > > > > > > > > by
> > > > > > > > > > > > links.
> > > > > > > > > > > >
> > > > > > > > > > > > This is especially true when data is stored on disks
> > with
> > > > > slow
> > > > > > > > random
> > > > > > > > > > > > access, like SAS disks. In my case on modern SAS
> disks
> > > > array
> > > > > > > > reading
> > > > > > > > > > > speed
> > > > > > > > > > > > was like several MB/sec while sequential read speed
> in
> > > perf
> > > > > > test
> > > > > > > > was
> > > > > > > > > > > about
> > > > > > > > > > > > GB/sec.
> > > > > > > > > > > >
> > > > > > > > > > > > I was able to fix the issue by using ScanQuery with
> > > > explicit
> > > > > > > > > partition
> > > > > > > > > > > set
> > > > > > > > > > > > and running simple warmup code before each partition
> > > scan.
> > > > > > > > > > > >
> > > > > > > > > > > > The code pins cold pages in memory in sequential
> order
> > > thus
> > > > > > > > > eliminating
> > > > > > > > > > > > random disk access. Speedup was like x100 magnitude.
> > > > > > > > > > > >
> > > > > > > > > > > > I suggest adding the improvement to the product's
> core
> > > by
> > > > > > always
> > > > > > > > > > > > sequentially preloading pages for all internal
> > partition
> > > > > > > iterations
> > > > > > > > > > > (cache
> > > > > > > > > > > > iterators, scan queries, sql queries with scan plan)
> if
> > > > > > partition
> > > > > > > > is
> > > > > > > > > > cold
> > > > > > > > > > > > (low number of pinned pages).
> > > > > > > > > > > >
> > > > > > > > > > > > This also should speed up rebalancing from cold
> > > partitions.
> > > > > > > > > > > >
> > > > > > > > > > > > Ignite JIRA ticket [1]
> > > > > > > > > > > >
> > > > > > > > > > > > Thoughts ?
> > > > > > > > > > > >
> > > > > > > > > > > > [1]
> https://issues.apache.org/jira/browse/IGNITE-8873
> > > > > > > > > > > >
> > > > > > > > > > > > --
> > > > > > > > > > > >
> > > > > > > > > > > > Best regards,
> > > > > > > > > > > > Alexei Scherbakov
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > --
> > > > > > > > > --
> > > > > > > > > Maxim Muzafarov
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > >
> > > > Best regards,
> > > > Alexei Scherbakov
> > > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re[2]: Cache scan efficiency

Zhenya Stanilovsky
hi, but how to deal with page replacements, which Dmitriy Pavlov mentioned?
this approach would be efficient if all data fits into memory, may be better to have method to pin some critical caches?


>Среда, 19 сентября 2018, 0:26 +03:00 от Dmitriy Pavlov <[hidden email]>:
>
>Even better, if RAM is exhausted page replacement process will be started.
>https://cwiki.apache.org/confluence/display/IGNITE/Ignite+Durable+Memory+-+under+the+hood#IgniteDurableMemory-underthehood-Pagereplacement(rotationwithdisk )
>
>Effect of the preloading will be still markable, but not as excelled as
>with full-fitting into RAM. Later I can review or improve javadocs if it is
>necessary.
>
>ср, 19 сент. 2018 г. в 0:18, Denis Magda < [hidden email] >:
>
>> Agree, it's just a matter of the documentation. If a user stores 100% in
>> RAM and on disk, and just wants to warm RAM up after a restart then he
>> knows everything will fit there. If during the preloading we detect that
>> the RAM is exhausted we can halt it and print out a warning.
>>
>> --
>> Denis
>>
>> On Tue, Sep 18, 2018 at 2:10 PM Dmitriy Pavlov < [hidden email] >
>> wrote:
>>
>> > Hi,
>> >
>> > I totally support the idea of cache preload.
>> >
>> > IMO it can be expanded. We can iterate over local partitions of the cache
>> > group and preload each.
>> >
>> > But it should be really clear documented methods so a user can be aware
>> of
>> > the benefits of such method (e.g. if RAM region is big enough, etc).
>> >
>> > Sincerely,
>> > Dmitriy Pavlov
>> >
>> > вт, 18 сент. 2018 г. в 21:36, Denis Magda < [hidden email] >:
>> >
>> > > Folks,
>> > >
>> > > Since we're adding a method that would preload a certain partition, can
>> > we
>> > > add the one which will preload the whole cache? Ignite persistence
>> users
>> > > I've been working with look puzzled once they realize there is no way
>> to
>> > > warm up RAM after the restart. There are use cases that require this.
>> > >
>> > > Can the current optimizations be expanded to the cache preloading use
>> > case?
>> > >
>> > > --
>> > > Denis
>> > >
>> > > On Tue, Sep 18, 2018 at 3:58 AM Alexei Scherbakov <
>> > >  [hidden email] > wrote:
>> > >
>> > > > Summing up, I suggest adding new public
>> > > > method IgniteCache.preloadPartition(partId).
>> > > >
>> > > > I will start preparing PR for IGNITE-8873
>> > > > < https://issues.apache.org/jira/browse/IGNITE-8873 > if no more
>> > > objections
>> > > > follow.
>> > > >
>> > > >
>> > > >
>> > > > вт, 18 сент. 2018 г. в 10:50, Alexey Goncharuk <
>> > >  [hidden email]
>> > > > >:
>> > > >
>> > > > > Dmitriy,
>> > > > >
>> > > > > In my understanding, the proper fix for the scan query looks like a
>> > big
>> > > > > change and it is unlikely that we include it in Ignite 2.7. On the
>> > > other
>> > > > > hand, the method suggested by Alexei is quite simple  and it
>> > definitely
>> > > > > fits Ignite 2.7, which will provide a better user experience. Even
>> > > > having a
>> > > > > proper scan query implemented this method can be useful in some
>> > > specific
>> > > > > scenarios, so we will not have to deprecate it.
>> > > > >
>> > > > > --AG
>> > > > >
>> > > > > пн, 17 сент. 2018 г. в 19:15, Dmitriy Pavlov <
>>  [hidden email]
>> > >:
>> > > > >
>> > > > > > As I understood it is not a hack, it is an advanced feature for
>> > > warming
>> > > > > up
>> > > > > > the partition. We can build warm-up of the overall cache by
>> calling
>> > > its
>> > > > > > partitions warm-up. Users often ask about this feature and are
>> not
>> > > > > > confident with our lazy upload.
>> > > > > >
>> > > > > > Please correct me if I misunderstood the idea.
>> > > > > >
>> > > > > > пн, 17 сент. 2018 г. в 18:37, Dmitriy Setrakyan <
>> > >  [hidden email]
>> > > > >:
>> > > > > >
>> > > > > > > I would rather fix the scan than hack the scan. Is there any
>> > > > technical
>> > > > > > > reason for hacking it now instead of fixing it properly? Can
>> some
>> > > of
>> > > > > the
>> > > > > > > experts in this thread provide an estimate of complexity and
>> > > > difference
>> > > > > > in
>> > > > > > > work that would be required for each approach?
>> > > > > > >
>> > > > > > > D.
>> > > > > > >
>> > > > > > > On Mon, Sep 17, 2018 at 4:42 PM Alexey Goncharuk <
>> > > > > > >  [hidden email] >
>> > > > > > > wrote:
>> > > > > > >
>> > > > > > > > I think it would be beneficial for some Ignite users if we
>> > added
>> > > > > such a
>> > > > > > > > partition warmup method to the public API. The method should
>> be
>> > > > > > > > well-documented and state that it may invalidate existing
>> page
>> > > > cache.
>> > > > > > It
>> > > > > > > > will be a very effective instrument until we add the proper
>> > scan
>> > > > > > ability
>> > > > > > > > that Vladimir was referring to.
>> > > > > > > >
>> > > > > > > > пн, 17 сент. 2018 г. в 13:05, Maxim Muzafarov <
>> > >  [hidden email]
>> > > > >:
>> > > > > > > >
>> > > > > > > > > Folks,
>> > > > > > > > >
>> > > > > > > > > Such warming up can be an effective technique for
>> performing
>> > > > > > > calculations
>> > > > > > > > > which required large cache
>> > > > > > > > > data reads, but I think it's the single narrow use case of
>> > all
>> > > > over
>> > > > > > > > Ignite
>> > > > > > > > > store usages. Like all other
>> > > > > > > > > powerfull techniques, we should use it wisely. In the
>> general
>> > > > > case, I
>> > > > > > > > think
>> > > > > > > > > we should consider other
>> > > > > > > > > techniques mentioned by Vladimir and may create something
>> > like
>> > > > > > `global
>> > > > > > > > > statistics of cache data usage`
>> > > > > > > > > to choose the best technique in each case.
>> > > > > > > > >
>> > > > > > > > > For instance, it's not obvious what would take longer:
>> > > > multi-block
>> > > > > > > reads
>> > > > > > > > or
>> > > > > > > > > 50 single-block reads issues
>> > > > > > > > > sequentially. It strongly depends on used hardware under
>> the
>> > > hood
>> > > > > and
>> > > > > > > > might
>> > > > > > > > > depend on workload system
>> > > > > > > > > resources (CPU-intensive calculations and I\O access) as
>> > well.
>> > > > But
>> > > > > > > > > `statistics` will help us to choose
>> > > > > > > > > the right way.
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > > On Sun, 16 Sep 2018 at 23:59 Dmitriy Pavlov <
>> > > >  [hidden email]
>> > > > > >
>> > > > > > > > wrote:
>> > > > > > > > >
>> > > > > > > > > > Hi Alexei,
>> > > > > > > > > >
>> > > > > > > > > > I did not find any PRs associated with the ticket for
>> check
>> > > > code
>> > > > > > > > changes
>> > > > > > > > > > behind this idea. Are there any PRs?
>> > > > > > > > > >
>> > > > > > > > > > If we create some forwards scan of pages, it should be a
>> > very
>> > > > > > > > > intellectual
>> > > > > > > > > > algorithm including a lot of parameters (how much RAM is
>> > > free,
>> > > > > how
>> > > > > > > > > probably
>> > > > > > > > > > we will need next page, etc). We had the private talk
>> about
>> > > > such
>> > > > > > idea
>> > > > > > > > > some
>> > > > > > > > > > time ago.
>> > > > > > > > > >
>> > > > > > > > > > By my experience, Linux systems already do such forward
>> > > reading
>> > > > > of
>> > > > > > > file
>> > > > > > > > > > data (for corresponding sequential flagged file
>> > descriptors),
>> > > > but
>> > > > > > > some
>> > > > > > > > > > prefetching of data at the level of application may be
>> > useful
>> > > > for
>> > > > > > > > > O_DIRECT
>> > > > > > > > > > file descriptors.
>> > > > > > > > > >
>> > > > > > > > > > And one more concern from me is about selecting a right
>> > place
>> > > > in
>> > > > > > the
>> > > > > > > > > system
>> > > > > > > > > > to do such prefetch.
>> > > > > > > > > >
>> > > > > > > > > > Sincerely,
>> > > > > > > > > > Dmitriy Pavlov
>> > > > > > > > > >
>> > > > > > > > > > вс, 16 сент. 2018 г. в 19:54, Vladimir Ozerov <
>> > > > > >  [hidden email]
>> > > > > > > >:
>> > > > > > > > > >
>> > > > > > > > > > > HI Alex,
>> > > > > > > > > > >
>> > > > > > > > > > > This is good that you observed speedup. But I do not
>> > think
>> > > > this
>> > > > > > > > > solution
>> > > > > > > > > > > works for the product in general case. Amount of RAM is
>> > > > > limited,
>> > > > > > > and
>> > > > > > > > > > even a
>> > > > > > > > > > > single partition may need more space than RAM
>> available.
>> > > > > Moving a
>> > > > > > > lot
>> > > > > > > > > of
>> > > > > > > > > > > pages to page memory for scan means that you evict a
>> lot
>> > of
>> > > > > other
>> > > > > > > > > pages,
>> > > > > > > > > > > what will ultimately lead to bad performance of
>> > subsequent
>> > > > > > queries
>> > > > > > > > and
>> > > > > > > > > > > defeat LRU algorithms, which are of great improtance
>> for
>> > > good
>> > > > > > > > database
>> > > > > > > > > > > performance.
>> > > > > > > > > > >
>> > > > > > > > > > > Database vendors choose another approach - skip BTrees,
>> > > > iterate
>> > > > > > > > > direclty
>> > > > > > > > > > > over data pages, read them in multi-block fashion, use
>> > > > separate
>> > > > > > > scan
>> > > > > > > > > > buffer
>> > > > > > > > > > > to avoid excessive evictions of other hot pages.
>> > > > Corresponding
>> > > > > > > ticket
>> > > > > > > > > for
>> > > > > > > > > > > SQL exists [1], but idea is common for all parts of the
>> > > > system,
>> > > > > > > > > requiring
>> > > > > > > > > > > scans.
>> > > > > > > > > > >
>> > > > > > > > > > > As far as proposed solution, it might be good idea to
>> add
>> > > > > special
>> > > > > > > API
>> > > > > > > > > to
>> > > > > > > > > > > "warmup" partition with clear explanation of pros (fast
>> > > scan
>> > > > > > after
>> > > > > > > > > > warmup)
>> > > > > > > > > > > and cons (slowdown of any other operations). But I
>> think
>> > we
>> > > > > > should
>> > > > > > > > not
>> > > > > > > > > > make
>> > > > > > > > > > > this approach part of normal scans.
>> > > > > > > > > > >
>> > > > > > > > > > > Vladimir.
>> > > > > > > > > > >
>> > > > > > > > > > > [1]  https://issues.apache.org/jira/browse/IGNITE-6057
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > > On Sun, Sep 16, 2018 at 6:44 PM Alexei Scherbakov <
>> > > > > > > > > > >  [hidden email] > wrote:
>> > > > > > > > > > >
>> > > > > > > > > > > > Igniters,
>> > > > > > > > > > > >
>> > > > > > > > > > > > My use case involves scenario where it's necessary to
>> > > > iterate
>> > > > > > > over
>> > > > > > > > > > > > large(many TBs) persistent cache doing some
>> calculation
>> > > on
>> > > > > read
>> > > > > > > > data.
>> > > > > > > > > > > >
>> > > > > > > > > > > > The basic solution is to iterate cache using
>> ScanQuery.
>> > > > > > > > > > > >
>> > > > > > > > > > > > This turns out to be slow because iteration over
>> cache
>> > > > > > involves a
>> > > > > > > > lot
>> > > > > > > > > > of
>> > > > > > > > > > > > random disk access for reading data pages referenced
>> > from
>> > > > > leaf
>> > > > > > > > pages
>> > > > > > > > > by
>> > > > > > > > > > > > links.
>> > > > > > > > > > > >
>> > > > > > > > > > > > This is especially true when data is stored on disks
>> > with
>> > > > > slow
>> > > > > > > > random
>> > > > > > > > > > > > access, like SAS disks. In my case on modern SAS
>> disks
>> > > > array
>> > > > > > > > reading
>> > > > > > > > > > > speed
>> > > > > > > > > > > > was like several MB/sec while sequential read speed
>> in
>> > > perf
>> > > > > > test
>> > > > > > > > was
>> > > > > > > > > > > about
>> > > > > > > > > > > > GB/sec.
>> > > > > > > > > > > >
>> > > > > > > > > > > > I was able to fix the issue by using ScanQuery with
>> > > > explicit
>> > > > > > > > > partition
>> > > > > > > > > > > set
>> > > > > > > > > > > > and running simple warmup code before each partition
>> > > scan.
>> > > > > > > > > > > >
>> > > > > > > > > > > > The code pins cold pages in memory in sequential
>> order
>> > > thus
>> > > > > > > > > eliminating
>> > > > > > > > > > > > random disk access. Speedup was like x100 magnitude.
>> > > > > > > > > > > >
>> > > > > > > > > > > > I suggest adding the improvement to the product's
>> core
>> > > by
>> > > > > > always
>> > > > > > > > > > > > sequentially preloading pages for all internal
>> > partition
>> > > > > > > iterations
>> > > > > > > > > > > (cache
>> > > > > > > > > > > > iterators, scan queries, sql queries with scan plan)
>> if
>> > > > > > partition
>> > > > > > > > is
>> > > > > > > > > > cold
>> > > > > > > > > > > > (low number of pinned pages).
>> > > > > > > > > > > >
>> > > > > > > > > > > > This also should speed up rebalancing from cold
>> > > partitions.
>> > > > > > > > > > > >
>> > > > > > > > > > > > Ignite JIRA ticket [1]
>> > > > > > > > > > > >
>> > > > > > > > > > > > Thoughts ?
>> > > > > > > > > > > >
>> > > > > > > > > > > > [1]
>>  https://issues.apache.org/jira/browse/IGNITE-8873
>> > > > > > > > > > > >
>> > > > > > > > > > > > --
>> > > > > > > > > > > >
>> > > > > > > > > > > > Best regards,
>> > > > > > > > > > > > Alexei Scherbakov
>> > > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > --
>> > > > > > > > > --
>> > > > > > > > > Maxim Muzafarov
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > > >
>> > > > --
>> > > >
>> > > > Best regards,
>> > > > Alexei Scherbakov
>> > > >
>> > >
>> >
>>


--
Zhenya Stanilovsky
Reply | Threaded
Open this post in threaded view
|

Re: Re[2]: Cache scan efficiency

Vladimir Ozerov
Pinning is even worse thing, because you loose control on how data is moved
within a single region. Instead, I would suggest to use partition warmup +
separate data region to achieve "pinning" semantics.

On Wed, Sep 19, 2018 at 8:34 AM Zhenya Stanilovsky
<[hidden email]> wrote:

> hi, but how to deal with page replacements, which Dmitriy Pavlov mentioned?
> this approach would be efficient if all data fits into memory, may be
> better to have method to pin some critical caches?
>
>
> >Среда, 19 сентября 2018, 0:26 +03:00 от Dmitriy Pavlov <
> [hidden email]>:
> >
> >Even better, if RAM is exhausted page replacement process will be started.
> >
> https://cwiki.apache.org/confluence/display/IGNITE/Ignite+Durable+Memory+-+under+the+hood#IgniteDurableMemory-underthehood-Pagereplacement(rotationwithdisk
> )
> >
> >Effect of the preloading will be still markable, but not as excelled as
> >with full-fitting into RAM. Later I can review or improve javadocs if it
> is
> >necessary.
> >
> >ср, 19 сент. 2018 г. в 0:18, Denis Magda < [hidden email] >:
> >
> >> Agree, it's just a matter of the documentation. If a user stores 100% in
> >> RAM and on disk, and just wants to warm RAM up after a restart then he
> >> knows everything will fit there. If during the preloading we detect that
> >> the RAM is exhausted we can halt it and print out a warning.
> >>
> >> --
> >> Denis
> >>
> >> On Tue, Sep 18, 2018 at 2:10 PM Dmitriy Pavlov < [hidden email]
> >
> >> wrote:
> >>
> >> > Hi,
> >> >
> >> > I totally support the idea of cache preload.
> >> >
> >> > IMO it can be expanded. We can iterate over local partitions of the
> cache
> >> > group and preload each.
> >> >
> >> > But it should be really clear documented methods so a user can be
> aware
> >> of
> >> > the benefits of such method (e.g. if RAM region is big enough, etc).
> >> >
> >> > Sincerely,
> >> > Dmitriy Pavlov
> >> >
> >> > вт, 18 сент. 2018 г. в 21:36, Denis Magda < [hidden email] >:
> >> >
> >> > > Folks,
> >> > >
> >> > > Since we're adding a method that would preload a certain partition,
> can
> >> > we
> >> > > add the one which will preload the whole cache? Ignite persistence
> >> users
> >> > > I've been working with look puzzled once they realize there is no
> way
> >> to
> >> > > warm up RAM after the restart. There are use cases that require
> this.
> >> > >
> >> > > Can the current optimizations be expanded to the cache preloading
> use
> >> > case?
> >> > >
> >> > > --
> >> > > Denis
> >> > >
> >> > > On Tue, Sep 18, 2018 at 3:58 AM Alexei Scherbakov <
> >> > >  [hidden email] > wrote:
> >> > >
> >> > > > Summing up, I suggest adding new public
> >> > > > method IgniteCache.preloadPartition(partId).
> >> > > >
> >> > > > I will start preparing PR for IGNITE-8873
> >> > > > < https://issues.apache.org/jira/browse/IGNITE-8873 > if no more
> >> > > objections
> >> > > > follow.
> >> > > >
> >> > > >
> >> > > >
> >> > > > вт, 18 сент. 2018 г. в 10:50, Alexey Goncharuk <
> >> > >  [hidden email]
> >> > > > >:
> >> > > >
> >> > > > > Dmitriy,
> >> > > > >
> >> > > > > In my understanding, the proper fix for the scan query looks
> like a
> >> > big
> >> > > > > change and it is unlikely that we include it in Ignite 2.7. On
> the
> >> > > other
> >> > > > > hand, the method suggested by Alexei is quite simple  and it
> >> > definitely
> >> > > > > fits Ignite 2.7, which will provide a better user experience.
> Even
> >> > > > having a
> >> > > > > proper scan query implemented this method can be useful in some
> >> > > specific
> >> > > > > scenarios, so we will not have to deprecate it.
> >> > > > >
> >> > > > > --AG
> >> > > > >
> >> > > > > пн, 17 сент. 2018 г. в 19:15, Dmitriy Pavlov <
> >>  [hidden email]
> >> > >:
> >> > > > >
> >> > > > > > As I understood it is not a hack, it is an advanced feature
> for
> >> > > warming
> >> > > > > up
> >> > > > > > the partition. We can build warm-up of the overall cache by
> >> calling
> >> > > its
> >> > > > > > partitions warm-up. Users often ask about this feature and are
> >> not
> >> > > > > > confident with our lazy upload.
> >> > > > > >
> >> > > > > > Please correct me if I misunderstood the idea.
> >> > > > > >
> >> > > > > > пн, 17 сент. 2018 г. в 18:37, Dmitriy Setrakyan <
> >> > >  [hidden email]
> >> > > > >:
> >> > > > > >
> >> > > > > > > I would rather fix the scan than hack the scan. Is there any
> >> > > > technical
> >> > > > > > > reason for hacking it now instead of fixing it properly? Can
> >> some
> >> > > of
> >> > > > > the
> >> > > > > > > experts in this thread provide an estimate of complexity and
> >> > > > difference
> >> > > > > > in
> >> > > > > > > work that would be required for each approach?
> >> > > > > > >
> >> > > > > > > D.
> >> > > > > > >
> >> > > > > > > On Mon, Sep 17, 2018 at 4:42 PM Alexey Goncharuk <
> >> > > > > > >  [hidden email] >
> >> > > > > > > wrote:
> >> > > > > > >
> >> > > > > > > > I think it would be beneficial for some Ignite users if we
> >> > added
> >> > > > > such a
> >> > > > > > > > partition warmup method to the public API. The method
> should
> >> be
> >> > > > > > > > well-documented and state that it may invalidate existing
> >> page
> >> > > > cache.
> >> > > > > > It
> >> > > > > > > > will be a very effective instrument until we add the
> proper
> >> > scan
> >> > > > > > ability
> >> > > > > > > > that Vladimir was referring to.
> >> > > > > > > >
> >> > > > > > > > пн, 17 сент. 2018 г. в 13:05, Maxim Muzafarov <
> >> > >  [hidden email]
> >> > > > >:
> >> > > > > > > >
> >> > > > > > > > > Folks,
> >> > > > > > > > >
> >> > > > > > > > > Such warming up can be an effective technique for
> >> performing
> >> > > > > > > calculations
> >> > > > > > > > > which required large cache
> >> > > > > > > > > data reads, but I think it's the single narrow use case
> of
> >> > all
> >> > > > over
> >> > > > > > > > Ignite
> >> > > > > > > > > store usages. Like all other
> >> > > > > > > > > powerfull techniques, we should use it wisely. In the
> >> general
> >> > > > > case, I
> >> > > > > > > > think
> >> > > > > > > > > we should consider other
> >> > > > > > > > > techniques mentioned by Vladimir and may create
> something
> >> > like
> >> > > > > > `global
> >> > > > > > > > > statistics of cache data usage`
> >> > > > > > > > > to choose the best technique in each case.
> >> > > > > > > > >
> >> > > > > > > > > For instance, it's not obvious what would take longer:
> >> > > > multi-block
> >> > > > > > > reads
> >> > > > > > > > or
> >> > > > > > > > > 50 single-block reads issues
> >> > > > > > > > > sequentially. It strongly depends on used hardware under
> >> the
> >> > > hood
> >> > > > > and
> >> > > > > > > > might
> >> > > > > > > > > depend on workload system
> >> > > > > > > > > resources (CPU-intensive calculations and I\O access) as
> >> > well.
> >> > > > But
> >> > > > > > > > > `statistics` will help us to choose
> >> > > > > > > > > the right way.
> >> > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > > > On Sun, 16 Sep 2018 at 23:59 Dmitriy Pavlov <
> >> > > >  [hidden email]
> >> > > > > >
> >> > > > > > > > wrote:
> >> > > > > > > > >
> >> > > > > > > > > > Hi Alexei,
> >> > > > > > > > > >
> >> > > > > > > > > > I did not find any PRs associated with the ticket for
> >> check
> >> > > > code
> >> > > > > > > > changes
> >> > > > > > > > > > behind this idea. Are there any PRs?
> >> > > > > > > > > >
> >> > > > > > > > > > If we create some forwards scan of pages, it should
> be a
> >> > very
> >> > > > > > > > > intellectual
> >> > > > > > > > > > algorithm including a lot of parameters (how much RAM
> is
> >> > > free,
> >> > > > > how
> >> > > > > > > > > probably
> >> > > > > > > > > > we will need next page, etc). We had the private talk
> >> about
> >> > > > such
> >> > > > > > idea
> >> > > > > > > > > some
> >> > > > > > > > > > time ago.
> >> > > > > > > > > >
> >> > > > > > > > > > By my experience, Linux systems already do such
> forward
> >> > > reading
> >> > > > > of
> >> > > > > > > file
> >> > > > > > > > > > data (for corresponding sequential flagged file
> >> > descriptors),
> >> > > > but
> >> > > > > > > some
> >> > > > > > > > > > prefetching of data at the level of application may be
> >> > useful
> >> > > > for
> >> > > > > > > > > O_DIRECT
> >> > > > > > > > > > file descriptors.
> >> > > > > > > > > >
> >> > > > > > > > > > And one more concern from me is about selecting a
> right
> >> > place
> >> > > > in
> >> > > > > > the
> >> > > > > > > > > system
> >> > > > > > > > > > to do such prefetch.
> >> > > > > > > > > >
> >> > > > > > > > > > Sincerely,
> >> > > > > > > > > > Dmitriy Pavlov
> >> > > > > > > > > >
> >> > > > > > > > > > вс, 16 сент. 2018 г. в 19:54, Vladimir Ozerov <
> >> > > > > >  [hidden email]
> >> > > > > > > >:
> >> > > > > > > > > >
> >> > > > > > > > > > > HI Alex,
> >> > > > > > > > > > >
> >> > > > > > > > > > > This is good that you observed speedup. But I do not
> >> > think
> >> > > > this
> >> > > > > > > > > solution
> >> > > > > > > > > > > works for the product in general case. Amount of
> RAM is
> >> > > > > limited,
> >> > > > > > > and
> >> > > > > > > > > > even a
> >> > > > > > > > > > > single partition may need more space than RAM
> >> available.
> >> > > > > Moving a
> >> > > > > > > lot
> >> > > > > > > > > of
> >> > > > > > > > > > > pages to page memory for scan means that you evict a
> >> lot
> >> > of
> >> > > > > other
> >> > > > > > > > > pages,
> >> > > > > > > > > > > what will ultimately lead to bad performance of
> >> > subsequent
> >> > > > > > queries
> >> > > > > > > > and
> >> > > > > > > > > > > defeat LRU algorithms, which are of great improtance
> >> for
> >> > > good
> >> > > > > > > > database
> >> > > > > > > > > > > performance.
> >> > > > > > > > > > >
> >> > > > > > > > > > > Database vendors choose another approach - skip
> BTrees,
> >> > > > iterate
> >> > > > > > > > > direclty
> >> > > > > > > > > > > over data pages, read them in multi-block fashion,
> use
> >> > > > separate
> >> > > > > > > scan
> >> > > > > > > > > > buffer
> >> > > > > > > > > > > to avoid excessive evictions of other hot pages.
> >> > > > Corresponding
> >> > > > > > > ticket
> >> > > > > > > > > for
> >> > > > > > > > > > > SQL exists [1], but idea is common for all parts of
> the
> >> > > > system,
> >> > > > > > > > > requiring
> >> > > > > > > > > > > scans.
> >> > > > > > > > > > >
> >> > > > > > > > > > > As far as proposed solution, it might be good idea
> to
> >> add
> >> > > > > special
> >> > > > > > > API
> >> > > > > > > > > to
> >> > > > > > > > > > > "warmup" partition with clear explanation of pros
> (fast
> >> > > scan
> >> > > > > > after
> >> > > > > > > > > > warmup)
> >> > > > > > > > > > > and cons (slowdown of any other operations). But I
> >> think
> >> > we
> >> > > > > > should
> >> > > > > > > > not
> >> > > > > > > > > > make
> >> > > > > > > > > > > this approach part of normal scans.
> >> > > > > > > > > > >
> >> > > > > > > > > > > Vladimir.
> >> > > > > > > > > > >
> >> > > > > > > > > > > [1]
> https://issues.apache.org/jira/browse/IGNITE-6057
> >> > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > > On Sun, Sep 16, 2018 at 6:44 PM Alexei Scherbakov <
> >> > > > > > > > > > >  [hidden email] > wrote:
> >> > > > > > > > > > >
> >> > > > > > > > > > > > Igniters,
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > My use case involves scenario where it's
> necessary to
> >> > > > iterate
> >> > > > > > > over
> >> > > > > > > > > > > > large(many TBs) persistent cache doing some
> >> calculation
> >> > > on
> >> > > > > read
> >> > > > > > > > data.
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > The basic solution is to iterate cache using
> >> ScanQuery.
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > This turns out to be slow because iteration over
> >> cache
> >> > > > > > involves a
> >> > > > > > > > lot
> >> > > > > > > > > > of
> >> > > > > > > > > > > > random disk access for reading data pages
> referenced
> >> > from
> >> > > > > leaf
> >> > > > > > > > pages
> >> > > > > > > > > by
> >> > > > > > > > > > > > links.
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > This is especially true when data is stored on
> disks
> >> > with
> >> > > > > slow
> >> > > > > > > > random
> >> > > > > > > > > > > > access, like SAS disks. In my case on modern SAS
> >> disks
> >> > > > array
> >> > > > > > > > reading
> >> > > > > > > > > > > speed
> >> > > > > > > > > > > > was like several MB/sec while sequential read
> speed
> >> in
> >> > > perf
> >> > > > > > test
> >> > > > > > > > was
> >> > > > > > > > > > > about
> >> > > > > > > > > > > > GB/sec.
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > I was able to fix the issue by using ScanQuery
> with
> >> > > > explicit
> >> > > > > > > > > partition
> >> > > > > > > > > > > set
> >> > > > > > > > > > > > and running simple warmup code before each
> partition
> >> > > scan.
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > The code pins cold pages in memory in sequential
> >> order
> >> > > thus
> >> > > > > > > > > eliminating
> >> > > > > > > > > > > > random disk access. Speedup was like x100
> magnitude.
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > I suggest adding the improvement to the product's
> >> core
> >> > > by
> >> > > > > > always
> >> > > > > > > > > > > > sequentially preloading pages for all internal
> >> > partition
> >> > > > > > > iterations
> >> > > > > > > > > > > (cache
> >> > > > > > > > > > > > iterators, scan queries, sql queries with scan
> plan)
> >> if
> >> > > > > > partition
> >> > > > > > > > is
> >> > > > > > > > > > cold
> >> > > > > > > > > > > > (low number of pinned pages).
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > This also should speed up rebalancing from cold
> >> > > partitions.
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > Ignite JIRA ticket [1]
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > Thoughts ?
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > [1]
> >>  https://issues.apache.org/jira/browse/IGNITE-8873
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > --
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > Best regards,
> >> > > > > > > > > > > > Alexei Scherbakov
> >> > > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > > --
> >> > > > > > > > > --
> >> > > > > > > > > Maxim Muzafarov
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > > >
> >> > > > --
> >> > > >
> >> > > > Best regards,
> >> > > > Alexei Scherbakov
> >> > > >
> >> > >
> >> >
> >>
>
>
> --
> Zhenya Stanilovsky
>
Reply | Threaded
Open this post in threaded view
|

Re[4]: Cache scan efficiency

Zhenya Stanilovsky
Great, i don`t think about that.


>Среда, 19 сентября 2018, 9:40 +03:00 от Vladimir Ozerov <[hidden email]>:
>
>Pinning is even worse thing, because you loose control on how data is moved
>within a single region. Instead, I would suggest to use partition warmup +
>separate data region to achieve "pinning" semantics.
>
>On Wed, Sep 19, 2018 at 8:34 AM Zhenya Stanilovsky
>< [hidden email] > wrote:
>
>> hi, but how to deal with page replacements, which Dmitriy Pavlov mentioned?
>> this approach would be efficient if all data fits into memory, may be
>> better to have method to pin some critical caches?
>>
>>
>> >Среда, 19 сентября 2018, 0:26 +03:00 от Dmitriy Pavlov <
>>  [hidden email] >:
>> >
>> >Even better, if RAM is exhausted page replacement process will be started.
>> >
>>  https://cwiki.apache.org/confluence/display/IGNITE/Ignite+Durable+Memory+-+under+the+hood#IgniteDurableMemory-underthehood-Pagereplacement(rotationwithdisk
>> )
>> >
>> >Effect of the preloading will be still markable, but not as excelled as
>> >with full-fitting into RAM. Later I can review or improve javadocs if it
>> is
>> >necessary.
>> >
>> >ср, 19 сент. 2018 г. в 0:18, Denis Magda <  [hidden email] >:
>> >
>> >> Agree, it's just a matter of the documentation. If a user stores 100% in
>> >> RAM and on disk, and just wants to warm RAM up after a restart then he
>> >> knows everything will fit there. If during the preloading we detect that
>> >> the RAM is exhausted we can halt it and print out a warning.
>> >>
>> >> --
>> >> Denis
>> >>
>> >> On Tue, Sep 18, 2018 at 2:10 PM Dmitriy Pavlov <  [hidden email]
>> >
>> >> wrote:
>> >>
>> >> > Hi,
>> >> >
>> >> > I totally support the idea of cache preload.
>> >> >
>> >> > IMO it can be expanded. We can iterate over local partitions of the
>> cache
>> >> > group and preload each.
>> >> >
>> >> > But it should be really clear documented methods so a user can be
>> aware
>> >> of
>> >> > the benefits of such method (e.g. if RAM region is big enough, etc).
>> >> >
>> >> > Sincerely,
>> >> > Dmitriy Pavlov
>> >> >
>> >> > вт, 18 сент. 2018 г. в 21:36, Denis Magda <  [hidden email] >:
>> >> >
>> >> > > Folks,
>> >> > >
>> >> > > Since we're adding a method that would preload a certain partition,
>> can
>> >> > we
>> >> > > add the one which will preload the whole cache? Ignite persistence
>> >> users
>> >> > > I've been working with look puzzled once they realize there is no
>> way
>> >> to
>> >> > > warm up RAM after the restart. There are use cases that require
>> this.
>> >> > >
>> >> > > Can the current optimizations be expanded to the cache preloading
>> use
>> >> > case?
>> >> > >
>> >> > > --
>> >> > > Denis
>> >> > >
>> >> > > On Tue, Sep 18, 2018 at 3:58 AM Alexei Scherbakov <
>> >> > >  [hidden email] > wrote:
>> >> > >
>> >> > > > Summing up, I suggest adding new public
>> >> > > > method IgniteCache.preloadPartition(partId).
>> >> > > >
>> >> > > > I will start preparing PR for IGNITE-8873
>> >> > > > <  https://issues.apache.org/jira/browse/IGNITE-8873 > if no more
>> >> > > objections
>> >> > > > follow.
>> >> > > >
>> >> > > >
>> >> > > >
>> >> > > > вт, 18 сент. 2018 г. в 10:50, Alexey Goncharuk <
>> >> > >  [hidden email]
>> >> > > > >:
>> >> > > >
>> >> > > > > Dmitriy,
>> >> > > > >
>> >> > > > > In my understanding, the proper fix for the scan query looks
>> like a
>> >> > big
>> >> > > > > change and it is unlikely that we include it in Ignite 2.7. On
>> the
>> >> > > other
>> >> > > > > hand, the method suggested by Alexei is quite simple  and it
>> >> > definitely
>> >> > > > > fits Ignite 2.7, which will provide a better user experience.
>> Even
>> >> > > > having a
>> >> > > > > proper scan query implemented this method can be useful in some
>> >> > > specific
>> >> > > > > scenarios, so we will not have to deprecate it.
>> >> > > > >
>> >> > > > > --AG
>> >> > > > >
>> >> > > > > пн, 17 сент. 2018 г. в 19:15, Dmitriy Pavlov <
>> >>  [hidden email]
>> >> > >:
>> >> > > > >
>> >> > > > > > As I understood it is not a hack, it is an advanced feature
>> for
>> >> > > warming
>> >> > > > > up
>> >> > > > > > the partition. We can build warm-up of the overall cache by
>> >> calling
>> >> > > its
>> >> > > > > > partitions warm-up. Users often ask about this feature and are
>> >> not
>> >> > > > > > confident with our lazy upload.
>> >> > > > > >
>> >> > > > > > Please correct me if I misunderstood the idea.
>> >> > > > > >
>> >> > > > > > пн, 17 сент. 2018 г. в 18:37, Dmitriy Setrakyan <
>> >> > >  [hidden email]
>> >> > > > >:
>> >> > > > > >
>> >> > > > > > > I would rather fix the scan than hack the scan. Is there any
>> >> > > > technical
>> >> > > > > > > reason for hacking it now instead of fixing it properly? Can
>> >> some
>> >> > > of
>> >> > > > > the
>> >> > > > > > > experts in this thread provide an estimate of complexity and
>> >> > > > difference
>> >> > > > > > in
>> >> > > > > > > work that would be required for each approach?
>> >> > > > > > >
>> >> > > > > > > D.
>> >> > > > > > >
>> >> > > > > > > On Mon, Sep 17, 2018 at 4:42 PM Alexey Goncharuk <
>> >> > > > > > >  [hidden email] >
>> >> > > > > > > wrote:
>> >> > > > > > >
>> >> > > > > > > > I think it would be beneficial for some Ignite users if we
>> >> > added
>> >> > > > > such a
>> >> > > > > > > > partition warmup method to the public API. The method
>> should
>> >> be
>> >> > > > > > > > well-documented and state that it may invalidate existing
>> >> page
>> >> > > > cache.
>> >> > > > > > It
>> >> > > > > > > > will be a very effective instrument until we add the
>> proper
>> >> > scan
>> >> > > > > > ability
>> >> > > > > > > > that Vladimir was referring to.
>> >> > > > > > > >
>> >> > > > > > > > пн, 17 сент. 2018 г. в 13:05, Maxim Muzafarov <
>> >> > >  [hidden email]
>> >> > > > >:
>> >> > > > > > > >
>> >> > > > > > > > > Folks,
>> >> > > > > > > > >
>> >> > > > > > > > > Such warming up can be an effective technique for
>> >> performing
>> >> > > > > > > calculations
>> >> > > > > > > > > which required large cache
>> >> > > > > > > > > data reads, but I think it's the single narrow use case
>> of
>> >> > all
>> >> > > > over
>> >> > > > > > > > Ignite
>> >> > > > > > > > > store usages. Like all other
>> >> > > > > > > > > powerfull techniques, we should use it wisely. In the
>> >> general
>> >> > > > > case, I
>> >> > > > > > > > think
>> >> > > > > > > > > we should consider other
>> >> > > > > > > > > techniques mentioned by Vladimir and may create
>> something
>> >> > like
>> >> > > > > > `global
>> >> > > > > > > > > statistics of cache data usage`
>> >> > > > > > > > > to choose the best technique in each case.
>> >> > > > > > > > >
>> >> > > > > > > > > For instance, it's not obvious what would take longer:
>> >> > > > multi-block
>> >> > > > > > > reads
>> >> > > > > > > > or
>> >> > > > > > > > > 50 single-block reads issues
>> >> > > > > > > > > sequentially. It strongly depends on used hardware under
>> >> the
>> >> > > hood
>> >> > > > > and
>> >> > > > > > > > might
>> >> > > > > > > > > depend on workload system
>> >> > > > > > > > > resources (CPU-intensive calculations and I\O access) as
>> >> > well.
>> >> > > > But
>> >> > > > > > > > > `statistics` will help us to choose
>> >> > > > > > > > > the right way.
>> >> > > > > > > > >
>> >> > > > > > > > >
>> >> > > > > > > > > On Sun, 16 Sep 2018 at 23:59 Dmitriy Pavlov <
>> >> > > >  [hidden email]
>> >> > > > > >
>> >> > > > > > > > wrote:
>> >> > > > > > > > >
>> >> > > > > > > > > > Hi Alexei,
>> >> > > > > > > > > >
>> >> > > > > > > > > > I did not find any PRs associated with the ticket for
>> >> check
>> >> > > > code
>> >> > > > > > > > changes
>> >> > > > > > > > > > behind this idea. Are there any PRs?
>> >> > > > > > > > > >
>> >> > > > > > > > > > If we create some forwards scan of pages, it should
>> be a
>> >> > very
>> >> > > > > > > > > intellectual
>> >> > > > > > > > > > algorithm including a lot of parameters (how much RAM
>> is
>> >> > > free,
>> >> > > > > how
>> >> > > > > > > > > probably
>> >> > > > > > > > > > we will need next page, etc). We had the private talk
>> >> about
>> >> > > > such
>> >> > > > > > idea
>> >> > > > > > > > > some
>> >> > > > > > > > > > time ago.
>> >> > > > > > > > > >
>> >> > > > > > > > > > By my experience, Linux systems already do such
>> forward
>> >> > > reading
>> >> > > > > of
>> >> > > > > > > file
>> >> > > > > > > > > > data (for corresponding sequential flagged file
>> >> > descriptors),
>> >> > > > but
>> >> > > > > > > some
>> >> > > > > > > > > > prefetching of data at the level of application may be
>> >> > useful
>> >> > > > for
>> >> > > > > > > > > O_DIRECT
>> >> > > > > > > > > > file descriptors.
>> >> > > > > > > > > >
>> >> > > > > > > > > > And one more concern from me is about selecting a
>> right
>> >> > place
>> >> > > > in
>> >> > > > > > the
>> >> > > > > > > > > system
>> >> > > > > > > > > > to do such prefetch.
>> >> > > > > > > > > >
>> >> > > > > > > > > > Sincerely,
>> >> > > > > > > > > > Dmitriy Pavlov
>> >> > > > > > > > > >
>> >> > > > > > > > > > вс, 16 сент. 2018 г. в 19:54, Vladimir Ozerov <
>> >> > > > > >  [hidden email]
>> >> > > > > > > >:
>> >> > > > > > > > > >
>> >> > > > > > > > > > > HI Alex,
>> >> > > > > > > > > > >
>> >> > > > > > > > > > > This is good that you observed speedup. But I do not
>> >> > think
>> >> > > > this
>> >> > > > > > > > > solution
>> >> > > > > > > > > > > works for the product in general case. Amount of
>> RAM is
>> >> > > > > limited,
>> >> > > > > > > and
>> >> > > > > > > > > > even a
>> >> > > > > > > > > > > single partition may need more space than RAM
>> >> available.
>> >> > > > > Moving a
>> >> > > > > > > lot
>> >> > > > > > > > > of
>> >> > > > > > > > > > > pages to page memory for scan means that you evict a
>> >> lot
>> >> > of
>> >> > > > > other
>> >> > > > > > > > > pages,
>> >> > > > > > > > > > > what will ultimately lead to bad performance of
>> >> > subsequent
>> >> > > > > > queries
>> >> > > > > > > > and
>> >> > > > > > > > > > > defeat LRU algorithms, which are of great improtance
>> >> for
>> >> > > good
>> >> > > > > > > > database
>> >> > > > > > > > > > > performance.
>> >> > > > > > > > > > >
>> >> > > > > > > > > > > Database vendors choose another approach - skip
>> BTrees,
>> >> > > > iterate
>> >> > > > > > > > > direclty
>> >> > > > > > > > > > > over data pages, read them in multi-block fashion,
>> use
>> >> > > > separate
>> >> > > > > > > scan
>> >> > > > > > > > > > buffer
>> >> > > > > > > > > > > to avoid excessive evictions of other hot pages.
>> >> > > > Corresponding
>> >> > > > > > > ticket
>> >> > > > > > > > > for
>> >> > > > > > > > > > > SQL exists [1], but idea is common for all parts of
>> the
>> >> > > > system,
>> >> > > > > > > > > requiring
>> >> > > > > > > > > > > scans.
>> >> > > > > > > > > > >
>> >> > > > > > > > > > > As far as proposed solution, it might be good idea
>> to
>> >> add
>> >> > > > > special
>> >> > > > > > > API
>> >> > > > > > > > > to
>> >> > > > > > > > > > > "warmup" partition with clear explanation of pros
>> (fast
>> >> > > scan
>> >> > > > > > after
>> >> > > > > > > > > > warmup)
>> >> > > > > > > > > > > and cons (slowdown of any other operations). But I
>> >> think
>> >> > we
>> >> > > > > > should
>> >> > > > > > > > not
>> >> > > > > > > > > > make
>> >> > > > > > > > > > > this approach part of normal scans.
>> >> > > > > > > > > > >
>> >> > > > > > > > > > > Vladimir.
>> >> > > > > > > > > > >
>> >> > > > > > > > > > > [1]
>>  https://issues.apache.org/jira/browse/IGNITE-6057
>> >> > > > > > > > > > >
>> >> > > > > > > > > > >
>> >> > > > > > > > > > > On Sun, Sep 16, 2018 at 6:44 PM Alexei Scherbakov <
>> >> > > > > > > > > > >  [hidden email] > wrote:
>> >> > > > > > > > > > >
>> >> > > > > > > > > > > > Igniters,
>> >> > > > > > > > > > > >
>> >> > > > > > > > > > > > My use case involves scenario where it's
>> necessary to
>> >> > > > iterate
>> >> > > > > > > over
>> >> > > > > > > > > > > > large(many TBs) persistent cache doing some
>> >> calculation
>> >> > > on
>> >> > > > > read
>> >> > > > > > > > data.
>> >> > > > > > > > > > > >
>> >> > > > > > > > > > > > The basic solution is to iterate cache using
>> >> ScanQuery.
>> >> > > > > > > > > > > >
>> >> > > > > > > > > > > > This turns out to be slow because iteration over
>> >> cache
>> >> > > > > > involves a
>> >> > > > > > > > lot
>> >> > > > > > > > > > of
>> >> > > > > > > > > > > > random disk access for reading data pages
>> referenced
>> >> > from
>> >> > > > > leaf
>> >> > > > > > > > pages
>> >> > > > > > > > > by
>> >> > > > > > > > > > > > links.
>> >> > > > > > > > > > > >
>> >> > > > > > > > > > > > This is especially true when data is stored on
>> disks
>> >> > with
>> >> > > > > slow
>> >> > > > > > > > random
>> >> > > > > > > > > > > > access, like SAS disks. In my case on modern SAS
>> >> disks
>> >> > > > array
>> >> > > > > > > > reading
>> >> > > > > > > > > > > speed
>> >> > > > > > > > > > > > was like several MB/sec while sequential read
>> speed
>> >> in
>> >> > > perf
>> >> > > > > > test
>> >> > > > > > > > was
>> >> > > > > > > > > > > about
>> >> > > > > > > > > > > > GB/sec.
>> >> > > > > > > > > > > >
>> >> > > > > > > > > > > > I was able to fix the issue by using ScanQuery
>> with
>> >> > > > explicit
>> >> > > > > > > > > partition
>> >> > > > > > > > > > > set
>> >> > > > > > > > > > > > and running simple warmup code before each
>> partition
>> >> > > scan.
>> >> > > > > > > > > > > >
>> >> > > > > > > > > > > > The code pins cold pages in memory in sequential
>> >> order
>> >> > > thus
>> >> > > > > > > > > eliminating
>> >> > > > > > > > > > > > random disk access. Speedup was like x100
>> magnitude.
>> >> > > > > > > > > > > >
>> >> > > > > > > > > > > > I suggest adding the improvement to the product's
>> >> core
>> >> > > by
>> >> > > > > > always
>> >> > > > > > > > > > > > sequentially preloading pages for all internal
>> >> > partition
>> >> > > > > > > iterations
>> >> > > > > > > > > > > (cache
>> >> > > > > > > > > > > > iterators, scan queries, sql queries with scan
>> plan)
>> >> if
>> >> > > > > > partition
>> >> > > > > > > > is
>> >> > > > > > > > > > cold
>> >> > > > > > > > > > > > (low number of pinned pages).
>> >> > > > > > > > > > > >
>> >> > > > > > > > > > > > This also should speed up rebalancing from cold
>> >> > > partitions.
>> >> > > > > > > > > > > >
>> >> > > > > > > > > > > > Ignite JIRA ticket [1]
>> >> > > > > > > > > > > >
>> >> > > > > > > > > > > > Thoughts ?
>> >> > > > > > > > > > > >
>> >> > > > > > > > > > > > [1]
>> >>  https://issues.apache.org/jira/browse/IGNITE-8873
>> >> > > > > > > > > > > >
>> >> > > > > > > > > > > > --
>> >> > > > > > > > > > > >
>> >> > > > > > > > > > > > Best regards,
>> >> > > > > > > > > > > > Alexei Scherbakov
>> >> > > > > > > > > > > >
>> >> > > > > > > > > > >
>> >> > > > > > > > > >
>> >> > > > > > > > > --
>> >> > > > > > > > > --
>> >> > > > > > > > > Maxim Muzafarov
>> >> > > > > > > > >
>> >> > > > > > > >
>> >> > > > > > >
>> >> > > > > >
>> >> > > > >
>> >> > > >
>> >> > > >
>> >> > > > --
>> >> > > >
>> >> > > > Best regards,
>> >> > > > Alexei Scherbakov
>> >> > > >
>> >> > >
>> >> >
>> >>
>>
>>
>> --
>> Zhenya Stanilovsky
>>


--
Zhenya Stanilovsky
Reply | Threaded
Open this post in threaded view
|

***UNCHECKED*** Re: Re[4]: Cache scan efficiency

Mmuzaf
Alexei,

> Summing up, I suggest adding new public
method IgniteCache.preloadPartition(partId).

If I understand correctly use case, the user wants to retrieve whole data
from
the cache (not only single partition) having slow HDD. So, my suggestion is
to
create methods of public API like these:

`public IgniteCache<K, V> withPartitionsWarmup();`
`public IgniteCache<K, V> withPartitionWarmup(int partition);`

On Wed, 19 Sep 2018 at 09:59 Zhenya Stanilovsky <[hidden email]>
wrote:

> Great, i don`t think about that.
>
>
> >Среда, 19 сентября 2018, 9:40 +03:00 от Vladimir Ozerov <
> [hidden email]>:
> >
> >Pinning is even worse thing, because you loose control on how data is
> moved
> >within a single region. Instead, I would suggest to use partition warmup +
> >separate data region to achieve "pinning" semantics.
> >
> >On Wed, Sep 19, 2018 at 8:34 AM Zhenya Stanilovsky
> >< [hidden email] > wrote:
> >
> >> hi, but how to deal with page replacements, which Dmitriy Pavlov
> mentioned?
> >> this approach would be efficient if all data fits into memory, may be
> >> better to have method to pin some critical caches?
> >>
> >>
> >> >Среда, 19 сентября 2018, 0:26 +03:00 от Dmitriy Pavlov <
> >>  [hidden email] >:
> >> >
> >> >Even better, if RAM is exhausted page replacement process will be
> started.
> >> >
> >>
> https://cwiki.apache.org/confluence/display/IGNITE/Ignite+Durable+Memory+-+under+the+hood#IgniteDurableMemory-underthehood-Pagereplacement(rotationwithdisk
> >> )
> >> >
> >> >Effect of the preloading will be still markable, but not as excelled as
> >> >with full-fitting into RAM. Later I can review or improve javadocs if
> it
> >> is
> >> >necessary.
> >> >
> >> >ср, 19 сент. 2018 г. в 0:18, Denis Magda <  [hidden email] >:
> >> >
> >> >> Agree, it's just a matter of the documentation. If a user stores
> 100% in
> >> >> RAM and on disk, and just wants to warm RAM up after a restart then
> he
> >> >> knows everything will fit there. If during the preloading we detect
> that
> >> >> the RAM is exhausted we can halt it and print out a warning.
> >> >>
> >> >> --
> >> >> Denis
> >> >>
> >> >> On Tue, Sep 18, 2018 at 2:10 PM Dmitriy Pavlov <
> [hidden email]
> >> >
> >> >> wrote:
> >> >>
> >> >> > Hi,
> >> >> >
> >> >> > I totally support the idea of cache preload.
> >> >> >
> >> >> > IMO it can be expanded. We can iterate over local partitions of the
> >> cache
> >> >> > group and preload each.
> >> >> >
> >> >> > But it should be really clear documented methods so a user can be
> >> aware
> >> >> of
> >> >> > the benefits of such method (e.g. if RAM region is big enough,
> etc).
> >> >> >
> >> >> > Sincerely,
> >> >> > Dmitriy Pavlov
> >> >> >
> >> >> > вт, 18 сент. 2018 г. в 21:36, Denis Magda <  [hidden email] >:
> >> >> >
> >> >> > > Folks,
> >> >> > >
> >> >> > > Since we're adding a method that would preload a certain
> partition,
> >> can
> >> >> > we
> >> >> > > add the one which will preload the whole cache? Ignite
> persistence
> >> >> users
> >> >> > > I've been working with look puzzled once they realize there is no
> >> way
> >> >> to
> >> >> > > warm up RAM after the restart. There are use cases that require
> >> this.
> >> >> > >
> >> >> > > Can the current optimizations be expanded to the cache preloading
> >> use
> >> >> > case?
> >> >> > >
> >> >> > > --
> >> >> > > Denis
> >> >> > >
> >> >> > > On Tue, Sep 18, 2018 at 3:58 AM Alexei Scherbakov <
> >> >> > >  [hidden email] > wrote:
> >> >> > >
> >> >> > > > Summing up, I suggest adding new public
> >> >> > > > method IgniteCache.preloadPartition(partId).
> >> >> > > >
> >> >> > > > I will start preparing PR for IGNITE-8873
> >> >> > > > <  https://issues.apache.org/jira/browse/IGNITE-8873 > if no
> more
> >> >> > > objections
> >> >> > > > follow.
> >> >> > > >
> >> >> > > >
> >> >> > > >
> >> >> > > > вт, 18 сент. 2018 г. в 10:50, Alexey Goncharuk <
> >> >> > >  [hidden email]
> >> >> > > > >:
> >> >> > > >
> >> >> > > > > Dmitriy,
> >> >> > > > >
> >> >> > > > > In my understanding, the proper fix for the scan query looks
> >> like a
> >> >> > big
> >> >> > > > > change and it is unlikely that we include it in Ignite 2.7.
> On
> >> the
> >> >> > > other
> >> >> > > > > hand, the method suggested by Alexei is quite simple  and it
> >> >> > definitely
> >> >> > > > > fits Ignite 2.7, which will provide a better user experience.
> >> Even
> >> >> > > > having a
> >> >> > > > > proper scan query implemented this method can be useful in
> some
> >> >> > > specific
> >> >> > > > > scenarios, so we will not have to deprecate it.
> >> >> > > > >
> >> >> > > > > --AG
> >> >> > > > >
> >> >> > > > > пн, 17 сент. 2018 г. в 19:15, Dmitriy Pavlov <
> >> >>  [hidden email]
> >> >> > >:
> >> >> > > > >
> >> >> > > > > > As I understood it is not a hack, it is an advanced feature
> >> for
> >> >> > > warming
> >> >> > > > > up
> >> >> > > > > > the partition. We can build warm-up of the overall cache by
> >> >> calling
> >> >> > > its
> >> >> > > > > > partitions warm-up. Users often ask about this feature and
> are
> >> >> not
> >> >> > > > > > confident with our lazy upload.
> >> >> > > > > >
> >> >> > > > > > Please correct me if I misunderstood the idea.
> >> >> > > > > >
> >> >> > > > > > пн, 17 сент. 2018 г. в 18:37, Dmitriy Setrakyan <
> >> >> > >  [hidden email]
> >> >> > > > >:
> >> >> > > > > >
> >> >> > > > > > > I would rather fix the scan than hack the scan. Is there
> any
> >> >> > > > technical
> >> >> > > > > > > reason for hacking it now instead of fixing it properly?
> Can
> >> >> some
> >> >> > > of
> >> >> > > > > the
> >> >> > > > > > > experts in this thread provide an estimate of complexity
> and
> >> >> > > > difference
> >> >> > > > > > in
> >> >> > > > > > > work that would be required for each approach?
> >> >> > > > > > >
> >> >> > > > > > > D.
> >> >> > > > > > >
> >> >> > > > > > > On Mon, Sep 17, 2018 at 4:42 PM Alexey Goncharuk <
> >> >> > > > > > >  [hidden email] >
> >> >> > > > > > > wrote:
> >> >> > > > > > >
> >> >> > > > > > > > I think it would be beneficial for some Ignite users
> if we
> >> >> > added
> >> >> > > > > such a
> >> >> > > > > > > > partition warmup method to the public API. The method
> >> should
> >> >> be
> >> >> > > > > > > > well-documented and state that it may invalidate
> existing
> >> >> page
> >> >> > > > cache.
> >> >> > > > > > It
> >> >> > > > > > > > will be a very effective instrument until we add the
> >> proper
> >> >> > scan
> >> >> > > > > > ability
> >> >> > > > > > > > that Vladimir was referring to.
> >> >> > > > > > > >
> >> >> > > > > > > > пн, 17 сент. 2018 г. в 13:05, Maxim Muzafarov <
> >> >> > >  [hidden email]
> >> >> > > > >:
> >> >> > > > > > > >
> >> >> > > > > > > > > Folks,
> >> >> > > > > > > > >
> >> >> > > > > > > > > Such warming up can be an effective technique for
> >> >> performing
> >> >> > > > > > > calculations
> >> >> > > > > > > > > which required large cache
> >> >> > > > > > > > > data reads, but I think it's the single narrow use
> case
> >> of
> >> >> > all
> >> >> > > > over
> >> >> > > > > > > > Ignite
> >> >> > > > > > > > > store usages. Like all other
> >> >> > > > > > > > > powerfull techniques, we should use it wisely. In the
> >> >> general
> >> >> > > > > case, I
> >> >> > > > > > > > think
> >> >> > > > > > > > > we should consider other
> >> >> > > > > > > > > techniques mentioned by Vladimir and may create
> >> something
> >> >> > like
> >> >> > > > > > `global
> >> >> > > > > > > > > statistics of cache data usage`
> >> >> > > > > > > > > to choose the best technique in each case.
> >> >> > > > > > > > >
> >> >> > > > > > > > > For instance, it's not obvious what would take
> longer:
> >> >> > > > multi-block
> >> >> > > > > > > reads
> >> >> > > > > > > > or
> >> >> > > > > > > > > 50 single-block reads issues
> >> >> > > > > > > > > sequentially. It strongly depends on used hardware
> under
> >> >> the
> >> >> > > hood
> >> >> > > > > and
> >> >> > > > > > > > might
> >> >> > > > > > > > > depend on workload system
> >> >> > > > > > > > > resources (CPU-intensive calculations and I\O
> access) as
> >> >> > well.
> >> >> > > > But
> >> >> > > > > > > > > `statistics` will help us to choose
> >> >> > > > > > > > > the right way.
> >> >> > > > > > > > >
> >> >> > > > > > > > >
> >> >> > > > > > > > > On Sun, 16 Sep 2018 at 23:59 Dmitriy Pavlov <
> >> >> > > >  [hidden email]
> >> >> > > > > >
> >> >> > > > > > > > wrote:
> >> >> > > > > > > > >
> >> >> > > > > > > > > > Hi Alexei,
> >> >> > > > > > > > > >
> >> >> > > > > > > > > > I did not find any PRs associated with the ticket
> for
> >> >> check
> >> >> > > > code
> >> >> > > > > > > > changes
> >> >> > > > > > > > > > behind this idea. Are there any PRs?
> >> >> > > > > > > > > >
> >> >> > > > > > > > > > If we create some forwards scan of pages, it should
> >> be a
> >> >> > very
> >> >> > > > > > > > > intellectual
> >> >> > > > > > > > > > algorithm including a lot of parameters (how much
> RAM
> >> is
> >> >> > > free,
> >> >> > > > > how
> >> >> > > > > > > > > probably
> >> >> > > > > > > > > > we will need next page, etc). We had the private
> talk
> >> >> about
> >> >> > > > such
> >> >> > > > > > idea
> >> >> > > > > > > > > some
> >> >> > > > > > > > > > time ago.
> >> >> > > > > > > > > >
> >> >> > > > > > > > > > By my experience, Linux systems already do such
> >> forward
> >> >> > > reading
> >> >> > > > > of
> >> >> > > > > > > file
> >> >> > > > > > > > > > data (for corresponding sequential flagged file
> >> >> > descriptors),
> >> >> > > > but
> >> >> > > > > > > some
> >> >> > > > > > > > > > prefetching of data at the level of application
> may be
> >> >> > useful
> >> >> > > > for
> >> >> > > > > > > > > O_DIRECT
> >> >> > > > > > > > > > file descriptors.
> >> >> > > > > > > > > >
> >> >> > > > > > > > > > And one more concern from me is about selecting a
> >> right
> >> >> > place
> >> >> > > > in
> >> >> > > > > > the
> >> >> > > > > > > > > system
> >> >> > > > > > > > > > to do such prefetch.
> >> >> > > > > > > > > >
> >> >> > > > > > > > > > Sincerely,
> >> >> > > > > > > > > > Dmitriy Pavlov
> >> >> > > > > > > > > >
> >> >> > > > > > > > > > вс, 16 сент. 2018 г. в 19:54, Vladimir Ozerov <
> >> >> > > > > >  [hidden email]
> >> >> > > > > > > >:
> >> >> > > > > > > > > >
> >> >> > > > > > > > > > > HI Alex,
> >> >> > > > > > > > > > >
> >> >> > > > > > > > > > > This is good that you observed speedup. But I do
> not
> >> >> > think
> >> >> > > > this
> >> >> > > > > > > > > solution
> >> >> > > > > > > > > > > works for the product in general case. Amount of
> >> RAM is
> >> >> > > > > limited,
> >> >> > > > > > > and
> >> >> > > > > > > > > > even a
> >> >> > > > > > > > > > > single partition may need more space than RAM
> >> >> available.
> >> >> > > > > Moving a
> >> >> > > > > > > lot
> >> >> > > > > > > > > of
> >> >> > > > > > > > > > > pages to page memory for scan means that you
> evict a
> >> >> lot
> >> >> > of
> >> >> > > > > other
> >> >> > > > > > > > > pages,
> >> >> > > > > > > > > > > what will ultimately lead to bad performance of
> >> >> > subsequent
> >> >> > > > > > queries
> >> >> > > > > > > > and
> >> >> > > > > > > > > > > defeat LRU algorithms, which are of great
> improtance
> >> >> for
> >> >> > > good
> >> >> > > > > > > > database
> >> >> > > > > > > > > > > performance.
> >> >> > > > > > > > > > >
> >> >> > > > > > > > > > > Database vendors choose another approach - skip
> >> BTrees,
> >> >> > > > iterate
> >> >> > > > > > > > > direclty
> >> >> > > > > > > > > > > over data pages, read them in multi-block
> fashion,
> >> use
> >> >> > > > separate
> >> >> > > > > > > scan
> >> >> > > > > > > > > > buffer
> >> >> > > > > > > > > > > to avoid excessive evictions of other hot pages.
> >> >> > > > Corresponding
> >> >> > > > > > > ticket
> >> >> > > > > > > > > for
> >> >> > > > > > > > > > > SQL exists [1], but idea is common for all parts
> of
> >> the
> >> >> > > > system,
> >> >> > > > > > > > > requiring
> >> >> > > > > > > > > > > scans.
> >> >> > > > > > > > > > >
> >> >> > > > > > > > > > > As far as proposed solution, it might be good
> idea
> >> to
> >> >> add
> >> >> > > > > special
> >> >> > > > > > > API
> >> >> > > > > > > > > to
> >> >> > > > > > > > > > > "warmup" partition with clear explanation of pros
> >> (fast
> >> >> > > scan
> >> >> > > > > > after
> >> >> > > > > > > > > > warmup)
> >> >> > > > > > > > > > > and cons (slowdown of any other operations). But
> I
> >> >> think
> >> >> > we
> >> >> > > > > > should
> >> >> > > > > > > > not
> >> >> > > > > > > > > > make
> >> >> > > > > > > > > > > this approach part of normal scans.
> >> >> > > > > > > > > > >
> >> >> > > > > > > > > > > Vladimir.
> >> >> > > > > > > > > > >
> >> >> > > > > > > > > > > [1]
> >>  https://issues.apache.org/jira/browse/IGNITE-6057
> >> >> > > > > > > > > > >
> >> >> > > > > > > > > > >
> >> >> > > > > > > > > > > On Sun, Sep 16, 2018 at 6:44 PM Alexei
> Scherbakov <
> >> >> > > > > > > > > > >  [hidden email] > wrote:
> >> >> > > > > > > > > > >
> >> >> > > > > > > > > > > > Igniters,
> >> >> > > > > > > > > > > >
> >> >> > > > > > > > > > > > My use case involves scenario where it's
> >> necessary to
> >> >> > > > iterate
> >> >> > > > > > > over
> >> >> > > > > > > > > > > > large(many TBs) persistent cache doing some
> >> >> calculation
> >> >> > > on
> >> >> > > > > read
> >> >> > > > > > > > data.
> >> >> > > > > > > > > > > >
> >> >> > > > > > > > > > > > The basic solution is to iterate cache using
> >> >> ScanQuery.
> >> >> > > > > > > > > > > >
> >> >> > > > > > > > > > > > This turns out to be slow because iteration
> over
> >> >> cache
> >> >> > > > > > involves a
> >> >> > > > > > > > lot
> >> >> > > > > > > > > > of
> >> >> > > > > > > > > > > > random disk access for reading data pages
> >> referenced
> >> >> > from
> >> >> > > > > leaf
> >> >> > > > > > > > pages
> >> >> > > > > > > > > by
> >> >> > > > > > > > > > > > links.
> >> >> > > > > > > > > > > >
> >> >> > > > > > > > > > > > This is especially true when data is stored on
> >> disks
> >> >> > with
> >> >> > > > > slow
> >> >> > > > > > > > random
> >> >> > > > > > > > > > > > access, like SAS disks. In my case on modern
> SAS
> >> >> disks
> >> >> > > > array
> >> >> > > > > > > > reading
> >> >> > > > > > > > > > > speed
> >> >> > > > > > > > > > > > was like several MB/sec while sequential read
> >> speed
> >> >> in
> >> >> > > perf
> >> >> > > > > > test
> >> >> > > > > > > > was
> >> >> > > > > > > > > > > about
> >> >> > > > > > > > > > > > GB/sec.
> >> >> > > > > > > > > > > >
> >> >> > > > > > > > > > > > I was able to fix the issue by using ScanQuery
> >> with
> >> >> > > > explicit
> >> >> > > > > > > > > partition
> >> >> > > > > > > > > > > set
> >> >> > > > > > > > > > > > and running simple warmup code before each
> >> partition
> >> >> > > scan.
> >> >> > > > > > > > > > > >
> >> >> > > > > > > > > > > > The code pins cold pages in memory in
> sequential
> >> >> order
> >> >> > > thus
> >> >> > > > > > > > > eliminating
> >> >> > > > > > > > > > > > random disk access. Speedup was like x100
> >> magnitude.
> >> >> > > > > > > > > > > >
> >> >> > > > > > > > > > > > I suggest adding the improvement to the
> product's
> >> >> core
> >> >> > > by
> >> >> > > > > > always
> >> >> > > > > > > > > > > > sequentially preloading pages for all internal
> >> >> > partition
> >> >> > > > > > > iterations
> >> >> > > > > > > > > > > (cache
> >> >> > > > > > > > > > > > iterators, scan queries, sql queries with scan
> >> plan)
> >> >> if
> >> >> > > > > > partition
> >> >> > > > > > > > is
> >> >> > > > > > > > > > cold
> >> >> > > > > > > > > > > > (low number of pinned pages).
> >> >> > > > > > > > > > > >
> >> >> > > > > > > > > > > > This also should speed up rebalancing from cold
> >> >> > > partitions.
> >> >> > > > > > > > > > > >
> >> >> > > > > > > > > > > > Ignite JIRA ticket [1]
> >> >> > > > > > > > > > > >
> >> >> > > > > > > > > > > > Thoughts ?
> >> >> > > > > > > > > > > >
> >> >> > > > > > > > > > > > [1]
> >> >>  https://issues.apache.org/jira/browse/IGNITE-8873
> >> >> > > > > > > > > > > >
> >> >> > > > > > > > > > > > --
> >> >> > > > > > > > > > > >
> >> >> > > > > > > > > > > > Best regards,
> >> >> > > > > > > > > > > > Alexei Scherbakov
> >> >> > > > > > > > > > > >
> >> >> > > > > > > > > > >
> >> >> > > > > > > > > >
> >> >> > > > > > > > > --
> >> >> > > > > > > > > --
> >> >> > > > > > > > > Maxim Muzafarov
> >> >> > > > > > > > >
> >> >> > > > > > > >
> >> >> > > > > > >
> >> >> > > > > >
> >> >> > > > >
> >> >> > > >
> >> >> > > >
> >> >> > > > --
> >> >> > > >
> >> >> > > > Best regards,
> >> >> > > > Alexei Scherbakov
> >> >> > > >
> >> >> > >
> >> >> >
> >> >>
> >>
> >>
> >> --
> >> Zhenya Stanilovsky
> >>
>
>
> --
> Zhenya Stanilovsky
>
--
--
Maxim Muzafarov