PageMemory approach for Ignite 2.0

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

PageMemory approach for Ignite 2.0

Alexey Goncharuk
Folks,

I pushed an initial implementation of IGNITE-3477 to ignite-3477 branch for
community review and further discussion.

Note that the implementation lacks the following features:
 - On-heap deserialized values cache
 - Full LOCAL cache support
 - Eviction policies
 - Multiple memory pools
 - Distributed joins support
 - Off-heap circular remove buffer
 - Maybe something else I missed

The subject of this discussion is to determine whether the PageMemory
approach is a way to go, because the this implementation is almost 2x
slower than current 2.0 branch. There is some room for improvement, but I
am not completely sure we can gain the same performance numbers as in 2.0.

I encourage the community to review the code and architecture and share
their thoughts here.

Thanks,
AG
Reply | Threaded
Open this post in threaded view
|

Re: PageMemory approach for Ignite 2.0

Sergi
I agree, we should fix all the outstanding issues and resolve the
performance problems before merging it into 2.0

Sergi

2016-12-29 12:37 GMT+03:00 Alexey Goncharuk <[hidden email]>:

> Folks,
>
> I pushed an initial implementation of IGNITE-3477 to ignite-3477 branch for
> community review and further discussion.
>
> Note that the implementation lacks the following features:
>  - On-heap deserialized values cache
>  - Full LOCAL cache support
>  - Eviction policies
>  - Multiple memory pools
>  - Distributed joins support
>  - Off-heap circular remove buffer
>  - Maybe something else I missed
>
> The subject of this discussion is to determine whether the PageMemory
> approach is a way to go, because the this implementation is almost 2x
> slower than current 2.0 branch. There is some room for improvement, but I
> am not completely sure we can gain the same performance numbers as in 2.0.
>
> I encourage the community to review the code and architecture and share
> their thoughts here.
>
> Thanks,
> AG
>
Reply | Threaded
Open this post in threaded view
|

Re: PageMemory approach for Ignite 2.0

dmagda
In reply to this post by Alexey Goncharuk
Alex,

> On Dec 29, 2016, at 1:37 AM, Alexey Goncharuk <[hidden email]> wrote:
>
> The subject of this discussion is to determine whether the PageMemory
> approach is a way to go, because the this implementation is almost 2x
> slower than current 2.0 branch.

What is the main reason for that? Some architecture flaw you didn’t consider before or performance issues that have to be cleaned up?


Denis
Reply | Threaded
Open this post in threaded view
|

Re: PageMemory approach for Ignite 2.0

Dmitriy Setrakyan
Denis, this is not "dreaming". We can't have a release with less features than 1.8. We can only add features, not subtract.

Dmitriy



> On Dec 29, 2016, at 9:49 AM, Denis Magda <[hidden email]> wrote:
>
> Alex,
>
>> On Dec 29, 2016, at 1:37 AM, Alexey Goncharuk <[hidden email]> wrote:
>>
>> The subject of this discussion is to determine whether the PageMemory
>> approach is a way to go, because the this implementation is almost 2x
>> slower than current 2.0 branch.
>
> What is the main reason for that? Some architecture flaw you didn’t consider before or performance issues that have to be cleaned up?
>
> —
> Denis
Reply | Threaded
Open this post in threaded view
|

Re: PageMemory approach for Ignite 2.0

dsetrakyan
In reply to this post by Alexey Goncharuk
On Thu, Dec 29, 2016 at 1:37 AM, Alexey Goncharuk <
[hidden email]> wrote:

> Folks,
>
> I pushed an initial implementation of IGNITE-3477 to ignite-3477 branch for
> community review and further discussion.
>
> Note that the implementation lacks the following features:
>  - On-heap deserialized values cache
>  - Full LOCAL cache support
>  - Eviction policies
>  - Multiple memory pools
>  - Distributed joins support
>  - Off-heap circular remove buffer
>  - Maybe something else I missed
>

Do we have *blocker* tickets for all the remaining issues? Ignite 2.0 will
have to support everything in Ignite 1.0. Otherwise we will not be able to
release.


> The subject of this discussion is to determine whether the PageMemory
> approach is a way to go, because the this implementation is almost 2x
> slower than current 2.0 branch. There is some room for improvement, but I
> am not completely sure we can gain the same performance numbers as in 2.0.
>

I would rephrase this. We should all assume that the PageMemory approach is
the right approach. Here are the main benefits:

- completely off-heap (minimal GC overhead)
- predictable memory size
- ability to extend to external store, like disk, without serialization
- etc...

Let's collectively work on ensuring that it can perform as fast as Ignite
1.8.x. If after a thorough investigation we decide that PageMemory cannot
perform, then we can start thinking about other approaches.


> I encourage the community to review the code and architecture and share
> their thoughts here.
>

Completely agree. If anyone has extra cycles, please review the code and
suggest any improvements.
Reply | Threaded
Open this post in threaded view
|

Re: PageMemory approach for Ignite 2.0

Vladimir Ozerov
So are you, guys, suggesting to accept page-memory as the right one by
default, which:
1) Doesn't work with half of current cache features
2) Halved our performance
3) Goes against the whole Java ecosystem with it's "offheap-first" approach?

Good try, but no :-)

Let me clarify on p.3. Offheap-first is not correct approach. It is
questionable and dangerous way, to say the least. GC is central component
of the whole Java ecosystem. No wonder that the most comfortable approach
for users is when everythings is stored in heap.

Offheap solutions were created to mitigate scalability issues Java faced in
recent years due to rapid decrease in RAM costs. However, it doesn't mean
that things will be bad forever. At the moment there are at least 3 modern
GCs targeting scalability: G1GC from Oracle, Shenandoah from RedHat, and C4
from Azul. No doubts they will solve (or at least relieve signifiacntly)
the problem in mid-term, with gradual improvement from year-to-year,
month-to-month.

Moreover, GC problem is attacked from different angles. Another major
improvement is stack-based structures which is going to appear in Java as a
part of Valhalla project [1]. When implemented, frameworks will be able to
reduce heap allocations significantly. Instead of having several dozens
heap objecs rooted from our infamous GridCacheMapEntry, we will have only
one heap object - GridCacheMapEntry itself.

Okay, okay, this is a matter of years, we need solution now, what is wrong
with offheap? Only one thing - it *splits server memory into two unrelated
pieces* - Java heap and offheap. This is terrible thing from user
perspective. I already went through this during Hadoop Accelerator
development:
- Output data is stored offheap. Cool, no GC!
- Intermediate data, such as our NIO messages, are stored in Java heap.
Now we run intensive load and ... OutOfMemoryError!.Ok, giving more Java
heap, but now ... out of native memory! Finally, in order to make it work
we have to give much more memory than needed to one of these parts.
Result: *poor
memory utilization*. Things would be much more easier if we either store
everything in heap, or everything offheap. But as user code is executed in
heap by default, offheap is not an option for average user.

All in all, offheap approach is valuable for high-end deployments with
hundreds gigabytes of memory. But on commodity software with moderate
amount of memory applications are likely to have problems due to
heap/offheap separation, without any advantages.

So my main concern is *what about current heap mode*? It must stay alive.
Page-memory approach should be abstracted out and implemented in addition
to current heap-approach, not instead of it. Have high-end machine and
suffer from GC? Pick offheap mode. Have a commodity machine? Old good heap
mode is your choice.

[1] http://openjdk.java.net/projects/valhalla/



On Fri, Dec 30, 2016 at 9:50 PM, Dmitriy Setrakyan <[hidden email]>
wrote:

> On Thu, Dec 29, 2016 at 1:37 AM, Alexey Goncharuk <
> [hidden email]> wrote:
>
> > Folks,
> >
> > I pushed an initial implementation of IGNITE-3477 to ignite-3477 branch
> for
> > community review and further discussion.
> >
> > Note that the implementation lacks the following features:
> >  - On-heap deserialized values cache
> >  - Full LOCAL cache support
> >  - Eviction policies
> >  - Multiple memory pools
> >  - Distributed joins support
> >  - Off-heap circular remove buffer
> >  - Maybe something else I missed
> >
>
> Do we have *blocker* tickets for all the remaining issues? Ignite 2.0 will
> have to support everything in Ignite 1.0. Otherwise we will not be able to
> release.
>
>
> > The subject of this discussion is to determine whether the PageMemory
> > approach is a way to go, because the this implementation is almost 2x
> > slower than current 2.0 branch. There is some room for improvement, but I
> > am not completely sure we can gain the same performance numbers as in
> 2.0.
> >
>
> I would rephrase this. We should all assume that the PageMemory approach is
> the right approach. Here are the main benefits:
>
> - completely off-heap (minimal GC overhead)
> - predictable memory size
> - ability to extend to external store, like disk, without serialization
> - etc...
>
> Let's collectively work on ensuring that it can perform as fast as Ignite
> 1.8.x. If after a thorough investigation we decide that PageMemory cannot
> perform, then we can start thinking about other approaches.
>
>
> > I encourage the community to review the code and architecture and share
> > their thoughts here.
> >
>
> Completely agree. If anyone has extra cycles, please review the code and
> suggest any improvements.
>
Reply | Threaded
Open this post in threaded view
|

Re: PageMemory approach for Ignite 2.0

dsetrakyan
On Sat, Dec 31, 2016 at 7:07 AM, Vladimir Ozerov <[hidden email]>
wrote:

So my main concern is *what about current heap mode*? It must stay alive.
> Page-memory approach should be abstracted out and implemented in addition
> to current heap-approach, not instead of it. Have high-end machine and
> suffer from GC? Pick offheap mode. Have a commodity machine? Old good heap
> mode is your choice.
>

Vova, disagree. I don't see a reason to maintain on-heap implementation, if
we can make the off-heap one work fast enough. Remember, this is the 1st
draft only. Once we optimize it, it will get a lot faster.
Reply | Threaded
Open this post in threaded view
|

Re: PageMemory approach for Ignite 2.0

dmagda
Here we need to define what’s meant under *fast enough*. Java is unmanageable in terms of memory and it’s unlikely that any custom memory manageable solution like the PageMemory will outperform it ever. Simply because the Java heap will still be an intermediate layer between an application and the PageMemory passing objects back and forth. Also the PageMemory manages data, as I understand, with JNI-based Unsafe that also brings performance hit, etc.

So, personally I share Vladimir’s point of view and would not discontinue the on-heap implementation.

What I can’t is why the PageMemory so slower than the current *off*-heap based implementation? The latter has comparable performance with the on-heap impl and it’s not twice times slower for sure. Alex G., could you elaborate on this?


Denis

> On Dec 31, 2016, at 8:29 AM, Dmitriy Setrakyan <[hidden email]> wrote:
>
> On Sat, Dec 31, 2016 at 7:07 AM, Vladimir Ozerov <[hidden email]>
> wrote:
>
> So my main concern is *what about current heap mode*? It must stay alive.
>> Page-memory approach should be abstracted out and implemented in addition
>> to current heap-approach, not instead of it. Have high-end machine and
>> suffer from GC? Pick offheap mode. Have a commodity machine? Old good heap
>> mode is your choice.
>>
>
> Vova, disagree. I don't see a reason to maintain on-heap implementation, if
> we can make the off-heap one work fast enough. Remember, this is the 1st
> draft only. Once we optimize it, it will get a lot faster.

Reply | Threaded
Open this post in threaded view
|

Re: PageMemory approach for Ignite 2.0

dmagda
Sorry, just recalled that Unsafe is not JNI based. However, my previous point of view still remains the same.

> On Dec 31, 2016, at 11:39 PM, Denis Magda <[hidden email]> wrote:
>
> JNI-based Unsafe that also brings performance hit


Denis

Reply | Threaded
Open this post in threaded view
|

Re: PageMemory approach for Ignite 2.0

Vladimir Ozerov
Dima,

Performance is a serious concern, but not the main one. My point is that
standard users working on commodity hardware and requiring only in-memory
mode simply do not need page memory. They need distributed HashMap. We
already have it. It is fast, it is stable, it have been tested rigorously
for years. It does what users need.

PageMemory approach targets high-end deployments which is hardly represents
majority of our users. Less than 10% I think. Or may be <5%, or even <1%.
This is who may benefit from page memory. Others will benefit nothing
except of additional layer of indirection, drop in performance, risks of
instability. And problems with capacity planning, because it is much harder
to plan two memory regions properly than a single one.

I talked to Alexey Goncharuk some time ago, and he told it is not a big
deal to abstract out PageMemory. Alex, please confirm. I encourage everyone
to stop thinking of dropping "old" before you have built "new" and
confirmed that it is better.

Let's ensure that new approach is well-abstracted, add it to 2.0, let it
maturate for 1-2 years, and then think of dropping current approach in 3.0.
This sounds much better to me.


On Sun, Jan 1, 2017 at 10:42 AM, Denis Magda <[hidden email]> wrote:

> Sorry, just recalled that Unsafe is not JNI based. However, my previous
> point of view still remains the same.
>
> > On Dec 31, 2016, at 11:39 PM, Denis Magda <[hidden email]> wrote:
> >
> > JNI-based Unsafe that also brings performance hit
>
> —
> Denis
>
>
Reply | Threaded
Open this post in threaded view
|

Re: PageMemory approach for Ignite 2.0

dsetrakyan
Vova,

I would qualify the need for PageMemory as strategic for Apache Ignite.
With addition of SQL Grid component, Ignite can now also satisfy in-memory
database use cases, which are very space consuming and require a new memory
management approach. Basic distributed hash map is not going to work for
such use cases.

Once PageMemory becomes stable and fast, I don't believe we can afford to
maintain two types of memory management in the project. It will be just too
time consuming. On top of that, the old approach will simply not be needed
anymore.

I am OK with maintaining 2 implementations, assuming we can have a good
abstraction for the memory management.However, even in that case, we should
all weigh whether it makes sense to spend time on migrating the current
on-heap implementation to use these new memory APIs.

D.

On Sun, Jan 1, 2017 at 10:47 PM, Vladimir Ozerov <[hidden email]>
wrote:

> Dima,
>
> Performance is a serious concern, but not the main one. My point is that
> standard users working on commodity hardware and requiring only in-memory
> mode simply do not need page memory. They need distributed HashMap. We
> already have it. It is fast, it is stable, it have been tested rigorously
> for years. It does what users need.
>
> PageMemory approach targets high-end deployments which is hardly represents
> majority of our users. Less than 10% I think. Or may be <5%, or even <1%.
> This is who may benefit from page memory. Others will benefit nothing
> except of additional layer of indirection, drop in performance, risks of
> instability. And problems with capacity planning, because it is much harder
> to plan two memory regions properly than a single one.
>
> I talked to Alexey Goncharuk some time ago, and he told it is not a big
> deal to abstract out PageMemory. Alex, please confirm. I encourage everyone
> to stop thinking of dropping "old" before you have built "new" and
> confirmed that it is better.
>
> Let's ensure that new approach is well-abstracted, add it to 2.0, let it
> maturate for 1-2 years, and then think of dropping current approach in 3.0.
> This sounds much better to me.
>
>
> On Sun, Jan 1, 2017 at 10:42 AM, Denis Magda <[hidden email]> wrote:
>
> > Sorry, just recalled that Unsafe is not JNI based. However, my previous
> > point of view still remains the same.
> >
> > > On Dec 31, 2016, at 11:39 PM, Denis Magda <[hidden email]> wrote:
> > >
> > > JNI-based Unsafe that also brings performance hit
> >
> > —
> > Denis
> >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: PageMemory approach for Ignite 2.0

Sergi
Agree with Dmitriy. We should avoid having multiple implementations of the
same thing if possible. Lets put our efforts on fixing issues with
PageMemory.

Sergi

2017-01-03 20:11 GMT+03:00 Dmitriy Setrakyan <[hidden email]>:

> Vova,
>
> I would qualify the need for PageMemory as strategic for Apache Ignite.
> With addition of SQL Grid component, Ignite can now also satisfy in-memory
> database use cases, which are very space consuming and require a new memory
> management approach. Basic distributed hash map is not going to work for
> such use cases.
>
> Once PageMemory becomes stable and fast, I don't believe we can afford to
> maintain two types of memory management in the project. It will be just too
> time consuming. On top of that, the old approach will simply not be needed
> anymore.
>
> I am OK with maintaining 2 implementations, assuming we can have a good
> abstraction for the memory management.However, even in that case, we should
> all weigh whether it makes sense to spend time on migrating the current
> on-heap implementation to use these new memory APIs.
>
> D.
>
> On Sun, Jan 1, 2017 at 10:47 PM, Vladimir Ozerov <[hidden email]>
> wrote:
>
> > Dima,
> >
> > Performance is a serious concern, but not the main one. My point is that
> > standard users working on commodity hardware and requiring only in-memory
> > mode simply do not need page memory. They need distributed HashMap. We
> > already have it. It is fast, it is stable, it have been tested rigorously
> > for years. It does what users need.
> >
> > PageMemory approach targets high-end deployments which is hardly
> represents
> > majority of our users. Less than 10% I think. Or may be <5%, or even <1%.
> > This is who may benefit from page memory. Others will benefit nothing
> > except of additional layer of indirection, drop in performance, risks of
> > instability. And problems with capacity planning, because it is much
> harder
> > to plan two memory regions properly than a single one.
> >
> > I talked to Alexey Goncharuk some time ago, and he told it is not a big
> > deal to abstract out PageMemory. Alex, please confirm. I encourage
> everyone
> > to stop thinking of dropping "old" before you have built "new" and
> > confirmed that it is better.
> >
> > Let's ensure that new approach is well-abstracted, add it to 2.0, let it
> > maturate for 1-2 years, and then think of dropping current approach in
> 3.0.
> > This sounds much better to me.
> >
> >
> > On Sun, Jan 1, 2017 at 10:42 AM, Denis Magda <[hidden email]> wrote:
> >
> > > Sorry, just recalled that Unsafe is not JNI based. However, my previous
> > > point of view still remains the same.
> > >
> > > > On Dec 31, 2016, at 11:39 PM, Denis Magda <[hidden email]> wrote:
> > > >
> > > > JNI-based Unsafe that also brings performance hit
> > >
> > > —
> > > Denis
> > >
> > >
> >
>