PDS suites fail with exit code 137

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

PDS suites fail with exit code 137

Ivan Bessonov
Hello Igniters,

I'd like to discuss the current issue with "out of memory" fails on
TeamCity. Particularly suites [1]
and [2], they have quite a lot of "Exit code 137" failures.

I investigated the "PDS (Indexing)" suite under [3]. There's another
similar issue as well: [4].
I came to the conclusion that the main problem is inside the default memory
allocator (malloc).
Let me explain the way I see it right now:

"malloc" is allowed to allocate (for internal usages) up to 8 * (number of
cores) blocks called
ARENA, 64 mb each. This may happen when a program creates/stops threads
frequently and
allocates a lot of memory all the time, which is exactly what our tests do.
Given that TC agents
have 32 cores, 8 * 32 * 64 mb gives 16 gigabytes, that's like the whole
amount of RAM on the
single agent.

The total amount of arenas can be manually lowered by setting
the MALLOC_ARENA_MAX
environment variable to 4 (or other small value). I tried it locally and in
PDS (Indexing) suite
settings on TC, results look very promising: [5]

It is said that changing this variable may lead to some performance
degradation, but it's hard to tell whether we have it or not, because the
suite usually failed before it was completed.

So, I have two questions right now:

- can those of you, who are into hardcore Linux and C, confirm that the
solution can help us? Experiments show that it completely solves the
problem.
- can you please point me to a person who usually does TC maintenance? I'm
not entirely sure
that I can propagate this environment variable to all suites by myself,
which is necessary to
avoid occasional error 137 (resulted from the same problem) in future. I
just don't know all the
details about suites structure.

Thank you!

[1]
https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_PdsIndexing&tab=buildTypeHistoryList&state=failed&branch_IgniteTests24Java8=%3Cdefault%3E
[2]
https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_Pds4&tab=buildTypeHistoryList&branch_IgniteTests24Java8=%3Cdefault%3E&state=failed
[3] https://issues.apache.org/jira/browse/IGNITE-13266
[4] https://issues.apache.org/jira/browse/IGNITE-13263
[5]
https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_PdsIndexing&tab=buildTypeHistoryList&branch_IgniteTests24Java8=pull%2F8051%2Fhead

--
Sincerely yours,
Ivan Bessonov
Reply | Threaded
Open this post in threaded view
|

Re: PDS suites fail with exit code 137

Ivan Daschinsky
Ivan, I think that we should use mmap/munmap to allocate huge chunks of
memory.

I've experimented with JNA and invoke mmap/munmap with it and it works fine.
May be we can create module (similar to direct-io) that use mmap/munap on
platforms, that support them
and fallback to Unsafe if not?

чт, 23 июл. 2020 г. в 13:31, Ivan Bessonov <[hidden email]>:

> Hello Igniters,
>
> I'd like to discuss the current issue with "out of memory" fails on
> TeamCity. Particularly suites [1]
> and [2], they have quite a lot of "Exit code 137" failures.
>
> I investigated the "PDS (Indexing)" suite under [3]. There's another
> similar issue as well: [4].
> I came to the conclusion that the main problem is inside the default memory
> allocator (malloc).
> Let me explain the way I see it right now:
>
> "malloc" is allowed to allocate (for internal usages) up to 8 * (number of
> cores) blocks called
> ARENA, 64 mb each. This may happen when a program creates/stops threads
> frequently and
> allocates a lot of memory all the time, which is exactly what our tests do.
> Given that TC agents
> have 32 cores, 8 * 32 * 64 mb gives 16 gigabytes, that's like the whole
> amount of RAM on the
> single agent.
>
> The total amount of arenas can be manually lowered by setting
> the MALLOC_ARENA_MAX
> environment variable to 4 (or other small value). I tried it locally and in
> PDS (Indexing) suite
> settings on TC, results look very promising: [5]
>
> It is said that changing this variable may lead to some performance
> degradation, but it's hard to tell whether we have it or not, because the
> suite usually failed before it was completed.
>
> So, I have two questions right now:
>
> - can those of you, who are into hardcore Linux and C, confirm that the
> solution can help us? Experiments show that it completely solves the
> problem.
> - can you please point me to a person who usually does TC maintenance? I'm
> not entirely sure
> that I can propagate this environment variable to all suites by myself,
> which is necessary to
> avoid occasional error 137 (resulted from the same problem) in future. I
> just don't know all the
> details about suites structure.
>
> Thank you!
>
> [1]
>
> https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_PdsIndexing&tab=buildTypeHistoryList&state=failed&branch_IgniteTests24Java8=%3Cdefault%3E
> [2]
>
> https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_Pds4&tab=buildTypeHistoryList&branch_IgniteTests24Java8=%3Cdefault%3E&state=failed
> [3] https://issues.apache.org/jira/browse/IGNITE-13266
> [4] https://issues.apache.org/jira/browse/IGNITE-13263
> [5]
>
> https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_PdsIndexing&tab=buildTypeHistoryList&branch_IgniteTests24Java8=pull%2F8051%2Fhead
>
> --
> Sincerely yours,
> Ivan Bessonov
>


--
Sincerely yours, Ivan Daschinskiy
Reply | Threaded
Open this post in threaded view
|

Re: PDS suites fail with exit code 137

Ivan Bessonov
Hello Ivan,

It feels like the problem is more about new starting threads rather than the
allocation of offheap regions. Plus I'd like to see results soon, your
proposal is
a major change for Ignite that can't be implemented fast enough.

Anyway, I think this makes sense, considering that one day Unsafe will be
removed. But I wouldn't think about it right now, maybe as a separate
proposal...



чт, 23 июл. 2020 г. в 13:40, Ivan Daschinsky <[hidden email]>:

> Ivan, I think that we should use mmap/munmap to allocate huge chunks of
> memory.
>
> I've experimented with JNA and invoke mmap/munmap with it and it works
> fine.
> May be we can create module (similar to direct-io) that use mmap/munap on
> platforms, that support them
> and fallback to Unsafe if not?
>
> чт, 23 июл. 2020 г. в 13:31, Ivan Bessonov <[hidden email]>:
>
> > Hello Igniters,
> >
> > I'd like to discuss the current issue with "out of memory" fails on
> > TeamCity. Particularly suites [1]
> > and [2], they have quite a lot of "Exit code 137" failures.
> >
> > I investigated the "PDS (Indexing)" suite under [3]. There's another
> > similar issue as well: [4].
> > I came to the conclusion that the main problem is inside the default
> memory
> > allocator (malloc).
> > Let me explain the way I see it right now:
> >
> > "malloc" is allowed to allocate (for internal usages) up to 8 * (number
> of
> > cores) blocks called
> > ARENA, 64 mb each. This may happen when a program creates/stops threads
> > frequently and
> > allocates a lot of memory all the time, which is exactly what our tests
> do.
> > Given that TC agents
> > have 32 cores, 8 * 32 * 64 mb gives 16 gigabytes, that's like the whole
> > amount of RAM on the
> > single agent.
> >
> > The total amount of arenas can be manually lowered by setting
> > the MALLOC_ARENA_MAX
> > environment variable to 4 (or other small value). I tried it locally and
> in
> > PDS (Indexing) suite
> > settings on TC, results look very promising: [5]
> >
> > It is said that changing this variable may lead to some performance
> > degradation, but it's hard to tell whether we have it or not, because the
> > suite usually failed before it was completed.
> >
> > So, I have two questions right now:
> >
> > - can those of you, who are into hardcore Linux and C, confirm that the
> > solution can help us? Experiments show that it completely solves the
> > problem.
> > - can you please point me to a person who usually does TC maintenance?
> I'm
> > not entirely sure
> > that I can propagate this environment variable to all suites by myself,
> > which is necessary to
> > avoid occasional error 137 (resulted from the same problem) in future. I
> > just don't know all the
> > details about suites structure.
> >
> > Thank you!
> >
> > [1]
> >
> >
> https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_PdsIndexing&tab=buildTypeHistoryList&state=failed&branch_IgniteTests24Java8=%3Cdefault%3E
> > [2]
> >
> >
> https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_Pds4&tab=buildTypeHistoryList&branch_IgniteTests24Java8=%3Cdefault%3E&state=failed
> > [3] https://issues.apache.org/jira/browse/IGNITE-13266
> > [4] https://issues.apache.org/jira/browse/IGNITE-13263
> > [5]
> >
> >
> https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_PdsIndexing&tab=buildTypeHistoryList&branch_IgniteTests24Java8=pull%2F8051%2Fhead
> >
> > --
> > Sincerely yours,
> > Ivan Bessonov
> >
>
>
> --
> Sincerely yours, Ivan Daschinskiy
>


--
Sincerely yours,
Ivan Bessonov
Reply | Threaded
Open this post in threaded view
|

Re: PDS suites fail with exit code 137

Ivan Daschinsky
AFAIK, glibc allocator uses arenas for minimize contention between threads
when they trying to access
or free preallocated bit of memory. But seems that we
use -XX:+AlwaysPreTouch, so heap is allocated
and committed at start time. We allocate memory for durable memory in one
thread.
So I think there will be not so much contention between threads for native
memory pools.

Also, there is another approach -- try to use jemalloc.
This allocator shows better result than default glibc malloc in our
scenarios. (memory consumption) [1]

[1] --
http://ithare.com/testing-memory-allocators-ptmalloc2-tcmalloc-hoard-jemalloc-while-trying-to-simulate-real-world-loads/



чт, 23 июл. 2020 г. в 14:19, Ivan Bessonov <[hidden email]>:

> Hello Ivan,
>
> It feels like the problem is more about new starting threads rather than
> the
> allocation of offheap regions. Plus I'd like to see results soon, your
> proposal is
> a major change for Ignite that can't be implemented fast enough.
>
> Anyway, I think this makes sense, considering that one day Unsafe will be
> removed. But I wouldn't think about it right now, maybe as a separate
> proposal...
>
>
>
> чт, 23 июл. 2020 г. в 13:40, Ivan Daschinsky <[hidden email]>:
>
> > Ivan, I think that we should use mmap/munmap to allocate huge chunks of
> > memory.
> >
> > I've experimented with JNA and invoke mmap/munmap with it and it works
> > fine.
> > May be we can create module (similar to direct-io) that use mmap/munap on
> > platforms, that support them
> > and fallback to Unsafe if not?
> >
> > чт, 23 июл. 2020 г. в 13:31, Ivan Bessonov <[hidden email]>:
> >
> > > Hello Igniters,
> > >
> > > I'd like to discuss the current issue with "out of memory" fails on
> > > TeamCity. Particularly suites [1]
> > > and [2], they have quite a lot of "Exit code 137" failures.
> > >
> > > I investigated the "PDS (Indexing)" suite under [3]. There's another
> > > similar issue as well: [4].
> > > I came to the conclusion that the main problem is inside the default
> > memory
> > > allocator (malloc).
> > > Let me explain the way I see it right now:
> > >
> > > "malloc" is allowed to allocate (for internal usages) up to 8 * (number
> > of
> > > cores) blocks called
> > > ARENA, 64 mb each. This may happen when a program creates/stops threads
> > > frequently and
> > > allocates a lot of memory all the time, which is exactly what our tests
> > do.
> > > Given that TC agents
> > > have 32 cores, 8 * 32 * 64 mb gives 16 gigabytes, that's like the whole
> > > amount of RAM on the
> > > single agent.
> > >
> > > The total amount of arenas can be manually lowered by setting
> > > the MALLOC_ARENA_MAX
> > > environment variable to 4 (or other small value). I tried it locally
> and
> > in
> > > PDS (Indexing) suite
> > > settings on TC, results look very promising: [5]
> > >
> > > It is said that changing this variable may lead to some performance
> > > degradation, but it's hard to tell whether we have it or not, because
> the
> > > suite usually failed before it was completed.
> > >
> > > So, I have two questions right now:
> > >
> > > - can those of you, who are into hardcore Linux and C, confirm that the
> > > solution can help us? Experiments show that it completely solves the
> > > problem.
> > > - can you please point me to a person who usually does TC maintenance?
> > I'm
> > > not entirely sure
> > > that I can propagate this environment variable to all suites by myself,
> > > which is necessary to
> > > avoid occasional error 137 (resulted from the same problem) in future.
> I
> > > just don't know all the
> > > details about suites structure.
> > >
> > > Thank you!
> > >
> > > [1]
> > >
> > >
> >
> https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_PdsIndexing&tab=buildTypeHistoryList&state=failed&branch_IgniteTests24Java8=%3Cdefault%3E
> > > [2]
> > >
> > >
> >
> https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_Pds4&tab=buildTypeHistoryList&branch_IgniteTests24Java8=%3Cdefault%3E&state=failed
> > > [3] https://issues.apache.org/jira/browse/IGNITE-13266
> > > [4] https://issues.apache.org/jira/browse/IGNITE-13263
> > > [5]
> > >
> > >
> >
> https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_PdsIndexing&tab=buildTypeHistoryList&branch_IgniteTests24Java8=pull%2F8051%2Fhead
> > >
> > > --
> > > Sincerely yours,
> > > Ivan Bessonov
> > >
> >
> >
> > --
> > Sincerely yours, Ivan Daschinskiy
> >
>
>
> --
> Sincerely yours,
> Ivan Bessonov
>


--
Sincerely yours, Ivan Daschinskiy
Reply | Threaded
Open this post in threaded view
|

Re: PDS suites fail with exit code 137

Ivan Bessonov
>
> glibc allocator uses arenas for minimize contention between threads


I understand it the same way. I did testing with running of Indexing suite
locally
and periodically executing "pmap <pid>", it showed that the number of 64mb
arenas grows constantly and never shrinks. By the middle of the suite the
amount
of virtual memory was close to 50 Gb and used physical memory was at least
6-7 Gb, if I recall it correctly. I have only 8 cores BTW, so it should be
worse on TC.
It means that there is enough contention somewhere in tests.

About "jemalloc" - it's also an option, but it also requires reconfiguring
suites on
TC, maybe in a more complicated way. It requires additional installation,
right?
Can we stick to the solution that I already tested or should we update TC
agents? :)

чт, 23 июл. 2020 г. в 15:02, Ivan Daschinsky <[hidden email]>:

> AFAIK, glibc allocator uses arenas for minimize contention between threads
> when they trying to access
> or free preallocated bit of memory. But seems that we
> use -XX:+AlwaysPreTouch, so heap is allocated
> and committed at start time. We allocate memory for durable memory in one
> thread.
> So I think there will be not so much contention between threads for native
> memory pools.
>
> Also, there is another approach -- try to use jemalloc.
> This allocator shows better result than default glibc malloc in our
> scenarios. (memory consumption) [1]
>
> [1] --
>
> http://ithare.com/testing-memory-allocators-ptmalloc2-tcmalloc-hoard-jemalloc-while-trying-to-simulate-real-world-loads/
>
>
>
> чт, 23 июл. 2020 г. в 14:19, Ivan Bessonov <[hidden email]>:
>
> > Hello Ivan,
> >
> > It feels like the problem is more about new starting threads rather than
> > the
> > allocation of offheap regions. Plus I'd like to see results soon, your
> > proposal is
> > a major change for Ignite that can't be implemented fast enough.
> >
> > Anyway, I think this makes sense, considering that one day Unsafe will be
> > removed. But I wouldn't think about it right now, maybe as a separate
> > proposal...
> >
> >
> >
> > чт, 23 июл. 2020 г. в 13:40, Ivan Daschinsky <[hidden email]>:
> >
> > > Ivan, I think that we should use mmap/munmap to allocate huge chunks of
> > > memory.
> > >
> > > I've experimented with JNA and invoke mmap/munmap with it and it works
> > > fine.
> > > May be we can create module (similar to direct-io) that use mmap/munap
> on
> > > platforms, that support them
> > > and fallback to Unsafe if not?
> > >
> > > чт, 23 июл. 2020 г. в 13:31, Ivan Bessonov <[hidden email]>:
> > >
> > > > Hello Igniters,
> > > >
> > > > I'd like to discuss the current issue with "out of memory" fails on
> > > > TeamCity. Particularly suites [1]
> > > > and [2], they have quite a lot of "Exit code 137" failures.
> > > >
> > > > I investigated the "PDS (Indexing)" suite under [3]. There's another
> > > > similar issue as well: [4].
> > > > I came to the conclusion that the main problem is inside the default
> > > memory
> > > > allocator (malloc).
> > > > Let me explain the way I see it right now:
> > > >
> > > > "malloc" is allowed to allocate (for internal usages) up to 8 *
> (number
> > > of
> > > > cores) blocks called
> > > > ARENA, 64 mb each. This may happen when a program creates/stops
> threads
> > > > frequently and
> > > > allocates a lot of memory all the time, which is exactly what our
> tests
> > > do.
> > > > Given that TC agents
> > > > have 32 cores, 8 * 32 * 64 mb gives 16 gigabytes, that's like the
> whole
> > > > amount of RAM on the
> > > > single agent.
> > > >
> > > > The total amount of arenas can be manually lowered by setting
> > > > the MALLOC_ARENA_MAX
> > > > environment variable to 4 (or other small value). I tried it locally
> > and
> > > in
> > > > PDS (Indexing) suite
> > > > settings on TC, results look very promising: [5]
> > > >
> > > > It is said that changing this variable may lead to some performance
> > > > degradation, but it's hard to tell whether we have it or not, because
> > the
> > > > suite usually failed before it was completed.
> > > >
> > > > So, I have two questions right now:
> > > >
> > > > - can those of you, who are into hardcore Linux and C, confirm that
> the
> > > > solution can help us? Experiments show that it completely solves the
> > > > problem.
> > > > - can you please point me to a person who usually does TC
> maintenance?
> > > I'm
> > > > not entirely sure
> > > > that I can propagate this environment variable to all suites by
> myself,
> > > > which is necessary to
> > > > avoid occasional error 137 (resulted from the same problem) in
> future.
> > I
> > > > just don't know all the
> > > > details about suites structure.
> > > >
> > > > Thank you!
> > > >
> > > > [1]
> > > >
> > > >
> > >
> >
> https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_PdsIndexing&tab=buildTypeHistoryList&state=failed&branch_IgniteTests24Java8=%3Cdefault%3E
> > > > [2]
> > > >
> > > >
> > >
> >
> https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_Pds4&tab=buildTypeHistoryList&branch_IgniteTests24Java8=%3Cdefault%3E&state=failed
> > > > [3] https://issues.apache.org/jira/browse/IGNITE-13266
> > > > [4] https://issues.apache.org/jira/browse/IGNITE-13263
> > > > [5]
> > > >
> > > >
> > >
> >
> https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_PdsIndexing&tab=buildTypeHistoryList&branch_IgniteTests24Java8=pull%2F8051%2Fhead
> > > >
> > > > --
> > > > Sincerely yours,
> > > > Ivan Bessonov
> > > >
> > >
> > >
> > > --
> > > Sincerely yours, Ivan Daschinskiy
> > >
> >
> >
> > --
> > Sincerely yours,
> > Ivan Bessonov
> >
>
>
> --
> Sincerely yours, Ivan Daschinskiy
>


--
Sincerely yours,
Ivan Bessonov
Reply | Threaded
Open this post in threaded view
|

Re: PDS suites fail with exit code 137

Ivan Daschinsky
>
> About "jemalloc" - it's also an option, but it also requires reconfiguring
> suites on
> TC, maybe in a more complicated way. It requires additional installation,
> right?
> Can we stick to the solution that I already tested or should we update TC
> agents? :)


Yes, if you want to use jemalloc, you should install it and configure a
specific env variable.
This is just an option to consider, nothing more. I suppose that your
approach is may be the
best variant right now.


чт, 23 июл. 2020 г. в 15:28, Ivan Bessonov <[hidden email]>:

> >
> > glibc allocator uses arenas for minimize contention between threads
>
>
> I understand it the same way. I did testing with running of Indexing suite
> locally
> and periodically executing "pmap <pid>", it showed that the number of 64mb
> arenas grows constantly and never shrinks. By the middle of the suite the
> amount
> of virtual memory was close to 50 Gb and used physical memory was at least
> 6-7 Gb, if I recall it correctly. I have only 8 cores BTW, so it should be
> worse on TC.
> It means that there is enough contention somewhere in tests.
>
> About "jemalloc" - it's also an option, but it also requires reconfiguring
> suites on
> TC, maybe in a more complicated way. It requires additional installation,
> right?
> Can we stick to the solution that I already tested or should we update TC
> agents? :)
>
> чт, 23 июл. 2020 г. в 15:02, Ivan Daschinsky <[hidden email]>:
>
> > AFAIK, glibc allocator uses arenas for minimize contention between
> threads
> > when they trying to access
> > or free preallocated bit of memory. But seems that we
> > use -XX:+AlwaysPreTouch, so heap is allocated
> > and committed at start time. We allocate memory for durable memory in one
> > thread.
> > So I think there will be not so much contention between threads for
> native
> > memory pools.
> >
> > Also, there is another approach -- try to use jemalloc.
> > This allocator shows better result than default glibc malloc in our
> > scenarios. (memory consumption) [1]
> >
> > [1] --
> >
> >
> http://ithare.com/testing-memory-allocators-ptmalloc2-tcmalloc-hoard-jemalloc-while-trying-to-simulate-real-world-loads/
> >
> >
> >
> > чт, 23 июл. 2020 г. в 14:19, Ivan Bessonov <[hidden email]>:
> >
> > > Hello Ivan,
> > >
> > > It feels like the problem is more about new starting threads rather
> than
> > > the
> > > allocation of offheap regions. Plus I'd like to see results soon, your
> > > proposal is
> > > a major change for Ignite that can't be implemented fast enough.
> > >
> > > Anyway, I think this makes sense, considering that one day Unsafe will
> be
> > > removed. But I wouldn't think about it right now, maybe as a separate
> > > proposal...
> > >
> > >
> > >
> > > чт, 23 июл. 2020 г. в 13:40, Ivan Daschinsky <[hidden email]>:
> > >
> > > > Ivan, I think that we should use mmap/munmap to allocate huge chunks
> of
> > > > memory.
> > > >
> > > > I've experimented with JNA and invoke mmap/munmap with it and it
> works
> > > > fine.
> > > > May be we can create module (similar to direct-io) that use
> mmap/munap
> > on
> > > > platforms, that support them
> > > > and fallback to Unsafe if not?
> > > >
> > > > чт, 23 июл. 2020 г. в 13:31, Ivan Bessonov <[hidden email]>:
> > > >
> > > > > Hello Igniters,
> > > > >
> > > > > I'd like to discuss the current issue with "out of memory" fails on
> > > > > TeamCity. Particularly suites [1]
> > > > > and [2], they have quite a lot of "Exit code 137" failures.
> > > > >
> > > > > I investigated the "PDS (Indexing)" suite under [3]. There's
> another
> > > > > similar issue as well: [4].
> > > > > I came to the conclusion that the main problem is inside the
> default
> > > > memory
> > > > > allocator (malloc).
> > > > > Let me explain the way I see it right now:
> > > > >
> > > > > "malloc" is allowed to allocate (for internal usages) up to 8 *
> > (number
> > > > of
> > > > > cores) blocks called
> > > > > ARENA, 64 mb each. This may happen when a program creates/stops
> > threads
> > > > > frequently and
> > > > > allocates a lot of memory all the time, which is exactly what our
> > tests
> > > > do.
> > > > > Given that TC agents
> > > > > have 32 cores, 8 * 32 * 64 mb gives 16 gigabytes, that's like the
> > whole
> > > > > amount of RAM on the
> > > > > single agent.
> > > > >
> > > > > The total amount of arenas can be manually lowered by setting
> > > > > the MALLOC_ARENA_MAX
> > > > > environment variable to 4 (or other small value). I tried it
> locally
> > > and
> > > > in
> > > > > PDS (Indexing) suite
> > > > > settings on TC, results look very promising: [5]
> > > > >
> > > > > It is said that changing this variable may lead to some performance
> > > > > degradation, but it's hard to tell whether we have it or not,
> because
> > > the
> > > > > suite usually failed before it was completed.
> > > > >
> > > > > So, I have two questions right now:
> > > > >
> > > > > - can those of you, who are into hardcore Linux and C, confirm that
> > the
> > > > > solution can help us? Experiments show that it completely solves
> the
> > > > > problem.
> > > > > - can you please point me to a person who usually does TC
> > maintenance?
> > > > I'm
> > > > > not entirely sure
> > > > > that I can propagate this environment variable to all suites by
> > myself,
> > > > > which is necessary to
> > > > > avoid occasional error 137 (resulted from the same problem) in
> > future.
> > > I
> > > > > just don't know all the
> > > > > details about suites structure.
> > > > >
> > > > > Thank you!
> > > > >
> > > > > [1]
> > > > >
> > > > >
> > > >
> > >
> >
> https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_PdsIndexing&tab=buildTypeHistoryList&state=failed&branch_IgniteTests24Java8=%3Cdefault%3E
> > > > > [2]
> > > > >
> > > > >
> > > >
> > >
> >
> https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_Pds4&tab=buildTypeHistoryList&branch_IgniteTests24Java8=%3Cdefault%3E&state=failed
> > > > > [3] https://issues.apache.org/jira/browse/IGNITE-13266
> > > > > [4] https://issues.apache.org/jira/browse/IGNITE-13263
> > > > > [5]
> > > > >
> > > > >
> > > >
> > >
> >
> https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_PdsIndexing&tab=buildTypeHistoryList&branch_IgniteTests24Java8=pull%2F8051%2Fhead
> > > > >
> > > > > --
> > > > > Sincerely yours,
> > > > > Ivan Bessonov
> > > > >
> > > >
> > > >
> > > > --
> > > > Sincerely yours, Ivan Daschinskiy
> > > >
> > >
> > >
> > > --
> > > Sincerely yours,
> > > Ivan Bessonov
> > >
> >
> >
> > --
> > Sincerely yours, Ivan Daschinskiy
> >
>
>
> --
> Sincerely yours,
> Ivan Bessonov
>


--
Sincerely yours, Ivan Daschinskiy
Reply | Threaded
Open this post in threaded view
|

Re: PDS suites fail with exit code 137

Ivan Pavlukhin
Ivan B.,

I noticed that you were able to configure environment variables for
PDS (Indexing). Do field experiments show that the suggested approach
fixes the problem?

Interesting stuff with jemalloc. It might be useful to file a ticket.

2020-07-23 16:07 GMT+03:00, Ivan Daschinsky <[hidden email]>:

>>
>> About "jemalloc" - it's also an option, but it also requires
>> reconfiguring
>> suites on
>> TC, maybe in a more complicated way. It requires additional installation,
>> right?
>> Can we stick to the solution that I already tested or should we update TC
>> agents? :)
>
>
> Yes, if you want to use jemalloc, you should install it and configure a
> specific env variable.
> This is just an option to consider, nothing more. I suppose that your
> approach is may be the
> best variant right now.
>
>
> чт, 23 июл. 2020 г. в 15:28, Ivan Bessonov <[hidden email]>:
>
>> >
>> > glibc allocator uses arenas for minimize contention between threads
>>
>>
>> I understand it the same way. I did testing with running of Indexing
>> suite
>> locally
>> and periodically executing "pmap <pid>", it showed that the number of
>> 64mb
>> arenas grows constantly and never shrinks. By the middle of the suite the
>> amount
>> of virtual memory was close to 50 Gb and used physical memory was at
>> least
>> 6-7 Gb, if I recall it correctly. I have only 8 cores BTW, so it should
>> be
>> worse on TC.
>> It means that there is enough contention somewhere in tests.
>>
>> About "jemalloc" - it's also an option, but it also requires
>> reconfiguring
>> suites on
>> TC, maybe in a more complicated way. It requires additional installation,
>> right?
>> Can we stick to the solution that I already tested or should we update TC
>> agents? :)
>>
>> чт, 23 июл. 2020 г. в 15:02, Ivan Daschinsky <[hidden email]>:
>>
>> > AFAIK, glibc allocator uses arenas for minimize contention between
>> threads
>> > when they trying to access
>> > or free preallocated bit of memory. But seems that we
>> > use -XX:+AlwaysPreTouch, so heap is allocated
>> > and committed at start time. We allocate memory for durable memory in
>> > one
>> > thread.
>> > So I think there will be not so much contention between threads for
>> native
>> > memory pools.
>> >
>> > Also, there is another approach -- try to use jemalloc.
>> > This allocator shows better result than default glibc malloc in our
>> > scenarios. (memory consumption) [1]
>> >
>> > [1] --
>> >
>> >
>> http://ithare.com/testing-memory-allocators-ptmalloc2-tcmalloc-hoard-jemalloc-while-trying-to-simulate-real-world-loads/
>> >
>> >
>> >
>> > чт, 23 июл. 2020 г. в 14:19, Ivan Bessonov <[hidden email]>:
>> >
>> > > Hello Ivan,
>> > >
>> > > It feels like the problem is more about new starting threads rather
>> than
>> > > the
>> > > allocation of offheap regions. Plus I'd like to see results soon,
>> > > your
>> > > proposal is
>> > > a major change for Ignite that can't be implemented fast enough.
>> > >
>> > > Anyway, I think this makes sense, considering that one day Unsafe
>> > > will
>> be
>> > > removed. But I wouldn't think about it right now, maybe as a separate
>> > > proposal...
>> > >
>> > >
>> > >
>> > > чт, 23 июл. 2020 г. в 13:40, Ivan Daschinsky <[hidden email]>:
>> > >
>> > > > Ivan, I think that we should use mmap/munmap to allocate huge
>> > > > chunks
>> of
>> > > > memory.
>> > > >
>> > > > I've experimented with JNA and invoke mmap/munmap with it and it
>> works
>> > > > fine.
>> > > > May be we can create module (similar to direct-io) that use
>> mmap/munap
>> > on
>> > > > platforms, that support them
>> > > > and fallback to Unsafe if not?
>> > > >
>> > > > чт, 23 июл. 2020 г. в 13:31, Ivan Bessonov <[hidden email]>:
>> > > >
>> > > > > Hello Igniters,
>> > > > >
>> > > > > I'd like to discuss the current issue with "out of memory" fails
>> > > > > on
>> > > > > TeamCity. Particularly suites [1]
>> > > > > and [2], they have quite a lot of "Exit code 137" failures.
>> > > > >
>> > > > > I investigated the "PDS (Indexing)" suite under [3]. There's
>> another
>> > > > > similar issue as well: [4].
>> > > > > I came to the conclusion that the main problem is inside the
>> default
>> > > > memory
>> > > > > allocator (malloc).
>> > > > > Let me explain the way I see it right now:
>> > > > >
>> > > > > "malloc" is allowed to allocate (for internal usages) up to 8 *
>> > (number
>> > > > of
>> > > > > cores) blocks called
>> > > > > ARENA, 64 mb each. This may happen when a program creates/stops
>> > threads
>> > > > > frequently and
>> > > > > allocates a lot of memory all the time, which is exactly what our
>> > tests
>> > > > do.
>> > > > > Given that TC agents
>> > > > > have 32 cores, 8 * 32 * 64 mb gives 16 gigabytes, that's like the
>> > whole
>> > > > > amount of RAM on the
>> > > > > single agent.
>> > > > >
>> > > > > The total amount of arenas can be manually lowered by setting
>> > > > > the MALLOC_ARENA_MAX
>> > > > > environment variable to 4 (or other small value). I tried it
>> locally
>> > > and
>> > > > in
>> > > > > PDS (Indexing) suite
>> > > > > settings on TC, results look very promising: [5]
>> > > > >
>> > > > > It is said that changing this variable may lead to some
>> > > > > performance
>> > > > > degradation, but it's hard to tell whether we have it or not,
>> because
>> > > the
>> > > > > suite usually failed before it was completed.
>> > > > >
>> > > > > So, I have two questions right now:
>> > > > >
>> > > > > - can those of you, who are into hardcore Linux and C, confirm
>> > > > > that
>> > the
>> > > > > solution can help us? Experiments show that it completely solves
>> the
>> > > > > problem.
>> > > > > - can you please point me to a person who usually does TC
>> > maintenance?
>> > > > I'm
>> > > > > not entirely sure
>> > > > > that I can propagate this environment variable to all suites by
>> > myself,
>> > > > > which is necessary to
>> > > > > avoid occasional error 137 (resulted from the same problem) in
>> > future.
>> > > I
>> > > > > just don't know all the
>> > > > > details about suites structure.
>> > > > >
>> > > > > Thank you!
>> > > > >
>> > > > > [1]
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_PdsIndexing&tab=buildTypeHistoryList&state=failed&branch_IgniteTests24Java8=%3Cdefault%3E
>> > > > > [2]
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_Pds4&tab=buildTypeHistoryList&branch_IgniteTests24Java8=%3Cdefault%3E&state=failed
>> > > > > [3] https://issues.apache.org/jira/browse/IGNITE-13266
>> > > > > [4] https://issues.apache.org/jira/browse/IGNITE-13263
>> > > > > [5]
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_PdsIndexing&tab=buildTypeHistoryList&branch_IgniteTests24Java8=pull%2F8051%2Fhead
>> > > > >
>> > > > > --
>> > > > > Sincerely yours,
>> > > > > Ivan Bessonov
>> > > > >
>> > > >
>> > > >
>> > > > --
>> > > > Sincerely yours, Ivan Daschinskiy
>> > > >
>> > >
>> > >
>> > > --
>> > > Sincerely yours,
>> > > Ivan Bessonov
>> > >
>> >
>> >
>> > --
>> > Sincerely yours, Ivan Daschinskiy
>> >
>>
>>
>> --
>> Sincerely yours,
>> Ivan Bessonov
>>
>
>
> --
> Sincerely yours, Ivan Daschinskiy
>


--

Best regards,
Ivan Pavlukhin
Reply | Threaded
Open this post in threaded view
|

Re: PDS suites fail with exit code 137

Ivan Bessonov
Hi Ivan P.,

I configured it for both PDS (Indexing) and PDS 4 (was asked by Nikita
Tolstunov). It totally worked, not a single 137 since then.
Occasional 130 will be fixed in [1], it has a different problem behind it.

Now I'm trying to find someone who knows TC configuration better and
will be able to propagate the setting to all suites. Also I don't have the
access to agents so "jemalloc" is definitely not an option for me
specifically.

[1] https://issues.apache.org/jira/browse/IGNITE-13266

вс, 26 июл. 2020 г. в 17:36, Ivan Pavlukhin <[hidden email]>:

> Ivan B.,
>
> I noticed that you were able to configure environment variables for
> PDS (Indexing). Do field experiments show that the suggested approach
> fixes the problem?
>
> Interesting stuff with jemalloc. It might be useful to file a ticket.
>
> 2020-07-23 16:07 GMT+03:00, Ivan Daschinsky <[hidden email]>:
> >>
> >> About "jemalloc" - it's also an option, but it also requires
> >> reconfiguring
> >> suites on
> >> TC, maybe in a more complicated way. It requires additional
> installation,
> >> right?
> >> Can we stick to the solution that I already tested or should we update
> TC
> >> agents? :)
> >
> >
> > Yes, if you want to use jemalloc, you should install it and configure a
> > specific env variable.
> > This is just an option to consider, nothing more. I suppose that your
> > approach is may be the
> > best variant right now.
> >
> >
> > чт, 23 июл. 2020 г. в 15:28, Ivan Bessonov <[hidden email]>:
> >
> >> >
> >> > glibc allocator uses arenas for minimize contention between threads
> >>
> >>
> >> I understand it the same way. I did testing with running of Indexing
> >> suite
> >> locally
> >> and periodically executing "pmap <pid>", it showed that the number of
> >> 64mb
> >> arenas grows constantly and never shrinks. By the middle of the suite
> the
> >> amount
> >> of virtual memory was close to 50 Gb and used physical memory was at
> >> least
> >> 6-7 Gb, if I recall it correctly. I have only 8 cores BTW, so it should
> >> be
> >> worse on TC.
> >> It means that there is enough contention somewhere in tests.
> >>
> >> About "jemalloc" - it's also an option, but it also requires
> >> reconfiguring
> >> suites on
> >> TC, maybe in a more complicated way. It requires additional
> installation,
> >> right?
> >> Can we stick to the solution that I already tested or should we update
> TC
> >> agents? :)
> >>
> >> чт, 23 июл. 2020 г. в 15:02, Ivan Daschinsky <[hidden email]>:
> >>
> >> > AFAIK, glibc allocator uses arenas for minimize contention between
> >> threads
> >> > when they trying to access
> >> > or free preallocated bit of memory. But seems that we
> >> > use -XX:+AlwaysPreTouch, so heap is allocated
> >> > and committed at start time. We allocate memory for durable memory in
> >> > one
> >> > thread.
> >> > So I think there will be not so much contention between threads for
> >> native
> >> > memory pools.
> >> >
> >> > Also, there is another approach -- try to use jemalloc.
> >> > This allocator shows better result than default glibc malloc in our
> >> > scenarios. (memory consumption) [1]
> >> >
> >> > [1] --
> >> >
> >> >
> >>
> http://ithare.com/testing-memory-allocators-ptmalloc2-tcmalloc-hoard-jemalloc-while-trying-to-simulate-real-world-loads/
> >> >
> >> >
> >> >
> >> > чт, 23 июл. 2020 г. в 14:19, Ivan Bessonov <[hidden email]>:
> >> >
> >> > > Hello Ivan,
> >> > >
> >> > > It feels like the problem is more about new starting threads rather
> >> than
> >> > > the
> >> > > allocation of offheap regions. Plus I'd like to see results soon,
> >> > > your
> >> > > proposal is
> >> > > a major change for Ignite that can't be implemented fast enough.
> >> > >
> >> > > Anyway, I think this makes sense, considering that one day Unsafe
> >> > > will
> >> be
> >> > > removed. But I wouldn't think about it right now, maybe as a
> separate
> >> > > proposal...
> >> > >
> >> > >
> >> > >
> >> > > чт, 23 июл. 2020 г. в 13:40, Ivan Daschinsky <[hidden email]>:
> >> > >
> >> > > > Ivan, I think that we should use mmap/munmap to allocate huge
> >> > > > chunks
> >> of
> >> > > > memory.
> >> > > >
> >> > > > I've experimented with JNA and invoke mmap/munmap with it and it
> >> works
> >> > > > fine.
> >> > > > May be we can create module (similar to direct-io) that use
> >> mmap/munap
> >> > on
> >> > > > platforms, that support them
> >> > > > and fallback to Unsafe if not?
> >> > > >
> >> > > > чт, 23 июл. 2020 г. в 13:31, Ivan Bessonov <[hidden email]
> >:
> >> > > >
> >> > > > > Hello Igniters,
> >> > > > >
> >> > > > > I'd like to discuss the current issue with "out of memory" fails
> >> > > > > on
> >> > > > > TeamCity. Particularly suites [1]
> >> > > > > and [2], they have quite a lot of "Exit code 137" failures.
> >> > > > >
> >> > > > > I investigated the "PDS (Indexing)" suite under [3]. There's
> >> another
> >> > > > > similar issue as well: [4].
> >> > > > > I came to the conclusion that the main problem is inside the
> >> default
> >> > > > memory
> >> > > > > allocator (malloc).
> >> > > > > Let me explain the way I see it right now:
> >> > > > >
> >> > > > > "malloc" is allowed to allocate (for internal usages) up to 8 *
> >> > (number
> >> > > > of
> >> > > > > cores) blocks called
> >> > > > > ARENA, 64 mb each. This may happen when a program creates/stops
> >> > threads
> >> > > > > frequently and
> >> > > > > allocates a lot of memory all the time, which is exactly what
> our
> >> > tests
> >> > > > do.
> >> > > > > Given that TC agents
> >> > > > > have 32 cores, 8 * 32 * 64 mb gives 16 gigabytes, that's like
> the
> >> > whole
> >> > > > > amount of RAM on the
> >> > > > > single agent.
> >> > > > >
> >> > > > > The total amount of arenas can be manually lowered by setting
> >> > > > > the MALLOC_ARENA_MAX
> >> > > > > environment variable to 4 (or other small value). I tried it
> >> locally
> >> > > and
> >> > > > in
> >> > > > > PDS (Indexing) suite
> >> > > > > settings on TC, results look very promising: [5]
> >> > > > >
> >> > > > > It is said that changing this variable may lead to some
> >> > > > > performance
> >> > > > > degradation, but it's hard to tell whether we have it or not,
> >> because
> >> > > the
> >> > > > > suite usually failed before it was completed.
> >> > > > >
> >> > > > > So, I have two questions right now:
> >> > > > >
> >> > > > > - can those of you, who are into hardcore Linux and C, confirm
> >> > > > > that
> >> > the
> >> > > > > solution can help us? Experiments show that it completely solves
> >> the
> >> > > > > problem.
> >> > > > > - can you please point me to a person who usually does TC
> >> > maintenance?
> >> > > > I'm
> >> > > > > not entirely sure
> >> > > > > that I can propagate this environment variable to all suites by
> >> > myself,
> >> > > > > which is necessary to
> >> > > > > avoid occasional error 137 (resulted from the same problem) in
> >> > future.
> >> > > I
> >> > > > > just don't know all the
> >> > > > > details about suites structure.
> >> > > > >
> >> > > > > Thank you!
> >> > > > >
> >> > > > > [1]
> >> > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_PdsIndexing&tab=buildTypeHistoryList&state=failed&branch_IgniteTests24Java8=%3Cdefault%3E
> >> > > > > [2]
> >> > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_Pds4&tab=buildTypeHistoryList&branch_IgniteTests24Java8=%3Cdefault%3E&state=failed
> >> > > > > [3] https://issues.apache.org/jira/browse/IGNITE-13266
> >> > > > > [4] https://issues.apache.org/jira/browse/IGNITE-13263
> >> > > > > [5]
> >> > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_PdsIndexing&tab=buildTypeHistoryList&branch_IgniteTests24Java8=pull%2F8051%2Fhead
> >> > > > >
> >> > > > > --
> >> > > > > Sincerely yours,
> >> > > > > Ivan Bessonov
> >> > > > >
> >> > > >
> >> > > >
> >> > > > --
> >> > > > Sincerely yours, Ivan Daschinskiy
> >> > > >
> >> > >
> >> > >
> >> > > --
> >> > > Sincerely yours,
> >> > > Ivan Bessonov
> >> > >
> >> >
> >> >
> >> > --
> >> > Sincerely yours, Ivan Daschinskiy
> >> >
> >>
> >>
> >> --
> >> Sincerely yours,
> >> Ivan Bessonov
> >>
> >
> >
> > --
> > Sincerely yours, Ivan Daschinskiy
> >
>
>
> --
>
> Best regards,
> Ivan Pavlukhin
>


--
Sincerely yours,
Ivan Bessonov
Reply | Threaded
Open this post in threaded view
|

Re: PDS suites fail with exit code 137

Ivan Pavlukhin
Ivan B.,

Good news, thank you!

2020-07-27 10:28 GMT+03:00, Ivan Bessonov <[hidden email]>:

> Hi Ivan P.,
>
> I configured it for both PDS (Indexing) and PDS 4 (was asked by Nikita
> Tolstunov). It totally worked, not a single 137 since then.
> Occasional 130 will be fixed in [1], it has a different problem behind it.
>
> Now I'm trying to find someone who knows TC configuration better and
> will be able to propagate the setting to all suites. Also I don't have the
> access to agents so "jemalloc" is definitely not an option for me
> specifically.
>
> [1] https://issues.apache.org/jira/browse/IGNITE-13266
>
> вс, 26 июл. 2020 г. в 17:36, Ivan Pavlukhin <[hidden email]>:
>
>> Ivan B.,
>>
>> I noticed that you were able to configure environment variables for
>> PDS (Indexing). Do field experiments show that the suggested approach
>> fixes the problem?
>>
>> Interesting stuff with jemalloc. It might be useful to file a ticket.
>>
>> 2020-07-23 16:07 GMT+03:00, Ivan Daschinsky <[hidden email]>:
>> >>
>> >> About "jemalloc" - it's also an option, but it also requires
>> >> reconfiguring
>> >> suites on
>> >> TC, maybe in a more complicated way. It requires additional
>> installation,
>> >> right?
>> >> Can we stick to the solution that I already tested or should we update
>> TC
>> >> agents? :)
>> >
>> >
>> > Yes, if you want to use jemalloc, you should install it and configure a
>> > specific env variable.
>> > This is just an option to consider, nothing more. I suppose that your
>> > approach is may be the
>> > best variant right now.
>> >
>> >
>> > чт, 23 июл. 2020 г. в 15:28, Ivan Bessonov <[hidden email]>:
>> >
>> >> >
>> >> > glibc allocator uses arenas for minimize contention between threads
>> >>
>> >>
>> >> I understand it the same way. I did testing with running of Indexing
>> >> suite
>> >> locally
>> >> and periodically executing "pmap <pid>", it showed that the number of
>> >> 64mb
>> >> arenas grows constantly and never shrinks. By the middle of the suite
>> the
>> >> amount
>> >> of virtual memory was close to 50 Gb and used physical memory was at
>> >> least
>> >> 6-7 Gb, if I recall it correctly. I have only 8 cores BTW, so it
>> >> should
>> >> be
>> >> worse on TC.
>> >> It means that there is enough contention somewhere in tests.
>> >>
>> >> About "jemalloc" - it's also an option, but it also requires
>> >> reconfiguring
>> >> suites on
>> >> TC, maybe in a more complicated way. It requires additional
>> installation,
>> >> right?
>> >> Can we stick to the solution that I already tested or should we update
>> TC
>> >> agents? :)
>> >>
>> >> чт, 23 июл. 2020 г. в 15:02, Ivan Daschinsky <[hidden email]>:
>> >>
>> >> > AFAIK, glibc allocator uses arenas for minimize contention between
>> >> threads
>> >> > when they trying to access
>> >> > or free preallocated bit of memory. But seems that we
>> >> > use -XX:+AlwaysPreTouch, so heap is allocated
>> >> > and committed at start time. We allocate memory for durable memory
>> >> > in
>> >> > one
>> >> > thread.
>> >> > So I think there will be not so much contention between threads for
>> >> native
>> >> > memory pools.
>> >> >
>> >> > Also, there is another approach -- try to use jemalloc.
>> >> > This allocator shows better result than default glibc malloc in our
>> >> > scenarios. (memory consumption) [1]
>> >> >
>> >> > [1] --
>> >> >
>> >> >
>> >>
>> http://ithare.com/testing-memory-allocators-ptmalloc2-tcmalloc-hoard-jemalloc-while-trying-to-simulate-real-world-loads/
>> >> >
>> >> >
>> >> >
>> >> > чт, 23 июл. 2020 г. в 14:19, Ivan Bessonov <[hidden email]>:
>> >> >
>> >> > > Hello Ivan,
>> >> > >
>> >> > > It feels like the problem is more about new starting threads
>> >> > > rather
>> >> than
>> >> > > the
>> >> > > allocation of offheap regions. Plus I'd like to see results soon,
>> >> > > your
>> >> > > proposal is
>> >> > > a major change for Ignite that can't be implemented fast enough.
>> >> > >
>> >> > > Anyway, I think this makes sense, considering that one day Unsafe
>> >> > > will
>> >> be
>> >> > > removed. But I wouldn't think about it right now, maybe as a
>> separate
>> >> > > proposal...
>> >> > >
>> >> > >
>> >> > >
>> >> > > чт, 23 июл. 2020 г. в 13:40, Ivan Daschinsky
>> >> > > <[hidden email]>:
>> >> > >
>> >> > > > Ivan, I think that we should use mmap/munmap to allocate huge
>> >> > > > chunks
>> >> of
>> >> > > > memory.
>> >> > > >
>> >> > > > I've experimented with JNA and invoke mmap/munmap with it and it
>> >> works
>> >> > > > fine.
>> >> > > > May be we can create module (similar to direct-io) that use
>> >> mmap/munap
>> >> > on
>> >> > > > platforms, that support them
>> >> > > > and fallback to Unsafe if not?
>> >> > > >
>> >> > > > чт, 23 июл. 2020 г. в 13:31, Ivan Bessonov
>> >> > > > <[hidden email]
>> >:
>> >> > > >
>> >> > > > > Hello Igniters,
>> >> > > > >
>> >> > > > > I'd like to discuss the current issue with "out of memory"
>> >> > > > > fails
>> >> > > > > on
>> >> > > > > TeamCity. Particularly suites [1]
>> >> > > > > and [2], they have quite a lot of "Exit code 137" failures.
>> >> > > > >
>> >> > > > > I investigated the "PDS (Indexing)" suite under [3]. There's
>> >> another
>> >> > > > > similar issue as well: [4].
>> >> > > > > I came to the conclusion that the main problem is inside the
>> >> default
>> >> > > > memory
>> >> > > > > allocator (malloc).
>> >> > > > > Let me explain the way I see it right now:
>> >> > > > >
>> >> > > > > "malloc" is allowed to allocate (for internal usages) up to 8
>> >> > > > > *
>> >> > (number
>> >> > > > of
>> >> > > > > cores) blocks called
>> >> > > > > ARENA, 64 mb each. This may happen when a program
>> >> > > > > creates/stops
>> >> > threads
>> >> > > > > frequently and
>> >> > > > > allocates a lot of memory all the time, which is exactly what
>> our
>> >> > tests
>> >> > > > do.
>> >> > > > > Given that TC agents
>> >> > > > > have 32 cores, 8 * 32 * 64 mb gives 16 gigabytes, that's like
>> the
>> >> > whole
>> >> > > > > amount of RAM on the
>> >> > > > > single agent.
>> >> > > > >
>> >> > > > > The total amount of arenas can be manually lowered by setting
>> >> > > > > the MALLOC_ARENA_MAX
>> >> > > > > environment variable to 4 (or other small value). I tried it
>> >> locally
>> >> > > and
>> >> > > > in
>> >> > > > > PDS (Indexing) suite
>> >> > > > > settings on TC, results look very promising: [5]
>> >> > > > >
>> >> > > > > It is said that changing this variable may lead to some
>> >> > > > > performance
>> >> > > > > degradation, but it's hard to tell whether we have it or not,
>> >> because
>> >> > > the
>> >> > > > > suite usually failed before it was completed.
>> >> > > > >
>> >> > > > > So, I have two questions right now:
>> >> > > > >
>> >> > > > > - can those of you, who are into hardcore Linux and C, confirm
>> >> > > > > that
>> >> > the
>> >> > > > > solution can help us? Experiments show that it completely
>> >> > > > > solves
>> >> the
>> >> > > > > problem.
>> >> > > > > - can you please point me to a person who usually does TC
>> >> > maintenance?
>> >> > > > I'm
>> >> > > > > not entirely sure
>> >> > > > > that I can propagate this environment variable to all suites
>> >> > > > > by
>> >> > myself,
>> >> > > > > which is necessary to
>> >> > > > > avoid occasional error 137 (resulted from the same problem) in
>> >> > future.
>> >> > > I
>> >> > > > > just don't know all the
>> >> > > > > details about suites structure.
>> >> > > > >
>> >> > > > > Thank you!
>> >> > > > >
>> >> > > > > [1]
>> >> > > > >
>> >> > > > >
>> >> > > >
>> >> > >
>> >> >
>> >>
>> https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_PdsIndexing&tab=buildTypeHistoryList&state=failed&branch_IgniteTests24Java8=%3Cdefault%3E
>> >> > > > > [2]
>> >> > > > >
>> >> > > > >
>> >> > > >
>> >> > >
>> >> >
>> >>
>> https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_Pds4&tab=buildTypeHistoryList&branch_IgniteTests24Java8=%3Cdefault%3E&state=failed
>> >> > > > > [3] https://issues.apache.org/jira/browse/IGNITE-13266
>> >> > > > > [4] https://issues.apache.org/jira/browse/IGNITE-13263
>> >> > > > > [5]
>> >> > > > >
>> >> > > > >
>> >> > > >
>> >> > >
>> >> >
>> >>
>> https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_PdsIndexing&tab=buildTypeHistoryList&branch_IgniteTests24Java8=pull%2F8051%2Fhead
>> >> > > > >
>> >> > > > > --
>> >> > > > > Sincerely yours,
>> >> > > > > Ivan Bessonov
>> >> > > > >
>> >> > > >
>> >> > > >
>> >> > > > --
>> >> > > > Sincerely yours, Ivan Daschinskiy
>> >> > > >
>> >> > >
>> >> > >
>> >> > > --
>> >> > > Sincerely yours,
>> >> > > Ivan Bessonov
>> >> > >
>> >> >
>> >> >
>> >> > --
>> >> > Sincerely yours, Ivan Daschinskiy
>> >> >
>> >>
>> >>
>> >> --
>> >> Sincerely yours,
>> >> Ivan Bessonov
>> >>
>> >
>> >
>> > --
>> > Sincerely yours, Ivan Daschinskiy
>> >
>>
>>
>> --
>>
>> Best regards,
>> Ivan Pavlukhin
>>
>
>
> --
> Sincerely yours,
> Ivan Bessonov
>


--

Best regards,
Ivan Pavlukhin