Re: nodes are restarting when i try to drop a table created with persistence enabled

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: nodes are restarting when i try to drop a table created with persistence enabled

dmagda
Shiva,

Does this issue still exist? Ignite Dev how do we debug this sort of thing?

-
Denis


On Tue, Sep 17, 2019 at 7:22 AM Shiva Kumar <[hidden email]>
wrote:

> Hi dmagda,
>
> I am trying to drop the table which has around 10 million records and I am
> seeing "*Out of memory in data region*" error messages in Ignite logs and
> ignite node [Ignite pod on kubernetes] is restarting.
> I have configured 3GB for default data region, 7GB for JVM and total 15GB
> for Ignite container and enabled native persistence.
> Earlier I was in an impression that restart was caused by "
> *SYSTEM_WORKER_BLOCKED*" errors but now I am realized that  "
> *SYSTEM_WORKER_BLOCKED*" is added to ignore failure list and the actual
> cause is " *CRITICAL_ERROR* " due to  "*Out of memory in data region"*
>
> This is the error messages in logs:
>
> ""[2019-09-17T08:25:35,054][ERROR][sys-#773][] *JVM will be halted
> immediately due to the failure: [failureCtx=FailureContext
> [type=CRITICAL_ERROR, err=class o.a.i.i.mem.IgniteOutOfMemoryException:
> Failed to find a page for eviction* [segmentCapacity=971652,
> loaded=381157, maxDirtyPages=285868, dirtyPages=381157, cpPages=0,
> pinnedInSegment=3, failedToPrepare=381155]
> *Out of memory in data region* [name=Default_Region, initSize=500.0 MiB,
> maxSize=3.0 GiB, persistenceEnabled=true] Try the following:
>   ^-- Increase maximum off-heap memory size
> (DataRegionConfiguration.maxSize)
>   ^-- Enable Ignite persistence
> (DataRegionConfiguration.persistenceEnabled)
>   ^-- Enable eviction or expiration policies]]
>
> Could you please help me on why *drop table operation* causing  "*Out of
> memory in data region"*? and how I can avoid it?
>
> We have a use case where application inserts records to many tables in
> Ignite simultaneously for some time period and other applications run a
> query on that time period data and update the dashboard. we need to delete
> the records inserted in the previous time period before inserting new
> records.
>
> even during *delete from table* operation, I have seen:
>
> "Critical system error detected. Will be handled accordingly to configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, super=AbstractFailureHandler [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED]]], failureCtx=FailureContext [*type=CRITICAL_ERROR*, err=class o.a.i.IgniteException: *Checkpoint read lock acquisition has been timed out*.]] class org.apache.ignite.IgniteException: Checkpoint read lock acquisition has been timed out.|
>
>
>
> On Mon, Apr 29, 2019 at 12:17 PM Denis Magda <[hidden email]> wrote:
>
>> Hi Shiva,
>>
>> That was designed to prevent global cluster performance degradation or
>> other outages. Have you tried to apply my recommendation of turning of the
>> failure handler for this system threads?
>>
>> -
>> Denis
>>
>>
>> On Sun, Apr 28, 2019 at 10:28 AM shivakumar <[hidden email]>
>> wrote:
>>
>>> HI Denis,
>>>
>>> is there any specific reason for the blocking of critical thread, like
>>> CPU
>>> is full or Heap is full ?
>>> We are again and again hitting this issue.
>>> is there any other way to drop tables/cache ?
>>> This looks like a critical issue.
>>>
>>> regards,
>>> shiva
>>>
>>>
>>>
>>> --
>>> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>>>
>>
Reply | Threaded
Open this post in threaded view
|

Re: nodes are restarting when i try to drop a table created with persistence enabled

Denis Mekhanikov
I think, the issue is that Ignite can't recover from
IgniteOutOfMemory, even by removing data.
Shiva, did IgniteOutOfMemory occur for the first time when you did the
DROP TABLE, or before that?

Denis

ср, 25 сент. 2019 г. в 02:30, Denis Magda <[hidden email]>:

>
> Shiva,
>
> Does this issue still exist? Ignite Dev how do we debug this sort of thing?
>
> -
> Denis
>
>
> On Tue, Sep 17, 2019 at 7:22 AM Shiva Kumar <[hidden email]> wrote:
>>
>> Hi dmagda,
>>
>> I am trying to drop the table which has around 10 million records and I am seeing "Out of memory in data region" error messages in Ignite logs and ignite node [Ignite pod on kubernetes] is restarting.
>> I have configured 3GB for default data region, 7GB for JVM and total 15GB for Ignite container and enabled native persistence.
>> Earlier I was in an impression that restart was caused by "SYSTEM_WORKER_BLOCKED" errors but now I am realized that  "SYSTEM_WORKER_BLOCKED" is added to ignore failure list and the actual cause is " CRITICAL_ERROR " due to  "Out of memory in data region"
>>
>> This is the error messages in logs:
>>
>> ""[2019-09-17T08:25:35,054][ERROR][sys-#773][] JVM will be halted immediately due to the failure: [failureCtx=FailureContext [type=CRITICAL_ERROR, err=class o.a.i.i.mem.IgniteOutOfMemoryException: Failed to find a page for eviction [segmentCapacity=971652, loaded=381157, maxDirtyPages=285868, dirtyPages=381157, cpPages=0, pinnedInSegment=3, failedToPrepare=381155]
>> Out of memory in data region [name=Default_Region, initSize=500.0 MiB, maxSize=3.0 GiB, persistenceEnabled=true] Try the following:
>>   ^-- Increase maximum off-heap memory size (DataRegionConfiguration.maxSize)
>>   ^-- Enable Ignite persistence (DataRegionConfiguration.persistenceEnabled)
>>   ^-- Enable eviction or expiration policies]]
>>
>> Could you please help me on why drop table operation causing  "Out of memory in data region"? and how I can avoid it?
>>
>> We have a use case where application inserts records to many tables in Ignite simultaneously for some time period and other applications run a query on that time period data and update the dashboard. we need to delete the records inserted in the previous time period before inserting new records.
>>
>> even during delete from table operation, I have seen:
>>
>> "Critical system error detected. Will be handled accordingly to configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, super=AbstractFailureHandler [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED]]], failureCtx=FailureContext [type=CRITICAL_ERROR, err=class o.a.i.IgniteException: Checkpoint read lock acquisition has been timed out.]] class org.apache.ignite.IgniteException: Checkpoint read lock acquisition has been timed out.|
>>
>>
>>
>> On Mon, Apr 29, 2019 at 12:17 PM Denis Magda <[hidden email]> wrote:
>>>
>>> Hi Shiva,
>>>
>>> That was designed to prevent global cluster performance degradation or other outages. Have you tried to apply my recommendation of turning of the failure handler for this system threads?
>>>
>>> -
>>> Denis
>>>
>>>
>>> On Sun, Apr 28, 2019 at 10:28 AM shivakumar <[hidden email]> wrote:
>>>>
>>>> HI Denis,
>>>>
>>>> is there any specific reason for the blocking of critical thread, like CPU
>>>> is full or Heap is full ?
>>>> We are again and again hitting this issue.
>>>> is there any other way to drop tables/cache ?
>>>> This looks like a critical issue.
>>>>
>>>> regards,
>>>> shiva
>>>>
>>>>
>>>>
>>>> --
>>>> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: nodes are restarting when i try to drop a table created with persistence enabled

Ivan Pavlukhin
Hi,

Stacktrace and exception message has some valuable details:
org.apache.ignite.internal.mem.IgniteOutOfMemoryException: Failed to
find a page for eviction [segmentCapacity=126515, loaded=49628,
maxDirtyPages=37221, dirtyPages=49627, cpPages=0, pinnedInSegment=1,
failedToPrepare=49628]

I see a following:
1. Not all data fits data region memory.
2. Exception occurs when underlying cache is destroyed
(IgniteCacheOffheapManagerImpl.stopCache/removeCacheData call in stack
trace).
3. Page for replacement to disk was not found (loaded=49628,
failedToPrepare=49628). Almost all pages are dirty (dirtyPages=49627).

Answering several questions can help:
1. Does the same occur if IgniteCache.destroy() is called instead of DROP TABLE?
2. Does the same occur if SQL is not enabled for a cache?
3. It would be nice to see IgniteConfiguration and CacheConfiguration
causing problems.
4. Need to figure out why almost all pages are dirty. It might be a clue.