Apache Ignite Developers - Legacy Mail Archive

Ignite pod keeps crashing and failed to recover the node

Classic

List

Threaded

2 messages Options

radha

Ignite pod keeps crashing and failed to recover the node

Ignite has been deployed on the kubernets , there are 3 replicas of server pod. The pods were up and running fine for 9 days. We have created 180 inventory tables and 204 transactional tables. The data has been inserted using the PyIgnite client using the cache.put() method. This is a very slow operation because PyIgnite is very slow. Each insert is committed one at a time, so it is not able to do bulk-style inserts. The PyIgnite was inserting about 20 of the inventory tables simultaneously (20 different threads/processes).

The cluster was nowhere stable after 9days, one of the pod crashed and failed to recover. Below is the error log:

{"type":"log","host":"ignite-cluster-ignite-esoc-2","level":"ERROR","system":"ignite-service","time":"2019-08-16T17:13:34,769Z","logger":"GridCachePartitionExchangeManager","timezone":"UTC","log":"Failed to process custom exchange task: ClientCacheChangeDummyDiscoveryMessage [reqId=6b5f6c50-a8c9-4b04-a461-49bfd0112eb0, cachesToClose=null, startCaches=[BgwService]] java.lang.NullPointerException| at org.apache.ignite.internal.processors.cache.CacheAffinitySharedManager.processClientCachesChanges(CacheAffinitySharedManager.java:635)| at org.apache.ignite.internal.processors.cache.GridCacheProcessor.processCustomExchangeTask(GridCacheProcessor.java:391)| at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.processCustomTask(GridCachePartitionExchangeManager.java:2475)| at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body0(GridCachePartitionExchangeManager.java:2620)| at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:2539)| at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)| at java.lang.Thread.run(Thread.java:748)"} {"type":"log","host":"ignite-cluster-ignite-esoc-2","level":"WARN","system":"ignite-service","time":"2019-08-16T17:13:36,724Z","logger":"GridCacheDatabaseSharedManager","timezone":"UTC","log":"Ignite node stopped in the middle of checkpoint. Will restore memory state and finish checkpoint on node start."}

The error report file and ignite-config.xml has been attached for your info.

Heap Memory and RAM Configurations are as below on each of the ignite server container:

Heap Memory: 32gb

RAM: 64GB

Default memory region:

cpu: 4

Persistence volume

wal_storage_size: 10GB

persistence_storage_size: 10GB

Thanks

With Regards

Radha

ignite-config.xml (8K) Download Attachment

dmagda

Re: Ignite pod keeps crashing and failed to recover the node

Hello,

As I see, the community guys stepped in and ready to help with this problem
via this discussion:
http://apache-ignite-users.70518.x6.nabble.com/One-of-Ignite-pod-keeps-crashing-and-not-joining-the-cluster-td29091.html#a29105

Please check out that response that clarifies why the method you use for
data loading is not optimal:
https://stackoverflow.com/questions/56778778/apache-ignite-inserts-extremely-slow/56795152#56795152

-
Denis

On Tue, Aug 20, 2019 at 10:25 AM radha jai <[hidden email]> wrote:

> Ignite has been deployed on the kubernets , there are 3 replicas of server
> pod. The pods were up and running fine for 9 days. We have created 180
> inventory tables and 204 transactional tables. The data has been
> inserted using the PyIgnite client using the cache.put() method. This is a
> very slow operation because PyIgnite is very slow. Each insert is
> committed one at a time, so it is not able to do bulk-style inserts. The
> PyIgnite was inserting about 20 of the inventory tables simultaneously (20
> different threads/processes).
>
> The cluster was nowhere stable after 9days, one of the pod crashed and
> failed to recover. Below is the error log:
> {"type":"log","host":"ignite-cluster-ignite-esoc-2","level":"ERROR","system":"ignite-service","time":"2019-08-16T17:13:34,769Z","logger":"GridCachePartitionExchangeManager","timezone":"UTC","log":"Failed
> to process custom exchange task: ClientCacheChangeDummyDiscoveryMessage
> [reqId=6b5f6c50-a8c9-4b04-a461-49bfd0112eb0, cachesToClose=null,
> startCaches=[BgwService]] java.lang.NullPointerException| at
> org.apache.ignite.internal.processors.cache.CacheAffinitySharedManager.processClientCachesChanges(CacheAffinitySharedManager.java:635)|
> at
> org.apache.ignite.internal.processors.cache.GridCacheProcessor.processCustomExchangeTask(GridCacheProcessor.java:391)|
> at
> org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.processCustomTask(GridCachePartitionExchangeManager.java:2475)|
> at
> org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body0(GridCachePartitionExchangeManager.java:2620)|
> at
> org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:2539)|
> at
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)|
> at java.lang.Thread.run(Thread.java:748)"}
> {"type":"log","host":"ignite-cluster-ignite-esoc-2","level":"WARN","system":"ignite-service","time":"2019-08-16T17:13:36,724Z","logger":"GridCacheDatabaseSharedManager","timezone":"UTC","log":"Ignite
> node stopped in the middle of checkpoint. Will restore memory state and
> finish checkpoint on node start."}
>
> The error report file and ignite-config.xml has been attached for your
> info.
>
> Heap Memory and RAM Configurations are as below on each of the ignite
> server container:
>
> Heap Memory: 32gb
>
> RAM: 64GB
>
> Default memory region:
>
> cpu: 4
>
> Persistence volume
>
> wal_storage_size: 10GB
>
> persistence_storage_size: 10GB
>
>
> Thanks
>
> With Regards
>
> Radha
>