Apache Ignite Developers - Legacy Mail Archive

[jira] [Created] (IGNITE-3558) Affinity task hangs when Collision SPI produces a lot of job rejections & Failover SPI produces many attempts

Classic

List

Threaded

1 message

Anton Vinogradov (Jira)

[jira] [Created] (IGNITE-3558) Affinity task hangs when Collision SPI produces a lot of job rejections & Failover SPI produces many attempts

Taras Ledkov created IGNITE-3558:
------------------------------------

Summary: Affinity task hangs when Collision SPI produces a lot of job rejections & Failover SPI produces many attempts
Key: IGNITE-3558
URL: https://issues.apache.org/jira/browse/IGNITE-3558
Project: Ignite
Issue Type: Bug
Components: compute
Reporter: Taras Ledkov
Assignee: Taras Ledkov

The test to reproduce:
IgniteCacheLockPartitionOnAffinityRunWithCollisionSpiTest#testJobFinishing

*Root cause*
GridJobExecuteResponse isn't set from target node because there is a confusion with GridJobWorker instances in the CollisionContext.

*Suggestion*
The method GridJobProcessor.CollisionJobContext.cancel()
use passiveJobs.remove(jobWorker.getJobId(), jobWorker).
*passiveJobs* is a ConcurrentHashMap and GridJobWorker.equals() implements as a equation of jobId.

So, when two thread try to cancel the two workers with *the same jobIds* we have the case:
- thread0 remove jobWorker0 & cancel jobWorker0.
- thread0 put jobWorker1 (because jobWorker0 already removed);
- thread1: (has a copy of jobWorker0) and try to cancel it.
- thread1: remove jobWorker1 instead of jobWorker0 (because jobId is used to identify);
- thread1: doesn't send ExecuteResponse because jobWorker0 has been canceled.

*Proposal*
Try to use system default equals for the GridJobWorker

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)