Ignite Usability: Deadlocks and Starvation

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Ignite Usability: Deadlocks and Starvation

yzhdanov
Hello, guys!

Currently Ignite clusters are very vulnerable to deadlocks on tx and java
levels and also thread pool starvation.

Unfortunately, Ignite currently offers very poor functionality to protect
and resolve situations of the kind.

Here is the page with the info on this -
https://cwiki.apache.org/confluence/display/IGNITE/Deadlock+Detection+And+Cluster+Protection

Please provide comments here on dev list or on wiki. Then we will file
tickets to plan these changes.

--Yakov
Reply | Threaded
Open this post in threaded view
|

Re: Ignite Usability: Deadlocks and Starvation

Andrey Kuznetsov
Hi folks,

Strictly speaking, it's not about improving deadlock/starvation detection
mechanisms, but how do we handle fragile concurrent code that (potentially)
leads to this kind of issues? For example, what if the method is assumed to
be thread-safe, but its safety depends on arguments supplied?

2017-08-23 19:44 GMT+03:00 Yakov Zhdanov <[hidden email]>:

> Hello, guys!
>
> Currently Ignite clusters are very vulnerable to deadlocks on tx and java
> levels and also thread pool starvation.
>
> Unfortunately, Ignite currently offers very poor functionality to protect
> and resolve situations of the kind.
>
> Here is the page with the info on this -
> https://cwiki.apache.org/confluence/display/IGNITE/Deadlock+Detection+And+
> Cluster+Protection
>
> Please provide comments here on dev list or on wiki. Then we will file
> tickets to plan these changes.
>
> --Yakov
>



--
Best regards,
  Andrey Kuznetsov.
Reply | Threaded
Open this post in threaded view
|

Re: Ignite Usability: Deadlocks and Starvation

yzhdanov
Andrey, do you have an example of such method in Ignite public or private
API?

If it is not Ignite API please provide an example.

--
Yakov Zhdanov
Reply | Threaded
Open this post in threaded view
|

Re: Ignite Usability: Deadlocks and Starvation

Andrey Kuznetsov
It's not public API, just implementation detail.

GridFutureAdapter::unregisterWaiter is not thread state in general. It
won't work properly if the argument is not current thread. And also I
couldn't prove clearly it wouldn't drop some unrelated Node in concurrent
operation.

2017-08-23 22:45 GMT+03:00 Yakov Zhdanov <[hidden email]>:

> Andrey, do you have an example of such method in Ignite public or private
> API?
>
> If it is not Ignite API please provide an example.
>
> --
> Yakov Zhdanov
>



--
Best regards,
  Andrey Kuznetsov.
Reply | Threaded
Open this post in threaded view
|

Re: Ignite Usability: Deadlocks and Starvation

dsetrakyan
In reply to this post by yzhdanov
Yakov,

I think as a first step, the deadlock detection should kick off after a
certain timeout, even if the transaction timeout was not set.

What do you think?

D.

On Wed, Aug 23, 2017 at 9:44 AM, Yakov Zhdanov <[hidden email]> wrote:

> Hello, guys!
>
> Currently Ignite clusters are very vulnerable to deadlocks on tx and java
> levels and also thread pool starvation.
>
> Unfortunately, Ignite currently offers very poor functionality to protect
> and resolve situations of the kind.
>
> Here is the page with the info on this -
> https://cwiki.apache.org/confluence/display/IGNITE/Deadlock+Detection+And+
> Cluster+Protection
>
> Please provide comments here on dev list or on wiki. Then we will file
> tickets to plan these changes.
>
> --Yakov
>
Reply | Threaded
Open this post in threaded view
|

Re: Ignite Usability: Deadlocks and Starvation

yzhdanov
> I think as a first step, the deadlock detection should kick off after a
> certain timeout, even if the transaction timeout was not set.

> What do you think?

Dmitry, I thought that was what I suggested, no? =)

--Yakov
Reply | Threaded
Open this post in threaded view
|

Re: Ignite Usability: Deadlocks and Starvation

dsetrakyan
On Thu, Aug 24, 2017 at 10:09 AM, Yakov Zhdanov <[hidden email]> wrote:

> > I think as a first step, the deadlock detection should kick off after a
> > certain timeout, even if the transaction timeout was not set.
>
> > What do you think?
>
> Dmitry, I thought that was what I suggested, no? =)
>

I am never sure with you :)
Reply | Threaded
Open this post in threaded view
|

Re: Ignite Usability: Deadlocks and Starvation

yzhdanov
In reply to this post by Andrey Kuznetsov
>It's not public API, just implementation detail.

>GridFutureAdapter::unregisterWaiter is not thread state in general. It
>won't work properly if the argument is not current thread. And also I
>couldn't prove clearly it wouldn't drop some unrelated Node in concurrent
>operation.

Andrey, it seems unregisterWaiter() method is supposed to be called only
with current thread as parameter.

--Yakov
Reply | Threaded
Open this post in threaded view
|

Re: Ignite Usability: Deadlocks and Starvation

Andrey Kuznetsov
Yakov, it would be good to get rid of the parameter at all. Strictly
speaking, such tiny things are not bugs, but can lead to bugs, and I'm
curious about the way of improving the situation.


> Andrey, it seems unregisterWaiter() method is supposed to be called only
> with current thread as parameter.
>
> --Yakov
>



--
Best regards,
  Andrey Kuznetsov.
Reply | Threaded
Open this post in threaded view
|

Re: Ignite Usability: Deadlocks and Starvation

yzhdanov
Andrey, feel free to file a ticket and remove the parameter =)

--Yakov