Partition Loss Policies issues

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Partition Loss Policies issues

Stanislav Lukyanov
Hi Igniters,

I've been looking into various scenarios of Partition Loss Policies usage recently,
and found a number of issues in the current implementation.

I'll start with an overview, but if you'd like to dive to a proposal I have right now then please
feel free to scroll down to TLDR.

The list of issues is below:
https://issues.apache.org/jira/browse/IGNITE-10041: Partition loss policies work incorrectly with BLT
https://issues.apache.org/jira/browse/IGNITE-10043: Lost partitions list is reset when only one server is alive in the cluster
https://issues.apache.org/jira/browse/IGNITE-9841: SQL doesn't take lost partitions into account when persistence is enabled
https://issues.apache.org/jira/browse/IGNITE-10057: SQL queries hang during rebalance if there are LOST partitions
https://issues.apache.org/jira/browse/IGNITE-9902: ScanQuery doesn't take lost partitions into account
https://issues.apache.org/jira/browse/IGNITE-10059: Local scan query against LOST partition fails
https://issues.apache.org/jira/browse/IGNITE-10044: LOST partition is marked as OWNING after the owner rejoins with existing persistent data
https://issues.apache.org/jira/browse/IGNITE-10058: resetLostPartitions() leaves an additional copy of a partition in the cluster

I'm sure this is not a complete list, but this is what I could find by tackling how queries and
persistence interact with current handling of partition loss.

It seems that the issues - from this list and some other fixed recently - can be split into three categories
- corner case bugs - there are always some, and we can fix them as they show up
- handling of lost partitions by different APIs - while JCache API handles lost partitions fine,
SQL and Scan queries have known issues; other APIs, such different types of queries, DataStreamer,
etc probably need to have more testing
- Partition Loss Policices + BLT (https://issues.apache.org/jira/browse/IGNITE-10041) - BLT seems
to be fundametally conflicting with the pre-existing semantics of partition loss

While the former two categories can be solved case-by-case, the last one needs a wider design effort.
We need to reimagine our partition loss semantics for BLT, and change behavior accordingly.
For now, most of the features don't really work for a cache with BLT, with only READ_WRITE_SAFE and
READ_ONLY_SAFE working correctly (good thing - these two are the most useful policies anyway).

TLDR: we have issues with partition loss policices, and the largest one is that BLT semantics
conflict with most partition lost policices, and we need to address this somehow.

What I suggest to do right now:
1. Deny the configurations that don't work - e.g. just throw an exception if a cache starts
with BLT and PartitionLossPolicy.IGNORE or others.
2. Change default PartitionLossPolicy to READ_WRITE_SAFE *for persistent caches only*.
This is what effectively in place for the persistent caches already (since IGNORE semantics are
not supported), so this shouldn't bring a lot of compatibility issues.

I believe doing this will at least help us to protect the users from unexpected/inconsistent behavior.
Actual design changes can be done later, e.g. as a part of IEP 4 Phase 2/3.

WDYT?

Thanks,
Stan

Reply | Threaded
Open this post in threaded view
|

Re: Partition Loss Policies issues

dmagda
Hi Stan,

Thanks for the extensive analysis. Alex G. could you please step in and
share your thoughts. Seems it's time to revisit IEP-4 and prioritize the
gaps.

--
Denis

On Wed, Oct 31, 2018 at 7:56 AM Stanislav Lukyanov <[hidden email]>
wrote:

> Hi Igniters,
>
> I've been looking into various scenarios of Partition Loss Policies usage
> recently,
> and found a number of issues in the current implementation.
>
> I'll start with an overview, but if you'd like to dive to a proposal I
> have right now then please
> feel free to scroll down to TLDR.
>
> The list of issues is below:
> https://issues.apache.org/jira/browse/IGNITE-10041: Partition loss
> policies work incorrectly with BLT
> https://issues.apache.org/jira/browse/IGNITE-10043: Lost partitions list
> is reset when only one server is alive in the cluster
> https://issues.apache.org/jira/browse/IGNITE-9841: SQL doesn't take lost
> partitions into account when persistence is enabled
> https://issues.apache.org/jira/browse/IGNITE-10057: SQL queries hang
> during rebalance if there are LOST partitions
> https://issues.apache.org/jira/browse/IGNITE-9902: ScanQuery doesn't take
> lost partitions into account
> https://issues.apache.org/jira/browse/IGNITE-10059: Local scan query
> against LOST partition fails
> https://issues.apache.org/jira/browse/IGNITE-10044: LOST partition is
> marked as OWNING after the owner rejoins with existing persistent data
> https://issues.apache.org/jira/browse/IGNITE-10058: resetLostPartitions()
> leaves an additional copy of a partition in the cluster
>
> I'm sure this is not a complete list, but this is what I could find by
> tackling how queries and
> persistence interact with current handling of partition loss.
>
> It seems that the issues - from this list and some other fixed recently -
> can be split into three categories
> - corner case bugs - there are always some, and we can fix them as they
> show up
> - handling of lost partitions by different APIs - while JCache API handles
> lost partitions fine,
> SQL and Scan queries have known issues; other APIs, such different types
> of queries, DataStreamer,
> etc probably need to have more testing
> - Partition Loss Policices + BLT (
> https://issues.apache.org/jira/browse/IGNITE-10041) - BLT seems
> to be fundametally conflicting with the pre-existing semantics of
> partition loss
>
> While the former two categories can be solved case-by-case, the last one
> needs a wider design effort.
> We need to reimagine our partition loss semantics for BLT, and change
> behavior accordingly.
> For now, most of the features don't really work for a cache with BLT, with
> only READ_WRITE_SAFE and
> READ_ONLY_SAFE working correctly (good thing - these two are the most
> useful policies anyway).
>
> TLDR: we have issues with partition loss policices, and the largest one is
> that BLT semantics
> conflict with most partition lost policices, and we need to address this
> somehow.
>
> What I suggest to do right now:
> 1. Deny the configurations that don't work - e.g. just throw an exception
> if a cache starts
> with BLT and PartitionLossPolicy.IGNORE or others.
> 2. Change default PartitionLossPolicy to READ_WRITE_SAFE *for persistent
> caches only*.
> This is what effectively in place for the persistent caches already (since
> IGNORE semantics are
> not supported), so this shouldn't bring a lot of compatibility issues.
>
> I believe doing this will at least help us to protect the users from
> unexpected/inconsistent behavior.
> Actual design changes can be done later, e.g. as a part of IEP 4 Phase 2/3.
>
> WDYT?
>
> Thanks,
> Stan
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Partition Loss Policies issues

Ivan Pavlukhin
Hi Stan,

Thank you for revealing (at least for me) one more dark corner in
Ignite. As I see from your analysis it looks like a partition loss
policy should be taken into account in many places (persistence,
different query types, different cache modes). And it seems that with
multiple options to configure it overall aspect becomes fragile. Do
you know if every policy is widely used in practice? Perhaps reducing
number of options can simplify things.

By the way, could you point out in what cases we can get LOST
partition with some data for in-memory cache?
сб, 3 нояб. 2018 г. в 00:33, Denis Magda <[hidden email]>:

>
> Hi Stan,
>
> Thanks for the extensive analysis. Alex G. could you please step in and
> share your thoughts. Seems it's time to revisit IEP-4 and prioritize the
> gaps.
>
> --
> Denis
>
> On Wed, Oct 31, 2018 at 7:56 AM Stanislav Lukyanov <[hidden email]>
> wrote:
>
> > Hi Igniters,
> >
> > I've been looking into various scenarios of Partition Loss Policies usage
> > recently,
> > and found a number of issues in the current implementation.
> >
> > I'll start with an overview, but if you'd like to dive to a proposal I
> > have right now then please
> > feel free to scroll down to TLDR.
> >
> > The list of issues is below:
> > https://issues.apache.org/jira/browse/IGNITE-10041: Partition loss
> > policies work incorrectly with BLT
> > https://issues.apache.org/jira/browse/IGNITE-10043: Lost partitions list
> > is reset when only one server is alive in the cluster
> > https://issues.apache.org/jira/browse/IGNITE-9841: SQL doesn't take lost
> > partitions into account when persistence is enabled
> > https://issues.apache.org/jira/browse/IGNITE-10057: SQL queries hang
> > during rebalance if there are LOST partitions
> > https://issues.apache.org/jira/browse/IGNITE-9902: ScanQuery doesn't take
> > lost partitions into account
> > https://issues.apache.org/jira/browse/IGNITE-10059: Local scan query
> > against LOST partition fails
> > https://issues.apache.org/jira/browse/IGNITE-10044: LOST partition is
> > marked as OWNING after the owner rejoins with existing persistent data
> > https://issues.apache.org/jira/browse/IGNITE-10058: resetLostPartitions()
> > leaves an additional copy of a partition in the cluster
> >
> > I'm sure this is not a complete list, but this is what I could find by
> > tackling how queries and
> > persistence interact with current handling of partition loss.
> >
> > It seems that the issues - from this list and some other fixed recently -
> > can be split into three categories
> > - corner case bugs - there are always some, and we can fix them as they
> > show up
> > - handling of lost partitions by different APIs - while JCache API handles
> > lost partitions fine,
> > SQL and Scan queries have known issues; other APIs, such different types
> > of queries, DataStreamer,
> > etc probably need to have more testing
> > - Partition Loss Policices + BLT (
> > https://issues.apache.org/jira/browse/IGNITE-10041) - BLT seems
> > to be fundametally conflicting with the pre-existing semantics of
> > partition loss
> >
> > While the former two categories can be solved case-by-case, the last one
> > needs a wider design effort.
> > We need to reimagine our partition loss semantics for BLT, and change
> > behavior accordingly.
> > For now, most of the features don't really work for a cache with BLT, with
> > only READ_WRITE_SAFE and
> > READ_ONLY_SAFE working correctly (good thing - these two are the most
> > useful policies anyway).
> >
> > TLDR: we have issues with partition loss policices, and the largest one is
> > that BLT semantics
> > conflict with most partition lost policices, and we need to address this
> > somehow.
> >
> > What I suggest to do right now:
> > 1. Deny the configurations that don't work - e.g. just throw an exception
> > if a cache starts
> > with BLT and PartitionLossPolicy.IGNORE or others.
> > 2. Change default PartitionLossPolicy to READ_WRITE_SAFE *for persistent
> > caches only*.
> > This is what effectively in place for the persistent caches already (since
> > IGNORE semantics are
> > not supported), so this shouldn't bring a lot of compatibility issues.
> >
> > I believe doing this will at least help us to protect the users from
> > unexpected/inconsistent behavior.
> > Actual design changes can be done later, e.g. as a part of IEP 4 Phase 2/3.
> >
> > WDYT?
> >
> > Thanks,
> > Stan
> >
> >



--
Best regards,
Ivan Pavlukhin