lost partition recovery with native persistence

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

lost partition recovery with native persistence

novicr
I was going over failure recovery scenarios, trying to understand logic
behind lost partitions functionality.  In the case of native persistence,
Ignite fully manages data persistence and availability.  If enough nodes in
the cluster become unavailable resulting in partitions marked lost, Ignite
keeps track of those partitions.  When nodes rejoin the cluster partitions
are automatically discovered and loaded from disk.  This can be shown by the
fact that data actually becomes available and can be retrieved using normal
get/query api's.  However, lostPartitions() lists still contain some
partitions that were previously lost (this seems like a bug) and Ignite
expects user to manually mark partitions available by calling
Ignite.resetLostPartitions() api.  

I found some discussion about issues with topology version handling in
resetLostPartitions() in this ticket:  IGNITE-7832
<https://issues.apache.org/jira/browse/IGNITE-7832>  , but it does not
address the question, why user involvement is required at all.

Seems there should, at least, be a configuration option to allow cache to
self-recover once all partitions become available.

This email was originally sent to the user group:
lost-partition-recovery-with-native-persistence
<http://apache-ignite-users.70518.x6.nabble.com/lost-partition-recovery-with-native-persistence-td24520.html>  



Thanks,
Roman



--
Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: lost partition recovery with native persistence

novicr
Resending this to bubble up to the top of inbox.  Would be good to hear
opinions on suggested functionality change.  

Thanks,
Roman



--
Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: lost partition recovery with native persistence

Ivan Pavlukhin
Hi Roman,

Also you could check if your problem is mentioned in another
discussion related to lost partitions [1].

[1] http://apache-ignite-developers.2346864.n4.nabble.com/Partition-Loss-Policies-issues-td37304.html
ср, 28 нояб. 2018 г. в 19:31, novicr <[hidden email]>:

>
> Resending this to bubble up to the top of inbox.  Would be good to hear
> opinions on suggested functionality change.
>
> Thanks,
> Roman
>
>
>
> --
> Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/



--
Best regards,
Ivan Pavlukhin
Reply | Threaded
Open this post in threaded view
|

Re: lost partition recovery with native persistence should not require resetLostPartitions call

novicr
Ivan,

Thanks for the pointer to discussion.  
It doesn't actually address the point around the need for
'resetLostPartitions()'.  It does point to a ticket that would fix the logic
in it when BLT is used.  My concern is that Ignite relies on the user to
call this method at all.  

Original message:
I was going over failure recovery scenarios, trying to understand logic
behind lost partitions functionality.  In the case of native persistence,
Ignite fully manages data persistence and availability.  If enough nodes in
the cluster become unavailable resulting in partitions marked lost, Ignite
keeps track of those partitions.  When nodes rejoin the cluster partitions
are automatically discovered and loaded from disk.  This can be shown by the
fact that data actually becomes available and can be retrieved using normal
get/query api's.  However, lostPartitions() lists still contain some
partitions that were previously lost (this seems like a bug) and Ignite
expects user to manually mark partitions available by calling
Ignite.resetLostPartitions() api.  

I found some discussion about issues with topology version handling in
resetLostPartitions() in this ticket:  IGNITE-7832
<https://issues.apache.org/jira/browse/IGNITE-7832>  , but it does not
address the question, why user involvement is required at all.

Seems there should, at least, be a configuration option to allow cache to
self-recover once all partitions become available.

This email was originally sent to the user group:
lost-partition-recovery-with-native-persistence
<http://apache-ignite-users.70518.x6.nabble.com/lost-partition-recovery-with-native-persistence-td24520.html>  





--
Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/