Reconsider default WAL mode: we need something between LOG_ONLY and FSYNC

classic Classic list List threaded Threaded
44 messages Options
123
Reply | Threaded
Open this post in threaded view
|

Reconsider default WAL mode: we need something between LOG_ONLY and FSYNC

Ivan Rakov
Igniters and especially Native Persistence experts,

We decided to change default WAL mode from DEFAULT(FSYNC) to LOG_ONLY in
2.4 release. That was difficult decision: we sacrificed power loss / OS
crash tolerance, but gained significant performance boost. From my
perspective, LOG_ONLY is right choice, but it still misses some critical
features that default mode should have.

Let's focus on exact guarantees each mode provides. Documentation
explains it in pretty simple manner: LOG_ONLY - writes survive process
crash, FSYNC - writes survive power loss scenarios. I have to notice
that documentation doesn't describe what exactly can happen to node in
LOG_ONLY mode in case of power loss / OS crash scenario. Basically,
there are two possible negative outcomes: loss of several last updates
(it's exactly what can happen in BACKGROUND mode in case of process
crash) and total storage corruption (not only last updates, but all data
will be lost). I've made a quick research on this and came into
conclusion that power loss in LOG_ONLY can lead to storage corruption.
There are several explanations for this:
1) IgniteWriteAheadLogManager#fsync is kind of broken - it doesn't
perform actual fsync unless current WAL mode is FSYNC. We call this
method when we write checkpoint marker to WAL. As long as part of WAL
before checkpoint marker can be not synced, "physical" records that are
necessary for crash recovery in "Node stopped in the middle of
checkpoint" scenario may be corrupted after power loss. If that happens,
we won't be able to recover internal data structures, which means loss
of all data.
2) We don't fsync WAL archive files unless current WAL mode is FSYNC.
WAL archive can contain necessary "physical" records as well, which
leads us to the case described above.
3) We do perform fsync on rollover (switch of current WAL segment) in
all modes, but only when there's enough space to write switch segment
record - see FileWriteHandle#close. So there's a little chance that
we'll skip fsync and bump into the same case.

Enforcing fsync on that three situations will give us a guarantee that
LOG_ONLY will survive power loss scenarios with possibility of losing
several last updates. There still can be a total binary mess in the last
part of WAL, but as long as we perform CRC check during WAL replay,
we'll detect start of that mess. Extra fsyncs may cause slight
performance degradation - all writes will have to await for one fsync on
every rollover and checkpoint. It's still much faster than fsync on
every write in WAL - I expect a few percent (0-5%) drop comparing to
current LOG_ONLY. But degradation is degradation, and LOG_ONLY mode
without extra fsyncs makes sense as well - that's why we need to
introduce "LOG_ONLY + extra fsyncs" as separate WAL mode. I think, we
should make it default - it provides significant durability bonus for
the cost of one extra fsync for each WAL segment written.

To sum it up, I propose a new set of possible WAL modes:
NONE - both process crash and power loss can lead to corruption
BACKGROUND - process crash can lead to last updates loss, power loss can
lead to corruption
LOG_ONLY - writes survive process crash, power loss can lead to corruption
LOG_ONLY_SAFE (default) - writes survive process crash, power loss can
lead to last updates loss
FSYNC - writes survive both process crash and power loss

Thoughts?


Best Regards,
Ivan Rakov

Reply | Threaded
Open this post in threaded view
|

Re: Reconsider default WAL mode: we need something between LOG_ONLY and FSYNC

dsetrakyan
Ivan,

Is there a performance difference between LOG_ONLY and LOG_ONLY_SAFE?

D.

On Thu, Mar 15, 2018 at 4:23 PM, Ivan Rakov <[hidden email]> wrote:

> Igniters and especially Native Persistence experts,
>
> We decided to change default WAL mode from DEFAULT(FSYNC) to LOG_ONLY in
> 2.4 release. That was difficult decision: we sacrificed power loss / OS
> crash tolerance, but gained significant performance boost. From my
> perspective, LOG_ONLY is right choice, but it still misses some critical
> features that default mode should have.
>
> Let's focus on exact guarantees each mode provides. Documentation explains
> it in pretty simple manner: LOG_ONLY - writes survive process crash, FSYNC
> - writes survive power loss scenarios. I have to notice that documentation
> doesn't describe what exactly can happen to node in LOG_ONLY mode in case
> of power loss / OS crash scenario. Basically, there are two possible
> negative outcomes: loss of several last updates (it's exactly what can
> happen in BACKGROUND mode in case of process crash) and total storage
> corruption (not only last updates, but all data will be lost). I've made a
> quick research on this and came into conclusion that power loss in LOG_ONLY
> can lead to storage corruption. There are several explanations for this:
> 1) IgniteWriteAheadLogManager#fsync is kind of broken - it doesn't
> perform actual fsync unless current WAL mode is FSYNC. We call this method
> when we write checkpoint marker to WAL. As long as part of WAL before
> checkpoint marker can be not synced, "physical" records that are necessary
> for crash recovery in "Node stopped in the middle of checkpoint" scenario
> may be corrupted after power loss. If that happens, we won't be able to
> recover internal data structures, which means loss of all data.
> 2) We don't fsync WAL archive files unless current WAL mode is FSYNC. WAL
> archive can contain necessary "physical" records as well, which leads us to
> the case described above.
> 3) We do perform fsync on rollover (switch of current WAL segment) in all
> modes, but only when there's enough space to write switch segment record -
> see FileWriteHandle#close. So there's a little chance that we'll skip fsync
> and bump into the same case.
>
> Enforcing fsync on that three situations will give us a guarantee that
> LOG_ONLY will survive power loss scenarios with possibility of losing
> several last updates. There still can be a total binary mess in the last
> part of WAL, but as long as we perform CRC check during WAL replay, we'll
> detect start of that mess. Extra fsyncs may cause slight performance
> degradation - all writes will have to await for one fsync on every rollover
> and checkpoint. It's still much faster than fsync on every write in WAL - I
> expect a few percent (0-5%) drop comparing to current LOG_ONLY. But
> degradation is degradation, and LOG_ONLY mode without extra fsyncs makes
> sense as well - that's why we need to introduce "LOG_ONLY + extra fsyncs"
> as separate WAL mode. I think, we should make it default - it provides
> significant durability bonus for the cost of one extra fsync for each WAL
> segment written.
>
> To sum it up, I propose a new set of possible WAL modes:
> NONE - both process crash and power loss can lead to corruption
> BACKGROUND - process crash can lead to last updates loss, power loss can
> lead to corruption
> LOG_ONLY - writes survive process crash, power loss can lead to corruption
> LOG_ONLY_SAFE (default) - writes survive process crash, power loss can
> lead to last updates loss
> FSYNC - writes survive both process crash and power loss
>
> Thoughts?
>
>
> Best Regards,
> Ivan Rakov
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Reconsider default WAL mode: we need something between LOG_ONLY and FSYNC

Ivan Rakov
It really depends on hardware and workload pattern. I expect that
LOG_ONLY_SAFE will be either equal to LOG_ONLY or a few percent slower.
We'll answer this question for sure after implementation of three fixes
and benchmarking.
Let's first of all get understanding whether extra durability guarantees
make sense. I think that it does: power loss itself is really unlikely
scenario, but LOG_ONLY_SAFE will make it much less risky. It will
guarantee presence of all partitions after power loss in the whole data
center, it will also make rebalancing after power loss on one node much
faster.

Best Regards,
Ivan Rakov

On 16.03.2018 8:17, Dmitriy Setrakyan wrote:

> Ivan,
>
> Is there a performance difference between LOG_ONLY and LOG_ONLY_SAFE?
>
> D.
>
> On Thu, Mar 15, 2018 at 4:23 PM, Ivan Rakov <[hidden email]> wrote:
>
>> Igniters and especially Native Persistence experts,
>>
>> We decided to change default WAL mode from DEFAULT(FSYNC) to LOG_ONLY in
>> 2.4 release. That was difficult decision: we sacrificed power loss / OS
>> crash tolerance, but gained significant performance boost. From my
>> perspective, LOG_ONLY is right choice, but it still misses some critical
>> features that default mode should have.
>>
>> Let's focus on exact guarantees each mode provides. Documentation explains
>> it in pretty simple manner: LOG_ONLY - writes survive process crash, FSYNC
>> - writes survive power loss scenarios. I have to notice that documentation
>> doesn't describe what exactly can happen to node in LOG_ONLY mode in case
>> of power loss / OS crash scenario. Basically, there are two possible
>> negative outcomes: loss of several last updates (it's exactly what can
>> happen in BACKGROUND mode in case of process crash) and total storage
>> corruption (not only last updates, but all data will be lost). I've made a
>> quick research on this and came into conclusion that power loss in LOG_ONLY
>> can lead to storage corruption. There are several explanations for this:
>> 1) IgniteWriteAheadLogManager#fsync is kind of broken - it doesn't
>> perform actual fsync unless current WAL mode is FSYNC. We call this method
>> when we write checkpoint marker to WAL. As long as part of WAL before
>> checkpoint marker can be not synced, "physical" records that are necessary
>> for crash recovery in "Node stopped in the middle of checkpoint" scenario
>> may be corrupted after power loss. If that happens, we won't be able to
>> recover internal data structures, which means loss of all data.
>> 2) We don't fsync WAL archive files unless current WAL mode is FSYNC. WAL
>> archive can contain necessary "physical" records as well, which leads us to
>> the case described above.
>> 3) We do perform fsync on rollover (switch of current WAL segment) in all
>> modes, but only when there's enough space to write switch segment record -
>> see FileWriteHandle#close. So there's a little chance that we'll skip fsync
>> and bump into the same case.
>>
>> Enforcing fsync on that three situations will give us a guarantee that
>> LOG_ONLY will survive power loss scenarios with possibility of losing
>> several last updates. There still can be a total binary mess in the last
>> part of WAL, but as long as we perform CRC check during WAL replay, we'll
>> detect start of that mess. Extra fsyncs may cause slight performance
>> degradation - all writes will have to await for one fsync on every rollover
>> and checkpoint. It's still much faster than fsync on every write in WAL - I
>> expect a few percent (0-5%) drop comparing to current LOG_ONLY. But
>> degradation is degradation, and LOG_ONLY mode without extra fsyncs makes
>> sense as well - that's why we need to introduce "LOG_ONLY + extra fsyncs"
>> as separate WAL mode. I think, we should make it default - it provides
>> significant durability bonus for the cost of one extra fsync for each WAL
>> segment written.
>>
>> To sum it up, I propose a new set of possible WAL modes:
>> NONE - both process crash and power loss can lead to corruption
>> BACKGROUND - process crash can lead to last updates loss, power loss can
>> lead to corruption
>> LOG_ONLY - writes survive process crash, power loss can lead to corruption
>> LOG_ONLY_SAFE (default) - writes survive process crash, power loss can
>> lead to last updates loss
>> FSYNC - writes survive both process crash and power loss
>>
>> Thoughts?
>>
>>
>> Best Regards,
>> Ivan Rakov
>>
>>

Reply | Threaded
Open this post in threaded view
|

Re: Reconsider default WAL mode: we need something between LOG_ONLY and FSYNC

Vladimir Ozerov
In reply to this post by dsetrakyan
Same question. It would be very difficult to explain these two modes to
users. We should do our best to fix LOG_ONLY first. Without these
guarantees there is no reason to keep LOG_ONLY at all, user could simply
use BACKGROUND with high flush frequency. This is precisely how Cassandra
works.

p.1 - sounds like a bug
p.2 - sounds like a bug as well; hopefully it should not introduce serious
performance hit unless we write too much data to WAL, what would mean that
we should work on it's optimization (e.g. free list update overhead, no
delta updates, etc).
p.3 - sounds like a bug as well

On Fri, Mar 16, 2018 at 8:17 AM, Dmitriy Setrakyan <[hidden email]>
wrote:

> Ivan,
>
> Is there a performance difference between LOG_ONLY and LOG_ONLY_SAFE?
>
> D.
>
> On Thu, Mar 15, 2018 at 4:23 PM, Ivan Rakov <[hidden email]> wrote:
>
> > Igniters and especially Native Persistence experts,
> >
> > We decided to change default WAL mode from DEFAULT(FSYNC) to LOG_ONLY in
> > 2.4 release. That was difficult decision: we sacrificed power loss / OS
> > crash tolerance, but gained significant performance boost. From my
> > perspective, LOG_ONLY is right choice, but it still misses some critical
> > features that default mode should have.
> >
> > Let's focus on exact guarantees each mode provides. Documentation
> explains
> > it in pretty simple manner: LOG_ONLY - writes survive process crash,
> FSYNC
> > - writes survive power loss scenarios. I have to notice that
> documentation
> > doesn't describe what exactly can happen to node in LOG_ONLY mode in case
> > of power loss / OS crash scenario. Basically, there are two possible
> > negative outcomes: loss of several last updates (it's exactly what can
> > happen in BACKGROUND mode in case of process crash) and total storage
> > corruption (not only last updates, but all data will be lost). I've made
> a
> > quick research on this and came into conclusion that power loss in
> LOG_ONLY
> > can lead to storage corruption. There are several explanations for this:
> > 1) IgniteWriteAheadLogManager#fsync is kind of broken - it doesn't
> > perform actual fsync unless current WAL mode is FSYNC. We call this
> method
> > when we write checkpoint marker to WAL. As long as part of WAL before
> > checkpoint marker can be not synced, "physical" records that are
> necessary
> > for crash recovery in "Node stopped in the middle of checkpoint" scenario
> > may be corrupted after power loss. If that happens, we won't be able to
> > recover internal data structures, which means loss of all data.
> > 2) We don't fsync WAL archive files unless current WAL mode is FSYNC. WAL
> > archive can contain necessary "physical" records as well, which leads us
> to
> > the case described above.
> > 3) We do perform fsync on rollover (switch of current WAL segment) in all
> > modes, but only when there's enough space to write switch segment record
> -
> > see FileWriteHandle#close. So there's a little chance that we'll skip
> fsync
> > and bump into the same case.
> >
> > Enforcing fsync on that three situations will give us a guarantee that
> > LOG_ONLY will survive power loss scenarios with possibility of losing
> > several last updates. There still can be a total binary mess in the last
> > part of WAL, but as long as we perform CRC check during WAL replay, we'll
> > detect start of that mess. Extra fsyncs may cause slight performance
> > degradation - all writes will have to await for one fsync on every
> rollover
> > and checkpoint. It's still much faster than fsync on every write in WAL
> - I
> > expect a few percent (0-5%) drop comparing to current LOG_ONLY. But
> > degradation is degradation, and LOG_ONLY mode without extra fsyncs makes
> > sense as well - that's why we need to introduce "LOG_ONLY + extra fsyncs"
> > as separate WAL mode. I think, we should make it default - it provides
> > significant durability bonus for the cost of one extra fsync for each WAL
> > segment written.
> >
> > To sum it up, I propose a new set of possible WAL modes:
> > NONE - both process crash and power loss can lead to corruption
> > BACKGROUND - process crash can lead to last updates loss, power loss can
> > lead to corruption
> > LOG_ONLY - writes survive process crash, power loss can lead to
> corruption
> > LOG_ONLY_SAFE (default) - writes survive process crash, power loss can
> > lead to last updates loss
> > FSYNC - writes survive both process crash and power loss
> >
> > Thoughts?
> >
> >
> > Best Regards,
> > Ivan Rakov
> >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Reconsider default WAL mode: we need something between LOG_ONLY and FSYNC

Dmitriy Pavlov
Folks, I do not expect any performance degradation here for high load
becase we already do fsync on rollover. So extra fsyncs will be almost
free. We should do this fsync without holding CP lock , of course.

(see also point 3:
3) We do perform fsync on rollover (switch of current WAL segment) in all
modes, but only when there's enough space to write switch segment record -
see FileWriteHandle # close. So there's a little chance that we'll skip
fsync and bump into the same case)

++1 from me for change Log only to be safe in all cases
+1 create new mode 'Log only safe'

пт, 16 мар. 2018 г. в 10:31, Vladimir Ozerov <[hidden email]>:

> Same question. It would be very difficult to explain these two modes to
> users. We should do our best to fix LOG_ONLY first. Without these
> guarantees there is no reason to keep LOG_ONLY at all, user could simply
> use BACKGROUND with high flush frequency. This is precisely how Cassandra
> works.
>
> p.1 - sounds like a bug
> p.2 - sounds like a bug as well; hopefully it should not introduce serious
> performance hit unless we write too much data to WAL, what would mean that
> we should work on it's optimization (e.g. free list update overhead, no
> delta updates, etc).
> p.3 - sounds like a bug as well
>
> On Fri, Mar 16, 2018 at 8:17 AM, Dmitriy Setrakyan <[hidden email]>
> wrote:
>
> > Ivan,
> >
> > Is there a performance difference between LOG_ONLY and LOG_ONLY_SAFE?
> >
> > D.
> >
> > On Thu, Mar 15, 2018 at 4:23 PM, Ivan Rakov <[hidden email]>
> wrote:
> >
> > > Igniters and especially Native Persistence experts,
> > >
> > > We decided to change default WAL mode from DEFAULT(FSYNC) to LOG_ONLY
> in
> > > 2.4 release. That was difficult decision: we sacrificed power loss / OS
> > > crash tolerance, but gained significant performance boost. From my
> > > perspective, LOG_ONLY is right choice, but it still misses some
> critical
> > > features that default mode should have.
> > >
> > > Let's focus on exact guarantees each mode provides. Documentation
> > explains
> > > it in pretty simple manner: LOG_ONLY - writes survive process crash,
> > FSYNC
> > > - writes survive power loss scenarios. I have to notice that
> > documentation
> > > doesn't describe what exactly can happen to node in LOG_ONLY mode in
> case
> > > of power loss / OS crash scenario. Basically, there are two possible
> > > negative outcomes: loss of several last updates (it's exactly what can
> > > happen in BACKGROUND mode in case of process crash) and total storage
> > > corruption (not only last updates, but all data will be lost). I've
> made
> > a
> > > quick research on this and came into conclusion that power loss in
> > LOG_ONLY
> > > can lead to storage corruption. There are several explanations for
> this:
> > > 1) IgniteWriteAheadLogManager#fsync is kind of broken - it doesn't
> > > perform actual fsync unless current WAL mode is FSYNC. We call this
> > method
> > > when we write checkpoint marker to WAL. As long as part of WAL before
> > > checkpoint marker can be not synced, "physical" records that are
> > necessary
> > > for crash recovery in "Node stopped in the middle of checkpoint"
> scenario
> > > may be corrupted after power loss. If that happens, we won't be able to
> > > recover internal data structures, which means loss of all data.
> > > 2) We don't fsync WAL archive files unless current WAL mode is FSYNC.
> WAL
> > > archive can contain necessary "physical" records as well, which leads
> us
> > to
> > > the case described above.
> > > 3) We do perform fsync on rollover (switch of current WAL segment) in
> all
> > > modes, but only when there's enough space to write switch segment
> record
> > -
> > > see FileWriteHandle#close. So there's a little chance that we'll skip
> > fsync
> > > and bump into the same case.
> > >
> > > Enforcing fsync on that three situations will give us a guarantee that
> > > LOG_ONLY will survive power loss scenarios with possibility of losing
> > > several last updates. There still can be a total binary mess in the
> last
> > > part of WAL, but as long as we perform CRC check during WAL replay,
> we'll
> > > detect start of that mess. Extra fsyncs may cause slight performance
> > > degradation - all writes will have to await for one fsync on every
> > rollover
> > > and checkpoint. It's still much faster than fsync on every write in WAL
> > - I
> > > expect a few percent (0-5%) drop comparing to current LOG_ONLY. But
> > > degradation is degradation, and LOG_ONLY mode without extra fsyncs
> makes
> > > sense as well - that's why we need to introduce "LOG_ONLY + extra
> fsyncs"
> > > as separate WAL mode. I think, we should make it default - it provides
> > > significant durability bonus for the cost of one extra fsync for each
> WAL
> > > segment written.
> > >
> > > To sum it up, I propose a new set of possible WAL modes:
> > > NONE - both process crash and power loss can lead to corruption
> > > BACKGROUND - process crash can lead to last updates loss, power loss
> can
> > > lead to corruption
> > > LOG_ONLY - writes survive process crash, power loss can lead to
> > corruption
> > > LOG_ONLY_SAFE (default) - writes survive process crash, power loss can
> > > lead to last updates loss
> > > FSYNC - writes survive both process crash and power loss
> > >
> > > Thoughts?
> > >
> > >
> > > Best Regards,
> > > Ivan Rakov
> > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Reconsider default WAL mode: we need something between LOG_ONLY and FSYNC

Ivan Rakov
Vladimir,

Unlike BACKGROUND, LOG_ONLY provides strict write guarantees unless
power loss has happened.
Seems like we need to measure performance difference to decide whether
do we need separate WAL mode. If it will be invisible, we'll just fix
these bugs without introducing new mode; if it will be perceptible,
we'll continue the discussion about introducing LOG_ONLY_SAFE.
Makes sense?

Best Regards,
Ivan Rakov

On 16.03.2018 10:45, Dmitry Pavlov wrote:

> Folks, I do not expect any performance degradation here for high load
> becase we already do fsync on rollover. So extra fsyncs will be almost
> free. We should do this fsync without holding CP lock , of course.
>
> (see also point 3:
> 3) We do perform fsync on rollover (switch of current WAL segment) in all
> modes, but only when there's enough space to write switch segment record -
> see FileWriteHandle # close. So there's a little chance that we'll skip
> fsync and bump into the same case)
>
> ++1 from me for change Log only to be safe in all cases
> +1 create new mode 'Log only safe'
>
> пт, 16 мар. 2018 г. в 10:31, Vladimir Ozerov <[hidden email]>:
>
>> Same question. It would be very difficult to explain these two modes to
>> users. We should do our best to fix LOG_ONLY first. Without these
>> guarantees there is no reason to keep LOG_ONLY at all, user could simply
>> use BACKGROUND with high flush frequency. This is precisely how Cassandra
>> works.
>>
>> p.1 - sounds like a bug
>> p.2 - sounds like a bug as well; hopefully it should not introduce serious
>> performance hit unless we write too much data to WAL, what would mean that
>> we should work on it's optimization (e.g. free list update overhead, no
>> delta updates, etc).
>> p.3 - sounds like a bug as well
>>
>> On Fri, Mar 16, 2018 at 8:17 AM, Dmitriy Setrakyan <[hidden email]>
>> wrote:
>>
>>> Ivan,
>>>
>>> Is there a performance difference between LOG_ONLY and LOG_ONLY_SAFE?
>>>
>>> D.
>>>
>>> On Thu, Mar 15, 2018 at 4:23 PM, Ivan Rakov <[hidden email]>
>> wrote:
>>>> Igniters and especially Native Persistence experts,
>>>>
>>>> We decided to change default WAL mode from DEFAULT(FSYNC) to LOG_ONLY
>> in
>>>> 2.4 release. That was difficult decision: we sacrificed power loss / OS
>>>> crash tolerance, but gained significant performance boost. From my
>>>> perspective, LOG_ONLY is right choice, but it still misses some
>> critical
>>>> features that default mode should have.
>>>>
>>>> Let's focus on exact guarantees each mode provides. Documentation
>>> explains
>>>> it in pretty simple manner: LOG_ONLY - writes survive process crash,
>>> FSYNC
>>>> - writes survive power loss scenarios. I have to notice that
>>> documentation
>>>> doesn't describe what exactly can happen to node in LOG_ONLY mode in
>> case
>>>> of power loss / OS crash scenario. Basically, there are two possible
>>>> negative outcomes: loss of several last updates (it's exactly what can
>>>> happen in BACKGROUND mode in case of process crash) and total storage
>>>> corruption (not only last updates, but all data will be lost). I've
>> made
>>> a
>>>> quick research on this and came into conclusion that power loss in
>>> LOG_ONLY
>>>> can lead to storage corruption. There are several explanations for
>> this:
>>>> 1) IgniteWriteAheadLogManager#fsync is kind of broken - it doesn't
>>>> perform actual fsync unless current WAL mode is FSYNC. We call this
>>> method
>>>> when we write checkpoint marker to WAL. As long as part of WAL before
>>>> checkpoint marker can be not synced, "physical" records that are
>>> necessary
>>>> for crash recovery in "Node stopped in the middle of checkpoint"
>> scenario
>>>> may be corrupted after power loss. If that happens, we won't be able to
>>>> recover internal data structures, which means loss of all data.
>>>> 2) We don't fsync WAL archive files unless current WAL mode is FSYNC.
>> WAL
>>>> archive can contain necessary "physical" records as well, which leads
>> us
>>> to
>>>> the case described above.
>>>> 3) We do perform fsync on rollover (switch of current WAL segment) in
>> all
>>>> modes, but only when there's enough space to write switch segment
>> record
>>> -
>>>> see FileWriteHandle#close. So there's a little chance that we'll skip
>>> fsync
>>>> and bump into the same case.
>>>>
>>>> Enforcing fsync on that three situations will give us a guarantee that
>>>> LOG_ONLY will survive power loss scenarios with possibility of losing
>>>> several last updates. There still can be a total binary mess in the
>> last
>>>> part of WAL, but as long as we perform CRC check during WAL replay,
>> we'll
>>>> detect start of that mess. Extra fsyncs may cause slight performance
>>>> degradation - all writes will have to await for one fsync on every
>>> rollover
>>>> and checkpoint. It's still much faster than fsync on every write in WAL
>>> - I
>>>> expect a few percent (0-5%) drop comparing to current LOG_ONLY. But
>>>> degradation is degradation, and LOG_ONLY mode without extra fsyncs
>> makes
>>>> sense as well - that's why we need to introduce "LOG_ONLY + extra
>> fsyncs"
>>>> as separate WAL mode. I think, we should make it default - it provides
>>>> significant durability bonus for the cost of one extra fsync for each
>> WAL
>>>> segment written.
>>>>
>>>> To sum it up, I propose a new set of possible WAL modes:
>>>> NONE - both process crash and power loss can lead to corruption
>>>> BACKGROUND - process crash can lead to last updates loss, power loss
>> can
>>>> lead to corruption
>>>> LOG_ONLY - writes survive process crash, power loss can lead to
>>> corruption
>>>> LOG_ONLY_SAFE (default) - writes survive process crash, power loss can
>>>> lead to last updates loss
>>>> FSYNC - writes survive both process crash and power loss
>>>>
>>>> Thoughts?
>>>>
>>>>
>>>> Best Regards,
>>>> Ivan Rakov
>>>>
>>>>

Reply | Threaded
Open this post in threaded view
|

Re: Reconsider default WAL mode: we need something between LOG_ONLY and FSYNC

dsetrakyan
On Fri, Mar 16, 2018 at 12:55 AM, Ivan Rakov <[hidden email]> wrote:

> Vladimir,
>
> Unlike BACKGROUND, LOG_ONLY provides strict write guarantees unless power
> loss has happened.
> Seems like we need to measure performance difference to decide whether do
> we need separate WAL mode. If it will be invisible, we'll just fix these
> bugs without introducing new mode; if it will be perceptible, we'll
> continue the discussion about introducing LOG_ONLY_SAFE.
> Makes sense?
>

Yes, this sounds like the right approach.
Reply | Threaded
Open this post in threaded view
|

Re: Reconsider default WAL mode: we need something between LOG_ONLY and FSYNC

Ivan Rakov
Ticket to track changes: https://issues.apache.org/jira/browse/IGNITE-7754

Best Regards,
Ivan Rakov

On 16.03.2018 10:58, Dmitriy Setrakyan wrote:

> On Fri, Mar 16, 2018 at 12:55 AM, Ivan Rakov <[hidden email]> wrote:
>
>> Vladimir,
>>
>> Unlike BACKGROUND, LOG_ONLY provides strict write guarantees unless power
>> loss has happened.
>> Seems like we need to measure performance difference to decide whether do
>> we need separate WAL mode. If it will be invisible, we'll just fix these
>> bugs without introducing new mode; if it will be perceptible, we'll
>> continue the discussion about introducing LOG_ONLY_SAFE.
>> Makes sense?
>>
> Yes, this sounds like the right approach.
>

Reply | Threaded
Open this post in threaded view
|

Re: Reconsider default WAL mode: we need something between LOG_ONLY and FSYNC

Valentin Kulichenko
Guys,

What do we understand under "data corruption" here? If a storage is in
corrupted state, does it mean that it needs to be completely removed and
cluster needs to be restarted without data? If so, I'm not sure any mode
that allows corruption makes much sense to me. How am I supposed to use a
database, if virtually any failure can end with complete loss of data?

In any case, this definitely should not be a default behavior. If user ever
switches to corruption-unsafe mode, there should be a clear warning about
this.

-Val

On Fri, Mar 16, 2018 at 1:06 AM, Ivan Rakov <[hidden email]> wrote:

> Ticket to track changes: https://issues.apache.org/jira/browse/IGNITE-7754
>
> Best Regards,
> Ivan Rakov
>
>
> On 16.03.2018 10:58, Dmitriy Setrakyan wrote:
>
>> On Fri, Mar 16, 2018 at 12:55 AM, Ivan Rakov <[hidden email]>
>> wrote:
>>
>> Vladimir,
>>>
>>> Unlike BACKGROUND, LOG_ONLY provides strict write guarantees unless power
>>> loss has happened.
>>> Seems like we need to measure performance difference to decide whether do
>>> we need separate WAL mode. If it will be invisible, we'll just fix these
>>> bugs without introducing new mode; if it will be perceptible, we'll
>>> continue the discussion about introducing LOG_ONLY_SAFE.
>>> Makes sense?
>>>
>>> Yes, this sounds like the right approach.
>>
>>
>
Reply | Threaded
Open this post in threaded view
|

Re: Reconsider default WAL mode: we need something between LOG_ONLY and FSYNC

Ivan Rakov
Val,

> If a storage is in
> corrupted state, does it mean that it needs to be completely removed and
> cluster needs to be restarted without data?

Yes, there's a chance that in LOG_ONLY all local data will be lost, but
only in *power loss**/ OS crash* case.
kill -9, JVM crash, death of critical system thread and all other cases
that usually take place are variations of *process crash*. All WAL modes
(except NONE, of course) ensure corruption-safety in case of process crash.

> If so, I'm not sure any mode
> that allows corruption makes much sense to me.
It depends on performance impact of enforcing power-loss corruption
safety. Price of full protection from power loss is high - FSYNC is way
slower (2-10 times) than other WAL modes. The question is whether
ensuring weaker guarantees (corruption can't happen, but loss of last
updates can) will affect performance as badly as strong guarantees. I'll
share benchmark results soon.

Best Regards,
Ivan Rakov

On 20.03.2018 5:09, Valentin Kulichenko wrote:

> Guys,
>
> What do we understand under "data corruption" here? If a storage is in
> corrupted state, does it mean that it needs to be completely removed and
> cluster needs to be restarted without data? If so, I'm not sure any mode
> that allows corruption makes much sense to me. How am I supposed to use a
> database, if virtually any failure can end with complete loss of data?
>
> In any case, this definitely should not be a default behavior. If user ever
> switches to corruption-unsafe mode, there should be a clear warning about
> this.
>
> -Val
>
> On Fri, Mar 16, 2018 at 1:06 AM, Ivan Rakov <[hidden email]> wrote:
>
>> Ticket to track changes: https://issues.apache.org/jira/browse/IGNITE-7754
>>
>> Best Regards,
>> Ivan Rakov
>>
>>
>> On 16.03.2018 10:58, Dmitriy Setrakyan wrote:
>>
>>> On Fri, Mar 16, 2018 at 12:55 AM, Ivan Rakov <[hidden email]>
>>> wrote:
>>>
>>> Vladimir,
>>>> Unlike BACKGROUND, LOG_ONLY provides strict write guarantees unless power
>>>> loss has happened.
>>>> Seems like we need to measure performance difference to decide whether do
>>>> we need separate WAL mode. If it will be invisible, we'll just fix these
>>>> bugs without introducing new mode; if it will be perceptible, we'll
>>>> continue the discussion about introducing LOG_ONLY_SAFE.
>>>> Makes sense?
>>>>
>>>> Yes, this sounds like the right approach.
>>>

Reply | Threaded
Open this post in threaded view
|

Re: Reconsider default WAL mode: we need something between LOG_ONLY and FSYNC

Ivan Rakov
I've attached benchmark results to the JIRA ticket.
We observe ~7% drop in "fair" LOG_ONLY_SAFE mode, independent of WAL
compaction enabled flag. It's pretty significant drop: WAL compaction
itself gives only ~3% drop.

I see two options here:
1) Change LOG_ONLY behavior. That implies that we'll be ready to release
AI 2.5 with 7% drop.
2) Introduce LOG_ONLY_SAFE, make it default, add release note to AI 2.5
that we added power loss durability in default mode, but user may
fallback to previous LOG_ONLY in order to retain performance.

Thoughts?

Best Regards,
Ivan Rakov

On 20.03.2018 16:00, Ivan Rakov wrote:

> Val,
>
>> If a storage is in
>> corrupted state, does it mean that it needs to be completely removed and
>> cluster needs to be restarted without data?
>
> Yes, there's a chance that in LOG_ONLY all local data will be lost,
> but only in *power loss**/ OS crash* case.
> kill -9, JVM crash, death of critical system thread and all other
> cases that usually take place are variations of *process crash*. All
> WAL modes (except NONE, of course) ensure corruption-safety in case of
> process crash.
>
>> If so, I'm not sure any mode
>> that allows corruption makes much sense to me.
> It depends on performance impact of enforcing power-loss corruption
> safety. Price of full protection from power loss is high - FSYNC is
> way slower (2-10 times) than other WAL modes. The question is whether
> ensuring weaker guarantees (corruption can't happen, but loss of last
> updates can) will affect performance as badly as strong guarantees.
> I'll share benchmark results soon.
>
> Best Regards,
> Ivan Rakov
>
> On 20.03.2018 5:09, Valentin Kulichenko wrote:
>> Guys,
>>
>> What do we understand under "data corruption" here? If a storage is in
>> corrupted state, does it mean that it needs to be completely removed and
>> cluster needs to be restarted without data? If so, I'm not sure any mode
>> that allows corruption makes much sense to me. How am I supposed to
>> use a
>> database, if virtually any failure can end with complete loss of data?
>>
>> In any case, this definitely should not be a default behavior. If
>> user ever
>> switches to corruption-unsafe mode, there should be a clear warning
>> about
>> this.
>>
>> -Val
>>
>> On Fri, Mar 16, 2018 at 1:06 AM, Ivan Rakov <[hidden email]>
>> wrote:
>>
>>> Ticket to track changes:
>>> https://issues.apache.org/jira/browse/IGNITE-7754
>>>
>>> Best Regards,
>>> Ivan Rakov
>>>
>>>
>>> On 16.03.2018 10:58, Dmitriy Setrakyan wrote:
>>>
>>>> On Fri, Mar 16, 2018 at 12:55 AM, Ivan Rakov <[hidden email]>
>>>> wrote:
>>>>
>>>> Vladimir,
>>>>> Unlike BACKGROUND, LOG_ONLY provides strict write guarantees
>>>>> unless power
>>>>> loss has happened.
>>>>> Seems like we need to measure performance difference to decide
>>>>> whether do
>>>>> we need separate WAL mode. If it will be invisible, we'll just fix
>>>>> these
>>>>> bugs without introducing new mode; if it will be perceptible, we'll
>>>>> continue the discussion about introducing LOG_ONLY_SAFE.
>>>>> Makes sense?
>>>>>
>>>>> Yes, this sounds like the right approach.
>>>>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: Reconsider default WAL mode: we need something between LOG_ONLY and FSYNC

Dmitriy Pavlov
Hi, I think option 1 is better. As Val said any mode that allows corruption
does not make much sense.

What Ivan mentioned here as drop, in relation to old mode DEFAULT (FSYNC
now), is still significant perfromance boost.

Sincerely,
Dmitriy Pavlov

ср, 21 мар. 2018 г. в 17:56, Ivan Rakov <[hidden email]>:

> I've attached benchmark results to the JIRA ticket.
> We observe ~7% drop in "fair" LOG_ONLY_SAFE mode, independent of WAL
> compaction enabled flag. It's pretty significant drop: WAL compaction
> itself gives only ~3% drop.
>
> I see two options here:
> 1) Change LOG_ONLY behavior. That implies that we'll be ready to release
> AI 2.5 with 7% drop.
> 2) Introduce LOG_ONLY_SAFE, make it default, add release note to AI 2.5
> that we added power loss durability in default mode, but user may
> fallback to previous LOG_ONLY in order to retain performance.
>
> Thoughts?
>
> Best Regards,
> Ivan Rakov
>
> On 20.03.2018 16:00, Ivan Rakov wrote:
> > Val,
> >
> >> If a storage is in
> >> corrupted state, does it mean that it needs to be completely removed and
> >> cluster needs to be restarted without data?
> >
> > Yes, there's a chance that in LOG_ONLY all local data will be lost,
> > but only in *power loss**/ OS crash* case.
> > kill -9, JVM crash, death of critical system thread and all other
> > cases that usually take place are variations of *process crash*. All
> > WAL modes (except NONE, of course) ensure corruption-safety in case of
> > process crash.
> >
> >> If so, I'm not sure any mode
> >> that allows corruption makes much sense to me.
> > It depends on performance impact of enforcing power-loss corruption
> > safety. Price of full protection from power loss is high - FSYNC is
> > way slower (2-10 times) than other WAL modes. The question is whether
> > ensuring weaker guarantees (corruption can't happen, but loss of last
> > updates can) will affect performance as badly as strong guarantees.
> > I'll share benchmark results soon.
> >
> > Best Regards,
> > Ivan Rakov
> >
> > On 20.03.2018 5:09, Valentin Kulichenko wrote:
> >> Guys,
> >>
> >> What do we understand under "data corruption" here? If a storage is in
> >> corrupted state, does it mean that it needs to be completely removed and
> >> cluster needs to be restarted without data? If so, I'm not sure any mode
> >> that allows corruption makes much sense to me. How am I supposed to
> >> use a
> >> database, if virtually any failure can end with complete loss of data?
> >>
> >> In any case, this definitely should not be a default behavior. If
> >> user ever
> >> switches to corruption-unsafe mode, there should be a clear warning
> >> about
> >> this.
> >>
> >> -Val
> >>
> >> On Fri, Mar 16, 2018 at 1:06 AM, Ivan Rakov <[hidden email]>
> >> wrote:
> >>
> >>> Ticket to track changes:
> >>> https://issues.apache.org/jira/browse/IGNITE-7754
> >>>
> >>> Best Regards,
> >>> Ivan Rakov
> >>>
> >>>
> >>> On 16.03.2018 10:58, Dmitriy Setrakyan wrote:
> >>>
> >>>> On Fri, Mar 16, 2018 at 12:55 AM, Ivan Rakov <[hidden email]>
> >>>> wrote:
> >>>>
> >>>> Vladimir,
> >>>>> Unlike BACKGROUND, LOG_ONLY provides strict write guarantees
> >>>>> unless power
> >>>>> loss has happened.
> >>>>> Seems like we need to measure performance difference to decide
> >>>>> whether do
> >>>>> we need separate WAL mode. If it will be invisible, we'll just fix
> >>>>> these
> >>>>> bugs without introducing new mode; if it will be perceptible, we'll
> >>>>> continue the discussion about introducing LOG_ONLY_SAFE.
> >>>>> Makes sense?
> >>>>>
> >>>>> Yes, this sounds like the right approach.
> >>>>
> >
> >
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Reconsider default WAL mode: we need something between LOG_ONLY and FSYNC

Vladimir Ozerov
+1 for accepting drop in LOG_ONLY. 7% is not that much and not a drop at
all, provided that we fixing a bug. I.e. should we implement it correctly
in the first place we would never notice any "drop".
I do not understand why someone would like to use current broken mode.

On Wed, Mar 21, 2018 at 6:11 PM, Dmitry Pavlov <[hidden email]>
wrote:

> Hi, I think option 1 is better. As Val said any mode that allows corruption
> does not make much sense.
>
> What Ivan mentioned here as drop, in relation to old mode DEFAULT (FSYNC
> now), is still significant perfromance boost.
>
> Sincerely,
> Dmitriy Pavlov
>
> ср, 21 мар. 2018 г. в 17:56, Ivan Rakov <[hidden email]>:
>
> > I've attached benchmark results to the JIRA ticket.
> > We observe ~7% drop in "fair" LOG_ONLY_SAFE mode, independent of WAL
> > compaction enabled flag. It's pretty significant drop: WAL compaction
> > itself gives only ~3% drop.
> >
> > I see two options here:
> > 1) Change LOG_ONLY behavior. That implies that we'll be ready to release
> > AI 2.5 with 7% drop.
> > 2) Introduce LOG_ONLY_SAFE, make it default, add release note to AI 2.5
> > that we added power loss durability in default mode, but user may
> > fallback to previous LOG_ONLY in order to retain performance.
> >
> > Thoughts?
> >
> > Best Regards,
> > Ivan Rakov
> >
> > On 20.03.2018 16:00, Ivan Rakov wrote:
> > > Val,
> > >
> > >> If a storage is in
> > >> corrupted state, does it mean that it needs to be completely removed
> and
> > >> cluster needs to be restarted without data?
> > >
> > > Yes, there's a chance that in LOG_ONLY all local data will be lost,
> > > but only in *power loss**/ OS crash* case.
> > > kill -9, JVM crash, death of critical system thread and all other
> > > cases that usually take place are variations of *process crash*. All
> > > WAL modes (except NONE, of course) ensure corruption-safety in case of
> > > process crash.
> > >
> > >> If so, I'm not sure any mode
> > >> that allows corruption makes much sense to me.
> > > It depends on performance impact of enforcing power-loss corruption
> > > safety. Price of full protection from power loss is high - FSYNC is
> > > way slower (2-10 times) than other WAL modes. The question is whether
> > > ensuring weaker guarantees (corruption can't happen, but loss of last
> > > updates can) will affect performance as badly as strong guarantees.
> > > I'll share benchmark results soon.
> > >
> > > Best Regards,
> > > Ivan Rakov
> > >
> > > On 20.03.2018 5:09, Valentin Kulichenko wrote:
> > >> Guys,
> > >>
> > >> What do we understand under "data corruption" here? If a storage is in
> > >> corrupted state, does it mean that it needs to be completely removed
> and
> > >> cluster needs to be restarted without data? If so, I'm not sure any
> mode
> > >> that allows corruption makes much sense to me. How am I supposed to
> > >> use a
> > >> database, if virtually any failure can end with complete loss of data?
> > >>
> > >> In any case, this definitely should not be a default behavior. If
> > >> user ever
> > >> switches to corruption-unsafe mode, there should be a clear warning
> > >> about
> > >> this.
> > >>
> > >> -Val
> > >>
> > >> On Fri, Mar 16, 2018 at 1:06 AM, Ivan Rakov <[hidden email]>
> > >> wrote:
> > >>
> > >>> Ticket to track changes:
> > >>> https://issues.apache.org/jira/browse/IGNITE-7754
> > >>>
> > >>> Best Regards,
> > >>> Ivan Rakov
> > >>>
> > >>>
> > >>> On 16.03.2018 10:58, Dmitriy Setrakyan wrote:
> > >>>
> > >>>> On Fri, Mar 16, 2018 at 12:55 AM, Ivan Rakov <[hidden email]
> >
> > >>>> wrote:
> > >>>>
> > >>>> Vladimir,
> > >>>>> Unlike BACKGROUND, LOG_ONLY provides strict write guarantees
> > >>>>> unless power
> > >>>>> loss has happened.
> > >>>>> Seems like we need to measure performance difference to decide
> > >>>>> whether do
> > >>>>> we need separate WAL mode. If it will be invisible, we'll just fix
> > >>>>> these
> > >>>>> bugs without introducing new mode; if it will be perceptible, we'll
> > >>>>> continue the discussion about introducing LOG_ONLY_SAFE.
> > >>>>> Makes sense?
> > >>>>>
> > >>>>> Yes, this sounds like the right approach.
> > >>>>
> > >
> > >
> >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Reconsider default WAL mode: we need something between LOG_ONLY and FSYNC

Alexey Goncharuk
+1 for fixing LOG_ONLY to enforce corruption safety given the provided
performance results.

2018-03-21 18:20 GMT+03:00 Vladimir Ozerov <[hidden email]>:

> +1 for accepting drop in LOG_ONLY. 7% is not that much and not a drop at
> all, provided that we fixing a bug. I.e. should we implement it correctly
> in the first place we would never notice any "drop".
> I do not understand why someone would like to use current broken mode.
>
> On Wed, Mar 21, 2018 at 6:11 PM, Dmitry Pavlov <[hidden email]>
> wrote:
>
> > Hi, I think option 1 is better. As Val said any mode that allows
> corruption
> > does not make much sense.
> >
> > What Ivan mentioned here as drop, in relation to old mode DEFAULT (FSYNC
> > now), is still significant perfromance boost.
> >
> > Sincerely,
> > Dmitriy Pavlov
> >
> > ср, 21 мар. 2018 г. в 17:56, Ivan Rakov <[hidden email]>:
> >
> > > I've attached benchmark results to the JIRA ticket.
> > > We observe ~7% drop in "fair" LOG_ONLY_SAFE mode, independent of WAL
> > > compaction enabled flag. It's pretty significant drop: WAL compaction
> > > itself gives only ~3% drop.
> > >
> > > I see two options here:
> > > 1) Change LOG_ONLY behavior. That implies that we'll be ready to
> release
> > > AI 2.5 with 7% drop.
> > > 2) Introduce LOG_ONLY_SAFE, make it default, add release note to AI 2.5
> > > that we added power loss durability in default mode, but user may
> > > fallback to previous LOG_ONLY in order to retain performance.
> > >
> > > Thoughts?
> > >
> > > Best Regards,
> > > Ivan Rakov
> > >
> > > On 20.03.2018 16:00, Ivan Rakov wrote:
> > > > Val,
> > > >
> > > >> If a storage is in
> > > >> corrupted state, does it mean that it needs to be completely removed
> > and
> > > >> cluster needs to be restarted without data?
> > > >
> > > > Yes, there's a chance that in LOG_ONLY all local data will be lost,
> > > > but only in *power loss**/ OS crash* case.
> > > > kill -9, JVM crash, death of critical system thread and all other
> > > > cases that usually take place are variations of *process crash*. All
> > > > WAL modes (except NONE, of course) ensure corruption-safety in case
> of
> > > > process crash.
> > > >
> > > >> If so, I'm not sure any mode
> > > >> that allows corruption makes much sense to me.
> > > > It depends on performance impact of enforcing power-loss corruption
> > > > safety. Price of full protection from power loss is high - FSYNC is
> > > > way slower (2-10 times) than other WAL modes. The question is whether
> > > > ensuring weaker guarantees (corruption can't happen, but loss of last
> > > > updates can) will affect performance as badly as strong guarantees.
> > > > I'll share benchmark results soon.
> > > >
> > > > Best Regards,
> > > > Ivan Rakov
> > > >
> > > > On 20.03.2018 5:09, Valentin Kulichenko wrote:
> > > >> Guys,
> > > >>
> > > >> What do we understand under "data corruption" here? If a storage is
> in
> > > >> corrupted state, does it mean that it needs to be completely removed
> > and
> > > >> cluster needs to be restarted without data? If so, I'm not sure any
> > mode
> > > >> that allows corruption makes much sense to me. How am I supposed to
> > > >> use a
> > > >> database, if virtually any failure can end with complete loss of
> data?
> > > >>
> > > >> In any case, this definitely should not be a default behavior. If
> > > >> user ever
> > > >> switches to corruption-unsafe mode, there should be a clear warning
> > > >> about
> > > >> this.
> > > >>
> > > >> -Val
> > > >>
> > > >> On Fri, Mar 16, 2018 at 1:06 AM, Ivan Rakov <[hidden email]>
> > > >> wrote:
> > > >>
> > > >>> Ticket to track changes:
> > > >>> https://issues.apache.org/jira/browse/IGNITE-7754
> > > >>>
> > > >>> Best Regards,
> > > >>> Ivan Rakov
> > > >>>
> > > >>>
> > > >>> On 16.03.2018 10:58, Dmitriy Setrakyan wrote:
> > > >>>
> > > >>>> On Fri, Mar 16, 2018 at 12:55 AM, Ivan Rakov <
> [hidden email]
> > >
> > > >>>> wrote:
> > > >>>>
> > > >>>> Vladimir,
> > > >>>>> Unlike BACKGROUND, LOG_ONLY provides strict write guarantees
> > > >>>>> unless power
> > > >>>>> loss has happened.
> > > >>>>> Seems like we need to measure performance difference to decide
> > > >>>>> whether do
> > > >>>>> we need separate WAL mode. If it will be invisible, we'll just
> fix
> > > >>>>> these
> > > >>>>> bugs without introducing new mode; if it will be perceptible,
> we'll
> > > >>>>> continue the discussion about introducing LOG_ONLY_SAFE.
> > > >>>>> Makes sense?
> > > >>>>>
> > > >>>>> Yes, this sounds like the right approach.
> > > >>>>
> > > >
> > > >
> > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Reconsider default WAL mode: we need something between LOG_ONLY and FSYNC

dmagda
+1 for the fix of LOG_ONLY

On Wed, Mar 21, 2018 at 11:23 AM, Alexey Goncharuk <
[hidden email]> wrote:

> +1 for fixing LOG_ONLY to enforce corruption safety given the provided
> performance results.
>
> 2018-03-21 18:20 GMT+03:00 Vladimir Ozerov <[hidden email]>:
>
> > +1 for accepting drop in LOG_ONLY. 7% is not that much and not a drop at
> > all, provided that we fixing a bug. I.e. should we implement it correctly
> > in the first place we would never notice any "drop".
> > I do not understand why someone would like to use current broken mode.
> >
> > On Wed, Mar 21, 2018 at 6:11 PM, Dmitry Pavlov <[hidden email]>
> > wrote:
> >
> > > Hi, I think option 1 is better. As Val said any mode that allows
> > corruption
> > > does not make much sense.
> > >
> > > What Ivan mentioned here as drop, in relation to old mode DEFAULT
> (FSYNC
> > > now), is still significant perfromance boost.
> > >
> > > Sincerely,
> > > Dmitriy Pavlov
> > >
> > > ср, 21 мар. 2018 г. в 17:56, Ivan Rakov <[hidden email]>:
> > >
> > > > I've attached benchmark results to the JIRA ticket.
> > > > We observe ~7% drop in "fair" LOG_ONLY_SAFE mode, independent of WAL
> > > > compaction enabled flag. It's pretty significant drop: WAL compaction
> > > > itself gives only ~3% drop.
> > > >
> > > > I see two options here:
> > > > 1) Change LOG_ONLY behavior. That implies that we'll be ready to
> > release
> > > > AI 2.5 with 7% drop.
> > > > 2) Introduce LOG_ONLY_SAFE, make it default, add release note to AI
> 2.5
> > > > that we added power loss durability in default mode, but user may
> > > > fallback to previous LOG_ONLY in order to retain performance.
> > > >
> > > > Thoughts?
> > > >
> > > > Best Regards,
> > > > Ivan Rakov
> > > >
> > > > On 20.03.2018 16:00, Ivan Rakov wrote:
> > > > > Val,
> > > > >
> > > > >> If a storage is in
> > > > >> corrupted state, does it mean that it needs to be completely
> removed
> > > and
> > > > >> cluster needs to be restarted without data?
> > > > >
> > > > > Yes, there's a chance that in LOG_ONLY all local data will be lost,
> > > > > but only in *power loss**/ OS crash* case.
> > > > > kill -9, JVM crash, death of critical system thread and all other
> > > > > cases that usually take place are variations of *process crash*.
> All
> > > > > WAL modes (except NONE, of course) ensure corruption-safety in case
> > of
> > > > > process crash.
> > > > >
> > > > >> If so, I'm not sure any mode
> > > > >> that allows corruption makes much sense to me.
> > > > > It depends on performance impact of enforcing power-loss corruption
> > > > > safety. Price of full protection from power loss is high - FSYNC is
> > > > > way slower (2-10 times) than other WAL modes. The question is
> whether
> > > > > ensuring weaker guarantees (corruption can't happen, but loss of
> last
> > > > > updates can) will affect performance as badly as strong guarantees.
> > > > > I'll share benchmark results soon.
> > > > >
> > > > > Best Regards,
> > > > > Ivan Rakov
> > > > >
> > > > > On 20.03.2018 5:09, Valentin Kulichenko wrote:
> > > > >> Guys,
> > > > >>
> > > > >> What do we understand under "data corruption" here? If a storage
> is
> > in
> > > > >> corrupted state, does it mean that it needs to be completely
> removed
> > > and
> > > > >> cluster needs to be restarted without data? If so, I'm not sure
> any
> > > mode
> > > > >> that allows corruption makes much sense to me. How am I supposed
> to
> > > > >> use a
> > > > >> database, if virtually any failure can end with complete loss of
> > data?
> > > > >>
> > > > >> In any case, this definitely should not be a default behavior. If
> > > > >> user ever
> > > > >> switches to corruption-unsafe mode, there should be a clear
> warning
> > > > >> about
> > > > >> this.
> > > > >>
> > > > >> -Val
> > > > >>
> > > > >> On Fri, Mar 16, 2018 at 1:06 AM, Ivan Rakov <
> [hidden email]>
> > > > >> wrote:
> > > > >>
> > > > >>> Ticket to track changes:
> > > > >>> https://issues.apache.org/jira/browse/IGNITE-7754
> > > > >>>
> > > > >>> Best Regards,
> > > > >>> Ivan Rakov
> > > > >>>
> > > > >>>
> > > > >>> On 16.03.2018 10:58, Dmitriy Setrakyan wrote:
> > > > >>>
> > > > >>>> On Fri, Mar 16, 2018 at 12:55 AM, Ivan Rakov <
> > [hidden email]
> > > >
> > > > >>>> wrote:
> > > > >>>>
> > > > >>>> Vladimir,
> > > > >>>>> Unlike BACKGROUND, LOG_ONLY provides strict write guarantees
> > > > >>>>> unless power
> > > > >>>>> loss has happened.
> > > > >>>>> Seems like we need to measure performance difference to decide
> > > > >>>>> whether do
> > > > >>>>> we need separate WAL mode. If it will be invisible, we'll just
> > fix
> > > > >>>>> these
> > > > >>>>> bugs without introducing new mode; if it will be perceptible,
> > we'll
> > > > >>>>> continue the discussion about introducing LOG_ONLY_SAFE.
> > > > >>>>> Makes sense?
> > > > >>>>>
> > > > >>>>> Yes, this sounds like the right approach.
> > > > >>>>
> > > > >
> > > > >
> > > >
> > > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Reconsider default WAL mode: we need something between LOG_ONLY and FSYNC

Ilya Lantukh
+1 for fixing LOG_ONLY. If current implementation doesn't protect from data
corruption, it doesn't make sence.

On Wed, Mar 21, 2018 at 10:38 PM, Denis Magda <[hidden email]> wrote:

> +1 for the fix of LOG_ONLY
>
> On Wed, Mar 21, 2018 at 11:23 AM, Alexey Goncharuk <
> [hidden email]> wrote:
>
> > +1 for fixing LOG_ONLY to enforce corruption safety given the provided
> > performance results.
> >
> > 2018-03-21 18:20 GMT+03:00 Vladimir Ozerov <[hidden email]>:
> >
> > > +1 for accepting drop in LOG_ONLY. 7% is not that much and not a drop
> at
> > > all, provided that we fixing a bug. I.e. should we implement it
> correctly
> > > in the first place we would never notice any "drop".
> > > I do not understand why someone would like to use current broken mode.
> > >
> > > On Wed, Mar 21, 2018 at 6:11 PM, Dmitry Pavlov <[hidden email]>
> > > wrote:
> > >
> > > > Hi, I think option 1 is better. As Val said any mode that allows
> > > corruption
> > > > does not make much sense.
> > > >
> > > > What Ivan mentioned here as drop, in relation to old mode DEFAULT
> > (FSYNC
> > > > now), is still significant perfromance boost.
> > > >
> > > > Sincerely,
> > > > Dmitriy Pavlov
> > > >
> > > > ср, 21 мар. 2018 г. в 17:56, Ivan Rakov <[hidden email]>:
> > > >
> > > > > I've attached benchmark results to the JIRA ticket.
> > > > > We observe ~7% drop in "fair" LOG_ONLY_SAFE mode, independent of
> WAL
> > > > > compaction enabled flag. It's pretty significant drop: WAL
> compaction
> > > > > itself gives only ~3% drop.
> > > > >
> > > > > I see two options here:
> > > > > 1) Change LOG_ONLY behavior. That implies that we'll be ready to
> > > release
> > > > > AI 2.5 with 7% drop.
> > > > > 2) Introduce LOG_ONLY_SAFE, make it default, add release note to AI
> > 2.5
> > > > > that we added power loss durability in default mode, but user may
> > > > > fallback to previous LOG_ONLY in order to retain performance.
> > > > >
> > > > > Thoughts?
> > > > >
> > > > > Best Regards,
> > > > > Ivan Rakov
> > > > >
> > > > > On 20.03.2018 16:00, Ivan Rakov wrote:
> > > > > > Val,
> > > > > >
> > > > > >> If a storage is in
> > > > > >> corrupted state, does it mean that it needs to be completely
> > removed
> > > > and
> > > > > >> cluster needs to be restarted without data?
> > > > > >
> > > > > > Yes, there's a chance that in LOG_ONLY all local data will be
> lost,
> > > > > > but only in *power loss**/ OS crash* case.
> > > > > > kill -9, JVM crash, death of critical system thread and all other
> > > > > > cases that usually take place are variations of *process crash*.
> > All
> > > > > > WAL modes (except NONE, of course) ensure corruption-safety in
> case
> > > of
> > > > > > process crash.
> > > > > >
> > > > > >> If so, I'm not sure any mode
> > > > > >> that allows corruption makes much sense to me.
> > > > > > It depends on performance impact of enforcing power-loss
> corruption
> > > > > > safety. Price of full protection from power loss is high - FSYNC
> is
> > > > > > way slower (2-10 times) than other WAL modes. The question is
> > whether
> > > > > > ensuring weaker guarantees (corruption can't happen, but loss of
> > last
> > > > > > updates can) will affect performance as badly as strong
> guarantees.
> > > > > > I'll share benchmark results soon.
> > > > > >
> > > > > > Best Regards,
> > > > > > Ivan Rakov
> > > > > >
> > > > > > On 20.03.2018 5:09, Valentin Kulichenko wrote:
> > > > > >> Guys,
> > > > > >>
> > > > > >> What do we understand under "data corruption" here? If a storage
> > is
> > > in
> > > > > >> corrupted state, does it mean that it needs to be completely
> > removed
> > > > and
> > > > > >> cluster needs to be restarted without data? If so, I'm not sure
> > any
> > > > mode
> > > > > >> that allows corruption makes much sense to me. How am I supposed
> > to
> > > > > >> use a
> > > > > >> database, if virtually any failure can end with complete loss of
> > > data?
> > > > > >>
> > > > > >> In any case, this definitely should not be a default behavior.
> If
> > > > > >> user ever
> > > > > >> switches to corruption-unsafe mode, there should be a clear
> > warning
> > > > > >> about
> > > > > >> this.
> > > > > >>
> > > > > >> -Val
> > > > > >>
> > > > > >> On Fri, Mar 16, 2018 at 1:06 AM, Ivan Rakov <
> > [hidden email]>
> > > > > >> wrote:
> > > > > >>
> > > > > >>> Ticket to track changes:
> > > > > >>> https://issues.apache.org/jira/browse/IGNITE-7754
> > > > > >>>
> > > > > >>> Best Regards,
> > > > > >>> Ivan Rakov
> > > > > >>>
> > > > > >>>
> > > > > >>> On 16.03.2018 10:58, Dmitriy Setrakyan wrote:
> > > > > >>>
> > > > > >>>> On Fri, Mar 16, 2018 at 12:55 AM, Ivan Rakov <
> > > [hidden email]
> > > > >
> > > > > >>>> wrote:
> > > > > >>>>
> > > > > >>>> Vladimir,
> > > > > >>>>> Unlike BACKGROUND, LOG_ONLY provides strict write guarantees
> > > > > >>>>> unless power
> > > > > >>>>> loss has happened.
> > > > > >>>>> Seems like we need to measure performance difference to
> decide
> > > > > >>>>> whether do
> > > > > >>>>> we need separate WAL mode. If it will be invisible, we'll
> just
> > > fix
> > > > > >>>>> these
> > > > > >>>>> bugs without introducing new mode; if it will be perceptible,
> > > we'll
> > > > > >>>>> continue the discussion about introducing LOG_ONLY_SAFE.
> > > > > >>>>> Makes sense?
> > > > > >>>>>
> > > > > >>>>> Yes, this sounds like the right approach.
> > > > > >>>>
> > > > > >
> > > > > >
> > > > >
> > > > >
> > > >
> > >
> >
>



--
Best regards,
Ilya
Reply | Threaded
Open this post in threaded view
|

Re: Reconsider default WAL mode: we need something between LOG_ONLY and FSYNC

Ivan Rakov
Thanks all!
We seem to have reached a consensus on this issue. I'll just add
necessary fsyncs under IGNITE-7754.

Best Regards,
Ivan Rakov

On 22.03.2018 15:13, Ilya Lantukh wrote:

> +1 for fixing LOG_ONLY. If current implementation doesn't protect from data
> corruption, it doesn't make sence.
>
> On Wed, Mar 21, 2018 at 10:38 PM, Denis Magda <[hidden email]> wrote:
>
>> +1 for the fix of LOG_ONLY
>>
>> On Wed, Mar 21, 2018 at 11:23 AM, Alexey Goncharuk <
>> [hidden email]> wrote:
>>
>>> +1 for fixing LOG_ONLY to enforce corruption safety given the provided
>>> performance results.
>>>
>>> 2018-03-21 18:20 GMT+03:00 Vladimir Ozerov <[hidden email]>:
>>>
>>>> +1 for accepting drop in LOG_ONLY. 7% is not that much and not a drop
>> at
>>>> all, provided that we fixing a bug. I.e. should we implement it
>> correctly
>>>> in the first place we would never notice any "drop".
>>>> I do not understand why someone would like to use current broken mode.
>>>>
>>>> On Wed, Mar 21, 2018 at 6:11 PM, Dmitry Pavlov <[hidden email]>
>>>> wrote:
>>>>
>>>>> Hi, I think option 1 is better. As Val said any mode that allows
>>>> corruption
>>>>> does not make much sense.
>>>>>
>>>>> What Ivan mentioned here as drop, in relation to old mode DEFAULT
>>> (FSYNC
>>>>> now), is still significant perfromance boost.
>>>>>
>>>>> Sincerely,
>>>>> Dmitriy Pavlov
>>>>>
>>>>> ср, 21 мар. 2018 г. в 17:56, Ivan Rakov <[hidden email]>:
>>>>>
>>>>>> I've attached benchmark results to the JIRA ticket.
>>>>>> We observe ~7% drop in "fair" LOG_ONLY_SAFE mode, independent of
>> WAL
>>>>>> compaction enabled flag. It's pretty significant drop: WAL
>> compaction
>>>>>> itself gives only ~3% drop.
>>>>>>
>>>>>> I see two options here:
>>>>>> 1) Change LOG_ONLY behavior. That implies that we'll be ready to
>>>> release
>>>>>> AI 2.5 with 7% drop.
>>>>>> 2) Introduce LOG_ONLY_SAFE, make it default, add release note to AI
>>> 2.5
>>>>>> that we added power loss durability in default mode, but user may
>>>>>> fallback to previous LOG_ONLY in order to retain performance.
>>>>>>
>>>>>> Thoughts?
>>>>>>
>>>>>> Best Regards,
>>>>>> Ivan Rakov
>>>>>>
>>>>>> On 20.03.2018 16:00, Ivan Rakov wrote:
>>>>>>> Val,
>>>>>>>
>>>>>>>> If a storage is in
>>>>>>>> corrupted state, does it mean that it needs to be completely
>>> removed
>>>>> and
>>>>>>>> cluster needs to be restarted without data?
>>>>>>> Yes, there's a chance that in LOG_ONLY all local data will be
>> lost,
>>>>>>> but only in *power loss**/ OS crash* case.
>>>>>>> kill -9, JVM crash, death of critical system thread and all other
>>>>>>> cases that usually take place are variations of *process crash*.
>>> All
>>>>>>> WAL modes (except NONE, of course) ensure corruption-safety in
>> case
>>>> of
>>>>>>> process crash.
>>>>>>>
>>>>>>>> If so, I'm not sure any mode
>>>>>>>> that allows corruption makes much sense to me.
>>>>>>> It depends on performance impact of enforcing power-loss
>> corruption
>>>>>>> safety. Price of full protection from power loss is high - FSYNC
>> is
>>>>>>> way slower (2-10 times) than other WAL modes. The question is
>>> whether
>>>>>>> ensuring weaker guarantees (corruption can't happen, but loss of
>>> last
>>>>>>> updates can) will affect performance as badly as strong
>> guarantees.
>>>>>>> I'll share benchmark results soon.
>>>>>>>
>>>>>>> Best Regards,
>>>>>>> Ivan Rakov
>>>>>>>
>>>>>>> On 20.03.2018 5:09, Valentin Kulichenko wrote:
>>>>>>>> Guys,
>>>>>>>>
>>>>>>>> What do we understand under "data corruption" here? If a storage
>>> is
>>>> in
>>>>>>>> corrupted state, does it mean that it needs to be completely
>>> removed
>>>>> and
>>>>>>>> cluster needs to be restarted without data? If so, I'm not sure
>>> any
>>>>> mode
>>>>>>>> that allows corruption makes much sense to me. How am I supposed
>>> to
>>>>>>>> use a
>>>>>>>> database, if virtually any failure can end with complete loss of
>>>> data?
>>>>>>>> In any case, this definitely should not be a default behavior.
>> If
>>>>>>>> user ever
>>>>>>>> switches to corruption-unsafe mode, there should be a clear
>>> warning
>>>>>>>> about
>>>>>>>> this.
>>>>>>>>
>>>>>>>> -Val
>>>>>>>>
>>>>>>>> On Fri, Mar 16, 2018 at 1:06 AM, Ivan Rakov <
>>> [hidden email]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Ticket to track changes:
>>>>>>>>> https://issues.apache.org/jira/browse/IGNITE-7754
>>>>>>>>>
>>>>>>>>> Best Regards,
>>>>>>>>> Ivan Rakov
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 16.03.2018 10:58, Dmitriy Setrakyan wrote:
>>>>>>>>>
>>>>>>>>>> On Fri, Mar 16, 2018 at 12:55 AM, Ivan Rakov <
>>>> [hidden email]
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> Vladimir,
>>>>>>>>>>> Unlike BACKGROUND, LOG_ONLY provides strict write guarantees
>>>>>>>>>>> unless power
>>>>>>>>>>> loss has happened.
>>>>>>>>>>> Seems like we need to measure performance difference to
>> decide
>>>>>>>>>>> whether do
>>>>>>>>>>> we need separate WAL mode. If it will be invisible, we'll
>> just
>>>> fix
>>>>>>>>>>> these
>>>>>>>>>>> bugs without introducing new mode; if it will be perceptible,
>>>> we'll
>>>>>>>>>>> continue the discussion about introducing LOG_ONLY_SAFE.
>>>>>>>>>>> Makes sense?
>>>>>>>>>>>
>>>>>>>>>>> Yes, this sounds like the right approach.
>>>>>>>
>>>>>>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: Reconsider default WAL mode: we need something between LOG_ONLY and FSYNC

dmagda
Ivan,

How quick are you going to merge the fix into the master? Many persistence
related optimizations have already stacked up. Probably, we can release
them sooner if the community agrees.

--
Denis

On Thu, Mar 22, 2018 at 5:22 AM, Ivan Rakov <[hidden email]> wrote:

> Thanks all!
> We seem to have reached a consensus on this issue. I'll just add necessary
> fsyncs under IGNITE-7754.
>
> Best Regards,
> Ivan Rakov
>
>
> On 22.03.2018 15:13, Ilya Lantukh wrote:
>
>> +1 for fixing LOG_ONLY. If current implementation doesn't protect from
>> data
>> corruption, it doesn't make sence.
>>
>> On Wed, Mar 21, 2018 at 10:38 PM, Denis Magda <[hidden email]> wrote:
>>
>> +1 for the fix of LOG_ONLY
>>>
>>> On Wed, Mar 21, 2018 at 11:23 AM, Alexey Goncharuk <
>>> [hidden email]> wrote:
>>>
>>> +1 for fixing LOG_ONLY to enforce corruption safety given the provided
>>>> performance results.
>>>>
>>>> 2018-03-21 18:20 GMT+03:00 Vladimir Ozerov <[hidden email]>:
>>>>
>>>> +1 for accepting drop in LOG_ONLY. 7% is not that much and not a drop
>>>>>
>>>> at
>>>
>>>> all, provided that we fixing a bug. I.e. should we implement it
>>>>>
>>>> correctly
>>>
>>>> in the first place we would never notice any "drop".
>>>>> I do not understand why someone would like to use current broken mode.
>>>>>
>>>>> On Wed, Mar 21, 2018 at 6:11 PM, Dmitry Pavlov <[hidden email]>
>>>>> wrote:
>>>>>
>>>>> Hi, I think option 1 is better. As Val said any mode that allows
>>>>>>
>>>>> corruption
>>>>>
>>>>>> does not make much sense.
>>>>>>
>>>>>> What Ivan mentioned here as drop, in relation to old mode DEFAULT
>>>>>>
>>>>> (FSYNC
>>>>
>>>>> now), is still significant perfromance boost.
>>>>>>
>>>>>> Sincerely,
>>>>>> Dmitriy Pavlov
>>>>>>
>>>>>> ср, 21 мар. 2018 г. в 17:56, Ivan Rakov <[hidden email]>:
>>>>>>
>>>>>> I've attached benchmark results to the JIRA ticket.
>>>>>>> We observe ~7% drop in "fair" LOG_ONLY_SAFE mode, independent of
>>>>>>>
>>>>>> WAL
>>>
>>>> compaction enabled flag. It's pretty significant drop: WAL
>>>>>>>
>>>>>> compaction
>>>
>>>> itself gives only ~3% drop.
>>>>>>>
>>>>>>> I see two options here:
>>>>>>> 1) Change LOG_ONLY behavior. That implies that we'll be ready to
>>>>>>>
>>>>>> release
>>>>>
>>>>>> AI 2.5 with 7% drop.
>>>>>>> 2) Introduce LOG_ONLY_SAFE, make it default, add release note to AI
>>>>>>>
>>>>>> 2.5
>>>>
>>>>> that we added power loss durability in default mode, but user may
>>>>>>> fallback to previous LOG_ONLY in order to retain performance.
>>>>>>>
>>>>>>> Thoughts?
>>>>>>>
>>>>>>> Best Regards,
>>>>>>> Ivan Rakov
>>>>>>>
>>>>>>> On 20.03.2018 16:00, Ivan Rakov wrote:
>>>>>>>
>>>>>>>> Val,
>>>>>>>>
>>>>>>>> If a storage is in
>>>>>>>>> corrupted state, does it mean that it needs to be completely
>>>>>>>>>
>>>>>>>> removed
>>>>
>>>>> and
>>>>>>
>>>>>>> cluster needs to be restarted without data?
>>>>>>>>>
>>>>>>>> Yes, there's a chance that in LOG_ONLY all local data will be
>>>>>>>>
>>>>>>> lost,
>>>
>>>> but only in *power loss**/ OS crash* case.
>>>>>>>> kill -9, JVM crash, death of critical system thread and all other
>>>>>>>> cases that usually take place are variations of *process crash*.
>>>>>>>>
>>>>>>> All
>>>>
>>>>> WAL modes (except NONE, of course) ensure corruption-safety in
>>>>>>>>
>>>>>>> case
>>>
>>>> of
>>>>>
>>>>>> process crash.
>>>>>>>>
>>>>>>>> If so, I'm not sure any mode
>>>>>>>>> that allows corruption makes much sense to me.
>>>>>>>>>
>>>>>>>> It depends on performance impact of enforcing power-loss
>>>>>>>>
>>>>>>> corruption
>>>
>>>> safety. Price of full protection from power loss is high - FSYNC
>>>>>>>>
>>>>>>> is
>>>
>>>> way slower (2-10 times) than other WAL modes. The question is
>>>>>>>>
>>>>>>> whether
>>>>
>>>>> ensuring weaker guarantees (corruption can't happen, but loss of
>>>>>>>>
>>>>>>> last
>>>>
>>>>> updates can) will affect performance as badly as strong
>>>>>>>>
>>>>>>> guarantees.
>>>
>>>> I'll share benchmark results soon.
>>>>>>>>
>>>>>>>> Best Regards,
>>>>>>>> Ivan Rakov
>>>>>>>>
>>>>>>>> On 20.03.2018 5:09, Valentin Kulichenko wrote:
>>>>>>>>
>>>>>>>>> Guys,
>>>>>>>>>
>>>>>>>>> What do we understand under "data corruption" here? If a storage
>>>>>>>>>
>>>>>>>> is
>>>>
>>>>> in
>>>>>
>>>>>> corrupted state, does it mean that it needs to be completely
>>>>>>>>>
>>>>>>>> removed
>>>>
>>>>> and
>>>>>>
>>>>>>> cluster needs to be restarted without data? If so, I'm not sure
>>>>>>>>>
>>>>>>>> any
>>>>
>>>>> mode
>>>>>>
>>>>>>> that allows corruption makes much sense to me. How am I supposed
>>>>>>>>>
>>>>>>>> to
>>>>
>>>>> use a
>>>>>>>>> database, if virtually any failure can end with complete loss of
>>>>>>>>>
>>>>>>>> data?
>>>>>
>>>>>> In any case, this definitely should not be a default behavior.
>>>>>>>>>
>>>>>>>> If
>>>
>>>> user ever
>>>>>>>>> switches to corruption-unsafe mode, there should be a clear
>>>>>>>>>
>>>>>>>> warning
>>>>
>>>>> about
>>>>>>>>> this.
>>>>>>>>>
>>>>>>>>> -Val
>>>>>>>>>
>>>>>>>>> On Fri, Mar 16, 2018 at 1:06 AM, Ivan Rakov <
>>>>>>>>>
>>>>>>>> [hidden email]>
>>>>
>>>>> wrote:
>>>>>>>>>
>>>>>>>>> Ticket to track changes:
>>>>>>>>>> https://issues.apache.org/jira/browse/IGNITE-7754
>>>>>>>>>>
>>>>>>>>>> Best Regards,
>>>>>>>>>> Ivan Rakov
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 16.03.2018 10:58, Dmitriy Setrakyan wrote:
>>>>>>>>>>
>>>>>>>>>> On Fri, Mar 16, 2018 at 12:55 AM, Ivan Rakov <
>>>>>>>>>>>
>>>>>>>>>> [hidden email]
>>>>>
>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Vladimir,
>>>>>>>>>>>
>>>>>>>>>>>> Unlike BACKGROUND, LOG_ONLY provides strict write guarantees
>>>>>>>>>>>> unless power
>>>>>>>>>>>> loss has happened.
>>>>>>>>>>>> Seems like we need to measure performance difference to
>>>>>>>>>>>>
>>>>>>>>>>> decide
>>>
>>>> whether do
>>>>>>>>>>>> we need separate WAL mode. If it will be invisible, we'll
>>>>>>>>>>>>
>>>>>>>>>>> just
>>>
>>>> fix
>>>>>
>>>>>> these
>>>>>>>>>>>> bugs without introducing new mode; if it will be perceptible,
>>>>>>>>>>>>
>>>>>>>>>>> we'll
>>>>>
>>>>>> continue the discussion about introducing LOG_ONLY_SAFE.
>>>>>>>>>>>> Makes sense?
>>>>>>>>>>>>
>>>>>>>>>>>> Yes, this sounds like the right approach.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>
>>>>>>>
>>
>>
>
Reply | Threaded
Open this post in threaded view
|

Re: Reconsider default WAL mode: we need something between LOG_ONLY and FSYNC

Ivan Rakov
Fixes are quite simple.
I expect them to be merged in master in a week in worst case.

Best Regards,
Ivan Rakov

On 22.03.2018 17:49, Denis Magda wrote:

> Ivan,
>
> How quick are you going to merge the fix into the master? Many persistence
> related optimizations have already stacked up. Probably, we can release
> them sooner if the community agrees.
>
> --
> Denis
>
> On Thu, Mar 22, 2018 at 5:22 AM, Ivan Rakov <[hidden email]> wrote:
>
>> Thanks all!
>> We seem to have reached a consensus on this issue. I'll just add necessary
>> fsyncs under IGNITE-7754.
>>
>> Best Regards,
>> Ivan Rakov
>>
>>
>> On 22.03.2018 15:13, Ilya Lantukh wrote:
>>
>>> +1 for fixing LOG_ONLY. If current implementation doesn't protect from
>>> data
>>> corruption, it doesn't make sence.
>>>
>>> On Wed, Mar 21, 2018 at 10:38 PM, Denis Magda <[hidden email]> wrote:
>>>
>>> +1 for the fix of LOG_ONLY
>>>> On Wed, Mar 21, 2018 at 11:23 AM, Alexey Goncharuk <
>>>> [hidden email]> wrote:
>>>>
>>>> +1 for fixing LOG_ONLY to enforce corruption safety given the provided
>>>>> performance results.
>>>>>
>>>>> 2018-03-21 18:20 GMT+03:00 Vladimir Ozerov <[hidden email]>:
>>>>>
>>>>> +1 for accepting drop in LOG_ONLY. 7% is not that much and not a drop
>>>>> at
>>>>> all, provided that we fixing a bug. I.e. should we implement it
>>>>> correctly
>>>>> in the first place we would never notice any "drop".
>>>>>> I do not understand why someone would like to use current broken mode.
>>>>>>
>>>>>> On Wed, Mar 21, 2018 at 6:11 PM, Dmitry Pavlov <[hidden email]>
>>>>>> wrote:
>>>>>>
>>>>>> Hi, I think option 1 is better. As Val said any mode that allows
>>>>>> corruption
>>>>>>
>>>>>>> does not make much sense.
>>>>>>>
>>>>>>> What Ivan mentioned here as drop, in relation to old mode DEFAULT
>>>>>>>
>>>>>> (FSYNC
>>>>>> now), is still significant perfromance boost.
>>>>>>> Sincerely,
>>>>>>> Dmitriy Pavlov
>>>>>>>
>>>>>>> ср, 21 мар. 2018 г. в 17:56, Ivan Rakov <[hidden email]>:
>>>>>>>
>>>>>>> I've attached benchmark results to the JIRA ticket.
>>>>>>>> We observe ~7% drop in "fair" LOG_ONLY_SAFE mode, independent of
>>>>>>>>
>>>>>>> WAL
>>>>> compaction enabled flag. It's pretty significant drop: WAL
>>>>>>> compaction
>>>>> itself gives only ~3% drop.
>>>>>>>> I see two options here:
>>>>>>>> 1) Change LOG_ONLY behavior. That implies that we'll be ready to
>>>>>>>>
>>>>>>> release
>>>>>>> AI 2.5 with 7% drop.
>>>>>>>> 2) Introduce LOG_ONLY_SAFE, make it default, add release note to AI
>>>>>>>>
>>>>>>> 2.5
>>>>>> that we added power loss durability in default mode, but user may
>>>>>>>> fallback to previous LOG_ONLY in order to retain performance.
>>>>>>>>
>>>>>>>> Thoughts?
>>>>>>>>
>>>>>>>> Best Regards,
>>>>>>>> Ivan Rakov
>>>>>>>>
>>>>>>>> On 20.03.2018 16:00, Ivan Rakov wrote:
>>>>>>>>
>>>>>>>>> Val,
>>>>>>>>>
>>>>>>>>> If a storage is in
>>>>>>>>>> corrupted state, does it mean that it needs to be completely
>>>>>>>>>>
>>>>>>>>> removed
>>>>>> and
>>>>>>>> cluster needs to be restarted without data?
>>>>>>>>> Yes, there's a chance that in LOG_ONLY all local data will be
>>>>>>>>>
>>>>>>>> lost,
>>>>> but only in *power loss**/ OS crash* case.
>>>>>>>>> kill -9, JVM crash, death of critical system thread and all other
>>>>>>>>> cases that usually take place are variations of *process crash*.
>>>>>>>>>
>>>>>>>> All
>>>>>> WAL modes (except NONE, of course) ensure corruption-safety in
>>>>>>>> case
>>>>> of
>>>>>>> process crash.
>>>>>>>>> If so, I'm not sure any mode
>>>>>>>>>> that allows corruption makes much sense to me.
>>>>>>>>>>
>>>>>>>>> It depends on performance impact of enforcing power-loss
>>>>>>>>>
>>>>>>>> corruption
>>>>> safety. Price of full protection from power loss is high - FSYNC
>>>>>>>> is
>>>>> way slower (2-10 times) than other WAL modes. The question is
>>>>>>>> whether
>>>>>> ensuring weaker guarantees (corruption can't happen, but loss of
>>>>>>>> last
>>>>>> updates can) will affect performance as badly as strong
>>>>>>>> guarantees.
>>>>> I'll share benchmark results soon.
>>>>>>>>> Best Regards,
>>>>>>>>> Ivan Rakov
>>>>>>>>>
>>>>>>>>> On 20.03.2018 5:09, Valentin Kulichenko wrote:
>>>>>>>>>
>>>>>>>>>> Guys,
>>>>>>>>>>
>>>>>>>>>> What do we understand under "data corruption" here? If a storage
>>>>>>>>>>
>>>>>>>>> is
>>>>>> in
>>>>>>
>>>>>>> corrupted state, does it mean that it needs to be completely
>>>>>>>>> removed
>>>>>> and
>>>>>>>> cluster needs to be restarted without data? If so, I'm not sure
>>>>>>>>> any
>>>>>> mode
>>>>>>>> that allows corruption makes much sense to me. How am I supposed
>>>>>>>>> to
>>>>>> use a
>>>>>>>>>> database, if virtually any failure can end with complete loss of
>>>>>>>>>>
>>>>>>>>> data?
>>>>>>> In any case, this definitely should not be a default behavior.
>>>>>>>>> If
>>>>> user ever
>>>>>>>>>> switches to corruption-unsafe mode, there should be a clear
>>>>>>>>>>
>>>>>>>>> warning
>>>>>> about
>>>>>>>>>> this.
>>>>>>>>>>
>>>>>>>>>> -Val
>>>>>>>>>>
>>>>>>>>>> On Fri, Mar 16, 2018 at 1:06 AM, Ivan Rakov <
>>>>>>>>>>
>>>>>>>>> [hidden email]>
>>>>>> wrote:
>>>>>>>>>> Ticket to track changes:
>>>>>>>>>>> https://issues.apache.org/jira/browse/IGNITE-7754
>>>>>>>>>>>
>>>>>>>>>>> Best Regards,
>>>>>>>>>>> Ivan Rakov
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 16.03.2018 10:58, Dmitriy Setrakyan wrote:
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Mar 16, 2018 at 12:55 AM, Ivan Rakov <
>>>>>>>>>>> [hidden email]
>>>>>>> wrote:
>>>>>>>>>>>> Vladimir,
>>>>>>>>>>>>
>>>>>>>>>>>>> Unlike BACKGROUND, LOG_ONLY provides strict write guarantees
>>>>>>>>>>>>> unless power
>>>>>>>>>>>>> loss has happened.
>>>>>>>>>>>>> Seems like we need to measure performance difference to
>>>>>>>>>>>>>
>>>>>>>>>>>> decide
>>>>> whether do
>>>>>>>>>>>>> we need separate WAL mode. If it will be invisible, we'll
>>>>>>>>>>>>>
>>>>>>>>>>>> just
>>>>> fix
>>>>>>> these
>>>>>>>>>>>>> bugs without introducing new mode; if it will be perceptible,
>>>>>>>>>>>>>
>>>>>>>>>>>> we'll
>>>>>>> continue the discussion about introducing LOG_ONLY_SAFE.
>>>>>>>>>>>>> Makes sense?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Yes, this sounds like the right approach.
>>>>>>>>>>>>>
>>>

Reply | Threaded
Open this post in threaded view
|

Re: Reconsider default WAL mode: we need something between LOG_ONLY and FSYNC

Ivan Rakov
Igniters, there's another important question about this matter.
Do we want to add extra FSYNCS for BACKGROUND WAL mode? I think that we
have to do it: it will cause similar performance drop, but if we
consider LOG_ONLY broken without these fixes, BACKGROUND is broken as well.

Best Regards,
Ivan Rakov

On 23.03.2018 10:27, Ivan Rakov wrote:

> Fixes are quite simple.
> I expect them to be merged in master in a week in worst case.
>
> Best Regards,
> Ivan Rakov
>
> On 22.03.2018 17:49, Denis Magda wrote:
>> Ivan,
>>
>> How quick are you going to merge the fix into the master? Many
>> persistence
>> related optimizations have already stacked up. Probably, we can release
>> them sooner if the community agrees.
>>
>> --
>> Denis
>>
>> On Thu, Mar 22, 2018 at 5:22 AM, Ivan Rakov <[hidden email]>
>> wrote:
>>
>>> Thanks all!
>>> We seem to have reached a consensus on this issue. I'll just add
>>> necessary
>>> fsyncs under IGNITE-7754.
>>>
>>> Best Regards,
>>> Ivan Rakov
>>>
>>>
>>> On 22.03.2018 15:13, Ilya Lantukh wrote:
>>>
>>>> +1 for fixing LOG_ONLY. If current implementation doesn't protect from
>>>> data
>>>> corruption, it doesn't make sence.
>>>>
>>>> On Wed, Mar 21, 2018 at 10:38 PM, Denis Magda <[hidden email]>
>>>> wrote:
>>>>
>>>> +1 for the fix of LOG_ONLY
>>>>> On Wed, Mar 21, 2018 at 11:23 AM, Alexey Goncharuk <
>>>>> [hidden email]> wrote:
>>>>>
>>>>> +1 for fixing LOG_ONLY to enforce corruption safety given the
>>>>> provided
>>>>>> performance results.
>>>>>>
>>>>>> 2018-03-21 18:20 GMT+03:00 Vladimir Ozerov <[hidden email]>:
>>>>>>
>>>>>> +1 for accepting drop in LOG_ONLY. 7% is not that much and not a
>>>>>> drop
>>>>>> at
>>>>>> all, provided that we fixing a bug. I.e. should we implement it
>>>>>> correctly
>>>>>> in the first place we would never notice any "drop".
>>>>>>> I do not understand why someone would like to use current broken
>>>>>>> mode.
>>>>>>>
>>>>>>> On Wed, Mar 21, 2018 at 6:11 PM, Dmitry Pavlov
>>>>>>> <[hidden email]>
>>>>>>> wrote:
>>>>>>>
>>>>>>> Hi, I think option 1 is better. As Val said any mode that allows
>>>>>>> corruption
>>>>>>>
>>>>>>>> does not make much sense.
>>>>>>>>
>>>>>>>> What Ivan mentioned here as drop, in relation to old mode DEFAULT
>>>>>>>>
>>>>>>> (FSYNC
>>>>>>> now), is still significant perfromance boost.
>>>>>>>> Sincerely,
>>>>>>>> Dmitriy Pavlov
>>>>>>>>
>>>>>>>> ср, 21 мар. 2018 г. в 17:56, Ivan Rakov <[hidden email]>:
>>>>>>>>
>>>>>>>> I've attached benchmark results to the JIRA ticket.
>>>>>>>>> We observe ~7% drop in "fair" LOG_ONLY_SAFE mode, independent of
>>>>>>>>>
>>>>>>>> WAL
>>>>>> compaction enabled flag. It's pretty significant drop: WAL
>>>>>>>> compaction
>>>>>> itself gives only ~3% drop.
>>>>>>>>> I see two options here:
>>>>>>>>> 1) Change LOG_ONLY behavior. That implies that we'll be ready to
>>>>>>>>>
>>>>>>>> release
>>>>>>>> AI 2.5 with 7% drop.
>>>>>>>>> 2) Introduce LOG_ONLY_SAFE, make it default, add release note
>>>>>>>>> to AI
>>>>>>>>>
>>>>>>>> 2.5
>>>>>>> that we added power loss durability in default mode, but user may
>>>>>>>>> fallback to previous LOG_ONLY in order to retain performance.
>>>>>>>>>
>>>>>>>>> Thoughts?
>>>>>>>>>
>>>>>>>>> Best Regards,
>>>>>>>>> Ivan Rakov
>>>>>>>>>
>>>>>>>>> On 20.03.2018 16:00, Ivan Rakov wrote:
>>>>>>>>>
>>>>>>>>>> Val,
>>>>>>>>>>
>>>>>>>>>> If a storage is in
>>>>>>>>>>> corrupted state, does it mean that it needs to be completely
>>>>>>>>>>>
>>>>>>>>>> removed
>>>>>>> and
>>>>>>>>> cluster needs to be restarted without data?
>>>>>>>>>> Yes, there's a chance that in LOG_ONLY all local data will be
>>>>>>>>>>
>>>>>>>>> lost,
>>>>>> but only in *power loss**/ OS crash* case.
>>>>>>>>>> kill -9, JVM crash, death of critical system thread and all
>>>>>>>>>> other
>>>>>>>>>> cases that usually take place are variations of *process crash*.
>>>>>>>>>>
>>>>>>>>> All
>>>>>>> WAL modes (except NONE, of course) ensure corruption-safety in
>>>>>>>>> case
>>>>>> of
>>>>>>>> process crash.
>>>>>>>>>> If so, I'm not sure any mode
>>>>>>>>>>> that allows corruption makes much sense to me.
>>>>>>>>>>>
>>>>>>>>>> It depends on performance impact of enforcing power-loss
>>>>>>>>>>
>>>>>>>>> corruption
>>>>>> safety. Price of full protection from power loss is high - FSYNC
>>>>>>>>> is
>>>>>> way slower (2-10 times) than other WAL modes. The question is
>>>>>>>>> whether
>>>>>>> ensuring weaker guarantees (corruption can't happen, but loss of
>>>>>>>>> last
>>>>>>> updates can) will affect performance as badly as strong
>>>>>>>>> guarantees.
>>>>>> I'll share benchmark results soon.
>>>>>>>>>> Best Regards,
>>>>>>>>>> Ivan Rakov
>>>>>>>>>>
>>>>>>>>>> On 20.03.2018 5:09, Valentin Kulichenko wrote:
>>>>>>>>>>
>>>>>>>>>>> Guys,
>>>>>>>>>>>
>>>>>>>>>>> What do we understand under "data corruption" here? If a
>>>>>>>>>>> storage
>>>>>>>>>>>
>>>>>>>>>> is
>>>>>>> in
>>>>>>>
>>>>>>>> corrupted state, does it mean that it needs to be completely
>>>>>>>>>> removed
>>>>>>> and
>>>>>>>>> cluster needs to be restarted without data? If so, I'm not sure
>>>>>>>>>> any
>>>>>>> mode
>>>>>>>>> that allows corruption makes much sense to me. How am I supposed
>>>>>>>>>> to
>>>>>>> use a
>>>>>>>>>>> database, if virtually any failure can end with complete
>>>>>>>>>>> loss of
>>>>>>>>>>>
>>>>>>>>>> data?
>>>>>>>> In any case, this definitely should not be a default behavior.
>>>>>>>>>> If
>>>>>> user ever
>>>>>>>>>>> switches to corruption-unsafe mode, there should be a clear
>>>>>>>>>>>
>>>>>>>>>> warning
>>>>>>> about
>>>>>>>>>>> this.
>>>>>>>>>>>
>>>>>>>>>>> -Val
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Mar 16, 2018 at 1:06 AM, Ivan Rakov <
>>>>>>>>>>>
>>>>>>>>>> [hidden email]>
>>>>>>> wrote:
>>>>>>>>>>> Ticket to track changes:
>>>>>>>>>>>> https://issues.apache.org/jira/browse/IGNITE-7754
>>>>>>>>>>>>
>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>> Ivan Rakov
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 16.03.2018 10:58, Dmitriy Setrakyan wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Mar 16, 2018 at 12:55 AM, Ivan Rakov <
>>>>>>>>>>>> [hidden email]
>>>>>>>> wrote:
>>>>>>>>>>>>> Vladimir,
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Unlike BACKGROUND, LOG_ONLY provides strict write guarantees
>>>>>>>>>>>>>> unless power
>>>>>>>>>>>>>> loss has happened.
>>>>>>>>>>>>>> Seems like we need to measure performance difference to
>>>>>>>>>>>>>>
>>>>>>>>>>>>> decide
>>>>>> whether do
>>>>>>>>>>>>>> we need separate WAL mode. If it will be invisible, we'll
>>>>>>>>>>>>>>
>>>>>>>>>>>>> just
>>>>>> fix
>>>>>>>> these
>>>>>>>>>>>>>> bugs without introducing new mode; if it will be
>>>>>>>>>>>>>> perceptible,
>>>>>>>>>>>>>>
>>>>>>>>>>>>> we'll
>>>>>>>> continue the discussion about introducing LOG_ONLY_SAFE.
>>>>>>>>>>>>>> Makes sense?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Yes, this sounds like the right approach.
>>>>>>>>>>>>>>
>>>>
>

123