Apache Ignite Developers - Legacy Mail Archive

Reconsider default WAL mode: we need something between LOG_ONLY and FSYNC

Classic

List

Threaded

44 messages Options

123

dmagda

Re: Reconsider default WAL mode: we need something between LOG_ONLY and FSYNC

Ivan,

Could we run Yardstick or YCSB benchmarks to see how the fixed LOG_ONLY
affected the performance under the operational load (after the preloading
part you're referring to is over)?

--
Denis

On Thu, Apr 12, 2018 at 9:45 AM, Ivan Rakov <[hidden email]> wrote:

> Dmitriy,
>
> fsync() is really slow operation - it's the main reason why FSYNC mode is
> way slower than LOG_ONLY.
> Fix includes extra fsyncs in necessary parts of code and nothing more.
> Every part is important - at the beginning of the thread I described why.
>
> 20% slow in benchmark doesn't mean than Ignite itself will become 20%
> slower. Benchmark replays only "data loading" scenario. It signals that
> maximum throughput with WAL enabled will be 20% slower. By the way, we
> already have option to disable WAL in runtime for the period of data
> loading.
>
> Best Regards,
> Ivan Rakov
>
>
> On 11.04.2018 9:59, Dmitriy Setrakyan wrote:
>
>> On Tue, Apr 10, 2018 at 11:57 PM, Ilya Suntsov <[hidden email]>
>> wrote:
>>
>> Dmitriy,
>>>
>>> I've measured performance on the current master and haven't found any
>>> problems with in-memory mode.
>>>
>>> Got it. I would still say that the performance drop is too big with
>> persistence turned on. It seems like we did not just fix the bug, we also
>> introduced some additional slow down there. I would investigate if we
>> could
>> optimize.
>>
>>
>

dsetrakyan

Re: Reconsider default WAL mode: we need something between LOG_ONLY and FSYNC

In reply to this post by Ivan Rakov

On Thu, Apr 12, 2018 at 9:45 AM, Ivan Rakov <[hidden email]> wrote:

Ivan, I get it, but I am sure that you can do more things in parallel. Do
we wait for the fsync call to complete? If yes, do we have to wait? Are
there other performance optimizations you can add, considering that we are
in LOG_ONLY or BACKGROUND modes and disk writes may be delayed.

D.

Alexey Goncharuk

Re: Reconsider default WAL mode: we need something between LOG_ONLY and FSYNC

Dmitriy,

The point of this fsync is to order FS disk writes to prevent data
corruption, so this fsync has to be synchronous and cannot be asynchronous
or delayed.

Given that we fix correctness, I believe that current results are
acceptable.

2018-04-13 2:48 GMT+03:00 Dmitriy Setrakyan <[hidden email]>:

> On Thu, Apr 12, 2018 at 9:45 AM, Ivan Rakov <[hidden email]> wrote:
>
> > Dmitriy,
> >
> > fsync() is really slow operation - it's the main reason why FSYNC mode is
> > way slower than LOG_ONLY.
> > Fix includes extra fsyncs in necessary parts of code and nothing more.
> > Every part is important - at the beginning of the thread I described why.
> >
> > 20% slow in benchmark doesn't mean than Ignite itself will become 20%
> > slower. Benchmark replays only "data loading" scenario. It signals that
> > maximum throughput with WAL enabled will be 20% slower. By the way, we
> > already have option to disable WAL in runtime for the period of data
> > loading.
> >
> >
> Ivan, I get it, but I am sure that you can do more things in parallel. Do
> we wait for the fsync call to complete? If yes, do we have to wait? Are
> there other performance optimizations you can add, considering that we are
> in LOG_ONLY or BACKGROUND modes and disk writes may be delayed.
>
> D.
>

Ivan Rakov

Re: Reconsider default WAL mode: we need something between LOG_ONLY and FSYNC

Agree with Alex.

Now we perform extra WAL fsync() at the beginning of checkpoint. We
*have* to wait for call completion before starting to write checkpoint
pages - otherwise both physical records in WAL and partition files in
storage will be in a mess in case of power loss. User threads *don't*
directly wait for this fsync(), however total throughput of user threads
can't exceed total throughput of checkpoint, that's why total throughput
of user threads is decreased.

Denis, regarding this:

> Could we run Yardstick or YCSB benchmarks to see how the fixed LOG_ONLY
> affected the performance under the operational load (after the preloading
> part you're referring to is over)?

Please take a look at benchmark results attached to
https://issues.apache.org/jira/browse/IGNITE-7754 ticket - "put"
benchmarks represent data loading, and "put-get" benchmarks represent
operational load. As you can see, operational load degradation is 4-5
times lesser that in data load case.

Best Regards,
Ivan Rakov

On 13.04.2018 11:24, Alexey Goncharuk wrote:

> Dmitriy,
>
> The point of this fsync is to order FS disk writes to prevent data
> corruption, so this fsync has to be synchronous and cannot be asynchronous
> or delayed.
>
> Given that we fix correctness, I believe that current results are
> acceptable.
>
> 2018-04-13 2:48 GMT+03:00 Dmitriy Setrakyan <[hidden email]>:
>
>> On Thu, Apr 12, 2018 at 9:45 AM, Ivan Rakov <[hidden email]> wrote:
>>
>>> Dmitriy,
>>>
>>> fsync() is really slow operation - it's the main reason why FSYNC mode is
>>> way slower than LOG_ONLY.
>>> Fix includes extra fsyncs in necessary parts of code and nothing more.
>>> Every part is important - at the beginning of the thread I described why.
>>>
>>> 20% slow in benchmark doesn't mean than Ignite itself will become 20%
>>> slower. Benchmark replays only "data loading" scenario. It signals that
>>> maximum throughput with WAL enabled will be 20% slower. By the way, we
>>> already have option to disable WAL in runtime for the period of data
>>> loading.
>>>
>>>
>> Ivan, I get it, but I am sure that you can do more things in parallel. Do
>> we wait for the fsync call to complete? If yes, do we have to wait? Are
>> there other performance optimizations you can add, considering that we are
>> in LOG_ONLY or BACKGROUND modes and disk writes may be delayed.
>>
>> D.
>>

123