Ivan,
Could we run Yardstick or YCSB benchmarks to see how the fixed LOG_ONLY affected the performance under the operational load (after the preloading part you're referring to is over)? -- Denis On Thu, Apr 12, 2018 at 9:45 AM, Ivan Rakov <[hidden email]> wrote: > Dmitriy, > > fsync() is really slow operation - it's the main reason why FSYNC mode is > way slower than LOG_ONLY. > Fix includes extra fsyncs in necessary parts of code and nothing more. > Every part is important - at the beginning of the thread I described why. > > 20% slow in benchmark doesn't mean than Ignite itself will become 20% > slower. Benchmark replays only "data loading" scenario. It signals that > maximum throughput with WAL enabled will be 20% slower. By the way, we > already have option to disable WAL in runtime for the period of data > loading. > > Best Regards, > Ivan Rakov > > > On 11.04.2018 9:59, Dmitriy Setrakyan wrote: > >> On Tue, Apr 10, 2018 at 11:57 PM, Ilya Suntsov <[hidden email]> >> wrote: >> >> Dmitriy, >>> >>> I've measured performance on the current master and haven't found any >>> problems with in-memory mode. >>> >>> Got it. I would still say that the performance drop is too big with >> persistence turned on. It seems like we did not just fix the bug, we also >> introduced some additional slow down there. I would investigate if we >> could >> optimize. >> >> > |
In reply to this post by Ivan Rakov
On Thu, Apr 12, 2018 at 9:45 AM, Ivan Rakov <[hidden email]> wrote:
> Dmitriy, > > fsync() is really slow operation - it's the main reason why FSYNC mode is > way slower than LOG_ONLY. > Fix includes extra fsyncs in necessary parts of code and nothing more. > Every part is important - at the beginning of the thread I described why. > > 20% slow in benchmark doesn't mean than Ignite itself will become 20% > slower. Benchmark replays only "data loading" scenario. It signals that > maximum throughput with WAL enabled will be 20% slower. By the way, we > already have option to disable WAL in runtime for the period of data > loading. > > we wait for the fsync call to complete? If yes, do we have to wait? Are there other performance optimizations you can add, considering that we are in LOG_ONLY or BACKGROUND modes and disk writes may be delayed. D. |
Dmitriy,
The point of this fsync is to order FS disk writes to prevent data corruption, so this fsync has to be synchronous and cannot be asynchronous or delayed. Given that we fix correctness, I believe that current results are acceptable. 2018-04-13 2:48 GMT+03:00 Dmitriy Setrakyan <[hidden email]>: > On Thu, Apr 12, 2018 at 9:45 AM, Ivan Rakov <[hidden email]> wrote: > > > Dmitriy, > > > > fsync() is really slow operation - it's the main reason why FSYNC mode is > > way slower than LOG_ONLY. > > Fix includes extra fsyncs in necessary parts of code and nothing more. > > Every part is important - at the beginning of the thread I described why. > > > > 20% slow in benchmark doesn't mean than Ignite itself will become 20% > > slower. Benchmark replays only "data loading" scenario. It signals that > > maximum throughput with WAL enabled will be 20% slower. By the way, we > > already have option to disable WAL in runtime for the period of data > > loading. > > > > > Ivan, I get it, but I am sure that you can do more things in parallel. Do > we wait for the fsync call to complete? If yes, do we have to wait? Are > there other performance optimizations you can add, considering that we are > in LOG_ONLY or BACKGROUND modes and disk writes may be delayed. > > D. > |
Agree with Alex.
Now we perform extra WAL fsync() at the beginning of checkpoint. We *have* to wait for call completion before starting to write checkpoint pages - otherwise both physical records in WAL and partition files in storage will be in a mess in case of power loss. User threads *don't* directly wait for this fsync(), however total throughput of user threads can't exceed total throughput of checkpoint, that's why total throughput of user threads is decreased. Denis, regarding this: > Could we run Yardstick or YCSB benchmarks to see how the fixed LOG_ONLY > affected the performance under the operational load (after the preloading > part you're referring to is over)? Please take a look at benchmark results attached to https://issues.apache.org/jira/browse/IGNITE-7754 ticket - "put" benchmarks represent data loading, and "put-get" benchmarks represent operational load. As you can see, operational load degradation is 4-5 times lesser that in data load case. Best Regards, Ivan Rakov On 13.04.2018 11:24, Alexey Goncharuk wrote: > Dmitriy, > > The point of this fsync is to order FS disk writes to prevent data > corruption, so this fsync has to be synchronous and cannot be asynchronous > or delayed. > > Given that we fix correctness, I believe that current results are > acceptable. > > 2018-04-13 2:48 GMT+03:00 Dmitriy Setrakyan <[hidden email]>: > >> On Thu, Apr 12, 2018 at 9:45 AM, Ivan Rakov <[hidden email]> wrote: >> >>> Dmitriy, >>> >>> fsync() is really slow operation - it's the main reason why FSYNC mode is >>> way slower than LOG_ONLY. >>> Fix includes extra fsyncs in necessary parts of code and nothing more. >>> Every part is important - at the beginning of the thread I described why. >>> >>> 20% slow in benchmark doesn't mean than Ignite itself will become 20% >>> slower. Benchmark replays only "data loading" scenario. It signals that >>> maximum throughput with WAL enabled will be 20% slower. By the way, we >>> already have option to disable WAL in runtime for the period of data >>> loading. >>> >>> >> Ivan, I get it, but I am sure that you can do more things in parallel. Do >> we wait for the fsync call to complete? If yes, do we have to wait? Are >> there other performance optimizations you can add, considering that we are >> in LOG_ONLY or BACKGROUND modes and disk writes may be delayed. >> >> D. >> |
Free forum by Nabble | Edit this page |