Apache Ignite Developers - Legacy Mail Archive

Move WAL archive cleanup from checkpoint to rollover

Classic

List

Threaded

12 messages Options

Kirill Tkalenko

Move WAL archive cleanup from checkpoint to rollover

Hello to all!

At the moment, the archive is cleared at the end of the checkpoint, it seems it should happen in FileWriteAheadLogManager.
I suggest moving it into the FileWriteAheadLogManager#rollOver when the DataStorageConfiguration#maxWalArchiveSize is reached.

To do this, I created a ticket https://issues.apache.org/jira/browse/IGNITE-13831

shm

Re: Move WAL archive cleanup from checkpoint to rollover

Hi Kirill Tkalenko

Is this ticket is just for moving WAL archive clean-up logic from "post
checkpoint" to FileWriteAheadLogManager#rollOver ? Or is there any plan to
change the logic which considers Archived WAL segments for clean-up ?

I'm facing a Major issue related to WAL usage and WAL usage growing
infinitely with heavy write load even maxWalArchiveSize is set.

Ref:
http://apache-ignite-users.70518.x6.nabble.com/Could-not-clear-historyMap-due-to-WAL-reservation-on-cp-td34779.html

http://apache-ignite-users.70518.x6.nabble.com/connecting-to-visor-shell-permanently-stops-unwanted-WAL-clean-up-td34737.html

Shiva

--
Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/

Kirill Tkalenko

Re: Move WAL archive cleanup from checkpoint to rollover

Hi, Shiva!

Yes, this ticket will only be about moving WAL archive cleanup.
I think further it is possible to solve the problem of WAL archive overflow, but before that we need to solve several problems and deal with heuristics.

09.12.2020, 15:02, "shm" <[hidden email]>:

> Hi Kirill Tkalenko
>
> Is this ticket is just for moving WAL archive clean-up logic from "post
> checkpoint" to FileWriteAheadLogManager#rollOver ? Or is there any plan to
> change the logic which considers Archived WAL segments for clean-up ?
>
> I'm facing a Major issue related to WAL usage and WAL usage growing
> infinitely with heavy write load even maxWalArchiveSize is set.
>
> Ref:
> http://apache-ignite-users.70518.x6.nabble.com/Could-not-clear-historyMap-due-to-WAL-reservation-on-cp-td34779.html
>
> http://apache-ignite-users.70518.x6.nabble.com/connecting-to-visor-shell-permanently-stops-unwanted-WAL-clean-up-td34737.html
>
> Shiva
>
> --
> Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/

shm

Re: Move WAL archive cleanup from checkpoint to rollover

Hi Kirill Tkalenko,
Thanks for your reply!
Do you know any known issue related to problem I'm facing ? or any ticket
related to infinite WAL growing issue which can be correlated to my issue?

Regards,
Shiva

--
Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/

Kirill Tkalenko

Re: Move WAL archive cleanup from checkpoint to rollover

Hi, Shiva!

I am not aware of such tickets yet. Later I think I can deal with this issue.
Now you can try to increase the frequency of checkpoints, the maximum WAL archive size and try to change the system property IGNITE_CHECKPOINT_TRIGGER_ARCHIVE_SIZE_PERCENTAGE.
This will not solve the problem, but it may help to delay it.

09.12.2020, 16:15, "shm" <[hidden email]>:

> Hi Kirill Tkalenko,
> Thanks for your reply!
> Do you know any known issue related to problem I'm facing ? or any ticket
> related to infinite WAL growing issue which can be correlated to my issue?
>
> Regards,
> Shiva
>
> --
> Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/

Kirill Tkalenko

Re: Move WAL archive cleanup from checkpoint to rollover

And this property IGNITE_THRESHOLD_WAL_ARCHIVE_SIZE_PERCENTAGE

09.12.2020, 16:28, "ткаленко кирилл" <[hidden email]>:

> And this property IGNITE_THRESHOLD_WAL_ARCHIVE_SIZE_PERCENTAGE
>
> 09.12.2020, 16:25, "ткаленко кирилл" <[hidden email]>:
>> Hi, Shiva!
>>
>> I am not aware of such tickets yet. Later I think I can deal with this issue.
>> Now you can try to increase the frequency of checkpoints, the maximum WAL archive size and try to change the system property IGNITE_CHECKPOINT_TRIGGER_ARCHIVE_SIZE_PERCENTAGE.
>> This will not solve the problem, but it may help to delay it.
>>
>> 09.12.2020, 16:15, "shm" <[hidden email]>:
>>> Hi Kirill Tkalenko,
>>> Thanks for your reply!
>>> Do you know any known issue related to problem I'm facing ? or any ticket
>>> related to infinite WAL growing issue which can be correlated to my issue?
>>>
>>> Regards,
>>> Shiva
>>>
>>> --
>>> Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/

vbm

Re: Move WAL archive cleanup from checkpoint to rollover

Hi Kirill Tkalenko,

Is there any relation to rate of ingestion of data to ignite ?

We had seen the issue of WAL growing infinitely recently in our K8s cluster.
We were ingesting data at around 2Mbps.
In other clusters where we did not have such a fast ingestion of data, this
issue was not observed.

Regards,
Vishwas

--
Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/

agura

Re: Move WAL archive cleanup from checkpoint to rollover

Kiriill,

Issue description contains the following:

> At the moment, WAL archive is cleared at the end of the checkpoint, which does not seem correct and needs to be moved

Could you please explain why existing behavior is not correct. It
seems that it is not enough motivation for change.

On Wed, Dec 9, 2020 at 5:05 PM vbm <[hidden email]> wrote:

>
> Hi Kirill Tkalenko,
>
> Is there any relation to rate of ingestion of data to ignite ?
>
> We had seen the issue of WAL growing infinitely recently in our K8s cluster.
> We were ingesting data at around 2Mbps.
> In other clusters where we did not have such a fast ingestion of data, this
> issue was not observed.
>
>
> Regards,
> Vishwas
>
>
>
> --
> Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/

Kirill Tkalenko

Re: Move WAL archive cleanup from checkpoint to rollover

In reply to this post by vbm

Hi, Vishwas!

Speed of uploading data directly associated with the growth of WAL archive.

09.12.2020, 17:05, "vbm" <[hidden email]>:

> Hi Kirill Tkalenko,
>
> Is there any relation to rate of ingestion of data to ignite ?
>
> We had seen the issue of WAL growing infinitely recently in our K8s cluster.
> We were ingesting data at around 2Mbps.
> In other clusters where we did not have such a fast ingestion of data, this
> issue was not observed.
>
> Regards,
> Vishwas
>
> --
> Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/

Kirill Tkalenko

Re: Move WAL archive cleanup from checkpoint to rollover

In reply to this post by agura

Hi, Andrey!

Users expect DataStorageConfiguration#maxWalArchiveSize to mean that WAL archive will not exceed this value, but it is not.
It seems that to reduce the chance of getting into a situation when we exceed WAL archive, it will be lowed when we clean it when switching to a new segment than at the end of the checkpoint.
After that, we can think about and make a hard limit on WAL archive, but for this will need to solve a few more problems.

09.12.2020, 17:24, "Andrey Gura" <[hidden email]>:

> Kiriill,
>
> Issue description contains the following:
>
>> At the moment, WAL archive is cleared at the end of the checkpoint, which does not seem correct and needs to be moved
>
> Could you please explain why existing behavior is not correct. It
> seems that it is not enough motivation for change.
>
> On Wed, Dec 9, 2020 at 5:05 PM vbm <[hidden email]> wrote:
>> Hi Kirill Tkalenko,
>>
>> Is there any relation to rate of ingestion of data to ignite ?
>>
>> We had seen the issue of WAL growing infinitely recently in our K8s cluster.
>> We were ingesting data at around 2Mbps.
>> In other clusters where we did not have such a fast ingestion of data, this
>> issue was not observed.
>>
>> Regards,
>> Vishwas
>>
>> --
>> Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/

Данилов Семён

Re: Move WAL archive cleanup from checkpoint to rollover

Hello! I also have an issue in progress regarding WAL archive: https://issues.apache.org/jira/browse/IGNITE-12892. In this ticket I tried clarifying WAL archive size configuration, removing usage of the deprecated walHistSize property. Also, IGNITE_PDS_MAX_CHECKPOINT_MEMORY_HISTORY_SIZE is now only used when archive is managed externally and you can set maxWalArchiveSize to -1 so WAL truncation is disabled. PR is already submitted and approved https://github.com/apache/ignite/pull/8550

Regards,
Semyon.

09.12.2020, 19:17, "ткаленко кирилл" <[hidden email]>:

> Hi, Andrey!
>
> Users expect DataStorageConfiguration#maxWalArchiveSize to mean that WAL archive will not exceed this value, but it is not.
> It seems that to reduce the chance of getting into a situation when we exceed WAL archive, it will be lowed when we clean it when switching to a new segment than at the end of the checkpoint.
> After that, we can think about and make a hard limit on WAL archive, but for this will need to solve a few more problems.
>
> 09.12.2020, 17:24, "Andrey Gura" <[hidden email]>:
>> Kiriill,
>>
>> Issue description contains the following:
>>
>>>   At the moment, WAL archive is cleared at the end of the checkpoint, which does not seem correct and needs to be moved
>>
>> Could you please explain why existing behavior is not correct. It
>> seems that it is not enough motivation for change.
>>
>> On Wed, Dec 9, 2020 at 5:05 PM vbm <[hidden email]> wrote:
>>>   Hi Kirill Tkalenko,
>>>
>>>   Is there any relation to rate of ingestion of data to ignite ?
>>>
>>>   We had seen the issue of WAL growing infinitely recently in our K8s cluster.
>>>   We were ingesting data at around 2Mbps.
>>>   In other clusters where we did not have such a fast ingestion of data, this
>>>   issue was not observed.
>>>
>>>   Regards,
>>>   Vishwas
>>>
>>>   --
>>>   Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/

agura

Re: Move WAL archive cleanup from checkpoint to rollover

In reply to this post by agura

Kirill,

thanks for adding motivation to the issue description.

On Wed, Dec 9, 2020 at 5:24 PM Andrey Gura <[hidden email]> wrote:

>
> Kiriill,
>
> Issue description contains the following:
>
> > At the moment, WAL archive is cleared at the end of the checkpoint, which does not seem correct and needs to be moved
>
> Could you please explain why existing behavior is not correct. It
> seems that it is not enough motivation for change.
>
> On Wed, Dec 9, 2020 at 5:05 PM vbm <[hidden email]> wrote:
> >
> > Hi Kirill Tkalenko,
> >
> > Is there any relation to rate of ingestion of data to ignite ?
> >
> > We had seen the issue of WAL growing infinitely recently in our K8s cluster.
> > We were ingesting data at around 2Mbps.
> > In other clusters where we did not have such a fast ingestion of data, this
> > issue was not observed.
> >
> >
> > Regards,
> > Vishwas
> >
> >
> >
> > --
> > Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/