Move WAL archive cleanup from checkpoint to rollover

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

Move WAL archive cleanup from checkpoint to rollover

Kirill Tkalenko
Hello to all!

At the moment, the archive is cleared at the end of the checkpoint, it seems it should happen in FileWriteAheadLogManager.
I suggest moving it into the FileWriteAheadLogManager#rollOver when the DataStorageConfiguration#maxWalArchiveSize is reached.

To do this, I created a ticket https://issues.apache.org/jira/browse/IGNITE-13831
shm
Reply | Threaded
Open this post in threaded view
|

Re: Move WAL archive cleanup from checkpoint to rollover

shm
Hi Kirill Tkalenko

Is this ticket is just for moving WAL archive clean-up logic from "post
checkpoint" to FileWriteAheadLogManager#rollOver  ? Or is there any plan to
change the logic which considers Archived WAL segments for clean-up ?

I'm facing a Major issue related to WAL usage and WAL usage growing
infinitely with heavy write load even maxWalArchiveSize is set.

Ref:
http://apache-ignite-users.70518.x6.nabble.com/Could-not-clear-historyMap-due-to-WAL-reservation-on-cp-td34779.html

http://apache-ignite-users.70518.x6.nabble.com/connecting-to-visor-shell-permanently-stops-unwanted-WAL-clean-up-td34737.html

Shiva



--
Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Move WAL archive cleanup from checkpoint to rollover

Kirill Tkalenko
Hi, Shiva!

Yes, this ticket will only be about moving WAL archive cleanup.
I think further it is possible to solve the problem of WAL archive overflow, but before that we need to solve several problems and deal with heuristics.

09.12.2020, 15:02, "shm" <[hidden email]>:

> Hi Kirill Tkalenko
>
> Is this ticket is just for moving WAL archive clean-up logic from "post
> checkpoint" to FileWriteAheadLogManager#rollOver ? Or is there any plan to
> change the logic which considers Archived WAL segments for clean-up ?
>
> I'm facing a Major issue related to WAL usage and WAL usage growing
> infinitely with heavy write load even maxWalArchiveSize is set.
>
> Ref:
> http://apache-ignite-users.70518.x6.nabble.com/Could-not-clear-historyMap-due-to-WAL-reservation-on-cp-td34779.html
>
> http://apache-ignite-users.70518.x6.nabble.com/connecting-to-visor-shell-permanently-stops-unwanted-WAL-clean-up-td34737.html
>
> Shiva
>
> --
> Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
shm
Reply | Threaded
Open this post in threaded view
|

Re: Move WAL archive cleanup from checkpoint to rollover

shm
Hi Kirill Tkalenko,
Thanks for your reply!
Do you know any known issue related to problem I'm facing ? or any ticket
related to infinite WAL growing issue which can be correlated to my issue?

Regards,
Shiva




--
Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Move WAL archive cleanup from checkpoint to rollover

Kirill Tkalenko
Hi, Shiva!

I am not aware of such tickets yet. Later I think I can deal with this issue.
Now you can try to increase the frequency of checkpoints, the maximum WAL archive size and try to change the system property IGNITE_CHECKPOINT_TRIGGER_ARCHIVE_SIZE_PERCENTAGE.
This will not solve the problem, but it may help to delay it.

09.12.2020, 16:15, "shm" <[hidden email]>:

> Hi Kirill Tkalenko,
> Thanks for your reply!
> Do you know any known issue related to problem I'm facing ? or any ticket
> related to infinite WAL growing issue which can be correlated to my issue?
>
> Regards,
> Shiva
>
> --
> Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Move WAL archive cleanup from checkpoint to rollover

Kirill Tkalenko
And this property IGNITE_THRESHOLD_WAL_ARCHIVE_SIZE_PERCENTAGE


09.12.2020, 16:28, "ткаленко кирилл" <[hidden email]>:

> And this property IGNITE_THRESHOLD_WAL_ARCHIVE_SIZE_PERCENTAGE
>
> 09.12.2020, 16:25, "ткаленко кирилл" <[hidden email]>:
>> Hi, Shiva!
>>
>> I am not aware of such tickets yet. Later I think I can deal with this issue.
>> Now you can try to increase the frequency of checkpoints, the maximum WAL archive size and try to change the system property IGNITE_CHECKPOINT_TRIGGER_ARCHIVE_SIZE_PERCENTAGE.
>> This will not solve the problem, but it may help to delay it.
>>
>> 09.12.2020, 16:15, "shm" <[hidden email]>:
>>>  Hi Kirill Tkalenko,
>>>  Thanks for your reply!
>>>  Do you know any known issue related to problem I'm facing ? or any ticket
>>>  related to infinite WAL growing issue which can be correlated to my issue?
>>>
>>>  Regards,
>>>  Shiva
>>>
>>>  --
>>>  Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
vbm
Reply | Threaded
Open this post in threaded view
|

Re: Move WAL archive cleanup from checkpoint to rollover

vbm
Hi Kirill Tkalenko,

Is there any relation to rate of ingestion of data to ignite ?

We had seen the issue of WAL growing infinitely recently in our K8s cluster.
We were ingesting data at around 2Mbps.
In other clusters where we did not have such a fast ingestion of data, this
issue was not observed.
 

Regards,
Vishwas



--
Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Move WAL archive cleanup from checkpoint to rollover

agura
Kiriill,

Issue description contains the following:

> At the moment, WAL archive is cleared at the end of the checkpoint, which does not seem correct and needs to be moved

Could you please explain why existing behavior is not correct. It
seems that it is not enough motivation for change.

On Wed, Dec 9, 2020 at 5:05 PM vbm <[hidden email]> wrote:

>
> Hi Kirill Tkalenko,
>
> Is there any relation to rate of ingestion of data to ignite ?
>
> We had seen the issue of WAL growing infinitely recently in our K8s cluster.
> We were ingesting data at around 2Mbps.
> In other clusters where we did not have such a fast ingestion of data, this
> issue was not observed.
>
>
> Regards,
> Vishwas
>
>
>
> --
> Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Move WAL archive cleanup from checkpoint to rollover

Kirill Tkalenko
In reply to this post by vbm
Hi, Vishwas!

Speed of uploading data directly associated with the growth of WAL archive.

09.12.2020, 17:05, "vbm" <[hidden email]>:

> Hi Kirill Tkalenko,
>
> Is there any relation to rate of ingestion of data to ignite ?
>
> We had seen the issue of WAL growing infinitely recently in our K8s cluster.
> We were ingesting data at around 2Mbps.
> In other clusters where we did not have such a fast ingestion of data, this
> issue was not observed.
>
> Regards,
> Vishwas
>
> --
> Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Move WAL archive cleanup from checkpoint to rollover

Kirill Tkalenko
In reply to this post by agura
Hi, Andrey!

Users expect DataStorageConfiguration#maxWalArchiveSize to mean that WAL archive will not exceed this value, but it is not.
It seems that to reduce the chance of getting into a situation when we exceed WAL archive, it will be lowed when we clean it when switching to a new segment than at the end of the checkpoint.
After that, we can think about and make a hard limit on WAL archive, but for this will need to solve a few more problems.

09.12.2020, 17:24, "Andrey Gura" <[hidden email]>:

> Kiriill,
>
> Issue description contains the following:
>
>>  At the moment, WAL archive is cleared at the end of the checkpoint, which does not seem correct and needs to be moved
>
> Could you please explain why existing behavior is not correct. It
> seems that it is not enough motivation for change.
>
> On Wed, Dec 9, 2020 at 5:05 PM vbm <[hidden email]> wrote:
>>  Hi Kirill Tkalenko,
>>
>>  Is there any relation to rate of ingestion of data to ignite ?
>>
>>  We had seen the issue of WAL growing infinitely recently in our K8s cluster.
>>  We were ingesting data at around 2Mbps.
>>  In other clusters where we did not have such a fast ingestion of data, this
>>  issue was not observed.
>>
>>  Regards,
>>  Vishwas
>>
>>  --
>>  Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Move WAL archive cleanup from checkpoint to rollover

Данилов Семён
Hello! I also have an issue in progress regarding WAL archive: https://issues.apache.org/jira/browse/IGNITE-12892. In this ticket I tried clarifying WAL archive size configuration, removing usage of the deprecated walHistSize property. Also, IGNITE_PDS_MAX_CHECKPOINT_MEMORY_HISTORY_SIZE is now only used when archive is managed externally and you can set maxWalArchiveSize to -1 so WAL truncation is disabled. PR is already submitted and approved https://github.com/apache/ignite/pull/8550

Regards,
Semyon.

09.12.2020, 19:17, "ткаленко кирилл" <[hidden email]>:

> Hi, Andrey!
>
> Users expect DataStorageConfiguration#maxWalArchiveSize to mean that WAL archive will not exceed this value, but it is not.
> It seems that to reduce the chance of getting into a situation when we exceed WAL archive, it will be lowed when we clean it when switching to a new segment than at the end of the checkpoint.
> After that, we can think about and make a hard limit on WAL archive, but for this will need to solve a few more problems.
>
> 09.12.2020, 17:24, "Andrey Gura" <[hidden email]>:
>>  Kiriill,
>>
>>  Issue description contains the following:
>>
>>>   At the moment, WAL archive is cleared at the end of the checkpoint, which does not seem correct and needs to be moved
>>
>>  Could you please explain why existing behavior is not correct. It
>>  seems that it is not enough motivation for change.
>>
>>  On Wed, Dec 9, 2020 at 5:05 PM vbm <[hidden email]> wrote:
>>>   Hi Kirill Tkalenko,
>>>
>>>   Is there any relation to rate of ingestion of data to ignite ?
>>>
>>>   We had seen the issue of WAL growing infinitely recently in our K8s cluster.
>>>   We were ingesting data at around 2Mbps.
>>>   In other clusters where we did not have such a fast ingestion of data, this
>>>   issue was not observed.
>>>
>>>   Regards,
>>>   Vishwas
>>>
>>>   --
>>>   Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Move WAL archive cleanup from checkpoint to rollover

agura
In reply to this post by agura
Kirill,

thanks for adding motivation to the issue description.

On Wed, Dec 9, 2020 at 5:24 PM Andrey Gura <[hidden email]> wrote:

>
> Kiriill,
>
> Issue description contains the following:
>
> > At the moment, WAL archive is cleared at the end of the checkpoint, which does not seem correct and needs to be moved
>
> Could you please explain why existing behavior is not correct. It
> seems that it is not enough motivation for change.
>
> On Wed, Dec 9, 2020 at 5:05 PM vbm <[hidden email]> wrote:
> >
> > Hi Kirill Tkalenko,
> >
> > Is there any relation to rate of ingestion of data to ignite ?
> >
> > We had seen the issue of WAL growing infinitely recently in our K8s cluster.
> > We were ingesting data at around 2Mbps.
> > In other clusters where we did not have such a fast ingestion of data, this
> > issue was not observed.
> >
> >
> > Regards,
> > Vishwas
> >
> >
> >
> > --
> > Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/