Hello, Igniters.
I want to start a discussion of the new feature [1] CDC - capture data change. The feature allows the consumer to receive online notifications about data record changes. It can be used in the following scenarios: * Export data into some warehouse, full-text search, or distributed log system. * Online statistics and analytics. * Wait and respond to some specific events or data changes. Propose to implement new IgniteCDC application as follows: * Run on the server node host. * Watches for the appearance of the WAL archive segments. * Iterates it using existing WALIterator and notifies consumer of each record from the segment. IgniteCDC features: * Independence from the server node process (JVM) - issues and failures of the consumer will not lead to server node instability. * Notification guarantees and failover - i.e. CDC track and save the pointer to the last consumed record. Continue notification from this pointer in case of restart. * Resilience for the consumer - it's not an issue when a consumer temporarily consumes slower than data appear. WDYT? [1] https://cwiki.apache.org/confluence/display/IGNITE/IEP-59+CDC+-+Capture+Data+Change |
Hi Nikolay,
The idea is good. But what do you think to integrate these ideas into WAL-G project? https://github.com/wal-g/wal-g It's a well-known tool that is already used to stream WAL for PostgreSQL, MySQL, and MongoDB. The advantages are integration with S3, GCP, Azure out of the box, encryption, and compression. ср, 14 окт. 2020 г. в 14:21, Nikolay Izhikov <[hidden email]>: > Hello, Igniters. > > I want to start a discussion of the new feature [1] > > CDC - capture data change. The feature allows the consumer to receive > online notifications about data record changes. > > It can be used in the following scenarios: > * Export data into some warehouse, full-text search, or > distributed log system. > * Online statistics and analytics. > * Wait and respond to some specific events or data changes. > > Propose to implement new IgniteCDC application as follows: > * Run on the server node host. > * Watches for the appearance of the WAL archive segments. > * Iterates it using existing WALIterator and notifies consumer of > each record from the segment. > > IgniteCDC features: > * Independence from the server node process (JVM) - issues and > failures of the consumer will not lead to server node instability. > * Notification guarantees and failover - i.e. CDC track and save > the pointer to the last consumed record. Continue notification from this > pointer in case of restart. > * Resilience for the consumer - it's not an issue when a consumer > temporarily consumes slower than data appear. > > WDYT? > > [1] > https://cwiki.apache.org/confluence/display/IGNITE/IEP-59+CDC+-+Capture+Data+Change |
This tool is also can be used to store snapshots in an external warehouse.
ср, 14 окт. 2020 г. в 14:57, Pavel Kovalenko <[hidden email]>: > Hi Nikolay, > > The idea is good. But what do you think to integrate these ideas into > WAL-G project? > https://github.com/wal-g/wal-g > It's a well-known tool that is already used to stream WAL for PostgreSQL, > MySQL, and MongoDB. > The advantages are integration with S3, GCP, Azure out of the box, > encryption, and compression. > > > ср, 14 окт. 2020 г. в 14:21, Nikolay Izhikov <[hidden email]>: > >> Hello, Igniters. >> >> I want to start a discussion of the new feature [1] >> >> CDC - capture data change. The feature allows the consumer to receive >> online notifications about data record changes. >> >> It can be used in the following scenarios: >> * Export data into some warehouse, full-text search, or >> distributed log system. >> * Online statistics and analytics. >> * Wait and respond to some specific events or data changes. >> >> Propose to implement new IgniteCDC application as follows: >> * Run on the server node host. >> * Watches for the appearance of the WAL archive segments. >> * Iterates it using existing WALIterator and notifies consumer of >> each record from the segment. >> >> IgniteCDC features: >> * Independence from the server node process (JVM) - issues and >> failures of the consumer will not lead to server node instability. >> * Notification guarantees and failover - i.e. CDC track and save >> the pointer to the last consumed record. Continue notification from this >> pointer in case of restart. >> * Resilience for the consumer - it's not an issue when a consumer >> temporarily consumes slower than data appear. >> >> WDYT? >> >> [1] >> https://cwiki.apache.org/confluence/display/IGNITE/IEP-59+CDC+-+Capture+Data+Change > > |
Hell. Pavel.
Thanks for the feedback. > But what do you think to integrate these ideas into WAL-G project? Looked into WAL-G description and look like it’s use-case are more restricted then CDC itself. Self-definition - "WAL-G is an archival restoration tool for Postgres(beta for MySQL, MariaDB, and MongoDB)" Restoration ability is one of the CDC application. I mentioned other use-cases in the first letter. Anyway, integration with the well-known tools is good. I try to contact WAL-G community. > 14 окт. 2020 г., в 14:59, Pavel Kovalenko <[hidden email]> написал(а): > > This tool is also can be used to store snapshots in an external warehouse. > > > ср, 14 окт. 2020 г. в 14:57, Pavel Kovalenko <[hidden email]>: > >> Hi Nikolay, >> >> The idea is good. But what do you think to integrate these ideas into >> WAL-G project? >> https://github.com/wal-g/wal-g >> It's a well-known tool that is already used to stream WAL for PostgreSQL, >> MySQL, and MongoDB. >> The advantages are integration with S3, GCP, Azure out of the box, >> encryption, and compression. >> >> >> ср, 14 окт. 2020 г. в 14:21, Nikolay Izhikov <[hidden email]>: >> >>> Hello, Igniters. >>> >>> I want to start a discussion of the new feature [1] >>> >>> CDC - capture data change. The feature allows the consumer to receive >>> online notifications about data record changes. >>> >>> It can be used in the following scenarios: >>> * Export data into some warehouse, full-text search, or >>> distributed log system. >>> * Online statistics and analytics. >>> * Wait and respond to some specific events or data changes. >>> >>> Propose to implement new IgniteCDC application as follows: >>> * Run on the server node host. >>> * Watches for the appearance of the WAL archive segments. >>> * Iterates it using existing WALIterator and notifies consumer of >>> each record from the segment. >>> >>> IgniteCDC features: >>> * Independence from the server node process (JVM) - issues and >>> failures of the consumer will not lead to server node instability. >>> * Notification guarantees and failover - i.e. CDC track and save >>> the pointer to the last consumed record. Continue notification from this >>> pointer in case of restart. >>> * Resilience for the consumer - it's not an issue when a consumer >>> temporarily consumes slower than data appear. >>> >>> WDYT? >>> >>> [1] >>> https://cwiki.apache.org/confluence/display/IGNITE/IEP-59+CDC+-+Capture+Data+Change >> >> |
In reply to this post by Pavel Kovalenko
Hi,
>On the segment archiving, utility iterates it using existing WALIterator >Wait and respond to some specific events or data changes. It seems like this solution will have an unpredictable delay for synchronization for handling events. Why can't we just implement a Debezium connector for Ignite, for example? https://debezium.io/documentation/reference/1.3/index.html. It is a pretty popular product that uses Kafka underneath. Evgenii ср, 14 окт. 2020 г. в 05:00, Pavel Kovalenko <[hidden email]>: > This tool is also can be used to store snapshots in an external warehouse. > > > ср, 14 окт. 2020 г. в 14:57, Pavel Kovalenko <[hidden email]>: > > > Hi Nikolay, > > > > The idea is good. But what do you think to integrate these ideas into > > WAL-G project? > > https://github.com/wal-g/wal-g > > It's a well-known tool that is already used to stream WAL for PostgreSQL, > > MySQL, and MongoDB. > > The advantages are integration with S3, GCP, Azure out of the box, > > encryption, and compression. > > > > > > ср, 14 окт. 2020 г. в 14:21, Nikolay Izhikov <[hidden email]>: > > > >> Hello, Igniters. > >> > >> I want to start a discussion of the new feature [1] > >> > >> CDC - capture data change. The feature allows the consumer to receive > >> online notifications about data record changes. > >> > >> It can be used in the following scenarios: > >> * Export data into some warehouse, full-text search, or > >> distributed log system. > >> * Online statistics and analytics. > >> * Wait and respond to some specific events or data changes. > >> > >> Propose to implement new IgniteCDC application as follows: > >> * Run on the server node host. > >> * Watches for the appearance of the WAL archive segments. > >> * Iterates it using existing WALIterator and notifies consumer > of > >> each record from the segment. > >> > >> IgniteCDC features: > >> * Independence from the server node process (JVM) - issues and > >> failures of the consumer will not lead to server node instability. > >> * Notification guarantees and failover - i.e. CDC track and save > >> the pointer to the last consumed record. Continue notification from this > >> pointer in case of restart. > >> * Resilience for the consumer - it's not an issue when a > consumer > >> temporarily consumes slower than data appear. > >> > >> WDYT? > >> > >> [1] > >> > https://cwiki.apache.org/confluence/display/IGNITE/IEP-59+CDC+-+Capture+Data+Change > > > > > |
Hello, Evgeni.
> It seems like this solution will have an unpredictable delay for synchronization for handling events. It’s true that CDC solution doesn’t have strict boundaries for notification delay because of asynchronous nature. But, I assume that we will introduce a WAL rollout timeout for CDC cases Please, take a look at the ticket [1]. The same approach is used by Oracle and other databases that implement CDC. Anyway, I treat notification delay and split of the event and its consumption as an advantage of CDC, not downside :) > Why can't we just implement a Debezium connector for Ignite, for example? I think we can. But, AFAIK debezium connectors developed for other databases uses CDC implementations similar to proposed. [1] https://issues.apache.org/jira/browse/IGNITE-13582?src=confmacro > 14 окт. 2020 г., в 15:36, Evgenii Zhuravlev <[hidden email]> написал(а): > > Hi, > >> On the segment archiving, utility iterates it using existing WALIterator >> Wait and respond to some specific events or data changes. > It seems like this solution will have an unpredictable delay for > synchronization for handling events. > > Why can't we just implement a Debezium connector for Ignite, for example? > https://debezium.io/documentation/reference/1.3/index.html. It is a pretty > popular product that uses Kafka underneath. > > Evgenii > > > ср, 14 окт. 2020 г. в 05:00, Pavel Kovalenko <[hidden email]>: > >> This tool is also can be used to store snapshots in an external warehouse. >> >> >> ср, 14 окт. 2020 г. в 14:57, Pavel Kovalenko <[hidden email]>: >> >>> Hi Nikolay, >>> >>> The idea is good. But what do you think to integrate these ideas into >>> WAL-G project? >>> https://github.com/wal-g/wal-g >>> It's a well-known tool that is already used to stream WAL for PostgreSQL, >>> MySQL, and MongoDB. >>> The advantages are integration with S3, GCP, Azure out of the box, >>> encryption, and compression. >>> >>> >>> ср, 14 окт. 2020 г. в 14:21, Nikolay Izhikov <[hidden email]>: >>> >>>> Hello, Igniters. >>>> >>>> I want to start a discussion of the new feature [1] >>>> >>>> CDC - capture data change. The feature allows the consumer to receive >>>> online notifications about data record changes. >>>> >>>> It can be used in the following scenarios: >>>> * Export data into some warehouse, full-text search, or >>>> distributed log system. >>>> * Online statistics and analytics. >>>> * Wait and respond to some specific events or data changes. >>>> >>>> Propose to implement new IgniteCDC application as follows: >>>> * Run on the server node host. >>>> * Watches for the appearance of the WAL archive segments. >>>> * Iterates it using existing WALIterator and notifies consumer >> of >>>> each record from the segment. >>>> >>>> IgniteCDC features: >>>> * Independence from the server node process (JVM) - issues and >>>> failures of the consumer will not lead to server node instability. >>>> * Notification guarantees and failover - i.e. CDC track and save >>>> the pointer to the last consumed record. Continue notification from this >>>> pointer in case of restart. >>>> * Resilience for the consumer - it's not an issue when a >> consumer >>>> temporarily consumes slower than data appear. >>>> >>>> WDYT? >>>> >>>> [1] >>>> >> https://cwiki.apache.org/confluence/display/IGNITE/IEP-59+CDC+-+Capture+Data+Change >>> >>> >> |
Hi,
I definitely agree with Pavel and Evgenii ideas and comments. From my point of view the proposal is not about Apache Ignite features. Described functionality could be implemented outside of the Apache Ignite project. Perhaps, Debezium connector or WAL-G module are the best candidates for the proposed CDC. CDC API will require additional support from Apache Ignite community while there is no need in any public API for described cases. It is a really good idea to implement CDC tool as part of Debezium or WAL-G projects. For other cases there is WALIterator. Also I have some comments about IEP-59. I use `>` for quotes from the IEP. > Many use-cases build on observation and processing changed records. Yes. But there is a significant difference between Apache Ignite and RDBMS like MySQL and PostgreSQL. Apache Ignite is a middleware class product and required logic could be implemented as part of business logic while RDBMS don't provide such possibility (you could implement an optional module for CDC purposes but *usually* you can't implement CDC-like functionality as a stored procedure). > Disadvantages of the CQ in described scenarios: > > CQ requires data to be sent over the network In a normal case the CDC tool will also send data over the network. I doubt that Apache Ignite node which is hosted on the same server where, for example, analytical storage (consumer) is hosted, is a good idea. > CQ parts (filter, transformer) live inside server node JVM so issues in it may affect server node stability. May affect or may not. It depends on filter/transformer implementation. Just implement these parts properly. We have many dangerous pits in Apache Ignite which represent entities like filter, transformer, listener, etc. But we do not invent something that should protect developers from blocking all this stuff. Except of failure handling and blocker threads detection of course :) Many concurrent models based on similar concepts and all these models are sensitive to blocking. > Slow CQ listener leads to increasing of the memory consumption of the server node. Yes. But CQ couldn't be for free. Otherwise, this point is more likely to refer to the previous one. > Fails of the CQ listener lead to the loss of the events. What kind of fails? CQ listener is the developer's responsibility. > The convenient solution should be: > > Independence from the server node process (JVM) - issues and failures of the consumer shouldn't lead to server node instability. It's a very good point. And it is the best point for implementing CDC tool as some external tool. > Notification guarantees and failover - i.e. track and save a pointer to the last consumed record. Continue notification from this pointer in case of restart. Notification is a superfluous word here. There is no need for notifications. All you need is WAL segments. On Wed, Oct 14, 2020 at 4:05 PM Nikolay Izhikov <[hidden email]> wrote: > > Hello, Evgeni. > > > It seems like this solution will have an unpredictable delay for synchronization for handling events. > > It’s true that CDC solution doesn’t have strict boundaries for notification delay because of asynchronous nature. > But, I assume that we will introduce a WAL rollout timeout for CDC cases > Please, take a look at the ticket [1]. > > The same approach is used by Oracle and other databases that implement CDC. > > Anyway, I treat notification delay and split of the event and its consumption as an advantage of CDC, not downside :) > > > Why can't we just implement a Debezium connector for Ignite, for example? > > I think we can. > But, AFAIK debezium connectors developed for other databases uses CDC implementations similar to proposed. > > [1] https://issues.apache.org/jira/browse/IGNITE-13582?src=confmacro > > > > 14 окт. 2020 г., в 15:36, Evgenii Zhuravlev <[hidden email]> написал(а): > > > > Hi, > > > >> On the segment archiving, utility iterates it using existing WALIterator > >> Wait and respond to some specific events or data changes. > > It seems like this solution will have an unpredictable delay for > > synchronization for handling events. > > > > Why can't we just implement a Debezium connector for Ignite, for example? > > https://debezium.io/documentation/reference/1.3/index.html. It is a pretty > > popular product that uses Kafka underneath. > > > > Evgenii > > > > > > ср, 14 окт. 2020 г. в 05:00, Pavel Kovalenko <[hidden email]>: > > > >> This tool is also can be used to store snapshots in an external warehouse. > >> > >> > >> ср, 14 окт. 2020 г. в 14:57, Pavel Kovalenko <[hidden email]>: > >> > >>> Hi Nikolay, > >>> > >>> The idea is good. But what do you think to integrate these ideas into > >>> WAL-G project? > >>> https://github.com/wal-g/wal-g > >>> It's a well-known tool that is already used to stream WAL for PostgreSQL, > >>> MySQL, and MongoDB. > >>> The advantages are integration with S3, GCP, Azure out of the box, > >>> encryption, and compression. > >>> > >>> > >>> ср, 14 окт. 2020 г. в 14:21, Nikolay Izhikov <[hidden email]>: > >>> > >>>> Hello, Igniters. > >>>> > >>>> I want to start a discussion of the new feature [1] > >>>> > >>>> CDC - capture data change. The feature allows the consumer to receive > >>>> online notifications about data record changes. > >>>> > >>>> It can be used in the following scenarios: > >>>> * Export data into some warehouse, full-text search, or > >>>> distributed log system. > >>>> * Online statistics and analytics. > >>>> * Wait and respond to some specific events or data changes. > >>>> > >>>> Propose to implement new IgniteCDC application as follows: > >>>> * Run on the server node host. > >>>> * Watches for the appearance of the WAL archive segments. > >>>> * Iterates it using existing WALIterator and notifies consumer > >> of > >>>> each record from the segment. > >>>> > >>>> IgniteCDC features: > >>>> * Independence from the server node process (JVM) - issues and > >>>> failures of the consumer will not lead to server node instability. > >>>> * Notification guarantees and failover - i.e. CDC track and save > >>>> the pointer to the last consumed record. Continue notification from this > >>>> pointer in case of restart. > >>>> * Resilience for the consumer - it's not an issue when a > >> consumer > >>>> temporarily consumes slower than data appear. > >>>> > >>>> WDYT? > >>>> > >>>> [1] > >>>> > >> https://cwiki.apache.org/confluence/display/IGNITE/IEP-59+CDC+-+Capture+Data+Change > >>> > >>> > >> > |
In reply to this post by Nikolay Izhikov-2
Hello Nikolay,
Thanks for the suggestion, it definitely may be a good feature, however, I do not see any significant value that it currently adds to the already existing WAL Iterator. I think the following issues should be addressed, otherwise, no regular user will be able to use the CDC reliably: - The interface exposes WALRecord which is a private API - There is no way to start capturing changes from a certain point (a watermark for already processed data). Users can configure a large size for WAL archive to sustain long node downtime for historical rebalance. If a CDC agent is restarted, it will have to start from scratch. I see that it is present in the IEP as a design choice, but I think this is a major usability issue - If a CDC reader does not keep up with the WAL write rate (e.g. there is a short-term write burst and WAL archive is small), the Ignite node will delete WAL segments while the consumer is still reading it. Since the consumer is running out-of-process, we need to specify some sort of synchronization protocol between the node and the consumer - If Ignite node crashes, gets restarted and initiates full rebalance, the consumer will lose some updates - Usually, it makes sense for the CDC consumer to read updates only on primary nodes (otherwise, multiple agents will be doing duplicate work). In the current design, the consumer will not be able to differentiate primary/backup updates. Moreover, even if we wrote such flags to WAL, the consumer would need to process backup records anyway because it is unknown whether the primary consumer is alive. In other words, how would an end user organize the CDC failover minimizing the duplicate work? ср, 14 окт. 2020 г. в 14:21, Nikolay Izhikov <[hidden email]>: > Hello, Igniters. > > I want to start a discussion of the new feature [1] > > CDC - capture data change. The feature allows the consumer to receive > online notifications about data record changes. > > It can be used in the following scenarios: > * Export data into some warehouse, full-text search, or > distributed log system. > * Online statistics and analytics. > * Wait and respond to some specific events or data changes. > > Propose to implement new IgniteCDC application as follows: > * Run on the server node host. > * Watches for the appearance of the WAL archive segments. > * Iterates it using existing WALIterator and notifies consumer of > each record from the segment. > > IgniteCDC features: > * Independence from the server node process (JVM) - issues and > failures of the consumer will not lead to server node instability. > * Notification guarantees and failover - i.e. CDC track and save > the pointer to the last consumed record. Continue notification from this > pointer in case of restart. > * Resilience for the consumer - it's not an issue when a consumer > temporarily consumes slower than data appear. > > WDYT? > > [1] > https://cwiki.apache.org/confluence/display/IGNITE/IEP-59+CDC+-+Capture+Data+Change |
Alexey,
>> If a CDC agent is restarted, it will have to start from scratch >> If a CDC reader does not keep up with the WAL write rate (e.g. there is a short-term write burst and WAL archive is small), the Ignite node will delete WAL segments while the consumer is still reading it. I think these cases can be resolved with the following approach: PostgreSQL can be configured to execute a shell command after WAL segment is archived. The same thing we can do for Ignite as well. A command can create a hardlink for such WAL segment to a specified directory to not loose it after deletion by Ignite and notify a CDC (or another kind of process) about this segment. That will be a filesystem queue and CDC after restart may proceed only segments located at this directory, so it's no need to start from scratch. When WAL segment is processed by CDC a hardlink from queue directory is deleted. пт, 16 окт. 2020 г. в 13:42, Alexey Goncharuk <[hidden email]>: > Hello Nikolay, > > Thanks for the suggestion, it definitely may be a good feature, however, I > do not see any significant value that it currently adds to the already > existing WAL Iterator. I think the following issues should be addressed, > otherwise, no regular user will be able to use the CDC reliably: > > - The interface exposes WALRecord which is a private API > - There is no way to start capturing changes from a certain point (a > watermark for already processed data). Users can configure a large size > for > WAL archive to sustain long node downtime for historical rebalance. If a > CDC agent is restarted, it will have to start from scratch. I see that > it > is present in the IEP as a design choice, but I think this is a major > usability issue > - If a CDC reader does not keep up with the WAL write rate (e.g. there > is a short-term write burst and WAL archive is small), the Ignite node > will > delete WAL segments while the consumer is still reading it. Since the > consumer is running out-of-process, we need to specify some sort of > synchronization protocol between the node and the consumer > - If Ignite node crashes, gets restarted and initiates full rebalance, > the consumer will lose some updates > - Usually, it makes sense for the CDC consumer to read updates only on > primary nodes (otherwise, multiple agents will be doing duplicate > work). In > the current design, the consumer will not be able to differentiate > primary/backup updates. Moreover, even if we wrote such flags to WAL, > the > consumer would need to process backup records anyway because it is > unknown > whether the primary consumer is alive. In other words, how would an end > user organize the CDC failover minimizing the duplicate work? > > > ср, 14 окт. 2020 г. в 14:21, Nikolay Izhikov <[hidden email]>: > > > Hello, Igniters. > > > > I want to start a discussion of the new feature [1] > > > > CDC - capture data change. The feature allows the consumer to receive > > online notifications about data record changes. > > > > It can be used in the following scenarios: > > * Export data into some warehouse, full-text search, or > > distributed log system. > > * Online statistics and analytics. > > * Wait and respond to some specific events or data changes. > > > > Propose to implement new IgniteCDC application as follows: > > * Run on the server node host. > > * Watches for the appearance of the WAL archive segments. > > * Iterates it using existing WALIterator and notifies consumer of > > each record from the segment. > > > > IgniteCDC features: > > * Independence from the server node process (JVM) - issues and > > failures of the consumer will not lead to server node instability. > > * Notification guarantees and failover - i.e. CDC track and save > > the pointer to the last consumed record. Continue notification from this > > pointer in case of restart. > > * Resilience for the consumer - it's not an issue when a consumer > > temporarily consumes slower than data appear. > > > > WDYT? > > > > [1] > > > https://cwiki.apache.org/confluence/display/IGNITE/IEP-59+CDC+-+Capture+Data+Change > |
Hello, Alexey.
Sorry, for the long answer. > - The interface exposes WALRecord which is a private API Not it's fixed. CDC consumer should use public API to get notifications about a data change. This API can be found in IEP [1] and PR [2] ``` @IgniteExperimental public interface DataChangeListener<K, V> { String id(); void start(IgniteConfiguration configuration, IgniteLogger log); boolean keepBinary(); boolean onChange(Iterable<EntryEvent<K, V>> events); void stop(); } @IgniteExperimental public interface EntryEvent<K, V> { public K key(); public V value(); public boolean primary(); EntryEventType operation(); long cacheId(); long expireTime(); } ``` > There is no way to start capturing changes from a certain point > If a CDC agent is restarted, it will have to start from scratch. There are a way :). CDC store processed offset in a special file. In the case of CDC restart changes will be captured from the last committed offset. > Users can configure a large size for WAL archive to sustain long node downtime for historical rebalance To fix this issue I propose to introduce timeout for force WAL segment archive. And yes, the event time gap and big WAL segments are tradeoffs for real-world deployment. > - If a CDC reader does not keep up with the WAL write rate (e.g. there is a short-term write burst and WAL archive is small), the Ignite node will delete WAL segments while the consumer is still reading it. This is fixed now. I implemented Pavel proposal - On WAL rollover if CDC enabled hard link to archive segment is created in a special folder. So CDC can process segment independently from main Ignite process and delete when finished. Note, that segment data will be removed from the disk only after both CDC and Ignite will remove the hard link to it. > If Ignite node crashes, gets restarted and initiates full rebalance, the consumer will lose some updates I expect that consumers will be started on each cluster node. So, no event loss here. > Usually, it makes sense for the CDC consumer to read updates only on primary nodes Makes sense. Thanks. I've added `primary` flag to DataEntry WAL record it. Take a look at PR [3] > the consumer would need to process backup records anyway because it is unknown whether the primary consumer is alive. If CDC on some node is down it will deliver updates on restart. I want to restrict CDC scope only for "Deliver local WAL event to the consumer". CDC itself not responsible for distributed consumer state. It's up to the consumer to implement some kind of failover scenario to keep all CDC up and running. And yes, it's expected - if CDC is down then event lag is grown. To prevent it the user can process all events, not only primary. > In other words, how would an end-user organize the CDC failover minimizing the duplicate work? 1. To recover from the CDC app failure simple restart will work. 2. For now, the user can distinguish between primary and backup DataEntry. This approach allows to the user prevent duplicate work and recover from Ignite node fail when the OS and CDC app still up. 3. If it's required to keep a small event gap in case of server failure(OS, Ignite node, and CDC app is down). It's required to process changes for backup nodes and do some duplicate processing. [1] https://cwiki.apache.org/confluence/display/IGNITE/IEP-59+CDC+-+Capture+Data+Change [2] https://github.com/apache/ignite/pull/8360 [3] https://github.com/apache/ignite/pull/8377 > 16 окт. 2020 г., в 14:19, Pavel Kovalenko <[hidden email]> написал(а): > > Alexey, > >>> If a CDC agent is restarted, it will have to start from scratch >>> If a CDC reader does not keep up with the WAL write rate (e.g. there > is a short-term write burst and WAL archive is small), the Ignite node > will > delete WAL segments while the consumer is still reading it. > > I think these cases can be resolved with the following approach: > PostgreSQL can be configured to execute a shell command after WAL segment > is archived. The same thing we can do for Ignite as well. > A command can create a hardlink for such WAL segment to a specified > directory to not loose it after deletion by Ignite and notify a CDC (or > another kind of process) about this segment. > That will be a filesystem queue and CDC after restart may proceed only > segments located at this directory, so it's no need to start from scratch. > When WAL segment is processed by CDC a hardlink from queue directory is > deleted. > > > > пт, 16 окт. 2020 г. в 13:42, Alexey Goncharuk <[hidden email]>: > >> Hello Nikolay, >> >> Thanks for the suggestion, it definitely may be a good feature, however, I >> do not see any significant value that it currently adds to the already >> existing WAL Iterator. I think the following issues should be addressed, >> otherwise, no regular user will be able to use the CDC reliably: >> >> - The interface exposes WALRecord which is a private API >> - There is no way to start capturing changes from a certain point (a >> watermark for already processed data). Users can configure a large size >> for >> WAL archive to sustain long node downtime for historical rebalance. If a >> CDC agent is restarted, it will have to start from scratch. I see that >> it >> is present in the IEP as a design choice, but I think this is a major >> usability issue >> - If a CDC reader does not keep up with the WAL write rate (e.g. there >> is a short-term write burst and WAL archive is small), the Ignite node >> will >> delete WAL segments while the consumer is still reading it. Since the >> consumer is running out-of-process, we need to specify some sort of >> synchronization protocol between the node and the consumer >> - If Ignite node crashes, gets restarted and initiates full rebalance, >> the consumer will lose some updates >> - Usually, it makes sense for the CDC consumer to read updates only on >> primary nodes (otherwise, multiple agents will be doing duplicate >> work). In >> the current design, the consumer will not be able to differentiate >> primary/backup updates. Moreover, even if we wrote such flags to WAL, >> the >> consumer would need to process backup records anyway because it is >> unknown >> whether the primary consumer is alive. In other words, how would an end >> user organize the CDC failover minimizing the duplicate work? >> >> >> ср, 14 окт. 2020 г. в 14:21, Nikolay Izhikov <[hidden email]>: >> >>> Hello, Igniters. >>> >>> I want to start a discussion of the new feature [1] >>> >>> CDC - capture data change. The feature allows the consumer to receive >>> online notifications about data record changes. >>> >>> It can be used in the following scenarios: >>> * Export data into some warehouse, full-text search, or >>> distributed log system. >>> * Online statistics and analytics. >>> * Wait and respond to some specific events or data changes. >>> >>> Propose to implement new IgniteCDC application as follows: >>> * Run on the server node host. >>> * Watches for the appearance of the WAL archive segments. >>> * Iterates it using existing WALIterator and notifies consumer of >>> each record from the segment. >>> >>> IgniteCDC features: >>> * Independence from the server node process (JVM) - issues and >>> failures of the consumer will not lead to server node instability. >>> * Notification guarantees and failover - i.e. CDC track and save >>> the pointer to the last consumed record. Continue notification from this >>> pointer in case of restart. >>> * Resilience for the consumer - it's not an issue when a consumer >>> temporarily consumes slower than data appear. >>> >>> WDYT? >>> >>> [1] >>> >> https://cwiki.apache.org/confluence/display/IGNITE/IEP-59+CDC+-+Capture+Data+Change >> |
Free forum by Nabble | Edit this page |