Hello!
We had this feature for a few versions, where you could do gnite.cluster(). disableWal() to temporarily disable WAL on a specific cache, involving a PME and checkpoint on every node. However, it became apparent that you cannot enable or disable WAL on any kind of unstable topology, at all: https://issues.apache.org/jira/browse/IGNITE-13976 You cannot even disable WAL while a baseline node is offline: When it comes back, it will not sync its WAL enabled status with the rest of the cluster, and all subsequent "WAL enable" or "WAL disable" operations will fail on that cache, with no clear way to recover this cache: ignite.close(); client.cluster().disableWal(CACHE_NAME); nodes.add(Ignition.start(igniteCfg(false, consistentId))); client.cluster().enableWal(CACHE_NAME); // will fail Even if this simple scenario is fixed, it seems that there are multiple failure scenarios if you try to add or remove a node in the middle of WAL state change operation. It does not seem that we have any expertise in wal disable/enable implementation right now, and I did not find a simple way of fixing it short of a full rewrite. Therefore, I propose that we should *(a) disable that feature* in 2.10 or* (b) give a clear warning *when it is used, and also mention in the documentation that it may only be used on stable topology. We may also want to re-mark this feature's API as @IgniteExperimental. I have lifted this ticket to be a Blocker. WDYT? Regards, |
Hi Ilya,
WAL disable is a very powerful feature that is widely adopted by users. For sure we need to fix it, even if it means rewriting it. The warning makes sense, in this case, we can even reduce the priority of the issue, but anyway, it's at least a critical one, because it can lead to data loss(and it does). I would say, instead of a warning, we can do something more noticeable, like method signature change like: boolean disableWal(String cacheName, boolean iReadJavaDockAndAwareOfTheRisk); - this one definitely will be noticed. Thanks, Mike. On Wed, Jan 20, 2021 at 8:28 AM Ilya Kasnacheev <[hidden email]> wrote: > Hello! > > We had this feature for a few versions, where you could do gnite.cluster(). > disableWal() to temporarily disable WAL on a specific cache, involving a > PME and checkpoint on every node. > > However, it became apparent that you cannot enable or disable WAL on any > kind of unstable topology, at all: > https://issues.apache.org/jira/browse/IGNITE-13976 > > You cannot even disable WAL while a baseline node is offline: When it comes > back, it will not sync its WAL enabled status with the rest of the cluster, > and all subsequent "WAL enable" or "WAL disable" operations will fail on > that cache, with no clear way to recover this cache: > > ignite.close(); > client.cluster().disableWal(CACHE_NAME); > nodes.add(Ignition.start(igniteCfg(false, consistentId))); > client.cluster().enableWal(CACHE_NAME); // will fail > > Even if this simple scenario is fixed, it seems that there are multiple > failure scenarios if you try to add or remove a node in the middle of WAL > state change operation. It does not seem that we have any expertise in wal > disable/enable implementation right now, and I did not find a simple way of > fixing it short of a full rewrite. > > Therefore, I propose that we should *(a) disable that feature* in 2.10 or* > (b) give a clear warning *when it is used, and also mention in the > documentation that it may only be used on stable topology. > > We may also want to re-mark this feature's API as @IgniteExperimental. > I have lifted this ticket to be a Blocker. > > WDYT? > > Regards, > -- Thanks, Mikhail. |
Ilya,
This issue must be fixed for sure (don't think we should rewrite it from scratch). Let's add TODO and warning comment referencing to this issue to the JavaDoc and also add the same warning to documentation pages. The reference to the issue will allow users to track the fixing progress. On Wed, 20 Jan 2021 at 22:39, Mikhail Cherkasov <[hidden email]> wrote: > > Hi Ilya, > > WAL disable is a very powerful feature that is widely adopted by users. > For sure we need to fix it, even if it means rewriting it. > The warning makes sense, in this case, we can even reduce the priority of > the issue, but anyway, it's at least a critical one, because it can lead to > data loss(and it does). > I would say, instead of a warning, we can do something more noticeable, > like method signature change like: > boolean disableWal(String cacheName, boolean > iReadJavaDockAndAwareOfTheRisk); - this one definitely will be noticed. > > Thanks, > Mike. > > On Wed, Jan 20, 2021 at 8:28 AM Ilya Kasnacheev <[hidden email]> wrote: > > > Hello! > > > > We had this feature for a few versions, where you could do gnite.cluster(). > > disableWal() to temporarily disable WAL on a specific cache, involving a > > PME and checkpoint on every node. > > > > However, it became apparent that you cannot enable or disable WAL on any > > kind of unstable topology, at all: > > https://issues.apache.org/jira/browse/IGNITE-13976 > > > > You cannot even disable WAL while a baseline node is offline: When it comes > > back, it will not sync its WAL enabled status with the rest of the cluster, > > and all subsequent "WAL enable" or "WAL disable" operations will fail on > > that cache, with no clear way to recover this cache: > > > > ignite.close(); > > client.cluster().disableWal(CACHE_NAME); > > nodes.add(Ignition.start(igniteCfg(false, consistentId))); > > client.cluster().enableWal(CACHE_NAME); // will fail > > > > Even if this simple scenario is fixed, it seems that there are multiple > > failure scenarios if you try to add or remove a node in the middle of WAL > > state change operation. It does not seem that we have any expertise in wal > > disable/enable implementation right now, and I did not find a simple way of > > fixing it short of a full rewrite. > > > > Therefore, I propose that we should *(a) disable that feature* in 2.10 or* > > (b) give a clear warning *when it is used, and also mention in the > > documentation that it may only be used on stable topology. > > > > We may also want to re-mark this feature's API as @IgniteExperimental. > > I have lifted this ticket to be a Blocker. > > > > WDYT? > > > > Regards, > > > > > -- > Thanks, > Mikhail. |
Hello!
I have created a separate ticket for that work: https://issues.apache.org/jira/browse/IGNITE-14039 I have designated it as a blocker for 2.10 I have submitted a pull request. I will run tests: https://github.com/apache/ignite/pull/8688/files Mikhail, your approach makes sense. However, I don't think we can change existing API - too late for that, we have to maintain compatibility. Regards, -- Ilya Kasnacheev чт, 21 янв. 2021 г. в 15:33, Maxim Muzafarov <[hidden email]>: > Ilya, > > This issue must be fixed for sure (don't think we should rewrite it > from scratch). > > Let's add TODO and warning comment referencing to this issue to the > JavaDoc and also add the same warning to documentation pages. The > reference to the issue will allow users to track the fixing progress. > > > On Wed, 20 Jan 2021 at 22:39, Mikhail Cherkasov <[hidden email]> > wrote: > > > > Hi Ilya, > > > > WAL disable is a very powerful feature that is widely adopted by users. > > For sure we need to fix it, even if it means rewriting it. > > The warning makes sense, in this case, we can even reduce the priority of > > the issue, but anyway, it's at least a critical one, because it can lead > to > > data loss(and it does). > > I would say, instead of a warning, we can do something more noticeable, > > like method signature change like: > > boolean disableWal(String cacheName, boolean > > iReadJavaDockAndAwareOfTheRisk); - this one definitely will be noticed. > > > > Thanks, > > Mike. > > > > On Wed, Jan 20, 2021 at 8:28 AM Ilya Kasnacheev <[hidden email]> > wrote: > > > > > Hello! > > > > > > We had this feature for a few versions, where you could do > gnite.cluster(). > > > disableWal() to temporarily disable WAL on a specific cache, involving > a > > > PME and checkpoint on every node. > > > > > > However, it became apparent that you cannot enable or disable WAL on > any > > > kind of unstable topology, at all: > > > https://issues.apache.org/jira/browse/IGNITE-13976 > > > > > > You cannot even disable WAL while a baseline node is offline: When it > comes > > > back, it will not sync its WAL enabled status with the rest of the > cluster, > > > and all subsequent "WAL enable" or "WAL disable" operations will fail > on > > > that cache, with no clear way to recover this cache: > > > > > > ignite.close(); > > > client.cluster().disableWal(CACHE_NAME); > > > nodes.add(Ignition.start(igniteCfg(false, consistentId))); > > > client.cluster().enableWal(CACHE_NAME); // will fail > > > > > > Even if this simple scenario is fixed, it seems that there are multiple > > > failure scenarios if you try to add or remove a node in the middle of > WAL > > > state change operation. It does not seem that we have any expertise in > wal > > > disable/enable implementation right now, and I did not find a simple > way of > > > fixing it short of a full rewrite. > > > > > > Therefore, I propose that we should *(a) disable that feature* in 2.10 > or* > > > (b) give a clear warning *when it is used, and also mention in the > > > documentation that it may only be used on stable topology. > > > > > > We may also want to re-mark this feature's API as @IgniteExperimental. > > > I have lifted this ticket to be a Blocker. > > > > > > WDYT? > > > > > > Regards, > > > > > > > > > -- > > Thanks, > > Mikhail. > |
Free forum by Nabble | Edit this page |