I found a lot of using "ClusterTopologyCheckedException" exception. In most
use-case these exceptions were just ignored. It's easier to see if issue-4612 will be applied (I'm waiting for code review by my team leader) where I renamed all unused exceptions in the "ignored". It means that in some case "retryReadyFuture" can left unset. But there are code where exception is being getting from "get()" method in some "future" class and don't ignored (exception is sending to method "GridFutureAdapter#onDone()" for example). So I decided not to do a deep global analysis of code and make some simple rules. 1. If some method "X" throws ClusterTopologyCheckedException and ignores it (or converts to String, there are some variants to lose exception like object) in all methods that call the method "X", then we can don't set "retryReadyFuture" for optimization. But this should be clarified in javadoc. 2. But if exception escapes at least once like object: we must set "retryReadyFuture" in that method. A few times when call tree was simple, I went deeper. So I found only three methods which can throw "ClusterTopologyCheckedException" without setting "retryReadyFuture": 1. IgfsFragmentizerManager#sendWithRetries(UUID nodeId, IgfsCommunicationMessage msg) 2. GridCacheIoManager#sendNoRetry(ClusterNode node, GridCacheMessage msg, byte plc) 3. GridCacheIoManager#sendOrderedMessage(ClusterNode node, Object topic, GridCacheMessage msg, byte plc, long timeout) In all other methods "ClusterTopologyCheckedException" escapes away too far. What do you think about this approach to make compromise between optimization and correctness? |
Alexander,
Do you have a use-case in mind which results in IgniteTopologyException thrown from public API with retryFuture unset? retryFuture makes sense only for certain cache operations and may be set only in certain places in the code where we already know what to wait for. I do not see how you can initialize this future, for example, in GridCacheIoManager#sendNoRetry(ClusterNode node, GridCacheMessage msg, byte plc) --AG 2017-01-27 15:45 GMT+03:00 Александр Меньшиков <[hidden email]>: > I found a lot of using "ClusterTopologyCheckedException" exception. In > most > use-case these exceptions were just ignored. It's easier to see if > issue-4612 will be applied (I'm waiting for code review by my team leader) > where I renamed all unused exceptions in the "ignored". > > It means that in some case "retryReadyFuture" can left unset. But there are > code where exception is being getting from "get()" method in some "future" > class and don't ignored (exception is sending to method > "GridFutureAdapter#onDone()" for example). > > So I decided not to do a deep global analysis of code and make some simple > rules. > 1. If some method "X" throws ClusterTopologyCheckedException and ignores > it > (or converts to String, there are some variants to lose exception like > object) in all methods that call the method "X", then we can don't set > "retryReadyFuture" for optimization. But this should be clarified in > javadoc. > 2. But if exception escapes at least once like object: we must set > "retryReadyFuture" in that method. > > A few times when call tree was simple, I went deeper. > > So I found only three methods which can throw > "ClusterTopologyCheckedException" without setting "retryReadyFuture": > > 1. IgfsFragmentizerManager#sendWithRetries(UUID nodeId, > IgfsCommunicationMessage msg) > 2. GridCacheIoManager#sendNoRetry(ClusterNode node, GridCacheMessage msg, > byte plc) > 3. GridCacheIoManager#sendOrderedMessage(ClusterNode node, Object topic, > GridCacheMessage msg, byte plc, long timeout) > > In all other methods "ClusterTopologyCheckedException" escapes away too > far. > > What do you think about this approach to make compromise between > optimization and correctness? > |
Alexey,
For GridCacheIoManager#sendNoRetry it's not necessary because exception will be ignored (or will cast to String). Perhaps my message was unclear. I try to say that three methods can throw "ClusterTopologyCheckedException" without any problem. But you are right, in some methods I can't set "retryFuture". We must ensure that exception doesn't escape away. The best option is to ignore it (or cast to String). But any way, look here: IgniteTxHandler#sendReply(UUID nodeId, GridDhtTxFinishRequest req, boolean committed, GridCacheVersion nearTxId) Inside you can found next lines: __________________________ ClusterTopologyCheckedException *cause* = new ClusterTopologyCheckedException("Primary node left grid."); res.checkCommittedError(new IgniteTxRollbackCheckedException("Failed to commit transaction " + "(transaction has been rolled back on backup node): " + req.version(), *cause*)); __________________________ How do you unsure that *"cause"* can't come to GridCacheUtils#retryTopologySafe(final Callable<S> c) ? Perhaps I don't know some rules which you use to maintain it now? 2017-01-30 12:24 GMT+03:00 Alexey Goncharuk <[hidden email]>: > Alexander, > > Do you have a use-case in mind which results in IgniteTopologyException > thrown from public API with retryFuture unset? > > retryFuture makes sense only for certain cache operations and may be set > only in certain places in the code where we already know what to wait for. > I do not see how you can initialize this future, for example, in > GridCacheIoManager#sendNoRetry(ClusterNode node, GridCacheMessage msg, > byte > plc) > > --AG > > 2017-01-27 15:45 GMT+03:00 Александр Меньшиков <[hidden email]>: > > > I found a lot of using "ClusterTopologyCheckedException" exception. In > > most > > use-case these exceptions were just ignored. It's easier to see if > > issue-4612 will be applied (I'm waiting for code review by my team > leader) > > where I renamed all unused exceptions in the "ignored". > > > > It means that in some case "retryReadyFuture" can left unset. But there > are > > code where exception is being getting from "get()" method in some > "future" > > class and don't ignored (exception is sending to method > > "GridFutureAdapter#onDone()" for example). > > > > So I decided not to do a deep global analysis of code and make some > simple > > rules. > > 1. If some method "X" throws ClusterTopologyCheckedException and ignores > > it > > (or converts to String, there are some variants to lose exception like > > object) in all methods that call the method "X", then we can don't set > > "retryReadyFuture" for optimization. But this should be clarified in > > javadoc. > > 2. But if exception escapes at least once like object: we must set > > "retryReadyFuture" in that method. > > > > A few times when call tree was simple, I went deeper. > > > > So I found only three methods which can throw > > "ClusterTopologyCheckedException" without setting "retryReadyFuture": > > > > 1. IgfsFragmentizerManager#sendWithRetries(UUID nodeId, > > IgfsCommunicationMessage msg) > > 2. GridCacheIoManager#sendNoRetry(ClusterNode node, GridCacheMessage > msg, > > byte plc) > > 3. GridCacheIoManager#sendOrderedMessage(ClusterNode node, Object topic, > > GridCacheMessage msg, byte plc, long timeout) > > > > In all other methods "ClusterTopologyCheckedException" escapes away too > > far. > > > > What do you think about this approach to make compromise between > > optimization and correctness? > > > |
In the example above there is no point of setting the retry future in the
exception because it will be serialized and sent to a remote node. I see the only possible way to ensure this: make this be verified at compile time. This looks like a big change, but the draft may look like so: 1) Introduce new exception type, like CacheTopology*Checked*Exception which *must* take the minimum topology version to wait for 2) In all cases when a cache operation fails because of topology change, provide the appropriate exception 3) When CacheTopologyException is being constructed, create a corresponding local future based on wait version and throw the exception. Even though this still does not give us 100% guarantee that we did not miss anything, it should cover a lot more cases than now. 2017-01-30 14:20 GMT+03:00 Александр Меньшиков <[hidden email]>: > Alexey, > > For GridCacheIoManager#sendNoRetry it's not necessary because exception > will be ignored (or will cast to String). Perhaps my message was unclear. > I try to say that three methods can throw "ClusterTopologyCheckedException" > without any problem. > > But you are right, in some methods I can't set "retryFuture". We must > ensure that exception doesn't escape away. The best option is to ignore it > (or cast to String). > > But any way, look here: > > IgniteTxHandler#sendReply(UUID nodeId, GridDhtTxFinishRequest req, boolean > committed, GridCacheVersion nearTxId) > > Inside you can found next lines: > __________________________ > ClusterTopologyCheckedException *cause* = > new ClusterTopologyCheckedException("Primary node left grid."); > > res.checkCommittedError(new IgniteTxRollbackCheckedException("Failed to > commit transaction " + > "(transaction has been rolled back on backup node): " + req.version(), > *cause*)); > __________________________ > > How do you unsure that *"cause"* can't come to GridCacheUtils#retryTopologySafe(final > Callable<S> c) ? > > > Perhaps I don't know some rules which you use to maintain it now? > > > 2017-01-30 12:24 GMT+03:00 Alexey Goncharuk <[hidden email]>: > >> Alexander, >> >> Do you have a use-case in mind which results in IgniteTopologyException >> thrown from public API with retryFuture unset? >> >> retryFuture makes sense only for certain cache operations and may be set >> only in certain places in the code where we already know what to wait for. >> I do not see how you can initialize this future, for example, in >> GridCacheIoManager#sendNoRetry(ClusterNode node, GridCacheMessage msg, >> byte >> plc) >> >> --AG >> >> 2017-01-27 15:45 GMT+03:00 Александр Меньшиков <[hidden email]>: >> >> > I found a lot of using "ClusterTopologyCheckedException" exception. In >> > most >> > use-case these exceptions were just ignored. It's easier to see if >> > issue-4612 will be applied (I'm waiting for code review by my team >> leader) >> > where I renamed all unused exceptions in the "ignored". >> > >> > It means that in some case "retryReadyFuture" can left unset. But there >> are >> > code where exception is being getting from "get()" method in some >> "future" >> > class and don't ignored (exception is sending to method >> > "GridFutureAdapter#onDone()" for example). >> > >> > So I decided not to do a deep global analysis of code and make some >> simple >> > rules. >> > 1. If some method "X" throws ClusterTopologyCheckedException and >> ignores >> > it >> > (or converts to String, there are some variants to lose exception like >> > object) in all methods that call the method "X", then we can don't set >> > "retryReadyFuture" for optimization. But this should be clarified in >> > javadoc. >> > 2. But if exception escapes at least once like object: we must set >> > "retryReadyFuture" in that method. >> > >> > A few times when call tree was simple, I went deeper. >> > >> > So I found only three methods which can throw >> > "ClusterTopologyCheckedException" without setting "retryReadyFuture": >> > >> > 1. IgfsFragmentizerManager#sendWithRetries(UUID nodeId, >> > IgfsCommunicationMessage msg) >> > 2. GridCacheIoManager#sendNoRetry(ClusterNode node, GridCacheMessage >> msg, >> > byte plc) >> > 3. GridCacheIoManager#sendOrderedMessage(ClusterNode node, Object >> topic, >> > GridCacheMessage msg, byte plc, long timeout) >> > >> > In all other methods "ClusterTopologyCheckedException" escapes away too >> > far. >> > >> > What do you think about this approach to make compromise between >> > optimization and correctness? >> > >> > > |
Okay, it seems good. So you want to remove field *readyFut* from
*ClusterTopologyCheckedException* and add *CacheTopologyCheckedException* like *ClusterTopologyCheckedException* but initialize *readyFut* in constructor, yes? 2017-01-30 16:54 GMT+03:00 Alexey Goncharuk <[hidden email]>: > In the example above there is no point of setting the retry future in the > exception because it will be serialized and sent to a remote node. > > I see the only possible way to ensure this: make this be verified at > compile time. This looks like a big change, but the draft may look like so: > 1) Introduce new exception type, like CacheTopology*Checked*Exception > which > *must* take the minimum topology version to wait for > 2) In all cases when a cache operation fails because of topology change, > provide the appropriate exception > 3) When CacheTopologyException is being constructed, create a corresponding > local future based on wait version and throw the exception. > > Even though this still does not give us 100% guarantee that we did not miss > anything, it should cover a lot more cases than now. > > 2017-01-30 14:20 GMT+03:00 Александр Меньшиков <[hidden email]>: > > > Alexey, > > > > For GridCacheIoManager#sendNoRetry it's not necessary because exception > > will be ignored (or will cast to String). Perhaps my message was unclear. > > I try to say that three methods can throw " > ClusterTopologyCheckedException" > > without any problem. > > > > But you are right, in some methods I can't set "retryFuture". We must > > ensure that exception doesn't escape away. The best option is to ignore > it > > (or cast to String). > > > > But any way, look here: > > > > IgniteTxHandler#sendReply(UUID nodeId, GridDhtTxFinishRequest req, > boolean > > committed, GridCacheVersion nearTxId) > > > > Inside you can found next lines: > > __________________________ > > ClusterTopologyCheckedException *cause* = > > new ClusterTopologyCheckedException("Primary node left grid."); > > > > res.checkCommittedError(new IgniteTxRollbackCheckedException("Failed to > > commit transaction " + > > "(transaction has been rolled back on backup node): " + > req.version(), > > *cause*)); > > __________________________ > > > > How do you unsure that *"cause"* can't come to GridCacheUtils# > retryTopologySafe(final > > Callable<S> c) ? > > > > > > Perhaps I don't know some rules which you use to maintain it now? > > > > > > 2017-01-30 12:24 GMT+03:00 Alexey Goncharuk <[hidden email] > >: > > > >> Alexander, > >> > >> Do you have a use-case in mind which results in IgniteTopologyException > >> thrown from public API with retryFuture unset? > >> > >> retryFuture makes sense only for certain cache operations and may be set > >> only in certain places in the code where we already know what to wait > for. > >> I do not see how you can initialize this future, for example, in > >> GridCacheIoManager#sendNoRetry(ClusterNode node, GridCacheMessage msg, > >> byte > >> plc) > >> > >> --AG > >> > >> 2017-01-27 15:45 GMT+03:00 Александр Меньшиков <[hidden email]>: > >> > >> > I found a lot of using "ClusterTopologyCheckedException" exception. > In > >> > most > >> > use-case these exceptions were just ignored. It's easier to see if > >> > issue-4612 will be applied (I'm waiting for code review by my team > >> leader) > >> > where I renamed all unused exceptions in the "ignored". > >> > > >> > It means that in some case "retryReadyFuture" can left unset. But > there > >> are > >> > code where exception is being getting from "get()" method in some > >> "future" > >> > class and don't ignored (exception is sending to method > >> > "GridFutureAdapter#onDone()" for example). > >> > > >> > So I decided not to do a deep global analysis of code and make some > >> simple > >> > rules. > >> > 1. If some method "X" throws ClusterTopologyCheckedException and > >> ignores > >> > it > >> > (or converts to String, there are some variants to lose exception like > >> > object) in all methods that call the method "X", then we can don't set > >> > "retryReadyFuture" for optimization. But this should be clarified in > >> > javadoc. > >> > 2. But if exception escapes at least once like object: we must set > >> > "retryReadyFuture" in that method. > >> > > >> > A few times when call tree was simple, I went deeper. > >> > > >> > So I found only three methods which can throw > >> > "ClusterTopologyCheckedException" without setting "retryReadyFuture": > >> > > >> > 1. IgfsFragmentizerManager#sendWithRetries(UUID nodeId, > >> > IgfsCommunicationMessage msg) > >> > 2. GridCacheIoManager#sendNoRetry(ClusterNode node, GridCacheMessage > >> msg, > >> > byte plc) > >> > 3. GridCacheIoManager#sendOrderedMessage(ClusterNode node, Object > >> topic, > >> > GridCacheMessage msg, byte plc, long timeout) > >> > > >> > In all other methods "ClusterTopologyCheckedException" escapes away > too > >> > far. > >> > > >> > What do you think about this approach to make compromise between > >> > optimization and correctness? > >> > > >> > > > > > |
In reply to this post by Alexey Goncharuk
Alexey, I ran into some difficulties.
Look at the method: GridNearTxFinishFuture.CheckBackupMiniFuture#onDhtFinishResponse(GridDhtTxFinishResponse res) *Throwable err* = res.checkCommittedError(); if (*err* != null) { if (*err* instanceof IgniteCheckedException) { *ClusterTopologyCheckedException cause* = ((IgniteCheckedException)*err*). *getCause(ClusterTopologyCheckedException.class)*; if (*cause* != null) *cause.retryReadyFuture(* cctx.nextAffinityReadyFuture(tx.topologyVersion())); * //^_____Here update the readyFut in some first "cause". * } onDone(*err*); } So I can't rewrite "cause" field in exception without reflection. It means if I try to use two exception: one with "readyFut" and second without "readyFut", I see no way to do it in this place. Perhaps I misunderstood your original idea. Or maybe this code some kind of a crutch and should be removed. What do you think? 2017-01-30 16:54 GMT+03:00 Alexey Goncharuk <[hidden email]>: > In the example above there is no point of setting the retry future in the > exception because it will be serialized and sent to a remote node. > > I see the only possible way to ensure this: make this be verified at > compile time. This looks like a big change, but the draft may look like so: > 1) Introduce new exception type, like CacheTopology*Checked*Exception > which > *must* take the minimum topology version to wait for > 2) In all cases when a cache operation fails because of topology change, > provide the appropriate exception > 3) When CacheTopologyException is being constructed, create a corresponding > local future based on wait version and throw the exception. > > Even though this still does not give us 100% guarantee that we did not miss > anything, it should cover a lot more cases than now. > > 2017-01-30 14:20 GMT+03:00 Александр Меньшиков <[hidden email]>: > > > Alexey, > > > > For GridCacheIoManager#sendNoRetry it's not necessary because exception > > will be ignored (or will cast to String). Perhaps my message was unclear. > > I try to say that three methods can throw " > ClusterTopologyCheckedException" > > without any problem. > > > > But you are right, in some methods I can't set "retryFuture". We must > > ensure that exception doesn't escape away. The best option is to ignore > it > > (or cast to String). > > > > But any way, look here: > > > > IgniteTxHandler#sendReply(UUID nodeId, GridDhtTxFinishRequest req, > boolean > > committed, GridCacheVersion nearTxId) > > > > Inside you can found next lines: > > __________________________ > > ClusterTopologyCheckedException *cause* = > > new ClusterTopologyCheckedException("Primary node left grid."); > > > > res.checkCommittedError(new IgniteTxRollbackCheckedException("Failed to > > commit transaction " + > > "(transaction has been rolled back on backup node): " + > req.version(), > > *cause*)); > > __________________________ > > > > How do you unsure that *"cause"* can't come to GridCacheUtils# > retryTopologySafe(final > > Callable<S> c) ? > > > > > > Perhaps I don't know some rules which you use to maintain it now? > > > > > > 2017-01-30 12:24 GMT+03:00 Alexey Goncharuk <[hidden email] > >: > > > >> Alexander, > >> > >> Do you have a use-case in mind which results in IgniteTopologyException > >> thrown from public API with retryFuture unset? > >> > >> retryFuture makes sense only for certain cache operations and may be set > >> only in certain places in the code where we already know what to wait > for. > >> I do not see how you can initialize this future, for example, in > >> GridCacheIoManager#sendNoRetry(ClusterNode node, GridCacheMessage msg, > >> byte > >> plc) > >> > >> --AG > >> > >> 2017-01-27 15:45 GMT+03:00 Александр Меньшиков <[hidden email]>: > >> > >> > I found a lot of using "ClusterTopologyCheckedException" exception. > In > >> > most > >> > use-case these exceptions were just ignored. It's easier to see if > >> > issue-4612 will be applied (I'm waiting for code review by my team > >> leader) > >> > where I renamed all unused exceptions in the "ignored". > >> > > >> > It means that in some case "retryReadyFuture" can left unset. But > there > >> are > >> > code where exception is being getting from "get()" method in some > >> "future" > >> > class and don't ignored (exception is sending to method > >> > "GridFutureAdapter#onDone()" for example). > >> > > >> > So I decided not to do a deep global analysis of code and make some > >> simple > >> > rules. > >> > 1. If some method "X" throws ClusterTopologyCheckedException and > >> ignores > >> > it > >> > (or converts to String, there are some variants to lose exception like > >> > object) in all methods that call the method "X", then we can don't set > >> > "retryReadyFuture" for optimization. But this should be clarified in > >> > javadoc. > >> > 2. But if exception escapes at least once like object: we must set > >> > "retryReadyFuture" in that method. > >> > > >> > A few times when call tree was simple, I went deeper. > >> > > >> > So I found only three methods which can throw > >> > "ClusterTopologyCheckedException" without setting "retryReadyFuture": > >> > > >> > 1. IgfsFragmentizerManager#sendWithRetries(UUID nodeId, > >> > IgfsCommunicationMessage msg) > >> > 2. GridCacheIoManager#sendNoRetry(ClusterNode node, GridCacheMessage > >> msg, > >> > byte plc) > >> > 3. GridCacheIoManager#sendOrderedMessage(ClusterNode node, Object > >> topic, > >> > GridCacheMessage msg, byte plc, long timeout) > >> > > >> > In all other methods "ClusterTopologyCheckedException" escapes away > too > >> > far. > >> > > >> > What do you think about this approach to make compromise between > >> > optimization and correctness? > >> > > >> > > > > > |
Alexander,
I do not see why you would need to overwrite the cause field. What I meant in the previous email is that instead of setting a retry ready future in the checked exception, you would set a correct affinity topology version here. Then, during a construction of CacheException (unchecked, which is guaranteed to be thrown locally and will not be serialized), you would create the retry ready future based on the topology version set. Hope this helps, AG 2017-02-07 17:18 GMT+03:00 Александр Меньшиков <[hidden email]>: > Alexey, I ran into some difficulties. > > Look at the method: GridNearTxFinishFuture.CheckBackupMiniFuture# > onDhtFinishResponse(GridDhtTxFinishResponse res) > > *Throwable err* = res.checkCommittedError(); > > if (*err* != null) { > if (*err* instanceof IgniteCheckedException) { > *ClusterTopologyCheckedException cause* = > ((IgniteCheckedException)*err*). > *getCause(ClusterTopologyCheckedException.class)*; > > if (*cause* != null) > *cause.retryReadyFuture(*cctx. > nextAffinityReadyFuture(tx.topologyVersion())); > * //^_____Here update the readyFut in > some first "cause". * > } > > onDone(*err*); > } > > > So I can't rewrite "cause" field in exception without reflection. It means > if I try to use two exception: one with "readyFut" and second without > "readyFut", I see no way to do it in this place. > > Perhaps I misunderstood your original idea. Or maybe this code some kind > of a crutch and should be removed. What do you think? > > 2017-01-30 16:54 GMT+03:00 Alexey Goncharuk <[hidden email]>: > >> In the example above there is no point of setting the retry future in the >> exception because it will be serialized and sent to a remote node. >> >> I see the only possible way to ensure this: make this be verified at >> compile time. This looks like a big change, but the draft may look like >> so: >> 1) Introduce new exception type, like CacheTopology*Checked*Exception >> which >> *must* take the minimum topology version to wait for >> 2) In all cases when a cache operation fails because of topology change, >> provide the appropriate exception >> 3) When CacheTopologyException is being constructed, create a >> corresponding >> local future based on wait version and throw the exception. >> >> Even though this still does not give us 100% guarantee that we did not >> miss >> anything, it should cover a lot more cases than now. >> >> 2017-01-30 14:20 GMT+03:00 Александр Меньшиков <[hidden email]>: >> >> > Alexey, >> > >> > For GridCacheIoManager#sendNoRetry it's not necessary because exception >> > will be ignored (or will cast to String). Perhaps my message was >> unclear. >> > I try to say that three methods can throw "ClusterTopologyCheckedExcepti >> on" >> > without any problem. >> > >> > But you are right, in some methods I can't set "retryFuture". We must >> > ensure that exception doesn't escape away. The best option is to ignore >> it >> > (or cast to String). >> > >> > But any way, look here: >> > >> > IgniteTxHandler#sendReply(UUID nodeId, GridDhtTxFinishRequest req, >> boolean >> > committed, GridCacheVersion nearTxId) >> > >> > Inside you can found next lines: >> > __________________________ >> > ClusterTopologyCheckedException *cause* = >> > new ClusterTopologyCheckedException("Primary node left grid."); >> > >> > res.checkCommittedError(new IgniteTxRollbackCheckedException("Failed to >> > commit transaction " + >> > "(transaction has been rolled back on backup node): " + >> req.version(), >> > *cause*)); >> > __________________________ >> > >> > How do you unsure that *"cause"* can't come to >> GridCacheUtils#retryTopologySafe(final >> >> > Callable<S> c) ? >> > >> > >> > Perhaps I don't know some rules which you use to maintain it now? >> > >> > >> > 2017-01-30 12:24 GMT+03:00 Alexey Goncharuk <[hidden email] >> >: >> > >> >> Alexander, >> >> >> >> Do you have a use-case in mind which results in IgniteTopologyException >> >> thrown from public API with retryFuture unset? >> >> >> >> retryFuture makes sense only for certain cache operations and may be >> set >> >> only in certain places in the code where we already know what to wait >> for. >> >> I do not see how you can initialize this future, for example, in >> >> GridCacheIoManager#sendNoRetry(ClusterNode node, GridCacheMessage >> msg, >> >> byte >> >> plc) >> >> >> >> --AG >> >> >> >> 2017-01-27 15:45 GMT+03:00 Александр Меньшиков <[hidden email]>: >> >> >> >> > I found a lot of using "ClusterTopologyCheckedException" exception. >> In >> >> > most >> >> > use-case these exceptions were just ignored. It's easier to see if >> >> > issue-4612 will be applied (I'm waiting for code review by my team >> >> leader) >> >> > where I renamed all unused exceptions in the "ignored". >> >> > >> >> > It means that in some case "retryReadyFuture" can left unset. But >> there >> >> are >> >> > code where exception is being getting from "get()" method in some >> >> "future" >> >> > class and don't ignored (exception is sending to method >> >> > "GridFutureAdapter#onDone()" for example). >> >> > >> >> > So I decided not to do a deep global analysis of code and make some >> >> simple >> >> > rules. >> >> > 1. If some method "X" throws ClusterTopologyCheckedException and >> >> ignores >> >> > it >> >> > (or converts to String, there are some variants to lose exception >> like >> >> > object) in all methods that call the method "X", then we can don't >> set >> >> > "retryReadyFuture" for optimization. But this should be clarified in >> >> > javadoc. >> >> > 2. But if exception escapes at least once like object: we must set >> >> > "retryReadyFuture" in that method. >> >> > >> >> > A few times when call tree was simple, I went deeper. >> >> > >> >> > So I found only three methods which can throw >> >> > "ClusterTopologyCheckedException" without setting >> "retryReadyFuture": >> >> > >> >> > 1. IgfsFragmentizerManager#sendWithRetries(UUID nodeId, >> >> > IgfsCommunicationMessage msg) >> >> > 2. GridCacheIoManager#sendNoRetry(ClusterNode node, GridCacheMessage >> >> msg, >> >> > byte plc) >> >> > 3. GridCacheIoManager#sendOrderedMessage(ClusterNode node, Object >> >> topic, >> >> > GridCacheMessage msg, byte plc, long timeout) >> >> > >> >> > In all other methods "ClusterTopologyCheckedException" escapes away >> too >> >> > far. >> >> > >> >> > What do you think about this approach to make compromise between >> >> > optimization and correctness? >> >> > >> >> >> > >> > >> > > |
Alexey,
I'm ready to make some conclusions. You can see result immediately here: PR: https://github.com/apache/ignite/pull/1537/files CI Tests: http://ci.ignite.apache.org/project.html?projectId=IgniteTests&tab=projectOverview&branch_IgniteTests=pull/1537/head I fixed only "ClusterTopologyCheckedException", and didn't touched "ClusterTopologyException". "ClusterTopologyException" has a same problem like "ClusterTopologyCheckedException", but even now changes is huge (80 files). So if you think fixing of "ClusterTopologyException" is necessary, you could add different issue in JIRA (or I can do that). And I will fix it in future. I can't implement your idea about using Runtime exception, because right now a lot of logic tied to the fact that this is "IgniteCheckedException". I real tried to mix it but I couldn't make it work. So before my changes we have 3 affected classes: 1. "ClusterTopologyCheckedException". 2. "ClusterTopologyServerNotFoundException". 3. "ClusterGroupEmptyCheckedException". Now we there are 5 affected classes: "ClusterTopologyCheckedException" split into 2 classes: 1. "ClusterTopologyCheckedException" (with future). 2. "ClusterTopologyLocalException" (without future, and it's parent for "ClusterTopologyCheckedException"). "ClusterTopologyServerNotFoundException" rename to 3. "ClusterTopologyServerNotFoundLocalException" (For consistency of names, I didn't find any cases where "ClusterTopologyServerNotFoundException" need his future). "ClusterGroupEmptyCheckedException" split into 2 classes: 4. "ClusterGroupEmptyCheckedException". 5. "ClusterGroupEmptyLocalException". Also in code "ready future" is using for waiting, and don't using for get real value (just look at code, all values just ignored). In fact "ready future" has type "IgniteFuture<?>". It suggests that result doesn't matter. 2017-02-10 11:06 GMT+03:00 Alexey Goncharuk <[hidden email]>: > Alexander, > > I do not see why you would need to overwrite the cause field. > > What I meant in the previous email is that instead of setting a retry > ready future in the checked exception, you would set a correct affinity > topology version here. Then, during a construction of CacheException > (unchecked, which is guaranteed to be thrown locally and will not be > serialized), you would create the retry ready future based on the topology > version set. > > Hope this helps, > AG > > 2017-02-07 17:18 GMT+03:00 Александр Меньшиков <[hidden email]>: > >> Alexey, I ran into some difficulties. >> >> Look at the method: GridNearTxFinishFuture.CheckBa >> ckupMiniFuture#onDhtFinishResponse(GridDhtTxFinishResponse res) >> >> *Throwable err* = res.checkCommittedError(); >> >> if (*err* != null) { >> if (*err* instanceof IgniteCheckedException) { >> *ClusterTopologyCheckedException cause* = >> ((IgniteCheckedException)*err*). >> *getCause(ClusterTopologyCheckedException.class)*; >> >> if (*cause* != null) >> *cause.retryReadyFuture(*cctx.ne >> xtAffinityReadyFuture(tx.topologyVersion())); >> * //^_____Here update the readyFut in >> some first "cause". * >> } >> >> onDone(*err*); >> } >> >> >> So I can't rewrite "cause" field in exception without reflection. It >> means if I try to use two exception: one with "readyFut" and second without >> "readyFut", I see no way to do it in this place. >> >> Perhaps I misunderstood your original idea. Or maybe this code some kind >> of a crutch and should be removed. What do you think? >> >> 2017-01-30 16:54 GMT+03:00 Alexey Goncharuk <[hidden email]>: >> >>> In the example above there is no point of setting the retry future in the >>> exception because it will be serialized and sent to a remote node. >>> >>> I see the only possible way to ensure this: make this be verified at >>> compile time. This looks like a big change, but the draft may look like >>> so: >>> 1) Introduce new exception type, like CacheTopology*Checked*Exception >>> which >>> *must* take the minimum topology version to wait for >>> 2) In all cases when a cache operation fails because of topology change, >>> provide the appropriate exception >>> 3) When CacheTopologyException is being constructed, create a >>> corresponding >>> local future based on wait version and throw the exception. >>> >>> Even though this still does not give us 100% guarantee that we did not >>> miss >>> anything, it should cover a lot more cases than now. >>> >>> 2017-01-30 14:20 GMT+03:00 Александр Меньшиков <[hidden email]>: >>> >>> > Alexey, >>> > >>> > For GridCacheIoManager#sendNoRetry it's not necessary because exception >>> > will be ignored (or will cast to String). Perhaps my message was >>> unclear. >>> > I try to say that three methods can throw >>> "ClusterTopologyCheckedException" >>> > without any problem. >>> > >>> > But you are right, in some methods I can't set "retryFuture". We must >>> > ensure that exception doesn't escape away. The best option is to >>> ignore it >>> > (or cast to String). >>> > >>> > But any way, look here: >>> > >>> > IgniteTxHandler#sendReply(UUID nodeId, GridDhtTxFinishRequest req, >>> boolean >>> > committed, GridCacheVersion nearTxId) >>> > >>> > Inside you can found next lines: >>> > __________________________ >>> > ClusterTopologyCheckedException *cause* = >>> > new ClusterTopologyCheckedException("Primary node left grid."); >>> > >>> > res.checkCommittedError(new IgniteTxRollbackCheckedException("Failed >>> to >>> > commit transaction " + >>> > "(transaction has been rolled back on backup node): " + >>> req.version(), >>> > *cause*)); >>> > __________________________ >>> > >>> > How do you unsure that *"cause"* can't come to >>> GridCacheUtils#retryTopologySafe(final >>> >>> > Callable<S> c) ? >>> > >>> > >>> > Perhaps I don't know some rules which you use to maintain it now? >>> > >>> > >>> > 2017-01-30 12:24 GMT+03:00 Alexey Goncharuk < >>> [hidden email]>: >>> > >>> >> Alexander, >>> >> >>> >> Do you have a use-case in mind which results in >>> IgniteTopologyException >>> >> thrown from public API with retryFuture unset? >>> >> >>> >> retryFuture makes sense only for certain cache operations and may be >>> set >>> >> only in certain places in the code where we already know what to wait >>> for. >>> >> I do not see how you can initialize this future, for example, in >>> >> GridCacheIoManager#sendNoRetry(ClusterNode node, GridCacheMessage >>> msg, >>> >> byte >>> >> plc) >>> >> >>> >> --AG >>> >> >>> >> 2017-01-27 15:45 GMT+03:00 Александр Меньшиков <[hidden email] >>> >: >>> >> >>> >> > I found a lot of using "ClusterTopologyCheckedException" >>> exception. In >>> >> > most >>> >> > use-case these exceptions were just ignored. It's easier to see if >>> >> > issue-4612 will be applied (I'm waiting for code review by my team >>> >> leader) >>> >> > where I renamed all unused exceptions in the "ignored". >>> >> > >>> >> > It means that in some case "retryReadyFuture" can left unset. But >>> there >>> >> are >>> >> > code where exception is being getting from "get()" method in some >>> >> "future" >>> >> > class and don't ignored (exception is sending to method >>> >> > "GridFutureAdapter#onDone()" for example). >>> >> > >>> >> > So I decided not to do a deep global analysis of code and make some >>> >> simple >>> >> > rules. >>> >> > 1. If some method "X" throws ClusterTopologyCheckedException and >>> >> ignores >>> >> > it >>> >> > (or converts to String, there are some variants to lose exception >>> like >>> >> > object) in all methods that call the method "X", then we can don't >>> set >>> >> > "retryReadyFuture" for optimization. But this should be clarified in >>> >> > javadoc. >>> >> > 2. But if exception escapes at least once like object: we must set >>> >> > "retryReadyFuture" in that method. >>> >> > >>> >> > A few times when call tree was simple, I went deeper. >>> >> > >>> >> > So I found only three methods which can throw >>> >> > "ClusterTopologyCheckedException" without setting >>> "retryReadyFuture": >>> >> > >>> >> > 1. IgfsFragmentizerManager#sendWithRetries(UUID nodeId, >>> >> > IgfsCommunicationMessage msg) >>> >> > 2. GridCacheIoManager#sendNoRetry(ClusterNode node, >>> GridCacheMessage >>> >> msg, >>> >> > byte plc) >>> >> > 3. GridCacheIoManager#sendOrderedMessage(ClusterNode node, Object >>> >> topic, >>> >> > GridCacheMessage msg, byte plc, long timeout) >>> >> > >>> >> > In all other methods "ClusterTopologyCheckedException" escapes >>> away too >>> >> > far. >>> >> > >>> >> > What do you think about this approach to make compromise between >>> >> > optimization and correctness? >>> >> > >>> >> >>> > >>> > >>> >> >> > |
Alexey, I just remind you about issue.
2017-02-20 16:42 GMT+03:00 Александр Меньшиков <[hidden email]>: > Alexey, > > I'm ready to make some conclusions. You can see result immediately here: > PR: https://github.com/apache/ignite/pull/1537/files > CI Tests: http://ci.ignite.apache.org/project.html?projectId= > IgniteTests&tab=projectOverview&branch_IgniteTests=pull/1537/head > > I fixed only "ClusterTopologyCheckedException", and didn't touched > "ClusterTopologyException". "ClusterTopologyException" has a same problem > like "ClusterTopologyCheckedException", but even now changes is huge (80 > files). So if you think fixing of "ClusterTopologyException" is necessary, > you could add different issue in JIRA (or I can do that). And I will fix it > in future. I can't implement your idea about using Runtime exception, > because right now a lot of logic tied to the fact that this is > "IgniteCheckedException". I real tried to mix it but I couldn't make it > work. > > So before my changes we have 3 affected classes: > 1. "ClusterTopologyCheckedException". > 2. "ClusterTopologyServerNotFoundException". > 3. "ClusterGroupEmptyCheckedException". > > Now we there are 5 affected classes: > > "ClusterTopologyCheckedException" split into 2 classes: > 1. "ClusterTopologyCheckedException" (with future). > 2. "ClusterTopologyLocalException" (without future, and it's parent for " > ClusterTopologyCheckedException"). > > "ClusterTopologyServerNotFoundException" rename to > 3. "ClusterTopologyServerNotFoundLocalException" (For consistency of > names, I didn't find any cases where "ClusterTopologyServerNotFoundException" > need his future). > > "ClusterGroupEmptyCheckedException" split into 2 classes: > 4. "ClusterGroupEmptyCheckedException". > 5. "ClusterGroupEmptyLocalException". > > Also in code "ready future" is using for waiting, and don't using for get > real value (just look at code, all values just ignored). In fact "ready > future" has type "IgniteFuture<?>". It suggests that result doesn't matter. > > 2017-02-10 11:06 GMT+03:00 Alexey Goncharuk <[hidden email]>: > >> Alexander, >> >> I do not see why you would need to overwrite the cause field. >> >> What I meant in the previous email is that instead of setting a retry >> ready future in the checked exception, you would set a correct affinity >> topology version here. Then, during a construction of CacheException >> (unchecked, which is guaranteed to be thrown locally and will not be >> serialized), you would create the retry ready future based on the topology >> version set. >> >> Hope this helps, >> AG >> >> 2017-02-07 17:18 GMT+03:00 Александр Меньшиков <[hidden email]>: >> >>> Alexey, I ran into some difficulties. >>> >>> Look at the method: GridNearTxFinishFuture.CheckBa >>> ckupMiniFuture#onDhtFinishResponse(GridDhtTxFinishResponse res) >>> >>> *Throwable err* = res.checkCommittedError(); >>> >>> if (*err* != null) { >>> if (*err* instanceof IgniteCheckedException) { >>> *ClusterTopologyCheckedException cause* = >>> ((IgniteCheckedException)*err*). >>> *getCause(ClusterTopologyCheckedException.class)*; >>> >>> if (*cause* != null) >>> *cause.retryReadyFuture(*cctx.ne >>> xtAffinityReadyFuture(tx.topologyVersion())); >>> * //^_____Here update the readyFut in >>> some first "cause". * >>> } >>> >>> onDone(*err*); >>> } >>> >>> >>> So I can't rewrite "cause" field in exception without reflection. It >>> means if I try to use two exception: one with "readyFut" and second without >>> "readyFut", I see no way to do it in this place. >>> >>> Perhaps I misunderstood your original idea. Or maybe this code some kind >>> of a crutch and should be removed. What do you think? >>> >>> 2017-01-30 16:54 GMT+03:00 Alexey Goncharuk <[hidden email]> >>> : >>> >>>> In the example above there is no point of setting the retry future in >>>> the >>>> exception because it will be serialized and sent to a remote node. >>>> >>>> I see the only possible way to ensure this: make this be verified at >>>> compile time. This looks like a big change, but the draft may look like >>>> so: >>>> 1) Introduce new exception type, like CacheTopology*Checked*Exception >>>> which >>>> *must* take the minimum topology version to wait for >>>> 2) In all cases when a cache operation fails because of topology change, >>>> provide the appropriate exception >>>> 3) When CacheTopologyException is being constructed, create a >>>> corresponding >>>> local future based on wait version and throw the exception. >>>> >>>> Even though this still does not give us 100% guarantee that we did not >>>> miss >>>> anything, it should cover a lot more cases than now. >>>> >>>> 2017-01-30 14:20 GMT+03:00 Александр Меньшиков <[hidden email]>: >>>> >>>> > Alexey, >>>> > >>>> > For GridCacheIoManager#sendNoRetry it's not necessary because >>>> exception >>>> > will be ignored (or will cast to String). Perhaps my message was >>>> unclear. >>>> > I try to say that three methods can throw >>>> "ClusterTopologyCheckedException" >>>> > without any problem. >>>> > >>>> > But you are right, in some methods I can't set "retryFuture". We must >>>> > ensure that exception doesn't escape away. The best option is to >>>> ignore it >>>> > (or cast to String). >>>> > >>>> > But any way, look here: >>>> > >>>> > IgniteTxHandler#sendReply(UUID nodeId, GridDhtTxFinishRequest req, >>>> boolean >>>> > committed, GridCacheVersion nearTxId) >>>> > >>>> > Inside you can found next lines: >>>> > __________________________ >>>> > ClusterTopologyCheckedException *cause* = >>>> > new ClusterTopologyCheckedException("Primary node left grid."); >>>> > >>>> > res.checkCommittedError(new IgniteTxRollbackCheckedException("Failed >>>> to >>>> > commit transaction " + >>>> > "(transaction has been rolled back on backup node): " + >>>> req.version(), >>>> > *cause*)); >>>> > __________________________ >>>> > >>>> > How do you unsure that *"cause"* can't come to >>>> GridCacheUtils#retryTopologySafe(final >>>> >>>> > Callable<S> c) ? >>>> > >>>> > >>>> > Perhaps I don't know some rules which you use to maintain it now? >>>> > >>>> > >>>> > 2017-01-30 12:24 GMT+03:00 Alexey Goncharuk < >>>> [hidden email]>: >>>> > >>>> >> Alexander, >>>> >> >>>> >> Do you have a use-case in mind which results in >>>> IgniteTopologyException >>>> >> thrown from public API with retryFuture unset? >>>> >> >>>> >> retryFuture makes sense only for certain cache operations and may be >>>> set >>>> >> only in certain places in the code where we already know what to >>>> wait for. >>>> >> I do not see how you can initialize this future, for example, in >>>> >> GridCacheIoManager#sendNoRetry(ClusterNode node, GridCacheMessage >>>> msg, >>>> >> byte >>>> >> plc) >>>> >> >>>> >> --AG >>>> >> >>>> >> 2017-01-27 15:45 GMT+03:00 Александр Меньшиков <[hidden email] >>>> >: >>>> >> >>>> >> > I found a lot of using "ClusterTopologyCheckedException" >>>> exception. In >>>> >> > most >>>> >> > use-case these exceptions were just ignored. It's easier to see if >>>> >> > issue-4612 will be applied (I'm waiting for code review by my team >>>> >> leader) >>>> >> > where I renamed all unused exceptions in the "ignored". >>>> >> > >>>> >> > It means that in some case "retryReadyFuture" can left unset. But >>>> there >>>> >> are >>>> >> > code where exception is being getting from "get()" method in some >>>> >> "future" >>>> >> > class and don't ignored (exception is sending to method >>>> >> > "GridFutureAdapter#onDone()" for example). >>>> >> > >>>> >> > So I decided not to do a deep global analysis of code and make some >>>> >> simple >>>> >> > rules. >>>> >> > 1. If some method "X" throws ClusterTopologyCheckedException and >>>> >> ignores >>>> >> > it >>>> >> > (or converts to String, there are some variants to lose exception >>>> like >>>> >> > object) in all methods that call the method "X", then we can don't >>>> set >>>> >> > "retryReadyFuture" for optimization. But this should be clarified >>>> in >>>> >> > javadoc. >>>> >> > 2. But if exception escapes at least once like object: we must set >>>> >> > "retryReadyFuture" in that method. >>>> >> > >>>> >> > A few times when call tree was simple, I went deeper. >>>> >> > >>>> >> > So I found only three methods which can throw >>>> >> > "ClusterTopologyCheckedException" without setting >>>> "retryReadyFuture": >>>> >> > >>>> >> > 1. IgfsFragmentizerManager#sendWithRetries(UUID nodeId, >>>> >> > IgfsCommunicationMessage msg) >>>> >> > 2. GridCacheIoManager#sendNoRetry(ClusterNode node, >>>> GridCacheMessage >>>> >> msg, >>>> >> > byte plc) >>>> >> > 3. GridCacheIoManager#sendOrderedMessage(ClusterNode node, Object >>>> >> topic, >>>> >> > GridCacheMessage msg, byte plc, long timeout) >>>> >> > >>>> >> > In all other methods "ClusterTopologyCheckedException" escapes >>>> away too >>>> >> > far. >>>> >> > >>>> >> > What do you think about this approach to make compromise between >>>> >> > optimization and correctness? >>>> >> > >>>> >> >>>> > >>>> > >>>> >>> >>> >> > |
Free forum by Nabble | Edit this page |