Apache Ignite Developers - Legacy Mail Archive

IGNITE-1948 ClusterTopologyCheckedException can return null for retryReadyFuture()

Classic

List

Threaded

9 messages Options

Александр Меньшиков

IGNITE-1948 ClusterTopologyCheckedException can return null for retryReadyFuture()

I found a lot of using "ClusterTopologyCheckedException" exception. In most
use-case these exceptions were just ignored. It's easier to see if
issue-4612 will be applied (I'm waiting for code review by my team leader)
where I renamed all unused exceptions in the "ignored".

It means that in some case "retryReadyFuture" can left unset. But there are
code where exception is being getting from "get()" method in some "future"
class and don't ignored (exception is sending to method
"GridFutureAdapter#onDone()" for example).

So I decided not to do a deep global analysis of code and make some simple
rules.
1. If some method "X" throws ClusterTopologyCheckedException and ignores it
(or converts to String, there are some variants to lose exception like
object) in all methods that call the method "X", then we can don't set
"retryReadyFuture" for optimization. But this should be clarified in
javadoc.
2. But if exception escapes at least once like object: we must set
"retryReadyFuture" in that method.

A few times when call tree was simple, I went deeper.

So I found only three methods which can throw
"ClusterTopologyCheckedException" without setting "retryReadyFuture":

1. IgfsFragmentizerManager#sendWithRetries(UUID nodeId,
IgfsCommunicationMessage msg)
2. GridCacheIoManager#sendNoRetry(ClusterNode node, GridCacheMessage msg,
byte plc)
3. GridCacheIoManager#sendOrderedMessage(ClusterNode node, Object topic,
GridCacheMessage msg, byte plc, long timeout)

In all other methods "ClusterTopologyCheckedException" escapes away too far.

What do you think about this approach to make compromise between
optimization and correctness?

Alexey Goncharuk

Re: IGNITE-1948 ClusterTopologyCheckedException can return null for retryReadyFuture()

Alexander,

Do you have a use-case in mind which results in IgniteTopologyException
thrown from public API with retryFuture unset?

retryFuture makes sense only for certain cache operations and may be set
only in certain places in the code where we already know what to wait for.
I do not see how you can initialize this future, for example, in
GridCacheIoManager#sendNoRetry(ClusterNode node, GridCacheMessage msg, byte
plc)

--AG

2017-01-27 15:45 GMT+03:00 Александр Меньшиков <[hidden email]>:

> I found a lot of using "ClusterTopologyCheckedException" exception. In
> most
> use-case these exceptions were just ignored. It's easier to see if
> issue-4612 will be applied (I'm waiting for code review by my team leader)
> where I renamed all unused exceptions in the "ignored".
>
> It means that in some case "retryReadyFuture" can left unset. But there are
> code where exception is being getting from "get()" method in some "future"
> class and don't ignored (exception is sending to method
> "GridFutureAdapter#onDone()" for example).
>
> So I decided not to do a deep global analysis of code and make some simple
> rules.
> 1. If some method "X" throws ClusterTopologyCheckedException and ignores
> it
> (or converts to String, there are some variants to lose exception like
> object) in all methods that call the method "X", then we can don't set
> "retryReadyFuture" for optimization. But this should be clarified in
> javadoc.
> 2. But if exception escapes at least once like object: we must set
> "retryReadyFuture" in that method.
>
> A few times when call tree was simple, I went deeper.
>
> So I found only three methods which can throw
> "ClusterTopologyCheckedException" without setting "retryReadyFuture":
>
> 1. IgfsFragmentizerManager#sendWithRetries(UUID nodeId,
> IgfsCommunicationMessage msg)
> 2. GridCacheIoManager#sendNoRetry(ClusterNode node, GridCacheMessage msg,
> byte plc)
> 3. GridCacheIoManager#sendOrderedMessage(ClusterNode node, Object topic,
> GridCacheMessage msg, byte plc, long timeout)
>
> In all other methods "ClusterTopologyCheckedException" escapes away too
> far.
>
> What do you think about this approach to make compromise between
> optimization and correctness?
>

Александр Меньшиков

Re: IGNITE-1948 ClusterTopologyCheckedException can return null for retryReadyFuture()

Alexey,

For GridCacheIoManager#sendNoRetry it's not necessary because exception
will be ignored (or will cast to String). Perhaps my message was unclear.
I try to say that three methods can throw "ClusterTopologyCheckedException"
without any problem.

But you are right, in some methods I can't set "retryFuture". We must
ensure that exception doesn't escape away. The best option is to ignore it
(or cast to String).

But any way, look here:

IgniteTxHandler#sendReply(UUID nodeId, GridDhtTxFinishRequest req, boolean
committed, GridCacheVersion nearTxId)

Inside you can found next lines:
__________________________
ClusterTopologyCheckedException *cause* =
new ClusterTopologyCheckedException("Primary node left grid.");

res.checkCommittedError(new IgniteTxRollbackCheckedException("Failed to
commit transaction " +
"(transaction has been rolled back on backup node): " + req.version(),
*cause*));
__________________________

How do you unsure that *"cause"* can't come to
GridCacheUtils#retryTopologySafe(final Callable<S> c) ?

Perhaps I don't know some rules which you use to maintain it now?

2017-01-30 12:24 GMT+03:00 Alexey Goncharuk <[hidden email]>:

> Alexander,
>
> Do you have a use-case in mind which results in IgniteTopologyException
> thrown from public API with retryFuture unset?
>
> retryFuture makes sense only for certain cache operations and may be set
> only in certain places in the code where we already know what to wait for.
> I do not see how you can initialize this future, for example, in
> GridCacheIoManager#sendNoRetry(ClusterNode node, GridCacheMessage msg,
> byte
> plc)
>
> --AG
>
> 2017-01-27 15:45 GMT+03:00 Александр Меньшиков <[hidden email]>:
>
> > I found a lot of using "ClusterTopologyCheckedException" exception. In
> > most
> > use-case these exceptions were just ignored. It's easier to see if
> > issue-4612 will be applied (I'm waiting for code review by my team
> leader)
> > where I renamed all unused exceptions in the "ignored".
> >
> > It means that in some case "retryReadyFuture" can left unset. But there
> are
> > code where exception is being getting from "get()" method in some
> "future"
> > class and don't ignored (exception is sending to method
> > "GridFutureAdapter#onDone()" for example).
> >
> > So I decided not to do a deep global analysis of code and make some
> simple
> > rules.
> > 1. If some method "X" throws ClusterTopologyCheckedException and ignores
> > it
> > (or converts to String, there are some variants to lose exception like
> > object) in all methods that call the method "X", then we can don't set
> > "retryReadyFuture" for optimization. But this should be clarified in
> > javadoc.
> > 2. But if exception escapes at least once like object: we must set
> > "retryReadyFuture" in that method.
> >
> > A few times when call tree was simple, I went deeper.
> >
> > So I found only three methods which can throw
> > "ClusterTopologyCheckedException" without setting "retryReadyFuture":
> >
> > 1. IgfsFragmentizerManager#sendWithRetries(UUID nodeId,
> > IgfsCommunicationMessage msg)
> > 2. GridCacheIoManager#sendNoRetry(ClusterNode node, GridCacheMessage
> msg,
> > byte plc)
> > 3. GridCacheIoManager#sendOrderedMessage(ClusterNode node, Object topic,
> > GridCacheMessage msg, byte plc, long timeout)
> >
> > In all other methods "ClusterTopologyCheckedException" escapes away too
> > far.
> >
> > What do you think about this approach to make compromise between
> > optimization and correctness?
> >
>

Alexey Goncharuk

Re: IGNITE-1948 ClusterTopologyCheckedException can return null for retryReadyFuture()

In the example above there is no point of setting the retry future in the
exception because it will be serialized and sent to a remote node.

I see the only possible way to ensure this: make this be verified at
compile time. This looks like a big change, but the draft may look like so:
1) Introduce new exception type, like CacheTopology*Checked*Exception which
*must* take the minimum topology version to wait for
2) In all cases when a cache operation fails because of topology change,
provide the appropriate exception
3) When CacheTopologyException is being constructed, create a corresponding
local future based on wait version and throw the exception.

Even though this still does not give us 100% guarantee that we did not miss
anything, it should cover a lot more cases than now.

2017-01-30 14:20 GMT+03:00 Александр Меньшиков <[hidden email]>:

> Alexey,
>
> For GridCacheIoManager#sendNoRetry it's not necessary because exception
> will be ignored (or will cast to String). Perhaps my message was unclear.
> I try to say that three methods can throw "ClusterTopologyCheckedException"
> without any problem.
>
> But you are right, in some methods I can't set "retryFuture". We must
> ensure that exception doesn't escape away. The best option is to ignore it
> (or cast to String).
>
> But any way, look here:
>
> IgniteTxHandler#sendReply(UUID nodeId, GridDhtTxFinishRequest req, boolean
> committed, GridCacheVersion nearTxId)
>
> Inside you can found next lines:
> __________________________
> ClusterTopologyCheckedException *cause* =
> new ClusterTopologyCheckedException("Primary node left grid.");
>
> res.checkCommittedError(new IgniteTxRollbackCheckedException("Failed to
> commit transaction " +
> "(transaction has been rolled back on backup node): " + req.version(),
> *cause*));
> __________________________
>
> How do you unsure that *"cause"* can't come to GridCacheUtils#retryTopologySafe(final
> Callable<S> c) ?
>
>
> Perhaps I don't know some rules which you use to maintain it now?
>
>
> 2017-01-30 12:24 GMT+03:00 Alexey Goncharuk <[hidden email]>:
>
>> Alexander,
>>
>> Do you have a use-case in mind which results in IgniteTopologyException
>> thrown from public API with retryFuture unset?
>>
>> retryFuture makes sense only for certain cache operations and may be set
>> only in certain places in the code where we already know what to wait for.
>> I do not see how you can initialize this future, for example, in
>> GridCacheIoManager#sendNoRetry(ClusterNode node, GridCacheMessage msg,
>> byte
>> plc)
>>
>> --AG
>>
>> 2017-01-27 15:45 GMT+03:00 Александр Меньшиков <[hidden email]>:
>>
>> > I found a lot of using "ClusterTopologyCheckedException" exception. In
>> > most
>> > use-case these exceptions were just ignored. It's easier to see if
>> > issue-4612 will be applied (I'm waiting for code review by my team
>> leader)
>> > where I renamed all unused exceptions in the "ignored".
>> >
>> > It means that in some case "retryReadyFuture" can left unset. But there
>> are
>> > code where exception is being getting from "get()" method in some
>> "future"
>> > class and don't ignored (exception is sending to method
>> > "GridFutureAdapter#onDone()" for example).
>> >
>> > So I decided not to do a deep global analysis of code and make some
>> simple
>> > rules.
>> > 1. If some method "X" throws ClusterTopologyCheckedException and
>> ignores
>> > it
>> > (or converts to String, there are some variants to lose exception like
>> > object) in all methods that call the method "X", then we can don't set
>> > "retryReadyFuture" for optimization. But this should be clarified in
>> > javadoc.
>> > 2. But if exception escapes at least once like object: we must set
>> > "retryReadyFuture" in that method.
>> >
>> > A few times when call tree was simple, I went deeper.
>> >
>> > So I found only three methods which can throw
>> > "ClusterTopologyCheckedException" without setting "retryReadyFuture":
>> >
>> > 1. IgfsFragmentizerManager#sendWithRetries(UUID nodeId,
>> > IgfsCommunicationMessage msg)
>> > 2. GridCacheIoManager#sendNoRetry(ClusterNode node, GridCacheMessage
>> msg,
>> > byte plc)
>> > 3. GridCacheIoManager#sendOrderedMessage(ClusterNode node, Object
>> topic,
>> > GridCacheMessage msg, byte plc, long timeout)
>> >
>> > In all other methods "ClusterTopologyCheckedException" escapes away too
>> > far.
>> >
>> > What do you think about this approach to make compromise between
>> > optimization and correctness?
>> >
>>
>
>

Александр Меньшиков

Re: IGNITE-1948 ClusterTopologyCheckedException can return null for retryReadyFuture()

Okay, it seems good. So you want to remove field *readyFut* from
*ClusterTopologyCheckedException* and add *CacheTopologyCheckedException*
like *ClusterTopologyCheckedException* but initialize *readyFut* in
constructor, yes?

2017-01-30 16:54 GMT+03:00 Alexey Goncharuk <[hidden email]>:

> In the example above there is no point of setting the retry future in the
> exception because it will be serialized and sent to a remote node.
>
> I see the only possible way to ensure this: make this be verified at
> compile time. This looks like a big change, but the draft may look like so:
> 1) Introduce new exception type, like CacheTopology*Checked*Exception
> which
> *must* take the minimum topology version to wait for
> 2) In all cases when a cache operation fails because of topology change,
> provide the appropriate exception
> 3) When CacheTopologyException is being constructed, create a corresponding
> local future based on wait version and throw the exception.
>
> Even though this still does not give us 100% guarantee that we did not miss
> anything, it should cover a lot more cases than now.
>
> 2017-01-30 14:20 GMT+03:00 Александр Меньшиков <[hidden email]>:
>
> > Alexey,
> >
> > For GridCacheIoManager#sendNoRetry it's not necessary because exception
> > will be ignored (or will cast to String). Perhaps my message was unclear.
> > I try to say that three methods can throw "
> ClusterTopologyCheckedException"
> > without any problem.
> >
> > But you are right, in some methods I can't set "retryFuture". We must
> > ensure that exception doesn't escape away. The best option is to ignore
> it
> > (or cast to String).
> >
> > But any way, look here:
> >
> > IgniteTxHandler#sendReply(UUID nodeId, GridDhtTxFinishRequest req,
> boolean
> > committed, GridCacheVersion nearTxId)
> >
> > Inside you can found next lines:
> > __________________________
> > ClusterTopologyCheckedException *cause* =
> > new ClusterTopologyCheckedException("Primary node left grid.");
> >
> > res.checkCommittedError(new IgniteTxRollbackCheckedException("Failed to
> > commit transaction " +
> > "(transaction has been rolled back on backup node): " +
> req.version(),
> > *cause*));
> > __________________________
> >
> > How do you unsure that *"cause"* can't come to GridCacheUtils#
> retryTopologySafe(final
> > Callable<S> c) ?
> >
> >
> > Perhaps I don't know some rules which you use to maintain it now?
> >
> >
> > 2017-01-30 12:24 GMT+03:00 Alexey Goncharuk <[hidden email]
> >:
> >
> >> Alexander,
> >>
> >> Do you have a use-case in mind which results in IgniteTopologyException
> >> thrown from public API with retryFuture unset?
> >>
> >> retryFuture makes sense only for certain cache operations and may be set
> >> only in certain places in the code where we already know what to wait
> for.
> >> I do not see how you can initialize this future, for example, in
> >> GridCacheIoManager#sendNoRetry(ClusterNode node, GridCacheMessage msg,
> >> byte
> >> plc)
> >>
> >> --AG
> >>
> >> 2017-01-27 15:45 GMT+03:00 Александр Меньшиков <[hidden email]>:
> >>
> >> > I found a lot of using "ClusterTopologyCheckedException" exception.
> In
> >> > most
> >> > use-case these exceptions were just ignored. It's easier to see if
> >> > issue-4612 will be applied (I'm waiting for code review by my team
> >> leader)
> >> > where I renamed all unused exceptions in the "ignored".
> >> >
> >> > It means that in some case "retryReadyFuture" can left unset. But
> there
> >> are
> >> > code where exception is being getting from "get()" method in some
> >> "future"
> >> > class and don't ignored (exception is sending to method
> >> > "GridFutureAdapter#onDone()" for example).
> >> >
> >> > So I decided not to do a deep global analysis of code and make some
> >> simple
> >> > rules.
> >> > 1. If some method "X" throws ClusterTopologyCheckedException and
> >> ignores
> >> > it
> >> > (or converts to String, there are some variants to lose exception like
> >> > object) in all methods that call the method "X", then we can don't set
> >> > "retryReadyFuture" for optimization. But this should be clarified in
> >> > javadoc.
> >> > 2. But if exception escapes at least once like object: we must set
> >> > "retryReadyFuture" in that method.
> >> >
> >> > A few times when call tree was simple, I went deeper.
> >> >
> >> > So I found only three methods which can throw
> >> > "ClusterTopologyCheckedException" without setting "retryReadyFuture":
> >> >
> >> > 1. IgfsFragmentizerManager#sendWithRetries(UUID nodeId,
> >> > IgfsCommunicationMessage msg)
> >> > 2. GridCacheIoManager#sendNoRetry(ClusterNode node, GridCacheMessage
> >> msg,
> >> > byte plc)
> >> > 3. GridCacheIoManager#sendOrderedMessage(ClusterNode node, Object
> >> topic,
> >> > GridCacheMessage msg, byte plc, long timeout)
> >> >
> >> > In all other methods "ClusterTopologyCheckedException" escapes away
> too
> >> > far.
> >> >
> >> > What do you think about this approach to make compromise between
> >> > optimization and correctness?
> >> >
> >>
> >
> >
>

Александр Меньшиков

Re: IGNITE-1948 ClusterTopologyCheckedException can return null for retryReadyFuture()

In reply to this post by Alexey Goncharuk

Alexey, I ran into some difficulties.

Look at the method:
GridNearTxFinishFuture.CheckBackupMiniFuture#onDhtFinishResponse(GridDhtTxFinishResponse
res)

*Throwable err* = res.checkCommittedError();

if (*err* != null) {
if (*err* instanceof IgniteCheckedException) {
*ClusterTopologyCheckedException cause* =
((IgniteCheckedException)*err*).
*getCause(ClusterTopologyCheckedException.class)*;

if (*cause* != null)
*cause.retryReadyFuture(*
cctx.nextAffinityReadyFuture(tx.topologyVersion()));
* //^_____Here update the readyFut in
some first "cause". *
}

onDone(*err*);
}

So I can't rewrite "cause" field in exception without reflection. It means
if I try to use two exception: one with "readyFut" and second without
"readyFut", I see no way to do it in this place.

Perhaps I misunderstood your original idea. Or maybe this code some kind of
a crutch and should be removed. What do you think?

2017-01-30 16:54 GMT+03:00 Alexey Goncharuk <[hidden email]>:

Alexey Goncharuk

Re: IGNITE-1948 ClusterTopologyCheckedException can return null for retryReadyFuture()

Alexander,

I do not see why you would need to overwrite the cause field.

What I meant in the previous email is that instead of setting a retry ready
future in the checked exception, you would set a correct affinity topology
version here. Then, during a construction of CacheException (unchecked,
which is guaranteed to be thrown locally and will not be serialized), you
would create the retry ready future based on the topology version set.

Hope this helps,
AG

2017-02-07 17:18 GMT+03:00 Александр Меньшиков <[hidden email]>:

> Alexey, I ran into some difficulties.
>
> Look at the method: GridNearTxFinishFuture.CheckBackupMiniFuture#
> onDhtFinishResponse(GridDhtTxFinishResponse res)
>
> *Throwable err* = res.checkCommittedError();
>
> if (*err* != null) {
> if (*err* instanceof IgniteCheckedException) {
> *ClusterTopologyCheckedException cause* =
> ((IgniteCheckedException)*err*).
> *getCause(ClusterTopologyCheckedException.class)*;
>
> if (*cause* != null)
> *cause.retryReadyFuture(*cctx.
> nextAffinityReadyFuture(tx.topologyVersion()));
> * //^_____Here update the readyFut in
> some first "cause". *
> }
>
> onDone(*err*);
> }
>
>
> So I can't rewrite "cause" field in exception without reflection. It means
> if I try to use two exception: one with "readyFut" and second without
> "readyFut", I see no way to do it in this place.
>
> Perhaps I misunderstood your original idea. Or maybe this code some kind
> of a crutch and should be removed. What do you think?
>
> 2017-01-30 16:54 GMT+03:00 Alexey Goncharuk <[hidden email]>:
>
>> In the example above there is no point of setting the retry future in the
>> exception because it will be serialized and sent to a remote node.
>>
>> I see the only possible way to ensure this: make this be verified at
>> compile time. This looks like a big change, but the draft may look like
>> so:
>> 1) Introduce new exception type, like CacheTopology*Checked*Exception
>> which
>> *must* take the minimum topology version to wait for
>> 2) In all cases when a cache operation fails because of topology change,
>> provide the appropriate exception
>> 3) When CacheTopologyException is being constructed, create a
>> corresponding
>> local future based on wait version and throw the exception.
>>
>> Even though this still does not give us 100% guarantee that we did not
>> miss
>> anything, it should cover a lot more cases than now.
>>
>> 2017-01-30 14:20 GMT+03:00 Александр Меньшиков <[hidden email]>:
>>
>> > Alexey,
>> >
>> > For GridCacheIoManager#sendNoRetry it's not necessary because exception
>> > will be ignored (or will cast to String). Perhaps my message was
>> unclear.
>> > I try to say that three methods can throw "ClusterTopologyCheckedExcepti
>> on"
>> > without any problem.
>> >
>> > But you are right, in some methods I can't set "retryFuture". We must
>> > ensure that exception doesn't escape away. The best option is to ignore
>> it
>> > (or cast to String).
>> >
>> > But any way, look here:
>> >
>> > IgniteTxHandler#sendReply(UUID nodeId, GridDhtTxFinishRequest req,
>> boolean
>> > committed, GridCacheVersion nearTxId)
>> >
>> > Inside you can found next lines:
>> > __________________________
>> > ClusterTopologyCheckedException *cause* =
>> > new ClusterTopologyCheckedException("Primary node left grid.");
>> >
>> > res.checkCommittedError(new IgniteTxRollbackCheckedException("Failed to
>> > commit transaction " +
>> > "(transaction has been rolled back on backup node): " +
>> req.version(),
>> > *cause*));
>> > __________________________
>> >
>> > How do you unsure that *"cause"* can't come to
>> GridCacheUtils#retryTopologySafe(final
>>
>> > Callable<S> c) ?
>> >
>> >
>> > Perhaps I don't know some rules which you use to maintain it now?
>> >
>> >
>> > 2017-01-30 12:24 GMT+03:00 Alexey Goncharuk <[hidden email]
>> >:
>> >
>> >> Alexander,
>> >>
>> >> Do you have a use-case in mind which results in IgniteTopologyException
>> >> thrown from public API with retryFuture unset?
>> >>
>> >> retryFuture makes sense only for certain cache operations and may be
>> set
>> >> only in certain places in the code where we already know what to wait
>> for.
>> >> I do not see how you can initialize this future, for example, in
>> >> GridCacheIoManager#sendNoRetry(ClusterNode node, GridCacheMessage
>> msg,
>> >> byte
>> >> plc)
>> >>
>> >> --AG
>> >>
>> >> 2017-01-27 15:45 GMT+03:00 Александр Меньшиков <[hidden email]>:
>> >>
>> >> > I found a lot of using "ClusterTopologyCheckedException" exception.
>> In
>> >> > most
>> >> > use-case these exceptions were just ignored. It's easier to see if
>> >> > issue-4612 will be applied (I'm waiting for code review by my team
>> >> leader)
>> >> > where I renamed all unused exceptions in the "ignored".
>> >> >
>> >> > It means that in some case "retryReadyFuture" can left unset. But
>> there
>> >> are
>> >> > code where exception is being getting from "get()" method in some
>> >> "future"
>> >> > class and don't ignored (exception is sending to method
>> >> > "GridFutureAdapter#onDone()" for example).
>> >> >
>> >> > So I decided not to do a deep global analysis of code and make some
>> >> simple
>> >> > rules.
>> >> > 1. If some method "X" throws ClusterTopologyCheckedException and
>> >> ignores
>> >> > it
>> >> > (or converts to String, there are some variants to lose exception
>> like
>> >> > object) in all methods that call the method "X", then we can don't
>> set
>> >> > "retryReadyFuture" for optimization. But this should be clarified in
>> >> > javadoc.
>> >> > 2. But if exception escapes at least once like object: we must set
>> >> > "retryReadyFuture" in that method.
>> >> >
>> >> > A few times when call tree was simple, I went deeper.
>> >> >
>> >> > So I found only three methods which can throw
>> >> > "ClusterTopologyCheckedException" without setting
>> "retryReadyFuture":
>> >> >
>> >> > 1. IgfsFragmentizerManager#sendWithRetries(UUID nodeId,
>> >> > IgfsCommunicationMessage msg)
>> >> > 2. GridCacheIoManager#sendNoRetry(ClusterNode node, GridCacheMessage
>> >> msg,
>> >> > byte plc)
>> >> > 3. GridCacheIoManager#sendOrderedMessage(ClusterNode node, Object
>> >> topic,
>> >> > GridCacheMessage msg, byte plc, long timeout)
>> >> >
>> >> > In all other methods "ClusterTopologyCheckedException" escapes away
>> too
>> >> > far.
>> >> >
>> >> > What do you think about this approach to make compromise between
>> >> > optimization and correctness?
>> >> >
>> >>
>> >
>> >
>>
>
>

Александр Меньшиков

Re: IGNITE-1948 ClusterTopologyCheckedException can return null for retryReadyFuture()

Alexey,

I'm ready to make some conclusions. You can see result immediately here:
PR: https://github.com/apache/ignite/pull/1537/files
CI Tests:
http://ci.ignite.apache.org/project.html?projectId=IgniteTests&tab=projectOverview&branch_IgniteTests=pull/1537/head

I fixed only "ClusterTopologyCheckedException", and didn't touched
"ClusterTopologyException". "ClusterTopologyException" has a same problem
like "ClusterTopologyCheckedException", but even now changes is huge (80
files). So if you think fixing of "ClusterTopologyException" is necessary,
you could add different issue in JIRA (or I can do that). And I will fix it
in future. I can't implement your idea about using Runtime exception,
because right now a lot of logic tied to the fact that this is
"IgniteCheckedException". I real tried to mix it but I couldn't make it
work.

So before my changes we have 3 affected classes:
1. "ClusterTopologyCheckedException".
2. "ClusterTopologyServerNotFoundException".
3. "ClusterGroupEmptyCheckedException".

Now we there are 5 affected classes:

"ClusterTopologyCheckedException" split into 2 classes:
1. "ClusterTopologyCheckedException" (with future).
2. "ClusterTopologyLocalException" (without future, and it's parent for
"ClusterTopologyCheckedException").

"ClusterTopologyServerNotFoundException" rename to
3. "ClusterTopologyServerNotFoundLocalException" (For consistency of names,
I didn't find any cases where "ClusterTopologyServerNotFoundException" need
his future).

"ClusterGroupEmptyCheckedException" split into 2 classes:
4. "ClusterGroupEmptyCheckedException".
5. "ClusterGroupEmptyLocalException".

Also in code "ready future" is using for waiting, and don't using for get
real value (just look at code, all values just ignored). In fact "ready
future" has type "IgniteFuture<?>". It suggests that result doesn't matter.

2017-02-10 11:06 GMT+03:00 Alexey Goncharuk <[hidden email]>:

> Alexander,
>
> I do not see why you would need to overwrite the cause field.
>
> What I meant in the previous email is that instead of setting a retry
> ready future in the checked exception, you would set a correct affinity
> topology version here. Then, during a construction of CacheException
> (unchecked, which is guaranteed to be thrown locally and will not be
> serialized), you would create the retry ready future based on the topology
> version set.
>
> Hope this helps,
> AG
>
> 2017-02-07 17:18 GMT+03:00 Александр Меньшиков <[hidden email]>:
>
>> Alexey, I ran into some difficulties.
>>
>> Look at the method: GridNearTxFinishFuture.CheckBa
>> ckupMiniFuture#onDhtFinishResponse(GridDhtTxFinishResponse res)
>>
>> *Throwable err* = res.checkCommittedError();
>>
>> if (*err* != null) {
>> if (*err* instanceof IgniteCheckedException) {
>> *ClusterTopologyCheckedException cause* =
>> ((IgniteCheckedException)*err*).
>> *getCause(ClusterTopologyCheckedException.class)*;
>>
>> if (*cause* != null)
>> *cause.retryReadyFuture(*cctx.ne
>> xtAffinityReadyFuture(tx.topologyVersion()));
>> * //^_____Here update the readyFut in
>> some first "cause". *
>> }
>>
>> onDone(*err*);
>> }
>>
>>
>> So I can't rewrite "cause" field in exception without reflection. It
>> means if I try to use two exception: one with "readyFut" and second without
>> "readyFut", I see no way to do it in this place.
>>
>> Perhaps I misunderstood your original idea. Or maybe this code some kind
>> of a crutch and should be removed. What do you think?
>>
>> 2017-01-30 16:54 GMT+03:00 Alexey Goncharuk <[hidden email]>:
>>
>>> In the example above there is no point of setting the retry future in the
>>> exception because it will be serialized and sent to a remote node.
>>>
>>> I see the only possible way to ensure this: make this be verified at
>>> compile time. This looks like a big change, but the draft may look like
>>> so:
>>> 1) Introduce new exception type, like CacheTopology*Checked*Exception
>>> which
>>> *must* take the minimum topology version to wait for
>>> 2) In all cases when a cache operation fails because of topology change,
>>> provide the appropriate exception
>>> 3) When CacheTopologyException is being constructed, create a
>>> corresponding
>>> local future based on wait version and throw the exception.
>>>
>>> Even though this still does not give us 100% guarantee that we did not
>>> miss
>>> anything, it should cover a lot more cases than now.
>>>
>>> 2017-01-30 14:20 GMT+03:00 Александр Меньшиков <[hidden email]>:
>>>
>>> > Alexey,
>>> >
>>> > For GridCacheIoManager#sendNoRetry it's not necessary because exception
>>> > will be ignored (or will cast to String). Perhaps my message was
>>> unclear.
>>> > I try to say that three methods can throw
>>> "ClusterTopologyCheckedException"
>>> > without any problem.
>>> >
>>> > But you are right, in some methods I can't set "retryFuture". We must
>>> > ensure that exception doesn't escape away. The best option is to
>>> ignore it
>>> > (or cast to String).
>>> >
>>> > But any way, look here:
>>> >
>>> > IgniteTxHandler#sendReply(UUID nodeId, GridDhtTxFinishRequest req,
>>> boolean
>>> > committed, GridCacheVersion nearTxId)
>>> >
>>> > Inside you can found next lines:
>>> > __________________________
>>> > ClusterTopologyCheckedException *cause* =
>>> > new ClusterTopologyCheckedException("Primary node left grid.");
>>> >
>>> > res.checkCommittedError(new IgniteTxRollbackCheckedException("Failed
>>> to
>>> > commit transaction " +
>>> > "(transaction has been rolled back on backup node): " +
>>> req.version(),
>>> > *cause*));
>>> > __________________________
>>> >
>>> > How do you unsure that *"cause"* can't come to
>>> GridCacheUtils#retryTopologySafe(final
>>>
>>> > Callable<S> c) ?
>>> >
>>> >
>>> > Perhaps I don't know some rules which you use to maintain it now?
>>> >
>>> >
>>> > 2017-01-30 12:24 GMT+03:00 Alexey Goncharuk <
>>> [hidden email]>:
>>> >
>>> >> Alexander,
>>> >>
>>> >> Do you have a use-case in mind which results in
>>> IgniteTopologyException
>>> >> thrown from public API with retryFuture unset?
>>> >>
>>> >> retryFuture makes sense only for certain cache operations and may be
>>> set
>>> >> only in certain places in the code where we already know what to wait
>>> for.
>>> >> I do not see how you can initialize this future, for example, in
>>> >> GridCacheIoManager#sendNoRetry(ClusterNode node, GridCacheMessage
>>> msg,
>>> >> byte
>>> >> plc)
>>> >>
>>> >> --AG
>>> >>
>>> >> 2017-01-27 15:45 GMT+03:00 Александр Меньшиков <[hidden email]
>>> >:
>>> >>
>>> >> > I found a lot of using "ClusterTopologyCheckedException"
>>> exception. In
>>> >> > most
>>> >> > use-case these exceptions were just ignored. It's easier to see if
>>> >> > issue-4612 will be applied (I'm waiting for code review by my team
>>> >> leader)
>>> >> > where I renamed all unused exceptions in the "ignored".
>>> >> >
>>> >> > It means that in some case "retryReadyFuture" can left unset. But
>>> there
>>> >> are
>>> >> > code where exception is being getting from "get()" method in some
>>> >> "future"
>>> >> > class and don't ignored (exception is sending to method
>>> >> > "GridFutureAdapter#onDone()" for example).
>>> >> >
>>> >> > So I decided not to do a deep global analysis of code and make some
>>> >> simple
>>> >> > rules.
>>> >> > 1. If some method "X" throws ClusterTopologyCheckedException and
>>> >> ignores
>>> >> > it
>>> >> > (or converts to String, there are some variants to lose exception
>>> like
>>> >> > object) in all methods that call the method "X", then we can don't
>>> set
>>> >> > "retryReadyFuture" for optimization. But this should be clarified in
>>> >> > javadoc.
>>> >> > 2. But if exception escapes at least once like object: we must set
>>> >> > "retryReadyFuture" in that method.
>>> >> >
>>> >> > A few times when call tree was simple, I went deeper.
>>> >> >
>>> >> > So I found only three methods which can throw
>>> >> > "ClusterTopologyCheckedException" without setting
>>> "retryReadyFuture":
>>> >> >
>>> >> > 1. IgfsFragmentizerManager#sendWithRetries(UUID nodeId,
>>> >> > IgfsCommunicationMessage msg)
>>> >> > 2. GridCacheIoManager#sendNoRetry(ClusterNode node,
>>> GridCacheMessage
>>> >> msg,
>>> >> > byte plc)
>>> >> > 3. GridCacheIoManager#sendOrderedMessage(ClusterNode node, Object
>>> >> topic,
>>> >> > GridCacheMessage msg, byte plc, long timeout)
>>> >> >
>>> >> > In all other methods "ClusterTopologyCheckedException" escapes
>>> away too
>>> >> > far.
>>> >> >
>>> >> > What do you think about this approach to make compromise between
>>> >> > optimization and correctness?
>>> >> >
>>> >>
>>> >
>>> >
>>>
>>
>>
>

Александр Меньшиков

Re: IGNITE-1948 ClusterTopologyCheckedException can return null for retryReadyFuture()

Alexey, I just remind you about issue.

2017-02-20 16:42 GMT+03:00 Александр Меньшиков <[hidden email]>:

> Alexey,
>
> I'm ready to make some conclusions. You can see result immediately here:
> PR: https://github.com/apache/ignite/pull/1537/files
> CI Tests: http://ci.ignite.apache.org/project.html?projectId=
> IgniteTests&tab=projectOverview&branch_IgniteTests=pull/1537/head
>
> I fixed only "ClusterTopologyCheckedException", and didn't touched
> "ClusterTopologyException". "ClusterTopologyException" has a same problem
> like "ClusterTopologyCheckedException", but even now changes is huge (80
> files). So if you think fixing of "ClusterTopologyException" is necessary,
> you could add different issue in JIRA (or I can do that). And I will fix it
> in future. I can't implement your idea about using Runtime exception,
> because right now a lot of logic tied to the fact that this is
> "IgniteCheckedException". I real tried to mix it but I couldn't make it
> work.
>
> So before my changes we have 3 affected classes:
> 1. "ClusterTopologyCheckedException".
> 2. "ClusterTopologyServerNotFoundException".
> 3. "ClusterGroupEmptyCheckedException".
>
> Now we there are 5 affected classes:
>
> "ClusterTopologyCheckedException" split into 2 classes:
> 1. "ClusterTopologyCheckedException" (with future).
> 2. "ClusterTopologyLocalException" (without future, and it's parent for "
> ClusterTopologyCheckedException").
>
> "ClusterTopologyServerNotFoundException" rename to
> 3. "ClusterTopologyServerNotFoundLocalException" (For consistency of
> names, I didn't find any cases where "ClusterTopologyServerNotFoundException"
> need his future).
>
> "ClusterGroupEmptyCheckedException" split into 2 classes:
> 4. "ClusterGroupEmptyCheckedException".
> 5. "ClusterGroupEmptyLocalException".
>
> Also in code "ready future" is using for waiting, and don't using for get
> real value (just look at code, all values just ignored). In fact "ready
> future" has type "IgniteFuture<?>". It suggests that result doesn't matter.
>
> 2017-02-10 11:06 GMT+03:00 Alexey Goncharuk <[hidden email]>:
>
>> Alexander,
>>
>> I do not see why you would need to overwrite the cause field.
>>
>> What I meant in the previous email is that instead of setting a retry
>> ready future in the checked exception, you would set a correct affinity
>> topology version here. Then, during a construction of CacheException
>> (unchecked, which is guaranteed to be thrown locally and will not be
>> serialized), you would create the retry ready future based on the topology
>> version set.
>>
>> Hope this helps,
>> AG
>>
>> 2017-02-07 17:18 GMT+03:00 Александр Меньшиков <[hidden email]>:
>>
>>> Alexey, I ran into some difficulties.
>>>
>>> Look at the method: GridNearTxFinishFuture.CheckBa
>>> ckupMiniFuture#onDhtFinishResponse(GridDhtTxFinishResponse res)
>>>
>>> *Throwable err* = res.checkCommittedError();
>>>
>>> if (*err* != null) {
>>> if (*err* instanceof IgniteCheckedException) {
>>> *ClusterTopologyCheckedException cause* =
>>> ((IgniteCheckedException)*err*).
>>> *getCause(ClusterTopologyCheckedException.class)*;
>>>
>>> if (*cause* != null)
>>> *cause.retryReadyFuture(*cctx.ne
>>> xtAffinityReadyFuture(tx.topologyVersion()));
>>> * //^_____Here update the readyFut in
>>> some first "cause". *
>>> }
>>>
>>> onDone(*err*);
>>> }
>>>
>>>
>>> So I can't rewrite "cause" field in exception without reflection. It
>>> means if I try to use two exception: one with "readyFut" and second without
>>> "readyFut", I see no way to do it in this place.
>>>
>>> Perhaps I misunderstood your original idea. Or maybe this code some kind
>>> of a crutch and should be removed. What do you think?
>>>
>>> 2017-01-30 16:54 GMT+03:00 Alexey Goncharuk <[hidden email]>
>>> :
>>>
>>>> In the example above there is no point of setting the retry future in
>>>> the
>>>> exception because it will be serialized and sent to a remote node.
>>>>
>>>> I see the only possible way to ensure this: make this be verified at
>>>> compile time. This looks like a big change, but the draft may look like
>>>> so:
>>>> 1) Introduce new exception type, like CacheTopology*Checked*Exception
>>>> which
>>>> *must* take the minimum topology version to wait for
>>>> 2) In all cases when a cache operation fails because of topology change,
>>>> provide the appropriate exception
>>>> 3) When CacheTopologyException is being constructed, create a
>>>> corresponding
>>>> local future based on wait version and throw the exception.
>>>>
>>>> Even though this still does not give us 100% guarantee that we did not
>>>> miss
>>>> anything, it should cover a lot more cases than now.
>>>>
>>>> 2017-01-30 14:20 GMT+03:00 Александр Меньшиков <[hidden email]>:
>>>>
>>>> > Alexey,
>>>> >
>>>> > For GridCacheIoManager#sendNoRetry it's not necessary because
>>>> exception
>>>> > will be ignored (or will cast to String). Perhaps my message was
>>>> unclear.
>>>> > I try to say that three methods can throw
>>>> "ClusterTopologyCheckedException"
>>>> > without any problem.
>>>> >
>>>> > But you are right, in some methods I can't set "retryFuture". We must
>>>> > ensure that exception doesn't escape away. The best option is to
>>>> ignore it
>>>> > (or cast to String).
>>>> >
>>>> > But any way, look here:
>>>> >
>>>> > IgniteTxHandler#sendReply(UUID nodeId, GridDhtTxFinishRequest req,
>>>> boolean
>>>> > committed, GridCacheVersion nearTxId)
>>>> >
>>>> > Inside you can found next lines:
>>>> > __________________________
>>>> > ClusterTopologyCheckedException *cause* =
>>>> > new ClusterTopologyCheckedException("Primary node left grid.");
>>>> >
>>>> > res.checkCommittedError(new IgniteTxRollbackCheckedException("Failed
>>>> to
>>>> > commit transaction " +
>>>> > "(transaction has been rolled back on backup node): " +
>>>> req.version(),
>>>> > *cause*));
>>>> > __________________________
>>>> >
>>>> > How do you unsure that *"cause"* can't come to
>>>> GridCacheUtils#retryTopologySafe(final
>>>>
>>>> > Callable<S> c) ?
>>>> >
>>>> >
>>>> > Perhaps I don't know some rules which you use to maintain it now?
>>>> >
>>>> >
>>>> > 2017-01-30 12:24 GMT+03:00 Alexey Goncharuk <
>>>> [hidden email]>:
>>>> >
>>>> >> Alexander,
>>>> >>
>>>> >> Do you have a use-case in mind which results in
>>>> IgniteTopologyException
>>>> >> thrown from public API with retryFuture unset?
>>>> >>
>>>> >> retryFuture makes sense only for certain cache operations and may be
>>>> set
>>>> >> only in certain places in the code where we already know what to
>>>> wait for.
>>>> >> I do not see how you can initialize this future, for example, in
>>>> >> GridCacheIoManager#sendNoRetry(ClusterNode node, GridCacheMessage
>>>> msg,
>>>> >> byte
>>>> >> plc)
>>>> >>
>>>> >> --AG
>>>> >>
>>>> >> 2017-01-27 15:45 GMT+03:00 Александр Меньшиков <[hidden email]
>>>> >:
>>>> >>
>>>> >> > I found a lot of using "ClusterTopologyCheckedException"
>>>> exception. In
>>>> >> > most
>>>> >> > use-case these exceptions were just ignored. It's easier to see if
>>>> >> > issue-4612 will be applied (I'm waiting for code review by my team
>>>> >> leader)
>>>> >> > where I renamed all unused exceptions in the "ignored".
>>>> >> >
>>>> >> > It means that in some case "retryReadyFuture" can left unset. But
>>>> there
>>>> >> are
>>>> >> > code where exception is being getting from "get()" method in some
>>>> >> "future"
>>>> >> > class and don't ignored (exception is sending to method
>>>> >> > "GridFutureAdapter#onDone()" for example).
>>>> >> >
>>>> >> > So I decided not to do a deep global analysis of code and make some
>>>> >> simple
>>>> >> > rules.
>>>> >> > 1. If some method "X" throws ClusterTopologyCheckedException and
>>>> >> ignores
>>>> >> > it
>>>> >> > (or converts to String, there are some variants to lose exception
>>>> like
>>>> >> > object) in all methods that call the method "X", then we can don't
>>>> set
>>>> >> > "retryReadyFuture" for optimization. But this should be clarified
>>>> in
>>>> >> > javadoc.
>>>> >> > 2. But if exception escapes at least once like object: we must set
>>>> >> > "retryReadyFuture" in that method.
>>>> >> >
>>>> >> > A few times when call tree was simple, I went deeper.
>>>> >> >
>>>> >> > So I found only three methods which can throw
>>>> >> > "ClusterTopologyCheckedException" without setting
>>>> "retryReadyFuture":
>>>> >> >
>>>> >> > 1. IgfsFragmentizerManager#sendWithRetries(UUID nodeId,
>>>> >> > IgfsCommunicationMessage msg)
>>>> >> > 2. GridCacheIoManager#sendNoRetry(ClusterNode node,
>>>> GridCacheMessage
>>>> >> msg,
>>>> >> > byte plc)
>>>> >> > 3. GridCacheIoManager#sendOrderedMessage(ClusterNode node, Object
>>>> >> topic,
>>>> >> > GridCacheMessage msg, byte plc, long timeout)
>>>> >> >
>>>> >> > In all other methods "ClusterTopologyCheckedException" escapes
>>>> away too
>>>> >> > far.
>>>> >> >
>>>> >> > What do you think about this approach to make compromise between
>>>> >> > optimization and correctness?
>>>> >> >
>>>> >>
>>>> >
>>>> >
>>>>
>>>
>>>
>>
>