ignite 1.4 status

classic Classic list List threaded Threaded
26 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Re: ignite 1.4 status

Alexey Goncharuk
Yakov,

I think I fixed the remaining issues in the branch. There was one issue
with the pending queue - my original ordering for messages was not correct.
The other thing was the NodeAddFinished message processing that I consulter
you with over Skype. The TC looks green(ish), I cleaned up the code and
merged it to ignite-1171 (non-debug) branch, triggered TC one more time.

It would be great if you guys trigger TC couple more times and monitor it's
state because we changed I guess the most sensitive part of Ignite, but it
feels like we're pretty close to get this issue fixed :)

2015-09-22 9:43 GMT-07:00 Yakov Zhdanov <[hidden email]>:

> Alex, I spent some time debugging this today.
>
> I noticed that we do not verify that topology version of the custom message
> is identical to current ring version. After I added this condition test
> started passing. However, it hangs from time to time since custom message
> gets discarded before it gets processed (the new condition works here)
> which means that topology version has somehow been changed, but custom
> message has not been processed yet by that time.
>
> My changes are in ignite-1171-debug. Can you please take a further look?
>
> --Yakov
>
> 2015-09-22 5:50 GMT+03:00 Alexey Goncharuk <[hidden email]>:
>
> > Folks,
> >
> > I was debugging issues with discovery today, my findings are below:
> >
> >    - Issue with assertion "topology version has not been updated" was
> >    caused by sending discard message for custom messages. Now since we
> >    re-arrange custom messages, discardId gets repositioned and messages
> > that
> >    should have been discarded were not discarded.
> >    - Fixed the issue above by introducing separate pending queue for
> custom
> >    messages which gets discarded independently from other discovery
> > messages.
> >    - Did not get to the bottom of "joining nodes" assertion. From the
> debug
> >    I see that coordinator always fires custom messages at the right
> moment,
> >    when joiningNodes is empty, however despite the fixed (above) issue
> with
> >    custom messages discard, custom processed custom messages get re-sent
> > which
> >    leads to this assertion
> >
> > I committed my pending debug code to ignite-1171-debug branch, if any of
> > you guys is up to debugging this issue while I'm asleep - great, if not -
> > I'll continue digging into it tomorrow.
> >
> > 2015-09-21 10:55 GMT-07:00 Yakov Zhdanov <[hidden email]>:
> >
> > > Igniters,
> > >
> > > We are not ready to release today.
> > >
> > > Alexey Goncharuk is still working on ignite-1171. Alex please provide
> > > updates by the end of the day.
> > >
> > > https://issues.apache.org/jira/browse/IGNITE-1516 - performance
> offheap
> > > query benchmark is not fully recovered. Semyon will be fixing it.
> Sergi,
> > > can you please assist?
> > >
> > > https://issues.apache.org/jira/browse/IGNITE-973 - Semyon has fixed
> race
> > > in
> > > cache logic, but issue is still reproducible due to possible issues in
> > > indexing logic. Sergi, this is on you. Can you please take a look?
> > >
> > > --Yakov
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: ignite 1.4 status

yzhdanov
Alex, I merged 1171 to 1.4 branch.

We have very few tickets left.

Guys, who is in charge of checking this -
https://issues.apache.org/jira/browse/IGNITE-973? Please provide the update?

Hopefully, we will release tomorrow.

Thanks!

--Yakov

2015-09-23 8:37 GMT+03:00 Alexey Goncharuk <[hidden email]>:

> Yakov,
>
> I think I fixed the remaining issues in the branch. There was one issue
> with the pending queue - my original ordering for messages was not correct.
> The other thing was the NodeAddFinished message processing that I consulter
> you with over Skype. The TC looks green(ish), I cleaned up the code and
> merged it to ignite-1171 (non-debug) branch, triggered TC one more time.
>
> It would be great if you guys trigger TC couple more times and monitor it's
> state because we changed I guess the most sensitive part of Ignite, but it
> feels like we're pretty close to get this issue fixed :)
>
> 2015-09-22 9:43 GMT-07:00 Yakov Zhdanov <[hidden email]>:
>
> > Alex, I spent some time debugging this today.
> >
> > I noticed that we do not verify that topology version of the custom
> message
> > is identical to current ring version. After I added this condition test
> > started passing. However, it hangs from time to time since custom message
> > gets discarded before it gets processed (the new condition works here)
> > which means that topology version has somehow been changed, but custom
> > message has not been processed yet by that time.
> >
> > My changes are in ignite-1171-debug. Can you please take a further look?
> >
> > --Yakov
> >
> > 2015-09-22 5:50 GMT+03:00 Alexey Goncharuk <[hidden email]>:
> >
> > > Folks,
> > >
> > > I was debugging issues with discovery today, my findings are below:
> > >
> > >    - Issue with assertion "topology version has not been updated" was
> > >    caused by sending discard message for custom messages. Now since we
> > >    re-arrange custom messages, discardId gets repositioned and messages
> > > that
> > >    should have been discarded were not discarded.
> > >    - Fixed the issue above by introducing separate pending queue for
> > custom
> > >    messages which gets discarded independently from other discovery
> > > messages.
> > >    - Did not get to the bottom of "joining nodes" assertion. From the
> > debug
> > >    I see that coordinator always fires custom messages at the right
> > moment,
> > >    when joiningNodes is empty, however despite the fixed (above) issue
> > with
> > >    custom messages discard, custom processed custom messages get
> re-sent
> > > which
> > >    leads to this assertion
> > >
> > > I committed my pending debug code to ignite-1171-debug branch, if any
> of
> > > you guys is up to debugging this issue while I'm asleep - great, if
> not -
> > > I'll continue digging into it tomorrow.
> > >
> > > 2015-09-21 10:55 GMT-07:00 Yakov Zhdanov <[hidden email]>:
> > >
> > > > Igniters,
> > > >
> > > > We are not ready to release today.
> > > >
> > > > Alexey Goncharuk is still working on ignite-1171. Alex please provide
> > > > updates by the end of the day.
> > > >
> > > > https://issues.apache.org/jira/browse/IGNITE-1516 - performance
> > offheap
> > > > query benchmark is not fully recovered. Semyon will be fixing it.
> > Sergi,
> > > > can you please assist?
> > > >
> > > > https://issues.apache.org/jira/browse/IGNITE-973 - Semyon has fixed
> > race
> > > > in
> > > > cache logic, but issue is still reproducible due to possible issues
> in
> > > > indexing logic. Sergi, this is on you. Can you please take a look?
> > > >
> > > > --Yakov
> > > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: ignite 1.4 status

Andrey Gura
In reply to this post by Alexey Goncharuk
Alexey,

I have been running Vert.x cluster manager tests today. IGNITE-1171 doesn't
reproduces anymore.

But new problem was found: https://issues.apache.org/jira/browse/IGNITE-1534

I'll try to create Ignite test for this problem. I hope you have some ideas
about how to reproduce (stable) and fix it.

On Wed, Sep 23, 2015 at 8:37 AM, Alexey Goncharuk <
[hidden email]> wrote:

> Yakov,
>
> I think I fixed the remaining issues in the branch. There was one issue
> with the pending queue - my original ordering for messages was not correct.
> The other thing was the NodeAddFinished message processing that I consulter
> you with over Skype. The TC looks green(ish), I cleaned up the code and
> merged it to ignite-1171 (non-debug) branch, triggered TC one more time.
>
> It would be great if you guys trigger TC couple more times and monitor it's
> state because we changed I guess the most sensitive part of Ignite, but it
> feels like we're pretty close to get this issue fixed :)
>
> 2015-09-22 9:43 GMT-07:00 Yakov Zhdanov <[hidden email]>:
>
> > Alex, I spent some time debugging this today.
> >
> > I noticed that we do not verify that topology version of the custom
> message
> > is identical to current ring version. After I added this condition test
> > started passing. However, it hangs from time to time since custom message
> > gets discarded before it gets processed (the new condition works here)
> > which means that topology version has somehow been changed, but custom
> > message has not been processed yet by that time.
> >
> > My changes are in ignite-1171-debug. Can you please take a further look?
> >
> > --Yakov
> >
> > 2015-09-22 5:50 GMT+03:00 Alexey Goncharuk <[hidden email]>:
> >
> > > Folks,
> > >
> > > I was debugging issues with discovery today, my findings are below:
> > >
> > >    - Issue with assertion "topology version has not been updated" was
> > >    caused by sending discard message for custom messages. Now since we
> > >    re-arrange custom messages, discardId gets repositioned and messages
> > > that
> > >    should have been discarded were not discarded.
> > >    - Fixed the issue above by introducing separate pending queue for
> > custom
> > >    messages which gets discarded independently from other discovery
> > > messages.
> > >    - Did not get to the bottom of "joining nodes" assertion. From the
> > debug
> > >    I see that coordinator always fires custom messages at the right
> > moment,
> > >    when joiningNodes is empty, however despite the fixed (above) issue
> > with
> > >    custom messages discard, custom processed custom messages get
> re-sent
> > > which
> > >    leads to this assertion
> > >
> > > I committed my pending debug code to ignite-1171-debug branch, if any
> of
> > > you guys is up to debugging this issue while I'm asleep - great, if
> not -
> > > I'll continue digging into it tomorrow.
> > >
> > > 2015-09-21 10:55 GMT-07:00 Yakov Zhdanov <[hidden email]>:
> > >
> > > > Igniters,
> > > >
> > > > We are not ready to release today.
> > > >
> > > > Alexey Goncharuk is still working on ignite-1171. Alex please provide
> > > > updates by the end of the day.
> > > >
> > > > https://issues.apache.org/jira/browse/IGNITE-1516 - performance
> > offheap
> > > > query benchmark is not fully recovered. Semyon will be fixing it.
> > Sergi,
> > > > can you please assist?
> > > >
> > > > https://issues.apache.org/jira/browse/IGNITE-973 - Semyon has fixed
> > race
> > > > in
> > > > cache logic, but issue is still reproducible due to possible issues
> in
> > > > indexing logic. Sergi, this is on you. Can you please take a look?
> > > >
> > > > --Yakov
> > > >
> > >
> >
>



--
Andrey Gura
GridGain Systems, Inc.
www.gridgain.com
Reply | Threaded
Open this post in threaded view
|

Re: ignite 1.4 status

Sergey Kozlov
In reply to this post by yzhdanov
I'm working to reproduce IGNITE-973

On Wed, Sep 23, 2015 at 7:35 PM, Yakov Zhdanov <[hidden email]> wrote:

> Alex, I merged 1171 to 1.4 branch.
>
> We have very few tickets left.
>
> Guys, who is in charge of checking this -
> https://issues.apache.org/jira/browse/IGNITE-973? Please provide the
> update?
>
> Hopefully, we will release tomorrow.
>
> Thanks!
>
> --Yakov
>
> 2015-09-23 8:37 GMT+03:00 Alexey Goncharuk <[hidden email]>:
>
> > Yakov,
> >
> > I think I fixed the remaining issues in the branch. There was one issue
> > with the pending queue - my original ordering for messages was not
> correct.
> > The other thing was the NodeAddFinished message processing that I
> consulter
> > you with over Skype. The TC looks green(ish), I cleaned up the code and
> > merged it to ignite-1171 (non-debug) branch, triggered TC one more time.
> >
> > It would be great if you guys trigger TC couple more times and monitor
> it's
> > state because we changed I guess the most sensitive part of Ignite, but
> it
> > feels like we're pretty close to get this issue fixed :)
> >
> > 2015-09-22 9:43 GMT-07:00 Yakov Zhdanov <[hidden email]>:
> >
> > > Alex, I spent some time debugging this today.
> > >
> > > I noticed that we do not verify that topology version of the custom
> > message
> > > is identical to current ring version. After I added this condition test
> > > started passing. However, it hangs from time to time since custom
> message
> > > gets discarded before it gets processed (the new condition works here)
> > > which means that topology version has somehow been changed, but custom
> > > message has not been processed yet by that time.
> > >
> > > My changes are in ignite-1171-debug. Can you please take a further
> look?
> > >
> > > --Yakov
> > >
> > > 2015-09-22 5:50 GMT+03:00 Alexey Goncharuk <[hidden email]
> >:
> > >
> > > > Folks,
> > > >
> > > > I was debugging issues with discovery today, my findings are below:
> > > >
> > > >    - Issue with assertion "topology version has not been updated" was
> > > >    caused by sending discard message for custom messages. Now since
> we
> > > >    re-arrange custom messages, discardId gets repositioned and
> messages
> > > > that
> > > >    should have been discarded were not discarded.
> > > >    - Fixed the issue above by introducing separate pending queue for
> > > custom
> > > >    messages which gets discarded independently from other discovery
> > > > messages.
> > > >    - Did not get to the bottom of "joining nodes" assertion. From the
> > > debug
> > > >    I see that coordinator always fires custom messages at the right
> > > moment,
> > > >    when joiningNodes is empty, however despite the fixed (above)
> issue
> > > with
> > > >    custom messages discard, custom processed custom messages get
> > re-sent
> > > > which
> > > >    leads to this assertion
> > > >
> > > > I committed my pending debug code to ignite-1171-debug branch, if any
> > of
> > > > you guys is up to debugging this issue while I'm asleep - great, if
> > not -
> > > > I'll continue digging into it tomorrow.
> > > >
> > > > 2015-09-21 10:55 GMT-07:00 Yakov Zhdanov <[hidden email]>:
> > > >
> > > > > Igniters,
> > > > >
> > > > > We are not ready to release today.
> > > > >
> > > > > Alexey Goncharuk is still working on ignite-1171. Alex please
> provide
> > > > > updates by the end of the day.
> > > > >
> > > > > https://issues.apache.org/jira/browse/IGNITE-1516 - performance
> > > offheap
> > > > > query benchmark is not fully recovered. Semyon will be fixing it.
> > > Sergi,
> > > > > can you please assist?
> > > > >
> > > > > https://issues.apache.org/jira/browse/IGNITE-973 - Semyon has
> fixed
> > > race
> > > > > in
> > > > > cache logic, but issue is still reproducible due to possible issues
> > in
> > > > > indexing logic. Sergi, this is on you. Can you please take a look?
> > > > >
> > > > > --Yakov
> > > > >
> > > >
> > >
> >
>



--
Sergey Kozlov
GridGain Systems
www.gridgain.com
Reply | Threaded
Open this post in threaded view
|

Re: ignite 1.4 status

Sergey Kozlov
I couldn't reproduce IGNITE-973 but would like to get a comment from the
reporter. Maybe some points for testing of issue have been missed by me...

On Wed, Sep 23, 2015 at 8:48 PM, Sergey Kozlov <[hidden email]> wrote:

> I'm working to reproduce IGNITE-973
>
> On Wed, Sep 23, 2015 at 7:35 PM, Yakov Zhdanov <[hidden email]>
> wrote:
>
>> Alex, I merged 1171 to 1.4 branch.
>>
>> We have very few tickets left.
>>
>> Guys, who is in charge of checking this -
>> https://issues.apache.org/jira/browse/IGNITE-973? Please provide the
>> update?
>>
>> Hopefully, we will release tomorrow.
>>
>> Thanks!
>>
>> --Yakov
>>
>> 2015-09-23 8:37 GMT+03:00 Alexey Goncharuk <[hidden email]>:
>>
>> > Yakov,
>> >
>> > I think I fixed the remaining issues in the branch. There was one issue
>> > with the pending queue - my original ordering for messages was not
>> correct.
>> > The other thing was the NodeAddFinished message processing that I
>> consulter
>> > you with over Skype. The TC looks green(ish), I cleaned up the code and
>> > merged it to ignite-1171 (non-debug) branch, triggered TC one more time.
>> >
>> > It would be great if you guys trigger TC couple more times and monitor
>> it's
>> > state because we changed I guess the most sensitive part of Ignite, but
>> it
>> > feels like we're pretty close to get this issue fixed :)
>> >
>> > 2015-09-22 9:43 GMT-07:00 Yakov Zhdanov <[hidden email]>:
>> >
>> > > Alex, I spent some time debugging this today.
>> > >
>> > > I noticed that we do not verify that topology version of the custom
>> > message
>> > > is identical to current ring version. After I added this condition
>> test
>> > > started passing. However, it hangs from time to time since custom
>> message
>> > > gets discarded before it gets processed (the new condition works here)
>> > > which means that topology version has somehow been changed, but custom
>> > > message has not been processed yet by that time.
>> > >
>> > > My changes are in ignite-1171-debug. Can you please take a further
>> look?
>> > >
>> > > --Yakov
>> > >
>> > > 2015-09-22 5:50 GMT+03:00 Alexey Goncharuk <
>> [hidden email]>:
>> > >
>> > > > Folks,
>> > > >
>> > > > I was debugging issues with discovery today, my findings are below:
>> > > >
>> > > >    - Issue with assertion "topology version has not been updated"
>> was
>> > > >    caused by sending discard message for custom messages. Now since
>> we
>> > > >    re-arrange custom messages, discardId gets repositioned and
>> messages
>> > > > that
>> > > >    should have been discarded were not discarded.
>> > > >    - Fixed the issue above by introducing separate pending queue for
>> > > custom
>> > > >    messages which gets discarded independently from other discovery
>> > > > messages.
>> > > >    - Did not get to the bottom of "joining nodes" assertion. From
>> the
>> > > debug
>> > > >    I see that coordinator always fires custom messages at the right
>> > > moment,
>> > > >    when joiningNodes is empty, however despite the fixed (above)
>> issue
>> > > with
>> > > >    custom messages discard, custom processed custom messages get
>> > re-sent
>> > > > which
>> > > >    leads to this assertion
>> > > >
>> > > > I committed my pending debug code to ignite-1171-debug branch, if
>> any
>> > of
>> > > > you guys is up to debugging this issue while I'm asleep - great, if
>> > not -
>> > > > I'll continue digging into it tomorrow.
>> > > >
>> > > > 2015-09-21 10:55 GMT-07:00 Yakov Zhdanov <[hidden email]>:
>> > > >
>> > > > > Igniters,
>> > > > >
>> > > > > We are not ready to release today.
>> > > > >
>> > > > > Alexey Goncharuk is still working on ignite-1171. Alex please
>> provide
>> > > > > updates by the end of the day.
>> > > > >
>> > > > > https://issues.apache.org/jira/browse/IGNITE-1516 - performance
>> > > offheap
>> > > > > query benchmark is not fully recovered. Semyon will be fixing it.
>> > > Sergi,
>> > > > > can you please assist?
>> > > > >
>> > > > > https://issues.apache.org/jira/browse/IGNITE-973 - Semyon has
>> fixed
>> > > race
>> > > > > in
>> > > > > cache logic, but issue is still reproducible due to possible
>> issues
>> > in
>> > > > > indexing logic. Sergi, this is on you. Can you please take a look?
>> > > > >
>> > > > > --Yakov
>> > > > >
>> > > >
>> > >
>> >
>>
>
>
>
> --
> Sergey Kozlov
> GridGain Systems
> www.gridgain.com
>



--
Sergey Kozlov
Reply | Threaded
Open this post in threaded view
|

Re: ignite 1.4 status

Alexey Goncharuk
In reply to this post by Andrey Gura
Andrey,

I tried to check out the Vert.x integration by myself and ran the test - it
passed for me. I saw some sporadic printouts from Vert.x that threads are
being blocked for a longer time than allowed, however I did not see the
exception you mentioned in IGNITE-1534. I also tried to add a separate unit
test for this case, it also passes. So it would be great if you managed to
reproduce this in a standalone test.

2015-09-23 10:38 GMT-07:00 Andrey Gura <[hidden email]>:

> Alexey,
>
> I have been running Vert.x cluster manager tests today. IGNITE-1171 doesn't
> reproduces anymore.
>
> But new problem was found:
> https://issues.apache.org/jira/browse/IGNITE-1534
>
> I'll try to create Ignite test for this problem. I hope you have some ideas
> about how to reproduce (stable) and fix it.
>
> On Wed, Sep 23, 2015 at 8:37 AM, Alexey Goncharuk <
> [hidden email]> wrote:
>
> > Yakov,
> >
> > I think I fixed the remaining issues in the branch. There was one issue
> > with the pending queue - my original ordering for messages was not
> correct.
> > The other thing was the NodeAddFinished message processing that I
> consulter
> > you with over Skype. The TC looks green(ish), I cleaned up the code and
> > merged it to ignite-1171 (non-debug) branch, triggered TC one more time.
> >
> > It would be great if you guys trigger TC couple more times and monitor
> it's
> > state because we changed I guess the most sensitive part of Ignite, but
> it
> > feels like we're pretty close to get this issue fixed :)
> >
> > 2015-09-22 9:43 GMT-07:00 Yakov Zhdanov <[hidden email]>:
> >
> > > Alex, I spent some time debugging this today.
> > >
> > > I noticed that we do not verify that topology version of the custom
> > message
> > > is identical to current ring version. After I added this condition test
> > > started passing. However, it hangs from time to time since custom
> message
> > > gets discarded before it gets processed (the new condition works here)
> > > which means that topology version has somehow been changed, but custom
> > > message has not been processed yet by that time.
> > >
> > > My changes are in ignite-1171-debug. Can you please take a further
> look?
> > >
> > > --Yakov
> > >
> > > 2015-09-22 5:50 GMT+03:00 Alexey Goncharuk <[hidden email]
> >:
> > >
> > > > Folks,
> > > >
> > > > I was debugging issues with discovery today, my findings are below:
> > > >
> > > >    - Issue with assertion "topology version has not been updated" was
> > > >    caused by sending discard message for custom messages. Now since
> we
> > > >    re-arrange custom messages, discardId gets repositioned and
> messages
> > > > that
> > > >    should have been discarded were not discarded.
> > > >    - Fixed the issue above by introducing separate pending queue for
> > > custom
> > > >    messages which gets discarded independently from other discovery
> > > > messages.
> > > >    - Did not get to the bottom of "joining nodes" assertion. From the
> > > debug
> > > >    I see that coordinator always fires custom messages at the right
> > > moment,
> > > >    when joiningNodes is empty, however despite the fixed (above)
> issue
> > > with
> > > >    custom messages discard, custom processed custom messages get
> > re-sent
> > > > which
> > > >    leads to this assertion
> > > >
> > > > I committed my pending debug code to ignite-1171-debug branch, if any
> > of
> > > > you guys is up to debugging this issue while I'm asleep - great, if
> > not -
> > > > I'll continue digging into it tomorrow.
> > > >
> > > > 2015-09-21 10:55 GMT-07:00 Yakov Zhdanov <[hidden email]>:
> > > >
> > > > > Igniters,
> > > > >
> > > > > We are not ready to release today.
> > > > >
> > > > > Alexey Goncharuk is still working on ignite-1171. Alex please
> provide
> > > > > updates by the end of the day.
> > > > >
> > > > > https://issues.apache.org/jira/browse/IGNITE-1516 - performance
> > > offheap
> > > > > query benchmark is not fully recovered. Semyon will be fixing it.
> > > Sergi,
> > > > > can you please assist?
> > > > >
> > > > > https://issues.apache.org/jira/browse/IGNITE-973 - Semyon has
> fixed
> > > race
> > > > > in
> > > > > cache logic, but issue is still reproducible due to possible issues
> > in
> > > > > indexing logic. Sergi, this is on you. Can you please take a look?
> > > > >
> > > > > --Yakov
> > > > >
> > > >
> > >
> >
>
>
>
> --
> Andrey Gura
> GridGain Systems, Inc.
> www.gridgain.com
>
12