Yakov,
I think I fixed the remaining issues in the branch. There was one issue with the pending queue - my original ordering for messages was not correct. The other thing was the NodeAddFinished message processing that I consulter you with over Skype. The TC looks green(ish), I cleaned up the code and merged it to ignite-1171 (non-debug) branch, triggered TC one more time. It would be great if you guys trigger TC couple more times and monitor it's state because we changed I guess the most sensitive part of Ignite, but it feels like we're pretty close to get this issue fixed :) 2015-09-22 9:43 GMT-07:00 Yakov Zhdanov <[hidden email]>: > Alex, I spent some time debugging this today. > > I noticed that we do not verify that topology version of the custom message > is identical to current ring version. After I added this condition test > started passing. However, it hangs from time to time since custom message > gets discarded before it gets processed (the new condition works here) > which means that topology version has somehow been changed, but custom > message has not been processed yet by that time. > > My changes are in ignite-1171-debug. Can you please take a further look? > > --Yakov > > 2015-09-22 5:50 GMT+03:00 Alexey Goncharuk <[hidden email]>: > > > Folks, > > > > I was debugging issues with discovery today, my findings are below: > > > > - Issue with assertion "topology version has not been updated" was > > caused by sending discard message for custom messages. Now since we > > re-arrange custom messages, discardId gets repositioned and messages > > that > > should have been discarded were not discarded. > > - Fixed the issue above by introducing separate pending queue for > custom > > messages which gets discarded independently from other discovery > > messages. > > - Did not get to the bottom of "joining nodes" assertion. From the > debug > > I see that coordinator always fires custom messages at the right > moment, > > when joiningNodes is empty, however despite the fixed (above) issue > with > > custom messages discard, custom processed custom messages get re-sent > > which > > leads to this assertion > > > > I committed my pending debug code to ignite-1171-debug branch, if any of > > you guys is up to debugging this issue while I'm asleep - great, if not - > > I'll continue digging into it tomorrow. > > > > 2015-09-21 10:55 GMT-07:00 Yakov Zhdanov <[hidden email]>: > > > > > Igniters, > > > > > > We are not ready to release today. > > > > > > Alexey Goncharuk is still working on ignite-1171. Alex please provide > > > updates by the end of the day. > > > > > > https://issues.apache.org/jira/browse/IGNITE-1516 - performance > offheap > > > query benchmark is not fully recovered. Semyon will be fixing it. > Sergi, > > > can you please assist? > > > > > > https://issues.apache.org/jira/browse/IGNITE-973 - Semyon has fixed > race > > > in > > > cache logic, but issue is still reproducible due to possible issues in > > > indexing logic. Sergi, this is on you. Can you please take a look? > > > > > > --Yakov > > > > > > |
Alex, I merged 1171 to 1.4 branch.
We have very few tickets left. Guys, who is in charge of checking this - https://issues.apache.org/jira/browse/IGNITE-973? Please provide the update? Hopefully, we will release tomorrow. Thanks! --Yakov 2015-09-23 8:37 GMT+03:00 Alexey Goncharuk <[hidden email]>: > Yakov, > > I think I fixed the remaining issues in the branch. There was one issue > with the pending queue - my original ordering for messages was not correct. > The other thing was the NodeAddFinished message processing that I consulter > you with over Skype. The TC looks green(ish), I cleaned up the code and > merged it to ignite-1171 (non-debug) branch, triggered TC one more time. > > It would be great if you guys trigger TC couple more times and monitor it's > state because we changed I guess the most sensitive part of Ignite, but it > feels like we're pretty close to get this issue fixed :) > > 2015-09-22 9:43 GMT-07:00 Yakov Zhdanov <[hidden email]>: > > > Alex, I spent some time debugging this today. > > > > I noticed that we do not verify that topology version of the custom > message > > is identical to current ring version. After I added this condition test > > started passing. However, it hangs from time to time since custom message > > gets discarded before it gets processed (the new condition works here) > > which means that topology version has somehow been changed, but custom > > message has not been processed yet by that time. > > > > My changes are in ignite-1171-debug. Can you please take a further look? > > > > --Yakov > > > > 2015-09-22 5:50 GMT+03:00 Alexey Goncharuk <[hidden email]>: > > > > > Folks, > > > > > > I was debugging issues with discovery today, my findings are below: > > > > > > - Issue with assertion "topology version has not been updated" was > > > caused by sending discard message for custom messages. Now since we > > > re-arrange custom messages, discardId gets repositioned and messages > > > that > > > should have been discarded were not discarded. > > > - Fixed the issue above by introducing separate pending queue for > > custom > > > messages which gets discarded independently from other discovery > > > messages. > > > - Did not get to the bottom of "joining nodes" assertion. From the > > debug > > > I see that coordinator always fires custom messages at the right > > moment, > > > when joiningNodes is empty, however despite the fixed (above) issue > > with > > > custom messages discard, custom processed custom messages get > re-sent > > > which > > > leads to this assertion > > > > > > I committed my pending debug code to ignite-1171-debug branch, if any > of > > > you guys is up to debugging this issue while I'm asleep - great, if > not - > > > I'll continue digging into it tomorrow. > > > > > > 2015-09-21 10:55 GMT-07:00 Yakov Zhdanov <[hidden email]>: > > > > > > > Igniters, > > > > > > > > We are not ready to release today. > > > > > > > > Alexey Goncharuk is still working on ignite-1171. Alex please provide > > > > updates by the end of the day. > > > > > > > > https://issues.apache.org/jira/browse/IGNITE-1516 - performance > > offheap > > > > query benchmark is not fully recovered. Semyon will be fixing it. > > Sergi, > > > > can you please assist? > > > > > > > > https://issues.apache.org/jira/browse/IGNITE-973 - Semyon has fixed > > race > > > > in > > > > cache logic, but issue is still reproducible due to possible issues > in > > > > indexing logic. Sergi, this is on you. Can you please take a look? > > > > > > > > --Yakov > > > > > > > > > > |
In reply to this post by Alexey Goncharuk
Alexey,
I have been running Vert.x cluster manager tests today. IGNITE-1171 doesn't reproduces anymore. But new problem was found: https://issues.apache.org/jira/browse/IGNITE-1534 I'll try to create Ignite test for this problem. I hope you have some ideas about how to reproduce (stable) and fix it. On Wed, Sep 23, 2015 at 8:37 AM, Alexey Goncharuk < [hidden email]> wrote: > Yakov, > > I think I fixed the remaining issues in the branch. There was one issue > with the pending queue - my original ordering for messages was not correct. > The other thing was the NodeAddFinished message processing that I consulter > you with over Skype. The TC looks green(ish), I cleaned up the code and > merged it to ignite-1171 (non-debug) branch, triggered TC one more time. > > It would be great if you guys trigger TC couple more times and monitor it's > state because we changed I guess the most sensitive part of Ignite, but it > feels like we're pretty close to get this issue fixed :) > > 2015-09-22 9:43 GMT-07:00 Yakov Zhdanov <[hidden email]>: > > > Alex, I spent some time debugging this today. > > > > I noticed that we do not verify that topology version of the custom > message > > is identical to current ring version. After I added this condition test > > started passing. However, it hangs from time to time since custom message > > gets discarded before it gets processed (the new condition works here) > > which means that topology version has somehow been changed, but custom > > message has not been processed yet by that time. > > > > My changes are in ignite-1171-debug. Can you please take a further look? > > > > --Yakov > > > > 2015-09-22 5:50 GMT+03:00 Alexey Goncharuk <[hidden email]>: > > > > > Folks, > > > > > > I was debugging issues with discovery today, my findings are below: > > > > > > - Issue with assertion "topology version has not been updated" was > > > caused by sending discard message for custom messages. Now since we > > > re-arrange custom messages, discardId gets repositioned and messages > > > that > > > should have been discarded were not discarded. > > > - Fixed the issue above by introducing separate pending queue for > > custom > > > messages which gets discarded independently from other discovery > > > messages. > > > - Did not get to the bottom of "joining nodes" assertion. From the > > debug > > > I see that coordinator always fires custom messages at the right > > moment, > > > when joiningNodes is empty, however despite the fixed (above) issue > > with > > > custom messages discard, custom processed custom messages get > re-sent > > > which > > > leads to this assertion > > > > > > I committed my pending debug code to ignite-1171-debug branch, if any > of > > > you guys is up to debugging this issue while I'm asleep - great, if > not - > > > I'll continue digging into it tomorrow. > > > > > > 2015-09-21 10:55 GMT-07:00 Yakov Zhdanov <[hidden email]>: > > > > > > > Igniters, > > > > > > > > We are not ready to release today. > > > > > > > > Alexey Goncharuk is still working on ignite-1171. Alex please provide > > > > updates by the end of the day. > > > > > > > > https://issues.apache.org/jira/browse/IGNITE-1516 - performance > > offheap > > > > query benchmark is not fully recovered. Semyon will be fixing it. > > Sergi, > > > > can you please assist? > > > > > > > > https://issues.apache.org/jira/browse/IGNITE-973 - Semyon has fixed > > race > > > > in > > > > cache logic, but issue is still reproducible due to possible issues > in > > > > indexing logic. Sergi, this is on you. Can you please take a look? > > > > > > > > --Yakov > > > > > > > > > > -- Andrey Gura GridGain Systems, Inc. www.gridgain.com |
In reply to this post by yzhdanov
I'm working to reproduce IGNITE-973
On Wed, Sep 23, 2015 at 7:35 PM, Yakov Zhdanov <[hidden email]> wrote: > Alex, I merged 1171 to 1.4 branch. > > We have very few tickets left. > > Guys, who is in charge of checking this - > https://issues.apache.org/jira/browse/IGNITE-973? Please provide the > update? > > Hopefully, we will release tomorrow. > > Thanks! > > --Yakov > > 2015-09-23 8:37 GMT+03:00 Alexey Goncharuk <[hidden email]>: > > > Yakov, > > > > I think I fixed the remaining issues in the branch. There was one issue > > with the pending queue - my original ordering for messages was not > correct. > > The other thing was the NodeAddFinished message processing that I > consulter > > you with over Skype. The TC looks green(ish), I cleaned up the code and > > merged it to ignite-1171 (non-debug) branch, triggered TC one more time. > > > > It would be great if you guys trigger TC couple more times and monitor > it's > > state because we changed I guess the most sensitive part of Ignite, but > it > > feels like we're pretty close to get this issue fixed :) > > > > 2015-09-22 9:43 GMT-07:00 Yakov Zhdanov <[hidden email]>: > > > > > Alex, I spent some time debugging this today. > > > > > > I noticed that we do not verify that topology version of the custom > > message > > > is identical to current ring version. After I added this condition test > > > started passing. However, it hangs from time to time since custom > message > > > gets discarded before it gets processed (the new condition works here) > > > which means that topology version has somehow been changed, but custom > > > message has not been processed yet by that time. > > > > > > My changes are in ignite-1171-debug. Can you please take a further > look? > > > > > > --Yakov > > > > > > 2015-09-22 5:50 GMT+03:00 Alexey Goncharuk <[hidden email] > >: > > > > > > > Folks, > > > > > > > > I was debugging issues with discovery today, my findings are below: > > > > > > > > - Issue with assertion "topology version has not been updated" was > > > > caused by sending discard message for custom messages. Now since > we > > > > re-arrange custom messages, discardId gets repositioned and > messages > > > > that > > > > should have been discarded were not discarded. > > > > - Fixed the issue above by introducing separate pending queue for > > > custom > > > > messages which gets discarded independently from other discovery > > > > messages. > > > > - Did not get to the bottom of "joining nodes" assertion. From the > > > debug > > > > I see that coordinator always fires custom messages at the right > > > moment, > > > > when joiningNodes is empty, however despite the fixed (above) > issue > > > with > > > > custom messages discard, custom processed custom messages get > > re-sent > > > > which > > > > leads to this assertion > > > > > > > > I committed my pending debug code to ignite-1171-debug branch, if any > > of > > > > you guys is up to debugging this issue while I'm asleep - great, if > > not - > > > > I'll continue digging into it tomorrow. > > > > > > > > 2015-09-21 10:55 GMT-07:00 Yakov Zhdanov <[hidden email]>: > > > > > > > > > Igniters, > > > > > > > > > > We are not ready to release today. > > > > > > > > > > Alexey Goncharuk is still working on ignite-1171. Alex please > provide > > > > > updates by the end of the day. > > > > > > > > > > https://issues.apache.org/jira/browse/IGNITE-1516 - performance > > > offheap > > > > > query benchmark is not fully recovered. Semyon will be fixing it. > > > Sergi, > > > > > can you please assist? > > > > > > > > > > https://issues.apache.org/jira/browse/IGNITE-973 - Semyon has > fixed > > > race > > > > > in > > > > > cache logic, but issue is still reproducible due to possible issues > > in > > > > > indexing logic. Sergi, this is on you. Can you please take a look? > > > > > > > > > > --Yakov > > > > > > > > > > > > > > > -- Sergey Kozlov GridGain Systems www.gridgain.com |
I couldn't reproduce IGNITE-973 but would like to get a comment from the
reporter. Maybe some points for testing of issue have been missed by me... On Wed, Sep 23, 2015 at 8:48 PM, Sergey Kozlov <[hidden email]> wrote: > I'm working to reproduce IGNITE-973 > > On Wed, Sep 23, 2015 at 7:35 PM, Yakov Zhdanov <[hidden email]> > wrote: > >> Alex, I merged 1171 to 1.4 branch. >> >> We have very few tickets left. >> >> Guys, who is in charge of checking this - >> https://issues.apache.org/jira/browse/IGNITE-973? Please provide the >> update? >> >> Hopefully, we will release tomorrow. >> >> Thanks! >> >> --Yakov >> >> 2015-09-23 8:37 GMT+03:00 Alexey Goncharuk <[hidden email]>: >> >> > Yakov, >> > >> > I think I fixed the remaining issues in the branch. There was one issue >> > with the pending queue - my original ordering for messages was not >> correct. >> > The other thing was the NodeAddFinished message processing that I >> consulter >> > you with over Skype. The TC looks green(ish), I cleaned up the code and >> > merged it to ignite-1171 (non-debug) branch, triggered TC one more time. >> > >> > It would be great if you guys trigger TC couple more times and monitor >> it's >> > state because we changed I guess the most sensitive part of Ignite, but >> it >> > feels like we're pretty close to get this issue fixed :) >> > >> > 2015-09-22 9:43 GMT-07:00 Yakov Zhdanov <[hidden email]>: >> > >> > > Alex, I spent some time debugging this today. >> > > >> > > I noticed that we do not verify that topology version of the custom >> > message >> > > is identical to current ring version. After I added this condition >> test >> > > started passing. However, it hangs from time to time since custom >> message >> > > gets discarded before it gets processed (the new condition works here) >> > > which means that topology version has somehow been changed, but custom >> > > message has not been processed yet by that time. >> > > >> > > My changes are in ignite-1171-debug. Can you please take a further >> look? >> > > >> > > --Yakov >> > > >> > > 2015-09-22 5:50 GMT+03:00 Alexey Goncharuk < >> [hidden email]>: >> > > >> > > > Folks, >> > > > >> > > > I was debugging issues with discovery today, my findings are below: >> > > > >> > > > - Issue with assertion "topology version has not been updated" >> was >> > > > caused by sending discard message for custom messages. Now since >> we >> > > > re-arrange custom messages, discardId gets repositioned and >> messages >> > > > that >> > > > should have been discarded were not discarded. >> > > > - Fixed the issue above by introducing separate pending queue for >> > > custom >> > > > messages which gets discarded independently from other discovery >> > > > messages. >> > > > - Did not get to the bottom of "joining nodes" assertion. From >> the >> > > debug >> > > > I see that coordinator always fires custom messages at the right >> > > moment, >> > > > when joiningNodes is empty, however despite the fixed (above) >> issue >> > > with >> > > > custom messages discard, custom processed custom messages get >> > re-sent >> > > > which >> > > > leads to this assertion >> > > > >> > > > I committed my pending debug code to ignite-1171-debug branch, if >> any >> > of >> > > > you guys is up to debugging this issue while I'm asleep - great, if >> > not - >> > > > I'll continue digging into it tomorrow. >> > > > >> > > > 2015-09-21 10:55 GMT-07:00 Yakov Zhdanov <[hidden email]>: >> > > > >> > > > > Igniters, >> > > > > >> > > > > We are not ready to release today. >> > > > > >> > > > > Alexey Goncharuk is still working on ignite-1171. Alex please >> provide >> > > > > updates by the end of the day. >> > > > > >> > > > > https://issues.apache.org/jira/browse/IGNITE-1516 - performance >> > > offheap >> > > > > query benchmark is not fully recovered. Semyon will be fixing it. >> > > Sergi, >> > > > > can you please assist? >> > > > > >> > > > > https://issues.apache.org/jira/browse/IGNITE-973 - Semyon has >> fixed >> > > race >> > > > > in >> > > > > cache logic, but issue is still reproducible due to possible >> issues >> > in >> > > > > indexing logic. Sergi, this is on you. Can you please take a look? >> > > > > >> > > > > --Yakov >> > > > > >> > > > >> > > >> > >> > > > > -- > Sergey Kozlov > GridGain Systems > www.gridgain.com > -- Sergey Kozlov |
In reply to this post by Andrey Gura
Andrey,
I tried to check out the Vert.x integration by myself and ran the test - it passed for me. I saw some sporadic printouts from Vert.x that threads are being blocked for a longer time than allowed, however I did not see the exception you mentioned in IGNITE-1534. I also tried to add a separate unit test for this case, it also passes. So it would be great if you managed to reproduce this in a standalone test. 2015-09-23 10:38 GMT-07:00 Andrey Gura <[hidden email]>: > Alexey, > > I have been running Vert.x cluster manager tests today. IGNITE-1171 doesn't > reproduces anymore. > > But new problem was found: > https://issues.apache.org/jira/browse/IGNITE-1534 > > I'll try to create Ignite test for this problem. I hope you have some ideas > about how to reproduce (stable) and fix it. > > On Wed, Sep 23, 2015 at 8:37 AM, Alexey Goncharuk < > [hidden email]> wrote: > > > Yakov, > > > > I think I fixed the remaining issues in the branch. There was one issue > > with the pending queue - my original ordering for messages was not > correct. > > The other thing was the NodeAddFinished message processing that I > consulter > > you with over Skype. The TC looks green(ish), I cleaned up the code and > > merged it to ignite-1171 (non-debug) branch, triggered TC one more time. > > > > It would be great if you guys trigger TC couple more times and monitor > it's > > state because we changed I guess the most sensitive part of Ignite, but > it > > feels like we're pretty close to get this issue fixed :) > > > > 2015-09-22 9:43 GMT-07:00 Yakov Zhdanov <[hidden email]>: > > > > > Alex, I spent some time debugging this today. > > > > > > I noticed that we do not verify that topology version of the custom > > message > > > is identical to current ring version. After I added this condition test > > > started passing. However, it hangs from time to time since custom > message > > > gets discarded before it gets processed (the new condition works here) > > > which means that topology version has somehow been changed, but custom > > > message has not been processed yet by that time. > > > > > > My changes are in ignite-1171-debug. Can you please take a further > look? > > > > > > --Yakov > > > > > > 2015-09-22 5:50 GMT+03:00 Alexey Goncharuk <[hidden email] > >: > > > > > > > Folks, > > > > > > > > I was debugging issues with discovery today, my findings are below: > > > > > > > > - Issue with assertion "topology version has not been updated" was > > > > caused by sending discard message for custom messages. Now since > we > > > > re-arrange custom messages, discardId gets repositioned and > messages > > > > that > > > > should have been discarded were not discarded. > > > > - Fixed the issue above by introducing separate pending queue for > > > custom > > > > messages which gets discarded independently from other discovery > > > > messages. > > > > - Did not get to the bottom of "joining nodes" assertion. From the > > > debug > > > > I see that coordinator always fires custom messages at the right > > > moment, > > > > when joiningNodes is empty, however despite the fixed (above) > issue > > > with > > > > custom messages discard, custom processed custom messages get > > re-sent > > > > which > > > > leads to this assertion > > > > > > > > I committed my pending debug code to ignite-1171-debug branch, if any > > of > > > > you guys is up to debugging this issue while I'm asleep - great, if > > not - > > > > I'll continue digging into it tomorrow. > > > > > > > > 2015-09-21 10:55 GMT-07:00 Yakov Zhdanov <[hidden email]>: > > > > > > > > > Igniters, > > > > > > > > > > We are not ready to release today. > > > > > > > > > > Alexey Goncharuk is still working on ignite-1171. Alex please > provide > > > > > updates by the end of the day. > > > > > > > > > > https://issues.apache.org/jira/browse/IGNITE-1516 - performance > > > offheap > > > > > query benchmark is not fully recovered. Semyon will be fixing it. > > > Sergi, > > > > > can you please assist? > > > > > > > > > > https://issues.apache.org/jira/browse/IGNITE-973 - Semyon has > fixed > > > race > > > > > in > > > > > cache logic, but issue is still reproducible due to possible issues > > in > > > > > indexing logic. Sergi, this is on you. Can you please take a look? > > > > > > > > > > --Yakov > > > > > > > > > > > > > > > > > > -- > Andrey Gura > GridGain Systems, Inc. > www.gridgain.com > |
Free forum by Nabble | Edit this page |