Denis,
It looks like the failing test is related to existing ATOMIC caches but it was broken by the MVCC commit, so it is a regression. Let's wait for Vladimir Ozerov or Igor Seliverstov to comment. чт, 29 нояб. 2018 г. в 19:32, Nikolay Izhikov <[hidden email]>: > Hello, Denis. > > Nothing blocks now. > I preparing vote artifacts right now. > There are some issues with TC tasks. > I think I resolve them in a couple of hours. > > чт, 29 нояб. 2018 г., 19:30 Denis Magda [hidden email]: > > > I think that it's not a blocker since MVCC is in the beta state and some > of > > the APIs might not work well with it yet. > > > > Apart from that, are we done with the stabilization and ready to start > the > > vote? What blocks us from that? > > > > -- > > Denis > > > > > > On Thu, Nov 29, 2018 at 7:43 AM Yakov Zhdanov <[hidden email]> > wrote: > > > > > Vladimir, can you please take a look at > > > https://issues.apache.org/jira/browse/IGNITE-10376? > > > > > > --Yakov > > > > > > |
In reply to this post by yzhdanov
Ivan,
Could you provide a bit more details? I don't see any NPE among all available logs. I don't think the issue is caused by changes in scope of IGNITE-7953. The test fails both before <https://ci.ignite.apache.org/viewLog.html?buildId=2318582&tab=buildResultsDiv&buildTypeId=IgniteTests24Java8_ContinuousQuery4#testNameId3300126853696550025> and after <https://ci.ignite.apache.org/viewLog.html?buildId=2345403&tab=buildResultsDiv&buildTypeId=IgniteTests24Java8_ContinuousQuery4#testNameId3300126853696550025> the commit was merged to master with almost the same stack trace. Regards, Igor чт, 29 нояб. 2018 г. в 18:43, Yakov Zhdanov <[hidden email]>: > Vladimir, can you please take a look at > https://issues.apache.org/jira/browse/IGNITE-10376? > > --Yakov > |
Igor,
NPE is available in a full log, now I also attached it in the ticket. IGNITE-7953 <https://github.com/apache/ignite/commit/51a202a4c48220fa919f47147bd4889033cd35a8> was commited on the 15 October. I could not take a look on the testAtomicOnheapTwoBackupAsyncFullSync before this date, because the oldest test in the history on TC dates 12 November. So, I tested it locally and could not reproduce mentioned error. чт, 29 нояб. 2018 г. в 20:07, Seliverstov Igor <[hidden email]>: > Ivan, > > Could you provide a bit more details? > > I don't see any NPE among all available logs. > > I don't think the issue is caused by changes in scope of IGNITE-7953. > The test fails both before > < > https://ci.ignite.apache.org/viewLog.html?buildId=2318582&tab=buildResultsDiv&buildTypeId=IgniteTests24Java8_ContinuousQuery4#testNameId3300126853696550025 > > > and after > < > https://ci.ignite.apache.org/viewLog.html?buildId=2345403&tab=buildResultsDiv&buildTypeId=IgniteTests24Java8_ContinuousQuery4#testNameId3300126853696550025 > > > the > commit was merged to master with almost the same stack trace. > > Regards, > Igor > > чт, 29 нояб. 2018 г. в 18:43, Yakov Zhdanov <[hidden email]>: > > > Vladimir, can you please take a look at > > https://issues.apache.org/jira/browse/IGNITE-10376? > > > > --Yakov > > > -- Ivan Fedotov. [hidden email] |
Ivan. Please, provide a link for a ticket with NPE stack trace attached.
I've looked at IGNITE-10376 and can't see any attachments. пт, 30 нояб. 2018 г., 10:14 Ivan Fedotov [hidden email]: > Igor, > NPE is available in a full log, now I also attached it in the ticket. > > IGNITE-7953 > < > https://github.com/apache/ignite/commit/51a202a4c48220fa919f47147bd4889033cd35a8 > > > was commited on the 15 October. I could not take a look on the > testAtomicOnheapTwoBackupAsyncFullSync before this date, because the oldest > test in the history on TC dates 12 November. > > So, I tested it locally and could not reproduce mentioned error. > > чт, 29 нояб. 2018 г. в 20:07, Seliverstov Igor <[hidden email]>: > > > Ivan, > > > > Could you provide a bit more details? > > > > I don't see any NPE among all available logs. > > > > I don't think the issue is caused by changes in scope of IGNITE-7953. > > The test fails both before > > < > > > https://ci.ignite.apache.org/viewLog.html?buildId=2318582&tab=buildResultsDiv&buildTypeId=IgniteTests24Java8_ContinuousQuery4#testNameId3300126853696550025 > > > > > and after > > < > > > https://ci.ignite.apache.org/viewLog.html?buildId=2345403&tab=buildResultsDiv&buildTypeId=IgniteTests24Java8_ContinuousQuery4#testNameId3300126853696550025 > > > > > the > > commit was merged to master with almost the same stack trace. > > > > Regards, > > Igor > > > > чт, 29 нояб. 2018 г. в 18:43, Yakov Zhdanov <[hidden email]>: > > > > > Vladimir, can you please take a look at > > > https://issues.apache.org/jira/browse/IGNITE-10376? > > > > > > --Yakov > > > > > > > > -- > Ivan Fedotov. > > [hidden email] > |
Null pointer there due to cache stop. Look at GridCacheContext#cleanup
(GridCacheContext.java:2050) which is called by GridCacheProcessor#stopCache (GridCacheProcessor.java:1372) That's why at the time GridCacheMapEntry#touch (GridCacheMapEntry.java:5063) invoked there is no eviction manager. This is a result of "normal" flow because message processing doesn't enter cache gate like user API does. пт, 30 нояб. 2018 г. в 10:26, Nikolay Izhikov <[hidden email]>: > Ivan. Please, provide a link for a ticket with NPE stack trace attached. > > I've looked at IGNITE-10376 and can't see any attachments. > > пт, 30 нояб. 2018 г., 10:14 Ivan Fedotov [hidden email]: > > > Igor, > > NPE is available in a full log, now I also attached it in the ticket. > > > > IGNITE-7953 > > < > > > https://github.com/apache/ignite/commit/51a202a4c48220fa919f47147bd4889033cd35a8 > > > > > was commited on the 15 October. I could not take a look on the > > testAtomicOnheapTwoBackupAsyncFullSync before this date, because the > oldest > > test in the history on TC dates 12 November. > > > > So, I tested it locally and could not reproduce mentioned error. > > > > чт, 29 нояб. 2018 г. в 20:07, Seliverstov Igor <[hidden email]>: > > > > > Ivan, > > > > > > Could you provide a bit more details? > > > > > > I don't see any NPE among all available logs. > > > > > > I don't think the issue is caused by changes in scope of IGNITE-7953. > > > The test fails both before > > > < > > > > > > https://ci.ignite.apache.org/viewLog.html?buildId=2318582&tab=buildResultsDiv&buildTypeId=IgniteTests24Java8_ContinuousQuery4#testNameId3300126853696550025 > > > > > > > and after > > > < > > > > > > https://ci.ignite.apache.org/viewLog.html?buildId=2345403&tab=buildResultsDiv&buildTypeId=IgniteTests24Java8_ContinuousQuery4#testNameId3300126853696550025 > > > > > > > the > > > commit was merged to master with almost the same stack trace. > > > > > > Regards, > > > Igor > > > > > > чт, 29 нояб. 2018 г. в 18:43, Yakov Zhdanov <[hidden email]>: > > > > > > > Vladimir, can you please take a look at > > > > https://issues.apache.org/jira/browse/IGNITE-10376? > > > > > > > > --Yakov > > > > > > > > > > > > > -- > > Ivan Fedotov. > > > > [hidden email] > > > |
Hi all!
I've reproduced this problem locally and attached the log to the ticket in my comment [1]. As Igor noted, NPE there is caused by node stop in the end of the test. The real problem here seems to be in the binary metadata registration flow. [1] https://issues.apache.org/jira/browse/IGNITE-10376?focusedCommentId=16704510&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16704510 -- Kind Regards Roman Kondakov On 30.11.2018 11:56, Seliverstov Igor wrote: > Null pointer there due to cache stop. Look at GridCacheContext#cleanup > (GridCacheContext.java:2050) > which is called by GridCacheProcessor#stopCache > (GridCacheProcessor.java:1372) > > That's why at the time GridCacheMapEntry#touch (GridCacheMapEntry.java:5063) > invoked there is no eviction manager. > > This is a result of "normal" flow because message processing doesn't enter > cache gate like user API does. > > пт, 30 нояб. 2018 г. в 10:26, Nikolay Izhikov <[hidden email]>: > >> Ivan. Please, provide a link for a ticket with NPE stack trace attached. >> >> I've looked at IGNITE-10376 and can't see any attachments. >> >> пт, 30 нояб. 2018 г., 10:14 Ivan Fedotov [hidden email]: >> >>> Igor, >>> NPE is available in a full log, now I also attached it in the ticket. >>> >>> IGNITE-7953 >>> < >>> >> https://github.com/apache/ignite/commit/51a202a4c48220fa919f47147bd4889033cd35a8 >>> was commited on the 15 October. I could not take a look on the >>> testAtomicOnheapTwoBackupAsyncFullSync before this date, because the >> oldest >>> test in the history on TC dates 12 November. >>> >>> So, I tested it locally and could not reproduce mentioned error. >>> >>> чт, 29 нояб. 2018 г. в 20:07, Seliverstov Igor <[hidden email]>: >>> >>>> Ivan, >>>> >>>> Could you provide a bit more details? >>>> >>>> I don't see any NPE among all available logs. >>>> >>>> I don't think the issue is caused by changes in scope of IGNITE-7953. >>>> The test fails both before >>>> < >>>> >> https://ci.ignite.apache.org/viewLog.html?buildId=2318582&tab=buildResultsDiv&buildTypeId=IgniteTests24Java8_ContinuousQuery4#testNameId3300126853696550025 >>>> and after >>>> < >>>> >> https://ci.ignite.apache.org/viewLog.html?buildId=2345403&tab=buildResultsDiv&buildTypeId=IgniteTests24Java8_ContinuousQuery4#testNameId3300126853696550025 >>>> the >>>> commit was merged to master with almost the same stack trace. >>>> >>>> Regards, >>>> Igor >>>> >>>> чт, 29 нояб. 2018 г. в 18:43, Yakov Zhdanov <[hidden email]>: >>>> >>>>> Vladimir, can you please take a look at >>>>> https://issues.apache.org/jira/browse/IGNITE-10376? >>>>> >>>>> --Yakov >>>>> >>> >>> -- >>> Ivan Fedotov. >>> >>> [hidden email] >>> |
Hello, Roman.
Is this issue blocks the 2.7 release? пт, 30 нояб. 2018 г., 13:19 Roman Kondakov [hidden email]: > Hi all! > > I've reproduced this problem locally and attached the log to the ticket > in my comment [1]. > > As Igor noted, NPE there is caused by node stop in the end of the test. > The real problem here seems to be in the binary metadata registration flow. > > > [1] > > https://issues.apache.org/jira/browse/IGNITE-10376?focusedCommentId=16704510&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16704510 > > -- > Kind Regards > Roman Kondakov > > On 30.11.2018 11:56, Seliverstov Igor wrote: > > Null pointer there due to cache stop. Look at GridCacheContext#cleanup > > (GridCacheContext.java:2050) > > which is called by GridCacheProcessor#stopCache > > (GridCacheProcessor.java:1372) > > > > That's why at the time GridCacheMapEntry#touch > (GridCacheMapEntry.java:5063) > > invoked there is no eviction manager. > > > > This is a result of "normal" flow because message processing doesn't > enter > > cache gate like user API does. > > > > пт, 30 нояб. 2018 г. в 10:26, Nikolay Izhikov <[hidden email]>: > > > >> Ivan. Please, provide a link for a ticket with NPE stack trace attached. > >> > >> I've looked at IGNITE-10376 and can't see any attachments. > >> > >> пт, 30 нояб. 2018 г., 10:14 Ivan Fedotov [hidden email]: > >> > >>> Igor, > >>> NPE is available in a full log, now I also attached it in the ticket. > >>> > >>> IGNITE-7953 > >>> < > >>> > >> > https://github.com/apache/ignite/commit/51a202a4c48220fa919f47147bd4889033cd35a8 > >>> was commited on the 15 October. I could not take a look on the > >>> testAtomicOnheapTwoBackupAsyncFullSync before this date, because the > >> oldest > >>> test in the history on TC dates 12 November. > >>> > >>> So, I tested it locally and could not reproduce mentioned error. > >>> > >>> чт, 29 нояб. 2018 г. в 20:07, Seliverstov Igor <[hidden email]>: > >>> > >>>> Ivan, > >>>> > >>>> Could you provide a bit more details? > >>>> > >>>> I don't see any NPE among all available logs. > >>>> > >>>> I don't think the issue is caused by changes in scope of IGNITE-7953. > >>>> The test fails both before > >>>> < > >>>> > >> > https://ci.ignite.apache.org/viewLog.html?buildId=2318582&tab=buildResultsDiv&buildTypeId=IgniteTests24Java8_ContinuousQuery4#testNameId3300126853696550025 > >>>> and after > >>>> < > >>>> > >> > https://ci.ignite.apache.org/viewLog.html?buildId=2345403&tab=buildResultsDiv&buildTypeId=IgniteTests24Java8_ContinuousQuery4#testNameId3300126853696550025 > >>>> the > >>>> commit was merged to master with almost the same stack trace. > >>>> > >>>> Regards, > >>>> Igor > >>>> > >>>> чт, 29 нояб. 2018 г. в 18:43, Yakov Zhdanov <[hidden email]>: > >>>> > >>>>> Vladimir, can you please take a look at > >>>>> https://issues.apache.org/jira/browse/IGNITE-10376? > >>>>> > >>>>> --Yakov > >>>>> > >>> > >>> -- > >>> Ivan Fedotov. > >>> > >>> [hidden email] > >>> > |
Nikolay,
I couldn't quickly find the root cause of this problem because I'm not an expert in the binary metadata flow. I think community should decide whether this is a release blocker or not. -- Kind Regards Roman Kondakov On 30.11.2018 13:23, Nikolay Izhikov wrote: > Hello, Roman. > > Is this issue blocks the 2.7 release? > > пт, 30 нояб. 2018 г., 13:19 Roman Kondakov [hidden email]: > >> Hi all! >> >> I've reproduced this problem locally and attached the log to the ticket >> in my comment [1]. >> >> As Igor noted, NPE there is caused by node stop in the end of the test. >> The real problem here seems to be in the binary metadata registration flow. >> >> >> [1] >> >> https://issues.apache.org/jira/browse/IGNITE-10376?focusedCommentId=16704510&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16704510 >> >> -- >> Kind Regards >> Roman Kondakov >> >> On 30.11.2018 11:56, Seliverstov Igor wrote: >>> Null pointer there due to cache stop. Look at GridCacheContext#cleanup >>> (GridCacheContext.java:2050) >>> which is called by GridCacheProcessor#stopCache >>> (GridCacheProcessor.java:1372) >>> >>> That's why at the time GridCacheMapEntry#touch >> (GridCacheMapEntry.java:5063) >>> invoked there is no eviction manager. >>> >>> This is a result of "normal" flow because message processing doesn't >> enter >>> cache gate like user API does. >>> >>> пт, 30 нояб. 2018 г. в 10:26, Nikolay Izhikov <[hidden email]>: >>> >>>> Ivan. Please, provide a link for a ticket with NPE stack trace attached. >>>> >>>> I've looked at IGNITE-10376 and can't see any attachments. >>>> >>>> пт, 30 нояб. 2018 г., 10:14 Ivan Fedotov [hidden email]: >>>> >>>>> Igor, >>>>> NPE is available in a full log, now I also attached it in the ticket. >>>>> >>>>> IGNITE-7953 >>>>> < >>>>> >> https://github.com/apache/ignite/commit/51a202a4c48220fa919f47147bd4889033cd35a8 >>>>> was commited on the 15 October. I could not take a look on the >>>>> testAtomicOnheapTwoBackupAsyncFullSync before this date, because the >>>> oldest >>>>> test in the history on TC dates 12 November. >>>>> >>>>> So, I tested it locally and could not reproduce mentioned error. >>>>> >>>>> чт, 29 нояб. 2018 г. в 20:07, Seliverstov Igor <[hidden email]>: >>>>> >>>>>> Ivan, >>>>>> >>>>>> Could you provide a bit more details? >>>>>> >>>>>> I don't see any NPE among all available logs. >>>>>> >>>>>> I don't think the issue is caused by changes in scope of IGNITE-7953. >>>>>> The test fails both before >>>>>> < >>>>>> >> https://ci.ignite.apache.org/viewLog.html?buildId=2318582&tab=buildResultsDiv&buildTypeId=IgniteTests24Java8_ContinuousQuery4#testNameId3300126853696550025 >>>>>> and after >>>>>> < >>>>>> >> https://ci.ignite.apache.org/viewLog.html?buildId=2345403&tab=buildResultsDiv&buildTypeId=IgniteTests24Java8_ContinuousQuery4#testNameId3300126853696550025 >>>>>> the >>>>>> commit was merged to master with almost the same stack trace. >>>>>> >>>>>> Regards, >>>>>> Igor >>>>>> >>>>>> чт, 29 нояб. 2018 г. в 18:43, Yakov Zhdanov <[hidden email]>: >>>>>> >>>>>>> Vladimir, can you please take a look at >>>>>>> https://issues.apache.org/jira/browse/IGNITE-10376? >>>>>>> >>>>>>> --Yakov >>>>>>> >>>>> -- >>>>> Ivan Fedotov. >>>>> >>>>> [hidden email] >>>>> |
Igor, thank you for explanation.
Now it seems that when the one thread tries to invoke GridCacheMapEntry#touch, the another one makes GridCacheProcessor#stopCache. If I am wrong, please feel free to correct me. But it still does not clear for me why this fail appears after commit <https://github.com/apache/ignite/commit/51a202a4c48220fa919f47147bd4889033cd35a8> which is about MVCC. Moreover, NPE appears only with BinaryObjectException, and when the test is green, I can not find NPE in the log. Now I tried to run test locally 1000 times on the version before MVCC and could not find error on this concretely case (but it exists the another one <https://github.com/apache/ignite/blob/master/modules/core/src/test/java/org/apache/ignite/internal/processors/cache/query/continuous/CacheContinuousQueryOrderingEventTest.java#L426> which is about assertion on received events). пт, 30 нояб. 2018 г. в 13:37, Roman Kondakov <[hidden email]>: > Nikolay, > > I couldn't quickly find the root cause of this problem because I'm not > an expert in the binary metadata flow. I think community should decide > whether this is a release blocker or not. > > > -- > Kind Regards > Roman Kondakov > > On 30.11.2018 13:23, Nikolay Izhikov wrote: > > Hello, Roman. > > > > Is this issue blocks the 2.7 release? > > > > пт, 30 нояб. 2018 г., 13:19 Roman Kondakov [hidden email]: > > > >> Hi all! > >> > >> I've reproduced this problem locally and attached the log to the ticket > >> in my comment [1]. > >> > >> As Igor noted, NPE there is caused by node stop in the end of the test. > >> The real problem here seems to be in the binary metadata registration > flow. > >> > >> > >> [1] > >> > >> > https://issues.apache.org/jira/browse/IGNITE-10376?focusedCommentId=16704510&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16704510 > >> > >> -- > >> Kind Regards > >> Roman Kondakov > >> > >> On 30.11.2018 11:56, Seliverstov Igor wrote: > >>> Null pointer there due to cache stop. Look at GridCacheContext#cleanup > >>> (GridCacheContext.java:2050) > >>> which is called by GridCacheProcessor#stopCache > >>> (GridCacheProcessor.java:1372) > >>> > >>> That's why at the time GridCacheMapEntry#touch > >> (GridCacheMapEntry.java:5063) > >>> invoked there is no eviction manager. > >>> > >>> This is a result of "normal" flow because message processing doesn't > >> enter > >>> cache gate like user API does. > >>> > >>> пт, 30 нояб. 2018 г. в 10:26, Nikolay Izhikov <[hidden email]>: > >>> > >>>> Ivan. Please, provide a link for a ticket with NPE stack trace > attached. > >>>> > >>>> I've looked at IGNITE-10376 and can't see any attachments. > >>>> > >>>> пт, 30 нояб. 2018 г., 10:14 Ivan Fedotov [hidden email]: > >>>> > >>>>> Igor, > >>>>> NPE is available in a full log, now I also attached it in the ticket. > >>>>> > >>>>> IGNITE-7953 > >>>>> < > >>>>> > >> > https://github.com/apache/ignite/commit/51a202a4c48220fa919f47147bd4889033cd35a8 > >>>>> was commited on the 15 October. I could not take a look on the > >>>>> testAtomicOnheapTwoBackupAsyncFullSync before this date, because the > >>>> oldest > >>>>> test in the history on TC dates 12 November. > >>>>> > >>>>> So, I tested it locally and could not reproduce mentioned error. > >>>>> > >>>>> чт, 29 нояб. 2018 г. в 20:07, Seliverstov Igor <[hidden email] > >: > >>>>> > >>>>>> Ivan, > >>>>>> > >>>>>> Could you provide a bit more details? > >>>>>> > >>>>>> I don't see any NPE among all available logs. > >>>>>> > >>>>>> I don't think the issue is caused by changes in scope of > IGNITE-7953. > >>>>>> The test fails both before > >>>>>> < > >>>>>> > >> > https://ci.ignite.apache.org/viewLog.html?buildId=2318582&tab=buildResultsDiv&buildTypeId=IgniteTests24Java8_ContinuousQuery4#testNameId3300126853696550025 > >>>>>> and after > >>>>>> < > >>>>>> > >> > https://ci.ignite.apache.org/viewLog.html?buildId=2345403&tab=buildResultsDiv&buildTypeId=IgniteTests24Java8_ContinuousQuery4#testNameId3300126853696550025 > >>>>>> the > >>>>>> commit was merged to master with almost the same stack trace. > >>>>>> > >>>>>> Regards, > >>>>>> Igor > >>>>>> > >>>>>> чт, 29 нояб. 2018 г. в 18:43, Yakov Zhdanov <[hidden email]>: > >>>>>> > >>>>>>> Vladimir, can you please take a look at > >>>>>>> https://issues.apache.org/jira/browse/IGNITE-10376? > >>>>>>> > >>>>>>> --Yakov > >>>>>>> > >>>>> -- > >>>>> Ivan Fedotov. > >>>>> > >>>>> [hidden email] > >>>>> > -- Ivan Fedotov. [hidden email] |
I've created the PR <https://github.com/apache/ignite/pull/5550> which
includes changes <https://github.com/1vanan/ignite/commits/before-MVCC> just before integration MVCC with Continuous Query and from the TeamCity <https://ci.ignite.apache.org/viewLog.html?buildId=2434057&tab=buildResultsDiv&buildTypeId=IgniteTests24Java8_ContinuousQuery1> it is clear that before this changes the test testAtomicOnheapTwoBackupAsyncFullSync is green. Also Roman Kondakov gave his view on this problem in the comments <https://issues.apache.org/jira/browse/IGNITE-10376>. Now the problem becomes more understandable, but the root reason is still unclear. May be a few of you have any suggestions why hang of threads on the binary metadata registration future appears? пт, 30 нояб. 2018 г. в 13:48, Ivan Fedotov <[hidden email]>: > Igor, thank you for explanation. > > Now it seems that when the one thread tries to invoke > GridCacheMapEntry#touch, the another one makes > GridCacheProcessor#stopCache. If I am wrong, please feel free to correct me. > > But it still does not clear for me why this fail appears after commit > <https://github.com/apache/ignite/commit/51a202a4c48220fa919f47147bd4889033cd35a8> which > is about MVCC. Moreover, NPE appears only with BinaryObjectException, and > when the test is green, I can not find NPE in the log. > > Now I tried to run test locally 1000 times on the version before MVCC and > could not find error on this concretely case (but it exists the another > one > <https://github.com/apache/ignite/blob/master/modules/core/src/test/java/org/apache/ignite/internal/processors/cache/query/continuous/CacheContinuousQueryOrderingEventTest.java#L426> which > is about assertion on received events). > > пт, 30 нояб. 2018 г. в 13:37, Roman Kondakov <[hidden email]>: > >> Nikolay, >> >> I couldn't quickly find the root cause of this problem because I'm not >> an expert in the binary metadata flow. I think community should decide >> whether this is a release blocker or not. >> >> >> -- >> Kind Regards >> Roman Kondakov >> >> On 30.11.2018 13:23, Nikolay Izhikov wrote: >> > Hello, Roman. >> > >> > Is this issue blocks the 2.7 release? >> > >> > пт, 30 нояб. 2018 г., 13:19 Roman Kondakov [hidden email]: >> > >> >> Hi all! >> >> >> >> I've reproduced this problem locally and attached the log to the ticket >> >> in my comment [1]. >> >> >> >> As Igor noted, NPE there is caused by node stop in the end of the test. >> >> The real problem here seems to be in the binary metadata registration >> flow. >> >> >> >> >> >> [1] >> >> >> >> >> https://issues.apache.org/jira/browse/IGNITE-10376?focusedCommentId=16704510&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16704510 >> >> >> >> -- >> >> Kind Regards >> >> Roman Kondakov >> >> >> >> On 30.11.2018 11:56, Seliverstov Igor wrote: >> >>> Null pointer there due to cache stop. Look at GridCacheContext#cleanup >> >>> (GridCacheContext.java:2050) >> >>> which is called by GridCacheProcessor#stopCache >> >>> (GridCacheProcessor.java:1372) >> >>> >> >>> That's why at the time GridCacheMapEntry#touch >> >> (GridCacheMapEntry.java:5063) >> >>> invoked there is no eviction manager. >> >>> >> >>> This is a result of "normal" flow because message processing doesn't >> >> enter >> >>> cache gate like user API does. >> >>> >> >>> пт, 30 нояб. 2018 г. в 10:26, Nikolay Izhikov <[hidden email]>: >> >>> >> >>>> Ivan. Please, provide a link for a ticket with NPE stack trace >> attached. >> >>>> >> >>>> I've looked at IGNITE-10376 and can't see any attachments. >> >>>> >> >>>> пт, 30 нояб. 2018 г., 10:14 Ivan Fedotov [hidden email]: >> >>>> >> >>>>> Igor, >> >>>>> NPE is available in a full log, now I also attached it in the >> ticket. >> >>>>> >> >>>>> IGNITE-7953 >> >>>>> < >> >>>>> >> >> >> https://github.com/apache/ignite/commit/51a202a4c48220fa919f47147bd4889033cd35a8 >> >>>>> was commited on the 15 October. I could not take a look on the >> >>>>> testAtomicOnheapTwoBackupAsyncFullSync before this date, because the >> >>>> oldest >> >>>>> test in the history on TC dates 12 November. >> >>>>> >> >>>>> So, I tested it locally and could not reproduce mentioned error. >> >>>>> >> >>>>> чт, 29 нояб. 2018 г. в 20:07, Seliverstov Igor < >> [hidden email]>: >> >>>>> >> >>>>>> Ivan, >> >>>>>> >> >>>>>> Could you provide a bit more details? >> >>>>>> >> >>>>>> I don't see any NPE among all available logs. >> >>>>>> >> >>>>>> I don't think the issue is caused by changes in scope of >> IGNITE-7953. >> >>>>>> The test fails both before >> >>>>>> < >> >>>>>> >> >> >> https://ci.ignite.apache.org/viewLog.html?buildId=2318582&tab=buildResultsDiv&buildTypeId=IgniteTests24Java8_ContinuousQuery4#testNameId3300126853696550025 >> >>>>>> and after >> >>>>>> < >> >>>>>> >> >> >> https://ci.ignite.apache.org/viewLog.html?buildId=2345403&tab=buildResultsDiv&buildTypeId=IgniteTests24Java8_ContinuousQuery4#testNameId3300126853696550025 >> >>>>>> the >> >>>>>> commit was merged to master with almost the same stack trace. >> >>>>>> >> >>>>>> Regards, >> >>>>>> Igor >> >>>>>> >> >>>>>> чт, 29 нояб. 2018 г. в 18:43, Yakov Zhdanov <[hidden email]>: >> >>>>>> >> >>>>>>> Vladimir, can you please take a look at >> >>>>>>> https://issues.apache.org/jira/browse/IGNITE-10376? >> >>>>>>> >> >>>>>>> --Yakov >> >>>>>>> >> >>>>> -- >> >>>>> Ivan Fedotov. >> >>>>> >> >>>>> [hidden email] >> >>>>> >> > > > -- > Ivan Fedotov. > > [hidden email] > -- Ivan Fedotov. [hidden email] |
Ivan, please, clarify.
How your investigation are related to 2.7 release? Do you think it's a release blocker? If yes, please, describe impact to users and how users can reproduce this issue. пн, 3 дек. 2018 г., 9:30 Ivan Fedotov [hidden email]: > I've created the PR <https://github.com/apache/ignite/pull/5550> which > includes changes <https://github.com/1vanan/ignite/commits/before-MVCC> > just before integration MVCC with Continuous Query and from the TeamCity > < > https://ci.ignite.apache.org/viewLog.html?buildId=2434057&tab=buildResultsDiv&buildTypeId=IgniteTests24Java8_ContinuousQuery1 > > > it is clear that before this changes the > test testAtomicOnheapTwoBackupAsyncFullSync is green. > > Also Roman Kondakov gave his view on this problem in the comments > <https://issues.apache.org/jira/browse/IGNITE-10376>. Now the problem > becomes more understandable, but the root reason is still unclear. > > May be a few of you have any suggestions why hang of threads on the binary > metadata registration future appears? > > пт, 30 нояб. 2018 г. в 13:48, Ivan Fedotov <[hidden email]>: > > > Igor, thank you for explanation. > > > > Now it seems that when the one thread tries to invoke > > GridCacheMapEntry#touch, the another one makes > > GridCacheProcessor#stopCache. If I am wrong, please feel free to correct > me. > > > > But it still does not clear for me why this fail appears after commit > > < > https://github.com/apache/ignite/commit/51a202a4c48220fa919f47147bd4889033cd35a8> > which > > is about MVCC. Moreover, NPE appears only with BinaryObjectException, and > > when the test is green, I can not find NPE in the log. > > > > Now I tried to run test locally 1000 times on the version before MVCC and > > could not find error on this concretely case (but it exists the another > > one > > < > https://github.com/apache/ignite/blob/master/modules/core/src/test/java/org/apache/ignite/internal/processors/cache/query/continuous/CacheContinuousQueryOrderingEventTest.java#L426> > which > > is about assertion on received events). > > > > пт, 30 нояб. 2018 г. в 13:37, Roman Kondakov <[hidden email] > >: > > > >> Nikolay, > >> > >> I couldn't quickly find the root cause of this problem because I'm not > >> an expert in the binary metadata flow. I think community should decide > >> whether this is a release blocker or not. > >> > >> > >> -- > >> Kind Regards > >> Roman Kondakov > >> > >> On 30.11.2018 13:23, Nikolay Izhikov wrote: > >> > Hello, Roman. > >> > > >> > Is this issue blocks the 2.7 release? > >> > > >> > пт, 30 нояб. 2018 г., 13:19 Roman Kondakov [hidden email] > : > >> > > >> >> Hi all! > >> >> > >> >> I've reproduced this problem locally and attached the log to the > ticket > >> >> in my comment [1]. > >> >> > >> >> As Igor noted, NPE there is caused by node stop in the end of the > test. > >> >> The real problem here seems to be in the binary metadata registration > >> flow. > >> >> > >> >> > >> >> [1] > >> >> > >> >> > >> > https://issues.apache.org/jira/browse/IGNITE-10376?focusedCommentId=16704510&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16704510 > >> >> > >> >> -- > >> >> Kind Regards > >> >> Roman Kondakov > >> >> > >> >> On 30.11.2018 11:56, Seliverstov Igor wrote: > >> >>> Null pointer there due to cache stop. Look at > GridCacheContext#cleanup > >> >>> (GridCacheContext.java:2050) > >> >>> which is called by GridCacheProcessor#stopCache > >> >>> (GridCacheProcessor.java:1372) > >> >>> > >> >>> That's why at the time GridCacheMapEntry#touch > >> >> (GridCacheMapEntry.java:5063) > >> >>> invoked there is no eviction manager. > >> >>> > >> >>> This is a result of "normal" flow because message processing doesn't > >> >> enter > >> >>> cache gate like user API does. > >> >>> > >> >>> пт, 30 нояб. 2018 г. в 10:26, Nikolay Izhikov <[hidden email] > >: > >> >>> > >> >>>> Ivan. Please, provide a link for a ticket with NPE stack trace > >> attached. > >> >>>> > >> >>>> I've looked at IGNITE-10376 and can't see any attachments. > >> >>>> > >> >>>> пт, 30 нояб. 2018 г., 10:14 Ivan Fedotov [hidden email]: > >> >>>> > >> >>>>> Igor, > >> >>>>> NPE is available in a full log, now I also attached it in the > >> ticket. > >> >>>>> > >> >>>>> IGNITE-7953 > >> >>>>> < > >> >>>>> > >> >> > >> > https://github.com/apache/ignite/commit/51a202a4c48220fa919f47147bd4889033cd35a8 > >> >>>>> was commited on the 15 October. I could not take a look on the > >> >>>>> testAtomicOnheapTwoBackupAsyncFullSync before this date, because > the > >> >>>> oldest > >> >>>>> test in the history on TC dates 12 November. > >> >>>>> > >> >>>>> So, I tested it locally and could not reproduce mentioned error. > >> >>>>> > >> >>>>> чт, 29 нояб. 2018 г. в 20:07, Seliverstov Igor < > >> [hidden email]>: > >> >>>>> > >> >>>>>> Ivan, > >> >>>>>> > >> >>>>>> Could you provide a bit more details? > >> >>>>>> > >> >>>>>> I don't see any NPE among all available logs. > >> >>>>>> > >> >>>>>> I don't think the issue is caused by changes in scope of > >> IGNITE-7953. > >> >>>>>> The test fails both before > >> >>>>>> < > >> >>>>>> > >> >> > >> > https://ci.ignite.apache.org/viewLog.html?buildId=2318582&tab=buildResultsDiv&buildTypeId=IgniteTests24Java8_ContinuousQuery4#testNameId3300126853696550025 > >> >>>>>> and after > >> >>>>>> < > >> >>>>>> > >> >> > >> > https://ci.ignite.apache.org/viewLog.html?buildId=2345403&tab=buildResultsDiv&buildTypeId=IgniteTests24Java8_ContinuousQuery4#testNameId3300126853696550025 > >> >>>>>> the > >> >>>>>> commit was merged to master with almost the same stack trace. > >> >>>>>> > >> >>>>>> Regards, > >> >>>>>> Igor > >> >>>>>> > >> >>>>>> чт, 29 нояб. 2018 г. в 18:43, Yakov Zhdanov <[hidden email] > >: > >> >>>>>> > >> >>>>>>> Vladimir, can you please take a look at > >> >>>>>>> https://issues.apache.org/jira/browse/IGNITE-10376? > >> >>>>>>> > >> >>>>>>> --Yakov > >> >>>>>>> > >> >>>>> -- > >> >>>>> Ivan Fedotov. > >> >>>>> > >> >>>>> [hidden email] > >> >>>>> > >> > > > > > > -- > > Ivan Fedotov. > > > > [hidden email] > > > > > -- > Ivan Fedotov. > > [hidden email] > |
Nikolay,
I think that end-user may face the problem during call IgniteCache#invoke on a cache with registered continious query if cache's configuration is as in the failed test: [PARTITIONED, ATOMIC, FULL_SYNCH, 2 backups]. I've found that failure has been introduced by MVCC commit [1]. As I understand the issue relates to the process of updating metadata, when the future of binary metadata registration hangs because of an unclear reason. I don't know if the issue the blocker, but seems it's regression because the test has been passed on Ignite 2.6 What do you think? [1] https://github.com/apache/ignite/commit/51a202a4c48220fa919f47147bd4889033cd35a8 пн, 3 дек. 2018 г. в 11:14, Nikolay Izhikov <[hidden email]>: > Ivan, please, clarify. > > How your investigation are related to 2.7 release? > Do you think it's a release blocker? > If yes, please, describe impact to users and how users can reproduce this > issue. > > пн, 3 дек. 2018 г., 9:30 Ivan Fedotov [hidden email]: > > > I've created the PR <https://github.com/apache/ignite/pull/5550> which > > includes changes <https://github.com/1vanan/ignite/commits/before-MVCC> > > just before integration MVCC with Continuous Query and from the TeamCity > > < > > > https://ci.ignite.apache.org/viewLog.html?buildId=2434057&tab=buildResultsDiv&buildTypeId=IgniteTests24Java8_ContinuousQuery1 > > > > > it is clear that before this changes the > > test testAtomicOnheapTwoBackupAsyncFullSync is green. > > > > Also Roman Kondakov gave his view on this problem in the comments > > <https://issues.apache.org/jira/browse/IGNITE-10376>. Now the problem > > becomes more understandable, but the root reason is still unclear. > > > > May be a few of you have any suggestions why hang of threads on the > binary > > metadata registration future appears? > > > > пт, 30 нояб. 2018 г. в 13:48, Ivan Fedotov <[hidden email]>: > > > > > Igor, thank you for explanation. > > > > > > Now it seems that when the one thread tries to invoke > > > GridCacheMapEntry#touch, the another one makes > > > GridCacheProcessor#stopCache. If I am wrong, please feel free to > correct > > me. > > > > > > But it still does not clear for me why this fail appears after commit > > > < > > > https://github.com/apache/ignite/commit/51a202a4c48220fa919f47147bd4889033cd35a8 > > > > which > > > is about MVCC. Moreover, NPE appears only with BinaryObjectException, > and > > > when the test is green, I can not find NPE in the log. > > > > > > Now I tried to run test locally 1000 times on the version before MVCC > and > > > could not find error on this concretely case (but it exists the another > > > one > > > < > > > https://github.com/apache/ignite/blob/master/modules/core/src/test/java/org/apache/ignite/internal/processors/cache/query/continuous/CacheContinuousQueryOrderingEventTest.java#L426 > > > > which > > > is about assertion on received events). > > > > > > пт, 30 нояб. 2018 г. в 13:37, Roman Kondakov > <[hidden email] > > >: > > > > > >> Nikolay, > > >> > > >> I couldn't quickly find the root cause of this problem because I'm not > > >> an expert in the binary metadata flow. I think community should decide > > >> whether this is a release blocker or not. > > >> > > >> > > >> -- > > >> Kind Regards > > >> Roman Kondakov > > >> > > >> On 30.11.2018 13:23, Nikolay Izhikov wrote: > > >> > Hello, Roman. > > >> > > > >> > Is this issue blocks the 2.7 release? > > >> > > > >> > пт, 30 нояб. 2018 г., 13:19 Roman Kondakov > [hidden email] > > : > > >> > > > >> >> Hi all! > > >> >> > > >> >> I've reproduced this problem locally and attached the log to the > > ticket > > >> >> in my comment [1]. > > >> >> > > >> >> As Igor noted, NPE there is caused by node stop in the end of the > > test. > > >> >> The real problem here seems to be in the binary metadata > registration > > >> flow. > > >> >> > > >> >> > > >> >> [1] > > >> >> > > >> >> > > >> > > > https://issues.apache.org/jira/browse/IGNITE-10376?focusedCommentId=16704510&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16704510 > > >> >> > > >> >> -- > > >> >> Kind Regards > > >> >> Roman Kondakov > > >> >> > > >> >> On 30.11.2018 11:56, Seliverstov Igor wrote: > > >> >>> Null pointer there due to cache stop. Look at > > GridCacheContext#cleanup > > >> >>> (GridCacheContext.java:2050) > > >> >>> which is called by GridCacheProcessor#stopCache > > >> >>> (GridCacheProcessor.java:1372) > > >> >>> > > >> >>> That's why at the time GridCacheMapEntry#touch > > >> >> (GridCacheMapEntry.java:5063) > > >> >>> invoked there is no eviction manager. > > >> >>> > > >> >>> This is a result of "normal" flow because message processing > doesn't > > >> >> enter > > >> >>> cache gate like user API does. > > >> >>> > > >> >>> пт, 30 нояб. 2018 г. в 10:26, Nikolay Izhikov < > [hidden email] > > >: > > >> >>> > > >> >>>> Ivan. Please, provide a link for a ticket with NPE stack trace > > >> attached. > > >> >>>> > > >> >>>> I've looked at IGNITE-10376 and can't see any attachments. > > >> >>>> > > >> >>>> пт, 30 нояб. 2018 г., 10:14 Ivan Fedotov [hidden email]: > > >> >>>> > > >> >>>>> Igor, > > >> >>>>> NPE is available in a full log, now I also attached it in the > > >> ticket. > > >> >>>>> > > >> >>>>> IGNITE-7953 > > >> >>>>> < > > >> >>>>> > > >> >> > > >> > > > https://github.com/apache/ignite/commit/51a202a4c48220fa919f47147bd4889033cd35a8 > > >> >>>>> was commited on the 15 October. I could not take a look on the > > >> >>>>> testAtomicOnheapTwoBackupAsyncFullSync before this date, because > > the > > >> >>>> oldest > > >> >>>>> test in the history on TC dates 12 November. > > >> >>>>> > > >> >>>>> So, I tested it locally and could not reproduce mentioned error. > > >> >>>>> > > >> >>>>> чт, 29 нояб. 2018 г. в 20:07, Seliverstov Igor < > > >> [hidden email]>: > > >> >>>>> > > >> >>>>>> Ivan, > > >> >>>>>> > > >> >>>>>> Could you provide a bit more details? > > >> >>>>>> > > >> >>>>>> I don't see any NPE among all available logs. > > >> >>>>>> > > >> >>>>>> I don't think the issue is caused by changes in scope of > > >> IGNITE-7953. > > >> >>>>>> The test fails both before > > >> >>>>>> < > > >> >>>>>> > > >> >> > > >> > > > https://ci.ignite.apache.org/viewLog.html?buildId=2318582&tab=buildResultsDiv&buildTypeId=IgniteTests24Java8_ContinuousQuery4#testNameId3300126853696550025 > > >> >>>>>> and after > > >> >>>>>> < > > >> >>>>>> > > >> >> > > >> > > > https://ci.ignite.apache.org/viewLog.html?buildId=2345403&tab=buildResultsDiv&buildTypeId=IgniteTests24Java8_ContinuousQuery4#testNameId3300126853696550025 > > >> >>>>>> the > > >> >>>>>> commit was merged to master with almost the same stack trace. > > >> >>>>>> > > >> >>>>>> Regards, > > >> >>>>>> Igor > > >> >>>>>> > > >> >>>>>> чт, 29 нояб. 2018 г. в 18:43, Yakov Zhdanov < > [hidden email] > > >: > > >> >>>>>> > > >> >>>>>>> Vladimir, can you please take a look at > > >> >>>>>>> https://issues.apache.org/jira/browse/IGNITE-10376? > > >> >>>>>>> > > >> >>>>>>> --Yakov > > >> >>>>>>> > > >> >>>>> -- > > >> >>>>> Ivan Fedotov. > > >> >>>>> > > >> >>>>> [hidden email] > > >> >>>>> > > >> > > > > > > > > > -- > > > Ivan Fedotov. > > > > > > [hidden email] > > > > > > > > > -- > > Ivan Fedotov. > > > > [hidden email] > > > -- Ivan Fedotov. [hidden email] |
Guys, I checked that `testAtomicOnheapTwoBackupAsyncFullSync` failed
in the master (as described Ivan), but it passes in branch ignite-2.7 (tag 2.7.0-rc2), so this shouldn't block the release. Ivan, were you able to reproduce this issue in ignite-2.7 branch? On Mon, Dec 3, 2018 at 1:03 PM Ivan Fedotov <[hidden email]> wrote: > > Nikolay, > > I think that end-user may face the problem during call IgniteCache#invoke > on a cache with registered continious query if cache's configuration is as > in the failed test: [PARTITIONED, ATOMIC, FULL_SYNCH, 2 backups]. > > I've found that failure has been introduced by MVCC commit [1]. As I > understand the issue relates to the process of updating metadata, when the > future of binary metadata registration hangs because of an unclear reason. > > I don't know if the issue the blocker, but seems it's regression because > the test has been passed on Ignite 2.6 > > What do you think? > > [1] > https://github.com/apache/ignite/commit/51a202a4c48220fa919f47147bd4889033cd35a8 > > пн, 3 дек. 2018 г. в 11:14, Nikolay Izhikov <[hidden email]>: > > > Ivan, please, clarify. > > > > How your investigation are related to 2.7 release? > > Do you think it's a release blocker? > > If yes, please, describe impact to users and how users can reproduce this > > issue. > > > > пн, 3 дек. 2018 г., 9:30 Ivan Fedotov [hidden email]: > > > > > I've created the PR <https://github.com/apache/ignite/pull/5550> which > > > includes changes <https://github.com/1vanan/ignite/commits/before-MVCC> > > > just before integration MVCC with Continuous Query and from the TeamCity > > > < > > > > > https://ci.ignite.apache.org/viewLog.html?buildId=2434057&tab=buildResultsDiv&buildTypeId=IgniteTests24Java8_ContinuousQuery1 > > > > > > > it is clear that before this changes the > > > test testAtomicOnheapTwoBackupAsyncFullSync is green. > > > > > > Also Roman Kondakov gave his view on this problem in the comments > > > <https://issues.apache.org/jira/browse/IGNITE-10376>. Now the problem > > > becomes more understandable, but the root reason is still unclear. > > > > > > May be a few of you have any suggestions why hang of threads on the > > binary > > > metadata registration future appears? > > > > > > пт, 30 нояб. 2018 г. в 13:48, Ivan Fedotov <[hidden email]>: > > > > > > > Igor, thank you for explanation. > > > > > > > > Now it seems that when the one thread tries to invoke > > > > GridCacheMapEntry#touch, the another one makes > > > > GridCacheProcessor#stopCache. If I am wrong, please feel free to > > correct > > > me. > > > > > > > > But it still does not clear for me why this fail appears after commit > > > > < > > > > > https://github.com/apache/ignite/commit/51a202a4c48220fa919f47147bd4889033cd35a8 > > > > > > which > > > > is about MVCC. Moreover, NPE appears only with BinaryObjectException, > > and > > > > when the test is green, I can not find NPE in the log. > > > > > > > > Now I tried to run test locally 1000 times on the version before MVCC > > and > > > > could not find error on this concretely case (but it exists the another > > > > one > > > > < > > > > > https://github.com/apache/ignite/blob/master/modules/core/src/test/java/org/apache/ignite/internal/processors/cache/query/continuous/CacheContinuousQueryOrderingEventTest.java#L426 > > > > > > which > > > > is about assertion on received events). > > > > > > > > пт, 30 нояб. 2018 г. в 13:37, Roman Kondakov > > <[hidden email] > > > >: > > > > > > > >> Nikolay, > > > >> > > > >> I couldn't quickly find the root cause of this problem because I'm not > > > >> an expert in the binary metadata flow. I think community should decide > > > >> whether this is a release blocker or not. > > > >> > > > >> > > > >> -- > > > >> Kind Regards > > > >> Roman Kondakov > > > >> > > > >> On 30.11.2018 13:23, Nikolay Izhikov wrote: > > > >> > Hello, Roman. > > > >> > > > > >> > Is this issue blocks the 2.7 release? > > > >> > > > > >> > пт, 30 нояб. 2018 г., 13:19 Roman Kondakov > > [hidden email] > > > : > > > >> > > > > >> >> Hi all! > > > >> >> > > > >> >> I've reproduced this problem locally and attached the log to the > > > ticket > > > >> >> in my comment [1]. > > > >> >> > > > >> >> As Igor noted, NPE there is caused by node stop in the end of the > > > test. > > > >> >> The real problem here seems to be in the binary metadata > > registration > > > >> flow. > > > >> >> > > > >> >> > > > >> >> [1] > > > >> >> > > > >> >> > > > >> > > > > > https://issues.apache.org/jira/browse/IGNITE-10376?focusedCommentId=16704510&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16704510 > > > >> >> > > > >> >> -- > > > >> >> Kind Regards > > > >> >> Roman Kondakov > > > >> >> > > > >> >> On 30.11.2018 11:56, Seliverstov Igor wrote: > > > >> >>> Null pointer there due to cache stop. Look at > > > GridCacheContext#cleanup > > > >> >>> (GridCacheContext.java:2050) > > > >> >>> which is called by GridCacheProcessor#stopCache > > > >> >>> (GridCacheProcessor.java:1372) > > > >> >>> > > > >> >>> That's why at the time GridCacheMapEntry#touch > > > >> >> (GridCacheMapEntry.java:5063) > > > >> >>> invoked there is no eviction manager. > > > >> >>> > > > >> >>> This is a result of "normal" flow because message processing > > doesn't > > > >> >> enter > > > >> >>> cache gate like user API does. > > > >> >>> > > > >> >>> пт, 30 нояб. 2018 г. в 10:26, Nikolay Izhikov < > > [hidden email] > > > >: > > > >> >>> > > > >> >>>> Ivan. Please, provide a link for a ticket with NPE stack trace > > > >> attached. > > > >> >>>> > > > >> >>>> I've looked at IGNITE-10376 and can't see any attachments. > > > >> >>>> > > > >> >>>> пт, 30 нояб. 2018 г., 10:14 Ivan Fedotov [hidden email]: > > > >> >>>> > > > >> >>>>> Igor, > > > >> >>>>> NPE is available in a full log, now I also attached it in the > > > >> ticket. > > > >> >>>>> > > > >> >>>>> IGNITE-7953 > > > >> >>>>> < > > > >> >>>>> > > > >> >> > > > >> > > > > > https://github.com/apache/ignite/commit/51a202a4c48220fa919f47147bd4889033cd35a8 > > > >> >>>>> was commited on the 15 October. I could not take a look on the > > > >> >>>>> testAtomicOnheapTwoBackupAsyncFullSync before this date, because > > > the > > > >> >>>> oldest > > > >> >>>>> test in the history on TC dates 12 November. > > > >> >>>>> > > > >> >>>>> So, I tested it locally and could not reproduce mentioned error. > > > >> >>>>> > > > >> >>>>> чт, 29 нояб. 2018 г. в 20:07, Seliverstov Igor < > > > >> [hidden email]>: > > > >> >>>>> > > > >> >>>>>> Ivan, > > > >> >>>>>> > > > >> >>>>>> Could you provide a bit more details? > > > >> >>>>>> > > > >> >>>>>> I don't see any NPE among all available logs. > > > >> >>>>>> > > > >> >>>>>> I don't think the issue is caused by changes in scope of > > > >> IGNITE-7953. > > > >> >>>>>> The test fails both before > > > >> >>>>>> < > > > >> >>>>>> > > > >> >> > > > >> > > > > > https://ci.ignite.apache.org/viewLog.html?buildId=2318582&tab=buildResultsDiv&buildTypeId=IgniteTests24Java8_ContinuousQuery4#testNameId3300126853696550025 > > > >> >>>>>> and after > > > >> >>>>>> < > > > >> >>>>>> > > > >> >> > > > >> > > > > > https://ci.ignite.apache.org/viewLog.html?buildId=2345403&tab=buildResultsDiv&buildTypeId=IgniteTests24Java8_ContinuousQuery4#testNameId3300126853696550025 > > > >> >>>>>> the > > > >> >>>>>> commit was merged to master with almost the same stack trace. > > > >> >>>>>> > > > >> >>>>>> Regards, > > > >> >>>>>> Igor > > > >> >>>>>> > > > >> >>>>>> чт, 29 нояб. 2018 г. в 18:43, Yakov Zhdanov < > > [hidden email] > > > >: > > > >> >>>>>> > > > >> >>>>>>> Vladimir, can you please take a look at > > > >> >>>>>>> https://issues.apache.org/jira/browse/IGNITE-10376? > > > >> >>>>>>> > > > >> >>>>>>> --Yakov > > > >> >>>>>>> > > > >> >>>>> -- > > > >> >>>>> Ivan Fedotov. > > > >> >>>>> > > > >> >>>>> [hidden email] > > > >> >>>>> > > > >> > > > > > > > > > > > > -- > > > > Ivan Fedotov. > > > > > > > > [hidden email] > > > > > > > > > > > > > -- > > > Ivan Fedotov. > > > > > > [hidden email] > > > > > > > > -- > Ivan Fedotov. > > [hidden email] -- Best Regards, Vyacheslav D. |
Confirming. Test never failed in AI 2.7 even though it contains mentioned
MVCC commit. On Mon, Dec 3, 2018 at 1:36 PM Vyacheslav Daradur <[hidden email]> wrote: > Guys, I checked that `testAtomicOnheapTwoBackupAsyncFullSync` failed > in the master (as described Ivan), but it passes in branch ignite-2.7 > (tag 2.7.0-rc2), so this shouldn't block the release. > > Ivan, were you able to reproduce this issue in ignite-2.7 branch? > > > On Mon, Dec 3, 2018 at 1:03 PM Ivan Fedotov <[hidden email]> wrote: > > > > Nikolay, > > > > I think that end-user may face the problem during call IgniteCache#invoke > > on a cache with registered continious query if cache's configuration is > as > > in the failed test: [PARTITIONED, ATOMIC, FULL_SYNCH, 2 backups]. > > > > I've found that failure has been introduced by MVCC commit [1]. As I > > understand the issue relates to the process of updating metadata, when > the > > future of binary metadata registration hangs because of an unclear > reason. > > > > I don't know if the issue the blocker, but seems it's regression because > > the test has been passed on Ignite 2.6 > > > > What do you think? > > > > [1] > > > https://github.com/apache/ignite/commit/51a202a4c48220fa919f47147bd4889033cd35a8 > > > > пн, 3 дек. 2018 г. в 11:14, Nikolay Izhikov <[hidden email]>: > > > > > Ivan, please, clarify. > > > > > > How your investigation are related to 2.7 release? > > > Do you think it's a release blocker? > > > If yes, please, describe impact to users and how users can reproduce > this > > > issue. > > > > > > пн, 3 дек. 2018 г., 9:30 Ivan Fedotov [hidden email]: > > > > > > > I've created the PR <https://github.com/apache/ignite/pull/5550> > which > > > > includes changes < > https://github.com/1vanan/ignite/commits/before-MVCC> > > > > just before integration MVCC with Continuous Query and from the > TeamCity > > > > < > > > > > > > > https://ci.ignite.apache.org/viewLog.html?buildId=2434057&tab=buildResultsDiv&buildTypeId=IgniteTests24Java8_ContinuousQuery1 > > > > > > > > > it is clear that before this changes the > > > > test testAtomicOnheapTwoBackupAsyncFullSync is green. > > > > > > > > Also Roman Kondakov gave his view on this problem in the comments > > > > <https://issues.apache.org/jira/browse/IGNITE-10376>. Now the > problem > > > > becomes more understandable, but the root reason is still unclear. > > > > > > > > May be a few of you have any suggestions why hang of threads on the > > > binary > > > > metadata registration future appears? > > > > > > > > пт, 30 нояб. 2018 г. в 13:48, Ivan Fedotov <[hidden email]>: > > > > > > > > > Igor, thank you for explanation. > > > > > > > > > > Now it seems that when the one thread tries to invoke > > > > > GridCacheMapEntry#touch, the another one makes > > > > > GridCacheProcessor#stopCache. If I am wrong, please feel free to > > > correct > > > > me. > > > > > > > > > > But it still does not clear for me why this fail appears after > commit > > > > > < > > > > > > > > https://github.com/apache/ignite/commit/51a202a4c48220fa919f47147bd4889033cd35a8 > > > > > > > > which > > > > > is about MVCC. Moreover, NPE appears only with > BinaryObjectException, > > > and > > > > > when the test is green, I can not find NPE in the log. > > > > > > > > > > Now I tried to run test locally 1000 times on the version before > MVCC > > > and > > > > > could not find error on this concretely case (but it exists the > another > > > > > one > > > > > < > > > > > > > > https://github.com/apache/ignite/blob/master/modules/core/src/test/java/org/apache/ignite/internal/processors/cache/query/continuous/CacheContinuousQueryOrderingEventTest.java#L426 > > > > > > > > which > > > > > is about assertion on received events). > > > > > > > > > > пт, 30 нояб. 2018 г. в 13:37, Roman Kondakov > > > <[hidden email] > > > > >: > > > > > > > > > >> Nikolay, > > > > >> > > > > >> I couldn't quickly find the root cause of this problem because > I'm not > > > > >> an expert in the binary metadata flow. I think community should > decide > > > > >> whether this is a release blocker or not. > > > > >> > > > > >> > > > > >> -- > > > > >> Kind Regards > > > > >> Roman Kondakov > > > > >> > > > > >> On 30.11.2018 13:23, Nikolay Izhikov wrote: > > > > >> > Hello, Roman. > > > > >> > > > > > >> > Is this issue blocks the 2.7 release? > > > > >> > > > > > >> > пт, 30 нояб. 2018 г., 13:19 Roman Kondakov > > > [hidden email] > > > > : > > > > >> > > > > > >> >> Hi all! > > > > >> >> > > > > >> >> I've reproduced this problem locally and attached the log to > the > > > > ticket > > > > >> >> in my comment [1]. > > > > >> >> > > > > >> >> As Igor noted, NPE there is caused by node stop in the end of > the > > > > test. > > > > >> >> The real problem here seems to be in the binary metadata > > > registration > > > > >> flow. > > > > >> >> > > > > >> >> > > > > >> >> [1] > > > > >> >> > > > > >> >> > > > > >> > > > > > > > > https://issues.apache.org/jira/browse/IGNITE-10376?focusedCommentId=16704510&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16704510 > > > > >> >> > > > > >> >> -- > > > > >> >> Kind Regards > > > > >> >> Roman Kondakov > > > > >> >> > > > > >> >> On 30.11.2018 11:56, Seliverstov Igor wrote: > > > > >> >>> Null pointer there due to cache stop. Look at > > > > GridCacheContext#cleanup > > > > >> >>> (GridCacheContext.java:2050) > > > > >> >>> which is called by GridCacheProcessor#stopCache > > > > >> >>> (GridCacheProcessor.java:1372) > > > > >> >>> > > > > >> >>> That's why at the time GridCacheMapEntry#touch > > > > >> >> (GridCacheMapEntry.java:5063) > > > > >> >>> invoked there is no eviction manager. > > > > >> >>> > > > > >> >>> This is a result of "normal" flow because message processing > > > doesn't > > > > >> >> enter > > > > >> >>> cache gate like user API does. > > > > >> >>> > > > > >> >>> пт, 30 нояб. 2018 г. в 10:26, Nikolay Izhikov < > > > [hidden email] > > > > >: > > > > >> >>> > > > > >> >>>> Ivan. Please, provide a link for a ticket with NPE stack > trace > > > > >> attached. > > > > >> >>>> > > > > >> >>>> I've looked at IGNITE-10376 and can't see any attachments. > > > > >> >>>> > > > > >> >>>> пт, 30 нояб. 2018 г., 10:14 Ivan Fedotov [hidden email] > : > > > > >> >>>> > > > > >> >>>>> Igor, > > > > >> >>>>> NPE is available in a full log, now I also attached it in > the > > > > >> ticket. > > > > >> >>>>> > > > > >> >>>>> IGNITE-7953 > > > > >> >>>>> < > > > > >> >>>>> > > > > >> >> > > > > >> > > > > > > > > https://github.com/apache/ignite/commit/51a202a4c48220fa919f47147bd4889033cd35a8 > > > > >> >>>>> was commited on the 15 October. I could not take a look on > the > > > > >> >>>>> testAtomicOnheapTwoBackupAsyncFullSync before this date, > because > > > > the > > > > >> >>>> oldest > > > > >> >>>>> test in the history on TC dates 12 November. > > > > >> >>>>> > > > > >> >>>>> So, I tested it locally and could not reproduce mentioned > error. > > > > >> >>>>> > > > > >> >>>>> чт, 29 нояб. 2018 г. в 20:07, Seliverstov Igor < > > > > >> [hidden email]>: > > > > >> >>>>> > > > > >> >>>>>> Ivan, > > > > >> >>>>>> > > > > >> >>>>>> Could you provide a bit more details? > > > > >> >>>>>> > > > > >> >>>>>> I don't see any NPE among all available logs. > > > > >> >>>>>> > > > > >> >>>>>> I don't think the issue is caused by changes in scope of > > > > >> IGNITE-7953. > > > > >> >>>>>> The test fails both before > > > > >> >>>>>> < > > > > >> >>>>>> > > > > >> >> > > > > >> > > > > > > > > https://ci.ignite.apache.org/viewLog.html?buildId=2318582&tab=buildResultsDiv&buildTypeId=IgniteTests24Java8_ContinuousQuery4#testNameId3300126853696550025 > > > > >> >>>>>> and after > > > > >> >>>>>> < > > > > >> >>>>>> > > > > >> >> > > > > >> > > > > > > > > https://ci.ignite.apache.org/viewLog.html?buildId=2345403&tab=buildResultsDiv&buildTypeId=IgniteTests24Java8_ContinuousQuery4#testNameId3300126853696550025 > > > > >> >>>>>> the > > > > >> >>>>>> commit was merged to master with almost the same stack > trace. > > > > >> >>>>>> > > > > >> >>>>>> Regards, > > > > >> >>>>>> Igor > > > > >> >>>>>> > > > > >> >>>>>> чт, 29 нояб. 2018 г. в 18:43, Yakov Zhdanov < > > > [hidden email] > > > > >: > > > > >> >>>>>> > > > > >> >>>>>>> Vladimir, can you please take a look at > > > > >> >>>>>>> https://issues.apache.org/jira/browse/IGNITE-10376? > > > > >> >>>>>>> > > > > >> >>>>>>> --Yakov > > > > >> >>>>>>> > > > > >> >>>>> -- > > > > >> >>>>> Ivan Fedotov. > > > > >> >>>>> > > > > >> >>>>> [hidden email] > > > > >> >>>>> > > > > >> > > > > > > > > > > > > > > > -- > > > > > Ivan Fedotov. > > > > > > > > > > [hidden email] > > > > > > > > > > > > > > > > > -- > > > > Ivan Fedotov. > > > > > > > > [hidden email] > > > > > > > > > > > > > -- > > Ivan Fedotov. > > > > [hidden email] > > > > -- > Best Regards, Vyacheslav D. > |
Vyacheslav, thank you for remark. I've tried to launch test on the 2.7
version and it is fine. I changed priority of the ticket from "Blocker" to "Major" and fix version to 2.8. пн, 3 дек. 2018 г. в 13:53, Vladimir Ozerov <[hidden email]>: > Confirming. Test never failed in AI 2.7 even though it contains mentioned > MVCC commit. > > On Mon, Dec 3, 2018 at 1:36 PM Vyacheslav Daradur <[hidden email]> > wrote: > > > Guys, I checked that `testAtomicOnheapTwoBackupAsyncFullSync` failed > > in the master (as described Ivan), but it passes in branch ignite-2.7 > > (tag 2.7.0-rc2), so this shouldn't block the release. > > > > Ivan, were you able to reproduce this issue in ignite-2.7 branch? > > > > > > On Mon, Dec 3, 2018 at 1:03 PM Ivan Fedotov <[hidden email]> wrote: > > > > > > Nikolay, > > > > > > I think that end-user may face the problem during call > IgniteCache#invoke > > > on a cache with registered continious query if cache's configuration is > > as > > > in the failed test: [PARTITIONED, ATOMIC, FULL_SYNCH, 2 backups]. > > > > > > I've found that failure has been introduced by MVCC commit [1]. As I > > > understand the issue relates to the process of updating metadata, when > > the > > > future of binary metadata registration hangs because of an unclear > > reason. > > > > > > I don't know if the issue the blocker, but seems it's regression > because > > > the test has been passed on Ignite 2.6 > > > > > > What do you think? > > > > > > [1] > > > > > > https://github.com/apache/ignite/commit/51a202a4c48220fa919f47147bd4889033cd35a8 > > > > > > пн, 3 дек. 2018 г. в 11:14, Nikolay Izhikov <[hidden email]>: > > > > > > > Ivan, please, clarify. > > > > > > > > How your investigation are related to 2.7 release? > > > > Do you think it's a release blocker? > > > > If yes, please, describe impact to users and how users can reproduce > > this > > > > issue. > > > > > > > > пн, 3 дек. 2018 г., 9:30 Ivan Fedotov [hidden email]: > > > > > > > > > I've created the PR <https://github.com/apache/ignite/pull/5550> > > which > > > > > includes changes < > > https://github.com/1vanan/ignite/commits/before-MVCC> > > > > > just before integration MVCC with Continuous Query and from the > > TeamCity > > > > > < > > > > > > > > > > > > https://ci.ignite.apache.org/viewLog.html?buildId=2434057&tab=buildResultsDiv&buildTypeId=IgniteTests24Java8_ContinuousQuery1 > > > > > > > > > > > it is clear that before this changes the > > > > > test testAtomicOnheapTwoBackupAsyncFullSync is green. > > > > > > > > > > Also Roman Kondakov gave his view on this problem in the comments > > > > > <https://issues.apache.org/jira/browse/IGNITE-10376>. Now the > > problem > > > > > becomes more understandable, but the root reason is still unclear. > > > > > > > > > > May be a few of you have any suggestions why hang of threads on the > > > > binary > > > > > metadata registration future appears? > > > > > > > > > > пт, 30 нояб. 2018 г. в 13:48, Ivan Fedotov <[hidden email]>: > > > > > > > > > > > Igor, thank you for explanation. > > > > > > > > > > > > Now it seems that when the one thread tries to invoke > > > > > > GridCacheMapEntry#touch, the another one makes > > > > > > GridCacheProcessor#stopCache. If I am wrong, please feel free to > > > > correct > > > > > me. > > > > > > > > > > > > But it still does not clear for me why this fail appears after > > commit > > > > > > < > > > > > > > > > > > > https://github.com/apache/ignite/commit/51a202a4c48220fa919f47147bd4889033cd35a8 > > > > > > > > > > which > > > > > > is about MVCC. Moreover, NPE appears only with > > BinaryObjectException, > > > > and > > > > > > when the test is green, I can not find NPE in the log. > > > > > > > > > > > > Now I tried to run test locally 1000 times on the version before > > MVCC > > > > and > > > > > > could not find error on this concretely case (but it exists the > > another > > > > > > one > > > > > > < > > > > > > > > > > > > https://github.com/apache/ignite/blob/master/modules/core/src/test/java/org/apache/ignite/internal/processors/cache/query/continuous/CacheContinuousQueryOrderingEventTest.java#L426 > > > > > > > > > > which > > > > > > is about assertion on received events). > > > > > > > > > > > > пт, 30 нояб. 2018 г. в 13:37, Roman Kondakov > > > > <[hidden email] > > > > > >: > > > > > > > > > > > >> Nikolay, > > > > > >> > > > > > >> I couldn't quickly find the root cause of this problem because > > I'm not > > > > > >> an expert in the binary metadata flow. I think community should > > decide > > > > > >> whether this is a release blocker or not. > > > > > >> > > > > > >> > > > > > >> -- > > > > > >> Kind Regards > > > > > >> Roman Kondakov > > > > > >> > > > > > >> On 30.11.2018 13:23, Nikolay Izhikov wrote: > > > > > >> > Hello, Roman. > > > > > >> > > > > > > >> > Is this issue blocks the 2.7 release? > > > > > >> > > > > > > >> > пт, 30 нояб. 2018 г., 13:19 Roman Kondakov > > > > [hidden email] > > > > > : > > > > > >> > > > > > > >> >> Hi all! > > > > > >> >> > > > > > >> >> I've reproduced this problem locally and attached the log to > > the > > > > > ticket > > > > > >> >> in my comment [1]. > > > > > >> >> > > > > > >> >> As Igor noted, NPE there is caused by node stop in the end of > > the > > > > > test. > > > > > >> >> The real problem here seems to be in the binary metadata > > > > registration > > > > > >> flow. > > > > > >> >> > > > > > >> >> > > > > > >> >> [1] > > > > > >> >> > > > > > >> >> > > > > > >> > > > > > > > > > > > > https://issues.apache.org/jira/browse/IGNITE-10376?focusedCommentId=16704510&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16704510 > > > > > >> >> > > > > > >> >> -- > > > > > >> >> Kind Regards > > > > > >> >> Roman Kondakov > > > > > >> >> > > > > > >> >> On 30.11.2018 11:56, Seliverstov Igor wrote: > > > > > >> >>> Null pointer there due to cache stop. Look at > > > > > GridCacheContext#cleanup > > > > > >> >>> (GridCacheContext.java:2050) > > > > > >> >>> which is called by GridCacheProcessor#stopCache > > > > > >> >>> (GridCacheProcessor.java:1372) > > > > > >> >>> > > > > > >> >>> That's why at the time GridCacheMapEntry#touch > > > > > >> >> (GridCacheMapEntry.java:5063) > > > > > >> >>> invoked there is no eviction manager. > > > > > >> >>> > > > > > >> >>> This is a result of "normal" flow because message processing > > > > doesn't > > > > > >> >> enter > > > > > >> >>> cache gate like user API does. > > > > > >> >>> > > > > > >> >>> пт, 30 нояб. 2018 г. в 10:26, Nikolay Izhikov < > > > > [hidden email] > > > > > >: > > > > > >> >>> > > > > > >> >>>> Ivan. Please, provide a link for a ticket with NPE stack > > trace > > > > > >> attached. > > > > > >> >>>> > > > > > >> >>>> I've looked at IGNITE-10376 and can't see any attachments. > > > > > >> >>>> > > > > > >> >>>> пт, 30 нояб. 2018 г., 10:14 Ivan Fedotov > [hidden email] > > : > > > > > >> >>>> > > > > > >> >>>>> Igor, > > > > > >> >>>>> NPE is available in a full log, now I also attached it in > > the > > > > > >> ticket. > > > > > >> >>>>> > > > > > >> >>>>> IGNITE-7953 > > > > > >> >>>>> < > > > > > >> >>>>> > > > > > >> >> > > > > > >> > > > > > > > > > > > > https://github.com/apache/ignite/commit/51a202a4c48220fa919f47147bd4889033cd35a8 > > > > > >> >>>>> was commited on the 15 October. I could not take a look on > > the > > > > > >> >>>>> testAtomicOnheapTwoBackupAsyncFullSync before this date, > > because > > > > > the > > > > > >> >>>> oldest > > > > > >> >>>>> test in the history on TC dates 12 November. > > > > > >> >>>>> > > > > > >> >>>>> So, I tested it locally and could not reproduce mentioned > > error. > > > > > >> >>>>> > > > > > >> >>>>> чт, 29 нояб. 2018 г. в 20:07, Seliverstov Igor < > > > > > >> [hidden email]>: > > > > > >> >>>>> > > > > > >> >>>>>> Ivan, > > > > > >> >>>>>> > > > > > >> >>>>>> Could you provide a bit more details? > > > > > >> >>>>>> > > > > > >> >>>>>> I don't see any NPE among all available logs. > > > > > >> >>>>>> > > > > > >> >>>>>> I don't think the issue is caused by changes in scope of > > > > > >> IGNITE-7953. > > > > > >> >>>>>> The test fails both before > > > > > >> >>>>>> < > > > > > >> >>>>>> > > > > > >> >> > > > > > >> > > > > > > > > > > > > https://ci.ignite.apache.org/viewLog.html?buildId=2318582&tab=buildResultsDiv&buildTypeId=IgniteTests24Java8_ContinuousQuery4#testNameId3300126853696550025 > > > > > >> >>>>>> and after > > > > > >> >>>>>> < > > > > > >> >>>>>> > > > > > >> >> > > > > > >> > > > > > > > > > > > > https://ci.ignite.apache.org/viewLog.html?buildId=2345403&tab=buildResultsDiv&buildTypeId=IgniteTests24Java8_ContinuousQuery4#testNameId3300126853696550025 > > > > > >> >>>>>> the > > > > > >> >>>>>> commit was merged to master with almost the same stack > > trace. > > > > > >> >>>>>> > > > > > >> >>>>>> Regards, > > > > > >> >>>>>> Igor > > > > > >> >>>>>> > > > > > >> >>>>>> чт, 29 нояб. 2018 г. в 18:43, Yakov Zhdanov < > > > > [hidden email] > > > > > >: > > > > > >> >>>>>> > > > > > >> >>>>>>> Vladimir, can you please take a look at > > > > > >> >>>>>>> https://issues.apache.org/jira/browse/IGNITE-10376? > > > > > >> >>>>>>> > > > > > >> >>>>>>> --Yakov > > > > > >> >>>>>>> > > > > > >> >>>>> -- > > > > > >> >>>>> Ivan Fedotov. > > > > > >> >>>>> > > > > > >> >>>>> [hidden email] > > > > > >> >>>>> > > > > > >> > > > > > > > > > > > > > > > > > > -- > > > > > > Ivan Fedotov. > > > > > > > > > > > > [hidden email] > > > > > > > > > > > > > > > > > > > > > -- > > > > > Ivan Fedotov. > > > > > > > > > > [hidden email] > > > > > > > > > > > > > > > > > > -- > > > Ivan Fedotov. > > > > > > [hidden email] > > > > > > > > -- > > Best Regards, Vyacheslav D. > > > -- Ivan Fedotov. [hidden email] |
Free forum by Nabble | Edit this page |