Hi, Igniters!
I'd like to propose a new feature - opportunity to query and create indexes from public API. It will help in some cases, where: 1. SQL is not applicable by design of user application; 2. Where IndexScan is preferable than ScanQuery for performance reasons; 3. Functional indexes are required. Also it'll be great to have a transactional support for such queries, like the "select for update" query provides. But I don't dig there much. It will be a next step if this API will be implemented. I've prepared an IEP-71 for that [1] with more details. Please share your thoughts. [1] https://cwiki.apache.org/confluence/display/IGNITE/IEP-71+Public+API+for+secondary+index+search |
Hi Maksim,
Nice idea, I'd like to see this feature in Ignite. The motivation is clear to me, it would be nice to have fast scans and omit SQL overhead on planning, parsing and etc in some simple use-cases. I've left few minor comments to the IEP, but I have the next questions which answer I failed to find in IEP. 1. Is it possible to extend ScanQuery functionality to pass index condition as a hint/parameter rather than create a separate query type? This allows a user to run a query over the particular table (for multi-table per cache case) and use an index for some type of conditions. 2. Functional indices, as you wrote, should use Functions distributed via peerClassLoading mechanics. This means there will no class with function on server sides and such classes are not persistent. Seems, they can survive grid restart. This task looks like a huge one because the lifecycle of such classes should be described first. Possible pitfalls are: * Durability. Function code MUST be persistent, to survive node restart as there can be no guaranteed classes available on the server-side. * Consistency. Server (and maybe clients) nodes MUST have the same class code at a time. * Code ownership. Would class code be shared or per-cache? If first, you can't just change class code by loading a new one, because other caches may use this function. If second, different caches may have different code/behavior, that may be non-obvious to end-user. 3. IndexScan by the predicate is questionable. Maybe it will can faster if there are multiple tables in a cache, but looks similar to ScanQuery with a filter. Also, I believe we can have a common API (configuring, creating, using) for all types of Indices, but some types (e.g. functional) will be ignored in SQL due to limited support on H2 side, and other types will be shared and could be used by ScanQuery engine as well as by SQL engine. On Tue, Apr 6, 2021 at 4:14 PM Maksim Timonin <[hidden email]> wrote: > Hi, Igniters! > > I'd like to propose a new feature - opportunity to query and create indexes > from public API. > > It will help in some cases, where: > 1. SQL is not applicable by design of user application; > 2. Where IndexScan is preferable than ScanQuery for performance reasons; > 3. Functional indexes are required. > > Also it'll be great to have a transactional support for such queries, like > the "select for update" query provides. But I don't dig there much. It will > be a next step if this API will be implemented. > > I've prepared an IEP-71 for that [1] with more details. Please share your > thoughts. > > > [1] > > https://cwiki.apache.org/confluence/display/IGNITE/IEP-71+Public+API+for+secondary+index+search > -- Best regards, Andrey V. Mashenkov |
Hi, Andrey!
Thanks for the review and your comments! >> Is it possible to extend ScanQuery functionality to pass index condition I investigated this way and see some issues: 1. Querying of indexes is not a scan actually. It's a tree traverse (predicate operation is an exclusion, other operations like gt, lt, min, max have explicit boundaries). An index query consists of conditions that match an index structure. In general for a multi-key index there can be multiple conditions. The ScanQuery API provides a filter as param that for case of index query should be splitted on such conditions. It looks like a non-trivial task. 2. Querying of an index requires a sorted result, while The ScanQuery doesn't matter about that. So there will be a different behavior of the iterator for scanning a cache and querying indexes. It's not much to implement I think, but it can make ScanQuery unclear for a user. Maybe it's a point to separate traverse (gt, lt, in, etc...) and scan (predicate) index operations to different API. So there still will be a new query type for the traversing. But we will introduce some inheritors for ScanQuery, like TableScanQuery and IndexScanQuery, for scan and filter. Then the question is about ordering, Cache and Table scans aren't ordered, but Index is. Then we can introduce an optional param "order" for ScanQuery too. WDYT? >> Functional indices >> This task looks like a huge one because the lifecycle of such classes should be described first I agree with you. That this part should be investigated deeper than I did. So let's postpone discussion about functional indexes for a while. IEP-71 declares some phases, functional indexes are part of the 2nd phase, but users will get new functionality already from the 1st phase. Then I'll dig into things you mentioned. Thanks for pointing them out. >> IndexScan by the predicate is questionable Also in comments to the IEP on the Confluence you mentioned about deserialization that is required to get an object for predicate function. Now I see it like that: 1. The predicate should operate only with indexed fields; 2. User win from predicate only if index is inlined properly (even a part of rows aren't inlined due to varlen - it still can be faster then make a ScanQuery); 3. Ignite creates a proxy object that is filled with objects that are inlined. If a user tries to access a field that isn't inlined or not indexed, then deserialization will start and Ignite will log.warn() about that. So, I think it's a valid use case. Is there smth I'm missing? On Tue, Apr 6, 2021 at 6:21 PM Andrey Mashenkov <[hidden email]> wrote: > Hi Maksim, > > Nice idea, I'd like to see this feature in Ignite. > The motivation is clear to me, it would be nice to have fast scans and omit > SQL overhead on planning, parsing and etc in some simple use-cases. > > I've left few minor comments to the IEP, but I have the next questions > which answer I failed to find in IEP. > 1. Is it possible to extend ScanQuery functionality to pass index condition > as a hint/parameter rather than create a separate query type? > This allows a user to run a query over the particular table (for > multi-table per cache case) and use an index for some type of conditions. > > 2. Functional indices, as you wrote, should use Functions distributed via > peerClassLoading mechanics. > This means there will no class with function on server sides and such > classes are not persistent. Seems, they can survive grid restart. > This task looks like a huge one because the lifecycle of such classes > should be described first. > Possible pitfalls are: > * Durability. Function code MUST be persistent, to survive node restart as > there can be no guaranteed classes available on the server-side. > * Consistency. Server (and maybe clients) nodes MUST have the same class > code at a time. > * Code ownership. Would class code be shared or per-cache? If first, you > can't just change class code by loading a new one, because other caches may > use this function. > If second, different caches may have different code/behavior, that may be > non-obvious to end-user. > > 3. IndexScan by the predicate is questionable. > Maybe it will can faster if there are multiple tables in a cache, but looks > similar to ScanQuery with a filter. > > Also, I believe we can have a common API (configuring, creating, using) for > all types of Indices, but > some types (e.g. functional) will be ignored in SQL due to limited support > on H2 side, > and other types will be shared and could be used by ScanQuery engine as > well as by SQL engine. > > On Tue, Apr 6, 2021 at 4:14 PM Maksim Timonin <[hidden email]> > wrote: > > > Hi, Igniters! > > > > I'd like to propose a new feature - opportunity to query and create > indexes > > from public API. > > > > It will help in some cases, where: > > 1. SQL is not applicable by design of user application; > > 2. Where IndexScan is preferable than ScanQuery for performance reasons; > > 3. Functional indexes are required. > > > > Also it'll be great to have a transactional support for such queries, > like > > the "select for update" query provides. But I don't dig there much. It > will > > be a next step if this API will be implemented. > > > > I've prepared an IEP-71 for that [1] with more details. Please share your > > thoughts. > > > > > > [1] > > > > > https://cwiki.apache.org/confluence/display/IGNITE/IEP-71+Public+API+for+secondary+index+search > > > > > -- > Best regards, > Andrey V. Mashenkov > |
Maksim,
The ScanQuery API provides a filter as > param that for case of index query should be splitted on such conditions. > It looks like a non-trivial task. > ScanQuery, TextQuery and partially SQL query share the same infrastructure. I've thought we could extend, improve and reuse some ScanQuery code that already works fine: map query on topology, IO, batching. Add IndexCondition alongside the Filter, and abstract query executor from source (primary and secondary Indexes). Add a sorted merge algorithm to the query merge stage. It can be very useful also for TextQueries that suffers from the absence of sorted merge and a "limit' condition work incorrectly. If you think it will be too hard than creating from scratch, I'm ok. 3. Ignite creates a proxy object that is filled with objects that are > inlined. If a user tries to access a field that isn't inlined or not > indexed, then deserialization will start and Ignite will log.warn() about > that. > Agree, this can be faster. I don't like the idea a user code will be executed inside BTree operation, any exception can cause FailureHandler triggering and stop the node. There is one more thing that could be improved. ScanQuery now iterates over per-partition PK Hash index trees and has performance issues on a small grid with a large number of partitions. So, there are many partitions on every node and many trees should be scanned. In this case scan over a secondary index gives significant boots even if every row is materialized, because we need to traverse over a single tree per-node. Having the ability to run a ScanQuery over a secondary index (if one exists) instead of PK Hash will be great. On Wed, Apr 7, 2021 at 11:18 AM Maksim Timonin <[hidden email]> wrote: > Hi, Andrey! > > Thanks for the review and your comments! > > >> Is it possible to extend ScanQuery functionality to pass index condition > I investigated this way and see some issues: > 1. Querying of indexes is not a scan actually. It's > a tree traverse (predicate operation is an exclusion, other operations like > gt, lt, min, max have explicit boundaries). An index query consists of > conditions that match an index structure. In general for a multi-key index > there can be multiple conditions. The ScanQuery API provides a filter as > param that for case of index query should be splitted on such conditions. > It looks like a non-trivial task. > 2. Querying of an index requires a sorted result, while The ScanQuery > doesn't matter about that. So there will be a different behavior of the > iterator for scanning a cache and querying indexes. It's not much to > implement I think, but it can make ScanQuery unclear for a user. > > Maybe it's a point to separate traverse (gt, lt, in, etc...) and scan > (predicate) index operations to different API. So there still will be a new > query type for the traversing. > > But we will introduce some inheritors for ScanQuery, like TableScanQuery > and IndexScanQuery, for scan and filter. Then the question is about > ordering, Cache and Table scans aren't ordered, but Index is. Then we can > introduce an optional param "order" for ScanQuery too. > > WDYT? > > >> Functional indices > >> This task looks like a huge one because the lifecycle of such classes > should be described first > I agree with you. That this part should be investigated deeper than I did. > So let's postpone discussion about functional indexes for a while. IEP-71 > declares some phases, functional indexes are part of the 2nd phase, but > users will get new functionality already from the 1st phase. Then I'll dig > into things you mentioned. Thanks for pointing them out. > > >> IndexScan by the predicate is questionable > Also in comments to the IEP on the Confluence you mentioned about > deserialization that is required to get an object for predicate function. > Now I see it like that: > 1. The predicate should operate only with indexed fields; > 2. User win from predicate only if index is inlined properly (even a part > of rows aren't inlined due to varlen - it still can be faster then make a > ScanQuery); > 3. Ignite creates a proxy object that is filled with objects that are > inlined. If a user tries to access a field that isn't inlined or not > indexed, then deserialization will start and Ignite will log.warn() about > that. > > So, I think it's a valid use case. Is there smth I'm missing? > > > > > > On Tue, Apr 6, 2021 at 6:21 PM Andrey Mashenkov < > [hidden email]> > wrote: > > > Hi Maksim, > > > > Nice idea, I'd like to see this feature in Ignite. > > The motivation is clear to me, it would be nice to have fast scans and > omit > > SQL overhead on planning, parsing and etc in some simple use-cases. > > > > I've left few minor comments to the IEP, but I have the next questions > > which answer I failed to find in IEP. > > 1. Is it possible to extend ScanQuery functionality to pass index > condition > > as a hint/parameter rather than create a separate query type? > > This allows a user to run a query over the particular table (for > > multi-table per cache case) and use an index for some type of conditions. > > > > 2. Functional indices, as you wrote, should use Functions distributed via > > peerClassLoading mechanics. > > This means there will no class with function on server sides and such > > classes are not persistent. Seems, they can survive grid restart. > > This task looks like a huge one because the lifecycle of such classes > > should be described first. > > Possible pitfalls are: > > * Durability. Function code MUST be persistent, to survive node restart > as > > there can be no guaranteed classes available on the server-side. > > * Consistency. Server (and maybe clients) nodes MUST have the same class > > code at a time. > > * Code ownership. Would class code be shared or per-cache? If first, you > > can't just change class code by loading a new one, because other caches > may > > use this function. > > If second, different caches may have different code/behavior, that may be > > non-obvious to end-user. > > > > 3. IndexScan by the predicate is questionable. > > Maybe it will can faster if there are multiple tables in a cache, but > looks > > similar to ScanQuery with a filter. > > > > Also, I believe we can have a common API (configuring, creating, using) > for > > all types of Indices, but > > some types (e.g. functional) will be ignored in SQL due to limited > support > > on H2 side, > > and other types will be shared and could be used by ScanQuery engine as > > well as by SQL engine. > > > > On Tue, Apr 6, 2021 at 4:14 PM Maksim Timonin <[hidden email]> > > wrote: > > > > > Hi, Igniters! > > > > > > I'd like to propose a new feature - opportunity to query and create > > indexes > > > from public API. > > > > > > It will help in some cases, where: > > > 1. SQL is not applicable by design of user application; > > > 2. Where IndexScan is preferable than ScanQuery for performance > reasons; > > > 3. Functional indexes are required. > > > > > > Also it'll be great to have a transactional support for such queries, > > like > > > the "select for update" query provides. But I don't dig there much. It > > will > > > be a next step if this API will be implemented. > > > > > > I've prepared an IEP-71 for that [1] with more details. Please share > your > > > thoughts. > > > > > > > > > [1] > > > > > > > > > https://cwiki.apache.org/confluence/display/IGNITE/IEP-71+Public+API+for+secondary+index+search > > > > > > > > > -- > > Best regards, > > Andrey V. Mashenkov > > > -- Best regards, Andrey V. Mashenkov |
Hi, Andrey!
>> ScanQuery, TextQuery and partially SQL query share the same infrastructure I think I understand what you mean. I debug query processing and now agree that it's a nice idea to try to reuse the infrastructure of scan and text queries. Also as I can see there already Reducer functionality exists, so I hope we can use that. I'm not absolutely confident now that it will work fine, but I'm going to start there. Thanks for pointing me this direction! >> I don't like the idea a user code will be executed inside BTree operation On the confluence page I've shown that a predicate passes as TreeRowClosure. In this case you're right, any exception in a predicate will lead to a CorruptedTreeException. But I see another legal way to implement the predicate operation. BPlusTree.find accepts the X param that passed to IO.getRow(). As I understand this param helps to control how much returned row is filled. Then we can use it to return an object that contains only basic info - link, pageAddr, offset. Then predicate operation will be applied on the higher level on a cursor returned by a tree (like H2TreeIndex does). It's safe to run user code there, we can handle exceptions there. On Wed, Apr 7, 2021 at 4:46 PM Andrey Mashenkov <[hidden email]> wrote: > Maksim, > > The ScanQuery API provides a filter as > > param that for case of index query should be splitted on such conditions. > > It looks like a non-trivial task. > > > ScanQuery, TextQuery and partially SQL query share the same infrastructure. > I've thought we could extend, improve and reuse some ScanQuery code that > already works fine: map query on topology, IO, batching. > Add IndexCondition alongside the Filter, and abstract query executor from > source (primary and secondary Indexes). > Add a sorted merge algorithm to the query merge stage. It can be very > useful also for TextQueries that suffers from the absence of sorted merge > and a "limit' condition work incorrectly. > > If you think it will be too hard than creating from scratch, I'm ok. > > 3. Ignite creates a proxy object that is filled with objects that are > > inlined. If a user tries to access a field that isn't inlined or not > > indexed, then deserialization will start and Ignite will log.warn() about > > that. > > > Agree, this can be faster. > I don't like the idea a user code will be executed inside BTree operation, > any exception can cause FailureHandler triggering and stop the node. > > There is one more thing that could be improved. > ScanQuery now iterates over per-partition PK Hash index trees and has > performance issues on a small grid with a large number of partitions. > So, there are many partitions on every node and many trees should be > scanned. > In this case scan over a secondary index gives significant boots even if > every row is materialized, because we need to traverse over a single tree > per-node. > Having the ability to run a ScanQuery over a secondary index (if one > exists) instead of PK Hash will be great. > > > On Wed, Apr 7, 2021 at 11:18 AM Maksim Timonin <[hidden email]> > wrote: > > > Hi, Andrey! > > > > Thanks for the review and your comments! > > > > >> Is it possible to extend ScanQuery functionality to pass index > condition > > I investigated this way and see some issues: > > 1. Querying of indexes is not a scan actually. It's > > a tree traverse (predicate operation is an exclusion, other operations > like > > gt, lt, min, max have explicit boundaries). An index query consists of > > conditions that match an index structure. In general for a multi-key > index > > there can be multiple conditions. The ScanQuery API provides a filter as > > param that for case of index query should be splitted on such conditions. > > It looks like a non-trivial task. > > 2. Querying of an index requires a sorted result, while The ScanQuery > > doesn't matter about that. So there will be a different behavior of the > > iterator for scanning a cache and querying indexes. It's not much to > > implement I think, but it can make ScanQuery unclear for a user. > > > > Maybe it's a point to separate traverse (gt, lt, in, etc...) and scan > > (predicate) index operations to different API. So there still will be a > new > > query type for the traversing. > > > > But we will introduce some inheritors for ScanQuery, like TableScanQuery > > and IndexScanQuery, for scan and filter. Then the question is about > > ordering, Cache and Table scans aren't ordered, but Index is. Then we can > > introduce an optional param "order" for ScanQuery too. > > > > WDYT? > > > > >> Functional indices > > >> This task looks like a huge one because the lifecycle of such classes > > should be described first > > I agree with you. That this part should be investigated deeper than I > did. > > So let's postpone discussion about functional indexes for a while. IEP-71 > > declares some phases, functional indexes are part of the 2nd phase, but > > users will get new functionality already from the 1st phase. Then I'll > dig > > into things you mentioned. Thanks for pointing them out. > > > > >> IndexScan by the predicate is questionable > > Also in comments to the IEP on the Confluence you mentioned about > > deserialization that is required to get an object for predicate function. > > Now I see it like that: > > 1. The predicate should operate only with indexed fields; > > 2. User win from predicate only if index is inlined properly (even a part > > of rows aren't inlined due to varlen - it still can be faster then make a > > ScanQuery); > > 3. Ignite creates a proxy object that is filled with objects that are > > inlined. If a user tries to access a field that isn't inlined or not > > indexed, then deserialization will start and Ignite will log.warn() about > > that. > > > > So, I think it's a valid use case. Is there smth I'm missing? > > > > > > > > > > > > On Tue, Apr 6, 2021 at 6:21 PM Andrey Mashenkov < > > [hidden email]> > > wrote: > > > > > Hi Maksim, > > > > > > Nice idea, I'd like to see this feature in Ignite. > > > The motivation is clear to me, it would be nice to have fast scans and > > omit > > > SQL overhead on planning, parsing and etc in some simple use-cases. > > > > > > I've left few minor comments to the IEP, but I have the next questions > > > which answer I failed to find in IEP. > > > 1. Is it possible to extend ScanQuery functionality to pass index > > condition > > > as a hint/parameter rather than create a separate query type? > > > This allows a user to run a query over the particular table (for > > > multi-table per cache case) and use an index for some type of > conditions. > > > > > > 2. Functional indices, as you wrote, should use Functions distributed > via > > > peerClassLoading mechanics. > > > This means there will no class with function on server sides and such > > > classes are not persistent. Seems, they can survive grid restart. > > > This task looks like a huge one because the lifecycle of such classes > > > should be described first. > > > Possible pitfalls are: > > > * Durability. Function code MUST be persistent, to survive node restart > > as > > > there can be no guaranteed classes available on the server-side. > > > * Consistency. Server (and maybe clients) nodes MUST have the same > class > > > code at a time. > > > * Code ownership. Would class code be shared or per-cache? If first, > you > > > can't just change class code by loading a new one, because other caches > > may > > > use this function. > > > If second, different caches may have different code/behavior, that may > be > > > non-obvious to end-user. > > > > > > 3. IndexScan by the predicate is questionable. > > > Maybe it will can faster if there are multiple tables in a cache, but > > looks > > > similar to ScanQuery with a filter. > > > > > > Also, I believe we can have a common API (configuring, creating, using) > > for > > > all types of Indices, but > > > some types (e.g. functional) will be ignored in SQL due to limited > > support > > > on H2 side, > > > and other types will be shared and could be used by ScanQuery engine as > > > well as by SQL engine. > > > > > > On Tue, Apr 6, 2021 at 4:14 PM Maksim Timonin <[hidden email] > > > > > wrote: > > > > > > > Hi, Igniters! > > > > > > > > I'd like to propose a new feature - opportunity to query and create > > > indexes > > > > from public API. > > > > > > > > It will help in some cases, where: > > > > 1. SQL is not applicable by design of user application; > > > > 2. Where IndexScan is preferable than ScanQuery for performance > > reasons; > > > > 3. Functional indexes are required. > > > > > > > > Also it'll be great to have a transactional support for such queries, > > > like > > > > the "select for update" query provides. But I don't dig there much. > It > > > will > > > > be a next step if this API will be implemented. > > > > > > > > I've prepared an IEP-71 for that [1] with more details. Please share > > your > > > > thoughts. > > > > > > > > > > > > [1] > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/IGNITE/IEP-71+Public+API+for+secondary+index+search > > > > > > > > > > > > > -- > > > Best regards, > > > Andrey V. Mashenkov > > > > > > > > -- > Best regards, > Andrey V. Mashenkov > |
Hi, Andrey!
Am I right, that you mean this ticket [1] *IGNITE-12291 Create controllable paged query requests / responses for TextQuery similar to current SQL result processing*, when talked about incorrect limit work for TextQueries? [1] https://issues.apache.org/jira/browse/IGNITE-12291 On Thu, Apr 8, 2021 at 4:32 PM Maksim Timonin <[hidden email]> wrote: > Hi, Andrey! > > >> ScanQuery, TextQuery and partially SQL query share the same > infrastructure > I think I understand what you mean. I debug query processing and now agree > that it's a nice idea to try to reuse the infrastructure of scan and text > queries. Also as I can see there already Reducer functionality exists, so I > hope we can use that. I'm not absolutely confident now that it will work > fine, but I'm going to start there. Thanks for pointing me this direction! > > >> I don't like the idea a user code will be executed inside BTree > operation > On the confluence page I've shown that a predicate passes as > TreeRowClosure. In this case you're right, any exception in a predicate > will lead to a CorruptedTreeException. But I see another legal way to > implement the predicate operation. BPlusTree.find accepts the X param that > passed to IO.getRow(). As I understand this param helps to control how much > returned row is filled. Then we can use it to return an object that > contains only basic info - link, pageAddr, offset. Then predicate operation > will be applied on the higher level on a cursor returned by a tree (like > H2TreeIndex does). It's safe to run user code there, we can handle > exceptions there. > > > > On Wed, Apr 7, 2021 at 4:46 PM Andrey Mashenkov < > [hidden email]> wrote: > >> Maksim, >> >> The ScanQuery API provides a filter as >> > param that for case of index query should be splitted on such >> conditions. >> > It looks like a non-trivial task. >> > >> ScanQuery, TextQuery and partially SQL query share the same >> infrastructure. >> I've thought we could extend, improve and reuse some ScanQuery code that >> already works fine: map query on topology, IO, batching. >> Add IndexCondition alongside the Filter, and abstract query executor from >> source (primary and secondary Indexes). >> Add a sorted merge algorithm to the query merge stage. It can be very >> useful also for TextQueries that suffers from the absence of sorted merge >> and a "limit' condition work incorrectly. >> >> If you think it will be too hard than creating from scratch, I'm ok. >> >> 3. Ignite creates a proxy object that is filled with objects that are >> > inlined. If a user tries to access a field that isn't inlined or not >> > indexed, then deserialization will start and Ignite will log.warn() >> about >> > that. >> > >> Agree, this can be faster. >> I don't like the idea a user code will be executed inside BTree operation, >> any exception can cause FailureHandler triggering and stop the node. >> >> There is one more thing that could be improved. >> ScanQuery now iterates over per-partition PK Hash index trees and has >> performance issues on a small grid with a large number of partitions. >> So, there are many partitions on every node and many trees should be >> scanned. >> In this case scan over a secondary index gives significant boots even if >> every row is materialized, because we need to traverse over a single tree >> per-node. >> Having the ability to run a ScanQuery over a secondary index (if one >> exists) instead of PK Hash will be great. >> >> >> On Wed, Apr 7, 2021 at 11:18 AM Maksim Timonin <[hidden email]> >> wrote: >> >> > Hi, Andrey! >> > >> > Thanks for the review and your comments! >> > >> > >> Is it possible to extend ScanQuery functionality to pass index >> condition >> > I investigated this way and see some issues: >> > 1. Querying of indexes is not a scan actually. It's >> > a tree traverse (predicate operation is an exclusion, other operations >> like >> > gt, lt, min, max have explicit boundaries). An index query consists of >> > conditions that match an index structure. In general for a multi-key >> index >> > there can be multiple conditions. The ScanQuery API provides a filter as >> > param that for case of index query should be splitted on such >> conditions. >> > It looks like a non-trivial task. >> > 2. Querying of an index requires a sorted result, while The ScanQuery >> > doesn't matter about that. So there will be a different behavior of the >> > iterator for scanning a cache and querying indexes. It's not much to >> > implement I think, but it can make ScanQuery unclear for a user. >> > >> > Maybe it's a point to separate traverse (gt, lt, in, etc...) and scan >> > (predicate) index operations to different API. So there still will be a >> new >> > query type for the traversing. >> > >> > But we will introduce some inheritors for ScanQuery, like TableScanQuery >> > and IndexScanQuery, for scan and filter. Then the question is about >> > ordering, Cache and Table scans aren't ordered, but Index is. Then we >> can >> > introduce an optional param "order" for ScanQuery too. >> > >> > WDYT? >> > >> > >> Functional indices >> > >> This task looks like a huge one because the lifecycle of such classes >> > should be described first >> > I agree with you. That this part should be investigated deeper than I >> did. >> > So let's postpone discussion about functional indexes for a while. >> IEP-71 >> > declares some phases, functional indexes are part of the 2nd phase, but >> > users will get new functionality already from the 1st phase. Then I'll >> dig >> > into things you mentioned. Thanks for pointing them out. >> > >> > >> IndexScan by the predicate is questionable >> > Also in comments to the IEP on the Confluence you mentioned about >> > deserialization that is required to get an object for predicate >> function. >> > Now I see it like that: >> > 1. The predicate should operate only with indexed fields; >> > 2. User win from predicate only if index is inlined properly (even a >> part >> > of rows aren't inlined due to varlen - it still can be faster then make >> a >> > ScanQuery); >> > 3. Ignite creates a proxy object that is filled with objects that are >> > inlined. If a user tries to access a field that isn't inlined or not >> > indexed, then deserialization will start and Ignite will log.warn() >> about >> > that. >> > >> > So, I think it's a valid use case. Is there smth I'm missing? >> > >> > >> > >> > >> > >> > On Tue, Apr 6, 2021 at 6:21 PM Andrey Mashenkov < >> > [hidden email]> >> > wrote: >> > >> > > Hi Maksim, >> > > >> > > Nice idea, I'd like to see this feature in Ignite. >> > > The motivation is clear to me, it would be nice to have fast scans and >> > omit >> > > SQL overhead on planning, parsing and etc in some simple use-cases. >> > > >> > > I've left few minor comments to the IEP, but I have the next questions >> > > which answer I failed to find in IEP. >> > > 1. Is it possible to extend ScanQuery functionality to pass index >> > condition >> > > as a hint/parameter rather than create a separate query type? >> > > This allows a user to run a query over the particular table (for >> > > multi-table per cache case) and use an index for some type of >> conditions. >> > > >> > > 2. Functional indices, as you wrote, should use Functions distributed >> via >> > > peerClassLoading mechanics. >> > > This means there will no class with function on server sides and such >> > > classes are not persistent. Seems, they can survive grid restart. >> > > This task looks like a huge one because the lifecycle of such classes >> > > should be described first. >> > > Possible pitfalls are: >> > > * Durability. Function code MUST be persistent, to survive node >> restart >> > as >> > > there can be no guaranteed classes available on the server-side. >> > > * Consistency. Server (and maybe clients) nodes MUST have the same >> class >> > > code at a time. >> > > * Code ownership. Would class code be shared or per-cache? If first, >> you >> > > can't just change class code by loading a new one, because other >> caches >> > may >> > > use this function. >> > > If second, different caches may have different code/behavior, that >> may be >> > > non-obvious to end-user. >> > > >> > > 3. IndexScan by the predicate is questionable. >> > > Maybe it will can faster if there are multiple tables in a cache, but >> > looks >> > > similar to ScanQuery with a filter. >> > > >> > > Also, I believe we can have a common API (configuring, creating, >> using) >> > for >> > > all types of Indices, but >> > > some types (e.g. functional) will be ignored in SQL due to limited >> > support >> > > on H2 side, >> > > and other types will be shared and could be used by ScanQuery engine >> as >> > > well as by SQL engine. >> > > >> > > On Tue, Apr 6, 2021 at 4:14 PM Maksim Timonin < >> [hidden email]> >> > > wrote: >> > > >> > > > Hi, Igniters! >> > > > >> > > > I'd like to propose a new feature - opportunity to query and create >> > > indexes >> > > > from public API. >> > > > >> > > > It will help in some cases, where: >> > > > 1. SQL is not applicable by design of user application; >> > > > 2. Where IndexScan is preferable than ScanQuery for performance >> > reasons; >> > > > 3. Functional indexes are required. >> > > > >> > > > Also it'll be great to have a transactional support for such >> queries, >> > > like >> > > > the "select for update" query provides. But I don't dig there much. >> It >> > > will >> > > > be a next step if this API will be implemented. >> > > > >> > > > I've prepared an IEP-71 for that [1] with more details. Please share >> > your >> > > > thoughts. >> > > > >> > > > >> > > > [1] >> > > > >> > > > >> > > >> > >> https://cwiki.apache.org/confluence/display/IGNITE/IEP-71+Public+API+for+secondary+index+search >> > > > >> > > >> > > >> > > -- >> > > Best regards, >> > > Andrey V. Mashenkov >> > > >> > >> >> >> -- >> Best regards, >> Andrey V. Mashenkov >> > |
In reply to this post by Maksim Timonin
How does this fit with the current IndexingSpi? Superficially they appear to do very similar things?
Regards, Stephen > On 6 Apr 2021, at 14:13, Maksim Timonin <[hidden email]> wrote: > > Hi, Igniters! > > I'd like to propose a new feature - opportunity to query and create indexes > from public API. > > It will help in some cases, where: > 1. SQL is not applicable by design of user application; > 2. Where IndexScan is preferable than ScanQuery for performance reasons; > 3. Functional indexes are required. > > Also it'll be great to have a transactional support for such queries, like > the "select for update" query provides. But I don't dig there much. It will > be a next step if this API will be implemented. > > I've prepared an IEP-71 for that [1] with more details. Please share your > thoughts. > > > [1] > https://cwiki.apache.org/confluence/display/IGNITE/IEP-71+Public+API+for+secondary+index+search |
Hi Stephen!
Please have a look at the QueryProcessing paragraph [1]. I've described why IndexingSpi doesn't fit us well. [1] https://cwiki.apache.org/confluence/display/IGNITE/IEP-71+Public+API+for+secondary+index+search#IEP71PublicAPIforsecondaryindexsearch-2)QueryProcessing On Mon, Apr 12, 2021 at 1:24 PM Stephen Darlington < [hidden email]> wrote: > How does this fit with the current IndexingSpi? Superficially they appear > to do very similar things? > > Regards, > Stephen > > > On 6 Apr 2021, at 14:13, Maksim Timonin <[hidden email]> wrote: > > > > Hi, Igniters! > > > > I'd like to propose a new feature - opportunity to query and create > indexes > > from public API. > > > > It will help in some cases, where: > > 1. SQL is not applicable by design of user application; > > 2. Where IndexScan is preferable than ScanQuery for performance reasons; > > 3. Functional indexes are required. > > > > Also it'll be great to have a transactional support for such queries, > like > > the "select for update" query provides. But I don't dig there much. It > will > > be a next step if this API will be implemented. > > > > I've prepared an IEP-71 for that [1] with more details. Please share your > > thoughts. > > > > > > [1] > > > https://cwiki.apache.org/confluence/display/IGNITE/IEP-71+Public+API+for+secondary+index+search > > > |
Is this a replacement for IndexingSpi? Put bluntly, do we deprecate (and remove) it?
Or do you see them as complimentary? > On 12 Apr 2021, at 11:29, Maksim Timonin <[hidden email]> wrote: > > Hi Stephen! > > Please have a look at the QueryProcessing paragraph [1]. I've described > why IndexingSpi doesn't fit us well. > > [1] > https://cwiki.apache.org/confluence/display/IGNITE/IEP-71+Public+API+for+secondary+index+search#IEP71PublicAPIforsecondaryindexsearch-2)QueryProcessing > > On Mon, Apr 12, 2021 at 1:24 PM Stephen Darlington < > [hidden email]> wrote: > >> How does this fit with the current IndexingSpi? Superficially they appear >> to do very similar things? >> >> Regards, >> Stephen >> >>> On 6 Apr 2021, at 14:13, Maksim Timonin <[hidden email]> wrote: >>> >>> Hi, Igniters! >>> >>> I'd like to propose a new feature - opportunity to query and create >> indexes >>> from public API. >>> >>> It will help in some cases, where: >>> 1. SQL is not applicable by design of user application; >>> 2. Where IndexScan is preferable than ScanQuery for performance reasons; >>> 3. Functional indexes are required. >>> >>> Also it'll be great to have a transactional support for such queries, >> like >>> the "select for update" query provides. But I don't dig there much. It >> will >>> be a next step if this API will be implemented. >>> >>> I've prepared an IEP-71 for that [1] with more details. Please share your >>> thoughts. >>> >>> >>> [1] >>> >> https://cwiki.apache.org/confluence/display/IGNITE/IEP-71+Public+API+for+secondary+index+search >> >> >> |
In reply to this post by Maksim Timonin
Maksim,
Am I right, that you mean this ticket [1] *IGNITE-12291 Create controllable > paged query requests / responses for TextQuery similar to current SQL > result processing*, when talked about incorrect limit work for > TextQueries? > Yes, sure, that's it. On Mon, Apr 12, 2021 at 12:15 PM Maksim Timonin <[hidden email]> wrote: > Hi, Andrey! > > Am I right, that you mean this ticket [1] *IGNITE-12291 Create > controllable paged query requests / responses for TextQuery similar to > current SQL result processing*, when talked about incorrect limit work > for TextQueries? > > > [1] https://issues.apache.org/jira/browse/IGNITE-12291 > > On Thu, Apr 8, 2021 at 4:32 PM Maksim Timonin <[hidden email]> > wrote: > >> Hi, Andrey! >> >> >> ScanQuery, TextQuery and partially SQL query share the same >> infrastructure >> I think I understand what you mean. I debug query processing and now >> agree that it's a nice idea to try to reuse the infrastructure of scan and >> text queries. Also as I can see there already Reducer functionality exists, >> so I hope we can use that. I'm not absolutely confident now that it will >> work fine, but I'm going to start there. Thanks for pointing me this >> direction! >> >> >> I don't like the idea a user code will be executed inside BTree >> operation >> On the confluence page I've shown that a predicate passes as >> TreeRowClosure. In this case you're right, any exception in a predicate >> will lead to a CorruptedTreeException. But I see another legal way to >> implement the predicate operation. BPlusTree.find accepts the X param that >> passed to IO.getRow(). As I understand this param helps to control how much >> returned row is filled. Then we can use it to return an object that >> contains only basic info - link, pageAddr, offset. Then predicate operation >> will be applied on the higher level on a cursor returned by a tree (like >> H2TreeIndex does). It's safe to run user code there, we can handle >> exceptions there. >> >> >> >> On Wed, Apr 7, 2021 at 4:46 PM Andrey Mashenkov < >> [hidden email]> wrote: >> >>> Maksim, >>> >>> The ScanQuery API provides a filter as >>> > param that for case of index query should be splitted on such >>> conditions. >>> > It looks like a non-trivial task. >>> > >>> ScanQuery, TextQuery and partially SQL query share the same >>> infrastructure. >>> I've thought we could extend, improve and reuse some ScanQuery code that >>> already works fine: map query on topology, IO, batching. >>> Add IndexCondition alongside the Filter, and abstract query executor from >>> source (primary and secondary Indexes). >>> Add a sorted merge algorithm to the query merge stage. It can be very >>> useful also for TextQueries that suffers from the absence of sorted merge >>> and a "limit' condition work incorrectly. >>> >>> If you think it will be too hard than creating from scratch, I'm ok. >>> >>> 3. Ignite creates a proxy object that is filled with objects that are >>> > inlined. If a user tries to access a field that isn't inlined or not >>> > indexed, then deserialization will start and Ignite will log.warn() >>> about >>> > that. >>> > >>> Agree, this can be faster. >>> I don't like the idea a user code will be executed inside BTree >>> operation, >>> any exception can cause FailureHandler triggering and stop the node. >>> >>> There is one more thing that could be improved. >>> ScanQuery now iterates over per-partition PK Hash index trees and has >>> performance issues on a small grid with a large number of partitions. >>> So, there are many partitions on every node and many trees should be >>> scanned. >>> In this case scan over a secondary index gives significant boots even if >>> every row is materialized, because we need to traverse over a single tree >>> per-node. >>> Having the ability to run a ScanQuery over a secondary index (if one >>> exists) instead of PK Hash will be great. >>> >>> >>> On Wed, Apr 7, 2021 at 11:18 AM Maksim Timonin <[hidden email]> >>> wrote: >>> >>> > Hi, Andrey! >>> > >>> > Thanks for the review and your comments! >>> > >>> > >> Is it possible to extend ScanQuery functionality to pass index >>> condition >>> > I investigated this way and see some issues: >>> > 1. Querying of indexes is not a scan actually. It's >>> > a tree traverse (predicate operation is an exclusion, other operations >>> like >>> > gt, lt, min, max have explicit boundaries). An index query consists of >>> > conditions that match an index structure. In general for a multi-key >>> index >>> > there can be multiple conditions. The ScanQuery API provides a filter >>> as >>> > param that for case of index query should be splitted on such >>> conditions. >>> > It looks like a non-trivial task. >>> > 2. Querying of an index requires a sorted result, while The ScanQuery >>> > doesn't matter about that. So there will be a different behavior of the >>> > iterator for scanning a cache and querying indexes. It's not much to >>> > implement I think, but it can make ScanQuery unclear for a user. >>> > >>> > Maybe it's a point to separate traverse (gt, lt, in, etc...) and scan >>> > (predicate) index operations to different API. So there still will be >>> a new >>> > query type for the traversing. >>> > >>> > But we will introduce some inheritors for ScanQuery, like >>> TableScanQuery >>> > and IndexScanQuery, for scan and filter. Then the question is about >>> > ordering, Cache and Table scans aren't ordered, but Index is. Then we >>> can >>> > introduce an optional param "order" for ScanQuery too. >>> > >>> > WDYT? >>> > >>> > >> Functional indices >>> > >> This task looks like a huge one because the lifecycle of such >>> classes >>> > should be described first >>> > I agree with you. That this part should be investigated deeper than I >>> did. >>> > So let's postpone discussion about functional indexes for a while. >>> IEP-71 >>> > declares some phases, functional indexes are part of the 2nd phase, but >>> > users will get new functionality already from the 1st phase. Then I'll >>> dig >>> > into things you mentioned. Thanks for pointing them out. >>> > >>> > >> IndexScan by the predicate is questionable >>> > Also in comments to the IEP on the Confluence you mentioned about >>> > deserialization that is required to get an object for predicate >>> function. >>> > Now I see it like that: >>> > 1. The predicate should operate only with indexed fields; >>> > 2. User win from predicate only if index is inlined properly (even a >>> part >>> > of rows aren't inlined due to varlen - it still can be faster then >>> make a >>> > ScanQuery); >>> > 3. Ignite creates a proxy object that is filled with objects that are >>> > inlined. If a user tries to access a field that isn't inlined or not >>> > indexed, then deserialization will start and Ignite will log.warn() >>> about >>> > that. >>> > >>> > So, I think it's a valid use case. Is there smth I'm missing? >>> > >>> > >>> > >>> > >>> > >>> > On Tue, Apr 6, 2021 at 6:21 PM Andrey Mashenkov < >>> > [hidden email]> >>> > wrote: >>> > >>> > > Hi Maksim, >>> > > >>> > > Nice idea, I'd like to see this feature in Ignite. >>> > > The motivation is clear to me, it would be nice to have fast scans >>> and >>> > omit >>> > > SQL overhead on planning, parsing and etc in some simple use-cases. >>> > > >>> > > I've left few minor comments to the IEP, but I have the next >>> questions >>> > > which answer I failed to find in IEP. >>> > > 1. Is it possible to extend ScanQuery functionality to pass index >>> > condition >>> > > as a hint/parameter rather than create a separate query type? >>> > > This allows a user to run a query over the particular table (for >>> > > multi-table per cache case) and use an index for some type of >>> conditions. >>> > > >>> > > 2. Functional indices, as you wrote, should use Functions >>> distributed via >>> > > peerClassLoading mechanics. >>> > > This means there will no class with function on server sides and such >>> > > classes are not persistent. Seems, they can survive grid restart. >>> > > This task looks like a huge one because the lifecycle of such classes >>> > > should be described first. >>> > > Possible pitfalls are: >>> > > * Durability. Function code MUST be persistent, to survive node >>> restart >>> > as >>> > > there can be no guaranteed classes available on the server-side. >>> > > * Consistency. Server (and maybe clients) nodes MUST have the same >>> class >>> > > code at a time. >>> > > * Code ownership. Would class code be shared or per-cache? If first, >>> you >>> > > can't just change class code by loading a new one, because other >>> caches >>> > may >>> > > use this function. >>> > > If second, different caches may have different code/behavior, that >>> may be >>> > > non-obvious to end-user. >>> > > >>> > > 3. IndexScan by the predicate is questionable. >>> > > Maybe it will can faster if there are multiple tables in a cache, but >>> > looks >>> > > similar to ScanQuery with a filter. >>> > > >>> > > Also, I believe we can have a common API (configuring, creating, >>> using) >>> > for >>> > > all types of Indices, but >>> > > some types (e.g. functional) will be ignored in SQL due to limited >>> > support >>> > > on H2 side, >>> > > and other types will be shared and could be used by ScanQuery engine >>> as >>> > > well as by SQL engine. >>> > > >>> > > On Tue, Apr 6, 2021 at 4:14 PM Maksim Timonin < >>> [hidden email]> >>> > > wrote: >>> > > >>> > > > Hi, Igniters! >>> > > > >>> > > > I'd like to propose a new feature - opportunity to query and create >>> > > indexes >>> > > > from public API. >>> > > > >>> > > > It will help in some cases, where: >>> > > > 1. SQL is not applicable by design of user application; >>> > > > 2. Where IndexScan is preferable than ScanQuery for performance >>> > reasons; >>> > > > 3. Functional indexes are required. >>> > > > >>> > > > Also it'll be great to have a transactional support for such >>> queries, >>> > > like >>> > > > the "select for update" query provides. But I don't dig there >>> much. It >>> > > will >>> > > > be a next step if this API will be implemented. >>> > > > >>> > > > I've prepared an IEP-71 for that [1] with more details. Please >>> share >>> > your >>> > > > thoughts. >>> > > > >>> > > > >>> > > > [1] >>> > > > >>> > > > >>> > > >>> > >>> https://cwiki.apache.org/confluence/display/IGNITE/IEP-71+Public+API+for+secondary+index+search >>> > > > >>> > > >>> > > >>> > > -- >>> > > Best regards, >>> > > Andrey V. Mashenkov >>> > > >>> > >>> >>> >>> -- >>> Best regards, >>> Andrey V. Mashenkov >>> >> -- Best regards, Andrey V. Mashenkov |
In reply to this post by sdarlington
Stephen,
I don't see a reason to replace or deprecate IndexingSpi. I'm not sure how smbd uses it, but it works now. On Mon, Apr 12, 2021 at 1:42 PM Stephen Darlington < [hidden email]> wrote: > Is this a replacement for IndexingSpi? Put bluntly, do we deprecate (and > remove) it? > > Or do you see them as complimentary? > > > On 12 Apr 2021, at 11:29, Maksim Timonin <[hidden email]> > wrote: > > > > Hi Stephen! > > > > Please have a look at the QueryProcessing paragraph [1]. I've described > > why IndexingSpi doesn't fit us well. > > > > [1] > > > https://cwiki.apache.org/confluence/display/IGNITE/IEP-71+Public+API+for+secondary+index+search#IEP71PublicAPIforsecondaryindexsearch-2)QueryProcessing > > > > On Mon, Apr 12, 2021 at 1:24 PM Stephen Darlington < > > [hidden email]> wrote: > > > >> How does this fit with the current IndexingSpi? Superficially they > appear > >> to do very similar things? > >> > >> Regards, > >> Stephen > >> > >>> On 6 Apr 2021, at 14:13, Maksim Timonin <[hidden email]> > wrote: > >>> > >>> Hi, Igniters! > >>> > >>> I'd like to propose a new feature - opportunity to query and create > >> indexes > >>> from public API. > >>> > >>> It will help in some cases, where: > >>> 1. SQL is not applicable by design of user application; > >>> 2. Where IndexScan is preferable than ScanQuery for performance > reasons; > >>> 3. Functional indexes are required. > >>> > >>> Also it'll be great to have a transactional support for such queries, > >> like > >>> the "select for update" query provides. But I don't dig there much. It > >> will > >>> be a next step if this API will be implemented. > >>> > >>> I've prepared an IEP-71 for that [1] with more details. Please share > your > >>> thoughts. > >>> > >>> > >>> [1] > >>> > >> > https://cwiki.apache.org/confluence/display/IGNITE/IEP-71+Public+API+for+secondary+index+search > >> > >> > >> > > > |
Andrey,
Thanks! I picked it. On Mon, Apr 12, 2021 at 1:51 PM Maksim Timonin <[hidden email]> wrote: > Stephen, > > I don't see a reason to replace or deprecate IndexingSpi. I'm not sure how > smbd uses it, but it works now. > > On Mon, Apr 12, 2021 at 1:42 PM Stephen Darlington < > [hidden email]> wrote: > >> Is this a replacement for IndexingSpi? Put bluntly, do we deprecate (and >> remove) it? >> >> Or do you see them as complimentary? >> >> > On 12 Apr 2021, at 11:29, Maksim Timonin <[hidden email]> >> wrote: >> > >> > Hi Stephen! >> > >> > Please have a look at the QueryProcessing paragraph [1]. I've described >> > why IndexingSpi doesn't fit us well. >> > >> > [1] >> > >> https://cwiki.apache.org/confluence/display/IGNITE/IEP-71+Public+API+for+secondary+index+search#IEP71PublicAPIforsecondaryindexsearch-2)QueryProcessing >> > >> > On Mon, Apr 12, 2021 at 1:24 PM Stephen Darlington < >> > [hidden email]> wrote: >> > >> >> How does this fit with the current IndexingSpi? Superficially they >> appear >> >> to do very similar things? >> >> >> >> Regards, >> >> Stephen >> >> >> >>> On 6 Apr 2021, at 14:13, Maksim Timonin <[hidden email]> >> wrote: >> >>> >> >>> Hi, Igniters! >> >>> >> >>> I'd like to propose a new feature - opportunity to query and create >> >> indexes >> >>> from public API. >> >>> >> >>> It will help in some cases, where: >> >>> 1. SQL is not applicable by design of user application; >> >>> 2. Where IndexScan is preferable than ScanQuery for performance >> reasons; >> >>> 3. Functional indexes are required. >> >>> >> >>> Also it'll be great to have a transactional support for such queries, >> >> like >> >>> the "select for update" query provides. But I don't dig there much. It >> >> will >> >>> be a next step if this API will be implemented. >> >>> >> >>> I've prepared an IEP-71 for that [1] with more details. Please share >> your >> >>> thoughts. >> >>> >> >>> >> >>> [1] >> >>> >> >> >> https://cwiki.apache.org/confluence/display/IGNITE/IEP-71+Public+API+for+secondary+index+search >> >> >> >> >> >> >> >> >> |
Andrey, hi!
Some updates, there. I've submitted a PR for IndexQuery [1]. There is an issue about lazy page loading, that is also related to Text query ticket IGNITE-12291. CacheQueries already have pending pages functionality, it's done with multiple sending GridCacheQueryRequest. There was an issue with TextQuery and limit, after exceeding a limit we still send requests, so I submitted a patch to fix this [2]. But currently, TextQuery, as SqlFieldsQuery also does, prepares whole data on query request, holds it, and provides a cursor over this collection. As I understand you correctly, you propose to run TextQuery over index with every poll page request. We can do this with Lucene IndexSearcher.searchAfter. So from one side, it will save resources. But from the other side, no queries (no TextQuery, no SqlFieldsQuery) lock index for querying. So there can be data inconsistency, as there can be concurrent operations on an index while a user iterates over the cursor. It also could be for queries now, due to no index lock being there, but the window of time of such inconsistency is much shorter. The same dilemma I have for IndexQuery. In my patch [1] I provide lazy iteration over BPlusTree. There is no lock on an index too while querying. And I want to discuss the right way. I have in mind the next things: 1. Indexes currently doesn't support transactions, also SQL queries don't lock index for queries, so Ignite don't guarantee data consistency; 2. As I understand preparing whole data for SQL queries is required due to relations between tables. The more complex query and relations we have, the much consistency issues we have in result in case of parallel operations; 3. Querying a single index only (by TextQuery or IndexQuery) doesn't affect any relations, so we can allow concurrent updates, as it could affect a query result but it doesn't hurt. And following these thoughts, it's right to implement lazy iterations over indexes. What do you think? Also, there is a second topic to discuss. BPlusTree indexes support query parallelism. But CacheQueries don't. There needs to be a change to infrastructure to support query parallelism, so on this patch [1] I handle multiple segments in a single thread. And this works OK, as in the case of lazy querying it's very fast to initialize a cursor, so there is not much overhead on multiple segments. I ran performance tests and found that in some cases, IndexQuery beats SqlFieldsQuery even with enabled queryParallelism (it helps a SqlFieldsQuery much). So the need for supporting queryParallelism for IndexQuery is required to be tested well. As IndexQuery already can help users to speed up some queries I propose to check queryParallelism a little bit later. WDYT? So, those 2 things affect the Apache Ignite release that IndexQuery will be delivered with. So, please let me know your thoughts. Any thoughts from the community are welcome too. [1] https://github.com/apache/ignite/pull/9118 [2] https://github.com/apache/ignite/pull/9086 On Mon, Apr 12, 2021 at 1:52 PM Maksim Timonin <[hidden email]> wrote: > Andrey, > > Thanks! I picked it. > > On Mon, Apr 12, 2021 at 1:51 PM Maksim Timonin <[hidden email]> > wrote: > >> Stephen, >> >> I don't see a reason to replace or deprecate IndexingSpi. I'm not >> sure how smbd uses it, but it works now. >> >> On Mon, Apr 12, 2021 at 1:42 PM Stephen Darlington < >> [hidden email]> wrote: >> >>> Is this a replacement for IndexingSpi? Put bluntly, do we deprecate (and >>> remove) it? >>> >>> Or do you see them as complimentary? >>> >>> > On 12 Apr 2021, at 11:29, Maksim Timonin <[hidden email]> >>> wrote: >>> > >>> > Hi Stephen! >>> > >>> > Please have a look at the QueryProcessing paragraph [1]. I've described >>> > why IndexingSpi doesn't fit us well. >>> > >>> > [1] >>> > >>> https://cwiki.apache.org/confluence/display/IGNITE/IEP-71+Public+API+for+secondary+index+search#IEP71PublicAPIforsecondaryindexsearch-2)QueryProcessing >>> > >>> > On Mon, Apr 12, 2021 at 1:24 PM Stephen Darlington < >>> > [hidden email]> wrote: >>> > >>> >> How does this fit with the current IndexingSpi? Superficially they >>> appear >>> >> to do very similar things? >>> >> >>> >> Regards, >>> >> Stephen >>> >> >>> >>> On 6 Apr 2021, at 14:13, Maksim Timonin <[hidden email]> >>> wrote: >>> >>> >>> >>> Hi, Igniters! >>> >>> >>> >>> I'd like to propose a new feature - opportunity to query and create >>> >> indexes >>> >>> from public API. >>> >>> >>> >>> It will help in some cases, where: >>> >>> 1. SQL is not applicable by design of user application; >>> >>> 2. Where IndexScan is preferable than ScanQuery for performance >>> reasons; >>> >>> 3. Functional indexes are required. >>> >>> >>> >>> Also it'll be great to have a transactional support for such queries, >>> >> like >>> >>> the "select for update" query provides. But I don't dig there much. >>> It >>> >> will >>> >>> be a next step if this API will be implemented. >>> >>> >>> >>> I've prepared an IEP-71 for that [1] with more details. Please share >>> your >>> >>> thoughts. >>> >>> >>> >>> >>> >>> [1] >>> >>> >>> >> >>> https://cwiki.apache.org/confluence/display/IGNITE/IEP-71+Public+API+for+secondary+index+search >>> >> >>> >> >>> >> >>> >>> >>> |
Free forum by Nabble | Edit this page |