Hi Andrey,
I've checked this ticket comments, and there is a TC Bot visa (with no blockers). Do you have any concerns related to this patch? Sincerely, Dmitriy Pavlov чт, 17 окт. 2019 г. в 16:43, Yuriy Shuliga <[hidden email]>: > Andrey, > > Per you request, I created ticket > https://issues.apache.org/jira/browse/IGNITE-12291 linked to > https://issues.apache.org/jira/projects/IGNITE/issues/IGNITE-12189 > > Could you please proceed with PR merge ? > > BR, > Yuriy Shuliha > > ср, 9 жовт. 2019 о 12:52 Andrey Mashenkov <[hidden email]> > пише: > > > Hi Yuri, > > > > To get access to TC Bot you should register as TeamCity user [1], if you > > didn't do this already. > > Then you will be able to authorize on Ignite TC Bot page with same > > credentials. > > > > [1] https://ci.ignite.apache.org/registerUser.html > > > > On Fri, Oct 4, 2019 at 3:10 PM Yuriy Shuliga <[hidden email]> wrote: > > > >> Andrew, > >> > >> I have corrected PR according to your notes. Please review. > >> What will be the next steps in order to merge in? > >> > >> Y. > >> > >> чт, 3 жовт. 2019 о 17:47 Andrey Mashenkov <[hidden email]> > >> пише: > >> > >> > Yuri, > >> > > >> > I've done with review. > >> > No crime found, but trivial compatibility bug. > >> > > >> > On Thu, Oct 3, 2019 at 3:54 PM Yuriy Shuliga <[hidden email]> > wrote: > >> > > >> > > Denis, > >> > > > >> > > Thank you for your attention to this. > >> > > as for now, the https://issues.apache.org/jira/browse/IGNITE-12189 > >> > ticket > >> > > is still pending review. > >> > > Do we have a chance to move it forward somehow? > >> > > > >> > > BR, > >> > > Yuriy Shuliha > >> > > > >> > > пн, 30 вер. 2019 о 23:35 Denis Magda <[hidden email]> пише: > >> > > > >> > > > Yuriy, > >> > > > > >> > > > I've seen you opening a pull-request with the first changes: > >> > > > https://issues.apache.org/jira/browse/IGNITE-12189 > >> > > > > >> > > > Alex Scherbakov and Ivan are you the right guys to do the review? > >> > > > > >> > > > - > >> > > > Denis > >> > > > > >> > > > > >> > > > On Fri, Sep 27, 2019 at 8:48 AM Павлухин Иван < > [hidden email]> > >> > > wrote: > >> > > > > >> > > > > Yuriy, > >> > > > > > >> > > > > Thank you for providing details! Quite interesting. > >> > > > > > >> > > > > Yes, we already have support of distributed limit and merging > >> sorted > >> > > > > subresults for SQL queries. E.g. ReduceIndexSorted and > >> > > > > MergeStreamIterator are used for merging sorted streams. > >> > > > > > >> > > > > Could you please also clarify about score/relevance? Is it > >> provided > >> > by > >> > > > > Lucene engine for each query result? I am thinking how to do > >> sorted > >> > > > > merge properly in this case. > >> > > > > > >> > > > > ср, 25 сент. 2019 г. в 18:56, Yuriy Shuliga <[hidden email] > >: > >> > > > > > > >> > > > > > Ivan, > >> > > > > > > >> > > > > > Thank you for interesting question! > >> > > > > > > >> > > > > > Text searches (or full text searches) are mostly > human-oriented. > >> > And > >> > > > the > >> > > > > > point of user's interest is topmost part of response. > >> > > > > > Then user can read it, evaluate and use the given records for > >> > further > >> > > > > > purposes. > >> > > > > > > >> > > > > > Particularly in our case, we use Ignite for operations with > >> > financial > >> > > > > data, > >> > > > > > and there lots of text stuff like assets names, fin. > >> instruments, > >> > > > > companies > >> > > > > > etc. > >> > > > > > In order to operate with this quickly and reliably, users used > >> to > >> > > work > >> > > > > with > >> > > > > > text search, type-ahead completions, suggestions. > >> > > > > > > >> > > > > > For this purposes we are indexing particular string data in > >> > separate > >> > > > > caches. > >> > > > > > > >> > > > > > Sorting capabilities and response size limitations are very > >> > important > >> > > > > > there. As our API have to provide most relevant information in > >> view > >> > > of > >> > > > > > limited size. > >> > > > > > > >> > > > > > Now let me comment some Ignite/Lucene perspective. > >> > > > > > Actually Ignite queries and Lucene returns *TopDocs.scoresDocs > >> > > *already > >> > > > > > sorted by *score *(relevance). So most relevant documents are > on > >> > the > >> > > > top. > >> > > > > > And currently distributed queries responses from different > nodes > >> > are > >> > > > > merged > >> > > > > > into final query cursor queue in arbitrary way. > >> > > > > > So in fact we already have the score order ruined here. Also > >> Ignite > >> > > > > > requests all possible documents from Lucene that is redundant > >> and > >> > not > >> > > > > good > >> > > > > > for performance. > >> > > > > > > >> > > > > > I'm implementing *limit* parameter to be part of *TextQuery > *and > >> > have > >> > > > to > >> > > > > > notice that we still have to add sorting for text queries > >> > processing > >> > > in > >> > > > > > order to have applicable results. > >> > > > > > > >> > > > > > *Limit* parameter itself should improve the part of issues > from > >> > > above, > >> > > > > but > >> > > > > > definitely, sorting by document score at least should be > >> > implemented > >> > > > > along > >> > > > > > with limit. > >> > > > > > > >> > > > > > This is a pretty short commentary if you still have any > >> questions, > >> > > > please > >> > > > > > ask, do not hesitate) > >> > > > > > > >> > > > > > BR, > >> > > > > > Yuriy Shuliha > >> > > > > > > >> > > > > > чт, 19 вер. 2019 о 11:38 Павлухин Иван <[hidden email]> > >> пише: > >> > > > > > > >> > > > > > > Yuriy, > >> > > > > > > > >> > > > > > > Greatly appreciate your interest. > >> > > > > > > > >> > > > > > > Could you please elaborate a little bit about sorting? What > >> tasks > >> > > > does > >> > > > > > > it help to solve and how? It would be great to provide an > >> > example. > >> > > > > > > > >> > > > > > > ср, 18 сент. 2019 г. в 09:39, Alexei Scherbakov < > >> > > > > > > [hidden email]>: > >> > > > > > > > > >> > > > > > > > Denis, > >> > > > > > > > > >> > > > > > > > I like the idea of throwing an exception for enabled text > >> > queries > >> > > > on > >> > > > > > > > persistent caches. > >> > > > > > > > > >> > > > > > > > Also I'm fine with proposed limit for unsorted searches. > >> > > > > > > > > >> > > > > > > > Yury, please proceed with ticket creation. > >> > > > > > > > > >> > > > > > > > вт, 17 сент. 2019 г., 22:06 Denis Magda < > [hidden email] > >> >: > >> > > > > > > > > >> > > > > > > > > Igniters, > >> > > > > > > > > > >> > > > > > > > > I see nothing wrong with Yury's proposal in regards > >> full-text > >> > > > > search > >> > > > > > > API > >> > > > > > > > > evolution as long as Yury is ready to push it forward. > >> > > > > > > > > > >> > > > > > > > > As for the in-memory mode only, it makes total sense for > >> > > > in-memory > >> > > > > data > >> > > > > > > > > grid deployments when Ignite caches data of an > underlying > >> DB > >> > > like > >> > > > > > > Postgres. > >> > > > > > > > > As part of the changes, I would simply throw an > exception > >> (by > >> > > > > default) > >> > > > > > > if > >> > > > > > > > > the one attempts to use text indices with the native > >> > > persistence > >> > > > > > > enabled. > >> > > > > > > > > If the person is ready to live with that limitation that > >> an > >> > > > > explicit > >> > > > > > > > > configuration change is needed to come around the > >> exception. > >> > > > > > > > > > >> > > > > > > > > Thoughts? > >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > - > >> > > > > > > > > Denis > >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > On Tue, Sep 17, 2019 at 7:44 AM Yuriy Shuliga < > >> > > [hidden email] > >> > > > > > >> > > > > > > wrote: > >> > > > > > > > > > >> > > > > > > > > > Hello to all again, > >> > > > > > > > > > > >> > > > > > > > > > Thank you for important comments and notes given > below! > >> > > > > > > > > > > >> > > > > > > > > > Let me answer and continue the discussion. > >> > > > > > > > > > > >> > > > > > > > > > (I) Overall needs in Lucene indexing > >> > > > > > > > > > > >> > > > > > > > > > Alexei has referenced to > >> > > > > > > > > > https://issues.apache.org/jira/browse/IGNITE-5371 > where > >> > > > > > > > > > absence of index persistence was declared as an > >> obstacle to > >> > > > > further > >> > > > > > > > > > development. > >> > > > > > > > > > > >> > > > > > > > > > a) This ticket is already closed as not valid.b) There > >> are > >> > > > > definite > >> > > > > > > needs > >> > > > > > > > > > (and in our project as well) in just in-memory > indexing > >> of > >> > > > > selected > >> > > > > > > data. > >> > > > > > > > > > We intend to use search capabilities for fetching > >> limited > >> > > > amount > >> > > > > of > >> > > > > > > > > records > >> > > > > > > > > > that should be used in type-ahead search / > suggestions. > >> > > > > > > > > > Not all of the data will be indexed and the are no > need > >> in > >> > > > Lucene > >> > > > > > > index > >> > > > > > > > > to > >> > > > > > > > > > be persistence. Hope this is a wide pattern of > >> text-search > >> > > > usage. > >> > > > > > > > > > > >> > > > > > > > > > (II) Necessary fixes in current implementation. > >> > > > > > > > > > > >> > > > > > > > > > a) Implementation of correct *limit *(*offset* seems > to > >> be > >> > > not > >> > > > > > > required > >> > > > > > > > > in > >> > > > > > > > > > text-search tasks for now) > >> > > > > > > > > > I have investigated the data flow for distributed text > >> > > queries. > >> > > > > it > >> > > > > > > was > >> > > > > > > > > > simple test prefix query, like 'name'*='ene*'* > >> > > > > > > > > > For now each server-node returns all response records > to > >> > the > >> > > > > > > client-node > >> > > > > > > > > > and it may contain ~thousands, ~hundred thousands > >> records. > >> > > > > > > > > > Event if we need only first 10-100. Again, all the > >> results > >> > > are > >> > > > > added > >> > > > > > > to > >> > > > > > > > > > queue in GridCacheQueryFutureAdapter in arbitrary > order > >> by > >> > > > pages. > >> > > > > > > > > > I did not find here any means to deliver deterministic > >> > > result. > >> > > > > > > > > > So implementing limit as part of query and > >> > > > > (GridCacheQueryRequest) > >> > > > > > > will > >> > > > > > > > > not > >> > > > > > > > > > change the nature of response but will limit load on > >> nodes > >> > > and > >> > > > > > > > > networking. > >> > > > > > > > > > > >> > > > > > > > > > Can we consider to open a ticket for this? > >> > > > > > > > > > > >> > > > > > > > > > (III) Further extension of Lucene API exposition to > >> Ignite > >> > > > > > > > > > > >> > > > > > > > > > a) Sorting > >> > > > > > > > > > The solution for this could be: > >> > > > > > > > > > - Make entities comparable > >> > > > > > > > > > - Add custom comparator to entity > >> > > > > > > > > > - Add annotations to mark sorted fields for Lucene > >> indexing > >> > > > > > > > > > - Use comparators when merging responses or reducing > to > >> > > desired > >> > > > > > > limit on > >> > > > > > > > > > client node. > >> > > > > > > > > > Will require full result set to be loaded into memory. > >> > Though > >> > > > > can be > >> > > > > > > used > >> > > > > > > > > > for relatively small limits. > >> > > > > > > > > > BR, > >> > > > > > > > > > Yuriy Shuliha > >> > > > > > > > > > > >> > > > > > > > > > пт, 30 серп. 2019 о 10:37 Alexei Scherbakov < > >> > > > > > > > > [hidden email]> > >> > > > > > > > > > пише: > >> > > > > > > > > > > >> > > > > > > > > > > Yuriy, > >> > > > > > > > > > > > >> > > > > > > > > > > Note what one of major blockers for text queries is > >> [1] > >> > > which > >> > > > > makes > >> > > > > > > > > > lucene > >> > > > > > > > > > > indexes unusable with persistence and main reason > for > >> > > > > > > discontinuation. > >> > > > > > > > > > > Probably it's should be addressed first to make text > >> > > queries > >> > > > a > >> > > > > > > valid > >> > > > > > > > > > > product feature. > >> > > > > > > > > > > > >> > > > > > > > > > > Distributed sorting and advanved querying is indeed > >> not a > >> > > > > trivial > >> > > > > > > task. > >> > > > > > > > > > > Some kind of merging must be implemented on query > >> > > originating > >> > > > > node. > >> > > > > > > > > > > > >> > > > > > > > > > > [1] > https://issues.apache.org/jira/browse/IGNITE-5371 > >> > > > > > > > > > > > >> > > > > > > > > > > чт, 29 авг. 2019 г. в 23:38, Denis Magda < > >> > > [hidden email] > >> > > > >: > >> > > > > > > > > > > > >> > > > > > > > > > > > Yuriy, > >> > > > > > > > > > > > > >> > > > > > > > > > > > If you are ready to take over the full-text search > >> > > indexes > >> > > > > then > >> > > > > > > > > please > >> > > > > > > > > > go > >> > > > > > > > > > > > ahead. The primary reason why the community wants > to > >> > > > > discontinue > >> > > > > > > them > >> > > > > > > > > > > first > >> > > > > > > > > > > > (and, probable, resurrect later) are the > limitations > >> > > listed > >> > > > > by > >> > > > > > > Andrey > >> > > > > > > > > > and > >> > > > > > > > > > > > minimal support from the community end. > >> > > > > > > > > > > > > >> > > > > > > > > > > > - > >> > > > > > > > > > > > Denis > >> > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > On Thu, Aug 29, 2019 at 1:29 PM Andrey Mashenkov < > >> > > > > > > > > > > > [hidden email]> > >> > > > > > > > > > > > wrote: > >> > > > > > > > > > > > > >> > > > > > > > > > > > > Hi Yuriy, > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > Unfortunatelly, there is a plan to discontinue > >> > > > TextQueries > >> > > > > in > >> > > > > > > > > Ignite > >> > > > > > > > > > > [1]. > >> > > > > > > > > > > > > Motivation here is text indexes are not > >> persistent, > >> > not > >> > > > > > > > > transactional > >> > > > > > > > > > > and > >> > > > > > > > > > > > > can't be user together with SQL or inside SQL. > >> > > > > > > > > > > > > and there is a lack of interest from community > >> side. > >> > > > > > > > > > > > > You are weclome to take on these issues and make > >> > > > > TextQueries > >> > > > > > > great. > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > 1, PageSize can't be used to limit resultset. > >> > > > > > > > > > > > > Query results return from data node to > client-side > >> > > cursor > >> > > > > in > >> > > > > > > > > > > page-by-page > >> > > > > > > > > > > > > manner and > >> > > > > > > > > > > > > this parameter is designed control page size. It > >> is > >> > > > > supposed > >> > > > > > > query > >> > > > > > > > > > > > executes > >> > > > > > > > > > > > > lazily on server side and > >> > > > > > > > > > > > > it is not excepted full resultset be loaded to > >> memory > >> > > on > >> > > > > server > >> > > > > > > > > side > >> > > > > > > > > > at > >> > > > > > > > > > > > > once, but by pages. > >> > > > > > > > > > > > > Do you mean you found Lucene load entire > resultset > >> > into > >> > > > > memory > >> > > > > > > > > before > >> > > > > > > > > > > > first > >> > > > > > > > > > > > > page is sent to client? > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > I'd think a new parameter should be added to > limit > >> > > > result. > >> > > > > The > >> > > > > > > best > >> > > > > > > > > > > > > solution is to use query language commands for > >> this, > >> > > e.g. > >> > > > > > > > > > > "LIMIT/OFFSET" > >> > > > > > > > > > > > in > >> > > > > > > > > > > > > SQL. > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > This task doesn't look trivial. Query is > >> distributed > >> > > > > operation > >> > > > > > > and > >> > > > > > > > > > same > >> > > > > > > > > > > > > user query will be executed on data nodes > >> > > > > > > > > > > > > and then results from all nodes should be > correcly > >> > > merged > >> > > > > > > before > >> > > > > > > > > > being > >> > > > > > > > > > > > > returned via client-cursor. > >> > > > > > > > > > > > > So, LIMIT should be applied on every node and > >> then on > >> > > > merge > >> > > > > > > phase. > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > Also, this may be non-obviuos, limiting results > >> make > >> > no > >> > > > > sence > >> > > > > > > > > without > >> > > > > > > > > > > > > sorting, > >> > > > > > > > > > > > > as there is no guarantee every next query run > will > >> > > return > >> > > > > same > >> > > > > > > data > >> > > > > > > > > > > > because > >> > > > > > > > > > > > > of page reordeing. > >> > > > > > > > > > > > > Basically, merge phase receive results from data > >> > nodes > >> > > > > > > > > asynchronously > >> > > > > > > > > > > and > >> > > > > > > > > > > > > messages from different nodes can't be ordered. > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > 2. > >> > > > > > > > > > > > > a. "tokenize" param name (for @QueryTextFiled) > >> looks > >> > > more > >> > > > > > > verbose, > >> > > > > > > > > > > isn't > >> > > > > > > > > > > > > it. > >> > > > > > > > > > > > > b,c. What about distributed query? How partial > >> > results > >> > > > from > >> > > > > > > nodes > >> > > > > > > > > > will > >> > > > > > > > > > > be > >> > > > > > > > > > > > > merged? > >> > > > > > > > > > > > > Does Lucene allows to configure comparator for > >> data > >> > > > > sorting? > >> > > > > > > > > > > > > What comparator Ignite should choose to sort > >> result > >> > on > >> > > > > merge > >> > > > > > > phase? > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > 3. For now Lucene engine is not configurable at > >> all. > >> > > E.g. > >> > > > > it is > >> > > > > > > > > > > > impossible > >> > > > > > > > > > > > > to configure Tokenizer. > >> > > > > > > > > > > > > I'd think about possible ways to configure > engine > >> at > >> > > > first > >> > > > > and > >> > > > > > > only > >> > > > > > > > > > > then > >> > > > > > > > > > > > go > >> > > > > > > > > > > > > further to discuss\implement complex features, > >> > > > > > > > > > > > > that may depends on engine config. > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > On Thu, Aug 29, 2019 at 8:17 PM Yuriy Shuliga < > >> > > > > > > [hidden email]> > >> > > > > > > > > > > wrote: > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > > Dear community, > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > By starting this chain I'd like to open > >> discussion > >> > > that > >> > > > > would > >> > > > > > > > > come > >> > > > > > > > > > to > >> > > > > > > > > > > > > > contribution results in subj. area. > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > Ignite has indexing capabilities, backed up by > >> > > > different > >> > > > > > > > > > mechanisms, > >> > > > > > > > > > > > > > including Lucene. > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > Currently, Lucene 7.5.0 is used (past year > >> > release). > >> > > > > > > > > > > > > > This is a wide spread and mature technology > that > >> > > covers > >> > > > > text > >> > > > > > > > > search > >> > > > > > > > > > > > area > >> > > > > > > > > > > > > > and beyond (e.g. spacial data indexing). > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > My goal is to *expose more Lucene > functionality > >> to > >> > > > Ignite > >> > > > > > > > > indexing > >> > > > > > > > > > > and > >> > > > > > > > > > > > > > query mechanisms for text data*. > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > It's quite simple request at current stage. It > >> is > >> > > > coming > >> > > > > > > from our > >> > > > > > > > > > > > > project's > >> > > > > > > > > > > > > > needs, but i believe, will be useful for a lot > >> more > >> > > > > people. > >> > > > > > > > > > > > > > Let's walk through and vote or discuss about > >> Jira > >> > > > > tickets for > >> > > > > > > > > them. > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > 1.[trivial] Use dataQuery.getPageSize() to > >> limit > >> > > > search > >> > > > > > > > > response > >> > > > > > > > > > > > items > >> > > > > > > > > > > > > > inside GridLuceneIndex.query(). Currently it > is > >> > > calling > >> > > > > > > > > > > > > > IndexSearcher.search(query, > >> *Integer.MAX_VALUE*) - > >> > so > >> > > > > > > basically > >> > > > > > > > > all > >> > > > > > > > > > > > > scored > >> > > > > > > > > > > > > > matches will me returned, what we do not need > in > >> > most > >> > > > > cases. > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > 2.[simple] Add sorting. Then more capable > >> search > >> > > call > >> > > > > can be > >> > > > > > > > > > > > > > executed: *IndexSearcher.search(query, count, > >> > > > > > > > > > > > > > sort) * > >> > > > > > > > > > > > > > Implementation steps: > >> > > > > > > > > > > > > > a) Introduce boolean *sortField* parameter in > >> > > > > > > *@QueryTextFiled * > >> > > > > > > > > > > > > > annotation. If > >> > > > > > > > > > > > > > *true *the filed will be indexed but not > >> tokenized. > >> > > > > Number > >> > > > > > > types > >> > > > > > > > > > are > >> > > > > > > > > > > > > > preferred here. > >> > > > > > > > > > > > > > b) Add *sort* collection to *TextQuery* > >> > constructor. > >> > > It > >> > > > > > > should > >> > > > > > > > > > define > >> > > > > > > > > > > > > > desired sort fields used for querying. > >> > > > > > > > > > > > > > c) Implement Lucene sort usage in > >> > > > > GridLuceneIndex.query(). > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > 3.[moderate] Build complex queries with > >> > *TextQuery*, > >> > > > > > > including > >> > > > > > > > > > > > > > terms/queries boosting. > >> > > > > > > > > > > > > > *This section for voting only, as requires > more > >> > > > detailed > >> > > > > > > work. > >> > > > > > > > > > Should > >> > > > > > > > > > > > be > >> > > > > > > > > > > > > > extended if community is interested in it.* > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > Looking forward to your comments! > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > BR, > >> > > > > > > > > > > > > > Yuriy Shuliha > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > -- > >> > > > > > > > > > > > > Best regards, > >> > > > > > > > > > > > > Andrey V. Mashenkov > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > -- > >> > > > > > > > > > > > >> > > > > > > > > > > Best regards, > >> > > > > > > > > > > Alexei Scherbakov > >> > > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > -- > >> > > > > > > Best regards, > >> > > > > > > Ivan Pavlukhin > >> > > > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > -- > >> > > > > Best regards, > >> > > > > Ivan Pavlukhin > >> > > > > > >> > > > > >> > > > >> > > >> > > >> > -- > >> > Best regards, > >> > Andrey V. Mashenkov > >> > > >> > > > > > > -- > > Best regards, > > Andrey V. Mashenkov > > > |
Hi Dmitry, Yuriy.
I've found GridCacheQueryFutureAdapter has newly added AtomicInteger 'total' field and 'limit; field as primitive int. Both fields are used inside synchronized block only. So, we can make both private and downgrade AtomicInteger to primitive int. Most likely, these fields can be replaced with one field. On Mon, Oct 21, 2019 at 10:01 PM Dmitriy Pavlov <[hidden email]> wrote: > Hi Andrey, > > I've checked this ticket comments, and there is a TC Bot visa (with no > blockers). > > Do you have any concerns related to this patch? > > Sincerely, > Dmitriy Pavlov > > чт, 17 окт. 2019 г. в 16:43, Yuriy Shuliga <[hidden email]>: > >> Andrey, >> >> Per you request, I created ticket >> https://issues.apache.org/jira/browse/IGNITE-12291 linked to >> https://issues.apache.org/jira/projects/IGNITE/issues/IGNITE-12189 >> >> Could you please proceed with PR merge ? >> >> BR, >> Yuriy Shuliha >> >> ср, 9 жовт. 2019 о 12:52 Andrey Mashenkov <[hidden email]> >> пише: >> >> > Hi Yuri, >> > >> > To get access to TC Bot you should register as TeamCity user [1], if you >> > didn't do this already. >> > Then you will be able to authorize on Ignite TC Bot page with same >> > credentials. >> > >> > [1] https://ci.ignite.apache.org/registerUser.html >> > >> > On Fri, Oct 4, 2019 at 3:10 PM Yuriy Shuliga <[hidden email]> wrote: >> > >> >> Andrew, >> >> >> >> I have corrected PR according to your notes. Please review. >> >> What will be the next steps in order to merge in? >> >> >> >> Y. >> >> >> >> чт, 3 жовт. 2019 о 17:47 Andrey Mashenkov <[hidden email]> >> >> пише: >> >> >> >> > Yuri, >> >> > >> >> > I've done with review. >> >> > No crime found, but trivial compatibility bug. >> >> > >> >> > On Thu, Oct 3, 2019 at 3:54 PM Yuriy Shuliga <[hidden email]> >> wrote: >> >> > >> >> > > Denis, >> >> > > >> >> > > Thank you for your attention to this. >> >> > > as for now, the https://issues.apache.org/jira/browse/IGNITE-12189 >> >> > ticket >> >> > > is still pending review. >> >> > > Do we have a chance to move it forward somehow? >> >> > > >> >> > > BR, >> >> > > Yuriy Shuliha >> >> > > >> >> > > пн, 30 вер. 2019 о 23:35 Denis Magda <[hidden email]> пише: >> >> > > >> >> > > > Yuriy, >> >> > > > >> >> > > > I've seen you opening a pull-request with the first changes: >> >> > > > https://issues.apache.org/jira/browse/IGNITE-12189 >> >> > > > >> >> > > > Alex Scherbakov and Ivan are you the right guys to do the review? >> >> > > > >> >> > > > - >> >> > > > Denis >> >> > > > >> >> > > > >> >> > > > On Fri, Sep 27, 2019 at 8:48 AM Павлухин Иван < >> [hidden email]> >> >> > > wrote: >> >> > > > >> >> > > > > Yuriy, >> >> > > > > >> >> > > > > Thank you for providing details! Quite interesting. >> >> > > > > >> >> > > > > Yes, we already have support of distributed limit and merging >> >> sorted >> >> > > > > subresults for SQL queries. E.g. ReduceIndexSorted and >> >> > > > > MergeStreamIterator are used for merging sorted streams. >> >> > > > > >> >> > > > > Could you please also clarify about score/relevance? Is it >> >> provided >> >> > by >> >> > > > > Lucene engine for each query result? I am thinking how to do >> >> sorted >> >> > > > > merge properly in this case. >> >> > > > > >> >> > > > > ср, 25 сент. 2019 г. в 18:56, Yuriy Shuliga <[hidden email] >> >: >> >> > > > > > >> >> > > > > > Ivan, >> >> > > > > > >> >> > > > > > Thank you for interesting question! >> >> > > > > > >> >> > > > > > Text searches (or full text searches) are mostly >> human-oriented. >> >> > And >> >> > > > the >> >> > > > > > point of user's interest is topmost part of response. >> >> > > > > > Then user can read it, evaluate and use the given records for >> >> > further >> >> > > > > > purposes. >> >> > > > > > >> >> > > > > > Particularly in our case, we use Ignite for operations with >> >> > financial >> >> > > > > data, >> >> > > > > > and there lots of text stuff like assets names, fin. >> >> instruments, >> >> > > > > companies >> >> > > > > > etc. >> >> > > > > > In order to operate with this quickly and reliably, users >> used >> >> to >> >> > > work >> >> > > > > with >> >> > > > > > text search, type-ahead completions, suggestions. >> >> > > > > > >> >> > > > > > For this purposes we are indexing particular string data in >> >> > separate >> >> > > > > caches. >> >> > > > > > >> >> > > > > > Sorting capabilities and response size limitations are very >> >> > important >> >> > > > > > there. As our API have to provide most relevant information >> in >> >> view >> >> > > of >> >> > > > > > limited size. >> >> > > > > > >> >> > > > > > Now let me comment some Ignite/Lucene perspective. >> >> > > > > > Actually Ignite queries and Lucene returns >> *TopDocs.scoresDocs >> >> > > *already >> >> > > > > > sorted by *score *(relevance). So most relevant documents >> are on >> >> > the >> >> > > > top. >> >> > > > > > And currently distributed queries responses from different >> nodes >> >> > are >> >> > > > > merged >> >> > > > > > into final query cursor queue in arbitrary way. >> >> > > > > > So in fact we already have the score order ruined here. Also >> >> Ignite >> >> > > > > > requests all possible documents from Lucene that is redundant >> >> and >> >> > not >> >> > > > > good >> >> > > > > > for performance. >> >> > > > > > >> >> > > > > > I'm implementing *limit* parameter to be part of *TextQuery >> *and >> >> > have >> >> > > > to >> >> > > > > > notice that we still have to add sorting for text queries >> >> > processing >> >> > > in >> >> > > > > > order to have applicable results. >> >> > > > > > >> >> > > > > > *Limit* parameter itself should improve the part of issues >> from >> >> > > above, >> >> > > > > but >> >> > > > > > definitely, sorting by document score at least should be >> >> > implemented >> >> > > > > along >> >> > > > > > with limit. >> >> > > > > > >> >> > > > > > This is a pretty short commentary if you still have any >> >> questions, >> >> > > > please >> >> > > > > > ask, do not hesitate) >> >> > > > > > >> >> > > > > > BR, >> >> > > > > > Yuriy Shuliha >> >> > > > > > >> >> > > > > > чт, 19 вер. 2019 о 11:38 Павлухин Иван <[hidden email]> >> >> пише: >> >> > > > > > >> >> > > > > > > Yuriy, >> >> > > > > > > >> >> > > > > > > Greatly appreciate your interest. >> >> > > > > > > >> >> > > > > > > Could you please elaborate a little bit about sorting? What >> >> tasks >> >> > > > does >> >> > > > > > > it help to solve and how? It would be great to provide an >> >> > example. >> >> > > > > > > >> >> > > > > > > ср, 18 сент. 2019 г. в 09:39, Alexei Scherbakov < >> >> > > > > > > [hidden email]>: >> >> > > > > > > > >> >> > > > > > > > Denis, >> >> > > > > > > > >> >> > > > > > > > I like the idea of throwing an exception for enabled text >> >> > queries >> >> > > > on >> >> > > > > > > > persistent caches. >> >> > > > > > > > >> >> > > > > > > > Also I'm fine with proposed limit for unsorted searches. >> >> > > > > > > > >> >> > > > > > > > Yury, please proceed with ticket creation. >> >> > > > > > > > >> >> > > > > > > > вт, 17 сент. 2019 г., 22:06 Denis Magda < >> [hidden email] >> >> >: >> >> > > > > > > > >> >> > > > > > > > > Igniters, >> >> > > > > > > > > >> >> > > > > > > > > I see nothing wrong with Yury's proposal in regards >> >> full-text >> >> > > > > search >> >> > > > > > > API >> >> > > > > > > > > evolution as long as Yury is ready to push it forward. >> >> > > > > > > > > >> >> > > > > > > > > As for the in-memory mode only, it makes total sense >> for >> >> > > > in-memory >> >> > > > > data >> >> > > > > > > > > grid deployments when Ignite caches data of an >> underlying >> >> DB >> >> > > like >> >> > > > > > > Postgres. >> >> > > > > > > > > As part of the changes, I would simply throw an >> exception >> >> (by >> >> > > > > default) >> >> > > > > > > if >> >> > > > > > > > > the one attempts to use text indices with the native >> >> > > persistence >> >> > > > > > > enabled. >> >> > > > > > > > > If the person is ready to live with that limitation >> that >> >> an >> >> > > > > explicit >> >> > > > > > > > > configuration change is needed to come around the >> >> exception. >> >> > > > > > > > > >> >> > > > > > > > > Thoughts? >> >> > > > > > > > > >> >> > > > > > > > > >> >> > > > > > > > > - >> >> > > > > > > > > Denis >> >> > > > > > > > > >> >> > > > > > > > > >> >> > > > > > > > > On Tue, Sep 17, 2019 at 7:44 AM Yuriy Shuliga < >> >> > > [hidden email] >> >> > > > > >> >> > > > > > > wrote: >> >> > > > > > > > > >> >> > > > > > > > > > Hello to all again, >> >> > > > > > > > > > >> >> > > > > > > > > > Thank you for important comments and notes given >> below! >> >> > > > > > > > > > >> >> > > > > > > > > > Let me answer and continue the discussion. >> >> > > > > > > > > > >> >> > > > > > > > > > (I) Overall needs in Lucene indexing >> >> > > > > > > > > > >> >> > > > > > > > > > Alexei has referenced to >> >> > > > > > > > > > https://issues.apache.org/jira/browse/IGNITE-5371 >> where >> >> > > > > > > > > > absence of index persistence was declared as an >> >> obstacle to >> >> > > > > further >> >> > > > > > > > > > development. >> >> > > > > > > > > > >> >> > > > > > > > > > a) This ticket is already closed as not valid.b) >> There >> >> are >> >> > > > > definite >> >> > > > > > > needs >> >> > > > > > > > > > (and in our project as well) in just in-memory >> indexing >> >> of >> >> > > > > selected >> >> > > > > > > data. >> >> > > > > > > > > > We intend to use search capabilities for fetching >> >> limited >> >> > > > amount >> >> > > > > of >> >> > > > > > > > > records >> >> > > > > > > > > > that should be used in type-ahead search / >> suggestions. >> >> > > > > > > > > > Not all of the data will be indexed and the are no >> need >> >> in >> >> > > > Lucene >> >> > > > > > > index >> >> > > > > > > > > to >> >> > > > > > > > > > be persistence. Hope this is a wide pattern of >> >> text-search >> >> > > > usage. >> >> > > > > > > > > > >> >> > > > > > > > > > (II) Necessary fixes in current implementation. >> >> > > > > > > > > > >> >> > > > > > > > > > a) Implementation of correct *limit *(*offset* seems >> to >> >> be >> >> > > not >> >> > > > > > > required >> >> > > > > > > > > in >> >> > > > > > > > > > text-search tasks for now) >> >> > > > > > > > > > I have investigated the data flow for distributed >> text >> >> > > queries. >> >> > > > > it >> >> > > > > > > was >> >> > > > > > > > > > simple test prefix query, like 'name'*='ene*'* >> >> > > > > > > > > > For now each server-node returns all response >> records to >> >> > the >> >> > > > > > > client-node >> >> > > > > > > > > > and it may contain ~thousands, ~hundred thousands >> >> records. >> >> > > > > > > > > > Event if we need only first 10-100. Again, all the >> >> results >> >> > > are >> >> > > > > added >> >> > > > > > > to >> >> > > > > > > > > > queue in GridCacheQueryFutureAdapter in arbitrary >> order >> >> by >> >> > > > pages. >> >> > > > > > > > > > I did not find here any means to deliver >> deterministic >> >> > > result. >> >> > > > > > > > > > So implementing limit as part of query and >> >> > > > > (GridCacheQueryRequest) >> >> > > > > > > will >> >> > > > > > > > > not >> >> > > > > > > > > > change the nature of response but will limit load on >> >> nodes >> >> > > and >> >> > > > > > > > > networking. >> >> > > > > > > > > > >> >> > > > > > > > > > Can we consider to open a ticket for this? >> >> > > > > > > > > > >> >> > > > > > > > > > (III) Further extension of Lucene API exposition to >> >> Ignite >> >> > > > > > > > > > >> >> > > > > > > > > > a) Sorting >> >> > > > > > > > > > The solution for this could be: >> >> > > > > > > > > > - Make entities comparable >> >> > > > > > > > > > - Add custom comparator to entity >> >> > > > > > > > > > - Add annotations to mark sorted fields for Lucene >> >> indexing >> >> > > > > > > > > > - Use comparators when merging responses or reducing >> to >> >> > > desired >> >> > > > > > > limit on >> >> > > > > > > > > > client node. >> >> > > > > > > > > > Will require full result set to be loaded into >> memory. >> >> > Though >> >> > > > > can be >> >> > > > > > > used >> >> > > > > > > > > > for relatively small limits. >> >> > > > > > > > > > BR, >> >> > > > > > > > > > Yuriy Shuliha >> >> > > > > > > > > > >> >> > > > > > > > > > пт, 30 серп. 2019 о 10:37 Alexei Scherbakov < >> >> > > > > > > > > [hidden email]> >> >> > > > > > > > > > пише: >> >> > > > > > > > > > >> >> > > > > > > > > > > Yuriy, >> >> > > > > > > > > > > >> >> > > > > > > > > > > Note what one of major blockers for text queries is >> >> [1] >> >> > > which >> >> > > > > makes >> >> > > > > > > > > > lucene >> >> > > > > > > > > > > indexes unusable with persistence and main reason >> for >> >> > > > > > > discontinuation. >> >> > > > > > > > > > > Probably it's should be addressed first to make >> text >> >> > > queries >> >> > > > a >> >> > > > > > > valid >> >> > > > > > > > > > > product feature. >> >> > > > > > > > > > > >> >> > > > > > > > > > > Distributed sorting and advanved querying is indeed >> >> not a >> >> > > > > trivial >> >> > > > > > > task. >> >> > > > > > > > > > > Some kind of merging must be implemented on query >> >> > > originating >> >> > > > > node. >> >> > > > > > > > > > > >> >> > > > > > > > > > > [1] >> https://issues.apache.org/jira/browse/IGNITE-5371 >> >> > > > > > > > > > > >> >> > > > > > > > > > > чт, 29 авг. 2019 г. в 23:38, Denis Magda < >> >> > > [hidden email] >> >> > > > >: >> >> > > > > > > > > > > >> >> > > > > > > > > > > > Yuriy, >> >> > > > > > > > > > > > >> >> > > > > > > > > > > > If you are ready to take over the full-text >> search >> >> > > indexes >> >> > > > > then >> >> > > > > > > > > please >> >> > > > > > > > > > go >> >> > > > > > > > > > > > ahead. The primary reason why the community >> wants to >> >> > > > > discontinue >> >> > > > > > > them >> >> > > > > > > > > > > first >> >> > > > > > > > > > > > (and, probable, resurrect later) are the >> limitations >> >> > > listed >> >> > > > > by >> >> > > > > > > Andrey >> >> > > > > > > > > > and >> >> > > > > > > > > > > > minimal support from the community end. >> >> > > > > > > > > > > > >> >> > > > > > > > > > > > - >> >> > > > > > > > > > > > Denis >> >> > > > > > > > > > > > >> >> > > > > > > > > > > > >> >> > > > > > > > > > > > On Thu, Aug 29, 2019 at 1:29 PM Andrey Mashenkov >> < >> >> > > > > > > > > > > > [hidden email]> >> >> > > > > > > > > > > > wrote: >> >> > > > > > > > > > > > >> >> > > > > > > > > > > > > Hi Yuriy, >> >> > > > > > > > > > > > > >> >> > > > > > > > > > > > > Unfortunatelly, there is a plan to discontinue >> >> > > > TextQueries >> >> > > > > in >> >> > > > > > > > > Ignite >> >> > > > > > > > > > > [1]. >> >> > > > > > > > > > > > > Motivation here is text indexes are not >> >> persistent, >> >> > not >> >> > > > > > > > > transactional >> >> > > > > > > > > > > and >> >> > > > > > > > > > > > > can't be user together with SQL or inside SQL. >> >> > > > > > > > > > > > > and there is a lack of interest from community >> >> side. >> >> > > > > > > > > > > > > You are weclome to take on these issues and >> make >> >> > > > > TextQueries >> >> > > > > > > great. >> >> > > > > > > > > > > > > >> >> > > > > > > > > > > > > 1, PageSize can't be used to limit resultset. >> >> > > > > > > > > > > > > Query results return from data node to >> client-side >> >> > > cursor >> >> > > > > in >> >> > > > > > > > > > > page-by-page >> >> > > > > > > > > > > > > manner and >> >> > > > > > > > > > > > > this parameter is designed control page size. >> It >> >> is >> >> > > > > supposed >> >> > > > > > > query >> >> > > > > > > > > > > > executes >> >> > > > > > > > > > > > > lazily on server side and >> >> > > > > > > > > > > > > it is not excepted full resultset be loaded to >> >> memory >> >> > > on >> >> > > > > server >> >> > > > > > > > > side >> >> > > > > > > > > > at >> >> > > > > > > > > > > > > once, but by pages. >> >> > > > > > > > > > > > > Do you mean you found Lucene load entire >> resultset >> >> > into >> >> > > > > memory >> >> > > > > > > > > before >> >> > > > > > > > > > > > first >> >> > > > > > > > > > > > > page is sent to client? >> >> > > > > > > > > > > > > >> >> > > > > > > > > > > > > I'd think a new parameter should be added to >> limit >> >> > > > result. >> >> > > > > The >> >> > > > > > > best >> >> > > > > > > > > > > > > solution is to use query language commands for >> >> this, >> >> > > e.g. >> >> > > > > > > > > > > "LIMIT/OFFSET" >> >> > > > > > > > > > > > in >> >> > > > > > > > > > > > > SQL. >> >> > > > > > > > > > > > > >> >> > > > > > > > > > > > > This task doesn't look trivial. Query is >> >> distributed >> >> > > > > operation >> >> > > > > > > and >> >> > > > > > > > > > same >> >> > > > > > > > > > > > > user query will be executed on data nodes >> >> > > > > > > > > > > > > and then results from all nodes should be >> correcly >> >> > > merged >> >> > > > > > > before >> >> > > > > > > > > > being >> >> > > > > > > > > > > > > returned via client-cursor. >> >> > > > > > > > > > > > > So, LIMIT should be applied on every node and >> >> then on >> >> > > > merge >> >> > > > > > > phase. >> >> > > > > > > > > > > > > >> >> > > > > > > > > > > > > Also, this may be non-obviuos, limiting results >> >> make >> >> > no >> >> > > > > sence >> >> > > > > > > > > without >> >> > > > > > > > > > > > > sorting, >> >> > > > > > > > > > > > > as there is no guarantee every next query run >> will >> >> > > return >> >> > > > > same >> >> > > > > > > data >> >> > > > > > > > > > > > because >> >> > > > > > > > > > > > > of page reordeing. >> >> > > > > > > > > > > > > Basically, merge phase receive results from >> data >> >> > nodes >> >> > > > > > > > > asynchronously >> >> > > > > > > > > > > and >> >> > > > > > > > > > > > > messages from different nodes can't be ordered. >> >> > > > > > > > > > > > > >> >> > > > > > > > > > > > > 2. >> >> > > > > > > > > > > > > a. "tokenize" param name (for @QueryTextFiled) >> >> looks >> >> > > more >> >> > > > > > > verbose, >> >> > > > > > > > > > > isn't >> >> > > > > > > > > > > > > it. >> >> > > > > > > > > > > > > b,c. What about distributed query? How partial >> >> > results >> >> > > > from >> >> > > > > > > nodes >> >> > > > > > > > > > will >> >> > > > > > > > > > > be >> >> > > > > > > > > > > > > merged? >> >> > > > > > > > > > > > > Does Lucene allows to configure comparator for >> >> data >> >> > > > > sorting? >> >> > > > > > > > > > > > > What comparator Ignite should choose to sort >> >> result >> >> > on >> >> > > > > merge >> >> > > > > > > phase? >> >> > > > > > > > > > > > > >> >> > > > > > > > > > > > > 3. For now Lucene engine is not configurable at >> >> all. >> >> > > E.g. >> >> > > > > it is >> >> > > > > > > > > > > > impossible >> >> > > > > > > > > > > > > to configure Tokenizer. >> >> > > > > > > > > > > > > I'd think about possible ways to configure >> engine >> >> at >> >> > > > first >> >> > > > > and >> >> > > > > > > only >> >> > > > > > > > > > > then >> >> > > > > > > > > > > > go >> >> > > > > > > > > > > > > further to discuss\implement complex features, >> >> > > > > > > > > > > > > that may depends on engine config. >> >> > > > > > > > > > > > > >> >> > > > > > > > > > > > > >> >> > > > > > > > > > > > > >> >> > > > > > > > > > > > > On Thu, Aug 29, 2019 at 8:17 PM Yuriy Shuliga < >> >> > > > > > > [hidden email]> >> >> > > > > > > > > > > wrote: >> >> > > > > > > > > > > > > >> >> > > > > > > > > > > > > > Dear community, >> >> > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > By starting this chain I'd like to open >> >> discussion >> >> > > that >> >> > > > > would >> >> > > > > > > > > come >> >> > > > > > > > > > to >> >> > > > > > > > > > > > > > contribution results in subj. area. >> >> > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > Ignite has indexing capabilities, backed up >> by >> >> > > > different >> >> > > > > > > > > > mechanisms, >> >> > > > > > > > > > > > > > including Lucene. >> >> > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > Currently, Lucene 7.5.0 is used (past year >> >> > release). >> >> > > > > > > > > > > > > > This is a wide spread and mature technology >> that >> >> > > covers >> >> > > > > text >> >> > > > > > > > > search >> >> > > > > > > > > > > > area >> >> > > > > > > > > > > > > > and beyond (e.g. spacial data indexing). >> >> > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > My goal is to *expose more Lucene >> functionality >> >> to >> >> > > > Ignite >> >> > > > > > > > > indexing >> >> > > > > > > > > > > and >> >> > > > > > > > > > > > > > query mechanisms for text data*. >> >> > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > It's quite simple request at current stage. >> It >> >> is >> >> > > > coming >> >> > > > > > > from our >> >> > > > > > > > > > > > > project's >> >> > > > > > > > > > > > > > needs, but i believe, will be useful for a >> lot >> >> more >> >> > > > > people. >> >> > > > > > > > > > > > > > Let's walk through and vote or discuss about >> >> Jira >> >> > > > > tickets for >> >> > > > > > > > > them. >> >> > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > 1.[trivial] Use dataQuery.getPageSize() to >> >> limit >> >> > > > search >> >> > > > > > > > > response >> >> > > > > > > > > > > > items >> >> > > > > > > > > > > > > > inside GridLuceneIndex.query(). Currently it >> is >> >> > > calling >> >> > > > > > > > > > > > > > IndexSearcher.search(query, >> >> *Integer.MAX_VALUE*) - >> >> > so >> >> > > > > > > basically >> >> > > > > > > > > all >> >> > > > > > > > > > > > > scored >> >> > > > > > > > > > > > > > matches will me returned, what we do not >> need in >> >> > most >> >> > > > > cases. >> >> > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > 2.[simple] Add sorting. Then more capable >> >> search >> >> > > call >> >> > > > > can be >> >> > > > > > > > > > > > > > executed: *IndexSearcher.search(query, count, >> >> > > > > > > > > > > > > > sort) * >> >> > > > > > > > > > > > > > Implementation steps: >> >> > > > > > > > > > > > > > a) Introduce boolean *sortField* parameter in >> >> > > > > > > *@QueryTextFiled * >> >> > > > > > > > > > > > > > annotation. If >> >> > > > > > > > > > > > > > *true *the filed will be indexed but not >> >> tokenized. >> >> > > > > Number >> >> > > > > > > types >> >> > > > > > > > > > are >> >> > > > > > > > > > > > > > preferred here. >> >> > > > > > > > > > > > > > b) Add *sort* collection to *TextQuery* >> >> > constructor. >> >> > > It >> >> > > > > > > should >> >> > > > > > > > > > define >> >> > > > > > > > > > > > > > desired sort fields used for querying. >> >> > > > > > > > > > > > > > c) Implement Lucene sort usage in >> >> > > > > GridLuceneIndex.query(). >> >> > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > 3.[moderate] Build complex queries with >> >> > *TextQuery*, >> >> > > > > > > including >> >> > > > > > > > > > > > > > terms/queries boosting. >> >> > > > > > > > > > > > > > *This section for voting only, as requires >> more >> >> > > > detailed >> >> > > > > > > work. >> >> > > > > > > > > > Should >> >> > > > > > > > > > > > be >> >> > > > > > > > > > > > > > extended if community is interested in it.* >> >> > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > Looking forward to your comments! >> >> > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > BR, >> >> > > > > > > > > > > > > > Yuriy Shuliha >> >> > > > > > > > > > > > > > >> >> > > > > > > > > > > > > >> >> > > > > > > > > > > > > >> >> > > > > > > > > > > > > -- >> >> > > > > > > > > > > > > Best regards, >> >> > > > > > > > > > > > > Andrey V. Mashenkov >> >> > > > > > > > > > > > > >> >> > > > > > > > > > > > >> >> > > > > > > > > > > >> >> > > > > > > > > > > >> >> > > > > > > > > > > -- >> >> > > > > > > > > > > >> >> > > > > > > > > > > Best regards, >> >> > > > > > > > > > > Alexei Scherbakov >> >> > > > > > > > > > > >> >> > > > > > > > > > >> >> > > > > > > > > >> >> > > > > > > >> >> > > > > > > >> >> > > > > > > >> >> > > > > > > -- >> >> > > > > > > Best regards, >> >> > > > > > > Ivan Pavlukhin >> >> > > > > > > >> >> > > > > >> >> > > > > >> >> > > > > >> >> > > > > -- >> >> > > > > Best regards, >> >> > > > > Ivan Pavlukhin >> >> > > > > >> >> > > > >> >> > > >> >> > >> >> > >> >> > -- >> >> > Best regards, >> >> > Andrey V. Mashenkov >> >> > >> >> >> > >> > >> > -- >> > Best regards, >> > Andrey V. Mashenkov >> > >> > -- Best regards, Andrey V. Mashenkov |
Dear Igniters,
The first part of TextQuery improvement - a result limit - was developed and merged. Now we have to develop most important functionality here - proper sorting of Lucene index response and correct reducing of them for distributed queries. *There are two Lucene based aspects* 1. In case of using no sorting fields, the documents in response are still ordered by relevance. Actually this is ScoreDoc.score value. In order to reduce the distributed results correctly, the score should be passed with response. 2. When sorting by conventional fields, then Lucene should have these fields properly indexed and corresponding Sort object should be applied to Lucene's search call. In order to mark those fields a new annotation like '@SortField' may be introduced. *Reducing on Ignite * The obvious point of distributed response reduction is class GridCacheDistributedQueryFuture. Though, @Ivan Pavlukhin mentioned class with similar functionality: ReduceIndexSorted What I see here, that it is tangled with H2 related classes ( org.h2.result.Row) and might not be unified with TextQuery reduction. Still need a support here. Overall, the goal of this letter is to initiate discussion on TextQuery Sorting implementation and come closer to ticket creation. BR, Yuriy Shuliha вт, 22 жовт. 2019 о 13:31 Andrey Mashenkov <[hidden email]> пише: > Hi Dmitry, Yuriy. > > I've found GridCacheQueryFutureAdapter has newly added AtomicInteger > 'total' field and 'limit; field as primitive int. > > Both fields are used inside synchronized block only. > So, we can make both private and downgrade AtomicInteger to primitive int. > > Most likely, these fields can be replaced with one field. > > > > On Mon, Oct 21, 2019 at 10:01 PM Dmitriy Pavlov <[hidden email]> > wrote: > > > Hi Andrey, > > > > I've checked this ticket comments, and there is a TC Bot visa (with no > > blockers). > > > > Do you have any concerns related to this patch? > > > > Sincerely, > > Dmitriy Pavlov > > > > чт, 17 окт. 2019 г. в 16:43, Yuriy Shuliga <[hidden email]>: > > > >> Andrey, > >> > >> Per you request, I created ticket > >> https://issues.apache.org/jira/browse/IGNITE-12291 linked to > >> https://issues.apache.org/jira/projects/IGNITE/issues/IGNITE-12189 > >> > >> Could you please proceed with PR merge ? > >> > >> BR, > >> Yuriy Shuliha > >> > >> ср, 9 жовт. 2019 о 12:52 Andrey Mashenkov <[hidden email]> > >> пише: > >> > >> > Hi Yuri, > >> > > >> > To get access to TC Bot you should register as TeamCity user [1], if > you > >> > didn't do this already. > >> > Then you will be able to authorize on Ignite TC Bot page with same > >> > credentials. > >> > > >> > [1] https://ci.ignite.apache.org/registerUser.html > >> > > >> > On Fri, Oct 4, 2019 at 3:10 PM Yuriy Shuliga <[hidden email]> > wrote: > >> > > >> >> Andrew, > >> >> > >> >> I have corrected PR according to your notes. Please review. > >> >> What will be the next steps in order to merge in? > >> >> > >> >> Y. > >> >> > >> >> чт, 3 жовт. 2019 о 17:47 Andrey Mashenkov < > [hidden email]> > >> >> пише: > >> >> > >> >> > Yuri, > >> >> > > >> >> > I've done with review. > >> >> > No crime found, but trivial compatibility bug. > >> >> > > >> >> > On Thu, Oct 3, 2019 at 3:54 PM Yuriy Shuliga <[hidden email]> > >> wrote: > >> >> > > >> >> > > Denis, > >> >> > > > >> >> > > Thank you for your attention to this. > >> >> > > as for now, the > https://issues.apache.org/jira/browse/IGNITE-12189 > >> >> > ticket > >> >> > > is still pending review. > >> >> > > Do we have a chance to move it forward somehow? > >> >> > > > >> >> > > BR, > >> >> > > Yuriy Shuliha > >> >> > > > >> >> > > пн, 30 вер. 2019 о 23:35 Denis Magda <[hidden email]> пише: > >> >> > > > >> >> > > > Yuriy, > >> >> > > > > >> >> > > > I've seen you opening a pull-request with the first changes: > >> >> > > > https://issues.apache.org/jira/browse/IGNITE-12189 > >> >> > > > > >> >> > > > Alex Scherbakov and Ivan are you the right guys to do the > review? > >> >> > > > > >> >> > > > - > >> >> > > > Denis > >> >> > > > > >> >> > > > > >> >> > > > On Fri, Sep 27, 2019 at 8:48 AM Павлухин Иван < > >> [hidden email]> > >> >> > > wrote: > >> >> > > > > >> >> > > > > Yuriy, > >> >> > > > > > >> >> > > > > Thank you for providing details! Quite interesting. > >> >> > > > > > >> >> > > > > Yes, we already have support of distributed limit and merging > >> >> sorted > >> >> > > > > subresults for SQL queries. E.g. ReduceIndexSorted and > >> >> > > > > MergeStreamIterator are used for merging sorted streams. > >> >> > > > > > >> >> > > > > Could you please also clarify about score/relevance? Is it > >> >> provided > >> >> > by > >> >> > > > > Lucene engine for each query result? I am thinking how to do > >> >> sorted > >> >> > > > > merge properly in this case. > >> >> > > > > > >> >> > > > > ср, 25 сент. 2019 г. в 18:56, Yuriy Shuliga < > [hidden email] > >> >: > >> >> > > > > > > >> >> > > > > > Ivan, > >> >> > > > > > > >> >> > > > > > Thank you for interesting question! > >> >> > > > > > > >> >> > > > > > Text searches (or full text searches) are mostly > >> human-oriented. > >> >> > And > >> >> > > > the > >> >> > > > > > point of user's interest is topmost part of response. > >> >> > > > > > Then user can read it, evaluate and use the given records > for > >> >> > further > >> >> > > > > > purposes. > >> >> > > > > > > >> >> > > > > > Particularly in our case, we use Ignite for operations with > >> >> > financial > >> >> > > > > data, > >> >> > > > > > and there lots of text stuff like assets names, fin. > >> >> instruments, > >> >> > > > > companies > >> >> > > > > > etc. > >> >> > > > > > In order to operate with this quickly and reliably, users > >> used > >> >> to > >> >> > > work > >> >> > > > > with > >> >> > > > > > text search, type-ahead completions, suggestions. > >> >> > > > > > > >> >> > > > > > For this purposes we are indexing particular string data in > >> >> > separate > >> >> > > > > caches. > >> >> > > > > > > >> >> > > > > > Sorting capabilities and response size limitations are very > >> >> > important > >> >> > > > > > there. As our API have to provide most relevant information > >> in > >> >> view > >> >> > > of > >> >> > > > > > limited size. > >> >> > > > > > > >> >> > > > > > Now let me comment some Ignite/Lucene perspective. > >> >> > > > > > Actually Ignite queries and Lucene returns > >> *TopDocs.scoresDocs > >> >> > > *already > >> >> > > > > > sorted by *score *(relevance). So most relevant documents > >> are on > >> >> > the > >> >> > > > top. > >> >> > > > > > And currently distributed queries responses from different > >> nodes > >> >> > are > >> >> > > > > merged > >> >> > > > > > into final query cursor queue in arbitrary way. > >> >> > > > > > So in fact we already have the score order ruined here. > Also > >> >> Ignite > >> >> > > > > > requests all possible documents from Lucene that is > redundant > >> >> and > >> >> > not > >> >> > > > > good > >> >> > > > > > for performance. > >> >> > > > > > > >> >> > > > > > I'm implementing *limit* parameter to be part of *TextQuery > >> *and > >> >> > have > >> >> > > > to > >> >> > > > > > notice that we still have to add sorting for text queries > >> >> > processing > >> >> > > in > >> >> > > > > > order to have applicable results. > >> >> > > > > > > >> >> > > > > > *Limit* parameter itself should improve the part of issues > >> from > >> >> > > above, > >> >> > > > > but > >> >> > > > > > definitely, sorting by document score at least should be > >> >> > implemented > >> >> > > > > along > >> >> > > > > > with limit. > >> >> > > > > > > >> >> > > > > > This is a pretty short commentary if you still have any > >> >> questions, > >> >> > > > please > >> >> > > > > > ask, do not hesitate) > >> >> > > > > > > >> >> > > > > > BR, > >> >> > > > > > Yuriy Shuliha > >> >> > > > > > > >> >> > > > > > чт, 19 вер. 2019 о 11:38 Павлухин Иван < > [hidden email]> > >> >> пише: > >> >> > > > > > > >> >> > > > > > > Yuriy, > >> >> > > > > > > > >> >> > > > > > > Greatly appreciate your interest. > >> >> > > > > > > > >> >> > > > > > > Could you please elaborate a little bit about sorting? > What > >> >> tasks > >> >> > > > does > >> >> > > > > > > it help to solve and how? It would be great to provide an > >> >> > example. > >> >> > > > > > > > >> >> > > > > > > ср, 18 сент. 2019 г. в 09:39, Alexei Scherbakov < > >> >> > > > > > > [hidden email]>: > >> >> > > > > > > > > >> >> > > > > > > > Denis, > >> >> > > > > > > > > >> >> > > > > > > > I like the idea of throwing an exception for enabled > text > >> >> > queries > >> >> > > > on > >> >> > > > > > > > persistent caches. > >> >> > > > > > > > > >> >> > > > > > > > Also I'm fine with proposed limit for unsorted > searches. > >> >> > > > > > > > > >> >> > > > > > > > Yury, please proceed with ticket creation. > >> >> > > > > > > > > >> >> > > > > > > > вт, 17 сент. 2019 г., 22:06 Denis Magda < > >> [hidden email] > >> >> >: > >> >> > > > > > > > > >> >> > > > > > > > > Igniters, > >> >> > > > > > > > > > >> >> > > > > > > > > I see nothing wrong with Yury's proposal in regards > >> >> full-text > >> >> > > > > search > >> >> > > > > > > API > >> >> > > > > > > > > evolution as long as Yury is ready to push it > forward. > >> >> > > > > > > > > > >> >> > > > > > > > > As for the in-memory mode only, it makes total sense > >> for > >> >> > > > in-memory > >> >> > > > > data > >> >> > > > > > > > > grid deployments when Ignite caches data of an > >> underlying > >> >> DB > >> >> > > like > >> >> > > > > > > Postgres. > >> >> > > > > > > > > As part of the changes, I would simply throw an > >> exception > >> >> (by > >> >> > > > > default) > >> >> > > > > > > if > >> >> > > > > > > > > the one attempts to use text indices with the native > >> >> > > persistence > >> >> > > > > > > enabled. > >> >> > > > > > > > > If the person is ready to live with that limitation > >> that > >> >> an > >> >> > > > > explicit > >> >> > > > > > > > > configuration change is needed to come around the > >> >> exception. > >> >> > > > > > > > > > >> >> > > > > > > > > Thoughts? > >> >> > > > > > > > > > >> >> > > > > > > > > > >> >> > > > > > > > > - > >> >> > > > > > > > > Denis > >> >> > > > > > > > > > >> >> > > > > > > > > > >> >> > > > > > > > > On Tue, Sep 17, 2019 at 7:44 AM Yuriy Shuliga < > >> >> > > [hidden email] > >> >> > > > > > >> >> > > > > > > wrote: > >> >> > > > > > > > > > >> >> > > > > > > > > > Hello to all again, > >> >> > > > > > > > > > > >> >> > > > > > > > > > Thank you for important comments and notes given > >> below! > >> >> > > > > > > > > > > >> >> > > > > > > > > > Let me answer and continue the discussion. > >> >> > > > > > > > > > > >> >> > > > > > > > > > (I) Overall needs in Lucene indexing > >> >> > > > > > > > > > > >> >> > > > > > > > > > Alexei has referenced to > >> >> > > > > > > > > > https://issues.apache.org/jira/browse/IGNITE-5371 > >> where > >> >> > > > > > > > > > absence of index persistence was declared as an > >> >> obstacle to > >> >> > > > > further > >> >> > > > > > > > > > development. > >> >> > > > > > > > > > > >> >> > > > > > > > > > a) This ticket is already closed as not valid.b) > >> There > >> >> are > >> >> > > > > definite > >> >> > > > > > > needs > >> >> > > > > > > > > > (and in our project as well) in just in-memory > >> indexing > >> >> of > >> >> > > > > selected > >> >> > > > > > > data. > >> >> > > > > > > > > > We intend to use search capabilities for fetching > >> >> limited > >> >> > > > amount > >> >> > > > > of > >> >> > > > > > > > > records > >> >> > > > > > > > > > that should be used in type-ahead search / > >> suggestions. > >> >> > > > > > > > > > Not all of the data will be indexed and the are no > >> need > >> >> in > >> >> > > > Lucene > >> >> > > > > > > index > >> >> > > > > > > > > to > >> >> > > > > > > > > > be persistence. Hope this is a wide pattern of > >> >> text-search > >> >> > > > usage. > >> >> > > > > > > > > > > >> >> > > > > > > > > > (II) Necessary fixes in current implementation. > >> >> > > > > > > > > > > >> >> > > > > > > > > > a) Implementation of correct *limit *(*offset* > seems > >> to > >> >> be > >> >> > > not > >> >> > > > > > > required > >> >> > > > > > > > > in > >> >> > > > > > > > > > text-search tasks for now) > >> >> > > > > > > > > > I have investigated the data flow for distributed > >> text > >> >> > > queries. > >> >> > > > > it > >> >> > > > > > > was > >> >> > > > > > > > > > simple test prefix query, like 'name'*='ene*'* > >> >> > > > > > > > > > For now each server-node returns all response > >> records to > >> >> > the > >> >> > > > > > > client-node > >> >> > > > > > > > > > and it may contain ~thousands, ~hundred thousands > >> >> records. > >> >> > > > > > > > > > Event if we need only first 10-100. Again, all the > >> >> results > >> >> > > are > >> >> > > > > added > >> >> > > > > > > to > >> >> > > > > > > > > > queue in GridCacheQueryFutureAdapter in arbitrary > >> order > >> >> by > >> >> > > > pages. > >> >> > > > > > > > > > I did not find here any means to deliver > >> deterministic > >> >> > > result. > >> >> > > > > > > > > > So implementing limit as part of query and > >> >> > > > > (GridCacheQueryRequest) > >> >> > > > > > > will > >> >> > > > > > > > > not > >> >> > > > > > > > > > change the nature of response but will limit load > on > >> >> nodes > >> >> > > and > >> >> > > > > > > > > networking. > >> >> > > > > > > > > > > >> >> > > > > > > > > > Can we consider to open a ticket for this? > >> >> > > > > > > > > > > >> >> > > > > > > > > > (III) Further extension of Lucene API exposition to > >> >> Ignite > >> >> > > > > > > > > > > >> >> > > > > > > > > > a) Sorting > >> >> > > > > > > > > > The solution for this could be: > >> >> > > > > > > > > > - Make entities comparable > >> >> > > > > > > > > > - Add custom comparator to entity > >> >> > > > > > > > > > - Add annotations to mark sorted fields for Lucene > >> >> indexing > >> >> > > > > > > > > > - Use comparators when merging responses or > reducing > >> to > >> >> > > desired > >> >> > > > > > > limit on > >> >> > > > > > > > > > client node. > >> >> > > > > > > > > > Will require full result set to be loaded into > >> memory. > >> >> > Though > >> >> > > > > can be > >> >> > > > > > > used > >> >> > > > > > > > > > for relatively small limits. > >> >> > > > > > > > > > BR, > >> >> > > > > > > > > > Yuriy Shuliha > >> >> > > > > > > > > > > >> >> > > > > > > > > > пт, 30 серп. 2019 о 10:37 Alexei Scherbakov < > >> >> > > > > > > > > [hidden email]> > >> >> > > > > > > > > > пише: > >> >> > > > > > > > > > > >> >> > > > > > > > > > > Yuriy, > >> >> > > > > > > > > > > > >> >> > > > > > > > > > > Note what one of major blockers for text queries > is > >> >> [1] > >> >> > > which > >> >> > > > > makes > >> >> > > > > > > > > > lucene > >> >> > > > > > > > > > > indexes unusable with persistence and main reason > >> for > >> >> > > > > > > discontinuation. > >> >> > > > > > > > > > > Probably it's should be addressed first to make > >> text > >> >> > > queries > >> >> > > > a > >> >> > > > > > > valid > >> >> > > > > > > > > > > product feature. > >> >> > > > > > > > > > > > >> >> > > > > > > > > > > Distributed sorting and advanved querying is > indeed > >> >> not a > >> >> > > > > trivial > >> >> > > > > > > task. > >> >> > > > > > > > > > > Some kind of merging must be implemented on query > >> >> > > originating > >> >> > > > > node. > >> >> > > > > > > > > > > > >> >> > > > > > > > > > > [1] > >> https://issues.apache.org/jira/browse/IGNITE-5371 > >> >> > > > > > > > > > > > >> >> > > > > > > > > > > чт, 29 авг. 2019 г. в 23:38, Denis Magda < > >> >> > > [hidden email] > >> >> > > > >: > >> >> > > > > > > > > > > > >> >> > > > > > > > > > > > Yuriy, > >> >> > > > > > > > > > > > > >> >> > > > > > > > > > > > If you are ready to take over the full-text > >> search > >> >> > > indexes > >> >> > > > > then > >> >> > > > > > > > > please > >> >> > > > > > > > > > go > >> >> > > > > > > > > > > > ahead. The primary reason why the community > >> wants to > >> >> > > > > discontinue > >> >> > > > > > > them > >> >> > > > > > > > > > > first > >> >> > > > > > > > > > > > (and, probable, resurrect later) are the > >> limitations > >> >> > > listed > >> >> > > > > by > >> >> > > > > > > Andrey > >> >> > > > > > > > > > and > >> >> > > > > > > > > > > > minimal support from the community end. > >> >> > > > > > > > > > > > > >> >> > > > > > > > > > > > - > >> >> > > > > > > > > > > > Denis > >> >> > > > > > > > > > > > > >> >> > > > > > > > > > > > > >> >> > > > > > > > > > > > On Thu, Aug 29, 2019 at 1:29 PM Andrey > Mashenkov > >> < > >> >> > > > > > > > > > > > [hidden email]> > >> >> > > > > > > > > > > > wrote: > >> >> > > > > > > > > > > > > >> >> > > > > > > > > > > > > Hi Yuriy, > >> >> > > > > > > > > > > > > > >> >> > > > > > > > > > > > > Unfortunatelly, there is a plan to > discontinue > >> >> > > > TextQueries > >> >> > > > > in > >> >> > > > > > > > > Ignite > >> >> > > > > > > > > > > [1]. > >> >> > > > > > > > > > > > > Motivation here is text indexes are not > >> >> persistent, > >> >> > not > >> >> > > > > > > > > transactional > >> >> > > > > > > > > > > and > >> >> > > > > > > > > > > > > can't be user together with SQL or inside > SQL. > >> >> > > > > > > > > > > > > and there is a lack of interest from > community > >> >> side. > >> >> > > > > > > > > > > > > You are weclome to take on these issues and > >> make > >> >> > > > > TextQueries > >> >> > > > > > > great. > >> >> > > > > > > > > > > > > > >> >> > > > > > > > > > > > > 1, PageSize can't be used to limit > resultset. > >> >> > > > > > > > > > > > > Query results return from data node to > >> client-side > >> >> > > cursor > >> >> > > > > in > >> >> > > > > > > > > > > page-by-page > >> >> > > > > > > > > > > > > manner and > >> >> > > > > > > > > > > > > this parameter is designed control page size. > >> It > >> >> is > >> >> > > > > supposed > >> >> > > > > > > query > >> >> > > > > > > > > > > > executes > >> >> > > > > > > > > > > > > lazily on server side and > >> >> > > > > > > > > > > > > it is not excepted full resultset be loaded > to > >> >> memory > >> >> > > on > >> >> > > > > server > >> >> > > > > > > > > side > >> >> > > > > > > > > > at > >> >> > > > > > > > > > > > > once, but by pages. > >> >> > > > > > > > > > > > > Do you mean you found Lucene load entire > >> resultset > >> >> > into > >> >> > > > > memory > >> >> > > > > > > > > before > >> >> > > > > > > > > > > > first > >> >> > > > > > > > > > > > > page is sent to client? > >> >> > > > > > > > > > > > > > >> >> > > > > > > > > > > > > I'd think a new parameter should be added to > >> limit > >> >> > > > result. > >> >> > > > > The > >> >> > > > > > > best > >> >> > > > > > > > > > > > > solution is to use query language commands > for > >> >> this, > >> >> > > e.g. > >> >> > > > > > > > > > > "LIMIT/OFFSET" > >> >> > > > > > > > > > > > in > >> >> > > > > > > > > > > > > SQL. > >> >> > > > > > > > > > > > > > >> >> > > > > > > > > > > > > This task doesn't look trivial. Query is > >> >> distributed > >> >> > > > > operation > >> >> > > > > > > and > >> >> > > > > > > > > > same > >> >> > > > > > > > > > > > > user query will be executed on data nodes > >> >> > > > > > > > > > > > > and then results from all nodes should be > >> correcly > >> >> > > merged > >> >> > > > > > > before > >> >> > > > > > > > > > being > >> >> > > > > > > > > > > > > returned via client-cursor. > >> >> > > > > > > > > > > > > So, LIMIT should be applied on every node and > >> >> then on > >> >> > > > merge > >> >> > > > > > > phase. > >> >> > > > > > > > > > > > > > >> >> > > > > > > > > > > > > Also, this may be non-obviuos, limiting > results > >> >> make > >> >> > no > >> >> > > > > sence > >> >> > > > > > > > > without > >> >> > > > > > > > > > > > > sorting, > >> >> > > > > > > > > > > > > as there is no guarantee every next query run > >> will > >> >> > > return > >> >> > > > > same > >> >> > > > > > > data > >> >> > > > > > > > > > > > because > >> >> > > > > > > > > > > > > of page reordeing. > >> >> > > > > > > > > > > > > Basically, merge phase receive results from > >> data > >> >> > nodes > >> >> > > > > > > > > asynchronously > >> >> > > > > > > > > > > and > >> >> > > > > > > > > > > > > messages from different nodes can't be > ordered. > >> >> > > > > > > > > > > > > > >> >> > > > > > > > > > > > > 2. > >> >> > > > > > > > > > > > > a. "tokenize" param name (for > @QueryTextFiled) > >> >> looks > >> >> > > more > >> >> > > > > > > verbose, > >> >> > > > > > > > > > > isn't > >> >> > > > > > > > > > > > > it. > >> >> > > > > > > > > > > > > b,c. What about distributed query? How > partial > >> >> > results > >> >> > > > from > >> >> > > > > > > nodes > >> >> > > > > > > > > > will > >> >> > > > > > > > > > > be > >> >> > > > > > > > > > > > > merged? > >> >> > > > > > > > > > > > > Does Lucene allows to configure comparator > for > >> >> data > >> >> > > > > sorting? > >> >> > > > > > > > > > > > > What comparator Ignite should choose to sort > >> >> result > >> >> > on > >> >> > > > > merge > >> >> > > > > > > phase? > >> >> > > > > > > > > > > > > > >> >> > > > > > > > > > > > > 3. For now Lucene engine is not configurable > at > >> >> all. > >> >> > > E.g. > >> >> > > > > it is > >> >> > > > > > > > > > > > impossible > >> >> > > > > > > > > > > > > to configure Tokenizer. > >> >> > > > > > > > > > > > > I'd think about possible ways to configure > >> engine > >> >> at > >> >> > > > first > >> >> > > > > and > >> >> > > > > > > only > >> >> > > > > > > > > > > then > >> >> > > > > > > > > > > > go > >> >> > > > > > > > > > > > > further to discuss\implement complex > features, > >> >> > > > > > > > > > > > > that may depends on engine config. > >> >> > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > >> >> > > > > > > > > > > > > On Thu, Aug 29, 2019 at 8:17 PM Yuriy > Shuliga < > >> >> > > > > > > [hidden email]> > >> >> > > > > > > > > > > wrote: > >> >> > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > Dear community, > >> >> > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > By starting this chain I'd like to open > >> >> discussion > >> >> > > that > >> >> > > > > would > >> >> > > > > > > > > come > >> >> > > > > > > > > > to > >> >> > > > > > > > > > > > > > contribution results in subj. area. > >> >> > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > Ignite has indexing capabilities, backed up > >> by > >> >> > > > different > >> >> > > > > > > > > > mechanisms, > >> >> > > > > > > > > > > > > > including Lucene. > >> >> > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > Currently, Lucene 7.5.0 is used (past year > >> >> > release). > >> >> > > > > > > > > > > > > > This is a wide spread and mature technology > >> that > >> >> > > covers > >> >> > > > > text > >> >> > > > > > > > > search > >> >> > > > > > > > > > > > area > >> >> > > > > > > > > > > > > > and beyond (e.g. spacial data indexing). > >> >> > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > My goal is to *expose more Lucene > >> functionality > >> >> to > >> >> > > > Ignite > >> >> > > > > > > > > indexing > >> >> > > > > > > > > > > and > >> >> > > > > > > > > > > > > > query mechanisms for text data*. > >> >> > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > It's quite simple request at current stage. > >> It > >> >> is > >> >> > > > coming > >> >> > > > > > > from our > >> >> > > > > > > > > > > > > project's > >> >> > > > > > > > > > > > > > needs, but i believe, will be useful for a > >> lot > >> >> more > >> >> > > > > people. > >> >> > > > > > > > > > > > > > Let's walk through and vote or discuss > about > >> >> Jira > >> >> > > > > tickets for > >> >> > > > > > > > > them. > >> >> > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > 1.[trivial] Use dataQuery.getPageSize() > to > >> >> limit > >> >> > > > search > >> >> > > > > > > > > response > >> >> > > > > > > > > > > > items > >> >> > > > > > > > > > > > > > inside GridLuceneIndex.query(). Currently > it > >> is > >> >> > > calling > >> >> > > > > > > > > > > > > > IndexSearcher.search(query, > >> >> *Integer.MAX_VALUE*) - > >> >> > so > >> >> > > > > > > basically > >> >> > > > > > > > > all > >> >> > > > > > > > > > > > > scored > >> >> > > > > > > > > > > > > > matches will me returned, what we do not > >> need in > >> >> > most > >> >> > > > > cases. > >> >> > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > 2.[simple] Add sorting. Then more capable > >> >> search > >> >> > > call > >> >> > > > > can be > >> >> > > > > > > > > > > > > > executed: *IndexSearcher.search(query, > count, > >> >> > > > > > > > > > > > > > sort) * > >> >> > > > > > > > > > > > > > Implementation steps: > >> >> > > > > > > > > > > > > > a) Introduce boolean *sortField* parameter > in > >> >> > > > > > > *@QueryTextFiled * > >> >> > > > > > > > > > > > > > annotation. If > >> >> > > > > > > > > > > > > > *true *the filed will be indexed but not > >> >> tokenized. > >> >> > > > > Number > >> >> > > > > > > types > >> >> > > > > > > > > > are > >> >> > > > > > > > > > > > > > preferred here. > >> >> > > > > > > > > > > > > > b) Add *sort* collection to *TextQuery* > >> >> > constructor. > >> >> > > It > >> >> > > > > > > should > >> >> > > > > > > > > > define > >> >> > > > > > > > > > > > > > desired sort fields used for querying. > >> >> > > > > > > > > > > > > > c) Implement Lucene sort usage in > >> >> > > > > GridLuceneIndex.query(). > >> >> > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > 3.[moderate] Build complex queries with > >> >> > *TextQuery*, > >> >> > > > > > > including > >> >> > > > > > > > > > > > > > terms/queries boosting. > >> >> > > > > > > > > > > > > > *This section for voting only, as requires > >> more > >> >> > > > detailed > >> >> > > > > > > work. > >> >> > > > > > > > > > Should > >> >> > > > > > > > > > > > be > >> >> > > > > > > > > > > > > > extended if community is interested in it.* > >> >> > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > Looking forward to your comments! > >> >> > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > BR, > >> >> > > > > > > > > > > > > > Yuriy Shuliha > >> >> > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > >> >> > > > > > > > > > > > > -- > >> >> > > > > > > > > > > > > Best regards, > >> >> > > > > > > > > > > > > Andrey V. Mashenkov > >> >> > > > > > > > > > > > > > >> >> > > > > > > > > > > > > >> >> > > > > > > > > > > > >> >> > > > > > > > > > > > >> >> > > > > > > > > > > -- > >> >> > > > > > > > > > > > >> >> > > > > > > > > > > Best regards, > >> >> > > > > > > > > > > Alexei Scherbakov > >> >> > > > > > > > > > > > >> >> > > > > > > > > > > >> >> > > > > > > > > > >> >> > > > > > > > >> >> > > > > > > > >> >> > > > > > > > >> >> > > > > > > -- > >> >> > > > > > > Best regards, > >> >> > > > > > > Ivan Pavlukhin > >> >> > > > > > > > >> >> > > > > > >> >> > > > > > >> >> > > > > > >> >> > > > > -- > >> >> > > > > Best regards, > >> >> > > > > Ivan Pavlukhin > >> >> > > > > > >> >> > > > > >> >> > > > >> >> > > >> >> > > >> >> > -- > >> >> > Best regards, > >> >> > Andrey V. Mashenkov > >> >> > > >> >> > >> > > >> > > >> > -- > >> > Best regards, > >> > Andrey V. Mashenkov > >> > > >> > > > > -- > Best regards, > Andrey V. Mashenkov > |
Hello!
I have a hunch that we are trying to build Apache Solr (or Solr Cloud) into Apache Ignite. I think that's a lot of effort that is not very justified. I don't think we should try to implement sorting in Apache Ignite, because it is a lot of work, and a lot of code in our code base which we don't really want. Regards, -- Ilya Kasnacheev пт, 22 нояб. 2019 г. в 20:59, Yuriy Shuliga <[hidden email]>: > Dear Igniters, > > The first part of TextQuery improvement - a result limit - was developed > and merged. > Now we have to develop most important functionality here - proper sorting > of Lucene index response and correct reducing of them for distributed > queries. > > *There are two Lucene based aspects* > > 1. In case of using no sorting fields, the documents in response are still > ordered by relevance. > Actually this is ScoreDoc.score value. > In order to reduce the distributed results correctly, the score should be > passed with response. > > 2. When sorting by conventional fields, then Lucene should have these > fields properly indexed and > corresponding Sort object should be applied to Lucene's search call. > In order to mark those fields a new annotation like '@SortField' may be > introduced. > > *Reducing on Ignite * > > The obvious point of distributed response reduction is class > GridCacheDistributedQueryFuture. > Though, @Ivan Pavlukhin mentioned class with similar functionality: > ReduceIndexSorted > What I see here, that it is tangled with H2 related classes ( > org.h2.result.Row) and might not be unified with TextQuery reduction. > > Still need a support here. > > Overall, the goal of this letter is to initiate discussion on TextQuery > Sorting implementation and come closer to ticket creation. > > BR, > Yuriy Shuliha > > вт, 22 жовт. 2019 о 13:31 Andrey Mashenkov <[hidden email]> > пише: > > > Hi Dmitry, Yuriy. > > > > I've found GridCacheQueryFutureAdapter has newly added AtomicInteger > > 'total' field and 'limit; field as primitive int. > > > > Both fields are used inside synchronized block only. > > So, we can make both private and downgrade AtomicInteger to primitive > int. > > > > Most likely, these fields can be replaced with one field. > > > > > > > > On Mon, Oct 21, 2019 at 10:01 PM Dmitriy Pavlov <[hidden email]> > > wrote: > > > > > Hi Andrey, > > > > > > I've checked this ticket comments, and there is a TC Bot visa (with no > > > blockers). > > > > > > Do you have any concerns related to this patch? > > > > > > Sincerely, > > > Dmitriy Pavlov > > > > > > чт, 17 окт. 2019 г. в 16:43, Yuriy Shuliga <[hidden email]>: > > > > > >> Andrey, > > >> > > >> Per you request, I created ticket > > >> https://issues.apache.org/jira/browse/IGNITE-12291 linked to > > >> https://issues.apache.org/jira/projects/IGNITE/issues/IGNITE-12189 > > >> > > >> Could you please proceed with PR merge ? > > >> > > >> BR, > > >> Yuriy Shuliha > > >> > > >> ср, 9 жовт. 2019 о 12:52 Andrey Mashenkov <[hidden email] > > > > >> пише: > > >> > > >> > Hi Yuri, > > >> > > > >> > To get access to TC Bot you should register as TeamCity user [1], if > > you > > >> > didn't do this already. > > >> > Then you will be able to authorize on Ignite TC Bot page with same > > >> > credentials. > > >> > > > >> > [1] https://ci.ignite.apache.org/registerUser.html > > >> > > > >> > On Fri, Oct 4, 2019 at 3:10 PM Yuriy Shuliga <[hidden email]> > > wrote: > > >> > > > >> >> Andrew, > > >> >> > > >> >> I have corrected PR according to your notes. Please review. > > >> >> What will be the next steps in order to merge in? > > >> >> > > >> >> Y. > > >> >> > > >> >> чт, 3 жовт. 2019 о 17:47 Andrey Mashenkov < > > [hidden email]> > > >> >> пише: > > >> >> > > >> >> > Yuri, > > >> >> > > > >> >> > I've done with review. > > >> >> > No crime found, but trivial compatibility bug. > > >> >> > > > >> >> > On Thu, Oct 3, 2019 at 3:54 PM Yuriy Shuliga <[hidden email]> > > >> wrote: > > >> >> > > > >> >> > > Denis, > > >> >> > > > > >> >> > > Thank you for your attention to this. > > >> >> > > as for now, the > > https://issues.apache.org/jira/browse/IGNITE-12189 > > >> >> > ticket > > >> >> > > is still pending review. > > >> >> > > Do we have a chance to move it forward somehow? > > >> >> > > > > >> >> > > BR, > > >> >> > > Yuriy Shuliha > > >> >> > > > > >> >> > > пн, 30 вер. 2019 о 23:35 Denis Magda <[hidden email]> пише: > > >> >> > > > > >> >> > > > Yuriy, > > >> >> > > > > > >> >> > > > I've seen you opening a pull-request with the first changes: > > >> >> > > > https://issues.apache.org/jira/browse/IGNITE-12189 > > >> >> > > > > > >> >> > > > Alex Scherbakov and Ivan are you the right guys to do the > > review? > > >> >> > > > > > >> >> > > > - > > >> >> > > > Denis > > >> >> > > > > > >> >> > > > > > >> >> > > > On Fri, Sep 27, 2019 at 8:48 AM Павлухин Иван < > > >> [hidden email]> > > >> >> > > wrote: > > >> >> > > > > > >> >> > > > > Yuriy, > > >> >> > > > > > > >> >> > > > > Thank you for providing details! Quite interesting. > > >> >> > > > > > > >> >> > > > > Yes, we already have support of distributed limit and > merging > > >> >> sorted > > >> >> > > > > subresults for SQL queries. E.g. ReduceIndexSorted and > > >> >> > > > > MergeStreamIterator are used for merging sorted streams. > > >> >> > > > > > > >> >> > > > > Could you please also clarify about score/relevance? Is it > > >> >> provided > > >> >> > by > > >> >> > > > > Lucene engine for each query result? I am thinking how to > do > > >> >> sorted > > >> >> > > > > merge properly in this case. > > >> >> > > > > > > >> >> > > > > ср, 25 сент. 2019 г. в 18:56, Yuriy Shuliga < > > [hidden email] > > >> >: > > >> >> > > > > > > > >> >> > > > > > Ivan, > > >> >> > > > > > > > >> >> > > > > > Thank you for interesting question! > > >> >> > > > > > > > >> >> > > > > > Text searches (or full text searches) are mostly > > >> human-oriented. > > >> >> > And > > >> >> > > > the > > >> >> > > > > > point of user's interest is topmost part of response. > > >> >> > > > > > Then user can read it, evaluate and use the given records > > for > > >> >> > further > > >> >> > > > > > purposes. > > >> >> > > > > > > > >> >> > > > > > Particularly in our case, we use Ignite for operations > with > > >> >> > financial > > >> >> > > > > data, > > >> >> > > > > > and there lots of text stuff like assets names, fin. > > >> >> instruments, > > >> >> > > > > companies > > >> >> > > > > > etc. > > >> >> > > > > > In order to operate with this quickly and reliably, users > > >> used > > >> >> to > > >> >> > > work > > >> >> > > > > with > > >> >> > > > > > text search, type-ahead completions, suggestions. > > >> >> > > > > > > > >> >> > > > > > For this purposes we are indexing particular string data > in > > >> >> > separate > > >> >> > > > > caches. > > >> >> > > > > > > > >> >> > > > > > Sorting capabilities and response size limitations are > very > > >> >> > important > > >> >> > > > > > there. As our API have to provide most relevant > information > > >> in > > >> >> view > > >> >> > > of > > >> >> > > > > > limited size. > > >> >> > > > > > > > >> >> > > > > > Now let me comment some Ignite/Lucene perspective. > > >> >> > > > > > Actually Ignite queries and Lucene returns > > >> *TopDocs.scoresDocs > > >> >> > > *already > > >> >> > > > > > sorted by *score *(relevance). So most relevant documents > > >> are on > > >> >> > the > > >> >> > > > top. > > >> >> > > > > > And currently distributed queries responses from > different > > >> nodes > > >> >> > are > > >> >> > > > > merged > > >> >> > > > > > into final query cursor queue in arbitrary way. > > >> >> > > > > > So in fact we already have the score order ruined here. > > Also > > >> >> Ignite > > >> >> > > > > > requests all possible documents from Lucene that is > > redundant > > >> >> and > > >> >> > not > > >> >> > > > > good > > >> >> > > > > > for performance. > > >> >> > > > > > > > >> >> > > > > > I'm implementing *limit* parameter to be part of > *TextQuery > > >> *and > > >> >> > have > > >> >> > > > to > > >> >> > > > > > notice that we still have to add sorting for text queries > > >> >> > processing > > >> >> > > in > > >> >> > > > > > order to have applicable results. > > >> >> > > > > > > > >> >> > > > > > *Limit* parameter itself should improve the part of > issues > > >> from > > >> >> > > above, > > >> >> > > > > but > > >> >> > > > > > definitely, sorting by document score at least should be > > >> >> > implemented > > >> >> > > > > along > > >> >> > > > > > with limit. > > >> >> > > > > > > > >> >> > > > > > This is a pretty short commentary if you still have any > > >> >> questions, > > >> >> > > > please > > >> >> > > > > > ask, do not hesitate) > > >> >> > > > > > > > >> >> > > > > > BR, > > >> >> > > > > > Yuriy Shuliha > > >> >> > > > > > > > >> >> > > > > > чт, 19 вер. 2019 о 11:38 Павлухин Иван < > > [hidden email]> > > >> >> пише: > > >> >> > > > > > > > >> >> > > > > > > Yuriy, > > >> >> > > > > > > > > >> >> > > > > > > Greatly appreciate your interest. > > >> >> > > > > > > > > >> >> > > > > > > Could you please elaborate a little bit about sorting? > > What > > >> >> tasks > > >> >> > > > does > > >> >> > > > > > > it help to solve and how? It would be great to provide > an > > >> >> > example. > > >> >> > > > > > > > > >> >> > > > > > > ср, 18 сент. 2019 г. в 09:39, Alexei Scherbakov < > > >> >> > > > > > > [hidden email]>: > > >> >> > > > > > > > > > >> >> > > > > > > > Denis, > > >> >> > > > > > > > > > >> >> > > > > > > > I like the idea of throwing an exception for enabled > > text > > >> >> > queries > > >> >> > > > on > > >> >> > > > > > > > persistent caches. > > >> >> > > > > > > > > > >> >> > > > > > > > Also I'm fine with proposed limit for unsorted > > searches. > > >> >> > > > > > > > > > >> >> > > > > > > > Yury, please proceed with ticket creation. > > >> >> > > > > > > > > > >> >> > > > > > > > вт, 17 сент. 2019 г., 22:06 Denis Magda < > > >> [hidden email] > > >> >> >: > > >> >> > > > > > > > > > >> >> > > > > > > > > Igniters, > > >> >> > > > > > > > > > > >> >> > > > > > > > > I see nothing wrong with Yury's proposal in regards > > >> >> full-text > > >> >> > > > > search > > >> >> > > > > > > API > > >> >> > > > > > > > > evolution as long as Yury is ready to push it > > forward. > > >> >> > > > > > > > > > > >> >> > > > > > > > > As for the in-memory mode only, it makes total > sense > > >> for > > >> >> > > > in-memory > > >> >> > > > > data > > >> >> > > > > > > > > grid deployments when Ignite caches data of an > > >> underlying > > >> >> DB > > >> >> > > like > > >> >> > > > > > > Postgres. > > >> >> > > > > > > > > As part of the changes, I would simply throw an > > >> exception > > >> >> (by > > >> >> > > > > default) > > >> >> > > > > > > if > > >> >> > > > > > > > > the one attempts to use text indices with the > native > > >> >> > > persistence > > >> >> > > > > > > enabled. > > >> >> > > > > > > > > If the person is ready to live with that limitation > > >> that > > >> >> an > > >> >> > > > > explicit > > >> >> > > > > > > > > configuration change is needed to come around the > > >> >> exception. > > >> >> > > > > > > > > > > >> >> > > > > > > > > Thoughts? > > >> >> > > > > > > > > > > >> >> > > > > > > > > > > >> >> > > > > > > > > - > > >> >> > > > > > > > > Denis > > >> >> > > > > > > > > > > >> >> > > > > > > > > > > >> >> > > > > > > > > On Tue, Sep 17, 2019 at 7:44 AM Yuriy Shuliga < > > >> >> > > [hidden email] > > >> >> > > > > > > >> >> > > > > > > wrote: > > >> >> > > > > > > > > > > >> >> > > > > > > > > > Hello to all again, > > >> >> > > > > > > > > > > > >> >> > > > > > > > > > Thank you for important comments and notes given > > >> below! > > >> >> > > > > > > > > > > > >> >> > > > > > > > > > Let me answer and continue the discussion. > > >> >> > > > > > > > > > > > >> >> > > > > > > > > > (I) Overall needs in Lucene indexing > > >> >> > > > > > > > > > > > >> >> > > > > > > > > > Alexei has referenced to > > >> >> > > > > > > > > > > https://issues.apache.org/jira/browse/IGNITE-5371 > > >> where > > >> >> > > > > > > > > > absence of index persistence was declared as an > > >> >> obstacle to > > >> >> > > > > further > > >> >> > > > > > > > > > development. > > >> >> > > > > > > > > > > > >> >> > > > > > > > > > a) This ticket is already closed as not valid.b) > > >> There > > >> >> are > > >> >> > > > > definite > > >> >> > > > > > > needs > > >> >> > > > > > > > > > (and in our project as well) in just in-memory > > >> indexing > > >> >> of > > >> >> > > > > selected > > >> >> > > > > > > data. > > >> >> > > > > > > > > > We intend to use search capabilities for fetching > > >> >> limited > > >> >> > > > amount > > >> >> > > > > of > > >> >> > > > > > > > > records > > >> >> > > > > > > > > > that should be used in type-ahead search / > > >> suggestions. > > >> >> > > > > > > > > > Not all of the data will be indexed and the are > no > > >> need > > >> >> in > > >> >> > > > Lucene > > >> >> > > > > > > index > > >> >> > > > > > > > > to > > >> >> > > > > > > > > > be persistence. Hope this is a wide pattern of > > >> >> text-search > > >> >> > > > usage. > > >> >> > > > > > > > > > > > >> >> > > > > > > > > > (II) Necessary fixes in current implementation. > > >> >> > > > > > > > > > > > >> >> > > > > > > > > > a) Implementation of correct *limit *(*offset* > > seems > > >> to > > >> >> be > > >> >> > > not > > >> >> > > > > > > required > > >> >> > > > > > > > > in > > >> >> > > > > > > > > > text-search tasks for now) > > >> >> > > > > > > > > > I have investigated the data flow for distributed > > >> text > > >> >> > > queries. > > >> >> > > > > it > > >> >> > > > > > > was > > >> >> > > > > > > > > > simple test prefix query, like 'name'*='ene*'* > > >> >> > > > > > > > > > For now each server-node returns all response > > >> records to > > >> >> > the > > >> >> > > > > > > client-node > > >> >> > > > > > > > > > and it may contain ~thousands, ~hundred thousands > > >> >> records. > > >> >> > > > > > > > > > Event if we need only first 10-100. Again, all > the > > >> >> results > > >> >> > > are > > >> >> > > > > added > > >> >> > > > > > > to > > >> >> > > > > > > > > > queue in GridCacheQueryFutureAdapter in arbitrary > > >> order > > >> >> by > > >> >> > > > pages. > > >> >> > > > > > > > > > I did not find here any means to deliver > > >> deterministic > > >> >> > > result. > > >> >> > > > > > > > > > So implementing limit as part of query and > > >> >> > > > > (GridCacheQueryRequest) > > >> >> > > > > > > will > > >> >> > > > > > > > > not > > >> >> > > > > > > > > > change the nature of response but will limit load > > on > > >> >> nodes > > >> >> > > and > > >> >> > > > > > > > > networking. > > >> >> > > > > > > > > > > > >> >> > > > > > > > > > Can we consider to open a ticket for this? > > >> >> > > > > > > > > > > > >> >> > > > > > > > > > (III) Further extension of Lucene API exposition > to > > >> >> Ignite > > >> >> > > > > > > > > > > > >> >> > > > > > > > > > a) Sorting > > >> >> > > > > > > > > > The solution for this could be: > > >> >> > > > > > > > > > - Make entities comparable > > >> >> > > > > > > > > > - Add custom comparator to entity > > >> >> > > > > > > > > > - Add annotations to mark sorted fields for > Lucene > > >> >> indexing > > >> >> > > > > > > > > > - Use comparators when merging responses or > > reducing > > >> to > > >> >> > > desired > > >> >> > > > > > > limit on > > >> >> > > > > > > > > > client node. > > >> >> > > > > > > > > > Will require full result set to be loaded into > > >> memory. > > >> >> > Though > > >> >> > > > > can be > > >> >> > > > > > > used > > >> >> > > > > > > > > > for relatively small limits. > > >> >> > > > > > > > > > BR, > > >> >> > > > > > > > > > Yuriy Shuliha > > >> >> > > > > > > > > > > > >> >> > > > > > > > > > пт, 30 серп. 2019 о 10:37 Alexei Scherbakov < > > >> >> > > > > > > > > [hidden email]> > > >> >> > > > > > > > > > пише: > > >> >> > > > > > > > > > > > >> >> > > > > > > > > > > Yuriy, > > >> >> > > > > > > > > > > > > >> >> > > > > > > > > > > Note what one of major blockers for text > queries > > is > > >> >> [1] > > >> >> > > which > > >> >> > > > > makes > > >> >> > > > > > > > > > lucene > > >> >> > > > > > > > > > > indexes unusable with persistence and main > reason > > >> for > > >> >> > > > > > > discontinuation. > > >> >> > > > > > > > > > > Probably it's should be addressed first to make > > >> text > > >> >> > > queries > > >> >> > > > a > > >> >> > > > > > > valid > > >> >> > > > > > > > > > > product feature. > > >> >> > > > > > > > > > > > > >> >> > > > > > > > > > > Distributed sorting and advanved querying is > > indeed > > >> >> not a > > >> >> > > > > trivial > > >> >> > > > > > > task. > > >> >> > > > > > > > > > > Some kind of merging must be implemented on > query > > >> >> > > originating > > >> >> > > > > node. > > >> >> > > > > > > > > > > > > >> >> > > > > > > > > > > [1] > > >> https://issues.apache.org/jira/browse/IGNITE-5371 > > >> >> > > > > > > > > > > > > >> >> > > > > > > > > > > чт, 29 авг. 2019 г. в 23:38, Denis Magda < > > >> >> > > [hidden email] > > >> >> > > > >: > > >> >> > > > > > > > > > > > > >> >> > > > > > > > > > > > Yuriy, > > >> >> > > > > > > > > > > > > > >> >> > > > > > > > > > > > If you are ready to take over the full-text > > >> search > > >> >> > > indexes > > >> >> > > > > then > > >> >> > > > > > > > > please > > >> >> > > > > > > > > > go > > >> >> > > > > > > > > > > > ahead. The primary reason why the community > > >> wants to > > >> >> > > > > discontinue > > >> >> > > > > > > them > > >> >> > > > > > > > > > > first > > >> >> > > > > > > > > > > > (and, probable, resurrect later) are the > > >> limitations > > >> >> > > listed > > >> >> > > > > by > > >> >> > > > > > > Andrey > > >> >> > > > > > > > > > and > > >> >> > > > > > > > > > > > minimal support from the community end. > > >> >> > > > > > > > > > > > > > >> >> > > > > > > > > > > > - > > >> >> > > > > > > > > > > > Denis > > >> >> > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > >> >> > > > > > > > > > > > On Thu, Aug 29, 2019 at 1:29 PM Andrey > > Mashenkov > > >> < > > >> >> > > > > > > > > > > > [hidden email]> > > >> >> > > > > > > > > > > > wrote: > > >> >> > > > > > > > > > > > > > >> >> > > > > > > > > > > > > Hi Yuriy, > > >> >> > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > Unfortunatelly, there is a plan to > > discontinue > > >> >> > > > TextQueries > > >> >> > > > > in > > >> >> > > > > > > > > Ignite > > >> >> > > > > > > > > > > [1]. > > >> >> > > > > > > > > > > > > Motivation here is text indexes are not > > >> >> persistent, > > >> >> > not > > >> >> > > > > > > > > transactional > > >> >> > > > > > > > > > > and > > >> >> > > > > > > > > > > > > can't be user together with SQL or inside > > SQL. > > >> >> > > > > > > > > > > > > and there is a lack of interest from > > community > > >> >> side. > > >> >> > > > > > > > > > > > > You are weclome to take on these issues and > > >> make > > >> >> > > > > TextQueries > > >> >> > > > > > > great. > > >> >> > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > 1, PageSize can't be used to limit > > resultset. > > >> >> > > > > > > > > > > > > Query results return from data node to > > >> client-side > > >> >> > > cursor > > >> >> > > > > in > > >> >> > > > > > > > > > > page-by-page > > >> >> > > > > > > > > > > > > manner and > > >> >> > > > > > > > > > > > > this parameter is designed control page > size. > > >> It > > >> >> is > > >> >> > > > > supposed > > >> >> > > > > > > query > > >> >> > > > > > > > > > > > executes > > >> >> > > > > > > > > > > > > lazily on server side and > > >> >> > > > > > > > > > > > > it is not excepted full resultset be loaded > > to > > >> >> memory > > >> >> > > on > > >> >> > > > > server > > >> >> > > > > > > > > side > > >> >> > > > > > > > > > at > > >> >> > > > > > > > > > > > > once, but by pages. > > >> >> > > > > > > > > > > > > Do you mean you found Lucene load entire > > >> resultset > > >> >> > into > > >> >> > > > > memory > > >> >> > > > > > > > > before > > >> >> > > > > > > > > > > > first > > >> >> > > > > > > > > > > > > page is sent to client? > > >> >> > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > I'd think a new parameter should be added > to > > >> limit > > >> >> > > > result. > > >> >> > > > > The > > >> >> > > > > > > best > > >> >> > > > > > > > > > > > > solution is to use query language commands > > for > > >> >> this, > > >> >> > > e.g. > > >> >> > > > > > > > > > > "LIMIT/OFFSET" > > >> >> > > > > > > > > > > > in > > >> >> > > > > > > > > > > > > SQL. > > >> >> > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > This task doesn't look trivial. Query is > > >> >> distributed > > >> >> > > > > operation > > >> >> > > > > > > and > > >> >> > > > > > > > > > same > > >> >> > > > > > > > > > > > > user query will be executed on data nodes > > >> >> > > > > > > > > > > > > and then results from all nodes should be > > >> correcly > > >> >> > > merged > > >> >> > > > > > > before > > >> >> > > > > > > > > > being > > >> >> > > > > > > > > > > > > returned via client-cursor. > > >> >> > > > > > > > > > > > > So, LIMIT should be applied on every node > and > > >> >> then on > > >> >> > > > merge > > >> >> > > > > > > phase. > > >> >> > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > Also, this may be non-obviuos, limiting > > results > > >> >> make > > >> >> > no > > >> >> > > > > sence > > >> >> > > > > > > > > without > > >> >> > > > > > > > > > > > > sorting, > > >> >> > > > > > > > > > > > > as there is no guarantee every next query > run > > >> will > > >> >> > > return > > >> >> > > > > same > > >> >> > > > > > > data > > >> >> > > > > > > > > > > > because > > >> >> > > > > > > > > > > > > of page reordeing. > > >> >> > > > > > > > > > > > > Basically, merge phase receive results from > > >> data > > >> >> > nodes > > >> >> > > > > > > > > asynchronously > > >> >> > > > > > > > > > > and > > >> >> > > > > > > > > > > > > messages from different nodes can't be > > ordered. > > >> >> > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > 2. > > >> >> > > > > > > > > > > > > a. "tokenize" param name (for > > @QueryTextFiled) > > >> >> looks > > >> >> > > more > > >> >> > > > > > > verbose, > > >> >> > > > > > > > > > > isn't > > >> >> > > > > > > > > > > > > it. > > >> >> > > > > > > > > > > > > b,c. What about distributed query? How > > partial > > >> >> > results > > >> >> > > > from > > >> >> > > > > > > nodes > > >> >> > > > > > > > > > will > > >> >> > > > > > > > > > > be > > >> >> > > > > > > > > > > > > merged? > > >> >> > > > > > > > > > > > > Does Lucene allows to configure comparator > > for > > >> >> data > > >> >> > > > > sorting? > > >> >> > > > > > > > > > > > > What comparator Ignite should choose to > sort > > >> >> result > > >> >> > on > > >> >> > > > > merge > > >> >> > > > > > > phase? > > >> >> > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > 3. For now Lucene engine is not > configurable > > at > > >> >> all. > > >> >> > > E.g. > > >> >> > > > > it is > > >> >> > > > > > > > > > > > impossible > > >> >> > > > > > > > > > > > > to configure Tokenizer. > > >> >> > > > > > > > > > > > > I'd think about possible ways to configure > > >> engine > > >> >> at > > >> >> > > > first > > >> >> > > > > and > > >> >> > > > > > > only > > >> >> > > > > > > > > > > then > > >> >> > > > > > > > > > > > go > > >> >> > > > > > > > > > > > > further to discuss\implement complex > > features, > > >> >> > > > > > > > > > > > > that may depends on engine config. > > >> >> > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > On Thu, Aug 29, 2019 at 8:17 PM Yuriy > > Shuliga < > > >> >> > > > > > > [hidden email]> > > >> >> > > > > > > > > > > wrote: > > >> >> > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > Dear community, > > >> >> > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > By starting this chain I'd like to open > > >> >> discussion > > >> >> > > that > > >> >> > > > > would > > >> >> > > > > > > > > come > > >> >> > > > > > > > > > to > > >> >> > > > > > > > > > > > > > contribution results in subj. area. > > >> >> > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > Ignite has indexing capabilities, backed > up > > >> by > > >> >> > > > different > > >> >> > > > > > > > > > mechanisms, > > >> >> > > > > > > > > > > > > > including Lucene. > > >> >> > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > Currently, Lucene 7.5.0 is used (past > year > > >> >> > release). > > >> >> > > > > > > > > > > > > > This is a wide spread and mature > technology > > >> that > > >> >> > > covers > > >> >> > > > > text > > >> >> > > > > > > > > search > > >> >> > > > > > > > > > > > area > > >> >> > > > > > > > > > > > > > and beyond (e.g. spacial data indexing). > > >> >> > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > My goal is to *expose more Lucene > > >> functionality > > >> >> to > > >> >> > > > Ignite > > >> >> > > > > > > > > indexing > > >> >> > > > > > > > > > > and > > >> >> > > > > > > > > > > > > > query mechanisms for text data*. > > >> >> > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > It's quite simple request at current > stage. > > >> It > > >> >> is > > >> >> > > > coming > > >> >> > > > > > > from our > > >> >> > > > > > > > > > > > > project's > > >> >> > > > > > > > > > > > > > needs, but i believe, will be useful for > a > > >> lot > > >> >> more > > >> >> > > > > people. > > >> >> > > > > > > > > > > > > > Let's walk through and vote or discuss > > about > > >> >> Jira > > >> >> > > > > tickets for > > >> >> > > > > > > > > them. > > >> >> > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > 1.[trivial] Use dataQuery.getPageSize() > > to > > >> >> limit > > >> >> > > > search > > >> >> > > > > > > > > response > > >> >> > > > > > > > > > > > items > > >> >> > > > > > > > > > > > > > inside GridLuceneIndex.query(). Currently > > it > > >> is > > >> >> > > calling > > >> >> > > > > > > > > > > > > > IndexSearcher.search(query, > > >> >> *Integer.MAX_VALUE*) - > > >> >> > so > > >> >> > > > > > > basically > > >> >> > > > > > > > > all > > >> >> > > > > > > > > > > > > scored > > >> >> > > > > > > > > > > > > > matches will me returned, what we do not > > >> need in > > >> >> > most > > >> >> > > > > cases. > > >> >> > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > 2.[simple] Add sorting. Then more > capable > > >> >> search > > >> >> > > call > > >> >> > > > > can be > > >> >> > > > > > > > > > > > > > executed: *IndexSearcher.search(query, > > count, > > >> >> > > > > > > > > > > > > > sort) * > > >> >> > > > > > > > > > > > > > Implementation steps: > > >> >> > > > > > > > > > > > > > a) Introduce boolean *sortField* > parameter > > in > > >> >> > > > > > > *@QueryTextFiled * > > >> >> > > > > > > > > > > > > > annotation. If > > >> >> > > > > > > > > > > > > > *true *the filed will be indexed but not > > >> >> tokenized. > > >> >> > > > > Number > > >> >> > > > > > > types > > >> >> > > > > > > > > > are > > >> >> > > > > > > > > > > > > > preferred here. > > >> >> > > > > > > > > > > > > > b) Add *sort* collection to *TextQuery* > > >> >> > constructor. > > >> >> > > It > > >> >> > > > > > > should > > >> >> > > > > > > > > > define > > >> >> > > > > > > > > > > > > > desired sort fields used for querying. > > >> >> > > > > > > > > > > > > > c) Implement Lucene sort usage in > > >> >> > > > > GridLuceneIndex.query(). > > >> >> > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > 3.[moderate] Build complex queries with > > >> >> > *TextQuery*, > > >> >> > > > > > > including > > >> >> > > > > > > > > > > > > > terms/queries boosting. > > >> >> > > > > > > > > > > > > > *This section for voting only, as > requires > > >> more > > >> >> > > > detailed > > >> >> > > > > > > work. > > >> >> > > > > > > > > > Should > > >> >> > > > > > > > > > > > be > > >> >> > > > > > > > > > > > > > extended if community is interested in > it.* > > >> >> > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > Looking forward to your comments! > > >> >> > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > BR, > > >> >> > > > > > > > > > > > > > Yuriy Shuliha > > >> >> > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > -- > > >> >> > > > > > > > > > > > > Best regards, > > >> >> > > > > > > > > > > > > Andrey V. Mashenkov > > >> >> > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > >> >> > > > > > > > > > > > > >> >> > > > > > > > > > > > > >> >> > > > > > > > > > > -- > > >> >> > > > > > > > > > > > > >> >> > > > > > > > > > > Best regards, > > >> >> > > > > > > > > > > Alexei Scherbakov > > >> >> > > > > > > > > > > > > >> >> > > > > > > > > > > > >> >> > > > > > > > > > > >> >> > > > > > > > > >> >> > > > > > > > > >> >> > > > > > > > > >> >> > > > > > > -- > > >> >> > > > > > > Best regards, > > >> >> > > > > > > Ivan Pavlukhin > > >> >> > > > > > > > > >> >> > > > > > > >> >> > > > > > > >> >> > > > > > > >> >> > > > > -- > > >> >> > > > > Best regards, > > >> >> > > > > Ivan Pavlukhin > > >> >> > > > > > > >> >> > > > > > >> >> > > > > >> >> > > > >> >> > > > >> >> > -- > > >> >> > Best regards, > > >> >> > Andrey V. Mashenkov > > >> >> > > > >> >> > > >> > > > >> > > > >> > -- > > >> > Best regards, > > >> > Andrey V. Mashenkov > > >> > > > >> > > > > > > > -- > > Best regards, > > Andrey V. Mashenkov > > > |
Ilya Kasnacheev, what a problem in Solr with Ignite functionality ? thanks ! >Вторник, 26 ноября 2019, 13:50 +03:00 от Ilya Kasnacheev <[hidden email]>: > >Hello! > >I have a hunch that we are trying to build Apache Solr (or Solr Cloud) into >Apache Ignite. I think that's a lot of effort that is not very justified. > >I don't think we should try to implement sorting in Apache Ignite, because >it is a lot of work, and a lot of code in our code base which we don't >really want. > >Regards, >-- >Ilya Kasnacheev > > >пт, 22 нояб. 2019 г. в 20:59, Yuriy Shuliga < [hidden email] >: > >> Dear Igniters, >> >> The first part of TextQuery improvement - a result limit - was developed >> and merged. >> Now we have to develop most important functionality here - proper sorting >> of Lucene index response and correct reducing of them for distributed >> queries. >> >> *There are two Lucene based aspects* >> >> 1. In case of using no sorting fields, the documents in response are still >> ordered by relevance. >> Actually this is ScoreDoc.score value. >> In order to reduce the distributed results correctly, the score should be >> passed with response. >> >> 2. When sorting by conventional fields, then Lucene should have these >> fields properly indexed and >> corresponding Sort object should be applied to Lucene's search call. >> In order to mark those fields a new annotation like '@SortField' may be >> introduced. >> >> *Reducing on Ignite * >> >> The obvious point of distributed response reduction is class >> GridCacheDistributedQueryFuture. >> Though, @Ivan Pavlukhin mentioned class with similar functionality: >> ReduceIndexSorted >> What I see here, that it is tangled with H2 related classes ( >> org.h2.result.Row) and might not be unified with TextQuery reduction. >> >> Still need a support here. >> >> Overall, the goal of this letter is to initiate discussion on TextQuery >> Sorting implementation and come closer to ticket creation. >> >> BR, >> Yuriy Shuliha >> >> вт, 22 жовт. 2019 о 13:31 Andrey Mashenkov < [hidden email] > >> пише: >> >> > Hi Dmitry, Yuriy. >> > >> > I've found GridCacheQueryFutureAdapter has newly added AtomicInteger >> > 'total' field and 'limit; field as primitive int. >> > >> > Both fields are used inside synchronized block only. >> > So, we can make both private and downgrade AtomicInteger to primitive >> int. >> > >> > Most likely, these fields can be replaced with one field. >> > >> > >> > >> > On Mon, Oct 21, 2019 at 10:01 PM Dmitriy Pavlov < [hidden email] > >> > wrote: >> > >> > > Hi Andrey, >> > > >> > > I've checked this ticket comments, and there is a TC Bot visa (with no >> > > blockers). >> > > >> > > Do you have any concerns related to this patch? >> > > >> > > Sincerely, >> > > Dmitriy Pavlov >> > > >> > > чт, 17 окт. 2019 г. в 16:43, Yuriy Shuliga < [hidden email] >: >> > > >> > >> Andrey, >> > >> >> > >> Per you request, I created ticket >> > >> https://issues.apache.org/jira/browse/IGNITE-12291 linked to >> > >> https://issues.apache.org/jira/projects/IGNITE/issues/IGNITE-12189 >> > >> >> > >> Could you please proceed with PR merge ? >> > >> >> > >> BR, >> > >> Yuriy Shuliha >> > >> >> > >> ср, 9 жовт. 2019 о 12:52 Andrey Mashenkov < [hidden email] >> > >> > >> пише: >> > >> >> > >> > Hi Yuri, >> > >> > >> > >> > To get access to TC Bot you should register as TeamCity user [1], if >> > you >> > >> > didn't do this already. >> > >> > Then you will be able to authorize on Ignite TC Bot page with same >> > >> > credentials. >> > >> > >> > >> > [1] https://ci.ignite.apache.org/registerUser.html >> > >> > >> > >> > On Fri, Oct 4, 2019 at 3:10 PM Yuriy Shuliga < [hidden email] > >> > wrote: >> > >> > >> > >> >> Andrew, >> > >> >> >> > >> >> I have corrected PR according to your notes. Please review. >> > >> >> What will be the next steps in order to merge in? >> > >> >> >> > >> >> Y. >> > >> >> >> > >> >> чт, 3 жовт. 2019 о 17:47 Andrey Mashenkov < >> > [hidden email] > >> > >> >> пише: >> > >> >> >> > >> >> > Yuri, >> > >> >> > >> > >> >> > I've done with review. >> > >> >> > No crime found, but trivial compatibility bug. >> > >> >> > >> > >> >> > On Thu, Oct 3, 2019 at 3:54 PM Yuriy Shuliga < [hidden email] > >> > >> wrote: >> > >> >> > >> > >> >> > > Denis, >> > >> >> > > >> > >> >> > > Thank you for your attention to this. >> > >> >> > > as for now, the >> > https://issues.apache.org/jira/browse/IGNITE-12189 >> > >> >> > ticket >> > >> >> > > is still pending review. >> > >> >> > > Do we have a chance to move it forward somehow? >> > >> >> > > >> > >> >> > > BR, >> > >> >> > > Yuriy Shuliha >> > >> >> > > >> > >> >> > > пн, 30 вер. 2019 о 23:35 Denis Magda < [hidden email] > пише: >> > >> >> > > >> > >> >> > > > Yuriy, >> > >> >> > > > >> > >> >> > > > I've seen you opening a pull-request with the first changes: >> > >> >> > > > https://issues.apache.org/jira/browse/IGNITE-12189 >> > >> >> > > > >> > >> >> > > > Alex Scherbakov and Ivan are you the right guys to do the >> > review? >> > >> >> > > > >> > >> >> > > > - >> > >> >> > > > Denis >> > >> >> > > > >> > >> >> > > > >> > >> >> > > > On Fri, Sep 27, 2019 at 8:48 AM Павлухин Иван < >> > >> [hidden email] > >> > >> >> > > wrote: >> > >> >> > > > >> > >> >> > > > > Yuriy, >> > >> >> > > > > >> > >> >> > > > > Thank you for providing details! Quite interesting. >> > >> >> > > > > >> > >> >> > > > > Yes, we already have support of distributed limit and >> merging >> > >> >> sorted >> > >> >> > > > > subresults for SQL queries. E.g. ReduceIndexSorted and >> > >> >> > > > > MergeStreamIterator are used for merging sorted streams. >> > >> >> > > > > >> > >> >> > > > > Could you please also clarify about score/relevance? Is it >> > >> >> provided >> > >> >> > by >> > >> >> > > > > Lucene engine for each query result? I am thinking how to >> do >> > >> >> sorted >> > >> >> > > > > merge properly in this case. >> > >> >> > > > > >> > >> >> > > > > ср, 25 сент. 2019 г. в 18:56, Yuriy Shuliga < >> > [hidden email] >> > >> >: >> > >> >> > > > > > >> > >> >> > > > > > Ivan, >> > >> >> > > > > > >> > >> >> > > > > > Thank you for interesting question! >> > >> >> > > > > > >> > >> >> > > > > > Text searches (or full text searches) are mostly >> > >> human-oriented. >> > >> >> > And >> > >> >> > > > the >> > >> >> > > > > > point of user's interest is topmost part of response. >> > >> >> > > > > > Then user can read it, evaluate and use the given records >> > for >> > >> >> > further >> > >> >> > > > > > purposes. >> > >> >> > > > > > >> > >> >> > > > > > Particularly in our case, we use Ignite for operations >> with >> > >> >> > financial >> > >> >> > > > > data, >> > >> >> > > > > > and there lots of text stuff like assets names, fin. >> > >> >> instruments, >> > >> >> > > > > companies >> > >> >> > > > > > etc. >> > >> >> > > > > > In order to operate with this quickly and reliably, users >> > >> used >> > >> >> to >> > >> >> > > work >> > >> >> > > > > with >> > >> >> > > > > > text search, type-ahead completions, suggestions. >> > >> >> > > > > > >> > >> >> > > > > > For this purposes we are indexing particular string data >> in >> > >> >> > separate >> > >> >> > > > > caches. >> > >> >> > > > > > >> > >> >> > > > > > Sorting capabilities and response size limitations are >> very >> > >> >> > important >> > >> >> > > > > > there. As our API have to provide most relevant >> information >> > >> in >> > >> >> view >> > >> >> > > of >> > >> >> > > > > > limited size. >> > >> >> > > > > > >> > >> >> > > > > > Now let me comment some Ignite/Lucene perspective. >> > >> >> > > > > > Actually Ignite queries and Lucene returns >> > >> *TopDocs.scoresDocs >> > >> >> > > *already >> > >> >> > > > > > sorted by *score *(relevance). So most relevant documents >> > >> are on >> > >> >> > the >> > >> >> > > > top. >> > >> >> > > > > > And currently distributed queries responses from >> different >> > >> nodes >> > >> >> > are >> > >> >> > > > > merged >> > >> >> > > > > > into final query cursor queue in arbitrary way. >> > >> >> > > > > > So in fact we already have the score order ruined here. >> > Also >> > >> >> Ignite >> > >> >> > > > > > requests all possible documents from Lucene that is >> > redundant >> > >> >> and >> > >> >> > not >> > >> >> > > > > good >> > >> >> > > > > > for performance. >> > >> >> > > > > > >> > >> >> > > > > > I'm implementing *limit* parameter to be part of >> *TextQuery >> > >> *and >> > >> >> > have >> > >> >> > > > to >> > >> >> > > > > > notice that we still have to add sorting for text queries >> > >> >> > processing >> > >> >> > > in >> > >> >> > > > > > order to have applicable results. >> > >> >> > > > > > >> > >> >> > > > > > *Limit* parameter itself should improve the part of >> issues >> > >> from >> > >> >> > > above, >> > >> >> > > > > but >> > >> >> > > > > > definitely, sorting by document score at least should be >> > >> >> > implemented >> > >> >> > > > > along >> > >> >> > > > > > with limit. >> > >> >> > > > > > >> > >> >> > > > > > This is a pretty short commentary if you still have any >> > >> >> questions, >> > >> >> > > > please >> > >> >> > > > > > ask, do not hesitate) >> > >> >> > > > > > >> > >> >> > > > > > BR, >> > >> >> > > > > > Yuriy Shuliha >> > >> >> > > > > > >> > >> >> > > > > > чт, 19 вер. 2019 о 11:38 Павлухин Иван < >> > [hidden email] > >> > >> >> пише: >> > >> >> > > > > > >> > >> >> > > > > > > Yuriy, >> > >> >> > > > > > > >> > >> >> > > > > > > Greatly appreciate your interest. >> > >> >> > > > > > > >> > >> >> > > > > > > Could you please elaborate a little bit about sorting? >> > What >> > >> >> tasks >> > >> >> > > > does >> > >> >> > > > > > > it help to solve and how? It would be great to provide >> an >> > >> >> > example. >> > >> >> > > > > > > >> > >> >> > > > > > > ср, 18 сент. 2019 г. в 09:39, Alexei Scherbakov < >> > >> >> > > > > > > [hidden email] >: >> > >> >> > > > > > > > >> > >> >> > > > > > > > Denis, >> > >> >> > > > > > > > >> > >> >> > > > > > > > I like the idea of throwing an exception for enabled >> > text >> > >> >> > queries >> > >> >> > > > on >> > >> >> > > > > > > > persistent caches. >> > >> >> > > > > > > > >> > >> >> > > > > > > > Also I'm fine with proposed limit for unsorted >> > searches. >> > >> >> > > > > > > > >> > >> >> > > > > > > > Yury, please proceed with ticket creation. >> > >> >> > > > > > > > >> > >> >> > > > > > > > вт, 17 сент. 2019 г., 22:06 Denis Magda < >> > >> [hidden email] >> > >> >> >: >> > >> >> > > > > > > > >> > >> >> > > > > > > > > Igniters, >> > >> >> > > > > > > > > >> > >> >> > > > > > > > > I see nothing wrong with Yury's proposal in regards >> > >> >> full-text >> > >> >> > > > > search >> > >> >> > > > > > > API >> > >> >> > > > > > > > > evolution as long as Yury is ready to push it >> > forward. >> > >> >> > > > > > > > > >> > >> >> > > > > > > > > As for the in-memory mode only, it makes total >> sense >> > >> for >> > >> >> > > > in-memory >> > >> >> > > > > data >> > >> >> > > > > > > > > grid deployments when Ignite caches data of an >> > >> underlying >> > >> >> DB >> > >> >> > > like >> > >> >> > > > > > > Postgres. >> > >> >> > > > > > > > > As part of the changes, I would simply throw an >> > >> exception >> > >> >> (by >> > >> >> > > > > default) >> > >> >> > > > > > > if >> > >> >> > > > > > > > > the one attempts to use text indices with the >> native >> > >> >> > > persistence >> > >> >> > > > > > > enabled. >> > >> >> > > > > > > > > If the person is ready to live with that limitation >> > >> that >> > >> >> an >> > >> >> > > > > explicit >> > >> >> > > > > > > > > configuration change is needed to come around the >> > >> >> exception. >> > >> >> > > > > > > > > >> > >> >> > > > > > > > > Thoughts? >> > >> >> > > > > > > > > >> > >> >> > > > > > > > > >> > >> >> > > > > > > > > - >> > >> >> > > > > > > > > Denis >> > >> >> > > > > > > > > >> > >> >> > > > > > > > > >> > >> >> > > > > > > > > On Tue, Sep 17, 2019 at 7:44 AM Yuriy Shuliga < >> > >> >> > > [hidden email] >> > >> >> > > > > >> > >> >> > > > > > > wrote: >> > >> >> > > > > > > > > >> > >> >> > > > > > > > > > Hello to all again, >> > >> >> > > > > > > > > > >> > >> >> > > > > > > > > > Thank you for important comments and notes given >> > >> below! >> > >> >> > > > > > > > > > >> > >> >> > > > > > > > > > Let me answer and continue the discussion. >> > >> >> > > > > > > > > > >> > >> >> > > > > > > > > > (I) Overall needs in Lucene indexing >> > >> >> > > > > > > > > > >> > >> >> > > > > > > > > > Alexei has referenced to >> > >> >> > > > > > > > > > >> https://issues.apache.org/jira/browse/IGNITE-5371 >> > >> where >> > >> >> > > > > > > > > > absence of index persistence was declared as an >> > >> >> obstacle to >> > >> >> > > > > further >> > >> >> > > > > > > > > > development. >> > >> >> > > > > > > > > > >> > >> >> > > > > > > > > > a) This ticket is already closed as not valid.b) >> > >> There >> > >> >> are >> > >> >> > > > > definite >> > >> >> > > > > > > needs >> > >> >> > > > > > > > > > (and in our project as well) in just in-memory >> > >> indexing >> > >> >> of >> > >> >> > > > > selected >> > >> >> > > > > > > data. >> > >> >> > > > > > > > > > We intend to use search capabilities for fetching >> > >> >> limited >> > >> >> > > > amount >> > >> >> > > > > of >> > >> >> > > > > > > > > records >> > >> >> > > > > > > > > > that should be used in type-ahead search / >> > >> suggestions. >> > >> >> > > > > > > > > > Not all of the data will be indexed and the are >> no >> > >> need >> > >> >> in >> > >> >> > > > Lucene >> > >> >> > > > > > > index >> > >> >> > > > > > > > > to >> > >> >> > > > > > > > > > be persistence. Hope this is a wide pattern of >> > >> >> text-search >> > >> >> > > > usage. >> > >> >> > > > > > > > > > >> > >> >> > > > > > > > > > (II) Necessary fixes in current implementation. >> > >> >> > > > > > > > > > >> > >> >> > > > > > > > > > a) Implementation of correct *limit *(*offset* >> > seems >> > >> to >> > >> >> be >> > >> >> > > not >> > >> >> > > > > > > required >> > >> >> > > > > > > > > in >> > >> >> > > > > > > > > > text-search tasks for now) >> > >> >> > > > > > > > > > I have investigated the data flow for distributed >> > >> text >> > >> >> > > queries. >> > >> >> > > > > it >> > >> >> > > > > > > was >> > >> >> > > > > > > > > > simple test prefix query, like 'name'*='ene*'* >> > >> >> > > > > > > > > > For now each server-node returns all response >> > >> records to >> > >> >> > the >> > >> >> > > > > > > client-node >> > >> >> > > > > > > > > > and it may contain ~thousands, ~hundred thousands >> > >> >> records. >> > >> >> > > > > > > > > > Event if we need only first 10-100. Again, all >> the >> > >> >> results >> > >> >> > > are >> > >> >> > > > > added >> > >> >> > > > > > > to >> > >> >> > > > > > > > > > queue in GridCacheQueryFutureAdapter in arbitrary >> > >> order >> > >> >> by >> > >> >> > > > pages. >> > >> >> > > > > > > > > > I did not find here any means to deliver >> > >> deterministic >> > >> >> > > result. >> > >> >> > > > > > > > > > So implementing limit as part of query and >> > >> >> > > > > (GridCacheQueryRequest) >> > >> >> > > > > > > will >> > >> >> > > > > > > > > not >> > >> >> > > > > > > > > > change the nature of response but will limit load >> > on >> > >> >> nodes >> > >> >> > > and >> > >> >> > > > > > > > > networking. >> > >> >> > > > > > > > > > >> > >> >> > > > > > > > > > Can we consider to open a ticket for this? >> > >> >> > > > > > > > > > >> > >> >> > > > > > > > > > (III) Further extension of Lucene API exposition >> to >> > >> >> Ignite >> > >> >> > > > > > > > > > >> > >> >> > > > > > > > > > a) Sorting >> > >> >> > > > > > > > > > The solution for this could be: >> > >> >> > > > > > > > > > - Make entities comparable >> > >> >> > > > > > > > > > - Add custom comparator to entity >> > >> >> > > > > > > > > > - Add annotations to mark sorted fields for >> Lucene >> > >> >> indexing >> > >> >> > > > > > > > > > - Use comparators when merging responses or >> > reducing >> > >> to >> > >> >> > > desired >> > >> >> > > > > > > limit on >> > >> >> > > > > > > > > > client node. >> > >> >> > > > > > > > > > Will require full result set to be loaded into >> > >> memory. >> > >> >> > Though >> > >> >> > > > > can be >> > >> >> > > > > > > used >> > >> >> > > > > > > > > > for relatively small limits. >> > >> >> > > > > > > > > > BR, >> > >> >> > > > > > > > > > Yuriy Shuliha >> > >> >> > > > > > > > > > >> > >> >> > > > > > > > > > пт, 30 серп. 2019 о 10:37 Alexei Scherbakov < >> > >> >> > > > > > > > > [hidden email] > >> > >> >> > > > > > > > > > пише: >> > >> >> > > > > > > > > > >> > >> >> > > > > > > > > > > Yuriy, >> > >> >> > > > > > > > > > > >> > >> >> > > > > > > > > > > Note what one of major blockers for text >> queries >> > is >> > >> >> [1] >> > >> >> > > which >> > >> >> > > > > makes >> > >> >> > > > > > > > > > lucene >> > >> >> > > > > > > > > > > indexes unusable with persistence and main >> reason >> > >> for >> > >> >> > > > > > > discontinuation. >> > >> >> > > > > > > > > > > Probably it's should be addressed first to make >> > >> text >> > >> >> > > queries >> > >> >> > > > a >> > >> >> > > > > > > valid >> > >> >> > > > > > > > > > > product feature. >> > >> >> > > > > > > > > > > >> > >> >> > > > > > > > > > > Distributed sorting and advanved querying is >> > indeed >> > >> >> not a >> > >> >> > > > > trivial >> > >> >> > > > > > > task. >> > >> >> > > > > > > > > > > Some kind of merging must be implemented on >> query >> > >> >> > > originating >> > >> >> > > > > node. >> > >> >> > > > > > > > > > > >> > >> >> > > > > > > > > > > [1] >> > >> https://issues.apache.org/jira/browse/IGNITE-5371 >> > >> >> > > > > > > > > > > >> > >> >> > > > > > > > > > > чт, 29 авг. 2019 г. в 23:38, Denis Magda < >> > >> >> > > [hidden email] >> > >> >> > > > >: >> > >> >> > > > > > > > > > > >> > >> >> > > > > > > > > > > > Yuriy, >> > >> >> > > > > > > > > > > > >> > >> >> > > > > > > > > > > > If you are ready to take over the full-text >> > >> search >> > >> >> > > indexes >> > >> >> > > > > then >> > >> >> > > > > > > > > please >> > >> >> > > > > > > > > > go >> > >> >> > > > > > > > > > > > ahead. The primary reason why the community >> > >> wants to >> > >> >> > > > > discontinue >> > >> >> > > > > > > them >> > >> >> > > > > > > > > > > first >> > >> >> > > > > > > > > > > > (and, probable, resurrect later) are the >> > >> limitations >> > >> >> > > listed >> > >> >> > > > > by >> > >> >> > > > > > > Andrey >> > >> >> > > > > > > > > > and >> > >> >> > > > > > > > > > > > minimal support from the community end. >> > >> >> > > > > > > > > > > > >> > >> >> > > > > > > > > > > > - >> > >> >> > > > > > > > > > > > Denis >> > >> >> > > > > > > > > > > > >> > >> >> > > > > > > > > > > > >> > >> >> > > > > > > > > > > > On Thu, Aug 29, 2019 at 1:29 PM Andrey >> > Mashenkov >> > >> < >> > >> >> > > > > > > > > > > > [hidden email] > >> > >> >> > > > > > > > > > > > wrote: >> > >> >> > > > > > > > > > > > >> > >> >> > > > > > > > > > > > > Hi Yuriy, >> > >> >> > > > > > > > > > > > > >> > >> >> > > > > > > > > > > > > Unfortunatelly, there is a plan to >> > discontinue >> > >> >> > > > TextQueries >> > >> >> > > > > in >> > >> >> > > > > > > > > Ignite >> > >> >> > > > > > > > > > > [1]. >> > >> >> > > > > > > > > > > > > Motivation here is text indexes are not >> > >> >> persistent, >> > >> >> > not >> > >> >> > > > > > > > > transactional >> > >> >> > > > > > > > > > > and >> > >> >> > > > > > > > > > > > > can't be user together with SQL or inside >> > SQL. >> > >> >> > > > > > > > > > > > > and there is a lack of interest from >> > community >> > >> >> side. >> > >> >> > > > > > > > > > > > > You are weclome to take on these issues and >> > >> make >> > >> >> > > > > TextQueries >> > >> >> > > > > > > great. >> > >> >> > > > > > > > > > > > > >> > >> >> > > > > > > > > > > > > 1, PageSize can't be used to limit >> > resultset. >> > >> >> > > > > > > > > > > > > Query results return from data node to >> > >> client-side >> > >> >> > > cursor >> > >> >> > > > > in >> > >> >> > > > > > > > > > > page-by-page >> > >> >> > > > > > > > > > > > > manner and >> > >> >> > > > > > > > > > > > > this parameter is designed control page >> size. >> > >> It >> > >> >> is >> > >> >> > > > > supposed >> > >> >> > > > > > > query >> > >> >> > > > > > > > > > > > executes >> > >> >> > > > > > > > > > > > > lazily on server side and >> > >> >> > > > > > > > > > > > > it is not excepted full resultset be loaded >> > to >> > >> >> memory >> > >> >> > > on >> > >> >> > > > > server >> > >> >> > > > > > > > > side >> > >> >> > > > > > > > > > at >> > >> >> > > > > > > > > > > > > once, but by pages. >> > >> >> > > > > > > > > > > > > Do you mean you found Lucene load entire >> > >> resultset >> > >> >> > into >> > >> >> > > > > memory >> > >> >> > > > > > > > > before >> > >> >> > > > > > > > > > > > first >> > >> >> > > > > > > > > > > > > page is sent to client? >> > >> >> > > > > > > > > > > > > >> > >> >> > > > > > > > > > > > > I'd think a new parameter should be added >> to >> > >> limit >> > >> >> > > > result. >> > >> >> > > > > The >> > >> >> > > > > > > best >> > >> >> > > > > > > > > > > > > solution is to use query language commands >> > for >> > >> >> this, >> > >> >> > > e.g. >> > >> >> > > > > > > > > > > "LIMIT/OFFSET" >> > >> >> > > > > > > > > > > > in >> > >> >> > > > > > > > > > > > > SQL. >> > >> >> > > > > > > > > > > > > >> > >> >> > > > > > > > > > > > > This task doesn't look trivial. Query is >> > >> >> distributed >> > >> >> > > > > operation >> > >> >> > > > > > > and >> > >> >> > > > > > > > > > same >> > >> >> > > > > > > > > > > > > user query will be executed on data nodes >> > >> >> > > > > > > > > > > > > and then results from all nodes should be >> > >> correcly >> > >> >> > > merged >> > >> >> > > > > > > before >> > >> >> > > > > > > > > > being >> > >> >> > > > > > > > > > > > > returned via client-cursor. >> > >> >> > > > > > > > > > > > > So, LIMIT should be applied on every node >> and >> > >> >> then on >> > >> >> > > > merge >> > >> >> > > > > > > phase. >> > >> >> > > > > > > > > > > > > >> > >> >> > > > > > > > > > > > > Also, this may be non-obviuos, limiting >> > results >> > >> >> make >> > >> >> > no >> > >> >> > > > > sence >> > >> >> > > > > > > > > without >> > >> >> > > > > > > > > > > > > sorting, >> > >> >> > > > > > > > > > > > > as there is no guarantee every next query >> run >> > >> will >> > >> >> > > return >> > >> >> > > > > same >> > >> >> > > > > > > data >> > >> >> > > > > > > > > > > > because >> > >> >> > > > > > > > > > > > > of page reordeing. >> > >> >> > > > > > > > > > > > > Basically, merge phase receive results from >> > >> data >> > >> >> > nodes >> > >> >> > > > > > > > > asynchronously >> > >> >> > > > > > > > > > > and >> > >> >> > > > > > > > > > > > > messages from different nodes can't be >> > ordered. >> > >> >> > > > > > > > > > > > > >> > >> >> > > > > > > > > > > > > 2. >> > >> >> > > > > > > > > > > > > a. "tokenize" param name (for >> > @QueryTextFiled) >> > >> >> looks >> > >> >> > > more >> > >> >> > > > > > > verbose, >> > >> >> > > > > > > > > > > isn't >> > >> >> > > > > > > > > > > > > it. >> > >> >> > > > > > > > > > > > > b,c. What about distributed query? How >> > partial >> > >> >> > results >> > >> >> > > > from >> > >> >> > > > > > > nodes >> > >> >> > > > > > > > > > will >> > >> >> > > > > > > > > > > be >> > >> >> > > > > > > > > > > > > merged? >> > >> >> > > > > > > > > > > > > Does Lucene allows to configure comparator >> > for >> > >> >> data >> > >> >> > > > > sorting? >> > >> >> > > > > > > > > > > > > What comparator Ignite should choose to >> sort >> > >> >> result >> > >> >> > on >> > >> >> > > > > merge >> > >> >> > > > > > > phase? >> > >> >> > > > > > > > > > > > > >> > >> >> > > > > > > > > > > > > 3. For now Lucene engine is not >> configurable >> > at >> > >> >> all. >> > >> >> > > E.g. >> > >> >> > > > > it is >> > >> >> > > > > > > > > > > > impossible >> > >> >> > > > > > > > > > > > > to configure Tokenizer. >> > >> >> > > > > > > > > > > > > I'd think about possible ways to configure >> > >> engine >> > >> >> at >> > >> >> > > > first >> > >> >> > > > > and >> > >> >> > > > > > > only >> > >> >> > > > > > > > > > > then >> > >> >> > > > > > > > > > > > go >> > >> >> > > > > > > > > > > > > further to discuss\implement complex >> > features, >> > >> >> > > > > > > > > > > > > that may depends on engine config. >> > >> >> > > > > > > > > > > > > >> > >> >> > > > > > > > > > > > > >> > >> >> > > > > > > > > > > > > >> > >> >> > > > > > > > > > > > > On Thu, Aug 29, 2019 at 8:17 PM Yuriy >> > Shuliga < >> > >> >> > > > > > > [hidden email] > >> > >> >> > > > > > > > > > > wrote: >> > >> >> > > > > > > > > > > > > >> > >> >> > > > > > > > > > > > > > Dear community, >> > >> >> > > > > > > > > > > > > > >> > >> >> > > > > > > > > > > > > > By starting this chain I'd like to open >> > >> >> discussion >> > >> >> > > that >> > >> >> > > > > would >> > >> >> > > > > > > > > come >> > >> >> > > > > > > > > > to >> > >> >> > > > > > > > > > > > > > contribution results in subj. area. >> > >> >> > > > > > > > > > > > > > >> > >> >> > > > > > > > > > > > > > Ignite has indexing capabilities, backed >> up >> > >> by >> > >> >> > > > different >> > >> >> > > > > > > > > > mechanisms, >> > >> >> > > > > > > > > > > > > > including Lucene. >> > >> >> > > > > > > > > > > > > > >> > >> >> > > > > > > > > > > > > > Currently, Lucene 7.5.0 is used (past >> year >> > >> >> > release). >> > >> >> > > > > > > > > > > > > > This is a wide spread and mature >> technology >> > >> that >> > >> >> > > covers >> > >> >> > > > > text >> > >> >> > > > > > > > > search >> > >> >> > > > > > > > > > > > area >> > >> >> > > > > > > > > > > > > > and beyond (e.g. spacial data indexing). >> > >> >> > > > > > > > > > > > > > >> > >> >> > > > > > > > > > > > > > My goal is to *expose more Lucene >> > >> functionality >> > >> >> to >> > >> >> > > > Ignite >> > >> >> > > > > > > > > indexing >> > >> >> > > > > > > > > > > and >> > >> >> > > > > > > > > > > > > > query mechanisms for text data*. >> > >> >> > > > > > > > > > > > > > >> > >> >> > > > > > > > > > > > > > It's quite simple request at current >> stage. >> > >> It >> > >> >> is >> > >> >> > > > coming >> > >> >> > > > > > > from our >> > >> >> > > > > > > > > > > > > project's >> > >> >> > > > > > > > > > > > > > needs, but i believe, will be useful for >> a >> > >> lot >> > >> >> more >> > >> >> > > > > people. >> > >> >> > > > > > > > > > > > > > Let's walk through and vote or discuss >> > about >> > >> >> Jira >> > >> >> > > > > tickets for >> > >> >> > > > > > > > > them. >> > >> >> > > > > > > > > > > > > > >> > >> >> > > > > > > > > > > > > > 1.[trivial] Use dataQuery.getPageSize() >> > to >> > >> >> limit >> > >> >> > > > search >> > >> >> > > > > > > > > response >> > >> >> > > > > > > > > > > > items >> > >> >> > > > > > > > > > > > > > inside GridLuceneIndex.query(). Currently >> > it >> > >> is >> > >> >> > > calling >> > >> >> > > > > > > > > > > > > > IndexSearcher.search(query, >> > >> >> *Integer.MAX_VALUE*) - >> > >> >> > so >> > >> >> > > > > > > basically >> > >> >> > > > > > > > > all >> > >> >> > > > > > > > > > > > > scored >> > >> >> > > > > > > > > > > > > > matches will me returned, what we do not >> > >> need in >> > >> >> > most >> > >> >> > > > > cases. >> > >> >> > > > > > > > > > > > > > >> > >> >> > > > > > > > > > > > > > 2.[simple] Add sorting. Then more >> capable >> > >> >> search >> > >> >> > > call >> > >> >> > > > > can be >> > >> >> > > > > > > > > > > > > > executed: *IndexSearcher.search(query, >> > count, >> > >> >> > > > > > > > > > > > > > sort) * >> > >> >> > > > > > > > > > > > > > Implementation steps: >> > >> >> > > > > > > > > > > > > > a) Introduce boolean *sortField* >> parameter >> > in >> > >> >> > > > > > > *@QueryTextFiled * >> > >> >> > > > > > > > > > > > > > annotation. If >> > >> >> > > > > > > > > > > > > > *true *the filed will be indexed but not >> > >> >> tokenized. >> > >> >> > > > > Number >> > >> >> > > > > > > types >> > >> >> > > > > > > > > > are >> > >> >> > > > > > > > > > > > > > preferred here. >> > >> >> > > > > > > > > > > > > > b) Add *sort* collection to *TextQuery* >> > >> >> > constructor. >> > >> >> > > It >> > >> >> > > > > > > should >> > >> >> > > > > > > > > > define >> > >> >> > > > > > > > > > > > > > desired sort fields used for querying. >> > >> >> > > > > > > > > > > > > > c) Implement Lucene sort usage in >> > >> >> > > > > GridLuceneIndex.query(). >> > >> >> > > > > > > > > > > > > > >> > >> >> > > > > > > > > > > > > > 3.[moderate] Build complex queries with >> > >> >> > *TextQuery*, >> > >> >> > > > > > > including >> > >> >> > > > > > > > > > > > > > terms/queries boosting. >> > >> >> > > > > > > > > > > > > > *This section for voting only, as >> requires >> > >> more >> > >> >> > > > detailed >> > >> >> > > > > > > work. >> > >> >> > > > > > > > > > Should >> > >> >> > > > > > > > > > > > be >> > >> >> > > > > > > > > > > > > > extended if community is interested in >> it.* >> > >> >> > > > > > > > > > > > > > >> > >> >> > > > > > > > > > > > > > Looking forward to your comments! >> > >> >> > > > > > > > > > > > > > >> > >> >> > > > > > > > > > > > > > BR, >> > >> >> > > > > > > > > > > > > > Yuriy Shuliha >> > >> >> > > > > > > > > > > > > > >> > >> >> > > > > > > > > > > > > >> > >> >> > > > > > > > > > > > > >> > >> >> > > > > > > > > > > > > -- >> > >> >> > > > > > > > > > > > > Best regards, >> > >> >> > > > > > > > > > > > > Andrey V. Mashenkov >> > >> >> > > > > > > > > > > > > >> > >> >> > > > > > > > > > > > >> > >> >> > > > > > > > > > > >> > >> >> > > > > > > > > > > >> > >> >> > > > > > > > > > > -- >> > >> >> > > > > > > > > > > >> > >> >> > > > > > > > > > > Best regards, >> > >> >> > > > > > > > > > > Alexei Scherbakov >> > >> >> > > > > > > > > > > >> > >> >> > > > > > > > > > >> > >> >> > > > > > > > > >> > >> >> > > > > > > >> > >> >> > > > > > > >> > >> >> > > > > > > >> > >> >> > > > > > > -- >> > >> >> > > > > > > Best regards, >> > >> >> > > > > > > Ivan Pavlukhin >> > >> >> > > > > > > >> > >> >> > > > > >> > >> >> > > > > >> > >> >> > > > > >> > >> >> > > > > -- >> > >> >> > > > > Best regards, >> > >> >> > > > > Ivan Pavlukhin >> > >> >> > > > > >> > >> >> > > > >> > >> >> > > >> > >> >> > >> > >> >> > >> > >> >> > -- >> > >> >> > Best regards, >> > >> >> > Andrey V. Mashenkov >> > >> >> > >> > >> >> >> > >> > >> > >> > >> > >> > -- >> > >> > Best regards, >> > >> > Andrey V. Mashenkov >> > >> > >> > >> >> > > >> > >> > -- >> > Best regards, >> > Andrey V. Mashenkov >> > >> |
Hello!
The problem here is that Solr is a multi-year effort by a lot of people. We can't match that. Maybe we could integrate with Solr/Solr Cloud instead, by feeding our cache information into their storage for indexing and relying on their own mechanisms for distributed IR sorting? Regards, -- Ilya Kasnacheev вт, 26 нояб. 2019 г. в 13:59, Zhenya Stanilovsky <[hidden email] >: > > Ilya Kasnacheev, what a problem in Solr with Ignite functionality ? > > thanks ! > > >Вторник, 26 ноября 2019, 13:50 +03:00 от Ilya Kasnacheev < > [hidden email]>: > > > >Hello! > > > >I have a hunch that we are trying to build Apache Solr (or Solr Cloud) > into > >Apache Ignite. I think that's a lot of effort that is not very justified. > > > >I don't think we should try to implement sorting in Apache Ignite, because > >it is a lot of work, and a lot of code in our code base which we don't > >really want. > > > >Regards, > >-- > >Ilya Kasnacheev > > > > > >пт, 22 нояб. 2019 г. в 20:59, Yuriy Shuliga < [hidden email] >: > > > >> Dear Igniters, > >> > >> The first part of TextQuery improvement - a result limit - was developed > >> and merged. > >> Now we have to develop most important functionality here - proper > sorting > >> of Lucene index response and correct reducing of them for distributed > >> queries. > >> > >> *There are two Lucene based aspects* > >> > >> 1. In case of using no sorting fields, the documents in response are > still > >> ordered by relevance. > >> Actually this is ScoreDoc.score value. > >> In order to reduce the distributed results correctly, the score should > be > >> passed with response. > >> > >> 2. When sorting by conventional fields, then Lucene should have these > >> fields properly indexed and > >> corresponding Sort object should be applied to Lucene's search call. > >> In order to mark those fields a new annotation like '@SortField' may be > >> introduced. > >> > >> *Reducing on Ignite * > >> > >> The obvious point of distributed response reduction is class > >> GridCacheDistributedQueryFuture. > >> Though, @Ivan Pavlukhin mentioned class with similar functionality: > >> ReduceIndexSorted > >> What I see here, that it is tangled with H2 related classes ( > >> org.h2.result.Row) and might not be unified with TextQuery reduction. > >> > >> Still need a support here. > >> > >> Overall, the goal of this letter is to initiate discussion on TextQuery > >> Sorting implementation and come closer to ticket creation. > >> > >> BR, > >> Yuriy Shuliha > >> > >> вт, 22 жовт. 2019 о 13:31 Andrey Mashenkov < [hidden email] > > > >> пише: > >> > >> > Hi Dmitry, Yuriy. > >> > > >> > I've found GridCacheQueryFutureAdapter has newly added AtomicInteger > >> > 'total' field and 'limit; field as primitive int. > >> > > >> > Both fields are used inside synchronized block only. > >> > So, we can make both private and downgrade AtomicInteger to primitive > >> int. > >> > > >> > Most likely, these fields can be replaced with one field. > >> > > >> > > >> > > >> > On Mon, Oct 21, 2019 at 10:01 PM Dmitriy Pavlov < [hidden email] > > > >> > wrote: > >> > > >> > > Hi Andrey, > >> > > > >> > > I've checked this ticket comments, and there is a TC Bot visa (with > no > >> > > blockers). > >> > > > >> > > Do you have any concerns related to this patch? > >> > > > >> > > Sincerely, > >> > > Dmitriy Pavlov > >> > > > >> > > чт, 17 окт. 2019 г. в 16:43, Yuriy Shuliga < [hidden email] >: > >> > > > >> > >> Andrey, > >> > >> > >> > >> Per you request, I created ticket > >> > >> https://issues.apache.org/jira/browse/IGNITE-12291 linked to > >> > >> > https://issues.apache.org/jira/projects/IGNITE/issues/IGNITE-12189 > >> > >> > >> > >> Could you please proceed with PR merge ? > >> > >> > >> > >> BR, > >> > >> Yuriy Shuliha > >> > >> > >> > >> ср, 9 жовт. 2019 о 12:52 Andrey Mashenkov < > [hidden email] > >> > > >> > >> пише: > >> > >> > >> > >> > Hi Yuri, > >> > >> > > >> > >> > To get access to TC Bot you should register as TeamCity user > [1], if > >> > you > >> > >> > didn't do this already. > >> > >> > Then you will be able to authorize on Ignite TC Bot page with > same > >> > >> > credentials. > >> > >> > > >> > >> > [1] https://ci.ignite.apache.org/registerUser.html > >> > >> > > >> > >> > On Fri, Oct 4, 2019 at 3:10 PM Yuriy Shuliga < [hidden email] > > > >> > wrote: > >> > >> > > >> > >> >> Andrew, > >> > >> >> > >> > >> >> I have corrected PR according to your notes. Please review. > >> > >> >> What will be the next steps in order to merge in? > >> > >> >> > >> > >> >> Y. > >> > >> >> > >> > >> >> чт, 3 жовт. 2019 о 17:47 Andrey Mashenkov < > >> > [hidden email] > > >> > >> >> пише: > >> > >> >> > >> > >> >> > Yuri, > >> > >> >> > > >> > >> >> > I've done with review. > >> > >> >> > No crime found, but trivial compatibility bug. > >> > >> >> > > >> > >> >> > On Thu, Oct 3, 2019 at 3:54 PM Yuriy Shuliga < > [hidden email] > > >> > >> wrote: > >> > >> >> > > >> > >> >> > > Denis, > >> > >> >> > > > >> > >> >> > > Thank you for your attention to this. > >> > >> >> > > as for now, the > >> > https://issues.apache.org/jira/browse/IGNITE-12189 > >> > >> >> > ticket > >> > >> >> > > is still pending review. > >> > >> >> > > Do we have a chance to move it forward somehow? > >> > >> >> > > > >> > >> >> > > BR, > >> > >> >> > > Yuriy Shuliha > >> > >> >> > > > >> > >> >> > > пн, 30 вер. 2019 о 23:35 Denis Magda < [hidden email] > > пише: > >> > >> >> > > > >> > >> >> > > > Yuriy, > >> > >> >> > > > > >> > >> >> > > > I've seen you opening a pull-request with the first > changes: > >> > >> >> > > > https://issues.apache.org/jira/browse/IGNITE-12189 > >> > >> >> > > > > >> > >> >> > > > Alex Scherbakov and Ivan are you the right guys to do the > >> > review? > >> > >> >> > > > > >> > >> >> > > > - > >> > >> >> > > > Denis > >> > >> >> > > > > >> > >> >> > > > > >> > >> >> > > > On Fri, Sep 27, 2019 at 8:48 AM Павлухин Иван < > >> > >> [hidden email] > > >> > >> >> > > wrote: > >> > >> >> > > > > >> > >> >> > > > > Yuriy, > >> > >> >> > > > > > >> > >> >> > > > > Thank you for providing details! Quite interesting. > >> > >> >> > > > > > >> > >> >> > > > > Yes, we already have support of distributed limit and > >> merging > >> > >> >> sorted > >> > >> >> > > > > subresults for SQL queries. E.g. ReduceIndexSorted and > >> > >> >> > > > > MergeStreamIterator are used for merging sorted streams. > >> > >> >> > > > > > >> > >> >> > > > > Could you please also clarify about score/relevance? Is > it > >> > >> >> provided > >> > >> >> > by > >> > >> >> > > > > Lucene engine for each query result? I am thinking how > to > >> do > >> > >> >> sorted > >> > >> >> > > > > merge properly in this case. > >> > >> >> > > > > > >> > >> >> > > > > ср, 25 сент. 2019 г. в 18:56, Yuriy Shuliga < > >> > [hidden email] > >> > >> >: > >> > >> >> > > > > > > >> > >> >> > > > > > Ivan, > >> > >> >> > > > > > > >> > >> >> > > > > > Thank you for interesting question! > >> > >> >> > > > > > > >> > >> >> > > > > > Text searches (or full text searches) are mostly > >> > >> human-oriented. > >> > >> >> > And > >> > >> >> > > > the > >> > >> >> > > > > > point of user's interest is topmost part of response. > >> > >> >> > > > > > Then user can read it, evaluate and use the given > records > >> > for > >> > >> >> > further > >> > >> >> > > > > > purposes. > >> > >> >> > > > > > > >> > >> >> > > > > > Particularly in our case, we use Ignite for operations > >> with > >> > >> >> > financial > >> > >> >> > > > > data, > >> > >> >> > > > > > and there lots of text stuff like assets names, fin. > >> > >> >> instruments, > >> > >> >> > > > > companies > >> > >> >> > > > > > etc. > >> > >> >> > > > > > In order to operate with this quickly and reliably, > users > >> > >> used > >> > >> >> to > >> > >> >> > > work > >> > >> >> > > > > with > >> > >> >> > > > > > text search, type-ahead completions, suggestions. > >> > >> >> > > > > > > >> > >> >> > > > > > For this purposes we are indexing particular string > data > >> in > >> > >> >> > separate > >> > >> >> > > > > caches. > >> > >> >> > > > > > > >> > >> >> > > > > > Sorting capabilities and response size limitations are > >> very > >> > >> >> > important > >> > >> >> > > > > > there. As our API have to provide most relevant > >> information > >> > >> in > >> > >> >> view > >> > >> >> > > of > >> > >> >> > > > > > limited size. > >> > >> >> > > > > > > >> > >> >> > > > > > Now let me comment some Ignite/Lucene perspective. > >> > >> >> > > > > > Actually Ignite queries and Lucene returns > >> > >> *TopDocs.scoresDocs > >> > >> >> > > *already > >> > >> >> > > > > > sorted by *score *(relevance). So most relevant > documents > >> > >> are on > >> > >> >> > the > >> > >> >> > > > top. > >> > >> >> > > > > > And currently distributed queries responses from > >> different > >> > >> nodes > >> > >> >> > are > >> > >> >> > > > > merged > >> > >> >> > > > > > into final query cursor queue in arbitrary way. > >> > >> >> > > > > > So in fact we already have the score order ruined > here. > >> > Also > >> > >> >> Ignite > >> > >> >> > > > > > requests all possible documents from Lucene that is > >> > redundant > >> > >> >> and > >> > >> >> > not > >> > >> >> > > > > good > >> > >> >> > > > > > for performance. > >> > >> >> > > > > > > >> > >> >> > > > > > I'm implementing *limit* parameter to be part of > >> *TextQuery > >> > >> *and > >> > >> >> > have > >> > >> >> > > > to > >> > >> >> > > > > > notice that we still have to add sorting for text > queries > >> > >> >> > processing > >> > >> >> > > in > >> > >> >> > > > > > order to have applicable results. > >> > >> >> > > > > > > >> > >> >> > > > > > *Limit* parameter itself should improve the part of > >> issues > >> > >> from > >> > >> >> > > above, > >> > >> >> > > > > but > >> > >> >> > > > > > definitely, sorting by document score at least should > be > >> > >> >> > implemented > >> > >> >> > > > > along > >> > >> >> > > > > > with limit. > >> > >> >> > > > > > > >> > >> >> > > > > > This is a pretty short commentary if you still have > any > >> > >> >> questions, > >> > >> >> > > > please > >> > >> >> > > > > > ask, do not hesitate) > >> > >> >> > > > > > > >> > >> >> > > > > > BR, > >> > >> >> > > > > > Yuriy Shuliha > >> > >> >> > > > > > > >> > >> >> > > > > > чт, 19 вер. 2019 о 11:38 Павлухин Иван < > >> > [hidden email] > > >> > >> >> пише: > >> > >> >> > > > > > > >> > >> >> > > > > > > Yuriy, > >> > >> >> > > > > > > > >> > >> >> > > > > > > Greatly appreciate your interest. > >> > >> >> > > > > > > > >> > >> >> > > > > > > Could you please elaborate a little bit about > sorting? > >> > What > >> > >> >> tasks > >> > >> >> > > > does > >> > >> >> > > > > > > it help to solve and how? It would be great to > provide > >> an > >> > >> >> > example. > >> > >> >> > > > > > > > >> > >> >> > > > > > > ср, 18 сент. 2019 г. в 09:39, Alexei Scherbakov < > >> > >> >> > > > > > > [hidden email] >: > >> > >> >> > > > > > > > > >> > >> >> > > > > > > > Denis, > >> > >> >> > > > > > > > > >> > >> >> > > > > > > > I like the idea of throwing an exception for > enabled > >> > text > >> > >> >> > queries > >> > >> >> > > > on > >> > >> >> > > > > > > > persistent caches. > >> > >> >> > > > > > > > > >> > >> >> > > > > > > > Also I'm fine with proposed limit for unsorted > >> > searches. > >> > >> >> > > > > > > > > >> > >> >> > > > > > > > Yury, please proceed with ticket creation. > >> > >> >> > > > > > > > > >> > >> >> > > > > > > > вт, 17 сент. 2019 г., 22:06 Denis Magda < > >> > >> [hidden email] > >> > >> >> >: > >> > >> >> > > > > > > > > >> > >> >> > > > > > > > > Igniters, > >> > >> >> > > > > > > > > > >> > >> >> > > > > > > > > I see nothing wrong with Yury's proposal in > regards > >> > >> >> full-text > >> > >> >> > > > > search > >> > >> >> > > > > > > API > >> > >> >> > > > > > > > > evolution as long as Yury is ready to push it > >> > forward. > >> > >> >> > > > > > > > > > >> > >> >> > > > > > > > > As for the in-memory mode only, it makes total > >> sense > >> > >> for > >> > >> >> > > > in-memory > >> > >> >> > > > > data > >> > >> >> > > > > > > > > grid deployments when Ignite caches data of an > >> > >> underlying > >> > >> >> DB > >> > >> >> > > like > >> > >> >> > > > > > > Postgres. > >> > >> >> > > > > > > > > As part of the changes, I would simply throw an > >> > >> exception > >> > >> >> (by > >> > >> >> > > > > default) > >> > >> >> > > > > > > if > >> > >> >> > > > > > > > > the one attempts to use text indices with the > >> native > >> > >> >> > > persistence > >> > >> >> > > > > > > enabled. > >> > >> >> > > > > > > > > If the person is ready to live with that > limitation > >> > >> that > >> > >> >> an > >> > >> >> > > > > explicit > >> > >> >> > > > > > > > > configuration change is needed to come around > the > >> > >> >> exception. > >> > >> >> > > > > > > > > > >> > >> >> > > > > > > > > Thoughts? > >> > >> >> > > > > > > > > > >> > >> >> > > > > > > > > > >> > >> >> > > > > > > > > - > >> > >> >> > > > > > > > > Denis > >> > >> >> > > > > > > > > > >> > >> >> > > > > > > > > > >> > >> >> > > > > > > > > On Tue, Sep 17, 2019 at 7:44 AM Yuriy Shuliga < > >> > >> >> > > [hidden email] > >> > >> >> > > > > > >> > >> >> > > > > > > wrote: > >> > >> >> > > > > > > > > > >> > >> >> > > > > > > > > > Hello to all again, > >> > >> >> > > > > > > > > > > >> > >> >> > > > > > > > > > Thank you for important comments and notes > given > >> > >> below! > >> > >> >> > > > > > > > > > > >> > >> >> > > > > > > > > > Let me answer and continue the discussion. > >> > >> >> > > > > > > > > > > >> > >> >> > > > > > > > > > (I) Overall needs in Lucene indexing > >> > >> >> > > > > > > > > > > >> > >> >> > > > > > > > > > Alexei has referenced to > >> > >> >> > > > > > > > > > > >> https://issues.apache.org/jira/browse/IGNITE-5371 > >> > >> where > >> > >> >> > > > > > > > > > absence of index persistence was declared as > an > >> > >> >> obstacle to > >> > >> >> > > > > further > >> > >> >> > > > > > > > > > development. > >> > >> >> > > > > > > > > > > >> > >> >> > > > > > > > > > a) This ticket is already closed as not > valid.b) > >> > >> There > >> > >> >> are > >> > >> >> > > > > definite > >> > >> >> > > > > > > needs > >> > >> >> > > > > > > > > > (and in our project as well) in just in-memory > >> > >> indexing > >> > >> >> of > >> > >> >> > > > > selected > >> > >> >> > > > > > > data. > >> > >> >> > > > > > > > > > We intend to use search capabilities for > fetching > >> > >> >> limited > >> > >> >> > > > amount > >> > >> >> > > > > of > >> > >> >> > > > > > > > > records > >> > >> >> > > > > > > > > > that should be used in type-ahead search / > >> > >> suggestions. > >> > >> >> > > > > > > > > > Not all of the data will be indexed and the > are > >> no > >> > >> need > >> > >> >> in > >> > >> >> > > > Lucene > >> > >> >> > > > > > > index > >> > >> >> > > > > > > > > to > >> > >> >> > > > > > > > > > be persistence. Hope this is a wide pattern of > >> > >> >> text-search > >> > >> >> > > > usage. > >> > >> >> > > > > > > > > > > >> > >> >> > > > > > > > > > (II) Necessary fixes in current > implementation. > >> > >> >> > > > > > > > > > > >> > >> >> > > > > > > > > > a) Implementation of correct *limit *(*offset* > >> > seems > >> > >> to > >> > >> >> be > >> > >> >> > > not > >> > >> >> > > > > > > required > >> > >> >> > > > > > > > > in > >> > >> >> > > > > > > > > > text-search tasks for now) > >> > >> >> > > > > > > > > > I have investigated the data flow for > distributed > >> > >> text > >> > >> >> > > queries. > >> > >> >> > > > > it > >> > >> >> > > > > > > was > >> > >> >> > > > > > > > > > simple test prefix query, like 'name'*='ene*'* > >> > >> >> > > > > > > > > > For now each server-node returns all response > >> > >> records to > >> > >> >> > the > >> > >> >> > > > > > > client-node > >> > >> >> > > > > > > > > > and it may contain ~thousands, ~hundred > thousands > >> > >> >> records. > >> > >> >> > > > > > > > > > Event if we need only first 10-100. Again, all > >> the > >> > >> >> results > >> > >> >> > > are > >> > >> >> > > > > added > >> > >> >> > > > > > > to > >> > >> >> > > > > > > > > > queue in GridCacheQueryFutureAdapter in > arbitrary > >> > >> order > >> > >> >> by > >> > >> >> > > > pages. > >> > >> >> > > > > > > > > > I did not find here any means to deliver > >> > >> deterministic > >> > >> >> > > result. > >> > >> >> > > > > > > > > > So implementing limit as part of query and > >> > >> >> > > > > (GridCacheQueryRequest) > >> > >> >> > > > > > > will > >> > >> >> > > > > > > > > not > >> > >> >> > > > > > > > > > change the nature of response but will limit > load > >> > on > >> > >> >> nodes > >> > >> >> > > and > >> > >> >> > > > > > > > > networking. > >> > >> >> > > > > > > > > > > >> > >> >> > > > > > > > > > Can we consider to open a ticket for this? > >> > >> >> > > > > > > > > > > >> > >> >> > > > > > > > > > (III) Further extension of Lucene API > exposition > >> to > >> > >> >> Ignite > >> > >> >> > > > > > > > > > > >> > >> >> > > > > > > > > > a) Sorting > >> > >> >> > > > > > > > > > The solution for this could be: > >> > >> >> > > > > > > > > > - Make entities comparable > >> > >> >> > > > > > > > > > - Add custom comparator to entity > >> > >> >> > > > > > > > > > - Add annotations to mark sorted fields for > >> Lucene > >> > >> >> indexing > >> > >> >> > > > > > > > > > - Use comparators when merging responses or > >> > reducing > >> > >> to > >> > >> >> > > desired > >> > >> >> > > > > > > limit on > >> > >> >> > > > > > > > > > client node. > >> > >> >> > > > > > > > > > Will require full result set to be loaded into > >> > >> memory. > >> > >> >> > Though > >> > >> >> > > > > can be > >> > >> >> > > > > > > used > >> > >> >> > > > > > > > > > for relatively small limits. > >> > >> >> > > > > > > > > > BR, > >> > >> >> > > > > > > > > > Yuriy Shuliha > >> > >> >> > > > > > > > > > > >> > >> >> > > > > > > > > > пт, 30 серп. 2019 о 10:37 Alexei Scherbakov < > >> > >> >> > > > > > > > > [hidden email] > > >> > >> >> > > > > > > > > > пише: > >> > >> >> > > > > > > > > > > >> > >> >> > > > > > > > > > > Yuriy, > >> > >> >> > > > > > > > > > > > >> > >> >> > > > > > > > > > > Note what one of major blockers for text > >> queries > >> > is > >> > >> >> [1] > >> > >> >> > > which > >> > >> >> > > > > makes > >> > >> >> > > > > > > > > > lucene > >> > >> >> > > > > > > > > > > indexes unusable with persistence and main > >> reason > >> > >> for > >> > >> >> > > > > > > discontinuation. > >> > >> >> > > > > > > > > > > Probably it's should be addressed first to > make > >> > >> text > >> > >> >> > > queries > >> > >> >> > > > a > >> > >> >> > > > > > > valid > >> > >> >> > > > > > > > > > > product feature. > >> > >> >> > > > > > > > > > > > >> > >> >> > > > > > > > > > > Distributed sorting and advanved querying is > >> > indeed > >> > >> >> not a > >> > >> >> > > > > trivial > >> > >> >> > > > > > > task. > >> > >> >> > > > > > > > > > > Some kind of merging must be implemented on > >> query > >> > >> >> > > originating > >> > >> >> > > > > node. > >> > >> >> > > > > > > > > > > > >> > >> >> > > > > > > > > > > [1] > >> > >> https://issues.apache.org/jira/browse/IGNITE-5371 > >> > >> >> > > > > > > > > > > > >> > >> >> > > > > > > > > > > чт, 29 авг. 2019 г. в 23:38, Denis Magda < > >> > >> >> > > [hidden email] > >> > >> >> > > > >: > >> > >> >> > > > > > > > > > > > >> > >> >> > > > > > > > > > > > Yuriy, > >> > >> >> > > > > > > > > > > > > >> > >> >> > > > > > > > > > > > If you are ready to take over the > full-text > >> > >> search > >> > >> >> > > indexes > >> > >> >> > > > > then > >> > >> >> > > > > > > > > please > >> > >> >> > > > > > > > > > go > >> > >> >> > > > > > > > > > > > ahead. The primary reason why the > community > >> > >> wants to > >> > >> >> > > > > discontinue > >> > >> >> > > > > > > them > >> > >> >> > > > > > > > > > > first > >> > >> >> > > > > > > > > > > > (and, probable, resurrect later) are the > >> > >> limitations > >> > >> >> > > listed > >> > >> >> > > > > by > >> > >> >> > > > > > > Andrey > >> > >> >> > > > > > > > > > and > >> > >> >> > > > > > > > > > > > minimal support from the community end. > >> > >> >> > > > > > > > > > > > > >> > >> >> > > > > > > > > > > > - > >> > >> >> > > > > > > > > > > > Denis > >> > >> >> > > > > > > > > > > > > >> > >> >> > > > > > > > > > > > > >> > >> >> > > > > > > > > > > > On Thu, Aug 29, 2019 at 1:29 PM Andrey > >> > Mashenkov > >> > >> < > >> > >> >> > > > > > > > > > > > [hidden email] > > >> > >> >> > > > > > > > > > > > wrote: > >> > >> >> > > > > > > > > > > > > >> > >> >> > > > > > > > > > > > > Hi Yuriy, > >> > >> >> > > > > > > > > > > > > > >> > >> >> > > > > > > > > > > > > Unfortunatelly, there is a plan to > >> > discontinue > >> > >> >> > > > TextQueries > >> > >> >> > > > > in > >> > >> >> > > > > > > > > Ignite > >> > >> >> > > > > > > > > > > [1]. > >> > >> >> > > > > > > > > > > > > Motivation here is text indexes are not > >> > >> >> persistent, > >> > >> >> > not > >> > >> >> > > > > > > > > transactional > >> > >> >> > > > > > > > > > > and > >> > >> >> > > > > > > > > > > > > can't be user together with SQL or > inside > >> > SQL. > >> > >> >> > > > > > > > > > > > > and there is a lack of interest from > >> > community > >> > >> >> side. > >> > >> >> > > > > > > > > > > > > You are weclome to take on these issues > and > >> > >> make > >> > >> >> > > > > TextQueries > >> > >> >> > > > > > > great. > >> > >> >> > > > > > > > > > > > > > >> > >> >> > > > > > > > > > > > > 1, PageSize can't be used to limit > >> > resultset. > >> > >> >> > > > > > > > > > > > > Query results return from data node to > >> > >> client-side > >> > >> >> > > cursor > >> > >> >> > > > > in > >> > >> >> > > > > > > > > > > page-by-page > >> > >> >> > > > > > > > > > > > > manner and > >> > >> >> > > > > > > > > > > > > this parameter is designed control page > >> size. > >> > >> It > >> > >> >> is > >> > >> >> > > > > supposed > >> > >> >> > > > > > > query > >> > >> >> > > > > > > > > > > > executes > >> > >> >> > > > > > > > > > > > > lazily on server side and > >> > >> >> > > > > > > > > > > > > it is not excepted full resultset be > loaded > >> > to > >> > >> >> memory > >> > >> >> > > on > >> > >> >> > > > > server > >> > >> >> > > > > > > > > side > >> > >> >> > > > > > > > > > at > >> > >> >> > > > > > > > > > > > > once, but by pages. > >> > >> >> > > > > > > > > > > > > Do you mean you found Lucene load entire > >> > >> resultset > >> > >> >> > into > >> > >> >> > > > > memory > >> > >> >> > > > > > > > > before > >> > >> >> > > > > > > > > > > > first > >> > >> >> > > > > > > > > > > > > page is sent to client? > >> > >> >> > > > > > > > > > > > > > >> > >> >> > > > > > > > > > > > > I'd think a new parameter should be > added > >> to > >> > >> limit > >> > >> >> > > > result. > >> > >> >> > > > > The > >> > >> >> > > > > > > best > >> > >> >> > > > > > > > > > > > > solution is to use query language > commands > >> > for > >> > >> >> this, > >> > >> >> > > e.g. > >> > >> >> > > > > > > > > > > "LIMIT/OFFSET" > >> > >> >> > > > > > > > > > > > in > >> > >> >> > > > > > > > > > > > > SQL. > >> > >> >> > > > > > > > > > > > > > >> > >> >> > > > > > > > > > > > > This task doesn't look trivial. Query is > >> > >> >> distributed > >> > >> >> > > > > operation > >> > >> >> > > > > > > and > >> > >> >> > > > > > > > > > same > >> > >> >> > > > > > > > > > > > > user query will be executed on data > nodes > >> > >> >> > > > > > > > > > > > > and then results from all nodes should > be > >> > >> correcly > >> > >> >> > > merged > >> > >> >> > > > > > > before > >> > >> >> > > > > > > > > > being > >> > >> >> > > > > > > > > > > > > returned via client-cursor. > >> > >> >> > > > > > > > > > > > > So, LIMIT should be applied on every > node > >> and > >> > >> >> then on > >> > >> >> > > > merge > >> > >> >> > > > > > > phase. > >> > >> >> > > > > > > > > > > > > > >> > >> >> > > > > > > > > > > > > Also, this may be non-obviuos, limiting > >> > results > >> > >> >> make > >> > >> >> > no > >> > >> >> > > > > sence > >> > >> >> > > > > > > > > without > >> > >> >> > > > > > > > > > > > > sorting, > >> > >> >> > > > > > > > > > > > > as there is no guarantee every next > query > >> run > >> > >> will > >> > >> >> > > return > >> > >> >> > > > > same > >> > >> >> > > > > > > data > >> > >> >> > > > > > > > > > > > because > >> > >> >> > > > > > > > > > > > > of page reordeing. > >> > >> >> > > > > > > > > > > > > Basically, merge phase receive results > from > >> > >> data > >> > >> >> > nodes > >> > >> >> > > > > > > > > asynchronously > >> > >> >> > > > > > > > > > > and > >> > >> >> > > > > > > > > > > > > messages from different nodes can't be > >> > ordered. > >> > >> >> > > > > > > > > > > > > > >> > >> >> > > > > > > > > > > > > 2. > >> > >> >> > > > > > > > > > > > > a. "tokenize" param name (for > >> > @QueryTextFiled) > >> > >> >> looks > >> > >> >> > > more > >> > >> >> > > > > > > verbose, > >> > >> >> > > > > > > > > > > isn't > >> > >> >> > > > > > > > > > > > > it. > >> > >> >> > > > > > > > > > > > > b,c. What about distributed query? How > >> > partial > >> > >> >> > results > >> > >> >> > > > from > >> > >> >> > > > > > > nodes > >> > >> >> > > > > > > > > > will > >> > >> >> > > > > > > > > > > be > >> > >> >> > > > > > > > > > > > > merged? > >> > >> >> > > > > > > > > > > > > Does Lucene allows to configure > comparator > >> > for > >> > >> >> data > >> > >> >> > > > > sorting? > >> > >> >> > > > > > > > > > > > > What comparator Ignite should choose to > >> sort > >> > >> >> result > >> > >> >> > on > >> > >> >> > > > > merge > >> > >> >> > > > > > > phase? > >> > >> >> > > > > > > > > > > > > > >> > >> >> > > > > > > > > > > > > 3. For now Lucene engine is not > >> configurable > >> > at > >> > >> >> all. > >> > >> >> > > E.g. > >> > >> >> > > > > it is > >> > >> >> > > > > > > > > > > > impossible > >> > >> >> > > > > > > > > > > > > to configure Tokenizer. > >> > >> >> > > > > > > > > > > > > I'd think about possible ways to > configure > >> > >> engine > >> > >> >> at > >> > >> >> > > > first > >> > >> >> > > > > and > >> > >> >> > > > > > > only > >> > >> >> > > > > > > > > > > then > >> > >> >> > > > > > > > > > > > go > >> > >> >> > > > > > > > > > > > > further to discuss\implement complex > >> > features, > >> > >> >> > > > > > > > > > > > > that may depends on engine config. > >> > >> >> > > > > > > > > > > > > > >> > >> >> > > > > > > > > > > > > > >> > >> >> > > > > > > > > > > > > > >> > >> >> > > > > > > > > > > > > On Thu, Aug 29, 2019 at 8:17 PM Yuriy > >> > Shuliga < > >> > >> >> > > > > > > [hidden email] > > >> > >> >> > > > > > > > > > > wrote: > >> > >> >> > > > > > > > > > > > > > >> > >> >> > > > > > > > > > > > > > Dear community, > >> > >> >> > > > > > > > > > > > > > > >> > >> >> > > > > > > > > > > > > > By starting this chain I'd like to > open > >> > >> >> discussion > >> > >> >> > > that > >> > >> >> > > > > would > >> > >> >> > > > > > > > > come > >> > >> >> > > > > > > > > > to > >> > >> >> > > > > > > > > > > > > > contribution results in subj. area. > >> > >> >> > > > > > > > > > > > > > > >> > >> >> > > > > > > > > > > > > > Ignite has indexing capabilities, > backed > >> up > >> > >> by > >> > >> >> > > > different > >> > >> >> > > > > > > > > > mechanisms, > >> > >> >> > > > > > > > > > > > > > including Lucene. > >> > >> >> > > > > > > > > > > > > > > >> > >> >> > > > > > > > > > > > > > Currently, Lucene 7.5.0 is used (past > >> year > >> > >> >> > release). > >> > >> >> > > > > > > > > > > > > > This is a wide spread and mature > >> technology > >> > >> that > >> > >> >> > > covers > >> > >> >> > > > > text > >> > >> >> > > > > > > > > search > >> > >> >> > > > > > > > > > > > area > >> > >> >> > > > > > > > > > > > > > and beyond (e.g. spacial data > indexing). > >> > >> >> > > > > > > > > > > > > > > >> > >> >> > > > > > > > > > > > > > My goal is to *expose more Lucene > >> > >> functionality > >> > >> >> to > >> > >> >> > > > Ignite > >> > >> >> > > > > > > > > indexing > >> > >> >> > > > > > > > > > > and > >> > >> >> > > > > > > > > > > > > > query mechanisms for text data*. > >> > >> >> > > > > > > > > > > > > > > >> > >> >> > > > > > > > > > > > > > It's quite simple request at current > >> stage. > >> > >> It > >> > >> >> is > >> > >> >> > > > coming > >> > >> >> > > > > > > from our > >> > >> >> > > > > > > > > > > > > project's > >> > >> >> > > > > > > > > > > > > > needs, but i believe, will be useful > for > >> a > >> > >> lot > >> > >> >> more > >> > >> >> > > > > people. > >> > >> >> > > > > > > > > > > > > > Let's walk through and vote or discuss > >> > about > >> > >> >> Jira > >> > >> >> > > > > tickets for > >> > >> >> > > > > > > > > them. > >> > >> >> > > > > > > > > > > > > > > >> > >> >> > > > > > > > > > > > > > 1.[trivial] Use > dataQuery.getPageSize() > >> > to > >> > >> >> limit > >> > >> >> > > > search > >> > >> >> > > > > > > > > response > >> > >> >> > > > > > > > > > > > items > >> > >> >> > > > > > > > > > > > > > inside GridLuceneIndex.query(). > Currently > >> > it > >> > >> is > >> > >> >> > > calling > >> > >> >> > > > > > > > > > > > > > IndexSearcher.search(query, > >> > >> >> *Integer.MAX_VALUE*) - > >> > >> >> > so > >> > >> >> > > > > > > basically > >> > >> >> > > > > > > > > all > >> > >> >> > > > > > > > > > > > > scored > >> > >> >> > > > > > > > > > > > > > matches will me returned, what we do > not > >> > >> need in > >> > >> >> > most > >> > >> >> > > > > cases. > >> > >> >> > > > > > > > > > > > > > > >> > >> >> > > > > > > > > > > > > > 2.[simple] Add sorting. Then more > >> capable > >> > >> >> search > >> > >> >> > > call > >> > >> >> > > > > can be > >> > >> >> > > > > > > > > > > > > > executed: *IndexSearcher.search(query, > >> > count, > >> > >> >> > > > > > > > > > > > > > sort) * > >> > >> >> > > > > > > > > > > > > > Implementation steps: > >> > >> >> > > > > > > > > > > > > > a) Introduce boolean *sortField* > >> parameter > >> > in > >> > >> >> > > > > > > *@QueryTextFiled * > >> > >> >> > > > > > > > > > > > > > annotation. If > >> > >> >> > > > > > > > > > > > > > *true *the filed will be indexed but > not > >> > >> >> tokenized. > >> > >> >> > > > > Number > >> > >> >> > > > > > > types > >> > >> >> > > > > > > > > > are > >> > >> >> > > > > > > > > > > > > > preferred here. > >> > >> >> > > > > > > > > > > > > > b) Add *sort* collection to > *TextQuery* > >> > >> >> > constructor. > >> > >> >> > > It > >> > >> >> > > > > > > should > >> > >> >> > > > > > > > > > define > >> > >> >> > > > > > > > > > > > > > desired sort fields used for querying. > >> > >> >> > > > > > > > > > > > > > c) Implement Lucene sort usage in > >> > >> >> > > > > GridLuceneIndex.query(). > >> > >> >> > > > > > > > > > > > > > > >> > >> >> > > > > > > > > > > > > > 3.[moderate] Build complex queries > with > >> > >> >> > *TextQuery*, > >> > >> >> > > > > > > including > >> > >> >> > > > > > > > > > > > > > terms/queries boosting. > >> > >> >> > > > > > > > > > > > > > *This section for voting only, as > >> requires > >> > >> more > >> > >> >> > > > detailed > >> > >> >> > > > > > > work. > >> > >> >> > > > > > > > > > Should > >> > >> >> > > > > > > > > > > > be > >> > >> >> > > > > > > > > > > > > > extended if community is interested in > >> it.* > >> > >> >> > > > > > > > > > > > > > > >> > >> >> > > > > > > > > > > > > > Looking forward to your comments! > >> > >> >> > > > > > > > > > > > > > > >> > >> >> > > > > > > > > > > > > > BR, > >> > >> >> > > > > > > > > > > > > > Yuriy Shuliha > >> > >> >> > > > > > > > > > > > > > > >> > >> >> > > > > > > > > > > > > > >> > >> >> > > > > > > > > > > > > > >> > >> >> > > > > > > > > > > > > -- > >> > >> >> > > > > > > > > > > > > Best regards, > >> > >> >> > > > > > > > > > > > > Andrey V. Mashenkov > >> > >> >> > > > > > > > > > > > > > >> > >> >> > > > > > > > > > > > > >> > >> >> > > > > > > > > > > > >> > >> >> > > > > > > > > > > > >> > >> >> > > > > > > > > > > -- > >> > >> >> > > > > > > > > > > > >> > >> >> > > > > > > > > > > Best regards, > >> > >> >> > > > > > > > > > > Alexei Scherbakov > >> > >> >> > > > > > > > > > > > >> > >> >> > > > > > > > > > > >> > >> >> > > > > > > > > > >> > >> >> > > > > > > > >> > >> >> > > > > > > > >> > >> >> > > > > > > > >> > >> >> > > > > > > -- > >> > >> >> > > > > > > Best regards, > >> > >> >> > > > > > > Ivan Pavlukhin > >> > >> >> > > > > > > > >> > >> >> > > > > > >> > >> >> > > > > > >> > >> >> > > > > > >> > >> >> > > > > -- > >> > >> >> > > > > Best regards, > >> > >> >> > > > > Ivan Pavlukhin > >> > >> >> > > > > > >> > >> >> > > > > >> > >> >> > > > >> > >> >> > > >> > >> >> > > >> > >> >> > -- > >> > >> >> > Best regards, > >> > >> >> > Andrey V. Mashenkov > >> > >> >> > > >> > >> >> > >> > >> > > >> > >> > > >> > >> > -- > >> > >> > Best regards, > >> > >> > Andrey V. Mashenkov > >> > >> > > >> > >> > >> > > > >> > > >> > -- > >> > Best regards, > >> > Andrey V. Mashenkov > >> > > >> > > > > |
Ok, lets forgot Solr and go through ASF way, if Yuriy prove this functionality is helpful and PR it, why not ? isn`t it ? >Вторник, 26 ноября 2019, 14:06 +03:00 от Ilya Kasnacheev <[hidden email]>: > >Hello! > >The problem here is that Solr is a multi-year effort by a lot of people. We >can't match that. > >Maybe we could integrate with Solr/Solr Cloud instead, by feeding our cache >information into their storage for indexing and relying on their own >mechanisms for distributed IR sorting? > >Regards, >-- >Ilya Kasnacheev > > >вт, 26 нояб. 2019 г. в 13:59, Zhenya Stanilovsky < [hidden email] >>: > >> >> Ilya Kasnacheev, what a problem in Solr with Ignite functionality ? >> >> thanks ! >> >> >Вторник, 26 ноября 2019, 13:50 +03:00 от Ilya Kasnacheev < >> [hidden email] >: >> > >> >Hello! >> > >> >I have a hunch that we are trying to build Apache Solr (or Solr Cloud) >> into >> >Apache Ignite. I think that's a lot of effort that is not very justified. >> > >> >I don't think we should try to implement sorting in Apache Ignite, because >> >it is a lot of work, and a lot of code in our code base which we don't >> >really want. >> > >> >Regards, >> >-- >> >Ilya Kasnacheev >> > >> > >> >пт, 22 нояб. 2019 г. в 20:59, Yuriy Shuliga < [hidden email] >: >> > >> >> Dear Igniters, >> >> >> >> The first part of TextQuery improvement - a result limit - was developed >> >> and merged. >> >> Now we have to develop most important functionality here - proper >> sorting >> >> of Lucene index response and correct reducing of them for distributed >> >> queries. >> >> >> >> *There are two Lucene based aspects* >> >> >> >> 1. In case of using no sorting fields, the documents in response are >> still >> >> ordered by relevance. >> >> Actually this is ScoreDoc.score value. >> >> In order to reduce the distributed results correctly, the score should >> be >> >> passed with response. >> >> >> >> 2. When sorting by conventional fields, then Lucene should have these >> >> fields properly indexed and >> >> corresponding Sort object should be applied to Lucene's search call. >> >> In order to mark those fields a new annotation like '@SortField' may be >> >> introduced. >> >> >> >> *Reducing on Ignite * >> >> >> >> The obvious point of distributed response reduction is class >> >> GridCacheDistributedQueryFuture. >> >> Though, @Ivan Pavlukhin mentioned class with similar functionality: >> >> ReduceIndexSorted >> >> What I see here, that it is tangled with H2 related classes ( >> >> org.h2.result.Row) and might not be unified with TextQuery reduction. >> >> >> >> Still need a support here. >> >> >> >> Overall, the goal of this letter is to initiate discussion on TextQuery >> >> Sorting implementation and come closer to ticket creation. >> >> >> >> BR, >> >> Yuriy Shuliha >> >> >> >> вт, 22 жовт. 2019 о 13:31 Andrey Mashenkov < [hidden email] >> > >> >> пише: >> >> >> >> > Hi Dmitry, Yuriy. >> >> > >> >> > I've found GridCacheQueryFutureAdapter has newly added AtomicInteger >> >> > 'total' field and 'limit; field as primitive int. >> >> > >> >> > Both fields are used inside synchronized block only. >> >> > So, we can make both private and downgrade AtomicInteger to primitive >> >> int. >> >> > >> >> > Most likely, these fields can be replaced with one field. >> >> > >> >> > >> >> > >> >> > On Mon, Oct 21, 2019 at 10:01 PM Dmitriy Pavlov < [hidden email] >> > >> >> > wrote: >> >> > >> >> > > Hi Andrey, >> >> > > >> >> > > I've checked this ticket comments, and there is a TC Bot visa (with >> no >> >> > > blockers). >> >> > > >> >> > > Do you have any concerns related to this patch? >> >> > > >> >> > > Sincerely, >> >> > > Dmitriy Pavlov >> >> > > >> >> > > чт, 17 окт. 2019 г. в 16:43, Yuriy Shuliga < [hidden email] >: >> >> > > >> >> > >> Andrey, >> >> > >> >> >> > >> Per you request, I created ticket >> >> > >> https://issues.apache.org/jira/browse/IGNITE-12291 linked to >> >> > >> >> https://issues.apache.org/jira/projects/IGNITE/issues/IGNITE-12189 >> >> > >> >> >> > >> Could you please proceed with PR merge ? >> >> > >> >> >> > >> BR, >> >> > >> Yuriy Shuliha >> >> > >> >> >> > >> ср, 9 жовт. 2019 о 12:52 Andrey Mashenkov < >> [hidden email] >> >> > >> >> > >> пише: >> >> > >> >> >> > >> > Hi Yuri, >> >> > >> > >> >> > >> > To get access to TC Bot you should register as TeamCity user >> [1], if >> >> > you >> >> > >> > didn't do this already. >> >> > >> > Then you will be able to authorize on Ignite TC Bot page with >> same >> >> > >> > credentials. >> >> > >> > >> >> > >> > [1] https://ci.ignite.apache.org/registerUser.html >> >> > >> > >> >> > >> > On Fri, Oct 4, 2019 at 3:10 PM Yuriy Shuliga < [hidden email] >> > >> >> > wrote: >> >> > >> > >> >> > >> >> Andrew, >> >> > >> >> >> >> > >> >> I have corrected PR according to your notes. Please review. >> >> > >> >> What will be the next steps in order to merge in? >> >> > >> >> >> >> > >> >> Y. >> >> > >> >> >> >> > >> >> чт, 3 жовт. 2019 о 17:47 Andrey Mashenkov < >> >> > [hidden email] > >> >> > >> >> пише: >> >> > >> >> >> >> > >> >> > Yuri, >> >> > >> >> > >> >> > >> >> > I've done with review. >> >> > >> >> > No crime found, but trivial compatibility bug. >> >> > >> >> > >> >> > >> >> > On Thu, Oct 3, 2019 at 3:54 PM Yuriy Shuliga < >> [hidden email] > >> >> > >> wrote: >> >> > >> >> > >> >> > >> >> > > Denis, >> >> > >> >> > > >> >> > >> >> > > Thank you for your attention to this. >> >> > >> >> > > as for now, the >> >> > https://issues.apache.org/jira/browse/IGNITE-12189 >> >> > >> >> > ticket >> >> > >> >> > > is still pending review. >> >> > >> >> > > Do we have a chance to move it forward somehow? >> >> > >> >> > > >> >> > >> >> > > BR, >> >> > >> >> > > Yuriy Shuliha >> >> > >> >> > > >> >> > >> >> > > пн, 30 вер. 2019 о 23:35 Denis Magda < [hidden email] > >> пише: >> >> > >> >> > > >> >> > >> >> > > > Yuriy, >> >> > >> >> > > > >> >> > >> >> > > > I've seen you opening a pull-request with the first >> changes: >> >> > >> >> > > > https://issues.apache.org/jira/browse/IGNITE-12189 >> >> > >> >> > > > >> >> > >> >> > > > Alex Scherbakov and Ivan are you the right guys to do the >> >> > review? >> >> > >> >> > > > >> >> > >> >> > > > - >> >> > >> >> > > > Denis >> >> > >> >> > > > >> >> > >> >> > > > >> >> > >> >> > > > On Fri, Sep 27, 2019 at 8:48 AM Павлухин Иван < >> >> > >> [hidden email] > >> >> > >> >> > > wrote: >> >> > >> >> > > > >> >> > >> >> > > > > Yuriy, >> >> > >> >> > > > > >> >> > >> >> > > > > Thank you for providing details! Quite interesting. >> >> > >> >> > > > > >> >> > >> >> > > > > Yes, we already have support of distributed limit and >> >> merging >> >> > >> >> sorted >> >> > >> >> > > > > subresults for SQL queries. E.g. ReduceIndexSorted and >> >> > >> >> > > > > MergeStreamIterator are used for merging sorted streams. >> >> > >> >> > > > > >> >> > >> >> > > > > Could you please also clarify about score/relevance? Is >> it >> >> > >> >> provided >> >> > >> >> > by >> >> > >> >> > > > > Lucene engine for each query result? I am thinking how >> to >> >> do >> >> > >> >> sorted >> >> > >> >> > > > > merge properly in this case. >> >> > >> >> > > > > >> >> > >> >> > > > > ср, 25 сент. 2019 г. в 18:56, Yuriy Shuliga < >> >> > [hidden email] >> >> > >> >: >> >> > >> >> > > > > > >> >> > >> >> > > > > > Ivan, >> >> > >> >> > > > > > >> >> > >> >> > > > > > Thank you for interesting question! >> >> > >> >> > > > > > >> >> > >> >> > > > > > Text searches (or full text searches) are mostly >> >> > >> human-oriented. >> >> > >> >> > And >> >> > >> >> > > > the >> >> > >> >> > > > > > point of user's interest is topmost part of response. >> >> > >> >> > > > > > Then user can read it, evaluate and use the given >> records >> >> > for >> >> > >> >> > further >> >> > >> >> > > > > > purposes. >> >> > >> >> > > > > > >> >> > >> >> > > > > > Particularly in our case, we use Ignite for operations >> >> with >> >> > >> >> > financial >> >> > >> >> > > > > data, >> >> > >> >> > > > > > and there lots of text stuff like assets names, fin. >> >> > >> >> instruments, >> >> > >> >> > > > > companies >> >> > >> >> > > > > > etc. >> >> > >> >> > > > > > In order to operate with this quickly and reliably, >> users >> >> > >> used >> >> > >> >> to >> >> > >> >> > > work >> >> > >> >> > > > > with >> >> > >> >> > > > > > text search, type-ahead completions, suggestions. >> >> > >> >> > > > > > >> >> > >> >> > > > > > For this purposes we are indexing particular string >> data >> >> in >> >> > >> >> > separate >> >> > >> >> > > > > caches. >> >> > >> >> > > > > > >> >> > >> >> > > > > > Sorting capabilities and response size limitations are >> >> very >> >> > >> >> > important >> >> > >> >> > > > > > there. As our API have to provide most relevant >> >> information >> >> > >> in >> >> > >> >> view >> >> > >> >> > > of >> >> > >> >> > > > > > limited size. >> >> > >> >> > > > > > >> >> > >> >> > > > > > Now let me comment some Ignite/Lucene perspective. >> >> > >> >> > > > > > Actually Ignite queries and Lucene returns >> >> > >> *TopDocs.scoresDocs >> >> > >> >> > > *already >> >> > >> >> > > > > > sorted by *score *(relevance). So most relevant >> documents >> >> > >> are on >> >> > >> >> > the >> >> > >> >> > > > top. >> >> > >> >> > > > > > And currently distributed queries responses from >> >> different >> >> > >> nodes >> >> > >> >> > are >> >> > >> >> > > > > merged >> >> > >> >> > > > > > into final query cursor queue in arbitrary way. >> >> > >> >> > > > > > So in fact we already have the score order ruined >> here. >> >> > Also >> >> > >> >> Ignite >> >> > >> >> > > > > > requests all possible documents from Lucene that is >> >> > redundant >> >> > >> >> and >> >> > >> >> > not >> >> > >> >> > > > > good >> >> > >> >> > > > > > for performance. >> >> > >> >> > > > > > >> >> > >> >> > > > > > I'm implementing *limit* parameter to be part of >> >> *TextQuery >> >> > >> *and >> >> > >> >> > have >> >> > >> >> > > > to >> >> > >> >> > > > > > notice that we still have to add sorting for text >> queries >> >> > >> >> > processing >> >> > >> >> > > in >> >> > >> >> > > > > > order to have applicable results. >> >> > >> >> > > > > > >> >> > >> >> > > > > > *Limit* parameter itself should improve the part of >> >> issues >> >> > >> from >> >> > >> >> > > above, >> >> > >> >> > > > > but >> >> > >> >> > > > > > definitely, sorting by document score at least should >> be >> >> > >> >> > implemented >> >> > >> >> > > > > along >> >> > >> >> > > > > > with limit. >> >> > >> >> > > > > > >> >> > >> >> > > > > > This is a pretty short commentary if you still have >> any >> >> > >> >> questions, >> >> > >> >> > > > please >> >> > >> >> > > > > > ask, do not hesitate) >> >> > >> >> > > > > > >> >> > >> >> > > > > > BR, >> >> > >> >> > > > > > Yuriy Shuliha >> >> > >> >> > > > > > >> >> > >> >> > > > > > чт, 19 вер. 2019 о 11:38 Павлухин Иван < >> >> > [hidden email] > >> >> > >> >> пише: >> >> > >> >> > > > > > >> >> > >> >> > > > > > > Yuriy, >> >> > >> >> > > > > > > >> >> > >> >> > > > > > > Greatly appreciate your interest. >> >> > >> >> > > > > > > >> >> > >> >> > > > > > > Could you please elaborate a little bit about >> sorting? >> >> > What >> >> > >> >> tasks >> >> > >> >> > > > does >> >> > >> >> > > > > > > it help to solve and how? It would be great to >> provide >> >> an >> >> > >> >> > example. >> >> > >> >> > > > > > > >> >> > >> >> > > > > > > ср, 18 сент. 2019 г. в 09:39, Alexei Scherbakov < >> >> > >> >> > > > > > > [hidden email] >: >> >> > >> >> > > > > > > > >> >> > >> >> > > > > > > > Denis, >> >> > >> >> > > > > > > > >> >> > >> >> > > > > > > > I like the idea of throwing an exception for >> enabled >> >> > text >> >> > >> >> > queries >> >> > >> >> > > > on >> >> > >> >> > > > > > > > persistent caches. >> >> > >> >> > > > > > > > >> >> > >> >> > > > > > > > Also I'm fine with proposed limit for unsorted >> >> > searches. >> >> > >> >> > > > > > > > >> >> > >> >> > > > > > > > Yury, please proceed with ticket creation. >> >> > >> >> > > > > > > > >> >> > >> >> > > > > > > > вт, 17 сент. 2019 г., 22:06 Denis Magda < >> >> > >> [hidden email] >> >> > >> >> >: >> >> > >> >> > > > > > > > >> >> > >> >> > > > > > > > > Igniters, >> >> > >> >> > > > > > > > > >> >> > >> >> > > > > > > > > I see nothing wrong with Yury's proposal in >> regards >> >> > >> >> full-text >> >> > >> >> > > > > search >> >> > >> >> > > > > > > API >> >> > >> >> > > > > > > > > evolution as long as Yury is ready to push it >> >> > forward. >> >> > >> >> > > > > > > > > >> >> > >> >> > > > > > > > > As for the in-memory mode only, it makes total >> >> sense >> >> > >> for >> >> > >> >> > > > in-memory >> >> > >> >> > > > > data >> >> > >> >> > > > > > > > > grid deployments when Ignite caches data of an >> >> > >> underlying >> >> > >> >> DB >> >> > >> >> > > like >> >> > >> >> > > > > > > Postgres. >> >> > >> >> > > > > > > > > As part of the changes, I would simply throw an >> >> > >> exception >> >> > >> >> (by >> >> > >> >> > > > > default) >> >> > >> >> > > > > > > if >> >> > >> >> > > > > > > > > the one attempts to use text indices with the >> >> native >> >> > >> >> > > persistence >> >> > >> >> > > > > > > enabled. >> >> > >> >> > > > > > > > > If the person is ready to live with that >> limitation >> >> > >> that >> >> > >> >> an >> >> > >> >> > > > > explicit >> >> > >> >> > > > > > > > > configuration change is needed to come around >> the >> >> > >> >> exception. >> >> > >> >> > > > > > > > > >> >> > >> >> > > > > > > > > Thoughts? >> >> > >> >> > > > > > > > > >> >> > >> >> > > > > > > > > >> >> > >> >> > > > > > > > > - >> >> > >> >> > > > > > > > > Denis >> >> > >> >> > > > > > > > > >> >> > >> >> > > > > > > > > >> >> > >> >> > > > > > > > > On Tue, Sep 17, 2019 at 7:44 AM Yuriy Shuliga < >> >> > >> >> > > [hidden email] >> >> > >> >> > > > > >> >> > >> >> > > > > > > wrote: >> >> > >> >> > > > > > > > > >> >> > >> >> > > > > > > > > > Hello to all again, >> >> > >> >> > > > > > > > > > >> >> > >> >> > > > > > > > > > Thank you for important comments and notes >> given >> >> > >> below! >> >> > >> >> > > > > > > > > > >> >> > >> >> > > > > > > > > > Let me answer and continue the discussion. >> >> > >> >> > > > > > > > > > >> >> > >> >> > > > > > > > > > (I) Overall needs in Lucene indexing >> >> > >> >> > > > > > > > > > >> >> > >> >> > > > > > > > > > Alexei has referenced to >> >> > >> >> > > > > > > > > > >> >> https://issues.apache.org/jira/browse/IGNITE-5371 >> >> > >> where >> >> > >> >> > > > > > > > > > absence of index persistence was declared as >> an >> >> > >> >> obstacle to >> >> > >> >> > > > > further >> >> > >> >> > > > > > > > > > development. >> >> > >> >> > > > > > > > > > >> >> > >> >> > > > > > > > > > a) This ticket is already closed as not >> valid.b) >> >> > >> There >> >> > >> >> are >> >> > >> >> > > > > definite >> >> > >> >> > > > > > > needs >> >> > >> >> > > > > > > > > > (and in our project as well) in just in-memory >> >> > >> indexing >> >> > >> >> of >> >> > >> >> > > > > selected >> >> > >> >> > > > > > > data. >> >> > >> >> > > > > > > > > > We intend to use search capabilities for >> fetching >> >> > >> >> limited >> >> > >> >> > > > amount >> >> > >> >> > > > > of >> >> > >> >> > > > > > > > > records >> >> > >> >> > > > > > > > > > that should be used in type-ahead search / >> >> > >> suggestions. >> >> > >> >> > > > > > > > > > Not all of the data will be indexed and the >> are >> >> no >> >> > >> need >> >> > >> >> in >> >> > >> >> > > > Lucene >> >> > >> >> > > > > > > index >> >> > >> >> > > > > > > > > to >> >> > >> >> > > > > > > > > > be persistence. Hope this is a wide pattern of >> >> > >> >> text-search >> >> > >> >> > > > usage. >> >> > >> >> > > > > > > > > > >> >> > >> >> > > > > > > > > > (II) Necessary fixes in current >> implementation. >> >> > >> >> > > > > > > > > > >> >> > >> >> > > > > > > > > > a) Implementation of correct *limit *(*offset* >> >> > seems >> >> > >> to >> >> > >> >> be >> >> > >> >> > > not >> >> > >> >> > > > > > > required >> >> > >> >> > > > > > > > > in >> >> > >> >> > > > > > > > > > text-search tasks for now) >> >> > >> >> > > > > > > > > > I have investigated the data flow for >> distributed >> >> > >> text >> >> > >> >> > > queries. >> >> > >> >> > > > > it >> >> > >> >> > > > > > > was >> >> > >> >> > > > > > > > > > simple test prefix query, like 'name'*='ene*'* >> >> > >> >> > > > > > > > > > For now each server-node returns all response >> >> > >> records to >> >> > >> >> > the >> >> > >> >> > > > > > > client-node >> >> > >> >> > > > > > > > > > and it may contain ~thousands, ~hundred >> thousands >> >> > >> >> records. >> >> > >> >> > > > > > > > > > Event if we need only first 10-100. Again, all >> >> the >> >> > >> >> results >> >> > >> >> > > are >> >> > >> >> > > > > added >> >> > >> >> > > > > > > to >> >> > >> >> > > > > > > > > > queue in GridCacheQueryFutureAdapter in >> arbitrary >> >> > >> order >> >> > >> >> by >> >> > >> >> > > > pages. >> >> > >> >> > > > > > > > > > I did not find here any means to deliver >> >> > >> deterministic >> >> > >> >> > > result. >> >> > >> >> > > > > > > > > > So implementing limit as part of query and >> >> > >> >> > > > > (GridCacheQueryRequest) >> >> > >> >> > > > > > > will >> >> > >> >> > > > > > > > > not >> >> > >> >> > > > > > > > > > change the nature of response but will limit >> load >> >> > on >> >> > >> >> nodes >> >> > >> >> > > and >> >> > >> >> > > > > > > > > networking. >> >> > >> >> > > > > > > > > > >> >> > >> >> > > > > > > > > > Can we consider to open a ticket for this? >> >> > >> >> > > > > > > > > > >> >> > >> >> > > > > > > > > > (III) Further extension of Lucene API >> exposition >> >> to >> >> > >> >> Ignite >> >> > >> >> > > > > > > > > > >> >> > >> >> > > > > > > > > > a) Sorting >> >> > >> >> > > > > > > > > > The solution for this could be: >> >> > >> >> > > > > > > > > > - Make entities comparable >> >> > >> >> > > > > > > > > > - Add custom comparator to entity >> >> > >> >> > > > > > > > > > - Add annotations to mark sorted fields for >> >> Lucene >> >> > >> >> indexing >> >> > >> >> > > > > > > > > > - Use comparators when merging responses or >> >> > reducing >> >> > >> to >> >> > >> >> > > desired >> >> > >> >> > > > > > > limit on >> >> > >> >> > > > > > > > > > client node. >> >> > >> >> > > > > > > > > > Will require full result set to be loaded into >> >> > >> memory. >> >> > >> >> > Though >> >> > >> >> > > > > can be >> >> > >> >> > > > > > > used >> >> > >> >> > > > > > > > > > for relatively small limits. >> >> > >> >> > > > > > > > > > BR, >> >> > >> >> > > > > > > > > > Yuriy Shuliha >> >> > >> >> > > > > > > > > > >> >> > >> >> > > > > > > > > > пт, 30 серп. 2019 о 10:37 Alexei Scherbakov < >> >> > >> >> > > > > > > > > [hidden email] > >> >> > >> >> > > > > > > > > > пише: >> >> > >> >> > > > > > > > > > >> >> > >> >> > > > > > > > > > > Yuriy, >> >> > >> >> > > > > > > > > > > >> >> > >> >> > > > > > > > > > > Note what one of major blockers for text >> >> queries >> >> > is >> >> > >> >> [1] >> >> > >> >> > > which >> >> > >> >> > > > > makes >> >> > >> >> > > > > > > > > > lucene >> >> > >> >> > > > > > > > > > > indexes unusable with persistence and main >> >> reason >> >> > >> for >> >> > >> >> > > > > > > discontinuation. >> >> > >> >> > > > > > > > > > > Probably it's should be addressed first to >> make >> >> > >> text >> >> > >> >> > > queries >> >> > >> >> > > > a >> >> > >> >> > > > > > > valid >> >> > >> >> > > > > > > > > > > product feature. >> >> > >> >> > > > > > > > > > > >> >> > >> >> > > > > > > > > > > Distributed sorting and advanved querying is >> >> > indeed >> >> > >> >> not a >> >> > >> >> > > > > trivial >> >> > >> >> > > > > > > task. >> >> > >> >> > > > > > > > > > > Some kind of merging must be implemented on >> >> query >> >> > >> >> > > originating >> >> > >> >> > > > > node. >> >> > >> >> > > > > > > > > > > >> >> > >> >> > > > > > > > > > > [1] >> >> > >> https://issues.apache.org/jira/browse/IGNITE-5371 >> >> > >> >> > > > > > > > > > > >> >> > >> >> > > > > > > > > > > чт, 29 авг. 2019 г. в 23:38, Denis Magda < >> >> > >> >> > > [hidden email] >> >> > >> >> > > > >: >> >> > >> >> > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > Yuriy, >> >> > >> >> > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > If you are ready to take over the >> full-text >> >> > >> search >> >> > >> >> > > indexes >> >> > >> >> > > > > then >> >> > >> >> > > > > > > > > please >> >> > >> >> > > > > > > > > > go >> >> > >> >> > > > > > > > > > > > ahead. The primary reason why the >> community >> >> > >> wants to >> >> > >> >> > > > > discontinue >> >> > >> >> > > > > > > them >> >> > >> >> > > > > > > > > > > first >> >> > >> >> > > > > > > > > > > > (and, probable, resurrect later) are the >> >> > >> limitations >> >> > >> >> > > listed >> >> > >> >> > > > > by >> >> > >> >> > > > > > > Andrey >> >> > >> >> > > > > > > > > > and >> >> > >> >> > > > > > > > > > > > minimal support from the community end. >> >> > >> >> > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > - >> >> > >> >> > > > > > > > > > > > Denis >> >> > >> >> > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > On Thu, Aug 29, 2019 at 1:29 PM Andrey >> >> > Mashenkov >> >> > >> < >> >> > >> >> > > > > > > > > > > > [hidden email] > >> >> > >> >> > > > > > > > > > > > wrote: >> >> > >> >> > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > Hi Yuriy, >> >> > >> >> > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > Unfortunatelly, there is a plan to >> >> > discontinue >> >> > >> >> > > > TextQueries >> >> > >> >> > > > > in >> >> > >> >> > > > > > > > > Ignite >> >> > >> >> > > > > > > > > > > [1]. >> >> > >> >> > > > > > > > > > > > > Motivation here is text indexes are not >> >> > >> >> persistent, >> >> > >> >> > not >> >> > >> >> > > > > > > > > transactional >> >> > >> >> > > > > > > > > > > and >> >> > >> >> > > > > > > > > > > > > can't be user together with SQL or >> inside >> >> > SQL. >> >> > >> >> > > > > > > > > > > > > and there is a lack of interest from >> >> > community >> >> > >> >> side. >> >> > >> >> > > > > > > > > > > > > You are weclome to take on these issues >> and >> >> > >> make >> >> > >> >> > > > > TextQueries >> >> > >> >> > > > > > > great. >> >> > >> >> > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > 1, PageSize can't be used to limit >> >> > resultset. >> >> > >> >> > > > > > > > > > > > > Query results return from data node to >> >> > >> client-side >> >> > >> >> > > cursor >> >> > >> >> > > > > in >> >> > >> >> > > > > > > > > > > page-by-page >> >> > >> >> > > > > > > > > > > > > manner and >> >> > >> >> > > > > > > > > > > > > this parameter is designed control page >> >> size. >> >> > >> It >> >> > >> >> is >> >> > >> >> > > > > supposed >> >> > >> >> > > > > > > query >> >> > >> >> > > > > > > > > > > > executes >> >> > >> >> > > > > > > > > > > > > lazily on server side and >> >> > >> >> > > > > > > > > > > > > it is not excepted full resultset be >> loaded >> >> > to >> >> > >> >> memory >> >> > >> >> > > on >> >> > >> >> > > > > server >> >> > >> >> > > > > > > > > side >> >> > >> >> > > > > > > > > > at >> >> > >> >> > > > > > > > > > > > > once, but by pages. >> >> > >> >> > > > > > > > > > > > > Do you mean you found Lucene load entire >> >> > >> resultset >> >> > >> >> > into >> >> > >> >> > > > > memory >> >> > >> >> > > > > > > > > before >> >> > >> >> > > > > > > > > > > > first >> >> > >> >> > > > > > > > > > > > > page is sent to client? >> >> > >> >> > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > I'd think a new parameter should be >> added >> >> to >> >> > >> limit >> >> > >> >> > > > result. >> >> > >> >> > > > > The >> >> > >> >> > > > > > > best >> >> > >> >> > > > > > > > > > > > > solution is to use query language >> commands >> >> > for >> >> > >> >> this, >> >> > >> >> > > e.g. >> >> > >> >> > > > > > > > > > > "LIMIT/OFFSET" >> >> > >> >> > > > > > > > > > > > in >> >> > >> >> > > > > > > > > > > > > SQL. >> >> > >> >> > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > This task doesn't look trivial. Query is >> >> > >> >> distributed >> >> > >> >> > > > > operation >> >> > >> >> > > > > > > and >> >> > >> >> > > > > > > > > > same >> >> > >> >> > > > > > > > > > > > > user query will be executed on data >> nodes >> >> > >> >> > > > > > > > > > > > > and then results from all nodes should >> be >> >> > >> correcly >> >> > >> >> > > merged >> >> > >> >> > > > > > > before >> >> > >> >> > > > > > > > > > being >> >> > >> >> > > > > > > > > > > > > returned via client-cursor. >> >> > >> >> > > > > > > > > > > > > So, LIMIT should be applied on every >> node >> >> and >> >> > >> >> then on >> >> > >> >> > > > merge >> >> > >> >> > > > > > > phase. >> >> > >> >> > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > Also, this may be non-obviuos, limiting >> >> > results >> >> > >> >> make >> >> > >> >> > no >> >> > >> >> > > > > sence >> >> > >> >> > > > > > > > > without >> >> > >> >> > > > > > > > > > > > > sorting, >> >> > >> >> > > > > > > > > > > > > as there is no guarantee every next >> query >> >> run >> >> > >> will >> >> > >> >> > > return >> >> > >> >> > > > > same >> >> > >> >> > > > > > > data >> >> > >> >> > > > > > > > > > > > because >> >> > >> >> > > > > > > > > > > > > of page reordeing. >> >> > >> >> > > > > > > > > > > > > Basically, merge phase receive results >> from >> >> > >> data >> >> > >> >> > nodes >> >> > >> >> > > > > > > > > asynchronously >> >> > >> >> > > > > > > > > > > and >> >> > >> >> > > > > > > > > > > > > messages from different nodes can't be >> >> > ordered. >> >> > >> >> > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > 2. >> >> > >> >> > > > > > > > > > > > > a. "tokenize" param name (for >> >> > @QueryTextFiled) >> >> > >> >> looks >> >> > >> >> > > more >> >> > >> >> > > > > > > verbose, >> >> > >> >> > > > > > > > > > > isn't >> >> > >> >> > > > > > > > > > > > > it. >> >> > >> >> > > > > > > > > > > > > b,c. What about distributed query? How >> >> > partial >> >> > >> >> > results >> >> > >> >> > > > from >> >> > >> >> > > > > > > nodes >> >> > >> >> > > > > > > > > > will >> >> > >> >> > > > > > > > > > > be >> >> > >> >> > > > > > > > > > > > > merged? >> >> > >> >> > > > > > > > > > > > > Does Lucene allows to configure >> comparator >> >> > for >> >> > >> >> data >> >> > >> >> > > > > sorting? >> >> > >> >> > > > > > > > > > > > > What comparator Ignite should choose to >> >> sort >> >> > >> >> result >> >> > >> >> > on >> >> > >> >> > > > > merge >> >> > >> >> > > > > > > phase? >> >> > >> >> > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > 3. For now Lucene engine is not >> >> configurable >> >> > at >> >> > >> >> all. >> >> > >> >> > > E.g. >> >> > >> >> > > > > it is >> >> > >> >> > > > > > > > > > > > impossible >> >> > >> >> > > > > > > > > > > > > to configure Tokenizer. >> >> > >> >> > > > > > > > > > > > > I'd think about possible ways to >> configure >> >> > >> engine >> >> > >> >> at >> >> > >> >> > > > first >> >> > >> >> > > > > and >> >> > >> >> > > > > > > only >> >> > >> >> > > > > > > > > > > then >> >> > >> >> > > > > > > > > > > > go >> >> > >> >> > > > > > > > > > > > > further to discuss\implement complex >> >> > features, >> >> > >> >> > > > > > > > > > > > > that may depends on engine config. >> >> > >> >> > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > On Thu, Aug 29, 2019 at 8:17 PM Yuriy >> >> > Shuliga < >> >> > >> >> > > > > > > [hidden email] > >> >> > >> >> > > > > > > > > > > wrote: >> >> > >> >> > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > Dear community, >> >> > >> >> > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > By starting this chain I'd like to >> open >> >> > >> >> discussion >> >> > >> >> > > that >> >> > >> >> > > > > would >> >> > >> >> > > > > > > > > come >> >> > >> >> > > > > > > > > > to >> >> > >> >> > > > > > > > > > > > > > contribution results in subj. area. >> >> > >> >> > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > Ignite has indexing capabilities, >> backed >> >> up >> >> > >> by >> >> > >> >> > > > different >> >> > >> >> > > > > > > > > > mechanisms, >> >> > >> >> > > > > > > > > > > > > > including Lucene. >> >> > >> >> > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > Currently, Lucene 7.5.0 is used (past >> >> year >> >> > >> >> > release). >> >> > >> >> > > > > > > > > > > > > > This is a wide spread and mature >> >> technology >> >> > >> that >> >> > >> >> > > covers >> >> > >> >> > > > > text >> >> > >> >> > > > > > > > > search >> >> > >> >> > > > > > > > > > > > area >> >> > >> >> > > > > > > > > > > > > > and beyond (e.g. spacial data >> indexing). >> >> > >> >> > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > My goal is to *expose more Lucene >> >> > >> functionality >> >> > >> >> to >> >> > >> >> > > > Ignite >> >> > >> >> > > > > > > > > indexing >> >> > >> >> > > > > > > > > > > and >> >> > >> >> > > > > > > > > > > > > > query mechanisms for text data*. >> >> > >> >> > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > It's quite simple request at current >> >> stage. >> >> > >> It >> >> > >> >> is >> >> > >> >> > > > coming >> >> > >> >> > > > > > > from our >> >> > >> >> > > > > > > > > > > > > project's >> >> > >> >> > > > > > > > > > > > > > needs, but i believe, will be useful >> for >> >> a >> >> > >> lot >> >> > >> >> more >> >> > >> >> > > > > people. >> >> > >> >> > > > > > > > > > > > > > Let's walk through and vote or discuss >> >> > about >> >> > >> >> Jira >> >> > >> >> > > > > tickets for >> >> > >> >> > > > > > > > > them. >> >> > >> >> > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > 1.[trivial] Use >> dataQuery.getPageSize() >> >> > to >> >> > >> >> limit >> >> > >> >> > > > search >> >> > >> >> > > > > > > > > response >> >> > >> >> > > > > > > > > > > > items >> >> > >> >> > > > > > > > > > > > > > inside GridLuceneIndex.query(). >> Currently >> >> > it >> >> > >> is >> >> > >> >> > > calling >> >> > >> >> > > > > > > > > > > > > > IndexSearcher.search(query, >> >> > >> >> *Integer.MAX_VALUE*) - >> >> > >> >> > so >> >> > >> >> > > > > > > basically >> >> > >> >> > > > > > > > > all >> >> > >> >> > > > > > > > > > > > > scored >> >> > >> >> > > > > > > > > > > > > > matches will me returned, what we do >> not >> >> > >> need in >> >> > >> >> > most >> >> > >> >> > > > > cases. >> >> > >> >> > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > 2.[simple] Add sorting. Then more >> >> capable >> >> > >> >> search >> >> > >> >> > > call >> >> > >> >> > > > > can be >> >> > >> >> > > > > > > > > > > > > > executed: *IndexSearcher.search(query, >> >> > count, >> >> > >> >> > > > > > > > > > > > > > sort) * >> >> > >> >> > > > > > > > > > > > > > Implementation steps: >> >> > >> >> > > > > > > > > > > > > > a) Introduce boolean *sortField* >> >> parameter >> >> > in >> >> > >> >> > > > > > > *@QueryTextFiled * >> >> > >> >> > > > > > > > > > > > > > annotation. If >> >> > >> >> > > > > > > > > > > > > > *true *the filed will be indexed but >> not >> >> > >> >> tokenized. >> >> > >> >> > > > > Number >> >> > >> >> > > > > > > types >> >> > >> >> > > > > > > > > > are >> >> > >> >> > > > > > > > > > > > > > preferred here. >> >> > >> >> > > > > > > > > > > > > > b) Add *sort* collection to >> *TextQuery* >> >> > >> >> > constructor. >> >> > >> >> > > It >> >> > >> >> > > > > > > should >> >> > >> >> > > > > > > > > > define >> >> > >> >> > > > > > > > > > > > > > desired sort fields used for querying. >> >> > >> >> > > > > > > > > > > > > > c) Implement Lucene sort usage in >> >> > >> >> > > > > GridLuceneIndex.query(). >> >> > >> >> > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > 3.[moderate] Build complex queries >> with >> >> > >> >> > *TextQuery*, >> >> > >> >> > > > > > > including >> >> > >> >> > > > > > > > > > > > > > terms/queries boosting. >> >> > >> >> > > > > > > > > > > > > > *This section for voting only, as >> >> requires >> >> > >> more >> >> > >> >> > > > detailed >> >> > >> >> > > > > > > work. >> >> > >> >> > > > > > > > > > Should >> >> > >> >> > > > > > > > > > > > be >> >> > >> >> > > > > > > > > > > > > > extended if community is interested in >> >> it.* >> >> > >> >> > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > Looking forward to your comments! >> >> > >> >> > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > BR, >> >> > >> >> > > > > > > > > > > > > > Yuriy Shuliha >> >> > >> >> > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > -- >> >> > >> >> > > > > > > > > > > > > Best regards, >> >> > >> >> > > > > > > > > > > > > Andrey V. Mashenkov >> >> > >> >> > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > >> >> > >> >> > > > > > > > > > > >> >> > >> >> > > > > > > > > > > -- >> >> > >> >> > > > > > > > > > > >> >> > >> >> > > > > > > > > > > Best regards, >> >> > >> >> > > > > > > > > > > Alexei Scherbakov >> >> > >> >> > > > > > > > > > > >> >> > >> >> > > > > > > > > > >> >> > >> >> > > > > > > > > >> >> > >> >> > > > > > > >> >> > >> >> > > > > > > >> >> > >> >> > > > > > > >> >> > >> >> > > > > > > -- >> >> > >> >> > > > > > > Best regards, >> >> > >> >> > > > > > > Ivan Pavlukhin >> >> > >> >> > > > > > > >> >> > >> >> > > > > >> >> > >> >> > > > > >> >> > >> >> > > > > >> >> > >> >> > > > > -- >> >> > >> >> > > > > Best regards, >> >> > >> >> > > > > Ivan Pavlukhin >> >> > >> >> > > > > >> >> > >> >> > > > >> >> > >> >> > > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > -- >> >> > >> >> > Best regards, >> >> > >> >> > Andrey V. Mashenkov >> >> > >> >> > >> >> > >> >> >> >> > >> > >> >> > >> > >> >> > >> > -- >> >> > >> > Best regards, >> >> > >> > Andrey V. Mashenkov >> >> > >> > >> >> > >> >> >> > > >> >> > >> >> > -- >> >> > Best regards, >> >> > Andrey V. Mashenkov >> >> > >> >> >> >> >> >> > |
Hello!
ASF way should probably start with an IEP :) Regards, -- Ilya Kasnacheev вт, 26 нояб. 2019 г. в 14:12, Zhenya Stanilovsky <[hidden email] >: > > Ok, lets forgot Solr and go through ASF way, if Yuriy prove this > functionality is helpful and PR it, why not ? > > isn`t it ? > > >Вторник, 26 ноября 2019, 14:06 +03:00 от Ilya Kasnacheev < > [hidden email]>: > > > >Hello! > > > >The problem here is that Solr is a multi-year effort by a lot of people. > We > >can't match that. > > > >Maybe we could integrate with Solr/Solr Cloud instead, by feeding our > cache > >information into their storage for indexing and relying on their own > >mechanisms for distributed IR sorting? > > > >Regards, > >-- > >Ilya Kasnacheev > > > > > >вт, 26 нояб. 2019 г. в 13:59, Zhenya Stanilovsky < > [hidden email] > >>: > > > >> > >> Ilya Kasnacheev, what a problem in Solr with Ignite functionality ? > >> > >> thanks ! > >> > >> >Вторник, 26 ноября 2019, 13:50 +03:00 от Ilya Kasnacheev < > >> [hidden email] >: > >> > > >> >Hello! > >> > > >> >I have a hunch that we are trying to build Apache Solr (or Solr Cloud) > >> into > >> >Apache Ignite. I think that's a lot of effort that is not very > justified. > >> > > >> >I don't think we should try to implement sorting in Apache Ignite, > because > >> >it is a lot of work, and a lot of code in our code base which we don't > >> >really want. > >> > > >> >Regards, > >> >-- > >> >Ilya Kasnacheev > >> > > >> > > >> >пт, 22 нояб. 2019 г. в 20:59, Yuriy Shuliga < [hidden email] >: > >> > > >> >> Dear Igniters, > >> >> > >> >> The first part of TextQuery improvement - a result limit - was > developed > >> >> and merged. > >> >> Now we have to develop most important functionality here - proper > >> sorting > >> >> of Lucene index response and correct reducing of them for distributed > >> >> queries. > >> >> > >> >> *There are two Lucene based aspects* > >> >> > >> >> 1. In case of using no sorting fields, the documents in response are > >> still > >> >> ordered by relevance. > >> >> Actually this is ScoreDoc.score value. > >> >> In order to reduce the distributed results correctly, the score > should > >> be > >> >> passed with response. > >> >> > >> >> 2. When sorting by conventional fields, then Lucene should have these > >> >> fields properly indexed and > >> >> corresponding Sort object should be applied to Lucene's search call. > >> >> In order to mark those fields a new annotation like '@SortField' may > be > >> >> introduced. > >> >> > >> >> *Reducing on Ignite * > >> >> > >> >> The obvious point of distributed response reduction is class > >> >> GridCacheDistributedQueryFuture. > >> >> Though, @Ivan Pavlukhin mentioned class with similar functionality: > >> >> ReduceIndexSorted > >> >> What I see here, that it is tangled with H2 related classes ( > >> >> org.h2.result.Row) and might not be unified with TextQuery reduction. > >> >> > >> >> Still need a support here. > >> >> > >> >> Overall, the goal of this letter is to initiate discussion on > TextQuery > >> >> Sorting implementation and come closer to ticket creation. > >> >> > >> >> BR, > >> >> Yuriy Shuliha > >> >> > >> >> вт, 22 жовт. 2019 о 13:31 Andrey Mashenkov < > [hidden email] > >> > > >> >> пише: > >> >> > >> >> > Hi Dmitry, Yuriy. > >> >> > > >> >> > I've found GridCacheQueryFutureAdapter has newly added > AtomicInteger > >> >> > 'total' field and 'limit; field as primitive int. > >> >> > > >> >> > Both fields are used inside synchronized block only. > >> >> > So, we can make both private and downgrade AtomicInteger to > primitive > >> >> int. > >> >> > > >> >> > Most likely, these fields can be replaced with one field. > >> >> > > >> >> > > >> >> > > >> >> > On Mon, Oct 21, 2019 at 10:01 PM Dmitriy Pavlov < > [hidden email] > >> > > >> >> > wrote: > >> >> > > >> >> > > Hi Andrey, > >> >> > > > >> >> > > I've checked this ticket comments, and there is a TC Bot visa > (with > >> no > >> >> > > blockers). > >> >> > > > >> >> > > Do you have any concerns related to this patch? > >> >> > > > >> >> > > Sincerely, > >> >> > > Dmitriy Pavlov > >> >> > > > >> >> > > чт, 17 окт. 2019 г. в 16:43, Yuriy Shuliga < [hidden email] > >: > >> >> > > > >> >> > >> Andrey, > >> >> > >> > >> >> > >> Per you request, I created ticket > >> >> > >> https://issues.apache.org/jira/browse/IGNITE-12291 linked to > >> >> > >> > >> https://issues.apache.org/jira/projects/IGNITE/issues/IGNITE-12189 > >> >> > >> > >> >> > >> Could you please proceed with PR merge ? > >> >> > >> > >> >> > >> BR, > >> >> > >> Yuriy Shuliha > >> >> > >> > >> >> > >> ср, 9 жовт. 2019 о 12:52 Andrey Mashenkov < > >> [hidden email] > >> >> > > >> >> > >> пише: > >> >> > >> > >> >> > >> > Hi Yuri, > >> >> > >> > > >> >> > >> > To get access to TC Bot you should register as TeamCity user > >> [1], if > >> >> > you > >> >> > >> > didn't do this already. > >> >> > >> > Then you will be able to authorize on Ignite TC Bot page with > >> same > >> >> > >> > credentials. > >> >> > >> > > >> >> > >> > [1] https://ci.ignite.apache.org/registerUser.html > >> >> > >> > > >> >> > >> > On Fri, Oct 4, 2019 at 3:10 PM Yuriy Shuliga < > [hidden email] > >> > > >> >> > wrote: > >> >> > >> > > >> >> > >> >> Andrew, > >> >> > >> >> > >> >> > >> >> I have corrected PR according to your notes. Please review. > >> >> > >> >> What will be the next steps in order to merge in? > >> >> > >> >> > >> >> > >> >> Y. > >> >> > >> >> > >> >> > >> >> чт, 3 жовт. 2019 о 17:47 Andrey Mashenkov < > >> >> > [hidden email] > > >> >> > >> >> пише: > >> >> > >> >> > >> >> > >> >> > Yuri, > >> >> > >> >> > > >> >> > >> >> > I've done with review. > >> >> > >> >> > No crime found, but trivial compatibility bug. > >> >> > >> >> > > >> >> > >> >> > On Thu, Oct 3, 2019 at 3:54 PM Yuriy Shuliga < > >> [hidden email] > > >> >> > >> wrote: > >> >> > >> >> > > >> >> > >> >> > > Denis, > >> >> > >> >> > > > >> >> > >> >> > > Thank you for your attention to this. > >> >> > >> >> > > as for now, the > >> >> > https://issues.apache.org/jira/browse/IGNITE-12189 > >> >> > >> >> > ticket > >> >> > >> >> > > is still pending review. > >> >> > >> >> > > Do we have a chance to move it forward somehow? > >> >> > >> >> > > > >> >> > >> >> > > BR, > >> >> > >> >> > > Yuriy Shuliha > >> >> > >> >> > > > >> >> > >> >> > > пн, 30 вер. 2019 о 23:35 Denis Magda < > [hidden email] > > >> пише: > >> >> > >> >> > > > >> >> > >> >> > > > Yuriy, > >> >> > >> >> > > > > >> >> > >> >> > > > I've seen you opening a pull-request with the first > >> changes: > >> >> > >> >> > > > https://issues.apache.org/jira/browse/IGNITE-12189 > >> >> > >> >> > > > > >> >> > >> >> > > > Alex Scherbakov and Ivan are you the right guys to do > the > >> >> > review? > >> >> > >> >> > > > > >> >> > >> >> > > > - > >> >> > >> >> > > > Denis > >> >> > >> >> > > > > >> >> > >> >> > > > > >> >> > >> >> > > > On Fri, Sep 27, 2019 at 8:48 AM Павлухин Иван < > >> >> > >> [hidden email] > > >> >> > >> >> > > wrote: > >> >> > >> >> > > > > >> >> > >> >> > > > > Yuriy, > >> >> > >> >> > > > > > >> >> > >> >> > > > > Thank you for providing details! Quite interesting. > >> >> > >> >> > > > > > >> >> > >> >> > > > > Yes, we already have support of distributed limit and > >> >> merging > >> >> > >> >> sorted > >> >> > >> >> > > > > subresults for SQL queries. E.g. ReduceIndexSorted > and > >> >> > >> >> > > > > MergeStreamIterator are used for merging sorted > streams. > >> >> > >> >> > > > > > >> >> > >> >> > > > > Could you please also clarify about score/relevance? > Is > >> it > >> >> > >> >> provided > >> >> > >> >> > by > >> >> > >> >> > > > > Lucene engine for each query result? I am thinking > how > >> to > >> >> do > >> >> > >> >> sorted > >> >> > >> >> > > > > merge properly in this case. > >> >> > >> >> > > > > > >> >> > >> >> > > > > ср, 25 сент. 2019 г. в 18:56, Yuriy Shuliga < > >> >> > [hidden email] > >> >> > >> >: > >> >> > >> >> > > > > > > >> >> > >> >> > > > > > Ivan, > >> >> > >> >> > > > > > > >> >> > >> >> > > > > > Thank you for interesting question! > >> >> > >> >> > > > > > > >> >> > >> >> > > > > > Text searches (or full text searches) are mostly > >> >> > >> human-oriented. > >> >> > >> >> > And > >> >> > >> >> > > > the > >> >> > >> >> > > > > > point of user's interest is topmost part of > response. > >> >> > >> >> > > > > > Then user can read it, evaluate and use the given > >> records > >> >> > for > >> >> > >> >> > further > >> >> > >> >> > > > > > purposes. > >> >> > >> >> > > > > > > >> >> > >> >> > > > > > Particularly in our case, we use Ignite for > operations > >> >> with > >> >> > >> >> > financial > >> >> > >> >> > > > > data, > >> >> > >> >> > > > > > and there lots of text stuff like assets names, > fin. > >> >> > >> >> instruments, > >> >> > >> >> > > > > companies > >> >> > >> >> > > > > > etc. > >> >> > >> >> > > > > > In order to operate with this quickly and reliably, > >> users > >> >> > >> used > >> >> > >> >> to > >> >> > >> >> > > work > >> >> > >> >> > > > > with > >> >> > >> >> > > > > > text search, type-ahead completions, suggestions. > >> >> > >> >> > > > > > > >> >> > >> >> > > > > > For this purposes we are indexing particular string > >> data > >> >> in > >> >> > >> >> > separate > >> >> > >> >> > > > > caches. > >> >> > >> >> > > > > > > >> >> > >> >> > > > > > Sorting capabilities and response size limitations > are > >> >> very > >> >> > >> >> > important > >> >> > >> >> > > > > > there. As our API have to provide most relevant > >> >> information > >> >> > >> in > >> >> > >> >> view > >> >> > >> >> > > of > >> >> > >> >> > > > > > limited size. > >> >> > >> >> > > > > > > >> >> > >> >> > > > > > Now let me comment some Ignite/Lucene perspective. > >> >> > >> >> > > > > > Actually Ignite queries and Lucene returns > >> >> > >> *TopDocs.scoresDocs > >> >> > >> >> > > *already > >> >> > >> >> > > > > > sorted by *score *(relevance). So most relevant > >> documents > >> >> > >> are on > >> >> > >> >> > the > >> >> > >> >> > > > top. > >> >> > >> >> > > > > > And currently distributed queries responses from > >> >> different > >> >> > >> nodes > >> >> > >> >> > are > >> >> > >> >> > > > > merged > >> >> > >> >> > > > > > into final query cursor queue in arbitrary way. > >> >> > >> >> > > > > > So in fact we already have the score order ruined > >> here. > >> >> > Also > >> >> > >> >> Ignite > >> >> > >> >> > > > > > requests all possible documents from Lucene that is > >> >> > redundant > >> >> > >> >> and > >> >> > >> >> > not > >> >> > >> >> > > > > good > >> >> > >> >> > > > > > for performance. > >> >> > >> >> > > > > > > >> >> > >> >> > > > > > I'm implementing *limit* parameter to be part of > >> >> *TextQuery > >> >> > >> *and > >> >> > >> >> > have > >> >> > >> >> > > > to > >> >> > >> >> > > > > > notice that we still have to add sorting for text > >> queries > >> >> > >> >> > processing > >> >> > >> >> > > in > >> >> > >> >> > > > > > order to have applicable results. > >> >> > >> >> > > > > > > >> >> > >> >> > > > > > *Limit* parameter itself should improve the part of > >> >> issues > >> >> > >> from > >> >> > >> >> > > above, > >> >> > >> >> > > > > but > >> >> > >> >> > > > > > definitely, sorting by document score at least > should > >> be > >> >> > >> >> > implemented > >> >> > >> >> > > > > along > >> >> > >> >> > > > > > with limit. > >> >> > >> >> > > > > > > >> >> > >> >> > > > > > This is a pretty short commentary if you still have > >> any > >> >> > >> >> questions, > >> >> > >> >> > > > please > >> >> > >> >> > > > > > ask, do not hesitate) > >> >> > >> >> > > > > > > >> >> > >> >> > > > > > BR, > >> >> > >> >> > > > > > Yuriy Shuliha > >> >> > >> >> > > > > > > >> >> > >> >> > > > > > чт, 19 вер. 2019 о 11:38 Павлухин Иван < > >> >> > [hidden email] > > >> >> > >> >> пише: > >> >> > >> >> > > > > > > >> >> > >> >> > > > > > > Yuriy, > >> >> > >> >> > > > > > > > >> >> > >> >> > > > > > > Greatly appreciate your interest. > >> >> > >> >> > > > > > > > >> >> > >> >> > > > > > > Could you please elaborate a little bit about > >> sorting? > >> >> > What > >> >> > >> >> tasks > >> >> > >> >> > > > does > >> >> > >> >> > > > > > > it help to solve and how? It would be great to > >> provide > >> >> an > >> >> > >> >> > example. > >> >> > >> >> > > > > > > > >> >> > >> >> > > > > > > ср, 18 сент. 2019 г. в 09:39, Alexei Scherbakov < > >> >> > >> >> > > > > > > [hidden email] >: > >> >> > >> >> > > > > > > > > >> >> > >> >> > > > > > > > Denis, > >> >> > >> >> > > > > > > > > >> >> > >> >> > > > > > > > I like the idea of throwing an exception for > >> enabled > >> >> > text > >> >> > >> >> > queries > >> >> > >> >> > > > on > >> >> > >> >> > > > > > > > persistent caches. > >> >> > >> >> > > > > > > > > >> >> > >> >> > > > > > > > Also I'm fine with proposed limit for unsorted > >> >> > searches. > >> >> > >> >> > > > > > > > > >> >> > >> >> > > > > > > > Yury, please proceed with ticket creation. > >> >> > >> >> > > > > > > > > >> >> > >> >> > > > > > > > вт, 17 сент. 2019 г., 22:06 Denis Magda < > >> >> > >> [hidden email] > >> >> > >> >> >: > >> >> > >> >> > > > > > > > > >> >> > >> >> > > > > > > > > Igniters, > >> >> > >> >> > > > > > > > > > >> >> > >> >> > > > > > > > > I see nothing wrong with Yury's proposal in > >> regards > >> >> > >> >> full-text > >> >> > >> >> > > > > search > >> >> > >> >> > > > > > > API > >> >> > >> >> > > > > > > > > evolution as long as Yury is ready to push it > >> >> > forward. > >> >> > >> >> > > > > > > > > > >> >> > >> >> > > > > > > > > As for the in-memory mode only, it makes > total > >> >> sense > >> >> > >> for > >> >> > >> >> > > > in-memory > >> >> > >> >> > > > > data > >> >> > >> >> > > > > > > > > grid deployments when Ignite caches data of > an > >> >> > >> underlying > >> >> > >> >> DB > >> >> > >> >> > > like > >> >> > >> >> > > > > > > Postgres. > >> >> > >> >> > > > > > > > > As part of the changes, I would simply throw > an > >> >> > >> exception > >> >> > >> >> (by > >> >> > >> >> > > > > default) > >> >> > >> >> > > > > > > if > >> >> > >> >> > > > > > > > > the one attempts to use text indices with the > >> >> native > >> >> > >> >> > > persistence > >> >> > >> >> > > > > > > enabled. > >> >> > >> >> > > > > > > > > If the person is ready to live with that > >> limitation > >> >> > >> that > >> >> > >> >> an > >> >> > >> >> > > > > explicit > >> >> > >> >> > > > > > > > > configuration change is needed to come around > >> the > >> >> > >> >> exception. > >> >> > >> >> > > > > > > > > > >> >> > >> >> > > > > > > > > Thoughts? > >> >> > >> >> > > > > > > > > > >> >> > >> >> > > > > > > > > > >> >> > >> >> > > > > > > > > - > >> >> > >> >> > > > > > > > > Denis > >> >> > >> >> > > > > > > > > > >> >> > >> >> > > > > > > > > > >> >> > >> >> > > > > > > > > On Tue, Sep 17, 2019 at 7:44 AM Yuriy > Shuliga < > >> >> > >> >> > > [hidden email] > >> >> > >> >> > > > > > >> >> > >> >> > > > > > > wrote: > >> >> > >> >> > > > > > > > > > >> >> > >> >> > > > > > > > > > Hello to all again, > >> >> > >> >> > > > > > > > > > > >> >> > >> >> > > > > > > > > > Thank you for important comments and notes > >> given > >> >> > >> below! > >> >> > >> >> > > > > > > > > > > >> >> > >> >> > > > > > > > > > Let me answer and continue the discussion. > >> >> > >> >> > > > > > > > > > > >> >> > >> >> > > > > > > > > > (I) Overall needs in Lucene indexing > >> >> > >> >> > > > > > > > > > > >> >> > >> >> > > > > > > > > > Alexei has referenced to > >> >> > >> >> > > > > > > > > > > >> >> https://issues.apache.org/jira/browse/IGNITE-5371 > >> >> > >> where > >> >> > >> >> > > > > > > > > > absence of index persistence was declared > as > >> an > >> >> > >> >> obstacle to > >> >> > >> >> > > > > further > >> >> > >> >> > > > > > > > > > development. > >> >> > >> >> > > > > > > > > > > >> >> > >> >> > > > > > > > > > a) This ticket is already closed as not > >> valid.b) > >> >> > >> There > >> >> > >> >> are > >> >> > >> >> > > > > definite > >> >> > >> >> > > > > > > needs > >> >> > >> >> > > > > > > > > > (and in our project as well) in just > in-memory > >> >> > >> indexing > >> >> > >> >> of > >> >> > >> >> > > > > selected > >> >> > >> >> > > > > > > data. > >> >> > >> >> > > > > > > > > > We intend to use search capabilities for > >> fetching > >> >> > >> >> limited > >> >> > >> >> > > > amount > >> >> > >> >> > > > > of > >> >> > >> >> > > > > > > > > records > >> >> > >> >> > > > > > > > > > that should be used in type-ahead search / > >> >> > >> suggestions. > >> >> > >> >> > > > > > > > > > Not all of the data will be indexed and the > >> are > >> >> no > >> >> > >> need > >> >> > >> >> in > >> >> > >> >> > > > Lucene > >> >> > >> >> > > > > > > index > >> >> > >> >> > > > > > > > > to > >> >> > >> >> > > > > > > > > > be persistence. Hope this is a wide > pattern of > >> >> > >> >> text-search > >> >> > >> >> > > > usage. > >> >> > >> >> > > > > > > > > > > >> >> > >> >> > > > > > > > > > (II) Necessary fixes in current > >> implementation. > >> >> > >> >> > > > > > > > > > > >> >> > >> >> > > > > > > > > > a) Implementation of correct *limit > *(*offset* > >> >> > seems > >> >> > >> to > >> >> > >> >> be > >> >> > >> >> > > not > >> >> > >> >> > > > > > > required > >> >> > >> >> > > > > > > > > in > >> >> > >> >> > > > > > > > > > text-search tasks for now) > >> >> > >> >> > > > > > > > > > I have investigated the data flow for > >> distributed > >> >> > >> text > >> >> > >> >> > > queries. > >> >> > >> >> > > > > it > >> >> > >> >> > > > > > > was > >> >> > >> >> > > > > > > > > > simple test prefix query, like > 'name'*='ene*'* > >> >> > >> >> > > > > > > > > > For now each server-node returns all > response > >> >> > >> records to > >> >> > >> >> > the > >> >> > >> >> > > > > > > client-node > >> >> > >> >> > > > > > > > > > and it may contain ~thousands, ~hundred > >> thousands > >> >> > >> >> records. > >> >> > >> >> > > > > > > > > > Event if we need only first 10-100. Again, > all > >> >> the > >> >> > >> >> results > >> >> > >> >> > > are > >> >> > >> >> > > > > added > >> >> > >> >> > > > > > > to > >> >> > >> >> > > > > > > > > > queue in GridCacheQueryFutureAdapter in > >> arbitrary > >> >> > >> order > >> >> > >> >> by > >> >> > >> >> > > > pages. > >> >> > >> >> > > > > > > > > > I did not find here any means to deliver > >> >> > >> deterministic > >> >> > >> >> > > result. > >> >> > >> >> > > > > > > > > > So implementing limit as part of query and > >> >> > >> >> > > > > (GridCacheQueryRequest) > >> >> > >> >> > > > > > > will > >> >> > >> >> > > > > > > > > not > >> >> > >> >> > > > > > > > > > change the nature of response but will > limit > >> load > >> >> > on > >> >> > >> >> nodes > >> >> > >> >> > > and > >> >> > >> >> > > > > > > > > networking. > >> >> > >> >> > > > > > > > > > > >> >> > >> >> > > > > > > > > > Can we consider to open a ticket for this? > >> >> > >> >> > > > > > > > > > > >> >> > >> >> > > > > > > > > > (III) Further extension of Lucene API > >> exposition > >> >> to > >> >> > >> >> Ignite > >> >> > >> >> > > > > > > > > > > >> >> > >> >> > > > > > > > > > a) Sorting > >> >> > >> >> > > > > > > > > > The solution for this could be: > >> >> > >> >> > > > > > > > > > - Make entities comparable > >> >> > >> >> > > > > > > > > > - Add custom comparator to entity > >> >> > >> >> > > > > > > > > > - Add annotations to mark sorted fields for > >> >> Lucene > >> >> > >> >> indexing > >> >> > >> >> > > > > > > > > > - Use comparators when merging responses or > >> >> > reducing > >> >> > >> to > >> >> > >> >> > > desired > >> >> > >> >> > > > > > > limit on > >> >> > >> >> > > > > > > > > > client node. > >> >> > >> >> > > > > > > > > > Will require full result set to be loaded > into > >> >> > >> memory. > >> >> > >> >> > Though > >> >> > >> >> > > > > can be > >> >> > >> >> > > > > > > used > >> >> > >> >> > > > > > > > > > for relatively small limits. > >> >> > >> >> > > > > > > > > > BR, > >> >> > >> >> > > > > > > > > > Yuriy Shuliha > >> >> > >> >> > > > > > > > > > > >> >> > >> >> > > > > > > > > > пт, 30 серп. 2019 о 10:37 Alexei > Scherbakov < > >> >> > >> >> > > > > > > > > [hidden email] > > >> >> > >> >> > > > > > > > > > пише: > >> >> > >> >> > > > > > > > > > > >> >> > >> >> > > > > > > > > > > Yuriy, > >> >> > >> >> > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > Note what one of major blockers for text > >> >> queries > >> >> > is > >> >> > >> >> [1] > >> >> > >> >> > > which > >> >> > >> >> > > > > makes > >> >> > >> >> > > > > > > > > > lucene > >> >> > >> >> > > > > > > > > > > indexes unusable with persistence and > main > >> >> reason > >> >> > >> for > >> >> > >> >> > > > > > > discontinuation. > >> >> > >> >> > > > > > > > > > > Probably it's should be addressed first > to > >> make > >> >> > >> text > >> >> > >> >> > > queries > >> >> > >> >> > > > a > >> >> > >> >> > > > > > > valid > >> >> > >> >> > > > > > > > > > > product feature. > >> >> > >> >> > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > Distributed sorting and advanved > querying is > >> >> > indeed > >> >> > >> >> not a > >> >> > >> >> > > > > trivial > >> >> > >> >> > > > > > > task. > >> >> > >> >> > > > > > > > > > > Some kind of merging must be implemented > on > >> >> query > >> >> > >> >> > > originating > >> >> > >> >> > > > > node. > >> >> > >> >> > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > [1] > >> >> > >> https://issues.apache.org/jira/browse/IGNITE-5371 > >> >> > >> >> > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > чт, 29 авг. 2019 г. в 23:38, Denis Magda > < > >> >> > >> >> > > [hidden email] > >> >> > >> >> > > > >: > >> >> > >> >> > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > Yuriy, > >> >> > >> >> > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > If you are ready to take over the > >> full-text > >> >> > >> search > >> >> > >> >> > > indexes > >> >> > >> >> > > > > then > >> >> > >> >> > > > > > > > > please > >> >> > >> >> > > > > > > > > > go > >> >> > >> >> > > > > > > > > > > > ahead. The primary reason why the > >> community > >> >> > >> wants to > >> >> > >> >> > > > > discontinue > >> >> > >> >> > > > > > > them > >> >> > >> >> > > > > > > > > > > first > >> >> > >> >> > > > > > > > > > > > (and, probable, resurrect later) are > the > >> >> > >> limitations > >> >> > >> >> > > listed > >> >> > >> >> > > > > by > >> >> > >> >> > > > > > > Andrey > >> >> > >> >> > > > > > > > > > and > >> >> > >> >> > > > > > > > > > > > minimal support from the community end. > >> >> > >> >> > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > - > >> >> > >> >> > > > > > > > > > > > Denis > >> >> > >> >> > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > On Thu, Aug 29, 2019 at 1:29 PM Andrey > >> >> > Mashenkov > >> >> > >> < > >> >> > >> >> > > > > > > > > > > > [hidden email] > > >> >> > >> >> > > > > > > > > > > > wrote: > >> >> > >> >> > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > Hi Yuriy, > >> >> > >> >> > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > Unfortunatelly, there is a plan to > >> >> > discontinue > >> >> > >> >> > > > TextQueries > >> >> > >> >> > > > > in > >> >> > >> >> > > > > > > > > Ignite > >> >> > >> >> > > > > > > > > > > [1]. > >> >> > >> >> > > > > > > > > > > > > Motivation here is text indexes are > not > >> >> > >> >> persistent, > >> >> > >> >> > not > >> >> > >> >> > > > > > > > > transactional > >> >> > >> >> > > > > > > > > > > and > >> >> > >> >> > > > > > > > > > > > > can't be user together with SQL or > >> inside > >> >> > SQL. > >> >> > >> >> > > > > > > > > > > > > and there is a lack of interest from > >> >> > community > >> >> > >> >> side. > >> >> > >> >> > > > > > > > > > > > > You are weclome to take on these > issues > >> and > >> >> > >> make > >> >> > >> >> > > > > TextQueries > >> >> > >> >> > > > > > > great. > >> >> > >> >> > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > 1, PageSize can't be used to limit > >> >> > resultset. > >> >> > >> >> > > > > > > > > > > > > Query results return from data node > to > >> >> > >> client-side > >> >> > >> >> > > cursor > >> >> > >> >> > > > > in > >> >> > >> >> > > > > > > > > > > page-by-page > >> >> > >> >> > > > > > > > > > > > > manner and > >> >> > >> >> > > > > > > > > > > > > this parameter is designed control > page > >> >> size. > >> >> > >> It > >> >> > >> >> is > >> >> > >> >> > > > > supposed > >> >> > >> >> > > > > > > query > >> >> > >> >> > > > > > > > > > > > executes > >> >> > >> >> > > > > > > > > > > > > lazily on server side and > >> >> > >> >> > > > > > > > > > > > > it is not excepted full resultset be > >> loaded > >> >> > to > >> >> > >> >> memory > >> >> > >> >> > > on > >> >> > >> >> > > > > server > >> >> > >> >> > > > > > > > > side > >> >> > >> >> > > > > > > > > > at > >> >> > >> >> > > > > > > > > > > > > once, but by pages. > >> >> > >> >> > > > > > > > > > > > > Do you mean you found Lucene load > entire > >> >> > >> resultset > >> >> > >> >> > into > >> >> > >> >> > > > > memory > >> >> > >> >> > > > > > > > > before > >> >> > >> >> > > > > > > > > > > > first > >> >> > >> >> > > > > > > > > > > > > page is sent to client? > >> >> > >> >> > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > I'd think a new parameter should be > >> added > >> >> to > >> >> > >> limit > >> >> > >> >> > > > result. > >> >> > >> >> > > > > The > >> >> > >> >> > > > > > > best > >> >> > >> >> > > > > > > > > > > > > solution is to use query language > >> commands > >> >> > for > >> >> > >> >> this, > >> >> > >> >> > > e.g. > >> >> > >> >> > > > > > > > > > > "LIMIT/OFFSET" > >> >> > >> >> > > > > > > > > > > > in > >> >> > >> >> > > > > > > > > > > > > SQL. > >> >> > >> >> > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > This task doesn't look trivial. > Query is > >> >> > >> >> distributed > >> >> > >> >> > > > > operation > >> >> > >> >> > > > > > > and > >> >> > >> >> > > > > > > > > > same > >> >> > >> >> > > > > > > > > > > > > user query will be executed on data > >> nodes > >> >> > >> >> > > > > > > > > > > > > and then results from all nodes > should > >> be > >> >> > >> correcly > >> >> > >> >> > > merged > >> >> > >> >> > > > > > > before > >> >> > >> >> > > > > > > > > > being > >> >> > >> >> > > > > > > > > > > > > returned via client-cursor. > >> >> > >> >> > > > > > > > > > > > > So, LIMIT should be applied on every > >> node > >> >> and > >> >> > >> >> then on > >> >> > >> >> > > > merge > >> >> > >> >> > > > > > > phase. > >> >> > >> >> > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > Also, this may be non-obviuos, > limiting > >> >> > results > >> >> > >> >> make > >> >> > >> >> > no > >> >> > >> >> > > > > sence > >> >> > >> >> > > > > > > > > without > >> >> > >> >> > > > > > > > > > > > > sorting, > >> >> > >> >> > > > > > > > > > > > > as there is no guarantee every next > >> query > >> >> run > >> >> > >> will > >> >> > >> >> > > return > >> >> > >> >> > > > > same > >> >> > >> >> > > > > > > data > >> >> > >> >> > > > > > > > > > > > because > >> >> > >> >> > > > > > > > > > > > > of page reordeing. > >> >> > >> >> > > > > > > > > > > > > Basically, merge phase receive > results > >> from > >> >> > >> data > >> >> > >> >> > nodes > >> >> > >> >> > > > > > > > > asynchronously > >> >> > >> >> > > > > > > > > > > and > >> >> > >> >> > > > > > > > > > > > > messages from different nodes can't > be > >> >> > ordered. > >> >> > >> >> > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > 2. > >> >> > >> >> > > > > > > > > > > > > a. "tokenize" param name (for > >> >> > @QueryTextFiled) > >> >> > >> >> looks > >> >> > >> >> > > more > >> >> > >> >> > > > > > > verbose, > >> >> > >> >> > > > > > > > > > > isn't > >> >> > >> >> > > > > > > > > > > > > it. > >> >> > >> >> > > > > > > > > > > > > b,c. What about distributed query? > How > >> >> > partial > >> >> > >> >> > results > >> >> > >> >> > > > from > >> >> > >> >> > > > > > > nodes > >> >> > >> >> > > > > > > > > > will > >> >> > >> >> > > > > > > > > > > be > >> >> > >> >> > > > > > > > > > > > > merged? > >> >> > >> >> > > > > > > > > > > > > Does Lucene allows to configure > >> comparator > >> >> > for > >> >> > >> >> data > >> >> > >> >> > > > > sorting? > >> >> > >> >> > > > > > > > > > > > > What comparator Ignite should choose > to > >> >> sort > >> >> > >> >> result > >> >> > >> >> > on > >> >> > >> >> > > > > merge > >> >> > >> >> > > > > > > phase? > >> >> > >> >> > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > 3. For now Lucene engine is not > >> >> configurable > >> >> > at > >> >> > >> >> all. > >> >> > >> >> > > E.g. > >> >> > >> >> > > > > it is > >> >> > >> >> > > > > > > > > > > > impossible > >> >> > >> >> > > > > > > > > > > > > to configure Tokenizer. > >> >> > >> >> > > > > > > > > > > > > I'd think about possible ways to > >> configure > >> >> > >> engine > >> >> > >> >> at > >> >> > >> >> > > > first > >> >> > >> >> > > > > and > >> >> > >> >> > > > > > > only > >> >> > >> >> > > > > > > > > > > then > >> >> > >> >> > > > > > > > > > > > go > >> >> > >> >> > > > > > > > > > > > > further to discuss\implement complex > >> >> > features, > >> >> > >> >> > > > > > > > > > > > > that may depends on engine config. > >> >> > >> >> > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > On Thu, Aug 29, 2019 at 8:17 PM Yuriy > >> >> > Shuliga < > >> >> > >> >> > > > > > > [hidden email] > > >> >> > >> >> > > > > > > > > > > wrote: > >> >> > >> >> > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > Dear community, > >> >> > >> >> > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > By starting this chain I'd like to > >> open > >> >> > >> >> discussion > >> >> > >> >> > > that > >> >> > >> >> > > > > would > >> >> > >> >> > > > > > > > > come > >> >> > >> >> > > > > > > > > > to > >> >> > >> >> > > > > > > > > > > > > > contribution results in subj. area. > >> >> > >> >> > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > Ignite has indexing capabilities, > >> backed > >> >> up > >> >> > >> by > >> >> > >> >> > > > different > >> >> > >> >> > > > > > > > > > mechanisms, > >> >> > >> >> > > > > > > > > > > > > > including Lucene. > >> >> > >> >> > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > Currently, Lucene 7.5.0 is used > (past > >> >> year > >> >> > >> >> > release). > >> >> > >> >> > > > > > > > > > > > > > This is a wide spread and mature > >> >> technology > >> >> > >> that > >> >> > >> >> > > covers > >> >> > >> >> > > > > text > >> >> > >> >> > > > > > > > > search > >> >> > >> >> > > > > > > > > > > > area > >> >> > >> >> > > > > > > > > > > > > > and beyond (e.g. spacial data > >> indexing). > >> >> > >> >> > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > My goal is to *expose more Lucene > >> >> > >> functionality > >> >> > >> >> to > >> >> > >> >> > > > Ignite > >> >> > >> >> > > > > > > > > indexing > >> >> > >> >> > > > > > > > > > > and > >> >> > >> >> > > > > > > > > > > > > > query mechanisms for text data*. > >> >> > >> >> > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > It's quite simple request at > current > >> >> stage. > >> >> > >> It > >> >> > >> >> is > >> >> > >> >> > > > coming > >> >> > >> >> > > > > > > from our > >> >> > >> >> > > > > > > > > > > > > project's > >> >> > >> >> > > > > > > > > > > > > > needs, but i believe, will be > useful > >> for > >> >> a > >> >> > >> lot > >> >> > >> >> more > >> >> > >> >> > > > > people. > >> >> > >> >> > > > > > > > > > > > > > Let's walk through and vote or > discuss > >> >> > about > >> >> > >> >> Jira > >> >> > >> >> > > > > tickets for > >> >> > >> >> > > > > > > > > them. > >> >> > >> >> > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > 1.[trivial] Use > >> dataQuery.getPageSize() > >> >> > to > >> >> > >> >> limit > >> >> > >> >> > > > search > >> >> > >> >> > > > > > > > > response > >> >> > >> >> > > > > > > > > > > > items > >> >> > >> >> > > > > > > > > > > > > > inside GridLuceneIndex.query(). > >> Currently > >> >> > it > >> >> > >> is > >> >> > >> >> > > calling > >> >> > >> >> > > > > > > > > > > > > > IndexSearcher.search(query, > >> >> > >> >> *Integer.MAX_VALUE*) - > >> >> > >> >> > so > >> >> > >> >> > > > > > > basically > >> >> > >> >> > > > > > > > > all > >> >> > >> >> > > > > > > > > > > > > scored > >> >> > >> >> > > > > > > > > > > > > > matches will me returned, what we > do > >> not > >> >> > >> need in > >> >> > >> >> > most > >> >> > >> >> > > > > cases. > >> >> > >> >> > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > 2.[simple] Add sorting. Then more > >> >> capable > >> >> > >> >> search > >> >> > >> >> > > call > >> >> > >> >> > > > > can be > >> >> > >> >> > > > > > > > > > > > > > executed: > *IndexSearcher.search(query, > >> >> > count, > >> >> > >> >> > > > > > > > > > > > > > sort) * > >> >> > >> >> > > > > > > > > > > > > > Implementation steps: > >> >> > >> >> > > > > > > > > > > > > > a) Introduce boolean *sortField* > >> >> parameter > >> >> > in > >> >> > >> >> > > > > > > *@QueryTextFiled * > >> >> > >> >> > > > > > > > > > > > > > annotation. If > >> >> > >> >> > > > > > > > > > > > > > *true *the filed will be indexed > but > >> not > >> >> > >> >> tokenized. > >> >> > >> >> > > > > Number > >> >> > >> >> > > > > > > types > >> >> > >> >> > > > > > > > > > are > >> >> > >> >> > > > > > > > > > > > > > preferred here. > >> >> > >> >> > > > > > > > > > > > > > b) Add *sort* collection to > >> *TextQuery* > >> >> > >> >> > constructor. > >> >> > >> >> > > It > >> >> > >> >> > > > > > > should > >> >> > >> >> > > > > > > > > > define > >> >> > >> >> > > > > > > > > > > > > > desired sort fields used for > querying. > >> >> > >> >> > > > > > > > > > > > > > c) Implement Lucene sort usage in > >> >> > >> >> > > > > GridLuceneIndex.query(). > >> >> > >> >> > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > 3.[moderate] Build complex queries > >> with > >> >> > >> >> > *TextQuery*, > >> >> > >> >> > > > > > > including > >> >> > >> >> > > > > > > > > > > > > > terms/queries boosting. > >> >> > >> >> > > > > > > > > > > > > > *This section for voting only, as > >> >> requires > >> >> > >> more > >> >> > >> >> > > > detailed > >> >> > >> >> > > > > > > work. > >> >> > >> >> > > > > > > > > > Should > >> >> > >> >> > > > > > > > > > > > be > >> >> > >> >> > > > > > > > > > > > > > extended if community is > interested in > >> >> it.* > >> >> > >> >> > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > Looking forward to your comments! > >> >> > >> >> > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > BR, > >> >> > >> >> > > > > > > > > > > > > > Yuriy Shuliha > >> >> > >> >> > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > -- > >> >> > >> >> > > > > > > > > > > > > Best regards, > >> >> > >> >> > > > > > > > > > > > > Andrey V. Mashenkov > >> >> > >> >> > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > -- > >> >> > >> >> > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > Best regards, > >> >> > >> >> > > > > > > > > > > Alexei Scherbakov > >> >> > >> >> > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > >> >> > >> >> > > > > > > > > > >> >> > >> >> > > > > > > > >> >> > >> >> > > > > > > > >> >> > >> >> > > > > > > > >> >> > >> >> > > > > > > -- > >> >> > >> >> > > > > > > Best regards, > >> >> > >> >> > > > > > > Ivan Pavlukhin > >> >> > >> >> > > > > > > > >> >> > >> >> > > > > > >> >> > >> >> > > > > > >> >> > >> >> > > > > > >> >> > >> >> > > > > -- > >> >> > >> >> > > > > Best regards, > >> >> > >> >> > > > > Ivan Pavlukhin > >> >> > >> >> > > > > > >> >> > >> >> > > > > >> >> > >> >> > > > >> >> > >> >> > > >> >> > >> >> > > >> >> > >> >> > -- > >> >> > >> >> > Best regards, > >> >> > >> >> > Andrey V. Mashenkov > >> >> > >> >> > > >> >> > >> >> > >> >> > >> > > >> >> > >> > > >> >> > >> > -- > >> >> > >> > Best regards, > >> >> > >> > Andrey V. Mashenkov > >> >> > >> > > >> >> > >> > >> >> > > > >> >> > > >> >> > -- > >> >> > Best regards, > >> >> > Andrey V. Mashenkov > >> >> > > >> >> > >> > >> > >> > >> > > > > > > |
Folks,
IEP is an Ignite-specific thing. In fact, I suppose that we are already doing it in ASF way by having this dev-list discussion =) As for me, implementing "limit" feature for text queries is not so big to make an IEP. But we might need to create one for next features. вт, 26 нояб. 2019 г. в 15:06, Ilya Kasnacheev <[hidden email]>: > > Hello! > > ASF way should probably start with an IEP :) > > Regards, > -- > Ilya Kasnacheev > > > вт, 26 нояб. 2019 г. в 14:12, Zhenya Stanilovsky <[hidden email] > >: > > > > > Ok, lets forgot Solr and go through ASF way, if Yuriy prove this > > functionality is helpful and PR it, why not ? > > > > isn`t it ? > > > > >Вторник, 26 ноября 2019, 14:06 +03:00 от Ilya Kasnacheev < > > [hidden email]>: > > > > > >Hello! > > > > > >The problem here is that Solr is a multi-year effort by a lot of people. > > We > > >can't match that. > > > > > >Maybe we could integrate with Solr/Solr Cloud instead, by feeding our > > cache > > >information into their storage for indexing and relying on their own > > >mechanisms for distributed IR sorting? > > > > > >Regards, > > >-- > > >Ilya Kasnacheev > > > > > > > > >вт, 26 нояб. 2019 г. в 13:59, Zhenya Stanilovsky < > > [hidden email] > > >>: > > > > > >> > > >> Ilya Kasnacheev, what a problem in Solr with Ignite functionality ? > > >> > > >> thanks ! > > >> > > >> >Вторник, 26 ноября 2019, 13:50 +03:00 от Ilya Kasnacheev < > > >> [hidden email] >: > > >> > > > >> >Hello! > > >> > > > >> >I have a hunch that we are trying to build Apache Solr (or Solr Cloud) > > >> into > > >> >Apache Ignite. I think that's a lot of effort that is not very > > justified. > > >> > > > >> >I don't think we should try to implement sorting in Apache Ignite, > > because > > >> >it is a lot of work, and a lot of code in our code base which we don't > > >> >really want. > > >> > > > >> >Regards, > > >> >-- > > >> >Ilya Kasnacheev > > >> > > > >> > > > >> >пт, 22 нояб. 2019 г. в 20:59, Yuriy Shuliga < [hidden email] >: > > >> > > > >> >> Dear Igniters, > > >> >> > > >> >> The first part of TextQuery improvement - a result limit - was > > developed > > >> >> and merged. > > >> >> Now we have to develop most important functionality here - proper > > >> sorting > > >> >> of Lucene index response and correct reducing of them for distributed > > >> >> queries. > > >> >> > > >> >> *There are two Lucene based aspects* > > >> >> > > >> >> 1. In case of using no sorting fields, the documents in response are > > >> still > > >> >> ordered by relevance. > > >> >> Actually this is ScoreDoc.score value. > > >> >> In order to reduce the distributed results correctly, the score > > should > > >> be > > >> >> passed with response. > > >> >> > > >> >> 2. When sorting by conventional fields, then Lucene should have these > > >> >> fields properly indexed and > > >> >> corresponding Sort object should be applied to Lucene's search call. > > >> >> In order to mark those fields a new annotation like '@SortField' may > > be > > >> >> introduced. > > >> >> > > >> >> *Reducing on Ignite * > > >> >> > > >> >> The obvious point of distributed response reduction is class > > >> >> GridCacheDistributedQueryFuture. > > >> >> Though, @Ivan Pavlukhin mentioned class with similar functionality: > > >> >> ReduceIndexSorted > > >> >> What I see here, that it is tangled with H2 related classes ( > > >> >> org.h2.result.Row) and might not be unified with TextQuery reduction. > > >> >> > > >> >> Still need a support here. > > >> >> > > >> >> Overall, the goal of this letter is to initiate discussion on > > TextQuery > > >> >> Sorting implementation and come closer to ticket creation. > > >> >> > > >> >> BR, > > >> >> Yuriy Shuliha > > >> >> > > >> >> вт, 22 жовт. 2019 о 13:31 Andrey Mashenkov < > > [hidden email] > > >> > > > >> >> пише: > > >> >> > > >> >> > Hi Dmitry, Yuriy. > > >> >> > > > >> >> > I've found GridCacheQueryFutureAdapter has newly added > > AtomicInteger > > >> >> > 'total' field and 'limit; field as primitive int. > > >> >> > > > >> >> > Both fields are used inside synchronized block only. > > >> >> > So, we can make both private and downgrade AtomicInteger to > > primitive > > >> >> int. > > >> >> > > > >> >> > Most likely, these fields can be replaced with one field. > > >> >> > > > >> >> > > > >> >> > > > >> >> > On Mon, Oct 21, 2019 at 10:01 PM Dmitriy Pavlov < > > [hidden email] > > >> > > > >> >> > wrote: > > >> >> > > > >> >> > > Hi Andrey, > > >> >> > > > > >> >> > > I've checked this ticket comments, and there is a TC Bot visa > > (with > > >> no > > >> >> > > blockers). > > >> >> > > > > >> >> > > Do you have any concerns related to this patch? > > >> >> > > > > >> >> > > Sincerely, > > >> >> > > Dmitriy Pavlov > > >> >> > > > > >> >> > > чт, 17 окт. 2019 г. в 16:43, Yuriy Shuliga < [hidden email] > > >: > > >> >> > > > > >> >> > >> Andrey, > > >> >> > >> > > >> >> > >> Per you request, I created ticket > > >> >> > >> https://issues.apache.org/jira/browse/IGNITE-12291 linked to > > >> >> > >> > > >> https://issues.apache.org/jira/projects/IGNITE/issues/IGNITE-12189 > > >> >> > >> > > >> >> > >> Could you please proceed with PR merge ? > > >> >> > >> > > >> >> > >> BR, > > >> >> > >> Yuriy Shuliha > > >> >> > >> > > >> >> > >> ср, 9 жовт. 2019 о 12:52 Andrey Mashenkov < > > >> [hidden email] > > >> >> > > > >> >> > >> пише: > > >> >> > >> > > >> >> > >> > Hi Yuri, > > >> >> > >> > > > >> >> > >> > To get access to TC Bot you should register as TeamCity user > > >> [1], if > > >> >> > you > > >> >> > >> > didn't do this already. > > >> >> > >> > Then you will be able to authorize on Ignite TC Bot page with > > >> same > > >> >> > >> > credentials. > > >> >> > >> > > > >> >> > >> > [1] https://ci.ignite.apache.org/registerUser.html > > >> >> > >> > > > >> >> > >> > On Fri, Oct 4, 2019 at 3:10 PM Yuriy Shuliga < > > [hidden email] > > >> > > > >> >> > wrote: > > >> >> > >> > > > >> >> > >> >> Andrew, > > >> >> > >> >> > > >> >> > >> >> I have corrected PR according to your notes. Please review. > > >> >> > >> >> What will be the next steps in order to merge in? > > >> >> > >> >> > > >> >> > >> >> Y. > > >> >> > >> >> > > >> >> > >> >> чт, 3 жовт. 2019 о 17:47 Andrey Mashenkov < > > >> >> > [hidden email] > > > >> >> > >> >> пише: > > >> >> > >> >> > > >> >> > >> >> > Yuri, > > >> >> > >> >> > > > >> >> > >> >> > I've done with review. > > >> >> > >> >> > No crime found, but trivial compatibility bug. > > >> >> > >> >> > > > >> >> > >> >> > On Thu, Oct 3, 2019 at 3:54 PM Yuriy Shuliga < > > >> [hidden email] > > > >> >> > >> wrote: > > >> >> > >> >> > > > >> >> > >> >> > > Denis, > > >> >> > >> >> > > > > >> >> > >> >> > > Thank you for your attention to this. > > >> >> > >> >> > > as for now, the > > >> >> > https://issues.apache.org/jira/browse/IGNITE-12189 > > >> >> > >> >> > ticket > > >> >> > >> >> > > is still pending review. > > >> >> > >> >> > > Do we have a chance to move it forward somehow? > > >> >> > >> >> > > > > >> >> > >> >> > > BR, > > >> >> > >> >> > > Yuriy Shuliha > > >> >> > >> >> > > > > >> >> > >> >> > > пн, 30 вер. 2019 о 23:35 Denis Magda < > > [hidden email] > > > >> пише: > > >> >> > >> >> > > > > >> >> > >> >> > > > Yuriy, > > >> >> > >> >> > > > > > >> >> > >> >> > > > I've seen you opening a pull-request with the first > > >> changes: > > >> >> > >> >> > > > https://issues.apache.org/jira/browse/IGNITE-12189 > > >> >> > >> >> > > > > > >> >> > >> >> > > > Alex Scherbakov and Ivan are you the right guys to do > > the > > >> >> > review? > > >> >> > >> >> > > > > > >> >> > >> >> > > > - > > >> >> > >> >> > > > Denis > > >> >> > >> >> > > > > > >> >> > >> >> > > > > > >> >> > >> >> > > > On Fri, Sep 27, 2019 at 8:48 AM Павлухин Иван < > > >> >> > >> [hidden email] > > > >> >> > >> >> > > wrote: > > >> >> > >> >> > > > > > >> >> > >> >> > > > > Yuriy, > > >> >> > >> >> > > > > > > >> >> > >> >> > > > > Thank you for providing details! Quite interesting. > > >> >> > >> >> > > > > > > >> >> > >> >> > > > > Yes, we already have support of distributed limit and > > >> >> merging > > >> >> > >> >> sorted > > >> >> > >> >> > > > > subresults for SQL queries. E.g. ReduceIndexSorted > > and > > >> >> > >> >> > > > > MergeStreamIterator are used for merging sorted > > streams. > > >> >> > >> >> > > > > > > >> >> > >> >> > > > > Could you please also clarify about score/relevance? > > Is > > >> it > > >> >> > >> >> provided > > >> >> > >> >> > by > > >> >> > >> >> > > > > Lucene engine for each query result? I am thinking > > how > > >> to > > >> >> do > > >> >> > >> >> sorted > > >> >> > >> >> > > > > merge properly in this case. > > >> >> > >> >> > > > > > > >> >> > >> >> > > > > ср, 25 сент. 2019 г. в 18:56, Yuriy Shuliga < > > >> >> > [hidden email] > > >> >> > >> >: > > >> >> > >> >> > > > > > > > >> >> > >> >> > > > > > Ivan, > > >> >> > >> >> > > > > > > > >> >> > >> >> > > > > > Thank you for interesting question! > > >> >> > >> >> > > > > > > > >> >> > >> >> > > > > > Text searches (or full text searches) are mostly > > >> >> > >> human-oriented. > > >> >> > >> >> > And > > >> >> > >> >> > > > the > > >> >> > >> >> > > > > > point of user's interest is topmost part of > > response. > > >> >> > >> >> > > > > > Then user can read it, evaluate and use the given > > >> records > > >> >> > for > > >> >> > >> >> > further > > >> >> > >> >> > > > > > purposes. > > >> >> > >> >> > > > > > > > >> >> > >> >> > > > > > Particularly in our case, we use Ignite for > > operations > > >> >> with > > >> >> > >> >> > financial > > >> >> > >> >> > > > > data, > > >> >> > >> >> > > > > > and there lots of text stuff like assets names, > > fin. > > >> >> > >> >> instruments, > > >> >> > >> >> > > > > companies > > >> >> > >> >> > > > > > etc. > > >> >> > >> >> > > > > > In order to operate with this quickly and reliably, > > >> users > > >> >> > >> used > > >> >> > >> >> to > > >> >> > >> >> > > work > > >> >> > >> >> > > > > with > > >> >> > >> >> > > > > > text search, type-ahead completions, suggestions. > > >> >> > >> >> > > > > > > > >> >> > >> >> > > > > > For this purposes we are indexing particular string > > >> data > > >> >> in > > >> >> > >> >> > separate > > >> >> > >> >> > > > > caches. > > >> >> > >> >> > > > > > > > >> >> > >> >> > > > > > Sorting capabilities and response size limitations > > are > > >> >> very > > >> >> > >> >> > important > > >> >> > >> >> > > > > > there. As our API have to provide most relevant > > >> >> information > > >> >> > >> in > > >> >> > >> >> view > > >> >> > >> >> > > of > > >> >> > >> >> > > > > > limited size. > > >> >> > >> >> > > > > > > > >> >> > >> >> > > > > > Now let me comment some Ignite/Lucene perspective. > > >> >> > >> >> > > > > > Actually Ignite queries and Lucene returns > > >> >> > >> *TopDocs.scoresDocs > > >> >> > >> >> > > *already > > >> >> > >> >> > > > > > sorted by *score *(relevance). So most relevant > > >> documents > > >> >> > >> are on > > >> >> > >> >> > the > > >> >> > >> >> > > > top. > > >> >> > >> >> > > > > > And currently distributed queries responses from > > >> >> different > > >> >> > >> nodes > > >> >> > >> >> > are > > >> >> > >> >> > > > > merged > > >> >> > >> >> > > > > > into final query cursor queue in arbitrary way. > > >> >> > >> >> > > > > > So in fact we already have the score order ruined > > >> here. > > >> >> > Also > > >> >> > >> >> Ignite > > >> >> > >> >> > > > > > requests all possible documents from Lucene that is > > >> >> > redundant > > >> >> > >> >> and > > >> >> > >> >> > not > > >> >> > >> >> > > > > good > > >> >> > >> >> > > > > > for performance. > > >> >> > >> >> > > > > > > > >> >> > >> >> > > > > > I'm implementing *limit* parameter to be part of > > >> >> *TextQuery > > >> >> > >> *and > > >> >> > >> >> > have > > >> >> > >> >> > > > to > > >> >> > >> >> > > > > > notice that we still have to add sorting for text > > >> queries > > >> >> > >> >> > processing > > >> >> > >> >> > > in > > >> >> > >> >> > > > > > order to have applicable results. > > >> >> > >> >> > > > > > > > >> >> > >> >> > > > > > *Limit* parameter itself should improve the part of > > >> >> issues > > >> >> > >> from > > >> >> > >> >> > > above, > > >> >> > >> >> > > > > but > > >> >> > >> >> > > > > > definitely, sorting by document score at least > > should > > >> be > > >> >> > >> >> > implemented > > >> >> > >> >> > > > > along > > >> >> > >> >> > > > > > with limit. > > >> >> > >> >> > > > > > > > >> >> > >> >> > > > > > This is a pretty short commentary if you still have > > >> any > > >> >> > >> >> questions, > > >> >> > >> >> > > > please > > >> >> > >> >> > > > > > ask, do not hesitate) > > >> >> > >> >> > > > > > > > >> >> > >> >> > > > > > BR, > > >> >> > >> >> > > > > > Yuriy Shuliha > > >> >> > >> >> > > > > > > > >> >> > >> >> > > > > > чт, 19 вер. 2019 о 11:38 Павлухин Иван < > > >> >> > [hidden email] > > > >> >> > >> >> пише: > > >> >> > >> >> > > > > > > > >> >> > >> >> > > > > > > Yuriy, > > >> >> > >> >> > > > > > > > > >> >> > >> >> > > > > > > Greatly appreciate your interest. > > >> >> > >> >> > > > > > > > > >> >> > >> >> > > > > > > Could you please elaborate a little bit about > > >> sorting? > > >> >> > What > > >> >> > >> >> tasks > > >> >> > >> >> > > > does > > >> >> > >> >> > > > > > > it help to solve and how? It would be great to > > >> provide > > >> >> an > > >> >> > >> >> > example. > > >> >> > >> >> > > > > > > > > >> >> > >> >> > > > > > > ср, 18 сент. 2019 г. в 09:39, Alexei Scherbakov < > > >> >> > >> >> > > > > > > [hidden email] >: > > >> >> > >> >> > > > > > > > > > >> >> > >> >> > > > > > > > Denis, > > >> >> > >> >> > > > > > > > > > >> >> > >> >> > > > > > > > I like the idea of throwing an exception for > > >> enabled > > >> >> > text > > >> >> > >> >> > queries > > >> >> > >> >> > > > on > > >> >> > >> >> > > > > > > > persistent caches. > > >> >> > >> >> > > > > > > > > > >> >> > >> >> > > > > > > > Also I'm fine with proposed limit for unsorted > > >> >> > searches. > > >> >> > >> >> > > > > > > > > > >> >> > >> >> > > > > > > > Yury, please proceed with ticket creation. > > >> >> > >> >> > > > > > > > > > >> >> > >> >> > > > > > > > вт, 17 сент. 2019 г., 22:06 Denis Magda < > > >> >> > >> [hidden email] > > >> >> > >> >> >: > > >> >> > >> >> > > > > > > > > > >> >> > >> >> > > > > > > > > Igniters, > > >> >> > >> >> > > > > > > > > > > >> >> > >> >> > > > > > > > > I see nothing wrong with Yury's proposal in > > >> regards > > >> >> > >> >> full-text > > >> >> > >> >> > > > > search > > >> >> > >> >> > > > > > > API > > >> >> > >> >> > > > > > > > > evolution as long as Yury is ready to push it > > >> >> > forward. > > >> >> > >> >> > > > > > > > > > > >> >> > >> >> > > > > > > > > As for the in-memory mode only, it makes > > total > > >> >> sense > > >> >> > >> for > > >> >> > >> >> > > > in-memory > > >> >> > >> >> > > > > data > > >> >> > >> >> > > > > > > > > grid deployments when Ignite caches data of > > an > > >> >> > >> underlying > > >> >> > >> >> DB > > >> >> > >> >> > > like > > >> >> > >> >> > > > > > > Postgres. > > >> >> > >> >> > > > > > > > > As part of the changes, I would simply throw > > an > > >> >> > >> exception > > >> >> > >> >> (by > > >> >> > >> >> > > > > default) > > >> >> > >> >> > > > > > > if > > >> >> > >> >> > > > > > > > > the one attempts to use text indices with the > > >> >> native > > >> >> > >> >> > > persistence > > >> >> > >> >> > > > > > > enabled. > > >> >> > >> >> > > > > > > > > If the person is ready to live with that > > >> limitation > > >> >> > >> that > > >> >> > >> >> an > > >> >> > >> >> > > > > explicit > > >> >> > >> >> > > > > > > > > configuration change is needed to come around > > >> the > > >> >> > >> >> exception. > > >> >> > >> >> > > > > > > > > > > >> >> > >> >> > > > > > > > > Thoughts? > > >> >> > >> >> > > > > > > > > > > >> >> > >> >> > > > > > > > > > > >> >> > >> >> > > > > > > > > - > > >> >> > >> >> > > > > > > > > Denis > > >> >> > >> >> > > > > > > > > > > >> >> > >> >> > > > > > > > > > > >> >> > >> >> > > > > > > > > On Tue, Sep 17, 2019 at 7:44 AM Yuriy > > Shuliga < > > >> >> > >> >> > > [hidden email] > > >> >> > >> >> > > > > > > >> >> > >> >> > > > > > > wrote: > > >> >> > >> >> > > > > > > > > > > >> >> > >> >> > > > > > > > > > Hello to all again, > > >> >> > >> >> > > > > > > > > > > > >> >> > >> >> > > > > > > > > > Thank you for important comments and notes > > >> given > > >> >> > >> below! > > >> >> > >> >> > > > > > > > > > > > >> >> > >> >> > > > > > > > > > Let me answer and continue the discussion. > > >> >> > >> >> > > > > > > > > > > > >> >> > >> >> > > > > > > > > > (I) Overall needs in Lucene indexing > > >> >> > >> >> > > > > > > > > > > > >> >> > >> >> > > > > > > > > > Alexei has referenced to > > >> >> > >> >> > > > > > > > > > > > >> >> https://issues.apache.org/jira/browse/IGNITE-5371 > > >> >> > >> where > > >> >> > >> >> > > > > > > > > > absence of index persistence was declared > > as > > >> an > > >> >> > >> >> obstacle to > > >> >> > >> >> > > > > further > > >> >> > >> >> > > > > > > > > > development. > > >> >> > >> >> > > > > > > > > > > > >> >> > >> >> > > > > > > > > > a) This ticket is already closed as not > > >> valid.b) > > >> >> > >> There > > >> >> > >> >> are > > >> >> > >> >> > > > > definite > > >> >> > >> >> > > > > > > needs > > >> >> > >> >> > > > > > > > > > (and in our project as well) in just > > in-memory > > >> >> > >> indexing > > >> >> > >> >> of > > >> >> > >> >> > > > > selected > > >> >> > >> >> > > > > > > data. > > >> >> > >> >> > > > > > > > > > We intend to use search capabilities for > > >> fetching > > >> >> > >> >> limited > > >> >> > >> >> > > > amount > > >> >> > >> >> > > > > of > > >> >> > >> >> > > > > > > > > records > > >> >> > >> >> > > > > > > > > > that should be used in type-ahead search / > > >> >> > >> suggestions. > > >> >> > >> >> > > > > > > > > > Not all of the data will be indexed and the > > >> are > > >> >> no > > >> >> > >> need > > >> >> > >> >> in > > >> >> > >> >> > > > Lucene > > >> >> > >> >> > > > > > > index > > >> >> > >> >> > > > > > > > > to > > >> >> > >> >> > > > > > > > > > be persistence. Hope this is a wide > > pattern of > > >> >> > >> >> text-search > > >> >> > >> >> > > > usage. > > >> >> > >> >> > > > > > > > > > > > >> >> > >> >> > > > > > > > > > (II) Necessary fixes in current > > >> implementation. > > >> >> > >> >> > > > > > > > > > > > >> >> > >> >> > > > > > > > > > a) Implementation of correct *limit > > *(*offset* > > >> >> > seems > > >> >> > >> to > > >> >> > >> >> be > > >> >> > >> >> > > not > > >> >> > >> >> > > > > > > required > > >> >> > >> >> > > > > > > > > in > > >> >> > >> >> > > > > > > > > > text-search tasks for now) > > >> >> > >> >> > > > > > > > > > I have investigated the data flow for > > >> distributed > > >> >> > >> text > > >> >> > >> >> > > queries. > > >> >> > >> >> > > > > it > > >> >> > >> >> > > > > > > was > > >> >> > >> >> > > > > > > > > > simple test prefix query, like > > 'name'*='ene*'* > > >> >> > >> >> > > > > > > > > > For now each server-node returns all > > response > > >> >> > >> records to > > >> >> > >> >> > the > > >> >> > >> >> > > > > > > client-node > > >> >> > >> >> > > > > > > > > > and it may contain ~thousands, ~hundred > > >> thousands > > >> >> > >> >> records. > > >> >> > >> >> > > > > > > > > > Event if we need only first 10-100. Again, > > all > > >> >> the > > >> >> > >> >> results > > >> >> > >> >> > > are > > >> >> > >> >> > > > > added > > >> >> > >> >> > > > > > > to > > >> >> > >> >> > > > > > > > > > queue in GridCacheQueryFutureAdapter in > > >> arbitrary > > >> >> > >> order > > >> >> > >> >> by > > >> >> > >> >> > > > pages. > > >> >> > >> >> > > > > > > > > > I did not find here any means to deliver > > >> >> > >> deterministic > > >> >> > >> >> > > result. > > >> >> > >> >> > > > > > > > > > So implementing limit as part of query and > > >> >> > >> >> > > > > (GridCacheQueryRequest) > > >> >> > >> >> > > > > > > will > > >> >> > >> >> > > > > > > > > not > > >> >> > >> >> > > > > > > > > > change the nature of response but will > > limit > > >> load > > >> >> > on > > >> >> > >> >> nodes > > >> >> > >> >> > > and > > >> >> > >> >> > > > > > > > > networking. > > >> >> > >> >> > > > > > > > > > > > >> >> > >> >> > > > > > > > > > Can we consider to open a ticket for this? > > >> >> > >> >> > > > > > > > > > > > >> >> > >> >> > > > > > > > > > (III) Further extension of Lucene API > > >> exposition > > >> >> to > > >> >> > >> >> Ignite > > >> >> > >> >> > > > > > > > > > > > >> >> > >> >> > > > > > > > > > a) Sorting > > >> >> > >> >> > > > > > > > > > The solution for this could be: > > >> >> > >> >> > > > > > > > > > - Make entities comparable > > >> >> > >> >> > > > > > > > > > - Add custom comparator to entity > > >> >> > >> >> > > > > > > > > > - Add annotations to mark sorted fields for > > >> >> Lucene > > >> >> > >> >> indexing > > >> >> > >> >> > > > > > > > > > - Use comparators when merging responses or > > >> >> > reducing > > >> >> > >> to > > >> >> > >> >> > > desired > > >> >> > >> >> > > > > > > limit on > > >> >> > >> >> > > > > > > > > > client node. > > >> >> > >> >> > > > > > > > > > Will require full result set to be loaded > > into > > >> >> > >> memory. > > >> >> > >> >> > Though > > >> >> > >> >> > > > > can be > > >> >> > >> >> > > > > > > used > > >> >> > >> >> > > > > > > > > > for relatively small limits. > > >> >> > >> >> > > > > > > > > > BR, > > >> >> > >> >> > > > > > > > > > Yuriy Shuliha > > >> >> > >> >> > > > > > > > > > > > >> >> > >> >> > > > > > > > > > пт, 30 серп. 2019 о 10:37 Alexei > > Scherbakov < > > >> >> > >> >> > > > > > > > > [hidden email] > > > >> >> > >> >> > > > > > > > > > пише: > > >> >> > >> >> > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > Yuriy, > > >> >> > >> >> > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > Note what one of major blockers for text > > >> >> queries > > >> >> > is > > >> >> > >> >> [1] > > >> >> > >> >> > > which > > >> >> > >> >> > > > > makes > > >> >> > >> >> > > > > > > > > > lucene > > >> >> > >> >> > > > > > > > > > > indexes unusable with persistence and > > main > > >> >> reason > > >> >> > >> for > > >> >> > >> >> > > > > > > discontinuation. > > >> >> > >> >> > > > > > > > > > > Probably it's should be addressed first > > to > > >> make > > >> >> > >> text > > >> >> > >> >> > > queries > > >> >> > >> >> > > > a > > >> >> > >> >> > > > > > > valid > > >> >> > >> >> > > > > > > > > > > product feature. > > >> >> > >> >> > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > Distributed sorting and advanved > > querying is > > >> >> > indeed > > >> >> > >> >> not a > > >> >> > >> >> > > > > trivial > > >> >> > >> >> > > > > > > task. > > >> >> > >> >> > > > > > > > > > > Some kind of merging must be implemented > > on > > >> >> query > > >> >> > >> >> > > originating > > >> >> > >> >> > > > > node. > > >> >> > >> >> > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > [1] > > >> >> > >> https://issues.apache.org/jira/browse/IGNITE-5371 > > >> >> > >> >> > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > чт, 29 авг. 2019 г. в 23:38, Denis Magda > > < > > >> >> > >> >> > > [hidden email] > > >> >> > >> >> > > > >: > > >> >> > >> >> > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > Yuriy, > > >> >> > >> >> > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > If you are ready to take over the > > >> full-text > > >> >> > >> search > > >> >> > >> >> > > indexes > > >> >> > >> >> > > > > then > > >> >> > >> >> > > > > > > > > please > > >> >> > >> >> > > > > > > > > > go > > >> >> > >> >> > > > > > > > > > > > ahead. The primary reason why the > > >> community > > >> >> > >> wants to > > >> >> > >> >> > > > > discontinue > > >> >> > >> >> > > > > > > them > > >> >> > >> >> > > > > > > > > > > first > > >> >> > >> >> > > > > > > > > > > > (and, probable, resurrect later) are > > the > > >> >> > >> limitations > > >> >> > >> >> > > listed > > >> >> > >> >> > > > > by > > >> >> > >> >> > > > > > > Andrey > > >> >> > >> >> > > > > > > > > > and > > >> >> > >> >> > > > > > > > > > > > minimal support from the community end. > > >> >> > >> >> > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > - > > >> >> > >> >> > > > > > > > > > > > Denis > > >> >> > >> >> > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > On Thu, Aug 29, 2019 at 1:29 PM Andrey > > >> >> > Mashenkov > > >> >> > >> < > > >> >> > >> >> > > > > > > > > > > > [hidden email] > > > >> >> > >> >> > > > > > > > > > > > wrote: > > >> >> > >> >> > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > Hi Yuriy, > > >> >> > >> >> > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > Unfortunatelly, there is a plan to > > >> >> > discontinue > > >> >> > >> >> > > > TextQueries > > >> >> > >> >> > > > > in > > >> >> > >> >> > > > > > > > > Ignite > > >> >> > >> >> > > > > > > > > > > [1]. > > >> >> > >> >> > > > > > > > > > > > > Motivation here is text indexes are > > not > > >> >> > >> >> persistent, > > >> >> > >> >> > not > > >> >> > >> >> > > > > > > > > transactional > > >> >> > >> >> > > > > > > > > > > and > > >> >> > >> >> > > > > > > > > > > > > can't be user together with SQL or > > >> inside > > >> >> > SQL. > > >> >> > >> >> > > > > > > > > > > > > and there is a lack of interest from > > >> >> > community > > >> >> > >> >> side. > > >> >> > >> >> > > > > > > > > > > > > You are weclome to take on these > > issues > > >> and > > >> >> > >> make > > >> >> > >> >> > > > > TextQueries > > >> >> > >> >> > > > > > > great. > > >> >> > >> >> > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > 1, PageSize can't be used to limit > > >> >> > resultset. > > >> >> > >> >> > > > > > > > > > > > > Query results return from data node > > to > > >> >> > >> client-side > > >> >> > >> >> > > cursor > > >> >> > >> >> > > > > in > > >> >> > >> >> > > > > > > > > > > page-by-page > > >> >> > >> >> > > > > > > > > > > > > manner and > > >> >> > >> >> > > > > > > > > > > > > this parameter is designed control > > page > > >> >> size. > > >> >> > >> It > > >> >> > >> >> is > > >> >> > >> >> > > > > supposed > > >> >> > >> >> > > > > > > query > > >> >> > >> >> > > > > > > > > > > > executes > > >> >> > >> >> > > > > > > > > > > > > lazily on server side and > > >> >> > >> >> > > > > > > > > > > > > it is not excepted full resultset be > > >> loaded > > >> >> > to > > >> >> > >> >> memory > > >> >> > >> >> > > on > > >> >> > >> >> > > > > server > > >> >> > >> >> > > > > > > > > side > > >> >> > >> >> > > > > > > > > > at > > >> >> > >> >> > > > > > > > > > > > > once, but by pages. > > >> >> > >> >> > > > > > > > > > > > > Do you mean you found Lucene load > > entire > > >> >> > >> resultset > > >> >> > >> >> > into > > >> >> > >> >> > > > > memory > > >> >> > >> >> > > > > > > > > before > > >> >> > >> >> > > > > > > > > > > > first > > >> >> > >> >> > > > > > > > > > > > > page is sent to client? > > >> >> > >> >> > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > I'd think a new parameter should be > > >> added > > >> >> to > > >> >> > >> limit > > >> >> > >> >> > > > result. > > >> >> > >> >> > > > > The > > >> >> > >> >> > > > > > > best > > >> >> > >> >> > > > > > > > > > > > > solution is to use query language > > >> commands > > >> >> > for > > >> >> > >> >> this, > > >> >> > >> >> > > e.g. > > >> >> > >> >> > > > > > > > > > > "LIMIT/OFFSET" > > >> >> > >> >> > > > > > > > > > > > in > > >> >> > >> >> > > > > > > > > > > > > SQL. > > >> >> > >> >> > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > This task doesn't look trivial. > > Query is > > >> >> > >> >> distributed > > >> >> > >> >> > > > > operation > > >> >> > >> >> > > > > > > and > > >> >> > >> >> > > > > > > > > > same > > >> >> > >> >> > > > > > > > > > > > > user query will be executed on data > > >> nodes > > >> >> > >> >> > > > > > > > > > > > > and then results from all nodes > > should > > >> be > > >> >> > >> correcly > > >> >> > >> >> > > merged > > >> >> > >> >> > > > > > > before > > >> >> > >> >> > > > > > > > > > being > > >> >> > >> >> > > > > > > > > > > > > returned via client-cursor. > > >> >> > >> >> > > > > > > > > > > > > So, LIMIT should be applied on every > > >> node > > >> >> and > > >> >> > >> >> then on > > >> >> > >> >> > > > merge > > >> >> > >> >> > > > > > > phase. > > >> >> > >> >> > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > Also, this may be non-obviuos, > > limiting > > >> >> > results > > >> >> > >> >> make > > >> >> > >> >> > no > > >> >> > >> >> > > > > sence > > >> >> > >> >> > > > > > > > > without > > >> >> > >> >> > > > > > > > > > > > > sorting, > > >> >> > >> >> > > > > > > > > > > > > as there is no guarantee every next > > >> query > > >> >> run > > >> >> > >> will > > >> >> > >> >> > > return > > >> >> > >> >> > > > > same > > >> >> > >> >> > > > > > > data > > >> >> > >> >> > > > > > > > > > > > because > > >> >> > >> >> > > > > > > > > > > > > of page reordeing. > > >> >> > >> >> > > > > > > > > > > > > Basically, merge phase receive > > results > > >> from > > >> >> > >> data > > >> >> > >> >> > nodes > > >> >> > >> >> > > > > > > > > asynchronously > > >> >> > >> >> > > > > > > > > > > and > > >> >> > >> >> > > > > > > > > > > > > messages from different nodes can't > > be > > >> >> > ordered. > > >> >> > >> >> > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > 2. > > >> >> > >> >> > > > > > > > > > > > > a. "tokenize" param name (for > > >> >> > @QueryTextFiled) > > >> >> > >> >> looks > > >> >> > >> >> > > more > > >> >> > >> >> > > > > > > verbose, > > >> >> > >> >> > > > > > > > > > > isn't > > >> >> > >> >> > > > > > > > > > > > > it. > > >> >> > >> >> > > > > > > > > > > > > b,c. What about distributed query? > > How > > >> >> > partial > > >> >> > >> >> > results > > >> >> > >> >> > > > from > > >> >> > >> >> > > > > > > nodes > > >> >> > >> >> > > > > > > > > > will > > >> >> > >> >> > > > > > > > > > > be > > >> >> > >> >> > > > > > > > > > > > > merged? > > >> >> > >> >> > > > > > > > > > > > > Does Lucene allows to configure > > >> comparator > > >> >> > for > > >> >> > >> >> data > > >> >> > >> >> > > > > sorting? > > >> >> > >> >> > > > > > > > > > > > > What comparator Ignite should choose > > to > > >> >> sort > > >> >> > >> >> result > > >> >> > >> >> > on > > >> >> > >> >> > > > > merge > > >> >> > >> >> > > > > > > phase? > > >> >> > >> >> > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > 3. For now Lucene engine is not > > >> >> configurable > > >> >> > at > > >> >> > >> >> all. > > >> >> > >> >> > > E.g. > > >> >> > >> >> > > > > it is > > >> >> > >> >> > > > > > > > > > > > impossible > > >> >> > >> >> > > > > > > > > > > > > to configure Tokenizer. > > >> >> > >> >> > > > > > > > > > > > > I'd think about possible ways to > > >> configure > > >> >> > >> engine > > >> >> > >> >> at > > >> >> > >> >> > > > first > > >> >> > >> >> > > > > and > > >> >> > >> >> > > > > > > only > > >> >> > >> >> > > > > > > > > > > then > > >> >> > >> >> > > > > > > > > > > > go > > >> >> > >> >> > > > > > > > > > > > > further to discuss\implement complex > > >> >> > features, > > >> >> > >> >> > > > > > > > > > > > > that may depends on engine config. > > >> >> > >> >> > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > On Thu, Aug 29, 2019 at 8:17 PM Yuriy > > >> >> > Shuliga < > > >> >> > >> >> > > > > > > [hidden email] > > > >> >> > >> >> > > > > > > > > > > wrote: > > >> >> > >> >> > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > Dear community, > > >> >> > >> >> > > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > By starting this chain I'd like to > > >> open > > >> >> > >> >> discussion > > >> >> > >> >> > > that > > >> >> > >> >> > > > > would > > >> >> > >> >> > > > > > > > > come > > >> >> > >> >> > > > > > > > > > to > > >> >> > >> >> > > > > > > > > > > > > > contribution results in subj. area. > > >> >> > >> >> > > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > Ignite has indexing capabilities, > > >> backed > > >> >> up > > >> >> > >> by > > >> >> > >> >> > > > different > > >> >> > >> >> > > > > > > > > > mechanisms, > > >> >> > >> >> > > > > > > > > > > > > > including Lucene. > > >> >> > >> >> > > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > Currently, Lucene 7.5.0 is used > > (past > > >> >> year > > >> >> > >> >> > release). > > >> >> > >> >> > > > > > > > > > > > > > This is a wide spread and mature > > >> >> technology > > >> >> > >> that > > >> >> > >> >> > > covers > > >> >> > >> >> > > > > text > > >> >> > >> >> > > > > > > > > search > > >> >> > >> >> > > > > > > > > > > > area > > >> >> > >> >> > > > > > > > > > > > > > and beyond (e.g. spacial data > > >> indexing). > > >> >> > >> >> > > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > My goal is to *expose more Lucene > > >> >> > >> functionality > > >> >> > >> >> to > > >> >> > >> >> > > > Ignite > > >> >> > >> >> > > > > > > > > indexing > > >> >> > >> >> > > > > > > > > > > and > > >> >> > >> >> > > > > > > > > > > > > > query mechanisms for text data*. > > >> >> > >> >> > > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > It's quite simple request at > > current > > >> >> stage. > > >> >> > >> It > > >> >> > >> >> is > > >> >> > >> >> > > > coming > > >> >> > >> >> > > > > > > from our > > >> >> > >> >> > > > > > > > > > > > > project's > > >> >> > >> >> > > > > > > > > > > > > > needs, but i believe, will be > > useful > > >> for > > >> >> a > > >> >> > >> lot > > >> >> > >> >> more > > >> >> > >> >> > > > > people. > > >> >> > >> >> > > > > > > > > > > > > > Let's walk through and vote or > > discuss > > >> >> > about > > >> >> > >> >> Jira > > >> >> > >> >> > > > > tickets for > > >> >> > >> >> > > > > > > > > them. > > >> >> > >> >> > > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > 1.[trivial] Use > > >> dataQuery.getPageSize() > > >> >> > to > > >> >> > >> >> limit > > >> >> > >> >> > > > search > > >> >> > >> >> > > > > > > > > response > > >> >> > >> >> > > > > > > > > > > > items > > >> >> > >> >> > > > > > > > > > > > > > inside GridLuceneIndex.query(). > > >> Currently > > >> >> > it > > >> >> > >> is > > >> >> > >> >> > > calling > > >> >> > >> >> > > > > > > > > > > > > > IndexSearcher.search(query, > > >> >> > >> >> *Integer.MAX_VALUE*) - > > >> >> > >> >> > so > > >> >> > >> >> > > > > > > basically > > >> >> > >> >> > > > > > > > > all > > >> >> > >> >> > > > > > > > > > > > > scored > > >> >> > >> >> > > > > > > > > > > > > > matches will me returned, what we > > do > > >> not > > >> >> > >> need in > > >> >> > >> >> > most > > >> >> > >> >> > > > > cases. > > >> >> > >> >> > > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > 2.[simple] Add sorting. Then more > > >> >> capable > > >> >> > >> >> search > > >> >> > >> >> > > call > > >> >> > >> >> > > > > can be > > >> >> > >> >> > > > > > > > > > > > > > executed: > > *IndexSearcher.search(query, > > >> >> > count, > > >> >> > >> >> > > > > > > > > > > > > > sort) * > > >> >> > >> >> > > > > > > > > > > > > > Implementation steps: > > >> >> > >> >> > > > > > > > > > > > > > a) Introduce boolean *sortField* > > >> >> parameter > > >> >> > in > > >> >> > >> >> > > > > > > *@QueryTextFiled * > > >> >> > >> >> > > > > > > > > > > > > > annotation. If > > >> >> > >> >> > > > > > > > > > > > > > *true *the filed will be indexed > > but > > >> not > > >> >> > >> >> tokenized. > > >> >> > >> >> > > > > Number > > >> >> > >> >> > > > > > > types > > >> >> > >> >> > > > > > > > > > are > > >> >> > >> >> > > > > > > > > > > > > > preferred here. > > >> >> > >> >> > > > > > > > > > > > > > b) Add *sort* collection to > > >> *TextQuery* > > >> >> > >> >> > constructor. > > >> >> > >> >> > > It > > >> >> > >> >> > > > > > > should > > >> >> > >> >> > > > > > > > > > define > > >> >> > >> >> > > > > > > > > > > > > > desired sort fields used for > > querying. > > >> >> > >> >> > > > > > > > > > > > > > c) Implement Lucene sort usage in > > >> >> > >> >> > > > > GridLuceneIndex.query(). > > >> >> > >> >> > > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > 3.[moderate] Build complex queries > > >> with > > >> >> > >> >> > *TextQuery*, > > >> >> > >> >> > > > > > > including > > >> >> > >> >> > > > > > > > > > > > > > terms/queries boosting. > > >> >> > >> >> > > > > > > > > > > > > > *This section for voting only, as > > >> >> requires > > >> >> > >> more > > >> >> > >> >> > > > detailed > > >> >> > >> >> > > > > > > work. > > >> >> > >> >> > > > > > > > > > Should > > >> >> > >> >> > > > > > > > > > > > be > > >> >> > >> >> > > > > > > > > > > > > > extended if community is > > interested in > > >> >> it.* > > >> >> > >> >> > > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > Looking forward to your comments! > > >> >> > >> >> > > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > BR, > > >> >> > >> >> > > > > > > > > > > > > > Yuriy Shuliha > > >> >> > >> >> > > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > -- > > >> >> > >> >> > > > > > > > > > > > > Best regards, > > >> >> > >> >> > > > > > > > > > > > > Andrey V. Mashenkov > > >> >> > >> >> > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > -- > > >> >> > >> >> > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > Best regards, > > >> >> > >> >> > > > > > > > > > > Alexei Scherbakov > > >> >> > >> >> > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > >> >> > >> >> > > > > > > > > >> >> > >> >> > > > > > > > > >> >> > >> >> > > > > > > > > >> >> > >> >> > > > > > > -- > > >> >> > >> >> > > > > > > Best regards, > > >> >> > >> >> > > > > > > Ivan Pavlukhin > > >> >> > >> >> > > > > > > > > >> >> > >> >> > > > > > > >> >> > >> >> > > > > > > >> >> > >> >> > > > > > > >> >> > >> >> > > > > -- > > >> >> > >> >> > > > > Best regards, > > >> >> > >> >> > > > > Ivan Pavlukhin > > >> >> > >> >> > > > > > > >> >> > >> >> > > > > > >> >> > >> >> > > > > >> >> > >> >> > > > >> >> > >> >> > > > >> >> > >> >> > -- > > >> >> > >> >> > Best regards, > > >> >> > >> >> > Andrey V. Mashenkov > > >> >> > >> >> > > > >> >> > >> >> > > >> >> > >> > > > >> >> > >> > > > >> >> > >> > -- > > >> >> > >> > Best regards, > > >> >> > >> > Andrey V. Mashenkov > > >> >> > >> > > > >> >> > >> > > >> >> > > > > >> >> > > > >> >> > -- > > >> >> > Best regards, > > >> >> > Andrey V. Mashenkov > > >> >> > > > >> >> > > >> > > >> > > >> > > >> > > > > > > > > > > > -- Best regards, Ivan Pavlukhin |
I don't see anything wrong if Yuriy is willing to carry on and keep
enhancing our full-text search support that lacks basic capabilities. The basics should be available. If anybody needs an advanced feature they can introduce Solr or ElastiSearch into the final architecture of the app. Folks, who of us can help Yuriy with the questions asked? Most like the SQL experts are the best candidates here. - Denis On Tue, Nov 26, 2019 at 8:52 AM Ivan Pavlukhin <[hidden email]> wrote: > Folks, > > IEP is an Ignite-specific thing. In fact, I suppose that we are > already doing it in ASF way by having this dev-list discussion =) > > As for me, implementing "limit" feature for text queries is not so big > to make an IEP. But we might need to create one for next features. > > вт, 26 нояб. 2019 г. в 15:06, Ilya Kasnacheev <[hidden email]>: > > > > Hello! > > > > ASF way should probably start with an IEP :) > > > > Regards, > > -- > > Ilya Kasnacheev > > > > > > вт, 26 нояб. 2019 г. в 14:12, Zhenya Stanilovsky > <[hidden email] > > >: > > > > > > > > Ok, lets forgot Solr and go through ASF way, if Yuriy prove this > > > functionality is helpful and PR it, why not ? > > > > > > isn`t it ? > > > > > > >Вторник, 26 ноября 2019, 14:06 +03:00 от Ilya Kasnacheev < > > > [hidden email]>: > > > > > > > >Hello! > > > > > > > >The problem here is that Solr is a multi-year effort by a lot of > people. > > > We > > > >can't match that. > > > > > > > >Maybe we could integrate with Solr/Solr Cloud instead, by feeding our > > > cache > > > >information into their storage for indexing and relying on their own > > > >mechanisms for distributed IR sorting? > > > > > > > >Regards, > > > >-- > > > >Ilya Kasnacheev > > > > > > > > > > > >вт, 26 нояб. 2019 г. в 13:59, Zhenya Stanilovsky < > > > [hidden email] > > > >>: > > > > > > > >> > > > >> Ilya Kasnacheev, what a problem in Solr with Ignite functionality ? > > > >> > > > >> thanks ! > > > >> > > > >> >Вторник, 26 ноября 2019, 13:50 +03:00 от Ilya Kasnacheev < > > > >> [hidden email] >: > > > >> > > > > >> >Hello! > > > >> > > > > >> >I have a hunch that we are trying to build Apache Solr (or Solr > Cloud) > > > >> into > > > >> >Apache Ignite. I think that's a lot of effort that is not very > > > justified. > > > >> > > > > >> >I don't think we should try to implement sorting in Apache Ignite, > > > because > > > >> >it is a lot of work, and a lot of code in our code base which we > don't > > > >> >really want. > > > >> > > > > >> >Regards, > > > >> >-- > > > >> >Ilya Kasnacheev > > > >> > > > > >> > > > > >> >пт, 22 нояб. 2019 г. в 20:59, Yuriy Shuliga < [hidden email] > >: > > > >> > > > > >> >> Dear Igniters, > > > >> >> > > > >> >> The first part of TextQuery improvement - a result limit - was > > > developed > > > >> >> and merged. > > > >> >> Now we have to develop most important functionality here - proper > > > >> sorting > > > >> >> of Lucene index response and correct reducing of them for > distributed > > > >> >> queries. > > > >> >> > > > >> >> *There are two Lucene based aspects* > > > >> >> > > > >> >> 1. In case of using no sorting fields, the documents in response > are > > > >> still > > > >> >> ordered by relevance. > > > >> >> Actually this is ScoreDoc.score value. > > > >> >> In order to reduce the distributed results correctly, the score > > > should > > > >> be > > > >> >> passed with response. > > > >> >> > > > >> >> 2. When sorting by conventional fields, then Lucene should have > these > > > >> >> fields properly indexed and > > > >> >> corresponding Sort object should be applied to Lucene's search > call. > > > >> >> In order to mark those fields a new annotation like '@SortField' > may > > > be > > > >> >> introduced. > > > >> >> > > > >> >> *Reducing on Ignite * > > > >> >> > > > >> >> The obvious point of distributed response reduction is class > > > >> >> GridCacheDistributedQueryFuture. > > > >> >> Though, @Ivan Pavlukhin mentioned class with similar > functionality: > > > >> >> ReduceIndexSorted > > > >> >> What I see here, that it is tangled with H2 related classes ( > > > >> >> org.h2.result.Row) and might not be unified with TextQuery > reduction. > > > >> >> > > > >> >> Still need a support here. > > > >> >> > > > >> >> Overall, the goal of this letter is to initiate discussion on > > > TextQuery > > > >> >> Sorting implementation and come closer to ticket creation. > > > >> >> > > > >> >> BR, > > > >> >> Yuriy Shuliha > > > >> >> > > > >> >> вт, 22 жовт. 2019 о 13:31 Andrey Mashenkov < > > > [hidden email] > > > >> > > > > >> >> пише: > > > >> >> > > > >> >> > Hi Dmitry, Yuriy. > > > >> >> > > > > >> >> > I've found GridCacheQueryFutureAdapter has newly added > > > AtomicInteger > > > >> >> > 'total' field and 'limit; field as primitive int. > > > >> >> > > > > >> >> > Both fields are used inside synchronized block only. > > > >> >> > So, we can make both private and downgrade AtomicInteger to > > > primitive > > > >> >> int. > > > >> >> > > > > >> >> > Most likely, these fields can be replaced with one field. > > > >> >> > > > > >> >> > > > > >> >> > > > > >> >> > On Mon, Oct 21, 2019 at 10:01 PM Dmitriy Pavlov < > > > [hidden email] > > > >> > > > > >> >> > wrote: > > > >> >> > > > > >> >> > > Hi Andrey, > > > >> >> > > > > > >> >> > > I've checked this ticket comments, and there is a TC Bot visa > > > (with > > > >> no > > > >> >> > > blockers). > > > >> >> > > > > > >> >> > > Do you have any concerns related to this patch? > > > >> >> > > > > > >> >> > > Sincerely, > > > >> >> > > Dmitriy Pavlov > > > >> >> > > > > > >> >> > > чт, 17 окт. 2019 г. в 16:43, Yuriy Shuliga < > [hidden email] > > > >: > > > >> >> > > > > > >> >> > >> Andrey, > > > >> >> > >> > > > >> >> > >> Per you request, I created ticket > > > >> >> > >> https://issues.apache.org/jira/browse/IGNITE-12291 linked > to > > > >> >> > >> > > > >> https://issues.apache.org/jira/projects/IGNITE/issues/IGNITE-12189 > > > >> >> > >> > > > >> >> > >> Could you please proceed with PR merge ? > > > >> >> > >> > > > >> >> > >> BR, > > > >> >> > >> Yuriy Shuliha > > > >> >> > >> > > > >> >> > >> ср, 9 жовт. 2019 о 12:52 Andrey Mashenkov < > > > >> [hidden email] > > > >> >> > > > > >> >> > >> пише: > > > >> >> > >> > > > >> >> > >> > Hi Yuri, > > > >> >> > >> > > > > >> >> > >> > To get access to TC Bot you should register as TeamCity > user > > > >> [1], if > > > >> >> > you > > > >> >> > >> > didn't do this already. > > > >> >> > >> > Then you will be able to authorize on Ignite TC Bot page > with > > > >> same > > > >> >> > >> > credentials. > > > >> >> > >> > > > > >> >> > >> > [1] https://ci.ignite.apache.org/registerUser.html > > > >> >> > >> > > > > >> >> > >> > On Fri, Oct 4, 2019 at 3:10 PM Yuriy Shuliga < > > > [hidden email] > > > >> > > > > >> >> > wrote: > > > >> >> > >> > > > > >> >> > >> >> Andrew, > > > >> >> > >> >> > > > >> >> > >> >> I have corrected PR according to your notes. Please > review. > > > >> >> > >> >> What will be the next steps in order to merge in? > > > >> >> > >> >> > > > >> >> > >> >> Y. > > > >> >> > >> >> > > > >> >> > >> >> чт, 3 жовт. 2019 о 17:47 Andrey Mashenkov < > > > >> >> > [hidden email] > > > > >> >> > >> >> пише: > > > >> >> > >> >> > > > >> >> > >> >> > Yuri, > > > >> >> > >> >> > > > > >> >> > >> >> > I've done with review. > > > >> >> > >> >> > No crime found, but trivial compatibility bug. > > > >> >> > >> >> > > > > >> >> > >> >> > On Thu, Oct 3, 2019 at 3:54 PM Yuriy Shuliga < > > > >> [hidden email] > > > > >> >> > >> wrote: > > > >> >> > >> >> > > > > >> >> > >> >> > > Denis, > > > >> >> > >> >> > > > > > >> >> > >> >> > > Thank you for your attention to this. > > > >> >> > >> >> > > as for now, the > > > >> >> > https://issues.apache.org/jira/browse/IGNITE-12189 > > > >> >> > >> >> > ticket > > > >> >> > >> >> > > is still pending review. > > > >> >> > >> >> > > Do we have a chance to move it forward somehow? > > > >> >> > >> >> > > > > > >> >> > >> >> > > BR, > > > >> >> > >> >> > > Yuriy Shuliha > > > >> >> > >> >> > > > > > >> >> > >> >> > > пн, 30 вер. 2019 о 23:35 Denis Magda < > > > [hidden email] > > > > >> пише: > > > >> >> > >> >> > > > > > >> >> > >> >> > > > Yuriy, > > > >> >> > >> >> > > > > > > >> >> > >> >> > > > I've seen you opening a pull-request with the first > > > >> changes: > > > >> >> > >> >> > > > > https://issues.apache.org/jira/browse/IGNITE-12189 > > > >> >> > >> >> > > > > > > >> >> > >> >> > > > Alex Scherbakov and Ivan are you the right guys to > do > > > the > > > >> >> > review? > > > >> >> > >> >> > > > > > > >> >> > >> >> > > > - > > > >> >> > >> >> > > > Denis > > > >> >> > >> >> > > > > > > >> >> > >> >> > > > > > > >> >> > >> >> > > > On Fri, Sep 27, 2019 at 8:48 AM Павлухин Иван < > > > >> >> > >> [hidden email] > > > > >> >> > >> >> > > wrote: > > > >> >> > >> >> > > > > > > >> >> > >> >> > > > > Yuriy, > > > >> >> > >> >> > > > > > > > >> >> > >> >> > > > > Thank you for providing details! Quite > interesting. > > > >> >> > >> >> > > > > > > > >> >> > >> >> > > > > Yes, we already have support of distributed > limit and > > > >> >> merging > > > >> >> > >> >> sorted > > > >> >> > >> >> > > > > subresults for SQL queries. E.g. > ReduceIndexSorted > > > and > > > >> >> > >> >> > > > > MergeStreamIterator are used for merging sorted > > > streams. > > > >> >> > >> >> > > > > > > > >> >> > >> >> > > > > Could you please also clarify about > score/relevance? > > > Is > > > >> it > > > >> >> > >> >> provided > > > >> >> > >> >> > by > > > >> >> > >> >> > > > > Lucene engine for each query result? I am > thinking > > > how > > > >> to > > > >> >> do > > > >> >> > >> >> sorted > > > >> >> > >> >> > > > > merge properly in this case. > > > >> >> > >> >> > > > > > > > >> >> > >> >> > > > > ср, 25 сент. 2019 г. в 18:56, Yuriy Shuliga < > > > >> >> > [hidden email] > > > >> >> > >> >: > > > >> >> > >> >> > > > > > > > > >> >> > >> >> > > > > > Ivan, > > > >> >> > >> >> > > > > > > > > >> >> > >> >> > > > > > Thank you for interesting question! > > > >> >> > >> >> > > > > > > > > >> >> > >> >> > > > > > Text searches (or full text searches) are > mostly > > > >> >> > >> human-oriented. > > > >> >> > >> >> > And > > > >> >> > >> >> > > > the > > > >> >> > >> >> > > > > > point of user's interest is topmost part of > > > response. > > > >> >> > >> >> > > > > > Then user can read it, evaluate and use the > given > > > >> records > > > >> >> > for > > > >> >> > >> >> > further > > > >> >> > >> >> > > > > > purposes. > > > >> >> > >> >> > > > > > > > > >> >> > >> >> > > > > > Particularly in our case, we use Ignite for > > > operations > > > >> >> with > > > >> >> > >> >> > financial > > > >> >> > >> >> > > > > data, > > > >> >> > >> >> > > > > > and there lots of text stuff like assets names, > > > fin. > > > >> >> > >> >> instruments, > > > >> >> > >> >> > > > > companies > > > >> >> > >> >> > > > > > etc. > > > >> >> > >> >> > > > > > In order to operate with this quickly and > reliably, > > > >> users > > > >> >> > >> used > > > >> >> > >> >> to > > > >> >> > >> >> > > work > > > >> >> > >> >> > > > > with > > > >> >> > >> >> > > > > > text search, type-ahead completions, > suggestions. > > > >> >> > >> >> > > > > > > > > >> >> > >> >> > > > > > For this purposes we are indexing particular > string > > > >> data > > > >> >> in > > > >> >> > >> >> > separate > > > >> >> > >> >> > > > > caches. > > > >> >> > >> >> > > > > > > > > >> >> > >> >> > > > > > Sorting capabilities and response size > limitations > > > are > > > >> >> very > > > >> >> > >> >> > important > > > >> >> > >> >> > > > > > there. As our API have to provide most relevant > > > >> >> information > > > >> >> > >> in > > > >> >> > >> >> view > > > >> >> > >> >> > > of > > > >> >> > >> >> > > > > > limited size. > > > >> >> > >> >> > > > > > > > > >> >> > >> >> > > > > > Now let me comment some Ignite/Lucene > perspective. > > > >> >> > >> >> > > > > > Actually Ignite queries and Lucene returns > > > >> >> > >> *TopDocs.scoresDocs > > > >> >> > >> >> > > *already > > > >> >> > >> >> > > > > > sorted by *score *(relevance). So most relevant > > > >> documents > > > >> >> > >> are on > > > >> >> > >> >> > the > > > >> >> > >> >> > > > top. > > > >> >> > >> >> > > > > > And currently distributed queries responses > from > > > >> >> different > > > >> >> > >> nodes > > > >> >> > >> >> > are > > > >> >> > >> >> > > > > merged > > > >> >> > >> >> > > > > > into final query cursor queue in arbitrary way. > > > >> >> > >> >> > > > > > So in fact we already have the score order > ruined > > > >> here. > > > >> >> > Also > > > >> >> > >> >> Ignite > > > >> >> > >> >> > > > > > requests all possible documents from Lucene > that is > > > >> >> > redundant > > > >> >> > >> >> and > > > >> >> > >> >> > not > > > >> >> > >> >> > > > > good > > > >> >> > >> >> > > > > > for performance. > > > >> >> > >> >> > > > > > > > > >> >> > >> >> > > > > > I'm implementing *limit* parameter to be part > of > > > >> >> *TextQuery > > > >> >> > >> *and > > > >> >> > >> >> > have > > > >> >> > >> >> > > > to > > > >> >> > >> >> > > > > > notice that we still have to add sorting for > text > > > >> queries > > > >> >> > >> >> > processing > > > >> >> > >> >> > > in > > > >> >> > >> >> > > > > > order to have applicable results. > > > >> >> > >> >> > > > > > > > > >> >> > >> >> > > > > > *Limit* parameter itself should improve the > part of > > > >> >> issues > > > >> >> > >> from > > > >> >> > >> >> > > above, > > > >> >> > >> >> > > > > but > > > >> >> > >> >> > > > > > definitely, sorting by document score at least > > > should > > > >> be > > > >> >> > >> >> > implemented > > > >> >> > >> >> > > > > along > > > >> >> > >> >> > > > > > with limit. > > > >> >> > >> >> > > > > > > > > >> >> > >> >> > > > > > This is a pretty short commentary if you still > have > > > >> any > > > >> >> > >> >> questions, > > > >> >> > >> >> > > > please > > > >> >> > >> >> > > > > > ask, do not hesitate) > > > >> >> > >> >> > > > > > > > > >> >> > >> >> > > > > > BR, > > > >> >> > >> >> > > > > > Yuriy Shuliha > > > >> >> > >> >> > > > > > > > > >> >> > >> >> > > > > > чт, 19 вер. 2019 о 11:38 Павлухин Иван < > > > >> >> > [hidden email] > > > > >> >> > >> >> пише: > > > >> >> > >> >> > > > > > > > > >> >> > >> >> > > > > > > Yuriy, > > > >> >> > >> >> > > > > > > > > > >> >> > >> >> > > > > > > Greatly appreciate your interest. > > > >> >> > >> >> > > > > > > > > > >> >> > >> >> > > > > > > Could you please elaborate a little bit about > > > >> sorting? > > > >> >> > What > > > >> >> > >> >> tasks > > > >> >> > >> >> > > > does > > > >> >> > >> >> > > > > > > it help to solve and how? It would be great > to > > > >> provide > > > >> >> an > > > >> >> > >> >> > example. > > > >> >> > >> >> > > > > > > > > > >> >> > >> >> > > > > > > ср, 18 сент. 2019 г. в 09:39, Alexei > Scherbakov < > > > >> >> > >> >> > > > > > > [hidden email] >: > > > >> >> > >> >> > > > > > > > > > > >> >> > >> >> > > > > > > > Denis, > > > >> >> > >> >> > > > > > > > > > > >> >> > >> >> > > > > > > > I like the idea of throwing an exception > for > > > >> enabled > > > >> >> > text > > > >> >> > >> >> > queries > > > >> >> > >> >> > > > on > > > >> >> > >> >> > > > > > > > persistent caches. > > > >> >> > >> >> > > > > > > > > > > >> >> > >> >> > > > > > > > Also I'm fine with proposed limit for > unsorted > > > >> >> > searches. > > > >> >> > >> >> > > > > > > > > > > >> >> > >> >> > > > > > > > Yury, please proceed with ticket creation. > > > >> >> > >> >> > > > > > > > > > > >> >> > >> >> > > > > > > > вт, 17 сент. 2019 г., 22:06 Denis Magda < > > > >> >> > >> [hidden email] > > > >> >> > >> >> >: > > > >> >> > >> >> > > > > > > > > > > >> >> > >> >> > > > > > > > > Igniters, > > > >> >> > >> >> > > > > > > > > > > > >> >> > >> >> > > > > > > > > I see nothing wrong with Yury's proposal > in > > > >> regards > > > >> >> > >> >> full-text > > > >> >> > >> >> > > > > search > > > >> >> > >> >> > > > > > > API > > > >> >> > >> >> > > > > > > > > evolution as long as Yury is ready to > push it > > > >> >> > forward. > > > >> >> > >> >> > > > > > > > > > > > >> >> > >> >> > > > > > > > > As for the in-memory mode only, it makes > > > total > > > >> >> sense > > > >> >> > >> for > > > >> >> > >> >> > > > in-memory > > > >> >> > >> >> > > > > data > > > >> >> > >> >> > > > > > > > > grid deployments when Ignite caches data > of > > > an > > > >> >> > >> underlying > > > >> >> > >> >> DB > > > >> >> > >> >> > > like > > > >> >> > >> >> > > > > > > Postgres. > > > >> >> > >> >> > > > > > > > > As part of the changes, I would simply > throw > > > an > > > >> >> > >> exception > > > >> >> > >> >> (by > > > >> >> > >> >> > > > > default) > > > >> >> > >> >> > > > > > > if > > > >> >> > >> >> > > > > > > > > the one attempts to use text indices > with the > > > >> >> native > > > >> >> > >> >> > > persistence > > > >> >> > >> >> > > > > > > enabled. > > > >> >> > >> >> > > > > > > > > If the person is ready to live with that > > > >> limitation > > > >> >> > >> that > > > >> >> > >> >> an > > > >> >> > >> >> > > > > explicit > > > >> >> > >> >> > > > > > > > > configuration change is needed to come > around > > > >> the > > > >> >> > >> >> exception. > > > >> >> > >> >> > > > > > > > > > > > >> >> > >> >> > > > > > > > > Thoughts? > > > >> >> > >> >> > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > >> >> > >> >> > > > > > > > > - > > > >> >> > >> >> > > > > > > > > Denis > > > >> >> > >> >> > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > >> >> > >> >> > > > > > > > > On Tue, Sep 17, 2019 at 7:44 AM Yuriy > > > Shuliga < > > > >> >> > >> >> > > [hidden email] > > > >> >> > >> >> > > > > > > > >> >> > >> >> > > > > > > wrote: > > > >> >> > >> >> > > > > > > > > > > > >> >> > >> >> > > > > > > > > > Hello to all again, > > > >> >> > >> >> > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > Thank you for important comments and > notes > > > >> given > > > >> >> > >> below! > > > >> >> > >> >> > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > Let me answer and continue the > discussion. > > > >> >> > >> >> > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > (I) Overall needs in Lucene indexing > > > >> >> > >> >> > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > Alexei has referenced to > > > >> >> > >> >> > > > > > > > > > > > > >> >> https://issues.apache.org/jira/browse/IGNITE-5371 > > > >> >> > >> where > > > >> >> > >> >> > > > > > > > > > absence of index persistence was > declared > > > as > > > >> an > > > >> >> > >> >> obstacle to > > > >> >> > >> >> > > > > further > > > >> >> > >> >> > > > > > > > > > development. > > > >> >> > >> >> > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > a) This ticket is already closed as not > > > >> valid.b) > > > >> >> > >> There > > > >> >> > >> >> are > > > >> >> > >> >> > > > > definite > > > >> >> > >> >> > > > > > > needs > > > >> >> > >> >> > > > > > > > > > (and in our project as well) in just > > > in-memory > > > >> >> > >> indexing > > > >> >> > >> >> of > > > >> >> > >> >> > > > > selected > > > >> >> > >> >> > > > > > > data. > > > >> >> > >> >> > > > > > > > > > We intend to use search capabilities > for > > > >> fetching > > > >> >> > >> >> limited > > > >> >> > >> >> > > > amount > > > >> >> > >> >> > > > > of > > > >> >> > >> >> > > > > > > > > records > > > >> >> > >> >> > > > > > > > > > that should be used in type-ahead > search / > > > >> >> > >> suggestions. > > > >> >> > >> >> > > > > > > > > > Not all of the data will be indexed > and the > > > >> are > > > >> >> no > > > >> >> > >> need > > > >> >> > >> >> in > > > >> >> > >> >> > > > Lucene > > > >> >> > >> >> > > > > > > index > > > >> >> > >> >> > > > > > > > > to > > > >> >> > >> >> > > > > > > > > > be persistence. Hope this is a wide > > > pattern of > > > >> >> > >> >> text-search > > > >> >> > >> >> > > > usage. > > > >> >> > >> >> > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > (II) Necessary fixes in current > > > >> implementation. > > > >> >> > >> >> > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > a) Implementation of correct *limit > > > *(*offset* > > > >> >> > seems > > > >> >> > >> to > > > >> >> > >> >> be > > > >> >> > >> >> > > not > > > >> >> > >> >> > > > > > > required > > > >> >> > >> >> > > > > > > > > in > > > >> >> > >> >> > > > > > > > > > text-search tasks for now) > > > >> >> > >> >> > > > > > > > > > I have investigated the data flow for > > > >> distributed > > > >> >> > >> text > > > >> >> > >> >> > > queries. > > > >> >> > >> >> > > > > it > > > >> >> > >> >> > > > > > > was > > > >> >> > >> >> > > > > > > > > > simple test prefix query, like > > > 'name'*='ene*'* > > > >> >> > >> >> > > > > > > > > > For now each server-node returns all > > > response > > > >> >> > >> records to > > > >> >> > >> >> > the > > > >> >> > >> >> > > > > > > client-node > > > >> >> > >> >> > > > > > > > > > and it may contain ~thousands, ~hundred > > > >> thousands > > > >> >> > >> >> records. > > > >> >> > >> >> > > > > > > > > > Event if we need only first 10-100. > Again, > > > all > > > >> >> the > > > >> >> > >> >> results > > > >> >> > >> >> > > are > > > >> >> > >> >> > > > > added > > > >> >> > >> >> > > > > > > to > > > >> >> > >> >> > > > > > > > > > queue in GridCacheQueryFutureAdapter in > > > >> arbitrary > > > >> >> > >> order > > > >> >> > >> >> by > > > >> >> > >> >> > > > pages. > > > >> >> > >> >> > > > > > > > > > I did not find here any means to > deliver > > > >> >> > >> deterministic > > > >> >> > >> >> > > result. > > > >> >> > >> >> > > > > > > > > > So implementing limit as part of query > and > > > >> >> > >> >> > > > > (GridCacheQueryRequest) > > > >> >> > >> >> > > > > > > will > > > >> >> > >> >> > > > > > > > > not > > > >> >> > >> >> > > > > > > > > > change the nature of response but will > > > limit > > > >> load > > > >> >> > on > > > >> >> > >> >> nodes > > > >> >> > >> >> > > and > > > >> >> > >> >> > > > > > > > > networking. > > > >> >> > >> >> > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > Can we consider to open a ticket for > this? > > > >> >> > >> >> > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > (III) Further extension of Lucene API > > > >> exposition > > > >> >> to > > > >> >> > >> >> Ignite > > > >> >> > >> >> > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > a) Sorting > > > >> >> > >> >> > > > > > > > > > The solution for this could be: > > > >> >> > >> >> > > > > > > > > > - Make entities comparable > > > >> >> > >> >> > > > > > > > > > - Add custom comparator to entity > > > >> >> > >> >> > > > > > > > > > - Add annotations to mark sorted > fields for > > > >> >> Lucene > > > >> >> > >> >> indexing > > > >> >> > >> >> > > > > > > > > > - Use comparators when merging > responses or > > > >> >> > reducing > > > >> >> > >> to > > > >> >> > >> >> > > desired > > > >> >> > >> >> > > > > > > limit on > > > >> >> > >> >> > > > > > > > > > client node. > > > >> >> > >> >> > > > > > > > > > Will require full result set to be > loaded > > > into > > > >> >> > >> memory. > > > >> >> > >> >> > Though > > > >> >> > >> >> > > > > can be > > > >> >> > >> >> > > > > > > used > > > >> >> > >> >> > > > > > > > > > for relatively small limits. > > > >> >> > >> >> > > > > > > > > > BR, > > > >> >> > >> >> > > > > > > > > > Yuriy Shuliha > > > >> >> > >> >> > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > пт, 30 серп. 2019 о 10:37 Alexei > > > Scherbakov < > > > >> >> > >> >> > > > > > > > > [hidden email] > > > > >> >> > >> >> > > > > > > > > > пише: > > > >> >> > >> >> > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > Yuriy, > > > >> >> > >> >> > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > Note what one of major blockers for > text > > > >> >> queries > > > >> >> > is > > > >> >> > >> >> [1] > > > >> >> > >> >> > > which > > > >> >> > >> >> > > > > makes > > > >> >> > >> >> > > > > > > > > > lucene > > > >> >> > >> >> > > > > > > > > > > indexes unusable with persistence and > > > main > > > >> >> reason > > > >> >> > >> for > > > >> >> > >> >> > > > > > > discontinuation. > > > >> >> > >> >> > > > > > > > > > > Probably it's should be addressed > first > > > to > > > >> make > > > >> >> > >> text > > > >> >> > >> >> > > queries > > > >> >> > >> >> > > > a > > > >> >> > >> >> > > > > > > valid > > > >> >> > >> >> > > > > > > > > > > product feature. > > > >> >> > >> >> > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > Distributed sorting and advanved > > > querying is > > > >> >> > indeed > > > >> >> > >> >> not a > > > >> >> > >> >> > > > > trivial > > > >> >> > >> >> > > > > > > task. > > > >> >> > >> >> > > > > > > > > > > Some kind of merging must be > implemented > > > on > > > >> >> query > > > >> >> > >> >> > > originating > > > >> >> > >> >> > > > > node. > > > >> >> > >> >> > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > [1] > > > >> >> > >> https://issues.apache.org/jira/browse/IGNITE-5371 > > > >> >> > >> >> > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > чт, 29 авг. 2019 г. в 23:38, Denis > Magda > > > < > > > >> >> > >> >> > > [hidden email] > > > >> >> > >> >> > > > >: > > > >> >> > >> >> > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > Yuriy, > > > >> >> > >> >> > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > If you are ready to take over the > > > >> full-text > > > >> >> > >> search > > > >> >> > >> >> > > indexes > > > >> >> > >> >> > > > > then > > > >> >> > >> >> > > > > > > > > please > > > >> >> > >> >> > > > > > > > > > go > > > >> >> > >> >> > > > > > > > > > > > ahead. The primary reason why the > > > >> community > > > >> >> > >> wants to > > > >> >> > >> >> > > > > discontinue > > > >> >> > >> >> > > > > > > them > > > >> >> > >> >> > > > > > > > > > > first > > > >> >> > >> >> > > > > > > > > > > > (and, probable, resurrect later) > are > > > the > > > >> >> > >> limitations > > > >> >> > >> >> > > listed > > > >> >> > >> >> > > > > by > > > >> >> > >> >> > > > > > > Andrey > > > >> >> > >> >> > > > > > > > > > and > > > >> >> > >> >> > > > > > > > > > > > minimal support from the community > end. > > > >> >> > >> >> > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > - > > > >> >> > >> >> > > > > > > > > > > > Denis > > > >> >> > >> >> > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > On Thu, Aug 29, 2019 at 1:29 PM > Andrey > > > >> >> > Mashenkov > > > >> >> > >> < > > > >> >> > >> >> > > > > > > > > > > > [hidden email] > > > > >> >> > >> >> > > > > > > > > > > > wrote: > > > >> >> > >> >> > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > Hi Yuriy, > > > >> >> > >> >> > > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > Unfortunatelly, there is a plan > to > > > >> >> > discontinue > > > >> >> > >> >> > > > TextQueries > > > >> >> > >> >> > > > > in > > > >> >> > >> >> > > > > > > > > Ignite > > > >> >> > >> >> > > > > > > > > > > [1]. > > > >> >> > >> >> > > > > > > > > > > > > Motivation here is text indexes > are > > > not > > > >> >> > >> >> persistent, > > > >> >> > >> >> > not > > > >> >> > >> >> > > > > > > > > transactional > > > >> >> > >> >> > > > > > > > > > > and > > > >> >> > >> >> > > > > > > > > > > > > can't be user together with SQL > or > > > >> inside > > > >> >> > SQL. > > > >> >> > >> >> > > > > > > > > > > > > and there is a lack of interest > from > > > >> >> > community > > > >> >> > >> >> side. > > > >> >> > >> >> > > > > > > > > > > > > You are weclome to take on these > > > issues > > > >> and > > > >> >> > >> make > > > >> >> > >> >> > > > > TextQueries > > > >> >> > >> >> > > > > > > great. > > > >> >> > >> >> > > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > 1, PageSize can't be used to > limit > > > >> >> > resultset. > > > >> >> > >> >> > > > > > > > > > > > > Query results return from data > node > > > to > > > >> >> > >> client-side > > > >> >> > >> >> > > cursor > > > >> >> > >> >> > > > > in > > > >> >> > >> >> > > > > > > > > > > page-by-page > > > >> >> > >> >> > > > > > > > > > > > > manner and > > > >> >> > >> >> > > > > > > > > > > > > this parameter is designed > control > > > page > > > >> >> size. > > > >> >> > >> It > > > >> >> > >> >> is > > > >> >> > >> >> > > > > supposed > > > >> >> > >> >> > > > > > > query > > > >> >> > >> >> > > > > > > > > > > > executes > > > >> >> > >> >> > > > > > > > > > > > > lazily on server side and > > > >> >> > >> >> > > > > > > > > > > > > it is not excepted full > resultset be > > > >> loaded > > > >> >> > to > > > >> >> > >> >> memory > > > >> >> > >> >> > > on > > > >> >> > >> >> > > > > server > > > >> >> > >> >> > > > > > > > > side > > > >> >> > >> >> > > > > > > > > > at > > > >> >> > >> >> > > > > > > > > > > > > once, but by pages. > > > >> >> > >> >> > > > > > > > > > > > > Do you mean you found Lucene load > > > entire > > > >> >> > >> resultset > > > >> >> > >> >> > into > > > >> >> > >> >> > > > > memory > > > >> >> > >> >> > > > > > > > > before > > > >> >> > >> >> > > > > > > > > > > > first > > > >> >> > >> >> > > > > > > > > > > > > page is sent to client? > > > >> >> > >> >> > > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > I'd think a new parameter should > be > > > >> added > > > >> >> to > > > >> >> > >> limit > > > >> >> > >> >> > > > result. > > > >> >> > >> >> > > > > The > > > >> >> > >> >> > > > > > > best > > > >> >> > >> >> > > > > > > > > > > > > solution is to use query language > > > >> commands > > > >> >> > for > > > >> >> > >> >> this, > > > >> >> > >> >> > > e.g. > > > >> >> > >> >> > > > > > > > > > > "LIMIT/OFFSET" > > > >> >> > >> >> > > > > > > > > > > > in > > > >> >> > >> >> > > > > > > > > > > > > SQL. > > > >> >> > >> >> > > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > This task doesn't look trivial. > > > Query is > > > >> >> > >> >> distributed > > > >> >> > >> >> > > > > operation > > > >> >> > >> >> > > > > > > and > > > >> >> > >> >> > > > > > > > > > same > > > >> >> > >> >> > > > > > > > > > > > > user query will be executed on > data > > > >> nodes > > > >> >> > >> >> > > > > > > > > > > > > and then results from all nodes > > > should > > > >> be > > > >> >> > >> correcly > > > >> >> > >> >> > > merged > > > >> >> > >> >> > > > > > > before > > > >> >> > >> >> > > > > > > > > > being > > > >> >> > >> >> > > > > > > > > > > > > returned via client-cursor. > > > >> >> > >> >> > > > > > > > > > > > > So, LIMIT should be applied on > every > > > >> node > > > >> >> and > > > >> >> > >> >> then on > > > >> >> > >> >> > > > merge > > > >> >> > >> >> > > > > > > phase. > > > >> >> > >> >> > > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > Also, this may be non-obviuos, > > > limiting > > > >> >> > results > > > >> >> > >> >> make > > > >> >> > >> >> > no > > > >> >> > >> >> > > > > sence > > > >> >> > >> >> > > > > > > > > without > > > >> >> > >> >> > > > > > > > > > > > > sorting, > > > >> >> > >> >> > > > > > > > > > > > > as there is no guarantee every > next > > > >> query > > > >> >> run > > > >> >> > >> will > > > >> >> > >> >> > > return > > > >> >> > >> >> > > > > same > > > >> >> > >> >> > > > > > > data > > > >> >> > >> >> > > > > > > > > > > > because > > > >> >> > >> >> > > > > > > > > > > > > of page reordeing. > > > >> >> > >> >> > > > > > > > > > > > > Basically, merge phase receive > > > results > > > >> from > > > >> >> > >> data > > > >> >> > >> >> > nodes > > > >> >> > >> >> > > > > > > > > asynchronously > > > >> >> > >> >> > > > > > > > > > > and > > > >> >> > >> >> > > > > > > > > > > > > messages from different nodes > can't > > > be > > > >> >> > ordered. > > > >> >> > >> >> > > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > 2. > > > >> >> > >> >> > > > > > > > > > > > > a. "tokenize" param name (for > > > >> >> > @QueryTextFiled) > > > >> >> > >> >> looks > > > >> >> > >> >> > > more > > > >> >> > >> >> > > > > > > verbose, > > > >> >> > >> >> > > > > > > > > > > isn't > > > >> >> > >> >> > > > > > > > > > > > > it. > > > >> >> > >> >> > > > > > > > > > > > > b,c. What about distributed > query? > > > How > > > >> >> > partial > > > >> >> > >> >> > results > > > >> >> > >> >> > > > from > > > >> >> > >> >> > > > > > > nodes > > > >> >> > >> >> > > > > > > > > > will > > > >> >> > >> >> > > > > > > > > > > be > > > >> >> > >> >> > > > > > > > > > > > > merged? > > > >> >> > >> >> > > > > > > > > > > > > Does Lucene allows to configure > > > >> comparator > > > >> >> > for > > > >> >> > >> >> data > > > >> >> > >> >> > > > > sorting? > > > >> >> > >> >> > > > > > > > > > > > > What comparator Ignite should > choose > > > to > > > >> >> sort > > > >> >> > >> >> result > > > >> >> > >> >> > on > > > >> >> > >> >> > > > > merge > > > >> >> > >> >> > > > > > > phase? > > > >> >> > >> >> > > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > 3. For now Lucene engine is not > > > >> >> configurable > > > >> >> > at > > > >> >> > >> >> all. > > > >> >> > >> >> > > E.g. > > > >> >> > >> >> > > > > it is > > > >> >> > >> >> > > > > > > > > > > > impossible > > > >> >> > >> >> > > > > > > > > > > > > to configure Tokenizer. > > > >> >> > >> >> > > > > > > > > > > > > I'd think about possible ways to > > > >> configure > > > >> >> > >> engine > > > >> >> > >> >> at > > > >> >> > >> >> > > > first > > > >> >> > >> >> > > > > and > > > >> >> > >> >> > > > > > > only > > > >> >> > >> >> > > > > > > > > > > then > > > >> >> > >> >> > > > > > > > > > > > go > > > >> >> > >> >> > > > > > > > > > > > > further to discuss\implement > complex > > > >> >> > features, > > > >> >> > >> >> > > > > > > > > > > > > that may depends on engine > config. > > > >> >> > >> >> > > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > On Thu, Aug 29, 2019 at 8:17 PM > Yuriy > > > >> >> > Shuliga < > > > >> >> > >> >> > > > > > > [hidden email] > > > > >> >> > >> >> > > > > > > > > > > wrote: > > > >> >> > >> >> > > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > Dear community, > > > >> >> > >> >> > > > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > By starting this chain I'd > like to > > > >> open > > > >> >> > >> >> discussion > > > >> >> > >> >> > > that > > > >> >> > >> >> > > > > would > > > >> >> > >> >> > > > > > > > > come > > > >> >> > >> >> > > > > > > > > > to > > > >> >> > >> >> > > > > > > > > > > > > > contribution results in subj. > area. > > > >> >> > >> >> > > > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > Ignite has indexing > capabilities, > > > >> backed > > > >> >> up > > > >> >> > >> by > > > >> >> > >> >> > > > different > > > >> >> > >> >> > > > > > > > > > mechanisms, > > > >> >> > >> >> > > > > > > > > > > > > > including Lucene. > > > >> >> > >> >> > > > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > Currently, Lucene 7.5.0 is used > > > (past > > > >> >> year > > > >> >> > >> >> > release). > > > >> >> > >> >> > > > > > > > > > > > > > This is a wide spread and > mature > > > >> >> technology > > > >> >> > >> that > > > >> >> > >> >> > > covers > > > >> >> > >> >> > > > > text > > > >> >> > >> >> > > > > > > > > search > > > >> >> > >> >> > > > > > > > > > > > area > > > >> >> > >> >> > > > > > > > > > > > > > and beyond (e.g. spacial data > > > >> indexing). > > > >> >> > >> >> > > > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > My goal is to *expose more > Lucene > > > >> >> > >> functionality > > > >> >> > >> >> to > > > >> >> > >> >> > > > Ignite > > > >> >> > >> >> > > > > > > > > indexing > > > >> >> > >> >> > > > > > > > > > > and > > > >> >> > >> >> > > > > > > > > > > > > > query mechanisms for text > data*. > > > >> >> > >> >> > > > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > It's quite simple request at > > > current > > > >> >> stage. > > > >> >> > >> It > > > >> >> > >> >> is > > > >> >> > >> >> > > > coming > > > >> >> > >> >> > > > > > > from our > > > >> >> > >> >> > > > > > > > > > > > > project's > > > >> >> > >> >> > > > > > > > > > > > > > needs, but i believe, will be > > > useful > > > >> for > > > >> >> a > > > >> >> > >> lot > > > >> >> > >> >> more > > > >> >> > >> >> > > > > people. > > > >> >> > >> >> > > > > > > > > > > > > > Let's walk through and vote or > > > discuss > > > >> >> > about > > > >> >> > >> >> Jira > > > >> >> > >> >> > > > > tickets for > > > >> >> > >> >> > > > > > > > > them. > > > >> >> > >> >> > > > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > 1.[trivial] Use > > > >> dataQuery.getPageSize() > > > >> >> > to > > > >> >> > >> >> limit > > > >> >> > >> >> > > > search > > > >> >> > >> >> > > > > > > > > response > > > >> >> > >> >> > > > > > > > > > > > items > > > >> >> > >> >> > > > > > > > > > > > > > inside GridLuceneIndex.query(). > > > >> Currently > > > >> >> > it > > > >> >> > >> is > > > >> >> > >> >> > > calling > > > >> >> > >> >> > > > > > > > > > > > > > IndexSearcher.search(query, > > > >> >> > >> >> *Integer.MAX_VALUE*) - > > > >> >> > >> >> > so > > > >> >> > >> >> > > > > > > basically > > > >> >> > >> >> > > > > > > > > all > > > >> >> > >> >> > > > > > > > > > > > > scored > > > >> >> > >> >> > > > > > > > > > > > > > matches will me returned, what > we > > > do > > > >> not > > > >> >> > >> need in > > > >> >> > >> >> > most > > > >> >> > >> >> > > > > cases. > > > >> >> > >> >> > > > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > 2.[simple] Add sorting. Then > more > > > >> >> capable > > > >> >> > >> >> search > > > >> >> > >> >> > > call > > > >> >> > >> >> > > > > can be > > > >> >> > >> >> > > > > > > > > > > > > > executed: > > > *IndexSearcher.search(query, > > > >> >> > count, > > > >> >> > >> >> > > > > > > > > > > > > > sort) * > > > >> >> > >> >> > > > > > > > > > > > > > Implementation steps: > > > >> >> > >> >> > > > > > > > > > > > > > a) Introduce boolean > *sortField* > > > >> >> parameter > > > >> >> > in > > > >> >> > >> >> > > > > > > *@QueryTextFiled * > > > >> >> > >> >> > > > > > > > > > > > > > annotation. If > > > >> >> > >> >> > > > > > > > > > > > > > *true *the filed will be > indexed > > > but > > > >> not > > > >> >> > >> >> tokenized. > > > >> >> > >> >> > > > > Number > > > >> >> > >> >> > > > > > > types > > > >> >> > >> >> > > > > > > > > > are > > > >> >> > >> >> > > > > > > > > > > > > > preferred here. > > > >> >> > >> >> > > > > > > > > > > > > > b) Add *sort* collection to > > > >> *TextQuery* > > > >> >> > >> >> > constructor. > > > >> >> > >> >> > > It > > > >> >> > >> >> > > > > > > should > > > >> >> > >> >> > > > > > > > > > define > > > >> >> > >> >> > > > > > > > > > > > > > desired sort fields used for > > > querying. > > > >> >> > >> >> > > > > > > > > > > > > > c) Implement Lucene sort usage > in > > > >> >> > >> >> > > > > GridLuceneIndex.query(). > > > >> >> > >> >> > > > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > 3.[moderate] Build complex > queries > > > >> with > > > >> >> > >> >> > *TextQuery*, > > > >> >> > >> >> > > > > > > including > > > >> >> > >> >> > > > > > > > > > > > > > terms/queries boosting. > > > >> >> > >> >> > > > > > > > > > > > > > *This section for voting only, > as > > > >> >> requires > > > >> >> > >> more > > > >> >> > >> >> > > > detailed > > > >> >> > >> >> > > > > > > work. > > > >> >> > >> >> > > > > > > > > > Should > > > >> >> > >> >> > > > > > > > > > > > be > > > >> >> > >> >> > > > > > > > > > > > > > extended if community is > > > interested in > > > >> >> it.* > > > >> >> > >> >> > > > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > Looking forward to your > comments! > > > >> >> > >> >> > > > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > BR, > > > >> >> > >> >> > > > > > > > > > > > > > Yuriy Shuliha > > > >> >> > >> >> > > > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > -- > > > >> >> > >> >> > > > > > > > > > > > > Best regards, > > > >> >> > >> >> > > > > > > > > > > > > Andrey V. Mashenkov > > > >> >> > >> >> > > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > -- > > > >> >> > >> >> > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > Best regards, > > > >> >> > >> >> > > > > > > > > > > Alexei Scherbakov > > > >> >> > >> >> > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > >> >> > >> >> > > > > > > > > > >> >> > >> >> > > > > > > > > > >> >> > >> >> > > > > > > > > > >> >> > >> >> > > > > > > -- > > > >> >> > >> >> > > > > > > Best regards, > > > >> >> > >> >> > > > > > > Ivan Pavlukhin > > > >> >> > >> >> > > > > > > > > > >> >> > >> >> > > > > > > > >> >> > >> >> > > > > > > > >> >> > >> >> > > > > > > > >> >> > >> >> > > > > -- > > > >> >> > >> >> > > > > Best regards, > > > >> >> > >> >> > > > > Ivan Pavlukhin > > > >> >> > >> >> > > > > > > > >> >> > >> >> > > > > > > >> >> > >> >> > > > > > >> >> > >> >> > > > > >> >> > >> >> > > > > >> >> > >> >> > -- > > > >> >> > >> >> > Best regards, > > > >> >> > >> >> > Andrey V. Mashenkov > > > >> >> > >> >> > > > > >> >> > >> >> > > > >> >> > >> > > > > >> >> > >> > > > > >> >> > >> > -- > > > >> >> > >> > Best regards, > > > >> >> > >> > Andrey V. Mashenkov > > > >> >> > >> > > > > >> >> > >> > > > >> >> > > > > > >> >> > > > > >> >> > -- > > > >> >> > Best regards, > > > >> >> > Andrey V. Mashenkov > > > >> >> > > > > >> >> > > > >> > > > >> > > > >> > > > >> > > > > > > > > > > > > > > > > > > > > -- > Best regards, > Ivan Pavlukhin > > |
Folks, Yuriy,
I suppose that we are going to proceed with >>> Reducing on Ignite The obvious point of distributed response reduction is class GridCacheDistributedQueryFuture. Though, @Ivan Pavlukhin mentioned class with similar functionality: ReduceIndexSorted What I see here, that it is tangled with H2 related classes (org.h2.result.Row) and might not be unified with TextQuery reduction. >> From my side there is no strict opinion that we should unify reduction. Having a separate reduction implementation for text queries sounds for me as not bad option as well. Are there still any open questions? ср, 27 нояб. 2019 г. в 02:27, Denis Magda <[hidden email]>: > > I don't see anything wrong if Yuriy is willing to carry on and keep > enhancing our full-text search support that lacks basic capabilities. > > The basics should be available. If anybody needs an advanced feature they > can introduce Solr or ElastiSearch into the final architecture of the app. > > Folks, who of us can help Yuriy with the questions asked? Most like the SQL > experts are the best candidates here. > > > - > Denis > > > On Tue, Nov 26, 2019 at 8:52 AM Ivan Pavlukhin <[hidden email]> wrote: > > > Folks, > > > > IEP is an Ignite-specific thing. In fact, I suppose that we are > > already doing it in ASF way by having this dev-list discussion =) > > > > As for me, implementing "limit" feature for text queries is not so big > > to make an IEP. But we might need to create one for next features. > > > > вт, 26 нояб. 2019 г. в 15:06, Ilya Kasnacheev <[hidden email]>: > > > > > > Hello! > > > > > > ASF way should probably start with an IEP :) > > > > > > Regards, > > > -- > > > Ilya Kasnacheev > > > > > > > > > вт, 26 нояб. 2019 г. в 14:12, Zhenya Stanilovsky > > <[hidden email] > > > >: > > > > > > > > > > > Ok, lets forgot Solr and go through ASF way, if Yuriy prove this > > > > functionality is helpful and PR it, why not ? > > > > > > > > isn`t it ? > > > > > > > > >Вторник, 26 ноября 2019, 14:06 +03:00 от Ilya Kasnacheev < > > > > [hidden email]>: > > > > > > > > > >Hello! > > > > > > > > > >The problem here is that Solr is a multi-year effort by a lot of > > people. > > > > We > > > > >can't match that. > > > > > > > > > >Maybe we could integrate with Solr/Solr Cloud instead, by feeding our > > > > cache > > > > >information into their storage for indexing and relying on their own > > > > >mechanisms for distributed IR sorting? > > > > > > > > > >Regards, > > > > >-- > > > > >Ilya Kasnacheev > > > > > > > > > > > > > > >вт, 26 нояб. 2019 г. в 13:59, Zhenya Stanilovsky < > > > > [hidden email] > > > > >>: > > > > > > > > > >> > > > > >> Ilya Kasnacheev, what a problem in Solr with Ignite functionality ? > > > > >> > > > > >> thanks ! > > > > >> > > > > >> >Вторник, 26 ноября 2019, 13:50 +03:00 от Ilya Kasnacheev < > > > > >> [hidden email] >: > > > > >> > > > > > >> >Hello! > > > > >> > > > > > >> >I have a hunch that we are trying to build Apache Solr (or Solr > > Cloud) > > > > >> into > > > > >> >Apache Ignite. I think that's a lot of effort that is not very > > > > justified. > > > > >> > > > > > >> >I don't think we should try to implement sorting in Apache Ignite, > > > > because > > > > >> >it is a lot of work, and a lot of code in our code base which we > > don't > > > > >> >really want. > > > > >> > > > > > >> >Regards, > > > > >> >-- > > > > >> >Ilya Kasnacheev > > > > >> > > > > > >> > > > > > >> >пт, 22 нояб. 2019 г. в 20:59, Yuriy Shuliga < [hidden email] > > >: > > > > >> > > > > > >> >> Dear Igniters, > > > > >> >> > > > > >> >> The first part of TextQuery improvement - a result limit - was > > > > developed > > > > >> >> and merged. > > > > >> >> Now we have to develop most important functionality here - proper > > > > >> sorting > > > > >> >> of Lucene index response and correct reducing of them for > > distributed > > > > >> >> queries. > > > > >> >> > > > > >> >> *There are two Lucene based aspects* > > > > >> >> > > > > >> >> 1. In case of using no sorting fields, the documents in response > > are > > > > >> still > > > > >> >> ordered by relevance. > > > > >> >> Actually this is ScoreDoc.score value. > > > > >> >> In order to reduce the distributed results correctly, the score > > > > should > > > > >> be > > > > >> >> passed with response. > > > > >> >> > > > > >> >> 2. When sorting by conventional fields, then Lucene should have > > these > > > > >> >> fields properly indexed and > > > > >> >> corresponding Sort object should be applied to Lucene's search > > call. > > > > >> >> In order to mark those fields a new annotation like '@SortField' > > may > > > > be > > > > >> >> introduced. > > > > >> >> > > > > >> >> *Reducing on Ignite * > > > > >> >> > > > > >> >> The obvious point of distributed response reduction is class > > > > >> >> GridCacheDistributedQueryFuture. > > > > >> >> Though, @Ivan Pavlukhin mentioned class with similar > > functionality: > > > > >> >> ReduceIndexSorted > > > > >> >> What I see here, that it is tangled with H2 related classes ( > > > > >> >> org.h2.result.Row) and might not be unified with TextQuery > > reduction. > > > > >> >> > > > > >> >> Still need a support here. > > > > >> >> > > > > >> >> Overall, the goal of this letter is to initiate discussion on > > > > TextQuery > > > > >> >> Sorting implementation and come closer to ticket creation. > > > > >> >> > > > > >> >> BR, > > > > >> >> Yuriy Shuliha > > > > >> >> > > > > >> >> вт, 22 жовт. 2019 о 13:31 Andrey Mashenkov < > > > > [hidden email] > > > > >> > > > > > >> >> пише: > > > > >> >> > > > > >> >> > Hi Dmitry, Yuriy. > > > > >> >> > > > > > >> >> > I've found GridCacheQueryFutureAdapter has newly added > > > > AtomicInteger > > > > >> >> > 'total' field and 'limit; field as primitive int. > > > > >> >> > > > > > >> >> > Both fields are used inside synchronized block only. > > > > >> >> > So, we can make both private and downgrade AtomicInteger to > > > > primitive > > > > >> >> int. > > > > >> >> > > > > > >> >> > Most likely, these fields can be replaced with one field. > > > > >> >> > > > > > >> >> > > > > > >> >> > > > > > >> >> > On Mon, Oct 21, 2019 at 10:01 PM Dmitriy Pavlov < > > > > [hidden email] > > > > >> > > > > > >> >> > wrote: > > > > >> >> > > > > > >> >> > > Hi Andrey, > > > > >> >> > > > > > > >> >> > > I've checked this ticket comments, and there is a TC Bot visa > > > > (with > > > > >> no > > > > >> >> > > blockers). > > > > >> >> > > > > > > >> >> > > Do you have any concerns related to this patch? > > > > >> >> > > > > > > >> >> > > Sincerely, > > > > >> >> > > Dmitriy Pavlov > > > > >> >> > > > > > > >> >> > > чт, 17 окт. 2019 г. в 16:43, Yuriy Shuliga < > > [hidden email] > > > > >: > > > > >> >> > > > > > > >> >> > >> Andrey, > > > > >> >> > >> > > > > >> >> > >> Per you request, I created ticket > > > > >> >> > >> https://issues.apache.org/jira/browse/IGNITE-12291 linked > > to > > > > >> >> > >> > > > > >> https://issues.apache.org/jira/projects/IGNITE/issues/IGNITE-12189 > > > > >> >> > >> > > > > >> >> > >> Could you please proceed with PR merge ? > > > > >> >> > >> > > > > >> >> > >> BR, > > > > >> >> > >> Yuriy Shuliha > > > > >> >> > >> > > > > >> >> > >> ср, 9 жовт. 2019 о 12:52 Andrey Mashenkov < > > > > >> [hidden email] > > > > >> >> > > > > > >> >> > >> пише: > > > > >> >> > >> > > > > >> >> > >> > Hi Yuri, > > > > >> >> > >> > > > > > >> >> > >> > To get access to TC Bot you should register as TeamCity > > user > > > > >> [1], if > > > > >> >> > you > > > > >> >> > >> > didn't do this already. > > > > >> >> > >> > Then you will be able to authorize on Ignite TC Bot page > > with > > > > >> same > > > > >> >> > >> > credentials. > > > > >> >> > >> > > > > > >> >> > >> > [1] https://ci.ignite.apache.org/registerUser.html > > > > >> >> > >> > > > > > >> >> > >> > On Fri, Oct 4, 2019 at 3:10 PM Yuriy Shuliga < > > > > [hidden email] > > > > >> > > > > > >> >> > wrote: > > > > >> >> > >> > > > > > >> >> > >> >> Andrew, > > > > >> >> > >> >> > > > > >> >> > >> >> I have corrected PR according to your notes. Please > > review. > > > > >> >> > >> >> What will be the next steps in order to merge in? > > > > >> >> > >> >> > > > > >> >> > >> >> Y. > > > > >> >> > >> >> > > > > >> >> > >> >> чт, 3 жовт. 2019 о 17:47 Andrey Mashenkov < > > > > >> >> > [hidden email] > > > > > >> >> > >> >> пише: > > > > >> >> > >> >> > > > > >> >> > >> >> > Yuri, > > > > >> >> > >> >> > > > > > >> >> > >> >> > I've done with review. > > > > >> >> > >> >> > No crime found, but trivial compatibility bug. > > > > >> >> > >> >> > > > > > >> >> > >> >> > On Thu, Oct 3, 2019 at 3:54 PM Yuriy Shuliga < > > > > >> [hidden email] > > > > > >> >> > >> wrote: > > > > >> >> > >> >> > > > > > >> >> > >> >> > > Denis, > > > > >> >> > >> >> > > > > > > >> >> > >> >> > > Thank you for your attention to this. > > > > >> >> > >> >> > > as for now, the > > > > >> >> > https://issues.apache.org/jira/browse/IGNITE-12189 > > > > >> >> > >> >> > ticket > > > > >> >> > >> >> > > is still pending review. > > > > >> >> > >> >> > > Do we have a chance to move it forward somehow? > > > > >> >> > >> >> > > > > > > >> >> > >> >> > > BR, > > > > >> >> > >> >> > > Yuriy Shuliha > > > > >> >> > >> >> > > > > > > >> >> > >> >> > > пн, 30 вер. 2019 о 23:35 Denis Magda < > > > > [hidden email] > > > > > >> пише: > > > > >> >> > >> >> > > > > > > >> >> > >> >> > > > Yuriy, > > > > >> >> > >> >> > > > > > > > >> >> > >> >> > > > I've seen you opening a pull-request with the first > > > > >> changes: > > > > >> >> > >> >> > > > > > https://issues.apache.org/jira/browse/IGNITE-12189 > > > > >> >> > >> >> > > > > > > > >> >> > >> >> > > > Alex Scherbakov and Ivan are you the right guys to > > do > > > > the > > > > >> >> > review? > > > > >> >> > >> >> > > > > > > > >> >> > >> >> > > > - > > > > >> >> > >> >> > > > Denis > > > > >> >> > >> >> > > > > > > > >> >> > >> >> > > > > > > > >> >> > >> >> > > > On Fri, Sep 27, 2019 at 8:48 AM Павлухин Иван < > > > > >> >> > >> [hidden email] > > > > > >> >> > >> >> > > wrote: > > > > >> >> > >> >> > > > > > > > >> >> > >> >> > > > > Yuriy, > > > > >> >> > >> >> > > > > > > > > >> >> > >> >> > > > > Thank you for providing details! Quite > > interesting. > > > > >> >> > >> >> > > > > > > > > >> >> > >> >> > > > > Yes, we already have support of distributed > > limit and > > > > >> >> merging > > > > >> >> > >> >> sorted > > > > >> >> > >> >> > > > > subresults for SQL queries. E.g. > > ReduceIndexSorted > > > > and > > > > >> >> > >> >> > > > > MergeStreamIterator are used for merging sorted > > > > streams. > > > > >> >> > >> >> > > > > > > > > >> >> > >> >> > > > > Could you please also clarify about > > score/relevance? > > > > Is > > > > >> it > > > > >> >> > >> >> provided > > > > >> >> > >> >> > by > > > > >> >> > >> >> > > > > Lucene engine for each query result? I am > > thinking > > > > how > > > > >> to > > > > >> >> do > > > > >> >> > >> >> sorted > > > > >> >> > >> >> > > > > merge properly in this case. > > > > >> >> > >> >> > > > > > > > > >> >> > >> >> > > > > ср, 25 сент. 2019 г. в 18:56, Yuriy Shuliga < > > > > >> >> > [hidden email] > > > > >> >> > >> >: > > > > >> >> > >> >> > > > > > > > > > >> >> > >> >> > > > > > Ivan, > > > > >> >> > >> >> > > > > > > > > > >> >> > >> >> > > > > > Thank you for interesting question! > > > > >> >> > >> >> > > > > > > > > > >> >> > >> >> > > > > > Text searches (or full text searches) are > > mostly > > > > >> >> > >> human-oriented. > > > > >> >> > >> >> > And > > > > >> >> > >> >> > > > the > > > > >> >> > >> >> > > > > > point of user's interest is topmost part of > > > > response. > > > > >> >> > >> >> > > > > > Then user can read it, evaluate and use the > > given > > > > >> records > > > > >> >> > for > > > > >> >> > >> >> > further > > > > >> >> > >> >> > > > > > purposes. > > > > >> >> > >> >> > > > > > > > > > >> >> > >> >> > > > > > Particularly in our case, we use Ignite for > > > > operations > > > > >> >> with > > > > >> >> > >> >> > financial > > > > >> >> > >> >> > > > > data, > > > > >> >> > >> >> > > > > > and there lots of text stuff like assets names, > > > > fin. > > > > >> >> > >> >> instruments, > > > > >> >> > >> >> > > > > companies > > > > >> >> > >> >> > > > > > etc. > > > > >> >> > >> >> > > > > > In order to operate with this quickly and > > reliably, > > > > >> users > > > > >> >> > >> used > > > > >> >> > >> >> to > > > > >> >> > >> >> > > work > > > > >> >> > >> >> > > > > with > > > > >> >> > >> >> > > > > > text search, type-ahead completions, > > suggestions. > > > > >> >> > >> >> > > > > > > > > > >> >> > >> >> > > > > > For this purposes we are indexing particular > > string > > > > >> data > > > > >> >> in > > > > >> >> > >> >> > separate > > > > >> >> > >> >> > > > > caches. > > > > >> >> > >> >> > > > > > > > > > >> >> > >> >> > > > > > Sorting capabilities and response size > > limitations > > > > are > > > > >> >> very > > > > >> >> > >> >> > important > > > > >> >> > >> >> > > > > > there. As our API have to provide most relevant > > > > >> >> information > > > > >> >> > >> in > > > > >> >> > >> >> view > > > > >> >> > >> >> > > of > > > > >> >> > >> >> > > > > > limited size. > > > > >> >> > >> >> > > > > > > > > > >> >> > >> >> > > > > > Now let me comment some Ignite/Lucene > > perspective. > > > > >> >> > >> >> > > > > > Actually Ignite queries and Lucene returns > > > > >> >> > >> *TopDocs.scoresDocs > > > > >> >> > >> >> > > *already > > > > >> >> > >> >> > > > > > sorted by *score *(relevance). So most relevant > > > > >> documents > > > > >> >> > >> are on > > > > >> >> > >> >> > the > > > > >> >> > >> >> > > > top. > > > > >> >> > >> >> > > > > > And currently distributed queries responses > > from > > > > >> >> different > > > > >> >> > >> nodes > > > > >> >> > >> >> > are > > > > >> >> > >> >> > > > > merged > > > > >> >> > >> >> > > > > > into final query cursor queue in arbitrary way. > > > > >> >> > >> >> > > > > > So in fact we already have the score order > > ruined > > > > >> here. > > > > >> >> > Also > > > > >> >> > >> >> Ignite > > > > >> >> > >> >> > > > > > requests all possible documents from Lucene > > that is > > > > >> >> > redundant > > > > >> >> > >> >> and > > > > >> >> > >> >> > not > > > > >> >> > >> >> > > > > good > > > > >> >> > >> >> > > > > > for performance. > > > > >> >> > >> >> > > > > > > > > > >> >> > >> >> > > > > > I'm implementing *limit* parameter to be part > > of > > > > >> >> *TextQuery > > > > >> >> > >> *and > > > > >> >> > >> >> > have > > > > >> >> > >> >> > > > to > > > > >> >> > >> >> > > > > > notice that we still have to add sorting for > > text > > > > >> queries > > > > >> >> > >> >> > processing > > > > >> >> > >> >> > > in > > > > >> >> > >> >> > > > > > order to have applicable results. > > > > >> >> > >> >> > > > > > > > > > >> >> > >> >> > > > > > *Limit* parameter itself should improve the > > part of > > > > >> >> issues > > > > >> >> > >> from > > > > >> >> > >> >> > > above, > > > > >> >> > >> >> > > > > but > > > > >> >> > >> >> > > > > > definitely, sorting by document score at least > > > > should > > > > >> be > > > > >> >> > >> >> > implemented > > > > >> >> > >> >> > > > > along > > > > >> >> > >> >> > > > > > with limit. > > > > >> >> > >> >> > > > > > > > > > >> >> > >> >> > > > > > This is a pretty short commentary if you still > > have > > > > >> any > > > > >> >> > >> >> questions, > > > > >> >> > >> >> > > > please > > > > >> >> > >> >> > > > > > ask, do not hesitate) > > > > >> >> > >> >> > > > > > > > > > >> >> > >> >> > > > > > BR, > > > > >> >> > >> >> > > > > > Yuriy Shuliha > > > > >> >> > >> >> > > > > > > > > > >> >> > >> >> > > > > > чт, 19 вер. 2019 о 11:38 Павлухин Иван < > > > > >> >> > [hidden email] > > > > > >> >> > >> >> пише: > > > > >> >> > >> >> > > > > > > > > > >> >> > >> >> > > > > > > Yuriy, > > > > >> >> > >> >> > > > > > > > > > > >> >> > >> >> > > > > > > Greatly appreciate your interest. > > > > >> >> > >> >> > > > > > > > > > > >> >> > >> >> > > > > > > Could you please elaborate a little bit about > > > > >> sorting? > > > > >> >> > What > > > > >> >> > >> >> tasks > > > > >> >> > >> >> > > > does > > > > >> >> > >> >> > > > > > > it help to solve and how? It would be great > > to > > > > >> provide > > > > >> >> an > > > > >> >> > >> >> > example. > > > > >> >> > >> >> > > > > > > > > > > >> >> > >> >> > > > > > > ср, 18 сент. 2019 г. в 09:39, Alexei > > Scherbakov < > > > > >> >> > >> >> > > > > > > [hidden email] >: > > > > >> >> > >> >> > > > > > > > > > > > >> >> > >> >> > > > > > > > Denis, > > > > >> >> > >> >> > > > > > > > > > > > >> >> > >> >> > > > > > > > I like the idea of throwing an exception > > for > > > > >> enabled > > > > >> >> > text > > > > >> >> > >> >> > queries > > > > >> >> > >> >> > > > on > > > > >> >> > >> >> > > > > > > > persistent caches. > > > > >> >> > >> >> > > > > > > > > > > > >> >> > >> >> > > > > > > > Also I'm fine with proposed limit for > > unsorted > > > > >> >> > searches. > > > > >> >> > >> >> > > > > > > > > > > > >> >> > >> >> > > > > > > > Yury, please proceed with ticket creation. > > > > >> >> > >> >> > > > > > > > > > > > >> >> > >> >> > > > > > > > вт, 17 сент. 2019 г., 22:06 Denis Magda < > > > > >> >> > >> [hidden email] > > > > >> >> > >> >> >: > > > > >> >> > >> >> > > > > > > > > > > > >> >> > >> >> > > > > > > > > Igniters, > > > > >> >> > >> >> > > > > > > > > > > > > >> >> > >> >> > > > > > > > > I see nothing wrong with Yury's proposal > > in > > > > >> regards > > > > >> >> > >> >> full-text > > > > >> >> > >> >> > > > > search > > > > >> >> > >> >> > > > > > > API > > > > >> >> > >> >> > > > > > > > > evolution as long as Yury is ready to > > push it > > > > >> >> > forward. > > > > >> >> > >> >> > > > > > > > > > > > > >> >> > >> >> > > > > > > > > As for the in-memory mode only, it makes > > > > total > > > > >> >> sense > > > > >> >> > >> for > > > > >> >> > >> >> > > > in-memory > > > > >> >> > >> >> > > > > data > > > > >> >> > >> >> > > > > > > > > grid deployments when Ignite caches data > > of > > > > an > > > > >> >> > >> underlying > > > > >> >> > >> >> DB > > > > >> >> > >> >> > > like > > > > >> >> > >> >> > > > > > > Postgres. > > > > >> >> > >> >> > > > > > > > > As part of the changes, I would simply > > throw > > > > an > > > > >> >> > >> exception > > > > >> >> > >> >> (by > > > > >> >> > >> >> > > > > default) > > > > >> >> > >> >> > > > > > > if > > > > >> >> > >> >> > > > > > > > > the one attempts to use text indices > > with the > > > > >> >> native > > > > >> >> > >> >> > > persistence > > > > >> >> > >> >> > > > > > > enabled. > > > > >> >> > >> >> > > > > > > > > If the person is ready to live with that > > > > >> limitation > > > > >> >> > >> that > > > > >> >> > >> >> an > > > > >> >> > >> >> > > > > explicit > > > > >> >> > >> >> > > > > > > > > configuration change is needed to come > > around > > > > >> the > > > > >> >> > >> >> exception. > > > > >> >> > >> >> > > > > > > > > > > > > >> >> > >> >> > > > > > > > > Thoughts? > > > > >> >> > >> >> > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > >> >> > >> >> > > > > > > > > - > > > > >> >> > >> >> > > > > > > > > Denis > > > > >> >> > >> >> > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > >> >> > >> >> > > > > > > > > On Tue, Sep 17, 2019 at 7:44 AM Yuriy > > > > Shuliga < > > > > >> >> > >> >> > > [hidden email] > > > > >> >> > >> >> > > > > > > > > >> >> > >> >> > > > > > > wrote: > > > > >> >> > >> >> > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > Hello to all again, > > > > >> >> > >> >> > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > Thank you for important comments and > > notes > > > > >> given > > > > >> >> > >> below! > > > > >> >> > >> >> > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > Let me answer and continue the > > discussion. > > > > >> >> > >> >> > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > (I) Overall needs in Lucene indexing > > > > >> >> > >> >> > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > Alexei has referenced to > > > > >> >> > >> >> > > > > > > > > > > > > > >> >> https://issues.apache.org/jira/browse/IGNITE-5371 > > > > >> >> > >> where > > > > >> >> > >> >> > > > > > > > > > absence of index persistence was > > declared > > > > as > > > > >> an > > > > >> >> > >> >> obstacle to > > > > >> >> > >> >> > > > > further > > > > >> >> > >> >> > > > > > > > > > development. > > > > >> >> > >> >> > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > a) This ticket is already closed as not > > > > >> valid.b) > > > > >> >> > >> There > > > > >> >> > >> >> are > > > > >> >> > >> >> > > > > definite > > > > >> >> > >> >> > > > > > > needs > > > > >> >> > >> >> > > > > > > > > > (and in our project as well) in just > > > > in-memory > > > > >> >> > >> indexing > > > > >> >> > >> >> of > > > > >> >> > >> >> > > > > selected > > > > >> >> > >> >> > > > > > > data. > > > > >> >> > >> >> > > > > > > > > > We intend to use search capabilities > > for > > > > >> fetching > > > > >> >> > >> >> limited > > > > >> >> > >> >> > > > amount > > > > >> >> > >> >> > > > > of > > > > >> >> > >> >> > > > > > > > > records > > > > >> >> > >> >> > > > > > > > > > that should be used in type-ahead > > search / > > > > >> >> > >> suggestions. > > > > >> >> > >> >> > > > > > > > > > Not all of the data will be indexed > > and the > > > > >> are > > > > >> >> no > > > > >> >> > >> need > > > > >> >> > >> >> in > > > > >> >> > >> >> > > > Lucene > > > > >> >> > >> >> > > > > > > index > > > > >> >> > >> >> > > > > > > > > to > > > > >> >> > >> >> > > > > > > > > > be persistence. Hope this is a wide > > > > pattern of > > > > >> >> > >> >> text-search > > > > >> >> > >> >> > > > usage. > > > > >> >> > >> >> > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > (II) Necessary fixes in current > > > > >> implementation. > > > > >> >> > >> >> > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > a) Implementation of correct *limit > > > > *(*offset* > > > > >> >> > seems > > > > >> >> > >> to > > > > >> >> > >> >> be > > > > >> >> > >> >> > > not > > > > >> >> > >> >> > > > > > > required > > > > >> >> > >> >> > > > > > > > > in > > > > >> >> > >> >> > > > > > > > > > text-search tasks for now) > > > > >> >> > >> >> > > > > > > > > > I have investigated the data flow for > > > > >> distributed > > > > >> >> > >> text > > > > >> >> > >> >> > > queries. > > > > >> >> > >> >> > > > > it > > > > >> >> > >> >> > > > > > > was > > > > >> >> > >> >> > > > > > > > > > simple test prefix query, like > > > > 'name'*='ene*'* > > > > >> >> > >> >> > > > > > > > > > For now each server-node returns all > > > > response > > > > >> >> > >> records to > > > > >> >> > >> >> > the > > > > >> >> > >> >> > > > > > > client-node > > > > >> >> > >> >> > > > > > > > > > and it may contain ~thousands, ~hundred > > > > >> thousands > > > > >> >> > >> >> records. > > > > >> >> > >> >> > > > > > > > > > Event if we need only first 10-100. > > Again, > > > > all > > > > >> >> the > > > > >> >> > >> >> results > > > > >> >> > >> >> > > are > > > > >> >> > >> >> > > > > added > > > > >> >> > >> >> > > > > > > to > > > > >> >> > >> >> > > > > > > > > > queue in GridCacheQueryFutureAdapter in > > > > >> arbitrary > > > > >> >> > >> order > > > > >> >> > >> >> by > > > > >> >> > >> >> > > > pages. > > > > >> >> > >> >> > > > > > > > > > I did not find here any means to > > deliver > > > > >> >> > >> deterministic > > > > >> >> > >> >> > > result. > > > > >> >> > >> >> > > > > > > > > > So implementing limit as part of query > > and > > > > >> >> > >> >> > > > > (GridCacheQueryRequest) > > > > >> >> > >> >> > > > > > > will > > > > >> >> > >> >> > > > > > > > > not > > > > >> >> > >> >> > > > > > > > > > change the nature of response but will > > > > limit > > > > >> load > > > > >> >> > on > > > > >> >> > >> >> nodes > > > > >> >> > >> >> > > and > > > > >> >> > >> >> > > > > > > > > networking. > > > > >> >> > >> >> > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > Can we consider to open a ticket for > > this? > > > > >> >> > >> >> > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > (III) Further extension of Lucene API > > > > >> exposition > > > > >> >> to > > > > >> >> > >> >> Ignite > > > > >> >> > >> >> > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > a) Sorting > > > > >> >> > >> >> > > > > > > > > > The solution for this could be: > > > > >> >> > >> >> > > > > > > > > > - Make entities comparable > > > > >> >> > >> >> > > > > > > > > > - Add custom comparator to entity > > > > >> >> > >> >> > > > > > > > > > - Add annotations to mark sorted > > fields for > > > > >> >> Lucene > > > > >> >> > >> >> indexing > > > > >> >> > >> >> > > > > > > > > > - Use comparators when merging > > responses or > > > > >> >> > reducing > > > > >> >> > >> to > > > > >> >> > >> >> > > desired > > > > >> >> > >> >> > > > > > > limit on > > > > >> >> > >> >> > > > > > > > > > client node. > > > > >> >> > >> >> > > > > > > > > > Will require full result set to be > > loaded > > > > into > > > > >> >> > >> memory. > > > > >> >> > >> >> > Though > > > > >> >> > >> >> > > > > can be > > > > >> >> > >> >> > > > > > > used > > > > >> >> > >> >> > > > > > > > > > for relatively small limits. > > > > >> >> > >> >> > > > > > > > > > BR, > > > > >> >> > >> >> > > > > > > > > > Yuriy Shuliha > > > > >> >> > >> >> > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > пт, 30 серп. 2019 о 10:37 Alexei > > > > Scherbakov < > > > > >> >> > >> >> > > > > > > > > [hidden email] > > > > > >> >> > >> >> > > > > > > > > > пише: > > > > >> >> > >> >> > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > Yuriy, > > > > >> >> > >> >> > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > Note what one of major blockers for > > text > > > > >> >> queries > > > > >> >> > is > > > > >> >> > >> >> [1] > > > > >> >> > >> >> > > which > > > > >> >> > >> >> > > > > makes > > > > >> >> > >> >> > > > > > > > > > lucene > > > > >> >> > >> >> > > > > > > > > > > indexes unusable with persistence and > > > > main > > > > >> >> reason > > > > >> >> > >> for > > > > >> >> > >> >> > > > > > > discontinuation. > > > > >> >> > >> >> > > > > > > > > > > Probably it's should be addressed > > first > > > > to > > > > >> make > > > > >> >> > >> text > > > > >> >> > >> >> > > queries > > > > >> >> > >> >> > > > a > > > > >> >> > >> >> > > > > > > valid > > > > >> >> > >> >> > > > > > > > > > > product feature. > > > > >> >> > >> >> > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > Distributed sorting and advanved > > > > querying is > > > > >> >> > indeed > > > > >> >> > >> >> not a > > > > >> >> > >> >> > > > > trivial > > > > >> >> > >> >> > > > > > > task. > > > > >> >> > >> >> > > > > > > > > > > Some kind of merging must be > > implemented > > > > on > > > > >> >> query > > > > >> >> > >> >> > > originating > > > > >> >> > >> >> > > > > node. > > > > >> >> > >> >> > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > [1] > > > > >> >> > >> https://issues.apache.org/jira/browse/IGNITE-5371 > > > > >> >> > >> >> > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > чт, 29 авг. 2019 г. в 23:38, Denis > > Magda > > > > < > > > > >> >> > >> >> > > [hidden email] > > > > >> >> > >> >> > > > >: > > > > >> >> > >> >> > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > Yuriy, > > > > >> >> > >> >> > > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > If you are ready to take over the > > > > >> full-text > > > > >> >> > >> search > > > > >> >> > >> >> > > indexes > > > > >> >> > >> >> > > > > then > > > > >> >> > >> >> > > > > > > > > please > > > > >> >> > >> >> > > > > > > > > > go > > > > >> >> > >> >> > > > > > > > > > > > ahead. The primary reason why the > > > > >> community > > > > >> >> > >> wants to > > > > >> >> > >> >> > > > > discontinue > > > > >> >> > >> >> > > > > > > them > > > > >> >> > >> >> > > > > > > > > > > first > > > > >> >> > >> >> > > > > > > > > > > > (and, probable, resurrect later) > > are > > > > the > > > > >> >> > >> limitations > > > > >> >> > >> >> > > listed > > > > >> >> > >> >> > > > > by > > > > >> >> > >> >> > > > > > > Andrey > > > > >> >> > >> >> > > > > > > > > > and > > > > >> >> > >> >> > > > > > > > > > > > minimal support from the community > > end. > > > > >> >> > >> >> > > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > - > > > > >> >> > >> >> > > > > > > > > > > > Denis > > > > >> >> > >> >> > > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > On Thu, Aug 29, 2019 at 1:29 PM > > Andrey > > > > >> >> > Mashenkov > > > > >> >> > >> < > > > > >> >> > >> >> > > > > > > > > > > > [hidden email] > > > > > >> >> > >> >> > > > > > > > > > > > wrote: > > > > >> >> > >> >> > > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > Hi Yuriy, > > > > >> >> > >> >> > > > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > Unfortunatelly, there is a plan > > to > > > > >> >> > discontinue > > > > >> >> > >> >> > > > TextQueries > > > > >> >> > >> >> > > > > in > > > > >> >> > >> >> > > > > > > > > Ignite > > > > >> >> > >> >> > > > > > > > > > > [1]. > > > > >> >> > >> >> > > > > > > > > > > > > Motivation here is text indexes > > are > > > > not > > > > >> >> > >> >> persistent, > > > > >> >> > >> >> > not > > > > >> >> > >> >> > > > > > > > > transactional > > > > >> >> > >> >> > > > > > > > > > > and > > > > >> >> > >> >> > > > > > > > > > > > > can't be user together with SQL > > or > > > > >> inside > > > > >> >> > SQL. > > > > >> >> > >> >> > > > > > > > > > > > > and there is a lack of interest > > from > > > > >> >> > community > > > > >> >> > >> >> side. > > > > >> >> > >> >> > > > > > > > > > > > > You are weclome to take on these > > > > issues > > > > >> and > > > > >> >> > >> make > > > > >> >> > >> >> > > > > TextQueries > > > > >> >> > >> >> > > > > > > great. > > > > >> >> > >> >> > > > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > 1, PageSize can't be used to > > limit > > > > >> >> > resultset. > > > > >> >> > >> >> > > > > > > > > > > > > Query results return from data > > node > > > > to > > > > >> >> > >> client-side > > > > >> >> > >> >> > > cursor > > > > >> >> > >> >> > > > > in > > > > >> >> > >> >> > > > > > > > > > > page-by-page > > > > >> >> > >> >> > > > > > > > > > > > > manner and > > > > >> >> > >> >> > > > > > > > > > > > > this parameter is designed > > control > > > > page > > > > >> >> size. > > > > >> >> > >> It > > > > >> >> > >> >> is > > > > >> >> > >> >> > > > > supposed > > > > >> >> > >> >> > > > > > > query > > > > >> >> > >> >> > > > > > > > > > > > executes > > > > >> >> > >> >> > > > > > > > > > > > > lazily on server side and > > > > >> >> > >> >> > > > > > > > > > > > > it is not excepted full > > resultset be > > > > >> loaded > > > > >> >> > to > > > > >> >> > >> >> memory > > > > >> >> > >> >> > > on > > > > >> >> > >> >> > > > > server > > > > >> >> > >> >> > > > > > > > > side > > > > >> >> > >> >> > > > > > > > > > at > > > > >> >> > >> >> > > > > > > > > > > > > once, but by pages. > > > > >> >> > >> >> > > > > > > > > > > > > Do you mean you found Lucene load > > > > entire > > > > >> >> > >> resultset > > > > >> >> > >> >> > into > > > > >> >> > >> >> > > > > memory > > > > >> >> > >> >> > > > > > > > > before > > > > >> >> > >> >> > > > > > > > > > > > first > > > > >> >> > >> >> > > > > > > > > > > > > page is sent to client? > > > > >> >> > >> >> > > > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > I'd think a new parameter should > > be > > > > >> added > > > > >> >> to > > > > >> >> > >> limit > > > > >> >> > >> >> > > > result. > > > > >> >> > >> >> > > > > The > > > > >> >> > >> >> > > > > > > best > > > > >> >> > >> >> > > > > > > > > > > > > solution is to use query language > > > > >> commands > > > > >> >> > for > > > > >> >> > >> >> this, > > > > >> >> > >> >> > > e.g. > > > > >> >> > >> >> > > > > > > > > > > "LIMIT/OFFSET" > > > > >> >> > >> >> > > > > > > > > > > > in > > > > >> >> > >> >> > > > > > > > > > > > > SQL. > > > > >> >> > >> >> > > > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > This task doesn't look trivial. > > > > Query is > > > > >> >> > >> >> distributed > > > > >> >> > >> >> > > > > operation > > > > >> >> > >> >> > > > > > > and > > > > >> >> > >> >> > > > > > > > > > same > > > > >> >> > >> >> > > > > > > > > > > > > user query will be executed on > > data > > > > >> nodes > > > > >> >> > >> >> > > > > > > > > > > > > and then results from all nodes > > > > should > > > > >> be > > > > >> >> > >> correcly > > > > >> >> > >> >> > > merged > > > > >> >> > >> >> > > > > > > before > > > > >> >> > >> >> > > > > > > > > > being > > > > >> >> > >> >> > > > > > > > > > > > > returned via client-cursor. > > > > >> >> > >> >> > > > > > > > > > > > > So, LIMIT should be applied on > > every > > > > >> node > > > > >> >> and > > > > >> >> > >> >> then on > > > > >> >> > >> >> > > > merge > > > > >> >> > >> >> > > > > > > phase. > > > > >> >> > >> >> > > > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > Also, this may be non-obviuos, > > > > limiting > > > > >> >> > results > > > > >> >> > >> >> make > > > > >> >> > >> >> > no > > > > >> >> > >> >> > > > > sence > > > > >> >> > >> >> > > > > > > > > without > > > > >> >> > >> >> > > > > > > > > > > > > sorting, > > > > >> >> > >> >> > > > > > > > > > > > > as there is no guarantee every > > next > > > > >> query > > > > >> >> run > > > > >> >> > >> will > > > > >> >> > >> >> > > return > > > > >> >> > >> >> > > > > same > > > > >> >> > >> >> > > > > > > data > > > > >> >> > >> >> > > > > > > > > > > > because > > > > >> >> > >> >> > > > > > > > > > > > > of page reordeing. > > > > >> >> > >> >> > > > > > > > > > > > > Basically, merge phase receive > > > > results > > > > >> from > > > > >> >> > >> data > > > > >> >> > >> >> > nodes > > > > >> >> > >> >> > > > > > > > > asynchronously > > > > >> >> > >> >> > > > > > > > > > > and > > > > >> >> > >> >> > > > > > > > > > > > > messages from different nodes > > can't > > > > be > > > > >> >> > ordered. > > > > >> >> > >> >> > > > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > 2. > > > > >> >> > >> >> > > > > > > > > > > > > a. "tokenize" param name (for > > > > >> >> > @QueryTextFiled) > > > > >> >> > >> >> looks > > > > >> >> > >> >> > > more > > > > >> >> > >> >> > > > > > > verbose, > > > > >> >> > >> >> > > > > > > > > > > isn't > > > > >> >> > >> >> > > > > > > > > > > > > it. > > > > >> >> > >> >> > > > > > > > > > > > > b,c. What about distributed > > query? > > > > How > > > > >> >> > partial > > > > >> >> > >> >> > results > > > > >> >> > >> >> > > > from > > > > >> >> > >> >> > > > > > > nodes > > > > >> >> > >> >> > > > > > > > > > will > > > > >> >> > >> >> > > > > > > > > > > be > > > > >> >> > >> >> > > > > > > > > > > > > merged? > > > > >> >> > >> >> > > > > > > > > > > > > Does Lucene allows to configure > > > > >> comparator > > > > >> >> > for > > > > >> >> > >> >> data > > > > >> >> > >> >> > > > > sorting? > > > > >> >> > >> >> > > > > > > > > > > > > What comparator Ignite should > > choose > > > > to > > > > >> >> sort > > > > >> >> > >> >> result > > > > >> >> > >> >> > on > > > > >> >> > >> >> > > > > merge > > > > >> >> > >> >> > > > > > > phase? > > > > >> >> > >> >> > > > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > 3. For now Lucene engine is not > > > > >> >> configurable > > > > >> >> > at > > > > >> >> > >> >> all. > > > > >> >> > >> >> > > E.g. > > > > >> >> > >> >> > > > > it is > > > > >> >> > >> >> > > > > > > > > > > > impossible > > > > >> >> > >> >> > > > > > > > > > > > > to configure Tokenizer. > > > > >> >> > >> >> > > > > > > > > > > > > I'd think about possible ways to > > > > >> configure > > > > >> >> > >> engine > > > > >> >> > >> >> at > > > > >> >> > >> >> > > > first > > > > >> >> > >> >> > > > > and > > > > >> >> > >> >> > > > > > > only > > > > >> >> > >> >> > > > > > > > > > > then > > > > >> >> > >> >> > > > > > > > > > > > go > > > > >> >> > >> >> > > > > > > > > > > > > further to discuss\implement > > complex > > > > >> >> > features, > > > > >> >> > >> >> > > > > > > > > > > > > that may depends on engine > > config. > > > > >> >> > >> >> > > > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > On Thu, Aug 29, 2019 at 8:17 PM > > Yuriy > > > > >> >> > Shuliga < > > > > >> >> > >> >> > > > > > > [hidden email] > > > > > >> >> > >> >> > > > > > > > > > > wrote: > > > > >> >> > >> >> > > > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > Dear community, > > > > >> >> > >> >> > > > > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > By starting this chain I'd > > like to > > > > >> open > > > > >> >> > >> >> discussion > > > > >> >> > >> >> > > that > > > > >> >> > >> >> > > > > would > > > > >> >> > >> >> > > > > > > > > come > > > > >> >> > >> >> > > > > > > > > > to > > > > >> >> > >> >> > > > > > > > > > > > > > contribution results in subj. > > area. > > > > >> >> > >> >> > > > > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > Ignite has indexing > > capabilities, > > > > >> backed > > > > >> >> up > > > > >> >> > >> by > > > > >> >> > >> >> > > > different > > > > >> >> > >> >> > > > > > > > > > mechanisms, > > > > >> >> > >> >> > > > > > > > > > > > > > including Lucene. > > > > >> >> > >> >> > > > > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > Currently, Lucene 7.5.0 is used > > > > (past > > > > >> >> year > > > > >> >> > >> >> > release). > > > > >> >> > >> >> > > > > > > > > > > > > > This is a wide spread and > > mature > > > > >> >> technology > > > > >> >> > >> that > > > > >> >> > >> >> > > covers > > > > >> >> > >> >> > > > > text > > > > >> >> > >> >> > > > > > > > > search > > > > >> >> > >> >> > > > > > > > > > > > area > > > > >> >> > >> >> > > > > > > > > > > > > > and beyond (e.g. spacial data > > > > >> indexing). > > > > >> >> > >> >> > > > > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > My goal is to *expose more > > Lucene > > > > >> >> > >> functionality > > > > >> >> > >> >> to > > > > >> >> > >> >> > > > Ignite > > > > >> >> > >> >> > > > > > > > > indexing > > > > >> >> > >> >> > > > > > > > > > > and > > > > >> >> > >> >> > > > > > > > > > > > > > query mechanisms for text > > data*. > > > > >> >> > >> >> > > > > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > It's quite simple request at > > > > current > > > > >> >> stage. > > > > >> >> > >> It > > > > >> >> > >> >> is > > > > >> >> > >> >> > > > coming > > > > >> >> > >> >> > > > > > > from our > > > > >> >> > >> >> > > > > > > > > > > > > project's > > > > >> >> > >> >> > > > > > > > > > > > > > needs, but i believe, will be > > > > useful > > > > >> for > > > > >> >> a > > > > >> >> > >> lot > > > > >> >> > >> >> more > > > > >> >> > >> >> > > > > people. > > > > >> >> > >> >> > > > > > > > > > > > > > Let's walk through and vote or > > > > discuss > > > > >> >> > about > > > > >> >> > >> >> Jira > > > > >> >> > >> >> > > > > tickets for > > > > >> >> > >> >> > > > > > > > > them. > > > > >> >> > >> >> > > > > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > 1.[trivial] Use > > > > >> dataQuery.getPageSize() > > > > >> >> > to > > > > >> >> > >> >> limit > > > > >> >> > >> >> > > > search > > > > >> >> > >> >> > > > > > > > > response > > > > >> >> > >> >> > > > > > > > > > > > items > > > > >> >> > >> >> > > > > > > > > > > > > > inside GridLuceneIndex.query(). > > > > >> Currently > > > > >> >> > it > > > > >> >> > >> is > > > > >> >> > >> >> > > calling > > > > >> >> > >> >> > > > > > > > > > > > > > IndexSearcher.search(query, > > > > >> >> > >> >> *Integer.MAX_VALUE*) - > > > > >> >> > >> >> > so > > > > >> >> > >> >> > > > > > > basically > > > > >> >> > >> >> > > > > > > > > all > > > > >> >> > >> >> > > > > > > > > > > > > scored > > > > >> >> > >> >> > > > > > > > > > > > > > matches will me returned, what > > we > > > > do > > > > >> not > > > > >> >> > >> need in > > > > >> >> > >> >> > most > > > > >> >> > >> >> > > > > cases. > > > > >> >> > >> >> > > > > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > 2.[simple] Add sorting. Then > > more > > > > >> >> capable > > > > >> >> > >> >> search > > > > >> >> > >> >> > > call > > > > >> >> > >> >> > > > > can be > > > > >> >> > >> >> > > > > > > > > > > > > > executed: > > > > *IndexSearcher.search(query, > > > > >> >> > count, > > > > >> >> > >> >> > > > > > > > > > > > > > sort) * > > > > >> >> > >> >> > > > > > > > > > > > > > Implementation steps: > > > > >> >> > >> >> > > > > > > > > > > > > > a) Introduce boolean > > *sortField* > > > > >> >> parameter > > > > >> >> > in > > > > >> >> > >> >> > > > > > > *@QueryTextFiled * > > > > >> >> > >> >> > > > > > > > > > > > > > annotation. If > > > > >> >> > >> >> > > > > > > > > > > > > > *true *the filed will be > > indexed > > > > but > > > > >> not > > > > >> >> > >> >> tokenized. > > > > >> >> > >> >> > > > > Number > > > > >> >> > >> >> > > > > > > types > > > > >> >> > >> >> > > > > > > > > > are > > > > >> >> > >> >> > > > > > > > > > > > > > preferred here. > > > > >> >> > >> >> > > > > > > > > > > > > > b) Add *sort* collection to > > > > >> *TextQuery* > > > > >> >> > >> >> > constructor. > > > > >> >> > >> >> > > It > > > > >> >> > >> >> > > > > > > should > > > > >> >> > >> >> > > > > > > > > > define > > > > >> >> > >> >> > > > > > > > > > > > > > desired sort fields used for > > > > querying. > > > > >> >> > >> >> > > > > > > > > > > > > > c) Implement Lucene sort usage > > in > > > > >> >> > >> >> > > > > GridLuceneIndex.query(). > > > > >> >> > >> >> > > > > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > 3.[moderate] Build complex > > queries > > > > >> with > > > > >> >> > >> >> > *TextQuery*, > > > > >> >> > >> >> > > > > > > including > > > > >> >> > >> >> > > > > > > > > > > > > > terms/queries boosting. > > > > >> >> > >> >> > > > > > > > > > > > > > *This section for voting only, > > as > > > > >> >> requires > > > > >> >> > >> more > > > > >> >> > >> >> > > > detailed > > > > >> >> > >> >> > > > > > > work. > > > > >> >> > >> >> > > > > > > > > > Should > > > > >> >> > >> >> > > > > > > > > > > > be > > > > >> >> > >> >> > > > > > > > > > > > > > extended if community is > > > > interested in > > > > >> >> it.* > > > > >> >> > >> >> > > > > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > Looking forward to your > > comments! > > > > >> >> > >> >> > > > > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > BR, > > > > >> >> > >> >> > > > > > > > > > > > > > Yuriy Shuliha > > > > >> >> > >> >> > > > > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > -- > > > > >> >> > >> >> > > > > > > > > > > > > Best regards, > > > > >> >> > >> >> > > > > > > > > > > > > Andrey V. Mashenkov > > > > >> >> > >> >> > > > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > -- > > > > >> >> > >> >> > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > Best regards, > > > > >> >> > >> >> > > > > > > > > > > Alexei Scherbakov > > > > >> >> > >> >> > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > >> >> > >> >> > > > > > > > > > > >> >> > >> >> > > > > > > > > > > >> >> > >> >> > > > > > > -- > > > > >> >> > >> >> > > > > > > Best regards, > > > > >> >> > >> >> > > > > > > Ivan Pavlukhin > > > > >> >> > >> >> > > > > > > > > > > >> >> > >> >> > > > > > > > > >> >> > >> >> > > > > > > > > >> >> > >> >> > > > > > > > > >> >> > >> >> > > > > -- > > > > >> >> > >> >> > > > > Best regards, > > > > >> >> > >> >> > > > > Ivan Pavlukhin > > > > >> >> > >> >> > > > > > > > > >> >> > >> >> > > > > > > > >> >> > >> >> > > > > > > >> >> > >> >> > > > > > >> >> > >> >> > > > > > >> >> > >> >> > -- > > > > >> >> > >> >> > Best regards, > > > > >> >> > >> >> > Andrey V. Mashenkov > > > > >> >> > >> >> > > > > > >> >> > >> >> > > > > >> >> > >> > > > > > >> >> > >> > > > > > >> >> > >> > -- > > > > >> >> > >> > Best regards, > > > > >> >> > >> > Andrey V. Mashenkov > > > > >> >> > >> > > > > > >> >> > >> > > > > >> >> > > > > > > >> >> > > > > > >> >> > -- > > > > >> >> > Best regards, > > > > >> >> > Andrey V. Mashenkov > > > > >> >> > > > > > >> >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > Best regards, > > Ivan Pavlukhin > > > > -- Best regards, Ivan Pavlukhin |
Nice to hear, Ivan
It's good practice to make existing functionality extension to be proper presented; as we expect if from Text Queries. Lets make it work correctly at first. I'm ok to prepare ticket for adding reduction for sorted responses to GridCacheDistributedQueryFuture or nearby. Also theTextQuery response entity will be extended to carry Lucene's 'docScore' per record. No open question has left then. BR, Yuriy Shuliha чт, 28 лист. 2019 о 15:23 Ivan Pavlukhin <[hidden email]> пише: > Folks, Yuriy, > > I suppose that we are going to proceed with > > >>> > Reducing on Ignite > > The obvious point of distributed response reduction is class > GridCacheDistributedQueryFuture. > Though, @Ivan Pavlukhin mentioned class with similar functionality: > ReduceIndexSorted > What I see here, that it is tangled with H2 related classes > (org.h2.result.Row) and might not be unified with TextQuery reduction. > >> > > From my side there is no strict opinion that we should unify > reduction. Having a separate reduction implementation for text queries > sounds for me as not bad option as well. > > Are there still any open questions? > > ср, 27 нояб. 2019 г. в 02:27, Denis Magda <[hidden email]>: > > > > I don't see anything wrong if Yuriy is willing to carry on and keep > > enhancing our full-text search support that lacks basic capabilities. > > > > The basics should be available. If anybody needs an advanced feature they > > can introduce Solr or ElastiSearch into the final architecture of the > app. > > > > Folks, who of us can help Yuriy with the questions asked? Most like the > SQL > > experts are the best candidates here. > > > > > > - > > Denis > > > > > > On Tue, Nov 26, 2019 at 8:52 AM Ivan Pavlukhin <[hidden email]> > wrote: > > > > > Folks, > > > > > > IEP is an Ignite-specific thing. In fact, I suppose that we are > > > already doing it in ASF way by having this dev-list discussion =) > > > > > > As for me, implementing "limit" feature for text queries is not so big > > > to make an IEP. But we might need to create one for next features. > > > > > > вт, 26 нояб. 2019 г. в 15:06, Ilya Kasnacheev < > [hidden email]>: > > > > > > > > Hello! > > > > > > > > ASF way should probably start with an IEP :) > > > > > > > > Regards, > > > > -- > > > > Ilya Kasnacheev > > > > > > > > > > > > вт, 26 нояб. 2019 г. в 14:12, Zhenya Stanilovsky > > > <[hidden email] > > > > >: > > > > > > > > > > > > > > Ok, lets forgot Solr and go through ASF way, if Yuriy prove this > > > > > functionality is helpful and PR it, why not ? > > > > > > > > > > isn`t it ? > > > > > > > > > > >Вторник, 26 ноября 2019, 14:06 +03:00 от Ilya Kasnacheev < > > > > > [hidden email]>: > > > > > > > > > > > >Hello! > > > > > > > > > > > >The problem here is that Solr is a multi-year effort by a lot of > > > people. > > > > > We > > > > > >can't match that. > > > > > > > > > > > >Maybe we could integrate with Solr/Solr Cloud instead, by feeding > our > > > > > cache > > > > > >information into their storage for indexing and relying on their > own > > > > > >mechanisms for distributed IR sorting? > > > > > > > > > > > >Regards, > > > > > >-- > > > > > >Ilya Kasnacheev > > > > > > > > > > > > > > > > > >вт, 26 нояб. 2019 г. в 13:59, Zhenya Stanilovsky < > > > > > [hidden email] > > > > > >>: > > > > > > > > > > > >> > > > > > >> Ilya Kasnacheev, what a problem in Solr with Ignite > functionality ? > > > > > >> > > > > > >> thanks ! > > > > > >> > > > > > >> >Вторник, 26 ноября 2019, 13:50 +03:00 от Ilya Kasnacheev < > > > > > >> [hidden email] >: > > > > > >> > > > > > > >> >Hello! > > > > > >> > > > > > > >> >I have a hunch that we are trying to build Apache Solr (or Solr > > > Cloud) > > > > > >> into > > > > > >> >Apache Ignite. I think that's a lot of effort that is not very > > > > > justified. > > > > > >> > > > > > > >> >I don't think we should try to implement sorting in Apache > Ignite, > > > > > because > > > > > >> >it is a lot of work, and a lot of code in our code base which > we > > > don't > > > > > >> >really want. > > > > > >> > > > > > > >> >Regards, > > > > > >> >-- > > > > > >> >Ilya Kasnacheev > > > > > >> > > > > > > >> > > > > > > >> >пт, 22 нояб. 2019 г. в 20:59, Yuriy Shuliga < > [hidden email] > > > >: > > > > > >> > > > > > > >> >> Dear Igniters, > > > > > >> >> > > > > > >> >> The first part of TextQuery improvement - a result limit - > was > > > > > developed > > > > > >> >> and merged. > > > > > >> >> Now we have to develop most important functionality here - > proper > > > > > >> sorting > > > > > >> >> of Lucene index response and correct reducing of them for > > > distributed > > > > > >> >> queries. > > > > > >> >> > > > > > >> >> *There are two Lucene based aspects* > > > > > >> >> > > > > > >> >> 1. In case of using no sorting fields, the documents in > response > > > are > > > > > >> still > > > > > >> >> ordered by relevance. > > > > > >> >> Actually this is ScoreDoc.score value. > > > > > >> >> In order to reduce the distributed results correctly, the > score > > > > > should > > > > > >> be > > > > > >> >> passed with response. > > > > > >> >> > > > > > >> >> 2. When sorting by conventional fields, then Lucene should > have > > > these > > > > > >> >> fields properly indexed and > > > > > >> >> corresponding Sort object should be applied to Lucene's > search > > > call. > > > > > >> >> In order to mark those fields a new annotation like > '@SortField' > > > may > > > > > be > > > > > >> >> introduced. > > > > > >> >> > > > > > >> >> *Reducing on Ignite * > > > > > >> >> > > > > > >> >> The obvious point of distributed response reduction is class > > > > > >> >> GridCacheDistributedQueryFuture. > > > > > >> >> Though, @Ivan Pavlukhin mentioned class with similar > > > functionality: > > > > > >> >> ReduceIndexSorted > > > > > >> >> What I see here, that it is tangled with H2 related classes ( > > > > > >> >> org.h2.result.Row) and might not be unified with TextQuery > > > reduction. > > > > > >> >> > > > > > >> >> Still need a support here. > > > > > >> >> > > > > > >> >> Overall, the goal of this letter is to initiate discussion on > > > > > TextQuery > > > > > >> >> Sorting implementation and come closer to ticket creation. > > > > > >> >> > > > > > >> >> BR, > > > > > >> >> Yuriy Shuliha > > > > > >> >> > > > > > >> >> вт, 22 жовт. 2019 о 13:31 Andrey Mashenkov < > > > > > [hidden email] > > > > > >> > > > > > > >> >> пише: > > > > > >> >> > > > > > >> >> > Hi Dmitry, Yuriy. > > > > > >> >> > > > > > > >> >> > I've found GridCacheQueryFutureAdapter has newly added > > > > > AtomicInteger > > > > > >> >> > 'total' field and 'limit; field as primitive int. > > > > > >> >> > > > > > > >> >> > Both fields are used inside synchronized block only. > > > > > >> >> > So, we can make both private and downgrade AtomicInteger to > > > > > primitive > > > > > >> >> int. > > > > > >> >> > > > > > > >> >> > Most likely, these fields can be replaced with one field. > > > > > >> >> > > > > > > >> >> > > > > > > >> >> > > > > > > >> >> > On Mon, Oct 21, 2019 at 10:01 PM Dmitriy Pavlov < > > > > > [hidden email] > > > > > >> > > > > > > >> >> > wrote: > > > > > >> >> > > > > > > >> >> > > Hi Andrey, > > > > > >> >> > > > > > > > >> >> > > I've checked this ticket comments, and there is a TC Bot > visa > > > > > (with > > > > > >> no > > > > > >> >> > > blockers). > > > > > >> >> > > > > > > > >> >> > > Do you have any concerns related to this patch? > > > > > >> >> > > > > > > > >> >> > > Sincerely, > > > > > >> >> > > Dmitriy Pavlov > > > > > >> >> > > > > > > > >> >> > > чт, 17 окт. 2019 г. в 16:43, Yuriy Shuliga < > > > [hidden email] > > > > > >: > > > > > >> >> > > > > > > > >> >> > >> Andrey, > > > > > >> >> > >> > > > > > >> >> > >> Per you request, I created ticket > > > > > >> >> > >> https://issues.apache.org/jira/browse/IGNITE-12291 > linked > > > to > > > > > >> >> > >> > > > > > >> > https://issues.apache.org/jira/projects/IGNITE/issues/IGNITE-12189 > > > > > >> >> > >> > > > > > >> >> > >> Could you please proceed with PR merge ? > > > > > >> >> > >> > > > > > >> >> > >> BR, > > > > > >> >> > >> Yuriy Shuliha > > > > > >> >> > >> > > > > > >> >> > >> ср, 9 жовт. 2019 о 12:52 Andrey Mashenkov < > > > > > >> [hidden email] > > > > > >> >> > > > > > > >> >> > >> пише: > > > > > >> >> > >> > > > > > >> >> > >> > Hi Yuri, > > > > > >> >> > >> > > > > > > >> >> > >> > To get access to TC Bot you should register as > TeamCity > > > user > > > > > >> [1], if > > > > > >> >> > you > > > > > >> >> > >> > didn't do this already. > > > > > >> >> > >> > Then you will be able to authorize on Ignite TC Bot > page > > > with > > > > > >> same > > > > > >> >> > >> > credentials. > > > > > >> >> > >> > > > > > > >> >> > >> > [1] https://ci.ignite.apache.org/registerUser.html > > > > > >> >> > >> > > > > > > >> >> > >> > On Fri, Oct 4, 2019 at 3:10 PM Yuriy Shuliga < > > > > > [hidden email] > > > > > >> > > > > > > >> >> > wrote: > > > > > >> >> > >> > > > > > > >> >> > >> >> Andrew, > > > > > >> >> > >> >> > > > > > >> >> > >> >> I have corrected PR according to your notes. Please > > > review. > > > > > >> >> > >> >> What will be the next steps in order to merge in? > > > > > >> >> > >> >> > > > > > >> >> > >> >> Y. > > > > > >> >> > >> >> > > > > > >> >> > >> >> чт, 3 жовт. 2019 о 17:47 Andrey Mashenkov < > > > > > >> >> > [hidden email] > > > > > > >> >> > >> >> пише: > > > > > >> >> > >> >> > > > > > >> >> > >> >> > Yuri, > > > > > >> >> > >> >> > > > > > > >> >> > >> >> > I've done with review. > > > > > >> >> > >> >> > No crime found, but trivial compatibility bug. > > > > > >> >> > >> >> > > > > > > >> >> > >> >> > On Thu, Oct 3, 2019 at 3:54 PM Yuriy Shuliga < > > > > > >> [hidden email] > > > > > > >> >> > >> wrote: > > > > > >> >> > >> >> > > > > > > >> >> > >> >> > > Denis, > > > > > >> >> > >> >> > > > > > > > >> >> > >> >> > > Thank you for your attention to this. > > > > > >> >> > >> >> > > as for now, the > > > > > >> >> > https://issues.apache.org/jira/browse/IGNITE-12189 > > > > > >> >> > >> >> > ticket > > > > > >> >> > >> >> > > is still pending review. > > > > > >> >> > >> >> > > Do we have a chance to move it forward somehow? > > > > > >> >> > >> >> > > > > > > > >> >> > >> >> > > BR, > > > > > >> >> > >> >> > > Yuriy Shuliha > > > > > >> >> > >> >> > > > > > > > >> >> > >> >> > > пн, 30 вер. 2019 о 23:35 Denis Magda < > > > > > [hidden email] > > > > > > >> пише: > > > > > >> >> > >> >> > > > > > > > >> >> > >> >> > > > Yuriy, > > > > > >> >> > >> >> > > > > > > > > >> >> > >> >> > > > I've seen you opening a pull-request with the > first > > > > > >> changes: > > > > > >> >> > >> >> > > > > > > https://issues.apache.org/jira/browse/IGNITE-12189 > > > > > >> >> > >> >> > > > > > > > > >> >> > >> >> > > > Alex Scherbakov and Ivan are you the right > guys to > > > do > > > > > the > > > > > >> >> > review? > > > > > >> >> > >> >> > > > > > > > > >> >> > >> >> > > > - > > > > > >> >> > >> >> > > > Denis > > > > > >> >> > >> >> > > > > > > > > >> >> > >> >> > > > > > > > > >> >> > >> >> > > > On Fri, Sep 27, 2019 at 8:48 AM Павлухин Иван < > > > > > >> >> > >> [hidden email] > > > > > > >> >> > >> >> > > wrote: > > > > > >> >> > >> >> > > > > > > > > >> >> > >> >> > > > > Yuriy, > > > > > >> >> > >> >> > > > > > > > > > >> >> > >> >> > > > > Thank you for providing details! Quite > > > interesting. > > > > > >> >> > >> >> > > > > > > > > > >> >> > >> >> > > > > Yes, we already have support of distributed > > > limit and > > > > > >> >> merging > > > > > >> >> > >> >> sorted > > > > > >> >> > >> >> > > > > subresults for SQL queries. E.g. > > > ReduceIndexSorted > > > > > and > > > > > >> >> > >> >> > > > > MergeStreamIterator are used for merging > sorted > > > > > streams. > > > > > >> >> > >> >> > > > > > > > > > >> >> > >> >> > > > > Could you please also clarify about > > > score/relevance? > > > > > Is > > > > > >> it > > > > > >> >> > >> >> provided > > > > > >> >> > >> >> > by > > > > > >> >> > >> >> > > > > Lucene engine for each query result? I am > > > thinking > > > > > how > > > > > >> to > > > > > >> >> do > > > > > >> >> > >> >> sorted > > > > > >> >> > >> >> > > > > merge properly in this case. > > > > > >> >> > >> >> > > > > > > > > > >> >> > >> >> > > > > ср, 25 сент. 2019 г. в 18:56, Yuriy Shuliga < > > > > > >> >> > [hidden email] > > > > > >> >> > >> >: > > > > > >> >> > >> >> > > > > > > > > > > >> >> > >> >> > > > > > Ivan, > > > > > >> >> > >> >> > > > > > > > > > > >> >> > >> >> > > > > > Thank you for interesting question! > > > > > >> >> > >> >> > > > > > > > > > > >> >> > >> >> > > > > > Text searches (or full text searches) are > > > mostly > > > > > >> >> > >> human-oriented. > > > > > >> >> > >> >> > And > > > > > >> >> > >> >> > > > the > > > > > >> >> > >> >> > > > > > point of user's interest is topmost part of > > > > > response. > > > > > >> >> > >> >> > > > > > Then user can read it, evaluate and use the > > > given > > > > > >> records > > > > > >> >> > for > > > > > >> >> > >> >> > further > > > > > >> >> > >> >> > > > > > purposes. > > > > > >> >> > >> >> > > > > > > > > > > >> >> > >> >> > > > > > Particularly in our case, we use Ignite for > > > > > operations > > > > > >> >> with > > > > > >> >> > >> >> > financial > > > > > >> >> > >> >> > > > > data, > > > > > >> >> > >> >> > > > > > and there lots of text stuff like assets > names, > > > > > fin. > > > > > >> >> > >> >> instruments, > > > > > >> >> > >> >> > > > > companies > > > > > >> >> > >> >> > > > > > etc. > > > > > >> >> > >> >> > > > > > In order to operate with this quickly and > > > reliably, > > > > > >> users > > > > > >> >> > >> used > > > > > >> >> > >> >> to > > > > > >> >> > >> >> > > work > > > > > >> >> > >> >> > > > > with > > > > > >> >> > >> >> > > > > > text search, type-ahead completions, > > > suggestions. > > > > > >> >> > >> >> > > > > > > > > > > >> >> > >> >> > > > > > For this purposes we are indexing > particular > > > string > > > > > >> data > > > > > >> >> in > > > > > >> >> > >> >> > separate > > > > > >> >> > >> >> > > > > caches. > > > > > >> >> > >> >> > > > > > > > > > > >> >> > >> >> > > > > > Sorting capabilities and response size > > > limitations > > > > > are > > > > > >> >> very > > > > > >> >> > >> >> > important > > > > > >> >> > >> >> > > > > > there. As our API have to provide most > relevant > > > > > >> >> information > > > > > >> >> > >> in > > > > > >> >> > >> >> view > > > > > >> >> > >> >> > > of > > > > > >> >> > >> >> > > > > > limited size. > > > > > >> >> > >> >> > > > > > > > > > > >> >> > >> >> > > > > > Now let me comment some Ignite/Lucene > > > perspective. > > > > > >> >> > >> >> > > > > > Actually Ignite queries and Lucene returns > > > > > >> >> > >> *TopDocs.scoresDocs > > > > > >> >> > >> >> > > *already > > > > > >> >> > >> >> > > > > > sorted by *score *(relevance). So most > relevant > > > > > >> documents > > > > > >> >> > >> are on > > > > > >> >> > >> >> > the > > > > > >> >> > >> >> > > > top. > > > > > >> >> > >> >> > > > > > And currently distributed queries responses > > > from > > > > > >> >> different > > > > > >> >> > >> nodes > > > > > >> >> > >> >> > are > > > > > >> >> > >> >> > > > > merged > > > > > >> >> > >> >> > > > > > into final query cursor queue in arbitrary > way. > > > > > >> >> > >> >> > > > > > So in fact we already have the score order > > > ruined > > > > > >> here. > > > > > >> >> > Also > > > > > >> >> > >> >> Ignite > > > > > >> >> > >> >> > > > > > requests all possible documents from Lucene > > > that is > > > > > >> >> > redundant > > > > > >> >> > >> >> and > > > > > >> >> > >> >> > not > > > > > >> >> > >> >> > > > > good > > > > > >> >> > >> >> > > > > > for performance. > > > > > >> >> > >> >> > > > > > > > > > > >> >> > >> >> > > > > > I'm implementing *limit* parameter to be > part > > > of > > > > > >> >> *TextQuery > > > > > >> >> > >> *and > > > > > >> >> > >> >> > have > > > > > >> >> > >> >> > > > to > > > > > >> >> > >> >> > > > > > notice that we still have to add sorting > for > > > text > > > > > >> queries > > > > > >> >> > >> >> > processing > > > > > >> >> > >> >> > > in > > > > > >> >> > >> >> > > > > > order to have applicable results. > > > > > >> >> > >> >> > > > > > > > > > > >> >> > >> >> > > > > > *Limit* parameter itself should improve the > > > part of > > > > > >> >> issues > > > > > >> >> > >> from > > > > > >> >> > >> >> > > above, > > > > > >> >> > >> >> > > > > but > > > > > >> >> > >> >> > > > > > definitely, sorting by document score at > least > > > > > should > > > > > >> be > > > > > >> >> > >> >> > implemented > > > > > >> >> > >> >> > > > > along > > > > > >> >> > >> >> > > > > > with limit. > > > > > >> >> > >> >> > > > > > > > > > > >> >> > >> >> > > > > > This is a pretty short commentary if you > still > > > have > > > > > >> any > > > > > >> >> > >> >> questions, > > > > > >> >> > >> >> > > > please > > > > > >> >> > >> >> > > > > > ask, do not hesitate) > > > > > >> >> > >> >> > > > > > > > > > > >> >> > >> >> > > > > > BR, > > > > > >> >> > >> >> > > > > > Yuriy Shuliha > > > > > >> >> > >> >> > > > > > > > > > > >> >> > >> >> > > > > > чт, 19 вер. 2019 о 11:38 Павлухин Иван < > > > > > >> >> > [hidden email] > > > > > > >> >> > >> >> пише: > > > > > >> >> > >> >> > > > > > > > > > > >> >> > >> >> > > > > > > Yuriy, > > > > > >> >> > >> >> > > > > > > > > > > > >> >> > >> >> > > > > > > Greatly appreciate your interest. > > > > > >> >> > >> >> > > > > > > > > > > > >> >> > >> >> > > > > > > Could you please elaborate a little bit > about > > > > > >> sorting? > > > > > >> >> > What > > > > > >> >> > >> >> tasks > > > > > >> >> > >> >> > > > does > > > > > >> >> > >> >> > > > > > > it help to solve and how? It would be > great > > > to > > > > > >> provide > > > > > >> >> an > > > > > >> >> > >> >> > example. > > > > > >> >> > >> >> > > > > > > > > > > > >> >> > >> >> > > > > > > ср, 18 сент. 2019 г. в 09:39, Alexei > > > Scherbakov < > > > > > >> >> > >> >> > > > > > > [hidden email] >: > > > > > >> >> > >> >> > > > > > > > > > > > > >> >> > >> >> > > > > > > > Denis, > > > > > >> >> > >> >> > > > > > > > > > > > > >> >> > >> >> > > > > > > > I like the idea of throwing an > exception > > > for > > > > > >> enabled > > > > > >> >> > text > > > > > >> >> > >> >> > queries > > > > > >> >> > >> >> > > > on > > > > > >> >> > >> >> > > > > > > > persistent caches. > > > > > >> >> > >> >> > > > > > > > > > > > > >> >> > >> >> > > > > > > > Also I'm fine with proposed limit for > > > unsorted > > > > > >> >> > searches. > > > > > >> >> > >> >> > > > > > > > > > > > > >> >> > >> >> > > > > > > > Yury, please proceed with ticket > creation. > > > > > >> >> > >> >> > > > > > > > > > > > > >> >> > >> >> > > > > > > > вт, 17 сент. 2019 г., 22:06 Denis > Magda < > > > > > >> >> > >> [hidden email] > > > > > >> >> > >> >> >: > > > > > >> >> > >> >> > > > > > > > > > > > > >> >> > >> >> > > > > > > > > Igniters, > > > > > >> >> > >> >> > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > I see nothing wrong with Yury's > proposal > > > in > > > > > >> regards > > > > > >> >> > >> >> full-text > > > > > >> >> > >> >> > > > > search > > > > > >> >> > >> >> > > > > > > API > > > > > >> >> > >> >> > > > > > > > > evolution as long as Yury is ready to > > > push it > > > > > >> >> > forward. > > > > > >> >> > >> >> > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > As for the in-memory mode only, it > makes > > > > > total > > > > > >> >> sense > > > > > >> >> > >> for > > > > > >> >> > >> >> > > > in-memory > > > > > >> >> > >> >> > > > > data > > > > > >> >> > >> >> > > > > > > > > grid deployments when Ignite caches > data > > > of > > > > > an > > > > > >> >> > >> underlying > > > > > >> >> > >> >> DB > > > > > >> >> > >> >> > > like > > > > > >> >> > >> >> > > > > > > Postgres. > > > > > >> >> > >> >> > > > > > > > > As part of the changes, I would > simply > > > throw > > > > > an > > > > > >> >> > >> exception > > > > > >> >> > >> >> (by > > > > > >> >> > >> >> > > > > default) > > > > > >> >> > >> >> > > > > > > if > > > > > >> >> > >> >> > > > > > > > > the one attempts to use text indices > > > with the > > > > > >> >> native > > > > > >> >> > >> >> > > persistence > > > > > >> >> > >> >> > > > > > > enabled. > > > > > >> >> > >> >> > > > > > > > > If the person is ready to live with > that > > > > > >> limitation > > > > > >> >> > >> that > > > > > >> >> > >> >> an > > > > > >> >> > >> >> > > > > explicit > > > > > >> >> > >> >> > > > > > > > > configuration change is needed to > come > > > around > > > > > >> the > > > > > >> >> > >> >> exception. > > > > > >> >> > >> >> > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > Thoughts? > > > > > >> >> > >> >> > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > - > > > > > >> >> > >> >> > > > > > > > > Denis > > > > > >> >> > >> >> > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > On Tue, Sep 17, 2019 at 7:44 AM Yuriy > > > > > Shuliga < > > > > > >> >> > >> >> > > [hidden email] > > > > > >> >> > >> >> > > > > > > > > > >> >> > >> >> > > > > > > wrote: > > > > > >> >> > >> >> > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > Hello to all again, > > > > > >> >> > >> >> > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > Thank you for important comments > and > > > notes > > > > > >> given > > > > > >> >> > >> below! > > > > > >> >> > >> >> > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > Let me answer and continue the > > > discussion. > > > > > >> >> > >> >> > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > (I) Overall needs in Lucene > indexing > > > > > >> >> > >> >> > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > Alexei has referenced to > > > > > >> >> > >> >> > > > > > > > > > > > > > > >> >> https://issues.apache.org/jira/browse/IGNITE-5371 > > > > > >> >> > >> where > > > > > >> >> > >> >> > > > > > > > > > absence of index persistence was > > > declared > > > > > as > > > > > >> an > > > > > >> >> > >> >> obstacle to > > > > > >> >> > >> >> > > > > further > > > > > >> >> > >> >> > > > > > > > > > development. > > > > > >> >> > >> >> > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > a) This ticket is already closed > as not > > > > > >> valid.b) > > > > > >> >> > >> There > > > > > >> >> > >> >> are > > > > > >> >> > >> >> > > > > definite > > > > > >> >> > >> >> > > > > > > needs > > > > > >> >> > >> >> > > > > > > > > > (and in our project as well) in > just > > > > > in-memory > > > > > >> >> > >> indexing > > > > > >> >> > >> >> of > > > > > >> >> > >> >> > > > > selected > > > > > >> >> > >> >> > > > > > > data. > > > > > >> >> > >> >> > > > > > > > > > We intend to use search > capabilities > > > for > > > > > >> fetching > > > > > >> >> > >> >> limited > > > > > >> >> > >> >> > > > amount > > > > > >> >> > >> >> > > > > of > > > > > >> >> > >> >> > > > > > > > > records > > > > > >> >> > >> >> > > > > > > > > > that should be used in type-ahead > > > search / > > > > > >> >> > >> suggestions. > > > > > >> >> > >> >> > > > > > > > > > Not all of the data will be indexed > > > and the > > > > > >> are > > > > > >> >> no > > > > > >> >> > >> need > > > > > >> >> > >> >> in > > > > > >> >> > >> >> > > > Lucene > > > > > >> >> > >> >> > > > > > > index > > > > > >> >> > >> >> > > > > > > > > to > > > > > >> >> > >> >> > > > > > > > > > be persistence. Hope this is a wide > > > > > pattern of > > > > > >> >> > >> >> text-search > > > > > >> >> > >> >> > > > usage. > > > > > >> >> > >> >> > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > (II) Necessary fixes in current > > > > > >> implementation. > > > > > >> >> > >> >> > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > a) Implementation of correct *limit > > > > > *(*offset* > > > > > >> >> > seems > > > > > >> >> > >> to > > > > > >> >> > >> >> be > > > > > >> >> > >> >> > > not > > > > > >> >> > >> >> > > > > > > required > > > > > >> >> > >> >> > > > > > > > > in > > > > > >> >> > >> >> > > > > > > > > > text-search tasks for now) > > > > > >> >> > >> >> > > > > > > > > > I have investigated the data flow > for > > > > > >> distributed > > > > > >> >> > >> text > > > > > >> >> > >> >> > > queries. > > > > > >> >> > >> >> > > > > it > > > > > >> >> > >> >> > > > > > > was > > > > > >> >> > >> >> > > > > > > > > > simple test prefix query, like > > > > > 'name'*='ene*'* > > > > > >> >> > >> >> > > > > > > > > > For now each server-node returns > all > > > > > response > > > > > >> >> > >> records to > > > > > >> >> > >> >> > the > > > > > >> >> > >> >> > > > > > > client-node > > > > > >> >> > >> >> > > > > > > > > > and it may contain ~thousands, > ~hundred > > > > > >> thousands > > > > > >> >> > >> >> records. > > > > > >> >> > >> >> > > > > > > > > > Event if we need only first 10-100. > > > Again, > > > > > all > > > > > >> >> the > > > > > >> >> > >> >> results > > > > > >> >> > >> >> > > are > > > > > >> >> > >> >> > > > > added > > > > > >> >> > >> >> > > > > > > to > > > > > >> >> > >> >> > > > > > > > > > queue in > GridCacheQueryFutureAdapter in > > > > > >> arbitrary > > > > > >> >> > >> order > > > > > >> >> > >> >> by > > > > > >> >> > >> >> > > > pages. > > > > > >> >> > >> >> > > > > > > > > > I did not find here any means to > > > deliver > > > > > >> >> > >> deterministic > > > > > >> >> > >> >> > > result. > > > > > >> >> > >> >> > > > > > > > > > So implementing limit as part of > query > > > and > > > > > >> >> > >> >> > > > > (GridCacheQueryRequest) > > > > > >> >> > >> >> > > > > > > will > > > > > >> >> > >> >> > > > > > > > > not > > > > > >> >> > >> >> > > > > > > > > > change the nature of response but > will > > > > > limit > > > > > >> load > > > > > >> >> > on > > > > > >> >> > >> >> nodes > > > > > >> >> > >> >> > > and > > > > > >> >> > >> >> > > > > > > > > networking. > > > > > >> >> > >> >> > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > Can we consider to open a ticket > for > > > this? > > > > > >> >> > >> >> > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > (III) Further extension of Lucene > API > > > > > >> exposition > > > > > >> >> to > > > > > >> >> > >> >> Ignite > > > > > >> >> > >> >> > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > a) Sorting > > > > > >> >> > >> >> > > > > > > > > > The solution for this could be: > > > > > >> >> > >> >> > > > > > > > > > - Make entities comparable > > > > > >> >> > >> >> > > > > > > > > > - Add custom comparator to entity > > > > > >> >> > >> >> > > > > > > > > > - Add annotations to mark sorted > > > fields for > > > > > >> >> Lucene > > > > > >> >> > >> >> indexing > > > > > >> >> > >> >> > > > > > > > > > - Use comparators when merging > > > responses or > > > > > >> >> > reducing > > > > > >> >> > >> to > > > > > >> >> > >> >> > > desired > > > > > >> >> > >> >> > > > > > > limit on > > > > > >> >> > >> >> > > > > > > > > > client node. > > > > > >> >> > >> >> > > > > > > > > > Will require full result set to be > > > loaded > > > > > into > > > > > >> >> > >> memory. > > > > > >> >> > >> >> > Though > > > > > >> >> > >> >> > > > > can be > > > > > >> >> > >> >> > > > > > > used > > > > > >> >> > >> >> > > > > > > > > > for relatively small limits. > > > > > >> >> > >> >> > > > > > > > > > BR, > > > > > >> >> > >> >> > > > > > > > > > Yuriy Shuliha > > > > > >> >> > >> >> > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > пт, 30 серп. 2019 о 10:37 Alexei > > > > > Scherbakov < > > > > > >> >> > >> >> > > > > > > > > [hidden email] > > > > > > >> >> > >> >> > > > > > > > > > пише: > > > > > >> >> > >> >> > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > Yuriy, > > > > > >> >> > >> >> > > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > Note what one of major blockers > for > > > text > > > > > >> >> queries > > > > > >> >> > is > > > > > >> >> > >> >> [1] > > > > > >> >> > >> >> > > which > > > > > >> >> > >> >> > > > > makes > > > > > >> >> > >> >> > > > > > > > > > lucene > > > > > >> >> > >> >> > > > > > > > > > > indexes unusable with > persistence and > > > > > main > > > > > >> >> reason > > > > > >> >> > >> for > > > > > >> >> > >> >> > > > > > > discontinuation. > > > > > >> >> > >> >> > > > > > > > > > > Probably it's should be addressed > > > first > > > > > to > > > > > >> make > > > > > >> >> > >> text > > > > > >> >> > >> >> > > queries > > > > > >> >> > >> >> > > > a > > > > > >> >> > >> >> > > > > > > valid > > > > > >> >> > >> >> > > > > > > > > > > product feature. > > > > > >> >> > >> >> > > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > Distributed sorting and advanved > > > > > querying is > > > > > >> >> > indeed > > > > > >> >> > >> >> not a > > > > > >> >> > >> >> > > > > trivial > > > > > >> >> > >> >> > > > > > > task. > > > > > >> >> > >> >> > > > > > > > > > > Some kind of merging must be > > > implemented > > > > > on > > > > > >> >> query > > > > > >> >> > >> >> > > originating > > > > > >> >> > >> >> > > > > node. > > > > > >> >> > >> >> > > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > [1] > > > > > >> >> > >> https://issues.apache.org/jira/browse/IGNITE-5371 > > > > > >> >> > >> >> > > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > чт, 29 авг. 2019 г. в 23:38, > Denis > > > Magda > > > > > < > > > > > >> >> > >> >> > > [hidden email] > > > > > >> >> > >> >> > > > >: > > > > > >> >> > >> >> > > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > Yuriy, > > > > > >> >> > >> >> > > > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > If you are ready to take over > the > > > > > >> full-text > > > > > >> >> > >> search > > > > > >> >> > >> >> > > indexes > > > > > >> >> > >> >> > > > > then > > > > > >> >> > >> >> > > > > > > > > please > > > > > >> >> > >> >> > > > > > > > > > go > > > > > >> >> > >> >> > > > > > > > > > > > ahead. The primary reason why > the > > > > > >> community > > > > > >> >> > >> wants to > > > > > >> >> > >> >> > > > > discontinue > > > > > >> >> > >> >> > > > > > > them > > > > > >> >> > >> >> > > > > > > > > > > first > > > > > >> >> > >> >> > > > > > > > > > > > (and, probable, resurrect > later) > > > are > > > > > the > > > > > >> >> > >> limitations > > > > > >> >> > >> >> > > listed > > > > > >> >> > >> >> > > > > by > > > > > >> >> > >> >> > > > > > > Andrey > > > > > >> >> > >> >> > > > > > > > > > and > > > > > >> >> > >> >> > > > > > > > > > > > minimal support from the > community > > > end. > > > > > >> >> > >> >> > > > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > - > > > > > >> >> > >> >> > > > > > > > > > > > Denis > > > > > >> >> > >> >> > > > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > On Thu, Aug 29, 2019 at 1:29 PM > > > Andrey > > > > > >> >> > Mashenkov > > > > > >> >> > >> < > > > > > >> >> > >> >> > > > > > > > > > > > [hidden email] > > > > > > >> >> > >> >> > > > > > > > > > > > wrote: > > > > > >> >> > >> >> > > > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > Hi Yuriy, > > > > > >> >> > >> >> > > > > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > Unfortunatelly, there is a > plan > > > to > > > > > >> >> > discontinue > > > > > >> >> > >> >> > > > TextQueries > > > > > >> >> > >> >> > > > > in > > > > > >> >> > >> >> > > > > > > > > Ignite > > > > > >> >> > >> >> > > > > > > > > > > [1]. > > > > > >> >> > >> >> > > > > > > > > > > > > Motivation here is text > indexes > > > are > > > > > not > > > > > >> >> > >> >> persistent, > > > > > >> >> > >> >> > not > > > > > >> >> > >> >> > > > > > > > > transactional > > > > > >> >> > >> >> > > > > > > > > > > and > > > > > >> >> > >> >> > > > > > > > > > > > > can't be user together with > SQL > > > or > > > > > >> inside > > > > > >> >> > SQL. > > > > > >> >> > >> >> > > > > > > > > > > > > and there is a lack of > interest > > > from > > > > > >> >> > community > > > > > >> >> > >> >> side. > > > > > >> >> > >> >> > > > > > > > > > > > > You are weclome to take on > these > > > > > issues > > > > > >> and > > > > > >> >> > >> make > > > > > >> >> > >> >> > > > > TextQueries > > > > > >> >> > >> >> > > > > > > great. > > > > > >> >> > >> >> > > > > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > 1, PageSize can't be used to > > > limit > > > > > >> >> > resultset. > > > > > >> >> > >> >> > > > > > > > > > > > > Query results return from > data > > > node > > > > > to > > > > > >> >> > >> client-side > > > > > >> >> > >> >> > > cursor > > > > > >> >> > >> >> > > > > in > > > > > >> >> > >> >> > > > > > > > > > > page-by-page > > > > > >> >> > >> >> > > > > > > > > > > > > manner and > > > > > >> >> > >> >> > > > > > > > > > > > > this parameter is designed > > > control > > > > > page > > > > > >> >> size. > > > > > >> >> > >> It > > > > > >> >> > >> >> is > > > > > >> >> > >> >> > > > > supposed > > > > > >> >> > >> >> > > > > > > query > > > > > >> >> > >> >> > > > > > > > > > > > executes > > > > > >> >> > >> >> > > > > > > > > > > > > lazily on server side and > > > > > >> >> > >> >> > > > > > > > > > > > > it is not excepted full > > > resultset be > > > > > >> loaded > > > > > >> >> > to > > > > > >> >> > >> >> memory > > > > > >> >> > >> >> > > on > > > > > >> >> > >> >> > > > > server > > > > > >> >> > >> >> > > > > > > > > side > > > > > >> >> > >> >> > > > > > > > > > at > > > > > >> >> > >> >> > > > > > > > > > > > > once, but by pages. > > > > > >> >> > >> >> > > > > > > > > > > > > Do you mean you found Lucene > load > > > > > entire > > > > > >> >> > >> resultset > > > > > >> >> > >> >> > into > > > > > >> >> > >> >> > > > > memory > > > > > >> >> > >> >> > > > > > > > > before > > > > > >> >> > >> >> > > > > > > > > > > > first > > > > > >> >> > >> >> > > > > > > > > > > > > page is sent to client? > > > > > >> >> > >> >> > > > > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > I'd think a new parameter > should > > > be > > > > > >> added > > > > > >> >> to > > > > > >> >> > >> limit > > > > > >> >> > >> >> > > > result. > > > > > >> >> > >> >> > > > > The > > > > > >> >> > >> >> > > > > > > best > > > > > >> >> > >> >> > > > > > > > > > > > > solution is to use query > language > > > > > >> commands > > > > > >> >> > for > > > > > >> >> > >> >> this, > > > > > >> >> > >> >> > > e.g. > > > > > >> >> > >> >> > > > > > > > > > > "LIMIT/OFFSET" > > > > > >> >> > >> >> > > > > > > > > > > > in > > > > > >> >> > >> >> > > > > > > > > > > > > SQL. > > > > > >> >> > >> >> > > > > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > This task doesn't look > trivial. > > > > > Query is > > > > > >> >> > >> >> distributed > > > > > >> >> > >> >> > > > > operation > > > > > >> >> > >> >> > > > > > > and > > > > > >> >> > >> >> > > > > > > > > > same > > > > > >> >> > >> >> > > > > > > > > > > > > user query will be executed > on > > > data > > > > > >> nodes > > > > > >> >> > >> >> > > > > > > > > > > > > and then results from all > nodes > > > > > should > > > > > >> be > > > > > >> >> > >> correcly > > > > > >> >> > >> >> > > merged > > > > > >> >> > >> >> > > > > > > before > > > > > >> >> > >> >> > > > > > > > > > being > > > > > >> >> > >> >> > > > > > > > > > > > > returned via client-cursor. > > > > > >> >> > >> >> > > > > > > > > > > > > So, LIMIT should be applied > on > > > every > > > > > >> node > > > > > >> >> and > > > > > >> >> > >> >> then on > > > > > >> >> > >> >> > > > merge > > > > > >> >> > >> >> > > > > > > phase. > > > > > >> >> > >> >> > > > > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > Also, this may be > non-obviuos, > > > > > limiting > > > > > >> >> > results > > > > > >> >> > >> >> make > > > > > >> >> > >> >> > no > > > > > >> >> > >> >> > > > > sence > > > > > >> >> > >> >> > > > > > > > > without > > > > > >> >> > >> >> > > > > > > > > > > > > sorting, > > > > > >> >> > >> >> > > > > > > > > > > > > as there is no guarantee > every > > > next > > > > > >> query > > > > > >> >> run > > > > > >> >> > >> will > > > > > >> >> > >> >> > > return > > > > > >> >> > >> >> > > > > same > > > > > >> >> > >> >> > > > > > > data > > > > > >> >> > >> >> > > > > > > > > > > > because > > > > > >> >> > >> >> > > > > > > > > > > > > of page reordeing. > > > > > >> >> > >> >> > > > > > > > > > > > > Basically, merge phase > receive > > > > > results > > > > > >> from > > > > > >> >> > >> data > > > > > >> >> > >> >> > nodes > > > > > >> >> > >> >> > > > > > > > > asynchronously > > > > > >> >> > >> >> > > > > > > > > > > and > > > > > >> >> > >> >> > > > > > > > > > > > > messages from different nodes > > > can't > > > > > be > > > > > >> >> > ordered. > > > > > >> >> > >> >> > > > > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > 2. > > > > > >> >> > >> >> > > > > > > > > > > > > a. "tokenize" param name (for > > > > > >> >> > @QueryTextFiled) > > > > > >> >> > >> >> looks > > > > > >> >> > >> >> > > more > > > > > >> >> > >> >> > > > > > > verbose, > > > > > >> >> > >> >> > > > > > > > > > > isn't > > > > > >> >> > >> >> > > > > > > > > > > > > it. > > > > > >> >> > >> >> > > > > > > > > > > > > b,c. What about distributed > > > query? > > > > > How > > > > > >> >> > partial > > > > > >> >> > >> >> > results > > > > > >> >> > >> >> > > > from > > > > > >> >> > >> >> > > > > > > nodes > > > > > >> >> > >> >> > > > > > > > > > will > > > > > >> >> > >> >> > > > > > > > > > > be > > > > > >> >> > >> >> > > > > > > > > > > > > merged? > > > > > >> >> > >> >> > > > > > > > > > > > > Does Lucene allows to > configure > > > > > >> comparator > > > > > >> >> > for > > > > > >> >> > >> >> data > > > > > >> >> > >> >> > > > > sorting? > > > > > >> >> > >> >> > > > > > > > > > > > > What comparator Ignite should > > > choose > > > > > to > > > > > >> >> sort > > > > > >> >> > >> >> result > > > > > >> >> > >> >> > on > > > > > >> >> > >> >> > > > > merge > > > > > >> >> > >> >> > > > > > > phase? > > > > > >> >> > >> >> > > > > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > 3. For now Lucene engine is > not > > > > > >> >> configurable > > > > > >> >> > at > > > > > >> >> > >> >> all. > > > > > >> >> > >> >> > > E.g. > > > > > >> >> > >> >> > > > > it is > > > > > >> >> > >> >> > > > > > > > > > > > impossible > > > > > >> >> > >> >> > > > > > > > > > > > > to configure Tokenizer. > > > > > >> >> > >> >> > > > > > > > > > > > > I'd think about possible > ways to > > > > > >> configure > > > > > >> >> > >> engine > > > > > >> >> > >> >> at > > > > > >> >> > >> >> > > > first > > > > > >> >> > >> >> > > > > and > > > > > >> >> > >> >> > > > > > > only > > > > > >> >> > >> >> > > > > > > > > > > then > > > > > >> >> > >> >> > > > > > > > > > > > go > > > > > >> >> > >> >> > > > > > > > > > > > > further to discuss\implement > > > complex > > > > > >> >> > features, > > > > > >> >> > >> >> > > > > > > > > > > > > that may depends on engine > > > config. > > > > > >> >> > >> >> > > > > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > On Thu, Aug 29, 2019 at 8:17 > PM > > > Yuriy > > > > > >> >> > Shuliga < > > > > > >> >> > >> >> > > > > > > [hidden email] > > > > > > >> >> > >> >> > > > > > > > > > > wrote: > > > > > >> >> > >> >> > > > > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > Dear community, > > > > > >> >> > >> >> > > > > > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > By starting this chain I'd > > > like to > > > > > >> open > > > > > >> >> > >> >> discussion > > > > > >> >> > >> >> > > that > > > > > >> >> > >> >> > > > > would > > > > > >> >> > >> >> > > > > > > > > come > > > > > >> >> > >> >> > > > > > > > > > to > > > > > >> >> > >> >> > > > > > > > > > > > > > contribution results in > subj. > > > area. > > > > > >> >> > >> >> > > > > > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > Ignite has indexing > > > capabilities, > > > > > >> backed > > > > > >> >> up > > > > > >> >> > >> by > > > > > >> >> > >> >> > > > different > > > > > >> >> > >> >> > > > > > > > > > mechanisms, > > > > > >> >> > >> >> > > > > > > > > > > > > > including Lucene. > > > > > >> >> > >> >> > > > > > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > Currently, Lucene 7.5.0 is > used > > > > > (past > > > > > >> >> year > > > > > >> >> > >> >> > release). > > > > > >> >> > >> >> > > > > > > > > > > > > > This is a wide spread and > > > mature > > > > > >> >> technology > > > > > >> >> > >> that > > > > > >> >> > >> >> > > covers > > > > > >> >> > >> >> > > > > text > > > > > >> >> > >> >> > > > > > > > > search > > > > > >> >> > >> >> > > > > > > > > > > > area > > > > > >> >> > >> >> > > > > > > > > > > > > > and beyond (e.g. spacial > data > > > > > >> indexing). > > > > > >> >> > >> >> > > > > > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > My goal is to *expose more > > > Lucene > > > > > >> >> > >> functionality > > > > > >> >> > >> >> to > > > > > >> >> > >> >> > > > Ignite > > > > > >> >> > >> >> > > > > > > > > indexing > > > > > >> >> > >> >> > > > > > > > > > > and > > > > > >> >> > >> >> > > > > > > > > > > > > > query mechanisms for text > > > data*. > > > > > >> >> > >> >> > > > > > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > It's quite simple request > at > > > > > current > > > > > >> >> stage. > > > > > >> >> > >> It > > > > > >> >> > >> >> is > > > > > >> >> > >> >> > > > coming > > > > > >> >> > >> >> > > > > > > from our > > > > > >> >> > >> >> > > > > > > > > > > > > project's > > > > > >> >> > >> >> > > > > > > > > > > > > > needs, but i believe, will > be > > > > > useful > > > > > >> for > > > > > >> >> a > > > > > >> >> > >> lot > > > > > >> >> > >> >> more > > > > > >> >> > >> >> > > > > people. > > > > > >> >> > >> >> > > > > > > > > > > > > > Let's walk through and > vote or > > > > > discuss > > > > > >> >> > about > > > > > >> >> > >> >> Jira > > > > > >> >> > >> >> > > > > tickets for > > > > > >> >> > >> >> > > > > > > > > them. > > > > > >> >> > >> >> > > > > > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > 1.[trivial] Use > > > > > >> dataQuery.getPageSize() > > > > > >> >> > to > > > > > >> >> > >> >> limit > > > > > >> >> > >> >> > > > search > > > > > >> >> > >> >> > > > > > > > > response > > > > > >> >> > >> >> > > > > > > > > > > > items > > > > > >> >> > >> >> > > > > > > > > > > > > > inside > GridLuceneIndex.query(). > > > > > >> Currently > > > > > >> >> > it > > > > > >> >> > >> is > > > > > >> >> > >> >> > > calling > > > > > >> >> > >> >> > > > > > > > > > > > > > IndexSearcher.search(query, > > > > > >> >> > >> >> *Integer.MAX_VALUE*) - > > > > > >> >> > >> >> > so > > > > > >> >> > >> >> > > > > > > basically > > > > > >> >> > >> >> > > > > > > > > all > > > > > >> >> > >> >> > > > > > > > > > > > > scored > > > > > >> >> > >> >> > > > > > > > > > > > > > matches will me returned, > what > > > we > > > > > do > > > > > >> not > > > > > >> >> > >> need in > > > > > >> >> > >> >> > most > > > > > >> >> > >> >> > > > > cases. > > > > > >> >> > >> >> > > > > > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > 2.[simple] Add sorting. > Then > > > more > > > > > >> >> capable > > > > > >> >> > >> >> search > > > > > >> >> > >> >> > > call > > > > > >> >> > >> >> > > > > can be > > > > > >> >> > >> >> > > > > > > > > > > > > > executed: > > > > > *IndexSearcher.search(query, > > > > > >> >> > count, > > > > > >> >> > >> >> > > > > > > > > > > > > > sort) * > > > > > >> >> > >> >> > > > > > > > > > > > > > Implementation steps: > > > > > >> >> > >> >> > > > > > > > > > > > > > a) Introduce boolean > > > *sortField* > > > > > >> >> parameter > > > > > >> >> > in > > > > > >> >> > >> >> > > > > > > *@QueryTextFiled * > > > > > >> >> > >> >> > > > > > > > > > > > > > annotation. If > > > > > >> >> > >> >> > > > > > > > > > > > > > *true *the filed will be > > > indexed > > > > > but > > > > > >> not > > > > > >> >> > >> >> tokenized. > > > > > >> >> > >> >> > > > > Number > > > > > >> >> > >> >> > > > > > > types > > > > > >> >> > >> >> > > > > > > > > > are > > > > > >> >> > >> >> > > > > > > > > > > > > > preferred here. > > > > > >> >> > >> >> > > > > > > > > > > > > > b) Add *sort* collection to > > > > > >> *TextQuery* > > > > > >> >> > >> >> > constructor. > > > > > >> >> > >> >> > > It > > > > > >> >> > >> >> > > > > > > should > > > > > >> >> > >> >> > > > > > > > > > define > > > > > >> >> > >> >> > > > > > > > > > > > > > desired sort fields used > for > > > > > querying. > > > > > >> >> > >> >> > > > > > > > > > > > > > c) Implement Lucene sort > usage > > > in > > > > > >> >> > >> >> > > > > GridLuceneIndex.query(). > > > > > >> >> > >> >> > > > > > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > 3.[moderate] Build complex > > > queries > > > > > >> with > > > > > >> >> > >> >> > *TextQuery*, > > > > > >> >> > >> >> > > > > > > including > > > > > >> >> > >> >> > > > > > > > > > > > > > terms/queries boosting. > > > > > >> >> > >> >> > > > > > > > > > > > > > *This section for voting > only, > > > as > > > > > >> >> requires > > > > > >> >> > >> more > > > > > >> >> > >> >> > > > detailed > > > > > >> >> > >> >> > > > > > > work. > > > > > >> >> > >> >> > > > > > > > > > Should > > > > > >> >> > >> >> > > > > > > > > > > > be > > > > > >> >> > >> >> > > > > > > > > > > > > > extended if community is > > > > > interested in > > > > > >> >> it.* > > > > > >> >> > >> >> > > > > > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > Looking forward to your > > > comments! > > > > > >> >> > >> >> > > > > > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > BR, > > > > > >> >> > >> >> > > > > > > > > > > > > > Yuriy Shuliha > > > > > >> >> > >> >> > > > > > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > -- > > > > > >> >> > >> >> > > > > > > > > > > > > Best regards, > > > > > >> >> > >> >> > > > > > > > > > > > > Andrey V. Mashenkov > > > > > >> >> > >> >> > > > > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > -- > > > > > >> >> > >> >> > > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > Best regards, > > > > > >> >> > >> >> > > > > > > > > > > Alexei Scherbakov > > > > > >> >> > >> >> > > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > >> >> > >> >> > > > > > > > > > > > >> >> > >> >> > > > > > > -- > > > > > >> >> > >> >> > > > > > > Best regards, > > > > > >> >> > >> >> > > > > > > Ivan Pavlukhin > > > > > >> >> > >> >> > > > > > > > > > > > >> >> > >> >> > > > > > > > > > >> >> > >> >> > > > > > > > > > >> >> > >> >> > > > > > > > > > >> >> > >> >> > > > > -- > > > > > >> >> > >> >> > > > > Best regards, > > > > > >> >> > >> >> > > > > Ivan Pavlukhin > > > > > >> >> > >> >> > > > > > > > > > >> >> > >> >> > > > > > > > > >> >> > >> >> > > > > > > > >> >> > >> >> > > > > > > >> >> > >> >> > > > > > > >> >> > >> >> > -- > > > > > >> >> > >> >> > Best regards, > > > > > >> >> > >> >> > Andrey V. Mashenkov > > > > > >> >> > >> >> > > > > > > >> >> > >> >> > > > > > >> >> > >> > > > > > > >> >> > >> > > > > > > >> >> > >> > -- > > > > > >> >> > >> > Best regards, > > > > > >> >> > >> > Andrey V. Mashenkov > > > > > >> >> > >> > > > > > > >> >> > >> > > > > > >> >> > > > > > > > >> >> > > > > > > >> >> > -- > > > > > >> >> > Best regards, > > > > > >> >> > Andrey V. Mashenkov > > > > > >> >> > > > > > > >> >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > Best regards, > > > Ivan Pavlukhin > > > > > > > > > > -- > Best regards, > Ivan Pavlukhin > > |
In reply to this post by Ivan Pavlukhin
Hello!
I have just found what I consider a major regression in Text Queries: it seems to me that text queries with limits will return same key-value entries multiple times. Please check the issue https://issues.apache.org/jira/browse/IGNITE-12401 and corresponding build https://ci.ignite.apache.org/viewQueued.html?itemId=4799634 Regards, -- Ilya Kasnacheev |
Yuriy,
Thank you for investigating the problem [1]. Still cannot realize how the problem relates to introduced "limit"? Is it right that there were no duplicates before "limit" support? After that support is introduced are only limited queries contain duplicates, or unlimited, or both? [1] https://issues.apache.org/jira/browse/IGNITE-12401 чт, 28 нояб. 2019 г. в 18:30, Ilya Kasnacheev <[hidden email]>: > > Hello! > > I have just found what I consider a major regression in Text Queries: it > seems to me that text queries with limits will return same key-value > entries multiple times. > > Please check the issue https://issues.apache.org/jira/browse/IGNITE-12401 > and corresponding build > https://ci.ignite.apache.org/viewQueued.html?itemId=4799634 > > Regards, > -- > Ilya Kasnacheev -- Best regards, Ivan Pavlukhin |
And is the problem specific to range queries or not?
пн, 2 дек. 2019 г. в 11:12, Ivan Pavlukhin <[hidden email]>: > > Yuriy, > > Thank you for investigating the problem [1]. Still cannot realize how > the problem relates to introduced "limit"? Is it right that there were > no duplicates before "limit" support? After that support is introduced > are only limited queries contain duplicates, or unlimited, or both? > > [1] https://issues.apache.org/jira/browse/IGNITE-12401 > > чт, 28 нояб. 2019 г. в 18:30, Ilya Kasnacheev <[hidden email]>: > > > > Hello! > > > > I have just found what I consider a major regression in Text Queries: it > > seems to me that text queries with limits will return same key-value > > entries multiple times. > > > > Please check the issue https://issues.apache.org/jira/browse/IGNITE-12401 > > and corresponding build > > https://ci.ignite.apache.org/viewQueued.html?itemId=4799634 > > > > Regards, > > -- > > Ilya Kasnacheev > > > > -- > Best regards, > Ivan Pavlukhin -- Best regards, Ivan Pavlukhin |
Hello!
The problem is NOT specific to range queries. Range queries were broken previously and they are broken now, but now even a simple "token in field with limit" returns duplicates. Before limits were introduced, any tested use cases were unaffected by duplicates, but now they are. Regards, -- Ilya Kasnacheev пн, 2 дек. 2019 г. в 12:23, Ivan Pavlukhin <[hidden email]>: > And is the problem specific to range queries or not? > > пн, 2 дек. 2019 г. в 11:12, Ivan Pavlukhin <[hidden email]>: > > > > Yuriy, > > > > Thank you for investigating the problem [1]. Still cannot realize how > > the problem relates to introduced "limit"? Is it right that there were > > no duplicates before "limit" support? After that support is introduced > > are only limited queries contain duplicates, or unlimited, or both? > > > > [1] https://issues.apache.org/jira/browse/IGNITE-12401 > > > > чт, 28 нояб. 2019 г. в 18:30, Ilya Kasnacheev <[hidden email] > >: > > > > > > Hello! > > > > > > I have just found what I consider a major regression in Text Queries: > it > > > seems to me that text queries with limits will return same key-value > > > entries multiple times. > > > > > > Please check the issue > https://issues.apache.org/jira/browse/IGNITE-12401 > > > and corresponding build > > > https://ci.ignite.apache.org/viewQueued.html?itemId=4799634 > > > > > > Regards, > > > -- > > > Ilya Kasnacheev > > > > > > > > -- > > Best regards, > > Ivan Pavlukhin > > > > -- > Best regards, > Ivan Pavlukhin > |
Ilya,
I checked your test on a revision before "limit" and it fails there as well. Could you please double check? пн, 2 дек. 2019 г. в 13:21, Ilya Kasnacheev <[hidden email]>: > > Hello! > > The problem is NOT specific to range queries. Range queries were broken > previously and they are broken now, but now even a simple "token in field > with limit" returns duplicates. > > Before limits were introduced, any tested use cases were unaffected by > duplicates, but now they are. > > Regards, > -- > Ilya Kasnacheev > > > пн, 2 дек. 2019 г. в 12:23, Ivan Pavlukhin <[hidden email]>: > > > And is the problem specific to range queries or not? > > > > пн, 2 дек. 2019 г. в 11:12, Ivan Pavlukhin <[hidden email]>: > > > > > > Yuriy, > > > > > > Thank you for investigating the problem [1]. Still cannot realize how > > > the problem relates to introduced "limit"? Is it right that there were > > > no duplicates before "limit" support? After that support is introduced > > > are only limited queries contain duplicates, or unlimited, or both? > > > > > > [1] https://issues.apache.org/jira/browse/IGNITE-12401 > > > > > > чт, 28 нояб. 2019 г. в 18:30, Ilya Kasnacheev <[hidden email] > > >: > > > > > > > > Hello! > > > > > > > > I have just found what I consider a major regression in Text Queries: > > it > > > > seems to me that text queries with limits will return same key-value > > > > entries multiple times. > > > > > > > > Please check the issue > > https://issues.apache.org/jira/browse/IGNITE-12401 > > > > and corresponding build > > > > https://ci.ignite.apache.org/viewQueued.html?itemId=4799634 > > > > > > > > Regards, > > > > -- > > > > Ilya Kasnacheev > > > > > > > > > > > > -- > > > Best regards, > > > Ivan Pavlukhin > > > > > > > > -- > > Best regards, > > Ivan Pavlukhin > > -- Best regards, Ivan Pavlukhin |
Ilya, Yuriy,
It seems that text queries can return incorrect results on tologies with MOVING partitions. I left a comment in JIRA [1]. [1] https://issues.apache.org/jira/browse/IGNITE-12401 пн, 2 дек. 2019 г. в 15:00, Ivan Pavlukhin <[hidden email]>: > > Ilya, > > I checked your test on a revision before "limit" and it fails there as > well. Could you please double check? > > пн, 2 дек. 2019 г. в 13:21, Ilya Kasnacheev <[hidden email]>: > > > > Hello! > > > > The problem is NOT specific to range queries. Range queries were broken > > previously and they are broken now, but now even a simple "token in field > > with limit" returns duplicates. > > > > Before limits were introduced, any tested use cases were unaffected by > > duplicates, but now they are. > > > > Regards, > > -- > > Ilya Kasnacheev > > > > > > пн, 2 дек. 2019 г. в 12:23, Ivan Pavlukhin <[hidden email]>: > > > > > And is the problem specific to range queries or not? > > > > > > пн, 2 дек. 2019 г. в 11:12, Ivan Pavlukhin <[hidden email]>: > > > > > > > > Yuriy, > > > > > > > > Thank you for investigating the problem [1]. Still cannot realize how > > > > the problem relates to introduced "limit"? Is it right that there were > > > > no duplicates before "limit" support? After that support is introduced > > > > are only limited queries contain duplicates, or unlimited, or both? > > > > > > > > [1] https://issues.apache.org/jira/browse/IGNITE-12401 > > > > > > > > чт, 28 нояб. 2019 г. в 18:30, Ilya Kasnacheev <[hidden email] > > > >: > > > > > > > > > > Hello! > > > > > > > > > > I have just found what I consider a major regression in Text Queries: > > > it > > > > > seems to me that text queries with limits will return same key-value > > > > > entries multiple times. > > > > > > > > > > Please check the issue > > > https://issues.apache.org/jira/browse/IGNITE-12401 > > > > > and corresponding build > > > > > https://ci.ignite.apache.org/viewQueued.html?itemId=4799634 > > > > > > > > > > Regards, > > > > > -- > > > > > Ilya Kasnacheev > > > > > > > > > > > > > > > > -- > > > > Best regards, > > > > Ivan Pavlukhin > > > > > > > > > > > > -- > > > Best regards, > > > Ivan Pavlukhin > > > > > > > -- > Best regards, > Ivan Pavlukhin -- Best regards, Ivan Pavlukhin |
*on topologies
вт, 3 дек. 2019 г. в 17:15, Ivan Pavlukhin <[hidden email]>: > > Ilya, Yuriy, > > It seems that text queries can return incorrect results on tologies > with MOVING partitions. I left a comment in JIRA [1]. > > [1] https://issues.apache.org/jira/browse/IGNITE-12401 > > пн, 2 дек. 2019 г. в 15:00, Ivan Pavlukhin <[hidden email]>: > > > > Ilya, > > > > I checked your test on a revision before "limit" and it fails there as > > well. Could you please double check? > > > > пн, 2 дек. 2019 г. в 13:21, Ilya Kasnacheev <[hidden email]>: > > > > > > Hello! > > > > > > The problem is NOT specific to range queries. Range queries were broken > > > previously and they are broken now, but now even a simple "token in field > > > with limit" returns duplicates. > > > > > > Before limits were introduced, any tested use cases were unaffected by > > > duplicates, but now they are. > > > > > > Regards, > > > -- > > > Ilya Kasnacheev > > > > > > > > > пн, 2 дек. 2019 г. в 12:23, Ivan Pavlukhin <[hidden email]>: > > > > > > > And is the problem specific to range queries or not? > > > > > > > > пн, 2 дек. 2019 г. в 11:12, Ivan Pavlukhin <[hidden email]>: > > > > > > > > > > Yuriy, > > > > > > > > > > Thank you for investigating the problem [1]. Still cannot realize how > > > > > the problem relates to introduced "limit"? Is it right that there were > > > > > no duplicates before "limit" support? After that support is introduced > > > > > are only limited queries contain duplicates, or unlimited, or both? > > > > > > > > > > [1] https://issues.apache.org/jira/browse/IGNITE-12401 > > > > > > > > > > чт, 28 нояб. 2019 г. в 18:30, Ilya Kasnacheev <[hidden email] > > > > >: > > > > > > > > > > > > Hello! > > > > > > > > > > > > I have just found what I consider a major regression in Text Queries: > > > > it > > > > > > seems to me that text queries with limits will return same key-value > > > > > > entries multiple times. > > > > > > > > > > > > Please check the issue > > > > https://issues.apache.org/jira/browse/IGNITE-12401 > > > > > > and corresponding build > > > > > > https://ci.ignite.apache.org/viewQueued.html?itemId=4799634 > > > > > > > > > > > > Regards, > > > > > > -- > > > > > > Ilya Kasnacheev > > > > > > > > > > > > > > > > > > > > -- > > > > > Best regards, > > > > > Ivan Pavlukhin > > > > > > > > > > > > > > > > -- > > > > Best regards, > > > > Ivan Pavlukhin > > > > > > > > > > > > -- > > Best regards, > > Ivan Pavlukhin > > > > -- > Best regards, > Ivan Pavlukhin -- Best regards, Ivan Pavlukhin |
In reply to this post by Ivan Pavlukhin
Hello!
Yes, I guess you are right :( I can surely fix the range issue, It's just that it was so broken that I could not figure the correct behavior for this case. Regards, -- Ilya Kasnacheev пн, 2 дек. 2019 г. в 15:01, Ivan Pavlukhin <[hidden email]>: > Ilya, > > I checked your test on a revision before "limit" and it fails there as > well. Could you please double check? > > пн, 2 дек. 2019 г. в 13:21, Ilya Kasnacheev <[hidden email]>: > > > > Hello! > > > > The problem is NOT specific to range queries. Range queries were broken > > previously and they are broken now, but now even a simple "token in field > > with limit" returns duplicates. > > > > Before limits were introduced, any tested use cases were unaffected by > > duplicates, but now they are. > > > > Regards, > > -- > > Ilya Kasnacheev > > > > > > пн, 2 дек. 2019 г. в 12:23, Ivan Pavlukhin <[hidden email]>: > > > > > And is the problem specific to range queries or not? > > > > > > пн, 2 дек. 2019 г. в 11:12, Ivan Pavlukhin <[hidden email]>: > > > > > > > > Yuriy, > > > > > > > > Thank you for investigating the problem [1]. Still cannot realize how > > > > the problem relates to introduced "limit"? Is it right that there > were > > > > no duplicates before "limit" support? After that support is > introduced > > > > are only limited queries contain duplicates, or unlimited, or both? > > > > > > > > [1] https://issues.apache.org/jira/browse/IGNITE-12401 > > > > > > > > чт, 28 нояб. 2019 г. в 18:30, Ilya Kasnacheev < > [hidden email] > > > >: > > > > > > > > > > Hello! > > > > > > > > > > I have just found what I consider a major regression in Text > Queries: > > > it > > > > > seems to me that text queries with limits will return same > key-value > > > > > entries multiple times. > > > > > > > > > > Please check the issue > > > https://issues.apache.org/jira/browse/IGNITE-12401 > > > > > and corresponding build > > > > > https://ci.ignite.apache.org/viewQueued.html?itemId=4799634 > > > > > > > > > > Regards, > > > > > -- > > > > > Ilya Kasnacheev > > > > > > > > > > > > > > > > -- > > > > Best regards, > > > > Ivan Pavlukhin > > > > > > > > > > > > -- > > > Best regards, > > > Ivan Pavlukhin > > > > > > > -- > Best regards, > Ivan Pavlukhin > |
Free forum by Nabble | Edit this page |