Default Lucene index for String cache values

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Default Lucene index for String cache values

Ilya Korol
Hi, All.

According to https://issues.apache.org/jira/browse/IGNITE-14805 there is
default index creation for caches with String values:


if (type().valueClass() == String.class) {
     try {
         luceneIdx = new GridLuceneIndex(idx.kernalContext(),
tbl.cacheName(), type);
     }
     catch (IgniteCheckedException e1) {
         throw new IgniteException(e1);
     }
}


Does this really necessary?  What about disabling this feature by
default and enabling it only by demand (to reduce unnecessary
performance hit even if its not very huge)? I guess additional option
could be introduced to do so.

Any ideas?

Reply | Threaded
Open this post in threaded view
|

Re: Default Lucene index for String cache values

Maksim Timonin
Hi, Ilya! AFAIK, to create LuceneIndex it's required to do this:

CacheConfiguration.setIndexedTypes(Long.class, String.class);

It's pretty straightforward, a user wants the value class to be indexed. If
you just create a simple cache (without entities, indexed types) with
String.class as value it won't be indexed, as indexes created per table,
not per cache.

Do I miss something?

On Wed, Jun 2, 2021 at 12:56 PM Ilya Korol <[hidden email]> wrote:

> Hi, All.
>
> According to https://issues.apache.org/jira/browse/IGNITE-14805 there is
> default index creation for caches with String values:
>
>
> if (type().valueClass() == String.class) {
>      try {
>          luceneIdx = new GridLuceneIndex(idx.kernalContext(),
> tbl.cacheName(), type);
>      }
>      catch (IgniteCheckedException e1) {
>          throw new IgniteException(e1);
>      }
> }
>
>
> Does this really necessary?  What about disabling this feature by
> default and enabling it only by demand (to reduce unnecessary
> performance hit even if its not very huge)? I guess additional option
> could be introduced to do so.
>
> Any ideas?
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Default Lucene index for String cache values

Ilya Korol
Thanks for feedback Maksim, but let me disagree with you.

As far as i understand

CacheConfiguration.setIndexedTypes(Long.class, String.class);

Is just the hint for Ignite about data that should be indexed, but what
kind of index should be created depends on our configuration. For example:

public class StringValue implements Serializable {

     @QuerySqlField(index = true)
     private final long id;

     @QueryTextField
     private final String name;

}

In this case i explicitly define that i need text/lucene index for name
field via marking it with @QueryTextField annotation. Doing so i'm able
to perform text queries:

cache.query(new TextQuery(StringValue.class, "value"))
             .getAll()
             .forEach(e -> System.out.println(e.toString()));

Without @QueryTextField annotation query will return nothing.

So setIndexedTypes(..) is just not enough for proper lucene index
initiation. But in case of plane strings as cache values (instead pf
pojo class from my example) lucene index would be created no matter if
we ask for it or not:

// IgniteH2Indexing#queryLocalText(..)

if (tbl != null && tbl.luceneIndex() != null) {
     Long qryId = runningQueryManager().register(qry, TEXT, schemaName,
true, null);

     try {
         // We will reach this line if we use String as cache values or
if we used @QueryTextField annotation
         return tbl.luceneIndex().query(qry.toUpperCase(), filters, limit);

     } finally {
         runningQueryManager().unregister(qryId, null);
     }
}



02.06.2021 23:40, Maksim Timonin пишет:

> Hi, Ilya! AFAIK, to create LuceneIndex it's required to do this:
>
> CacheConfiguration.setIndexedTypes(Long.class, String.class);
>
> It's pretty straightforward, a user wants the value class to be indexed. If
> you just create a simple cache (without entities, indexed types) with
> String.class as value it won't be indexed, as indexes created per table,
> not per cache.
>
> Do I miss something?
>
> On Wed, Jun 2, 2021 at 12:56 PM Ilya Korol <[hidden email]> wrote:
>
>> Hi, All.
>>
>> According to https://issues.apache.org/jira/browse/IGNITE-14805 there is
>> default index creation for caches with String values:
>>
>>
>> if (type().valueClass() == String.class) {
>>       try {
>>           luceneIdx = new GridLuceneIndex(idx.kernalContext(),
>> tbl.cacheName(), type);
>>       }
>>       catch (IgniteCheckedException e1) {
>>           throw new IgniteException(e1);
>>       }
>> }
>>
>>
>> Does this really necessary?  What about disabling this feature by
>> default and enabling it only by demand (to reduce unnecessary
>> performance hit even if its not very huge)? I guess additional option
>> could be introduced to do so.
>>
>> Any ideas?
>>
>>
Reply | Threaded
Open this post in threaded view
|

Re: Default Lucene index for String cache values

Maksim Timonin
Hi, Ilya.

Could you please provide a reproducer of wrong behavior? Looks like I'm
missing smth in your idea. There is my code [1] with String class as value,
and TextQuery works correctly without the annotation QueryTextField. And I
expect this behavior.

> case of plane strings as cache values (instead of pojo class from my
example) lucene index would be created no matter if we ask for it or not
It's not correct. If you create cache CacheConfiguration<Integer, String>
it will not create LuceneIndex, as there is no table with String as value
class. You can check it with my code just replace Long and String
classes in cache definition.

[1] https://gist.github.com/timoninmaxim/e477ddfcbe56ec9892c7ba6ad44bfadb

On Thu, Jun 3, 2021 at 10:56 AM Ilya Korol <[hidden email]> wrote:

> Thanks for feedback Maksim, but let me disagree with you.
>
> As far as i understand
>
> CacheConfiguration.setIndexedTypes(Long.class, String.class);
>
> Is just the hint for Ignite about data that should be indexed, but what
> kind of index should be created depends on our configuration. For example:
>
> public class StringValue implements Serializable {
>
>      @QuerySqlField(index = true)
>      private final long id;
>
>      @QueryTextField
>      private final String name;
>
> }
>
> In this case i explicitly define that i need text/lucene index for name
> field via marking it with @QueryTextField annotation. Doing so i'm able
> to perform text queries:
>
> cache.query(new TextQuery(StringValue.class, "value"))
>              .getAll()
>              .forEach(e -> System.out.println(e.toString()));
>
> Without @QueryTextField annotation query will return nothing.
>
> So setIndexedTypes(..) is just not enough for proper lucene index
> initiation. But in case of plane strings as cache values (instead pf
> pojo class from my example) lucene index would be created no matter if
> we ask for it or not:
>
> // IgniteH2Indexing#queryLocalText(..)
>
> if (tbl != null && tbl.luceneIndex() != null) {
>      Long qryId = runningQueryManager().register(qry, TEXT, schemaName,
> true, null);
>
>      try {
>          // We will reach this line if we use String as cache values or
> if we used @QueryTextField annotation
>          return tbl.luceneIndex().query(qry.toUpperCase(), filters, limit);
>
>      } finally {
>          runningQueryManager().unregister(qryId, null);
>      }
> }
>
>
>
> 02.06.2021 23:40, Maksim Timonin пишет:
> > Hi, Ilya! AFAIK, to create LuceneIndex it's required to do this:
> >
> > CacheConfiguration.setIndexedTypes(Long.class, String.class);
> >
> > It's pretty straightforward, a user wants the value class to be indexed.
> If
> > you just create a simple cache (without entities, indexed types) with
> > String.class as value it won't be indexed, as indexes created per table,
> > not per cache.
> >
> > Do I miss something?
> >
> > On Wed, Jun 2, 2021 at 12:56 PM Ilya Korol <[hidden email]> wrote:
> >
> >> Hi, All.
> >>
> >> According to https://issues.apache.org/jira/browse/IGNITE-14805 there
> is
> >> default index creation for caches with String values:
> >>
> >>
> >> if (type().valueClass() == String.class) {
> >>       try {
> >>           luceneIdx = new GridLuceneIndex(idx.kernalContext(),
> >> tbl.cacheName(), type);
> >>       }
> >>       catch (IgniteCheckedException e1) {
> >>           throw new IgniteException(e1);
> >>       }
> >> }
> >>
> >>
> >> Does this really necessary?  What about disabling this feature by
> >> default and enabling it only by demand (to reduce unnecessary
> >> performance hit even if its not very huge)? I guess additional option
> >> could be introduced to do so.
> >>
> >> Any ideas?
> >>
> >>
>
Reply | Threaded
Open this post in threaded view
|

Re: Default Lucene index for String cache values

Valentin Kulichenko
Folks,

This is a usability issue of the current API, which I don't think we can
easily fix right now without breaking compatibility.

There are two separate cases.

First - the value class is a user's POJO. This one is straightforward - we
analyze annotations and create indexes based on that. The user has full
control.

Second - the value class is a "first-class citizen", i.e. primitive,
String, UUID, or similar. In this case, there are no annotations
(obviously), so we have to implicitly create an index for the value itself.
String type is a special case, as a user might want a full-text index for
it as well. So we *additionally* create one.

We can't just remove the aforementioned code, because there is no explicit
way to create a full-text index for the String value type. I guess such a
way can be provided by introducing additional configuration property (or
properties), but frankly, it will only complicate things even more. The use
case is not that popular, and the overhead of a single index is not that
big, so I don't think it worth it.

I would not change anything here.

-Val

On Thu, Jun 3, 2021 at 1:38 AM Maksim Timonin <[hidden email]>
wrote:

> Hi, Ilya.
>
> Could you please provide a reproducer of wrong behavior? Looks like I'm
> missing smth in your idea. There is my code [1] with String class as value,
> and TextQuery works correctly without the annotation QueryTextField. And I
> expect this behavior.
>
> > case of plane strings as cache values (instead of pojo class from my
> example) lucene index would be created no matter if we ask for it or not
> It's not correct. If you create cache CacheConfiguration<Integer, String>
> it will not create LuceneIndex, as there is no table with String as value
> class. You can check it with my code just replace Long and String
> classes in cache definition.
>
> [1] https://gist.github.com/timoninmaxim/e477ddfcbe56ec9892c7ba6ad44bfadb
>
> On Thu, Jun 3, 2021 at 10:56 AM Ilya Korol <[hidden email]> wrote:
>
> > Thanks for feedback Maksim, but let me disagree with you.
> >
> > As far as i understand
> >
> > CacheConfiguration.setIndexedTypes(Long.class, String.class);
> >
> > Is just the hint for Ignite about data that should be indexed, but what
> > kind of index should be created depends on our configuration. For
> example:
> >
> > public class StringValue implements Serializable {
> >
> >      @QuerySqlField(index = true)
> >      private final long id;
> >
> >      @QueryTextField
> >      private final String name;
> >
> > }
> >
> > In this case i explicitly define that i need text/lucene index for name
> > field via marking it with @QueryTextField annotation. Doing so i'm able
> > to perform text queries:
> >
> > cache.query(new TextQuery(StringValue.class, "value"))
> >              .getAll()
> >              .forEach(e -> System.out.println(e.toString()));
> >
> > Without @QueryTextField annotation query will return nothing.
> >
> > So setIndexedTypes(..) is just not enough for proper lucene index
> > initiation. But in case of plane strings as cache values (instead pf
> > pojo class from my example) lucene index would be created no matter if
> > we ask for it or not:
> >
> > // IgniteH2Indexing#queryLocalText(..)
> >
> > if (tbl != null && tbl.luceneIndex() != null) {
> >      Long qryId = runningQueryManager().register(qry, TEXT, schemaName,
> > true, null);
> >
> >      try {
> >          // We will reach this line if we use String as cache values or
> > if we used @QueryTextField annotation
> >          return tbl.luceneIndex().query(qry.toUpperCase(), filters,
> limit);
> >
> >      } finally {
> >          runningQueryManager().unregister(qryId, null);
> >      }
> > }
> >
> >
> >
> > 02.06.2021 23:40, Maksim Timonin пишет:
> > > Hi, Ilya! AFAIK, to create LuceneIndex it's required to do this:
> > >
> > > CacheConfiguration.setIndexedTypes(Long.class, String.class);
> > >
> > > It's pretty straightforward, a user wants the value class to be
> indexed.
> > If
> > > you just create a simple cache (without entities, indexed types) with
> > > String.class as value it won't be indexed, as indexes created per
> table,
> > > not per cache.
> > >
> > > Do I miss something?
> > >
> > > On Wed, Jun 2, 2021 at 12:56 PM Ilya Korol <[hidden email]>
> wrote:
> > >
> > >> Hi, All.
> > >>
> > >> According to https://issues.apache.org/jira/browse/IGNITE-14805 there
> > is
> > >> default index creation for caches with String values:
> > >>
> > >>
> > >> if (type().valueClass() == String.class) {
> > >>       try {
> > >>           luceneIdx = new GridLuceneIndex(idx.kernalContext(),
> > >> tbl.cacheName(), type);
> > >>       }
> > >>       catch (IgniteCheckedException e1) {
> > >>           throw new IgniteException(e1);
> > >>       }
> > >> }
> > >>
> > >>
> > >> Does this really necessary?  What about disabling this feature by
> > >> default and enabling it only by demand (to reduce unnecessary
> > >> performance hit even if its not very huge)? I guess additional option
> > >> could be introduced to do so.
> > >>
> > >> Any ideas?
> > >>
> > >>
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Default Lucene index for String cache values

Ilya Korol
I've got your point Val, thanks. At least we can consider this issue
during Ignite 3 development.

03.06.2021 19:10, Valentin Kulichenko пишет:

> Folks,
>
> This is a usability issue of the current API, which I don't think we can
> easily fix right now without breaking compatibility.
>
> There are two separate cases.
>
> First - the value class is a user's POJO. This one is straightforward - we
> analyze annotations and create indexes based on that. The user has full
> control.
>
> Second - the value class is a "first-class citizen", i.e. primitive,
> String, UUID, or similar. In this case, there are no annotations
> (obviously), so we have to implicitly create an index for the value itself.
> String type is a special case, as a user might want a full-text index for
> it as well. So we *additionally* create one.
>
> We can't just remove the aforementioned code, because there is no explicit
> way to create a full-text index for the String value type. I guess such a
> way can be provided by introducing additional configuration property (or
> properties), but frankly, it will only complicate things even more. The use
> case is not that popular, and the overhead of a single index is not that
> big, so I don't think it worth it.
>
> I would not change anything here.
>
> -Val
>
> On Thu, Jun 3, 2021 at 1:38 AM Maksim Timonin <[hidden email]>
> wrote:
>
>> Hi, Ilya.
>>
>> Could you please provide a reproducer of wrong behavior? Looks like I'm
>> missing smth in your idea. There is my code [1] with String class as value,
>> and TextQuery works correctly without the annotation QueryTextField. And I
>> expect this behavior.
>>
>>> case of plane strings as cache values (instead of pojo class from my
>> example) lucene index would be created no matter if we ask for it or not
>> It's not correct. If you create cache CacheConfiguration<Integer, String>
>> it will not create LuceneIndex, as there is no table with String as value
>> class. You can check it with my code just replace Long and String
>> classes in cache definition.
>>
>> [1] https://gist.github.com/timoninmaxim/e477ddfcbe56ec9892c7ba6ad44bfadb
>>
>> On Thu, Jun 3, 2021 at 10:56 AM Ilya Korol <[hidden email]> wrote:
>>
>>> Thanks for feedback Maksim, but let me disagree with you.
>>>
>>> As far as i understand
>>>
>>> CacheConfiguration.setIndexedTypes(Long.class, String.class);
>>>
>>> Is just the hint for Ignite about data that should be indexed, but what
>>> kind of index should be created depends on our configuration. For
>> example:
>>> public class StringValue implements Serializable {
>>>
>>>       @QuerySqlField(index = true)
>>>       private final long id;
>>>
>>>       @QueryTextField
>>>       private final String name;
>>>
>>> }
>>>
>>> In this case i explicitly define that i need text/lucene index for name
>>> field via marking it with @QueryTextField annotation. Doing so i'm able
>>> to perform text queries:
>>>
>>> cache.query(new TextQuery(StringValue.class, "value"))
>>>               .getAll()
>>>               .forEach(e -> System.out.println(e.toString()));
>>>
>>> Without @QueryTextField annotation query will return nothing.
>>>
>>> So setIndexedTypes(..) is just not enough for proper lucene index
>>> initiation. But in case of plane strings as cache values (instead pf
>>> pojo class from my example) lucene index would be created no matter if
>>> we ask for it or not:
>>>
>>> // IgniteH2Indexing#queryLocalText(..)
>>>
>>> if (tbl != null && tbl.luceneIndex() != null) {
>>>       Long qryId = runningQueryManager().register(qry, TEXT, schemaName,
>>> true, null);
>>>
>>>       try {
>>>           // We will reach this line if we use String as cache values or
>>> if we used @QueryTextField annotation
>>>           return tbl.luceneIndex().query(qry.toUpperCase(), filters,
>> limit);
>>>       } finally {
>>>           runningQueryManager().unregister(qryId, null);
>>>       }
>>> }
>>>
>>>
>>>
>>> 02.06.2021 23:40, Maksim Timonin пишет:
>>>> Hi, Ilya! AFAIK, to create LuceneIndex it's required to do this:
>>>>
>>>> CacheConfiguration.setIndexedTypes(Long.class, String.class);
>>>>
>>>> It's pretty straightforward, a user wants the value class to be
>> indexed.
>>> If
>>>> you just create a simple cache (without entities, indexed types) with
>>>> String.class as value it won't be indexed, as indexes created per
>> table,
>>>> not per cache.
>>>>
>>>> Do I miss something?
>>>>
>>>> On Wed, Jun 2, 2021 at 12:56 PM Ilya Korol <[hidden email]>
>> wrote:
>>>>> Hi, All.
>>>>>
>>>>> According to https://issues.apache.org/jira/browse/IGNITE-14805 there
>>> is
>>>>> default index creation for caches with String values:
>>>>>
>>>>>
>>>>> if (type().valueClass() == String.class) {
>>>>>        try {
>>>>>            luceneIdx = new GridLuceneIndex(idx.kernalContext(),
>>>>> tbl.cacheName(), type);
>>>>>        }
>>>>>        catch (IgniteCheckedException e1) {
>>>>>            throw new IgniteException(e1);
>>>>>        }
>>>>> }
>>>>>
>>>>>
>>>>> Does this really necessary?  What about disabling this feature by
>>>>> default and enabling it only by demand (to reduce unnecessary
>>>>> performance hit even if its not very huge)? I guess additional option
>>>>> could be introduced to do so.
>>>>>
>>>>> Any ideas?
>>>>>
>>>>>
Reply | Threaded
Open this post in threaded view
|

Re: Default Lucene index for String cache values

Vladimir Pligin
Hi guys,

I've checked that a text index is being created only if a user specifies
indexedTypes (or something similar).
It would be nice to give users an option to avoid this behaviour as it's
technically a workaround because it's not possible to somehow specify the
@QueryTextField annotation on a bare String value. However it's seems pretty
complicated to fix that if someone has already started leveraging that
without a compatibility break. Potentially we could use a system property or
just leave it be until we release Ignite 3. There we'll get rid of it
entirely.



--
Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/