SQL Hash indexes

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

SQL Hash indexes

Sergi
Guys,

It seems that for simple equality queries like

SELECT * FROM x WHERE a = ?

it is more effective to use hash indexes which do not even need to be
snapshotable.
I think it will be easy to implement one based on ConcurrentHashMap.

Thoughts?

Sergi
Reply | Threaded
Open this post in threaded view
|

Re: SQL Hash indexes

dsetrakyan
Sergi,

Does it mean that field "a" will now have 2 indexes, hash and sorted?

D.

On Fri, Sep 18, 2015 at 4:01 PM, Sergi Vladykin <[hidden email]>
wrote:

> Guys,
>
> It seems that for simple equality queries like
>
> SELECT * FROM x WHERE a = ?
>
> it is more effective to use hash indexes which do not even need to be
> snapshotable.
> I think it will be easy to implement one based on ConcurrentHashMap.
>
> Thoughts?
>
> Sergi
>
Reply | Threaded
Open this post in threaded view
|

Re: SQL Hash indexes

Sergi
Not necessary, you can configure to have either sorted index or hash index
or both.
In the last case as far as I understand optimizer just will pick up hash
index for
equality conditions because it will have lower cost.

The only thing I'm currently not sure of is how to add this to configuration
(our indexed types config already looks like piece of crap, don't want to
complicate it even more).

Sergi

2015-09-18 19:05 GMT+03:00 Dmitriy Setrakyan <[hidden email]>:

> Sergi,
>
> Does it mean that field "a" will now have 2 indexes, hash and sorted?
>
> D.
>
> On Fri, Sep 18, 2015 at 4:01 PM, Sergi Vladykin <[hidden email]>
> wrote:
>
> > Guys,
> >
> > It seems that for simple equality queries like
> >
> > SELECT * FROM x WHERE a = ?
> >
> > it is more effective to use hash indexes which do not even need to be
> > snapshotable.
> > I think it will be easy to implement one based on ConcurrentHashMap.
> >
> > Thoughts?
> >
> > Sergi
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: SQL Hash indexes

dsetrakyan
On Sat, Sep 19, 2015 at 2:04 PM, Sergi Vladykin <[hidden email]>
wrote:

> Not necessary, you can configure to have either sorted index or hash index
> or both.
> In the last case as far as I understand optimizer just will pick up hash
> index for
> equality conditions because it will have lower cost.
>
> The only thing I'm currently not sure of is how to add this to
> configuration
> (our indexed types config already looks like piece of crap, don't want to
> complicate it even more).
>

I think index type should be specified at the annotation level. As far as
configuring query metadata in XML, I agree with you, we should clean up the
design.


>
> Sergi
>
> 2015-09-18 19:05 GMT+03:00 Dmitriy Setrakyan <[hidden email]>:
>
> > Sergi,
> >
> > Does it mean that field "a" will now have 2 indexes, hash and sorted?
> >
> > D.
> >
> > On Fri, Sep 18, 2015 at 4:01 PM, Sergi Vladykin <
> [hidden email]>
> > wrote:
> >
> > > Guys,
> > >
> > > It seems that for simple equality queries like
> > >
> > > SELECT * FROM x WHERE a = ?
> > >
> > > it is more effective to use hash indexes which do not even need to be
> > > snapshotable.
> > > I think it will be easy to implement one based on ConcurrentHashMap.
> > >
> > > Thoughts?
> > >
> > > Sergi
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: SQL Hash indexes

Sergi
As part of this discussion I want to talk about non-snapshotable indexes in
general.

There are few things we can do to improve performance of queries and
indexing.

1. We can implement SkipList based index. Probably it will not be much
faster than current SnapTree, but GC and Optimizer will feel better.
2. We can implement ConcurrentMap based index as I wrote earlier. For joins
on big caches it must give us a good boost.

3. Because these indexes are not snapshotable we can eliminate taking
writeLock and for taking snapshots. This writeLock stops concurrent updates
of indexes, so on mixed query/update workloads it will be noticeable
improvement.

4. Implement off-heap SkipList. I already did that for Hadoop Accelerator
(though it was append only), so this code can be taken as a base.
5. Implement offheap ConcurrentMap. Probably we can use something from our
current GridOffheapProcessor.

The benefit of this is improved performance.
The drawback is that all the updates (even partial) will be immediately
visible and index search can occur when index is in inconsistent state like
when on update the old value was already removed from index but the new one
was not added yet.

I think items 1-2 can be done in a few days, item 3 can take couple of days
as well. Items 4-5 are probably the toughest ones, I think we should
postpone them until we will be production ready with 1-3.

Sergi



2015-09-19 17:49 GMT+03:00 Dmitriy Setrakyan <[hidden email]>:

> On Sat, Sep 19, 2015 at 2:04 PM, Sergi Vladykin <[hidden email]>
> wrote:
>
> > Not necessary, you can configure to have either sorted index or hash
> index
> > or both.
> > In the last case as far as I understand optimizer just will pick up hash
> > index for
> > equality conditions because it will have lower cost.
> >
> > The only thing I'm currently not sure of is how to add this to
> > configuration
> > (our indexed types config already looks like piece of crap, don't want to
> > complicate it even more).
> >
>
> I think index type should be specified at the annotation level. As far as
> configuring query metadata in XML, I agree with you, we should clean up the
> design.
>
>
> >
> > Sergi
> >
> > 2015-09-18 19:05 GMT+03:00 Dmitriy Setrakyan <[hidden email]>:
> >
> > > Sergi,
> > >
> > > Does it mean that field "a" will now have 2 indexes, hash and sorted?
> > >
> > > D.
> > >
> > > On Fri, Sep 18, 2015 at 4:01 PM, Sergi Vladykin <
> > [hidden email]>
> > > wrote:
> > >
> > > > Guys,
> > > >
> > > > It seems that for simple equality queries like
> > > >
> > > > SELECT * FROM x WHERE a = ?
> > > >
> > > > it is more effective to use hash indexes which do not even need to be
> > > > snapshotable.
> > > > I think it will be easy to implement one based on ConcurrentHashMap.
> > > >
> > > > Thoughts?
> > > >
> > > > Sergi
> > > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: SQL Hash indexes

dsetrakyan
Sergey,

Thanks for detailed explanation! I agree with your suggestion, let's do
on-heap (points 1 to 3) first and then switch to off-heap.

Can you please create a parent ticket with necessary subtasks, so we can
properly track it in Jira?

D.

On Wed, Sep 23, 2015 at 11:10 AM, Sergi Vladykin <[hidden email]>
wrote:

> As part of this discussion I want to talk about non-snapshotable indexes in
> general.
>
> There are few things we can do to improve performance of queries and
> indexing.
>
> 1. We can implement SkipList based index. Probably it will not be much
> faster than current SnapTree, but GC and Optimizer will feel better.
> 2. We can implement ConcurrentMap based index as I wrote earlier. For joins
> on big caches it must give us a good boost.
>
> 3. Because these indexes are not snapshotable we can eliminate taking
> writeLock and for taking snapshots. This writeLock stops concurrent updates
> of indexes, so on mixed query/update workloads it will be noticeable
> improvement.
>
> 4. Implement off-heap SkipList. I already did that for Hadoop Accelerator
> (though it was append only), so this code can be taken as a base.
> 5. Implement offheap ConcurrentMap. Probably we can use something from our
> current GridOffheapProcessor.
>
> The benefit of this is improved performance.
> The drawback is that all the updates (even partial) will be immediately
> visible and index search can occur when index is in inconsistent state like
> when on update the old value was already removed from index but the new one
> was not added yet.
>
> I think items 1-2 can be done in a few days, item 3 can take couple of days
> as well. Items 4-5 are probably the toughest ones, I think we should
> postpone them until we will be production ready with 1-3.
>
> Sergi
>
>
>
> 2015-09-19 17:49 GMT+03:00 Dmitriy Setrakyan <[hidden email]>:
>
> > On Sat, Sep 19, 2015 at 2:04 PM, Sergi Vladykin <
> [hidden email]>
> > wrote:
> >
> > > Not necessary, you can configure to have either sorted index or hash
> > index
> > > or both.
> > > In the last case as far as I understand optimizer just will pick up
> hash
> > > index for
> > > equality conditions because it will have lower cost.
> > >
> > > The only thing I'm currently not sure of is how to add this to
> > > configuration
> > > (our indexed types config already looks like piece of crap, don't want
> to
> > > complicate it even more).
> > >
> >
> > I think index type should be specified at the annotation level. As far as
> > configuring query metadata in XML, I agree with you, we should clean up
> the
> > design.
> >
> >
> > >
> > > Sergi
> > >
> > > 2015-09-18 19:05 GMT+03:00 Dmitriy Setrakyan <[hidden email]>:
> > >
> > > > Sergi,
> > > >
> > > > Does it mean that field "a" will now have 2 indexes, hash and sorted?
> > > >
> > > > D.
> > > >
> > > > On Fri, Sep 18, 2015 at 4:01 PM, Sergi Vladykin <
> > > [hidden email]>
> > > > wrote:
> > > >
> > > > > Guys,
> > > > >
> > > > > It seems that for simple equality queries like
> > > > >
> > > > > SELECT * FROM x WHERE a = ?
> > > > >
> > > > > it is more effective to use hash indexes which do not even need to
> be
> > > > > snapshotable.
> > > > > I think it will be easy to implement one based on
> ConcurrentHashMap.
> > > > >
> > > > > Thoughts?
> > > > >
> > > > > Sergi
> > > > >
> > > >
> > >
> >
>