Optimize integer sets.

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Optimize integer sets.

Andrew Mashenkov
Hi Guys

Alexei Scherbakov report a ticket few time ago [1]. The solution look
promissing.

Alexei, you wrote that this can save some memory. More over replacing
linked Set structure to array based bit-set
can give a speed-up due to array based structures are cache friendly.

But one thing is not clear for me how we will handle sparsed bit-sets? For
example, if we have 1024 partiotions (as it is by default)
and have much nodes, e.g. 512. In this case, bit-set will occupy 256 bytes
that seem to be more than Set<Integer>.

What do you mean exactly to use bit-set as more compact structure then
Set<Integer> or bit-set with some additional compression?

I would thought, we can use hash-set with open addressing in some cases
like that to get gain of array bases structures over linked structures and
save memory?
For example, we could use such hash-set for small data (64bytes as cache
line size) and use bit-sets for bigger data, if it's possible of course.


Thoughts?

[1] https://issues.apache.org/jira/browse/IGNITE-4554
Reply | Threaded
Open this post in threaded view
|

Re: Optimize integer sets.

Sergi
I'd suggest to have a single abstract class `Partitions` with protected
constructor and static factory method. This will allow to add different
optimized for any particular case implementations transparently.

Sergi

2017-01-21 15:26 GMT+03:00 Andrey Mashenkov <[hidden email]>:

> Hi Guys
>
> Alexei Scherbakov report a ticket few time ago [1]. The solution look
> promissing.
>
> Alexei, you wrote that this can save some memory. More over replacing
> linked Set structure to array based bit-set
> can give a speed-up due to array based structures are cache friendly.
>
> But one thing is not clear for me how we will handle sparsed bit-sets? For
> example, if we have 1024 partiotions (as it is by default)
> and have much nodes, e.g. 512. In this case, bit-set will occupy 256 bytes
> that seem to be more than Set<Integer>.
>
> What do you mean exactly to use bit-set as more compact structure then
> Set<Integer> or bit-set with some additional compression?
>
> I would thought, we can use hash-set with open addressing in some cases
> like that to get gain of array bases structures over linked structures and
> save memory?
> For example, we could use such hash-set for small data (64bytes as cache
> line size) and use bit-sets for bigger data, if it's possible of course.
>
>
> Thoughts?
>
> [1] https://issues.apache.org/jira/browse/IGNITE-4554
>
Reply | Threaded
Open this post in threaded view
|

Re: Optimize integer sets.

Alexei Scherbakov
Andrey,

IGNITE-4554 is about compressed bit sets, not standart bit sets.

Currently I'm implementing data structure based on [1]

It should efficiently handle sparse bit sets.

[1] http://roaringbitmap.org/

2017-01-21 16:28 GMT+03:00 Sergi Vladykin <[hidden email]>:

> I'd suggest to have a single abstract class `Partitions` with protected
> constructor and static factory method. This will allow to add different
> optimized for any particular case implementations transparently.
>
> Sergi
>
> 2017-01-21 15:26 GMT+03:00 Andrey Mashenkov <[hidden email]>:
>
> > Hi Guys
> >
> > Alexei Scherbakov report a ticket few time ago [1]. The solution look
> > promissing.
> >
> > Alexei, you wrote that this can save some memory. More over replacing
> > linked Set structure to array based bit-set
> > can give a speed-up due to array based structures are cache friendly.
> >
> > But one thing is not clear for me how we will handle sparsed bit-sets?
> For
> > example, if we have 1024 partiotions (as it is by default)
> > and have much nodes, e.g. 512. In this case, bit-set will occupy 256
> bytes
> > that seem to be more than Set<Integer>.
> >
> > What do you mean exactly to use bit-set as more compact structure then
> > Set<Integer> or bit-set with some additional compression?
> >
> > I would thought, we can use hash-set with open addressing in some cases
> > like that to get gain of array bases structures over linked structures
> and
> > save memory?
> > For example, we could use such hash-set for small data (64bytes as cache
> > line size) and use bit-sets for bigger data, if it's possible of course.
> >
> >
> > Thoughts?
> >
> > [1] https://issues.apache.org/jira/browse/IGNITE-4554
> >
>



--

Best regards,
Alexei Scherbakov
Reply | Threaded
Open this post in threaded view
|

答复: Optimize integer sets.

Shawn Du
In reply to this post by Sergi
Try this https://github.com/RoaringBitmap/RoaringBitmap


-----邮件原件-----
发件人: Sergi Vladykin [mailto:[hidden email]]
发送时间: 2017年1月21日 21:29
收件人: [hidden email]
主题: Re: Optimize integer sets.

I'd suggest to have a single abstract class `Partitions` with protected constructor and static factory method. This will allow to add different optimized for any particular case implementations transparently.

Sergi

2017-01-21 15:26 GMT+03:00 Andrey Mashenkov <[hidden email]>:

> Hi Guys
>
> Alexei Scherbakov report a ticket few time ago [1]. The solution look
> promissing.
>
> Alexei, you wrote that this can save some memory. More over replacing
> linked Set structure to array based bit-set can give a speed-up due to
> array based structures are cache friendly.
>
> But one thing is not clear for me how we will handle sparsed bit-sets?
> For example, if we have 1024 partiotions (as it is by default) and
> have much nodes, e.g. 512. In this case, bit-set will occupy 256 bytes
> that seem to be more than Set<Integer>.
>
> What do you mean exactly to use bit-set as more compact structure then
> Set<Integer> or bit-set with some additional compression?
>
> I would thought, we can use hash-set with open addressing in some
> cases like that to get gain of array bases structures over linked
> structures and save memory?
> For example, we could use such hash-set for small data (64bytes as
> cache line size) and use bit-sets for bigger data, if it's possible of course.
>
>
> Thoughts?
>
> [1] https://issues.apache.org/jira/browse/IGNITE-4554
>