Distribution of keys to partitions

classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

Distribution of keys to partitions

michael.griggs
Hi Igniters,

Last week I was working with a group of Ignite users.  They are inserting
several million string keys in to a cache.  Each string key was
approximately 22-characters in length.  When I exported the partition
counts (via GG Visor) I was able to see an unusual periodicity in the
number of keys allocated to partitions.  I charted this in Excel [1].
After further investigation, it appears that there is a relationship
between the number of keys being inserted, the number of partitions
assigned to the cache and amount of apparent periodicity: a small number of
partitions will cause periodicity to appear with lower numbers of keys.

The RendezvousAffinityFunction#partition function performs a simple
calculation of key hashcode modulo partition-count:

U.safeAbs(key.hashCode() % parts)


Digging further I was led to the fact that this is how the Java HashMap
*used* to behave [2], but was upgraded around Java 1.4 to perform the
following:

key.hashCode() & (parts - 1)

which performs more efficiently.  It was then updated further to do the
following:

(h = key.hashCode()) ^ (h >>> 16);

with the bit-shift performed to

incorporate impact of the highest bits that would otherwise
never be used in index calculations because of table bounds


When using this function, rather than our
RendezvousAffinityFunction#partition implementation, I also saw a
significant decrease in the periodicity and a better distribution of keys
amongst partitions [3].

I would like to suggest that we adopt this modified hash function inside
RendezvousAffinityFunction.

Regards
Mike


[1]: https://i.imgur.com/0FtCZ2A.png
[2]:
https://www.quora.com/Why-does-Java-use-a-mediocre-hashCode-implementation-for-strings
[3]: https://i.imgur.com/8ZuCSA3.png
Reply | Threaded
Open this post in threaded view
|

Re: Distribution of keys to partitions

agura
Michael,

it makes sense only for cases when partitions count is power of two.
Affinity function doesn't have this limitation.

Bu, of course, we can check, that partitions count is power of two and
use optimized hash code calculation.


On Wed, Mar 15, 2017 at 4:09 PM, Michael Griggs
<[hidden email]> wrote:

> Hi Igniters,
>
> Last week I was working with a group of Ignite users.  They are inserting
> several million string keys in to a cache.  Each string key was
> approximately 22-characters in length.  When I exported the partition
> counts (via GG Visor) I was able to see an unusual periodicity in the
> number of keys allocated to partitions.  I charted this in Excel [1].
> After further investigation, it appears that there is a relationship
> between the number of keys being inserted, the number of partitions
> assigned to the cache and amount of apparent periodicity: a small number of
> partitions will cause periodicity to appear with lower numbers of keys.
>
> The RendezvousAffinityFunction#partition function performs a simple
> calculation of key hashcode modulo partition-count:
>
> U.safeAbs(key.hashCode() % parts)
>
>
> Digging further I was led to the fact that this is how the Java HashMap
> *used* to behave [2], but was upgraded around Java 1.4 to perform the
> following:
>
> key.hashCode() & (parts - 1)
>
> which performs more efficiently.  It was then updated further to do the
> following:
>
> (h = key.hashCode()) ^ (h >>> 16);
>
> with the bit-shift performed to
>
> incorporate impact of the highest bits that would otherwise
> never be used in index calculations because of table bounds
>
>
> When using this function, rather than our
> RendezvousAffinityFunction#partition implementation, I also saw a
> significant decrease in the periodicity and a better distribution of keys
> amongst partitions [3].
>
> I would like to suggest that we adopt this modified hash function inside
> RendezvousAffinityFunction.
>
> Regards
> Mike
>
>
> [1]: https://i.imgur.com/0FtCZ2A.png
> [2]:
> https://www.quora.com/Why-does-Java-use-a-mediocre-hashCode-implementation-for-strings
> [3]: https://i.imgur.com/8ZuCSA3.png
Reply | Threaded
Open this post in threaded view
|

Re: Distribution of keys to partitions

Valentin Kulichenko
In 99% of cases number of partition is a power of two, because it's the
default value. Almost no one changes it. If this change actually provides
better distribution, it absolutely makes sense to do it.

Michael, can you create a Jira ticket and put you findings there?

-Val

On Wed, Mar 15, 2017 at 3:58 PM, Andrey Gura <[hidden email]> wrote:

> Michael,
>
> it makes sense only for cases when partitions count is power of two.
> Affinity function doesn't have this limitation.
>
> Bu, of course, we can check, that partitions count is power of two and
> use optimized hash code calculation.
>
>
> On Wed, Mar 15, 2017 at 4:09 PM, Michael Griggs
> <[hidden email]> wrote:
> > Hi Igniters,
> >
> > Last week I was working with a group of Ignite users.  They are inserting
> > several million string keys in to a cache.  Each string key was
> > approximately 22-characters in length.  When I exported the partition
> > counts (via GG Visor) I was able to see an unusual periodicity in the
> > number of keys allocated to partitions.  I charted this in Excel [1].
> > After further investigation, it appears that there is a relationship
> > between the number of keys being inserted, the number of partitions
> > assigned to the cache and amount of apparent periodicity: a small number
> of
> > partitions will cause periodicity to appear with lower numbers of keys.
> >
> > The RendezvousAffinityFunction#partition function performs a simple
> > calculation of key hashcode modulo partition-count:
> >
> > U.safeAbs(key.hashCode() % parts)
> >
> >
> > Digging further I was led to the fact that this is how the Java HashMap
> > *used* to behave [2], but was upgraded around Java 1.4 to perform the
> > following:
> >
> > key.hashCode() & (parts - 1)
> >
> > which performs more efficiently.  It was then updated further to do the
> > following:
> >
> > (h = key.hashCode()) ^ (h >>> 16);
> >
> > with the bit-shift performed to
> >
> > incorporate impact of the highest bits that would otherwise
> > never be used in index calculations because of table bounds
> >
> >
> > When using this function, rather than our
> > RendezvousAffinityFunction#partition implementation, I also saw a
> > significant decrease in the periodicity and a better distribution of keys
> > amongst partitions [3].
> >
> > I would like to suggest that we adopt this modified hash function inside
> > RendezvousAffinityFunction.
> >
> > Regards
> > Mike
> >
> >
> > [1]: https://i.imgur.com/0FtCZ2A.png
> > [2]:
> > https://www.quora.com/Why-does-Java-use-a-mediocre-
> hashCode-implementation-for-strings
> > [3]: https://i.imgur.com/8ZuCSA3.png
>
Reply | Threaded
Open this post in threaded view
|

Re: Distribution of keys to partitions

agura
Anyway, we can't always use this optimization because it will not work
for non power of two values.

On Wed, Mar 15, 2017 at 6:48 PM, Valentin Kulichenko
<[hidden email]> wrote:

> In 99% of cases number of partition is a power of two, because it's the
> default value. Almost no one changes it. If this change actually provides
> better distribution, it absolutely makes sense to do it.
>
> Michael, can you create a Jira ticket and put you findings there?
>
> -Val
>
> On Wed, Mar 15, 2017 at 3:58 PM, Andrey Gura <[hidden email]> wrote:
>
>> Michael,
>>
>> it makes sense only for cases when partitions count is power of two.
>> Affinity function doesn't have this limitation.
>>
>> Bu, of course, we can check, that partitions count is power of two and
>> use optimized hash code calculation.
>>
>>
>> On Wed, Mar 15, 2017 at 4:09 PM, Michael Griggs
>> <[hidden email]> wrote:
>> > Hi Igniters,
>> >
>> > Last week I was working with a group of Ignite users.  They are inserting
>> > several million string keys in to a cache.  Each string key was
>> > approximately 22-characters in length.  When I exported the partition
>> > counts (via GG Visor) I was able to see an unusual periodicity in the
>> > number of keys allocated to partitions.  I charted this in Excel [1].
>> > After further investigation, it appears that there is a relationship
>> > between the number of keys being inserted, the number of partitions
>> > assigned to the cache and amount of apparent periodicity: a small number
>> of
>> > partitions will cause periodicity to appear with lower numbers of keys.
>> >
>> > The RendezvousAffinityFunction#partition function performs a simple
>> > calculation of key hashcode modulo partition-count:
>> >
>> > U.safeAbs(key.hashCode() % parts)
>> >
>> >
>> > Digging further I was led to the fact that this is how the Java HashMap
>> > *used* to behave [2], but was upgraded around Java 1.4 to perform the
>> > following:
>> >
>> > key.hashCode() & (parts - 1)
>> >
>> > which performs more efficiently.  It was then updated further to do the
>> > following:
>> >
>> > (h = key.hashCode()) ^ (h >>> 16);
>> >
>> > with the bit-shift performed to
>> >
>> > incorporate impact of the highest bits that would otherwise
>> > never be used in index calculations because of table bounds
>> >
>> >
>> > When using this function, rather than our
>> > RendezvousAffinityFunction#partition implementation, I also saw a
>> > significant decrease in the periodicity and a better distribution of keys
>> > amongst partitions [3].
>> >
>> > I would like to suggest that we adopt this modified hash function inside
>> > RendezvousAffinityFunction.
>> >
>> > Regards
>> > Mike
>> >
>> >
>> > [1]: https://i.imgur.com/0FtCZ2A.png
>> > [2]:
>> > https://www.quora.com/Why-does-Java-use-a-mediocre-
>> hashCode-implementation-for-strings
>> > [3]: https://i.imgur.com/8ZuCSA3.png
>>
Reply | Threaded
Open this post in threaded view
|

Re: Distribution of keys to partitions

michael.griggs
In reply to this post by Valentin Kulichenko
Valentin Kulichenko wrote
In 99% of cases number of partition is a power of two, because it's the
default value. Almost no one changes it. If this change actually provides
better distribution, it absolutely makes sense to do it.

Michael, can you create a Jira ticket and put you findings there?

-Val
Done - https://issues.apache.org/jira/browse/IGNITE-4828

Andrey Gura wrote
On Wed, Mar 15, 2017 at 3:58 PM, Andrey Gura <[hidden email]> wrote:

> it makes sense only for cases when partitions count is power of two.
> Affinity function doesn't have this limitation.
The Replicated cache, which is where this issue was found, has 512 partitions by default.  The default value for Partitioned caches is 1024.  Presumably we can provide a custom implementation of the hash function when the number of partitions is modified to a non-power-of-two value?
Reply | Threaded
Open this post in threaded view
|

Re: Distribution of keys to partitions

Valentin Kulichenko
In reply to this post by agura
Andrey,

Absolutely, your point is correct. I'm talking about default behavior which
must be as effective as possible. In case we do this optimization, I would
also show a warning if number of partitions is not a power of two.

-Val

On Wed, Mar 15, 2017 at 5:09 PM, Andrey Gura <[hidden email]> wrote:

> Anyway, we can't always use this optimization because it will not work
> for non power of two values.
>
> On Wed, Mar 15, 2017 at 6:48 PM, Valentin Kulichenko
> <[hidden email]> wrote:
> > In 99% of cases number of partition is a power of two, because it's the
> > default value. Almost no one changes it. If this change actually provides
> > better distribution, it absolutely makes sense to do it.
> >
> > Michael, can you create a Jira ticket and put you findings there?
> >
> > -Val
> >
> > On Wed, Mar 15, 2017 at 3:58 PM, Andrey Gura <[hidden email]> wrote:
> >
> >> Michael,
> >>
> >> it makes sense only for cases when partitions count is power of two.
> >> Affinity function doesn't have this limitation.
> >>
> >> Bu, of course, we can check, that partitions count is power of two and
> >> use optimized hash code calculation.
> >>
> >>
> >> On Wed, Mar 15, 2017 at 4:09 PM, Michael Griggs
> >> <[hidden email]> wrote:
> >> > Hi Igniters,
> >> >
> >> > Last week I was working with a group of Ignite users.  They are
> inserting
> >> > several million string keys in to a cache.  Each string key was
> >> > approximately 22-characters in length.  When I exported the partition
> >> > counts (via GG Visor) I was able to see an unusual periodicity in the
> >> > number of keys allocated to partitions.  I charted this in Excel [1].
> >> > After further investigation, it appears that there is a relationship
> >> > between the number of keys being inserted, the number of partitions
> >> > assigned to the cache and amount of apparent periodicity: a small
> number
> >> of
> >> > partitions will cause periodicity to appear with lower numbers of
> keys.
> >> >
> >> > The RendezvousAffinityFunction#partition function performs a simple
> >> > calculation of key hashcode modulo partition-count:
> >> >
> >> > U.safeAbs(key.hashCode() % parts)
> >> >
> >> >
> >> > Digging further I was led to the fact that this is how the Java
> HashMap
> >> > *used* to behave [2], but was upgraded around Java 1.4 to perform the
> >> > following:
> >> >
> >> > key.hashCode() & (parts - 1)
> >> >
> >> > which performs more efficiently.  It was then updated further to do
> the
> >> > following:
> >> >
> >> > (h = key.hashCode()) ^ (h >>> 16);
> >> >
> >> > with the bit-shift performed to
> >> >
> >> > incorporate impact of the highest bits that would otherwise
> >> > never be used in index calculations because of table bounds
> >> >
> >> >
> >> > When using this function, rather than our
> >> > RendezvousAffinityFunction#partition implementation, I also saw a
> >> > significant decrease in the periodicity and a better distribution of
> keys
> >> > amongst partitions [3].
> >> >
> >> > I would like to suggest that we adopt this modified hash function
> inside
> >> > RendezvousAffinityFunction.
> >> >
> >> > Regards
> >> > Mike
> >> >
> >> >
> >> > [1]: https://i.imgur.com/0FtCZ2A.png
> >> > [2]:
> >> > https://www.quora.com/Why-does-Java-use-a-mediocre-
> >> hashCode-implementation-for-strings
> >> > [3]: https://i.imgur.com/8ZuCSA3.png
> >>
>
Reply | Threaded
Open this post in threaded view
|

RE: Distribution of keys to partitions

michael.griggs
Have we ever heard of somebody needing to set the partition count to a non-power-of-two number?  Perhaps we could restrict the method so that it will only accept a power of two as the partition count?

-----Original Message-----
From: Valentin Kulichenko [mailto:[hidden email]]
Sent: 15 March 2017 16:22
To: [hidden email]
Subject: Re: Distribution of keys to partitions

Andrey,

Absolutely, your point is correct. I'm talking about default behavior which must be as effective as possible. In case we do this optimization, I would also show a warning if number of partitions is not a power of two.

-Val

On Wed, Mar 15, 2017 at 5:09 PM, Andrey Gura <[hidden email]> wrote:

> Anyway, we can't always use this optimization because it will not work
> for non power of two values.
>
> On Wed, Mar 15, 2017 at 6:48 PM, Valentin Kulichenko
> <[hidden email]> wrote:
> > In 99% of cases number of partition is a power of two, because it's
> > the default value. Almost no one changes it. If this change actually
> > provides better distribution, it absolutely makes sense to do it.
> >
> > Michael, can you create a Jira ticket and put you findings there?
> >
> > -Val
> >
> > On Wed, Mar 15, 2017 at 3:58 PM, Andrey Gura <[hidden email]> wrote:
> >
> >> Michael,
> >>
> >> it makes sense only for cases when partitions count is power of two.
> >> Affinity function doesn't have this limitation.
> >>
> >> Bu, of course, we can check, that partitions count is power of two
> >> and use optimized hash code calculation.
> >>
> >>
> >> On Wed, Mar 15, 2017 at 4:09 PM, Michael Griggs
> >> <[hidden email]> wrote:
> >> > Hi Igniters,
> >> >
> >> > Last week I was working with a group of Ignite users.  They are
> inserting
> >> > several million string keys in to a cache.  Each string key was
> >> > approximately 22-characters in length.  When I exported the
> >> > partition counts (via GG Visor) I was able to see an unusual
> >> > periodicity in the number of keys allocated to partitions.  I charted this in Excel [1].
> >> > After further investigation, it appears that there is a
> >> > relationship between the number of keys being inserted, the
> >> > number of partitions assigned to the cache and amount of apparent
> >> > periodicity: a small
> number
> >> of
> >> > partitions will cause periodicity to appear with lower numbers of
> keys.
> >> >
> >> > The RendezvousAffinityFunction#partition function performs a
> >> > simple calculation of key hashcode modulo partition-count:
> >> >
> >> > U.safeAbs(key.hashCode() % parts)
> >> >
> >> >
> >> > Digging further I was led to the fact that this is how the Java
> HashMap
> >> > *used* to behave [2], but was upgraded around Java 1.4 to perform
> >> > the
> >> > following:
> >> >
> >> > key.hashCode() & (parts - 1)
> >> >
> >> > which performs more efficiently.  It was then updated further to
> >> > do
> the
> >> > following:
> >> >
> >> > (h = key.hashCode()) ^ (h >>> 16);
> >> >
> >> > with the bit-shift performed to
> >> >
> >> > incorporate impact of the highest bits that would otherwise never
> >> > be used in index calculations because of table bounds
> >> >
> >> >
> >> > When using this function, rather than our
> >> > RendezvousAffinityFunction#partition implementation, I also saw a
> >> > significant decrease in the periodicity and a better distribution
> >> > of
> keys
> >> > amongst partitions [3].
> >> >
> >> > I would like to suggest that we adopt this modified hash function
> inside
> >> > RendezvousAffinityFunction.
> >> >
> >> > Regards
> >> > Mike
> >> >
> >> >
> >> > [1]: https://i.imgur.com/0FtCZ2A.png
> >> > [2]:
> >> > https://www.quora.com/Why-does-Java-use-a-mediocre-
> >> hashCode-implementation-for-strings
> >> > [3]: https://i.imgur.com/8ZuCSA3.png
> >>
>

Reply | Threaded
Open this post in threaded view
|

Re: Distribution of keys to partitions

dsetrakyan
On Wed, Mar 15, 2017 at 9:51 AM, Michael Griggs <[hidden email]
> wrote:

> Have we ever heard of somebody needing to set the partition count to a
> non-power-of-two number?  Perhaps we could restrict the method so that it
> will only accept a power of two as the partition count?
>

As Valentin suggested, we should not restrict it, but should print out a
warning.
Reply | Threaded
Open this post in threaded view
|

Re: Distribution of keys to partitions

dmagda
Excellent discovery, thanks Michael!

I would suggest doing the following. If we see that a number of partitions is a power of two then the new algorithm will be applied, otherwise the warning will be printed out and the *old* one approach will be used. Does this resolver all the concerns?

Michael, could you complete the contribution performing all the required changes? I think we should squeeze the update into AI 2.0 where it’s safe to break the compatibility.

Before merging the changes we can ask Ilya to run a set of benchmarks to confirm that there is no performance hit.


Denis

> On Mar 15, 2017, at 12:31 PM, Dmitriy Setrakyan <[hidden email]> wrote:
>
> On Wed, Mar 15, 2017 at 9:51 AM, Michael Griggs <[hidden email]
>> wrote:
>
>> Have we ever heard of somebody needing to set the partition count to a
>> non-power-of-two number?  Perhaps we could restrict the method so that it
>> will only accept a power of two as the partition count?
>>
>
> As Valentin suggested, we should not restrict it, but should print out a
> warning.

Reply | Threaded
Open this post in threaded view
|

Re: Distribution of keys to partitions

michael.griggs
I created a PR to implement this.  I ran the TC tests, but there are a lot of errors.  However, the errors seem unrelated to the change.  

I see that other PRs are suffering with similar test failures.  Have some tests been broken by new 2.0 functionality and not fixed yet?

http://ci.ignite.apache.org/project.html?projectId=IgniteTests&branch_IgniteTests=pull/1645/head

Regards
Mike

Reply | Threaded
Open this post in threaded view
|

Re: Distribution of keys to partitions

michael.griggs
The change is now ready for review:

https://github.com/apache/ignite/pull/1707
Reply | Threaded
Open this post in threaded view
|

Re: Distribution of keys to partitions

Vladimir Ozerov
In order to overcome this problem, Hazelcast guys set non-power-of-two
partition count by default - 257.

On Fri, Mar 31, 2017 at 9:59 AM, michael.griggs <[hidden email]
> wrote:

> The change is now ready for review:
>
> https://github.com/apache/ignite/pull/1707
>
>
>
> --
> View this message in context: http://apache-ignite-
> developers.2346864.n4.nabble.com/Distribution-of-keys-to-
> partitions-tp15455p16005.html
> Sent from the Apache Ignite Developers mailing list archive at Nabble.com.
>
Reply | Threaded
Open this post in threaded view
|

Re: Distribution of keys to partitions

dsetrakyan
Vova,

Not sure I understand what you mean. I believe that we just agreed that if
the partition count is set to the power of 2, then we can improve the
performance with a better hashing algorithm.

D.

On Fri, Mar 31, 2017 at 1:46 AM, Vladimir Ozerov <[hidden email]>
wrote:

> In order to overcome this problem, Hazelcast guys set non-power-of-two
> partition count by default - 257.
>
> On Fri, Mar 31, 2017 at 9:59 AM, michael.griggs <
> [hidden email]
> > wrote:
>
> > The change is now ready for review:
> >
> > https://github.com/apache/ignite/pull/1707
> >
> >
> >
> > --
> > View this message in context: http://apache-ignite-
> > developers.2346864.n4.nabble.com/Distribution-of-keys-to-
> > partitions-tp15455p16005.html
> > Sent from the Apache Ignite Developers mailing list archive at
> Nabble.com.
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Distribution of keys to partitions

dmagda
In reply to this post by michael.griggs
Michael, thanks.

I did minor improvements and merged them to IGNITE-4828 branch and triggered TeamCity tests:
http://ci.ignite.apache.org/viewQueued.html?itemId=526101&tab=queuedBuildOverviewTab

*Michael*, please check the tests results lately. I won’t be available in the nearest 4 days.

*Ilya*, please benchmark IGNITE-4828 branch against master branch in FULL_SYNC mode.


Denis

> On Mar 31, 2017, at 2:59 AM, michael.griggs <[hidden email]> wrote:
>
> The change is now ready for review:
>
> https://github.com/apache/ignite/pull/1707
>
>
>
> --
> View this message in context: http://apache-ignite-developers.2346864.n4.nabble.com/Distribution-of-keys-to-partitions-tp15455p16005.html
> Sent from the Apache Ignite Developers mailing list archive at Nabble.com.

Reply | Threaded
Open this post in threaded view
|

Re: Distribution of keys to partitions

yzhdanov
Guys, I replied in ticket. Overall I liked the changes but I think this
needs additional elaboration. Please see
https://issues.apache.org/jira/browse/IGNITE-4828

--Yakov

2017-03-31 20:09 GMT+03:00 Denis Magda <[hidden email]>:

> Michael, thanks.
>
> I did minor improvements and merged them to IGNITE-4828 branch and
> triggered TeamCity tests:
> http://ci.ignite.apache.org/viewQueued.html?itemId=526101&
> tab=queuedBuildOverviewTab
>
> *Michael*, please check the tests results lately. I won’t be available in
> the nearest 4 days.
>
> *Ilya*, please benchmark IGNITE-4828 branch against master branch in
> FULL_SYNC mode.
>
> —
> Denis
>
> > On Mar 31, 2017, at 2:59 AM, michael.griggs <[hidden email]>
> wrote:
> >
> > The change is now ready for review:
> >
> > https://github.com/apache/ignite/pull/1707
> >
> >
> >
> > --
> > View this message in context: http://apache-ignite-
> developers.2346864.n4.nabble.com/Distribution-of-keys-to-
> partitions-tp15455p16005.html
> > Sent from the Apache Ignite Developers mailing list archive at
> Nabble.com.
>
>