Storing short/empty strings in Ignite

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Storing short/empty strings in Ignite

Valentin Kulichenko
Hey folks,

While working with Ignite users, I keep seeing data models where a single
object (row) might contain many fields (100, 200, more...), and most of
them are strings.

Correct me if I'm wrong, but per my understanding, for every such field we
store an integer value to represent its length. This is significant
overhead - with 200 fields we spend 800 bytes only for this.

Now here is the catch: vast majority of those strings are actually empty or
very short (several chars), therefore we don't really need 4 bytes to their
length.

My suggestions is to introduce another data type, e.g. STRING_SHORT, use it
for all strings that are 255 chars or less, and therefore use a single byte
to encode length. We can go even further, and also introduce STRING_EMPTY,
which obviously doesn't need any length information at all.

What do you guys think?

-Val
Reply | Threaded
Open this post in threaded view
|

Re: Storing short/empty strings in Ignite

Vladimir Ozerov
Hi Val,

I would say that we do not need string length at all, because it can be
derived from object footer (next field offset MINUS current field offset).
It is not very good idea to implement proposed change in Apache Ignite 2.x
because it is breaking and will add unnecessary complexity to already very
complex binary infrastructure. Instead, it is better to review binary
format in 3.0 and remove length's not only from Strings, but from other
variable-length data types as well (arrays, decimals).

On Tue, Mar 5, 2019 at 10:12 AM Valentin Kulichenko <
[hidden email]> wrote:

> Hey folks,
>
> While working with Ignite users, I keep seeing data models where a single
> object (row) might contain many fields (100, 200, more...), and most of
> them are strings.
>
> Correct me if I'm wrong, but per my understanding, for every such field we
> store an integer value to represent its length. This is significant
> overhead - with 200 fields we spend 800 bytes only for this.
>
> Now here is the catch: vast majority of those strings are actually empty or
> very short (several chars), therefore we don't really need 4 bytes to their
> length.
>
> My suggestions is to introduce another data type, e.g. STRING_SHORT, use it
> for all strings that are 255 chars or less, and therefore use a single byte
> to encode length. We can go even further, and also introduce STRING_EMPTY,
> which obviously doesn't need any length information at all.
>
> What do you guys think?
>
> -Val
>
Reply | Threaded
Open this post in threaded view
|

Re: Storing short/empty strings in Ignite

Ilya Kasnacheev
In reply to this post by Valentin Kulichenko
Hello!

If you can modify your code to store nulls instead of empty strings, nulls
seem to be much more compact.

Regards,
--
Ilya Kasnacheev


вт, 5 мар. 2019 г. в 10:12, Valentin Kulichenko <
[hidden email]>:

> Hey folks,
>
> While working with Ignite users, I keep seeing data models where a single
> object (row) might contain many fields (100, 200, more...), and most of
> them are strings.
>
> Correct me if I'm wrong, but per my understanding, for every such field we
> store an integer value to represent its length. This is significant
> overhead - with 200 fields we spend 800 bytes only for this.
>
> Now here is the catch: vast majority of those strings are actually empty or
> very short (several chars), therefore we don't really need 4 bytes to their
> length.
>
> My suggestions is to introduce another data type, e.g. STRING_SHORT, use it
> for all strings that are 255 chars or less, and therefore use a single byte
> to encode length. We can go even further, and also introduce STRING_EMPTY,
> which obviously doesn't need any length information at all.
>
> What do you guys think?
>
> -Val
>
Reply | Threaded
Open this post in threaded view
|

Re: Storing short/empty strings in Ignite

yzhdanov
We still need to differentiate between nulls and empty strings.

--Yakov