Apache Ignite Developers - Legacy Mail Archive

Storing short/empty strings in Ignite

Classic

List

Threaded

4 messages Options

Valentin Kulichenko

Storing short/empty strings in Ignite

Hey folks,

While working with Ignite users, I keep seeing data models where a single
object (row) might contain many fields (100, 200, more...), and most of
them are strings.

Correct me if I'm wrong, but per my understanding, for every such field we
store an integer value to represent its length. This is significant
overhead - with 200 fields we spend 800 bytes only for this.

Now here is the catch: vast majority of those strings are actually empty or
very short (several chars), therefore we don't really need 4 bytes to their
length.

My suggestions is to introduce another data type, e.g. STRING_SHORT, use it
for all strings that are 255 chars or less, and therefore use a single byte
to encode length. We can go even further, and also introduce STRING_EMPTY,
which obviously doesn't need any length information at all.

What do you guys think?

-Val

Vladimir Ozerov

Re: Storing short/empty strings in Ignite

Hi Val,

I would say that we do not need string length at all, because it can be
derived from object footer (next field offset MINUS current field offset).
It is not very good idea to implement proposed change in Apache Ignite 2.x
because it is breaking and will add unnecessary complexity to already very
complex binary infrastructure. Instead, it is better to review binary
format in 3.0 and remove length's not only from Strings, but from other
variable-length data types as well (arrays, decimals).

On Tue, Mar 5, 2019 at 10:12 AM Valentin Kulichenko <
[hidden email]> wrote:

> Hey folks,
>
> While working with Ignite users, I keep seeing data models where a single
> object (row) might contain many fields (100, 200, more...), and most of
> them are strings.
>
> Correct me if I'm wrong, but per my understanding, for every such field we
> store an integer value to represent its length. This is significant
> overhead - with 200 fields we spend 800 bytes only for this.
>
> Now here is the catch: vast majority of those strings are actually empty or
> very short (several chars), therefore we don't really need 4 bytes to their
> length.
>
> My suggestions is to introduce another data type, e.g. STRING_SHORT, use it
> for all strings that are 255 chars or less, and therefore use a single byte
> to encode length. We can go even further, and also introduce STRING_EMPTY,
> which obviously doesn't need any length information at all.
>
> What do you guys think?
>
> -Val
>

Ilya Kasnacheev

Re: Storing short/empty strings in Ignite

In reply to this post by Valentin Kulichenko

Hello!

If you can modify your code to store nulls instead of empty strings, nulls
seem to be much more compact.

Regards,
--
Ilya Kasnacheev

вт, 5 мар. 2019 г. в 10:12, Valentin Kulichenko <
[hidden email]>:

yzhdanov

Re: Storing short/empty strings in Ignite

We still need to differentiate between nulls and empty strings.

--Yakov