[IMPORTANT] Future of Binary Objects

classic Classic list List threaded Threaded
29 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[IMPORTANT] Future of Binary Objects

Vladimir Ozerov
Igniters,

It is very likely that Apache Ignite 3.0 will be released next year. So we
need to start thinking about major product improvements. I'd like to start
with binary objects.

Currently they are one of the main limiting factors for the product. They
are fat - 30+ bytes overhead on average, high TCO of Apache Ignite
comparing to other vendors. They are slow - not suitable for SQL at all.

I would like to ask all of you who worked with binary objects to share your
feedback and ideas, so that we understand how they should look like in AI
3.0. This is a brain storm - let's accumulate ideas first and minimize
critics. Then we will work on ideas in separate topics.

1) Historical background

BO were implemented around 2014 (Apache Ignite 1.5) when we started working
on .NET and CPP clients. During design we had several ideas in mind:
- ability to read object fields in O(1) without deserialization
- interoperabillty between Java, .NET and CPP.

Since then a number of other concepts were mixed to the cocktail:
- Affinity key fields
- Strict typing for existing fields (aka metadata)
- Binary Object as storage format

2) My proposals

2.1) Introduce "Data Row Format" interface
Binary Objects are terrible candidates for storage. Too fat, too slow.
Efficient storage typically has <10 bytes overhead per row (no metadata, no
length, no hash code, etc), allow supper-fast field access, support
different string formats (ASCII, UTF-8, etc), support different temporal
types (date, time, timestamp, timestamp with timezone, etc), and store
these types as efficiently as possible.

What we need is to introduce an interface which will convert a pair of
key-value objects into a row. This row will be used to store data and to
get fields from it. Care about memory consumption, need SQL and strict
schema - use one format. Need flexibility and prefer key-value access - use
another format which will store binary objects unchanged (current behavior).

interface DataRowFormat {
    DataRow create(Object key, Object value); // primitives or binary
objects
    DataRowMetadata metadata();
}

2.2) Remove affinity field from metadata
Affinity rules are governed by cache, not type. We should remove
"affintiyFieldName" from metadata.

2.3) Remove restrictions on changing field type
I do not know why we did that in the first place. This restriction prevents
type evolution and confuses users.

2.4) Use bitmaps for "null" and default values and for fixed-length fields,
put fixed-length fields before variable-length.
Motivation: to save space.

What else? Please share your ideas.

Vladimir.
Reply | Threaded
Open this post in threaded view
|

Re: [IMPORTANT] Future of Binary Objects

Alexey Zinoviev
Do we discuss here Core features only or the roadmap for all components?

вт, 20 нояб. 2018 г. в 10:05, Vladimir Ozerov <[hidden email]>:

> Igniters,
>
> It is very likely that Apache Ignite 3.0 will be released next year. So we
> need to start thinking about major product improvements. I'd like to start
> with binary objects.
>
> Currently they are one of the main limiting factors for the product. They
> are fat - 30+ bytes overhead on average, high TCO of Apache Ignite
> comparing to other vendors. They are slow - not suitable for SQL at all.
>
> I would like to ask all of you who worked with binary objects to share your
> feedback and ideas, so that we understand how they should look like in AI
> 3.0. This is a brain storm - let's accumulate ideas first and minimize
> critics. Then we will work on ideas in separate topics.
>
> 1) Historical background
>
> BO were implemented around 2014 (Apache Ignite 1.5) when we started working
> on .NET and CPP clients. During design we had several ideas in mind:
> - ability to read object fields in O(1) without deserialization
> - interoperabillty between Java, .NET and CPP.
>
> Since then a number of other concepts were mixed to the cocktail:
> - Affinity key fields
> - Strict typing for existing fields (aka metadata)
> - Binary Object as storage format
>
> 2) My proposals
>
> 2.1) Introduce "Data Row Format" interface
> Binary Objects are terrible candidates for storage. Too fat, too slow.
> Efficient storage typically has <10 bytes overhead per row (no metadata, no
> length, no hash code, etc), allow supper-fast field access, support
> different string formats (ASCII, UTF-8, etc), support different temporal
> types (date, time, timestamp, timestamp with timezone, etc), and store
> these types as efficiently as possible.
>
> What we need is to introduce an interface which will convert a pair of
> key-value objects into a row. This row will be used to store data and to
> get fields from it. Care about memory consumption, need SQL and strict
> schema - use one format. Need flexibility and prefer key-value access - use
> another format which will store binary objects unchanged (current
> behavior).
>
> interface DataRowFormat {
>     DataRow create(Object key, Object value); // primitives or binary
> objects
>     DataRowMetadata metadata();
> }
>
> 2.2) Remove affinity field from metadata
> Affinity rules are governed by cache, not type. We should remove
> "affintiyFieldName" from metadata.
>
> 2.3) Remove restrictions on changing field type
> I do not know why we did that in the first place. This restriction prevents
> type evolution and confuses users.
>
> 2.4) Use bitmaps for "null" and default values and for fixed-length fields,
> put fixed-length fields before variable-length.
> Motivation: to save space.
>
> What else? Please share your ideas.
>
> Vladimir.
>
Reply | Threaded
Open this post in threaded view
|

Re: [IMPORTANT] Future of Binary Objects

Vladimir Ozerov
Hi Alexey,

Binary Objects only.

On Tue, Nov 20, 2018 at 10:50 AM Alexey Zinoviev <[hidden email]>
wrote:

> Do we discuss here Core features only or the roadmap for all components?
>
> вт, 20 нояб. 2018 г. в 10:05, Vladimir Ozerov <[hidden email]>:
>
> > Igniters,
> >
> > It is very likely that Apache Ignite 3.0 will be released next year. So
> we
> > need to start thinking about major product improvements. I'd like to
> start
> > with binary objects.
> >
> > Currently they are one of the main limiting factors for the product. They
> > are fat - 30+ bytes overhead on average, high TCO of Apache Ignite
> > comparing to other vendors. They are slow - not suitable for SQL at all.
> >
> > I would like to ask all of you who worked with binary objects to share
> your
> > feedback and ideas, so that we understand how they should look like in AI
> > 3.0. This is a brain storm - let's accumulate ideas first and minimize
> > critics. Then we will work on ideas in separate topics.
> >
> > 1) Historical background
> >
> > BO were implemented around 2014 (Apache Ignite 1.5) when we started
> working
> > on .NET and CPP clients. During design we had several ideas in mind:
> > - ability to read object fields in O(1) without deserialization
> > - interoperabillty between Java, .NET and CPP.
> >
> > Since then a number of other concepts were mixed to the cocktail:
> > - Affinity key fields
> > - Strict typing for existing fields (aka metadata)
> > - Binary Object as storage format
> >
> > 2) My proposals
> >
> > 2.1) Introduce "Data Row Format" interface
> > Binary Objects are terrible candidates for storage. Too fat, too slow.
> > Efficient storage typically has <10 bytes overhead per row (no metadata,
> no
> > length, no hash code, etc), allow supper-fast field access, support
> > different string formats (ASCII, UTF-8, etc), support different temporal
> > types (date, time, timestamp, timestamp with timezone, etc), and store
> > these types as efficiently as possible.
> >
> > What we need is to introduce an interface which will convert a pair of
> > key-value objects into a row. This row will be used to store data and to
> > get fields from it. Care about memory consumption, need SQL and strict
> > schema - use one format. Need flexibility and prefer key-value access -
> use
> > another format which will store binary objects unchanged (current
> > behavior).
> >
> > interface DataRowFormat {
> >     DataRow create(Object key, Object value); // primitives or binary
> > objects
> >     DataRowMetadata metadata();
> > }
> >
> > 2.2) Remove affinity field from metadata
> > Affinity rules are governed by cache, not type. We should remove
> > "affintiyFieldName" from metadata.
> >
> > 2.3) Remove restrictions on changing field type
> > I do not know why we did that in the first place. This restriction
> prevents
> > type evolution and confuses users.
> >
> > 2.4) Use bitmaps for "null" and default values and for fixed-length
> fields,
> > put fixed-length fields before variable-length.
> > Motivation: to save space.
> >
> > What else? Please share your ideas.
> >
> > Vladimir.
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [IMPORTANT] Future of Binary Objects

daradurvs
I think, one of a possible way to reduce overhead and TCO - SQL Scheme approach.

That assumes that metadata will be stored separately from serialized
data to reduce size.
In this case, the most advantages of Binary Objects like access in
O(1) and access without deserialization may be achieved.
On Tue, Nov 20, 2018 at 10:56 AM Vladimir Ozerov <[hidden email]> wrote:

>
> Hi Alexey,
>
> Binary Objects only.
>
> On Tue, Nov 20, 2018 at 10:50 AM Alexey Zinoviev <[hidden email]>
> wrote:
>
> > Do we discuss here Core features only or the roadmap for all components?
> >
> > вт, 20 нояб. 2018 г. в 10:05, Vladimir Ozerov <[hidden email]>:
> >
> > > Igniters,
> > >
> > > It is very likely that Apache Ignite 3.0 will be released next year. So
> > we
> > > need to start thinking about major product improvements. I'd like to
> > start
> > > with binary objects.
> > >
> > > Currently they are one of the main limiting factors for the product. They
> > > are fat - 30+ bytes overhead on average, high TCO of Apache Ignite
> > > comparing to other vendors. They are slow - not suitable for SQL at all.
> > >
> > > I would like to ask all of you who worked with binary objects to share
> > your
> > > feedback and ideas, so that we understand how they should look like in AI
> > > 3.0. This is a brain storm - let's accumulate ideas first and minimize
> > > critics. Then we will work on ideas in separate topics.
> > >
> > > 1) Historical background
> > >
> > > BO were implemented around 2014 (Apache Ignite 1.5) when we started
> > working
> > > on .NET and CPP clients. During design we had several ideas in mind:
> > > - ability to read object fields in O(1) without deserialization
> > > - interoperabillty between Java, .NET and CPP.
> > >
> > > Since then a number of other concepts were mixed to the cocktail:
> > > - Affinity key fields
> > > - Strict typing for existing fields (aka metadata)
> > > - Binary Object as storage format
> > >
> > > 2) My proposals
> > >
> > > 2.1) Introduce "Data Row Format" interface
> > > Binary Objects are terrible candidates for storage. Too fat, too slow.
> > > Efficient storage typically has <10 bytes overhead per row (no metadata,
> > no
> > > length, no hash code, etc), allow supper-fast field access, support
> > > different string formats (ASCII, UTF-8, etc), support different temporal
> > > types (date, time, timestamp, timestamp with timezone, etc), and store
> > > these types as efficiently as possible.
> > >
> > > What we need is to introduce an interface which will convert a pair of
> > > key-value objects into a row. This row will be used to store data and to
> > > get fields from it. Care about memory consumption, need SQL and strict
> > > schema - use one format. Need flexibility and prefer key-value access -
> > use
> > > another format which will store binary objects unchanged (current
> > > behavior).
> > >
> > > interface DataRowFormat {
> > >     DataRow create(Object key, Object value); // primitives or binary
> > > objects
> > >     DataRowMetadata metadata();
> > > }
> > >
> > > 2.2) Remove affinity field from metadata
> > > Affinity rules are governed by cache, not type. We should remove
> > > "affintiyFieldName" from metadata.
> > >
> > > 2.3) Remove restrictions on changing field type
> > > I do not know why we did that in the first place. This restriction
> > prevents
> > > type evolution and confuses users.
> > >
> > > 2.4) Use bitmaps for "null" and default values and for fixed-length
> > fields,
> > > put fixed-length fields before variable-length.
> > > Motivation: to save space.
> > >
> > > What else? Please share your ideas.
> > >
> > > Vladimir.
> > >
> >



--
Best Regards, Vyacheslav D.
Reply | Threaded
Open this post in threaded view
|

Re: [IMPORTANT] Future of Binary Objects

Sergi
I really like Protobuf format. It is probably not what we need for O(1)
fields access,
but for compact data representation we can derive lots from there.

Also IMO, restricting field type change is absolutely sane idea.
The correct way to evolve schema in common case is to add new fields and
gradually
deprecate the old ones, if you can skip default/null fields in binary
format this approach
will not introduce any noticeable performance/size overhead.

Sergi

вт, 20 нояб. 2018 г. в 11:12, Vyacheslav Daradur <[hidden email]>:

> I think, one of a possible way to reduce overhead and TCO - SQL Scheme
> approach.
>
> That assumes that metadata will be stored separately from serialized
> data to reduce size.
> In this case, the most advantages of Binary Objects like access in
> O(1) and access without deserialization may be achieved.
> On Tue, Nov 20, 2018 at 10:56 AM Vladimir Ozerov <[hidden email]>
> wrote:
> >
> > Hi Alexey,
> >
> > Binary Objects only.
> >
> > On Tue, Nov 20, 2018 at 10:50 AM Alexey Zinoviev <[hidden email]
> >
> > wrote:
> >
> > > Do we discuss here Core features only or the roadmap for all
> components?
> > >
> > > вт, 20 нояб. 2018 г. в 10:05, Vladimir Ozerov <[hidden email]>:
> > >
> > > > Igniters,
> > > >
> > > > It is very likely that Apache Ignite 3.0 will be released next year.
> So
> > > we
> > > > need to start thinking about major product improvements. I'd like to
> > > start
> > > > with binary objects.
> > > >
> > > > Currently they are one of the main limiting factors for the product.
> They
> > > > are fat - 30+ bytes overhead on average, high TCO of Apache Ignite
> > > > comparing to other vendors. They are slow - not suitable for SQL at
> all.
> > > >
> > > > I would like to ask all of you who worked with binary objects to
> share
> > > your
> > > > feedback and ideas, so that we understand how they should look like
> in AI
> > > > 3.0. This is a brain storm - let's accumulate ideas first and
> minimize
> > > > critics. Then we will work on ideas in separate topics.
> > > >
> > > > 1) Historical background
> > > >
> > > > BO were implemented around 2014 (Apache Ignite 1.5) when we started
> > > working
> > > > on .NET and CPP clients. During design we had several ideas in mind:
> > > > - ability to read object fields in O(1) without deserialization
> > > > - interoperabillty between Java, .NET and CPP.
> > > >
> > > > Since then a number of other concepts were mixed to the cocktail:
> > > > - Affinity key fields
> > > > - Strict typing for existing fields (aka metadata)
> > > > - Binary Object as storage format
> > > >
> > > > 2) My proposals
> > > >
> > > > 2.1) Introduce "Data Row Format" interface
> > > > Binary Objects are terrible candidates for storage. Too fat, too
> slow.
> > > > Efficient storage typically has <10 bytes overhead per row (no
> metadata,
> > > no
> > > > length, no hash code, etc), allow supper-fast field access, support
> > > > different string formats (ASCII, UTF-8, etc), support different
> temporal
> > > > types (date, time, timestamp, timestamp with timezone, etc), and
> store
> > > > these types as efficiently as possible.
> > > >
> > > > What we need is to introduce an interface which will convert a pair
> of
> > > > key-value objects into a row. This row will be used to store data
> and to
> > > > get fields from it. Care about memory consumption, need SQL and
> strict
> > > > schema - use one format. Need flexibility and prefer key-value
> access -
> > > use
> > > > another format which will store binary objects unchanged (current
> > > > behavior).
> > > >
> > > > interface DataRowFormat {
> > > >     DataRow create(Object key, Object value); // primitives or binary
> > > > objects
> > > >     DataRowMetadata metadata();
> > > > }
> > > >
> > > > 2.2) Remove affinity field from metadata
> > > > Affinity rules are governed by cache, not type. We should remove
> > > > "affintiyFieldName" from metadata.
> > > >
> > > > 2.3) Remove restrictions on changing field type
> > > > I do not know why we did that in the first place. This restriction
> > > prevents
> > > > type evolution and confuses users.
> > > >
> > > > 2.4) Use bitmaps for "null" and default values and for fixed-length
> > > fields,
> > > > put fixed-length fields before variable-length.
> > > > Motivation: to save space.
> > > >
> > > > What else? Please share your ideas.
> > > >
> > > > Vladimir.
> > > >
> > >
>
>
>
> --
> Best Regards, Vyacheslav D.
>
Reply | Threaded
Open this post in threaded view
|

Re: [IMPORTANT] Future of Binary Objects

Alexey Zinoviev
I'd like @Vyacheslav Daradur approach.

Maybe somebody could have a look at UnsafeRow in Spark
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java
UnsafeRow is a concrete InternalRow that represents a mutable internal
raw-memory (and hence unsafe) binary row format.

P.S. If somebody is interested in this apporach, I could share more
information

вт, 20 нояб. 2018 г. в 11:33, Sergi Vladykin <[hidden email]>:

> I really like Protobuf format. It is probably not what we need for O(1)
> fields access,
> but for compact data representation we can derive lots from there.
>
> Also IMO, restricting field type change is absolutely sane idea.
> The correct way to evolve schema in common case is to add new fields and
> gradually
> deprecate the old ones, if you can skip default/null fields in binary
> format this approach
> will not introduce any noticeable performance/size overhead.
>
> Sergi
>
> вт, 20 нояб. 2018 г. в 11:12, Vyacheslav Daradur <[hidden email]>:
>
> > I think, one of a possible way to reduce overhead and TCO - SQL Scheme
> > approach.
> >
> > That assumes that metadata will be stored separately from serialized
> > data to reduce size.
> > In this case, the most advantages of Binary Objects like access in
> > O(1) and access without deserialization may be achieved.
> > On Tue, Nov 20, 2018 at 10:56 AM Vladimir Ozerov <[hidden email]>
> > wrote:
> > >
> > > Hi Alexey,
> > >
> > > Binary Objects only.
> > >
> > > On Tue, Nov 20, 2018 at 10:50 AM Alexey Zinoviev <
> [hidden email]
> > >
> > > wrote:
> > >
> > > > Do we discuss here Core features only or the roadmap for all
> > components?
> > > >
> > > > вт, 20 нояб. 2018 г. в 10:05, Vladimir Ozerov <[hidden email]
> >:
> > > >
> > > > > Igniters,
> > > > >
> > > > > It is very likely that Apache Ignite 3.0 will be released next
> year.
> > So
> > > > we
> > > > > need to start thinking about major product improvements. I'd like
> to
> > > > start
> > > > > with binary objects.
> > > > >
> > > > > Currently they are one of the main limiting factors for the
> product.
> > They
> > > > > are fat - 30+ bytes overhead on average, high TCO of Apache Ignite
> > > > > comparing to other vendors. They are slow - not suitable for SQL at
> > all.
> > > > >
> > > > > I would like to ask all of you who worked with binary objects to
> > share
> > > > your
> > > > > feedback and ideas, so that we understand how they should look like
> > in AI
> > > > > 3.0. This is a brain storm - let's accumulate ideas first and
> > minimize
> > > > > critics. Then we will work on ideas in separate topics.
> > > > >
> > > > > 1) Historical background
> > > > >
> > > > > BO were implemented around 2014 (Apache Ignite 1.5) when we started
> > > > working
> > > > > on .NET and CPP clients. During design we had several ideas in
> mind:
> > > > > - ability to read object fields in O(1) without deserialization
> > > > > - interoperabillty between Java, .NET and CPP.
> > > > >
> > > > > Since then a number of other concepts were mixed to the cocktail:
> > > > > - Affinity key fields
> > > > > - Strict typing for existing fields (aka metadata)
> > > > > - Binary Object as storage format
> > > > >
> > > > > 2) My proposals
> > > > >
> > > > > 2.1) Introduce "Data Row Format" interface
> > > > > Binary Objects are terrible candidates for storage. Too fat, too
> > slow.
> > > > > Efficient storage typically has <10 bytes overhead per row (no
> > metadata,
> > > > no
> > > > > length, no hash code, etc), allow supper-fast field access, support
> > > > > different string formats (ASCII, UTF-8, etc), support different
> > temporal
> > > > > types (date, time, timestamp, timestamp with timezone, etc), and
> > store
> > > > > these types as efficiently as possible.
> > > > >
> > > > > What we need is to introduce an interface which will convert a pair
> > of
> > > > > key-value objects into a row. This row will be used to store data
> > and to
> > > > > get fields from it. Care about memory consumption, need SQL and
> > strict
> > > > > schema - use one format. Need flexibility and prefer key-value
> > access -
> > > > use
> > > > > another format which will store binary objects unchanged (current
> > > > > behavior).
> > > > >
> > > > > interface DataRowFormat {
> > > > >     DataRow create(Object key, Object value); // primitives or
> binary
> > > > > objects
> > > > >     DataRowMetadata metadata();
> > > > > }
> > > > >
> > > > > 2.2) Remove affinity field from metadata
> > > > > Affinity rules are governed by cache, not type. We should remove
> > > > > "affintiyFieldName" from metadata.
> > > > >
> > > > > 2.3) Remove restrictions on changing field type
> > > > > I do not know why we did that in the first place. This restriction
> > > > prevents
> > > > > type evolution and confuses users.
> > > > >
> > > > > 2.4) Use bitmaps for "null" and default values and for fixed-length
> > > > fields,
> > > > > put fixed-length fields before variable-length.
> > > > > Motivation: to save space.
> > > > >
> > > > > What else? Please share your ideas.
> > > > >
> > > > > Vladimir.
> > > > >
> > > >
> >
> >
> >
> > --
> > Best Regards, Vyacheslav D.
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [IMPORTANT] Future of Binary Objects

Vladimir Ozerov
In reply to this post by daradurvs
Vyacheslav,

Metadata is already stored separately. Object only contain 4 bytes
reference to that metadata (aka "schema ID") and offsets to be able to find
fields quickly. But if separate row format from binary format, we may be
able to reduce it event further to some extent.

On Tue, Nov 20, 2018 at 11:12 AM Vyacheslav Daradur <[hidden email]>
wrote:

> I think, one of a possible way to reduce overhead and TCO - SQL Scheme
> approach.
>
> That assumes that metadata will be stored separately from serialized
> data to reduce size.
> In this case, the most advantages of Binary Objects like access in
> O(1) and access without deserialization may be achieved.
> On Tue, Nov 20, 2018 at 10:56 AM Vladimir Ozerov <[hidden email]>
> wrote:
> >
> > Hi Alexey,
> >
> > Binary Objects only.
> >
> > On Tue, Nov 20, 2018 at 10:50 AM Alexey Zinoviev <[hidden email]
> >
> > wrote:
> >
> > > Do we discuss here Core features only or the roadmap for all
> components?
> > >
> > > вт, 20 нояб. 2018 г. в 10:05, Vladimir Ozerov <[hidden email]>:
> > >
> > > > Igniters,
> > > >
> > > > It is very likely that Apache Ignite 3.0 will be released next year.
> So
> > > we
> > > > need to start thinking about major product improvements. I'd like to
> > > start
> > > > with binary objects.
> > > >
> > > > Currently they are one of the main limiting factors for the product.
> They
> > > > are fat - 30+ bytes overhead on average, high TCO of Apache Ignite
> > > > comparing to other vendors. They are slow - not suitable for SQL at
> all.
> > > >
> > > > I would like to ask all of you who worked with binary objects to
> share
> > > your
> > > > feedback and ideas, so that we understand how they should look like
> in AI
> > > > 3.0. This is a brain storm - let's accumulate ideas first and
> minimize
> > > > critics. Then we will work on ideas in separate topics.
> > > >
> > > > 1) Historical background
> > > >
> > > > BO were implemented around 2014 (Apache Ignite 1.5) when we started
> > > working
> > > > on .NET and CPP clients. During design we had several ideas in mind:
> > > > - ability to read object fields in O(1) without deserialization
> > > > - interoperabillty between Java, .NET and CPP.
> > > >
> > > > Since then a number of other concepts were mixed to the cocktail:
> > > > - Affinity key fields
> > > > - Strict typing for existing fields (aka metadata)
> > > > - Binary Object as storage format
> > > >
> > > > 2) My proposals
> > > >
> > > > 2.1) Introduce "Data Row Format" interface
> > > > Binary Objects are terrible candidates for storage. Too fat, too
> slow.
> > > > Efficient storage typically has <10 bytes overhead per row (no
> metadata,
> > > no
> > > > length, no hash code, etc), allow supper-fast field access, support
> > > > different string formats (ASCII, UTF-8, etc), support different
> temporal
> > > > types (date, time, timestamp, timestamp with timezone, etc), and
> store
> > > > these types as efficiently as possible.
> > > >
> > > > What we need is to introduce an interface which will convert a pair
> of
> > > > key-value objects into a row. This row will be used to store data
> and to
> > > > get fields from it. Care about memory consumption, need SQL and
> strict
> > > > schema - use one format. Need flexibility and prefer key-value
> access -
> > > use
> > > > another format which will store binary objects unchanged (current
> > > > behavior).
> > > >
> > > > interface DataRowFormat {
> > > >     DataRow create(Object key, Object value); // primitives or binary
> > > > objects
> > > >     DataRowMetadata metadata();
> > > > }
> > > >
> > > > 2.2) Remove affinity field from metadata
> > > > Affinity rules are governed by cache, not type. We should remove
> > > > "affintiyFieldName" from metadata.
> > > >
> > > > 2.3) Remove restrictions on changing field type
> > > > I do not know why we did that in the first place. This restriction
> > > prevents
> > > > type evolution and confuses users.
> > > >
> > > > 2.4) Use bitmaps for "null" and default values and for fixed-length
> > > fields,
> > > > put fixed-length fields before variable-length.
> > > > Motivation: to save space.
> > > >
> > > > What else? Please share your ideas.
> > > >
> > > > Vladimir.
> > > >
> > >
>
>
>
> --
> Best Regards, Vyacheslav D.
>
Reply | Threaded
Open this post in threaded view
|

Re: [IMPORTANT] Future of Binary Objects

Vladimir Ozerov
In reply to this post by Alexey Zinoviev
Hi Alexey,

Yes, this looks really similar to Postgres format as welд - bitset, fixed
fields, varlen fields. Most probably we need something similar.

On Wed, Nov 21, 2018 at 10:10 AM Alexey Zinoviev <[hidden email]>
wrote:

> I'd like @Vyacheslav Daradur approach.
>
> Maybe somebody could have a look at UnsafeRow in Spark
>
> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java
> UnsafeRow is a concrete InternalRow that represents a mutable internal
> raw-memory (and hence unsafe) binary row format.
>
> P.S. If somebody is interested in this apporach, I could share more
> information
>
> вт, 20 нояб. 2018 г. в 11:33, Sergi Vladykin <[hidden email]>:
>
> > I really like Protobuf format. It is probably not what we need for O(1)
> > fields access,
> > but for compact data representation we can derive lots from there.
> >
> > Also IMO, restricting field type change is absolutely sane idea.
> > The correct way to evolve schema in common case is to add new fields and
> > gradually
> > deprecate the old ones, if you can skip default/null fields in binary
> > format this approach
> > will not introduce any noticeable performance/size overhead.
> >
> > Sergi
> >
> > вт, 20 нояб. 2018 г. в 11:12, Vyacheslav Daradur <[hidden email]>:
> >
> > > I think, one of a possible way to reduce overhead and TCO - SQL Scheme
> > > approach.
> > >
> > > That assumes that metadata will be stored separately from serialized
> > > data to reduce size.
> > > In this case, the most advantages of Binary Objects like access in
> > > O(1) and access without deserialization may be achieved.
> > > On Tue, Nov 20, 2018 at 10:56 AM Vladimir Ozerov <[hidden email]
> >
> > > wrote:
> > > >
> > > > Hi Alexey,
> > > >
> > > > Binary Objects only.
> > > >
> > > > On Tue, Nov 20, 2018 at 10:50 AM Alexey Zinoviev <
> > [hidden email]
> > > >
> > > > wrote:
> > > >
> > > > > Do we discuss here Core features only or the roadmap for all
> > > components?
> > > > >
> > > > > вт, 20 нояб. 2018 г. в 10:05, Vladimir Ozerov <
> [hidden email]
> > >:
> > > > >
> > > > > > Igniters,
> > > > > >
> > > > > > It is very likely that Apache Ignite 3.0 will be released next
> > year.
> > > So
> > > > > we
> > > > > > need to start thinking about major product improvements. I'd like
> > to
> > > > > start
> > > > > > with binary objects.
> > > > > >
> > > > > > Currently they are one of the main limiting factors for the
> > product.
> > > They
> > > > > > are fat - 30+ bytes overhead on average, high TCO of Apache
> Ignite
> > > > > > comparing to other vendors. They are slow - not suitable for SQL
> at
> > > all.
> > > > > >
> > > > > > I would like to ask all of you who worked with binary objects to
> > > share
> > > > > your
> > > > > > feedback and ideas, so that we understand how they should look
> like
> > > in AI
> > > > > > 3.0. This is a brain storm - let's accumulate ideas first and
> > > minimize
> > > > > > critics. Then we will work on ideas in separate topics.
> > > > > >
> > > > > > 1) Historical background
> > > > > >
> > > > > > BO were implemented around 2014 (Apache Ignite 1.5) when we
> started
> > > > > working
> > > > > > on .NET and CPP clients. During design we had several ideas in
> > mind:
> > > > > > - ability to read object fields in O(1) without deserialization
> > > > > > - interoperabillty between Java, .NET and CPP.
> > > > > >
> > > > > > Since then a number of other concepts were mixed to the cocktail:
> > > > > > - Affinity key fields
> > > > > > - Strict typing for existing fields (aka metadata)
> > > > > > - Binary Object as storage format
> > > > > >
> > > > > > 2) My proposals
> > > > > >
> > > > > > 2.1) Introduce "Data Row Format" interface
> > > > > > Binary Objects are terrible candidates for storage. Too fat, too
> > > slow.
> > > > > > Efficient storage typically has <10 bytes overhead per row (no
> > > metadata,
> > > > > no
> > > > > > length, no hash code, etc), allow supper-fast field access,
> support
> > > > > > different string formats (ASCII, UTF-8, etc), support different
> > > temporal
> > > > > > types (date, time, timestamp, timestamp with timezone, etc), and
> > > store
> > > > > > these types as efficiently as possible.
> > > > > >
> > > > > > What we need is to introduce an interface which will convert a
> pair
> > > of
> > > > > > key-value objects into a row. This row will be used to store data
> > > and to
> > > > > > get fields from it. Care about memory consumption, need SQL and
> > > strict
> > > > > > schema - use one format. Need flexibility and prefer key-value
> > > access -
> > > > > use
> > > > > > another format which will store binary objects unchanged (current
> > > > > > behavior).
> > > > > >
> > > > > > interface DataRowFormat {
> > > > > >     DataRow create(Object key, Object value); // primitives or
> > binary
> > > > > > objects
> > > > > >     DataRowMetadata metadata();
> > > > > > }
> > > > > >
> > > > > > 2.2) Remove affinity field from metadata
> > > > > > Affinity rules are governed by cache, not type. We should remove
> > > > > > "affintiyFieldName" from metadata.
> > > > > >
> > > > > > 2.3) Remove restrictions on changing field type
> > > > > > I do not know why we did that in the first place. This
> restriction
> > > > > prevents
> > > > > > type evolution and confuses users.
> > > > > >
> > > > > > 2.4) Use bitmaps for "null" and default values and for
> fixed-length
> > > > > fields,
> > > > > > put fixed-length fields before variable-length.
> > > > > > Motivation: to save space.
> > > > > >
> > > > > > What else? Please share your ideas.
> > > > > >
> > > > > > Vladimir.
> > > > > >
> > > > >
> > >
> > >
> > >
> > > --
> > > Best Regards, Vyacheslav D.
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [IMPORTANT] Future of Binary Objects

Vladimir Ozerov
In reply to this post by Sergi
Sergi,

Changing filed name to change it's type is not user-friendly approach.
Because it will prevent operations like, which are perfectly normal from
user perspective:
ALTER TABLE my_table MODIFY COLUMN x BIGINT; // Was INT previously.

Command above is way more simpler than:
1. ALTER TABLE my_table DROP COLUMN x;
2. ALTER TABLE my_table ADD COLUMN x1 BIGINT;
3. Change application code in multiple places to deal with new fields

Binary object is essentially a collection of key-value pairs, no more than
that, so there is no need to restrict field types. All confusion will go
away, if we introduce "RowFormat" interface on cache level, which I
explained briefly in previous emails. In this case we may have "flexible"
row format allowing any types for the same field as long as user
application tolerates this, and we can have "strict" row format with
concrete fields, concrete types and concrete restrictions on them (NOT
NULL, CHECK, etc). In this case user still can create a binary object with
any type, but it might be rejected on storage level by "RowFormat"
implementation.


On Tue, Nov 20, 2018 at 11:33 AM Sergi Vladykin <[hidden email]>
wrote:

> I really like Protobuf format. It is probably not what we need for O(1)
> fields access,
> but for compact data representation we can derive lots from there.
>
> Also IMO, restricting field type change is absolutely sane idea.
> The correct way to evolve schema in common case is to add new fields and
> gradually
> deprecate the old ones, if you can skip default/null fields in binary
> format this approach
> will not introduce any noticeable performance/size overhead.
>
> Sergi
>
> вт, 20 нояб. 2018 г. в 11:12, Vyacheslav Daradur <[hidden email]>:
>
> > I think, one of a possible way to reduce overhead and TCO - SQL Scheme
> > approach.
> >
> > That assumes that metadata will be stored separately from serialized
> > data to reduce size.
> > In this case, the most advantages of Binary Objects like access in
> > O(1) and access without deserialization may be achieved.
> > On Tue, Nov 20, 2018 at 10:56 AM Vladimir Ozerov <[hidden email]>
> > wrote:
> > >
> > > Hi Alexey,
> > >
> > > Binary Objects only.
> > >
> > > On Tue, Nov 20, 2018 at 10:50 AM Alexey Zinoviev <
> [hidden email]
> > >
> > > wrote:
> > >
> > > > Do we discuss here Core features only or the roadmap for all
> > components?
> > > >
> > > > вт, 20 нояб. 2018 г. в 10:05, Vladimir Ozerov <[hidden email]
> >:
> > > >
> > > > > Igniters,
> > > > >
> > > > > It is very likely that Apache Ignite 3.0 will be released next
> year.
> > So
> > > > we
> > > > > need to start thinking about major product improvements. I'd like
> to
> > > > start
> > > > > with binary objects.
> > > > >
> > > > > Currently they are one of the main limiting factors for the
> product.
> > They
> > > > > are fat - 30+ bytes overhead on average, high TCO of Apache Ignite
> > > > > comparing to other vendors. They are slow - not suitable for SQL at
> > all.
> > > > >
> > > > > I would like to ask all of you who worked with binary objects to
> > share
> > > > your
> > > > > feedback and ideas, so that we understand how they should look like
> > in AI
> > > > > 3.0. This is a brain storm - let's accumulate ideas first and
> > minimize
> > > > > critics. Then we will work on ideas in separate topics.
> > > > >
> > > > > 1) Historical background
> > > > >
> > > > > BO were implemented around 2014 (Apache Ignite 1.5) when we started
> > > > working
> > > > > on .NET and CPP clients. During design we had several ideas in
> mind:
> > > > > - ability to read object fields in O(1) without deserialization
> > > > > - interoperabillty between Java, .NET and CPP.
> > > > >
> > > > > Since then a number of other concepts were mixed to the cocktail:
> > > > > - Affinity key fields
> > > > > - Strict typing for existing fields (aka metadata)
> > > > > - Binary Object as storage format
> > > > >
> > > > > 2) My proposals
> > > > >
> > > > > 2.1) Introduce "Data Row Format" interface
> > > > > Binary Objects are terrible candidates for storage. Too fat, too
> > slow.
> > > > > Efficient storage typically has <10 bytes overhead per row (no
> > metadata,
> > > > no
> > > > > length, no hash code, etc), allow supper-fast field access, support
> > > > > different string formats (ASCII, UTF-8, etc), support different
> > temporal
> > > > > types (date, time, timestamp, timestamp with timezone, etc), and
> > store
> > > > > these types as efficiently as possible.
> > > > >
> > > > > What we need is to introduce an interface which will convert a pair
> > of
> > > > > key-value objects into a row. This row will be used to store data
> > and to
> > > > > get fields from it. Care about memory consumption, need SQL and
> > strict
> > > > > schema - use one format. Need flexibility and prefer key-value
> > access -
> > > > use
> > > > > another format which will store binary objects unchanged (current
> > > > > behavior).
> > > > >
> > > > > interface DataRowFormat {
> > > > >     DataRow create(Object key, Object value); // primitives or
> binary
> > > > > objects
> > > > >     DataRowMetadata metadata();
> > > > > }
> > > > >
> > > > > 2.2) Remove affinity field from metadata
> > > > > Affinity rules are governed by cache, not type. We should remove
> > > > > "affintiyFieldName" from metadata.
> > > > >
> > > > > 2.3) Remove restrictions on changing field type
> > > > > I do not know why we did that in the first place. This restriction
> > > > prevents
> > > > > type evolution and confuses users.
> > > > >
> > > > > 2.4) Use bitmaps for "null" and default values and for fixed-length
> > > > fields,
> > > > > put fixed-length fields before variable-length.
> > > > > Motivation: to save space.
> > > > >
> > > > > What else? Please share your ideas.
> > > > >
> > > > > Vladimir.
> > > > >
> > > >
> >
> >
> >
> > --
> > Best Regards, Vyacheslav D.
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [IMPORTANT] Future of Binary Objects

Denis Mekhanikov
In reply to this post by Alexey Zinoviev
People often ask about possibility to store their data in that format, that
they use in their applications.
If you use Avro everywhere in your application, then why not store data in
the same format in Ignite?
So, how about making an interface, that would enlist all operations we
need,
and use this interface everywhere without relying on any specific
implementation.
*BinaryObject* looks like a suitable interface, but the only
implementation, that you can get from Ignite
is *BinaryObjectImpl*.
I think, we should make Ignite extendible and provide capability to specify
your own data format
by implementing the corresponding interfaces.
So, if you like JSONB or Protobuf or whatever else, you could enable a
module for the corresponding
format, and use it for storing the data.

Denis

ср, 21 нояб. 2018 г. в 10:10, Alexey Zinoviev <[hidden email]>:

> I'd like @Vyacheslav Daradur approach.
>
> Maybe somebody could have a look at UnsafeRow in Spark
>
> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java
> UnsafeRow is a concrete InternalRow that represents a mutable internal
> raw-memory (and hence unsafe) binary row format.
>
> P.S. If somebody is interested in this apporach, I could share more
> information
>
> вт, 20 нояб. 2018 г. в 11:33, Sergi Vladykin <[hidden email]>:
>
> > I really like Protobuf format. It is probably not what we need for O(1)
> > fields access,
> > but for compact data representation we can derive lots from there.
> >
> > Also IMO, restricting field type change is absolutely sane idea.
> > The correct way to evolve schema in common case is to add new fields and
> > gradually
> > deprecate the old ones, if you can skip default/null fields in binary
> > format this approach
> > will not introduce any noticeable performance/size overhead.
> >
> > Sergi
> >
> > вт, 20 нояб. 2018 г. в 11:12, Vyacheslav Daradur <[hidden email]>:
> >
> > > I think, one of a possible way to reduce overhead and TCO - SQL Scheme
> > > approach.
> > >
> > > That assumes that metadata will be stored separately from serialized
> > > data to reduce size.
> > > In this case, the most advantages of Binary Objects like access in
> > > O(1) and access without deserialization may be achieved.
> > > On Tue, Nov 20, 2018 at 10:56 AM Vladimir Ozerov <[hidden email]
> >
> > > wrote:
> > > >
> > > > Hi Alexey,
> > > >
> > > > Binary Objects only.
> > > >
> > > > On Tue, Nov 20, 2018 at 10:50 AM Alexey Zinoviev <
> > [hidden email]
> > > >
> > > > wrote:
> > > >
> > > > > Do we discuss here Core features only or the roadmap for all
> > > components?
> > > > >
> > > > > вт, 20 нояб. 2018 г. в 10:05, Vladimir Ozerov <
> [hidden email]
> > >:
> > > > >
> > > > > > Igniters,
> > > > > >
> > > > > > It is very likely that Apache Ignite 3.0 will be released next
> > year.
> > > So
> > > > > we
> > > > > > need to start thinking about major product improvements. I'd like
> > to
> > > > > start
> > > > > > with binary objects.
> > > > > >
> > > > > > Currently they are one of the main limiting factors for the
> > product.
> > > They
> > > > > > are fat - 30+ bytes overhead on average, high TCO of Apache
> Ignite
> > > > > > comparing to other vendors. They are slow - not suitable for SQL
> at
> > > all.
> > > > > >
> > > > > > I would like to ask all of you who worked with binary objects to
> > > share
> > > > > your
> > > > > > feedback and ideas, so that we understand how they should look
> like
> > > in AI
> > > > > > 3.0. This is a brain storm - let's accumulate ideas first and
> > > minimize
> > > > > > critics. Then we will work on ideas in separate topics.
> > > > > >
> > > > > > 1) Historical background
> > > > > >
> > > > > > BO were implemented around 2014 (Apache Ignite 1.5) when we
> started
> > > > > working
> > > > > > on .NET and CPP clients. During design we had several ideas in
> > mind:
> > > > > > - ability to read object fields in O(1) without deserialization
> > > > > > - interoperabillty between Java, .NET and CPP.
> > > > > >
> > > > > > Since then a number of other concepts were mixed to the cocktail:
> > > > > > - Affinity key fields
> > > > > > - Strict typing for existing fields (aka metadata)
> > > > > > - Binary Object as storage format
> > > > > >
> > > > > > 2) My proposals
> > > > > >
> > > > > > 2.1) Introduce "Data Row Format" interface
> > > > > > Binary Objects are terrible candidates for storage. Too fat, too
> > > slow.
> > > > > > Efficient storage typically has <10 bytes overhead per row (no
> > > metadata,
> > > > > no
> > > > > > length, no hash code, etc), allow supper-fast field access,
> support
> > > > > > different string formats (ASCII, UTF-8, etc), support different
> > > temporal
> > > > > > types (date, time, timestamp, timestamp with timezone, etc), and
> > > store
> > > > > > these types as efficiently as possible.
> > > > > >
> > > > > > What we need is to introduce an interface which will convert a
> pair
> > > of
> > > > > > key-value objects into a row. This row will be used to store data
> > > and to
> > > > > > get fields from it. Care about memory consumption, need SQL and
> > > strict
> > > > > > schema - use one format. Need flexibility and prefer key-value
> > > access -
> > > > > use
> > > > > > another format which will store binary objects unchanged (current
> > > > > > behavior).
> > > > > >
> > > > > > interface DataRowFormat {
> > > > > >     DataRow create(Object key, Object value); // primitives or
> > binary
> > > > > > objects
> > > > > >     DataRowMetadata metadata();
> > > > > > }
> > > > > >
> > > > > > 2.2) Remove affinity field from metadata
> > > > > > Affinity rules are governed by cache, not type. We should remove
> > > > > > "affintiyFieldName" from metadata.
> > > > > >
> > > > > > 2.3) Remove restrictions on changing field type
> > > > > > I do not know why we did that in the first place. This
> restriction
> > > > > prevents
> > > > > > type evolution and confuses users.
> > > > > >
> > > > > > 2.4) Use bitmaps for "null" and default values and for
> fixed-length
> > > > > fields,
> > > > > > put fixed-length fields before variable-length.
> > > > > > Motivation: to save space.
> > > > > >
> > > > > > What else? Please share your ideas.
> > > > > >
> > > > > > Vladimir.
> > > > > >
> > > > >
> > >
> > >
> > >
> > > --
> > > Best Regards, Vyacheslav D.
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [IMPORTANT] Future of Binary Objects

Igor Sapego-2
I want to offer several optimizations:

1. If we store fields metadata anyway, and are going to store bitmasks for
null fields, should we also exclude "header" byte from object field? As we
can get field type info from a metadata.

2. If we have subsequent fields of fixed length we can avoid storing offset
to these field, as we can easily calculate these offsets. We can even store
them in metadata to improve performance.

3. If these two optimizations are adopted, it makes sense to mention in docs
that it is highly recommended to write fixed sized types in the beginning
of the
object.

Best Regards,
Igor


On Wed, Nov 21, 2018 at 11:37 AM Denis Mekhanikov <[hidden email]>
wrote:

> People often ask about possibility to store their data in that format, that
> they use in their applications.
> If you use Avro everywhere in your application, then why not store data in
> the same format in Ignite?
> So, how about making an interface, that would enlist all operations we
> need,
> and use this interface everywhere without relying on any specific
> implementation.
> *BinaryObject* looks like a suitable interface, but the only
> implementation, that you can get from Ignite
> is *BinaryObjectImpl*.
> I think, we should make Ignite extendible and provide capability to specify
> your own data format
> by implementing the corresponding interfaces.
> So, if you like JSONB or Protobuf or whatever else, you could enable a
> module for the corresponding
> format, and use it for storing the data.
>
> Denis
>
> ср, 21 нояб. 2018 г. в 10:10, Alexey Zinoviev <[hidden email]>:
>
> > I'd like @Vyacheslav Daradur approach.
> >
> > Maybe somebody could have a look at UnsafeRow in Spark
> >
> >
> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java
> > UnsafeRow is a concrete InternalRow that represents a mutable internal
> > raw-memory (and hence unsafe) binary row format.
> >
> > P.S. If somebody is interested in this apporach, I could share more
> > information
> >
> > вт, 20 нояб. 2018 г. в 11:33, Sergi Vladykin <[hidden email]>:
> >
> > > I really like Protobuf format. It is probably not what we need for O(1)
> > > fields access,
> > > but for compact data representation we can derive lots from there.
> > >
> > > Also IMO, restricting field type change is absolutely sane idea.
> > > The correct way to evolve schema in common case is to add new fields
> and
> > > gradually
> > > deprecate the old ones, if you can skip default/null fields in binary
> > > format this approach
> > > will not introduce any noticeable performance/size overhead.
> > >
> > > Sergi
> > >
> > > вт, 20 нояб. 2018 г. в 11:12, Vyacheslav Daradur <[hidden email]
> >:
> > >
> > > > I think, one of a possible way to reduce overhead and TCO - SQL
> Scheme
> > > > approach.
> > > >
> > > > That assumes that metadata will be stored separately from serialized
> > > > data to reduce size.
> > > > In this case, the most advantages of Binary Objects like access in
> > > > O(1) and access without deserialization may be achieved.
> > > > On Tue, Nov 20, 2018 at 10:56 AM Vladimir Ozerov <
> [hidden email]
> > >
> > > > wrote:
> > > > >
> > > > > Hi Alexey,
> > > > >
> > > > > Binary Objects only.
> > > > >
> > > > > On Tue, Nov 20, 2018 at 10:50 AM Alexey Zinoviev <
> > > [hidden email]
> > > > >
> > > > > wrote:
> > > > >
> > > > > > Do we discuss here Core features only or the roadmap for all
> > > > components?
> > > > > >
> > > > > > вт, 20 нояб. 2018 г. в 10:05, Vladimir Ozerov <
> > [hidden email]
> > > >:
> > > > > >
> > > > > > > Igniters,
> > > > > > >
> > > > > > > It is very likely that Apache Ignite 3.0 will be released next
> > > year.
> > > > So
> > > > > > we
> > > > > > > need to start thinking about major product improvements. I'd
> like
> > > to
> > > > > > start
> > > > > > > with binary objects.
> > > > > > >
> > > > > > > Currently they are one of the main limiting factors for the
> > > product.
> > > > They
> > > > > > > are fat - 30+ bytes overhead on average, high TCO of Apache
> > Ignite
> > > > > > > comparing to other vendors. They are slow - not suitable for
> SQL
> > at
> > > > all.
> > > > > > >
> > > > > > > I would like to ask all of you who worked with binary objects
> to
> > > > share
> > > > > > your
> > > > > > > feedback and ideas, so that we understand how they should look
> > like
> > > > in AI
> > > > > > > 3.0. This is a brain storm - let's accumulate ideas first and
> > > > minimize
> > > > > > > critics. Then we will work on ideas in separate topics.
> > > > > > >
> > > > > > > 1) Historical background
> > > > > > >
> > > > > > > BO were implemented around 2014 (Apache Ignite 1.5) when we
> > started
> > > > > > working
> > > > > > > on .NET and CPP clients. During design we had several ideas in
> > > mind:
> > > > > > > - ability to read object fields in O(1) without deserialization
> > > > > > > - interoperabillty between Java, .NET and CPP.
> > > > > > >
> > > > > > > Since then a number of other concepts were mixed to the
> cocktail:
> > > > > > > - Affinity key fields
> > > > > > > - Strict typing for existing fields (aka metadata)
> > > > > > > - Binary Object as storage format
> > > > > > >
> > > > > > > 2) My proposals
> > > > > > >
> > > > > > > 2.1) Introduce "Data Row Format" interface
> > > > > > > Binary Objects are terrible candidates for storage. Too fat,
> too
> > > > slow.
> > > > > > > Efficient storage typically has <10 bytes overhead per row (no
> > > > metadata,
> > > > > > no
> > > > > > > length, no hash code, etc), allow supper-fast field access,
> > support
> > > > > > > different string formats (ASCII, UTF-8, etc), support different
> > > > temporal
> > > > > > > types (date, time, timestamp, timestamp with timezone, etc),
> and
> > > > store
> > > > > > > these types as efficiently as possible.
> > > > > > >
> > > > > > > What we need is to introduce an interface which will convert a
> > pair
> > > > of
> > > > > > > key-value objects into a row. This row will be used to store
> data
> > > > and to
> > > > > > > get fields from it. Care about memory consumption, need SQL and
> > > > strict
> > > > > > > schema - use one format. Need flexibility and prefer key-value
> > > > access -
> > > > > > use
> > > > > > > another format which will store binary objects unchanged
> (current
> > > > > > > behavior).
> > > > > > >
> > > > > > > interface DataRowFormat {
> > > > > > >     DataRow create(Object key, Object value); // primitives or
> > > binary
> > > > > > > objects
> > > > > > >     DataRowMetadata metadata();
> > > > > > > }
> > > > > > >
> > > > > > > 2.2) Remove affinity field from metadata
> > > > > > > Affinity rules are governed by cache, not type. We should
> remove
> > > > > > > "affintiyFieldName" from metadata.
> > > > > > >
> > > > > > > 2.3) Remove restrictions on changing field type
> > > > > > > I do not know why we did that in the first place. This
> > restriction
> > > > > > prevents
> > > > > > > type evolution and confuses users.
> > > > > > >
> > > > > > > 2.4) Use bitmaps for "null" and default values and for
> > fixed-length
> > > > > > fields,
> > > > > > > put fixed-length fields before variable-length.
> > > > > > > Motivation: to save space.
> > > > > > >
> > > > > > > What else? Please share your ideas.
> > > > > > >
> > > > > > > Vladimir.
> > > > > > >
> > > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Best Regards, Vyacheslav D.
> > > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [IMPORTANT] Future of Binary Objects

Vladimir Ozerov
In reply to this post by Denis Mekhanikov
Denis,

Could you please clarify - are you talking about storage, e.g. how objects
are stored in Ignite, or about serialization as a whole? I'd like to better
understand whether the use case you described is relevant to my idea of
splitting binary objects from underlying storage format.
My vision was that we can use current BinaryObject protocol (with whatever
optimizations needed), as a common format for communication between nodes
and a common serialization protocol. This is very handy because all
participants (Java, С++, .NET, all sorts of thin clients) are able to work
with it. So if I have a "Person" class in Java I can read it in any other
platform without any additional configuration. But when it comes to
*storage*, then we may introduce pluggable row format interface which will
apply any necessary transformations. So if someone wants to store objects
in Avro/Protobuf, and ready to configure and implement it (generate
classes, implementa field extraction logic, etc.) - then just implement
that interface. They key is that this implementation will only be needed in
Java, not in a dozen of platform we support.

But when it comes to how to store object in a cache

On Wed, Nov 21, 2018 at 11:37 AM Denis Mekhanikov <[hidden email]>
wrote:

> People often ask about possibility to store their data in that format, that
> they use in their applications.
> If you use Avro everywhere in your application, then why not store data in
> the same format in Ignite?
> So, how about making an interface, that would enlist all operations we
> need,
> and use this interface everywhere without relying on any specific
> implementation.
> *BinaryObject* looks like a suitable interface, but the only
> implementation, that you can get from Ignite
> is *BinaryObjectImpl*.
> I think, we should make Ignite extendible and provide capability to specify
> your own data format
> by implementing the corresponding interfaces.
> So, if you like JSONB or Protobuf or whatever else, you could enable a
> module for the corresponding
> format, and use it for storing the data.
>
> Denis
>
> ср, 21 нояб. 2018 г. в 10:10, Alexey Zinoviev <[hidden email]>:
>
> > I'd like @Vyacheslav Daradur approach.
> >
> > Maybe somebody could have a look at UnsafeRow in Spark
> >
> >
> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java
> > UnsafeRow is a concrete InternalRow that represents a mutable internal
> > raw-memory (and hence unsafe) binary row format.
> >
> > P.S. If somebody is interested in this apporach, I could share more
> > information
> >
> > вт, 20 нояб. 2018 г. в 11:33, Sergi Vladykin <[hidden email]>:
> >
> > > I really like Protobuf format. It is probably not what we need for O(1)
> > > fields access,
> > > but for compact data representation we can derive lots from there.
> > >
> > > Also IMO, restricting field type change is absolutely sane idea.
> > > The correct way to evolve schema in common case is to add new fields
> and
> > > gradually
> > > deprecate the old ones, if you can skip default/null fields in binary
> > > format this approach
> > > will not introduce any noticeable performance/size overhead.
> > >
> > > Sergi
> > >
> > > вт, 20 нояб. 2018 г. в 11:12, Vyacheslav Daradur <[hidden email]
> >:
> > >
> > > > I think, one of a possible way to reduce overhead and TCO - SQL
> Scheme
> > > > approach.
> > > >
> > > > That assumes that metadata will be stored separately from serialized
> > > > data to reduce size.
> > > > In this case, the most advantages of Binary Objects like access in
> > > > O(1) and access without deserialization may be achieved.
> > > > On Tue, Nov 20, 2018 at 10:56 AM Vladimir Ozerov <
> [hidden email]
> > >
> > > > wrote:
> > > > >
> > > > > Hi Alexey,
> > > > >
> > > > > Binary Objects only.
> > > > >
> > > > > On Tue, Nov 20, 2018 at 10:50 AM Alexey Zinoviev <
> > > [hidden email]
> > > > >
> > > > > wrote:
> > > > >
> > > > > > Do we discuss here Core features only or the roadmap for all
> > > > components?
> > > > > >
> > > > > > вт, 20 нояб. 2018 г. в 10:05, Vladimir Ozerov <
> > [hidden email]
> > > >:
> > > > > >
> > > > > > > Igniters,
> > > > > > >
> > > > > > > It is very likely that Apache Ignite 3.0 will be released next
> > > year.
> > > > So
> > > > > > we
> > > > > > > need to start thinking about major product improvements. I'd
> like
> > > to
> > > > > > start
> > > > > > > with binary objects.
> > > > > > >
> > > > > > > Currently they are one of the main limiting factors for the
> > > product.
> > > > They
> > > > > > > are fat - 30+ bytes overhead on average, high TCO of Apache
> > Ignite
> > > > > > > comparing to other vendors. They are slow - not suitable for
> SQL
> > at
> > > > all.
> > > > > > >
> > > > > > > I would like to ask all of you who worked with binary objects
> to
> > > > share
> > > > > > your
> > > > > > > feedback and ideas, so that we understand how they should look
> > like
> > > > in AI
> > > > > > > 3.0. This is a brain storm - let's accumulate ideas first and
> > > > minimize
> > > > > > > critics. Then we will work on ideas in separate topics.
> > > > > > >
> > > > > > > 1) Historical background
> > > > > > >
> > > > > > > BO were implemented around 2014 (Apache Ignite 1.5) when we
> > started
> > > > > > working
> > > > > > > on .NET and CPP clients. During design we had several ideas in
> > > mind:
> > > > > > > - ability to read object fields in O(1) without deserialization
> > > > > > > - interoperabillty between Java, .NET and CPP.
> > > > > > >
> > > > > > > Since then a number of other concepts were mixed to the
> cocktail:
> > > > > > > - Affinity key fields
> > > > > > > - Strict typing for existing fields (aka metadata)
> > > > > > > - Binary Object as storage format
> > > > > > >
> > > > > > > 2) My proposals
> > > > > > >
> > > > > > > 2.1) Introduce "Data Row Format" interface
> > > > > > > Binary Objects are terrible candidates for storage. Too fat,
> too
> > > > slow.
> > > > > > > Efficient storage typically has <10 bytes overhead per row (no
> > > > metadata,
> > > > > > no
> > > > > > > length, no hash code, etc), allow supper-fast field access,
> > support
> > > > > > > different string formats (ASCII, UTF-8, etc), support different
> > > > temporal
> > > > > > > types (date, time, timestamp, timestamp with timezone, etc),
> and
> > > > store
> > > > > > > these types as efficiently as possible.
> > > > > > >
> > > > > > > What we need is to introduce an interface which will convert a
> > pair
> > > > of
> > > > > > > key-value objects into a row. This row will be used to store
> data
> > > > and to
> > > > > > > get fields from it. Care about memory consumption, need SQL and
> > > > strict
> > > > > > > schema - use one format. Need flexibility and prefer key-value
> > > > access -
> > > > > > use
> > > > > > > another format which will store binary objects unchanged
> (current
> > > > > > > behavior).
> > > > > > >
> > > > > > > interface DataRowFormat {
> > > > > > >     DataRow create(Object key, Object value); // primitives or
> > > binary
> > > > > > > objects
> > > > > > >     DataRowMetadata metadata();
> > > > > > > }
> > > > > > >
> > > > > > > 2.2) Remove affinity field from metadata
> > > > > > > Affinity rules are governed by cache, not type. We should
> remove
> > > > > > > "affintiyFieldName" from metadata.
> > > > > > >
> > > > > > > 2.3) Remove restrictions on changing field type
> > > > > > > I do not know why we did that in the first place. This
> > restriction
> > > > > > prevents
> > > > > > > type evolution and confuses users.
> > > > > > >
> > > > > > > 2.4) Use bitmaps for "null" and default values and for
> > fixed-length
> > > > > > fields,
> > > > > > > put fixed-length fields before variable-length.
> > > > > > > Motivation: to save space.
> > > > > > >
> > > > > > > What else? Please share your ideas.
> > > > > > >
> > > > > > > Vladimir.
> > > > > > >
> > > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Best Regards, Vyacheslav D.
> > > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [IMPORTANT] Future of Binary Objects

Denis Mekhanikov
Vladimir,

Thank you for the clarification. I didn't see this distinction first.

I meant using customizable formats for all serialization, not only for
storage.
The idea behind my proposal is to avoid data conversion, when loading data
into Ignite.
It will complicate usage of thin clients though, so I'm not sure, that it
will make users happier.

But anyway, the same approach may be used for storage only.

Denis

ср, 21 нояб. 2018 г. в 12:57, Vladimir Ozerov <[hidden email]>:

> Denis,
>
> Could you please clarify - are you talking about storage, e.g. how objects
> are stored in Ignite, or about serialization as a whole? I'd like to better
> understand whether the use case you described is relevant to my idea of
> splitting binary objects from underlying storage format.
> My vision was that we can use current BinaryObject protocol (with whatever
> optimizations needed), as a common format for communication between nodes
> and a common serialization protocol. This is very handy because all
> participants (Java, С++, .NET, all sorts of thin clients) are able to work
> with it. So if I have a "Person" class in Java I can read it in any other
> platform without any additional configuration. But when it comes to
> *storage*, then we may introduce pluggable row format interface which will
> apply any necessary transformations. So if someone wants to store objects
> in Avro/Protobuf, and ready to configure and implement it (generate
> classes, implementa field extraction logic, etc.) - then just implement
> that interface. They key is that this implementation will only be needed in
> Java, not in a dozen of platform we support.
>
> But when it comes to how to store object in a cache
>
> On Wed, Nov 21, 2018 at 11:37 AM Denis Mekhanikov <[hidden email]>
> wrote:
>
> > People often ask about possibility to store their data in that format,
> that
> > they use in their applications.
> > If you use Avro everywhere in your application, then why not store data
> in
> > the same format in Ignite?
> > So, how about making an interface, that would enlist all operations we
> > need,
> > and use this interface everywhere without relying on any specific
> > implementation.
> > *BinaryObject* looks like a suitable interface, but the only
> > implementation, that you can get from Ignite
> > is *BinaryObjectImpl*.
> > I think, we should make Ignite extendible and provide capability to
> specify
> > your own data format
> > by implementing the corresponding interfaces.
> > So, if you like JSONB or Protobuf or whatever else, you could enable a
> > module for the corresponding
> > format, and use it for storing the data.
> >
> > Denis
> >
> > ср, 21 нояб. 2018 г. в 10:10, Alexey Zinoviev <[hidden email]>:
> >
> > > I'd like @Vyacheslav Daradur approach.
> > >
> > > Maybe somebody could have a look at UnsafeRow in Spark
> > >
> > >
> >
> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java
> > > UnsafeRow is a concrete InternalRow that represents a mutable internal
> > > raw-memory (and hence unsafe) binary row format.
> > >
> > > P.S. If somebody is interested in this apporach, I could share more
> > > information
> > >
> > > вт, 20 нояб. 2018 г. в 11:33, Sergi Vladykin <[hidden email]
> >:
> > >
> > > > I really like Protobuf format. It is probably not what we need for
> O(1)
> > > > fields access,
> > > > but for compact data representation we can derive lots from there.
> > > >
> > > > Also IMO, restricting field type change is absolutely sane idea.
> > > > The correct way to evolve schema in common case is to add new fields
> > and
> > > > gradually
> > > > deprecate the old ones, if you can skip default/null fields in binary
> > > > format this approach
> > > > will not introduce any noticeable performance/size overhead.
> > > >
> > > > Sergi
> > > >
> > > > вт, 20 нояб. 2018 г. в 11:12, Vyacheslav Daradur <
> [hidden email]
> > >:
> > > >
> > > > > I think, one of a possible way to reduce overhead and TCO - SQL
> > Scheme
> > > > > approach.
> > > > >
> > > > > That assumes that metadata will be stored separately from
> serialized
> > > > > data to reduce size.
> > > > > In this case, the most advantages of Binary Objects like access in
> > > > > O(1) and access without deserialization may be achieved.
> > > > > On Tue, Nov 20, 2018 at 10:56 AM Vladimir Ozerov <
> > [hidden email]
> > > >
> > > > > wrote:
> > > > > >
> > > > > > Hi Alexey,
> > > > > >
> > > > > > Binary Objects only.
> > > > > >
> > > > > > On Tue, Nov 20, 2018 at 10:50 AM Alexey Zinoviev <
> > > > [hidden email]
> > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Do we discuss here Core features only or the roadmap for all
> > > > > components?
> > > > > > >
> > > > > > > вт, 20 нояб. 2018 г. в 10:05, Vladimir Ozerov <
> > > [hidden email]
> > > > >:
> > > > > > >
> > > > > > > > Igniters,
> > > > > > > >
> > > > > > > > It is very likely that Apache Ignite 3.0 will be released
> next
> > > > year.
> > > > > So
> > > > > > > we
> > > > > > > > need to start thinking about major product improvements. I'd
> > like
> > > > to
> > > > > > > start
> > > > > > > > with binary objects.
> > > > > > > >
> > > > > > > > Currently they are one of the main limiting factors for the
> > > > product.
> > > > > They
> > > > > > > > are fat - 30+ bytes overhead on average, high TCO of Apache
> > > Ignite
> > > > > > > > comparing to other vendors. They are slow - not suitable for
> > SQL
> > > at
> > > > > all.
> > > > > > > >
> > > > > > > > I would like to ask all of you who worked with binary objects
> > to
> > > > > share
> > > > > > > your
> > > > > > > > feedback and ideas, so that we understand how they should
> look
> > > like
> > > > > in AI
> > > > > > > > 3.0. This is a brain storm - let's accumulate ideas first and
> > > > > minimize
> > > > > > > > critics. Then we will work on ideas in separate topics.
> > > > > > > >
> > > > > > > > 1) Historical background
> > > > > > > >
> > > > > > > > BO were implemented around 2014 (Apache Ignite 1.5) when we
> > > started
> > > > > > > working
> > > > > > > > on .NET and CPP clients. During design we had several ideas
> in
> > > > mind:
> > > > > > > > - ability to read object fields in O(1) without
> deserialization
> > > > > > > > - interoperabillty between Java, .NET and CPP.
> > > > > > > >
> > > > > > > > Since then a number of other concepts were mixed to the
> > cocktail:
> > > > > > > > - Affinity key fields
> > > > > > > > - Strict typing for existing fields (aka metadata)
> > > > > > > > - Binary Object as storage format
> > > > > > > >
> > > > > > > > 2) My proposals
> > > > > > > >
> > > > > > > > 2.1) Introduce "Data Row Format" interface
> > > > > > > > Binary Objects are terrible candidates for storage. Too fat,
> > too
> > > > > slow.
> > > > > > > > Efficient storage typically has <10 bytes overhead per row
> (no
> > > > > metadata,
> > > > > > > no
> > > > > > > > length, no hash code, etc), allow supper-fast field access,
> > > support
> > > > > > > > different string formats (ASCII, UTF-8, etc), support
> different
> > > > > temporal
> > > > > > > > types (date, time, timestamp, timestamp with timezone, etc),
> > and
> > > > > store
> > > > > > > > these types as efficiently as possible.
> > > > > > > >
> > > > > > > > What we need is to introduce an interface which will convert
> a
> > > pair
> > > > > of
> > > > > > > > key-value objects into a row. This row will be used to store
> > data
> > > > > and to
> > > > > > > > get fields from it. Care about memory consumption, need SQL
> and
> > > > > strict
> > > > > > > > schema - use one format. Need flexibility and prefer
> key-value
> > > > > access -
> > > > > > > use
> > > > > > > > another format which will store binary objects unchanged
> > (current
> > > > > > > > behavior).
> > > > > > > >
> > > > > > > > interface DataRowFormat {
> > > > > > > >     DataRow create(Object key, Object value); // primitives
> or
> > > > binary
> > > > > > > > objects
> > > > > > > >     DataRowMetadata metadata();
> > > > > > > > }
> > > > > > > >
> > > > > > > > 2.2) Remove affinity field from metadata
> > > > > > > > Affinity rules are governed by cache, not type. We should
> > remove
> > > > > > > > "affintiyFieldName" from metadata.
> > > > > > > >
> > > > > > > > 2.3) Remove restrictions on changing field type
> > > > > > > > I do not know why we did that in the first place. This
> > > restriction
> > > > > > > prevents
> > > > > > > > type evolution and confuses users.
> > > > > > > >
> > > > > > > > 2.4) Use bitmaps for "null" and default values and for
> > > fixed-length
> > > > > > > fields,
> > > > > > > > put fixed-length fields before variable-length.
> > > > > > > > Motivation: to save space.
> > > > > > > >
> > > > > > > > What else? Please share your ideas.
> > > > > > > >
> > > > > > > > Vladimir.
> > > > > > > >
> > > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Best Regards, Vyacheslav D.
> > > > >
> > > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [IMPORTANT] Future of Binary Objects

Vladimir Ozerov
Denis,

In theory data conversion could be avoided in certain cases. E.g. consider
a case of loading data through streamer. We know the cache, we know it's
metadata and row format. So instead of doing "user object" -> "binary
object" -> "row", we can do "user object" -> "row".

On Wed, Nov 21, 2018 at 1:31 PM Denis Mekhanikov <[hidden email]>
wrote:

> Vladimir,
>
> Thank you for the clarification. I didn't see this distinction first.
>
> I meant using customizable formats for all serialization, not only for
> storage.
> The idea behind my proposal is to avoid data conversion, when loading data
> into Ignite.
> It will complicate usage of thin clients though, so I'm not sure, that it
> will make users happier.
>
> But anyway, the same approach may be used for storage only.
>
> Denis
>
> ср, 21 нояб. 2018 г. в 12:57, Vladimir Ozerov <[hidden email]>:
>
> > Denis,
> >
> > Could you please clarify - are you talking about storage, e.g. how
> objects
> > are stored in Ignite, or about serialization as a whole? I'd like to
> better
> > understand whether the use case you described is relevant to my idea of
> > splitting binary objects from underlying storage format.
> > My vision was that we can use current BinaryObject protocol (with
> whatever
> > optimizations needed), as a common format for communication between nodes
> > and a common serialization protocol. This is very handy because all
> > participants (Java, С++, .NET, all sorts of thin clients) are able to
> work
> > with it. So if I have a "Person" class in Java I can read it in any other
> > platform without any additional configuration. But when it comes to
> > *storage*, then we may introduce pluggable row format interface which
> will
> > apply any necessary transformations. So if someone wants to store objects
> > in Avro/Protobuf, and ready to configure and implement it (generate
> > classes, implementa field extraction logic, etc.) - then just implement
> > that interface. They key is that this implementation will only be needed
> in
> > Java, not in a dozen of platform we support.
> >
> > But when it comes to how to store object in a cache
> >
> > On Wed, Nov 21, 2018 at 11:37 AM Denis Mekhanikov <[hidden email]
> >
> > wrote:
> >
> > > People often ask about possibility to store their data in that format,
> > that
> > > they use in their applications.
> > > If you use Avro everywhere in your application, then why not store data
> > in
> > > the same format in Ignite?
> > > So, how about making an interface, that would enlist all operations we
> > > need,
> > > and use this interface everywhere without relying on any specific
> > > implementation.
> > > *BinaryObject* looks like a suitable interface, but the only
> > > implementation, that you can get from Ignite
> > > is *BinaryObjectImpl*.
> > > I think, we should make Ignite extendible and provide capability to
> > specify
> > > your own data format
> > > by implementing the corresponding interfaces.
> > > So, if you like JSONB or Protobuf or whatever else, you could enable a
> > > module for the corresponding
> > > format, and use it for storing the data.
> > >
> > > Denis
> > >
> > > ср, 21 нояб. 2018 г. в 10:10, Alexey Zinoviev <[hidden email]
> >:
> > >
> > > > I'd like @Vyacheslav Daradur approach.
> > > >
> > > > Maybe somebody could have a look at UnsafeRow in Spark
> > > >
> > > >
> > >
> >
> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java
> > > > UnsafeRow is a concrete InternalRow that represents a mutable
> internal
> > > > raw-memory (and hence unsafe) binary row format.
> > > >
> > > > P.S. If somebody is interested in this apporach, I could share more
> > > > information
> > > >
> > > > вт, 20 нояб. 2018 г. в 11:33, Sergi Vladykin <
> [hidden email]
> > >:
> > > >
> > > > > I really like Protobuf format. It is probably not what we need for
> > O(1)
> > > > > fields access,
> > > > > but for compact data representation we can derive lots from there.
> > > > >
> > > > > Also IMO, restricting field type change is absolutely sane idea.
> > > > > The correct way to evolve schema in common case is to add new
> fields
> > > and
> > > > > gradually
> > > > > deprecate the old ones, if you can skip default/null fields in
> binary
> > > > > format this approach
> > > > > will not introduce any noticeable performance/size overhead.
> > > > >
> > > > > Sergi
> > > > >
> > > > > вт, 20 нояб. 2018 г. в 11:12, Vyacheslav Daradur <
> > [hidden email]
> > > >:
> > > > >
> > > > > > I think, one of a possible way to reduce overhead and TCO - SQL
> > > Scheme
> > > > > > approach.
> > > > > >
> > > > > > That assumes that metadata will be stored separately from
> > serialized
> > > > > > data to reduce size.
> > > > > > In this case, the most advantages of Binary Objects like access
> in
> > > > > > O(1) and access without deserialization may be achieved.
> > > > > > On Tue, Nov 20, 2018 at 10:56 AM Vladimir Ozerov <
> > > [hidden email]
> > > > >
> > > > > > wrote:
> > > > > > >
> > > > > > > Hi Alexey,
> > > > > > >
> > > > > > > Binary Objects only.
> > > > > > >
> > > > > > > On Tue, Nov 20, 2018 at 10:50 AM Alexey Zinoviev <
> > > > > [hidden email]
> > > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Do we discuss here Core features only or the roadmap for all
> > > > > > components?
> > > > > > > >
> > > > > > > > вт, 20 нояб. 2018 г. в 10:05, Vladimir Ozerov <
> > > > [hidden email]
> > > > > >:
> > > > > > > >
> > > > > > > > > Igniters,
> > > > > > > > >
> > > > > > > > > It is very likely that Apache Ignite 3.0 will be released
> > next
> > > > > year.
> > > > > > So
> > > > > > > > we
> > > > > > > > > need to start thinking about major product improvements.
> I'd
> > > like
> > > > > to
> > > > > > > > start
> > > > > > > > > with binary objects.
> > > > > > > > >
> > > > > > > > > Currently they are one of the main limiting factors for the
> > > > > product.
> > > > > > They
> > > > > > > > > are fat - 30+ bytes overhead on average, high TCO of Apache
> > > > Ignite
> > > > > > > > > comparing to other vendors. They are slow - not suitable
> for
> > > SQL
> > > > at
> > > > > > all.
> > > > > > > > >
> > > > > > > > > I would like to ask all of you who worked with binary
> objects
> > > to
> > > > > > share
> > > > > > > > your
> > > > > > > > > feedback and ideas, so that we understand how they should
> > look
> > > > like
> > > > > > in AI
> > > > > > > > > 3.0. This is a brain storm - let's accumulate ideas first
> and
> > > > > > minimize
> > > > > > > > > critics. Then we will work on ideas in separate topics.
> > > > > > > > >
> > > > > > > > > 1) Historical background
> > > > > > > > >
> > > > > > > > > BO were implemented around 2014 (Apache Ignite 1.5) when we
> > > > started
> > > > > > > > working
> > > > > > > > > on .NET and CPP clients. During design we had several ideas
> > in
> > > > > mind:
> > > > > > > > > - ability to read object fields in O(1) without
> > deserialization
> > > > > > > > > - interoperabillty between Java, .NET and CPP.
> > > > > > > > >
> > > > > > > > > Since then a number of other concepts were mixed to the
> > > cocktail:
> > > > > > > > > - Affinity key fields
> > > > > > > > > - Strict typing for existing fields (aka metadata)
> > > > > > > > > - Binary Object as storage format
> > > > > > > > >
> > > > > > > > > 2) My proposals
> > > > > > > > >
> > > > > > > > > 2.1) Introduce "Data Row Format" interface
> > > > > > > > > Binary Objects are terrible candidates for storage. Too
> fat,
> > > too
> > > > > > slow.
> > > > > > > > > Efficient storage typically has <10 bytes overhead per row
> > (no
> > > > > > metadata,
> > > > > > > > no
> > > > > > > > > length, no hash code, etc), allow supper-fast field access,
> > > > support
> > > > > > > > > different string formats (ASCII, UTF-8, etc), support
> > different
> > > > > > temporal
> > > > > > > > > types (date, time, timestamp, timestamp with timezone,
> etc),
> > > and
> > > > > > store
> > > > > > > > > these types as efficiently as possible.
> > > > > > > > >
> > > > > > > > > What we need is to introduce an interface which will
> convert
> > a
> > > > pair
> > > > > > of
> > > > > > > > > key-value objects into a row. This row will be used to
> store
> > > data
> > > > > > and to
> > > > > > > > > get fields from it. Care about memory consumption, need SQL
> > and
> > > > > > strict
> > > > > > > > > schema - use one format. Need flexibility and prefer
> > key-value
> > > > > > access -
> > > > > > > > use
> > > > > > > > > another format which will store binary objects unchanged
> > > (current
> > > > > > > > > behavior).
> > > > > > > > >
> > > > > > > > > interface DataRowFormat {
> > > > > > > > >     DataRow create(Object key, Object value); // primitives
> > or
> > > > > binary
> > > > > > > > > objects
> > > > > > > > >     DataRowMetadata metadata();
> > > > > > > > > }
> > > > > > > > >
> > > > > > > > > 2.2) Remove affinity field from metadata
> > > > > > > > > Affinity rules are governed by cache, not type. We should
> > > remove
> > > > > > > > > "affintiyFieldName" from metadata.
> > > > > > > > >
> > > > > > > > > 2.3) Remove restrictions on changing field type
> > > > > > > > > I do not know why we did that in the first place. This
> > > > restriction
> > > > > > > > prevents
> > > > > > > > > type evolution and confuses users.
> > > > > > > > >
> > > > > > > > > 2.4) Use bitmaps for "null" and default values and for
> > > > fixed-length
> > > > > > > > fields,
> > > > > > > > > put fixed-length fields before variable-length.
> > > > > > > > > Motivation: to save space.
> > > > > > > > >
> > > > > > > > > What else? Please share your ideas.
> > > > > > > > >
> > > > > > > > > Vladimir.
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Best Regards, Vyacheslav D.
> > > > > >
> > > > >
> > > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [IMPORTANT] Future of Binary Objects

Pavel Tupitsyn
Vladimir,

IMO the issue is that we allow any type of data in the cache (put Person,
then put int to the same cache).
Are we going to address this in 3.0 and enforce key/value types according
to cache configuration?
This will provide more space for optimizations.

On Wed, Nov 21, 2018 at 3:14 PM Vladimir Ozerov <[hidden email]>
wrote:

> Denis,
>
> In theory data conversion could be avoided in certain cases. E.g. consider
> a case of loading data through streamer. We know the cache, we know it's
> metadata and row format. So instead of doing "user object" -> "binary
> object" -> "row", we can do "user object" -> "row".
>
> On Wed, Nov 21, 2018 at 1:31 PM Denis Mekhanikov <[hidden email]>
> wrote:
>
> > Vladimir,
> >
> > Thank you for the clarification. I didn't see this distinction first.
> >
> > I meant using customizable formats for all serialization, not only for
> > storage.
> > The idea behind my proposal is to avoid data conversion, when loading
> data
> > into Ignite.
> > It will complicate usage of thin clients though, so I'm not sure, that it
> > will make users happier.
> >
> > But anyway, the same approach may be used for storage only.
> >
> > Denis
> >
> > ср, 21 нояб. 2018 г. в 12:57, Vladimir Ozerov <[hidden email]>:
> >
> > > Denis,
> > >
> > > Could you please clarify - are you talking about storage, e.g. how
> > objects
> > > are stored in Ignite, or about serialization as a whole? I'd like to
> > better
> > > understand whether the use case you described is relevant to my idea of
> > > splitting binary objects from underlying storage format.
> > > My vision was that we can use current BinaryObject protocol (with
> > whatever
> > > optimizations needed), as a common format for communication between
> nodes
> > > and a common serialization protocol. This is very handy because all
> > > participants (Java, С++, .NET, all sorts of thin clients) are able to
> > work
> > > with it. So if I have a "Person" class in Java I can read it in any
> other
> > > platform without any additional configuration. But when it comes to
> > > *storage*, then we may introduce pluggable row format interface which
> > will
> > > apply any necessary transformations. So if someone wants to store
> objects
> > > in Avro/Protobuf, and ready to configure and implement it (generate
> > > classes, implementa field extraction logic, etc.) - then just implement
> > > that interface. They key is that this implementation will only be
> needed
> > in
> > > Java, not in a dozen of platform we support.
> > >
> > > But when it comes to how to store object in a cache
> > >
> > > On Wed, Nov 21, 2018 at 11:37 AM Denis Mekhanikov <
> [hidden email]
> > >
> > > wrote:
> > >
> > > > People often ask about possibility to store their data in that
> format,
> > > that
> > > > they use in their applications.
> > > > If you use Avro everywhere in your application, then why not store
> data
> > > in
> > > > the same format in Ignite?
> > > > So, how about making an interface, that would enlist all operations
> we
> > > > need,
> > > > and use this interface everywhere without relying on any specific
> > > > implementation.
> > > > *BinaryObject* looks like a suitable interface, but the only
> > > > implementation, that you can get from Ignite
> > > > is *BinaryObjectImpl*.
> > > > I think, we should make Ignite extendible and provide capability to
> > > specify
> > > > your own data format
> > > > by implementing the corresponding interfaces.
> > > > So, if you like JSONB or Protobuf or whatever else, you could enable
> a
> > > > module for the corresponding
> > > > format, and use it for storing the data.
> > > >
> > > > Denis
> > > >
> > > > ср, 21 нояб. 2018 г. в 10:10, Alexey Zinoviev <
> [hidden email]
> > >:
> > > >
> > > > > I'd like @Vyacheslav Daradur approach.
> > > > >
> > > > > Maybe somebody could have a look at UnsafeRow in Spark
> > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java
> > > > > UnsafeRow is a concrete InternalRow that represents a mutable
> > internal
> > > > > raw-memory (and hence unsafe) binary row format.
> > > > >
> > > > > P.S. If somebody is interested in this apporach, I could share more
> > > > > information
> > > > >
> > > > > вт, 20 нояб. 2018 г. в 11:33, Sergi Vladykin <
> > [hidden email]
> > > >:
> > > > >
> > > > > > I really like Protobuf format. It is probably not what we need
> for
> > > O(1)
> > > > > > fields access,
> > > > > > but for compact data representation we can derive lots from
> there.
> > > > > >
> > > > > > Also IMO, restricting field type change is absolutely sane idea.
> > > > > > The correct way to evolve schema in common case is to add new
> > fields
> > > > and
> > > > > > gradually
> > > > > > deprecate the old ones, if you can skip default/null fields in
> > binary
> > > > > > format this approach
> > > > > > will not introduce any noticeable performance/size overhead.
> > > > > >
> > > > > > Sergi
> > > > > >
> > > > > > вт, 20 нояб. 2018 г. в 11:12, Vyacheslav Daradur <
> > > [hidden email]
> > > > >:
> > > > > >
> > > > > > > I think, one of a possible way to reduce overhead and TCO - SQL
> > > > Scheme
> > > > > > > approach.
> > > > > > >
> > > > > > > That assumes that metadata will be stored separately from
> > > serialized
> > > > > > > data to reduce size.
> > > > > > > In this case, the most advantages of Binary Objects like access
> > in
> > > > > > > O(1) and access without deserialization may be achieved.
> > > > > > > On Tue, Nov 20, 2018 at 10:56 AM Vladimir Ozerov <
> > > > [hidden email]
> > > > > >
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > Hi Alexey,
> > > > > > > >
> > > > > > > > Binary Objects only.
> > > > > > > >
> > > > > > > > On Tue, Nov 20, 2018 at 10:50 AM Alexey Zinoviev <
> > > > > > [hidden email]
> > > > > > > >
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Do we discuss here Core features only or the roadmap for
> all
> > > > > > > components?
> > > > > > > > >
> > > > > > > > > вт, 20 нояб. 2018 г. в 10:05, Vladimir Ozerov <
> > > > > [hidden email]
> > > > > > >:
> > > > > > > > >
> > > > > > > > > > Igniters,
> > > > > > > > > >
> > > > > > > > > > It is very likely that Apache Ignite 3.0 will be released
> > > next
> > > > > > year.
> > > > > > > So
> > > > > > > > > we
> > > > > > > > > > need to start thinking about major product improvements.
> > I'd
> > > > like
> > > > > > to
> > > > > > > > > start
> > > > > > > > > > with binary objects.
> > > > > > > > > >
> > > > > > > > > > Currently they are one of the main limiting factors for
> the
> > > > > > product.
> > > > > > > They
> > > > > > > > > > are fat - 30+ bytes overhead on average, high TCO of
> Apache
> > > > > Ignite
> > > > > > > > > > comparing to other vendors. They are slow - not suitable
> > for
> > > > SQL
> > > > > at
> > > > > > > all.
> > > > > > > > > >
> > > > > > > > > > I would like to ask all of you who worked with binary
> > objects
> > > > to
> > > > > > > share
> > > > > > > > > your
> > > > > > > > > > feedback and ideas, so that we understand how they should
> > > look
> > > > > like
> > > > > > > in AI
> > > > > > > > > > 3.0. This is a brain storm - let's accumulate ideas first
> > and
> > > > > > > minimize
> > > > > > > > > > critics. Then we will work on ideas in separate topics.
> > > > > > > > > >
> > > > > > > > > > 1) Historical background
> > > > > > > > > >
> > > > > > > > > > BO were implemented around 2014 (Apache Ignite 1.5) when
> we
> > > > > started
> > > > > > > > > working
> > > > > > > > > > on .NET and CPP clients. During design we had several
> ideas
> > > in
> > > > > > mind:
> > > > > > > > > > - ability to read object fields in O(1) without
> > > deserialization
> > > > > > > > > > - interoperabillty between Java, .NET and CPP.
> > > > > > > > > >
> > > > > > > > > > Since then a number of other concepts were mixed to the
> > > > cocktail:
> > > > > > > > > > - Affinity key fields
> > > > > > > > > > - Strict typing for existing fields (aka metadata)
> > > > > > > > > > - Binary Object as storage format
> > > > > > > > > >
> > > > > > > > > > 2) My proposals
> > > > > > > > > >
> > > > > > > > > > 2.1) Introduce "Data Row Format" interface
> > > > > > > > > > Binary Objects are terrible candidates for storage. Too
> > fat,
> > > > too
> > > > > > > slow.
> > > > > > > > > > Efficient storage typically has <10 bytes overhead per
> row
> > > (no
> > > > > > > metadata,
> > > > > > > > > no
> > > > > > > > > > length, no hash code, etc), allow supper-fast field
> access,
> > > > > support
> > > > > > > > > > different string formats (ASCII, UTF-8, etc), support
> > > different
> > > > > > > temporal
> > > > > > > > > > types (date, time, timestamp, timestamp with timezone,
> > etc),
> > > > and
> > > > > > > store
> > > > > > > > > > these types as efficiently as possible.
> > > > > > > > > >
> > > > > > > > > > What we need is to introduce an interface which will
> > convert
> > > a
> > > > > pair
> > > > > > > of
> > > > > > > > > > key-value objects into a row. This row will be used to
> > store
> > > > data
> > > > > > > and to
> > > > > > > > > > get fields from it. Care about memory consumption, need
> SQL
> > > and
> > > > > > > strict
> > > > > > > > > > schema - use one format. Need flexibility and prefer
> > > key-value
> > > > > > > access -
> > > > > > > > > use
> > > > > > > > > > another format which will store binary objects unchanged
> > > > (current
> > > > > > > > > > behavior).
> > > > > > > > > >
> > > > > > > > > > interface DataRowFormat {
> > > > > > > > > >     DataRow create(Object key, Object value); //
> primitives
> > > or
> > > > > > binary
> > > > > > > > > > objects
> > > > > > > > > >     DataRowMetadata metadata();
> > > > > > > > > > }
> > > > > > > > > >
> > > > > > > > > > 2.2) Remove affinity field from metadata
> > > > > > > > > > Affinity rules are governed by cache, not type. We should
> > > > remove
> > > > > > > > > > "affintiyFieldName" from metadata.
> > > > > > > > > >
> > > > > > > > > > 2.3) Remove restrictions on changing field type
> > > > > > > > > > I do not know why we did that in the first place. This
> > > > > restriction
> > > > > > > > > prevents
> > > > > > > > > > type evolution and confuses users.
> > > > > > > > > >
> > > > > > > > > > 2.4) Use bitmaps for "null" and default values and for
> > > > > fixed-length
> > > > > > > > > fields,
> > > > > > > > > > put fixed-length fields before variable-length.
> > > > > > > > > > Motivation: to save space.
> > > > > > > > > >
> > > > > > > > > > What else? Please share your ideas.
> > > > > > > > > >
> > > > > > > > > > Vladimir.
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Best Regards, Vyacheslav D.
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [IMPORTANT] Future of Binary Objects

Vladimir Ozerov
Pavel,

This could be solved with aforementioned "RowFormat". We will be able to
configure cache as follows: "this is a cache with strict type checks, first
one is A, with fields A1, A2, A3, second is B with fields B1, B2". So it
will be possible to serialize anything into binary object, but when it
comes to real store, exception will be thrown.

Makes sense?

On Wed, Nov 21, 2018 at 3:21 PM Pavel Tupitsyn <[hidden email]> wrote:

> Vladimir,
>
> IMO the issue is that we allow any type of data in the cache (put Person,
> then put int to the same cache).
> Are we going to address this in 3.0 and enforce key/value types according
> to cache configuration?
> This will provide more space for optimizations.
>
> On Wed, Nov 21, 2018 at 3:14 PM Vladimir Ozerov <[hidden email]>
> wrote:
>
> > Denis,
> >
> > In theory data conversion could be avoided in certain cases. E.g.
> consider
> > a case of loading data through streamer. We know the cache, we know it's
> > metadata and row format. So instead of doing "user object" -> "binary
> > object" -> "row", we can do "user object" -> "row".
> >
> > On Wed, Nov 21, 2018 at 1:31 PM Denis Mekhanikov <[hidden email]>
> > wrote:
> >
> > > Vladimir,
> > >
> > > Thank you for the clarification. I didn't see this distinction first.
> > >
> > > I meant using customizable formats for all serialization, not only for
> > > storage.
> > > The idea behind my proposal is to avoid data conversion, when loading
> > data
> > > into Ignite.
> > > It will complicate usage of thin clients though, so I'm not sure, that
> it
> > > will make users happier.
> > >
> > > But anyway, the same approach may be used for storage only.
> > >
> > > Denis
> > >
> > > ср, 21 нояб. 2018 г. в 12:57, Vladimir Ozerov <[hidden email]>:
> > >
> > > > Denis,
> > > >
> > > > Could you please clarify - are you talking about storage, e.g. how
> > > objects
> > > > are stored in Ignite, or about serialization as a whole? I'd like to
> > > better
> > > > understand whether the use case you described is relevant to my idea
> of
> > > > splitting binary objects from underlying storage format.
> > > > My vision was that we can use current BinaryObject protocol (with
> > > whatever
> > > > optimizations needed), as a common format for communication between
> > nodes
> > > > and a common serialization protocol. This is very handy because all
> > > > participants (Java, С++, .NET, all sorts of thin clients) are able to
> > > work
> > > > with it. So if I have a "Person" class in Java I can read it in any
> > other
> > > > platform without any additional configuration. But when it comes to
> > > > *storage*, then we may introduce pluggable row format interface which
> > > will
> > > > apply any necessary transformations. So if someone wants to store
> > objects
> > > > in Avro/Protobuf, and ready to configure and implement it (generate
> > > > classes, implementa field extraction logic, etc.) - then just
> implement
> > > > that interface. They key is that this implementation will only be
> > needed
> > > in
> > > > Java, not in a dozen of platform we support.
> > > >
> > > > But when it comes to how to store object in a cache
> > > >
> > > > On Wed, Nov 21, 2018 at 11:37 AM Denis Mekhanikov <
> > [hidden email]
> > > >
> > > > wrote:
> > > >
> > > > > People often ask about possibility to store their data in that
> > format,
> > > > that
> > > > > they use in their applications.
> > > > > If you use Avro everywhere in your application, then why not store
> > data
> > > > in
> > > > > the same format in Ignite?
> > > > > So, how about making an interface, that would enlist all operations
> > we
> > > > > need,
> > > > > and use this interface everywhere without relying on any specific
> > > > > implementation.
> > > > > *BinaryObject* looks like a suitable interface, but the only
> > > > > implementation, that you can get from Ignite
> > > > > is *BinaryObjectImpl*.
> > > > > I think, we should make Ignite extendible and provide capability to
> > > > specify
> > > > > your own data format
> > > > > by implementing the corresponding interfaces.
> > > > > So, if you like JSONB or Protobuf or whatever else, you could
> enable
> > a
> > > > > module for the corresponding
> > > > > format, and use it for storing the data.
> > > > >
> > > > > Denis
> > > > >
> > > > > ср, 21 нояб. 2018 г. в 10:10, Alexey Zinoviev <
> > [hidden email]
> > > >:
> > > > >
> > > > > > I'd like @Vyacheslav Daradur approach.
> > > > > >
> > > > > > Maybe somebody could have a look at UnsafeRow in Spark
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java
> > > > > > UnsafeRow is a concrete InternalRow that represents a mutable
> > > internal
> > > > > > raw-memory (and hence unsafe) binary row format.
> > > > > >
> > > > > > P.S. If somebody is interested in this apporach, I could share
> more
> > > > > > information
> > > > > >
> > > > > > вт, 20 нояб. 2018 г. в 11:33, Sergi Vladykin <
> > > [hidden email]
> > > > >:
> > > > > >
> > > > > > > I really like Protobuf format. It is probably not what we need
> > for
> > > > O(1)
> > > > > > > fields access,
> > > > > > > but for compact data representation we can derive lots from
> > there.
> > > > > > >
> > > > > > > Also IMO, restricting field type change is absolutely sane
> idea.
> > > > > > > The correct way to evolve schema in common case is to add new
> > > fields
> > > > > and
> > > > > > > gradually
> > > > > > > deprecate the old ones, if you can skip default/null fields in
> > > binary
> > > > > > > format this approach
> > > > > > > will not introduce any noticeable performance/size overhead.
> > > > > > >
> > > > > > > Sergi
> > > > > > >
> > > > > > > вт, 20 нояб. 2018 г. в 11:12, Vyacheslav Daradur <
> > > > [hidden email]
> > > > > >:
> > > > > > >
> > > > > > > > I think, one of a possible way to reduce overhead and TCO -
> SQL
> > > > > Scheme
> > > > > > > > approach.
> > > > > > > >
> > > > > > > > That assumes that metadata will be stored separately from
> > > > serialized
> > > > > > > > data to reduce size.
> > > > > > > > In this case, the most advantages of Binary Objects like
> access
> > > in
> > > > > > > > O(1) and access without deserialization may be achieved.
> > > > > > > > On Tue, Nov 20, 2018 at 10:56 AM Vladimir Ozerov <
> > > > > [hidden email]
> > > > > > >
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > Hi Alexey,
> > > > > > > > >
> > > > > > > > > Binary Objects only.
> > > > > > > > >
> > > > > > > > > On Tue, Nov 20, 2018 at 10:50 AM Alexey Zinoviev <
> > > > > > > [hidden email]
> > > > > > > > >
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Do we discuss here Core features only or the roadmap for
> > all
> > > > > > > > components?
> > > > > > > > > >
> > > > > > > > > > вт, 20 нояб. 2018 г. в 10:05, Vladimir Ozerov <
> > > > > > [hidden email]
> > > > > > > >:
> > > > > > > > > >
> > > > > > > > > > > Igniters,
> > > > > > > > > > >
> > > > > > > > > > > It is very likely that Apache Ignite 3.0 will be
> released
> > > > next
> > > > > > > year.
> > > > > > > > So
> > > > > > > > > > we
> > > > > > > > > > > need to start thinking about major product
> improvements.
> > > I'd
> > > > > like
> > > > > > > to
> > > > > > > > > > start
> > > > > > > > > > > with binary objects.
> > > > > > > > > > >
> > > > > > > > > > > Currently they are one of the main limiting factors for
> > the
> > > > > > > product.
> > > > > > > > They
> > > > > > > > > > > are fat - 30+ bytes overhead on average, high TCO of
> > Apache
> > > > > > Ignite
> > > > > > > > > > > comparing to other vendors. They are slow - not
> suitable
> > > for
> > > > > SQL
> > > > > > at
> > > > > > > > all.
> > > > > > > > > > >
> > > > > > > > > > > I would like to ask all of you who worked with binary
> > > objects
> > > > > to
> > > > > > > > share
> > > > > > > > > > your
> > > > > > > > > > > feedback and ideas, so that we understand how they
> should
> > > > look
> > > > > > like
> > > > > > > > in AI
> > > > > > > > > > > 3.0. This is a brain storm - let's accumulate ideas
> first
> > > and
> > > > > > > > minimize
> > > > > > > > > > > critics. Then we will work on ideas in separate topics.
> > > > > > > > > > >
> > > > > > > > > > > 1) Historical background
> > > > > > > > > > >
> > > > > > > > > > > BO were implemented around 2014 (Apache Ignite 1.5)
> when
> > we
> > > > > > started
> > > > > > > > > > working
> > > > > > > > > > > on .NET and CPP clients. During design we had several
> > ideas
> > > > in
> > > > > > > mind:
> > > > > > > > > > > - ability to read object fields in O(1) without
> > > > deserialization
> > > > > > > > > > > - interoperabillty between Java, .NET and CPP.
> > > > > > > > > > >
> > > > > > > > > > > Since then a number of other concepts were mixed to the
> > > > > cocktail:
> > > > > > > > > > > - Affinity key fields
> > > > > > > > > > > - Strict typing for existing fields (aka metadata)
> > > > > > > > > > > - Binary Object as storage format
> > > > > > > > > > >
> > > > > > > > > > > 2) My proposals
> > > > > > > > > > >
> > > > > > > > > > > 2.1) Introduce "Data Row Format" interface
> > > > > > > > > > > Binary Objects are terrible candidates for storage. Too
> > > fat,
> > > > > too
> > > > > > > > slow.
> > > > > > > > > > > Efficient storage typically has <10 bytes overhead per
> > row
> > > > (no
> > > > > > > > metadata,
> > > > > > > > > > no
> > > > > > > > > > > length, no hash code, etc), allow supper-fast field
> > access,
> > > > > > support
> > > > > > > > > > > different string formats (ASCII, UTF-8, etc), support
> > > > different
> > > > > > > > temporal
> > > > > > > > > > > types (date, time, timestamp, timestamp with timezone,
> > > etc),
> > > > > and
> > > > > > > > store
> > > > > > > > > > > these types as efficiently as possible.
> > > > > > > > > > >
> > > > > > > > > > > What we need is to introduce an interface which will
> > > convert
> > > > a
> > > > > > pair
> > > > > > > > of
> > > > > > > > > > > key-value objects into a row. This row will be used to
> > > store
> > > > > data
> > > > > > > > and to
> > > > > > > > > > > get fields from it. Care about memory consumption, need
> > SQL
> > > > and
> > > > > > > > strict
> > > > > > > > > > > schema - use one format. Need flexibility and prefer
> > > > key-value
> > > > > > > > access -
> > > > > > > > > > use
> > > > > > > > > > > another format which will store binary objects
> unchanged
> > > > > (current
> > > > > > > > > > > behavior).
> > > > > > > > > > >
> > > > > > > > > > > interface DataRowFormat {
> > > > > > > > > > >     DataRow create(Object key, Object value); //
> > primitives
> > > > or
> > > > > > > binary
> > > > > > > > > > > objects
> > > > > > > > > > >     DataRowMetadata metadata();
> > > > > > > > > > > }
> > > > > > > > > > >
> > > > > > > > > > > 2.2) Remove affinity field from metadata
> > > > > > > > > > > Affinity rules are governed by cache, not type. We
> should
> > > > > remove
> > > > > > > > > > > "affintiyFieldName" from metadata.
> > > > > > > > > > >
> > > > > > > > > > > 2.3) Remove restrictions on changing field type
> > > > > > > > > > > I do not know why we did that in the first place. This
> > > > > > restriction
> > > > > > > > > > prevents
> > > > > > > > > > > type evolution and confuses users.
> > > > > > > > > > >
> > > > > > > > > > > 2.4) Use bitmaps for "null" and default values and for
> > > > > > fixed-length
> > > > > > > > > > fields,
> > > > > > > > > > > put fixed-length fields before variable-length.
> > > > > > > > > > > Motivation: to save space.
> > > > > > > > > > >
> > > > > > > > > > > What else? Please share your ideas.
> > > > > > > > > > >
> > > > > > > > > > > Vladimir.
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Best Regards, Vyacheslav D.
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [IMPORTANT] Future of Binary Objects

Pavel Tupitsyn
Makes sense.

I'm trying to grasp this from usability POV.
Having two ways of storing data with different behavior can be confusing.
In .NET we already have this issue with DateTime when if you want SQL, you
get subtly different behavior.

So IMO we should enable strict type checks for all caches, even non-SQL
ones.
Users will be able to evolve types by adding/removing fields, but at least
type id will be fixed.
And for SQL caches you'll get a clear exception like "Field does not exist
in SQL schema: foobar"

On Wed, Nov 21, 2018 at 4:19 PM Vladimir Ozerov <[hidden email]>
wrote:

> Pavel,
>
> This could be solved with aforementioned "RowFormat". We will be able to
> configure cache as follows: "this is a cache with strict type checks, first
> one is A, with fields A1, A2, A3, second is B with fields B1, B2". So it
> will be possible to serialize anything into binary object, but when it
> comes to real store, exception will be thrown.
>
> Makes sense?
>
> On Wed, Nov 21, 2018 at 3:21 PM Pavel Tupitsyn <[hidden email]>
> wrote:
>
> > Vladimir,
> >
> > IMO the issue is that we allow any type of data in the cache (put Person,
> > then put int to the same cache).
> > Are we going to address this in 3.0 and enforce key/value types according
> > to cache configuration?
> > This will provide more space for optimizations.
> >
> > On Wed, Nov 21, 2018 at 3:14 PM Vladimir Ozerov <[hidden email]>
> > wrote:
> >
> > > Denis,
> > >
> > > In theory data conversion could be avoided in certain cases. E.g.
> > consider
> > > a case of loading data through streamer. We know the cache, we know
> it's
> > > metadata and row format. So instead of doing "user object" -> "binary
> > > object" -> "row", we can do "user object" -> "row".
> > >
> > > On Wed, Nov 21, 2018 at 1:31 PM Denis Mekhanikov <
> [hidden email]>
> > > wrote:
> > >
> > > > Vladimir,
> > > >
> > > > Thank you for the clarification. I didn't see this distinction first.
> > > >
> > > > I meant using customizable formats for all serialization, not only
> for
> > > > storage.
> > > > The idea behind my proposal is to avoid data conversion, when loading
> > > data
> > > > into Ignite.
> > > > It will complicate usage of thin clients though, so I'm not sure,
> that
> > it
> > > > will make users happier.
> > > >
> > > > But anyway, the same approach may be used for storage only.
> > > >
> > > > Denis
> > > >
> > > > ср, 21 нояб. 2018 г. в 12:57, Vladimir Ozerov <[hidden email]
> >:
> > > >
> > > > > Denis,
> > > > >
> > > > > Could you please clarify - are you talking about storage, e.g. how
> > > > objects
> > > > > are stored in Ignite, or about serialization as a whole? I'd like
> to
> > > > better
> > > > > understand whether the use case you described is relevant to my
> idea
> > of
> > > > > splitting binary objects from underlying storage format.
> > > > > My vision was that we can use current BinaryObject protocol (with
> > > > whatever
> > > > > optimizations needed), as a common format for communication between
> > > nodes
> > > > > and a common serialization protocol. This is very handy because all
> > > > > participants (Java, С++, .NET, all sorts of thin clients) are able
> to
> > > > work
> > > > > with it. So if I have a "Person" class in Java I can read it in any
> > > other
> > > > > platform without any additional configuration. But when it comes to
> > > > > *storage*, then we may introduce pluggable row format interface
> which
> > > > will
> > > > > apply any necessary transformations. So if someone wants to store
> > > objects
> > > > > in Avro/Protobuf, and ready to configure and implement it (generate
> > > > > classes, implementa field extraction logic, etc.) - then just
> > implement
> > > > > that interface. They key is that this implementation will only be
> > > needed
> > > > in
> > > > > Java, not in a dozen of platform we support.
> > > > >
> > > > > But when it comes to how to store object in a cache
> > > > >
> > > > > On Wed, Nov 21, 2018 at 11:37 AM Denis Mekhanikov <
> > > [hidden email]
> > > > >
> > > > > wrote:
> > > > >
> > > > > > People often ask about possibility to store their data in that
> > > format,
> > > > > that
> > > > > > they use in their applications.
> > > > > > If you use Avro everywhere in your application, then why not
> store
> > > data
> > > > > in
> > > > > > the same format in Ignite?
> > > > > > So, how about making an interface, that would enlist all
> operations
> > > we
> > > > > > need,
> > > > > > and use this interface everywhere without relying on any specific
> > > > > > implementation.
> > > > > > *BinaryObject* looks like a suitable interface, but the only
> > > > > > implementation, that you can get from Ignite
> > > > > > is *BinaryObjectImpl*.
> > > > > > I think, we should make Ignite extendible and provide capability
> to
> > > > > specify
> > > > > > your own data format
> > > > > > by implementing the corresponding interfaces.
> > > > > > So, if you like JSONB or Protobuf or whatever else, you could
> > enable
> > > a
> > > > > > module for the corresponding
> > > > > > format, and use it for storing the data.
> > > > > >
> > > > > > Denis
> > > > > >
> > > > > > ср, 21 нояб. 2018 г. в 10:10, Alexey Zinoviev <
> > > [hidden email]
> > > > >:
> > > > > >
> > > > > > > I'd like @Vyacheslav Daradur approach.
> > > > > > >
> > > > > > > Maybe somebody could have a look at UnsafeRow in Spark
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java
> > > > > > > UnsafeRow is a concrete InternalRow that represents a mutable
> > > > internal
> > > > > > > raw-memory (and hence unsafe) binary row format.
> > > > > > >
> > > > > > > P.S. If somebody is interested in this apporach, I could share
> > more
> > > > > > > information
> > > > > > >
> > > > > > > вт, 20 нояб. 2018 г. в 11:33, Sergi Vladykin <
> > > > [hidden email]
> > > > > >:
> > > > > > >
> > > > > > > > I really like Protobuf format. It is probably not what we
> need
> > > for
> > > > > O(1)
> > > > > > > > fields access,
> > > > > > > > but for compact data representation we can derive lots from
> > > there.
> > > > > > > >
> > > > > > > > Also IMO, restricting field type change is absolutely sane
> > idea.
> > > > > > > > The correct way to evolve schema in common case is to add new
> > > > fields
> > > > > > and
> > > > > > > > gradually
> > > > > > > > deprecate the old ones, if you can skip default/null fields
> in
> > > > binary
> > > > > > > > format this approach
> > > > > > > > will not introduce any noticeable performance/size overhead.
> > > > > > > >
> > > > > > > > Sergi
> > > > > > > >
> > > > > > > > вт, 20 нояб. 2018 г. в 11:12, Vyacheslav Daradur <
> > > > > [hidden email]
> > > > > > >:
> > > > > > > >
> > > > > > > > > I think, one of a possible way to reduce overhead and TCO -
> > SQL
> > > > > > Scheme
> > > > > > > > > approach.
> > > > > > > > >
> > > > > > > > > That assumes that metadata will be stored separately from
> > > > > serialized
> > > > > > > > > data to reduce size.
> > > > > > > > > In this case, the most advantages of Binary Objects like
> > access
> > > > in
> > > > > > > > > O(1) and access without deserialization may be achieved.
> > > > > > > > > On Tue, Nov 20, 2018 at 10:56 AM Vladimir Ozerov <
> > > > > > [hidden email]
> > > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > Hi Alexey,
> > > > > > > > > >
> > > > > > > > > > Binary Objects only.
> > > > > > > > > >
> > > > > > > > > > On Tue, Nov 20, 2018 at 10:50 AM Alexey Zinoviev <
> > > > > > > > [hidden email]
> > > > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Do we discuss here Core features only or the roadmap
> for
> > > all
> > > > > > > > > components?
> > > > > > > > > > >
> > > > > > > > > > > вт, 20 нояб. 2018 г. в 10:05, Vladimir Ozerov <
> > > > > > > [hidden email]
> > > > > > > > >:
> > > > > > > > > > >
> > > > > > > > > > > > Igniters,
> > > > > > > > > > > >
> > > > > > > > > > > > It is very likely that Apache Ignite 3.0 will be
> > released
> > > > > next
> > > > > > > > year.
> > > > > > > > > So
> > > > > > > > > > > we
> > > > > > > > > > > > need to start thinking about major product
> > improvements.
> > > > I'd
> > > > > > like
> > > > > > > > to
> > > > > > > > > > > start
> > > > > > > > > > > > with binary objects.
> > > > > > > > > > > >
> > > > > > > > > > > > Currently they are one of the main limiting factors
> for
> > > the
> > > > > > > > product.
> > > > > > > > > They
> > > > > > > > > > > > are fat - 30+ bytes overhead on average, high TCO of
> > > Apache
> > > > > > > Ignite
> > > > > > > > > > > > comparing to other vendors. They are slow - not
> > suitable
> > > > for
> > > > > > SQL
> > > > > > > at
> > > > > > > > > all.
> > > > > > > > > > > >
> > > > > > > > > > > > I would like to ask all of you who worked with binary
> > > > objects
> > > > > > to
> > > > > > > > > share
> > > > > > > > > > > your
> > > > > > > > > > > > feedback and ideas, so that we understand how they
> > should
> > > > > look
> > > > > > > like
> > > > > > > > > in AI
> > > > > > > > > > > > 3.0. This is a brain storm - let's accumulate ideas
> > first
> > > > and
> > > > > > > > > minimize
> > > > > > > > > > > > critics. Then we will work on ideas in separate
> topics.
> > > > > > > > > > > >
> > > > > > > > > > > > 1) Historical background
> > > > > > > > > > > >
> > > > > > > > > > > > BO were implemented around 2014 (Apache Ignite 1.5)
> > when
> > > we
> > > > > > > started
> > > > > > > > > > > working
> > > > > > > > > > > > on .NET and CPP clients. During design we had several
> > > ideas
> > > > > in
> > > > > > > > mind:
> > > > > > > > > > > > - ability to read object fields in O(1) without
> > > > > deserialization
> > > > > > > > > > > > - interoperabillty between Java, .NET and CPP.
> > > > > > > > > > > >
> > > > > > > > > > > > Since then a number of other concepts were mixed to
> the
> > > > > > cocktail:
> > > > > > > > > > > > - Affinity key fields
> > > > > > > > > > > > - Strict typing for existing fields (aka metadata)
> > > > > > > > > > > > - Binary Object as storage format
> > > > > > > > > > > >
> > > > > > > > > > > > 2) My proposals
> > > > > > > > > > > >
> > > > > > > > > > > > 2.1) Introduce "Data Row Format" interface
> > > > > > > > > > > > Binary Objects are terrible candidates for storage.
> Too
> > > > fat,
> > > > > > too
> > > > > > > > > slow.
> > > > > > > > > > > > Efficient storage typically has <10 bytes overhead
> per
> > > row
> > > > > (no
> > > > > > > > > metadata,
> > > > > > > > > > > no
> > > > > > > > > > > > length, no hash code, etc), allow supper-fast field
> > > access,
> > > > > > > support
> > > > > > > > > > > > different string formats (ASCII, UTF-8, etc), support
> > > > > different
> > > > > > > > > temporal
> > > > > > > > > > > > types (date, time, timestamp, timestamp with
> timezone,
> > > > etc),
> > > > > > and
> > > > > > > > > store
> > > > > > > > > > > > these types as efficiently as possible.
> > > > > > > > > > > >
> > > > > > > > > > > > What we need is to introduce an interface which will
> > > > convert
> > > > > a
> > > > > > > pair
> > > > > > > > > of
> > > > > > > > > > > > key-value objects into a row. This row will be used
> to
> > > > store
> > > > > > data
> > > > > > > > > and to
> > > > > > > > > > > > get fields from it. Care about memory consumption,
> need
> > > SQL
> > > > > and
> > > > > > > > > strict
> > > > > > > > > > > > schema - use one format. Need flexibility and prefer
> > > > > key-value
> > > > > > > > > access -
> > > > > > > > > > > use
> > > > > > > > > > > > another format which will store binary objects
> > unchanged
> > > > > > (current
> > > > > > > > > > > > behavior).
> > > > > > > > > > > >
> > > > > > > > > > > > interface DataRowFormat {
> > > > > > > > > > > >     DataRow create(Object key, Object value); //
> > > primitives
> > > > > or
> > > > > > > > binary
> > > > > > > > > > > > objects
> > > > > > > > > > > >     DataRowMetadata metadata();
> > > > > > > > > > > > }
> > > > > > > > > > > >
> > > > > > > > > > > > 2.2) Remove affinity field from metadata
> > > > > > > > > > > > Affinity rules are governed by cache, not type. We
> > should
> > > > > > remove
> > > > > > > > > > > > "affintiyFieldName" from metadata.
> > > > > > > > > > > >
> > > > > > > > > > > > 2.3) Remove restrictions on changing field type
> > > > > > > > > > > > I do not know why we did that in the first place.
> This
> > > > > > > restriction
> > > > > > > > > > > prevents
> > > > > > > > > > > > type evolution and confuses users.
> > > > > > > > > > > >
> > > > > > > > > > > > 2.4) Use bitmaps for "null" and default values and
> for
> > > > > > > fixed-length
> > > > > > > > > > > fields,
> > > > > > > > > > > > put fixed-length fields before variable-length.
> > > > > > > > > > > > Motivation: to save space.
> > > > > > > > > > > >
> > > > > > > > > > > > What else? Please share your ideas.
> > > > > > > > > > > >
> > > > > > > > > > > > Vladimir.
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > Best Regards, Vyacheslav D.
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [IMPORTANT] Future of Binary Objects

sdarlington
In reply to this post by Vladimir Ozerov
Possibly heading into wishlist rather than practical territory here, but you did ask...

> What we need is to introduce an interface which will convert a pair of
> key-value objects into a row. This row will be used to store data and to
> get fields from it.

Rather than mapping objects to a row, how about mapping to a more general “internal storage” interface? Assuming that all the data for a row is stored together makes it difficult to implement any optimisations that spans multiple rows. Think of a string state field where there are only five known values… we currently repeat the text over and over. Or a full column store backend.

Regards,
Stephen


Reply | Threaded
Open this post in threaded view
|

Re: [IMPORTANT] Future of Binary Objects

Ilya Kasnacheev
In reply to this post by Vladimir Ozerov
Hello!

I would like to propose the following changes:

- Let's allow multiple BinaryType's per Class. Make typeId = cksum(list of
class types + fields) as opposed of cksum(class name) as we have it
currently. Note that we only have to compute that once per class loaded in
JVM.
- BinaryType has a list of fixed length fields (numbers, datetimes, flags)
and list of variable length fields. We can put all fixed length fields at
start of BinaryObject so that we can access them by offset as per typeId.
- Likewise we don't need to encode field id in BinaryObject anymore, save 4
bytes per field. We already know their order from BinaryType.
- This means when you ALTER TABLE we add a BinaryType to existing Class (or
pseudo-Class type name) and we can use it for new data, and eventually
update existing data to have this field.
- On top of BinaryType's we can have checks that run them against SQL table
columns list to see if there are any mismatches.

To Illustrate, previously we had it like:
[ Type id | String field id | String field value | Long field id | Long
field value | Datetime field id | Datetime field value ]
But now it will be
[ Type id | Long field value | Datetime field value | String field value ]
            ^------------------^---- can be accessed by offset

Regards,
Ilya.

--
Ilya Kasnacheev


вт, 20 нояб. 2018 г. в 10:05, Vladimir Ozerov <[hidden email]>:

> Igniters,
>
> It is very likely that Apache Ignite 3.0 will be released next year. So we
> need to start thinking about major product improvements. I'd like to start
> with binary objects.
>
> Currently they are one of the main limiting factors for the product. They
> are fat - 30+ bytes overhead on average, high TCO of Apache Ignite
> comparing to other vendors. They are slow - not suitable for SQL at all.
>
> I would like to ask all of you who worked with binary objects to share your
> feedback and ideas, so that we understand how they should look like in AI
> 3.0. This is a brain storm - let's accumulate ideas first and minimize
> critics. Then we will work on ideas in separate topics.
>
> 1) Historical background
>
> BO were implemented around 2014 (Apache Ignite 1.5) when we started working
> on .NET and CPP clients. During design we had several ideas in mind:
> - ability to read object fields in O(1) without deserialization
> - interoperabillty between Java, .NET and CPP.
>
> Since then a number of other concepts were mixed to the cocktail:
> - Affinity key fields
> - Strict typing for existing fields (aka metadata)
> - Binary Object as storage format
>
> 2) My proposals
>
> 2.1) Introduce "Data Row Format" interface
> Binary Objects are terrible candidates for storage. Too fat, too slow.
> Efficient storage typically has <10 bytes overhead per row (no metadata, no
> length, no hash code, etc), allow supper-fast field access, support
> different string formats (ASCII, UTF-8, etc), support different temporal
> types (date, time, timestamp, timestamp with timezone, etc), and store
> these types as efficiently as possible.
>
> What we need is to introduce an interface which will convert a pair of
> key-value objects into a row. This row will be used to store data and to
> get fields from it. Care about memory consumption, need SQL and strict
> schema - use one format. Need flexibility and prefer key-value access - use
> another format which will store binary objects unchanged (current
> behavior).
>
> interface DataRowFormat {
>     DataRow create(Object key, Object value); // primitives or binary
> objects
>     DataRowMetadata metadata();
> }
>
> 2.2) Remove affinity field from metadata
> Affinity rules are governed by cache, not type. We should remove
> "affintiyFieldName" from metadata.
>
> 2.3) Remove restrictions on changing field type
> I do not know why we did that in the first place. This restriction prevents
> type evolution and confuses users.
>
> 2.4) Use bitmaps for "null" and default values and for fixed-length fields,
> put fixed-length fields before variable-length.
> Motivation: to save space.
>
> What else? Please share your ideas.
>
> Vladimir.
>
Reply | Threaded
Open this post in threaded view
|

Re: [IMPORTANT] Future of Binary Objects

Andrew Mashenkov
Hi,

Vladimir,  Ilya,

What about variable length fields? How do you suggest to store offsets in
footer or header?

For large objects, headers will allow to retrive field faster and detect
null immediately, but we have to reserve place for all var-len fields
offset and update header after serialization.
however, footers looks more compact (we can omit nulls) and allow us to use
stream concept during serialization.
Have I miss smth?


On Wed, Nov 21, 2018 at 7:18 PM Ilya Kasnacheev <[hidden email]>
wrote:

> Hello!
>
> I would like to propose the following changes:
>
> - Let's allow multiple BinaryType's per Class. Make typeId = cksum(list of
> class types + fields) as opposed of cksum(class name) as we have it
> currently. Note that we only have to compute that once per class loaded in
> JVM.
> - BinaryType has a list of fixed length fields (numbers, datetimes, flags)
> and list of variable length fields. We can put all fixed length fields at
> start of BinaryObject so that we can access them by offset as per typeId.
> - Likewise we don't need to encode field id in BinaryObject anymore, save 4
> bytes per field. We already know their order from BinaryType.
> - This means when you ALTER TABLE we add a BinaryType to existing Class (or
> pseudo-Class type name) and we can use it for new data, and eventually
> update existing data to have this field.
> - On top of BinaryType's we can have checks that run them against SQL table
> columns list to see if there are any mismatches.
>
> To Illustrate, previously we had it like:
> [ Type id | String field id | String field value | Long field id | Long
> field value | Datetime field id | Datetime field value ]
> But now it will be
> [ Type id | Long field value | Datetime field value | String field value ]
>             ^------------------^---- can be accessed by offset
>
> Regards,
> Ilya.
>
> --
> Ilya Kasnacheev
>
>
> вт, 20 нояб. 2018 г. в 10:05, Vladimir Ozerov <[hidden email]>:
>
> > Igniters,
> >
> > It is very likely that Apache Ignite 3.0 will be released next year. So
> we
> > need to start thinking about major product improvements. I'd like to
> start
> > with binary objects.
> >
> > Currently they are one of the main limiting factors for the product. They
> > are fat - 30+ bytes overhead on average, high TCO of Apache Ignite
> > comparing to other vendors. They are slow - not suitable for SQL at all.
> >
> > I would like to ask all of you who worked with binary objects to share
> your
> > feedback and ideas, so that we understand how they should look like in AI
> > 3.0. This is a brain storm - let's accumulate ideas first and minimize
> > critics. Then we will work on ideas in separate topics.
> >
> > 1) Historical background
> >
> > BO were implemented around 2014 (Apache Ignite 1.5) when we started
> working
> > on .NET and CPP clients. During design we had several ideas in mind:
> > - ability to read object fields in O(1) without deserialization
> > - interoperabillty between Java, .NET and CPP.
> >
> > Since then a number of other concepts were mixed to the cocktail:
> > - Affinity key fields
> > - Strict typing for existing fields (aka metadata)
> > - Binary Object as storage format
> >
> > 2) My proposals
> >
> > 2.1) Introduce "Data Row Format" interface
> > Binary Objects are terrible candidates for storage. Too fat, too slow.
> > Efficient storage typically has <10 bytes overhead per row (no metadata,
> no
> > length, no hash code, etc), allow supper-fast field access, support
> > different string formats (ASCII, UTF-8, etc), support different temporal
> > types (date, time, timestamp, timestamp with timezone, etc), and store
> > these types as efficiently as possible.
> >
> > What we need is to introduce an interface which will convert a pair of
> > key-value objects into a row. This row will be used to store data and to
> > get fields from it. Care about memory consumption, need SQL and strict
> > schema - use one format. Need flexibility and prefer key-value access -
> use
> > another format which will store binary objects unchanged (current
> > behavior).
> >
> > interface DataRowFormat {
> >     DataRow create(Object key, Object value); // primitives or binary
> > objects
> >     DataRowMetadata metadata();
> > }
> >
> > 2.2) Remove affinity field from metadata
> > Affinity rules are governed by cache, not type. We should remove
> > "affintiyFieldName" from metadata.
> >
> > 2.3) Remove restrictions on changing field type
> > I do not know why we did that in the first place. This restriction
> prevents
> > type evolution and confuses users.
> >
> > 2.4) Use bitmaps for "null" and default values and for fixed-length
> fields,
> > put fixed-length fields before variable-length.
> > Motivation: to save space.
> >
> > What else? Please share your ideas.
> >
> > Vladimir.
> >
>


--
Best regards,
Andrey V. Mashenkov
12