What is the purpose of a binary schema and schema registry?

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

What is the purpose of a binary schema and schema registry?

John Wilson
Hi,

When objects are marshaled, Ignites adds a schema (BinarySchema) to the
BinarySchemaRegistry. Moreover, the documentation says that an object can
have a few different schemas.

My question:

   1. What does it mean for an object to have multiple schemas? (e.g. for a
   simple person object Person obj = new Person())
   2. What is the purpose of the binary schema registry?

Thanks,
Reply | Threaded
Open this post in threaded view
|

Re: What is the purpose of a binary schema and schema registry?

Pavel Tupitsyn
Hi John,

BinarySchema is an optimization to make serialized objects more compact.
Schema is basically int[] containing field ids. Schemas are stored by id in
the cluster.
Serialized objects themselves contain only a set of field offsets
(basically int[] or short[] or byte[]).

So to get a value of a field with a given name:
1) Retrieve schema by id from cluster
2) Convert field name to field id (hash function)
3) Look up the index (idx) of that field id in the schema
4) Get the field position in the serialized object: fieldOffsets[idx]
5) Deserialize the data from that position

In the previous versions of the binary protocol we had schema embedded in
the objects
(e.g. each object contained a map from field id to offset).

But most objects have fields always serialized in the same order, so it
makes sense to avoid
duplicating the same data in every object.

To answer your questions:
1) Multiple schemas occur when object is serialized in multiple different
ways.
    For example, Binarylizable.writeBinary implementation may have some
conditional logic which causes
    different set of fields in various cases
2) Binary schema registry helps reducing serialized objects size by storing
common information

Pavel

On Mon, Feb 5, 2018 at 10:44 PM, John Wilson <[hidden email]>
wrote:

> Hi,
>
> When objects are marshaled, Ignites adds a schema (BinarySchema) to the
> BinarySchemaRegistry. Moreover, the documentation says that an object can
> have a few different schemas.
>
> My question:
>
>    1. What does it mean for an object to have multiple schemas? (e.g. for a
>    simple person object Person obj = new Person())
>    2. What is the purpose of the binary schema registry?
>
> Thanks,
>
Reply | Threaded
Open this post in threaded view
|

Re: What is the purpose of a binary schema and schema registry?

Valentin Kulichenko
Hi John,

There are multiple ways to get several schemas for the same type. As Pavel
mentioned, one of the examples is when Binarylizable generates different
sets of fields under different circumstances.

However, more common use case is for two client nodes to have different
versions of the same class. For example, you start with a Person class that
has firstName and lastName, and then one of the clients up adds a
middleName field. Now, to support this in Ignite, you don't have to do the
full cluster restart as well as update all the applications in one go.
Other clients will continue working with older version as long as this is
needed, and this will happen transparently. The only requirement is that
the class is not deployed on server nodes' classpath.

So from user perspective, it's basically an ability to dynamically change
the schema without restarting the cluster. And BinarySchemaRegistry is one
of the implementation pieces to achieve that.

-Val

On Tue, Feb 6, 2018 at 12:28 AM, Pavel Tupitsyn <[hidden email]>
wrote:

> Hi John,
>
> BinarySchema is an optimization to make serialized objects more compact.
> Schema is basically int[] containing field ids. Schemas are stored by id in
> the cluster.
> Serialized objects themselves contain only a set of field offsets
> (basically int[] or short[] or byte[]).
>
> So to get a value of a field with a given name:
> 1) Retrieve schema by id from cluster
> 2) Convert field name to field id (hash function)
> 3) Look up the index (idx) of that field id in the schema
> 4) Get the field position in the serialized object: fieldOffsets[idx]
> 5) Deserialize the data from that position
>
> In the previous versions of the binary protocol we had schema embedded in
> the objects
> (e.g. each object contained a map from field id to offset).
>
> But most objects have fields always serialized in the same order, so it
> makes sense to avoid
> duplicating the same data in every object.
>
> To answer your questions:
> 1) Multiple schemas occur when object is serialized in multiple different
> ways.
>     For example, Binarylizable.writeBinary implementation may have some
> conditional logic which causes
>     different set of fields in various cases
> 2) Binary schema registry helps reducing serialized objects size by storing
> common information
>
> Pavel
>
> On Mon, Feb 5, 2018 at 10:44 PM, John Wilson <[hidden email]>
> wrote:
>
> > Hi,
> >
> > When objects are marshaled, Ignites adds a schema (BinarySchema) to the
> > BinarySchemaRegistry. Moreover, the documentation says that an object can
> > have a few different schemas.
> >
> > My question:
> >
> >    1. What does it mean for an object to have multiple schemas? (e.g.
> for a
> >    simple person object Person obj = new Person())
> >    2. What is the purpose of the binary schema registry?
> >
> > Thanks,
> >
>