Apache Ignite Developers - Legacy Mail Archive

Non-UTF-8 string encoding support in BinaryMarshaller (IGNITE-5655)

Classic

List

Threaded

28 messages Options

Andrey Kuznetsov

Re: Non-UTF-8 string encoding support in BinaryMarshaller (IGNITE-5655)

We discussed this with Pavel and Anton just a moment ago. Summary follows.

- New byte "flag" is to be added (ENCODED_STRING)

- 'Encoding' property is to be added at
-- global level (BinaryConfiguration)

-- per-class level (BinaryTypeConfiguration)

-- per-field level (BinaryTypeConfiguration)

2017-07-28 14:15 GMT+03:00 Vladimir Ozerov [via Apache Ignite Developers] <[hidden email]>:

As Pavel mentioned, Marshaller should not be tied to cache, BinaryObject
should be self-explanatory, i.e. containing all information necessary for
unmarshalling. This is an absolute requirement.

We will have one extra byte for in serialized form, meaning that advantage
of custom encoding will become evident for all strings with length >= 1,
which is perfectly fine. I do not quite understand what are we arguing
about.

As far as configuration, we can do it as follows:

1) Add global encoding, UTF8 by default.
2) Add per-cache encoding.
3) Add encoding to JDBC and ODBC driver properties.

This should be enough.

Best regards,

Andrey Kuznetsov.

Vladimir Ozerov

Re: Non-UTF-8 string encoding support in BinaryMarshaller (IGNITE-5655)

Encoding *must not* be added to per-class or per-field level, this is wrong.

It should be added to per-cache level, and to per-cache-column level in
future.

пт, 28 июля 2017 г. в 14:27, Andrey Kuznetsov <[hidden email]>:

> We discussed this with Pavel and Anton just a moment ago. Summary follows.
>
> - New byte "flag" is to be added (ENCODED_STRING)
> - 'Encoding' property is to be added at
> -- global level (BinaryConfiguration)
> -- per-class level (BinaryTypeConfiguration)
> -- per-field level (BinaryTypeConfiguration)
>
> 2017-07-28 14:15 GMT+03:00 Vladimir Ozerov [via Apache Ignite Developers] <
> [hidden email]>:
>
> > As Pavel mentioned, Marshaller should not be tied to cache, BinaryObject
> > should be self-explanatory, i.e. containing all information necessary for
> > unmarshalling. This is an absolute requirement.
> >
> > We will have one extra byte for in serialized form, meaning that
> advantage
> > of custom encoding will become evident for all strings with length >= 1,
> > which is perfectly fine. I do not quite understand what are we arguing
> > about.
> >
> > As far as configuration, we can do it as follows:
> >
> > 1) Add global encoding, UTF8 by default.
> > 2) Add per-cache encoding.
> > 3) Add encoding to JDBC and ODBC driver properties.
> >
> > This should be enough.
> >
> >
> --
> Best regards,
> Andrey Kuznetsov.
>
>
>
>
> --
> View this message in context:
> http://apache-ignite-developers.2346864.n4.nabble.com/Non-UTF-8-string-encoding-support-in-BinaryMarshaller-IGNITE-5655-tp20024p20161.html
> Sent from the Apache Ignite Developers mailing list archive at Nabble.com.

Pavel Tupitsyn

Re: Non-UTF-8 string encoding support in BinaryMarshaller (IGNITE-5655)

> As Pavel mentioned, Marshaller should not be tied to cache
> should be added to per-cache level
Not sure if I follow.
Marshalling and caching are two separate mechanisms.
Defining binary format in CacheConfiguration violates separation of
concerns.

> Encoding *must not* be added to per-class or per-field level, this is
wrong
What is wrong with this? BinaryTypeConfiguration looks the right place for
such a setting.
Are we talking from SQL standpoint here, so you want this to be defined
somehow via DDL in future?

On Fri, Jul 28, 2017 at 2:30 PM, Vladimir Ozerov <[hidden email]>
wrote:

> Encoding *must not* be added to per-class or per-field level, this is
> wrong.
>
> It should be added to per-cache level, and to per-cache-column level in
> future.
>
> пт, 28 июля 2017 г. в 14:27, Andrey Kuznetsov <[hidden email]>:
>
> > We discussed this with Pavel and Anton just a moment ago. Summary
> follows.
> >
> > - New byte "flag" is to be added (ENCODED_STRING)
> > - 'Encoding' property is to be added at
> > -- global level (BinaryConfiguration)
> > -- per-class level (BinaryTypeConfiguration)
> > -- per-field level (BinaryTypeConfiguration)
> >
> > 2017-07-28 14:15 GMT+03:00 Vladimir Ozerov [via Apache Ignite
> Developers] <
> > [hidden email]>:
> >
> > > As Pavel mentioned, Marshaller should not be tied to cache,
> BinaryObject
> > > should be self-explanatory, i.e. containing all information necessary
> for
> > > unmarshalling. This is an absolute requirement.
> > >
> > > We will have one extra byte for in serialized form, meaning that
> > advantage
> > > of custom encoding will become evident for all strings with length >=
> 1,
> > > which is perfectly fine. I do not quite understand what are we arguing
> > > about.
> > >
> > > As far as configuration, we can do it as follows:
> > >
> > > 1) Add global encoding, UTF8 by default.
> > > 2) Add per-cache encoding.
> > > 3) Add encoding to JDBC and ODBC driver properties.
> > >
> > > This should be enough.
> > >
> > >
> > --
> > Best regards,
> > Andrey Kuznetsov.
> >
> >
> >
> >
> > --
> > View this message in context:
> > http://apache-ignite-developers.2346864.n4.nabble.
> com/Non-UTF-8-string-encoding-support-in-BinaryMarshaller-
> IGNITE-5655-tp20024p20161.html
> > Sent from the Apache Ignite Developers mailing list archive at
> Nabble.com.
>

Vladimir Ozerov

Re: Non-UTF-8 string encoding support in BinaryMarshaller (IGNITE-5655)

String encoding is a concept similar to "collation" in RDBMS. You can
define it either globally, or on per-table basis. The same should be done
for Ignite. We do not define behavior of a type. We define behavior of a
*storage*.

Two cases when proposed approach with per-type and per-type-field approach
doesn't work:
1) I have a class Person with field "name". I have two caches/tables - one
for US persons, where name is in Latin, another for RU persons with
Cyrillic names. How can achieve optimal encoding formats for both tables?
2) I have an empty grid. Now I want to create a cache/table with custom
encoding. How can I do that without cluster restart? Nohow, because
BinaryTypeConfiguration configured statically, while caches/tables can be
created in runtime.

On Fri, Jul 28, 2017 at 2:38 PM, Pavel Tupitsyn <[hidden email]>
wrote:

> > As Pavel mentioned, Marshaller should not be tied to cache
> > should be added to per-cache level
> Not sure if I follow.
> Marshalling and caching are two separate mechanisms.
> Defining binary format in CacheConfiguration violates separation of
> concerns.
>
> > Encoding *must not* be added to per-class or per-field level, this is
> wrong
> What is wrong with this? BinaryTypeConfiguration looks the right place for
> such a setting.
> Are we talking from SQL standpoint here, so you want this to be defined
> somehow via DDL in future?
>
> On Fri, Jul 28, 2017 at 2:30 PM, Vladimir Ozerov <[hidden email]>
> wrote:
>
> > Encoding *must not* be added to per-class or per-field level, this is
> > wrong.
> >
> > It should be added to per-cache level, and to per-cache-column level in
> > future.
> >
> > пт, 28 июля 2017 г. в 14:27, Andrey Kuznetsov <[hidden email]>:
> >
> > > We discussed this with Pavel and Anton just a moment ago. Summary
> > follows.
> > >
> > > - New byte "flag" is to be added (ENCODED_STRING)
> > > - 'Encoding' property is to be added at
> > > -- global level (BinaryConfiguration)
> > > -- per-class level (BinaryTypeConfiguration)
> > > -- per-field level (BinaryTypeConfiguration)
> > >
> > > 2017-07-28 14:15 GMT+03:00 Vladimir Ozerov [via Apache Ignite
> > Developers] <
> > > [hidden email]>:
> > >
> > > > As Pavel mentioned, Marshaller should not be tied to cache,
> > BinaryObject
> > > > should be self-explanatory, i.e. containing all information necessary
> > for
> > > > unmarshalling. This is an absolute requirement.
> > > >
> > > > We will have one extra byte for in serialized form, meaning that
> > > advantage
> > > > of custom encoding will become evident for all strings with length >=
> > 1,
> > > > which is perfectly fine. I do not quite understand what are we
> arguing
> > > > about.
> > > >
> > > > As far as configuration, we can do it as follows:
> > > >
> > > > 1) Add global encoding, UTF8 by default.
> > > > 2) Add per-cache encoding.
> > > > 3) Add encoding to JDBC and ODBC driver properties.
> > > >
> > > > This should be enough.
> > > >
> > > >
> > > --
> > > Best regards,
> > > Andrey Kuznetsov.
> > >
> > >
> > >
> > >
> > > --
> > > View this message in context:
> > > http://apache-ignite-developers.2346864.n4.nabble.
> > com/Non-UTF-8-string-encoding-support-in-BinaryMarshaller-
> > IGNITE-5655-tp20024p20161.html
> > > Sent from the Apache Ignite Developers mailing list archive at
> > Nabble.com.
> >
>

Andrey Kuznetsov

Re: Non-UTF-8 string encoding support in BinaryMarshaller (IGNITE-5655)

Currently, marshaller determines the type of field (BYTE, INT, STRING etc.)
only by the Class of data being serialized. It seems rather non-trivial to
manage marshaling parameters at cache creation point. Alternatively, there
exists simple and flexible way: just to introduce new Java type, say,
StringWithEncoding, but it looks ugly to my mind.

2017-07-28 14:45 GMT+03:00 Vladimir Ozerov <[hidden email]>:

> String encoding is a concept similar to "collation" in RDBMS. You can
> define it either globally, or on per-table basis. The same should be done
> for Ignite. We do not define behavior of a type. We define behavior of a
> *storage*.
>
> Two cases when proposed approach with per-type and per-type-field approach
> doesn't work:
> 1) I have a class Person with field "name". I have two caches/tables - one
> for US persons, where name is in Latin, another for RU persons with
> Cyrillic names. How can achieve optimal encoding formats for both tables?
> 2) I have an empty grid. Now I want to create a cache/table with custom
> encoding. How can I do that without cluster restart? Nohow, because
> BinaryTypeConfiguration configured statically, while caches/tables can be
> created in runtime.
>

Artem Schitow

Re: Non-UTF-8 string encoding support in BinaryMarshaller (IGNITE-5655)

In reply to this post by Vladimir Ozerov

> String encoding is a concept similar to "collation" in RDBMS. You can
> define it either globally, or on per-table basis.

Or on per-column (per-field) basis. Though Oracle does not have per-column charset, some other databases provide this option.

MySQL:
- https://dev.mysql.com/doc/refman/5.7/en/create-table.html
| CHAR[(length)] [BINARY]
[CHARACTER SET charset_name] [COLLATE collation_name]

| VARCHAR(length) [BINARY]
[CHARACTER SET charset_name] [COLLATE collation_name]

| TEXT [BINARY]
[CHARACTER SET charset_name] [COLLATE collation_name]

SQL Server:
- https://docs.microsoft.com/en-us/sql/t-sql/statements/create-table-transact-sql
<column_definition> ::=
column_name <data_type>
[ FILESTREAM ]
[ COLLATE collation_name ]

Postgres:
- https://www.postgresql.org/docs/9.6/static/sql-createtable.html
CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXISTS ] table_name
( [
{
column_name data_type [ COLLATE collation ]

> 1) I have a class Person with field "name". I have two caches/tables - one
> for US persons, where name is in Latin, another for RU persons with
> Cyrillic names. How can achieve optimal encoding formats for both tables?

You have to have two classes in this case, maybe with a common parent. Or you have to select a common denominator and settle with one encoding for both of them. Like Java did with UTF-16 java.util.String-s.

—
Artem Schitow
[hidden email]

> On 28 Jul 2017, at 14:45, Vladimir Ozerov <[hidden email]> wrote:
>
> String encoding is a concept similar to "collation" in RDBMS. You can
> define it either globally, or on per-table basis. The same should be done
> for Ignite. We do not define behavior of a type. We define behavior of a
> *storage*.
>
> Two cases when proposed approach with per-type and per-type-field approach
> doesn't work:
> 1) I have a class Person with field "name". I have two caches/tables - one
> for US persons, where name is in Latin, another for RU persons with
> Cyrillic names. How can achieve optimal encoding formats for both tables?
> 2) I have an empty grid. Now I want to create a cache/table with custom
> encoding. How can I do that without cluster restart? Nohow, because
> BinaryTypeConfiguration configured statically, while caches/tables can be
> created in runtime.
>
> On Fri, Jul 28, 2017 at 2:38 PM, Pavel Tupitsyn <[hidden email]>
> wrote:
>
>>> As Pavel mentioned, Marshaller should not be tied to cache
>>> should be added to per-cache level
>> Not sure if I follow.
>> Marshalling and caching are two separate mechanisms.
>> Defining binary format in CacheConfiguration violates separation of
>> concerns.
>>
>>> Encoding *must not* be added to per-class or per-field level, this is
>> wrong
>> What is wrong with this? BinaryTypeConfiguration looks the right place for
>> such a setting.
>> Are we talking from SQL standpoint here, so you want this to be defined
>> somehow via DDL in future?
>>
>> On Fri, Jul 28, 2017 at 2:30 PM, Vladimir Ozerov <[hidden email]>
>> wrote:
>>
>>> Encoding *must not* be added to per-class or per-field level, this is
>>> wrong.
>>>
>>> It should be added to per-cache level, and to per-cache-column level in
>>> future.
>>>
>>> пт, 28 июля 2017 г. в 14:27, Andrey Kuznetsov <[hidden email]>:
>>>
>>>> We discussed this with Pavel and Anton just a moment ago. Summary
>>> follows.
>>>>
>>>> - New byte "flag" is to be added (ENCODED_STRING)
>>>> - 'Encoding' property is to be added at
>>>> -- global level (BinaryConfiguration)
>>>> -- per-class level (BinaryTypeConfiguration)
>>>> -- per-field level (BinaryTypeConfiguration)
>>>>
>>>> 2017-07-28 14:15 GMT+03:00 Vladimir Ozerov [via Apache Ignite
>>> Developers] <
>>>> [hidden email]>:
>>>>
>>>>> As Pavel mentioned, Marshaller should not be tied to cache,
>>> BinaryObject
>>>>> should be self-explanatory, i.e. containing all information necessary
>>> for
>>>>> unmarshalling. This is an absolute requirement.
>>>>>
>>>>> We will have one extra byte for in serialized form, meaning that
>>>> advantage
>>>>> of custom encoding will become evident for all strings with length >=
>>> 1,
>>>>> which is perfectly fine. I do not quite understand what are we
>> arguing
>>>>> about.
>>>>>
>>>>> As far as configuration, we can do it as follows:
>>>>>
>>>>> 1) Add global encoding, UTF8 by default.
>>>>> 2) Add per-cache encoding.
>>>>> 3) Add encoding to JDBC and ODBC driver properties.
>>>>>
>>>>> This should be enough.
>>>>>
>>>>>
>>>> --
>>>> Best regards,
>>>> Andrey Kuznetsov.
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> View this message in context:
>>>> http://apache-ignite-developers.2346864.n4.nabble.
>>> com/Non-UTF-8-string-encoding-support-in-BinaryMarshaller-
>>> IGNITE-5655-tp20024p20161.html
>>>> Sent from the Apache Ignite Developers mailing list archive at
>>> Nabble.com.
>>>
>>

Vladimir Ozerov

Re: Non-UTF-8 string encoding support in BinaryMarshaller (IGNITE-5655)

Managing encoding on per-cache level is not that complex thing.
Essentially, when any cache message are prepared on initiating node, we
perform Object -> BinaryObject transition. These places has reference to
cache context ([1], [2]). This is where we should define proper string
encoding - either take global encoding, or cache-specific encoding.

As far as per-column encoding, let's put this fine-grained case aside for a
while. This is not as widely used as global or per-cache/per-table scenario.

[1] org.apache.ignite.internal.processors.cache.GridCacheContext#toCacheKeyObject(java.lang.Object)
[2]
org.apache.ignite.internal.processors.cache.GridCacheContext#toCacheObject

On Fri, Jul 28, 2017 at 8:08 PM, Artem Schitow <[hidden email]>
wrote:

> > String encoding is a concept similar to "collation" in RDBMS. You can
> > define it either globally, or on per-table basis.
>
> Or on per-column (per-field) basis. Though Oracle does not have per-column
> charset, some other databases provide this option.
>
> MySQL:
> - https://dev.mysql.com/doc/refman/5.7/en/create-table.html
> | CHAR[(length)] [BINARY]
> [CHARACTER SET charset_name] [COLLATE collation_name]
>
> | VARCHAR(length) [BINARY]
> [CHARACTER SET charset_name] [COLLATE collation_name]
>
> | TEXT [BINARY]
> [CHARACTER SET charset_name] [COLLATE collation_name]
>
> SQL Server:
> - https://docs.microsoft.com/en-us/sql/t-sql/statements/
> create-table-transact-sql
> <column_definition> ::=
> column_name <data_type>
> [ FILESTREAM ]
> [ COLLATE collation_name ]
>
> Postgres:
> - https://www.postgresql.org/docs/9.6/static/sql-createtable.html
> CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF
> NOT EXISTS ] table_name
> ( [
> {
> column_name data_type [ COLLATE collation ]
>
> > 1) I have a class Person with field "name". I have two caches/tables -
> one
> > for US persons, where name is in Latin, another for RU persons with
> > Cyrillic names. How can achieve optimal encoding formats for both tables?
>
> You have to have two classes in this case, maybe with a common parent. Or
> you have to select a common denominator and settle with one encoding for
> both of them. Like Java did with UTF-16 java.util.String-s.
>
> —
> Artem Schitow
> [hidden email]
>
>
>
>
> > On 28 Jul 2017, at 14:45, Vladimir Ozerov <[hidden email]> wrote:
> >
> > String encoding is a concept similar to "collation" in RDBMS. You can
> > define it either globally, or on per-table basis. The same should be done
> > for Ignite. We do not define behavior of a type. We define behavior of a
> > *storage*.
> >
> > Two cases when proposed approach with per-type and per-type-field
> approach
> > doesn't work:
> > 1) I have a class Person with field "name". I have two caches/tables -
> one
> > for US persons, where name is in Latin, another for RU persons with
> > Cyrillic names. How can achieve optimal encoding formats for both tables?
> > 2) I have an empty grid. Now I want to create a cache/table with custom
> > encoding. How can I do that without cluster restart? Nohow, because
> > BinaryTypeConfiguration configured statically, while caches/tables can be
> > created in runtime.
> >
> > On Fri, Jul 28, 2017 at 2:38 PM, Pavel Tupitsyn <[hidden email]>
> > wrote:
> >
> >>> As Pavel mentioned, Marshaller should not be tied to cache
> >>> should be added to per-cache level
> >> Not sure if I follow.
> >> Marshalling and caching are two separate mechanisms.
> >> Defining binary format in CacheConfiguration violates separation of
> >> concerns.
> >>
> >>> Encoding *must not* be added to per-class or per-field level, this is
> >> wrong
> >> What is wrong with this? BinaryTypeConfiguration looks the right place
> for
> >> such a setting.
> >> Are we talking from SQL standpoint here, so you want this to be defined
> >> somehow via DDL in future?
> >>
> >> On Fri, Jul 28, 2017 at 2:30 PM, Vladimir Ozerov <[hidden email]>
> >> wrote:
> >>
> >>> Encoding *must not* be added to per-class or per-field level, this is
> >>> wrong.
> >>>
> >>> It should be added to per-cache level, and to per-cache-column level in
> >>> future.
> >>>
> >>> пт, 28 июля 2017 г. в 14:27, Andrey Kuznetsov <[hidden email]>:
> >>>
> >>>> We discussed this with Pavel and Anton just a moment ago. Summary
> >>> follows.
> >>>>
> >>>> - New byte "flag" is to be added (ENCODED_STRING)
> >>>> - 'Encoding' property is to be added at
> >>>> -- global level (BinaryConfiguration)
> >>>> -- per-class level (BinaryTypeConfiguration)
> >>>> -- per-field level (BinaryTypeConfiguration)
> >>>>
> >>>> 2017-07-28 14:15 GMT+03:00 Vladimir Ozerov [via Apache Ignite
> >>> Developers] <
> >>>> [hidden email]>:
> >>>>
> >>>>> As Pavel mentioned, Marshaller should not be tied to cache,
> >>> BinaryObject
> >>>>> should be self-explanatory, i.e. containing all information necessary
> >>> for
> >>>>> unmarshalling. This is an absolute requirement.
> >>>>>
> >>>>> We will have one extra byte for in serialized form, meaning that
> >>>> advantage
> >>>>> of custom encoding will become evident for all strings with length >=
> >>> 1,
> >>>>> which is perfectly fine. I do not quite understand what are we
> >> arguing
> >>>>> about.
> >>>>>
> >>>>> As far as configuration, we can do it as follows:
> >>>>>
> >>>>> 1) Add global encoding, UTF8 by default.
> >>>>> 2) Add per-cache encoding.
> >>>>> 3) Add encoding to JDBC and ODBC driver properties.
> >>>>>
> >>>>> This should be enough.
> >>>>>
> >>>>>
> >>>> --
> >>>> Best regards,
> >>>> Andrey Kuznetsov.
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> View this message in context:
> >>>> http://apache-ignite-developers.2346864.n4.nabble.
> >>> com/Non-UTF-8-string-encoding-support-in-BinaryMarshaller-
> >>> IGNITE-5655-tp20024p20161.html
> >>>> Sent from the Apache Ignite Developers mailing list archive at
> >>> Nabble.com.
> >>>
> >>
>
>

Pavel Tupitsyn

Re: Non-UTF-8 string encoding support in BinaryMarshaller (IGNITE-5655)

Vladimir, what about binary mode (IgniteCache.withKeepBinary)?
Two caches may have different encoding settings:

BinaryObject obj = cache1.get(key); // Got fields in utf8
cache2.put(key, obj); // Fields are expected to be in Windows-1251

What do we do here? Re-build the binary object?

Also, what about BinaryRawWriter - do we need encoding support there?

Pavel

On Tue, Aug 1, 2017 at 12:23 PM, Vladimir Ozerov <[hidden email]>
wrote:

> Managing encoding on per-cache level is not that complex thing.
> Essentially, when any cache message are prepared on initiating node, we
> perform Object -> BinaryObject transition. These places has reference to
> cache context ([1], [2]). This is where we should define proper string
> encoding - either take global encoding, or cache-specific encoding.
>
> As far as per-column encoding, let's put this fine-grained case aside for a
> while. This is not as widely used as global or per-cache/per-table
> scenario.
>
> [1] org.apache.ignite.internal.processors.cache.GridCacheContext#
> toCacheKeyObject(java.lang.Object)
> [2]
> org.apache.ignite.internal.processors.cache.GridCacheContext#toCacheObject
>
> On Fri, Jul 28, 2017 at 8:08 PM, Artem Schitow <[hidden email]>
> wrote:
>
> > > String encoding is a concept similar to "collation" in RDBMS. You can
> > > define it either globally, or on per-table basis.
> >
> > Or on per-column (per-field) basis. Though Oracle does not have
> per-column
> > charset, some other databases provide this option.
> >
> > MySQL:
> > - https://dev.mysql.com/doc/refman/5.7/en/create-table.html
> > | CHAR[(length)] [BINARY]
> > [CHARACTER SET charset_name] [COLLATE collation_name]
> >
> > | VARCHAR(length) [BINARY]
> > [CHARACTER SET charset_name] [COLLATE collation_name]
> >
> > | TEXT [BINARY]
> > [CHARACTER SET charset_name] [COLLATE collation_name]
> >
> > SQL Server:
> > - https://docs.microsoft.com/en-us/sql/t-sql/statements/
> > create-table-transact-sql
> > <column_definition> ::=
> > column_name <data_type>
> > [ FILESTREAM ]
> > [ COLLATE collation_name ]
> >
> > Postgres:
> > - https://www.postgresql.org/docs/9.6/static/sql-createtable.html
> > CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF
> > NOT EXISTS ] table_name
> > ( [
> > {
> > column_name data_type [ COLLATE collation ]
> >
> > > 1) I have a class Person with field "name". I have two caches/tables -
> > one
> > > for US persons, where name is in Latin, another for RU persons with
> > > Cyrillic names. How can achieve optimal encoding formats for both
> tables?
> >
> > You have to have two classes in this case, maybe with a common parent. Or
> > you have to select a common denominator and settle with one encoding for
> > both of them. Like Java did with UTF-16 java.util.String-s.
> >
> > —
> > Artem Schitow
> > [hidden email]
> >
> >
> >
> >
> > > On 28 Jul 2017, at 14:45, Vladimir Ozerov <[hidden email]>
> wrote:
> > >
> > > String encoding is a concept similar to "collation" in RDBMS. You can
> > > define it either globally, or on per-table basis. The same should be
> done
> > > for Ignite. We do not define behavior of a type. We define behavior of
> a
> > > *storage*.
> > >
> > > Two cases when proposed approach with per-type and per-type-field
> > approach
> > > doesn't work:
> > > 1) I have a class Person with field "name". I have two caches/tables -
> > one
> > > for US persons, where name is in Latin, another for RU persons with
> > > Cyrillic names. How can achieve optimal encoding formats for both
> tables?
> > > 2) I have an empty grid. Now I want to create a cache/table with custom
> > > encoding. How can I do that without cluster restart? Nohow, because
> > > BinaryTypeConfiguration configured statically, while caches/tables can
> be
> > > created in runtime.
> > >
> > > On Fri, Jul 28, 2017 at 2:38 PM, Pavel Tupitsyn <[hidden email]>
> > > wrote:
> > >
> > >>> As Pavel mentioned, Marshaller should not be tied to cache
> > >>> should be added to per-cache level
> > >> Not sure if I follow.
> > >> Marshalling and caching are two separate mechanisms.
> > >> Defining binary format in CacheConfiguration violates separation of
> > >> concerns.
> > >>
> > >>> Encoding *must not* be added to per-class or per-field level, this is
> > >> wrong
> > >> What is wrong with this? BinaryTypeConfiguration looks the right place
> > for
> > >> such a setting.
> > >> Are we talking from SQL standpoint here, so you want this to be
> defined
> > >> somehow via DDL in future?
> > >>
> > >> On Fri, Jul 28, 2017 at 2:30 PM, Vladimir Ozerov <
> [hidden email]>
> > >> wrote:
> > >>
> > >>> Encoding *must not* be added to per-class or per-field level, this is
> > >>> wrong.
> > >>>
> > >>> It should be added to per-cache level, and to per-cache-column level
> in
> > >>> future.
> > >>>
> > >>> пт, 28 июля 2017 г. в 14:27, Andrey Kuznetsov <[hidden email]>:
> > >>>
> > >>>> We discussed this with Pavel and Anton just a moment ago. Summary
> > >>> follows.
> > >>>>
> > >>>> - New byte "flag" is to be added (ENCODED_STRING)
> > >>>> - 'Encoding' property is to be added at
> > >>>> -- global level (BinaryConfiguration)
> > >>>> -- per-class level (BinaryTypeConfiguration)
> > >>>> -- per-field level (BinaryTypeConfiguration)
> > >>>>
> > >>>> 2017-07-28 14:15 GMT+03:00 Vladimir Ozerov [via Apache Ignite
> > >>> Developers] <
> > >>>> [hidden email]>:
> > >>>>
> > >>>>> As Pavel mentioned, Marshaller should not be tied to cache,
> > >>> BinaryObject
> > >>>>> should be self-explanatory, i.e. containing all information
> necessary
> > >>> for
> > >>>>> unmarshalling. This is an absolute requirement.
> > >>>>>
> > >>>>> We will have one extra byte for in serialized form, meaning that
> > >>>> advantage
> > >>>>> of custom encoding will become evident for all strings with length
> >=
> > >>> 1,
> > >>>>> which is perfectly fine. I do not quite understand what are we
> > >> arguing
> > >>>>> about.
> > >>>>>
> > >>>>> As far as configuration, we can do it as follows:
> > >>>>>
> > >>>>> 1) Add global encoding, UTF8 by default.
> > >>>>> 2) Add per-cache encoding.
> > >>>>> 3) Add encoding to JDBC and ODBC driver properties.
> > >>>>>
> > >>>>> This should be enough.
> > >>>>>
> > >>>>>
> > >>>> --
> > >>>> Best regards,
> > >>>> Andrey Kuznetsov.
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>> --
> > >>>> View this message in context:
> > >>>> http://apache-ignite-developers.2346864.n4.nabble.
> > >>> com/Non-UTF-8-string-encoding-support-in-BinaryMarshaller-
> > >>> IGNITE-5655-tp20024p20161.html
> > >>>> Sent from the Apache Ignite Developers mailing list archive at
> > >>> Nabble.com.
> > >>>
> > >>
> >
> >
>