Apache Ignite Developers - Legacy Mail Archive

C++ marshalling.

Classic

List

Threaded

10 messages Options

Vladimir Ozerov

C++ marshalling.

Igniters,

C++ doesn't have reflection/introspection. For this reason we have to map
user structs/classes to their marshal/unmarshal handlers (functions)
somehow.

Various approaches for this are available:
1) Predefined map [type ID -> marshal/unmarshal functions] which is
configured at runtime before Grid is started.
2) Provide serializers in runtime. E.g. the following will set specific
serializers on cache projection:
ICache* cache = grid.cache(KeySerializer* k, ValSerialzier* v).

I think we should start with p.1 as it is flexible and will not require
users to change their existing types. The drawback is that user will have
to write marshalling logic by hand. But we can introduce some
code-generation facility later (e.g. like Gigaspaces does this).

Thoughts and ideas are welcomed.

Vladimir.

Denis Magda

Re: C++ marshalling.

Yeap, it's not so easy to marshall/unmarshall data in C++.

Take a look at these slides from NVIDIA:
http://on-demand.gputechconf.com/gtc/2012/presentations/S0377-C++-Data-Marshalling-Best-Practices.pdf

The slides are quite high level but probably they will expose a solution
to p.2.

--
Denis

On 5/26/2015 12:09 PM, Vladimir Ozerov wrote:

> Igniters,
>
> C++ doesn't have reflection/introspection. For this reason we have to map
> user structs/classes to their marshal/unmarshal handlers (functions)
> somehow.
>
> Various approaches for this are available:
> 1) Predefined map [type ID -> marshal/unmarshal functions] which is
> configured at runtime before Grid is started.
> 2) Provide serializers in runtime. E.g. the following will set specific
> serializers on cache projection:
> ICache* cache = grid.cache(KeySerializer* k, ValSerialzier* v).
>
> I think we should start with p.1 as it is flexible and will not require
> users to change their existing types. The drawback is that user will have
> to write marshalling logic by hand. But we can introduce some
> code-generation facility later (e.g. like Gigaspaces does this).
>
> Thoughts and ideas are welcomed.
>
> Vladimir.
>

Vladimir Ozerov

Re: C++ marshalling.

SFINAE could be a way to perform compile-time introspection:
http://en.wikipedia.org/wiki/Substitution_failure_is_not_an_error

On Tue, May 26, 2015 at 5:40 PM, Denis Magda <[hidden email]> wrote:

> Yeap, it's not so easy to marshall/unmarshall data in C++.
>
> Take a look at these slides from NVIDIA:
>
> http://on-demand.gputechconf.com/gtc/2012/presentations/S0377-C++-Data-Marshalling-Best-Practices.pdf
>
> The slides are quite high level but probably they will expose a solution
> to p.2.
>
> --
> Denis
>
>
> On 5/26/2015 12:09 PM, Vladimir Ozerov wrote:
>
>> Igniters,
>>
>> C++ doesn't have reflection/introspection. For this reason we have to map
>> user structs/classes to their marshal/unmarshal handlers (functions)
>> somehow.
>>
>> Various approaches for this are available:
>> 1) Predefined map [type ID -> marshal/unmarshal functions] which is
>> configured at runtime before Grid is started.
>> 2) Provide serializers in runtime. E.g. the following will set specific
>> serializers on cache projection:
>> ICache* cache = grid.cache(KeySerializer* k, ValSerialzier* v).
>>
>> I think we should start with p.1 as it is flexible and will not require
>> users to change their existing types. The drawback is that user will have
>> to write marshalling logic by hand. But we can introduce some
>> code-generation facility later (e.g. like Gigaspaces does this).
>>
>> Thoughts and ideas are welcomed.
>>
>> Vladimir.
>>
>>
>

Branko Čibej

Re: C++ marshalling.

Why don't you just use an existing IDL? Something like Thrift or
Protobufs or ... there are quite a few of them out there. Inventing your
own marshalling is a waste of time.

-- Brane

On 26.05.2015 22:12, Vladimir Ozerov wrote:

> SFINAE could be a way to perform compile-time introspection:
> http://en.wikipedia.org/wiki/Substitution_failure_is_not_an_error
>
> On Tue, May 26, 2015 at 5:40 PM, Denis Magda <[hidden email]> wrote:
>
>> Yeap, it's not so easy to marshall/unmarshall data in C++.
>>
>> Take a look at these slides from NVIDIA:
>>
>> http://on-demand.gputechconf.com/gtc/2012/presentations/S0377-C++-Data-Marshalling-Best-Practices.pdf
>>
>> The slides are quite high level but probably they will expose a solution
>> to p.2.
>>
>> --
>> Denis
>>
>>
>> On 5/26/2015 12:09 PM, Vladimir Ozerov wrote:
>>
>>> Igniters,
>>>
>>> C++ doesn't have reflection/introspection. For this reason we have to map
>>> user structs/classes to their marshal/unmarshal handlers (functions)
>>> somehow.
>>>
>>> Various approaches for this are available:
>>> 1) Predefined map [type ID -> marshal/unmarshal functions] which is
>>> configured at runtime before Grid is started.
>>> 2) Provide serializers in runtime. E.g. the following will set specific
>>> serializers on cache projection:
>>> ICache* cache = grid.cache(KeySerializer* k, ValSerialzier* v).
>>>
>>> I think we should start with p.1 as it is flexible and will not require
>>> users to change their existing types. The drawback is that user will have
>>> to write marshalling logic by hand. But we can introduce some
>>> code-generation facility later (e.g. like Gigaspaces does this).
>>>
>>> Thoughts and ideas are welcomed.
>>>
>>> Vladimir.
>>>
>>>

Vladimir Ozerov

Re: C++ marshalling.

Brane,

There are two key features we are planning to address with platforms
integration efforts apart of trivial cache/compue APIs;
1) Portability, so that objects can freely travel between Java, C++, etc..
Also portability is mandatory to support queries because our query engine
is tightly coupled to Java.
2) Zero changes to existing models in user apps, so that Ignite could be
integrated into legacy apps with minimal efforts. On Java side our
OptimizedMarshaller is already in a very good shape to handle this. This is
why I would prefer to develop marshaller from scratch instead of using
existing solutions.

Vladimir.

On Tue, May 26, 2015 at 11:43 PM, Branko Čibej <[hidden email]> wrote:

> Why don't you just use an existing IDL? Something like Thrift or
> Protobufs or ... there are quite a few of them out there. Inventing your
> own marshalling is a waste of time.
>
> -- Brane
>
>
> On 26.05.2015 22:12, Vladimir Ozerov wrote:
> > SFINAE could be a way to perform compile-time introspection:
> > http://en.wikipedia.org/wiki/Substitution_failure_is_not_an_error
> >
> > On Tue, May 26, 2015 at 5:40 PM, Denis Magda <[hidden email]>
> wrote:
> >
> >> Yeap, it's not so easy to marshall/unmarshall data in C++.
> >>
> >> Take a look at these slides from NVIDIA:
> >>
> >>
> http://on-demand.gputechconf.com/gtc/2012/presentations/S0377-C++-Data-Marshalling-Best-Practices.pdf
> >>
> >> The slides are quite high level but probably they will expose a solution
> >> to p.2.
> >>
> >> --
> >> Denis
> >>
> >>
> >> On 5/26/2015 12:09 PM, Vladimir Ozerov wrote:
> >>
> >>> Igniters,
> >>>
> >>> C++ doesn't have reflection/introspection. For this reason we have to
> map
> >>> user structs/classes to their marshal/unmarshal handlers (functions)
> >>> somehow.
> >>>
> >>> Various approaches for this are available:
> >>> 1) Predefined map [type ID -> marshal/unmarshal functions] which is
> >>> configured at runtime before Grid is started.
> >>> 2) Provide serializers in runtime. E.g. the following will set specific
> >>> serializers on cache projection:
> >>> ICache* cache = grid.cache(KeySerializer* k, ValSerialzier* v).
> >>>
> >>> I think we should start with p.1 as it is flexible and will not require
> >>> users to change their existing types. The drawback is that user will
> have
> >>> to write marshalling logic by hand. But we can introduce some
> >>> code-generation facility later (e.g. like Gigaspaces does this).
> >>>
> >>> Thoughts and ideas are welcomed.
> >>>
> >>> Vladimir.
> >>>
> >>>
>
>

Branko Čibej

Re: C++ marshalling.

On 26.05.2015 23:04, Vladimir Ozerov wrote:

> Brane,
>
> There are two key features we are planning to address with platforms
> integration efforts apart of trivial cache/compue APIs;
> 1) Portability, so that objects can freely travel between Java, C++, etc..
> Also portability is mandatory to support queries because our query engine
> is tightly coupled to Java.
> 2) Zero changes to existing models in user apps, so that Ignite could be
> integrated into legacy apps with minimal efforts. On Java side our
> OptimizedMarshaller is already in a very good shape to handle this. This is
> why I would prefer to develop marshaller from scratch instead of using
> existing solutions.

Hmmm ... so, in this case I really see only two possible approaches:

1. Use an IDL-like approach, where users would describe their C++
types, e.g., in Java, and you could use the existing introspection
code to generate a specialized C++ marshaller.
2. Use a C++ parser (e.g., from LLVM) to generate a machine-readable
structure description, and generate the marshalling code from that.

Option 2 is by far the most user-friendly, because users could just
point the generator to their existing class definitions. But it's
probably a huge amount of work.

Both approaches have one nasty problem: In C++, unlike in Java with
reflection, there's no universal way to construct an object without
using the available constructors, nor is there a universal way to set
values to class attributes. It's likely that users would have to modify
their existing classes by adding marshaller-friendly constructors or
factory methods (where such don't yet exist).

-- Brane

Denis Magda

Re: C++ marshalling.

On 5/27/2015 12:14 AM, Branko Čibej wrote:
> 2. Use a C++ parser (e.g., from LLVM) to generate a machine-readable
> structure description, and generate the marshalling code from that.
>
> Option 2 is by far the most user-friendly, because users could just
> point the generator to their existing class definitions. But it's
> probably a huge amount of work.
Branko,

This is really an awesome solution if we want to perform marshalling
automatically.

Apple extensively uses libclang for code analysis and generation in its
Objective-C runtime.
We can go the same way. Here is a good introductory article on clang:
http://szelei.me/code-generator/

--
Denis

Vladimir Ozerov

Re: C++ marshalling.

Code generation in any from (either compiile-time or runtime) is nice idea
and we certainly should pay attention to it. But it appears to be too
complicated for initial release.
I would stick to explicit "serializers" for now.

On Wed, May 27, 2015 at 9:28 AM, Denis Magda <[hidden email]> wrote:

> On 5/27/2015 12:14 AM, Branko Čibej wrote:
>
>> 2. Use a C++ parser (e.g., from LLVM) to generate a machine-readable
>> structure description, and generate the marshalling code from that.
>>
>> Option 2 is by far the most user-friendly, because users could just
>> point the generator to their existing class definitions. But it's
>> probably a huge amount of work.
>>
> Branko,
>
> This is really an awesome solution if we want to perform marshalling
> automatically.
>
> Apple extensively uses libclang for code analysis and generation in its
> Objective-C runtime.
> We can go the same way. Here is a good introductory article on clang:
> http://szelei.me/code-generator/
>
> --
> Denis
>
>

Branko Čibej

Re: C++ marshalling.

On 27.05.2015 09:06, Vladimir Ozerov wrote:
> Code generation in any from (either compiile-time or runtime) is nice idea
> and we certainly should pay attention to it. But it appears to be too
> complicated for initial release.
> I would stick to explicit "serializers" for now.

Sure; I'm taking the long view here.

-- Brane

> On Wed, May 27, 2015 at 9:28 AM, Denis Magda <[hidden email]> wrote:
>
>> On 5/27/2015 12:14 AM, Branko Čibej wrote:
>>
>>> 2. Use a C++ parser (e.g., from LLVM) to generate a machine-readable
>>> structure description, and generate the marshalling code from that.
>>>
>>> Option 2 is by far the most user-friendly, because users could just
>>> point the generator to their existing class definitions. But it's
>>> probably a huge amount of work.
>>>
>> Branko,
>>
>> This is really an awesome solution if we want to perform marshalling
>> automatically.
>>
>> Apple extensively uses libclang for code analysis and generation in its
>> Objective-C runtime.
>> We can go the same way. Here is a good introductory article on clang:
>> http://szelei.me/code-generator/
>>
>> --
>> Denis
>>
>>

Vladimir Ozerov

Re: C++ marshalling.

Igniters,

After some meditation on the matter I came to the following rough design.
We define 2 serialization modes - intrusive and non-intrusive.

1) Intrusive
Require user to define 3 static methods in his class:
static int TypeId(); // Type ID is used to distinguish different types.
static Write(IgniteWriter& writer, T* obj);
static T* Read(IgniteReader& reader);

Alternatively Write/Read methods can be moved to custom serializer and
registered in Ignite configuration. This allows for non-static reads/writes.

Pros: easy to use; fits well to new development; serialization logic is
bound to Ignite instance.
Cons: not available for types which cannot be changed; works only with raw
pointers;

2) Non-intrusive
User should define Read/Write functions for particular type and register
them with our macro as follows:
IGNITE_SERIALIZATION_REGISTER_TYPE(
Person, // Type to register.
my_proj::TypeIdPersion , // Type ID function.
my_proj::WritePerson, // Write function.
my_proj::ReadPerson // Read function.
)

This macro will expand into several overrides of well-known functions, thus
generating necessary serialization code in compile time.

Pros: no changes to existing code-base; can work with anything - values,
raw pointers, smart potiners, etc..
Cons: more actions from user are required, expanded functions have global
visibility.

We will use this approach internally to serialize primitive types,
collections, smart pointers, etc. E.g. once user registered his type T, he
will automatically be able to serialize std::vector<T>, std:;shared_ptr<T>,
etc..

Any thoughts?

Vladimir.

On Wed, May 27, 2015 at 11:57 AM, Branko Čibej <[hidden email]> wrote:

> On 27.05.2015 09:06, Vladimir Ozerov wrote:
> > Code generation in any from (either compiile-time or runtime) is nice
> idea
> > and we certainly should pay attention to it. But it appears to be too
> > complicated for initial release.
> > I would stick to explicit "serializers" for now.
>
> Sure; I'm taking the long view here.
>
> -- Brane
>
> > On Wed, May 27, 2015 at 9:28 AM, Denis Magda <[hidden email]>
> wrote:
> >
> >> On 5/27/2015 12:14 AM, Branko Čibej wrote:
> >>
> >>> 2. Use a C++ parser (e.g., from LLVM) to generate a machine-readable
> >>> structure description, and generate the marshalling code from
> that.
> >>>
> >>> Option 2 is by far the most user-friendly, because users could just
> >>> point the generator to their existing class definitions. But it's
> >>> probably a huge amount of work.
> >>>
> >> Branko,
> >>
> >> This is really an awesome solution if we want to perform marshalling
> >> automatically.
> >>
> >> Apple extensively uses libclang for code analysis and generation in its
> >> Objective-C runtime.
> >> We can go the same way. Here is a good introductory article on clang:
> >> http://szelei.me/code-generator/
> >>
> >> --
> >> Denis
> >>
> >>
>
>