Apache Ignite Developers - Legacy Mail Archive

EntryProcessor execution semantics

Classic

List

Threaded

8 messages Options

Andrey Kornev

EntryProcessor execution semantics

Hello,

I'd like to ask the community members to share their thoughts/opinions on the following subject.

JCache provides a way to atomically execute one or more actions against a cache entry using the Entry Processor mechanism. The Cache interface exposes an invoke() method that takes a key of the cache entry to be acted upon and an instance of the EntryProcessor interface as its parameters. When invoked, the entry processor will be passed an instance of JCache's MutableEntry which gives the processor exclusive access to the cache entry for the duration of the EntryProcessor.process() call. Great feature for delta updates, in-place compute, coordination/agreement (a la zookeeper), and so on!

Now, if one were to put his Ignite user hat on, what execution semantics would you as a user expect? Specifically,
1) the EntryProcessor is executed on the key's primary node as well as all backup nodes.
2) the EntryProcessor is executed only on the key's primary node.
3) something else.

Unfortunately JCache spec doesn't provide much details on this feature. Ignite documentation is silent, too.

Thanks
Andrey

Vladimir Ershov

Re: EntryProcessor execution semantics

Hi Andrey!

Could you please clarify in what case are you interested in:

1. cache entries backup related to CacheConfiguration#getBackups amount
2. just a remote nodes with some other keys

Thanks!

On Wed, Nov 25, 2015 at 8:43 PM, Andrey Kornev <[hidden email]>
wrote:

> Hello,
>
> I'd like to ask the community members to share their thoughts/opinions on
> the following subject.
>
> JCache provides a way to atomically execute one or more actions against a
> cache entry using the Entry Processor mechanism. The Cache interface
> exposes an invoke() method that takes a key of the cache entry to be acted
> upon and an instance of the EntryProcessor interface as its parameters.
> When invoked, the entry processor will be passed an instance of JCache's
> MutableEntry which gives the processor exclusive access to the cache entry
> for the duration of the EntryProcessor.process() call. Great feature for
> delta updates, in-place compute, coordination/agreement (a la zookeeper),
> and so on!
>
> Now, if one were to put his Ignite user hat on, what execution semantics
> would you as a user expect? Specifically,
> 1) the EntryProcessor is executed on the key's primary node as well as all
> backup nodes.
> 2) the EntryProcessor is executed only on the key's primary node.
> 3) something else.
>
> Unfortunately JCache spec doesn't provide much details on this feature.
> Ignite documentation is silent, too.
>
> Thanks
> Andrey
>

Andrey Kornev

RE: EntryProcessor execution semantics

Hi Vladimir,

My question was related to your expectations as a user of JCache API. Specifically, if you were to use the JCache's entry processor feature, where and when would you expect the EntryProcessor be executed once you call Cache.invoke() method?

I wonder if there is anyone here who's used this API?

The reason I'm asking this question is that I believe the current implementation of this feature in Ignite strikes me as strange, to say the least. And due to lack of details/guidance in the JCache spec as to how this feature to be implemented the only thing left is to ask the community for their opinion/experience.

Based on my previous experience with Coherence's implementation of this feature I expected the same behavior from Ignite's. But alas, Ignite has its own -- different -- opinion how it should be done. :)

Regards
Andrey

> Date: Thu, 26 Nov 2015 20:59:48 +0300
> Subject: Re: EntryProcessor execution semantics
> From: [hidden email]
> To: [hidden email]
>
> Hi Andrey!
>
> Could you please clarify in what case are you interested in:
>
> 1. cache entries backup related to CacheConfiguration#getBackups amount
> 2. just a remote nodes with some other keys
>
> Thanks!
>
> On Wed, Nov 25, 2015 at 8:43 PM, Andrey Kornev <[hidden email]>
> wrote:
>
> > Hello,
> >
> > I'd like to ask the community members to share their thoughts/opinions on
> > the following subject.
> >
> > JCache provides a way to atomically execute one or more actions against a
> > cache entry using the Entry Processor mechanism. The Cache interface
> > exposes an invoke() method that takes a key of the cache entry to be acted
> > upon and an instance of the EntryProcessor interface as its parameters.
> > When invoked, the entry processor will be passed an instance of JCache's
> > MutableEntry which gives the processor exclusive access to the cache entry
> > for the duration of the EntryProcessor.process() call. Great feature for
> > delta updates, in-place compute, coordination/agreement (a la zookeeper),
> > and so on!
> >
> > Now, if one were to put his Ignite user hat on, what execution semantics
> > would you as a user expect? Specifically,
> > 1) the EntryProcessor is executed on the key's primary node as well as all
> > backup nodes.
> > 2) the EntryProcessor is executed only on the key's primary node.
> > 3) something else.
> >
> > Unfortunately JCache spec doesn't provide much details on this feature.
> > Ignite documentation is silent, too.
> >
> > Thanks
> > Andrey
> >

Alexey Goncharuk

Re: EntryProcessor execution semantics

Andrey,

If I leave behind my knowledge about Ignite internals, my expectation would
be that an EntryProcessor is invoked on all affinity - both primary and
backup - nodes in the grid. The main reason behind this expectation is that
usually a serialized EntryProcessor instance is smaller than resulting
object being stored in the cache, so sending a serialized EntryProcessor
should be cheaper. Is there a specific reason you expect an EntryProcessor
to be called only once across all the nodes?

I would not imply any restrictions on how many times an EntryProcessor is
called during a cache update. For example, in a case of explicit optimistic
READ_COMMITTED transaction it may be called more than once because Ignite
needs to calculate a return value for the first invoke() and then it should
be called second time during commit when transactional locks are held.

Current requirement is that an EntryProcessor should be a stateless
function, and it may be called more than once (but of course it will
receive the same cache value every time). I agree that this should be
properly articulated in the documentation, I will make sure that it will be
reflected in the forthcoming 1.5 release javadocs.

Andrey Kornev

RE: EntryProcessor execution semantics

Thank you, Alexey!

By stating that "sending a serialized EntryProcessor should be cheaper" you implicitly assume that the cache entry is big and the computation done by the processor is cheap. But what if it's not the case? What if the computation itself is quite expensive and depends on external data (which may happen to be constantly changing -- like the stock tickers?), or is done for a side effect? What is the EP feature good for after all, given the constraints you posed below? Incrementing an integer counter, as the example in Ignite documentation does? :)

Of course, JCache specification is open to interpretation, and one might argue that the EntryProcessor is a performance feature, but my reading of the spec makes me think (and it looks like both Coherence and Hazelcast agree with me) that it's first and foremost a way to atomically mutate a cache entry without incurring an overhead of locking.

Let's see now. A single call to Cache.invoke() produces
- a single EP invocation on the key's primary node in Coherence. Period.
- a single EP invocation on the key's primary node in Hazelcast, but they offer the non-JCache BackupAwareEntryProcessor class that allows the user "to create or pass another EntryProcessor to run on backup
partitions and apply delta changes to the backup entries".
- In Ignite:
-- a single invocation on the key's primary node if the cache is ATOMIC (both REPLICATED and PARTITIONED).
-- N+1 invocations (where N is the number of nodes the cache is started on) if the cache is REPLICATED and TRANSACTIONAL.
-- B+2 invocations (where B is the number of replicas) if the cache is PARTITIONED and TRANSACTIONAL.

Go figure! Alexey, you're suggesting that a user without deep knowledge of Ignite internals would find such behavior expected and natural? Even with deep knowledge of Ignite internals it's hard to understand the logic.

Neither Coherence nor Hazelcast require the EP to be stateless and side-effect free. Even better Hazelcast makes the choice explicit by providing the backup aware processor API and it's then up to the user to ensure statelessness etc. But Ignite is just too clever.

I'd really like to ask the brains behind the current design to reconsider.

Regards
Andrey

> Date: Mon, 30 Nov 2015 13:11:13 +0300
> Subject: Re: EntryProcessor execution semantics
> From: [hidden email]
> To: [hidden email]
>
> Andrey,
>
> If I leave behind my knowledge about Ignite internals, my expectation would
> be that an EntryProcessor is invoked on all affinity - both primary and
> backup - nodes in the grid. The main reason behind this expectation is that
> usually a serialized EntryProcessor instance is smaller than resulting
> object being stored in the cache, so sending a serialized EntryProcessor
> should be cheaper. Is there a specific reason you expect an EntryProcessor
> to be called only once across all the nodes?
>
> I would not imply any restrictions on how many times an EntryProcessor is
> called during a cache update. For example, in a case of explicit optimistic
> READ_COMMITTED transaction it may be called more than once because Ignite
> needs to calculate a return value for the first invoke() and then it should
> be called second time during commit when transactional locks are held.
>
> Current requirement is that an EntryProcessor should be a stateless
> function, and it may be called more than once (but of course it will
> receive the same cache value every time). I agree that this should be
> properly articulated in the documentation, I will make sure that it will be
> reflected in the forthcoming 1.5 release javadocs.

dsetrakyan

Re: EntryProcessor execution semantics

On Mon, Nov 30, 2015 at 9:02 AM, Andrey Kornev <[hidden email]>
wrote:

>
> Neither Coherence nor Hazelcast require the EP to be stateless and
> side-effect free. Even better Hazelcast makes the choice explicit by
> providing the backup aware processor API and it's then up to the user to
> ensure statelessness etc. But Ignite is just too clever.
>

Andrey, stateful EP seems a bit utopian to me, since the state would not
survive between executions anyway. Can you elaborate?

Andrey Kornev

RE: EntryProcessor execution semantics

Dmitriy,

Here, by "stateless" I meant whatever Alexey meant in his previous post in this thread. But I'm really talking about being able to have EPs with side effects and therefore the execution semantics should be "exactly-once" by default. Besides, maybe it's just me, but intuitively the expectation of Cache.invoke() is that the EP will be executed only once because *logically* there can only be one entry with the given key in the cache to which the EP is applied. Having the EP executed many times for the same entry comes as a big surprise, at least to me.

Maybe it's worth considering an API similar to what Hazelcast has to make it possible to explicitly control EP's execution semantics.

Regards
Andrey

> From: [hidden email]
> Date: Mon, 30 Nov 2015 23:16:58 -0800
> Subject: Re: EntryProcessor execution semantics
> To: [hidden email]
>
> On Mon, Nov 30, 2015 at 9:02 AM, Andrey Kornev <[hidden email]>
> wrote:
>
> >
> > Neither Coherence nor Hazelcast require the EP to be stateless and
> > side-effect free. Even better Hazelcast makes the choice explicit by
> > providing the backup aware processor API and it's then up to the user to
> > ensure statelessness etc. But Ignite is just too clever.
> >
>
> Andrey, stateful EP seems a bit utopian to me, since the state would not
> survive between executions anyway. Can you elaborate?

dsetrakyan

Re: EntryProcessor execution semantics

On Tue, Dec 1, 2015 at 8:34 AM, Andrey Kornev <[hidden email]>
wrote:

> Dmitriy,
>
> Here, by "stateless" I meant whatever Alexey meant in his previous post in
> this thread. But I'm really talking about being able to have EPs with side
> effects and therefore the execution semantics should be "exactly-once" by
> default. Besides, maybe it's just me, but intuitively the expectation of
> Cache.invoke() is that the EP will be executed only once because
> *logically* there can only be one entry with the given key in the cache to
> which the EP is applied. Having the EP executed many times for the same
> entry comes as a big surprise, at least to me.
>
> Maybe it's worth considering an API similar to what Hazelcast has to make
> it possible to explicitly control EP's execution semantics.
>

Andrey, can you create a ticket and propose a design? We could continue
this discussion there.

>
> Regards
> Andrey
>
> > From: [hidden email]
> > Date: Mon, 30 Nov 2015 23:16:58 -0800
> > Subject: Re: EntryProcessor execution semantics
> > To: [hidden email]
> >
> > On Mon, Nov 30, 2015 at 9:02 AM, Andrey Kornev <[hidden email]
> >
> > wrote:
> >
> > >
> > > Neither Coherence nor Hazelcast require the EP to be stateless and
> > > side-effect free. Even better Hazelcast makes the choice explicit by
> > > providing the backup aware processor API and it's then up to the user
> to
> > > ensure statelessness etc. But Ignite is just too clever.
> > >
> >
> > Andrey, stateful EP seems a bit utopian to me, since the state would not
> > survive between executions anyway. Can you elaborate?
>
>