IgniteSet implementation: changes required

classic Classic list List threaded Threaded
35 messages Options
12
Reply | Threaded
Open this post in threaded view
|

IgniteSet implementation: changes required

Andrey Kuznetsov
Hi, Igniters!

Current implementation of IgniteSet is fragile with respect to cluster
recovery from a checkpoint. We have an issue (IGNITE-5553) that addresses
set's size() behavior, but the problem is slightly broader. The text below
is my comment from Jira issue. I encourage you to discuss it.

We can put current set size into set header cache entry. This will fix
size(), but we have broken iterator() implementation as well.

Currently, set implementation maintains plain Java sets on every node, see
CacheDataStructuresManager.setDataMap. These sets duplicate backing-cache
entries, both primary and backup. size() and iterator() calls issue
distributed queries to collect/filter data from all setDataMap's. And
setDataMaps remain empty after cluster is recovered from checkpoint.

Now I see the following options to fix the issue.

#1 - Naive. Iterate over all datastructure-backing caches entries during
recover from checkpoint procedure, filter set-related entries and refill
setDataMap's.
Pros: easy to implement
Cons: inpredictable time/memory overhead.

#2 - More realistic. Avoid node-local copies of cache data. Maintain linked
list in datastructure-backing cache: key is set item, value is next set
item. List head is stored in set header cache entry (this set item is
youngest one). Iterators build on top of this structure are fail-fast.
Pros: less memory overhead, no need to maintain node-local mirrors of cache
data
Cons: iterators are not fail-safe.

#3 - Option #2 modified. We can store reference counter and 'removed' flag
along with next item reference. This allows to make iterators fail safe.
Pros: iterators are fail-safe
Cons: slightly more complicated implementation, may affect performance,
also I see no way to handle active iterators on remote nodes failures.


Best regards,

Andrey.
Reply | Threaded
Open this post in threaded view
|

Re: IgniteSet implementation: changes required

dsetrakyan
Hi Andrey,

Thanks for a detailed email. I think your suggestions do make sense. Ignite
cannot afford to have a distributed set that is not fail-safe. Can you
please focus only on solutions that provide consistent behavior in case of
topology changes and failures and document them in the ticket?

https://issues.apache.org/jira/browse/IGNITE-5553

D.

On Mon, Oct 30, 2017 at 3:07 AM, Andrey Kuznetsov <[hidden email]> wrote:

> Hi, Igniters!
>
> Current implementation of IgniteSet is fragile with respect to cluster
> recovery from a checkpoint. We have an issue (IGNITE-5553) that addresses
> set's size() behavior, but the problem is slightly broader. The text below
> is my comment from Jira issue. I encourage you to discuss it.
>
> We can put current set size into set header cache entry. This will fix
> size(), but we have broken iterator() implementation as well.
>
> Currently, set implementation maintains plain Java sets on every node, see
> CacheDataStructuresManager.setDataMap. These sets duplicate backing-cache
> entries, both primary and backup. size() and iterator() calls issue
> distributed queries to collect/filter data from all setDataMap's. And
> setDataMaps remain empty after cluster is recovered from checkpoint.
>
> Now I see the following options to fix the issue.
>
> #1 - Naive. Iterate over all datastructure-backing caches entries during
> recover from checkpoint procedure, filter set-related entries and refill
> setDataMap's.
> Pros: easy to implement
> Cons: inpredictable time/memory overhead.
>
> #2 - More realistic. Avoid node-local copies of cache data. Maintain linked
> list in datastructure-backing cache: key is set item, value is next set
> item. List head is stored in set header cache entry (this set item is
> youngest one). Iterators build on top of this structure are fail-fast.
> Pros: less memory overhead, no need to maintain node-local mirrors of cache
> data
> Cons: iterators are not fail-safe.
>
> #3 - Option #2 modified. We can store reference counter and 'removed' flag
> along with next item reference. This allows to make iterators fail safe.
> Pros: iterators are fail-safe
> Cons: slightly more complicated implementation, may affect performance,
> also I see no way to handle active iterators on remote nodes failures.
>
>
> Best regards,
>
> Andrey.
>
Reply | Threaded
Open this post in threaded view
|

Re: IgniteSet implementation: changes required

Pavel Pereslegin
Hello, Igniters!

We have some issues with current IgniteSet implementation ([1], [2], [3], [4]).

As was already described in this conversation, the main problem is
that current IgniteSet implementation maintains plain Java sets on
every node (see CacheDataStructuresManager.setDataMap). These sets
duplicate backing-cache entries, both primary and backup. size() and
iterator() calls issue distributed queries to collect/filter data from
all setDataMap's.

I believe we can solve specified issues if each instance of IgniteSet
will have separate internal cache that will be destroyed on close.

What do you think about such major change? Do you have any thoughts or
objections?

[1] https://issues.apache.org/jira/browse/IGNITE-7565
[2] https://issues.apache.org/jira/browse/IGNITE-5370
[3] https://issues.apache.org/jira/browse/IGNITE-5553
[4] https://issues.apache.org/jira/browse/IGNITE-6474


2017-10-31 5:53 GMT+03:00 Dmitriy Setrakyan <[hidden email]>:

> Hi Andrey,
>
> Thanks for a detailed email. I think your suggestions do make sense. Ignite
> cannot afford to have a distributed set that is not fail-safe. Can you
> please focus only on solutions that provide consistent behavior in case of
> topology changes and failures and document them in the ticket?
>
> https://issues.apache.org/jira/browse/IGNITE-5553
>
> D.
>
> On Mon, Oct 30, 2017 at 3:07 AM, Andrey Kuznetsov <[hidden email]> wrote:
>
>> Hi, Igniters!
>>
>> Current implementation of IgniteSet is fragile with respect to cluster
>> recovery from a checkpoint. We have an issue (IGNITE-5553) that addresses
>> set's size() behavior, but the problem is slightly broader. The text below
>> is my comment from Jira issue. I encourage you to discuss it.
>>
>> We can put current set size into set header cache entry. This will fix
>> size(), but we have broken iterator() implementation as well.
>>
>> Currently, set implementation maintains plain Java sets on every node, see
>> CacheDataStructuresManager.setDataMap. These sets duplicate backing-cache
>> entries, both primary and backup. size() and iterator() calls issue
>> distributed queries to collect/filter data from all setDataMap's. And
>> setDataMaps remain empty after cluster is recovered from checkpoint.
>>
>> Now I see the following options to fix the issue.
>>
>> #1 - Naive. Iterate over all datastructure-backing caches entries during
>> recover from checkpoint procedure, filter set-related entries and refill
>> setDataMap's.
>> Pros: easy to implement
>> Cons: inpredictable time/memory overhead.
>>
>> #2 - More realistic. Avoid node-local copies of cache data. Maintain linked
>> list in datastructure-backing cache: key is set item, value is next set
>> item. List head is stored in set header cache entry (this set item is
>> youngest one). Iterators build on top of this structure are fail-fast.
>> Pros: less memory overhead, no need to maintain node-local mirrors of cache
>> data
>> Cons: iterators are not fail-safe.
>>
>> #3 - Option #2 modified. We can store reference counter and 'removed' flag
>> along with next item reference. This allows to make iterators fail safe.
>> Pros: iterators are fail-safe
>> Cons: slightly more complicated implementation, may affect performance,
>> also I see no way to handle active iterators on remote nodes failures.
>>
>>
>> Best regards,
>>
>> Andrey.
>>
Reply | Threaded
Open this post in threaded view
|

Re: IgniteSet implementation: changes required

Valentin Kulichenko
Pavel,

I don't like an idea of creating separate cache for each data structure,
especially for collocated ones. Can actually, I'm not sure I understand how
that would help. It sounds like that we just need to properly persist the
data structures cache and then reload on restart.

-Val

On Thu, Feb 8, 2018 at 6:12 AM, Pavel Pereslegin <[hidden email]> wrote:

> Hello, Igniters!
>
> We have some issues with current IgniteSet implementation ([1], [2], [3],
> [4]).
>
> As was already described in this conversation, the main problem is
> that current IgniteSet implementation maintains plain Java sets on
> every node (see CacheDataStructuresManager.setDataMap). These sets
> duplicate backing-cache entries, both primary and backup. size() and
> iterator() calls issue distributed queries to collect/filter data from
> all setDataMap's.
>
> I believe we can solve specified issues if each instance of IgniteSet
> will have separate internal cache that will be destroyed on close.
>
> What do you think about such major change? Do you have any thoughts or
> objections?
>
> [1] https://issues.apache.org/jira/browse/IGNITE-7565
> [2] https://issues.apache.org/jira/browse/IGNITE-5370
> [3] https://issues.apache.org/jira/browse/IGNITE-5553
> [4] https://issues.apache.org/jira/browse/IGNITE-6474
>
>
> 2017-10-31 5:53 GMT+03:00 Dmitriy Setrakyan <[hidden email]>:
> > Hi Andrey,
> >
> > Thanks for a detailed email. I think your suggestions do make sense.
> Ignite
> > cannot afford to have a distributed set that is not fail-safe. Can you
> > please focus only on solutions that provide consistent behavior in case
> of
> > topology changes and failures and document them in the ticket?
> >
> > https://issues.apache.org/jira/browse/IGNITE-5553
> >
> > D.
> >
> > On Mon, Oct 30, 2017 at 3:07 AM, Andrey Kuznetsov <[hidden email]>
> wrote:
> >
> >> Hi, Igniters!
> >>
> >> Current implementation of IgniteSet is fragile with respect to cluster
> >> recovery from a checkpoint. We have an issue (IGNITE-5553) that
> addresses
> >> set's size() behavior, but the problem is slightly broader. The text
> below
> >> is my comment from Jira issue. I encourage you to discuss it.
> >>
> >> We can put current set size into set header cache entry. This will fix
> >> size(), but we have broken iterator() implementation as well.
> >>
> >> Currently, set implementation maintains plain Java sets on every node,
> see
> >> CacheDataStructuresManager.setDataMap. These sets duplicate
> backing-cache
> >> entries, both primary and backup. size() and iterator() calls issue
> >> distributed queries to collect/filter data from all setDataMap's. And
> >> setDataMaps remain empty after cluster is recovered from checkpoint.
> >>
> >> Now I see the following options to fix the issue.
> >>
> >> #1 - Naive. Iterate over all datastructure-backing caches entries during
> >> recover from checkpoint procedure, filter set-related entries and refill
> >> setDataMap's.
> >> Pros: easy to implement
> >> Cons: inpredictable time/memory overhead.
> >>
> >> #2 - More realistic. Avoid node-local copies of cache data. Maintain
> linked
> >> list in datastructure-backing cache: key is set item, value is next set
> >> item. List head is stored in set header cache entry (this set item is
> >> youngest one). Iterators build on top of this structure are fail-fast.
> >> Pros: less memory overhead, no need to maintain node-local mirrors of
> cache
> >> data
> >> Cons: iterators are not fail-safe.
> >>
> >> #3 - Option #2 modified. We can store reference counter and 'removed'
> flag
> >> along with next item reference. This allows to make iterators fail safe.
> >> Pros: iterators are fail-safe
> >> Cons: slightly more complicated implementation, may affect performance,
> >> also I see no way to handle active iterators on remote nodes failures.
> >>
> >>
> >> Best regards,
> >>
> >> Andrey.
> >>
>
Reply | Threaded
Open this post in threaded view
|

Re: IgniteSet implementation: changes required

Pavel Pereslegin
Hello, Valentin.

Thank you for the reply.

As mentioned in this conversation, for now we have at least two issues
with IgniteSet:
1. Incorrect behavior after recovery from PDS [1].
2. The data in the cache is duplicated on-heap [2], which is not
documented and lead to heap/GC overhead when using large Sets.

Without significant changes, it is possible to solve [1] with the
workaround, proposed by Andrey Kuznetsov - iterate over all
datastructure-backing caches entries during recover from checkpoint
procedure, filter set-related entries and refill setDataMap's.
As a workaround for [2] we can add configuration option which data
structure to use for "local caching" (on-heap or off-heap).
If we go this way then cache data duplication will remain and some
kind of off-heap ConcurrentHashMap should be implemented in Ignite
(probably, already exists in some form, need to investigate this topic
properly).

On the other hand, if we use separate cache for each IgniteSet instance:
1. It will be not necessary to maintain redundant data stored
somewhere other than the cache.
2. It will be not necessary to implement workaround for recovery from PDS.
For the collocated mode we can, for example, enforce REPLICATED cache mode.

Why don't you like the idea with separate cache?

[1] https://issues.apache.org/jira/browse/IGNITE-7565
[2] https://issues.apache.org/jira/browse/IGNITE-5553


2018-02-09 0:44 GMT+03:00 Valentin Kulichenko <[hidden email]>:

> Pavel,
>
> I don't like an idea of creating separate cache for each data structure,
> especially for collocated ones. Can actually, I'm not sure I understand how
> that would help. It sounds like that we just need to properly persist the
> data structures cache and then reload on restart.
>
> -Val
>
> On Thu, Feb 8, 2018 at 6:12 AM, Pavel Pereslegin <[hidden email]> wrote:
>
>> Hello, Igniters!
>>
>> We have some issues with current IgniteSet implementation ([1], [2], [3],
>> [4]).
>>
>> As was already described in this conversation, the main problem is
>> that current IgniteSet implementation maintains plain Java sets on
>> every node (see CacheDataStructuresManager.setDataMap). These sets
>> duplicate backing-cache entries, both primary and backup. size() and
>> iterator() calls issue distributed queries to collect/filter data from
>> all setDataMap's.
>>
>> I believe we can solve specified issues if each instance of IgniteSet
>> will have separate internal cache that will be destroyed on close.
>>
>> What do you think about such major change? Do you have any thoughts or
>> objections?
>>
>> [1] https://issues.apache.org/jira/browse/IGNITE-7565
>> [2] https://issues.apache.org/jira/browse/IGNITE-5370
>> [3] https://issues.apache.org/jira/browse/IGNITE-5553
>> [4] https://issues.apache.org/jira/browse/IGNITE-6474
>>
>>
>> 2017-10-31 5:53 GMT+03:00 Dmitriy Setrakyan <[hidden email]>:
>> > Hi Andrey,
>> >
>> > Thanks for a detailed email. I think your suggestions do make sense.
>> Ignite
>> > cannot afford to have a distributed set that is not fail-safe. Can you
>> > please focus only on solutions that provide consistent behavior in case
>> of
>> > topology changes and failures and document them in the ticket?
>> >
>> > https://issues.apache.org/jira/browse/IGNITE-5553
>> >
>> > D.
>> >
>> > On Mon, Oct 30, 2017 at 3:07 AM, Andrey Kuznetsov <[hidden email]>
>> wrote:
>> >
>> >> Hi, Igniters!
>> >>
>> >> Current implementation of IgniteSet is fragile with respect to cluster
>> >> recovery from a checkpoint. We have an issue (IGNITE-5553) that
>> addresses
>> >> set's size() behavior, but the problem is slightly broader. The text
>> below
>> >> is my comment from Jira issue. I encourage you to discuss it.
>> >>
>> >> We can put current set size into set header cache entry. This will fix
>> >> size(), but we have broken iterator() implementation as well.
>> >>
>> >> Currently, set implementation maintains plain Java sets on every node,
>> see
>> >> CacheDataStructuresManager.setDataMap. These sets duplicate
>> backing-cache
>> >> entries, both primary and backup. size() and iterator() calls issue
>> >> distributed queries to collect/filter data from all setDataMap's. And
>> >> setDataMaps remain empty after cluster is recovered from checkpoint.
>> >>
>> >> Now I see the following options to fix the issue.
>> >>
>> >> #1 - Naive. Iterate over all datastructure-backing caches entries during
>> >> recover from checkpoint procedure, filter set-related entries and refill
>> >> setDataMap's.
>> >> Pros: easy to implement
>> >> Cons: inpredictable time/memory overhead.
>> >>
>> >> #2 - More realistic. Avoid node-local copies of cache data. Maintain
>> linked
>> >> list in datastructure-backing cache: key is set item, value is next set
>> >> item. List head is stored in set header cache entry (this set item is
>> >> youngest one). Iterators build on top of this structure are fail-fast.
>> >> Pros: less memory overhead, no need to maintain node-local mirrors of
>> cache
>> >> data
>> >> Cons: iterators are not fail-safe.
>> >>
>> >> #3 - Option #2 modified. We can store reference counter and 'removed'
>> flag
>> >> along with next item reference. This allows to make iterators fail safe.
>> >> Pros: iterators are fail-safe
>> >> Cons: slightly more complicated implementation, may affect performance,
>> >> also I see no way to handle active iterators on remote nodes failures.
>> >>
>> >>
>> >> Best regards,
>> >>
>> >> Andrey.
>> >>
>>
Reply | Threaded
Open this post in threaded view
|

Re: IgniteSet implementation: changes required

dsetrakyan
Hi Pavel,

We have 2 types of data structures, collocated and non-collocated. The
difference between them is that the collocated set is generally smaller and
will always end up on the same node. Users generally will have many
colllocated sets. On the other hand, a non-collocated set can span multiple
nodes and therefore is able to store a lot more data.

I can see how cache-per-set strategy can be applied to the non-collocated
set. As a matter of fact, I would be surprised if it is not implemented
that way already.

However, I do not see this strategy applied to the collocated sets. Users
can have 1000s of collocated sets or more. Are you suggesting that this
will translate into 1000s of caches?

D.

On Fri, Feb 9, 2018 at 8:10 AM, Pavel Pereslegin <[hidden email]> wrote:

> Hello, Valentin.
>
> Thank you for the reply.
>
> As mentioned in this conversation, for now we have at least two issues
> with IgniteSet:
> 1. Incorrect behavior after recovery from PDS [1].
> 2. The data in the cache is duplicated on-heap [2], which is not
> documented and lead to heap/GC overhead when using large Sets.
>
> Without significant changes, it is possible to solve [1] with the
> workaround, proposed by Andrey Kuznetsov - iterate over all
> datastructure-backing caches entries during recover from checkpoint
> procedure, filter set-related entries and refill setDataMap's.
> As a workaround for [2] we can add configuration option which data
> structure to use for "local caching" (on-heap or off-heap).
> If we go this way then cache data duplication will remain and some
> kind of off-heap ConcurrentHashMap should be implemented in Ignite
> (probably, already exists in some form, need to investigate this topic
> properly).
>
> On the other hand, if we use separate cache for each IgniteSet instance:
> 1. It will be not necessary to maintain redundant data stored
> somewhere other than the cache.
> 2. It will be not necessary to implement workaround for recovery from PDS.
> For the collocated mode we can, for example, enforce REPLICATED cache mode.
>
> Why don't you like the idea with separate cache?
>
> [1] https://issues.apache.org/jira/browse/IGNITE-7565
> [2] https://issues.apache.org/jira/browse/IGNITE-5553
>
>
> 2018-02-09 0:44 GMT+03:00 Valentin Kulichenko <
> [hidden email]>:
> > Pavel,
> >
> > I don't like an idea of creating separate cache for each data structure,
> > especially for collocated ones. Can actually, I'm not sure I understand
> how
> > that would help. It sounds like that we just need to properly persist the
> > data structures cache and then reload on restart.
> >
> > -Val
> >
> > On Thu, Feb 8, 2018 at 6:12 AM, Pavel Pereslegin <[hidden email]>
> wrote:
> >
> >> Hello, Igniters!
> >>
> >> We have some issues with current IgniteSet implementation ([1], [2],
> [3],
> >> [4]).
> >>
> >> As was already described in this conversation, the main problem is
> >> that current IgniteSet implementation maintains plain Java sets on
> >> every node (see CacheDataStructuresManager.setDataMap). These sets
> >> duplicate backing-cache entries, both primary and backup. size() and
> >> iterator() calls issue distributed queries to collect/filter data from
> >> all setDataMap's.
> >>
> >> I believe we can solve specified issues if each instance of IgniteSet
> >> will have separate internal cache that will be destroyed on close.
> >>
> >> What do you think about such major change? Do you have any thoughts or
> >> objections?
> >>
> >> [1] https://issues.apache.org/jira/browse/IGNITE-7565
> >> [2] https://issues.apache.org/jira/browse/IGNITE-5370
> >> [3] https://issues.apache.org/jira/browse/IGNITE-5553
> >> [4] https://issues.apache.org/jira/browse/IGNITE-6474
> >>
> >>
> >> 2017-10-31 5:53 GMT+03:00 Dmitriy Setrakyan <[hidden email]>:
> >> > Hi Andrey,
> >> >
> >> > Thanks for a detailed email. I think your suggestions do make sense.
> >> Ignite
> >> > cannot afford to have a distributed set that is not fail-safe. Can you
> >> > please focus only on solutions that provide consistent behavior in
> case
> >> of
> >> > topology changes and failures and document them in the ticket?
> >> >
> >> > https://issues.apache.org/jira/browse/IGNITE-5553
> >> >
> >> > D.
> >> >
> >> > On Mon, Oct 30, 2017 at 3:07 AM, Andrey Kuznetsov <[hidden email]>
> >> wrote:
> >> >
> >> >> Hi, Igniters!
> >> >>
> >> >> Current implementation of IgniteSet is fragile with respect to
> cluster
> >> >> recovery from a checkpoint. We have an issue (IGNITE-5553) that
> >> addresses
> >> >> set's size() behavior, but the problem is slightly broader. The text
> >> below
> >> >> is my comment from Jira issue. I encourage you to discuss it.
> >> >>
> >> >> We can put current set size into set header cache entry. This will
> fix
> >> >> size(), but we have broken iterator() implementation as well.
> >> >>
> >> >> Currently, set implementation maintains plain Java sets on every
> node,
> >> see
> >> >> CacheDataStructuresManager.setDataMap. These sets duplicate
> >> backing-cache
> >> >> entries, both primary and backup. size() and iterator() calls issue
> >> >> distributed queries to collect/filter data from all setDataMap's. And
> >> >> setDataMaps remain empty after cluster is recovered from checkpoint.
> >> >>
> >> >> Now I see the following options to fix the issue.
> >> >>
> >> >> #1 - Naive. Iterate over all datastructure-backing caches entries
> during
> >> >> recover from checkpoint procedure, filter set-related entries and
> refill
> >> >> setDataMap's.
> >> >> Pros: easy to implement
> >> >> Cons: inpredictable time/memory overhead.
> >> >>
> >> >> #2 - More realistic. Avoid node-local copies of cache data. Maintain
> >> linked
> >> >> list in datastructure-backing cache: key is set item, value is next
> set
> >> >> item. List head is stored in set header cache entry (this set item is
> >> >> youngest one). Iterators build on top of this structure are
> fail-fast.
> >> >> Pros: less memory overhead, no need to maintain node-local mirrors of
> >> cache
> >> >> data
> >> >> Cons: iterators are not fail-safe.
> >> >>
> >> >> #3 - Option #2 modified. We can store reference counter and 'removed'
> >> flag
> >> >> along with next item reference. This allows to make iterators fail
> safe.
> >> >> Pros: iterators are fail-safe
> >> >> Cons: slightly more complicated implementation, may affect
> performance,
> >> >> also I see no way to handle active iterators on remote nodes
> failures.
> >> >>
> >> >>
> >> >> Best regards,
> >> >>
> >> >> Andrey.
> >> >>
> >>
>
Reply | Threaded
Open this post in threaded view
|

Re: IgniteSet implementation: changes required

Valentin Kulichenko
Pavel,

I'm a bit confused. In my understanding, issue exists because we have local
in-memory maps which are used as the main source of truth about which
structures currently exist. During restart, we lose all this data even if
data structures cache(s) are persisted. Once we fix this, issue goes away,
regardless of weather we store data structure per cache or everything in
single cache. Am I missing something?

I also agree with Dmitry. While cache per set approach can make sense for
non-collocated sets, for collocated ones it definitely doesn't. So I would
fix the original issue first, and then change the architecture if it's
really needed.

-Val

On Fri, Feb 9, 2018 at 10:39 AM, Dmitriy Setrakyan <[hidden email]>
wrote:

> Hi Pavel,
>
> We have 2 types of data structures, collocated and non-collocated. The
> difference between them is that the collocated set is generally smaller and
> will always end up on the same node. Users generally will have many
> colllocated sets. On the other hand, a non-collocated set can span multiple
> nodes and therefore is able to store a lot more data.
>
> I can see how cache-per-set strategy can be applied to the non-collocated
> set. As a matter of fact, I would be surprised if it is not implemented
> that way already.
>
> However, I do not see this strategy applied to the collocated sets. Users
> can have 1000s of collocated sets or more. Are you suggesting that this
> will translate into 1000s of caches?
>
> D.
>
> On Fri, Feb 9, 2018 at 8:10 AM, Pavel Pereslegin <[hidden email]> wrote:
>
> > Hello, Valentin.
> >
> > Thank you for the reply.
> >
> > As mentioned in this conversation, for now we have at least two issues
> > with IgniteSet:
> > 1. Incorrect behavior after recovery from PDS [1].
> > 2. The data in the cache is duplicated on-heap [2], which is not
> > documented and lead to heap/GC overhead when using large Sets.
> >
> > Without significant changes, it is possible to solve [1] with the
> > workaround, proposed by Andrey Kuznetsov - iterate over all
> > datastructure-backing caches entries during recover from checkpoint
> > procedure, filter set-related entries and refill setDataMap's.
> > As a workaround for [2] we can add configuration option which data
> > structure to use for "local caching" (on-heap or off-heap).
> > If we go this way then cache data duplication will remain and some
> > kind of off-heap ConcurrentHashMap should be implemented in Ignite
> > (probably, already exists in some form, need to investigate this topic
> > properly).
> >
> > On the other hand, if we use separate cache for each IgniteSet instance:
> > 1. It will be not necessary to maintain redundant data stored
> > somewhere other than the cache.
> > 2. It will be not necessary to implement workaround for recovery from
> PDS.
> > For the collocated mode we can, for example, enforce REPLICATED cache
> mode.
> >
> > Why don't you like the idea with separate cache?
> >
> > [1] https://issues.apache.org/jira/browse/IGNITE-7565
> > [2] https://issues.apache.org/jira/browse/IGNITE-5553
> >
> >
> > 2018-02-09 0:44 GMT+03:00 Valentin Kulichenko <
> > [hidden email]>:
> > > Pavel,
> > >
> > > I don't like an idea of creating separate cache for each data
> structure,
> > > especially for collocated ones. Can actually, I'm not sure I understand
> > how
> > > that would help. It sounds like that we just need to properly persist
> the
> > > data structures cache and then reload on restart.
> > >
> > > -Val
> > >
> > > On Thu, Feb 8, 2018 at 6:12 AM, Pavel Pereslegin <[hidden email]>
> > wrote:
> > >
> > >> Hello, Igniters!
> > >>
> > >> We have some issues with current IgniteSet implementation ([1], [2],
> > [3],
> > >> [4]).
> > >>
> > >> As was already described in this conversation, the main problem is
> > >> that current IgniteSet implementation maintains plain Java sets on
> > >> every node (see CacheDataStructuresManager.setDataMap). These sets
> > >> duplicate backing-cache entries, both primary and backup. size() and
> > >> iterator() calls issue distributed queries to collect/filter data from
> > >> all setDataMap's.
> > >>
> > >> I believe we can solve specified issues if each instance of IgniteSet
> > >> will have separate internal cache that will be destroyed on close.
> > >>
> > >> What do you think about such major change? Do you have any thoughts or
> > >> objections?
> > >>
> > >> [1] https://issues.apache.org/jira/browse/IGNITE-7565
> > >> [2] https://issues.apache.org/jira/browse/IGNITE-5370
> > >> [3] https://issues.apache.org/jira/browse/IGNITE-5553
> > >> [4] https://issues.apache.org/jira/browse/IGNITE-6474
> > >>
> > >>
> > >> 2017-10-31 5:53 GMT+03:00 Dmitriy Setrakyan <[hidden email]>:
> > >> > Hi Andrey,
> > >> >
> > >> > Thanks for a detailed email. I think your suggestions do make sense.
> > >> Ignite
> > >> > cannot afford to have a distributed set that is not fail-safe. Can
> you
> > >> > please focus only on solutions that provide consistent behavior in
> > case
> > >> of
> > >> > topology changes and failures and document them in the ticket?
> > >> >
> > >> > https://issues.apache.org/jira/browse/IGNITE-5553
> > >> >
> > >> > D.
> > >> >
> > >> > On Mon, Oct 30, 2017 at 3:07 AM, Andrey Kuznetsov <
> [hidden email]>
> > >> wrote:
> > >> >
> > >> >> Hi, Igniters!
> > >> >>
> > >> >> Current implementation of IgniteSet is fragile with respect to
> > cluster
> > >> >> recovery from a checkpoint. We have an issue (IGNITE-5553) that
> > >> addresses
> > >> >> set's size() behavior, but the problem is slightly broader. The
> text
> > >> below
> > >> >> is my comment from Jira issue. I encourage you to discuss it.
> > >> >>
> > >> >> We can put current set size into set header cache entry. This will
> > fix
> > >> >> size(), but we have broken iterator() implementation as well.
> > >> >>
> > >> >> Currently, set implementation maintains plain Java sets on every
> > node,
> > >> see
> > >> >> CacheDataStructuresManager.setDataMap. These sets duplicate
> > >> backing-cache
> > >> >> entries, both primary and backup. size() and iterator() calls issue
> > >> >> distributed queries to collect/filter data from all setDataMap's.
> And
> > >> >> setDataMaps remain empty after cluster is recovered from
> checkpoint.
> > >> >>
> > >> >> Now I see the following options to fix the issue.
> > >> >>
> > >> >> #1 - Naive. Iterate over all datastructure-backing caches entries
> > during
> > >> >> recover from checkpoint procedure, filter set-related entries and
> > refill
> > >> >> setDataMap's.
> > >> >> Pros: easy to implement
> > >> >> Cons: inpredictable time/memory overhead.
> > >> >>
> > >> >> #2 - More realistic. Avoid node-local copies of cache data.
> Maintain
> > >> linked
> > >> >> list in datastructure-backing cache: key is set item, value is next
> > set
> > >> >> item. List head is stored in set header cache entry (this set item
> is
> > >> >> youngest one). Iterators build on top of this structure are
> > fail-fast.
> > >> >> Pros: less memory overhead, no need to maintain node-local mirrors
> of
> > >> cache
> > >> >> data
> > >> >> Cons: iterators are not fail-safe.
> > >> >>
> > >> >> #3 - Option #2 modified. We can store reference counter and
> 'removed'
> > >> flag
> > >> >> along with next item reference. This allows to make iterators fail
> > safe.
> > >> >> Pros: iterators are fail-safe
> > >> >> Cons: slightly more complicated implementation, may affect
> > performance,
> > >> >> also I see no way to handle active iterators on remote nodes
> > failures.
> > >> >>
> > >> >>
> > >> >> Best regards,
> > >> >>
> > >> >> Andrey.
> > >> >>
> > >>
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: IgniteSet implementation: changes required

Andrey Kuznetsov
Hi all,

Current set implementation has significant flaw: all set data are
duplicated in onheap maps on _every_ node in order to make iterator() and
size(). For me it looks like simple yet ineffective implementation.
Currently, these maps are damaged by checkpointing/recovery, and we could
patch them somehow. Another future change to Ignite caches can damage them
again. This looks fragile when datastructure is not entirely backed by
caches. Pavel's proposal seems to be a reliable solution for non-collocated
sets.

9 февр. 2018 г. 22:46 пользователь "Valentin Kulichenko" <
[hidden email]> написал:

Pavel,

I'm a bit confused. In my understanding, issue exists because we have local
in-memory maps which are used as the main source of truth about which
structures currently exist. During restart, we lose all this data even if
data structures cache(s) are persisted. Once we fix this, issue goes away,
regardless of weather we store data structure per cache or everything in
single cache. Am I missing something?

I also agree with Dmitry. While cache per set approach can make sense for
non-collocated sets, for collocated ones it definitely doesn't. So I would
fix the original issue first, and then change the architecture if it's
really needed.

-Val
Reply | Threaded
Open this post in threaded view
|

Re: IgniteSet implementation: changes required

dsetrakyan
On Fri, Feb 9, 2018 at 6:26 PM, Andrey Kuznetsov <[hidden email]> wrote:

> Hi all,
>
> Current set implementation has significant flaw: all set data are
> duplicated in onheap maps on _every_ node in order to make iterator() and
> size(). For me it looks like simple yet ineffective implementation.
> Currently, these maps are damaged by checkpointing/recovery, and we could
> patch them somehow. Another future change to Ignite caches can damage them
> again. This looks fragile when datastructure is not entirely backed by
> caches. Pavel's proposal seems to be a reliable solution for non-collocated
> sets.
>

I would agree, but I was under an impression that non-collocated sets are
already implemented this way. Am I wrong?
Reply | Threaded
Open this post in threaded view
|

Re: IgniteSet implementation: changes required

Andrey Kuznetsov
Indeed, all sets, regardless of whether they collocated or not, share
single cache, and also use onheap data structures irresistable to
checkpointing/recovery.

2018-02-13 2:14 GMT+03:00 Dmitriy Setrakyan <[hidden email]>:

> On Fri, Feb 9, 2018 at 6:26 PM, Andrey Kuznetsov <[hidden email]>
> wrote:
>
> > Hi all,
> >
> > Current set implementation has significant flaw: all set data are
> > duplicated in onheap maps on _every_ node in order to make iterator() and
> > size(). For me it looks like simple yet ineffective implementation.
> > Currently, these maps are damaged by checkpointing/recovery, and we could
> > patch them somehow. Another future change to Ignite caches can damage
> them
> > again. This looks fragile when datastructure is not entirely backed by
> > caches. Pavel's proposal seems to be a reliable solution for
> non-collocated
> > sets.
> >
>
> I would agree, but I was under an impression that non-collocated sets are
> already implemented this way. Am I wrong?
>



--
Best regards,
  Andrey Kuznetsov.
Reply | Threaded
Open this post in threaded view
|

Re: IgniteSet implementation: changes required

Pavel Pereslegin
Hello, Igniters!

I agree that solution with separate caches is not acceptable for a
large number of sets.

So, I want to suggest one more way to implement IgniteSet that will
introduce element indexes (similar to IgniteQueue). To implement this
we can add head/tail indexes to IgniteSet header and for each
IgniteSet element store two key-value pairs:
 setKey, index
 index, setKey

Indexes are required to support iterator and they should be continuous.

Please, see detailed description in JIRA comment [1].

With such approach add/remove/contains operations will have O(1) time
complexity, iterator should work similar to current IgniteQueue
iterator, issues [2], [3] will be resolved, because PDS recovery will
work "out of the box" and we will not use JVM heap for duplicated
values.

Btw, we can use this implementation only for collocated mode (map
keys/indexes to IgniteSet name) and use separate caches for
non-collocated mode.

What do you think about this?

[1] https://issues.apache.org/jira/browse/IGNITE-5553#comment-16364043
[2] https://issues.apache.org/jira/browse/IGNITE-5553
[3] https://issues.apache.org/jira/browse/IGNITE-7565


2018-02-13 9:33 GMT+03:00 Andrey Kuznetsov <[hidden email]>:

> Indeed, all sets, regardless of whether they collocated or not, share
> single cache, and also use onheap data structures irresistable to
> checkpointing/recovery.
>
> 2018-02-13 2:14 GMT+03:00 Dmitriy Setrakyan <[hidden email]>:
>
>> On Fri, Feb 9, 2018 at 6:26 PM, Andrey Kuznetsov <[hidden email]>
>> wrote:
>>
>> > Hi all,
>> >
>> > Current set implementation has significant flaw: all set data are
>> > duplicated in onheap maps on _every_ node in order to make iterator() and
>> > size(). For me it looks like simple yet ineffective implementation.
>> > Currently, these maps are damaged by checkpointing/recovery, and we could
>> > patch them somehow. Another future change to Ignite caches can damage
>> them
>> > again. This looks fragile when datastructure is not entirely backed by
>> > caches. Pavel's proposal seems to be a reliable solution for
>> non-collocated
>> > sets.
>> >
>>
>> I would agree, but I was under an impression that non-collocated sets are
>> already implemented this way. Am I wrong?
>>
>
>
>
> --
> Best regards,
>   Andrey Kuznetsov.
Reply | Threaded
Open this post in threaded view
|

Re: IgniteSet implementation: changes required

Vladimir Ozerov
I do not think indexes is the right approach - set do not have indexes, and
you will have to maintain additional counter for it in order to know when
to stop.

From what I see there are two distinct problems:
1) Broken recovery - this is just a bug which needs to be fixed. As soon as
data is stored in real persistent cache, recovery of data structure state
should be trivial task.
2) Heap items - this should not be a problem in common case when set
contains moderate number of elements. If set is excessively large, then
this is not the right structure for your use case and you should use
standard IgniteCache API instead. What we can do is to optionally disable
on-heap caching for specific set at the cost of lower performance if user
wants so.

On Wed, Feb 14, 2018 at 4:51 PM, Pavel Pereslegin <[hidden email]> wrote:

> Hello, Igniters!
>
> I agree that solution with separate caches is not acceptable for a
> large number of sets.
>
> So, I want to suggest one more way to implement IgniteSet that will
> introduce element indexes (similar to IgniteQueue). To implement this
> we can add head/tail indexes to IgniteSet header and for each
> IgniteSet element store two key-value pairs:
>  setKey, index
>  index, setKey
>
> Indexes are required to support iterator and they should be continuous.
>
> Please, see detailed description in JIRA comment [1].
>
> With such approach add/remove/contains operations will have O(1) time
> complexity, iterator should work similar to current IgniteQueue
> iterator, issues [2], [3] will be resolved, because PDS recovery will
> work "out of the box" and we will not use JVM heap for duplicated
> values.
>
> Btw, we can use this implementation only for collocated mode (map
> keys/indexes to IgniteSet name) and use separate caches for
> non-collocated mode.
>
> What do you think about this?
>
> [1] https://issues.apache.org/jira/browse/IGNITE-5553#comment-16364043
> [2] https://issues.apache.org/jira/browse/IGNITE-5553
> [3] https://issues.apache.org/jira/browse/IGNITE-7565
>
>
> 2018-02-13 9:33 GMT+03:00 Andrey Kuznetsov <[hidden email]>:
> > Indeed, all sets, regardless of whether they collocated or not, share
> > single cache, and also use onheap data structures irresistable to
> > checkpointing/recovery.
> >
> > 2018-02-13 2:14 GMT+03:00 Dmitriy Setrakyan <[hidden email]>:
> >
> >> On Fri, Feb 9, 2018 at 6:26 PM, Andrey Kuznetsov <[hidden email]>
> >> wrote:
> >>
> >> > Hi all,
> >> >
> >> > Current set implementation has significant flaw: all set data are
> >> > duplicated in onheap maps on _every_ node in order to make iterator()
> and
> >> > size(). For me it looks like simple yet ineffective implementation.
> >> > Currently, these maps are damaged by checkpointing/recovery, and we
> could
> >> > patch them somehow. Another future change to Ignite caches can damage
> >> them
> >> > again. This looks fragile when datastructure is not entirely backed by
> >> > caches. Pavel's proposal seems to be a reliable solution for
> >> non-collocated
> >> > sets.
> >> >
> >>
> >> I would agree, but I was under an impression that non-collocated sets
> are
> >> already implemented this way. Am I wrong?
> >>
> >
> >
> >
> > --
> > Best regards,
> >   Andrey Kuznetsov.
>
Reply | Threaded
Open this post in threaded view
|

Re: IgniteSet implementation: changes required

dsetrakyan
On Thu, Feb 15, 2018 at 6:08 AM, Vladimir Ozerov <[hidden email]>
wrote:

> I do not think indexes is the right approach - set do not have indexes, and
> you will have to maintain additional counter for it in order to know when
> to stop.
>
> From what I see there are two distinct problems:
> 1) Broken recovery - this is just a bug which needs to be fixed. As soon as
> data is stored in real persistent cache, recovery of data structure state
> should be trivial task.
> 2) Heap items - this should not be a problem in common case when set
> contains moderate number of elements. If set is excessively large, then
> this is not the right structure for your use case and you should use
> standard IgniteCache API instead. What we can do is to optionally disable
> on-heap caching for specific set at the cost of lower performance if user
> wants so.
>

Vladimir, I am not sure I agree. In my view, set should be similar to
cache, just a different API. I am not sure why we should make an
assumptions that set data should be smaller than cache, especially given
that it is a trivial task to implement a set based on Ignite cache API (we
could just store key-key mappings in cache instead of key-value mappings
internally).

Can you clarify why you believe that IgniteSet should need to have on-heap
entries?

D.
Reply | Threaded
Open this post in threaded view
|

Re: IgniteSet implementation: changes required

Pavel Pereslegin
Hello Vladimir,

> What we can do is to optionally disable on-heap caching for specific set at the cost of lower performance if user wants so.

I want to make sure that we are speaking about the same thing. By
"on-heap caching" I mean custom datastructure and not standard on-heap
cache, we can't disable it now, because it is the part of current
IgniteSet implementation. Is it acceptable to use GridOffHeapMap
optionally instead of ConcurrentHashMap for "off-heap mode"?


2018-02-15 23:20 GMT+03:00 Dmitriy Setrakyan <[hidden email]>:

> On Thu, Feb 15, 2018 at 6:08 AM, Vladimir Ozerov <[hidden email]>
> wrote:
>
>> I do not think indexes is the right approach - set do not have indexes, and
>> you will have to maintain additional counter for it in order to know when
>> to stop.
>>
>> From what I see there are two distinct problems:
>> 1) Broken recovery - this is just a bug which needs to be fixed. As soon as
>> data is stored in real persistent cache, recovery of data structure state
>> should be trivial task.
>> 2) Heap items - this should not be a problem in common case when set
>> contains moderate number of elements. If set is excessively large, then
>> this is not the right structure for your use case and you should use
>> standard IgniteCache API instead. What we can do is to optionally disable
>> on-heap caching for specific set at the cost of lower performance if user
>> wants so.
>>
>
> Vladimir, I am not sure I agree. In my view, set should be similar to
> cache, just a different API. I am not sure why we should make an
> assumptions that set data should be smaller than cache, especially given
> that it is a trivial task to implement a set based on Ignite cache API (we
> could just store key-key mappings in cache instead of key-value mappings
> internally).
>
> Can you clarify why you believe that IgniteSet should need to have on-heap
> entries?
>
> D.
Reply | Threaded
Open this post in threaded view
|

Re: IgniteSet implementation: changes required

Vladimir Ozerov
I think the root issue is that we are trying to mix different cases in a
single solution. What is the common usage patterns of sets?
1) Small mostly-read sets - current implementation is ideal for them -
everything is available locally, on-heap and in deserialized form
2) Big data sets - IgniteCache API is the best candidate here; we can
simply add a method "Set<K> IgniteCache.asSet()" which will return thin
wrapper for Set interface around cache. An you will have recovery and
heap-resistance with almost no additional efforts.
There is no need to choose between one or another solution. We can support
both.

As far as current igniteSet implementation - all issues described above
will go away if you use approach suggested by Dmitry. I do not see a need
for any major changes.

Vladimir.

On Mon, Feb 19, 2018 at 10:44 AM, Pavel Pereslegin <[hidden email]> wrote:

> Hello Vladimir,
>
> > What we can do is to optionally disable on-heap caching for specific set
> at the cost of lower performance if user wants so.
>
> I want to make sure that we are speaking about the same thing. By
> "on-heap caching" I mean custom datastructure and not standard on-heap
> cache, we can't disable it now, because it is the part of current
> IgniteSet implementation. Is it acceptable to use GridOffHeapMap
> optionally instead of ConcurrentHashMap for "off-heap mode"?
>
>
> 2018-02-15 23:20 GMT+03:00 Dmitriy Setrakyan <[hidden email]>:
> > On Thu, Feb 15, 2018 at 6:08 AM, Vladimir Ozerov <[hidden email]>
> > wrote:
> >
> >> I do not think indexes is the right approach - set do not have indexes,
> and
> >> you will have to maintain additional counter for it in order to know
> when
> >> to stop.
> >>
> >> From what I see there are two distinct problems:
> >> 1) Broken recovery - this is just a bug which needs to be fixed. As
> soon as
> >> data is stored in real persistent cache, recovery of data structure
> state
> >> should be trivial task.
> >> 2) Heap items - this should not be a problem in common case when set
> >> contains moderate number of elements. If set is excessively large, then
> >> this is not the right structure for your use case and you should use
> >> standard IgniteCache API instead. What we can do is to optionally
> disable
> >> on-heap caching for specific set at the cost of lower performance if
> user
> >> wants so.
> >>
> >
> > Vladimir, I am not sure I agree. In my view, set should be similar to
> > cache, just a different API. I am not sure why we should make an
> > assumptions that set data should be smaller than cache, especially given
> > that it is a trivial task to implement a set based on Ignite cache API
> (we
> > could just store key-key mappings in cache instead of key-value mappings
> > internally).
> >
> > Can you clarify why you believe that IgniteSet should need to have
> on-heap
> > entries?
> >
> > D.
>
Reply | Threaded
Open this post in threaded view
|

Re: IgniteSet implementation: changes required

Andrey Kuznetsov
As far as I know, Pavel P. is working on fixing existing sets currently.

As for {{asSet}} cache adapter, I filed the ticket [1].

[1] https://issues.apache.org/jira/browse/IGNITE-7823

2018-02-27 11:20 GMT+03:00 Vladimir Ozerov <[hidden email]>:

> I think the root issue is that we are trying to mix different cases in a
> single solution. What is the common usage patterns of sets?
> 1) Small mostly-read sets - current implementation is ideal for them -
> everything is available locally, on-heap and in deserialized form
> 2) Big data sets - IgniteCache API is the best candidate here; we can
> simply add a method "Set<K> IgniteCache.asSet()" which will return thin
> wrapper for Set interface around cache. An you will have recovery and
> heap-resistance with almost no additional efforts.
> There is no need to choose between one or another solution. We can support
> both.
>
> As far as current igniteSet implementation - all issues described above
> will go away if you use approach suggested by Dmitry. I do not see a need
> for any major changes.
>
> Vladimir.
>

--
Best regards,
  Andrey Kuznetsov.
Reply | Threaded
Open this post in threaded view
|

Re: IgniteSet implementation: changes required

Pavel Pereslegin
Hello Igniters.

I'm working on the implementation of the IgniteCache#asSet method [1]
and I think it should return Set (not IgniteSet). Because IgniteSet
was introduced mainly to add methods for the collocated version of
IgniteSet.

Any thoughts?

[1] https://issues.apache.org/jira/browse/IGNITE-7823


2018-02-27 14:59 GMT+03:00 Andrey Kuznetsov <[hidden email]>:

> As far as I know, Pavel P. is working on fixing existing sets currently.
>
> As for {{asSet}} cache adapter, I filed the ticket [1].
>
> [1] https://issues.apache.org/jira/browse/IGNITE-7823
>
> 2018-02-27 11:20 GMT+03:00 Vladimir Ozerov <[hidden email]>:
>
>> I think the root issue is that we are trying to mix different cases in a
>> single solution. What is the common usage patterns of sets?
>> 1) Small mostly-read sets - current implementation is ideal for them -
>> everything is available locally, on-heap and in deserialized form
>> 2) Big data sets - IgniteCache API is the best candidate here; we can
>> simply add a method "Set<K> IgniteCache.asSet()" which will return thin
>> wrapper for Set interface around cache. An you will have recovery and
>> heap-resistance with almost no additional efforts.
>> There is no need to choose between one or another solution. We can support
>> both.
>>
>> As far as current igniteSet implementation - all issues described above
>> will go away if you use approach suggested by Dmitry. I do not see a need
>> for any major changes.
>>
>> Vladimir.
>>
>
> --
> Best regards,
>   Andrey Kuznetsov.
Reply | Threaded
Open this post in threaded view
|

Re: IgniteSet implementation: changes required

Dmitriy Pavlov
Hi Pavel,

I did not quite understand the purpose of the new collection and the
purpose of introduced new set.

Is ticket descriptoin states, that it is for better documentation?

Can we improve the task description
https://issues.apache.org/jira/browse/IGNITE-7823 ?

Sincerely,
Dmitriy Pavlov

ср, 14 мар. 2018 г. в 12:31, Pavel Pereslegin <[hidden email]>:

> Hello Igniters.
>
> I'm working on the implementation of the IgniteCache#asSet method [1]
> and I think it should return Set (not IgniteSet). Because IgniteSet
> was introduced mainly to add methods for the collocated version of
> IgniteSet.
>
> Any thoughts?
>
> [1] https://issues.apache.org/jira/browse/IGNITE-7823
>
>
> 2018-02-27 14:59 GMT+03:00 Andrey Kuznetsov <[hidden email]>:
> > As far as I know, Pavel P. is working on fixing existing sets currently.
> >
> > As for {{asSet}} cache adapter, I filed the ticket [1].
> >
> > [1] https://issues.apache.org/jira/browse/IGNITE-7823
> >
> > 2018-02-27 11:20 GMT+03:00 Vladimir Ozerov <[hidden email]>:
> >
> >> I think the root issue is that we are trying to mix different cases in a
> >> single solution. What is the common usage patterns of sets?
> >> 1) Small mostly-read sets - current implementation is ideal for them -
> >> everything is available locally, on-heap and in deserialized form
> >> 2) Big data sets - IgniteCache API is the best candidate here; we can
> >> simply add a method "Set<K> IgniteCache.asSet()" which will return thin
> >> wrapper for Set interface around cache. An you will have recovery and
> >> heap-resistance with almost no additional efforts.
> >> There is no need to choose between one or another solution. We can
> support
> >> both.
> >>
> >> As far as current igniteSet implementation - all issues described above
> >> will go away if you use approach suggested by Dmitry. I do not see a
> need
> >> for any major changes.
> >>
> >> Vladimir.
> >>
> >
> > --
> > Best regards,
> >   Andrey Kuznetsov.
>
Reply | Threaded
Open this post in threaded view
|

Re: IgniteSet implementation: changes required

Andrey Kuznetsov
Hi, Dmitry.

The primary goal of the ticket is to implement {{IgniteCache::asSet}} view.
The rationale is introduced earlier in this talk by Vladimir O. I've also
mentioned the need to document that new method properly: it should be used
to create large sets. Existing {{IgniteSets}} are good to represent small
sets.

2018-03-14 20:05 GMT+03:00 Dmitry Pavlov <[hidden email]>:

> Hi Pavel,
>
> I did not quite understand the purpose of the new collection and the
> purpose of introduced new set.
>
> Is ticket descriptoin states, that it is for better documentation?
>
> Can we improve the task description
> https://issues.apache.org/jira/browse/IGNITE-7823 ?
>
> Sincerely,
> Dmitriy Pavlov
>
> ср, 14 мар. 2018 г. в 12:31, Pavel Pereslegin <[hidden email]>:
>
> > Hello Igniters.
> >
> > I'm working on the implementation of the IgniteCache#asSet method [1]
> > and I think it should return Set (not IgniteSet). Because IgniteSet
> > was introduced mainly to add methods for the collocated version of
> > IgniteSet.
> >
> > Any thoughts?
> >
> > [1] https://issues.apache.org/jira/browse/IGNITE-7823
> >
> >
> > 2018-02-27 14:59 GMT+03:00 Andrey Kuznetsov <[hidden email]>:
> > > As far as I know, Pavel P. is working on fixing existing sets
> currently.
> > >
> > > As for {{asSet}} cache adapter, I filed the ticket [1].
> > >
> > > [1] https://issues.apache.org/jira/browse/IGNITE-7823
> > >
> > > 2018-02-27 11:20 GMT+03:00 Vladimir Ozerov <[hidden email]>:
> > >
> > >> I think the root issue is that we are trying to mix different cases
> in a
> > >> single solution. What is the common usage patterns of sets?
> > >> 1) Small mostly-read sets - current implementation is ideal for them -
> > >> everything is available locally, on-heap and in deserialized form
> > >> 2) Big data sets - IgniteCache API is the best candidate here; we can
> > >> simply add a method "Set<K> IgniteCache.asSet()" which will return
> thin
> > >> wrapper for Set interface around cache. An you will have recovery and
> > >> heap-resistance with almost no additional efforts.
> > >> There is no need to choose between one or another solution. We can
> > support
> > >> both.
> > >>
> > >> As far as current igniteSet implementation - all issues described
> above
> > >> will go away if you use approach suggested by Dmitry. I do not see a
> > need
> > >> for any major changes.
> > >>
> > >> Vladimir.
> > >>
> > >
> > > --
> > > Best regards,
> > >   Andrey Kuznetsov.
> >
>



--
Best regards,
  Andrey Kuznetsov.
Reply | Threaded
Open this post in threaded view
|

Re: IgniteSet implementation: changes required

dsetrakyan
I am not sure I like the "asSet()" method. We already have Ignite.set(...)
method and now introducing yet another one. The new design should fix the
existing implementation. We cannot keep the broken implementation around
and introduce yet another one. To be consistent, we should also stick to
the IgniteSet abstraction.

Just to be clear, I am in agreement that a non-collocated set should be
based on a cache, but I do not see why the existing API cannot be re-used.

D.


On Wed, Mar 14, 2018 at 1:26 PM, Andrey Kuznetsov <[hidden email]> wrote:

> Hi, Dmitry.
>
> The primary goal of the ticket is to implement {{IgniteCache::asSet}} view.
> The rationale is introduced earlier in this talk by Vladimir O. I've also
> mentioned the need to document that new method properly: it should be used
> to create large sets. Existing {{IgniteSets}} are good to represent small
> sets.
>
> 2018-03-14 20:05 GMT+03:00 Dmitry Pavlov <[hidden email]>:
>
> > Hi Pavel,
> >
> > I did not quite understand the purpose of the new collection and the
> > purpose of introduced new set.
> >
> > Is ticket descriptoin states, that it is for better documentation?
> >
> > Can we improve the task description
> > https://issues.apache.org/jira/browse/IGNITE-7823 ?
> >
> > Sincerely,
> > Dmitriy Pavlov
> >
> > ср, 14 мар. 2018 г. в 12:31, Pavel Pereslegin <[hidden email]>:
> >
> > > Hello Igniters.
> > >
> > > I'm working on the implementation of the IgniteCache#asSet method [1]
> > > and I think it should return Set (not IgniteSet). Because IgniteSet
> > > was introduced mainly to add methods for the collocated version of
> > > IgniteSet.
> > >
> > > Any thoughts?
> > >
> > > [1] https://issues.apache.org/jira/browse/IGNITE-7823
> > >
> > >
> > > 2018-02-27 14:59 GMT+03:00 Andrey Kuznetsov <[hidden email]>:
> > > > As far as I know, Pavel P. is working on fixing existing sets
> > currently.
> > > >
> > > > As for {{asSet}} cache adapter, I filed the ticket [1].
> > > >
> > > > [1] https://issues.apache.org/jira/browse/IGNITE-7823
> > > >
> > > > 2018-02-27 11:20 GMT+03:00 Vladimir Ozerov <[hidden email]>:
> > > >
> > > >> I think the root issue is that we are trying to mix different cases
> > in a
> > > >> single solution. What is the common usage patterns of sets?
> > > >> 1) Small mostly-read sets - current implementation is ideal for
> them -
> > > >> everything is available locally, on-heap and in deserialized form
> > > >> 2) Big data sets - IgniteCache API is the best candidate here; we
> can
> > > >> simply add a method "Set<K> IgniteCache.asSet()" which will return
> > thin
> > > >> wrapper for Set interface around cache. An you will have recovery
> and
> > > >> heap-resistance with almost no additional efforts.
> > > >> There is no need to choose between one or another solution. We can
> > > support
> > > >> both.
> > > >>
> > > >> As far as current igniteSet implementation - all issues described
> > above
> > > >> will go away if you use approach suggested by Dmitry. I do not see a
> > > need
> > > >> for any major changes.
> > > >>
> > > >> Vladimir.
> > > >>
> > > >
> > > > --
> > > > Best regards,
> > > >   Andrey Kuznetsov.
> > >
> >
>
>
>
> --
> Best regards,
>   Andrey Kuznetsov.
>
12