Hi, Igniters,
I want to raise a topic about our affinity node definition. After adding baseline (affinity) topology (BL(A)T) things start being complicated. Plenty of bugs appears: IGNITE-8173 ignite.getOrCreateCache(cacheConfig).iterator() method works incorrect for replicated cache in case if some data node isn't in baseline IGNITE-7628 SqlQuery hangs indefinitely with additional not registered in baseline node. It's because everything relies on concept "affinity node". And until now it was as simple as a server node which passes node filter. Other words any server node which is not filtered out by node filter. But node which is not in BL(A)T and which passes node filter would be treated as affinity node. And it's definitely wrong. At least, it is a source of many bugs (I believe there are much more than those 2 which I already have mentioned). It's clear that this definition should be changed. Let's start with a new definition of "Affinity topology". Affinity topology is a set of nodes which potentially could keep data. If we use knowledge about the current realization we can say that 1. for in-memory cache groups it would be all server nodes; 2. for persistent cache groups it would be BL(A)T. I will further use Dynamic Affinity Topology or DAT for 1 (in-memory cache groups) and Static Affinity Topology or SAT instead BL(A)T, or 2nd point. Denote node filter as f(X), where X is affinity topology. Then we can say that node A is affinity node if A ∈ AT', where AT' = f(AT), where AT is DAT or SAT. It worth to mention that AT' should be used to pass to affinity function of cache groups. Also, AT and AT' could change during the time (BL(A)T changes or node joins/disconnections). And I don't like fact that usage of DAT or SAT relies on persistence settings (Should we make it configurable per cache group?). Ok, I have created a ticket to implement this changes and will start working on it. https://issues.apache.org/jira/browse/IGNITE-8380 (Affinity node calculation doesn't take into account BLT). Also, I want to use these definitions (Affinity Topology, Affinity Node, DAT, SAT) in documentation and java docs. Maybe, we also should consider replacing BL(A)T with SAT. Thank you for your attention. |
Eduard,
Can you please summarize code changes that you are proposing? I agree that BLT is a bit misleading term and DAT/SAT make more sense. However, establishing a consensus on v2.4 Baseline Topology terminology took a long time and seems like you are going to cause a bit more perturbations. I still don't understand what and how should be changed. Please provide summary of upcoming class renamings and changes of existing system parts. Best Regards, Ivan Rakov On 24.04.2018 17:46, Eduard Shangareev wrote: > Hi, Igniters, > > I want to raise a topic about our affinity node definition. > > After adding baseline (affinity) topology (BL(A)T) things start being > complicated. > > Plenty of bugs appears: > > IGNITE-8173 > ignite.getOrCreateCache(cacheConfig).iterator() method works incorrect for > replicated cache in case if some data node isn't in baseline > > IGNITE-7628 > SqlQuery hangs indefinitely with additional not registered in baseline node. > > It's because everything relies on concept "affinity node". > And until now it was as simple as a server node which passes node filter. > Other words any server node which is not filtered out by node filter. > > But node which is not in BL(A)T and which passes node filter would be > treated as affinity node. And it's definitely wrong. At least, it is a > source of many bugs (I believe there are much more than those 2 which I > already have mentioned). > > It's clear that this definition should be changed. > Let's start with a new definition of "Affinity topology". Affinity topology > is a set of nodes which potentially could keep data. > > If we use knowledge about the current realization we can say that 1. for > in-memory cache groups it would be all server nodes; > 2. for persistent cache groups it would be BL(A)T. > > I will further use Dynamic Affinity Topology or DAT for 1 (in-memory cache > groups) and Static Affinity Topology or SAT instead BL(A)T, or 2nd point. > > Denote node filter as f(X), where X is affinity topology. > > Then we can say that node A is affinity node if > A ∈ AT', where AT' = f(AT), where AT is DAT or SAT. > > It worth to mention that AT' should be used to pass to affinity function of > cache groups. > Also, AT and AT' could change during the time (BL(A)T changes or node > joins/disconnections). > > And I don't like fact that usage of DAT or SAT relies on persistence > settings (Should we make it configurable per cache group?). > > Ok, I have created a ticket to implement this changes and will start > working on it. > https://issues.apache.org/jira/browse/IGNITE-8380 (Affinity node > calculation doesn't take into account BLT). > > Also, I want to use these definitions (Affinity Topology, Affinity Node, > DAT, SAT) in documentation and java docs. > > Maybe, we also should consider replacing BL(A)T with SAT. > > Thank you for your attention. > |
Guys,
As a user I definitely do not want to think about BLATs, SATs, DATs, whatsoever. I want to query data, iterate over data, send compute tasks to data. If certain node is outside of BLAT and do not have data, then this is not affinity node. Can we just fix affinity logic to take in count BLAT appropriately? On Tue, Apr 24, 2018 at 6:12 PM, Ivan Rakov <[hidden email]> wrote: > Eduard, > > Can you please summarize code changes that you are proposing? > I agree that BLT is a bit misleading term and DAT/SAT make more sense. > However, establishing a consensus on v2.4 Baseline Topology terminology > took a long time and seems like you are going to cause a bit more > perturbations. > I still don't understand what and how should be changed. Please provide > summary of upcoming class renamings and changes of existing system parts. > > Best Regards, > Ivan Rakov > > > On 24.04.2018 17:46, Eduard Shangareev wrote: > >> Hi, Igniters, >> >> I want to raise a topic about our affinity node definition. >> >> After adding baseline (affinity) topology (BL(A)T) things start being >> complicated. >> >> Plenty of bugs appears: >> >> IGNITE-8173 >> ignite.getOrCreateCache(cacheConfig).iterator() method works incorrect >> for >> replicated cache in case if some data node isn't in baseline >> >> IGNITE-7628 >> SqlQuery hangs indefinitely with additional not registered in baseline >> node. >> >> It's because everything relies on concept "affinity node". >> And until now it was as simple as a server node which passes node filter. >> Other words any server node which is not filtered out by node filter. >> >> But node which is not in BL(A)T and which passes node filter would be >> treated as affinity node. And it's definitely wrong. At least, it is a >> source of many bugs (I believe there are much more than those 2 which I >> already have mentioned). >> >> It's clear that this definition should be changed. >> Let's start with a new definition of "Affinity topology". Affinity >> topology >> is a set of nodes which potentially could keep data. >> >> If we use knowledge about the current realization we can say that 1. for >> in-memory cache groups it would be all server nodes; >> 2. for persistent cache groups it would be BL(A)T. >> >> I will further use Dynamic Affinity Topology or DAT for 1 (in-memory cache >> groups) and Static Affinity Topology or SAT instead BL(A)T, or 2nd point. >> >> Denote node filter as f(X), where X is affinity topology. >> >> Then we can say that node A is affinity node if >> A ∈ AT', where AT' = f(AT), where AT is DAT or SAT. >> >> It worth to mention that AT' should be used to pass to affinity function >> of >> cache groups. >> Also, AT and AT' could change during the time (BL(A)T changes or node >> joins/disconnections). >> >> And I don't like fact that usage of DAT or SAT relies on persistence >> settings (Should we make it configurable per cache group?). >> >> Ok, I have created a ticket to implement this changes and will start >> working on it. >> https://issues.apache.org/jira/browse/IGNITE-8380 (Affinity node >> calculation doesn't take into account BLT). >> >> Also, I want to use these definitions (Affinity Topology, Affinity Node, >> DAT, SAT) in documentation and java docs. >> >> Maybe, we also should consider replacing BL(A)T with SAT. >> >> Thank you for your attention. >> >> > |
Vladimir,
It will be fixed, But it is not user-list. We (developers) should decide ourselves how to go ahead with these concepts. And I think that our old approach to describe BLAT is sophisticated and not clear (maybe, even error-prone). On Tue, Apr 24, 2018 at 6:28 PM, Vladimir Ozerov <[hidden email]> wrote: > Guys, > > As a user I definitely do not want to think about BLATs, SATs, DATs, > whatsoever. I want to query data, iterate over data, send compute tasks to > data. If certain node is outside of BLAT and do not have data, then this is > not affinity node. Can we just fix affinity logic to take in count BLAT > appropriately? > > On Tue, Apr 24, 2018 at 6:12 PM, Ivan Rakov <[hidden email]> wrote: > > > Eduard, > > > > Can you please summarize code changes that you are proposing? > > I agree that BLT is a bit misleading term and DAT/SAT make more sense. > > However, establishing a consensus on v2.4 Baseline Topology terminology > > took a long time and seems like you are going to cause a bit more > > perturbations. > > I still don't understand what and how should be changed. Please provide > > summary of upcoming class renamings and changes of existing system parts. > > > > Best Regards, > > Ivan Rakov > > > > > > On 24.04.2018 17:46, Eduard Shangareev wrote: > > > >> Hi, Igniters, > >> > >> I want to raise a topic about our affinity node definition. > >> > >> After adding baseline (affinity) topology (BL(A)T) things start being > >> complicated. > >> > >> Plenty of bugs appears: > >> > >> IGNITE-8173 > >> ignite.getOrCreateCache(cacheConfig).iterator() method works incorrect > >> for > >> replicated cache in case if some data node isn't in baseline > >> > >> IGNITE-7628 > >> SqlQuery hangs indefinitely with additional not registered in baseline > >> node. > >> > >> It's because everything relies on concept "affinity node". > >> And until now it was as simple as a server node which passes node > filter. > >> Other words any server node which is not filtered out by node filter. > >> > >> But node which is not in BL(A)T and which passes node filter would be > >> treated as affinity node. And it's definitely wrong. At least, it is a > >> source of many bugs (I believe there are much more than those 2 which I > >> already have mentioned). > >> > >> It's clear that this definition should be changed. > >> Let's start with a new definition of "Affinity topology". Affinity > >> topology > >> is a set of nodes which potentially could keep data. > >> > >> If we use knowledge about the current realization we can say that 1. for > >> in-memory cache groups it would be all server nodes; > >> 2. for persistent cache groups it would be BL(A)T. > >> > >> I will further use Dynamic Affinity Topology or DAT for 1 (in-memory > cache > >> groups) and Static Affinity Topology or SAT instead BL(A)T, or 2nd > point. > >> > >> Denote node filter as f(X), where X is affinity topology. > >> > >> Then we can say that node A is affinity node if > >> A ∈ AT', where AT' = f(AT), where AT is DAT or SAT. > >> > >> It worth to mention that AT' should be used to pass to affinity function > >> of > >> cache groups. > >> Also, AT and AT' could change during the time (BL(A)T changes or node > >> joins/disconnections). > >> > >> And I don't like fact that usage of DAT or SAT relies on persistence > >> settings (Should we make it configurable per cache group?). > >> > >> Ok, I have created a ticket to implement this changes and will start > >> working on it. > >> https://issues.apache.org/jira/browse/IGNITE-8380 (Affinity node > >> calculation doesn't take into account BLT). > >> > >> Also, I want to use these definitions (Affinity Topology, Affinity Node, > >> DAT, SAT) in documentation and java docs. > >> > >> Maybe, we also should consider replacing BL(A)T with SAT. > >> > >> Thank you for your attention. > >> > >> > > > |
Ed,
Agreed. Can we see proposed API changes? On Tue, Apr 24, 2018 at 6:39 PM, Eduard Shangareev < [hidden email]> wrote: > Vladimir, > > It will be fixed, But it is not user-list. > > We (developers) should decide ourselves how to go ahead with these > concepts. > > And I think that our old approach to describe BLAT is sophisticated and not > clear (maybe, even error-prone). > > On Tue, Apr 24, 2018 at 6:28 PM, Vladimir Ozerov <[hidden email]> > wrote: > > > Guys, > > > > As a user I definitely do not want to think about BLATs, SATs, DATs, > > whatsoever. I want to query data, iterate over data, send compute tasks > to > > data. If certain node is outside of BLAT and do not have data, then this > is > > not affinity node. Can we just fix affinity logic to take in count BLAT > > appropriately? > > > > On Tue, Apr 24, 2018 at 6:12 PM, Ivan Rakov <[hidden email]> > wrote: > > > > > Eduard, > > > > > > Can you please summarize code changes that you are proposing? > > > I agree that BLT is a bit misleading term and DAT/SAT make more sense. > > > However, establishing a consensus on v2.4 Baseline Topology terminology > > > took a long time and seems like you are going to cause a bit more > > > perturbations. > > > I still don't understand what and how should be changed. Please provide > > > summary of upcoming class renamings and changes of existing system > parts. > > > > > > Best Regards, > > > Ivan Rakov > > > > > > > > > On 24.04.2018 17:46, Eduard Shangareev wrote: > > > > > >> Hi, Igniters, > > >> > > >> I want to raise a topic about our affinity node definition. > > >> > > >> After adding baseline (affinity) topology (BL(A)T) things start being > > >> complicated. > > >> > > >> Plenty of bugs appears: > > >> > > >> IGNITE-8173 > > >> ignite.getOrCreateCache(cacheConfig).iterator() method works > incorrect > > >> for > > >> replicated cache in case if some data node isn't in baseline > > >> > > >> IGNITE-7628 > > >> SqlQuery hangs indefinitely with additional not registered in baseline > > >> node. > > >> > > >> It's because everything relies on concept "affinity node". > > >> And until now it was as simple as a server node which passes node > > filter. > > >> Other words any server node which is not filtered out by node filter. > > >> > > >> But node which is not in BL(A)T and which passes node filter would be > > >> treated as affinity node. And it's definitely wrong. At least, it is a > > >> source of many bugs (I believe there are much more than those 2 which > I > > >> already have mentioned). > > >> > > >> It's clear that this definition should be changed. > > >> Let's start with a new definition of "Affinity topology". Affinity > > >> topology > > >> is a set of nodes which potentially could keep data. > > >> > > >> If we use knowledge about the current realization we can say that 1. > for > > >> in-memory cache groups it would be all server nodes; > > >> 2. for persistent cache groups it would be BL(A)T. > > >> > > >> I will further use Dynamic Affinity Topology or DAT for 1 (in-memory > > cache > > >> groups) and Static Affinity Topology or SAT instead BL(A)T, or 2nd > > point. > > >> > > >> Denote node filter as f(X), where X is affinity topology. > > >> > > >> Then we can say that node A is affinity node if > > >> A ∈ AT', where AT' = f(AT), where AT is DAT or SAT. > > >> > > >> It worth to mention that AT' should be used to pass to affinity > function > > >> of > > >> cache groups. > > >> Also, AT and AT' could change during the time (BL(A)T changes or node > > >> joins/disconnections). > > >> > > >> And I don't like fact that usage of DAT or SAT relies on persistence > > >> settings (Should we make it configurable per cache group?). > > >> > > >> Ok, I have created a ticket to implement this changes and will start > > >> working on it. > > >> https://issues.apache.org/jira/browse/IGNITE-8380 (Affinity node > > >> calculation doesn't take into account BLT). > > >> > > >> Also, I want to use these definitions (Affinity Topology, Affinity > Node, > > >> DAT, SAT) in documentation and java docs. > > >> > > >> Maybe, we also should consider replacing BL(A)T with SAT. > > >> > > >> Thank you for your attention. > > >> > > >> > > > > > > |
In reply to this post by Vladimir Ozerov
+ for Vladimir's point - adding more complexity may (and likely will) be
even more misleading. Can we take a step back and discuss why do we need to have different behavior for persistent and in-memory caches? Can we make in-memory caches honor baseline instead of special-casing them? Thanks, Stan вт, 24 апр. 2018 г., 18:28 Vladimir Ozerov <[hidden email]>: > Guys, > > As a user I definitely do not want to think about BLATs, SATs, DATs, > whatsoever. I want to query data, iterate over data, send compute tasks to > data. If certain node is outside of BLAT and do not have data, then this is > not affinity node. Can we just fix affinity logic to take in count BLAT > appropriately? > > On Tue, Apr 24, 2018 at 6:12 PM, Ivan Rakov <[hidden email]> wrote: > > > Eduard, > > > > Can you please summarize code changes that you are proposing? > > I agree that BLT is a bit misleading term and DAT/SAT make more sense. > > However, establishing a consensus on v2.4 Baseline Topology terminology > > took a long time and seems like you are going to cause a bit more > > perturbations. > > I still don't understand what and how should be changed. Please provide > > summary of upcoming class renamings and changes of existing system parts. > > > > Best Regards, > > Ivan Rakov > > > > > > On 24.04.2018 17:46, Eduard Shangareev wrote: > > > >> Hi, Igniters, > >> > >> I want to raise a topic about our affinity node definition. > >> > >> After adding baseline (affinity) topology (BL(A)T) things start being > >> complicated. > >> > >> Plenty of bugs appears: > >> > >> IGNITE-8173 > >> ignite.getOrCreateCache(cacheConfig).iterator() method works incorrect > >> for > >> replicated cache in case if some data node isn't in baseline > >> > >> IGNITE-7628 > >> SqlQuery hangs indefinitely with additional not registered in baseline > >> node. > >> > >> It's because everything relies on concept "affinity node". > >> And until now it was as simple as a server node which passes node > filter. > >> Other words any server node which is not filtered out by node filter. > >> > >> But node which is not in BL(A)T and which passes node filter would be > >> treated as affinity node. And it's definitely wrong. At least, it is a > >> source of many bugs (I believe there are much more than those 2 which I > >> already have mentioned). > >> > >> It's clear that this definition should be changed. > >> Let's start with a new definition of "Affinity topology". Affinity > >> topology > >> is a set of nodes which potentially could keep data. > >> > >> If we use knowledge about the current realization we can say that 1. for > >> in-memory cache groups it would be all server nodes; > >> 2. for persistent cache groups it would be BL(A)T. > >> > >> I will further use Dynamic Affinity Topology or DAT for 1 (in-memory > cache > >> groups) and Static Affinity Topology or SAT instead BL(A)T, or 2nd > point. > >> > >> Denote node filter as f(X), where X is affinity topology. > >> > >> Then we can say that node A is affinity node if > >> A ∈ AT', where AT' = f(AT), where AT is DAT or SAT. > >> > >> It worth to mention that AT' should be used to pass to affinity function > >> of > >> cache groups. > >> Also, AT and AT' could change during the time (BL(A)T changes or node > >> joins/disconnections). > >> > >> And I don't like fact that usage of DAT or SAT relies on persistence > >> settings (Should we make it configurable per cache group?). > >> > >> Ok, I have created a ticket to implement this changes and will start > >> working on it. > >> https://issues.apache.org/jira/browse/IGNITE-8380 (Affinity node > >> calculation doesn't take into account BLT). > >> > >> Also, I want to use these definitions (Affinity Topology, Affinity Node, > >> DAT, SAT) in documentation and java docs. > >> > >> Maybe, we also should consider replacing BL(A)T with SAT. > >> > >> Thank you for your attention. > >> > >> > > > |
Stan,
I believe it was discussed at the design proposal thread: http://apache-ignite-developers.2346864.n4.nabble.com/Cluster-auto-activation-design-proposal-td20295.html The short answer: backup factor decreases if node leaves. In non-persistent mode we have to rebalance data ASAP - otherwise last node that owns partition may fail and data will be lost forever. This is not necessary if data is persisted to disk storage, that's the reason for Baseline Topology concept. Best Regards, Ivan Rakov On 24.04.2018 18:48, Stanislav Lukyanov wrote: > + for Vladimir's point - adding more complexity may (and likely will) be > even more misleading. > > Can we take a step back and discuss why do we need to have different > behavior for persistent and in-memory caches? Can we make in-memory caches > honor baseline instead of special-casing them? > > Thanks, > Stan > > > вт, 24 апр. 2018 г., 18:28 Vladimir Ozerov <[hidden email]>: > >> Guys, >> >> As a user I definitely do not want to think about BLATs, SATs, DATs, >> whatsoever. I want to query data, iterate over data, send compute tasks to >> data. If certain node is outside of BLAT and do not have data, then this is >> not affinity node. Can we just fix affinity logic to take in count BLAT >> appropriately? >> >> On Tue, Apr 24, 2018 at 6:12 PM, Ivan Rakov <[hidden email]> wrote: >> >>> Eduard, >>> >>> Can you please summarize code changes that you are proposing? >>> I agree that BLT is a bit misleading term and DAT/SAT make more sense. >>> However, establishing a consensus on v2.4 Baseline Topology terminology >>> took a long time and seems like you are going to cause a bit more >>> perturbations. >>> I still don't understand what and how should be changed. Please provide >>> summary of upcoming class renamings and changes of existing system parts. >>> >>> Best Regards, >>> Ivan Rakov >>> >>> >>> On 24.04.2018 17:46, Eduard Shangareev wrote: >>> >>>> Hi, Igniters, >>>> >>>> I want to raise a topic about our affinity node definition. >>>> >>>> After adding baseline (affinity) topology (BL(A)T) things start being >>>> complicated. >>>> >>>> Plenty of bugs appears: >>>> >>>> IGNITE-8173 >>>> ignite.getOrCreateCache(cacheConfig).iterator() method works incorrect >>>> for >>>> replicated cache in case if some data node isn't in baseline >>>> >>>> IGNITE-7628 >>>> SqlQuery hangs indefinitely with additional not registered in baseline >>>> node. >>>> >>>> It's because everything relies on concept "affinity node". >>>> And until now it was as simple as a server node which passes node >> filter. >>>> Other words any server node which is not filtered out by node filter. >>>> >>>> But node which is not in BL(A)T and which passes node filter would be >>>> treated as affinity node. And it's definitely wrong. At least, it is a >>>> source of many bugs (I believe there are much more than those 2 which I >>>> already have mentioned). >>>> >>>> It's clear that this definition should be changed. >>>> Let's start with a new definition of "Affinity topology". Affinity >>>> topology >>>> is a set of nodes which potentially could keep data. >>>> >>>> If we use knowledge about the current realization we can say that 1. for >>>> in-memory cache groups it would be all server nodes; >>>> 2. for persistent cache groups it would be BL(A)T. >>>> >>>> I will further use Dynamic Affinity Topology or DAT for 1 (in-memory >> cache >>>> groups) and Static Affinity Topology or SAT instead BL(A)T, or 2nd >> point. >>>> Denote node filter as f(X), where X is affinity topology. >>>> >>>> Then we can say that node A is affinity node if >>>> A ∈ AT', where AT' = f(AT), where AT is DAT or SAT. >>>> >>>> It worth to mention that AT' should be used to pass to affinity function >>>> of >>>> cache groups. >>>> Also, AT and AT' could change during the time (BL(A)T changes or node >>>> joins/disconnections). >>>> >>>> And I don't like fact that usage of DAT or SAT relies on persistence >>>> settings (Should we make it configurable per cache group?). >>>> >>>> Ok, I have created a ticket to implement this changes and will start >>>> working on it. >>>> https://issues.apache.org/jira/browse/IGNITE-8380 (Affinity node >>>> calculation doesn't take into account BLT). >>>> >>>> Also, I want to use these definitions (Affinity Topology, Affinity Node, >>>> DAT, SAT) in documentation and java docs. >>>> >>>> Maybe, we also should consider replacing BL(A)T with SAT. >>>> >>>> Thank you for your attention. >>>> >>>> |
Ivan,
This reasoning sounds questionable to me. First, separate logic for in memory and persistent regions means that we loose collocation between persistent and non persistent caches. Second, “data is still on disk” assumption might be not valid if node has left due to disk crash, or when data is updated on remaining nodes. вт, 24 апр. 2018 г. в 19:21, Ivan Rakov <[hidden email]>: > Stan, > > I believe it was discussed at the design proposal thread: > > http://apache-ignite-developers.2346864.n4.nabble.com/Cluster-auto-activation-design-proposal-td20295.html > > The short answer: backup factor decreases if node leaves. In > non-persistent mode we have to rebalance data ASAP - otherwise last node > that owns partition may fail and data will be lost forever. > This is not necessary if data is persisted to disk storage, that's the > reason for Baseline Topology concept. > > Best Regards, > Ivan Rakov > > On 24.04.2018 18:48, Stanislav Lukyanov wrote: > > + for Vladimir's point - adding more complexity may (and likely will) be > > even more misleading. > > > > Can we take a step back and discuss why do we need to have different > > behavior for persistent and in-memory caches? Can we make in-memory > caches > > honor baseline instead of special-casing them? > > > > Thanks, > > Stan > > > > > > вт, 24 апр. 2018 г., 18:28 Vladimir Ozerov <[hidden email]>: > > > >> Guys, > >> > >> As a user I definitely do not want to think about BLATs, SATs, DATs, > >> whatsoever. I want to query data, iterate over data, send compute tasks > to > >> data. If certain node is outside of BLAT and do not have data, then > this is > >> not affinity node. Can we just fix affinity logic to take in count BLAT > >> appropriately? > >> > >> On Tue, Apr 24, 2018 at 6:12 PM, Ivan Rakov <[hidden email]> > wrote: > >> > >>> Eduard, > >>> > >>> Can you please summarize code changes that you are proposing? > >>> I agree that BLT is a bit misleading term and DAT/SAT make more sense. > >>> However, establishing a consensus on v2.4 Baseline Topology terminology > >>> took a long time and seems like you are going to cause a bit more > >>> perturbations. > >>> I still don't understand what and how should be changed. Please provide > >>> summary of upcoming class renamings and changes of existing system > parts. > >>> > >>> Best Regards, > >>> Ivan Rakov > >>> > >>> > >>> On 24.04.2018 17:46, Eduard Shangareev wrote: > >>> > >>>> Hi, Igniters, > >>>> > >>>> I want to raise a topic about our affinity node definition. > >>>> > >>>> After adding baseline (affinity) topology (BL(A)T) things start being > >>>> complicated. > >>>> > >>>> Plenty of bugs appears: > >>>> > >>>> IGNITE-8173 > >>>> ignite.getOrCreateCache(cacheConfig).iterator() method works incorrect > >>>> for > >>>> replicated cache in case if some data node isn't in baseline > >>>> > >>>> IGNITE-7628 > >>>> SqlQuery hangs indefinitely with additional not registered in baseline > >>>> node. > >>>> > >>>> It's because everything relies on concept "affinity node". > >>>> And until now it was as simple as a server node which passes node > >> filter. > >>>> Other words any server node which is not filtered out by node filter. > >>>> > >>>> But node which is not in BL(A)T and which passes node filter would be > >>>> treated as affinity node. And it's definitely wrong. At least, it is a > >>>> source of many bugs (I believe there are much more than those 2 which > I > >>>> already have mentioned). > >>>> > >>>> It's clear that this definition should be changed. > >>>> Let's start with a new definition of "Affinity topology". Affinity > >>>> topology > >>>> is a set of nodes which potentially could keep data. > >>>> > >>>> If we use knowledge about the current realization we can say that 1. > for > >>>> in-memory cache groups it would be all server nodes; > >>>> 2. for persistent cache groups it would be BL(A)T. > >>>> > >>>> I will further use Dynamic Affinity Topology or DAT for 1 (in-memory > >> cache > >>>> groups) and Static Affinity Topology or SAT instead BL(A)T, or 2nd > >> point. > >>>> Denote node filter as f(X), where X is affinity topology. > >>>> > >>>> Then we can say that node A is affinity node if > >>>> A ∈ AT', where AT' = f(AT), where AT is DAT or SAT. > >>>> > >>>> It worth to mention that AT' should be used to pass to affinity > function > >>>> of > >>>> cache groups. > >>>> Also, AT and AT' could change during the time (BL(A)T changes or node > >>>> joins/disconnections). > >>>> > >>>> And I don't like fact that usage of DAT or SAT relies on persistence > >>>> settings (Should we make it configurable per cache group?). > >>>> > >>>> Ok, I have created a ticket to implement this changes and will start > >>>> working on it. > >>>> https://issues.apache.org/jira/browse/IGNITE-8380 (Affinity node > >>>> calculation doesn't take into account BLT). > >>>> > >>>> Also, I want to use these definitions (Affinity Topology, Affinity > Node, > >>>> DAT, SAT) in documentation and java docs. > >>>> > >>>> Maybe, we also should consider replacing BL(A)T with SAT. > >>>> > >>>> Thank you for your attention. > >>>> > >>>> > > |
Vladimir,
Automatic cluster membership changes may be implemented to grow the topology, but auto-shrinking topology is usually not possible because a process cannot distinguish between a node shutdown and network partitioning. If we want to deal with split-brain scenarios as a grown-up system, we should change the replication strategy within partitions to a consensus algorithm (I really hope we will). None of the consensus algorithms (at least known to me - paxos, raft, ZAB) do auto cluster adjustments based on a internally-detected process failure. I consider baseline topology as a step towards this model. Addressing your second concern, If a node was down for a short period of time, we should (and we do) rebalance only deltas, which is faster than erasing the whole node and moving all data from scratch. 2018-04-24 19:42 GMT+03:00 Vladimir Ozerov <[hidden email]>: > Ivan, > > This reasoning sounds questionable to me. First, separate logic for in > memory and persistent regions means that we loose collocation between > persistent and non persistent caches. Second, “data is still on disk” > assumption might be not valid if node has left due to disk crash, or when > data is updated on remaining nodes. > > вт, 24 апр. 2018 г. в 19:21, Ivan Rakov <[hidden email]>: > > > Stan, > > > > I believe it was discussed at the design proposal thread: > > > > http://apache-ignite-developers.2346864.n4.nabble. > com/Cluster-auto-activation-design-proposal-td20295.html > > > > The short answer: backup factor decreases if node leaves. In > > non-persistent mode we have to rebalance data ASAP - otherwise last node > > that owns partition may fail and data will be lost forever. > > This is not necessary if data is persisted to disk storage, that's the > > reason for Baseline Topology concept. > > > > Best Regards, > > Ivan Rakov > > > > On 24.04.2018 18:48, Stanislav Lukyanov wrote: > > > + for Vladimir's point - adding more complexity may (and likely will) > be > > > even more misleading. > > > > > > Can we take a step back and discuss why do we need to have different > > > behavior for persistent and in-memory caches? Can we make in-memory > > caches > > > honor baseline instead of special-casing them? > > > > > > Thanks, > > > Stan > > > > > > > > > вт, 24 апр. 2018 г., 18:28 Vladimir Ozerov <[hidden email]>: > > > > > >> Guys, > > >> > > >> As a user I definitely do not want to think about BLATs, SATs, DATs, > > >> whatsoever. I want to query data, iterate over data, send compute > tasks > > to > > >> data. If certain node is outside of BLAT and do not have data, then > > this is > > >> not affinity node. Can we just fix affinity logic to take in count > BLAT > > >> appropriately? > > >> > > >> On Tue, Apr 24, 2018 at 6:12 PM, Ivan Rakov <[hidden email]> > > wrote: > > >> > > >>> Eduard, > > >>> > > >>> Can you please summarize code changes that you are proposing? > > >>> I agree that BLT is a bit misleading term and DAT/SAT make more > sense. > > >>> However, establishing a consensus on v2.4 Baseline Topology > terminology > > >>> took a long time and seems like you are going to cause a bit more > > >>> perturbations. > > >>> I still don't understand what and how should be changed. Please > provide > > >>> summary of upcoming class renamings and changes of existing system > > parts. > > >>> > > >>> Best Regards, > > >>> Ivan Rakov > > >>> > > >>> > > >>> On 24.04.2018 17:46, Eduard Shangareev wrote: > > >>> > > >>>> Hi, Igniters, > > >>>> > > >>>> I want to raise a topic about our affinity node definition. > > >>>> > > >>>> After adding baseline (affinity) topology (BL(A)T) things start > being > > >>>> complicated. > > >>>> > > >>>> Plenty of bugs appears: > > >>>> > > >>>> IGNITE-8173 > > >>>> ignite.getOrCreateCache(cacheConfig).iterator() method works > incorrect > > >>>> for > > >>>> replicated cache in case if some data node isn't in baseline > > >>>> > > >>>> IGNITE-7628 > > >>>> SqlQuery hangs indefinitely with additional not registered in > baseline > > >>>> node. > > >>>> > > >>>> It's because everything relies on concept "affinity node". > > >>>> And until now it was as simple as a server node which passes node > > >> filter. > > >>>> Other words any server node which is not filtered out by node > filter. > > >>>> > > >>>> But node which is not in BL(A)T and which passes node filter would > be > > >>>> treated as affinity node. And it's definitely wrong. At least, it > is a > > >>>> source of many bugs (I believe there are much more than those 2 > which > > I > > >>>> already have mentioned). > > >>>> > > >>>> It's clear that this definition should be changed. > > >>>> Let's start with a new definition of "Affinity topology". Affinity > > >>>> topology > > >>>> is a set of nodes which potentially could keep data. > > >>>> > > >>>> If we use knowledge about the current realization we can say that 1. > > for > > >>>> in-memory cache groups it would be all server nodes; > > >>>> 2. for persistent cache groups it would be BL(A)T. > > >>>> > > >>>> I will further use Dynamic Affinity Topology or DAT for 1 (in-memory > > >> cache > > >>>> groups) and Static Affinity Topology or SAT instead BL(A)T, or 2nd > > >> point. > > >>>> Denote node filter as f(X), where X is affinity topology. > > >>>> > > >>>> Then we can say that node A is affinity node if > > >>>> A ∈ AT', where AT' = f(AT), where AT is DAT or SAT. > > >>>> > > >>>> It worth to mention that AT' should be used to pass to affinity > > function > > >>>> of > > >>>> cache groups. > > >>>> Also, AT and AT' could change during the time (BL(A)T changes or > node > > >>>> joins/disconnections). > > >>>> > > >>>> And I don't like fact that usage of DAT or SAT relies on persistence > > >>>> settings (Should we make it configurable per cache group?). > > >>>> > > >>>> Ok, I have created a ticket to implement this changes and will start > > >>>> working on it. > > >>>> https://issues.apache.org/jira/browse/IGNITE-8380 (Affinity node > > >>>> calculation doesn't take into account BLT). > > >>>> > > >>>> Also, I want to use these definitions (Affinity Topology, Affinity > > Node, > > >>>> DAT, SAT) in documentation and java docs. > > >>>> > > >>>> Maybe, we also should consider replacing BL(A)T with SAT. > > >>>> > > >>>> Thank you for your attention. > > >>>> > > >>>> > > > > > |
Igniters,
I have introduced DAT in opposition to BLAT (SAT) because they reflect how Ignite works. But I actually have concerns about the necessity of such separation. DAT exists only because we don't want to lose any data in in-memory caches. But there are alternatives. Besides BLAT auto-change policies I would pay attention to next approach: - for in-memory caches, affinity would calculate with SAT/BLAT on the first step and because of it collocation would work between in-memory and persistent caches; - on the next step, if there are offline nodes, we would spread their partitions among alive nodes. This would save us from data loss. I don't want to propose any changes until we don't have consensus. On Tue, Apr 24, 2018 at 7:55 PM, Alexey Goncharuk < [hidden email]> wrote: > Vladimir, > > Automatic cluster membership changes may be implemented to grow the > topology, but auto-shrinking topology is usually not possible because a > process cannot distinguish between a node shutdown and network > partitioning. If we want to deal with split-brain scenarios as a grown-up > system, we should change the replication strategy within partitions to a > consensus algorithm (I really hope we will). None of the consensus > algorithms (at least known to me - paxos, raft, ZAB) do auto cluster > adjustments based on a internally-detected process failure. I consider > baseline topology as a step towards this model. > > Addressing your second concern, If a node was down for a short period of > time, we should (and we do) rebalance only deltas, which is faster than > erasing the whole node and moving all data from scratch. > > 2018-04-24 19:42 GMT+03:00 Vladimir Ozerov <[hidden email]>: > > > Ivan, > > > > This reasoning sounds questionable to me. First, separate logic for in > > memory and persistent regions means that we loose collocation between > > persistent and non persistent caches. Second, “data is still on disk” > > assumption might be not valid if node has left due to disk crash, or when > > data is updated on remaining nodes. > > > > вт, 24 апр. 2018 г. в 19:21, Ivan Rakov <[hidden email]>: > > > > > Stan, > > > > > > I believe it was discussed at the design proposal thread: > > > > > > http://apache-ignite-developers.2346864.n4.nabble. > > com/Cluster-auto-activation-design-proposal-td20295.html > > > > > > The short answer: backup factor decreases if node leaves. In > > > non-persistent mode we have to rebalance data ASAP - otherwise last > node > > > that owns partition may fail and data will be lost forever. > > > This is not necessary if data is persisted to disk storage, that's the > > > reason for Baseline Topology concept. > > > > > > Best Regards, > > > Ivan Rakov > > > > > > On 24.04.2018 18:48, Stanislav Lukyanov wrote: > > > > + for Vladimir's point - adding more complexity may (and likely will) > > be > > > > even more misleading. > > > > > > > > Can we take a step back and discuss why do we need to have different > > > > behavior for persistent and in-memory caches? Can we make in-memory > > > caches > > > > honor baseline instead of special-casing them? > > > > > > > > Thanks, > > > > Stan > > > > > > > > > > > > вт, 24 апр. 2018 г., 18:28 Vladimir Ozerov <[hidden email]>: > > > > > > > >> Guys, > > > >> > > > >> As a user I definitely do not want to think about BLATs, SATs, DATs, > > > >> whatsoever. I want to query data, iterate over data, send compute > > tasks > > > to > > > >> data. If certain node is outside of BLAT and do not have data, then > > > this is > > > >> not affinity node. Can we just fix affinity logic to take in count > > BLAT > > > >> appropriately? > > > >> > > > >> On Tue, Apr 24, 2018 at 6:12 PM, Ivan Rakov <[hidden email]> > > > wrote: > > > >> > > > >>> Eduard, > > > >>> > > > >>> Can you please summarize code changes that you are proposing? > > > >>> I agree that BLT is a bit misleading term and DAT/SAT make more > > sense. > > > >>> However, establishing a consensus on v2.4 Baseline Topology > > terminology > > > >>> took a long time and seems like you are going to cause a bit more > > > >>> perturbations. > > > >>> I still don't understand what and how should be changed. Please > > provide > > > >>> summary of upcoming class renamings and changes of existing system > > > parts. > > > >>> > > > >>> Best Regards, > > > >>> Ivan Rakov > > > >>> > > > >>> > > > >>> On 24.04.2018 17:46, Eduard Shangareev wrote: > > > >>> > > > >>>> Hi, Igniters, > > > >>>> > > > >>>> I want to raise a topic about our affinity node definition. > > > >>>> > > > >>>> After adding baseline (affinity) topology (BL(A)T) things start > > being > > > >>>> complicated. > > > >>>> > > > >>>> Plenty of bugs appears: > > > >>>> > > > >>>> IGNITE-8173 > > > >>>> ignite.getOrCreateCache(cacheConfig).iterator() method works > > incorrect > > > >>>> for > > > >>>> replicated cache in case if some data node isn't in baseline > > > >>>> > > > >>>> IGNITE-7628 > > > >>>> SqlQuery hangs indefinitely with additional not registered in > > baseline > > > >>>> node. > > > >>>> > > > >>>> It's because everything relies on concept "affinity node". > > > >>>> And until now it was as simple as a server node which passes node > > > >> filter. > > > >>>> Other words any server node which is not filtered out by node > > filter. > > > >>>> > > > >>>> But node which is not in BL(A)T and which passes node filter would > > be > > > >>>> treated as affinity node. And it's definitely wrong. At least, it > > is a > > > >>>> source of many bugs (I believe there are much more than those 2 > > which > > > I > > > >>>> already have mentioned). > > > >>>> > > > >>>> It's clear that this definition should be changed. > > > >>>> Let's start with a new definition of "Affinity topology". Affinity > > > >>>> topology > > > >>>> is a set of nodes which potentially could keep data. > > > >>>> > > > >>>> If we use knowledge about the current realization we can say that > 1. > > > for > > > >>>> in-memory cache groups it would be all server nodes; > > > >>>> 2. for persistent cache groups it would be BL(A)T. > > > >>>> > > > >>>> I will further use Dynamic Affinity Topology or DAT for 1 > (in-memory > > > >> cache > > > >>>> groups) and Static Affinity Topology or SAT instead BL(A)T, or 2nd > > > >> point. > > > >>>> Denote node filter as f(X), where X is affinity topology. > > > >>>> > > > >>>> Then we can say that node A is affinity node if > > > >>>> A ∈ AT', where AT' = f(AT), where AT is DAT or SAT. > > > >>>> > > > >>>> It worth to mention that AT' should be used to pass to affinity > > > function > > > >>>> of > > > >>>> cache groups. > > > >>>> Also, AT and AT' could change during the time (BL(A)T changes or > > node > > > >>>> joins/disconnections). > > > >>>> > > > >>>> And I don't like fact that usage of DAT or SAT relies on > persistence > > > >>>> settings (Should we make it configurable per cache group?). > > > >>>> > > > >>>> Ok, I have created a ticket to implement this changes and will > start > > > >>>> working on it. > > > >>>> https://issues.apache.org/jira/browse/IGNITE-8380 (Affinity node > > > >>>> calculation doesn't take into account BLT). > > > >>>> > > > >>>> Also, I want to use these definitions (Affinity Topology, Affinity > > > Node, > > > >>>> DAT, SAT) in documentation and java docs. > > > >>>> > > > >>>> Maybe, we also should consider replacing BL(A)T with SAT. > > > >>>> > > > >>>> Thank you for your attention. > > > >>>> > > > >>>> > > > > > > > > > |
> - for in-memory caches, affinity would calculate with SAT/BLAT on the first
> step and because of it collocation would work between in-memory and > persistent caches; > - on the next step, if there are offline nodes, we would spread their > partitions among alive nodes. This would save us from data loss. +1 to this approach. I can't estimate how hard is it to implement, but seems like it solves both collocation and data loss issues. Best Regards, Ivan Rakov On 24.04.2018 20:29, Eduard Shangareev wrote: > Igniters, > > I have introduced DAT in opposition to BLAT (SAT) because they reflect how > Ignite works. > > But I actually have concerns about the necessity of such separation. > > DAT exists only because we don't want to lose any data in in-memory caches. > > But there are alternatives. Besides BLAT auto-change policies I would pay > attention to next approach: > - for in-memory caches, affinity would calculate with SAT/BLAT on the first > step and because of it collocation would work between in-memory and > persistent caches; > - on the next step, if there are offline nodes, we would spread their > partitions among alive nodes. This would save us from data loss. > > I don't want to propose any changes until we don't have consensus. > > > > On Tue, Apr 24, 2018 at 7:55 PM, Alexey Goncharuk < > [hidden email]> wrote: > >> Vladimir, >> >> Automatic cluster membership changes may be implemented to grow the >> topology, but auto-shrinking topology is usually not possible because a >> process cannot distinguish between a node shutdown and network >> partitioning. If we want to deal with split-brain scenarios as a grown-up >> system, we should change the replication strategy within partitions to a >> consensus algorithm (I really hope we will). None of the consensus >> algorithms (at least known to me - paxos, raft, ZAB) do auto cluster >> adjustments based on a internally-detected process failure. I consider >> baseline topology as a step towards this model. >> >> Addressing your second concern, If a node was down for a short period of >> time, we should (and we do) rebalance only deltas, which is faster than >> erasing the whole node and moving all data from scratch. >> >> 2018-04-24 19:42 GMT+03:00 Vladimir Ozerov <[hidden email]>: >> >>> Ivan, >>> >>> This reasoning sounds questionable to me. First, separate logic for in >>> memory and persistent regions means that we loose collocation between >>> persistent and non persistent caches. Second, “data is still on disk” >>> assumption might be not valid if node has left due to disk crash, or when >>> data is updated on remaining nodes. >>> >>> вт, 24 апр. 2018 г. в 19:21, Ivan Rakov <[hidden email]>: >>> >>>> Stan, >>>> >>>> I believe it was discussed at the design proposal thread: >>>> >>>> http://apache-ignite-developers.2346864.n4.nabble. >>> com/Cluster-auto-activation-design-proposal-td20295.html >>>> The short answer: backup factor decreases if node leaves. In >>>> non-persistent mode we have to rebalance data ASAP - otherwise last >> node >>>> that owns partition may fail and data will be lost forever. >>>> This is not necessary if data is persisted to disk storage, that's the >>>> reason for Baseline Topology concept. >>>> >>>> Best Regards, >>>> Ivan Rakov >>>> >>>> On 24.04.2018 18:48, Stanislav Lukyanov wrote: >>>>> + for Vladimir's point - adding more complexity may (and likely will) >>> be >>>>> even more misleading. >>>>> >>>>> Can we take a step back and discuss why do we need to have different >>>>> behavior for persistent and in-memory caches? Can we make in-memory >>>> caches >>>>> honor baseline instead of special-casing them? >>>>> >>>>> Thanks, >>>>> Stan >>>>> >>>>> >>>>> вт, 24 апр. 2018 г., 18:28 Vladimir Ozerov <[hidden email]>: >>>>> >>>>>> Guys, >>>>>> >>>>>> As a user I definitely do not want to think about BLATs, SATs, DATs, >>>>>> whatsoever. I want to query data, iterate over data, send compute >>> tasks >>>> to >>>>>> data. If certain node is outside of BLAT and do not have data, then >>>> this is >>>>>> not affinity node. Can we just fix affinity logic to take in count >>> BLAT >>>>>> appropriately? >>>>>> >>>>>> On Tue, Apr 24, 2018 at 6:12 PM, Ivan Rakov <[hidden email]> >>>> wrote: >>>>>>> Eduard, >>>>>>> >>>>>>> Can you please summarize code changes that you are proposing? >>>>>>> I agree that BLT is a bit misleading term and DAT/SAT make more >>> sense. >>>>>>> However, establishing a consensus on v2.4 Baseline Topology >>> terminology >>>>>>> took a long time and seems like you are going to cause a bit more >>>>>>> perturbations. >>>>>>> I still don't understand what and how should be changed. Please >>> provide >>>>>>> summary of upcoming class renamings and changes of existing system >>>> parts. >>>>>>> Best Regards, >>>>>>> Ivan Rakov >>>>>>> >>>>>>> >>>>>>> On 24.04.2018 17:46, Eduard Shangareev wrote: >>>>>>> >>>>>>>> Hi, Igniters, >>>>>>>> >>>>>>>> I want to raise a topic about our affinity node definition. >>>>>>>> >>>>>>>> After adding baseline (affinity) topology (BL(A)T) things start >>> being >>>>>>>> complicated. >>>>>>>> >>>>>>>> Plenty of bugs appears: >>>>>>>> >>>>>>>> IGNITE-8173 >>>>>>>> ignite.getOrCreateCache(cacheConfig).iterator() method works >>> incorrect >>>>>>>> for >>>>>>>> replicated cache in case if some data node isn't in baseline >>>>>>>> >>>>>>>> IGNITE-7628 >>>>>>>> SqlQuery hangs indefinitely with additional not registered in >>> baseline >>>>>>>> node. >>>>>>>> >>>>>>>> It's because everything relies on concept "affinity node". >>>>>>>> And until now it was as simple as a server node which passes node >>>>>> filter. >>>>>>>> Other words any server node which is not filtered out by node >>> filter. >>>>>>>> But node which is not in BL(A)T and which passes node filter would >>> be >>>>>>>> treated as affinity node. And it's definitely wrong. At least, it >>> is a >>>>>>>> source of many bugs (I believe there are much more than those 2 >>> which >>>> I >>>>>>>> already have mentioned). >>>>>>>> >>>>>>>> It's clear that this definition should be changed. >>>>>>>> Let's start with a new definition of "Affinity topology". Affinity >>>>>>>> topology >>>>>>>> is a set of nodes which potentially could keep data. >>>>>>>> >>>>>>>> If we use knowledge about the current realization we can say that >> 1. >>>> for >>>>>>>> in-memory cache groups it would be all server nodes; >>>>>>>> 2. for persistent cache groups it would be BL(A)T. >>>>>>>> >>>>>>>> I will further use Dynamic Affinity Topology or DAT for 1 >> (in-memory >>>>>> cache >>>>>>>> groups) and Static Affinity Topology or SAT instead BL(A)T, or 2nd >>>>>> point. >>>>>>>> Denote node filter as f(X), where X is affinity topology. >>>>>>>> >>>>>>>> Then we can say that node A is affinity node if >>>>>>>> A ∈ AT', where AT' = f(AT), where AT is DAT or SAT. >>>>>>>> >>>>>>>> It worth to mention that AT' should be used to pass to affinity >>>> function >>>>>>>> of >>>>>>>> cache groups. >>>>>>>> Also, AT and AT' could change during the time (BL(A)T changes or >>> node >>>>>>>> joins/disconnections). >>>>>>>> >>>>>>>> And I don't like fact that usage of DAT or SAT relies on >> persistence >>>>>>>> settings (Should we make it configurable per cache group?). >>>>>>>> >>>>>>>> Ok, I have created a ticket to implement this changes and will >> start >>>>>>>> working on it. >>>>>>>> https://issues.apache.org/jira/browse/IGNITE-8380 (Affinity node >>>>>>>> calculation doesn't take into account BLT). >>>>>>>> >>>>>>>> Also, I want to use these definitions (Affinity Topology, Affinity >>>> Node, >>>>>>>> DAT, SAT) in documentation and java docs. >>>>>>>> >>>>>>>> Maybe, we also should consider replacing BL(A)T with SAT. >>>>>>>> >>>>>>>> Thank you for your attention. >>>>>>>> >>>>>>>> >>>> |
In reply to this post by Alexey Goncharuk
Alex,
CockroachDB is based on RAFT and is able to repair itself automatically [1] [2]. Their approach looks reasonable to me and is pretty much similar to MongoDB and Cassandra. In short, you distinguish between short-term and long-term failures. 1) First, you wait for small time window in hope that it was a network glitch or restart. Even if this was a segmentation, with true consensus algorithm this is not an issue - you partitions or the whole cluster is unavailable during this window. 2) Then, if majority is still there and cluster is operational you trigger automatic rebalance. 3) Last, if you need fine-grained control you can tune or disable auto-rebalance and do some manual magic. This is very nice approach: it is simple for simple use cases and complex for complex use cases. Ideally, this is how Ignite should work. Want to play and write hello-world app? Just learn what cache is. Started developing moderately complex application? Learn about affinity, cache modes, etc.. Going to enterprise scale? Learn about BLAT, activation, etc.. It seems that old behavior without BLAT and even without manual activation would be enough for majority of our users. At the very least it is enough for order of magnitude more popular Cassandra and MongoDB. [1] https://www.cockroachlabs.com/docs/stable/frequently-asked-questions.html#how-does-cockroachdb-survive-failures [2] https://www.cockroachlabs.com/docs/stable/training/fault-tolerance-and-automated-repair.html On Tue, Apr 24, 2018 at 7:55 PM, Alexey Goncharuk < [hidden email]> wrote: > Vladimir, > > Automatic cluster membership changes may be implemented to grow the > topology, but auto-shrinking topology is usually not possible because a > process cannot distinguish between a node shutdown and network > partitioning. If we want to deal with split-brain scenarios as a grown-up > system, we should change the replication strategy within partitions to a > consensus algorithm (I really hope we will). None of the consensus > algorithms (at least known to me - paxos, raft, ZAB) do auto cluster > adjustments based on a internally-detected process failure. I consider > baseline topology as a step towards this model. > > Addressing your second concern, If a node was down for a short period of > time, we should (and we do) rebalance only deltas, which is faster than > erasing the whole node and moving all data from scratch. > > 2018-04-24 19:42 GMT+03:00 Vladimir Ozerov <[hidden email]>: > > > Ivan, > > > > This reasoning sounds questionable to me. First, separate logic for in > > memory and persistent regions means that we loose collocation between > > persistent and non persistent caches. Second, “data is still on disk” > > assumption might be not valid if node has left due to disk crash, or when > > data is updated on remaining nodes. > > > > вт, 24 апр. 2018 г. в 19:21, Ivan Rakov <[hidden email]>: > > > > > Stan, > > > > > > I believe it was discussed at the design proposal thread: > > > > > > http://apache-ignite-developers.2346864.n4.nabble. > > com/Cluster-auto-activation-design-proposal-td20295.html > > > > > > The short answer: backup factor decreases if node leaves. In > > > non-persistent mode we have to rebalance data ASAP - otherwise last > node > > > that owns partition may fail and data will be lost forever. > > > This is not necessary if data is persisted to disk storage, that's the > > > reason for Baseline Topology concept. > > > > > > Best Regards, > > > Ivan Rakov > > > > > > On 24.04.2018 18:48, Stanislav Lukyanov wrote: > > > > + for Vladimir's point - adding more complexity may (and likely will) > > be > > > > even more misleading. > > > > > > > > Can we take a step back and discuss why do we need to have different > > > > behavior for persistent and in-memory caches? Can we make in-memory > > > caches > > > > honor baseline instead of special-casing them? > > > > > > > > Thanks, > > > > Stan > > > > > > > > > > > > вт, 24 апр. 2018 г., 18:28 Vladimir Ozerov <[hidden email]>: > > > > > > > >> Guys, > > > >> > > > >> As a user I definitely do not want to think about BLATs, SATs, DATs, > > > >> whatsoever. I want to query data, iterate over data, send compute > > tasks > > > to > > > >> data. If certain node is outside of BLAT and do not have data, then > > > this is > > > >> not affinity node. Can we just fix affinity logic to take in count > > BLAT > > > >> appropriately? > > > >> > > > >> On Tue, Apr 24, 2018 at 6:12 PM, Ivan Rakov <[hidden email]> > > > wrote: > > > >> > > > >>> Eduard, > > > >>> > > > >>> Can you please summarize code changes that you are proposing? > > > >>> I agree that BLT is a bit misleading term and DAT/SAT make more > > sense. > > > >>> However, establishing a consensus on v2.4 Baseline Topology > > terminology > > > >>> took a long time and seems like you are going to cause a bit more > > > >>> perturbations. > > > >>> I still don't understand what and how should be changed. Please > > provide > > > >>> summary of upcoming class renamings and changes of existing system > > > parts. > > > >>> > > > >>> Best Regards, > > > >>> Ivan Rakov > > > >>> > > > >>> > > > >>> On 24.04.2018 17:46, Eduard Shangareev wrote: > > > >>> > > > >>>> Hi, Igniters, > > > >>>> > > > >>>> I want to raise a topic about our affinity node definition. > > > >>>> > > > >>>> After adding baseline (affinity) topology (BL(A)T) things start > > being > > > >>>> complicated. > > > >>>> > > > >>>> Plenty of bugs appears: > > > >>>> > > > >>>> IGNITE-8173 > > > >>>> ignite.getOrCreateCache(cacheConfig).iterator() method works > > incorrect > > > >>>> for > > > >>>> replicated cache in case if some data node isn't in baseline > > > >>>> > > > >>>> IGNITE-7628 > > > >>>> SqlQuery hangs indefinitely with additional not registered in > > baseline > > > >>>> node. > > > >>>> > > > >>>> It's because everything relies on concept "affinity node". > > > >>>> And until now it was as simple as a server node which passes node > > > >> filter. > > > >>>> Other words any server node which is not filtered out by node > > filter. > > > >>>> > > > >>>> But node which is not in BL(A)T and which passes node filter would > > be > > > >>>> treated as affinity node. And it's definitely wrong. At least, it > > is a > > > >>>> source of many bugs (I believe there are much more than those 2 > > which > > > I > > > >>>> already have mentioned). > > > >>>> > > > >>>> It's clear that this definition should be changed. > > > >>>> Let's start with a new definition of "Affinity topology". Affinity > > > >>>> topology > > > >>>> is a set of nodes which potentially could keep data. > > > >>>> > > > >>>> If we use knowledge about the current realization we can say that > 1. > > > for > > > >>>> in-memory cache groups it would be all server nodes; > > > >>>> 2. for persistent cache groups it would be BL(A)T. > > > >>>> > > > >>>> I will further use Dynamic Affinity Topology or DAT for 1 > (in-memory > > > >> cache > > > >>>> groups) and Static Affinity Topology or SAT instead BL(A)T, or 2nd > > > >> point. > > > >>>> Denote node filter as f(X), where X is affinity topology. > > > >>>> > > > >>>> Then we can say that node A is affinity node if > > > >>>> A ∈ AT', where AT' = f(AT), where AT is DAT or SAT. > > > >>>> > > > >>>> It worth to mention that AT' should be used to pass to affinity > > > function > > > >>>> of > > > >>>> cache groups. > > > >>>> Also, AT and AT' could change during the time (BL(A)T changes or > > node > > > >>>> joins/disconnections). > > > >>>> > > > >>>> And I don't like fact that usage of DAT or SAT relies on > persistence > > > >>>> settings (Should we make it configurable per cache group?). > > > >>>> > > > >>>> Ok, I have created a ticket to implement this changes and will > start > > > >>>> working on it. > > > >>>> https://issues.apache.org/jira/browse/IGNITE-8380 (Affinity node > > > >>>> calculation doesn't take into account BLT). > > > >>>> > > > >>>> Also, I want to use these definitions (Affinity Topology, Affinity > > > Node, > > > >>>> DAT, SAT) in documentation and java docs. > > > >>>> > > > >>>> Maybe, we also should consider replacing BL(A)T with SAT. > > > >>>> > > > >>>> Thank you for your attention. > > > >>>> > > > >>>> > > > > > > > > > |
Well, this means that the concept of baseline is still needed because we
must not reassign partitions immediately (note that this is not identical to rebalance delay!). The approach you describe is identical to baseline change policies and I have nothing against this, their implementation was planned to phase II of baseline changes. 2018-04-24 21:31 GMT+03:00 Vladimir Ozerov <[hidden email]>: > Alex, > > CockroachDB is based on RAFT and is able to repair itself automatically [1] > [2]. Their approach looks reasonable to me and is pretty much similar to > MongoDB and Cassandra. In short, you distinguish between short-term and > long-term failures. > 1) First, you wait for small time window in hope that it was a network > glitch or restart. Even if this was a segmentation, with true consensus > algorithm this is not an issue - you partitions or the whole cluster is > unavailable during this window. > 2) Then, if majority is still there and cluster is operational you trigger > automatic rebalance. > 3) Last, if you need fine-grained control you can tune or disable > auto-rebalance and do some manual magic. > > This is very nice approach: it is simple for simple use cases and complex > for complex use cases. Ideally, this is how Ignite should work. Want to > play and write hello-world app? Just learn what cache is. Started > developing moderately complex application? Learn about affinity, cache > modes, etc.. Going to enterprise scale? Learn about BLAT, activation, etc.. > > It seems that old behavior without BLAT and even without manual activation > would be enough for majority of our users. At the very least it is enough > for order of magnitude more popular Cassandra and MongoDB. > > [1] > https://www.cockroachlabs.com/docs/stable/frequently-asked- > questions.html#how-does-cockroachdb-survive-failures > [2] > https://www.cockroachlabs.com/docs/stable/training/fault- > tolerance-and-automated-repair.html > > On Tue, Apr 24, 2018 at 7:55 PM, Alexey Goncharuk < > [hidden email]> wrote: > > > Vladimir, > > > > Automatic cluster membership changes may be implemented to grow the > > topology, but auto-shrinking topology is usually not possible because a > > process cannot distinguish between a node shutdown and network > > partitioning. If we want to deal with split-brain scenarios as a grown-up > > system, we should change the replication strategy within partitions to a > > consensus algorithm (I really hope we will). None of the consensus > > algorithms (at least known to me - paxos, raft, ZAB) do auto cluster > > adjustments based on a internally-detected process failure. I consider > > baseline topology as a step towards this model. > > > > Addressing your second concern, If a node was down for a short period of > > time, we should (and we do) rebalance only deltas, which is faster than > > erasing the whole node and moving all data from scratch. > > > > 2018-04-24 19:42 GMT+03:00 Vladimir Ozerov <[hidden email]>: > > > > > Ivan, > > > > > > This reasoning sounds questionable to me. First, separate logic for in > > > memory and persistent regions means that we loose collocation between > > > persistent and non persistent caches. Second, “data is still on disk” > > > assumption might be not valid if node has left due to disk crash, or > when > > > data is updated on remaining nodes. > > > > > > вт, 24 апр. 2018 г. в 19:21, Ivan Rakov <[hidden email]>: > > > > > > > Stan, > > > > > > > > I believe it was discussed at the design proposal thread: > > > > > > > > http://apache-ignite-developers.2346864.n4.nabble. > > > com/Cluster-auto-activation-design-proposal-td20295.html > > > > > > > > The short answer: backup factor decreases if node leaves. In > > > > non-persistent mode we have to rebalance data ASAP - otherwise last > > node > > > > that owns partition may fail and data will be lost forever. > > > > This is not necessary if data is persisted to disk storage, that's > the > > > > reason for Baseline Topology concept. > > > > > > > > Best Regards, > > > > Ivan Rakov > > > > > > > > On 24.04.2018 18:48, Stanislav Lukyanov wrote: > > > > > + for Vladimir's point - adding more complexity may (and likely > will) > > > be > > > > > even more misleading. > > > > > > > > > > Can we take a step back and discuss why do we need to have > different > > > > > behavior for persistent and in-memory caches? Can we make in-memory > > > > caches > > > > > honor baseline instead of special-casing them? > > > > > > > > > > Thanks, > > > > > Stan > > > > > > > > > > > > > > > вт, 24 апр. 2018 г., 18:28 Vladimir Ozerov <[hidden email]>: > > > > > > > > > >> Guys, > > > > >> > > > > >> As a user I definitely do not want to think about BLATs, SATs, > DATs, > > > > >> whatsoever. I want to query data, iterate over data, send compute > > > tasks > > > > to > > > > >> data. If certain node is outside of BLAT and do not have data, > then > > > > this is > > > > >> not affinity node. Can we just fix affinity logic to take in count > > > BLAT > > > > >> appropriately? > > > > >> > > > > >> On Tue, Apr 24, 2018 at 6:12 PM, Ivan Rakov < > [hidden email]> > > > > wrote: > > > > >> > > > > >>> Eduard, > > > > >>> > > > > >>> Can you please summarize code changes that you are proposing? > > > > >>> I agree that BLT is a bit misleading term and DAT/SAT make more > > > sense. > > > > >>> However, establishing a consensus on v2.4 Baseline Topology > > > terminology > > > > >>> took a long time and seems like you are going to cause a bit more > > > > >>> perturbations. > > > > >>> I still don't understand what and how should be changed. Please > > > provide > > > > >>> summary of upcoming class renamings and changes of existing > system > > > > parts. > > > > >>> > > > > >>> Best Regards, > > > > >>> Ivan Rakov > > > > >>> > > > > >>> > > > > >>> On 24.04.2018 17:46, Eduard Shangareev wrote: > > > > >>> > > > > >>>> Hi, Igniters, > > > > >>>> > > > > >>>> I want to raise a topic about our affinity node definition. > > > > >>>> > > > > >>>> After adding baseline (affinity) topology (BL(A)T) things start > > > being > > > > >>>> complicated. > > > > >>>> > > > > >>>> Plenty of bugs appears: > > > > >>>> > > > > >>>> IGNITE-8173 > > > > >>>> ignite.getOrCreateCache(cacheConfig).iterator() method works > > > incorrect > > > > >>>> for > > > > >>>> replicated cache in case if some data node isn't in baseline > > > > >>>> > > > > >>>> IGNITE-7628 > > > > >>>> SqlQuery hangs indefinitely with additional not registered in > > > baseline > > > > >>>> node. > > > > >>>> > > > > >>>> It's because everything relies on concept "affinity node". > > > > >>>> And until now it was as simple as a server node which passes > node > > > > >> filter. > > > > >>>> Other words any server node which is not filtered out by node > > > filter. > > > > >>>> > > > > >>>> But node which is not in BL(A)T and which passes node filter > would > > > be > > > > >>>> treated as affinity node. And it's definitely wrong. At least, > it > > > is a > > > > >>>> source of many bugs (I believe there are much more than those 2 > > > which > > > > I > > > > >>>> already have mentioned). > > > > >>>> > > > > >>>> It's clear that this definition should be changed. > > > > >>>> Let's start with a new definition of "Affinity topology". > Affinity > > > > >>>> topology > > > > >>>> is a set of nodes which potentially could keep data. > > > > >>>> > > > > >>>> If we use knowledge about the current realization we can say > that > > 1. > > > > for > > > > >>>> in-memory cache groups it would be all server nodes; > > > > >>>> 2. for persistent cache groups it would be BL(A)T. > > > > >>>> > > > > >>>> I will further use Dynamic Affinity Topology or DAT for 1 > > (in-memory > > > > >> cache > > > > >>>> groups) and Static Affinity Topology or SAT instead BL(A)T, or > 2nd > > > > >> point. > > > > >>>> Denote node filter as f(X), where X is affinity topology. > > > > >>>> > > > > >>>> Then we can say that node A is affinity node if > > > > >>>> A ∈ AT', where AT' = f(AT), where AT is DAT or SAT. > > > > >>>> > > > > >>>> It worth to mention that AT' should be used to pass to affinity > > > > function > > > > >>>> of > > > > >>>> cache groups. > > > > >>>> Also, AT and AT' could change during the time (BL(A)T changes or > > > node > > > > >>>> joins/disconnections). > > > > >>>> > > > > >>>> And I don't like fact that usage of DAT or SAT relies on > > persistence > > > > >>>> settings (Should we make it configurable per cache group?). > > > > >>>> > > > > >>>> Ok, I have created a ticket to implement this changes and will > > start > > > > >>>> working on it. > > > > >>>> https://issues.apache.org/jira/browse/IGNITE-8380 (Affinity > node > > > > >>>> calculation doesn't take into account BLT). > > > > >>>> > > > > >>>> Also, I want to use these definitions (Affinity Topology, > Affinity > > > > Node, > > > > >>>> DAT, SAT) in documentation and java docs. > > > > >>>> > > > > >>>> Maybe, we also should consider replacing BL(A)T with SAT. > > > > >>>> > > > > >>>> Thank you for your attention. > > > > >>>> > > > > >>>> > > > > > > > > > > > > > > |
Right, as far as I understand we are not arguing on whether BLT is needed
or not. The main questions are how to properly deliver this feature to users and how to deal with co-location issues between persistent and non-persistent caches. Looks like change policies are the way to go for the first question. As far as co-location, it is important to note that different affinity distribution for in-memory and persistent caches automatically means that we loose SQL joins and predictable behavior of any affinity-based operations. It means that if we calculated the same affinity for persistent and in-memory caches at some point, we cannot re-distribute in-memory caches differently if some nodes go down without breaking co-located computations, am I right? On Tue, Apr 24, 2018 at 10:19 PM, Alexey Goncharuk < [hidden email]> wrote: > Well, this means that the concept of baseline is still needed because we > must not reassign partitions immediately (note that this is not identical > to rebalance delay!). The approach you describe is identical to baseline > change policies and I have nothing against this, their implementation was > planned to phase II of baseline changes. > > 2018-04-24 21:31 GMT+03:00 Vladimir Ozerov <[hidden email]>: > > > Alex, > > > > CockroachDB is based on RAFT and is able to repair itself automatically > [1] > > [2]. Their approach looks reasonable to me and is pretty much similar to > > MongoDB and Cassandra. In short, you distinguish between short-term and > > long-term failures. > > 1) First, you wait for small time window in hope that it was a network > > glitch or restart. Even if this was a segmentation, with true consensus > > algorithm this is not an issue - you partitions or the whole cluster is > > unavailable during this window. > > 2) Then, if majority is still there and cluster is operational you > trigger > > automatic rebalance. > > 3) Last, if you need fine-grained control you can tune or disable > > auto-rebalance and do some manual magic. > > > > This is very nice approach: it is simple for simple use cases and complex > > for complex use cases. Ideally, this is how Ignite should work. Want to > > play and write hello-world app? Just learn what cache is. Started > > developing moderately complex application? Learn about affinity, cache > > modes, etc.. Going to enterprise scale? Learn about BLAT, activation, > etc.. > > > > It seems that old behavior without BLAT and even without manual > activation > > would be enough for majority of our users. At the very least it is enough > > for order of magnitude more popular Cassandra and MongoDB. > > > > [1] > > https://www.cockroachlabs.com/docs/stable/frequently-asked- > > questions.html#how-does-cockroachdb-survive-failures > > [2] > > https://www.cockroachlabs.com/docs/stable/training/fault- > > tolerance-and-automated-repair.html > > > > On Tue, Apr 24, 2018 at 7:55 PM, Alexey Goncharuk < > > [hidden email]> wrote: > > > > > Vladimir, > > > > > > Automatic cluster membership changes may be implemented to grow the > > > topology, but auto-shrinking topology is usually not possible because a > > > process cannot distinguish between a node shutdown and network > > > partitioning. If we want to deal with split-brain scenarios as a > grown-up > > > system, we should change the replication strategy within partitions to > a > > > consensus algorithm (I really hope we will). None of the consensus > > > algorithms (at least known to me - paxos, raft, ZAB) do auto cluster > > > adjustments based on a internally-detected process failure. I consider > > > baseline topology as a step towards this model. > > > > > > Addressing your second concern, If a node was down for a short period > of > > > time, we should (and we do) rebalance only deltas, which is faster than > > > erasing the whole node and moving all data from scratch. > > > > > > 2018-04-24 19:42 GMT+03:00 Vladimir Ozerov <[hidden email]>: > > > > > > > Ivan, > > > > > > > > This reasoning sounds questionable to me. First, separate logic for > in > > > > memory and persistent regions means that we loose collocation between > > > > persistent and non persistent caches. Second, “data is still on disk” > > > > assumption might be not valid if node has left due to disk crash, or > > when > > > > data is updated on remaining nodes. > > > > > > > > вт, 24 апр. 2018 г. в 19:21, Ivan Rakov <[hidden email]>: > > > > > > > > > Stan, > > > > > > > > > > I believe it was discussed at the design proposal thread: > > > > > > > > > > http://apache-ignite-developers.2346864.n4.nabble. > > > > com/Cluster-auto-activation-design-proposal-td20295.html > > > > > > > > > > The short answer: backup factor decreases if node leaves. In > > > > > non-persistent mode we have to rebalance data ASAP - otherwise last > > > node > > > > > that owns partition may fail and data will be lost forever. > > > > > This is not necessary if data is persisted to disk storage, that's > > the > > > > > reason for Baseline Topology concept. > > > > > > > > > > Best Regards, > > > > > Ivan Rakov > > > > > > > > > > On 24.04.2018 18:48, Stanislav Lukyanov wrote: > > > > > > + for Vladimir's point - adding more complexity may (and likely > > will) > > > > be > > > > > > even more misleading. > > > > > > > > > > > > Can we take a step back and discuss why do we need to have > > different > > > > > > behavior for persistent and in-memory caches? Can we make > in-memory > > > > > caches > > > > > > honor baseline instead of special-casing them? > > > > > > > > > > > > Thanks, > > > > > > Stan > > > > > > > > > > > > > > > > > > вт, 24 апр. 2018 г., 18:28 Vladimir Ozerov <[hidden email] > >: > > > > > > > > > > > >> Guys, > > > > > >> > > > > > >> As a user I definitely do not want to think about BLATs, SATs, > > DATs, > > > > > >> whatsoever. I want to query data, iterate over data, send > compute > > > > tasks > > > > > to > > > > > >> data. If certain node is outside of BLAT and do not have data, > > then > > > > > this is > > > > > >> not affinity node. Can we just fix affinity logic to take in > count > > > > BLAT > > > > > >> appropriately? > > > > > >> > > > > > >> On Tue, Apr 24, 2018 at 6:12 PM, Ivan Rakov < > > [hidden email]> > > > > > wrote: > > > > > >> > > > > > >>> Eduard, > > > > > >>> > > > > > >>> Can you please summarize code changes that you are proposing? > > > > > >>> I agree that BLT is a bit misleading term and DAT/SAT make more > > > > sense. > > > > > >>> However, establishing a consensus on v2.4 Baseline Topology > > > > terminology > > > > > >>> took a long time and seems like you are going to cause a bit > more > > > > > >>> perturbations. > > > > > >>> I still don't understand what and how should be changed. Please > > > > provide > > > > > >>> summary of upcoming class renamings and changes of existing > > system > > > > > parts. > > > > > >>> > > > > > >>> Best Regards, > > > > > >>> Ivan Rakov > > > > > >>> > > > > > >>> > > > > > >>> On 24.04.2018 17:46, Eduard Shangareev wrote: > > > > > >>> > > > > > >>>> Hi, Igniters, > > > > > >>>> > > > > > >>>> I want to raise a topic about our affinity node definition. > > > > > >>>> > > > > > >>>> After adding baseline (affinity) topology (BL(A)T) things > start > > > > being > > > > > >>>> complicated. > > > > > >>>> > > > > > >>>> Plenty of bugs appears: > > > > > >>>> > > > > > >>>> IGNITE-8173 > > > > > >>>> ignite.getOrCreateCache(cacheConfig).iterator() method works > > > > incorrect > > > > > >>>> for > > > > > >>>> replicated cache in case if some data node isn't in baseline > > > > > >>>> > > > > > >>>> IGNITE-7628 > > > > > >>>> SqlQuery hangs indefinitely with additional not registered in > > > > baseline > > > > > >>>> node. > > > > > >>>> > > > > > >>>> It's because everything relies on concept "affinity node". > > > > > >>>> And until now it was as simple as a server node which passes > > node > > > > > >> filter. > > > > > >>>> Other words any server node which is not filtered out by node > > > > filter. > > > > > >>>> > > > > > >>>> But node which is not in BL(A)T and which passes node filter > > would > > > > be > > > > > >>>> treated as affinity node. And it's definitely wrong. At least, > > it > > > > is a > > > > > >>>> source of many bugs (I believe there are much more than those > 2 > > > > which > > > > > I > > > > > >>>> already have mentioned). > > > > > >>>> > > > > > >>>> It's clear that this definition should be changed. > > > > > >>>> Let's start with a new definition of "Affinity topology". > > Affinity > > > > > >>>> topology > > > > > >>>> is a set of nodes which potentially could keep data. > > > > > >>>> > > > > > >>>> If we use knowledge about the current realization we can say > > that > > > 1. > > > > > for > > > > > >>>> in-memory cache groups it would be all server nodes; > > > > > >>>> 2. for persistent cache groups it would be BL(A)T. > > > > > >>>> > > > > > >>>> I will further use Dynamic Affinity Topology or DAT for 1 > > > (in-memory > > > > > >> cache > > > > > >>>> groups) and Static Affinity Topology or SAT instead BL(A)T, or > > 2nd > > > > > >> point. > > > > > >>>> Denote node filter as f(X), where X is affinity topology. > > > > > >>>> > > > > > >>>> Then we can say that node A is affinity node if > > > > > >>>> A ∈ AT', where AT' = f(AT), where AT is DAT or SAT. > > > > > >>>> > > > > > >>>> It worth to mention that AT' should be used to pass to > affinity > > > > > function > > > > > >>>> of > > > > > >>>> cache groups. > > > > > >>>> Also, AT and AT' could change during the time (BL(A)T changes > or > > > > node > > > > > >>>> joins/disconnections). > > > > > >>>> > > > > > >>>> And I don't like fact that usage of DAT or SAT relies on > > > persistence > > > > > >>>> settings (Should we make it configurable per cache group?). > > > > > >>>> > > > > > >>>> Ok, I have created a ticket to implement this changes and will > > > start > > > > > >>>> working on it. > > > > > >>>> https://issues.apache.org/jira/browse/IGNITE-8380 (Affinity > > node > > > > > >>>> calculation doesn't take into account BLT). > > > > > >>>> > > > > > >>>> Also, I want to use these definitions (Affinity Topology, > > Affinity > > > > > Node, > > > > > >>>> DAT, SAT) in documentation and java docs. > > > > > >>>> > > > > > >>>> Maybe, we also should consider replacing BL(A)T with SAT. > > > > > >>>> > > > > > >>>> Thank you for your attention. > > > > > >>>> > > > > > >>>> > > > > > > > > > > > > > > > > > > > > |
On Wed, Apr 25, 2018 at 4:13 AM, Vladimir Ozerov <[hidden email]>
wrote: > Right, as far as I understand we are not arguing on whether BLT is needed > or not. The main questions are how to properly deliver this feature to > users and how to deal with co-location issues between persistent and > non-persistent caches. Looks like change policies are the way to go for the > first question. > > As far as co-location, it is important to note that different affinity > distribution for in-memory and persistent caches automatically means that > we loose SQL joins and predictable behavior of any affinity-based > operations. It means that if we calculated the same affinity for persistent > and in-memory caches at some point, we cannot re-distribute in-memory > caches differently if some nodes go down without breaking co-located > computations, am I right? > Vova, you are right, but this is rather an edge case. I doubt there are many users out there who will need to join memory-only with persistent caches. What you are describing would be nice to support, but I would not make it a hard requirement. However, if we choose not to support it, we should have a very good explanation for why not. |
Guys,
I have started a new topic to address the issue with DAT [1]. [1] http://apache-ignite-developers.2346864.n4.nabble.com/IEP-4-Phase-2-Using-BL-A-T-for-in-memory-caches-td29942.html On Tue, Apr 24, 2018 at 11:43 PM, Dmitriy Setrakyan <[hidden email]> wrote: > On Wed, Apr 25, 2018 at 4:13 AM, Vladimir Ozerov <[hidden email]> > wrote: > > > Right, as far as I understand we are not arguing on whether BLT is needed > > or not. The main questions are how to properly deliver this feature to > > users and how to deal with co-location issues between persistent and > > non-persistent caches. Looks like change policies are the way to go for > the > > first question. > > > > As far as co-location, it is important to note that different affinity > > distribution for in-memory and persistent caches automatically means that > > we loose SQL joins and predictable behavior of any affinity-based > > operations. It means that if we calculated the same affinity for > persistent > > and in-memory caches at some point, we cannot re-distribute in-memory > > caches differently if some nodes go down without breaking co-located > > computations, am I right? > > > > Vova, you are right, but this is rather an edge case. I doubt there are > many users out there who will need to join memory-only with persistent > caches. What you are describing would be nice to support, but I would not > make it a hard requirement. However, if we choose not to support it, we > should have a very good explanation for why not. > |
Free forum by Nabble | Edit this page |