Hello, Igniters!
I would like to discuss the implementation of ticket IGNITE-6871. In our Ignite instance there are more than 1000 caches and about 10 cache groups. To minimize the probability of data loss we need to alert when a critical level of redundancy in cluster is reached. So, we need some metric, which will count a minimal partition redundancy level for a cache group. Now there are no MXBeans for cache groups. And since cache groups were introduced, some metrics from CacheMetricsMXBean actually show information about the cache group, but not about the cache. I can implement the new metric (minimal partition redundancy level for cache group) in CacheMetricsMXBean, the same way it was before. In such case we’ll whether need to monitor this metric for all caches or to get somehow information about cache to cache group relation and to monitor this metric for only one cache per cache group. But it’s not transparent to an administrator which cache groups are existing and which caches belong to which cache group. Alternatively, I can implement a new type of MXBean for cache groups and add a new metric to this MXBean. Maybe it will be useful later to add to this MXBean some other cache group related metrics, which now are implemented in CacheMetricsMXBean. So, should I extend existing CacheMetricsMXBean or create a new type of MXBeans for cache groups? |
Hi Alex,
I think the proper approach would be to have a separate MBean for cache groups. It should show average metrics across all the caches in the group and some additional metrics as well. Agree? Also, I am not sure I understand what is "partition redundancy level" and what that metric would show. Can you explain. D. On Tue, Nov 21, 2017 at 2:28 AM, Alex Plehanov <[hidden email]> wrote: > Hello, Igniters! > > > > I would like to discuss the implementation of ticket IGNITE-6871. > > > > In our Ignite instance there are more than 1000 caches and about 10 cache > groups. To minimize the probability of data loss we need to alert when a > critical level of redundancy in cluster is reached. So, we need some > metric, which will count a minimal partition redundancy level for a cache > group. > > > > Now there are no MXBeans for cache groups. And since cache groups were > introduced, some metrics from CacheMetricsMXBean actually show information > about the cache group, but not about the cache. > > > > I can implement the new metric (minimal partition redundancy level for > cache group) in CacheMetricsMXBean, the same way it was before. In such > case we’ll whether need to monitor this metric for all caches or to get > somehow information about cache to cache group relation and to monitor this > metric for only one cache per cache group. But it’s not transparent to an > administrator which cache groups are existing and which caches belong to > which cache group. > > > > Alternatively, I can implement a new type of MXBean for cache groups and > add a new metric to this MXBean. Maybe it will be useful later to add to > this MXBean some other cache group related metrics, which now are > implemented in CacheMetricsMXBean. > > > > So, should I extend existing CacheMetricsMXBean or create a new type of > MXBeans for cache groups? > |
Hello Dmitriy,
I agree. By "minimal partition redundancy level for cache group" I mean minimal number of partition copies among all partitions of this cache group. For example, if we have in our cluster for cache group one partition with 2 copies (1 primary and 1 backup) and other partitions with 4 copies (1 primary and 3 backups), then minimal partition redundancy level for this cache group will be 2. 2017-11-22 5:34 GMT+03:00 Dmitriy Setrakyan <[hidden email]>: > Hi Alex, > > I think the proper approach would be to have a separate MBean for cache > groups. It should show average metrics across all the caches in the group > and some additional metrics as well. Agree? > > Also, I am not sure I understand what is "partition redundancy level" and > what that metric would show. Can you explain. > > D. > > On Tue, Nov 21, 2017 at 2:28 AM, Alex Plehanov <[hidden email]> > wrote: > > > Hello, Igniters! > > > > > > > > I would like to discuss the implementation of ticket IGNITE-6871. > > > > > > > > In our Ignite instance there are more than 1000 caches and about 10 cache > > groups. To minimize the probability of data loss we need to alert when a > > critical level of redundancy in cluster is reached. So, we need some > > metric, which will count a minimal partition redundancy level for a cache > > group. > > > > > > > > Now there are no MXBeans for cache groups. And since cache groups were > > introduced, some metrics from CacheMetricsMXBean actually show > information > > about the cache group, but not about the cache. > > > > > > > > I can implement the new metric (minimal partition redundancy level for > > cache group) in CacheMetricsMXBean, the same way it was before. In such > > case we’ll whether need to monitor this metric for all caches or to get > > somehow information about cache to cache group relation and to monitor > this > > metric for only one cache per cache group. But it’s not transparent to an > > administrator which cache groups are existing and which caches belong to > > which cache group. > > > > > > > > Alternatively, I can implement a new type of MXBean for cache groups and > > add a new metric to this MXBean. Maybe it will be useful later to add to > > this MXBean some other cache group related metrics, which now are > > implemented in CacheMetricsMXBean. > > > > > > > > So, should I extend existing CacheMetricsMXBean or create a new type of > > MXBeans for cache groups? > > > |
On Wed, Nov 22, 2017 at 12:39 PM, Alex Plehanov <[hidden email]>
wrote: > Hello Dmitriy, > > I agree. > > By "minimal partition redundancy level for cache group" I mean minimal > number of partition copies among all partitions of this cache group. > For example, if we have in our cluster for cache group one partition with 2 > copies (1 primary and 1 backup) and other partitions with 4 copies (1 > primary and 3 backups), then minimal partition redundancy level for this > cache group will be 2. > Such configuration within the same group would be impossible. All caches within the same group have identical total number of partitions and identical number of backups. If that is not the case, then they fall into different groups. D. |
It's not about caches.
Each partition has certain amount of copies. Amount of copies may differ for different partitions of one cache group. This configuration possible: 1) With custom affinity function 2) When nodes left the cluster, till rebalancing is not finished 2017-11-23 0:18 GMT+03:00 Dmitriy Setrakyan <[hidden email]>: > On Wed, Nov 22, 2017 at 12:39 PM, Alex Plehanov <[hidden email]> > wrote: > > > Hello Dmitriy, > > > > I agree. > > > > By "minimal partition redundancy level for cache group" I mean minimal > > number of partition copies among all partitions of this cache group. > > For example, if we have in our cluster for cache group one partition > with 2 > > copies (1 primary and 1 backup) and other partitions with 4 copies (1 > > primary and 3 backups), then minimal partition redundancy level for this > > cache group will be 2. > > > > Such configuration within the same group would be impossible. All caches > within the same group have identical total number of partitions and > identical number of backups. If that is not the case, then they fall into > different groups. > > D. > |
I think you are talking about the case when cluster temporarily gets into
unbalanced state and needs to rebalance. However, I am still not sure what this metric would show. Can you provide an example? D. On Wed, Nov 22, 2017 at 2:10 PM, Alex Plehanov <[hidden email]> wrote: > It's not about caches. > Each partition has certain amount of copies. Amount of copies may differ > for different partitions of one cache group. > > This configuration possible: > 1) With custom affinity function > 2) When nodes left the cluster, till rebalancing is not finished > > > > 2017-11-23 0:18 GMT+03:00 Dmitriy Setrakyan <[hidden email]>: > > > On Wed, Nov 22, 2017 at 12:39 PM, Alex Plehanov <[hidden email] > > > > wrote: > > > > > Hello Dmitriy, > > > > > > I agree. > > > > > > By "minimal partition redundancy level for cache group" I mean minimal > > > number of partition copies among all partitions of this cache group. > > > For example, if we have in our cluster for cache group one partition > > with 2 > > > copies (1 primary and 1 backup) and other partitions with 4 copies (1 > > > primary and 3 backups), then minimal partition redundancy level for > this > > > cache group will be 2. > > > > > > > Such configuration within the same group would be impossible. All caches > > within the same group have identical total number of partitions and > > identical number of backups. If that is not the case, then they fall into > > different groups. > > > > D. > > > |
Example was in my previous letters: if we have in our cluster for cache
group one partition with 2 copies (1 primary and 1 backup) and other partitions with 4 copies (1 primary and 3 backups), then minimal partition redundancy level for this cache group will be 2. Maybe code will be more clear than my description, I think it will be something like that: for (int part = 0; part < partitions; part++) { int partRedundancyLevel = 0; for (Map.Entry<UUID, GridDhtPartitionMap> entry : partFullMap.entrySet()) { if (entry.getValue().get(part) == GridDhtPartitionState.OWNING) partRedundancyLevel++; } if (partRedundancyLevel < minRedundancyLevel) minRedundancyLevel = partRedundancyLevel; } 2017-11-23 4:04 GMT+03:00 Dmitriy Setrakyan <[hidden email]>: > I think you are talking about the case when cluster temporarily gets into > unbalanced state and needs to rebalance. However, I am still not sure what > this metric would show. Can you provide an example? > > D. > > On Wed, Nov 22, 2017 at 2:10 PM, Alex Plehanov <[hidden email]> > wrote: > > > It's not about caches. > > Each partition has certain amount of copies. Amount of copies may differ > > for different partitions of one cache group. > > > > This configuration possible: > > 1) With custom affinity function > > 2) When nodes left the cluster, till rebalancing is not finished > > > > > > > > 2017-11-23 0:18 GMT+03:00 Dmitriy Setrakyan <[hidden email]>: > > > > > On Wed, Nov 22, 2017 at 12:39 PM, Alex Plehanov < > [hidden email] > > > > > > wrote: > > > > > > > Hello Dmitriy, > > > > > > > > I agree. > > > > > > > > By "minimal partition redundancy level for cache group" I mean > minimal > > > > number of partition copies among all partitions of this cache group. > > > > For example, if we have in our cluster for cache group one partition > > > with 2 > > > > copies (1 primary and 1 backup) and other partitions with 4 copies (1 > > > > primary and 3 backups), then minimal partition redundancy level for > > this > > > > cache group will be 2. > > > > > > > > > > Such configuration within the same group would be impossible. All > caches > > > within the same group have identical total number of partitions and > > > identical number of backups. If that is not the case, then they fall > into > > > different groups. > > > > > > D. > > > > > > |
Alex,
I am really confused. What do you need to know the "minimal partition redundancy" for? What will it give you? D. On Thu, Nov 23, 2017 at 2:25 PM, Alex Plehanov <[hidden email]> wrote: > Example was in my previous letters: if we have in our cluster for cache > group one partition with 2 copies (1 primary and 1 backup) and other > partitions with 4 copies (1 primary and 3 backups), then minimal partition > redundancy level for this cache group will be 2. > > Maybe code will be more clear than my description, I think it will be > something like that: > > for (int part = 0; part < partitions; part++) { > int partRedundancyLevel = 0; > > for (Map.Entry<UUID, GridDhtPartitionMap> entry : > partFullMap.entrySet()) { > if (entry.getValue().get(part) == > GridDhtPartitionState.OWNING) > partRedundancyLevel++; > } > > if (partRedundancyLevel < minRedundancyLevel) > minRedundancyLevel = partRedundancyLevel; > } > > > 2017-11-23 4:04 GMT+03:00 Dmitriy Setrakyan <[hidden email]>: > > > I think you are talking about the case when cluster temporarily gets into > > unbalanced state and needs to rebalance. However, I am still not sure > what > > this metric would show. Can you provide an example? > > > > D. > > > > On Wed, Nov 22, 2017 at 2:10 PM, Alex Plehanov <[hidden email]> > > wrote: > > > > > It's not about caches. > > > Each partition has certain amount of copies. Amount of copies may > differ > > > for different partitions of one cache group. > > > > > > This configuration possible: > > > 1) With custom affinity function > > > 2) When nodes left the cluster, till rebalancing is not finished > > > > > > > > > > > > 2017-11-23 0:18 GMT+03:00 Dmitriy Setrakyan <[hidden email]>: > > > > > > > On Wed, Nov 22, 2017 at 12:39 PM, Alex Plehanov < > > [hidden email] > > > > > > > > wrote: > > > > > > > > > Hello Dmitriy, > > > > > > > > > > I agree. > > > > > > > > > > By "minimal partition redundancy level for cache group" I mean > > minimal > > > > > number of partition copies among all partitions of this cache > group. > > > > > For example, if we have in our cluster for cache group one > partition > > > > with 2 > > > > > copies (1 primary and 1 backup) and other partitions with 4 copies > (1 > > > > > primary and 3 backups), then minimal partition redundancy level for > > > this > > > > > cache group will be 2. > > > > > > > > > > > > > Such configuration within the same group would be impossible. All > > caches > > > > within the same group have identical total number of partitions and > > > > identical number of backups. If that is not the case, then they fall > > into > > > > different groups. > > > > > > > > D. > > > > > > > > > > |
We have target redundancy level - 4. If, for some reason, minimal
redundancy level reached the value of 1, then each next node left the cluster may cause data loss or service unavailability. 2017-11-24 1:31 GMT+03:00 Dmitriy Setrakyan <[hidden email]>: > Alex, > > I am really confused. What do you need to know the "minimal partition > redundancy" for? What will it give you? > > D. > > On Thu, Nov 23, 2017 at 2:25 PM, Alex Plehanov <[hidden email]> > wrote: > > > Example was in my previous letters: if we have in our cluster for cache > > group one partition with 2 copies (1 primary and 1 backup) and other > > partitions with 4 copies (1 primary and 3 backups), then minimal > partition > > redundancy level for this cache group will be 2. > > > > Maybe code will be more clear than my description, I think it will be > > something like that: > > > > for (int part = 0; part < partitions; part++) { > > int partRedundancyLevel = 0; > > > > for (Map.Entry<UUID, GridDhtPartitionMap> entry : > > partFullMap.entrySet()) { > > if (entry.getValue().get(part) == > > GridDhtPartitionState.OWNING) > > partRedundancyLevel++; > > } > > > > if (partRedundancyLevel < minRedundancyLevel) > > minRedundancyLevel = partRedundancyLevel; > > } > > > > > > 2017-11-23 4:04 GMT+03:00 Dmitriy Setrakyan <[hidden email]>: > > > > > I think you are talking about the case when cluster temporarily gets > into > > > unbalanced state and needs to rebalance. However, I am still not sure > > what > > > this metric would show. Can you provide an example? > > > > > > D. > > > > > > On Wed, Nov 22, 2017 at 2:10 PM, Alex Plehanov < > [hidden email]> > > > wrote: > > > > > > > It's not about caches. > > > > Each partition has certain amount of copies. Amount of copies may > > differ > > > > for different partitions of one cache group. > > > > > > > > This configuration possible: > > > > 1) With custom affinity function > > > > 2) When nodes left the cluster, till rebalancing is not finished > > > > > > > > > > > > > > > > 2017-11-23 0:18 GMT+03:00 Dmitriy Setrakyan <[hidden email]>: > > > > > > > > > On Wed, Nov 22, 2017 at 12:39 PM, Alex Plehanov < > > > [hidden email] > > > > > > > > > > wrote: > > > > > > > > > > > Hello Dmitriy, > > > > > > > > > > > > I agree. > > > > > > > > > > > > By "minimal partition redundancy level for cache group" I mean > > > minimal > > > > > > number of partition copies among all partitions of this cache > > group. > > > > > > For example, if we have in our cluster for cache group one > > partition > > > > > with 2 > > > > > > copies (1 primary and 1 backup) and other partitions with 4 > copies > > (1 > > > > > > primary and 3 backups), then minimal partition redundancy level > for > > > > this > > > > > > cache group will be 2. > > > > > > > > > > > > > > > > Such configuration within the same group would be impossible. All > > > caches > > > > > within the same group have identical total number of partitions and > > > > > identical number of backups. If that is not the case, then they > fall > > > into > > > > > different groups. > > > > > > > > > > D. > > > > > > > > > > > > > > > |
Got it, but I do not like the name of the metric, I think it is confusing.
I would provide the following metrics: - minNumberOfCopies() - maxNumberOfCopies() Will this work for you? D. On Thu, Nov 23, 2017 at 10:38 PM, Alex Plehanov <[hidden email]> wrote: > We have target redundancy level - 4. If, for some reason, minimal > redundancy level reached the value of 1, then each next node left the > cluster may cause data loss or service unavailability. > > 2017-11-24 1:31 GMT+03:00 Dmitriy Setrakyan <[hidden email]>: > > > Alex, > > > > I am really confused. What do you need to know the "minimal partition > > redundancy" for? What will it give you? > > > > D. > > > > On Thu, Nov 23, 2017 at 2:25 PM, Alex Plehanov <[hidden email]> > > wrote: > > > > > Example was in my previous letters: if we have in our cluster for cache > > > group one partition with 2 copies (1 primary and 1 backup) and other > > > partitions with 4 copies (1 primary and 3 backups), then minimal > > partition > > > redundancy level for this cache group will be 2. > > > > > > Maybe code will be more clear than my description, I think it will be > > > something like that: > > > > > > for (int part = 0; part < partitions; part++) { > > > int partRedundancyLevel = 0; > > > > > > for (Map.Entry<UUID, GridDhtPartitionMap> entry : > > > partFullMap.entrySet()) { > > > if (entry.getValue().get(part) == > > > GridDhtPartitionState.OWNING) > > > partRedundancyLevel++; > > > } > > > > > > if (partRedundancyLevel < minRedundancyLevel) > > > minRedundancyLevel = partRedundancyLevel; > > > } > > > > > > > > > 2017-11-23 4:04 GMT+03:00 Dmitriy Setrakyan <[hidden email]>: > > > > > > > I think you are talking about the case when cluster temporarily gets > > into > > > > unbalanced state and needs to rebalance. However, I am still not sure > > > what > > > > this metric would show. Can you provide an example? > > > > > > > > D. > > > > > > > > On Wed, Nov 22, 2017 at 2:10 PM, Alex Plehanov < > > [hidden email]> > > > > wrote: > > > > > > > > > It's not about caches. > > > > > Each partition has certain amount of copies. Amount of copies may > > > differ > > > > > for different partitions of one cache group. > > > > > > > > > > This configuration possible: > > > > > 1) With custom affinity function > > > > > 2) When nodes left the cluster, till rebalancing is not finished > > > > > > > > > > > > > > > > > > > > 2017-11-23 0:18 GMT+03:00 Dmitriy Setrakyan <[hidden email] > >: > > > > > > > > > > > On Wed, Nov 22, 2017 at 12:39 PM, Alex Plehanov < > > > > [hidden email] > > > > > > > > > > > > wrote: > > > > > > > > > > > > > Hello Dmitriy, > > > > > > > > > > > > > > I agree. > > > > > > > > > > > > > > By "minimal partition redundancy level for cache group" I mean > > > > minimal > > > > > > > number of partition copies among all partitions of this cache > > > group. > > > > > > > For example, if we have in our cluster for cache group one > > > partition > > > > > > with 2 > > > > > > > copies (1 primary and 1 backup) and other partitions with 4 > > copies > > > (1 > > > > > > > primary and 3 backups), then minimal partition redundancy level > > for > > > > > this > > > > > > > cache group will be 2. > > > > > > > > > > > > > > > > > > > Such configuration within the same group would be impossible. All > > > > caches > > > > > > within the same group have identical total number of partitions > and > > > > > > identical number of backups. If that is not the case, then they > > fall > > > > into > > > > > > different groups. > > > > > > > > > > > > D. > > > > > > > > > > > > > > > > > > > > > |
Ok, I will rename the metrics.
2017-11-24 22:55 GMT+03:00 Dmitriy Setrakyan <[hidden email]>: > Got it, but I do not like the name of the metric, I think it is confusing. > > I would provide the following metrics: > - minNumberOfCopies() > - maxNumberOfCopies() > > Will this work for you? > > D. > > On Thu, Nov 23, 2017 at 10:38 PM, Alex Plehanov <[hidden email]> > wrote: > > > We have target redundancy level - 4. If, for some reason, minimal > > redundancy level reached the value of 1, then each next node left the > > cluster may cause data loss or service unavailability. > > > > 2017-11-24 1:31 GMT+03:00 Dmitriy Setrakyan <[hidden email]>: > > > > > Alex, > > > > > > I am really confused. What do you need to know the "minimal partition > > > redundancy" for? What will it give you? > > > > > > D. > > > > > > On Thu, Nov 23, 2017 at 2:25 PM, Alex Plehanov < > [hidden email]> > > > wrote: > > > > > > > Example was in my previous letters: if we have in our cluster for > cache > > > > group one partition with 2 copies (1 primary and 1 backup) and other > > > > partitions with 4 copies (1 primary and 3 backups), then minimal > > > partition > > > > redundancy level for this cache group will be 2. > > > > > > > > Maybe code will be more clear than my description, I think it will be > > > > something like that: > > > > > > > > for (int part = 0; part < partitions; part++) { > > > > int partRedundancyLevel = 0; > > > > > > > > for (Map.Entry<UUID, GridDhtPartitionMap> entry : > > > > partFullMap.entrySet()) { > > > > if (entry.getValue().get(part) == > > > > GridDhtPartitionState.OWNING) > > > > partRedundancyLevel++; > > > > } > > > > > > > > if (partRedundancyLevel < minRedundancyLevel) > > > > minRedundancyLevel = partRedundancyLevel; > > > > } > > > > > > > > > > > > 2017-11-23 4:04 GMT+03:00 Dmitriy Setrakyan <[hidden email]>: > > > > > > > > > I think you are talking about the case when cluster temporarily > gets > > > into > > > > > unbalanced state and needs to rebalance. However, I am still not > sure > > > > what > > > > > this metric would show. Can you provide an example? > > > > > > > > > > D. > > > > > > > > > > On Wed, Nov 22, 2017 at 2:10 PM, Alex Plehanov < > > > [hidden email]> > > > > > wrote: > > > > > > > > > > > It's not about caches. > > > > > > Each partition has certain amount of copies. Amount of copies may > > > > differ > > > > > > for different partitions of one cache group. > > > > > > > > > > > > This configuration possible: > > > > > > 1) With custom affinity function > > > > > > 2) When nodes left the cluster, till rebalancing is not finished > > > > > > > > > > > > > > > > > > > > > > > > 2017-11-23 0:18 GMT+03:00 Dmitriy Setrakyan < > [hidden email] > > >: > > > > > > > > > > > > > On Wed, Nov 22, 2017 at 12:39 PM, Alex Plehanov < > > > > > [hidden email] > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > Hello Dmitriy, > > > > > > > > > > > > > > > > I agree. > > > > > > > > > > > > > > > > By "minimal partition redundancy level for cache group" I > mean > > > > > minimal > > > > > > > > number of partition copies among all partitions of this cache > > > > group. > > > > > > > > For example, if we have in our cluster for cache group one > > > > partition > > > > > > > with 2 > > > > > > > > copies (1 primary and 1 backup) and other partitions with 4 > > > copies > > > > (1 > > > > > > > > primary and 3 backups), then minimal partition redundancy > level > > > for > > > > > > this > > > > > > > > cache group will be 2. > > > > > > > > > > > > > > > > > > > > > > Such configuration within the same group would be impossible. > All > > > > > caches > > > > > > > within the same group have identical total number of partitions > > and > > > > > > > identical number of backups. If that is not the case, then they > > > fall > > > > > into > > > > > > > different groups. > > > > > > > > > > > > > > D. > > > > > > > > > > > > > > > > > > > > > > > > > > > > |
Free forum by Nabble | Edit this page |