Thanks Sasha!
Resending to the dev list. D. On Fri, Jul 8, 2016 at 2:02 PM, Alexandre Boudnik <[hidden email]> wrote: > Apache Ignite a great platform but it lacks of certain capabilities, > which are common in RDMS world, such as: > - Consistent on-line backup for data on entire cluster (or for > specified set of caches) > - Hierarchal snapshots for specified set caches > - Transaction log > - Restore cluster state as of certain point in time > - Rolling forward from snapshot with ability to filter/modify transactions > - Asynchronous replication based either on log shipment or snapshot > shipment > -- Between clusters > -- Continues data export to let’s say RDMS > It is also a necessity to reduce cold start time for huge clusters > with strict SLAs. > > I'll put some implementation ideas in JIRA later on. I believe that > this list is far from being complete, but I want the community to > discuss these abovementioned use cases. > > --Sasha > |
My answers are inline…
On Sat, Jul 9, 2016 at 3:04 AM, Dmitriy Setrakyan <[hidden email]> wrote: > Thanks Sasha! > > Resending to the dev list. > > D. > > On Fri, Jul 8, 2016 at 2:02 PM, Alexandre Boudnik <[hidden email]> > wrote: > >> Apache Ignite a great platform but it lacks of certain capabilities, >> which are common in RDMS world, such as: >> - Consistent on-line backup for data on entire cluster (or for >> specified set of caches) >> > implement, and so far has been handled by commercial vendors of Ignite, e.g. GridGain. > - Hierarchal snapshots for specified set caches >> > What do you mean by hierarchical? > - Transaction log >> > Why does Ignite need it for in-memory transactions? > - Restore cluster state as of certain point in time >> > Given that such restorability may introduce lots of memory overhead, does it really make sense for an in-memory cache? > - Rolling forward from snapshot with ability to filter/modify transactions >> > Same as above > - Asynchronous replication based either on log shipment or snapshot >> shipment >> -- Between clusters >> > This is the same as data center replication, no? > -- Continues data export to let’s say RDMS >> > Don’t we already support it with our write-through feature to a database? > It is also a necessity to reduce cold start time for huge clusters >> with strict SLAs. >> > What part are you trying to speed up here? Are you talking about loading data from databases? > >> I'll put some implementation ideas in JIRA later on. I believe that >> this list is far from being complete, but I want the community to >> discuss these abovementioned use cases. >> >> --Sasha >> > > |
On Mon, Jul 11, 2016 at 08:15AM, Dmitriy Setrakyan wrote:
> My answers are inline… > > On Sat, Jul 9, 2016 at 3:04 AM, Dmitriy Setrakyan <[hidden email]> > wrote: > > > Thanks Sasha! > > > > Resending to the dev list. > > > > D. > > > > On Fri, Jul 8, 2016 at 2:02 PM, Alexandre Boudnik <[hidden email]> > > wrote: > > > >> Apache Ignite a great platform but it lacks of certain capabilities, > >> which are common in RDMS world, such as: > >> - Consistent on-line backup for data on entire cluster (or for > >> specified set of caches) > >> > > > I think you mean data center replication here. It is not an easy feature to > implement, and so far has been handled by commercial vendors of Ignite, > e.g. GridGain. Apache Geode (incubating) has added the DC replication a couple months ago, so I don't see why Ignite shouldn't? Cos > > - Hierarchal snapshots for specified set caches > >> > > > What do you mean by hierarchical? > > > > - Transaction log > >> > > > Why does Ignite need it for in-memory transactions? > > > > - Restore cluster state as of certain point in time > >> > > > Given that such restorability may introduce lots of memory overhead, does > it really make sense for an in-memory cache? > > > > - Rolling forward from snapshot with ability to filter/modify transactions > >> > > > Same as above > > > > - Asynchronous replication based either on log shipment or snapshot > >> shipment > >> -- Between clusters > >> > > > This is the same as data center replication, no? > > > > -- Continues data export to let’s say RDMS > >> > > > Don’t we already support it with our write-through feature to a database? > > > > It is also a necessity to reduce cold start time for huge clusters > >> with strict SLAs. > >> > > > What part are you trying to speed up here? Are you talking about loading > data from databases? > > > > > >> I'll put some implementation ideas in JIRA later on. I believe that > >> this list is far from being complete, but I want the community to > >> discuss these abovementioned use cases. > >> > >> --Sasha > >> > > > > |
In reply to this post by dsetrakyan
Dmitriy, thank you for your time and questions, which helped me to
realize what I forget to mentioned! See my answers inline; later I'll combine everything together to help to the next readers :) I put together some implementation ideas in Apache Ignite JIRA, as promised: https://issues.apache.org/jira/browse/IGNITE-3457. I see this facility as another CacheStore implementation, so it wouldn't interfere with base principals of Ignite platform. On Mon, Jul 11, 2016 at 1:15 AM, Dmitriy Setrakyan <[hidden email]> wrote: > My answers are inline… > > On Sat, Jul 9, 2016 at 3:04 AM, Dmitriy Setrakyan <[hidden email]> > wrote: > >> Thanks Sasha! >> >> Resending to the dev list. >> >> D. >> >> On Fri, Jul 8, 2016 at 2:02 PM, Alexandre Boudnik <[hidden email]> >> wrote: >> >>> Apache Ignite a great platform but it lacks of certain capabilities, >>> which are common in RDMS world, such as: >>> - Consistent on-line backup for data on entire cluster (or for >>> specified set of caches) >>> >> > I think you mean data center replication here. It is not an easy feature to > implement, and so far has been handled by commercial vendors of Ignite, > e.g. GridGain. > incremental backup of all/selected caches in consistent state so it can be used for the purpose of being able to restore them in case of data loss or data corruption. One of important use cases is the OLAP systems (let's say for banking), which has been built on Apache Ignite platform. And you right, data center replication can be easily implemented based on log/snapshot shipment. > >> - Hierarchal snapshots for specified set caches >>> >> > What do you mean by hierarchical? > In this particular case the notion of hierarchical snapshots is very similar to the same notion used in SAN appliances or by Virtual Box or vmware. Using concept of snapshots we can do all this amazing things: - full and incremental backup - restore - rollback to checkpoint - roll forward much easier, with minimal memory and I/O overhead. > >> - Transaction log >>> >> > Why does Ignite need it for in-memory transactions? > At least it is required to provide roll-forward functionality, when you restores the state of the cache from checkpoint (the cache state before snapshot has been made) and then reapply transactions one by one. > >> - Restore cluster state as of certain point in time >>> >> > Given that such restorability may introduce lots of memory overhead, does > it really make sense for an in-memory cache? > Actually, it will not consume any memory. It will use external memory, such as HDD/SSD space instead. And yes, I think that this functionality makes complete sense for our users IRL, who will love it. > >> - Rolling forward from snapshot with ability to filter/modify transactions >>> >> > Same as above > The same as above: my customers in trenches are begging for that feature. > >> - Asynchronous replication based either on log shipment or snapshot >>> shipment >>> -- Between clusters >>> >> > This is the same as data center replication, no? Including but not limited to: log shipment or snapshot shipment also could be used to implement so called "better-than-lambda-architecture" for BI and OLAP, when data replicated to a query-able datasource let's say Oracle as soon as they are produced by OLTP system. We can use RDBMS API such as Oracle Streams (going to be discontinued - sad) or Golden Gate to filter changes from logs/snapshots and then apply them. That approach allows to save a tons of legacy reports and BI dashboards. > > >> -- Continues data export to let’s say RDMS >>> >> > Don’t we already support it with our write-through feature to a database? > When write-through used for non-local caches it may cause the data corruption in RDBMS: I have opened this issue a few weeks ago: https://issues.apache.org/jira/browse/IGNITE-3321 > >> It is also a necessity to reduce cold start time for huge clusters >>> with strict SLAs. >>> >> > What part are you trying to speed up here? Are you talking about loading > data from databases? > I'm talking about the initial load from Persistent Store when cluster has been cold-started (like from GridGain's Local Recoverable Store). > >> >>> I'll put some implementation ideas in JIRA later on. I believe that >>> this list is far from being complete, but I want the community to >>> discuss these abovementioned use cases. >>> >>> --Sasha >>> >> >> |
In reply to this post by Konstantin Boudnik-2
On Tue, Jul 12, 2016 at 6:44 PM, Konstantin Boudnik <[hidden email]> wrote:
> On Mon, Jul 11, 2016 at 08:15AM, Dmitriy Setrakyan wrote: > > > > I think you mean data center replication here. It is not an easy feature > to > > implement, and so far has been handled by commercial vendors of Ignite, > > e.g. GridGain. > > Apache Geode (incubating) has added the DC replication a couple months > ago, so > I don't see why Ignite shouldn't? > > Cos Thanks, Cos! I didn’t know that. I will look into it. |
In reply to this post by Alexandre Boudnik
Hi Alex,
I believe most of your comments have to do with disk-based functionality, especially in regard to backups, snapshots, etc. However, Ignite is currently an in-memory system, at least for the nearest future. Let me know if I misunderstood something. D. On Tue, Jul 12, 2016 at 9:44 PM, Alexandre Boudnik < [hidden email]> wrote: > Dmitriy, thank you for your time and questions, which helped me to > realize what I forget to mentioned! > See my answers inline; later I'll combine everything together to help > to the next readers :) > > I put together some implementation ideas in Apache Ignite JIRA, as > promised: https://issues.apache.org/jira/browse/IGNITE-3457. I see > this facility as another CacheStore implementation, so it wouldn't > interfere with base principals of Ignite platform. > > > On Mon, Jul 11, 2016 at 1:15 AM, Dmitriy Setrakyan > <[hidden email]> wrote: > > My answers are inline… > > > > On Sat, Jul 9, 2016 at 3:04 AM, Dmitriy Setrakyan <[hidden email] > > > > wrote: > > > >> Thanks Sasha! > >> > >> Resending to the dev list. > >> > >> D. > >> > >> On Fri, Jul 8, 2016 at 2:02 PM, Alexandre Boudnik < > [hidden email]> > >> wrote: > >> > >>> Apache Ignite a great platform but it lacks of certain capabilities, > >>> which are common in RDMS world, such as: > >>> - Consistent on-line backup for data on entire cluster (or for > >>> specified set of caches) > >>> > >> > > I think you mean data center replication here. It is not an easy feature > to > > implement, and so far has been handled by commercial vendors of Ignite, > > e.g. GridGain. > > > Actually not. Right here I meant exactly what I said: full or > incremental backup of all/selected caches in consistent state so it > can be used for the purpose of being able to restore them in case of > data loss or data corruption. One of important use cases is the OLAP > systems (let's say for banking), which has been built on Apache Ignite > platform. > > And you right, data center replication can be easily implemented based > on log/snapshot shipment. > > > > >> - Hierarchal snapshots for specified set caches > >>> > >> > > What do you mean by hierarchical? > > > In this particular case the notion of hierarchical snapshots is very > similar to the same notion used in SAN appliances or by Virtual Box or > vmware. Using concept of snapshots we can do all this amazing things: > - full and incremental backup > - restore > - rollback to checkpoint > - roll forward > much easier, with minimal memory and I/O overhead. > > > > >> - Transaction log > >>> > >> > > Why does Ignite need it for in-memory transactions? > > > At least it is required to provide roll-forward functionality, when > you restores the state of the cache from checkpoint (the cache state > before snapshot has been made) and then reapply transactions one by > one. > > > > >> - Restore cluster state as of certain point in time > >>> > >> > > Given that such restorability may introduce lots of memory overhead, does > > it really make sense for an in-memory cache? > > > Actually, it will not consume any memory. It will use external memory, > such as HDD/SSD space instead. And yes, I think that this > functionality makes complete sense for our users IRL, who will love > it. > > > > >> - Rolling forward from snapshot with ability to filter/modify > transactions > >>> > >> > > Same as above > > > The same as above: my customers in trenches are begging for that feature. > > > > >> - Asynchronous replication based either on log shipment or snapshot > >>> shipment > >>> -- Between clusters > >>> > >> > > This is the same as data center replication, no? > Including but not limited to: log shipment or snapshot shipment also > could be used to implement so called "better-than-lambda-architecture" > for BI and OLAP, when data replicated to a query-able datasource let's > say Oracle as soon as they are produced by OLTP system. We can use > RDBMS API such as Oracle Streams (going to be discontinued - sad) or > Golden Gate to filter changes from logs/snapshots and then apply them. > That approach allows to save a tons of legacy reports and BI > dashboards. > > > > > > >> -- Continues data export to let’s say RDMS > >>> > >> > > Don’t we already support it with our write-through feature to a database? > > > When write-through used for non-local caches it may cause the data > corruption in RDBMS: I have opened this issue a few weeks ago: > https://issues.apache.org/jira/browse/IGNITE-3321 > > > > >> It is also a necessity to reduce cold start time for huge clusters > >>> with strict SLAs. > >>> > >> > > What part are you trying to speed up here? Are you talking about loading > > data from databases? > > > I'm talking about the initial load from Persistent Store when cluster > has been cold-started (like from GridGain's Local Recoverable Store). > > > > >> > >>> I'll put some implementation ideas in JIRA later on. I believe that > >>> this list is far from being complete, but I want the community to > >>> discuss these abovementioned use cases. > >>> > >>> --Sasha > >>> > >> > >> > |
On Wed, Jul 13, 2016 at 05:30AM, Dmitriy Setrakyan wrote:
> Hi Alex, > > I believe most of your comments have to do with disk-based functionality, > especially in regard to backups, snapshots, etc. However, Ignite is > currently an in-memory system, at least for the nearest future. Let me know > if I misunderstood something. And the nearest future is defined by....? This is a collaborative project, as you all learned during the incubation, and the statements like "the X only does bar for now" should be consensual. If there's a will to work on the new functionality which is demanded by the users, and the said functionality is expected to expand the applicability of the technology - I don't really see why and how it could be put to hold. Fortunately, there are a number of ways this development could be put through, and it doesn't really require much of the moving parts (in fact it is done all the time in the same way right now): let's put the new development on a branch, and start moving. There's JIRA and there's the CI to help to validate and coordinate the work. Sounds like an easy decision to me. Cos > On Tue, Jul 12, 2016 at 9:44 PM, Alexandre Boudnik < > [hidden email]> wrote: > > > Dmitriy, thank you for your time and questions, which helped me to > > realize what I forget to mentioned! > > See my answers inline; later I'll combine everything together to help > > to the next readers :) > > > > I put together some implementation ideas in Apache Ignite JIRA, as > > promised: https://issues.apache.org/jira/browse/IGNITE-3457. I see > > this facility as another CacheStore implementation, so it wouldn't > > interfere with base principals of Ignite platform. > > > > > > On Mon, Jul 11, 2016 at 1:15 AM, Dmitriy Setrakyan > > <[hidden email]> wrote: > > > My answers are inline… > > > > > > On Sat, Jul 9, 2016 at 3:04 AM, Dmitriy Setrakyan <[hidden email] > > > > > > wrote: > > > > > >> Thanks Sasha! > > >> > > >> Resending to the dev list. > > >> > > >> D. > > >> > > >> On Fri, Jul 8, 2016 at 2:02 PM, Alexandre Boudnik < > > [hidden email]> > > >> wrote: > > >> > > >>> Apache Ignite a great platform but it lacks of certain capabilities, > > >>> which are common in RDMS world, such as: > > >>> - Consistent on-line backup for data on entire cluster (or for > > >>> specified set of caches) > > >>> > > >> > > > I think you mean data center replication here. It is not an easy feature > > to > > > implement, and so far has been handled by commercial vendors of Ignite, > > > e.g. GridGain. > > > > > Actually not. Right here I meant exactly what I said: full or > > incremental backup of all/selected caches in consistent state so it > > can be used for the purpose of being able to restore them in case of > > data loss or data corruption. One of important use cases is the OLAP > > systems (let's say for banking), which has been built on Apache Ignite > > platform. > > > > And you right, data center replication can be easily implemented based > > on log/snapshot shipment. > > > > > > > >> - Hierarchal snapshots for specified set caches > > >>> > > >> > > > What do you mean by hierarchical? > > > > > In this particular case the notion of hierarchical snapshots is very > > similar to the same notion used in SAN appliances or by Virtual Box or > > vmware. Using concept of snapshots we can do all this amazing things: > > - full and incremental backup > > - restore > > - rollback to checkpoint > > - roll forward > > much easier, with minimal memory and I/O overhead. > > > > > > > >> - Transaction log > > >>> > > >> > > > Why does Ignite need it for in-memory transactions? > > > > > At least it is required to provide roll-forward functionality, when > > you restores the state of the cache from checkpoint (the cache state > > before snapshot has been made) and then reapply transactions one by > > one. > > > > > > > >> - Restore cluster state as of certain point in time > > >>> > > >> > > > Given that such restorability may introduce lots of memory overhead, does > > > it really make sense for an in-memory cache? > > > > > Actually, it will not consume any memory. It will use external memory, > > such as HDD/SSD space instead. And yes, I think that this > > functionality makes complete sense for our users IRL, who will love > > it. > > > > > > > >> - Rolling forward from snapshot with ability to filter/modify > > transactions > > >>> > > >> > > > Same as above > > > > > The same as above: my customers in trenches are begging for that feature. > > > > > > > >> - Asynchronous replication based either on log shipment or snapshot > > >>> shipment > > >>> -- Between clusters > > >>> > > >> > > > This is the same as data center replication, no? > > Including but not limited to: log shipment or snapshot shipment also > > could be used to implement so called "better-than-lambda-architecture" > > for BI and OLAP, when data replicated to a query-able datasource let's > > say Oracle as soon as they are produced by OLTP system. We can use > > RDBMS API such as Oracle Streams (going to be discontinued - sad) or > > Golden Gate to filter changes from logs/snapshots and then apply them. > > That approach allows to save a tons of legacy reports and BI > > dashboards. > > > > > > > > > > >> -- Continues data export to let’s say RDMS > > >>> > > >> > > > Don’t we already support it with our write-through feature to a database? > > > > > When write-through used for non-local caches it may cause the data > > corruption in RDBMS: I have opened this issue a few weeks ago: > > https://issues.apache.org/jira/browse/IGNITE-3321 > > > > > > > >> It is also a necessity to reduce cold start time for huge clusters > > >>> with strict SLAs. > > >>> > > >> > > > What part are you trying to speed up here? Are you talking about loading > > > data from databases? > > > > > I'm talking about the initial load from Persistent Store when cluster > > has been cold-started (like from GridGain's Local Recoverable Store). > > > > > > > >> > > >>> I'll put some implementation ideas in JIRA later on. I believe that > > >>> this list is far from being complete, but I want the community to > > >>> discuss these abovementioned use cases. > > >>> > > >>> --Sasha > > >>> > > >> > > >> > > |
On Thu, Jul 14, 2016 at 9:07 PM, Konstantin Boudnik <[hidden email]> wrote:
> On Wed, Jul 13, 2016 at 05:30AM, Dmitriy Setrakyan wrote: > > Hi Alex, > > > > I believe most of your comments have to do with disk-based functionality, > > especially in regard to backups, snapshots, etc. However, Ignite is > > currently an in-memory system, at least for the nearest future. Let me > know > > if I misunderstood something. > > And the nearest future is defined by....? This is a collaborative project, > as > you all learned during the incubation, and the statements like "the X only > does bar for now" should be consensual. If there's a will to work on the > new > functionality which is demanded by the users, and the said functionality is > expected to expand the applicability of the technology - I don't really see > why and how it could be put to hold. > > Fortunately, there are a number of ways this development could be put > through, > and it doesn't really require much of the moving parts (in fact it is done > all > the time in the same way right now): let's put the new development on a > branch, and start moving. There's JIRA and there's the CI to help to > validate > and coordinate the work. Sounds like an easy decision to me. > Cos, the nearest future is defined by the community, of course. Take a look at the Ignite 2.0 discussion which is taking place on another thread [1]. Any disk-based functionality will require some significant memory-model rearchitecture, which is already planned for Ignite 2.0 as part of IGNITE-3477 [2] and IGNITE-3478 [3]. I believe Alexey G. has already started making significant progress on it. Note that in-memory snapshots are already defined as a part of this work. If the community decides to add disk based features, I am all for it. We can start a discussion on it now, but the implementation should come after the Ignite 2.0, to avoid any conflicts in architecture, design, or code. Just my 0.02 cents. [1] - http://apache-ignite-developers.2346864.n4.nabble.com/Ignite-2-0-tasks-roadmap-td9585.html [2] - https://issues.apache.org/jira/browse/IGNITE-3477 [3] - https://issues.apache.org/jira/browse/IGNITE-3478 > Cos > > > On Tue, Jul 12, 2016 at 9:44 PM, Alexandre Boudnik < > > [hidden email]> wrote: > > > > > Dmitriy, thank you for your time and questions, which helped me to > > > realize what I forget to mentioned! > > > See my answers inline; later I'll combine everything together to help > > > to the next readers :) > > > > > > I put together some implementation ideas in Apache Ignite JIRA, as > > > promised: https://issues.apache.org/jira/browse/IGNITE-3457. I see > > > this facility as another CacheStore implementation, so it wouldn't > > > interfere with base principals of Ignite platform. > > > > > > > > > On Mon, Jul 11, 2016 at 1:15 AM, Dmitriy Setrakyan > > > <[hidden email]> wrote: > > > > My answers are inline… > > > > > > > > On Sat, Jul 9, 2016 at 3:04 AM, Dmitriy Setrakyan < > [hidden email] > > > > > > > > wrote: > > > > > > > >> Thanks Sasha! > > > >> > > > >> Resending to the dev list. > > > >> > > > >> D. > > > >> > > > >> On Fri, Jul 8, 2016 at 2:02 PM, Alexandre Boudnik < > > > [hidden email]> > > > >> wrote: > > > >> > > > >>> Apache Ignite a great platform but it lacks of certain > capabilities, > > > >>> which are common in RDMS world, such as: > > > >>> - Consistent on-line backup for data on entire cluster (or for > > > >>> specified set of caches) > > > >>> > > > >> > > > > I think you mean data center replication here. It is not an easy > feature > > > to > > > > implement, and so far has been handled by commercial vendors of > Ignite, > > > > e.g. GridGain. > > > > > > > Actually not. Right here I meant exactly what I said: full or > > > incremental backup of all/selected caches in consistent state so it > > > can be used for the purpose of being able to restore them in case of > > > data loss or data corruption. One of important use cases is the OLAP > > > systems (let's say for banking), which has been built on Apache Ignite > > > platform. > > > > > > And you right, data center replication can be easily implemented based > > > on log/snapshot shipment. > > > > > > > > > > >> - Hierarchal snapshots for specified set caches > > > >>> > > > >> > > > > What do you mean by hierarchical? > > > > > > > In this particular case the notion of hierarchical snapshots is very > > > similar to the same notion used in SAN appliances or by Virtual Box or > > > vmware. Using concept of snapshots we can do all this amazing things: > > > - full and incremental backup > > > - restore > > > - rollback to checkpoint > > > - roll forward > > > much easier, with minimal memory and I/O overhead. > > > > > > > > > > >> - Transaction log > > > >>> > > > >> > > > > Why does Ignite need it for in-memory transactions? > > > > > > > At least it is required to provide roll-forward functionality, when > > > you restores the state of the cache from checkpoint (the cache state > > > before snapshot has been made) and then reapply transactions one by > > > one. > > > > > > > > > > >> - Restore cluster state as of certain point in time > > > >>> > > > >> > > > > Given that such restorability may introduce lots of memory overhead, > does > > > > it really make sense for an in-memory cache? > > > > > > > Actually, it will not consume any memory. It will use external memory, > > > such as HDD/SSD space instead. And yes, I think that this > > > functionality makes complete sense for our users IRL, who will love > > > it. > > > > > > > > > > >> - Rolling forward from snapshot with ability to filter/modify > > > transactions > > > >>> > > > >> > > > > Same as above > > > > > > > The same as above: my customers in trenches are begging for that > feature. > > > > > > > > > > >> - Asynchronous replication based either on log shipment or snapshot > > > >>> shipment > > > >>> -- Between clusters > > > >>> > > > >> > > > > This is the same as data center replication, no? > > > Including but not limited to: log shipment or snapshot shipment also > > > could be used to implement so called "better-than-lambda-architecture" > > > for BI and OLAP, when data replicated to a query-able datasource let's > > > say Oracle as soon as they are produced by OLTP system. We can use > > > RDBMS API such as Oracle Streams (going to be discontinued - sad) or > > > Golden Gate to filter changes from logs/snapshots and then apply them. > > > That approach allows to save a tons of legacy reports and BI > > > dashboards. > > > > > > > > > > > > > > >> -- Continues data export to let’s say RDMS > > > >>> > > > >> > > > > Don’t we already support it with our write-through feature to a > database? > > > > > > > When write-through used for non-local caches it may cause the data > > > corruption in RDBMS: I have opened this issue a few weeks ago: > > > https://issues.apache.org/jira/browse/IGNITE-3321 > > > > > > > > > > >> It is also a necessity to reduce cold start time for huge clusters > > > >>> with strict SLAs. > > > >>> > > > >> > > > > What part are you trying to speed up here? Are you talking about > loading > > > > data from databases? > > > > > > > I'm talking about the initial load from Persistent Store when cluster > > > has been cold-started (like from GridGain's Local Recoverable Store). > > > > > > > > > > >> > > > >>> I'll put some implementation ideas in JIRA later on. I believe that > > > >>> this list is far from being complete, but I want the community to > > > >>> discuss these abovementioned use cases. > > > >>> > > > >>> --Sasha > > > >>> > > > >> > > > >> > > > > |
Dmitriy,
It looks like Konstantin is talking about specific case, when you specified readThrough/writeThrough mode for your caches. In a such mode all your WRITE operations and some portion of READ operation are inevitably disk-based. Thus all the suggested enhancements are about readThrough/writeThrough mode only. Igor Rudyak On Thu, Jul 14, 2016 at 3:00 PM, Dmitriy Setrakyan <[hidden email]> wrote: > On Thu, Jul 14, 2016 at 9:07 PM, Konstantin Boudnik <[hidden email]> > wrote: > > > On Wed, Jul 13, 2016 at 05:30AM, Dmitriy Setrakyan wrote: > > > Hi Alex, > > > > > > I believe most of your comments have to do with disk-based > functionality, > > > especially in regard to backups, snapshots, etc. However, Ignite is > > > currently an in-memory system, at least for the nearest future. Let me > > know > > > if I misunderstood something. > > > > And the nearest future is defined by....? This is a collaborative > project, > > as > > you all learned during the incubation, and the statements like "the X > only > > does bar for now" should be consensual. If there's a will to work on the > > new > > functionality which is demanded by the users, and the said functionality > is > > expected to expand the applicability of the technology - I don't really > see > > why and how it could be put to hold. > > > > Fortunately, there are a number of ways this development could be put > > through, > > and it doesn't really require much of the moving parts (in fact it is > done > > all > > the time in the same way right now): let's put the new development on a > > branch, and start moving. There's JIRA and there's the CI to help to > > validate > > and coordinate the work. Sounds like an easy decision to me. > > > > Cos, the nearest future is defined by the community, of course. Take a look > at the Ignite 2.0 discussion which is taking place on another thread [1]. > > Any disk-based functionality will require some significant memory-model > rearchitecture, which is already planned for Ignite 2.0 as part of > IGNITE-3477 [2] and IGNITE-3478 [3]. I believe Alexey G. has already > started making significant progress on it. Note that in-memory snapshots > are already defined as a part of this work. > > If the community decides to add disk based features, I am all for it. We > can start a discussion on it now, but the implementation should come after > the Ignite 2.0, to avoid any conflicts in architecture, design, or code. > Just my 0.02 cents. > > [1] - > > http://apache-ignite-developers.2346864.n4.nabble.com/Ignite-2-0-tasks-roadmap-td9585.html > [2] - https://issues.apache.org/jira/browse/IGNITE-3477 > [3] - https://issues.apache.org/jira/browse/IGNITE-3478 > > > > Cos > > > > > On Tue, Jul 12, 2016 at 9:44 PM, Alexandre Boudnik < > > > [hidden email]> wrote: > > > > > > > Dmitriy, thank you for your time and questions, which helped me to > > > > realize what I forget to mentioned! > > > > See my answers inline; later I'll combine everything together to help > > > > to the next readers :) > > > > > > > > I put together some implementation ideas in Apache Ignite JIRA, as > > > > promised: https://issues.apache.org/jira/browse/IGNITE-3457. I see > > > > this facility as another CacheStore implementation, so it wouldn't > > > > interfere with base principals of Ignite platform. > > > > > > > > > > > > On Mon, Jul 11, 2016 at 1:15 AM, Dmitriy Setrakyan > > > > <[hidden email]> wrote: > > > > > My answers are inline… > > > > > > > > > > On Sat, Jul 9, 2016 at 3:04 AM, Dmitriy Setrakyan < > > [hidden email] > > > > > > > > > > wrote: > > > > > > > > > >> Thanks Sasha! > > > > >> > > > > >> Resending to the dev list. > > > > >> > > > > >> D. > > > > >> > > > > >> On Fri, Jul 8, 2016 at 2:02 PM, Alexandre Boudnik < > > > > [hidden email]> > > > > >> wrote: > > > > >> > > > > >>> Apache Ignite a great platform but it lacks of certain > > capabilities, > > > > >>> which are common in RDMS world, such as: > > > > >>> - Consistent on-line backup for data on entire cluster (or for > > > > >>> specified set of caches) > > > > >>> > > > > >> > > > > > I think you mean data center replication here. It is not an easy > > feature > > > > to > > > > > implement, and so far has been handled by commercial vendors of > > Ignite, > > > > > e.g. GridGain. > > > > > > > > > Actually not. Right here I meant exactly what I said: full or > > > > incremental backup of all/selected caches in consistent state so it > > > > can be used for the purpose of being able to restore them in case of > > > > data loss or data corruption. One of important use cases is the OLAP > > > > systems (let's say for banking), which has been built on Apache > Ignite > > > > platform. > > > > > > > > And you right, data center replication can be easily implemented > based > > > > on log/snapshot shipment. > > > > > > > > > > > > > >> - Hierarchal snapshots for specified set caches > > > > >>> > > > > >> > > > > > What do you mean by hierarchical? > > > > > > > > > In this particular case the notion of hierarchical snapshots is very > > > > similar to the same notion used in SAN appliances or by Virtual Box > or > > > > vmware. Using concept of snapshots we can do all this amazing things: > > > > - full and incremental backup > > > > - restore > > > > - rollback to checkpoint > > > > - roll forward > > > > much easier, with minimal memory and I/O overhead. > > > > > > > > > > > > > >> - Transaction log > > > > >>> > > > > >> > > > > > Why does Ignite need it for in-memory transactions? > > > > > > > > > At least it is required to provide roll-forward functionality, when > > > > you restores the state of the cache from checkpoint (the cache state > > > > before snapshot has been made) and then reapply transactions one by > > > > one. > > > > > > > > > > > > > >> - Restore cluster state as of certain point in time > > > > >>> > > > > >> > > > > > Given that such restorability may introduce lots of memory > overhead, > > does > > > > > it really make sense for an in-memory cache? > > > > > > > > > Actually, it will not consume any memory. It will use external > memory, > > > > such as HDD/SSD space instead. And yes, I think that this > > > > functionality makes complete sense for our users IRL, who will love > > > > it. > > > > > > > > > > > > > >> - Rolling forward from snapshot with ability to filter/modify > > > > transactions > > > > >>> > > > > >> > > > > > Same as above > > > > > > > > > The same as above: my customers in trenches are begging for that > > feature. > > > > > > > > > > > > > >> - Asynchronous replication based either on log shipment or > snapshot > > > > >>> shipment > > > > >>> -- Between clusters > > > > >>> > > > > >> > > > > > This is the same as data center replication, no? > > > > Including but not limited to: log shipment or snapshot shipment also > > > > could be used to implement so called > "better-than-lambda-architecture" > > > > for BI and OLAP, when data replicated to a query-able datasource > let's > > > > say Oracle as soon as they are produced by OLTP system. We can use > > > > RDBMS API such as Oracle Streams (going to be discontinued - sad) or > > > > Golden Gate to filter changes from logs/snapshots and then apply > them. > > > > That approach allows to save a tons of legacy reports and BI > > > > dashboards. > > > > > > > > > > > > > > > > > > >> -- Continues data export to let’s say RDMS > > > > >>> > > > > >> > > > > > Don’t we already support it with our write-through feature to a > > database? > > > > > > > > > When write-through used for non-local caches it may cause the data > > > > corruption in RDBMS: I have opened this issue a few weeks ago: > > > > https://issues.apache.org/jira/browse/IGNITE-3321 > > > > > > > > > > > > > >> It is also a necessity to reduce cold start time for huge clusters > > > > >>> with strict SLAs. > > > > >>> > > > > >> > > > > > What part are you trying to speed up here? Are you talking about > > loading > > > > > data from databases? > > > > > > > > > I'm talking about the initial load from Persistent Store when cluster > > > > has been cold-started (like from GridGain's Local Recoverable Store). > > > > > > > > > > > > > >> > > > > >>> I'll put some implementation ideas in JIRA later on. I believe > that > > > > >>> this list is far from being complete, but I want the community to > > > > >>> discuss these abovementioned use cases. > > > > >>> > > > > >>> --Sasha > > > > >>> > > > > >> > > > > >> > > > > > > > |
Free forum by Nabble | Edit this page |