Hell Igniters,
In this thread I would like to share and discuss some thoughts on DML operations' implementation, so let's start and keep it here. Everyone is of course welcome to share their suggestions. For starters, I was thinking about semantics of INSERT. In traditional RDBMSs, INSERT works only for records whose primary keys don't conflict with those of records that are already persistent - you can't try to insert the same key more than once because you'll get an error. However, semantics of cache put is obviously different - it does not have anything about duplicate keys, it just quietly updates values in case of keys' duplication. Still, cache has putIfAbsent operation that is closer to traditional notion of INSERT, and H2's SQL dialect has MERGE operation which corresponds to semantics of cache put. Thus, I suggest that we implement MERGE as a separate operation backed by putIfAbsent operation, while INSERT will be implemented via put. And one more, probably more important thing: I suggest that we create separate class Update and corresponding operation update() in IgniteCache. The reasons are as follows: - Query bears some flags that are clearly redundant for Update (page size, locality) - query() method in IgniteCache (one that accepts Query) and query() methods in GridQueryIndexing return iterators. So, if we strive to leave interfaces unchanged, we still will introduce some design ugliness like query methods returning empty iterators for certain queries, and/or query flags that indicate whether it's an update query or not, etc. - If some Queries are update queries, then continuous queries can't be based on them - more design-wise ugly checks and stuff like that. - I'm pretty sure there's more I don't know about. Comments and suggestions are welcome. Sergi Vladykin, Dmitry Setrakyan, your opinions are of particular interest, please advise. Regards, Alex |
> Thus, I suggest that we implement MERGE as a separate operation backed by putIfAbsent operation, while INSERT will be implemented via put.
Sorry, of course I meant that MERGE has to be put-based, while INSERT has to be putIfAbsent-based. 2016-07-20 12:30 GMT+03:00 Alexander Paschenko <[hidden email]>: > Hell Igniters, > > In this thread I would like to share and discuss some thoughts on DML > operations' implementation, so let's start and keep it here. Everyone > is of course welcome to share their suggestions. > > For starters, I was thinking about semantics of INSERT. In traditional > RDBMSs, INSERT works only for records whose primary keys don't > conflict with those of records that are already persistent - you can't > try to insert the same key more than once because you'll get an error. > However, semantics of cache put is obviously different - it does not > have anything about duplicate keys, it just quietly updates values in > case of keys' duplication. Still, cache has putIfAbsent operation that > is closer to traditional notion of INSERT, and H2's SQL dialect has > MERGE operation which corresponds to semantics of cache put. Thus, I > suggest that we implement MERGE as a separate operation backed by > putIfAbsent operation, while INSERT will be implemented via put. > > And one more, probably more important thing: I suggest that we create > separate class Update and corresponding operation update() in > IgniteCache. The reasons are as follows: > > - Query bears some flags that are clearly redundant for Update (page > size, locality) > - query() method in IgniteCache (one that accepts Query) and query() > methods in GridQueryIndexing return iterators. So, if we strive to > leave interfaces unchanged, we still will introduce some design > ugliness like query methods returning empty iterators for certain > queries, and/or query flags that indicate whether it's an update query > or not, etc. > - If some Queries are update queries, then continuous queries can't be > based on them - more design-wise ugly checks and stuff like that. > - I'm pretty sure there's more I don't know about. > > Comments and suggestions are welcome. Sergi Vladykin, Dmitry > Setrakyan, your opinions are of particular interest, please advise. > > Regards, > Alex |
Like the idea of merge and insert. I need more time to think about the API changes.
Sergi, what do you think? Dmitriy On Jul 20, 2016, at 12:36 PM, Alexander Paschenko <[hidden email]> wrote: >> Thus, I suggest that we implement MERGE as a separate operation backed by putIfAbsent operation, while INSERT will be implemented via put. > > Sorry, of course I meant that MERGE has to be put-based, while INSERT > has to be putIfAbsent-based. > > 2016-07-20 12:30 GMT+03:00 Alexander Paschenko > <[hidden email]>: >> Hell Igniters, >> >> In this thread I would like to share and discuss some thoughts on DML >> operations' implementation, so let's start and keep it here. Everyone >> is of course welcome to share their suggestions. >> >> For starters, I was thinking about semantics of INSERT. In traditional >> RDBMSs, INSERT works only for records whose primary keys don't >> conflict with those of records that are already persistent - you can't >> try to insert the same key more than once because you'll get an error. >> However, semantics of cache put is obviously different - it does not >> have anything about duplicate keys, it just quietly updates values in >> case of keys' duplication. Still, cache has putIfAbsent operation that >> is closer to traditional notion of INSERT, and H2's SQL dialect has >> MERGE operation which corresponds to semantics of cache put. Thus, I >> suggest that we implement MERGE as a separate operation backed by >> putIfAbsent operation, while INSERT will be implemented via put. >> >> And one more, probably more important thing: I suggest that we create >> separate class Update and corresponding operation update() in >> IgniteCache. The reasons are as follows: >> >> - Query bears some flags that are clearly redundant for Update (page >> size, locality) >> - query() method in IgniteCache (one that accepts Query) and query() >> methods in GridQueryIndexing return iterators. So, if we strive to >> leave interfaces unchanged, we still will introduce some design >> ugliness like query methods returning empty iterators for certain >> queries, and/or query flags that indicate whether it's an update query >> or not, etc. >> - If some Queries are update queries, then continuous queries can't be >> based on them - more design-wise ugly checks and stuff like that. >> - I'm pretty sure there's more I don't know about. >> >> Comments and suggestions are welcome. Sergi Vladykin, Dmitry >> Setrakyan, your opinions are of particular interest, please advise. >> >> Regards, >> Alex |
I'd prefer to do MERGE operation last because in H2 it is not standard ANSI
SQL MERGE. Or may be not implement it at all, or may be contribute ANSI correct version to H2, then implement it on Ignite. Need to investigate the semantics deeper before making any decisions here. Lets start with simple scenarios for INSERT and go through all the possible cases and answer the questions: - What will happen on key conflict in TX cache? - What will happen on key conflict in Atomic cache? - What will happen with the previous two if we use DataLoader? - How to make these operations efficient (it will be simple enough to implement them with separate put/putIfAbsent operations but probably we will need some batching like putAllIfAbsent for efficiency)? As for API, we still will need to have a single entry point for all SQL queries/commands to allow any console work with it transparently. It would be great if we will be able to come up with something consistent with this idea on public API. Sergi On Wed, Jul 20, 2016 at 2:23 PM, Dmitriy Setrakyan <[hidden email]> wrote: > Like the idea of merge and insert. I need more time to think about the API > changes. > > Sergi, what do you think? > > Dmitriy > > > > On Jul 20, 2016, at 12:36 PM, Alexander Paschenko < > [hidden email]> wrote: > > >> Thus, I suggest that we implement MERGE as a separate operation backed > by putIfAbsent operation, while INSERT will be implemented via put. > > > > Sorry, of course I meant that MERGE has to be put-based, while INSERT > > has to be putIfAbsent-based. > > > > 2016-07-20 12:30 GMT+03:00 Alexander Paschenko > > <[hidden email]>: > >> Hell Igniters, > >> > >> In this thread I would like to share and discuss some thoughts on DML > >> operations' implementation, so let's start and keep it here. Everyone > >> is of course welcome to share their suggestions. > >> > >> For starters, I was thinking about semantics of INSERT. In traditional > >> RDBMSs, INSERT works only for records whose primary keys don't > >> conflict with those of records that are already persistent - you can't > >> try to insert the same key more than once because you'll get an error. > >> However, semantics of cache put is obviously different - it does not > >> have anything about duplicate keys, it just quietly updates values in > >> case of keys' duplication. Still, cache has putIfAbsent operation that > >> is closer to traditional notion of INSERT, and H2's SQL dialect has > >> MERGE operation which corresponds to semantics of cache put. Thus, I > >> suggest that we implement MERGE as a separate operation backed by > >> putIfAbsent operation, while INSERT will be implemented via put. > >> > >> And one more, probably more important thing: I suggest that we create > >> separate class Update and corresponding operation update() in > >> IgniteCache. The reasons are as follows: > >> > >> - Query bears some flags that are clearly redundant for Update (page > >> size, locality) > >> - query() method in IgniteCache (one that accepts Query) and query() > >> methods in GridQueryIndexing return iterators. So, if we strive to > >> leave interfaces unchanged, we still will introduce some design > >> ugliness like query methods returning empty iterators for certain > >> queries, and/or query flags that indicate whether it's an update query > >> or not, etc. > >> - If some Queries are update queries, then continuous queries can't be > >> based on them - more design-wise ugly checks and stuff like that. > >> - I'm pretty sure there's more I don't know about. > >> > >> Comments and suggestions are welcome. Sergi Vladykin, Dmitry > >> Setrakyan, your opinions are of particular interest, please advise. > >> > >> Regards, > >> Alex > |
Serj, are you asking what will happen as of today? Then the answer to all
your questions is that duplicate keys are not an issue, and Ignite always operates in **upsert** mode (which is essentially a *“put(…)” *method). However, the *“insert”* that is suggested by Alex would delegate to *“putIfAbsent(…)”*, which in database world makes more sense. However, in this case, the *“update”* syntax should delegate to *“replace(…)”*, as update should fail in case if a key is absent. Considering the above, a notion of “*upsert”* or “*merge” *operation is very much needed, as it will give a user an option to perform “insert-or-update” in 1 call. Does this make sense? D. On Wed, Jul 20, 2016 at 9:39 PM, Sergi Vladykin <[hidden email]> wrote: > I'd prefer to do MERGE operation last because in H2 it is not standard ANSI > SQL MERGE. Or may be not implement it at all, or may be contribute ANSI > correct version to H2, then implement it on Ignite. Need to investigate the > semantics deeper before making any decisions here. > > Lets start with simple scenarios for INSERT and go through all the possible > cases and answer the questions: > - What will happen on key conflict in TX cache? > - What will happen on key conflict in Atomic cache? > - What will happen with the previous two if we use DataLoader? > - How to make these operations efficient (it will be simple enough to > implement them with separate put/putIfAbsent operations but probably we > will need some batching like putAllIfAbsent for efficiency)? > > As for API, we still will need to have a single entry point for all SQL > queries/commands to allow any console work with it transparently. It would > be great if we will be able to come up with something consistent with this > idea on public API. > > Sergi > > > > > > > > > On Wed, Jul 20, 2016 at 2:23 PM, Dmitriy Setrakyan < > [hidden email]> > wrote: > > > Like the idea of merge and insert. I need more time to think about the > API > > changes. > > > > Sergi, what do you think? > > > > Dmitriy > > > > > > > > On Jul 20, 2016, at 12:36 PM, Alexander Paschenko < > > [hidden email]> wrote: > > > > >> Thus, I suggest that we implement MERGE as a separate operation backed > > by putIfAbsent operation, while INSERT will be implemented via put. > > > > > > Sorry, of course I meant that MERGE has to be put-based, while INSERT > > > has to be putIfAbsent-based. > > > > > > 2016-07-20 12:30 GMT+03:00 Alexander Paschenko > > > <[hidden email]>: > > >> Hell Igniters, > > >> > > >> In this thread I would like to share and discuss some thoughts on DML > > >> operations' implementation, so let's start and keep it here. Everyone > > >> is of course welcome to share their suggestions. > > >> > > >> For starters, I was thinking about semantics of INSERT. In traditional > > >> RDBMSs, INSERT works only for records whose primary keys don't > > >> conflict with those of records that are already persistent - you can't > > >> try to insert the same key more than once because you'll get an error. > > >> However, semantics of cache put is obviously different - it does not > > >> have anything about duplicate keys, it just quietly updates values in > > >> case of keys' duplication. Still, cache has putIfAbsent operation that > > >> is closer to traditional notion of INSERT, and H2's SQL dialect has > > >> MERGE operation which corresponds to semantics of cache put. Thus, I > > >> suggest that we implement MERGE as a separate operation backed by > > >> putIfAbsent operation, while INSERT will be implemented via put. > > >> > > >> And one more, probably more important thing: I suggest that we create > > >> separate class Update and corresponding operation update() in > > >> IgniteCache. The reasons are as follows: > > >> > > >> - Query bears some flags that are clearly redundant for Update (page > > >> size, locality) > > >> - query() method in IgniteCache (one that accepts Query) and query() > > >> methods in GridQueryIndexing return iterators. So, if we strive to > > >> leave interfaces unchanged, we still will introduce some design > > >> ugliness like query methods returning empty iterators for certain > > >> queries, and/or query flags that indicate whether it's an update query > > >> or not, etc. > > >> - If some Queries are update queries, then continuous queries can't be > > >> based on them - more design-wise ugly checks and stuff like that. > > >> - I'm pretty sure there's more I don't know about. > > >> > > >> Comments and suggestions are welcome. Sergi Vladykin, Dmitry > > >> Setrakyan, your opinions are of particular interest, please advise. > > >> > > >> Regards, > > >> Alex > > > |
No, this does not make sense.
There is no upsert mode in databases. There are operations: INSERT, UPDATE, DELETE, MERGE. I want to have clear understanding of how they have to behave in SQL databases and how they will actually behave in Ignite in different scenarios. Also I want to have clear understanding of performance implications of each decision here. Anything wrong with that? Sergi On Thu, Jul 21, 2016 at 1:04 PM, Dmitriy Setrakyan <[hidden email]> wrote: > Serj, are you asking what will happen as of today? Then the answer to all > your questions is that duplicate keys are not an issue, and Ignite always > operates in **upsert** mode (which is essentially a *“put(…)” *method). > > However, the *“insert”* that is suggested by Alex would delegate to > *“putIfAbsent(…)”*, which in database world makes more sense. However, in > this case, the *“update”* syntax should delegate to *“replace(…)”*, as > update should fail in case if a key is absent. > > Considering the above, a notion of “*upsert”* or “*merge” *operation is > very much needed, as it will give a user an option to perform > “insert-or-update” in 1 call. > > Does this make sense? > > D. > > On Wed, Jul 20, 2016 at 9:39 PM, Sergi Vladykin <[hidden email]> > wrote: > > > I'd prefer to do MERGE operation last because in H2 it is not standard > ANSI > > SQL MERGE. Or may be not implement it at all, or may be contribute ANSI > > correct version to H2, then implement it on Ignite. Need to investigate > the > > semantics deeper before making any decisions here. > > > > Lets start with simple scenarios for INSERT and go through all the > possible > > cases and answer the questions: > > - What will happen on key conflict in TX cache? > > - What will happen on key conflict in Atomic cache? > > - What will happen with the previous two if we use DataLoader? > > - How to make these operations efficient (it will be simple enough to > > implement them with separate put/putIfAbsent operations but probably we > > will need some batching like putAllIfAbsent for efficiency)? > > > > As for API, we still will need to have a single entry point for all SQL > > queries/commands to allow any console work with it transparently. It > would > > be great if we will be able to come up with something consistent with > this > > idea on public API. > > > > Sergi > > > > > > > > > > > > > > > > > > On Wed, Jul 20, 2016 at 2:23 PM, Dmitriy Setrakyan < > > [hidden email]> > > wrote: > > > > > Like the idea of merge and insert. I need more time to think about the > > API > > > changes. > > > > > > Sergi, what do you think? > > > > > > Dmitriy > > > > > > > > > > > > On Jul 20, 2016, at 12:36 PM, Alexander Paschenko < > > > [hidden email]> wrote: > > > > > > >> Thus, I suggest that we implement MERGE as a separate operation > backed > > > by putIfAbsent operation, while INSERT will be implemented via put. > > > > > > > > Sorry, of course I meant that MERGE has to be put-based, while INSERT > > > > has to be putIfAbsent-based. > > > > > > > > 2016-07-20 12:30 GMT+03:00 Alexander Paschenko > > > > <[hidden email]>: > > > >> Hell Igniters, > > > >> > > > >> In this thread I would like to share and discuss some thoughts on > DML > > > >> operations' implementation, so let's start and keep it here. > Everyone > > > >> is of course welcome to share their suggestions. > > > >> > > > >> For starters, I was thinking about semantics of INSERT. In > traditional > > > >> RDBMSs, INSERT works only for records whose primary keys don't > > > >> conflict with those of records that are already persistent - you > can't > > > >> try to insert the same key more than once because you'll get an > error. > > > >> However, semantics of cache put is obviously different - it does not > > > >> have anything about duplicate keys, it just quietly updates values > in > > > >> case of keys' duplication. Still, cache has putIfAbsent operation > that > > > >> is closer to traditional notion of INSERT, and H2's SQL dialect has > > > >> MERGE operation which corresponds to semantics of cache put. Thus, I > > > >> suggest that we implement MERGE as a separate operation backed by > > > >> putIfAbsent operation, while INSERT will be implemented via put. > > > >> > > > >> And one more, probably more important thing: I suggest that we > create > > > >> separate class Update and corresponding operation update() in > > > >> IgniteCache. The reasons are as follows: > > > >> > > > >> - Query bears some flags that are clearly redundant for Update (page > > > >> size, locality) > > > >> - query() method in IgniteCache (one that accepts Query) and query() > > > >> methods in GridQueryIndexing return iterators. So, if we strive to > > > >> leave interfaces unchanged, we still will introduce some design > > > >> ugliness like query methods returning empty iterators for certain > > > >> queries, and/or query flags that indicate whether it's an update > query > > > >> or not, etc. > > > >> - If some Queries are update queries, then continuous queries can't > be > > > >> based on them - more design-wise ugly checks and stuff like that. > > > >> - I'm pretty sure there's more I don't know about. > > > >> > > > >> Comments and suggestions are welcome. Sergi Vladykin, Dmitry > > > >> Setrakyan, your opinions are of particular interest, please advise. > > > >> > > > >> Regards, > > > >> Alex > > > > > > |
For me the main question here is how we define Key and Value for a cache
when using SQL interface. Unless we are going to re-architect the whole Ignite, it will still be a key-value storage in the first place, so whenever we do an INSERT, we must generate Key and Value. Given that values in INSERT are supposed to be fields of either Key or Value, this confuses me a lot. Maybe we should come up with DDL first? 2016-07-21 14:06 GMT+03:00 Sergi Vladykin <[hidden email]>: > No, this does not make sense. > > There is no upsert mode in databases. There are operations: INSERT, UPDATE, > DELETE, MERGE. > > I want to have clear understanding of how they have to behave in SQL > databases and how they will actually behave in Ignite in different > scenarios. Also I want to have clear understanding of performance > implications of each decision here. > > Anything wrong with that? > > Sergi > > On Thu, Jul 21, 2016 at 1:04 PM, Dmitriy Setrakyan <[hidden email]> > wrote: > > > Serj, are you asking what will happen as of today? Then the answer to all > > your questions is that duplicate keys are not an issue, and Ignite always > > operates in **upsert** mode (which is essentially a *“put(…)” *method). > > > > However, the *“insert”* that is suggested by Alex would delegate to > > *“putIfAbsent(…)”*, which in database world makes more sense. However, in > > this case, the *“update”* syntax should delegate to *“replace(…)”*, as > > update should fail in case if a key is absent. > > > > Considering the above, a notion of “*upsert”* or “*merge” *operation is > > very much needed, as it will give a user an option to perform > > “insert-or-update” in 1 call. > > > > Does this make sense? > > > > D. > > > > On Wed, Jul 20, 2016 at 9:39 PM, Sergi Vladykin < > [hidden email]> > > wrote: > > > > > I'd prefer to do MERGE operation last because in H2 it is not standard > > ANSI > > > SQL MERGE. Or may be not implement it at all, or may be contribute ANSI > > > correct version to H2, then implement it on Ignite. Need to investigate > > the > > > semantics deeper before making any decisions here. > > > > > > Lets start with simple scenarios for INSERT and go through all the > > possible > > > cases and answer the questions: > > > - What will happen on key conflict in TX cache? > > > - What will happen on key conflict in Atomic cache? > > > - What will happen with the previous two if we use DataLoader? > > > - How to make these operations efficient (it will be simple enough to > > > implement them with separate put/putIfAbsent operations but probably we > > > will need some batching like putAllIfAbsent for efficiency)? > > > > > > As for API, we still will need to have a single entry point for all SQL > > > queries/commands to allow any console work with it transparently. It > > would > > > be great if we will be able to come up with something consistent with > > this > > > idea on public API. > > > > > > Sergi > > > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Jul 20, 2016 at 2:23 PM, Dmitriy Setrakyan < > > > [hidden email]> > > > wrote: > > > > > > > Like the idea of merge and insert. I need more time to think about > the > > > API > > > > changes. > > > > > > > > Sergi, what do you think? > > > > > > > > Dmitriy > > > > > > > > > > > > > > > > On Jul 20, 2016, at 12:36 PM, Alexander Paschenko < > > > > [hidden email]> wrote: > > > > > > > > >> Thus, I suggest that we implement MERGE as a separate operation > > backed > > > > by putIfAbsent operation, while INSERT will be implemented via put. > > > > > > > > > > Sorry, of course I meant that MERGE has to be put-based, while > INSERT > > > > > has to be putIfAbsent-based. > > > > > > > > > > 2016-07-20 12:30 GMT+03:00 Alexander Paschenko > > > > > <[hidden email]>: > > > > >> Hell Igniters, > > > > >> > > > > >> In this thread I would like to share and discuss some thoughts on > > DML > > > > >> operations' implementation, so let's start and keep it here. > > Everyone > > > > >> is of course welcome to share their suggestions. > > > > >> > > > > >> For starters, I was thinking about semantics of INSERT. In > > traditional > > > > >> RDBMSs, INSERT works only for records whose primary keys don't > > > > >> conflict with those of records that are already persistent - you > > can't > > > > >> try to insert the same key more than once because you'll get an > > error. > > > > >> However, semantics of cache put is obviously different - it does > not > > > > >> have anything about duplicate keys, it just quietly updates values > > in > > > > >> case of keys' duplication. Still, cache has putIfAbsent operation > > that > > > > >> is closer to traditional notion of INSERT, and H2's SQL dialect > has > > > > >> MERGE operation which corresponds to semantics of cache put. > Thus, I > > > > >> suggest that we implement MERGE as a separate operation backed by > > > > >> putIfAbsent operation, while INSERT will be implemented via put. > > > > >> > > > > >> And one more, probably more important thing: I suggest that we > > create > > > > >> separate class Update and corresponding operation update() in > > > > >> IgniteCache. The reasons are as follows: > > > > >> > > > > >> - Query bears some flags that are clearly redundant for Update > (page > > > > >> size, locality) > > > > >> - query() method in IgniteCache (one that accepts Query) and > query() > > > > >> methods in GridQueryIndexing return iterators. So, if we strive to > > > > >> leave interfaces unchanged, we still will introduce some design > > > > >> ugliness like query methods returning empty iterators for certain > > > > >> queries, and/or query flags that indicate whether it's an update > > query > > > > >> or not, etc. > > > > >> - If some Queries are update queries, then continuous queries > can't > > be > > > > >> based on them - more design-wise ugly checks and stuff like that. > > > > >> - I'm pretty sure there's more I don't know about. > > > > >> > > > > >> Comments and suggestions are welcome. Sergi Vladykin, Dmitry > > > > >> Setrakyan, your opinions are of particular interest, please > advise. > > > > >> > > > > >> Regards, > > > > >> Alex > > > > > > > > > > |
I don't see why we need to start with DDL. We have table definitions with
key and value definitions in them, so what is the problem? Sergi On Thu, Jul 21, 2016 at 2:14 PM, Alexey Goncharuk < [hidden email]> wrote: > For me the main question here is how we define Key and Value for a cache > when using SQL interface. Unless we are going to re-architect the whole > Ignite, it will still be a key-value storage in the first place, so > whenever we do an INSERT, we must generate Key and Value. Given that values > in INSERT are supposed to be fields of either Key or Value, this confuses > me a lot. > > Maybe we should come up with DDL first? > > 2016-07-21 14:06 GMT+03:00 Sergi Vladykin <[hidden email]>: > > > No, this does not make sense. > > > > There is no upsert mode in databases. There are operations: INSERT, > UPDATE, > > DELETE, MERGE. > > > > I want to have clear understanding of how they have to behave in SQL > > databases and how they will actually behave in Ignite in different > > scenarios. Also I want to have clear understanding of performance > > implications of each decision here. > > > > Anything wrong with that? > > > > Sergi > > > > On Thu, Jul 21, 2016 at 1:04 PM, Dmitriy Setrakyan < > [hidden email]> > > wrote: > > > > > Serj, are you asking what will happen as of today? Then the answer to > all > > > your questions is that duplicate keys are not an issue, and Ignite > always > > > operates in **upsert** mode (which is essentially a *“put(…)” *method). > > > > > > However, the *“insert”* that is suggested by Alex would delegate to > > > *“putIfAbsent(…)”*, which in database world makes more sense. However, > in > > > this case, the *“update”* syntax should delegate to *“replace(…)”*, as > > > update should fail in case if a key is absent. > > > > > > Considering the above, a notion of “*upsert”* or “*merge” *operation is > > > very much needed, as it will give a user an option to perform > > > “insert-or-update” in 1 call. > > > > > > Does this make sense? > > > > > > D. > > > > > > On Wed, Jul 20, 2016 at 9:39 PM, Sergi Vladykin < > > [hidden email]> > > > wrote: > > > > > > > I'd prefer to do MERGE operation last because in H2 it is not > standard > > > ANSI > > > > SQL MERGE. Or may be not implement it at all, or may be contribute > ANSI > > > > correct version to H2, then implement it on Ignite. Need to > investigate > > > the > > > > semantics deeper before making any decisions here. > > > > > > > > Lets start with simple scenarios for INSERT and go through all the > > > possible > > > > cases and answer the questions: > > > > - What will happen on key conflict in TX cache? > > > > - What will happen on key conflict in Atomic cache? > > > > - What will happen with the previous two if we use DataLoader? > > > > - How to make these operations efficient (it will be simple enough to > > > > implement them with separate put/putIfAbsent operations but probably > we > > > > will need some batching like putAllIfAbsent for efficiency)? > > > > > > > > As for API, we still will need to have a single entry point for all > SQL > > > > queries/commands to allow any console work with it transparently. It > > > would > > > > be great if we will be able to come up with something consistent with > > > this > > > > idea on public API. > > > > > > > > Sergi > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Jul 20, 2016 at 2:23 PM, Dmitriy Setrakyan < > > > > [hidden email]> > > > > wrote: > > > > > > > > > Like the idea of merge and insert. I need more time to think about > > the > > > > API > > > > > changes. > > > > > > > > > > Sergi, what do you think? > > > > > > > > > > Dmitriy > > > > > > > > > > > > > > > > > > > > On Jul 20, 2016, at 12:36 PM, Alexander Paschenko < > > > > > [hidden email]> wrote: > > > > > > > > > > >> Thus, I suggest that we implement MERGE as a separate operation > > > backed > > > > > by putIfAbsent operation, while INSERT will be implemented via put. > > > > > > > > > > > > Sorry, of course I meant that MERGE has to be put-based, while > > INSERT > > > > > > has to be putIfAbsent-based. > > > > > > > > > > > > 2016-07-20 12:30 GMT+03:00 Alexander Paschenko > > > > > > <[hidden email]>: > > > > > >> Hell Igniters, > > > > > >> > > > > > >> In this thread I would like to share and discuss some thoughts > on > > > DML > > > > > >> operations' implementation, so let's start and keep it here. > > > Everyone > > > > > >> is of course welcome to share their suggestions. > > > > > >> > > > > > >> For starters, I was thinking about semantics of INSERT. In > > > traditional > > > > > >> RDBMSs, INSERT works only for records whose primary keys don't > > > > > >> conflict with those of records that are already persistent - you > > > can't > > > > > >> try to insert the same key more than once because you'll get an > > > error. > > > > > >> However, semantics of cache put is obviously different - it does > > not > > > > > >> have anything about duplicate keys, it just quietly updates > values > > > in > > > > > >> case of keys' duplication. Still, cache has putIfAbsent > operation > > > that > > > > > >> is closer to traditional notion of INSERT, and H2's SQL dialect > > has > > > > > >> MERGE operation which corresponds to semantics of cache put. > > Thus, I > > > > > >> suggest that we implement MERGE as a separate operation backed > by > > > > > >> putIfAbsent operation, while INSERT will be implemented via put. > > > > > >> > > > > > >> And one more, probably more important thing: I suggest that we > > > create > > > > > >> separate class Update and corresponding operation update() in > > > > > >> IgniteCache. The reasons are as follows: > > > > > >> > > > > > >> - Query bears some flags that are clearly redundant for Update > > (page > > > > > >> size, locality) > > > > > >> - query() method in IgniteCache (one that accepts Query) and > > query() > > > > > >> methods in GridQueryIndexing return iterators. So, if we strive > to > > > > > >> leave interfaces unchanged, we still will introduce some design > > > > > >> ugliness like query methods returning empty iterators for > certain > > > > > >> queries, and/or query flags that indicate whether it's an update > > > query > > > > > >> or not, etc. > > > > > >> - If some Queries are update queries, then continuous queries > > can't > > > be > > > > > >> based on them - more design-wise ugly checks and stuff like > that. > > > > > >> - I'm pretty sure there's more I don't know about. > > > > > >> > > > > > >> Comments and suggestions are welcome. Sergi Vladykin, Dmitry > > > > > >> Setrakyan, your opinions are of particular interest, please > > advise. > > > > > >> > > > > > >> Regards, > > > > > >> Alex > > > > > > > > > > > > > > > |
Suppose I have a PersonKey {int id;} class and Person {@SqlQueryField int
id; @SqlQueryField String name; @SqlQueryField int age;} class. How would an insert statement look like? INSERT INTO Person (_key, _val) values (...) INSERT INTO Person (_key, _val, id, name, age) values (...) -> Obviously will not be usable from any console unless we are able to parse "new PersonKey(1)" statements. INSERT INTO Person (id, name, age) values (...) -> How do we know how to construct the PersonKey? Most likely I am missing something, but this is not clear for me so far. |
I think these problems we need to solve regardless of DDL.
Sergi On 21 июля 2016 г., 16:40, Alexey Goncharuk <[hidden email]> wrote: > Suppose I have a PersonKey {int id;} class and Person {@SqlQueryField int > id; @SqlQueryField String name; @SqlQueryField int age;} class. > > How would an insert statement look like? > INSERT INTO Person (_key, _val) values (...) > INSERT INTO Person (_key, _val, id, name, age) values (...) -> Obviously > will not be usable from any console unless we are able to parse "new > PersonKey(1)" statements. > > INSERT INTO Person (id, name, age) values (...) -> How do we know how to > construct the PersonKey? Most likely I am missing something, but this is > not clear for me so far. > |
I agree. It looks like we first of all need some algorithm of key
generation for new objects, and it does not necessarily have to be involved with DDL. The first suggestion that comes to my mind on that matter is, obviously, marking some method of the class persisted with magical annotation - so that the value will be able to supply its own key when needed. Alex 2016-07-21 17:11 GMT+03:00 Sergi Vladykin <[hidden email]>: > I think these problems we need to solve regardless of DDL. > > Sergi > > On 21 июля 2016 г., 16:40, Alexey Goncharuk <[hidden email]> > wrote: > >> Suppose I have a PersonKey {int id;} class and Person {@SqlQueryField int >> id; @SqlQueryField String name; @SqlQueryField int age;} class. >> >> How would an insert statement look like? >> INSERT INTO Person (_key, _val) values (...) >> INSERT INTO Person (_key, _val, id, name, age) values (...) -> Obviously >> will not be usable from any console unless we are able to parse "new >> PersonKey(1)" statements. >> >> INSERT INTO Person (id, name, age) values (...) -> How do we know how to >> construct the PersonKey? Most likely I am missing something, but this is >> not clear for me so far. >> |
In reply to this post by Sergi
OK, then using your analogy, the current behavior in Ignite is MERGE for
the most part. My preference is that Ignite SQL should work no different from traditional databases, which means: - INSERT is translated into *putIfAbsent()* call in Ignite - UPDATE is translated into *replace()* call in Ignite - MERGE is translated into *put()* call in Ignite - For SQL BATCH calls we should delegate to Ignite batch operations, e.g. *putAll()* The above should hold true for atomic and transactional put/putAll calls, as well as for the data streamer. Does this make sense? D. On Thu, Jul 21, 2016 at 4:06 AM, Sergi Vladykin <[hidden email]> wrote: > No, this does not make sense. > > There is no upsert mode in databases. There are operations: INSERT, UPDATE, > DELETE, MERGE. > > I want to have clear understanding of how they have to behave in SQL > databases and how they will actually behave in Ignite in different > scenarios. Also I want to have clear understanding of performance > implications of each decision here. > > Anything wrong with that? > > Sergi > > On Thu, Jul 21, 2016 at 1:04 PM, Dmitriy Setrakyan <[hidden email]> > wrote: > > > Serj, are you asking what will happen as of today? Then the answer to all > > your questions is that duplicate keys are not an issue, and Ignite always > > operates in **upsert** mode (which is essentially a *“put(…)” *method). > > > > However, the *“insert”* that is suggested by Alex would delegate to > > *“putIfAbsent(…)”*, which in database world makes more sense. However, in > > this case, the *“update”* syntax should delegate to *“replace(…)”*, as > > update should fail in case if a key is absent. > > > > Considering the above, a notion of “*upsert”* or “*merge” *operation is > > very much needed, as it will give a user an option to perform > > “insert-or-update” in 1 call. > > > > Does this make sense? > > > > D. > > > > On Wed, Jul 20, 2016 at 9:39 PM, Sergi Vladykin < > [hidden email]> > > wrote: > > > > > I'd prefer to do MERGE operation last because in H2 it is not standard > > ANSI > > > SQL MERGE. Or may be not implement it at all, or may be contribute ANSI > > > correct version to H2, then implement it on Ignite. Need to investigate > > the > > > semantics deeper before making any decisions here. > > > > > > Lets start with simple scenarios for INSERT and go through all the > > possible > > > cases and answer the questions: > > > - What will happen on key conflict in TX cache? > > > - What will happen on key conflict in Atomic cache? > > > - What will happen with the previous two if we use DataLoader? > > > - How to make these operations efficient (it will be simple enough to > > > implement them with separate put/putIfAbsent operations but probably we > > > will need some batching like putAllIfAbsent for efficiency)? > > > > > > As for API, we still will need to have a single entry point for all SQL > > > queries/commands to allow any console work with it transparently. It > > would > > > be great if we will be able to come up with something consistent with > > this > > > idea on public API. > > > > > > Sergi > > > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Jul 20, 2016 at 2:23 PM, Dmitriy Setrakyan < > > > [hidden email]> > > > wrote: > > > > > > > Like the idea of merge and insert. I need more time to think about > the > > > API > > > > changes. > > > > > > > > Sergi, what do you think? > > > > > > > > Dmitriy > > > > > > > > > > > > > > > > On Jul 20, 2016, at 12:36 PM, Alexander Paschenko < > > > > [hidden email]> wrote: > > > > > > > > >> Thus, I suggest that we implement MERGE as a separate operation > > backed > > > > by putIfAbsent operation, while INSERT will be implemented via put. > > > > > > > > > > Sorry, of course I meant that MERGE has to be put-based, while > INSERT > > > > > has to be putIfAbsent-based. > > > > > > > > > > 2016-07-20 12:30 GMT+03:00 Alexander Paschenko > > > > > <[hidden email]>: > > > > >> Hell Igniters, > > > > >> > > > > >> In this thread I would like to share and discuss some thoughts on > > DML > > > > >> operations' implementation, so let's start and keep it here. > > Everyone > > > > >> is of course welcome to share their suggestions. > > > > >> > > > > >> For starters, I was thinking about semantics of INSERT. In > > traditional > > > > >> RDBMSs, INSERT works only for records whose primary keys don't > > > > >> conflict with those of records that are already persistent - you > > can't > > > > >> try to insert the same key more than once because you'll get an > > error. > > > > >> However, semantics of cache put is obviously different - it does > not > > > > >> have anything about duplicate keys, it just quietly updates values > > in > > > > >> case of keys' duplication. Still, cache has putIfAbsent operation > > that > > > > >> is closer to traditional notion of INSERT, and H2's SQL dialect > has > > > > >> MERGE operation which corresponds to semantics of cache put. > Thus, I > > > > >> suggest that we implement MERGE as a separate operation backed by > > > > >> putIfAbsent operation, while INSERT will be implemented via put. > > > > >> > > > > >> And one more, probably more important thing: I suggest that we > > create > > > > >> separate class Update and corresponding operation update() in > > > > >> IgniteCache. The reasons are as follows: > > > > >> > > > > >> - Query bears some flags that are clearly redundant for Update > (page > > > > >> size, locality) > > > > >> - query() method in IgniteCache (one that accepts Query) and > query() > > > > >> methods in GridQueryIndexing return iterators. So, if we strive to > > > > >> leave interfaces unchanged, we still will introduce some design > > > > >> ugliness like query methods returning empty iterators for certain > > > > >> queries, and/or query flags that indicate whether it's an update > > query > > > > >> or not, etc. > > > > >> - If some Queries are update queries, then continuous queries > can't > > be > > > > >> based on them - more design-wise ugly checks and stuff like that. > > > > >> - I'm pretty sure there's more I don't know about. > > > > >> > > > > >> Comments and suggestions are welcome. Sergi Vladykin, Dmitry > > > > >> Setrakyan, your opinions are of particular interest, please > advise. > > > > >> > > > > >> Regards, > > > > >> Alex > > > > > > > > > > |
Guys,
I would like to advance the discussion further. There's one quite important question that arose based on current state of work on this issue. If we use some kind of interactive console, like Visor, then how should it know whether SQL query it is requested to execute returns a result set or not? In JDBC world, solution is quite simple - there's base interface called Statement that all commands implement, and it has magic isResultSet method that tells whether statement is a query or an update command. The API proposed now has separate Query and Update operations which I believe to be a right thing by the reasons I outlined in the beginning of this thread. However, their lack of common ancestor prevents possible console clients from running text SQL commands in a fully transparent manner - like IgniteCache.execute(String sql). Therefore I see two possible ways of solving this: - we change API so that it includes new class or interface parenting both Query and Update, and clients use it to communicate with cache - we let (or make :) ) the client determine command type independently and behave accordingly - for it to work it will have some kind of command parsing by itself just to determine its type. Visor console may use simple library like JSqlParser (https://github.com/JSQLParser/JSqlParser; dual LGPL 2.1/ASF 2.0 licensed) to determine request type in terms of JDBC, and behave accordingly. Personally, I think that the second approach is better - and here's why. First, it does not seem wise to change API simply to make console (or any other) clients simpler. Programmatic APIs should be concise and short for programmatic use, console clients should be easy to use from console - and that's it: after all, console client exists to free a user from burden of doing things programmatically, so its aim is to adapt API to console or whatever UI. Second, possible complications in client implied by such approach certainly won't be dramatic - I don't think that additional single query parsing operation in client code will make it much harder to develop. Third, as I see it now, adding a new "synthetic" entity and new method would take more effort to adapting the client to new API. Dmitry, Sergi, I would like to hear what you think about it all. Thanks. - Alex 2016-07-21 21:17 GMT+03:00 Dmitriy Setrakyan <[hidden email]>: > OK, then using your analogy, the current behavior in Ignite is MERGE for > the most part. > > My preference is that Ignite SQL should work no different from traditional > databases, which means: > > - INSERT is translated into *putIfAbsent()* call in Ignite > - UPDATE is translated into *replace()* call in Ignite > - MERGE is translated into *put()* call in Ignite > - For SQL BATCH calls we should delegate to Ignite batch operations, e.g. > *putAll()* > > The above should hold true for atomic and transactional put/putAll calls, > as well as for the data streamer. > > Does this make sense? > > D. > > On Thu, Jul 21, 2016 at 4:06 AM, Sergi Vladykin <[hidden email]> > wrote: > >> No, this does not make sense. >> >> There is no upsert mode in databases. There are operations: INSERT, UPDATE, >> DELETE, MERGE. >> >> I want to have clear understanding of how they have to behave in SQL >> databases and how they will actually behave in Ignite in different >> scenarios. Also I want to have clear understanding of performance >> implications of each decision here. >> >> Anything wrong with that? >> >> Sergi >> >> On Thu, Jul 21, 2016 at 1:04 PM, Dmitriy Setrakyan <[hidden email]> >> wrote: >> >> > Serj, are you asking what will happen as of today? Then the answer to all >> > your questions is that duplicate keys are not an issue, and Ignite always >> > operates in **upsert** mode (which is essentially a *“put(…)” *method). >> > >> > However, the *“insert”* that is suggested by Alex would delegate to >> > *“putIfAbsent(…)”*, which in database world makes more sense. However, in >> > this case, the *“update”* syntax should delegate to *“replace(…)”*, as >> > update should fail in case if a key is absent. >> > >> > Considering the above, a notion of “*upsert”* or “*merge” *operation is >> > very much needed, as it will give a user an option to perform >> > “insert-or-update” in 1 call. >> > >> > Does this make sense? >> > >> > D. >> > >> > On Wed, Jul 20, 2016 at 9:39 PM, Sergi Vladykin < >> [hidden email]> >> > wrote: >> > >> > > I'd prefer to do MERGE operation last because in H2 it is not standard >> > ANSI >> > > SQL MERGE. Or may be not implement it at all, or may be contribute ANSI >> > > correct version to H2, then implement it on Ignite. Need to investigate >> > the >> > > semantics deeper before making any decisions here. >> > > >> > > Lets start with simple scenarios for INSERT and go through all the >> > possible >> > > cases and answer the questions: >> > > - What will happen on key conflict in TX cache? >> > > - What will happen on key conflict in Atomic cache? >> > > - What will happen with the previous two if we use DataLoader? >> > > - How to make these operations efficient (it will be simple enough to >> > > implement them with separate put/putIfAbsent operations but probably we >> > > will need some batching like putAllIfAbsent for efficiency)? >> > > >> > > As for API, we still will need to have a single entry point for all SQL >> > > queries/commands to allow any console work with it transparently. It >> > would >> > > be great if we will be able to come up with something consistent with >> > this >> > > idea on public API. >> > > >> > > Sergi >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > On Wed, Jul 20, 2016 at 2:23 PM, Dmitriy Setrakyan < >> > > [hidden email]> >> > > wrote: >> > > >> > > > Like the idea of merge and insert. I need more time to think about >> the >> > > API >> > > > changes. >> > > > >> > > > Sergi, what do you think? >> > > > >> > > > Dmitriy >> > > > >> > > > >> > > > >> > > > On Jul 20, 2016, at 12:36 PM, Alexander Paschenko < >> > > > [hidden email]> wrote: >> > > > >> > > > >> Thus, I suggest that we implement MERGE as a separate operation >> > backed >> > > > by putIfAbsent operation, while INSERT will be implemented via put. >> > > > > >> > > > > Sorry, of course I meant that MERGE has to be put-based, while >> INSERT >> > > > > has to be putIfAbsent-based. >> > > > > >> > > > > 2016-07-20 12:30 GMT+03:00 Alexander Paschenko >> > > > > <[hidden email]>: >> > > > >> Hell Igniters, >> > > > >> >> > > > >> In this thread I would like to share and discuss some thoughts on >> > DML >> > > > >> operations' implementation, so let's start and keep it here. >> > Everyone >> > > > >> is of course welcome to share their suggestions. >> > > > >> >> > > > >> For starters, I was thinking about semantics of INSERT. In >> > traditional >> > > > >> RDBMSs, INSERT works only for records whose primary keys don't >> > > > >> conflict with those of records that are already persistent - you >> > can't >> > > > >> try to insert the same key more than once because you'll get an >> > error. >> > > > >> However, semantics of cache put is obviously different - it does >> not >> > > > >> have anything about duplicate keys, it just quietly updates values >> > in >> > > > >> case of keys' duplication. Still, cache has putIfAbsent operation >> > that >> > > > >> is closer to traditional notion of INSERT, and H2's SQL dialect >> has >> > > > >> MERGE operation which corresponds to semantics of cache put. >> Thus, I >> > > > >> suggest that we implement MERGE as a separate operation backed by >> > > > >> putIfAbsent operation, while INSERT will be implemented via put. >> > > > >> >> > > > >> And one more, probably more important thing: I suggest that we >> > create >> > > > >> separate class Update and corresponding operation update() in >> > > > >> IgniteCache. The reasons are as follows: >> > > > >> >> > > > >> - Query bears some flags that are clearly redundant for Update >> (page >> > > > >> size, locality) >> > > > >> - query() method in IgniteCache (one that accepts Query) and >> query() >> > > > >> methods in GridQueryIndexing return iterators. So, if we strive to >> > > > >> leave interfaces unchanged, we still will introduce some design >> > > > >> ugliness like query methods returning empty iterators for certain >> > > > >> queries, and/or query flags that indicate whether it's an update >> > query >> > > > >> or not, etc. >> > > > >> - If some Queries are update queries, then continuous queries >> can't >> > be >> > > > >> based on them - more design-wise ugly checks and stuff like that. >> > > > >> - I'm pretty sure there's more I don't know about. >> > > > >> >> > > > >> Comments and suggestions are welcome. Sergi Vladykin, Dmitry >> > > > >> Setrakyan, your opinions are of particular interest, please >> advise. >> > > > >> >> > > > >> Regards, >> > > > >> Alex >> > > > >> > > >> > >> |
I don't like any pre-parsing, especially with some libraries other than H2.
H2 itself has enough quirks to multiply it on quirks of another library. This is exactly what I was talking about - we need some single entry point on API for all the SQL commands and queries. Thats why I suggested SqlUpdate to extend Query. To me its is the cleanest approach. May be we need to change in some backward compatible way this Query hierarchy to get rid of extra methods but the idea is still the same. Sergi 2016-07-26 14:34 GMT+03:00 Alexander Paschenko < [hidden email]>: > Guys, > > I would like to advance the discussion further. There's one quite > important question that arose based on current state of work on this > issue. If we use some kind of interactive console, like Visor, then > how should it know whether SQL query it is requested to execute > returns a result set or not? In JDBC world, solution is quite simple - > there's base interface called Statement that all commands implement, > and it has magic isResultSet method that tells whether statement is a > query or an update command. The API proposed now has separate Query > and Update operations which I believe to be a right thing by the > reasons I outlined in the beginning of this thread. However, their > lack of common ancestor prevents possible console clients from running > text SQL commands in a fully transparent manner - like > IgniteCache.execute(String sql). Therefore I see two possible ways of > solving this: > > - we change API so that it includes new class or interface parenting > both Query and Update, and clients use it to communicate with cache > - we let (or make :) ) the client determine command type independently > and behave accordingly - for it to work it will have some kind of > command parsing by itself just to determine its type. Visor console > may use simple library like JSqlParser > (https://github.com/JSQLParser/JSqlParser; dual LGPL 2.1/ASF 2.0 > licensed) to determine request type in terms of JDBC, and behave > accordingly. > > Personally, I think that the second approach is better - and here's why. > > First, it does not seem wise to change API simply to make console (or > any other) clients simpler. Programmatic APIs should be concise and > short for programmatic use, console clients should be easy to use from > console - and that's it: after all, console client exists to free a > user from burden of doing things programmatically, so its aim is to > adapt API to console or whatever UI. > Second, possible complications in client implied by such approach > certainly won't be dramatic - I don't think that additional single > query parsing operation in client code will make it much harder to > develop. > Third, as I see it now, adding a new "synthetic" entity and new method > would take more effort to adapting the client to new API. > > Dmitry, Sergi, I would like to hear what you think about it all. Thanks. > > - Alex > > 2016-07-21 21:17 GMT+03:00 Dmitriy Setrakyan <[hidden email]>: > > OK, then using your analogy, the current behavior in Ignite is MERGE for > > the most part. > > > > My preference is that Ignite SQL should work no different from > traditional > > databases, which means: > > > > - INSERT is translated into *putIfAbsent()* call in Ignite > > - UPDATE is translated into *replace()* call in Ignite > > - MERGE is translated into *put()* call in Ignite > > - For SQL BATCH calls we should delegate to Ignite batch operations, e.g. > > *putAll()* > > > > The above should hold true for atomic and transactional put/putAll calls, > > as well as for the data streamer. > > > > Does this make sense? > > > > D. > > > > On Thu, Jul 21, 2016 at 4:06 AM, Sergi Vladykin < > [hidden email]> > > wrote: > > > >> No, this does not make sense. > >> > >> There is no upsert mode in databases. There are operations: INSERT, > UPDATE, > >> DELETE, MERGE. > >> > >> I want to have clear understanding of how they have to behave in SQL > >> databases and how they will actually behave in Ignite in different > >> scenarios. Also I want to have clear understanding of performance > >> implications of each decision here. > >> > >> Anything wrong with that? > >> > >> Sergi > >> > >> On Thu, Jul 21, 2016 at 1:04 PM, Dmitriy Setrakyan < > [hidden email]> > >> wrote: > >> > >> > Serj, are you asking what will happen as of today? Then the answer to > all > >> > your questions is that duplicate keys are not an issue, and Ignite > always > >> > operates in **upsert** mode (which is essentially a *“put(…)” > *method). > >> > > >> > However, the *“insert”* that is suggested by Alex would delegate to > >> > *“putIfAbsent(…)”*, which in database world makes more sense. > However, in > >> > this case, the *“update”* syntax should delegate to *“replace(…)”*, as > >> > update should fail in case if a key is absent. > >> > > >> > Considering the above, a notion of “*upsert”* or “*merge” *operation > is > >> > very much needed, as it will give a user an option to perform > >> > “insert-or-update” in 1 call. > >> > > >> > Does this make sense? > >> > > >> > D. > >> > > >> > On Wed, Jul 20, 2016 at 9:39 PM, Sergi Vladykin < > >> [hidden email]> > >> > wrote: > >> > > >> > > I'd prefer to do MERGE operation last because in H2 it is not > standard > >> > ANSI > >> > > SQL MERGE. Or may be not implement it at all, or may be contribute > ANSI > >> > > correct version to H2, then implement it on Ignite. Need to > investigate > >> > the > >> > > semantics deeper before making any decisions here. > >> > > > >> > > Lets start with simple scenarios for INSERT and go through all the > >> > possible > >> > > cases and answer the questions: > >> > > - What will happen on key conflict in TX cache? > >> > > - What will happen on key conflict in Atomic cache? > >> > > - What will happen with the previous two if we use DataLoader? > >> > > - How to make these operations efficient (it will be simple enough > to > >> > > implement them with separate put/putIfAbsent operations but > probably we > >> > > will need some batching like putAllIfAbsent for efficiency)? > >> > > > >> > > As for API, we still will need to have a single entry point for all > SQL > >> > > queries/commands to allow any console work with it transparently. It > >> > would > >> > > be great if we will be able to come up with something consistent > with > >> > this > >> > > idea on public API. > >> > > > >> > > Sergi > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > On Wed, Jul 20, 2016 at 2:23 PM, Dmitriy Setrakyan < > >> > > [hidden email]> > >> > > wrote: > >> > > > >> > > > Like the idea of merge and insert. I need more time to think about > >> the > >> > > API > >> > > > changes. > >> > > > > >> > > > Sergi, what do you think? > >> > > > > >> > > > Dmitriy > >> > > > > >> > > > > >> > > > > >> > > > On Jul 20, 2016, at 12:36 PM, Alexander Paschenko < > >> > > > [hidden email]> wrote: > >> > > > > >> > > > >> Thus, I suggest that we implement MERGE as a separate operation > >> > backed > >> > > > by putIfAbsent operation, while INSERT will be implemented via > put. > >> > > > > > >> > > > > Sorry, of course I meant that MERGE has to be put-based, while > >> INSERT > >> > > > > has to be putIfAbsent-based. > >> > > > > > >> > > > > 2016-07-20 12:30 GMT+03:00 Alexander Paschenko > >> > > > > <[hidden email]>: > >> > > > >> Hell Igniters, > >> > > > >> > >> > > > >> In this thread I would like to share and discuss some thoughts > on > >> > DML > >> > > > >> operations' implementation, so let's start and keep it here. > >> > Everyone > >> > > > >> is of course welcome to share their suggestions. > >> > > > >> > >> > > > >> For starters, I was thinking about semantics of INSERT. In > >> > traditional > >> > > > >> RDBMSs, INSERT works only for records whose primary keys don't > >> > > > >> conflict with those of records that are already persistent - > you > >> > can't > >> > > > >> try to insert the same key more than once because you'll get an > >> > error. > >> > > > >> However, semantics of cache put is obviously different - it > does > >> not > >> > > > >> have anything about duplicate keys, it just quietly updates > values > >> > in > >> > > > >> case of keys' duplication. Still, cache has putIfAbsent > operation > >> > that > >> > > > >> is closer to traditional notion of INSERT, and H2's SQL dialect > >> has > >> > > > >> MERGE operation which corresponds to semantics of cache put. > >> Thus, I > >> > > > >> suggest that we implement MERGE as a separate operation backed > by > >> > > > >> putIfAbsent operation, while INSERT will be implemented via > put. > >> > > > >> > >> > > > >> And one more, probably more important thing: I suggest that we > >> > create > >> > > > >> separate class Update and corresponding operation update() in > >> > > > >> IgniteCache. The reasons are as follows: > >> > > > >> > >> > > > >> - Query bears some flags that are clearly redundant for Update > >> (page > >> > > > >> size, locality) > >> > > > >> - query() method in IgniteCache (one that accepts Query) and > >> query() > >> > > > >> methods in GridQueryIndexing return iterators. So, if we > strive to > >> > > > >> leave interfaces unchanged, we still will introduce some design > >> > > > >> ugliness like query methods returning empty iterators for > certain > >> > > > >> queries, and/or query flags that indicate whether it's an > update > >> > query > >> > > > >> or not, etc. > >> > > > >> - If some Queries are update queries, then continuous queries > >> can't > >> > be > >> > > > >> based on them - more design-wise ugly checks and stuff like > that. > >> > > > >> - I'm pretty sure there's more I don't know about. > >> > > > >> > >> > > > >> Comments and suggestions are welcome. Sergi Vladykin, Dmitry > >> > > > >> Setrakyan, your opinions are of particular interest, please > >> advise. > >> > > > >> > >> > > > >> Regards, > >> > > > >> Alex > >> > > > > >> > > > >> > > >> > |
BTW, the simplest way to solve this issue is to allow running SQL commands
inside of SqlFieldsQuery. We may add some additional convenience API for updates if we want, but JDBC client will always call it like this: cache.query(new SqlFieldsQuery("INSERT INTO MY_TABLE VALUES(?,?)").setArgs(1,2)); This will resolve any ambiguity. Sergi 2016-07-26 17:56 GMT+03:00 Sergi Vladykin <[hidden email]>: > I don't like any pre-parsing, especially with some libraries other than > H2. H2 itself has enough quirks to multiply it on quirks of another library. > > This is exactly what I was talking about - we need some single entry point > on API for all the SQL commands and queries. Thats why I suggested > SqlUpdate to extend Query. To me its is the cleanest approach. May be we > need to change in some backward compatible way this Query hierarchy to get > rid of extra methods but the idea is still the same. > > Sergi > > 2016-07-26 14:34 GMT+03:00 Alexander Paschenko < > [hidden email]>: > >> Guys, >> >> I would like to advance the discussion further. There's one quite >> important question that arose based on current state of work on this >> issue. If we use some kind of interactive console, like Visor, then >> how should it know whether SQL query it is requested to execute >> returns a result set or not? In JDBC world, solution is quite simple - >> there's base interface called Statement that all commands implement, >> and it has magic isResultSet method that tells whether statement is a >> query or an update command. The API proposed now has separate Query >> and Update operations which I believe to be a right thing by the >> reasons I outlined in the beginning of this thread. However, their >> lack of common ancestor prevents possible console clients from running >> text SQL commands in a fully transparent manner - like >> IgniteCache.execute(String sql). Therefore I see two possible ways of >> solving this: >> >> - we change API so that it includes new class or interface parenting >> both Query and Update, and clients use it to communicate with cache >> - we let (or make :) ) the client determine command type independently >> and behave accordingly - for it to work it will have some kind of >> command parsing by itself just to determine its type. Visor console >> may use simple library like JSqlParser >> (https://github.com/JSQLParser/JSqlParser; dual LGPL 2.1/ASF 2.0 >> licensed) to determine request type in terms of JDBC, and behave >> accordingly. >> >> Personally, I think that the second approach is better - and here's why. >> >> First, it does not seem wise to change API simply to make console (or >> any other) clients simpler. Programmatic APIs should be concise and >> short for programmatic use, console clients should be easy to use from >> console - and that's it: after all, console client exists to free a >> user from burden of doing things programmatically, so its aim is to >> adapt API to console or whatever UI. >> Second, possible complications in client implied by such approach >> certainly won't be dramatic - I don't think that additional single >> query parsing operation in client code will make it much harder to >> develop. >> Third, as I see it now, adding a new "synthetic" entity and new method >> would take more effort to adapting the client to new API. >> >> Dmitry, Sergi, I would like to hear what you think about it all. Thanks. >> >> - Alex >> >> 2016-07-21 21:17 GMT+03:00 Dmitriy Setrakyan <[hidden email]>: >> > OK, then using your analogy, the current behavior in Ignite is MERGE for >> > the most part. >> > >> > My preference is that Ignite SQL should work no different from >> traditional >> > databases, which means: >> > >> > - INSERT is translated into *putIfAbsent()* call in Ignite >> > - UPDATE is translated into *replace()* call in Ignite >> > - MERGE is translated into *put()* call in Ignite >> > - For SQL BATCH calls we should delegate to Ignite batch operations, >> e.g. >> > *putAll()* >> > >> > The above should hold true for atomic and transactional put/putAll >> calls, >> > as well as for the data streamer. >> > >> > Does this make sense? >> > >> > D. >> > >> > On Thu, Jul 21, 2016 at 4:06 AM, Sergi Vladykin < >> [hidden email]> >> > wrote: >> > >> >> No, this does not make sense. >> >> >> >> There is no upsert mode in databases. There are operations: INSERT, >> UPDATE, >> >> DELETE, MERGE. >> >> >> >> I want to have clear understanding of how they have to behave in SQL >> >> databases and how they will actually behave in Ignite in different >> >> scenarios. Also I want to have clear understanding of performance >> >> implications of each decision here. >> >> >> >> Anything wrong with that? >> >> >> >> Sergi >> >> >> >> On Thu, Jul 21, 2016 at 1:04 PM, Dmitriy Setrakyan < >> [hidden email]> >> >> wrote: >> >> >> >> > Serj, are you asking what will happen as of today? Then the answer >> to all >> >> > your questions is that duplicate keys are not an issue, and Ignite >> always >> >> > operates in **upsert** mode (which is essentially a *“put(…)” >> *method). >> >> > >> >> > However, the *“insert”* that is suggested by Alex would delegate to >> >> > *“putIfAbsent(…)”*, which in database world makes more sense. >> However, in >> >> > this case, the *“update”* syntax should delegate to *“replace(…)”*, >> as >> >> > update should fail in case if a key is absent. >> >> > >> >> > Considering the above, a notion of “*upsert”* or “*merge” *operation >> is >> >> > very much needed, as it will give a user an option to perform >> >> > “insert-or-update” in 1 call. >> >> > >> >> > Does this make sense? >> >> > >> >> > D. >> >> > >> >> > On Wed, Jul 20, 2016 at 9:39 PM, Sergi Vladykin < >> >> [hidden email]> >> >> > wrote: >> >> > >> >> > > I'd prefer to do MERGE operation last because in H2 it is not >> standard >> >> > ANSI >> >> > > SQL MERGE. Or may be not implement it at all, or may be contribute >> ANSI >> >> > > correct version to H2, then implement it on Ignite. Need to >> investigate >> >> > the >> >> > > semantics deeper before making any decisions here. >> >> > > >> >> > > Lets start with simple scenarios for INSERT and go through all the >> >> > possible >> >> > > cases and answer the questions: >> >> > > - What will happen on key conflict in TX cache? >> >> > > - What will happen on key conflict in Atomic cache? >> >> > > - What will happen with the previous two if we use DataLoader? >> >> > > - How to make these operations efficient (it will be simple enough >> to >> >> > > implement them with separate put/putIfAbsent operations but >> probably we >> >> > > will need some batching like putAllIfAbsent for efficiency)? >> >> > > >> >> > > As for API, we still will need to have a single entry point for >> all SQL >> >> > > queries/commands to allow any console work with it transparently. >> It >> >> > would >> >> > > be great if we will be able to come up with something consistent >> with >> >> > this >> >> > > idea on public API. >> >> > > >> >> > > Sergi >> >> > > >> >> > > >> >> > > >> >> > > >> >> > > >> >> > > >> >> > > >> >> > > >> >> > > On Wed, Jul 20, 2016 at 2:23 PM, Dmitriy Setrakyan < >> >> > > [hidden email]> >> >> > > wrote: >> >> > > >> >> > > > Like the idea of merge and insert. I need more time to think >> about >> >> the >> >> > > API >> >> > > > changes. >> >> > > > >> >> > > > Sergi, what do you think? >> >> > > > >> >> > > > Dmitriy >> >> > > > >> >> > > > >> >> > > > >> >> > > > On Jul 20, 2016, at 12:36 PM, Alexander Paschenko < >> >> > > > [hidden email]> wrote: >> >> > > > >> >> > > > >> Thus, I suggest that we implement MERGE as a separate >> operation >> >> > backed >> >> > > > by putIfAbsent operation, while INSERT will be implemented via >> put. >> >> > > > > >> >> > > > > Sorry, of course I meant that MERGE has to be put-based, while >> >> INSERT >> >> > > > > has to be putIfAbsent-based. >> >> > > > > >> >> > > > > 2016-07-20 12:30 GMT+03:00 Alexander Paschenko >> >> > > > > <[hidden email]>: >> >> > > > >> Hell Igniters, >> >> > > > >> >> >> > > > >> In this thread I would like to share and discuss some >> thoughts on >> >> > DML >> >> > > > >> operations' implementation, so let's start and keep it here. >> >> > Everyone >> >> > > > >> is of course welcome to share their suggestions. >> >> > > > >> >> >> > > > >> For starters, I was thinking about semantics of INSERT. In >> >> > traditional >> >> > > > >> RDBMSs, INSERT works only for records whose primary keys don't >> >> > > > >> conflict with those of records that are already persistent - >> you >> >> > can't >> >> > > > >> try to insert the same key more than once because you'll get >> an >> >> > error. >> >> > > > >> However, semantics of cache put is obviously different - it >> does >> >> not >> >> > > > >> have anything about duplicate keys, it just quietly updates >> values >> >> > in >> >> > > > >> case of keys' duplication. Still, cache has putIfAbsent >> operation >> >> > that >> >> > > > >> is closer to traditional notion of INSERT, and H2's SQL >> dialect >> >> has >> >> > > > >> MERGE operation which corresponds to semantics of cache put. >> >> Thus, I >> >> > > > >> suggest that we implement MERGE as a separate operation >> backed by >> >> > > > >> putIfAbsent operation, while INSERT will be implemented via >> put. >> >> > > > >> >> >> > > > >> And one more, probably more important thing: I suggest that we >> >> > create >> >> > > > >> separate class Update and corresponding operation update() in >> >> > > > >> IgniteCache. The reasons are as follows: >> >> > > > >> >> >> > > > >> - Query bears some flags that are clearly redundant for Update >> >> (page >> >> > > > >> size, locality) >> >> > > > >> - query() method in IgniteCache (one that accepts Query) and >> >> query() >> >> > > > >> methods in GridQueryIndexing return iterators. So, if we >> strive to >> >> > > > >> leave interfaces unchanged, we still will introduce some >> design >> >> > > > >> ugliness like query methods returning empty iterators for >> certain >> >> > > > >> queries, and/or query flags that indicate whether it's an >> update >> >> > query >> >> > > > >> or not, etc. >> >> > > > >> - If some Queries are update queries, then continuous queries >> >> can't >> >> > be >> >> > > > >> based on them - more design-wise ugly checks and stuff like >> that. >> >> > > > >> - I'm pretty sure there's more I don't know about. >> >> > > > >> >> >> > > > >> Comments and suggestions are welcome. Sergi Vladykin, Dmitry >> >> > > > >> Setrakyan, your opinions are of particular interest, please >> >> advise. >> >> > > > >> >> >> > > > >> Regards, >> >> > > > >> Alex >> >> > > > >> >> > > >> >> > >> >> >> > > |
I see your point. But what about my concerns from initial post?
Particularly about signatures of existing methods? I personally don't like an option of query() method always returning an empty iterator for any non-select query, it seems ugly design wise. - Alex 2016-07-26 18:15 GMT+03:00 Sergi Vladykin <[hidden email]>: > BTW, the simplest way to solve this issue is to allow running SQL commands > inside of SqlFieldsQuery. > > We may add some additional convenience API for updates if we want, but JDBC > client will always call it like this: > > cache.query(new SqlFieldsQuery("INSERT INTO MY_TABLE > VALUES(?,?)").setArgs(1,2)); > > This will resolve any ambiguity. > > Sergi > > 2016-07-26 17:56 GMT+03:00 Sergi Vladykin <[hidden email]>: > >> I don't like any pre-parsing, especially with some libraries other than >> H2. H2 itself has enough quirks to multiply it on quirks of another library. >> >> This is exactly what I was talking about - we need some single entry point >> on API for all the SQL commands and queries. Thats why I suggested >> SqlUpdate to extend Query. To me its is the cleanest approach. May be we >> need to change in some backward compatible way this Query hierarchy to get >> rid of extra methods but the idea is still the same. >> >> Sergi >> >> 2016-07-26 14:34 GMT+03:00 Alexander Paschenko < >> [hidden email]>: >> >>> Guys, >>> >>> I would like to advance the discussion further. There's one quite >>> important question that arose based on current state of work on this >>> issue. If we use some kind of interactive console, like Visor, then >>> how should it know whether SQL query it is requested to execute >>> returns a result set or not? In JDBC world, solution is quite simple - >>> there's base interface called Statement that all commands implement, >>> and it has magic isResultSet method that tells whether statement is a >>> query or an update command. The API proposed now has separate Query >>> and Update operations which I believe to be a right thing by the >>> reasons I outlined in the beginning of this thread. However, their >>> lack of common ancestor prevents possible console clients from running >>> text SQL commands in a fully transparent manner - like >>> IgniteCache.execute(String sql). Therefore I see two possible ways of >>> solving this: >>> >>> - we change API so that it includes new class or interface parenting >>> both Query and Update, and clients use it to communicate with cache >>> - we let (or make :) ) the client determine command type independently >>> and behave accordingly - for it to work it will have some kind of >>> command parsing by itself just to determine its type. Visor console >>> may use simple library like JSqlParser >>> (https://github.com/JSQLParser/JSqlParser; dual LGPL 2.1/ASF 2.0 >>> licensed) to determine request type in terms of JDBC, and behave >>> accordingly. >>> >>> Personally, I think that the second approach is better - and here's why. >>> >>> First, it does not seem wise to change API simply to make console (or >>> any other) clients simpler. Programmatic APIs should be concise and >>> short for programmatic use, console clients should be easy to use from >>> console - and that's it: after all, console client exists to free a >>> user from burden of doing things programmatically, so its aim is to >>> adapt API to console or whatever UI. >>> Second, possible complications in client implied by such approach >>> certainly won't be dramatic - I don't think that additional single >>> query parsing operation in client code will make it much harder to >>> develop. >>> Third, as I see it now, adding a new "synthetic" entity and new method >>> would take more effort to adapting the client to new API. >>> >>> Dmitry, Sergi, I would like to hear what you think about it all. Thanks. >>> >>> - Alex >>> >>> 2016-07-21 21:17 GMT+03:00 Dmitriy Setrakyan <[hidden email]>: >>> > OK, then using your analogy, the current behavior in Ignite is MERGE for >>> > the most part. >>> > >>> > My preference is that Ignite SQL should work no different from >>> traditional >>> > databases, which means: >>> > >>> > - INSERT is translated into *putIfAbsent()* call in Ignite >>> > - UPDATE is translated into *replace()* call in Ignite >>> > - MERGE is translated into *put()* call in Ignite >>> > - For SQL BATCH calls we should delegate to Ignite batch operations, >>> e.g. >>> > *putAll()* >>> > >>> > The above should hold true for atomic and transactional put/putAll >>> calls, >>> > as well as for the data streamer. >>> > >>> > Does this make sense? >>> > >>> > D. >>> > >>> > On Thu, Jul 21, 2016 at 4:06 AM, Sergi Vladykin < >>> [hidden email]> >>> > wrote: >>> > >>> >> No, this does not make sense. >>> >> >>> >> There is no upsert mode in databases. There are operations: INSERT, >>> UPDATE, >>> >> DELETE, MERGE. >>> >> >>> >> I want to have clear understanding of how they have to behave in SQL >>> >> databases and how they will actually behave in Ignite in different >>> >> scenarios. Also I want to have clear understanding of performance >>> >> implications of each decision here. >>> >> >>> >> Anything wrong with that? >>> >> >>> >> Sergi >>> >> >>> >> On Thu, Jul 21, 2016 at 1:04 PM, Dmitriy Setrakyan < >>> [hidden email]> >>> >> wrote: >>> >> >>> >> > Serj, are you asking what will happen as of today? Then the answer >>> to all >>> >> > your questions is that duplicate keys are not an issue, and Ignite >>> always >>> >> > operates in **upsert** mode (which is essentially a *“put(…)” >>> *method). >>> >> > >>> >> > However, the *“insert”* that is suggested by Alex would delegate to >>> >> > *“putIfAbsent(…)”*, which in database world makes more sense. >>> However, in >>> >> > this case, the *“update”* syntax should delegate to *“replace(…)”*, >>> as >>> >> > update should fail in case if a key is absent. >>> >> > >>> >> > Considering the above, a notion of “*upsert”* or “*merge” *operation >>> is >>> >> > very much needed, as it will give a user an option to perform >>> >> > “insert-or-update” in 1 call. >>> >> > >>> >> > Does this make sense? >>> >> > >>> >> > D. >>> >> > >>> >> > On Wed, Jul 20, 2016 at 9:39 PM, Sergi Vladykin < >>> >> [hidden email]> >>> >> > wrote: >>> >> > >>> >> > > I'd prefer to do MERGE operation last because in H2 it is not >>> standard >>> >> > ANSI >>> >> > > SQL MERGE. Or may be not implement it at all, or may be contribute >>> ANSI >>> >> > > correct version to H2, then implement it on Ignite. Need to >>> investigate >>> >> > the >>> >> > > semantics deeper before making any decisions here. >>> >> > > >>> >> > > Lets start with simple scenarios for INSERT and go through all the >>> >> > possible >>> >> > > cases and answer the questions: >>> >> > > - What will happen on key conflict in TX cache? >>> >> > > - What will happen on key conflict in Atomic cache? >>> >> > > - What will happen with the previous two if we use DataLoader? >>> >> > > - How to make these operations efficient (it will be simple enough >>> to >>> >> > > implement them with separate put/putIfAbsent operations but >>> probably we >>> >> > > will need some batching like putAllIfAbsent for efficiency)? >>> >> > > >>> >> > > As for API, we still will need to have a single entry point for >>> all SQL >>> >> > > queries/commands to allow any console work with it transparently. >>> It >>> >> > would >>> >> > > be great if we will be able to come up with something consistent >>> with >>> >> > this >>> >> > > idea on public API. >>> >> > > >>> >> > > Sergi >>> >> > > >>> >> > > >>> >> > > >>> >> > > >>> >> > > >>> >> > > >>> >> > > >>> >> > > >>> >> > > On Wed, Jul 20, 2016 at 2:23 PM, Dmitriy Setrakyan < >>> >> > > [hidden email]> >>> >> > > wrote: >>> >> > > >>> >> > > > Like the idea of merge and insert. I need more time to think >>> about >>> >> the >>> >> > > API >>> >> > > > changes. >>> >> > > > >>> >> > > > Sergi, what do you think? >>> >> > > > >>> >> > > > Dmitriy >>> >> > > > >>> >> > > > >>> >> > > > >>> >> > > > On Jul 20, 2016, at 12:36 PM, Alexander Paschenko < >>> >> > > > [hidden email]> wrote: >>> >> > > > >>> >> > > > >> Thus, I suggest that we implement MERGE as a separate >>> operation >>> >> > backed >>> >> > > > by putIfAbsent operation, while INSERT will be implemented via >>> put. >>> >> > > > > >>> >> > > > > Sorry, of course I meant that MERGE has to be put-based, while >>> >> INSERT >>> >> > > > > has to be putIfAbsent-based. >>> >> > > > > >>> >> > > > > 2016-07-20 12:30 GMT+03:00 Alexander Paschenko >>> >> > > > > <[hidden email]>: >>> >> > > > >> Hell Igniters, >>> >> > > > >> >>> >> > > > >> In this thread I would like to share and discuss some >>> thoughts on >>> >> > DML >>> >> > > > >> operations' implementation, so let's start and keep it here. >>> >> > Everyone >>> >> > > > >> is of course welcome to share their suggestions. >>> >> > > > >> >>> >> > > > >> For starters, I was thinking about semantics of INSERT. In >>> >> > traditional >>> >> > > > >> RDBMSs, INSERT works only for records whose primary keys don't >>> >> > > > >> conflict with those of records that are already persistent - >>> you >>> >> > can't >>> >> > > > >> try to insert the same key more than once because you'll get >>> an >>> >> > error. >>> >> > > > >> However, semantics of cache put is obviously different - it >>> does >>> >> not >>> >> > > > >> have anything about duplicate keys, it just quietly updates >>> values >>> >> > in >>> >> > > > >> case of keys' duplication. Still, cache has putIfAbsent >>> operation >>> >> > that >>> >> > > > >> is closer to traditional notion of INSERT, and H2's SQL >>> dialect >>> >> has >>> >> > > > >> MERGE operation which corresponds to semantics of cache put. >>> >> Thus, I >>> >> > > > >> suggest that we implement MERGE as a separate operation >>> backed by >>> >> > > > >> putIfAbsent operation, while INSERT will be implemented via >>> put. >>> >> > > > >> >>> >> > > > >> And one more, probably more important thing: I suggest that we >>> >> > create >>> >> > > > >> separate class Update and corresponding operation update() in >>> >> > > > >> IgniteCache. The reasons are as follows: >>> >> > > > >> >>> >> > > > >> - Query bears some flags that are clearly redundant for Update >>> >> (page >>> >> > > > >> size, locality) >>> >> > > > >> - query() method in IgniteCache (one that accepts Query) and >>> >> query() >>> >> > > > >> methods in GridQueryIndexing return iterators. So, if we >>> strive to >>> >> > > > >> leave interfaces unchanged, we still will introduce some >>> design >>> >> > > > >> ugliness like query methods returning empty iterators for >>> certain >>> >> > > > >> queries, and/or query flags that indicate whether it's an >>> update >>> >> > query >>> >> > > > >> or not, etc. >>> >> > > > >> - If some Queries are update queries, then continuous queries >>> >> can't >>> >> > be >>> >> > > > >> based on them - more design-wise ugly checks and stuff like >>> that. >>> >> > > > >> - I'm pretty sure there's more I don't know about. >>> >> > > > >> >>> >> > > > >> Comments and suggestions are welcome. Sergi Vladykin, Dmitry >>> >> > > > >> Setrakyan, your opinions are of particular interest, please >>> >> advise. >>> >> > > > >> >>> >> > > > >> Regards, >>> >> > > > >> Alex >>> >> > > > >>> >> > > >>> >> > >>> >> >>> >> >> |
I don't see anything ugly in empty iterator, sorry if I insulted your taste
of beauty. If you will take a look at Jdbc, you will see that Statement.executeUpdate method returns number of updated rows, I'd prefer to return the same information, so it will not be empty (beauty is restored!). Sergi 2016-07-26 18:24 GMT+03:00 Alexander Paschenko < [hidden email]>: > I see your point. But what about my concerns from initial post? > Particularly about signatures of existing methods? I personally don't > like an option of query() method always returning an empty iterator > for any non-select query, it seems ugly design wise. > > - Alex > > 2016-07-26 18:15 GMT+03:00 Sergi Vladykin <[hidden email]>: > > BTW, the simplest way to solve this issue is to allow running SQL > commands > > inside of SqlFieldsQuery. > > > > We may add some additional convenience API for updates if we want, but > JDBC > > client will always call it like this: > > > > cache.query(new SqlFieldsQuery("INSERT INTO MY_TABLE > > VALUES(?,?)").setArgs(1,2)); > > > > This will resolve any ambiguity. > > > > Sergi > > > > 2016-07-26 17:56 GMT+03:00 Sergi Vladykin <[hidden email]>: > > > >> I don't like any pre-parsing, especially with some libraries other than > >> H2. H2 itself has enough quirks to multiply it on quirks of another > library. > >> > >> This is exactly what I was talking about - we need some single entry > point > >> on API for all the SQL commands and queries. Thats why I suggested > >> SqlUpdate to extend Query. To me its is the cleanest approach. May be we > >> need to change in some backward compatible way this Query hierarchy to > get > >> rid of extra methods but the idea is still the same. > >> > >> Sergi > >> > >> 2016-07-26 14:34 GMT+03:00 Alexander Paschenko < > >> [hidden email]>: > >> > >>> Guys, > >>> > >>> I would like to advance the discussion further. There's one quite > >>> important question that arose based on current state of work on this > >>> issue. If we use some kind of interactive console, like Visor, then > >>> how should it know whether SQL query it is requested to execute > >>> returns a result set or not? In JDBC world, solution is quite simple - > >>> there's base interface called Statement that all commands implement, > >>> and it has magic isResultSet method that tells whether statement is a > >>> query or an update command. The API proposed now has separate Query > >>> and Update operations which I believe to be a right thing by the > >>> reasons I outlined in the beginning of this thread. However, their > >>> lack of common ancestor prevents possible console clients from running > >>> text SQL commands in a fully transparent manner - like > >>> IgniteCache.execute(String sql). Therefore I see two possible ways of > >>> solving this: > >>> > >>> - we change API so that it includes new class or interface parenting > >>> both Query and Update, and clients use it to communicate with cache > >>> - we let (or make :) ) the client determine command type independently > >>> and behave accordingly - for it to work it will have some kind of > >>> command parsing by itself just to determine its type. Visor console > >>> may use simple library like JSqlParser > >>> (https://github.com/JSQLParser/JSqlParser; dual LGPL 2.1/ASF 2.0 > >>> licensed) to determine request type in terms of JDBC, and behave > >>> accordingly. > >>> > >>> Personally, I think that the second approach is better - and here's > why. > >>> > >>> First, it does not seem wise to change API simply to make console (or > >>> any other) clients simpler. Programmatic APIs should be concise and > >>> short for programmatic use, console clients should be easy to use from > >>> console - and that's it: after all, console client exists to free a > >>> user from burden of doing things programmatically, so its aim is to > >>> adapt API to console or whatever UI. > >>> Second, possible complications in client implied by such approach > >>> certainly won't be dramatic - I don't think that additional single > >>> query parsing operation in client code will make it much harder to > >>> develop. > >>> Third, as I see it now, adding a new "synthetic" entity and new method > >>> would take more effort to adapting the client to new API. > >>> > >>> Dmitry, Sergi, I would like to hear what you think about it all. > Thanks. > >>> > >>> - Alex > >>> > >>> 2016-07-21 21:17 GMT+03:00 Dmitriy Setrakyan <[hidden email]>: > >>> > OK, then using your analogy, the current behavior in Ignite is MERGE > for > >>> > the most part. > >>> > > >>> > My preference is that Ignite SQL should work no different from > >>> traditional > >>> > databases, which means: > >>> > > >>> > - INSERT is translated into *putIfAbsent()* call in Ignite > >>> > - UPDATE is translated into *replace()* call in Ignite > >>> > - MERGE is translated into *put()* call in Ignite > >>> > - For SQL BATCH calls we should delegate to Ignite batch operations, > >>> e.g. > >>> > *putAll()* > >>> > > >>> > The above should hold true for atomic and transactional put/putAll > >>> calls, > >>> > as well as for the data streamer. > >>> > > >>> > Does this make sense? > >>> > > >>> > D. > >>> > > >>> > On Thu, Jul 21, 2016 at 4:06 AM, Sergi Vladykin < > >>> [hidden email]> > >>> > wrote: > >>> > > >>> >> No, this does not make sense. > >>> >> > >>> >> There is no upsert mode in databases. There are operations: INSERT, > >>> UPDATE, > >>> >> DELETE, MERGE. > >>> >> > >>> >> I want to have clear understanding of how they have to behave in SQL > >>> >> databases and how they will actually behave in Ignite in different > >>> >> scenarios. Also I want to have clear understanding of performance > >>> >> implications of each decision here. > >>> >> > >>> >> Anything wrong with that? > >>> >> > >>> >> Sergi > >>> >> > >>> >> On Thu, Jul 21, 2016 at 1:04 PM, Dmitriy Setrakyan < > >>> [hidden email]> > >>> >> wrote: > >>> >> > >>> >> > Serj, are you asking what will happen as of today? Then the answer > >>> to all > >>> >> > your questions is that duplicate keys are not an issue, and Ignite > >>> always > >>> >> > operates in **upsert** mode (which is essentially a *“put(…)” > >>> *method). > >>> >> > > >>> >> > However, the *“insert”* that is suggested by Alex would delegate > to > >>> >> > *“putIfAbsent(…)”*, which in database world makes more sense. > >>> However, in > >>> >> > this case, the *“update”* syntax should delegate to > *“replace(…)”*, > >>> as > >>> >> > update should fail in case if a key is absent. > >>> >> > > >>> >> > Considering the above, a notion of “*upsert”* or “*merge” > *operation > >>> is > >>> >> > very much needed, as it will give a user an option to perform > >>> >> > “insert-or-update” in 1 call. > >>> >> > > >>> >> > Does this make sense? > >>> >> > > >>> >> > D. > >>> >> > > >>> >> > On Wed, Jul 20, 2016 at 9:39 PM, Sergi Vladykin < > >>> >> [hidden email]> > >>> >> > wrote: > >>> >> > > >>> >> > > I'd prefer to do MERGE operation last because in H2 it is not > >>> standard > >>> >> > ANSI > >>> >> > > SQL MERGE. Or may be not implement it at all, or may be > contribute > >>> ANSI > >>> >> > > correct version to H2, then implement it on Ignite. Need to > >>> investigate > >>> >> > the > >>> >> > > semantics deeper before making any decisions here. > >>> >> > > > >>> >> > > Lets start with simple scenarios for INSERT and go through all > the > >>> >> > possible > >>> >> > > cases and answer the questions: > >>> >> > > - What will happen on key conflict in TX cache? > >>> >> > > - What will happen on key conflict in Atomic cache? > >>> >> > > - What will happen with the previous two if we use DataLoader? > >>> >> > > - How to make these operations efficient (it will be simple > enough > >>> to > >>> >> > > implement them with separate put/putIfAbsent operations but > >>> probably we > >>> >> > > will need some batching like putAllIfAbsent for efficiency)? > >>> >> > > > >>> >> > > As for API, we still will need to have a single entry point for > >>> all SQL > >>> >> > > queries/commands to allow any console work with it > transparently. > >>> It > >>> >> > would > >>> >> > > be great if we will be able to come up with something consistent > >>> with > >>> >> > this > >>> >> > > idea on public API. > >>> >> > > > >>> >> > > Sergi > >>> >> > > > >>> >> > > > >>> >> > > > >>> >> > > > >>> >> > > > >>> >> > > > >>> >> > > > >>> >> > > > >>> >> > > On Wed, Jul 20, 2016 at 2:23 PM, Dmitriy Setrakyan < > >>> >> > > [hidden email]> > >>> >> > > wrote: > >>> >> > > > >>> >> > > > Like the idea of merge and insert. I need more time to think > >>> about > >>> >> the > >>> >> > > API > >>> >> > > > changes. > >>> >> > > > > >>> >> > > > Sergi, what do you think? > >>> >> > > > > >>> >> > > > Dmitriy > >>> >> > > > > >>> >> > > > > >>> >> > > > > >>> >> > > > On Jul 20, 2016, at 12:36 PM, Alexander Paschenko < > >>> >> > > > [hidden email]> wrote: > >>> >> > > > > >>> >> > > > >> Thus, I suggest that we implement MERGE as a separate > >>> operation > >>> >> > backed > >>> >> > > > by putIfAbsent operation, while INSERT will be implemented via > >>> put. > >>> >> > > > > > >>> >> > > > > Sorry, of course I meant that MERGE has to be put-based, > while > >>> >> INSERT > >>> >> > > > > has to be putIfAbsent-based. > >>> >> > > > > > >>> >> > > > > 2016-07-20 12:30 GMT+03:00 Alexander Paschenko > >>> >> > > > > <[hidden email]>: > >>> >> > > > >> Hell Igniters, > >>> >> > > > >> > >>> >> > > > >> In this thread I would like to share and discuss some > >>> thoughts on > >>> >> > DML > >>> >> > > > >> operations' implementation, so let's start and keep it > here. > >>> >> > Everyone > >>> >> > > > >> is of course welcome to share their suggestions. > >>> >> > > > >> > >>> >> > > > >> For starters, I was thinking about semantics of INSERT. In > >>> >> > traditional > >>> >> > > > >> RDBMSs, INSERT works only for records whose primary keys > don't > >>> >> > > > >> conflict with those of records that are already persistent > - > >>> you > >>> >> > can't > >>> >> > > > >> try to insert the same key more than once because you'll > get > >>> an > >>> >> > error. > >>> >> > > > >> However, semantics of cache put is obviously different - it > >>> does > >>> >> not > >>> >> > > > >> have anything about duplicate keys, it just quietly updates > >>> values > >>> >> > in > >>> >> > > > >> case of keys' duplication. Still, cache has putIfAbsent > >>> operation > >>> >> > that > >>> >> > > > >> is closer to traditional notion of INSERT, and H2's SQL > >>> dialect > >>> >> has > >>> >> > > > >> MERGE operation which corresponds to semantics of cache > put. > >>> >> Thus, I > >>> >> > > > >> suggest that we implement MERGE as a separate operation > >>> backed by > >>> >> > > > >> putIfAbsent operation, while INSERT will be implemented via > >>> put. > >>> >> > > > >> > >>> >> > > > >> And one more, probably more important thing: I suggest > that we > >>> >> > create > >>> >> > > > >> separate class Update and corresponding operation update() > in > >>> >> > > > >> IgniteCache. The reasons are as follows: > >>> >> > > > >> > >>> >> > > > >> - Query bears some flags that are clearly redundant for > Update > >>> >> (page > >>> >> > > > >> size, locality) > >>> >> > > > >> - query() method in IgniteCache (one that accepts Query) > and > >>> >> query() > >>> >> > > > >> methods in GridQueryIndexing return iterators. So, if we > >>> strive to > >>> >> > > > >> leave interfaces unchanged, we still will introduce some > >>> design > >>> >> > > > >> ugliness like query methods returning empty iterators for > >>> certain > >>> >> > > > >> queries, and/or query flags that indicate whether it's an > >>> update > >>> >> > query > >>> >> > > > >> or not, etc. > >>> >> > > > >> - If some Queries are update queries, then continuous > queries > >>> >> can't > >>> >> > be > >>> >> > > > >> based on them - more design-wise ugly checks and stuff like > >>> that. > >>> >> > > > >> - I'm pretty sure there's more I don't know about. > >>> >> > > > >> > >>> >> > > > >> Comments and suggestions are welcome. Sergi Vladykin, > Dmitry > >>> >> > > > >> Setrakyan, your opinions are of particular interest, please > >>> >> advise. > >>> >> > > > >> > >>> >> > > > >> Regards, > >>> >> > > > >> Alex > >>> >> > > > > >>> >> > > > >>> >> > > >>> >> > >>> > >> > >> > |
What is the ticket number for this. Is the new API documented there?
On Tue, Jul 26, 2016 at 11:36 AM, Sergi Vladykin <[hidden email]> wrote: > I don't see anything ugly in empty iterator, sorry if I insulted your taste > of beauty. > > If you will take a look at Jdbc, you will see that Statement.executeUpdate > method returns number of updated rows, I'd prefer to return the same > information, so it will not be empty (beauty is restored!). > > Sergi > > > > 2016-07-26 18:24 GMT+03:00 Alexander Paschenko < > [hidden email]>: > > > I see your point. But what about my concerns from initial post? > > Particularly about signatures of existing methods? I personally don't > > like an option of query() method always returning an empty iterator > > for any non-select query, it seems ugly design wise. > > > > - Alex > > > > 2016-07-26 18:15 GMT+03:00 Sergi Vladykin <[hidden email]>: > > > BTW, the simplest way to solve this issue is to allow running SQL > > commands > > > inside of SqlFieldsQuery. > > > > > > We may add some additional convenience API for updates if we want, but > > JDBC > > > client will always call it like this: > > > > > > cache.query(new SqlFieldsQuery("INSERT INTO MY_TABLE > > > VALUES(?,?)").setArgs(1,2)); > > > > > > This will resolve any ambiguity. > > > > > > Sergi > > > > > > 2016-07-26 17:56 GMT+03:00 Sergi Vladykin <[hidden email]>: > > > > > >> I don't like any pre-parsing, especially with some libraries other > than > > >> H2. H2 itself has enough quirks to multiply it on quirks of another > > library. > > >> > > >> This is exactly what I was talking about - we need some single entry > > point > > >> on API for all the SQL commands and queries. Thats why I suggested > > >> SqlUpdate to extend Query. To me its is the cleanest approach. May be > we > > >> need to change in some backward compatible way this Query hierarchy to > > get > > >> rid of extra methods but the idea is still the same. > > >> > > >> Sergi > > >> > > >> 2016-07-26 14:34 GMT+03:00 Alexander Paschenko < > > >> [hidden email]>: > > >> > > >>> Guys, > > >>> > > >>> I would like to advance the discussion further. There's one quite > > >>> important question that arose based on current state of work on this > > >>> issue. If we use some kind of interactive console, like Visor, then > > >>> how should it know whether SQL query it is requested to execute > > >>> returns a result set or not? In JDBC world, solution is quite simple > - > > >>> there's base interface called Statement that all commands implement, > > >>> and it has magic isResultSet method that tells whether statement is a > > >>> query or an update command. The API proposed now has separate Query > > >>> and Update operations which I believe to be a right thing by the > > >>> reasons I outlined in the beginning of this thread. However, their > > >>> lack of common ancestor prevents possible console clients from > running > > >>> text SQL commands in a fully transparent manner - like > > >>> IgniteCache.execute(String sql). Therefore I see two possible ways of > > >>> solving this: > > >>> > > >>> - we change API so that it includes new class or interface parenting > > >>> both Query and Update, and clients use it to communicate with cache > > >>> - we let (or make :) ) the client determine command type > independently > > >>> and behave accordingly - for it to work it will have some kind of > > >>> command parsing by itself just to determine its type. Visor console > > >>> may use simple library like JSqlParser > > >>> (https://github.com/JSQLParser/JSqlParser; dual LGPL 2.1/ASF 2.0 > > >>> licensed) to determine request type in terms of JDBC, and behave > > >>> accordingly. > > >>> > > >>> Personally, I think that the second approach is better - and here's > > why. > > >>> > > >>> First, it does not seem wise to change API simply to make console (or > > >>> any other) clients simpler. Programmatic APIs should be concise and > > >>> short for programmatic use, console clients should be easy to use > from > > >>> console - and that's it: after all, console client exists to free a > > >>> user from burden of doing things programmatically, so its aim is to > > >>> adapt API to console or whatever UI. > > >>> Second, possible complications in client implied by such approach > > >>> certainly won't be dramatic - I don't think that additional single > > >>> query parsing operation in client code will make it much harder to > > >>> develop. > > >>> Third, as I see it now, adding a new "synthetic" entity and new > method > > >>> would take more effort to adapting the client to new API. > > >>> > > >>> Dmitry, Sergi, I would like to hear what you think about it all. > > Thanks. > > >>> > > >>> - Alex > > >>> > > >>> 2016-07-21 21:17 GMT+03:00 Dmitriy Setrakyan <[hidden email] > >: > > >>> > OK, then using your analogy, the current behavior in Ignite is > MERGE > > for > > >>> > the most part. > > >>> > > > >>> > My preference is that Ignite SQL should work no different from > > >>> traditional > > >>> > databases, which means: > > >>> > > > >>> > - INSERT is translated into *putIfAbsent()* call in Ignite > > >>> > - UPDATE is translated into *replace()* call in Ignite > > >>> > - MERGE is translated into *put()* call in Ignite > > >>> > - For SQL BATCH calls we should delegate to Ignite batch > operations, > > >>> e.g. > > >>> > *putAll()* > > >>> > > > >>> > The above should hold true for atomic and transactional put/putAll > > >>> calls, > > >>> > as well as for the data streamer. > > >>> > > > >>> > Does this make sense? > > >>> > > > >>> > D. > > >>> > > > >>> > On Thu, Jul 21, 2016 at 4:06 AM, Sergi Vladykin < > > >>> [hidden email]> > > >>> > wrote: > > >>> > > > >>> >> No, this does not make sense. > > >>> >> > > >>> >> There is no upsert mode in databases. There are operations: > INSERT, > > >>> UPDATE, > > >>> >> DELETE, MERGE. > > >>> >> > > >>> >> I want to have clear understanding of how they have to behave in > SQL > > >>> >> databases and how they will actually behave in Ignite in different > > >>> >> scenarios. Also I want to have clear understanding of performance > > >>> >> implications of each decision here. > > >>> >> > > >>> >> Anything wrong with that? > > >>> >> > > >>> >> Sergi > > >>> >> > > >>> >> On Thu, Jul 21, 2016 at 1:04 PM, Dmitriy Setrakyan < > > >>> [hidden email]> > > >>> >> wrote: > > >>> >> > > >>> >> > Serj, are you asking what will happen as of today? Then the > answer > > >>> to all > > >>> >> > your questions is that duplicate keys are not an issue, and > Ignite > > >>> always > > >>> >> > operates in **upsert** mode (which is essentially a *“put(…)” > > >>> *method). > > >>> >> > > > >>> >> > However, the *“insert”* that is suggested by Alex would delegate > > to > > >>> >> > *“putIfAbsent(…)”*, which in database world makes more sense. > > >>> However, in > > >>> >> > this case, the *“update”* syntax should delegate to > > *“replace(…)”*, > > >>> as > > >>> >> > update should fail in case if a key is absent. > > >>> >> > > > >>> >> > Considering the above, a notion of “*upsert”* or “*merge” > > *operation > > >>> is > > >>> >> > very much needed, as it will give a user an option to perform > > >>> >> > “insert-or-update” in 1 call. > > >>> >> > > > >>> >> > Does this make sense? > > >>> >> > > > >>> >> > D. > > >>> >> > > > >>> >> > On Wed, Jul 20, 2016 at 9:39 PM, Sergi Vladykin < > > >>> >> [hidden email]> > > >>> >> > wrote: > > >>> >> > > > >>> >> > > I'd prefer to do MERGE operation last because in H2 it is not > > >>> standard > > >>> >> > ANSI > > >>> >> > > SQL MERGE. Or may be not implement it at all, or may be > > contribute > > >>> ANSI > > >>> >> > > correct version to H2, then implement it on Ignite. Need to > > >>> investigate > > >>> >> > the > > >>> >> > > semantics deeper before making any decisions here. > > >>> >> > > > > >>> >> > > Lets start with simple scenarios for INSERT and go through all > > the > > >>> >> > possible > > >>> >> > > cases and answer the questions: > > >>> >> > > - What will happen on key conflict in TX cache? > > >>> >> > > - What will happen on key conflict in Atomic cache? > > >>> >> > > - What will happen with the previous two if we use DataLoader? > > >>> >> > > - How to make these operations efficient (it will be simple > > enough > > >>> to > > >>> >> > > implement them with separate put/putIfAbsent operations but > > >>> probably we > > >>> >> > > will need some batching like putAllIfAbsent for efficiency)? > > >>> >> > > > > >>> >> > > As for API, we still will need to have a single entry point > for > > >>> all SQL > > >>> >> > > queries/commands to allow any console work with it > > transparently. > > >>> It > > >>> >> > would > > >>> >> > > be great if we will be able to come up with something > consistent > > >>> with > > >>> >> > this > > >>> >> > > idea on public API. > > >>> >> > > > > >>> >> > > Sergi > > >>> >> > > > > >>> >> > > > > >>> >> > > > > >>> >> > > > > >>> >> > > > > >>> >> > > > > >>> >> > > > > >>> >> > > > > >>> >> > > On Wed, Jul 20, 2016 at 2:23 PM, Dmitriy Setrakyan < > > >>> >> > > [hidden email]> > > >>> >> > > wrote: > > >>> >> > > > > >>> >> > > > Like the idea of merge and insert. I need more time to think > > >>> about > > >>> >> the > > >>> >> > > API > > >>> >> > > > changes. > > >>> >> > > > > > >>> >> > > > Sergi, what do you think? > > >>> >> > > > > > >>> >> > > > Dmitriy > > >>> >> > > > > > >>> >> > > > > > >>> >> > > > > > >>> >> > > > On Jul 20, 2016, at 12:36 PM, Alexander Paschenko < > > >>> >> > > > [hidden email]> wrote: > > >>> >> > > > > > >>> >> > > > >> Thus, I suggest that we implement MERGE as a separate > > >>> operation > > >>> >> > backed > > >>> >> > > > by putIfAbsent operation, while INSERT will be implemented > via > > >>> put. > > >>> >> > > > > > > >>> >> > > > > Sorry, of course I meant that MERGE has to be put-based, > > while > > >>> >> INSERT > > >>> >> > > > > has to be putIfAbsent-based. > > >>> >> > > > > > > >>> >> > > > > 2016-07-20 12:30 GMT+03:00 Alexander Paschenko > > >>> >> > > > > <[hidden email]>: > > >>> >> > > > >> Hell Igniters, > > >>> >> > > > >> > > >>> >> > > > >> In this thread I would like to share and discuss some > > >>> thoughts on > > >>> >> > DML > > >>> >> > > > >> operations' implementation, so let's start and keep it > > here. > > >>> >> > Everyone > > >>> >> > > > >> is of course welcome to share their suggestions. > > >>> >> > > > >> > > >>> >> > > > >> For starters, I was thinking about semantics of INSERT. > In > > >>> >> > traditional > > >>> >> > > > >> RDBMSs, INSERT works only for records whose primary keys > > don't > > >>> >> > > > >> conflict with those of records that are already > persistent > > - > > >>> you > > >>> >> > can't > > >>> >> > > > >> try to insert the same key more than once because you'll > > get > > >>> an > > >>> >> > error. > > >>> >> > > > >> However, semantics of cache put is obviously different - > it > > >>> does > > >>> >> not > > >>> >> > > > >> have anything about duplicate keys, it just quietly > updates > > >>> values > > >>> >> > in > > >>> >> > > > >> case of keys' duplication. Still, cache has putIfAbsent > > >>> operation > > >>> >> > that > > >>> >> > > > >> is closer to traditional notion of INSERT, and H2's SQL > > >>> dialect > > >>> >> has > > >>> >> > > > >> MERGE operation which corresponds to semantics of cache > > put. > > >>> >> Thus, I > > >>> >> > > > >> suggest that we implement MERGE as a separate operation > > >>> backed by > > >>> >> > > > >> putIfAbsent operation, while INSERT will be implemented > via > > >>> put. > > >>> >> > > > >> > > >>> >> > > > >> And one more, probably more important thing: I suggest > > that we > > >>> >> > create > > >>> >> > > > >> separate class Update and corresponding operation > update() > > in > > >>> >> > > > >> IgniteCache. The reasons are as follows: > > >>> >> > > > >> > > >>> >> > > > >> - Query bears some flags that are clearly redundant for > > Update > > >>> >> (page > > >>> >> > > > >> size, locality) > > >>> >> > > > >> - query() method in IgniteCache (one that accepts Query) > > and > > >>> >> query() > > >>> >> > > > >> methods in GridQueryIndexing return iterators. So, if we > > >>> strive to > > >>> >> > > > >> leave interfaces unchanged, we still will introduce some > > >>> design > > >>> >> > > > >> ugliness like query methods returning empty iterators for > > >>> certain > > >>> >> > > > >> queries, and/or query flags that indicate whether it's an > > >>> update > > >>> >> > query > > >>> >> > > > >> or not, etc. > > >>> >> > > > >> - If some Queries are update queries, then continuous > > queries > > >>> >> can't > > >>> >> > be > > >>> >> > > > >> based on them - more design-wise ugly checks and stuff > like > > >>> that. > > >>> >> > > > >> - I'm pretty sure there's more I don't know about. > > >>> >> > > > >> > > >>> >> > > > >> Comments and suggestions are welcome. Sergi Vladykin, > > Dmitry > > >>> >> > > > >> Setrakyan, your opinions are of particular interest, > please > > >>> >> advise. > > >>> >> > > > >> > > >>> >> > > > >> Regards, > > >>> >> > > > >> Alex > > >>> >> > > > > > >>> >> > > > > >>> >> > > > >>> >> > > >>> > > >> > > >> > > > |
Sergi,
You wrote: > I'd prefer to return the same information, so it will not be empty Do you mean return iterator with single element that denotes number of rows? Dmitriy, You wrote: > What is the ticket number for this. Is the new API documented there? Overall issue number is 2294. There's no particular issue on API changes, but creating one seems to be a good idea, I will do it. - Alex 2016-07-27 9:20 GMT+03:00 Dmitriy Setrakyan <[hidden email]>: > What is the ticket number for this. Is the new API documented there? > > On Tue, Jul 26, 2016 at 11:36 AM, Sergi Vladykin <[hidden email]> > wrote: > >> I don't see anything ugly in empty iterator, sorry if I insulted your taste >> of beauty. >> >> If you will take a look at Jdbc, you will see that Statement.executeUpdate >> method returns number of updated rows, I'd prefer to return the same >> information, so it will not be empty (beauty is restored!). >> >> Sergi >> >> >> >> 2016-07-26 18:24 GMT+03:00 Alexander Paschenko < >> [hidden email]>: >> >> > I see your point. But what about my concerns from initial post? >> > Particularly about signatures of existing methods? I personally don't >> > like an option of query() method always returning an empty iterator >> > for any non-select query, it seems ugly design wise. >> > >> > - Alex >> > >> > 2016-07-26 18:15 GMT+03:00 Sergi Vladykin <[hidden email]>: >> > > BTW, the simplest way to solve this issue is to allow running SQL >> > commands >> > > inside of SqlFieldsQuery. >> > > >> > > We may add some additional convenience API for updates if we want, but >> > JDBC >> > > client will always call it like this: >> > > >> > > cache.query(new SqlFieldsQuery("INSERT INTO MY_TABLE >> > > VALUES(?,?)").setArgs(1,2)); >> > > >> > > This will resolve any ambiguity. >> > > >> > > Sergi >> > > >> > > 2016-07-26 17:56 GMT+03:00 Sergi Vladykin <[hidden email]>: >> > > >> > >> I don't like any pre-parsing, especially with some libraries other >> than >> > >> H2. H2 itself has enough quirks to multiply it on quirks of another >> > library. >> > >> >> > >> This is exactly what I was talking about - we need some single entry >> > point >> > >> on API for all the SQL commands and queries. Thats why I suggested >> > >> SqlUpdate to extend Query. To me its is the cleanest approach. May be >> we >> > >> need to change in some backward compatible way this Query hierarchy to >> > get >> > >> rid of extra methods but the idea is still the same. >> > >> >> > >> Sergi >> > >> >> > >> 2016-07-26 14:34 GMT+03:00 Alexander Paschenko < >> > >> [hidden email]>: >> > >> >> > >>> Guys, >> > >>> >> > >>> I would like to advance the discussion further. There's one quite >> > >>> important question that arose based on current state of work on this >> > >>> issue. If we use some kind of interactive console, like Visor, then >> > >>> how should it know whether SQL query it is requested to execute >> > >>> returns a result set or not? In JDBC world, solution is quite simple >> - >> > >>> there's base interface called Statement that all commands implement, >> > >>> and it has magic isResultSet method that tells whether statement is a >> > >>> query or an update command. The API proposed now has separate Query >> > >>> and Update operations which I believe to be a right thing by the >> > >>> reasons I outlined in the beginning of this thread. However, their >> > >>> lack of common ancestor prevents possible console clients from >> running >> > >>> text SQL commands in a fully transparent manner - like >> > >>> IgniteCache.execute(String sql). Therefore I see two possible ways of >> > >>> solving this: >> > >>> >> > >>> - we change API so that it includes new class or interface parenting >> > >>> both Query and Update, and clients use it to communicate with cache >> > >>> - we let (or make :) ) the client determine command type >> independently >> > >>> and behave accordingly - for it to work it will have some kind of >> > >>> command parsing by itself just to determine its type. Visor console >> > >>> may use simple library like JSqlParser >> > >>> (https://github.com/JSQLParser/JSqlParser; dual LGPL 2.1/ASF 2.0 >> > >>> licensed) to determine request type in terms of JDBC, and behave >> > >>> accordingly. >> > >>> >> > >>> Personally, I think that the second approach is better - and here's >> > why. >> > >>> >> > >>> First, it does not seem wise to change API simply to make console (or >> > >>> any other) clients simpler. Programmatic APIs should be concise and >> > >>> short for programmatic use, console clients should be easy to use >> from >> > >>> console - and that's it: after all, console client exists to free a >> > >>> user from burden of doing things programmatically, so its aim is to >> > >>> adapt API to console or whatever UI. >> > >>> Second, possible complications in client implied by such approach >> > >>> certainly won't be dramatic - I don't think that additional single >> > >>> query parsing operation in client code will make it much harder to >> > >>> develop. >> > >>> Third, as I see it now, adding a new "synthetic" entity and new >> method >> > >>> would take more effort to adapting the client to new API. >> > >>> >> > >>> Dmitry, Sergi, I would like to hear what you think about it all. >> > Thanks. >> > >>> >> > >>> - Alex >> > >>> >> > >>> 2016-07-21 21:17 GMT+03:00 Dmitriy Setrakyan <[hidden email] >> >: >> > >>> > OK, then using your analogy, the current behavior in Ignite is >> MERGE >> > for >> > >>> > the most part. >> > >>> > >> > >>> > My preference is that Ignite SQL should work no different from >> > >>> traditional >> > >>> > databases, which means: >> > >>> > >> > >>> > - INSERT is translated into *putIfAbsent()* call in Ignite >> > >>> > - UPDATE is translated into *replace()* call in Ignite >> > >>> > - MERGE is translated into *put()* call in Ignite >> > >>> > - For SQL BATCH calls we should delegate to Ignite batch >> operations, >> > >>> e.g. >> > >>> > *putAll()* >> > >>> > >> > >>> > The above should hold true for atomic and transactional put/putAll >> > >>> calls, >> > >>> > as well as for the data streamer. >> > >>> > >> > >>> > Does this make sense? >> > >>> > >> > >>> > D. >> > >>> > >> > >>> > On Thu, Jul 21, 2016 at 4:06 AM, Sergi Vladykin < >> > >>> [hidden email]> >> > >>> > wrote: >> > >>> > >> > >>> >> No, this does not make sense. >> > >>> >> >> > >>> >> There is no upsert mode in databases. There are operations: >> INSERT, >> > >>> UPDATE, >> > >>> >> DELETE, MERGE. >> > >>> >> >> > >>> >> I want to have clear understanding of how they have to behave in >> SQL >> > >>> >> databases and how they will actually behave in Ignite in different >> > >>> >> scenarios. Also I want to have clear understanding of performance >> > >>> >> implications of each decision here. >> > >>> >> >> > >>> >> Anything wrong with that? >> > >>> >> >> > >>> >> Sergi >> > >>> >> >> > >>> >> On Thu, Jul 21, 2016 at 1:04 PM, Dmitriy Setrakyan < >> > >>> [hidden email]> >> > >>> >> wrote: >> > >>> >> >> > >>> >> > Serj, are you asking what will happen as of today? Then the >> answer >> > >>> to all >> > >>> >> > your questions is that duplicate keys are not an issue, and >> Ignite >> > >>> always >> > >>> >> > operates in **upsert** mode (which is essentially a *“put(…)” >> > >>> *method). >> > >>> >> > >> > >>> >> > However, the *“insert”* that is suggested by Alex would delegate >> > to >> > >>> >> > *“putIfAbsent(…)”*, which in database world makes more sense. >> > >>> However, in >> > >>> >> > this case, the *“update”* syntax should delegate to >> > *“replace(…)”*, >> > >>> as >> > >>> >> > update should fail in case if a key is absent. >> > >>> >> > >> > >>> >> > Considering the above, a notion of “*upsert”* or “*merge” >> > *operation >> > >>> is >> > >>> >> > very much needed, as it will give a user an option to perform >> > >>> >> > “insert-or-update” in 1 call. >> > >>> >> > >> > >>> >> > Does this make sense? >> > >>> >> > >> > >>> >> > D. >> > >>> >> > >> > >>> >> > On Wed, Jul 20, 2016 at 9:39 PM, Sergi Vladykin < >> > >>> >> [hidden email]> >> > >>> >> > wrote: >> > >>> >> > >> > >>> >> > > I'd prefer to do MERGE operation last because in H2 it is not >> > >>> standard >> > >>> >> > ANSI >> > >>> >> > > SQL MERGE. Or may be not implement it at all, or may be >> > contribute >> > >>> ANSI >> > >>> >> > > correct version to H2, then implement it on Ignite. Need to >> > >>> investigate >> > >>> >> > the >> > >>> >> > > semantics deeper before making any decisions here. >> > >>> >> > > >> > >>> >> > > Lets start with simple scenarios for INSERT and go through all >> > the >> > >>> >> > possible >> > >>> >> > > cases and answer the questions: >> > >>> >> > > - What will happen on key conflict in TX cache? >> > >>> >> > > - What will happen on key conflict in Atomic cache? >> > >>> >> > > - What will happen with the previous two if we use DataLoader? >> > >>> >> > > - How to make these operations efficient (it will be simple >> > enough >> > >>> to >> > >>> >> > > implement them with separate put/putIfAbsent operations but >> > >>> probably we >> > >>> >> > > will need some batching like putAllIfAbsent for efficiency)? >> > >>> >> > > >> > >>> >> > > As for API, we still will need to have a single entry point >> for >> > >>> all SQL >> > >>> >> > > queries/commands to allow any console work with it >> > transparently. >> > >>> It >> > >>> >> > would >> > >>> >> > > be great if we will be able to come up with something >> consistent >> > >>> with >> > >>> >> > this >> > >>> >> > > idea on public API. >> > >>> >> > > >> > >>> >> > > Sergi >> > >>> >> > > >> > >>> >> > > >> > >>> >> > > >> > >>> >> > > >> > >>> >> > > >> > >>> >> > > >> > >>> >> > > >> > >>> >> > > >> > >>> >> > > On Wed, Jul 20, 2016 at 2:23 PM, Dmitriy Setrakyan < >> > >>> >> > > [hidden email]> >> > >>> >> > > wrote: >> > >>> >> > > >> > >>> >> > > > Like the idea of merge and insert. I need more time to think >> > >>> about >> > >>> >> the >> > >>> >> > > API >> > >>> >> > > > changes. >> > >>> >> > > > >> > >>> >> > > > Sergi, what do you think? >> > >>> >> > > > >> > >>> >> > > > Dmitriy >> > >>> >> > > > >> > >>> >> > > > >> > >>> >> > > > >> > >>> >> > > > On Jul 20, 2016, at 12:36 PM, Alexander Paschenko < >> > >>> >> > > > [hidden email]> wrote: >> > >>> >> > > > >> > >>> >> > > > >> Thus, I suggest that we implement MERGE as a separate >> > >>> operation >> > >>> >> > backed >> > >>> >> > > > by putIfAbsent operation, while INSERT will be implemented >> via >> > >>> put. >> > >>> >> > > > > >> > >>> >> > > > > Sorry, of course I meant that MERGE has to be put-based, >> > while >> > >>> >> INSERT >> > >>> >> > > > > has to be putIfAbsent-based. >> > >>> >> > > > > >> > >>> >> > > > > 2016-07-20 12:30 GMT+03:00 Alexander Paschenko >> > >>> >> > > > > <[hidden email]>: >> > >>> >> > > > >> Hell Igniters, >> > >>> >> > > > >> >> > >>> >> > > > >> In this thread I would like to share and discuss some >> > >>> thoughts on >> > >>> >> > DML >> > >>> >> > > > >> operations' implementation, so let's start and keep it >> > here. >> > >>> >> > Everyone >> > >>> >> > > > >> is of course welcome to share their suggestions. >> > >>> >> > > > >> >> > >>> >> > > > >> For starters, I was thinking about semantics of INSERT. >> In >> > >>> >> > traditional >> > >>> >> > > > >> RDBMSs, INSERT works only for records whose primary keys >> > don't >> > >>> >> > > > >> conflict with those of records that are already >> persistent >> > - >> > >>> you >> > >>> >> > can't >> > >>> >> > > > >> try to insert the same key more than once because you'll >> > get >> > >>> an >> > >>> >> > error. >> > >>> >> > > > >> However, semantics of cache put is obviously different - >> it >> > >>> does >> > >>> >> not >> > >>> >> > > > >> have anything about duplicate keys, it just quietly >> updates >> > >>> values >> > >>> >> > in >> > >>> >> > > > >> case of keys' duplication. Still, cache has putIfAbsent >> > >>> operation >> > >>> >> > that >> > >>> >> > > > >> is closer to traditional notion of INSERT, and H2's SQL >> > >>> dialect >> > >>> >> has >> > >>> >> > > > >> MERGE operation which corresponds to semantics of cache >> > put. >> > >>> >> Thus, I >> > >>> >> > > > >> suggest that we implement MERGE as a separate operation >> > >>> backed by >> > >>> >> > > > >> putIfAbsent operation, while INSERT will be implemented >> via >> > >>> put. >> > >>> >> > > > >> >> > >>> >> > > > >> And one more, probably more important thing: I suggest >> > that we >> > >>> >> > create >> > >>> >> > > > >> separate class Update and corresponding operation >> update() >> > in >> > >>> >> > > > >> IgniteCache. The reasons are as follows: >> > >>> >> > > > >> >> > >>> >> > > > >> - Query bears some flags that are clearly redundant for >> > Update >> > >>> >> (page >> > >>> >> > > > >> size, locality) >> > >>> >> > > > >> - query() method in IgniteCache (one that accepts Query) >> > and >> > >>> >> query() >> > >>> >> > > > >> methods in GridQueryIndexing return iterators. So, if we >> > >>> strive to >> > >>> >> > > > >> leave interfaces unchanged, we still will introduce some >> > >>> design >> > >>> >> > > > >> ugliness like query methods returning empty iterators for >> > >>> certain >> > >>> >> > > > >> queries, and/or query flags that indicate whether it's an >> > >>> update >> > >>> >> > query >> > >>> >> > > > >> or not, etc. >> > >>> >> > > > >> - If some Queries are update queries, then continuous >> > queries >> > >>> >> can't >> > >>> >> > be >> > >>> >> > > > >> based on them - more design-wise ugly checks and stuff >> like >> > >>> that. >> > >>> >> > > > >> - I'm pretty sure there's more I don't know about. >> > >>> >> > > > >> >> > >>> >> > > > >> Comments and suggestions are welcome. Sergi Vladykin, >> > Dmitry >> > >>> >> > > > >> Setrakyan, your opinions are of particular interest, >> please >> > >>> >> advise. >> > >>> >> > > > >> >> > >>> >> > > > >> Regards, >> > >>> >> > > > >> Alex >> > >>> >> > > > >> > >>> >> > > >> > >>> >> > >> > >>> >> >> > >>> >> > >> >> > >> >> > >> |
Exactly. This will allow our Jdbc driver to work transparently.
Sergi 2016-07-27 12:40 GMT+03:00 Alexander Paschenko < [hidden email]>: > Sergi, > > You wrote: > > I'd prefer to return the same information, so it will not be empty > > Do you mean return iterator with single element that denotes number of > rows? > > Dmitriy, > > You wrote: > > What is the ticket number for this. Is the new API documented there? > > Overall issue number is 2294. There's no particular issue on API > changes, but creating one seems to be a good idea, I will do it. > > - Alex > > 2016-07-27 9:20 GMT+03:00 Dmitriy Setrakyan <[hidden email]>: > > What is the ticket number for this. Is the new API documented there? > > > > On Tue, Jul 26, 2016 at 11:36 AM, Sergi Vladykin < > [hidden email]> > > wrote: > > > >> I don't see anything ugly in empty iterator, sorry if I insulted your > taste > >> of beauty. > >> > >> If you will take a look at Jdbc, you will see that > Statement.executeUpdate > >> method returns number of updated rows, I'd prefer to return the same > >> information, so it will not be empty (beauty is restored!). > >> > >> Sergi > >> > >> > >> > >> 2016-07-26 18:24 GMT+03:00 Alexander Paschenko < > >> [hidden email]>: > >> > >> > I see your point. But what about my concerns from initial post? > >> > Particularly about signatures of existing methods? I personally don't > >> > like an option of query() method always returning an empty iterator > >> > for any non-select query, it seems ugly design wise. > >> > > >> > - Alex > >> > > >> > 2016-07-26 18:15 GMT+03:00 Sergi Vladykin <[hidden email]>: > >> > > BTW, the simplest way to solve this issue is to allow running SQL > >> > commands > >> > > inside of SqlFieldsQuery. > >> > > > >> > > We may add some additional convenience API for updates if we want, > but > >> > JDBC > >> > > client will always call it like this: > >> > > > >> > > cache.query(new SqlFieldsQuery("INSERT INTO MY_TABLE > >> > > VALUES(?,?)").setArgs(1,2)); > >> > > > >> > > This will resolve any ambiguity. > >> > > > >> > > Sergi > >> > > > >> > > 2016-07-26 17:56 GMT+03:00 Sergi Vladykin <[hidden email] > >: > >> > > > >> > >> I don't like any pre-parsing, especially with some libraries other > >> than > >> > >> H2. H2 itself has enough quirks to multiply it on quirks of another > >> > library. > >> > >> > >> > >> This is exactly what I was talking about - we need some single > entry > >> > point > >> > >> on API for all the SQL commands and queries. Thats why I suggested > >> > >> SqlUpdate to extend Query. To me its is the cleanest approach. May > be > >> we > >> > >> need to change in some backward compatible way this Query > hierarchy to > >> > get > >> > >> rid of extra methods but the idea is still the same. > >> > >> > >> > >> Sergi > >> > >> > >> > >> 2016-07-26 14:34 GMT+03:00 Alexander Paschenko < > >> > >> [hidden email]>: > >> > >> > >> > >>> Guys, > >> > >>> > >> > >>> I would like to advance the discussion further. There's one quite > >> > >>> important question that arose based on current state of work on > this > >> > >>> issue. If we use some kind of interactive console, like Visor, > then > >> > >>> how should it know whether SQL query it is requested to execute > >> > >>> returns a result set or not? In JDBC world, solution is quite > simple > >> - > >> > >>> there's base interface called Statement that all commands > implement, > >> > >>> and it has magic isResultSet method that tells whether statement > is a > >> > >>> query or an update command. The API proposed now has separate > Query > >> > >>> and Update operations which I believe to be a right thing by the > >> > >>> reasons I outlined in the beginning of this thread. However, their > >> > >>> lack of common ancestor prevents possible console clients from > >> running > >> > >>> text SQL commands in a fully transparent manner - like > >> > >>> IgniteCache.execute(String sql). Therefore I see two possible > ways of > >> > >>> solving this: > >> > >>> > >> > >>> - we change API so that it includes new class or interface > parenting > >> > >>> both Query and Update, and clients use it to communicate with > cache > >> > >>> - we let (or make :) ) the client determine command type > >> independently > >> > >>> and behave accordingly - for it to work it will have some kind of > >> > >>> command parsing by itself just to determine its type. Visor > console > >> > >>> may use simple library like JSqlParser > >> > >>> (https://github.com/JSQLParser/JSqlParser; dual LGPL 2.1/ASF 2.0 > >> > >>> licensed) to determine request type in terms of JDBC, and behave > >> > >>> accordingly. > >> > >>> > >> > >>> Personally, I think that the second approach is better - and > here's > >> > why. > >> > >>> > >> > >>> First, it does not seem wise to change API simply to make console > (or > >> > >>> any other) clients simpler. Programmatic APIs should be concise > and > >> > >>> short for programmatic use, console clients should be easy to use > >> from > >> > >>> console - and that's it: after all, console client exists to free > a > >> > >>> user from burden of doing things programmatically, so its aim is > to > >> > >>> adapt API to console or whatever UI. > >> > >>> Second, possible complications in client implied by such approach > >> > >>> certainly won't be dramatic - I don't think that additional single > >> > >>> query parsing operation in client code will make it much harder to > >> > >>> develop. > >> > >>> Third, as I see it now, adding a new "synthetic" entity and new > >> method > >> > >>> would take more effort to adapting the client to new API. > >> > >>> > >> > >>> Dmitry, Sergi, I would like to hear what you think about it all. > >> > Thanks. > >> > >>> > >> > >>> - Alex > >> > >>> > >> > >>> 2016-07-21 21:17 GMT+03:00 Dmitriy Setrakyan < > [hidden email] > >> >: > >> > >>> > OK, then using your analogy, the current behavior in Ignite is > >> MERGE > >> > for > >> > >>> > the most part. > >> > >>> > > >> > >>> > My preference is that Ignite SQL should work no different from > >> > >>> traditional > >> > >>> > databases, which means: > >> > >>> > > >> > >>> > - INSERT is translated into *putIfAbsent()* call in Ignite > >> > >>> > - UPDATE is translated into *replace()* call in Ignite > >> > >>> > - MERGE is translated into *put()* call in Ignite > >> > >>> > - For SQL BATCH calls we should delegate to Ignite batch > >> operations, > >> > >>> e.g. > >> > >>> > *putAll()* > >> > >>> > > >> > >>> > The above should hold true for atomic and transactional > put/putAll > >> > >>> calls, > >> > >>> > as well as for the data streamer. > >> > >>> > > >> > >>> > Does this make sense? > >> > >>> > > >> > >>> > D. > >> > >>> > > >> > >>> > On Thu, Jul 21, 2016 at 4:06 AM, Sergi Vladykin < > >> > >>> [hidden email]> > >> > >>> > wrote: > >> > >>> > > >> > >>> >> No, this does not make sense. > >> > >>> >> > >> > >>> >> There is no upsert mode in databases. There are operations: > >> INSERT, > >> > >>> UPDATE, > >> > >>> >> DELETE, MERGE. > >> > >>> >> > >> > >>> >> I want to have clear understanding of how they have to behave > in > >> SQL > >> > >>> >> databases and how they will actually behave in Ignite in > different > >> > >>> >> scenarios. Also I want to have clear understanding of > performance > >> > >>> >> implications of each decision here. > >> > >>> >> > >> > >>> >> Anything wrong with that? > >> > >>> >> > >> > >>> >> Sergi > >> > >>> >> > >> > >>> >> On Thu, Jul 21, 2016 at 1:04 PM, Dmitriy Setrakyan < > >> > >>> [hidden email]> > >> > >>> >> wrote: > >> > >>> >> > >> > >>> >> > Serj, are you asking what will happen as of today? Then the > >> answer > >> > >>> to all > >> > >>> >> > your questions is that duplicate keys are not an issue, and > >> Ignite > >> > >>> always > >> > >>> >> > operates in **upsert** mode (which is essentially a *“put(…)” > >> > >>> *method). > >> > >>> >> > > >> > >>> >> > However, the *“insert”* that is suggested by Alex would > delegate > >> > to > >> > >>> >> > *“putIfAbsent(…)”*, which in database world makes more sense. > >> > >>> However, in > >> > >>> >> > this case, the *“update”* syntax should delegate to > >> > *“replace(…)”*, > >> > >>> as > >> > >>> >> > update should fail in case if a key is absent. > >> > >>> >> > > >> > >>> >> > Considering the above, a notion of “*upsert”* or “*merge” > >> > *operation > >> > >>> is > >> > >>> >> > very much needed, as it will give a user an option to perform > >> > >>> >> > “insert-or-update” in 1 call. > >> > >>> >> > > >> > >>> >> > Does this make sense? > >> > >>> >> > > >> > >>> >> > D. > >> > >>> >> > > >> > >>> >> > On Wed, Jul 20, 2016 at 9:39 PM, Sergi Vladykin < > >> > >>> >> [hidden email]> > >> > >>> >> > wrote: > >> > >>> >> > > >> > >>> >> > > I'd prefer to do MERGE operation last because in H2 it is > not > >> > >>> standard > >> > >>> >> > ANSI > >> > >>> >> > > SQL MERGE. Or may be not implement it at all, or may be > >> > contribute > >> > >>> ANSI > >> > >>> >> > > correct version to H2, then implement it on Ignite. Need to > >> > >>> investigate > >> > >>> >> > the > >> > >>> >> > > semantics deeper before making any decisions here. > >> > >>> >> > > > >> > >>> >> > > Lets start with simple scenarios for INSERT and go through > all > >> > the > >> > >>> >> > possible > >> > >>> >> > > cases and answer the questions: > >> > >>> >> > > - What will happen on key conflict in TX cache? > >> > >>> >> > > - What will happen on key conflict in Atomic cache? > >> > >>> >> > > - What will happen with the previous two if we use > DataLoader? > >> > >>> >> > > - How to make these operations efficient (it will be simple > >> > enough > >> > >>> to > >> > >>> >> > > implement them with separate put/putIfAbsent operations but > >> > >>> probably we > >> > >>> >> > > will need some batching like putAllIfAbsent for > efficiency)? > >> > >>> >> > > > >> > >>> >> > > As for API, we still will need to have a single entry point > >> for > >> > >>> all SQL > >> > >>> >> > > queries/commands to allow any console work with it > >> > transparently. > >> > >>> It > >> > >>> >> > would > >> > >>> >> > > be great if we will be able to come up with something > >> consistent > >> > >>> with > >> > >>> >> > this > >> > >>> >> > > idea on public API. > >> > >>> >> > > > >> > >>> >> > > Sergi > >> > >>> >> > > > >> > >>> >> > > > >> > >>> >> > > > >> > >>> >> > > > >> > >>> >> > > > >> > >>> >> > > > >> > >>> >> > > > >> > >>> >> > > > >> > >>> >> > > On Wed, Jul 20, 2016 at 2:23 PM, Dmitriy Setrakyan < > >> > >>> >> > > [hidden email]> > >> > >>> >> > > wrote: > >> > >>> >> > > > >> > >>> >> > > > Like the idea of merge and insert. I need more time to > think > >> > >>> about > >> > >>> >> the > >> > >>> >> > > API > >> > >>> >> > > > changes. > >> > >>> >> > > > > >> > >>> >> > > > Sergi, what do you think? > >> > >>> >> > > > > >> > >>> >> > > > Dmitriy > >> > >>> >> > > > > >> > >>> >> > > > > >> > >>> >> > > > > >> > >>> >> > > > On Jul 20, 2016, at 12:36 PM, Alexander Paschenko < > >> > >>> >> > > > [hidden email]> wrote: > >> > >>> >> > > > > >> > >>> >> > > > >> Thus, I suggest that we implement MERGE as a separate > >> > >>> operation > >> > >>> >> > backed > >> > >>> >> > > > by putIfAbsent operation, while INSERT will be > implemented > >> via > >> > >>> put. > >> > >>> >> > > > > > >> > >>> >> > > > > Sorry, of course I meant that MERGE has to be > put-based, > >> > while > >> > >>> >> INSERT > >> > >>> >> > > > > has to be putIfAbsent-based. > >> > >>> >> > > > > > >> > >>> >> > > > > 2016-07-20 12:30 GMT+03:00 Alexander Paschenko > >> > >>> >> > > > > <[hidden email]>: > >> > >>> >> > > > >> Hell Igniters, > >> > >>> >> > > > >> > >> > >>> >> > > > >> In this thread I would like to share and discuss some > >> > >>> thoughts on > >> > >>> >> > DML > >> > >>> >> > > > >> operations' implementation, so let's start and keep it > >> > here. > >> > >>> >> > Everyone > >> > >>> >> > > > >> is of course welcome to share their suggestions. > >> > >>> >> > > > >> > >> > >>> >> > > > >> For starters, I was thinking about semantics of > INSERT. > >> In > >> > >>> >> > traditional > >> > >>> >> > > > >> RDBMSs, INSERT works only for records whose primary > keys > >> > don't > >> > >>> >> > > > >> conflict with those of records that are already > >> persistent > >> > - > >> > >>> you > >> > >>> >> > can't > >> > >>> >> > > > >> try to insert the same key more than once because > you'll > >> > get > >> > >>> an > >> > >>> >> > error. > >> > >>> >> > > > >> However, semantics of cache put is obviously > different - > >> it > >> > >>> does > >> > >>> >> not > >> > >>> >> > > > >> have anything about duplicate keys, it just quietly > >> updates > >> > >>> values > >> > >>> >> > in > >> > >>> >> > > > >> case of keys' duplication. Still, cache has > putIfAbsent > >> > >>> operation > >> > >>> >> > that > >> > >>> >> > > > >> is closer to traditional notion of INSERT, and H2's > SQL > >> > >>> dialect > >> > >>> >> has > >> > >>> >> > > > >> MERGE operation which corresponds to semantics of > cache > >> > put. > >> > >>> >> Thus, I > >> > >>> >> > > > >> suggest that we implement MERGE as a separate > operation > >> > >>> backed by > >> > >>> >> > > > >> putIfAbsent operation, while INSERT will be > implemented > >> via > >> > >>> put. > >> > >>> >> > > > >> > >> > >>> >> > > > >> And one more, probably more important thing: I suggest > >> > that we > >> > >>> >> > create > >> > >>> >> > > > >> separate class Update and corresponding operation > >> update() > >> > in > >> > >>> >> > > > >> IgniteCache. The reasons are as follows: > >> > >>> >> > > > >> > >> > >>> >> > > > >> - Query bears some flags that are clearly redundant > for > >> > Update > >> > >>> >> (page > >> > >>> >> > > > >> size, locality) > >> > >>> >> > > > >> - query() method in IgniteCache (one that accepts > Query) > >> > and > >> > >>> >> query() > >> > >>> >> > > > >> methods in GridQueryIndexing return iterators. So, if > we > >> > >>> strive to > >> > >>> >> > > > >> leave interfaces unchanged, we still will introduce > some > >> > >>> design > >> > >>> >> > > > >> ugliness like query methods returning empty iterators > for > >> > >>> certain > >> > >>> >> > > > >> queries, and/or query flags that indicate whether > it's an > >> > >>> update > >> > >>> >> > query > >> > >>> >> > > > >> or not, etc. > >> > >>> >> > > > >> - If some Queries are update queries, then continuous > >> > queries > >> > >>> >> can't > >> > >>> >> > be > >> > >>> >> > > > >> based on them - more design-wise ugly checks and stuff > >> like > >> > >>> that. > >> > >>> >> > > > >> - I'm pretty sure there's more I don't know about. > >> > >>> >> > > > >> > >> > >>> >> > > > >> Comments and suggestions are welcome. Sergi Vladykin, > >> > Dmitry > >> > >>> >> > > > >> Setrakyan, your opinions are of particular interest, > >> please > >> > >>> >> advise. > >> > >>> >> > > > >> > >> > >>> >> > > > >> Regards, > >> > >>> >> > > > >> Alex > >> > >>> >> > > > > >> > >>> >> > > > >> > >>> >> > > >> > >>> >> > >> > >>> > >> > >> > >> > >> > >> > > >> > |
Free forum by Nabble | Edit this page |