Hello Igniters,
The Ignite Spark SQL interface currently takes just “table name” as a parameter which it uses to supply a Spark dataset with data from the underlying Ignite SQL table with that name. To do this it loops through each cache and finds the first one with the given table name [1]. This causes issues if there are multiple tables registered in different caches with the same table name as you can only access one of those caches from Spark. Is the right thing to do here: 1. Simply not support such a scenario and note in the Spark documentation that table names must be unique? 2. Pass an extra parameter through the Ignite Spark data source which optionally specifies the cache name? 3. Support namespacing in the existing table name parameter, ie “cacheName.tableName”? Thanks, Stuart. [1] https://github.com/apache/ignite/blob/ca973ad99c6112160a305df05be9458e29f88307/modules/spark/src/main/scala/org/apache/ignite/spark/impl/package.scala#L119 |
Stuart,
Two tables can have same names only if they are located in different schemas. Said that, sdding schema name support makes sense to me for sure. We can implement this using either separate SCHEMA_NAME parameter, or similar to what you suggested in option 3 but with schema name instead of cache name. Please feel free to create a ticket. -Val On Tue, Aug 7, 2018 at 9:32 AM Stuart Macdonald <[hidden email]> wrote: > Hello Igniters, > > The Ignite Spark SQL interface currently takes just “table name” as a > parameter which it uses to supply a Spark dataset with data from the > underlying Ignite SQL table with that name. > > To do this it loops through each cache and finds the first one with the > given table name [1]. This causes issues if there are multiple tables > registered in different caches with the same table name as you can only > access one of those caches from Spark. Is the right thing to do here: > > 1. Simply not support such a scenario and note in the Spark documentation > that table names must be unique? > 2. Pass an extra parameter through the Ignite Spark data source which > optionally specifies the cache name? > 3. Support namespacing in the existing table name parameter, ie > “cacheName.tableName”? > > Thanks, > Stuart. > > [1] > > https://github.com/apache/ignite/blob/ca973ad99c6112160a305df05be9458e29f88307/modules/spark/src/main/scala/org/apache/ignite/spark/impl/package.scala#L119 > |
Thanks Val, here’s the ticket:
https://issues.apache.org/jira/projects/IGNITE/issues/IGNITE-9228 <https://issues.apache.org/jira/projects/IGNITE/issues/IGNITE-9228?filter=allopenissues> (Thanks for correcting my terminology - I work mostly with the traditional CacheConfiguration interface where I believe each cache occupies its own schema.) Stuart. On 7 Aug 2018, at 18:34, Valentin Kulichenko <[hidden email]> wrote: Stuart, Two tables can have same names only if they are located in different schemas. Said that, sdding schema name support makes sense to me for sure. We can implement this using either separate SCHEMA_NAME parameter, or similar to what you suggested in option 3 but with schema name instead of cache name. Please feel free to create a ticket. -Val On Tue, Aug 7, 2018 at 9:32 AM Stuart Macdonald <[hidden email]> wrote: Hello Igniters, The Ignite Spark SQL interface currently takes just “table name” as a parameter which it uses to supply a Spark dataset with data from the underlying Ignite SQL table with that name. To do this it loops through each cache and finds the first one with the given table name [1]. This causes issues if there are multiple tables registered in different caches with the same table name as you can only access one of those caches from Spark. Is the right thing to do here: 1. Simply not support such a scenario and note in the Spark documentation that table names must be unique? 2. Pass an extra parameter through the Ignite Spark data source which optionally specifies the cache name? 3. Support namespacing in the existing table name parameter, ie “cacheName.tableName”? Thanks, Stuart. [1] https://github.com/apache/ignite/blob/ca973ad99c6112160a305df05be9458e29f88307/modules/spark/src/main/scala/org/apache/ignite/spark/impl/package.scala#L119 |
Stuart, do you want to work on this ticket?
В Вт, 07/08/2018 в 11:13 -0700, Stuart Macdonald пишет: > Thanks Val, here’s the ticket: > > https://issues.apache.org/jira/projects/IGNITE/issues/IGNITE-9228 > <https://issues.apache.org/jira/projects/IGNITE/issues/IGNITE-9228?filter=allopenissues> > > (Thanks for correcting my terminology - I work mostly with the traditional > CacheConfiguration interface where I believe each cache occupies its own > schema.) > > Stuart. > > On 7 Aug 2018, at 18:34, Valentin Kulichenko <[hidden email]> > wrote: > > Stuart, > > Two tables can have same names only if they are located in different > schemas. Said that, sdding schema name support makes sense to me for sure. > We can implement this using either separate SCHEMA_NAME parameter, or > similar to what you suggested in option 3 but with schema name instead of > cache name. > > Please feel free to create a ticket. > > -Val > > On Tue, Aug 7, 2018 at 9:32 AM Stuart Macdonald <[hidden email]> wrote: > > Hello Igniters, > > > The Ignite Spark SQL interface currently takes just “table name” as a > > parameter which it uses to supply a Spark dataset with data from the > > underlying Ignite SQL table with that name. > > > To do this it loops through each cache and finds the first one with the > > given table name [1]. This causes issues if there are multiple tables > > registered in different caches with the same table name as you can only > > access one of those caches from Spark. Is the right thing to do here: > > > 1. Simply not support such a scenario and note in the Spark documentation > > that table names must be unique? > > 2. Pass an extra parameter through the Ignite Spark data source which > > optionally specifies the cache name? > > 3. Support namespacing in the existing table name parameter, ie > > “cacheName.tableName”? > > > Thanks, > > Stuart. > > > [1] > > > https://github.com/apache/ignite/blob/ca973ad99c6112160a305df05be9458e29f88307/modules/spark/src/main/scala/org/apache/ignite/spark/impl/package.scala#L119 |
Hi Nikolay, yes would be happy to - will likely be early next week.
I’ll go with the approach of adding a new optional field to the Spark data source provider unless there are any objections. Stuart. > On 9 Aug 2018, at 14:20, Nikolay Izhikov <[hidden email]> wrote: > > Stuart, do you want to work on this ticket? > > В Вт, 07/08/2018 в 11:13 -0700, Stuart Macdonald пишет: >> Thanks Val, here’s the ticket: >> >> https://issues.apache.org/jira/projects/IGNITE/issues/IGNITE-9228 >> <https://issues.apache.org/jira/projects/IGNITE/issues/IGNITE-9228?filter=allopenissues> >> >> (Thanks for correcting my terminology - I work mostly with the traditional >> CacheConfiguration interface where I believe each cache occupies its own >> schema.) >> >> Stuart. >> >> On 7 Aug 2018, at 18:34, Valentin Kulichenko <[hidden email]> >> wrote: >> >> Stuart, >> >> Two tables can have same names only if they are located in different >> schemas. Said that, sdding schema name support makes sense to me for sure. >> We can implement this using either separate SCHEMA_NAME parameter, or >> similar to what you suggested in option 3 but with schema name instead of >> cache name. >> >> Please feel free to create a ticket. >> >> -Val >> >> On Tue, Aug 7, 2018 at 9:32 AM Stuart Macdonald <[hidden email]> wrote: >> >> Hello Igniters, >> >> >> The Ignite Spark SQL interface currently takes just “table name” as a >> >> parameter which it uses to supply a Spark dataset with data from the >> >> underlying Ignite SQL table with that name. >> >> >> To do this it loops through each cache and finds the first one with the >> >> given table name [1]. This causes issues if there are multiple tables >> >> registered in different caches with the same table name as you can only >> >> access one of those caches from Spark. Is the right thing to do here: >> >> >> 1. Simply not support such a scenario and note in the Spark documentation >> >> that table names must be unique? >> >> 2. Pass an extra parameter through the Ignite Spark data source which >> >> optionally specifies the cache name? >> >> 3. Support namespacing in the existing table name parameter, ie >> >> “cacheName.tableName”? >> >> >> Thanks, >> >> Stuart. >> >> >> [1] >> >> >> https://github.com/apache/ignite/blob/ca973ad99c6112160a305df05be9458e29f88307/modules/spark/src/main/scala/org/apache/ignite/spark/impl/package.scala#L119 |
Here's the initial pull request for this issue, please review and let me
know your feedback. I had to combine the two approaches to enable this to work for both standard .read() where we can add the schema option, and catalog-based selects where we use schemaName.tableName. Happy to discuss on a call if this isn't clear. https://github.com/apache/ignite/pull/4551 On Thu, Aug 9, 2018 at 2:32 PM, Stuart Macdonald <[hidden email]> wrote: > Hi Nikolay, yes would be happy to - will likely be early next week. I’ll > go with the approach of adding a new optional field to the Spark data > source provider unless there are any objections. > > Stuart. > > > On 9 Aug 2018, at 14:20, Nikolay Izhikov <[hidden email]> wrote: > > > > Stuart, do you want to work on this ticket? > > > > В Вт, 07/08/2018 в 11:13 -0700, Stuart Macdonald пишет: > >> Thanks Val, here’s the ticket: > >> > >> https://issues.apache.org/jira/projects/IGNITE/issues/IGNITE-9228 > >> <https://issues.apache.org/jira/projects/IGNITE/issues/ > IGNITE-9228?filter=allopenissues> > >> > >> (Thanks for correcting my terminology - I work mostly with the > traditional > >> CacheConfiguration interface where I believe each cache occupies its own > >> schema.) > >> > >> Stuart. > >> > >> On 7 Aug 2018, at 18:34, Valentin Kulichenko < > [hidden email]> > >> wrote: > >> > >> Stuart, > >> > >> Two tables can have same names only if they are located in different > >> schemas. Said that, sdding schema name support makes sense to me for > sure. > >> We can implement this using either separate SCHEMA_NAME parameter, or > >> similar to what you suggested in option 3 but with schema name instead > of > >> cache name. > >> > >> Please feel free to create a ticket. > >> > >> -Val > >> > >> On Tue, Aug 7, 2018 at 9:32 AM Stuart Macdonald <[hidden email]> > wrote: > >> > >> Hello Igniters, > >> > >> > >> The Ignite Spark SQL interface currently takes just “table name” as a > >> > >> parameter which it uses to supply a Spark dataset with data from the > >> > >> underlying Ignite SQL table with that name. > >> > >> > >> To do this it loops through each cache and finds the first one with the > >> > >> given table name [1]. This causes issues if there are multiple tables > >> > >> registered in different caches with the same table name as you can only > >> > >> access one of those caches from Spark. Is the right thing to do here: > >> > >> > >> 1. Simply not support such a scenario and note in the Spark > documentation > >> > >> that table names must be unique? > >> > >> 2. Pass an extra parameter through the Ignite Spark data source which > >> > >> optionally specifies the cache name? > >> > >> 3. Support namespacing in the existing table name parameter, ie > >> > >> “cacheName.tableName”? > >> > >> > >> Thanks, > >> > >> Stuart. > >> > >> > >> [1] > >> > >> > >> https://github.com/apache/ignite/blob/ca973ad99c6112160a305df05be945 > 8e29f88307/modules/spark/src/main/scala/org/apache/ignite/ > spark/impl/package.scala#L119 > |
Stuart, can you please move the ticket into PATCH_AVAILABLE state? You need
to click "Submit Patch" button in Jira. D. On Wed, Aug 15, 2018 at 10:22 AM, Stuart Macdonald <[hidden email]> wrote: > Here's the initial pull request for this issue, please review and let me > know your feedback. I had to combine the two approaches to enable this to > work for both standard .read() where we can add the schema option, and > catalog-based selects where we use schemaName.tableName. Happy to discuss > on a call if this isn't clear. > > https://github.com/apache/ignite/pull/4551 > > On Thu, Aug 9, 2018 at 2:32 PM, Stuart Macdonald <[hidden email]> > wrote: > > > Hi Nikolay, yes would be happy to - will likely be early next week. I’ll > > go with the approach of adding a new optional field to the Spark data > > source provider unless there are any objections. > > > > Stuart. > > > > > On 9 Aug 2018, at 14:20, Nikolay Izhikov <[hidden email]> wrote: > > > > > > Stuart, do you want to work on this ticket? > > > > > > В Вт, 07/08/2018 в 11:13 -0700, Stuart Macdonald пишет: > > >> Thanks Val, here’s the ticket: > > >> > > >> https://issues.apache.org/jira/projects/IGNITE/issues/IGNITE-9228 > > >> <https://issues.apache.org/jira/projects/IGNITE/issues/ > > IGNITE-9228?filter=allopenissues> > > >> > > >> (Thanks for correcting my terminology - I work mostly with the > > traditional > > >> CacheConfiguration interface where I believe each cache occupies its > own > > >> schema.) > > >> > > >> Stuart. > > >> > > >> On 7 Aug 2018, at 18:34, Valentin Kulichenko < > > [hidden email]> > > >> wrote: > > >> > > >> Stuart, > > >> > > >> Two tables can have same names only if they are located in different > > >> schemas. Said that, sdding schema name support makes sense to me for > > sure. > > >> We can implement this using either separate SCHEMA_NAME parameter, or > > >> similar to what you suggested in option 3 but with schema name instead > > of > > >> cache name. > > >> > > >> Please feel free to create a ticket. > > >> > > >> -Val > > >> > > >> On Tue, Aug 7, 2018 at 9:32 AM Stuart Macdonald <[hidden email]> > > wrote: > > >> > > >> Hello Igniters, > > >> > > >> > > >> The Ignite Spark SQL interface currently takes just “table name” as a > > >> > > >> parameter which it uses to supply a Spark dataset with data from the > > >> > > >> underlying Ignite SQL table with that name. > > >> > > >> > > >> To do this it loops through each cache and finds the first one with > the > > >> > > >> given table name [1]. This causes issues if there are multiple tables > > >> > > >> registered in different caches with the same table name as you can > only > > >> > > >> access one of those caches from Spark. Is the right thing to do here: > > >> > > >> > > >> 1. Simply not support such a scenario and note in the Spark > > documentation > > >> > > >> that table names must be unique? > > >> > > >> 2. Pass an extra parameter through the Ignite Spark data source which > > >> > > >> optionally specifies the cache name? > > >> > > >> 3. Support namespacing in the existing table name parameter, ie > > >> > > >> “cacheName.tableName”? > > >> > > >> > > >> Thanks, > > >> > > >> Stuart. > > >> > > >> > > >> [1] > > >> > > >> > > >> https://github.com/apache/ignite/blob/ca973ad99c6112160a305df05be945 > > 8e29f88307/modules/spark/src/main/scala/org/apache/ignite/ > > spark/impl/package.scala#L119 > > > |
Hi Dmitriy, thanks - that’s done now,
Stuart. On 16 Aug 2018, at 22:23, Dmitriy Setrakyan <[hidden email]> wrote: Stuart, can you please move the ticket into PATCH_AVAILABLE state? You need to click "Submit Patch" button in Jira. D. On Wed, Aug 15, 2018 at 10:22 AM, Stuart Macdonald <[hidden email]> wrote: > Here's the initial pull request for this issue, please review and let me > know your feedback. I had to combine the two approaches to enable this to > work for both standard .read() where we can add the schema option, and > catalog-based selects where we use schemaName.tableName. Happy to discuss > on a call if this isn't clear. > > https://github.com/apache/ignite/pull/4551 > > On Thu, Aug 9, 2018 at 2:32 PM, Stuart Macdonald <[hidden email]> > wrote: > > > Hi Nikolay, yes would be happy to - will likely be early next week. I’ll > > go with the approach of adding a new optional field to the Spark data > > source provider unless there are any objections. > > > > Stuart. > > > > > On 9 Aug 2018, at 14:20, Nikolay Izhikov <[hidden email]> wrote: > > > > > > Stuart, do you want to work on this ticket? > > > > > > В Вт, 07/08/2018 в 11:13 -0700, Stuart Macdonald пишет: > > >> Thanks Val, here’s the ticket: > > >> > > >> https://issues.apache.org/jira/projects/IGNITE/issues/IGNITE-9228 > > >> <https://issues.apache.org/jira/projects/IGNITE/issues/ > > IGNITE-9228?filter=allopenissues> > > >> > > >> (Thanks for correcting my terminology - I work mostly with the > > traditional > > >> CacheConfiguration interface where I believe each cache occupies its > own > > >> schema.) > > >> > > >> Stuart. > > >> > > >> On 7 Aug 2018, at 18:34, Valentin Kulichenko < > > [hidden email]> > > >> wrote: > > >> > > >> Stuart, > > >> > > >> Two tables can have same names only if they are located in different > > >> schemas. Said that, sdding schema name support makes sense to me for > > sure. > > >> We can implement this using either separate SCHEMA_NAME parameter, or > > >> similar to what you suggested in option 3 but with schema name instead > > of > > >> cache name. > > >> > > >> Please feel free to create a ticket. > > >> > > >> -Val > > >> > > >> On Tue, Aug 7, 2018 at 9:32 AM Stuart Macdonald <[hidden email]> > > wrote: > > >> > > >> Hello Igniters, > > >> > > >> > > >> The Ignite Spark SQL interface currently takes just “table name” as a > > >> > > >> parameter which it uses to supply a Spark dataset with data from the > > >> > > >> underlying Ignite SQL table with that name. > > >> > > >> > > >> To do this it loops through each cache and finds the first one with > the > > >> > > >> given table name [1]. This causes issues if there are multiple tables > > >> > > >> registered in different caches with the same table name as you can > only > > >> > > >> access one of those caches from Spark. Is the right thing to do here: > > >> > > >> > > >> 1. Simply not support such a scenario and note in the Spark > > documentation > > >> > > >> that table names must be unique? > > >> > > >> 2. Pass an extra parameter through the Ignite Spark data source which > > >> > > >> optionally specifies the cache name? > > >> > > >> 3. Support namespacing in the existing table name parameter, ie > > >> > > >> “cacheName.tableName”? > > >> > > >> > > >> Thanks, > > >> > > >> Stuart. > > >> > > >> > > >> [1] > > >> > > >> > > >> https://github.com/apache/ignite/blob/ca973ad99c6112160a305df05be945 > > 8e29f88307/modules/spark/src/main/scala/org/apache/ignite/ > > spark/impl/package.scala#L119 > > > |
Hi Stuart,
I see review already started and Nikolay responded in GitHub. I've added you to contributors list, so now you can assign issues to yourself. Also, I assigned https://issues.apache.org/jira/browse/IGNITE-9228 issue to you. The issue could be correctly filtered by all committers. I hope you don't mind. Sincerely, Dmitriy Pavlov пт, 17 авг. 2018 г. в 10:22, Stuart Macdonald <[hidden email]>: > Hi Dmitriy, thanks - that’s done now, > > Stuart. > > On 16 Aug 2018, at 22:23, Dmitriy Setrakyan <[hidden email]> wrote: > > Stuart, can you please move the ticket into PATCH_AVAILABLE state? You need > to click "Submit Patch" button in Jira. > > D. > > On Wed, Aug 15, 2018 at 10:22 AM, Stuart Macdonald <[hidden email]> > wrote: > > > Here's the initial pull request for this issue, please review and let me > > know your feedback. I had to combine the two approaches to enable this to > > work for both standard .read() where we can add the schema option, and > > catalog-based selects where we use schemaName.tableName. Happy to discuss > > on a call if this isn't clear. > > > > https://github.com/apache/ignite/pull/4551 > > > > On Thu, Aug 9, 2018 at 2:32 PM, Stuart Macdonald <[hidden email]> > > wrote: > > > > > Hi Nikolay, yes would be happy to - will likely be early next week. > I’ll > > > go with the approach of adding a new optional field to the Spark data > > > source provider unless there are any objections. > > > > > > Stuart. > > > > > > > On 9 Aug 2018, at 14:20, Nikolay Izhikov <[hidden email]> > wrote: > > > > > > > > Stuart, do you want to work on this ticket? > > > > > > > > В Вт, 07/08/2018 в 11:13 -0700, Stuart Macdonald пишет: > > > >> Thanks Val, here’s the ticket: > > > >> > > > >> https://issues.apache.org/jira/projects/IGNITE/issues/IGNITE-9228 > > > >> <https://issues.apache.org/jira/projects/IGNITE/issues/ > > > IGNITE-9228?filter=allopenissues> > > > >> > > > >> (Thanks for correcting my terminology - I work mostly with the > > > traditional > > > >> CacheConfiguration interface where I believe each cache occupies its > > own > > > >> schema.) > > > >> > > > >> Stuart. > > > >> > > > >> On 7 Aug 2018, at 18:34, Valentin Kulichenko < > > > [hidden email]> > > > >> wrote: > > > >> > > > >> Stuart, > > > >> > > > >> Two tables can have same names only if they are located in different > > > >> schemas. Said that, sdding schema name support makes sense to me for > > > sure. > > > >> We can implement this using either separate SCHEMA_NAME parameter, > or > > > >> similar to what you suggested in option 3 but with schema name > instead > > > of > > > >> cache name. > > > >> > > > >> Please feel free to create a ticket. > > > >> > > > >> -Val > > > >> > > > >> On Tue, Aug 7, 2018 at 9:32 AM Stuart Macdonald <[hidden email]> > > > wrote: > > > >> > > > >> Hello Igniters, > > > >> > > > >> > > > >> The Ignite Spark SQL interface currently takes just “table name” as > a > > > >> > > > >> parameter which it uses to supply a Spark dataset with data from the > > > >> > > > >> underlying Ignite SQL table with that name. > > > >> > > > >> > > > >> To do this it loops through each cache and finds the first one with > > the > > > >> > > > >> given table name [1]. This causes issues if there are multiple > tables > > > >> > > > >> registered in different caches with the same table name as you can > > only > > > >> > > > >> access one of those caches from Spark. Is the right thing to do > here: > > > >> > > > >> > > > >> 1. Simply not support such a scenario and note in the Spark > > > documentation > > > >> > > > >> that table names must be unique? > > > >> > > > >> 2. Pass an extra parameter through the Ignite Spark data source > which > > > >> > > > >> optionally specifies the cache name? > > > >> > > > >> 3. Support namespacing in the existing table name parameter, ie > > > >> > > > >> “cacheName.tableName”? > > > >> > > > >> > > > >> Thanks, > > > >> > > > >> Stuart. > > > >> > > > >> > > > >> [1] > > > >> > > > >> > > > >> > https://github.com/apache/ignite/blob/ca973ad99c6112160a305df05be945 > > > 8e29f88307/modules/spark/src/main/scala/org/apache/ignite/ > > > spark/impl/package.scala#L119 > > > > > > |
Free forum by Nabble | Edit this page |