Spark+Ignite SQL syntax proposal

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Spark+Ignite SQL syntax proposal

Nikolay Izhikov
Hello, guys.

I’m working on IGNITE-3084 [1] “Spark Data Frames Support in Apache Ignite”
and have a proposal to discuss.

I want to provide a consistent way to query Ignite key-value caches from
Spark SQL engine.

To implement it I have to determine java class for the key and value.
It required for calculating schema for a Spark Data Frame.
As far as I know, there is no meta information for key-value cache in
Ignite for now.

If a regular data source is used, a user can provide key class and value
class throw options. Example:

```
val df = spark.read
  .format(IGNITE)
  .option("config", CONFIG)
  .option("cache", CACHE_NAME)
  .option("keyClass", "java.lang.Long")
  .option("valueClass", "java.lang.String")
  .load()

df.printSchema()

df.createOrReplaceTempView("testCache")

val igniteDF = spark.sql("SELECT key, value FROM testCache WHERE key >= 2
AND value like '%0'")
```

But If we use Ignite implementation of Spark catalog we don’t want to
register existing caches by hand.
Anton Vinogradov proposes syntax that I personally like very much:

*Let’s use following table name for a key-value cache -
`cacheName[keyClass,valueClass]`*

Example:

```
val df3 = igniteSession.sql("SELECT * FROM
`testCache[java.lang.Integer,java.lang.String]` WHERE key % 2 = 0")

df3.printSchema()

df3.show()
```

Thoughts?

[1] https://issues.apache.org/jira/browse/IGNITE-3084

--
Nikolay Izhikov
[hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: Spark+Ignite SQL syntax proposal

Valentin Kulichenko
Nikolay,

I don't understand. Why do we require to provide key and value types in
SQL? What is the issue you're trying to solve with this syntax?

-Val

On Thu, Oct 5, 2017 at 7:05 AM, Николай Ижиков <[hidden email]>
wrote:

> Hello, guys.
>
> I’m working on IGNITE-3084 [1] “Spark Data Frames Support in Apache Ignite”
> and have a proposal to discuss.
>
> I want to provide a consistent way to query Ignite key-value caches from
> Spark SQL engine.
>
> To implement it I have to determine java class for the key and value.
> It required for calculating schema for a Spark Data Frame.
> As far as I know, there is no meta information for key-value cache in
> Ignite for now.
>
> If a regular data source is used, a user can provide key class and value
> class throw options. Example:
>
> ```
> val df = spark.read
>   .format(IGNITE)
>   .option("config", CONFIG)
>   .option("cache", CACHE_NAME)
>   .option("keyClass", "java.lang.Long")
>   .option("valueClass", "java.lang.String")
>   .load()
>
> df.printSchema()
>
> df.createOrReplaceTempView("testCache")
>
> val igniteDF = spark.sql("SELECT key, value FROM testCache WHERE key >= 2
> AND value like '%0'")
> ```
>
> But If we use Ignite implementation of Spark catalog we don’t want to
> register existing caches by hand.
> Anton Vinogradov proposes syntax that I personally like very much:
>
> *Let’s use following table name for a key-value cache -
> `cacheName[keyClass,valueClass]`*
>
> Example:
>
> ```
> val df3 = igniteSession.sql("SELECT * FROM
> `testCache[java.lang.Integer,java.lang.String]` WHERE key % 2 = 0")
>
> df3.printSchema()
>
> df3.show()
> ```
>
> Thoughts?
>
> [1] https://issues.apache.org/jira/browse/IGNITE-3084
>
> --
> Nikolay Izhikov
> [hidden email]
>
Reply | Threaded
Open this post in threaded view
|

Re: Spark+Ignite SQL syntax proposal

Nikolay Izhikov
Hello, Valentin.

I implemented the ability to make Spark SQL Queries for both:

1.  Ignite SQL Table. Internally table described by QueryEntity with meta
information about data.
2.  Key-Value cache - regular Ignite cache without meta information about
stored data.

In the second case, we have to know which types cache stores.
So for this case, I propose use syntax I describe


2017-10-05 20:45 GMT+03:00 Valentin Kulichenko <
[hidden email]>:

> Nikolay,
>
> I don't understand. Why do we require to provide key and value types in
> SQL? What is the issue you're trying to solve with this syntax?
>
> -Val
>
> On Thu, Oct 5, 2017 at 7:05 AM, Николай Ижиков <[hidden email]>
> wrote:
>
> > Hello, guys.
> >
> > I’m working on IGNITE-3084 [1] “Spark Data Frames Support in Apache
> Ignite”
> > and have a proposal to discuss.
> >
> > I want to provide a consistent way to query Ignite key-value caches from
> > Spark SQL engine.
> >
> > To implement it I have to determine java class for the key and value.
> > It required for calculating schema for a Spark Data Frame.
> > As far as I know, there is no meta information for key-value cache in
> > Ignite for now.
> >
> > If a regular data source is used, a user can provide key class and value
> > class throw options. Example:
> >
> > ```
> > val df = spark.read
> >   .format(IGNITE)
> >   .option("config", CONFIG)
> >   .option("cache", CACHE_NAME)
> >   .option("keyClass", "java.lang.Long")
> >   .option("valueClass", "java.lang.String")
> >   .load()
> >
> > df.printSchema()
> >
> > df.createOrReplaceTempView("testCache")
> >
> > val igniteDF = spark.sql("SELECT key, value FROM testCache WHERE key >= 2
> > AND value like '%0'")
> > ```
> >
> > But If we use Ignite implementation of Spark catalog we don’t want to
> > register existing caches by hand.
> > Anton Vinogradov proposes syntax that I personally like very much:
> >
> > *Let’s use following table name for a key-value cache -
> > `cacheName[keyClass,valueClass]`*
> >
> > Example:
> >
> > ```
> > val df3 = igniteSession.sql("SELECT * FROM
> > `testCache[java.lang.Integer,java.lang.String]` WHERE key % 2 = 0")
> >
> > df3.printSchema()
> >
> > df3.show()
> > ```
> >
> > Thoughts?
> >
> > [1] https://issues.apache.org/jira/browse/IGNITE-3084
> >
> > --
> > Nikolay Izhikov
> > [hidden email]
> >
>



--
Nikolay Izhikov
[hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: Spark+Ignite SQL syntax proposal

Valentin Kulichenko
Nikolay,

I don't think we need this, especially with this kind of syntax which is
very confusing. Main use case for data frames is SQL, so let's concentrate
on it. We should use Ignite's SQL engine capabilities as much as possible.
If we see other use cases down the road, we can always support them.

-Val

On Thu, Oct 5, 2017 at 10:57 AM, Николай Ижиков <[hidden email]>
wrote:

> Hello, Valentin.
>
> I implemented the ability to make Spark SQL Queries for both:
>
> 1.  Ignite SQL Table. Internally table described by QueryEntity with meta
> information about data.
> 2.  Key-Value cache - regular Ignite cache without meta information about
> stored data.
>
> In the second case, we have to know which types cache stores.
> So for this case, I propose use syntax I describe
>
>
> 2017-10-05 20:45 GMT+03:00 Valentin Kulichenko <
> [hidden email]>:
>
> > Nikolay,
> >
> > I don't understand. Why do we require to provide key and value types in
> > SQL? What is the issue you're trying to solve with this syntax?
> >
> > -Val
> >
> > On Thu, Oct 5, 2017 at 7:05 AM, Николай Ижиков <[hidden email]>
> > wrote:
> >
> > > Hello, guys.
> > >
> > > I’m working on IGNITE-3084 [1] “Spark Data Frames Support in Apache
> > Ignite”
> > > and have a proposal to discuss.
> > >
> > > I want to provide a consistent way to query Ignite key-value caches
> from
> > > Spark SQL engine.
> > >
> > > To implement it I have to determine java class for the key and value.
> > > It required for calculating schema for a Spark Data Frame.
> > > As far as I know, there is no meta information for key-value cache in
> > > Ignite for now.
> > >
> > > If a regular data source is used, a user can provide key class and
> value
> > > class throw options. Example:
> > >
> > > ```
> > > val df = spark.read
> > >   .format(IGNITE)
> > >   .option("config", CONFIG)
> > >   .option("cache", CACHE_NAME)
> > >   .option("keyClass", "java.lang.Long")
> > >   .option("valueClass", "java.lang.String")
> > >   .load()
> > >
> > > df.printSchema()
> > >
> > > df.createOrReplaceTempView("testCache")
> > >
> > > val igniteDF = spark.sql("SELECT key, value FROM testCache WHERE key
> >= 2
> > > AND value like '%0'")
> > > ```
> > >
> > > But If we use Ignite implementation of Spark catalog we don’t want to
> > > register existing caches by hand.
> > > Anton Vinogradov proposes syntax that I personally like very much:
> > >
> > > *Let’s use following table name for a key-value cache -
> > > `cacheName[keyClass,valueClass]`*
> > >
> > > Example:
> > >
> > > ```
> > > val df3 = igniteSession.sql("SELECT * FROM
> > > `testCache[java.lang.Integer,java.lang.String]` WHERE key % 2 = 0")
> > >
> > > df3.printSchema()
> > >
> > > df3.show()
> > > ```
> > >
> > > Thoughts?
> > >
> > > [1] https://issues.apache.org/jira/browse/IGNITE-3084
> > >
> > > --
> > > Nikolay Izhikov
> > > [hidden email]
> > >
> >
>
>
>
> --
> Nikolay Izhikov
> [hidden email]
>
Reply | Threaded
Open this post in threaded view
|

Re: Spark+Ignite SQL syntax proposal

dmagda
I tend to agree with Val that key-value support seems excessive. My suggestion is to consider Ignite as a SQL database for this specific integration implementing only relevant functionality.


Denis

> On Oct 5, 2017, at 5:41 PM, Valentin Kulichenko <[hidden email]> wrote:
>
> Nikolay,
>
> I don't think we need this, especially with this kind of syntax which is
> very confusing. Main use case for data frames is SQL, so let's concentrate
> on it. We should use Ignite's SQL engine capabilities as much as possible.
> If we see other use cases down the road, we can always support them.
>
> -Val
>
> On Thu, Oct 5, 2017 at 10:57 AM, Николай Ижиков <[hidden email]>
> wrote:
>
>> Hello, Valentin.
>>
>> I implemented the ability to make Spark SQL Queries for both:
>>
>> 1.  Ignite SQL Table. Internally table described by QueryEntity with meta
>> information about data.
>> 2.  Key-Value cache - regular Ignite cache without meta information about
>> stored data.
>>
>> In the second case, we have to know which types cache stores.
>> So for this case, I propose use syntax I describe
>>
>>
>> 2017-10-05 20:45 GMT+03:00 Valentin Kulichenko <
>> [hidden email]>:
>>
>>> Nikolay,
>>>
>>> I don't understand. Why do we require to provide key and value types in
>>> SQL? What is the issue you're trying to solve with this syntax?
>>>
>>> -Val
>>>
>>> On Thu, Oct 5, 2017 at 7:05 AM, Николай Ижиков <[hidden email]>
>>> wrote:
>>>
>>>> Hello, guys.
>>>>
>>>> I’m working on IGNITE-3084 [1] “Spark Data Frames Support in Apache
>>> Ignite”
>>>> and have a proposal to discuss.
>>>>
>>>> I want to provide a consistent way to query Ignite key-value caches
>> from
>>>> Spark SQL engine.
>>>>
>>>> To implement it I have to determine java class for the key and value.
>>>> It required for calculating schema for a Spark Data Frame.
>>>> As far as I know, there is no meta information for key-value cache in
>>>> Ignite for now.
>>>>
>>>> If a regular data source is used, a user can provide key class and
>> value
>>>> class throw options. Example:
>>>>
>>>> ```
>>>> val df = spark.read
>>>>  .format(IGNITE)
>>>>  .option("config", CONFIG)
>>>>  .option("cache", CACHE_NAME)
>>>>  .option("keyClass", "java.lang.Long")
>>>>  .option("valueClass", "java.lang.String")
>>>>  .load()
>>>>
>>>> df.printSchema()
>>>>
>>>> df.createOrReplaceTempView("testCache")
>>>>
>>>> val igniteDF = spark.sql("SELECT key, value FROM testCache WHERE key
>>> = 2
>>>> AND value like '%0'")
>>>> ```
>>>>
>>>> But If we use Ignite implementation of Spark catalog we don’t want to
>>>> register existing caches by hand.
>>>> Anton Vinogradov proposes syntax that I personally like very much:
>>>>
>>>> *Let’s use following table name for a key-value cache -
>>>> `cacheName[keyClass,valueClass]`*
>>>>
>>>> Example:
>>>>
>>>> ```
>>>> val df3 = igniteSession.sql("SELECT * FROM
>>>> `testCache[java.lang.Integer,java.lang.String]` WHERE key % 2 = 0")
>>>>
>>>> df3.printSchema()
>>>>
>>>> df3.show()
>>>> ```
>>>>
>>>> Thoughts?
>>>>
>>>> [1] https://issues.apache.org/jira/browse/IGNITE-3084
>>>>
>>>> --
>>>> Nikolay Izhikov
>>>> [hidden email]
>>>>
>>>
>>
>>
>>
>> --
>> Nikolay Izhikov
>> [hidden email]
>>

Reply | Threaded
Open this post in threaded view
|

Re: Spark+Ignite SQL syntax proposal

Nikolay Izhikov
Ok. Got it. Will remove key value support from catalog.

6 окт. 2017 г. 6:34 AM пользователь "Denis Magda" <[hidden email]>
написал:

> I tend to agree with Val that key-value support seems excessive. My
> suggestion is to consider Ignite as a SQL database for this specific
> integration implementing only relevant functionality.
>
> —
> Denis
>
> > On Oct 5, 2017, at 5:41 PM, Valentin Kulichenko <
> [hidden email]> wrote:
> >
> > Nikolay,
> >
> > I don't think we need this, especially with this kind of syntax which is
> > very confusing. Main use case for data frames is SQL, so let's
> concentrate
> > on it. We should use Ignite's SQL engine capabilities as much as
> possible.
> > If we see other use cases down the road, we can always support them.
> >
> > -Val
> >
> > On Thu, Oct 5, 2017 at 10:57 AM, Николай Ижиков <[hidden email]>
> > wrote:
> >
> >> Hello, Valentin.
> >>
> >> I implemented the ability to make Spark SQL Queries for both:
> >>
> >> 1.  Ignite SQL Table. Internally table described by QueryEntity with
> meta
> >> information about data.
> >> 2.  Key-Value cache - regular Ignite cache without meta information
> about
> >> stored data.
> >>
> >> In the second case, we have to know which types cache stores.
> >> So for this case, I propose use syntax I describe
> >>
> >>
> >> 2017-10-05 20:45 GMT+03:00 Valentin Kulichenko <
> >> [hidden email]>:
> >>
> >>> Nikolay,
> >>>
> >>> I don't understand. Why do we require to provide key and value types in
> >>> SQL? What is the issue you're trying to solve with this syntax?
> >>>
> >>> -Val
> >>>
> >>> On Thu, Oct 5, 2017 at 7:05 AM, Николай Ижиков <[hidden email]
> >
> >>> wrote:
> >>>
> >>>> Hello, guys.
> >>>>
> >>>> I’m working on IGNITE-3084 [1] “Spark Data Frames Support in Apache
> >>> Ignite”
> >>>> and have a proposal to discuss.
> >>>>
> >>>> I want to provide a consistent way to query Ignite key-value caches
> >> from
> >>>> Spark SQL engine.
> >>>>
> >>>> To implement it I have to determine java class for the key and value.
> >>>> It required for calculating schema for a Spark Data Frame.
> >>>> As far as I know, there is no meta information for key-value cache in
> >>>> Ignite for now.
> >>>>
> >>>> If a regular data source is used, a user can provide key class and
> >> value
> >>>> class throw options. Example:
> >>>>
> >>>> ```
> >>>> val df = spark.read
> >>>>  .format(IGNITE)
> >>>>  .option("config", CONFIG)
> >>>>  .option("cache", CACHE_NAME)
> >>>>  .option("keyClass", "java.lang.Long")
> >>>>  .option("valueClass", "java.lang.String")
> >>>>  .load()
> >>>>
> >>>> df.printSchema()
> >>>>
> >>>> df.createOrReplaceTempView("testCache")
> >>>>
> >>>> val igniteDF = spark.sql("SELECT key, value FROM testCache WHERE key
> >>> = 2
> >>>> AND value like '%0'")
> >>>> ```
> >>>>
> >>>> But If we use Ignite implementation of Spark catalog we don’t want to
> >>>> register existing caches by hand.
> >>>> Anton Vinogradov proposes syntax that I personally like very much:
> >>>>
> >>>> *Let’s use following table name for a key-value cache -
> >>>> `cacheName[keyClass,valueClass]`*
> >>>>
> >>>> Example:
> >>>>
> >>>> ```
> >>>> val df3 = igniteSession.sql("SELECT * FROM
> >>>> `testCache[java.lang.Integer,java.lang.String]` WHERE key % 2 = 0")
> >>>>
> >>>> df3.printSchema()
> >>>>
> >>>> df3.show()
> >>>> ```
> >>>>
> >>>> Thoughts?
> >>>>
> >>>> [1] https://issues.apache.org/jira/browse/IGNITE-3084
> >>>>
> >>>> --
> >>>> Nikolay Izhikov
> >>>> [hidden email]
> >>>>
> >>>
> >>
> >>
> >>
> >> --
> >> Nikolay Izhikov
> >> [hidden email]
> >>
>
>
Ray
Reply | Threaded
Open this post in threaded view
|

Re: Spark+Ignite SQL syntax proposal

Ray
In reply to this post by Nikolay Izhikov
Hi Nikolay,

Could you also implement the DataFrame support for spark-2.10 module?
There's some legacy spark users still using Spark 1.6, they need the
DataFrame features too.

Thanks




--
Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
Ray
Reply | Threaded
Open this post in threaded view
|

Re: Spark+Ignite SQL syntax proposal

Ray
In reply to this post by Nikolay Izhikov
Hi Nikolay,

Could you also implement the DataFrame support for spark-2.10 module?
There's still some legacy users who still uses spark 1.6, they need
DataFrame feature too.

Thanks



--
Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Spark+Ignite SQL syntax proposal

Nikolay Izhikov
In reply to this post by Ray
Hello, Ray.

I think it can be done as a second step.
After DataFrame for a current Spark release would be merged.

Thoughts?

07.10.2017 16:42, Ray пишет:

> Hi Nikolay,
>
> Could you also implement the DataFrame support for spark-2.10 module?
> There's some legacy spark users still using Spark 1.6, they need the
> DataFrame features too.
>
> Thanks
>
>
>
>
> --
> Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
>