Apache Ignite Developers - Legacy Mail Archive

Spark+Ignite SQL syntax proposal

Classic

List

Threaded

9 messages Options

Nikolay Izhikov

Spark+Ignite SQL syntax proposal

Hello, guys.

I’m working on IGNITE-3084 [1] “Spark Data Frames Support in Apache Ignite”
and have a proposal to discuss.

I want to provide a consistent way to query Ignite key-value caches from
Spark SQL engine.

To implement it I have to determine java class for the key and value.
It required for calculating schema for a Spark Data Frame.
As far as I know, there is no meta information for key-value cache in
Ignite for now.

If a regular data source is used, a user can provide key class and value
class throw options. Example:

```
val df = spark.read
.format(IGNITE)
.option("config", CONFIG)
.option("cache", CACHE_NAME)
.option("keyClass", "java.lang.Long")
.option("valueClass", "java.lang.String")
.load()

df.printSchema()

df.createOrReplaceTempView("testCache")

val igniteDF = spark.sql("SELECT key, value FROM testCache WHERE key >= 2
AND value like '%0'")
```

But If we use Ignite implementation of Spark catalog we don’t want to
register existing caches by hand.
Anton Vinogradov proposes syntax that I personally like very much:

*Let’s use following table name for a key-value cache -
`cacheName[keyClass,valueClass]`*

Example:

```
val df3 = igniteSession.sql("SELECT * FROM
`testCache[java.lang.Integer,java.lang.String]` WHERE key % 2 = 0")

df3.printSchema()

df3.show()
```

Thoughts?

[1] https://issues.apache.org/jira/browse/IGNITE-3084

--
Nikolay Izhikov
[hidden email]

Valentin Kulichenko

Re: Spark+Ignite SQL syntax proposal

Nikolay,

I don't understand. Why do we require to provide key and value types in
SQL? What is the issue you're trying to solve with this syntax?

-Val

On Thu, Oct 5, 2017 at 7:05 AM, Николай Ижиков <[hidden email]>
wrote:

> Hello, guys.
>
> I’m working on IGNITE-3084 [1] “Spark Data Frames Support in Apache Ignite”
> and have a proposal to discuss.
>
> I want to provide a consistent way to query Ignite key-value caches from
> Spark SQL engine.
>
> To implement it I have to determine java class for the key and value.
> It required for calculating schema for a Spark Data Frame.
> As far as I know, there is no meta information for key-value cache in
> Ignite for now.
>
> If a regular data source is used, a user can provide key class and value
> class throw options. Example:
>
> ```
> val df = spark.read
> .format(IGNITE)
> .option("config", CONFIG)
> .option("cache", CACHE_NAME)
> .option("keyClass", "java.lang.Long")
> .option("valueClass", "java.lang.String")
> .load()
>
> df.printSchema()
>
> df.createOrReplaceTempView("testCache")
>
> val igniteDF = spark.sql("SELECT key, value FROM testCache WHERE key >= 2
> AND value like '%0'")
> ```
>
> But If we use Ignite implementation of Spark catalog we don’t want to
> register existing caches by hand.
> Anton Vinogradov proposes syntax that I personally like very much:
>
> *Let’s use following table name for a key-value cache -
> `cacheName[keyClass,valueClass]`*
>
> Example:
>
> ```
> val df3 = igniteSession.sql("SELECT * FROM
> `testCache[java.lang.Integer,java.lang.String]` WHERE key % 2 = 0")
>
> df3.printSchema()
>
> df3.show()
> ```
>
> Thoughts?
>
> [1] https://issues.apache.org/jira/browse/IGNITE-3084
>
> --
> Nikolay Izhikov
> [hidden email]
>

Nikolay Izhikov

Re: Spark+Ignite SQL syntax proposal

Hello, Valentin.

I implemented the ability to make Spark SQL Queries for both:

1. Ignite SQL Table. Internally table described by QueryEntity with meta
information about data.
2. Key-Value cache - regular Ignite cache without meta information about
stored data.

In the second case, we have to know which types cache stores.
So for this case, I propose use syntax I describe

2017-10-05 20:45 GMT+03:00 Valentin Kulichenko <
[hidden email]>:

> Nikolay,
>
> I don't understand. Why do we require to provide key and value types in
> SQL? What is the issue you're trying to solve with this syntax?
>
> -Val
>
> On Thu, Oct 5, 2017 at 7:05 AM, Николай Ижиков <[hidden email]>
> wrote:
>
> > Hello, guys.
> >
> > I’m working on IGNITE-3084 [1] “Spark Data Frames Support in Apache
> Ignite”
> > and have a proposal to discuss.
> >
> > I want to provide a consistent way to query Ignite key-value caches from
> > Spark SQL engine.
> >
> > To implement it I have to determine java class for the key and value.
> > It required for calculating schema for a Spark Data Frame.
> > As far as I know, there is no meta information for key-value cache in
> > Ignite for now.
> >
> > If a regular data source is used, a user can provide key class and value
> > class throw options. Example:
> >
> > ```
> > val df = spark.read
> > .format(IGNITE)
> > .option("config", CONFIG)
> > .option("cache", CACHE_NAME)
> > .option("keyClass", "java.lang.Long")
> > .option("valueClass", "java.lang.String")
> > .load()
> >
> > df.printSchema()
> >
> > df.createOrReplaceTempView("testCache")
> >
> > val igniteDF = spark.sql("SELECT key, value FROM testCache WHERE key >= 2
> > AND value like '%0'")
> > ```
> >
> > But If we use Ignite implementation of Spark catalog we don’t want to
> > register existing caches by hand.
> > Anton Vinogradov proposes syntax that I personally like very much:
> >
> > *Let’s use following table name for a key-value cache -
> > `cacheName[keyClass,valueClass]`*
> >
> > Example:
> >
> > ```
> > val df3 = igniteSession.sql("SELECT * FROM
> > `testCache[java.lang.Integer,java.lang.String]` WHERE key % 2 = 0")
> >
> > df3.printSchema()
> >
> > df3.show()
> > ```
> >
> > Thoughts?
> >
> > [1] https://issues.apache.org/jira/browse/IGNITE-3084
> >
> > --
> > Nikolay Izhikov
> > [hidden email]
> >
>

--
Nikolay Izhikov
[hidden email]

Valentin Kulichenko

Re: Spark+Ignite SQL syntax proposal

Nikolay,

I don't think we need this, especially with this kind of syntax which is
very confusing. Main use case for data frames is SQL, so let's concentrate
on it. We should use Ignite's SQL engine capabilities as much as possible.
If we see other use cases down the road, we can always support them.

-Val

On Thu, Oct 5, 2017 at 10:57 AM, Николай Ижиков <[hidden email]>
wrote:

> Hello, Valentin.
>
> I implemented the ability to make Spark SQL Queries for both:
>
> 1. Ignite SQL Table. Internally table described by QueryEntity with meta
> information about data.
> 2. Key-Value cache - regular Ignite cache without meta information about
> stored data.
>
> In the second case, we have to know which types cache stores.
> So for this case, I propose use syntax I describe
>
>
> 2017-10-05 20:45 GMT+03:00 Valentin Kulichenko <
> [hidden email]>:
>
> > Nikolay,
> >
> > I don't understand. Why do we require to provide key and value types in
> > SQL? What is the issue you're trying to solve with this syntax?
> >
> > -Val
> >
> > On Thu, Oct 5, 2017 at 7:05 AM, Николай Ижиков <[hidden email]>
> > wrote:
> >
> > > Hello, guys.
> > >
> > > I’m working on IGNITE-3084 [1] “Spark Data Frames Support in Apache
> > Ignite”
> > > and have a proposal to discuss.
> > >
> > > I want to provide a consistent way to query Ignite key-value caches
> from
> > > Spark SQL engine.
> > >
> > > To implement it I have to determine java class for the key and value.
> > > It required for calculating schema for a Spark Data Frame.
> > > As far as I know, there is no meta information for key-value cache in
> > > Ignite for now.
> > >
> > > If a regular data source is used, a user can provide key class and
> value
> > > class throw options. Example:
> > >
> > > ```
> > > val df = spark.read
> > > .format(IGNITE)
> > > .option("config", CONFIG)
> > > .option("cache", CACHE_NAME)
> > > .option("keyClass", "java.lang.Long")
> > > .option("valueClass", "java.lang.String")
> > > .load()
> > >
> > > df.printSchema()
> > >
> > > df.createOrReplaceTempView("testCache")
> > >
> > > val igniteDF = spark.sql("SELECT key, value FROM testCache WHERE key
> >= 2
> > > AND value like '%0'")
> > > ```
> > >
> > > But If we use Ignite implementation of Spark catalog we don’t want to
> > > register existing caches by hand.
> > > Anton Vinogradov proposes syntax that I personally like very much:
> > >
> > > *Let’s use following table name for a key-value cache -
> > > `cacheName[keyClass,valueClass]`*
> > >
> > > Example:
> > >
> > > ```
> > > val df3 = igniteSession.sql("SELECT * FROM
> > > `testCache[java.lang.Integer,java.lang.String]` WHERE key % 2 = 0")
> > >
> > > df3.printSchema()
> > >
> > > df3.show()
> > > ```
> > >
> > > Thoughts?
> > >
> > > [1] https://issues.apache.org/jira/browse/IGNITE-3084
> > >
> > > --
> > > Nikolay Izhikov
> > > [hidden email]
> > >
> >
>
>
>
> --
> Nikolay Izhikov
> [hidden email]
>

dmagda

Re: Spark+Ignite SQL syntax proposal

I tend to agree with Val that key-value support seems excessive. My suggestion is to consider Ignite as a SQL database for this specific integration implementing only relevant functionality.

—
Denis

> On Oct 5, 2017, at 5:41 PM, Valentin Kulichenko <[hidden email]> wrote:
>
> Nikolay,
>
> I don't think we need this, especially with this kind of syntax which is
> very confusing. Main use case for data frames is SQL, so let's concentrate
> on it. We should use Ignite's SQL engine capabilities as much as possible.
> If we see other use cases down the road, we can always support them.
>
> -Val
>
> On Thu, Oct 5, 2017 at 10:57 AM, Николай Ижиков <[hidden email]>
> wrote:
>
>> Hello, Valentin.
>>
>> I implemented the ability to make Spark SQL Queries for both:
>>
>> 1. Ignite SQL Table. Internally table described by QueryEntity with meta
>> information about data.
>> 2. Key-Value cache - regular Ignite cache without meta information about
>> stored data.
>>
>> In the second case, we have to know which types cache stores.
>> So for this case, I propose use syntax I describe
>>
>>
>> 2017-10-05 20:45 GMT+03:00 Valentin Kulichenko <
>> [hidden email]>:
>>
>>> Nikolay,
>>>
>>> I don't understand. Why do we require to provide key and value types in
>>> SQL? What is the issue you're trying to solve with this syntax?
>>>
>>> -Val
>>>
>>> On Thu, Oct 5, 2017 at 7:05 AM, Николай Ижиков <[hidden email]>
>>> wrote:
>>>
>>>> Hello, guys.
>>>>
>>>> I’m working on IGNITE-3084 [1] “Spark Data Frames Support in Apache
>>> Ignite”
>>>> and have a proposal to discuss.
>>>>
>>>> I want to provide a consistent way to query Ignite key-value caches
>> from
>>>> Spark SQL engine.
>>>>
>>>> To implement it I have to determine java class for the key and value.
>>>> It required for calculating schema for a Spark Data Frame.
>>>> As far as I know, there is no meta information for key-value cache in
>>>> Ignite for now.
>>>>
>>>> If a regular data source is used, a user can provide key class and
>> value
>>>> class throw options. Example:
>>>>
>>>> ```
>>>> val df = spark.read
>>>> .format(IGNITE)
>>>> .option("config", CONFIG)
>>>> .option("cache", CACHE_NAME)
>>>> .option("keyClass", "java.lang.Long")
>>>> .option("valueClass", "java.lang.String")
>>>> .load()
>>>>
>>>> df.printSchema()
>>>>
>>>> df.createOrReplaceTempView("testCache")
>>>>
>>>> val igniteDF = spark.sql("SELECT key, value FROM testCache WHERE key
>>> = 2
>>>> AND value like '%0'")
>>>> ```
>>>>
>>>> But If we use Ignite implementation of Spark catalog we don’t want to
>>>> register existing caches by hand.
>>>> Anton Vinogradov proposes syntax that I personally like very much:
>>>>
>>>> *Let’s use following table name for a key-value cache -
>>>> `cacheName[keyClass,valueClass]`*
>>>>
>>>> Example:
>>>>
>>>> ```
>>>> val df3 = igniteSession.sql("SELECT * FROM
>>>> `testCache[java.lang.Integer,java.lang.String]` WHERE key % 2 = 0")
>>>>
>>>> df3.printSchema()
>>>>
>>>> df3.show()
>>>> ```
>>>>
>>>> Thoughts?
>>>>
>>>> [1] https://issues.apache.org/jira/browse/IGNITE-3084
>>>>
>>>> --
>>>> Nikolay Izhikov
>>>> [hidden email]
>>>>
>>>
>>
>>
>>
>> --
>> Nikolay Izhikov
>> [hidden email]
>>

Nikolay Izhikov

Re: Spark+Ignite SQL syntax proposal

Ok. Got it. Will remove key value support from catalog.

6 окт. 2017 г. 6:34 AM пользователь "Denis Magda" <[hidden email]>
написал:

> I tend to agree with Val that key-value support seems excessive. My
> suggestion is to consider Ignite as a SQL database for this specific
> integration implementing only relevant functionality.
>
> —
> Denis
>
> > On Oct 5, 2017, at 5:41 PM, Valentin Kulichenko <
> [hidden email]> wrote:
> >
> > Nikolay,
> >
> > I don't think we need this, especially with this kind of syntax which is
> > very confusing. Main use case for data frames is SQL, so let's
> concentrate
> > on it. We should use Ignite's SQL engine capabilities as much as
> possible.
> > If we see other use cases down the road, we can always support them.
> >
> > -Val
> >
> > On Thu, Oct 5, 2017 at 10:57 AM, Николай Ижиков <[hidden email]>
> > wrote:
> >
> >> Hello, Valentin.
> >>
> >> I implemented the ability to make Spark SQL Queries for both:
> >>
> >> 1. Ignite SQL Table. Internally table described by QueryEntity with
> meta
> >> information about data.
> >> 2. Key-Value cache - regular Ignite cache without meta information
> about
> >> stored data.
> >>
> >> In the second case, we have to know which types cache stores.
> >> So for this case, I propose use syntax I describe
> >>
> >>
> >> 2017-10-05 20:45 GMT+03:00 Valentin Kulichenko <
> >> [hidden email]>:
> >>
> >>> Nikolay,
> >>>
> >>> I don't understand. Why do we require to provide key and value types in
> >>> SQL? What is the issue you're trying to solve with this syntax?
> >>>
> >>> -Val
> >>>
> >>> On Thu, Oct 5, 2017 at 7:05 AM, Николай Ижиков <[hidden email]
> >
> >>> wrote:
> >>>
> >>>> Hello, guys.
> >>>>
> >>>> I’m working on IGNITE-3084 [1] “Spark Data Frames Support in Apache
> >>> Ignite”
> >>>> and have a proposal to discuss.
> >>>>
> >>>> I want to provide a consistent way to query Ignite key-value caches
> >> from
> >>>> Spark SQL engine.
> >>>>
> >>>> To implement it I have to determine java class for the key and value.
> >>>> It required for calculating schema for a Spark Data Frame.
> >>>> As far as I know, there is no meta information for key-value cache in
> >>>> Ignite for now.
> >>>>
> >>>> If a regular data source is used, a user can provide key class and
> >> value
> >>>> class throw options. Example:
> >>>>
> >>>> ```
> >>>> val df = spark.read
> >>>> .format(IGNITE)
> >>>> .option("config", CONFIG)
> >>>> .option("cache", CACHE_NAME)
> >>>> .option("keyClass", "java.lang.Long")
> >>>> .option("valueClass", "java.lang.String")
> >>>> .load()
> >>>>
> >>>> df.printSchema()
> >>>>
> >>>> df.createOrReplaceTempView("testCache")
> >>>>
> >>>> val igniteDF = spark.sql("SELECT key, value FROM testCache WHERE key
> >>> = 2
> >>>> AND value like '%0'")
> >>>> ```
> >>>>
> >>>> But If we use Ignite implementation of Spark catalog we don’t want to
> >>>> register existing caches by hand.
> >>>> Anton Vinogradov proposes syntax that I personally like very much:
> >>>>
> >>>> *Let’s use following table name for a key-value cache -
> >>>> `cacheName[keyClass,valueClass]`*
> >>>>
> >>>> Example:
> >>>>
> >>>> ```
> >>>> val df3 = igniteSession.sql("SELECT * FROM
> >>>> `testCache[java.lang.Integer,java.lang.String]` WHERE key % 2 = 0")
> >>>>
> >>>> df3.printSchema()
> >>>>
> >>>> df3.show()
> >>>> ```
> >>>>
> >>>> Thoughts?
> >>>>
> >>>> [1] https://issues.apache.org/jira/browse/IGNITE-3084
> >>>>
> >>>> --
> >>>> Nikolay Izhikov
> >>>> [hidden email]
> >>>>
> >>>
> >>
> >>
> >>
> >> --
> >> Nikolay Izhikov
> >> [hidden email]
> >>
>
>

Ray

Re: Spark+Ignite SQL syntax proposal

In reply to this post by Nikolay Izhikov

Hi Nikolay,

Could you also implement the DataFrame support for spark-2.10 module?
There's some legacy spark users still using Spark 1.6, they need the
DataFrame features too.

Thanks

--
Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/

Ray

Re: Spark+Ignite SQL syntax proposal

In reply to this post by Nikolay Izhikov

Hi Nikolay,

Could you also implement the DataFrame support for spark-2.10 module?
There's still some legacy users who still uses spark 1.6, they need
DataFrame feature too.

Thanks

--
Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/

Nikolay Izhikov

Re: Spark+Ignite SQL syntax proposal

In reply to this post by Ray

Hello, Ray.

I think it can be done as a second step.
After DataFrame for a current Spark release would be merged.

Thoughts?

07.10.2017 16:42, Ray пишет:

> Hi Nikolay,
>
> Could you also implement the DataFrame support for spark-2.10 module?
> There's some legacy spark users still using Spark 1.6, they need the
> DataFrame features too.
>
> Thanks
>
>
>
>
> --
> Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
>