The Spark 2.4 support

classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

The Spark 2.4 support

Alexey Zinoviev
Hi, Igniters
I've started the work on the Spark 2.4 support

We started the discussion here, in
https://issues.apache.org/jira/browse/IGNITE-12054

The Spark internals were totally refactored between 2.3 and 2.4 versions,
main changes touches

   - External catalog and listeners refactoring
   - Changes of HAVING operator semantic support
   - Push-down NULL filters generation in JOIN plans
   - minor changes in Plan Generation that should be adopted in our
   integration module

I propose the initial solution here via creation of new module spark-2.4
here https://issues.apache.org/jira/browse/IGNITE-12247 and addition of new
profile spark-2.4 (to avoid possible clashes with another spark versions)

Also I've transformed ticket to an Umbrella ticket and created a few
tickets for muted tests (around 7 from 211 tests are muted now)

Please, if somebody interested in it, make an initial review of modular
ignite structure and changes (without deep diving into Spark code).

And yes, the proposed code is a copy-paste of spark-ignite module with a
few fixes
Reply | Threaded
Open this post in threaded view
|

Re: The Spark 2.4 support

Ivan Pavlukhin
Hi Alexey,

As an external watcher very far from Ignite Spark integration I would
like to ask a humble question for my understanding. Why this
integration uses Spark internals? Is it a common approach for
integrating with Spark?

пн, 30 сент. 2019 г. в 16:17, Alexey Zinoviev <[hidden email]>:

>
> Hi, Igniters
> I've started the work on the Spark 2.4 support
>
> We started the discussion here, in
> https://issues.apache.org/jira/browse/IGNITE-12054
>
> The Spark internals were totally refactored between 2.3 and 2.4 versions,
> main changes touches
>
>    - External catalog and listeners refactoring
>    - Changes of HAVING operator semantic support
>    - Push-down NULL filters generation in JOIN plans
>    - minor changes in Plan Generation that should be adopted in our
>    integration module
>
> I propose the initial solution here via creation of new module spark-2.4
> here https://issues.apache.org/jira/browse/IGNITE-12247 and addition of new
> profile spark-2.4 (to avoid possible clashes with another spark versions)
>
> Also I've transformed ticket to an Umbrella ticket and created a few
> tickets for muted tests (around 7 from 211 tests are muted now)
>
> Please, if somebody interested in it, make an initial review of modular
> ignite structure and changes (without deep diving into Spark code).
>
> And yes, the proposed code is a copy-paste of spark-ignite module with a
> few fixes



--
Best regards,
Ivan Pavlukhin
Reply | Threaded
Open this post in threaded view
|

Re: The Spark 2.4 support

Alexey Zinoviev
Yes, as I understand it uses Spark internals from the first commit)))
The reason - we take Spark SQL query execution plan and try to execute it
on Ignite cluster
Also we inherit a lot of Developer API related classes that could be
unstable. Spark has no good point for extension and this is a reason why we
should go deeper

пн, 30 сент. 2019 г. в 20:17, Ivan Pavlukhin <[hidden email]>:

> Hi Alexey,
>
> As an external watcher very far from Ignite Spark integration I would
> like to ask a humble question for my understanding. Why this
> integration uses Spark internals? Is it a common approach for
> integrating with Spark?
>
> пн, 30 сент. 2019 г. в 16:17, Alexey Zinoviev <[hidden email]>:
> >
> > Hi, Igniters
> > I've started the work on the Spark 2.4 support
> >
> > We started the discussion here, in
> > https://issues.apache.org/jira/browse/IGNITE-12054
> >
> > The Spark internals were totally refactored between 2.3 and 2.4 versions,
> > main changes touches
> >
> >    - External catalog and listeners refactoring
> >    - Changes of HAVING operator semantic support
> >    - Push-down NULL filters generation in JOIN plans
> >    - minor changes in Plan Generation that should be adopted in our
> >    integration module
> >
> > I propose the initial solution here via creation of new module spark-2.4
> > here https://issues.apache.org/jira/browse/IGNITE-12247 and addition of
> new
> > profile spark-2.4 (to avoid possible clashes with another spark versions)
> >
> > Also I've transformed ticket to an Umbrella ticket and created a few
> > tickets for muted tests (around 7 from 211 tests are muted now)
> >
> > Please, if somebody interested in it, make an initial review of modular
> > ignite structure and changes (without deep diving into Spark code).
> >
> > And yes, the proposed code is a copy-paste of spark-ignite module with a
> > few fixes
>
>
>
> --
> Best regards,
> Ivan Pavlukhin
>
Reply | Threaded
Open this post in threaded view
|

Re: The Spark 2.4 support

Nikolay Izhikov-2
Hello, Ivan.

I had a talk about internals of Spark integration in Ignite.
It answers on question why we should use Spark internals.

You can take a look at my meetup talk(in Russian) [1] or read an article if you prefer text [2].

[1] https://www.youtube.com/watch?v=CzbAweNKEVY
[2] https://habr.com/ru/company/sberbank/blog/427297/

В Пн, 30/09/2019 в 20:29 +0300, Alexey Zinoviev пишет:

> Yes, as I understand it uses Spark internals from the first commit)))
> The reason - we take Spark SQL query execution plan and try to execute it
> on Ignite cluster
> Also we inherit a lot of Developer API related classes that could be
> unstable. Spark has no good point for extension and this is a reason why we
> should go deeper
>
> пн, 30 сент. 2019 г. в 20:17, Ivan Pavlukhin <[hidden email]>:
>
> > Hi Alexey,
> >
> > As an external watcher very far from Ignite Spark integration I would
> > like to ask a humble question for my understanding. Why this
> > integration uses Spark internals? Is it a common approach for
> > integrating with Spark?
> >
> > пн, 30 сент. 2019 г. в 16:17, Alexey Zinoviev <[hidden email]>:
> > >
> > > Hi, Igniters
> > > I've started the work on the Spark 2.4 support
> > >
> > > We started the discussion here, in
> > > https://issues.apache.org/jira/browse/IGNITE-12054
> > >
> > > The Spark internals were totally refactored between 2.3 and 2.4 versions,
> > > main changes touches
> > >
> > >    - External catalog and listeners refactoring
> > >    - Changes of HAVING operator semantic support
> > >    - Push-down NULL filters generation in JOIN plans
> > >    - minor changes in Plan Generation that should be adopted in our
> > >    integration module
> > >
> > > I propose the initial solution here via creation of new module spark-2.4
> > > here https://issues.apache.org/jira/browse/IGNITE-12247 and addition of
> >
> > new
> > > profile spark-2.4 (to avoid possible clashes with another spark versions)
> > >
> > > Also I've transformed ticket to an Umbrella ticket and created a few
> > > tickets for muted tests (around 7 from 211 tests are muted now)
> > >
> > > Please, if somebody interested in it, make an initial review of modular
> > > ignite structure and changes (without deep diving into Spark code).
> > >
> > > And yes, the proposed code is a copy-paste of spark-ignite module with a
> > > few fixes
> >
> >
> >
> > --
> > Best regards,
> > Ivan Pavlukhin
> >

signature.asc (499 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: The Spark 2.4 support

dmagda
Nikolay,

Would you be able to review the changes? I'm not sure there is a better
candidate for now.

-
Denis


On Mon, Sep 30, 2019 at 11:01 AM Nikolay Izhikov <[hidden email]>
wrote:

> Hello, Ivan.
>
> I had a talk about internals of Spark integration in Ignite.
> It answers on question why we should use Spark internals.
>
> You can take a look at my meetup talk(in Russian) [1] or read an article
> if you prefer text [2].
>
> [1] https://www.youtube.com/watch?v=CzbAweNKEVY
> [2] https://habr.com/ru/company/sberbank/blog/427297/
>
> В Пн, 30/09/2019 в 20:29 +0300, Alexey Zinoviev пишет:
> > Yes, as I understand it uses Spark internals from the first commit)))
> > The reason - we take Spark SQL query execution plan and try to execute it
> > on Ignite cluster
> > Also we inherit a lot of Developer API related classes that could be
> > unstable. Spark has no good point for extension and this is a reason why
> we
> > should go deeper
> >
> > пн, 30 сент. 2019 г. в 20:17, Ivan Pavlukhin <[hidden email]>:
> >
> > > Hi Alexey,
> > >
> > > As an external watcher very far from Ignite Spark integration I would
> > > like to ask a humble question for my understanding. Why this
> > > integration uses Spark internals? Is it a common approach for
> > > integrating with Spark?
> > >
> > > пн, 30 сент. 2019 г. в 16:17, Alexey Zinoviev <[hidden email]
> >:
> > > >
> > > > Hi, Igniters
> > > > I've started the work on the Spark 2.4 support
> > > >
> > > > We started the discussion here, in
> > > > https://issues.apache.org/jira/browse/IGNITE-12054
> > > >
> > > > The Spark internals were totally refactored between 2.3 and 2.4
> versions,
> > > > main changes touches
> > > >
> > > >    - External catalog and listeners refactoring
> > > >    - Changes of HAVING operator semantic support
> > > >    - Push-down NULL filters generation in JOIN plans
> > > >    - minor changes in Plan Generation that should be adopted in our
> > > >    integration module
> > > >
> > > > I propose the initial solution here via creation of new module
> spark-2.4
> > > > here https://issues.apache.org/jira/browse/IGNITE-12247 and
> addition of
> > >
> > > new
> > > > profile spark-2.4 (to avoid possible clashes with another spark
> versions)
> > > >
> > > > Also I've transformed ticket to an Umbrella ticket and created a few
> > > > tickets for muted tests (around 7 from 211 tests are muted now)
> > > >
> > > > Please, if somebody interested in it, make an initial review of
> modular
> > > > ignite structure and changes (without deep diving into Spark code).
> > > >
> > > > And yes, the proposed code is a copy-paste of spark-ignite module
> with a
> > > > few fixes
> > >
> > >
> > >
> > > --
> > > Best regards,
> > > Ivan Pavlukhin
> > >
>
Reply | Threaded
Open this post in threaded view
|

Re: The Spark 2.4 support

Nikolay Izhikov-2
Yes, I can :)

В Пн, 30/09/2019 в 11:40 -0700, Denis Magda пишет:

> Nikolay,
>
> Would you be able to review the changes? I'm not sure there is a better candidate for now.
>
> -
> Denis
>
>
> On Mon, Sep 30, 2019 at 11:01 AM Nikolay Izhikov <[hidden email]> wrote:
> > Hello, Ivan.
> >
> > I had a talk about internals of Spark integration in Ignite.
> > It answers on question why we should use Spark internals.
> >
> > You can take a look at my meetup talk(in Russian) [1] or read an article if you prefer text [2].
> >
> > [1] https://www.youtube.com/watch?v=CzbAweNKEVY
> > [2] https://habr.com/ru/company/sberbank/blog/427297/
> >
> > В Пн, 30/09/2019 в 20:29 +0300, Alexey Zinoviev пишет:
> > > Yes, as I understand it uses Spark internals from the first commit)))
> > > The reason - we take Spark SQL query execution plan and try to execute it
> > > on Ignite cluster
> > > Also we inherit a lot of Developer API related classes that could be
> > > unstable. Spark has no good point for extension and this is a reason why we
> > > should go deeper
> > >
> > > пн, 30 сент. 2019 г. в 20:17, Ivan Pavlukhin <[hidden email]>:
> > >
> > > > Hi Alexey,
> > > >
> > > > As an external watcher very far from Ignite Spark integration I would
> > > > like to ask a humble question for my understanding. Why this
> > > > integration uses Spark internals? Is it a common approach for
> > > > integrating with Spark?
> > > >
> > > > пн, 30 сент. 2019 г. в 16:17, Alexey Zinoviev <[hidden email]>:
> > > > >
> > > > > Hi, Igniters
> > > > > I've started the work on the Spark 2.4 support
> > > > >
> > > > > We started the discussion here, in
> > > > > https://issues.apache.org/jira/browse/IGNITE-12054
> > > > >
> > > > > The Spark internals were totally refactored between 2.3 and 2.4 versions,
> > > > > main changes touches
> > > > >
> > > > >    - External catalog and listeners refactoring
> > > > >    - Changes of HAVING operator semantic support
> > > > >    - Push-down NULL filters generation in JOIN plans
> > > > >    - minor changes in Plan Generation that should be adopted in our
> > > > >    integration module
> > > > >
> > > > > I propose the initial solution here via creation of new module spark-2.4
> > > > > here https://issues.apache.org/jira/browse/IGNITE-12247 and addition of
> > > >
> > > > new
> > > > > profile spark-2.4 (to avoid possible clashes with another spark versions)
> > > > >
> > > > > Also I've transformed ticket to an Umbrella ticket and created a few
> > > > > tickets for muted tests (around 7 from 211 tests are muted now)
> > > > >
> > > > > Please, if somebody interested in it, make an initial review of modular
> > > > > ignite structure and changes (without deep diving into Spark code).
> > > > >
> > > > > And yes, the proposed code is a copy-paste of spark-ignite module with a
> > > > > few fixes
> > > >
> > > >
> > > >
> > > > --
> > > > Best regards,
> > > > Ivan Pavlukhin
> > > >

signature.asc (499 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: The Spark 2.4 support

Alexey Zinoviev
Great talk and paper, I've learnt it last year

пн, 30 сент. 2019 г., 21:42 Nikolay Izhikov <[hidden email]>:

> Yes, I can :)
>
> В Пн, 30/09/2019 в 11:40 -0700, Denis Magda пишет:
> > Nikolay,
> >
> > Would you be able to review the changes? I'm not sure there is a better
> candidate for now.
> >
> > -
> > Denis
> >
> >
> > On Mon, Sep 30, 2019 at 11:01 AM Nikolay Izhikov <[hidden email]>
> wrote:
> > > Hello, Ivan.
> > >
> > > I had a talk about internals of Spark integration in Ignite.
> > > It answers on question why we should use Spark internals.
> > >
> > > You can take a look at my meetup talk(in Russian) [1] or read an
> article if you prefer text [2].
> > >
> > > [1] https://www.youtube.com/watch?v=CzbAweNKEVY
> > > [2] https://habr.com/ru/company/sberbank/blog/427297/
> > >
> > > В Пн, 30/09/2019 в 20:29 +0300, Alexey Zinoviev пишет:
> > > > Yes, as I understand it uses Spark internals from the first commit)))
> > > > The reason - we take Spark SQL query execution plan and try to
> execute it
> > > > on Ignite cluster
> > > > Also we inherit a lot of Developer API related classes that could be
> > > > unstable. Spark has no good point for extension and this is a reason
> why we
> > > > should go deeper
> > > >
> > > > пн, 30 сент. 2019 г. в 20:17, Ivan Pavlukhin <[hidden email]>:
> > > >
> > > > > Hi Alexey,
> > > > >
> > > > > As an external watcher very far from Ignite Spark integration I
> would
> > > > > like to ask a humble question for my understanding. Why this
> > > > > integration uses Spark internals? Is it a common approach for
> > > > > integrating with Spark?
> > > > >
> > > > > пн, 30 сент. 2019 г. в 16:17, Alexey Zinoviev <
> [hidden email]>:
> > > > > >
> > > > > > Hi, Igniters
> > > > > > I've started the work on the Spark 2.4 support
> > > > > >
> > > > > > We started the discussion here, in
> > > > > > https://issues.apache.org/jira/browse/IGNITE-12054
> > > > > >
> > > > > > The Spark internals were totally refactored between 2.3 and 2.4
> versions,
> > > > > > main changes touches
> > > > > >
> > > > > >    - External catalog and listeners refactoring
> > > > > >    - Changes of HAVING operator semantic support
> > > > > >    - Push-down NULL filters generation in JOIN plans
> > > > > >    - minor changes in Plan Generation that should be adopted in
> our
> > > > > >    integration module
> > > > > >
> > > > > > I propose the initial solution here via creation of new module
> spark-2.4
> > > > > > here https://issues.apache.org/jira/browse/IGNITE-12247 and
> addition of
> > > > >
> > > > > new
> > > > > > profile spark-2.4 (to avoid possible clashes with another spark
> versions)
> > > > > >
> > > > > > Also I've transformed ticket to an Umbrella ticket and created a
> few
> > > > > > tickets for muted tests (around 7 from 211 tests are muted now)
> > > > > >
> > > > > > Please, if somebody interested in it, make an initial review of
> modular
> > > > > > ignite structure and changes (without deep diving into Spark
> code).
> > > > > >
> > > > > > And yes, the proposed code is a copy-paste of spark-ignite
> module with a
> > > > > > few fixes
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Best regards,
> > > > > Ivan Pavlukhin
> > > > >
>
Reply | Threaded
Open this post in threaded view
|

Re: The Spark 2.4 support

Ivan Pavlukhin
Alexey, Nikolay,

Thank you for sharing details!

вт, 1 окт. 2019 г. в 07:42, Alexey Zinoviev <[hidden email]>:

>
> Great talk and paper, I've learnt it last year
>
> пн, 30 сент. 2019 г., 21:42 Nikolay Izhikov <[hidden email]>:
>
> > Yes, I can :)
> >
> > В Пн, 30/09/2019 в 11:40 -0700, Denis Magda пишет:
> > > Nikolay,
> > >
> > > Would you be able to review the changes? I'm not sure there is a better
> > candidate for now.
> > >
> > > -
> > > Denis
> > >
> > >
> > > On Mon, Sep 30, 2019 at 11:01 AM Nikolay Izhikov <[hidden email]>
> > wrote:
> > > > Hello, Ivan.
> > > >
> > > > I had a talk about internals of Spark integration in Ignite.
> > > > It answers on question why we should use Spark internals.
> > > >
> > > > You can take a look at my meetup talk(in Russian) [1] or read an
> > article if you prefer text [2].
> > > >
> > > > [1] https://www.youtube.com/watch?v=CzbAweNKEVY
> > > > [2] https://habr.com/ru/company/sberbank/blog/427297/
> > > >
> > > > В Пн, 30/09/2019 в 20:29 +0300, Alexey Zinoviev пишет:
> > > > > Yes, as I understand it uses Spark internals from the first commit)))
> > > > > The reason - we take Spark SQL query execution plan and try to
> > execute it
> > > > > on Ignite cluster
> > > > > Also we inherit a lot of Developer API related classes that could be
> > > > > unstable. Spark has no good point for extension and this is a reason
> > why we
> > > > > should go deeper
> > > > >
> > > > > пн, 30 сент. 2019 г. в 20:17, Ivan Pavlukhin <[hidden email]>:
> > > > >
> > > > > > Hi Alexey,
> > > > > >
> > > > > > As an external watcher very far from Ignite Spark integration I
> > would
> > > > > > like to ask a humble question for my understanding. Why this
> > > > > > integration uses Spark internals? Is it a common approach for
> > > > > > integrating with Spark?
> > > > > >
> > > > > > пн, 30 сент. 2019 г. в 16:17, Alexey Zinoviev <
> > [hidden email]>:
> > > > > > >
> > > > > > > Hi, Igniters
> > > > > > > I've started the work on the Spark 2.4 support
> > > > > > >
> > > > > > > We started the discussion here, in
> > > > > > > https://issues.apache.org/jira/browse/IGNITE-12054
> > > > > > >
> > > > > > > The Spark internals were totally refactored between 2.3 and 2.4
> > versions,
> > > > > > > main changes touches
> > > > > > >
> > > > > > >    - External catalog and listeners refactoring
> > > > > > >    - Changes of HAVING operator semantic support
> > > > > > >    - Push-down NULL filters generation in JOIN plans
> > > > > > >    - minor changes in Plan Generation that should be adopted in
> > our
> > > > > > >    integration module
> > > > > > >
> > > > > > > I propose the initial solution here via creation of new module
> > spark-2.4
> > > > > > > here https://issues.apache.org/jira/browse/IGNITE-12247 and
> > addition of
> > > > > >
> > > > > > new
> > > > > > > profile spark-2.4 (to avoid possible clashes with another spark
> > versions)
> > > > > > >
> > > > > > > Also I've transformed ticket to an Umbrella ticket and created a
> > few
> > > > > > > tickets for muted tests (around 7 from 211 tests are muted now)
> > > > > > >
> > > > > > > Please, if somebody interested in it, make an initial review of
> > modular
> > > > > > > ignite structure and changes (without deep diving into Spark
> > code).
> > > > > > >
> > > > > > > And yes, the proposed code is a copy-paste of spark-ignite
> > module with a
> > > > > > > few fixes
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Best regards,
> > > > > > Ivan Pavlukhin
> > > > > >
> >



--
Best regards,
Ivan Pavlukhin
Reply | Threaded
Open this post in threaded view
|

Re: The Spark 2.4 support

Alexey Zinoviev
Dear Nikolay Izhikov, I've recreated the PR for 2.4 initial support

The last commit
https://github.com/apache/ignite/pull/7058/commits/60386802299deedc6ed60bf4736e922201a67fb8
contains
real changes from Spark 2.3

I suggest to merge to master this initial solution with 95% support of
Spark 2.4 and continue work on known issues listed in JIRA

This solution supports the new Spark version for all examples and 95% of
tests of 2.3.

вт, 1 окт. 2019 г. в 08:48, Ivan Pavlukhin <[hidden email]>:

> Alexey, Nikolay,
>
> Thank you for sharing details!
>
> вт, 1 окт. 2019 г. в 07:42, Alexey Zinoviev <[hidden email]>:
> >
> > Great talk and paper, I've learnt it last year
> >
> > пн, 30 сент. 2019 г., 21:42 Nikolay Izhikov <[hidden email]>:
> >
> > > Yes, I can :)
> > >
> > > В Пн, 30/09/2019 в 11:40 -0700, Denis Magda пишет:
> > > > Nikolay,
> > > >
> > > > Would you be able to review the changes? I'm not sure there is a
> better
> > > candidate for now.
> > > >
> > > > -
> > > > Denis
> > > >
> > > >
> > > > On Mon, Sep 30, 2019 at 11:01 AM Nikolay Izhikov <
> [hidden email]>
> > > wrote:
> > > > > Hello, Ivan.
> > > > >
> > > > > I had a talk about internals of Spark integration in Ignite.
> > > > > It answers on question why we should use Spark internals.
> > > > >
> > > > > You can take a look at my meetup talk(in Russian) [1] or read an
> > > article if you prefer text [2].
> > > > >
> > > > > [1] https://www.youtube.com/watch?v=CzbAweNKEVY
> > > > > [2] https://habr.com/ru/company/sberbank/blog/427297/
> > > > >
> > > > > В Пн, 30/09/2019 в 20:29 +0300, Alexey Zinoviev пишет:
> > > > > > Yes, as I understand it uses Spark internals from the first
> commit)))
> > > > > > The reason - we take Spark SQL query execution plan and try to
> > > execute it
> > > > > > on Ignite cluster
> > > > > > Also we inherit a lot of Developer API related classes that
> could be
> > > > > > unstable. Spark has no good point for extension and this is a
> reason
> > > why we
> > > > > > should go deeper
> > > > > >
> > > > > > пн, 30 сент. 2019 г. в 20:17, Ivan Pavlukhin <
> [hidden email]>:
> > > > > >
> > > > > > > Hi Alexey,
> > > > > > >
> > > > > > > As an external watcher very far from Ignite Spark integration I
> > > would
> > > > > > > like to ask a humble question for my understanding. Why this
> > > > > > > integration uses Spark internals? Is it a common approach for
> > > > > > > integrating with Spark?
> > > > > > >
> > > > > > > пн, 30 сент. 2019 г. в 16:17, Alexey Zinoviev <
> > > [hidden email]>:
> > > > > > > >
> > > > > > > > Hi, Igniters
> > > > > > > > I've started the work on the Spark 2.4 support
> > > > > > > >
> > > > > > > > We started the discussion here, in
> > > > > > > > https://issues.apache.org/jira/browse/IGNITE-12054
> > > > > > > >
> > > > > > > > The Spark internals were totally refactored between 2.3 and
> 2.4
> > > versions,
> > > > > > > > main changes touches
> > > > > > > >
> > > > > > > >    - External catalog and listeners refactoring
> > > > > > > >    - Changes of HAVING operator semantic support
> > > > > > > >    - Push-down NULL filters generation in JOIN plans
> > > > > > > >    - minor changes in Plan Generation that should be adopted
> in
> > > our
> > > > > > > >    integration module
> > > > > > > >
> > > > > > > > I propose the initial solution here via creation of new
> module
> > > spark-2.4
> > > > > > > > here https://issues.apache.org/jira/browse/IGNITE-12247 and
> > > addition of
> > > > > > >
> > > > > > > new
> > > > > > > > profile spark-2.4 (to avoid possible clashes with another
> spark
> > > versions)
> > > > > > > >
> > > > > > > > Also I've transformed ticket to an Umbrella ticket and
> created a
> > > few
> > > > > > > > tickets for muted tests (around 7 from 211 tests are muted
> now)
> > > > > > > >
> > > > > > > > Please, if somebody interested in it, make an initial review
> of
> > > modular
> > > > > > > > ignite structure and changes (without deep diving into Spark
> > > code).
> > > > > > > >
> > > > > > > > And yes, the proposed code is a copy-paste of spark-ignite
> > > module with a
> > > > > > > > few fixes
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Best regards,
> > > > > > > Ivan Pavlukhin
> > > > > > >
> > >
>
>
>
> --
> Best regards,
> Ivan Pavlukhin
>
Reply | Threaded
Open this post in threaded view
|

Re: The Spark 2.4 support

dmagda
Alexey,

Please help to understand what it means that 2.4 integration supports "95%
of tests of 2.3". Does it mean that 5% of existing tests are failing and,
basically, need to be fixed?

-
Denis


On Mon, Nov 18, 2019 at 6:52 AM Alexey Zinoviev <[hidden email]>
wrote:

> Dear Nikolay Izhikov, I've recreated the PR for 2.4 initial support
>
> The last commit
>
> https://github.com/apache/ignite/pull/7058/commits/60386802299deedc6ed60bf4736e922201a67fb8
> contains
> real changes from Spark 2.3
>
> I suggest to merge to master this initial solution with 95% support of
> Spark 2.4 and continue work on known issues listed in JIRA
>
> This solution supports the new Spark version for all examples and 95% of
> tests of 2.3.
>
> вт, 1 окт. 2019 г. в 08:48, Ivan Pavlukhin <[hidden email]>:
>
> > Alexey, Nikolay,
> >
> > Thank you for sharing details!
> >
> > вт, 1 окт. 2019 г. в 07:42, Alexey Zinoviev <[hidden email]>:
> > >
> > > Great talk and paper, I've learnt it last year
> > >
> > > пн, 30 сент. 2019 г., 21:42 Nikolay Izhikov <[hidden email]>:
> > >
> > > > Yes, I can :)
> > > >
> > > > В Пн, 30/09/2019 в 11:40 -0700, Denis Magda пишет:
> > > > > Nikolay,
> > > > >
> > > > > Would you be able to review the changes? I'm not sure there is a
> > better
> > > > candidate for now.
> > > > >
> > > > > -
> > > > > Denis
> > > > >
> > > > >
> > > > > On Mon, Sep 30, 2019 at 11:01 AM Nikolay Izhikov <
> > [hidden email]>
> > > > wrote:
> > > > > > Hello, Ivan.
> > > > > >
> > > > > > I had a talk about internals of Spark integration in Ignite.
> > > > > > It answers on question why we should use Spark internals.
> > > > > >
> > > > > > You can take a look at my meetup talk(in Russian) [1] or read an
> > > > article if you prefer text [2].
> > > > > >
> > > > > > [1] https://www.youtube.com/watch?v=CzbAweNKEVY
> > > > > > [2] https://habr.com/ru/company/sberbank/blog/427297/
> > > > > >
> > > > > > В Пн, 30/09/2019 в 20:29 +0300, Alexey Zinoviev пишет:
> > > > > > > Yes, as I understand it uses Spark internals from the first
> > commit)))
> > > > > > > The reason - we take Spark SQL query execution plan and try to
> > > > execute it
> > > > > > > on Ignite cluster
> > > > > > > Also we inherit a lot of Developer API related classes that
> > could be
> > > > > > > unstable. Spark has no good point for extension and this is a
> > reason
> > > > why we
> > > > > > > should go deeper
> > > > > > >
> > > > > > > пн, 30 сент. 2019 г. в 20:17, Ivan Pavlukhin <
> > [hidden email]>:
> > > > > > >
> > > > > > > > Hi Alexey,
> > > > > > > >
> > > > > > > > As an external watcher very far from Ignite Spark
> integration I
> > > > would
> > > > > > > > like to ask a humble question for my understanding. Why this
> > > > > > > > integration uses Spark internals? Is it a common approach for
> > > > > > > > integrating with Spark?
> > > > > > > >
> > > > > > > > пн, 30 сент. 2019 г. в 16:17, Alexey Zinoviev <
> > > > [hidden email]>:
> > > > > > > > >
> > > > > > > > > Hi, Igniters
> > > > > > > > > I've started the work on the Spark 2.4 support
> > > > > > > > >
> > > > > > > > > We started the discussion here, in
> > > > > > > > > https://issues.apache.org/jira/browse/IGNITE-12054
> > > > > > > > >
> > > > > > > > > The Spark internals were totally refactored between 2.3 and
> > 2.4
> > > > versions,
> > > > > > > > > main changes touches
> > > > > > > > >
> > > > > > > > >    - External catalog and listeners refactoring
> > > > > > > > >    - Changes of HAVING operator semantic support
> > > > > > > > >    - Push-down NULL filters generation in JOIN plans
> > > > > > > > >    - minor changes in Plan Generation that should be
> adopted
> > in
> > > > our
> > > > > > > > >    integration module
> > > > > > > > >
> > > > > > > > > I propose the initial solution here via creation of new
> > module
> > > > spark-2.4
> > > > > > > > > here https://issues.apache.org/jira/browse/IGNITE-12247
> and
> > > > addition of
> > > > > > > >
> > > > > > > > new
> > > > > > > > > profile spark-2.4 (to avoid possible clashes with another
> > spark
> > > > versions)
> > > > > > > > >
> > > > > > > > > Also I've transformed ticket to an Umbrella ticket and
> > created a
> > > > few
> > > > > > > > > tickets for muted tests (around 7 from 211 tests are muted
> > now)
> > > > > > > > >
> > > > > > > > > Please, if somebody interested in it, make an initial
> review
> > of
> > > > modular
> > > > > > > > > ignite structure and changes (without deep diving into
> Spark
> > > > code).
> > > > > > > > >
> > > > > > > > > And yes, the proposed code is a copy-paste of spark-ignite
> > > > module with a
> > > > > > > > > few fixes
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Best regards,
> > > > > > > > Ivan Pavlukhin
> > > > > > > >
> > > >
> >
> >
> >
> > --
> > Best regards,
> > Ivan Pavlukhin
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: The Spark 2.4 support

Alexey Zinoviev
Right, a few tests from 200 are failed due to known issue and couldnt be
fixed immediately, related to rare cases. These tests are copies of 2.3
tests and part of them could have no meaning for 2.4 due to Spark changed
behaviour.

пн, 18 нояб. 2019 г., 19:42 Denis Magda <[hidden email]>:

> Alexey,
>
> Please help to understand what it means that 2.4 integration supports "95%
> of tests of 2.3". Does it mean that 5% of existing tests are failing and,
> basically, need to be fixed?
>
> -
> Denis
>
>
> On Mon, Nov 18, 2019 at 6:52 AM Alexey Zinoviev <[hidden email]>
> wrote:
>
> > Dear Nikolay Izhikov, I've recreated the PR for 2.4 initial support
> >
> > The last commit
> >
> >
> https://github.com/apache/ignite/pull/7058/commits/60386802299deedc6ed60bf4736e922201a67fb8
> > contains
> > real changes from Spark 2.3
> >
> > I suggest to merge to master this initial solution with 95% support of
> > Spark 2.4 and continue work on known issues listed in JIRA
> >
> > This solution supports the new Spark version for all examples and 95% of
> > tests of 2.3.
> >
> > вт, 1 окт. 2019 г. в 08:48, Ivan Pavlukhin <[hidden email]>:
> >
> > > Alexey, Nikolay,
> > >
> > > Thank you for sharing details!
> > >
> > > вт, 1 окт. 2019 г. в 07:42, Alexey Zinoviev <[hidden email]>:
> > > >
> > > > Great talk and paper, I've learnt it last year
> > > >
> > > > пн, 30 сент. 2019 г., 21:42 Nikolay Izhikov <[hidden email]>:
> > > >
> > > > > Yes, I can :)
> > > > >
> > > > > В Пн, 30/09/2019 в 11:40 -0700, Denis Magda пишет:
> > > > > > Nikolay,
> > > > > >
> > > > > > Would you be able to review the changes? I'm not sure there is a
> > > better
> > > > > candidate for now.
> > > > > >
> > > > > > -
> > > > > > Denis
> > > > > >
> > > > > >
> > > > > > On Mon, Sep 30, 2019 at 11:01 AM Nikolay Izhikov <
> > > [hidden email]>
> > > > > wrote:
> > > > > > > Hello, Ivan.
> > > > > > >
> > > > > > > I had a talk about internals of Spark integration in Ignite.
> > > > > > > It answers on question why we should use Spark internals.
> > > > > > >
> > > > > > > You can take a look at my meetup talk(in Russian) [1] or read
> an
> > > > > article if you prefer text [2].
> > > > > > >
> > > > > > > [1] https://www.youtube.com/watch?v=CzbAweNKEVY
> > > > > > > [2] https://habr.com/ru/company/sberbank/blog/427297/
> > > > > > >
> > > > > > > В Пн, 30/09/2019 в 20:29 +0300, Alexey Zinoviev пишет:
> > > > > > > > Yes, as I understand it uses Spark internals from the first
> > > commit)))
> > > > > > > > The reason - we take Spark SQL query execution plan and try
> to
> > > > > execute it
> > > > > > > > on Ignite cluster
> > > > > > > > Also we inherit a lot of Developer API related classes that
> > > could be
> > > > > > > > unstable. Spark has no good point for extension and this is a
> > > reason
> > > > > why we
> > > > > > > > should go deeper
> > > > > > > >
> > > > > > > > пн, 30 сент. 2019 г. в 20:17, Ivan Pavlukhin <
> > > [hidden email]>:
> > > > > > > >
> > > > > > > > > Hi Alexey,
> > > > > > > > >
> > > > > > > > > As an external watcher very far from Ignite Spark
> > integration I
> > > > > would
> > > > > > > > > like to ask a humble question for my understanding. Why
> this
> > > > > > > > > integration uses Spark internals? Is it a common approach
> for
> > > > > > > > > integrating with Spark?
> > > > > > > > >
> > > > > > > > > пн, 30 сент. 2019 г. в 16:17, Alexey Zinoviev <
> > > > > [hidden email]>:
> > > > > > > > > >
> > > > > > > > > > Hi, Igniters
> > > > > > > > > > I've started the work on the Spark 2.4 support
> > > > > > > > > >
> > > > > > > > > > We started the discussion here, in
> > > > > > > > > > https://issues.apache.org/jira/browse/IGNITE-12054
> > > > > > > > > >
> > > > > > > > > > The Spark internals were totally refactored between 2.3
> and
> > > 2.4
> > > > > versions,
> > > > > > > > > > main changes touches
> > > > > > > > > >
> > > > > > > > > >    - External catalog and listeners refactoring
> > > > > > > > > >    - Changes of HAVING operator semantic support
> > > > > > > > > >    - Push-down NULL filters generation in JOIN plans
> > > > > > > > > >    - minor changes in Plan Generation that should be
> > adopted
> > > in
> > > > > our
> > > > > > > > > >    integration module
> > > > > > > > > >
> > > > > > > > > > I propose the initial solution here via creation of new
> > > module
> > > > > spark-2.4
> > > > > > > > > > here https://issues.apache.org/jira/browse/IGNITE-12247
> > and
> > > > > addition of
> > > > > > > > >
> > > > > > > > > new
> > > > > > > > > > profile spark-2.4 (to avoid possible clashes with another
> > > spark
> > > > > versions)
> > > > > > > > > >
> > > > > > > > > > Also I've transformed ticket to an Umbrella ticket and
> > > created a
> > > > > few
> > > > > > > > > > tickets for muted tests (around 7 from 211 tests are
> muted
> > > now)
> > > > > > > > > >
> > > > > > > > > > Please, if somebody interested in it, make an initial
> > review
> > > of
> > > > > modular
> > > > > > > > > > ignite structure and changes (without deep diving into
> > Spark
> > > > > code).
> > > > > > > > > >
> > > > > > > > > > And yes, the proposed code is a copy-paste of
> spark-ignite
> > > > > module with a
> > > > > > > > > > few fixes
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > Best regards,
> > > > > > > > > Ivan Pavlukhin
> > > > > > > > >
> > > > >
> > >
> > >
> > >
> > > --
> > > Best regards,
> > > Ivan Pavlukhin
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: The Spark 2.4 support

dmagda
Alexey, thanks for the details and for reaching out this milestone with the
2.4 support.

Generally, I would advise us to merge the changes to the master only after
we confirm the failing tests are not regressions. We should either remove
them or replace them with some others or just fix.

-
Denis


On Mon, Nov 18, 2019 at 10:06 AM Alexey Zinoviev <[hidden email]>
wrote:

> Right, a few tests from 200 are failed due to known issue and couldnt be
> fixed immediately, related to rare cases. These tests are copies of 2.3
> tests and part of them could have no meaning for 2.4 due to Spark changed
> behaviour.
>
> пн, 18 нояб. 2019 г., 19:42 Denis Magda <[hidden email]>:
>
> > Alexey,
> >
> > Please help to understand what it means that 2.4 integration supports
> "95%
> > of tests of 2.3". Does it mean that 5% of existing tests are failing and,
> > basically, need to be fixed?
> >
> > -
> > Denis
> >
> >
> > On Mon, Nov 18, 2019 at 6:52 AM Alexey Zinoviev <[hidden email]>
> > wrote:
> >
> > > Dear Nikolay Izhikov, I've recreated the PR for 2.4 initial support
> > >
> > > The last commit
> > >
> > >
> >
> https://github.com/apache/ignite/pull/7058/commits/60386802299deedc6ed60bf4736e922201a67fb8
> > > contains
> > > real changes from Spark 2.3
> > >
> > > I suggest to merge to master this initial solution with 95% support of
> > > Spark 2.4 and continue work on known issues listed in JIRA
> > >
> > > This solution supports the new Spark version for all examples and 95%
> of
> > > tests of 2.3.
> > >
> > > вт, 1 окт. 2019 г. в 08:48, Ivan Pavlukhin <[hidden email]>:
> > >
> > > > Alexey, Nikolay,
> > > >
> > > > Thank you for sharing details!
> > > >
> > > > вт, 1 окт. 2019 г. в 07:42, Alexey Zinoviev <[hidden email]
> >:
> > > > >
> > > > > Great talk and paper, I've learnt it last year
> > > > >
> > > > > пн, 30 сент. 2019 г., 21:42 Nikolay Izhikov <[hidden email]>:
> > > > >
> > > > > > Yes, I can :)
> > > > > >
> > > > > > В Пн, 30/09/2019 в 11:40 -0700, Denis Magda пишет:
> > > > > > > Nikolay,
> > > > > > >
> > > > > > > Would you be able to review the changes? I'm not sure there is
> a
> > > > better
> > > > > > candidate for now.
> > > > > > >
> > > > > > > -
> > > > > > > Denis
> > > > > > >
> > > > > > >
> > > > > > > On Mon, Sep 30, 2019 at 11:01 AM Nikolay Izhikov <
> > > > [hidden email]>
> > > > > > wrote:
> > > > > > > > Hello, Ivan.
> > > > > > > >
> > > > > > > > I had a talk about internals of Spark integration in Ignite.
> > > > > > > > It answers on question why we should use Spark internals.
> > > > > > > >
> > > > > > > > You can take a look at my meetup talk(in Russian) [1] or read
> > an
> > > > > > article if you prefer text [2].
> > > > > > > >
> > > > > > > > [1] https://www.youtube.com/watch?v=CzbAweNKEVY
> > > > > > > > [2] https://habr.com/ru/company/sberbank/blog/427297/
> > > > > > > >
> > > > > > > > В Пн, 30/09/2019 в 20:29 +0300, Alexey Zinoviev пишет:
> > > > > > > > > Yes, as I understand it uses Spark internals from the first
> > > > commit)))
> > > > > > > > > The reason - we take Spark SQL query execution plan and try
> > to
> > > > > > execute it
> > > > > > > > > on Ignite cluster
> > > > > > > > > Also we inherit a lot of Developer API related classes that
> > > > could be
> > > > > > > > > unstable. Spark has no good point for extension and this
> is a
> > > > reason
> > > > > > why we
> > > > > > > > > should go deeper
> > > > > > > > >
> > > > > > > > > пн, 30 сент. 2019 г. в 20:17, Ivan Pavlukhin <
> > > > [hidden email]>:
> > > > > > > > >
> > > > > > > > > > Hi Alexey,
> > > > > > > > > >
> > > > > > > > > > As an external watcher very far from Ignite Spark
> > > integration I
> > > > > > would
> > > > > > > > > > like to ask a humble question for my understanding. Why
> > this
> > > > > > > > > > integration uses Spark internals? Is it a common approach
> > for
> > > > > > > > > > integrating with Spark?
> > > > > > > > > >
> > > > > > > > > > пн, 30 сент. 2019 г. в 16:17, Alexey Zinoviev <
> > > > > > [hidden email]>:
> > > > > > > > > > >
> > > > > > > > > > > Hi, Igniters
> > > > > > > > > > > I've started the work on the Spark 2.4 support
> > > > > > > > > > >
> > > > > > > > > > > We started the discussion here, in
> > > > > > > > > > > https://issues.apache.org/jira/browse/IGNITE-12054
> > > > > > > > > > >
> > > > > > > > > > > The Spark internals were totally refactored between 2.3
> > and
> > > > 2.4
> > > > > > versions,
> > > > > > > > > > > main changes touches
> > > > > > > > > > >
> > > > > > > > > > >    - External catalog and listeners refactoring
> > > > > > > > > > >    - Changes of HAVING operator semantic support
> > > > > > > > > > >    - Push-down NULL filters generation in JOIN plans
> > > > > > > > > > >    - minor changes in Plan Generation that should be
> > > adopted
> > > > in
> > > > > > our
> > > > > > > > > > >    integration module
> > > > > > > > > > >
> > > > > > > > > > > I propose the initial solution here via creation of new
> > > > module
> > > > > > spark-2.4
> > > > > > > > > > > here
> https://issues.apache.org/jira/browse/IGNITE-12247
> > > and
> > > > > > addition of
> > > > > > > > > >
> > > > > > > > > > new
> > > > > > > > > > > profile spark-2.4 (to avoid possible clashes with
> another
> > > > spark
> > > > > > versions)
> > > > > > > > > > >
> > > > > > > > > > > Also I've transformed ticket to an Umbrella ticket and
> > > > created a
> > > > > > few
> > > > > > > > > > > tickets for muted tests (around 7 from 211 tests are
> > muted
> > > > now)
> > > > > > > > > > >
> > > > > > > > > > > Please, if somebody interested in it, make an initial
> > > review
> > > > of
> > > > > > modular
> > > > > > > > > > > ignite structure and changes (without deep diving into
> > > Spark
> > > > > > code).
> > > > > > > > > > >
> > > > > > > > > > > And yes, the proposed code is a copy-paste of
> > spark-ignite
> > > > > > module with a
> > > > > > > > > > > few fixes
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > > Best regards,
> > > > > > > > > > Ivan Pavlukhin
> > > > > > > > > >
> > > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Best regards,
> > > > Ivan Pavlukhin
> > > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: The Spark 2.4 support

Nikolay Izhikov-2
Hello, Alexey.

Can we somehow highlight changes in Spark-2.4 module comparing to 2.3 one?
For now the changes look too huge for me (+11,681 −1).

Are we sure we want to add those huge piece of code to support two versions?
Can we extract unchanged parts(based on spark public API) and keep them in one copy?

> 18 нояб. 2019 г., в 23:47, Denis Magda <[hidden email]> написал(а):
>
> Alexey, thanks for the details and for reaching out this milestone with the
> 2.4 support.
>
> Generally, I would advise us to merge the changes to the master only after
> we confirm the failing tests are not regressions. We should either remove
> them or replace them with some others or just fix.
>
> -
> Denis
>
>
> On Mon, Nov 18, 2019 at 10:06 AM Alexey Zinoviev <[hidden email]>
> wrote:
>
>> Right, a few tests from 200 are failed due to known issue and couldnt be
>> fixed immediately, related to rare cases. These tests are copies of 2.3
>> tests and part of them could have no meaning for 2.4 due to Spark changed
>> behaviour.
>>
>> пн, 18 нояб. 2019 г., 19:42 Denis Magda <[hidden email]>:
>>
>>> Alexey,
>>>
>>> Please help to understand what it means that 2.4 integration supports
>> "95%
>>> of tests of 2.3". Does it mean that 5% of existing tests are failing and,
>>> basically, need to be fixed?
>>>
>>> -
>>> Denis
>>>
>>>
>>> On Mon, Nov 18, 2019 at 6:52 AM Alexey Zinoviev <[hidden email]>
>>> wrote:
>>>
>>>> Dear Nikolay Izhikov, I've recreated the PR for 2.4 initial support
>>>>
>>>> The last commit
>>>>
>>>>
>>>
>> https://github.com/apache/ignite/pull/7058/commits/60386802299deedc6ed60bf4736e922201a67fb8
>>>> contains
>>>> real changes from Spark 2.3
>>>>
>>>> I suggest to merge to master this initial solution with 95% support of
>>>> Spark 2.4 and continue work on known issues listed in JIRA
>>>>
>>>> This solution supports the new Spark version for all examples and 95%
>> of
>>>> tests of 2.3.
>>>>
>>>> вт, 1 окт. 2019 г. в 08:48, Ivan Pavlukhin <[hidden email]>:
>>>>
>>>>> Alexey, Nikolay,
>>>>>
>>>>> Thank you for sharing details!
>>>>>
>>>>> вт, 1 окт. 2019 г. в 07:42, Alexey Zinoviev <[hidden email]
>>> :
>>>>>>
>>>>>> Great talk and paper, I've learnt it last year
>>>>>>
>>>>>> пн, 30 сент. 2019 г., 21:42 Nikolay Izhikov <[hidden email]>:
>>>>>>
>>>>>>> Yes, I can :)
>>>>>>>
>>>>>>> В Пн, 30/09/2019 в 11:40 -0700, Denis Magda пишет:
>>>>>>>> Nikolay,
>>>>>>>>
>>>>>>>> Would you be able to review the changes? I'm not sure there is
>> a
>>>>> better
>>>>>>> candidate for now.
>>>>>>>>
>>>>>>>> -
>>>>>>>> Denis
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Sep 30, 2019 at 11:01 AM Nikolay Izhikov <
>>>>> [hidden email]>
>>>>>>> wrote:
>>>>>>>>> Hello, Ivan.
>>>>>>>>>
>>>>>>>>> I had a talk about internals of Spark integration in Ignite.
>>>>>>>>> It answers on question why we should use Spark internals.
>>>>>>>>>
>>>>>>>>> You can take a look at my meetup talk(in Russian) [1] or read
>>> an
>>>>>>> article if you prefer text [2].
>>>>>>>>>
>>>>>>>>> [1] https://www.youtube.com/watch?v=CzbAweNKEVY
>>>>>>>>> [2] https://habr.com/ru/company/sberbank/blog/427297/
>>>>>>>>>
>>>>>>>>> В Пн, 30/09/2019 в 20:29 +0300, Alexey Zinoviev пишет:
>>>>>>>>>> Yes, as I understand it uses Spark internals from the first
>>>>> commit)))
>>>>>>>>>> The reason - we take Spark SQL query execution plan and try
>>> to
>>>>>>> execute it
>>>>>>>>>> on Ignite cluster
>>>>>>>>>> Also we inherit a lot of Developer API related classes that
>>>>> could be
>>>>>>>>>> unstable. Spark has no good point for extension and this
>> is a
>>>>> reason
>>>>>>> why we
>>>>>>>>>> should go deeper
>>>>>>>>>>
>>>>>>>>>> пн, 30 сент. 2019 г. в 20:17, Ivan Pavlukhin <
>>>>> [hidden email]>:
>>>>>>>>>>
>>>>>>>>>>> Hi Alexey,
>>>>>>>>>>>
>>>>>>>>>>> As an external watcher very far from Ignite Spark
>>>> integration I
>>>>>>> would
>>>>>>>>>>> like to ask a humble question for my understanding. Why
>>> this
>>>>>>>>>>> integration uses Spark internals? Is it a common approach
>>> for
>>>>>>>>>>> integrating with Spark?
>>>>>>>>>>>
>>>>>>>>>>> пн, 30 сент. 2019 г. в 16:17, Alexey Zinoviev <
>>>>>>> [hidden email]>:
>>>>>>>>>>>>
>>>>>>>>>>>> Hi, Igniters
>>>>>>>>>>>> I've started the work on the Spark 2.4 support
>>>>>>>>>>>>
>>>>>>>>>>>> We started the discussion here, in
>>>>>>>>>>>> https://issues.apache.org/jira/browse/IGNITE-12054
>>>>>>>>>>>>
>>>>>>>>>>>> The Spark internals were totally refactored between 2.3
>>> and
>>>>> 2.4
>>>>>>> versions,
>>>>>>>>>>>> main changes touches
>>>>>>>>>>>>
>>>>>>>>>>>>   - External catalog and listeners refactoring
>>>>>>>>>>>>   - Changes of HAVING operator semantic support
>>>>>>>>>>>>   - Push-down NULL filters generation in JOIN plans
>>>>>>>>>>>>   - minor changes in Plan Generation that should be
>>>> adopted
>>>>> in
>>>>>>> our
>>>>>>>>>>>>   integration module
>>>>>>>>>>>>
>>>>>>>>>>>> I propose the initial solution here via creation of new
>>>>> module
>>>>>>> spark-2.4
>>>>>>>>>>>> here
>> https://issues.apache.org/jira/browse/IGNITE-12247
>>>> and
>>>>>>> addition of
>>>>>>>>>>>
>>>>>>>>>>> new
>>>>>>>>>>>> profile spark-2.4 (to avoid possible clashes with
>> another
>>>>> spark
>>>>>>> versions)
>>>>>>>>>>>>
>>>>>>>>>>>> Also I've transformed ticket to an Umbrella ticket and
>>>>> created a
>>>>>>> few
>>>>>>>>>>>> tickets for muted tests (around 7 from 211 tests are
>>> muted
>>>>> now)
>>>>>>>>>>>>
>>>>>>>>>>>> Please, if somebody interested in it, make an initial
>>>> review
>>>>> of
>>>>>>> modular
>>>>>>>>>>>> ignite structure and changes (without deep diving into
>>>> Spark
>>>>>>> code).
>>>>>>>>>>>>
>>>>>>>>>>>> And yes, the proposed code is a copy-paste of
>>> spark-ignite
>>>>>>> module with a
>>>>>>>>>>>> few fixes
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Best regards,
>>>>>>>>>>> Ivan Pavlukhin
>>>>>>>>>>>
>>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Best regards,
>>>>> Ivan Pavlukhin
>>>>>
>>>>
>>>
>>

Reply | Threaded
Open this post in threaded view
|

Re: The Spark 2.4 support

Alexey Zinoviev
Yes, as I mentioned above, you could observe the real changes after copying
of spark 2.3 module here, in special commit
The last commit
https://github.com/apache/ignite/pull/7058/commits/60386802299deedc6ed60bf4736e922201a67fb8
contains
real changes from Spark 2.3


пн, 25 нояб. 2019 г. в 17:28, Николай Ижиков <[hidden email]>:

> Hello, Alexey.
>
> Can we somehow highlight changes in Spark-2.4 module comparing to 2.3 one?
> For now the changes look too huge for me (+11,681 −1).
>
> Are we sure we want to add those huge piece of code to support two
> versions?
> Can we extract unchanged parts(based on spark public API) and keep them in
> one copy?
>
> > 18 нояб. 2019 г., в 23:47, Denis Magda <[hidden email]> написал(а):
> >
> > Alexey, thanks for the details and for reaching out this milestone with
> the
> > 2.4 support.
> >
> > Generally, I would advise us to merge the changes to the master only
> after
> > we confirm the failing tests are not regressions. We should either remove
> > them or replace them with some others or just fix.
> >
> > -
> > Denis
> >
> >
> > On Mon, Nov 18, 2019 at 10:06 AM Alexey Zinoviev <[hidden email]
> >
> > wrote:
> >
> >> Right, a few tests from 200 are failed due to known issue and couldnt be
> >> fixed immediately, related to rare cases. These tests are copies of 2.3
> >> tests and part of them could have no meaning for 2.4 due to Spark
> changed
> >> behaviour.
> >>
> >> пн, 18 нояб. 2019 г., 19:42 Denis Magda <[hidden email]>:
> >>
> >>> Alexey,
> >>>
> >>> Please help to understand what it means that 2.4 integration supports
> >> "95%
> >>> of tests of 2.3". Does it mean that 5% of existing tests are failing
> and,
> >>> basically, need to be fixed?
> >>>
> >>> -
> >>> Denis
> >>>
> >>>
> >>> On Mon, Nov 18, 2019 at 6:52 AM Alexey Zinoviev <
> [hidden email]>
> >>> wrote:
> >>>
> >>>> Dear Nikolay Izhikov, I've recreated the PR for 2.4 initial support
> >>>>
> >>>> The last commit
> >>>>
> >>>>
> >>>
> >>
> https://github.com/apache/ignite/pull/7058/commits/60386802299deedc6ed60bf4736e922201a67fb8
> >>>> contains
> >>>> real changes from Spark 2.3
> >>>>
> >>>> I suggest to merge to master this initial solution with 95% support of
> >>>> Spark 2.4 and continue work on known issues listed in JIRA
> >>>>
> >>>> This solution supports the new Spark version for all examples and 95%
> >> of
> >>>> tests of 2.3.
> >>>>
> >>>> вт, 1 окт. 2019 г. в 08:48, Ivan Pavlukhin <[hidden email]>:
> >>>>
> >>>>> Alexey, Nikolay,
> >>>>>
> >>>>> Thank you for sharing details!
> >>>>>
> >>>>> вт, 1 окт. 2019 г. в 07:42, Alexey Zinoviev <[hidden email]
> >>> :
> >>>>>>
> >>>>>> Great talk and paper, I've learnt it last year
> >>>>>>
> >>>>>> пн, 30 сент. 2019 г., 21:42 Nikolay Izhikov <[hidden email]>:
> >>>>>>
> >>>>>>> Yes, I can :)
> >>>>>>>
> >>>>>>> В Пн, 30/09/2019 в 11:40 -0700, Denis Magda пишет:
> >>>>>>>> Nikolay,
> >>>>>>>>
> >>>>>>>> Would you be able to review the changes? I'm not sure there is
> >> a
> >>>>> better
> >>>>>>> candidate for now.
> >>>>>>>>
> >>>>>>>> -
> >>>>>>>> Denis
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Mon, Sep 30, 2019 at 11:01 AM Nikolay Izhikov <
> >>>>> [hidden email]>
> >>>>>>> wrote:
> >>>>>>>>> Hello, Ivan.
> >>>>>>>>>
> >>>>>>>>> I had a talk about internals of Spark integration in Ignite.
> >>>>>>>>> It answers on question why we should use Spark internals.
> >>>>>>>>>
> >>>>>>>>> You can take a look at my meetup talk(in Russian) [1] or read
> >>> an
> >>>>>>> article if you prefer text [2].
> >>>>>>>>>
> >>>>>>>>> [1] https://www.youtube.com/watch?v=CzbAweNKEVY
> >>>>>>>>> [2] https://habr.com/ru/company/sberbank/blog/427297/
> >>>>>>>>>
> >>>>>>>>> В Пн, 30/09/2019 в 20:29 +0300, Alexey Zinoviev пишет:
> >>>>>>>>>> Yes, as I understand it uses Spark internals from the first
> >>>>> commit)))
> >>>>>>>>>> The reason - we take Spark SQL query execution plan and try
> >>> to
> >>>>>>> execute it
> >>>>>>>>>> on Ignite cluster
> >>>>>>>>>> Also we inherit a lot of Developer API related classes that
> >>>>> could be
> >>>>>>>>>> unstable. Spark has no good point for extension and this
> >> is a
> >>>>> reason
> >>>>>>> why we
> >>>>>>>>>> should go deeper
> >>>>>>>>>>
> >>>>>>>>>> пн, 30 сент. 2019 г. в 20:17, Ivan Pavlukhin <
> >>>>> [hidden email]>:
> >>>>>>>>>>
> >>>>>>>>>>> Hi Alexey,
> >>>>>>>>>>>
> >>>>>>>>>>> As an external watcher very far from Ignite Spark
> >>>> integration I
> >>>>>>> would
> >>>>>>>>>>> like to ask a humble question for my understanding. Why
> >>> this
> >>>>>>>>>>> integration uses Spark internals? Is it a common approach
> >>> for
> >>>>>>>>>>> integrating with Spark?
> >>>>>>>>>>>
> >>>>>>>>>>> пн, 30 сент. 2019 г. в 16:17, Alexey Zinoviev <
> >>>>>>> [hidden email]>:
> >>>>>>>>>>>>
> >>>>>>>>>>>> Hi, Igniters
> >>>>>>>>>>>> I've started the work on the Spark 2.4 support
> >>>>>>>>>>>>
> >>>>>>>>>>>> We started the discussion here, in
> >>>>>>>>>>>> https://issues.apache.org/jira/browse/IGNITE-12054
> >>>>>>>>>>>>
> >>>>>>>>>>>> The Spark internals were totally refactored between 2.3
> >>> and
> >>>>> 2.4
> >>>>>>> versions,
> >>>>>>>>>>>> main changes touches
> >>>>>>>>>>>>
> >>>>>>>>>>>>   - External catalog and listeners refactoring
> >>>>>>>>>>>>   - Changes of HAVING operator semantic support
> >>>>>>>>>>>>   - Push-down NULL filters generation in JOIN plans
> >>>>>>>>>>>>   - minor changes in Plan Generation that should be
> >>>> adopted
> >>>>> in
> >>>>>>> our
> >>>>>>>>>>>>   integration module
> >>>>>>>>>>>>
> >>>>>>>>>>>> I propose the initial solution here via creation of new
> >>>>> module
> >>>>>>> spark-2.4
> >>>>>>>>>>>> here
> >> https://issues.apache.org/jira/browse/IGNITE-12247
> >>>> and
> >>>>>>> addition of
> >>>>>>>>>>>
> >>>>>>>>>>> new
> >>>>>>>>>>>> profile spark-2.4 (to avoid possible clashes with
> >> another
> >>>>> spark
> >>>>>>> versions)
> >>>>>>>>>>>>
> >>>>>>>>>>>> Also I've transformed ticket to an Umbrella ticket and
> >>>>> created a
> >>>>>>> few
> >>>>>>>>>>>> tickets for muted tests (around 7 from 211 tests are
> >>> muted
> >>>>> now)
> >>>>>>>>>>>>
> >>>>>>>>>>>> Please, if somebody interested in it, make an initial
> >>>> review
> >>>>> of
> >>>>>>> modular
> >>>>>>>>>>>> ignite structure and changes (without deep diving into
> >>>> Spark
> >>>>>>> code).
> >>>>>>>>>>>>
> >>>>>>>>>>>> And yes, the proposed code is a copy-paste of
> >>> spark-ignite
> >>>>>>> module with a
> >>>>>>>>>>>> few fixes
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> --
> >>>>>>>>>>> Best regards,
> >>>>>>>>>>> Ivan Pavlukhin
> >>>>>>>>>>>
> >>>>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> Best regards,
> >>>>> Ivan Pavlukhin
> >>>>>
> >>>>
> >>>
> >>
>
>