Service Grid new design overview

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Service Grid new design overview

daradurvs
Hi, Igniters!

I’m working on Service Grid redesign tasks and design seems to be finished.

The main goal of Service Grid redesign is to provide missed guarantees:
- Synchronous services deployment/undeployment;
- Failover on coordinator change;
- Propagation of deployments errors across the cluster;
- Introduce of a deployment failures policy;
- Prevention of deployments initiators hangs while deployment;
- etc.

I’d like to ask the community their thoughts about the proposed design
to be sure that all important things have been considered.

Also, I have a question about services migration from AI 2.6 to a new
solution. It’s very hard to provide tools for users migration, because
of significant changes. We don’t use utility cache anymore. Should we
spend time on this?

I’ve prepared a definition of new Service Grid design, it’s described below:

*OVERVIEW*

All nodes (servers and clients) are able to host services, but the
client nodes are excluded from service deployment by default. The only
way to deploy service on clients nodes is to specify node filter in
ServiceConfiguration.

All deployed services are identified internally by “serviceId”
(IgniteUuid). This allows us to build a base for such features as hot
redeployment and service’s versioning. It’s important to have the
ability to identify and manage services with the same name, but
different version.

All actions on service’s state change are processed according to unified flow:
1) Initiator sends over disco-spi a request to change service state
[deploy, undeploy] DynamicServicesChangeRequestBatchMessage which will
be stored by all server nodes in own queue to be processed, if
coordinator failed, at new coordinator;
2) Coordinator calculates assignments and defines actions in a new
message ServicesAssignmentsRequestMessage and sends it over disco-spi
to be processed by all nodes;
3) All nodes apply actions and build single map message
ServicesSingleMapMessage that contains services id and amount of
instances were deployed on this single node and sends the message over
comm-spi to coordinator (p2p);
4) Once coordinator receives all single map messages then it builds
ServicesFullMapMessage that contains services deployments across the
cluster and sends message over disco-spi to be processed by all nodes;

*MESSAGES*

class DynamicServicesChangeRequestBatchMessage {
    Collection<DynamicServiceChangeRequest> reqs;
}

class DynamicServiceChangeRequest {
    IgniteUuid srvcId; // Unique service id (generates to deploy,
existing used to undeploy)
    ServiceConfiguration cfg; // Empty in case of undeploy
    byte flags; // Change’s types flags [deploy, undeploy, etc.]
}

class ServicesAssignmentsRequestMessage {
    ServicesDeploymentExchangeId exchId;
    Map<IgniteUuid, Map<UUID, Integer>> srvcsToDeploy; // Deploy and reassign
    Collection<IgniteUuid> srvcsToUndeploy;
}

class ServicesSingleMapMessage {
    ServicesDeploymentExchangeId exchId;
    Map<IgniteUuid, ServiceSingleDeploymentsResults> results;
}

class ServiceSingleDeploymentsResults {
    int cnt; // Deployed instances count, 0 in case of undeploy
    Collection<byte[]> errors; // Serialized exceptions to avoid
issues at spi-level
}

class ServicesFullMapMessage  {
    ServicesDeploymentExchangeId exchId;
    Collection<ServiceFullDeploymentsResults> results;
}

class ServiceFullDeploymentsResults {
    IgniteUuid srvcId;
    Map<UUID, ServiceSingleDeploymentsResults> results; // Per node
}

class ServicesDeploymentExchangeId {
    UUID nodeId; // Initiated, joined or failed node id
    int evtType; // EVT_NODE_[JOIN/LEFT/FAILED], EVT_DISCOVERY_CUSTOM_EVT
    AffinityTopologyVersion topVer;
    IgniteUuid reqId; // Unique id of custom discovery message
}

*COORDINATOR CHANGE*

All server nodes handle requests of service’s state changes and put it
into deployment queue, but only coordinator process them. If
coordinator left or fail they will be processed on new coordinator.

*TOPOLOGY CHANGE*

Each topology change (NODE_JOIN/LEFT/FAILED event) causes service's
states deployment task. Assignments will be recalculated and applied
for each deployed service.

*CLUSTER ACTIVATION/DEACTIVATION*

- On deactivation:
    * local services are being undeployed;
    * requests are not handling (including deployment / undeployment);
- On activation:
    * local services are being redeployed;
    * requests are handling as usual;

*RELATED LINKS*

https://cwiki.apache.org/confluence/display/IGNITE/IEP-17%3A+Oil+Change+in+Service+Grid
http://apache-ignite-developers.2346864.n4.nabble.com/Service-grid-redesign-td28521.html


--
Best Regards, Vyacheslav D.
Reply | Threaded
Open this post in threaded view
|

Re: Service Grid new design overview

Nikolay Izhikov-2
Hello, Vyacheslav.

Thanks, for sharing your design.

> I have a question about services migration from AI 2.6 to a new solution

Can you describe consequences of not having migration solution?
What will happen on the user side?


В Чт, 23/08/2018 в 14:44 +0300, Vyacheslav Daradur пишет:

> Hi, Igniters!
>
> I’m working on Service Grid redesign tasks and design seems to be finished.
>
> The main goal of Service Grid redesign is to provide missed guarantees:
> - Synchronous services deployment/undeployment;
> - Failover on coordinator change;
> - Propagation of deployments errors across the cluster;
> - Introduce of a deployment failures policy;
> - Prevention of deployments initiators hangs while deployment;
> - etc.
>
> I’d like to ask the community their thoughts about the proposed design
> to be sure that all important things have been considered.
>
> Also, I have a question about services migration from AI 2.6 to a new
> solution. It’s very hard to provide tools for users migration, because
> of significant changes. We don’t use utility cache anymore. Should we
> spend time on this?
>
> I’ve prepared a definition of new Service Grid design, it’s described below:
>
> *OVERVIEW*
>
> All nodes (servers and clients) are able to host services, but the
> client nodes are excluded from service deployment by default. The only
> way to deploy service on clients nodes is to specify node filter in
> ServiceConfiguration.
>
> All deployed services are identified internally by “serviceId”
> (IgniteUuid). This allows us to build a base for such features as hot
> redeployment and service’s versioning. It’s important to have the
> ability to identify and manage services with the same name, but
> different version.
>
> All actions on service’s state change are processed according to unified flow:
> 1) Initiator sends over disco-spi a request to change service state
> [deploy, undeploy] DynamicServicesChangeRequestBatchMessage which will
> be stored by all server nodes in own queue to be processed, if
> coordinator failed, at new coordinator;
> 2) Coordinator calculates assignments and defines actions in a new
> message ServicesAssignmentsRequestMessage and sends it over disco-spi
> to be processed by all nodes;
> 3) All nodes apply actions and build single map message
> ServicesSingleMapMessage that contains services id and amount of
> instances were deployed on this single node and sends the message over
> comm-spi to coordinator (p2p);
> 4) Once coordinator receives all single map messages then it builds
> ServicesFullMapMessage that contains services deployments across the
> cluster and sends message over disco-spi to be processed by all nodes;
>
> *MESSAGES*
>
> class DynamicServicesChangeRequestBatchMessage {
>     Collection<DynamicServiceChangeRequest> reqs;
> }
>
> class DynamicServiceChangeRequest {
>     IgniteUuid srvcId; // Unique service id (generates to deploy,
> existing used to undeploy)
>     ServiceConfiguration cfg; // Empty in case of undeploy
>     byte flags; // Change’s types flags [deploy, undeploy, etc.]
> }
>
> class ServicesAssignmentsRequestMessage {
>     ServicesDeploymentExchangeId exchId;
>     Map<IgniteUuid, Map<UUID, Integer>> srvcsToDeploy; // Deploy and reassign
>     Collection<IgniteUuid> srvcsToUndeploy;
> }
>
> class ServicesSingleMapMessage {
>     ServicesDeploymentExchangeId exchId;
>     Map<IgniteUuid, ServiceSingleDeploymentsResults> results;
> }
>
> class ServiceSingleDeploymentsResults {
>     int cnt; // Deployed instances count, 0 in case of undeploy
>     Collection<byte[]> errors; // Serialized exceptions to avoid
> issues at spi-level
> }
>
> class ServicesFullMapMessage  {
>     ServicesDeploymentExchangeId exchId;
>     Collection<ServiceFullDeploymentsResults> results;
> }
>
> class ServiceFullDeploymentsResults {
>     IgniteUuid srvcId;
>     Map<UUID, ServiceSingleDeploymentsResults> results; // Per node
> }
>
> class ServicesDeploymentExchangeId {
>     UUID nodeId; // Initiated, joined or failed node id
>     int evtType; // EVT_NODE_[JOIN/LEFT/FAILED], EVT_DISCOVERY_CUSTOM_EVT
>     AffinityTopologyVersion topVer;
>     IgniteUuid reqId; // Unique id of custom discovery message
> }
>
> *COORDINATOR CHANGE*
>
> All server nodes handle requests of service’s state changes and put it
> into deployment queue, but only coordinator process them. If
> coordinator left or fail they will be processed on new coordinator.
>
> *TOPOLOGY CHANGE*
>
> Each topology change (NODE_JOIN/LEFT/FAILED event) causes service's
> states deployment task. Assignments will be recalculated and applied
> for each deployed service.
>
> *CLUSTER ACTIVATION/DEACTIVATION*
>
> - On deactivation:
>     * local services are being undeployed;
>     * requests are not handling (including deployment / undeployment);
> - On activation:
>     * local services are being redeployed;
>     * requests are handling as usual;
>
> *RELATED LINKS*
>
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-17%3A+Oil+Change+in+Service+Grid
> http://apache-ignite-developers.2346864.n4.nabble.com/Service-grid-redesign-td28521.html
>
>

signature.asc (499 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Service Grid new design overview

Anton Vinogradov-2
Vyacheslav.

It looks like we able to restart all services on grid startup from old
definitions (inside cache) in case persistence turned on.
Se no problems to provide such automated migration case.
Also, we can test it using compatibility framework.

BTW, Is proposed solution provides the guarantee that services will be
redeployed after each cluster restart since now we're not using the cache?

чт, 23 авг. 2018 г. в 15:21, Nikolay Izhikov <[hidden email]>:

> Hello, Vyacheslav.
>
> Thanks, for sharing your design.
>
> > I have a question about services migration from AI 2.6 to a new solution
>
> Can you describe consequences of not having migration solution?
> What will happen on the user side?
>
>
> В Чт, 23/08/2018 в 14:44 +0300, Vyacheslav Daradur пишет:
> > Hi, Igniters!
> >
> > I’m working on Service Grid redesign tasks and design seems to be
> finished.
> >
> > The main goal of Service Grid redesign is to provide missed guarantees:
> > - Synchronous services deployment/undeployment;
> > - Failover on coordinator change;
> > - Propagation of deployments errors across the cluster;
> > - Introduce of a deployment failures policy;
> > - Prevention of deployments initiators hangs while deployment;
> > - etc.
> >
> > I’d like to ask the community their thoughts about the proposed design
> > to be sure that all important things have been considered.
> >
> > Also, I have a question about services migration from AI 2.6 to a new
> > solution. It’s very hard to provide tools for users migration, because
> > of significant changes. We don’t use utility cache anymore. Should we
> > spend time on this?
> >
> > I’ve prepared a definition of new Service Grid design, it’s described
> below:
> >
> > *OVERVIEW*
> >
> > All nodes (servers and clients) are able to host services, but the
> > client nodes are excluded from service deployment by default. The only
> > way to deploy service on clients nodes is to specify node filter in
> > ServiceConfiguration.
> >
> > All deployed services are identified internally by “serviceId”
> > (IgniteUuid). This allows us to build a base for such features as hot
> > redeployment and service’s versioning. It’s important to have the
> > ability to identify and manage services with the same name, but
> > different version.
> >
> > All actions on service’s state change are processed according to unified
> flow:
> > 1) Initiator sends over disco-spi a request to change service state
> > [deploy, undeploy] DynamicServicesChangeRequestBatchMessage which will
> > be stored by all server nodes in own queue to be processed, if
> > coordinator failed, at new coordinator;
> > 2) Coordinator calculates assignments and defines actions in a new
> > message ServicesAssignmentsRequestMessage and sends it over disco-spi
> > to be processed by all nodes;
> > 3) All nodes apply actions and build single map message
> > ServicesSingleMapMessage that contains services id and amount of
> > instances were deployed on this single node and sends the message over
> > comm-spi to coordinator (p2p);
> > 4) Once coordinator receives all single map messages then it builds
> > ServicesFullMapMessage that contains services deployments across the
> > cluster and sends message over disco-spi to be processed by all nodes;
> >
> > *MESSAGES*
> >
> > class DynamicServicesChangeRequestBatchMessage {
> >     Collection<DynamicServiceChangeRequest> reqs;
> > }
> >
> > class DynamicServiceChangeRequest {
> >     IgniteUuid srvcId; // Unique service id (generates to deploy,
> > existing used to undeploy)
> >     ServiceConfiguration cfg; // Empty in case of undeploy
> >     byte flags; // Change’s types flags [deploy, undeploy, etc.]
> > }
> >
> > class ServicesAssignmentsRequestMessage {
> >     ServicesDeploymentExchangeId exchId;
> >     Map<IgniteUuid, Map<UUID, Integer>> srvcsToDeploy; // Deploy and
> reassign
> >     Collection<IgniteUuid> srvcsToUndeploy;
> > }
> >
> > class ServicesSingleMapMessage {
> >     ServicesDeploymentExchangeId exchId;
> >     Map<IgniteUuid, ServiceSingleDeploymentsResults> results;
> > }
> >
> > class ServiceSingleDeploymentsResults {
> >     int cnt; // Deployed instances count, 0 in case of undeploy
> >     Collection<byte[]> errors; // Serialized exceptions to avoid
> > issues at spi-level
> > }
> >
> > class ServicesFullMapMessage  {
> >     ServicesDeploymentExchangeId exchId;
> >     Collection<ServiceFullDeploymentsResults> results;
> > }
> >
> > class ServiceFullDeploymentsResults {
> >     IgniteUuid srvcId;
> >     Map<UUID, ServiceSingleDeploymentsResults> results; // Per node
> > }
> >
> > class ServicesDeploymentExchangeId {
> >     UUID nodeId; // Initiated, joined or failed node id
> >     int evtType; // EVT_NODE_[JOIN/LEFT/FAILED], EVT_DISCOVERY_CUSTOM_EVT
> >     AffinityTopologyVersion topVer;
> >     IgniteUuid reqId; // Unique id of custom discovery message
> > }
> >
> > *COORDINATOR CHANGE*
> >
> > All server nodes handle requests of service’s state changes and put it
> > into deployment queue, but only coordinator process them. If
> > coordinator left or fail they will be processed on new coordinator.
> >
> > *TOPOLOGY CHANGE*
> >
> > Each topology change (NODE_JOIN/LEFT/FAILED event) causes service's
> > states deployment task. Assignments will be recalculated and applied
> > for each deployed service.
> >
> > *CLUSTER ACTIVATION/DEACTIVATION*
> >
> > - On deactivation:
> >     * local services are being undeployed;
> >     * requests are not handling (including deployment / undeployment);
> > - On activation:
> >     * local services are being redeployed;
> >     * requests are handling as usual;
> >
> > *RELATED LINKS*
> >
> >
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-17%3A+Oil+Change+in+Service+Grid
> >
> http://apache-ignite-developers.2346864.n4.nabble.com/Service-grid-redesign-td28521.html
> >
> >
Reply | Threaded
Open this post in threaded view
|

Re: Service Grid new design overview

daradurvs
Nick, Antron, thank you for stepping in.

AFAIK, Ignite cluster can move its state to a new version of Ignite
using persistence only.

Since Ignite v.2.3 persistence is configured on a memory region and
system memory region is not persistence, that means the system
(utility) cache will not be recovered on cluster restart.

Here is a ticket which describes the same issue:
https://issues.apache.org/jira/browse/IGNITE-6629

> BTW, Is proposed solution provides the guarantee that services will be
> redeployed after each cluster restart since now we're not using the cache?

No, only services described in IgniteConfiguration will be deployed at
node startup as well as now.

Am I wrong in something?
On Thu, Aug 23, 2018 at 5:59 PM Anton Vinogradov <[hidden email]> wrote:

>
> Vyacheslav.
>
> It looks like we able to restart all services on grid startup from old
> definitions (inside cache) in case persistence turned on.
> Se no problems to provide such automated migration case.
> Also, we can test it using compatibility framework.
>
> BTW, Is proposed solution provides the guarantee that services will be
> redeployed after each cluster restart since now we're not using the cache?
>
> чт, 23 авг. 2018 г. в 15:21, Nikolay Izhikov <[hidden email]>:
>
> > Hello, Vyacheslav.
> >
> > Thanks, for sharing your design.
> >
> > > I have a question about services migration from AI 2.6 to a new solution
> >
> > Can you describe consequences of not having migration solution?
> > What will happen on the user side?
> >
> >
> > В Чт, 23/08/2018 в 14:44 +0300, Vyacheslav Daradur пишет:
> > > Hi, Igniters!
> > >
> > > I’m working on Service Grid redesign tasks and design seems to be
> > finished.
> > >
> > > The main goal of Service Grid redesign is to provide missed guarantees:
> > > - Synchronous services deployment/undeployment;
> > > - Failover on coordinator change;
> > > - Propagation of deployments errors across the cluster;
> > > - Introduce of a deployment failures policy;
> > > - Prevention of deployments initiators hangs while deployment;
> > > - etc.
> > >
> > > I’d like to ask the community their thoughts about the proposed design
> > > to be sure that all important things have been considered.
> > >
> > > Also, I have a question about services migration from AI 2.6 to a new
> > > solution. It’s very hard to provide tools for users migration, because
> > > of significant changes. We don’t use utility cache anymore. Should we
> > > spend time on this?
> > >
> > > I’ve prepared a definition of new Service Grid design, it’s described
> > below:
> > >
> > > *OVERVIEW*
> > >
> > > All nodes (servers and clients) are able to host services, but the
> > > client nodes are excluded from service deployment by default. The only
> > > way to deploy service on clients nodes is to specify node filter in
> > > ServiceConfiguration.
> > >
> > > All deployed services are identified internally by “serviceId”
> > > (IgniteUuid). This allows us to build a base for such features as hot
> > > redeployment and service’s versioning. It’s important to have the
> > > ability to identify and manage services with the same name, but
> > > different version.
> > >
> > > All actions on service’s state change are processed according to unified
> > flow:
> > > 1) Initiator sends over disco-spi a request to change service state
> > > [deploy, undeploy] DynamicServicesChangeRequestBatchMessage which will
> > > be stored by all server nodes in own queue to be processed, if
> > > coordinator failed, at new coordinator;
> > > 2) Coordinator calculates assignments and defines actions in a new
> > > message ServicesAssignmentsRequestMessage and sends it over disco-spi
> > > to be processed by all nodes;
> > > 3) All nodes apply actions and build single map message
> > > ServicesSingleMapMessage that contains services id and amount of
> > > instances were deployed on this single node and sends the message over
> > > comm-spi to coordinator (p2p);
> > > 4) Once coordinator receives all single map messages then it builds
> > > ServicesFullMapMessage that contains services deployments across the
> > > cluster and sends message over disco-spi to be processed by all nodes;
> > >
> > > *MESSAGES*
> > >
> > > class DynamicServicesChangeRequestBatchMessage {
> > >     Collection<DynamicServiceChangeRequest> reqs;
> > > }
> > >
> > > class DynamicServiceChangeRequest {
> > >     IgniteUuid srvcId; // Unique service id (generates to deploy,
> > > existing used to undeploy)
> > >     ServiceConfiguration cfg; // Empty in case of undeploy
> > >     byte flags; // Change’s types flags [deploy, undeploy, etc.]
> > > }
> > >
> > > class ServicesAssignmentsRequestMessage {
> > >     ServicesDeploymentExchangeId exchId;
> > >     Map<IgniteUuid, Map<UUID, Integer>> srvcsToDeploy; // Deploy and
> > reassign
> > >     Collection<IgniteUuid> srvcsToUndeploy;
> > > }
> > >
> > > class ServicesSingleMapMessage {
> > >     ServicesDeploymentExchangeId exchId;
> > >     Map<IgniteUuid, ServiceSingleDeploymentsResults> results;
> > > }
> > >
> > > class ServiceSingleDeploymentsResults {
> > >     int cnt; // Deployed instances count, 0 in case of undeploy
> > >     Collection<byte[]> errors; // Serialized exceptions to avoid
> > > issues at spi-level
> > > }
> > >
> > > class ServicesFullMapMessage  {
> > >     ServicesDeploymentExchangeId exchId;
> > >     Collection<ServiceFullDeploymentsResults> results;
> > > }
> > >
> > > class ServiceFullDeploymentsResults {
> > >     IgniteUuid srvcId;
> > >     Map<UUID, ServiceSingleDeploymentsResults> results; // Per node
> > > }
> > >
> > > class ServicesDeploymentExchangeId {
> > >     UUID nodeId; // Initiated, joined or failed node id
> > >     int evtType; // EVT_NODE_[JOIN/LEFT/FAILED], EVT_DISCOVERY_CUSTOM_EVT
> > >     AffinityTopologyVersion topVer;
> > >     IgniteUuid reqId; // Unique id of custom discovery message
> > > }
> > >
> > > *COORDINATOR CHANGE*
> > >
> > > All server nodes handle requests of service’s state changes and put it
> > > into deployment queue, but only coordinator process them. If
> > > coordinator left or fail they will be processed on new coordinator.
> > >
> > > *TOPOLOGY CHANGE*
> > >
> > > Each topology change (NODE_JOIN/LEFT/FAILED event) causes service's
> > > states deployment task. Assignments will be recalculated and applied
> > > for each deployed service.
> > >
> > > *CLUSTER ACTIVATION/DEACTIVATION*
> > >
> > > - On deactivation:
> > >     * local services are being undeployed;
> > >     * requests are not handling (including deployment / undeployment);
> > > - On activation:
> > >     * local services are being redeployed;
> > >     * requests are handling as usual;
> > >
> > > *RELATED LINKS*
> > >
> > >
> > https://cwiki.apache.org/confluence/display/IGNITE/IEP-17%3A+Oil+Change+in+Service+Grid
> > >
> > http://apache-ignite-developers.2346864.n4.nabble.com/Service-grid-redesign-td28521.html
> > >
> > >



--
Best Regards, Vyacheslav D.
Reply | Threaded
Open this post in threaded view
|

Re: Service Grid new design overview

Dmitriy Pavlov
Denis M. & Val please share your vision about this topic.

пт, 24 авг. 2018 г. в 15:52, Vyacheslav Daradur <[hidden email]>:

> Nick, Antron, thank you for stepping in.
>
> AFAIK, Ignite cluster can move its state to a new version of Ignite
> using persistence only.
>
> Since Ignite v.2.3 persistence is configured on a memory region and
> system memory region is not persistence, that means the system
> (utility) cache will not be recovered on cluster restart.
>
> Here is a ticket which describes the same issue:
> https://issues.apache.org/jira/browse/IGNITE-6629
>
> > BTW, Is proposed solution provides the guarantee that services will be
> > redeployed after each cluster restart since now we're not using the
> cache?
>
> No, only services described in IgniteConfiguration will be deployed at
> node startup as well as now.
>
> Am I wrong in something?
> On Thu, Aug 23, 2018 at 5:59 PM Anton Vinogradov <[hidden email]> wrote:
> >
> > Vyacheslav.
> >
> > It looks like we able to restart all services on grid startup from old
> > definitions (inside cache) in case persistence turned on.
> > Se no problems to provide such automated migration case.
> > Also, we can test it using compatibility framework.
> >
> > BTW, Is proposed solution provides the guarantee that services will be
> > redeployed after each cluster restart since now we're not using the
> cache?
> >
> > чт, 23 авг. 2018 г. в 15:21, Nikolay Izhikov <[hidden email]>:
> >
> > > Hello, Vyacheslav.
> > >
> > > Thanks, for sharing your design.
> > >
> > > > I have a question about services migration from AI 2.6 to a new
> solution
> > >
> > > Can you describe consequences of not having migration solution?
> > > What will happen on the user side?
> > >
> > >
> > > В Чт, 23/08/2018 в 14:44 +0300, Vyacheslav Daradur пишет:
> > > > Hi, Igniters!
> > > >
> > > > I’m working on Service Grid redesign tasks and design seems to be
> > > finished.
> > > >
> > > > The main goal of Service Grid redesign is to provide missed
> guarantees:
> > > > - Synchronous services deployment/undeployment;
> > > > - Failover on coordinator change;
> > > > - Propagation of deployments errors across the cluster;
> > > > - Introduce of a deployment failures policy;
> > > > - Prevention of deployments initiators hangs while deployment;
> > > > - etc.
> > > >
> > > > I’d like to ask the community their thoughts about the proposed
> design
> > > > to be sure that all important things have been considered.
> > > >
> > > > Also, I have a question about services migration from AI 2.6 to a new
> > > > solution. It’s very hard to provide tools for users migration,
> because
> > > > of significant changes. We don’t use utility cache anymore. Should we
> > > > spend time on this?
> > > >
> > > > I’ve prepared a definition of new Service Grid design, it’s described
> > > below:
> > > >
> > > > *OVERVIEW*
> > > >
> > > > All nodes (servers and clients) are able to host services, but the
> > > > client nodes are excluded from service deployment by default. The
> only
> > > > way to deploy service on clients nodes is to specify node filter in
> > > > ServiceConfiguration.
> > > >
> > > > All deployed services are identified internally by “serviceId”
> > > > (IgniteUuid). This allows us to build a base for such features as hot
> > > > redeployment and service’s versioning. It’s important to have the
> > > > ability to identify and manage services with the same name, but
> > > > different version.
> > > >
> > > > All actions on service’s state change are processed according to
> unified
> > > flow:
> > > > 1) Initiator sends over disco-spi a request to change service state
> > > > [deploy, undeploy] DynamicServicesChangeRequestBatchMessage which
> will
> > > > be stored by all server nodes in own queue to be processed, if
> > > > coordinator failed, at new coordinator;
> > > > 2) Coordinator calculates assignments and defines actions in a new
> > > > message ServicesAssignmentsRequestMessage and sends it over disco-spi
> > > > to be processed by all nodes;
> > > > 3) All nodes apply actions and build single map message
> > > > ServicesSingleMapMessage that contains services id and amount of
> > > > instances were deployed on this single node and sends the message
> over
> > > > comm-spi to coordinator (p2p);
> > > > 4) Once coordinator receives all single map messages then it builds
> > > > ServicesFullMapMessage that contains services deployments across the
> > > > cluster and sends message over disco-spi to be processed by all
> nodes;
> > > >
> > > > *MESSAGES*
> > > >
> > > > class DynamicServicesChangeRequestBatchMessage {
> > > >     Collection<DynamicServiceChangeRequest> reqs;
> > > > }
> > > >
> > > > class DynamicServiceChangeRequest {
> > > >     IgniteUuid srvcId; // Unique service id (generates to deploy,
> > > > existing used to undeploy)
> > > >     ServiceConfiguration cfg; // Empty in case of undeploy
> > > >     byte flags; // Change’s types flags [deploy, undeploy, etc.]
> > > > }
> > > >
> > > > class ServicesAssignmentsRequestMessage {
> > > >     ServicesDeploymentExchangeId exchId;
> > > >     Map<IgniteUuid, Map<UUID, Integer>> srvcsToDeploy; // Deploy and
> > > reassign
> > > >     Collection<IgniteUuid> srvcsToUndeploy;
> > > > }
> > > >
> > > > class ServicesSingleMapMessage {
> > > >     ServicesDeploymentExchangeId exchId;
> > > >     Map<IgniteUuid, ServiceSingleDeploymentsResults> results;
> > > > }
> > > >
> > > > class ServiceSingleDeploymentsResults {
> > > >     int cnt; // Deployed instances count, 0 in case of undeploy
> > > >     Collection<byte[]> errors; // Serialized exceptions to avoid
> > > > issues at spi-level
> > > > }
> > > >
> > > > class ServicesFullMapMessage  {
> > > >     ServicesDeploymentExchangeId exchId;
> > > >     Collection<ServiceFullDeploymentsResults> results;
> > > > }
> > > >
> > > > class ServiceFullDeploymentsResults {
> > > >     IgniteUuid srvcId;
> > > >     Map<UUID, ServiceSingleDeploymentsResults> results; // Per node
> > > > }
> > > >
> > > > class ServicesDeploymentExchangeId {
> > > >     UUID nodeId; // Initiated, joined or failed node id
> > > >     int evtType; // EVT_NODE_[JOIN/LEFT/FAILED],
> EVT_DISCOVERY_CUSTOM_EVT
> > > >     AffinityTopologyVersion topVer;
> > > >     IgniteUuid reqId; // Unique id of custom discovery message
> > > > }
> > > >
> > > > *COORDINATOR CHANGE*
> > > >
> > > > All server nodes handle requests of service’s state changes and put
> it
> > > > into deployment queue, but only coordinator process them. If
> > > > coordinator left or fail they will be processed on new coordinator.
> > > >
> > > > *TOPOLOGY CHANGE*
> > > >
> > > > Each topology change (NODE_JOIN/LEFT/FAILED event) causes service's
> > > > states deployment task. Assignments will be recalculated and applied
> > > > for each deployed service.
> > > >
> > > > *CLUSTER ACTIVATION/DEACTIVATION*
> > > >
> > > > - On deactivation:
> > > >     * local services are being undeployed;
> > > >     * requests are not handling (including deployment /
> undeployment);
> > > > - On activation:
> > > >     * local services are being redeployed;
> > > >     * requests are handling as usual;
> > > >
> > > > *RELATED LINKS*
> > > >
> > > >
> > >
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-17%3A+Oil+Change+in+Service+Grid
> > > >
> > >
> http://apache-ignite-developers.2346864.n4.nabble.com/Service-grid-redesign-td28521.html
> > > >
> > > >
>
>
>
> --
> Best Regards, Vyacheslav D.
>
Reply | Threaded
Open this post in threaded view
|

Re: Service Grid new design overview

Valentin Kulichenko
Guys,

I believe we should preserve the behavior that we have now. What happens to
services if we restart a persistent cluster running 2.6? Are services
recreated or not? If YES, we should make sure the same happens after
redesign. Would be even better if we preserve compatibility, i.e. allow
seamless upgrade from older version that uses system cache to newer version
that uses disco messages for service deployment. If NO, it's much easier
and we can leave it as is for now. However, eventually would be great to
have an option to persist services and redeploy them after cluster restart.

-Val

On Fri, Aug 24, 2018 at 2:51 PM Dmitriy Pavlov <[hidden email]>
wrote:

> Denis M. & Val please share your vision about this topic.
>
> пт, 24 авг. 2018 г. в 15:52, Vyacheslav Daradur <[hidden email]>:
>
> > Nick, Antron, thank you for stepping in.
> >
> > AFAIK, Ignite cluster can move its state to a new version of Ignite
> > using persistence only.
> >
> > Since Ignite v.2.3 persistence is configured on a memory region and
> > system memory region is not persistence, that means the system
> > (utility) cache will not be recovered on cluster restart.
> >
> > Here is a ticket which describes the same issue:
> > https://issues.apache.org/jira/browse/IGNITE-6629
> >
> > > BTW, Is proposed solution provides the guarantee that services will be
> > > redeployed after each cluster restart since now we're not using the
> > cache?
> >
> > No, only services described in IgniteConfiguration will be deployed at
> > node startup as well as now.
> >
> > Am I wrong in something?
> > On Thu, Aug 23, 2018 at 5:59 PM Anton Vinogradov <[hidden email]> wrote:
> > >
> > > Vyacheslav.
> > >
> > > It looks like we able to restart all services on grid startup from old
> > > definitions (inside cache) in case persistence turned on.
> > > Se no problems to provide such automated migration case.
> > > Also, we can test it using compatibility framework.
> > >
> > > BTW, Is proposed solution provides the guarantee that services will be
> > > redeployed after each cluster restart since now we're not using the
> > cache?
> > >
> > > чт, 23 авг. 2018 г. в 15:21, Nikolay Izhikov <[hidden email]>:
> > >
> > > > Hello, Vyacheslav.
> > > >
> > > > Thanks, for sharing your design.
> > > >
> > > > > I have a question about services migration from AI 2.6 to a new
> > solution
> > > >
> > > > Can you describe consequences of not having migration solution?
> > > > What will happen on the user side?
> > > >
> > > >
> > > > В Чт, 23/08/2018 в 14:44 +0300, Vyacheslav Daradur пишет:
> > > > > Hi, Igniters!
> > > > >
> > > > > I’m working on Service Grid redesign tasks and design seems to be
> > > > finished.
> > > > >
> > > > > The main goal of Service Grid redesign is to provide missed
> > guarantees:
> > > > > - Synchronous services deployment/undeployment;
> > > > > - Failover on coordinator change;
> > > > > - Propagation of deployments errors across the cluster;
> > > > > - Introduce of a deployment failures policy;
> > > > > - Prevention of deployments initiators hangs while deployment;
> > > > > - etc.
> > > > >
> > > > > I’d like to ask the community their thoughts about the proposed
> > design
> > > > > to be sure that all important things have been considered.
> > > > >
> > > > > Also, I have a question about services migration from AI 2.6 to a
> new
> > > > > solution. It’s very hard to provide tools for users migration,
> > because
> > > > > of significant changes. We don’t use utility cache anymore. Should
> we
> > > > > spend time on this?
> > > > >
> > > > > I’ve prepared a definition of new Service Grid design, it’s
> described
> > > > below:
> > > > >
> > > > > *OVERVIEW*
> > > > >
> > > > > All nodes (servers and clients) are able to host services, but the
> > > > > client nodes are excluded from service deployment by default. The
> > only
> > > > > way to deploy service on clients nodes is to specify node filter in
> > > > > ServiceConfiguration.
> > > > >
> > > > > All deployed services are identified internally by “serviceId”
> > > > > (IgniteUuid). This allows us to build a base for such features as
> hot
> > > > > redeployment and service’s versioning. It’s important to have the
> > > > > ability to identify and manage services with the same name, but
> > > > > different version.
> > > > >
> > > > > All actions on service’s state change are processed according to
> > unified
> > > > flow:
> > > > > 1) Initiator sends over disco-spi a request to change service state
> > > > > [deploy, undeploy] DynamicServicesChangeRequestBatchMessage which
> > will
> > > > > be stored by all server nodes in own queue to be processed, if
> > > > > coordinator failed, at new coordinator;
> > > > > 2) Coordinator calculates assignments and defines actions in a new
> > > > > message ServicesAssignmentsRequestMessage and sends it over
> disco-spi
> > > > > to be processed by all nodes;
> > > > > 3) All nodes apply actions and build single map message
> > > > > ServicesSingleMapMessage that contains services id and amount of
> > > > > instances were deployed on this single node and sends the message
> > over
> > > > > comm-spi to coordinator (p2p);
> > > > > 4) Once coordinator receives all single map messages then it builds
> > > > > ServicesFullMapMessage that contains services deployments across
> the
> > > > > cluster and sends message over disco-spi to be processed by all
> > nodes;
> > > > >
> > > > > *MESSAGES*
> > > > >
> > > > > class DynamicServicesChangeRequestBatchMessage {
> > > > >     Collection<DynamicServiceChangeRequest> reqs;
> > > > > }
> > > > >
> > > > > class DynamicServiceChangeRequest {
> > > > >     IgniteUuid srvcId; // Unique service id (generates to deploy,
> > > > > existing used to undeploy)
> > > > >     ServiceConfiguration cfg; // Empty in case of undeploy
> > > > >     byte flags; // Change’s types flags [deploy, undeploy, etc.]
> > > > > }
> > > > >
> > > > > class ServicesAssignmentsRequestMessage {
> > > > >     ServicesDeploymentExchangeId exchId;
> > > > >     Map<IgniteUuid, Map<UUID, Integer>> srvcsToDeploy; // Deploy
> and
> > > > reassign
> > > > >     Collection<IgniteUuid> srvcsToUndeploy;
> > > > > }
> > > > >
> > > > > class ServicesSingleMapMessage {
> > > > >     ServicesDeploymentExchangeId exchId;
> > > > >     Map<IgniteUuid, ServiceSingleDeploymentsResults> results;
> > > > > }
> > > > >
> > > > > class ServiceSingleDeploymentsResults {
> > > > >     int cnt; // Deployed instances count, 0 in case of undeploy
> > > > >     Collection<byte[]> errors; // Serialized exceptions to avoid
> > > > > issues at spi-level
> > > > > }
> > > > >
> > > > > class ServicesFullMapMessage  {
> > > > >     ServicesDeploymentExchangeId exchId;
> > > > >     Collection<ServiceFullDeploymentsResults> results;
> > > > > }
> > > > >
> > > > > class ServiceFullDeploymentsResults {
> > > > >     IgniteUuid srvcId;
> > > > >     Map<UUID, ServiceSingleDeploymentsResults> results; // Per node
> > > > > }
> > > > >
> > > > > class ServicesDeploymentExchangeId {
> > > > >     UUID nodeId; // Initiated, joined or failed node id
> > > > >     int evtType; // EVT_NODE_[JOIN/LEFT/FAILED],
> > EVT_DISCOVERY_CUSTOM_EVT
> > > > >     AffinityTopologyVersion topVer;
> > > > >     IgniteUuid reqId; // Unique id of custom discovery message
> > > > > }
> > > > >
> > > > > *COORDINATOR CHANGE*
> > > > >
> > > > > All server nodes handle requests of service’s state changes and put
> > it
> > > > > into deployment queue, but only coordinator process them. If
> > > > > coordinator left or fail they will be processed on new coordinator.
> > > > >
> > > > > *TOPOLOGY CHANGE*
> > > > >
> > > > > Each topology change (NODE_JOIN/LEFT/FAILED event) causes service's
> > > > > states deployment task. Assignments will be recalculated and
> applied
> > > > > for each deployed service.
> > > > >
> > > > > *CLUSTER ACTIVATION/DEACTIVATION*
> > > > >
> > > > > - On deactivation:
> > > > >     * local services are being undeployed;
> > > > >     * requests are not handling (including deployment /
> > undeployment);
> > > > > - On activation:
> > > > >     * local services are being redeployed;
> > > > >     * requests are handling as usual;
> > > > >
> > > > > *RELATED LINKS*
> > > > >
> > > > >
> > > >
> >
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-17%3A+Oil+Change+in+Service+Grid
> > > > >
> > > >
> >
> http://apache-ignite-developers.2346864.n4.nabble.com/Service-grid-redesign-td28521.html
> > > > >
> > > > >
> >
> >
> >
> > --
> > Best Regards, Vyacheslav D.
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Service Grid new design overview

dsetrakyan
Agree with Val. I think all users would expect that a service is restarted
upon a node or cluster restart. Let's make sure we preserve this behavior.

D.

On Fri, Aug 24, 2018 at 4:17 PM, Valentin Kulichenko <
[hidden email]> wrote:

> Guys,
>
> I believe we should preserve the behavior that we have now. What happens to
> services if we restart a persistent cluster running 2.6? Are services
> recreated or not? If YES, we should make sure the same happens after
> redesign. Would be even better if we preserve compatibility, i.e. allow
> seamless upgrade from older version that uses system cache to newer version
> that uses disco messages for service deployment. If NO, it's much easier
> and we can leave it as is for now. However, eventually would be great to
> have an option to persist services and redeploy them after cluster restart.
>
> -Val
>
> On Fri, Aug 24, 2018 at 2:51 PM Dmitriy Pavlov <[hidden email]>
> wrote:
>
> > Denis M. & Val please share your vision about this topic.
> >
> > пт, 24 авг. 2018 г. в 15:52, Vyacheslav Daradur <[hidden email]>:
> >
> > > Nick, Antron, thank you for stepping in.
> > >
> > > AFAIK, Ignite cluster can move its state to a new version of Ignite
> > > using persistence only.
> > >
> > > Since Ignite v.2.3 persistence is configured on a memory region and
> > > system memory region is not persistence, that means the system
> > > (utility) cache will not be recovered on cluster restart.
> > >
> > > Here is a ticket which describes the same issue:
> > > https://issues.apache.org/jira/browse/IGNITE-6629
> > >
> > > > BTW, Is proposed solution provides the guarantee that services will
> be
> > > > redeployed after each cluster restart since now we're not using the
> > > cache?
> > >
> > > No, only services described in IgniteConfiguration will be deployed at
> > > node startup as well as now.
> > >
> > > Am I wrong in something?
> > > On Thu, Aug 23, 2018 at 5:59 PM Anton Vinogradov <[hidden email]>
> wrote:
> > > >
> > > > Vyacheslav.
> > > >
> > > > It looks like we able to restart all services on grid startup from
> old
> > > > definitions (inside cache) in case persistence turned on.
> > > > Se no problems to provide such automated migration case.
> > > > Also, we can test it using compatibility framework.
> > > >
> > > > BTW, Is proposed solution provides the guarantee that services will
> be
> > > > redeployed after each cluster restart since now we're not using the
> > > cache?
> > > >
> > > > чт, 23 авг. 2018 г. в 15:21, Nikolay Izhikov <[hidden email]>:
> > > >
> > > > > Hello, Vyacheslav.
> > > > >
> > > > > Thanks, for sharing your design.
> > > > >
> > > > > > I have a question about services migration from AI 2.6 to a new
> > > solution
> > > > >
> > > > > Can you describe consequences of not having migration solution?
> > > > > What will happen on the user side?
> > > > >
> > > > >
> > > > > В Чт, 23/08/2018 в 14:44 +0300, Vyacheslav Daradur пишет:
> > > > > > Hi, Igniters!
> > > > > >
> > > > > > I’m working on Service Grid redesign tasks and design seems to be
> > > > > finished.
> > > > > >
> > > > > > The main goal of Service Grid redesign is to provide missed
> > > guarantees:
> > > > > > - Synchronous services deployment/undeployment;
> > > > > > - Failover on coordinator change;
> > > > > > - Propagation of deployments errors across the cluster;
> > > > > > - Introduce of a deployment failures policy;
> > > > > > - Prevention of deployments initiators hangs while deployment;
> > > > > > - etc.
> > > > > >
> > > > > > I’d like to ask the community their thoughts about the proposed
> > > design
> > > > > > to be sure that all important things have been considered.
> > > > > >
> > > > > > Also, I have a question about services migration from AI 2.6 to a
> > new
> > > > > > solution. It’s very hard to provide tools for users migration,
> > > because
> > > > > > of significant changes. We don’t use utility cache anymore.
> Should
> > we
> > > > > > spend time on this?
> > > > > >
> > > > > > I’ve prepared a definition of new Service Grid design, it’s
> > described
> > > > > below:
> > > > > >
> > > > > > *OVERVIEW*
> > > > > >
> > > > > > All nodes (servers and clients) are able to host services, but
> the
> > > > > > client nodes are excluded from service deployment by default. The
> > > only
> > > > > > way to deploy service on clients nodes is to specify node filter
> in
> > > > > > ServiceConfiguration.
> > > > > >
> > > > > > All deployed services are identified internally by “serviceId”
> > > > > > (IgniteUuid). This allows us to build a base for such features as
> > hot
> > > > > > redeployment and service’s versioning. It’s important to have the
> > > > > > ability to identify and manage services with the same name, but
> > > > > > different version.
> > > > > >
> > > > > > All actions on service’s state change are processed according to
> > > unified
> > > > > flow:
> > > > > > 1) Initiator sends over disco-spi a request to change service
> state
> > > > > > [deploy, undeploy] DynamicServicesChangeRequestBatchMessage
> which
> > > will
> > > > > > be stored by all server nodes in own queue to be processed, if
> > > > > > coordinator failed, at new coordinator;
> > > > > > 2) Coordinator calculates assignments and defines actions in a
> new
> > > > > > message ServicesAssignmentsRequestMessage and sends it over
> > disco-spi
> > > > > > to be processed by all nodes;
> > > > > > 3) All nodes apply actions and build single map message
> > > > > > ServicesSingleMapMessage that contains services id and amount of
> > > > > > instances were deployed on this single node and sends the message
> > > over
> > > > > > comm-spi to coordinator (p2p);
> > > > > > 4) Once coordinator receives all single map messages then it
> builds
> > > > > > ServicesFullMapMessage that contains services deployments across
> > the
> > > > > > cluster and sends message over disco-spi to be processed by all
> > > nodes;
> > > > > >
> > > > > > *MESSAGES*
> > > > > >
> > > > > > class DynamicServicesChangeRequestBatchMessage {
> > > > > >     Collection<DynamicServiceChangeRequest> reqs;
> > > > > > }
> > > > > >
> > > > > > class DynamicServiceChangeRequest {
> > > > > >     IgniteUuid srvcId; // Unique service id (generates to deploy,
> > > > > > existing used to undeploy)
> > > > > >     ServiceConfiguration cfg; // Empty in case of undeploy
> > > > > >     byte flags; // Change’s types flags [deploy, undeploy, etc.]
> > > > > > }
> > > > > >
> > > > > > class ServicesAssignmentsRequestMessage {
> > > > > >     ServicesDeploymentExchangeId exchId;
> > > > > >     Map<IgniteUuid, Map<UUID, Integer>> srvcsToDeploy; // Deploy
> > and
> > > > > reassign
> > > > > >     Collection<IgniteUuid> srvcsToUndeploy;
> > > > > > }
> > > > > >
> > > > > > class ServicesSingleMapMessage {
> > > > > >     ServicesDeploymentExchangeId exchId;
> > > > > >     Map<IgniteUuid, ServiceSingleDeploymentsResults> results;
> > > > > > }
> > > > > >
> > > > > > class ServiceSingleDeploymentsResults {
> > > > > >     int cnt; // Deployed instances count, 0 in case of undeploy
> > > > > >     Collection<byte[]> errors; // Serialized exceptions to avoid
> > > > > > issues at spi-level
> > > > > > }
> > > > > >
> > > > > > class ServicesFullMapMessage  {
> > > > > >     ServicesDeploymentExchangeId exchId;
> > > > > >     Collection<ServiceFullDeploymentsResults> results;
> > > > > > }
> > > > > >
> > > > > > class ServiceFullDeploymentsResults {
> > > > > >     IgniteUuid srvcId;
> > > > > >     Map<UUID, ServiceSingleDeploymentsResults> results; // Per
> node
> > > > > > }
> > > > > >
> > > > > > class ServicesDeploymentExchangeId {
> > > > > >     UUID nodeId; // Initiated, joined or failed node id
> > > > > >     int evtType; // EVT_NODE_[JOIN/LEFT/FAILED],
> > > EVT_DISCOVERY_CUSTOM_EVT
> > > > > >     AffinityTopologyVersion topVer;
> > > > > >     IgniteUuid reqId; // Unique id of custom discovery message
> > > > > > }
> > > > > >
> > > > > > *COORDINATOR CHANGE*
> > > > > >
> > > > > > All server nodes handle requests of service’s state changes and
> put
> > > it
> > > > > > into deployment queue, but only coordinator process them. If
> > > > > > coordinator left or fail they will be processed on new
> coordinator.
> > > > > >
> > > > > > *TOPOLOGY CHANGE*
> > > > > >
> > > > > > Each topology change (NODE_JOIN/LEFT/FAILED event) causes
> service's
> > > > > > states deployment task. Assignments will be recalculated and
> > applied
> > > > > > for each deployed service.
> > > > > >
> > > > > > *CLUSTER ACTIVATION/DEACTIVATION*
> > > > > >
> > > > > > - On deactivation:
> > > > > >     * local services are being undeployed;
> > > > > >     * requests are not handling (including deployment /
> > > undeployment);
> > > > > > - On activation:
> > > > > >     * local services are being redeployed;
> > > > > >     * requests are handling as usual;
> > > > > >
> > > > > > *RELATED LINKS*
> > > > > >
> > > > > >
> > > > >
> > >
> > https://cwiki.apache.org/confluence/display/IGNITE/IEP-
> 17%3A+Oil+Change+in+Service+Grid
> > > > > >
> > > > >
> > >
> > http://apache-ignite-developers.2346864.n4.nabble.
> com/Service-grid-redesign-td28521.html
> > > > > >
> > > > > >
> > >
> > >
> > >
> > > --
> > > Best Regards, Vyacheslav D.
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Service Grid new design overview

daradurvs
Hi Igniters!

I had a private talk about new Service Grid design with: Alexey
Goncharuk, Vladimir Ozerov, Denis Mekhanikov, Nikolay Izhikov, Anton
Vinogradov and I'd like to share results.

Design looks good in general, but we have decided to improve the
unified flow of requests processing as follows - when any node
received request on deploying, a node should calculate assignments
yourself and deploys it if needed, then sends the result to the
coordinator instead of waiting for assignments from the coordinator.

For this change, we should make our service's assignments function
*determined*, that means the function will return the same results for
the same arguments at any node.

We all agreed with this change because it allows us to reduce messages
for handling each request and making the solution more flexible.

On Tue, Aug 28, 2018 at 12:26 AM Dmitriy Setrakyan
<[hidden email]> wrote:

>
> Agree with Val. I think all users would expect that a service is restarted
> upon a node or cluster restart. Let's make sure we preserve this behavior.
>
> D.
>
> On Fri, Aug 24, 2018 at 4:17 PM, Valentin Kulichenko <
> [hidden email]> wrote:
>
> > Guys,
> >
> > I believe we should preserve the behavior that we have now. What happens to
> > services if we restart a persistent cluster running 2.6? Are services
> > recreated or not? If YES, we should make sure the same happens after
> > redesign. Would be even better if we preserve compatibility, i.e. allow
> > seamless upgrade from older version that uses system cache to newer version
> > that uses disco messages for service deployment. If NO, it's much easier
> > and we can leave it as is for now. However, eventually would be great to
> > have an option to persist services and redeploy them after cluster restart.
> >
> > -Val
> >
> > On Fri, Aug 24, 2018 at 2:51 PM Dmitriy Pavlov <[hidden email]>
> > wrote:
> >
> > > Denis M. & Val please share your vision about this topic.
> > >
> > > пт, 24 авг. 2018 г. в 15:52, Vyacheslav Daradur <[hidden email]>:
> > >
> > > > Nick, Antron, thank you for stepping in.
> > > >
> > > > AFAIK, Ignite cluster can move its state to a new version of Ignite
> > > > using persistence only.
> > > >
> > > > Since Ignite v.2.3 persistence is configured on a memory region and
> > > > system memory region is not persistence, that means the system
> > > > (utility) cache will not be recovered on cluster restart.
> > > >
> > > > Here is a ticket which describes the same issue:
> > > > https://issues.apache.org/jira/browse/IGNITE-6629
> > > >
> > > > > BTW, Is proposed solution provides the guarantee that services will
> > be
> > > > > redeployed after each cluster restart since now we're not using the
> > > > cache?
> > > >
> > > > No, only services described in IgniteConfiguration will be deployed at
> > > > node startup as well as now.
> > > >
> > > > Am I wrong in something?
> > > > On Thu, Aug 23, 2018 at 5:59 PM Anton Vinogradov <[hidden email]>
> > wrote:
> > > > >
> > > > > Vyacheslav.
> > > > >
> > > > > It looks like we able to restart all services on grid startup from
> > old
> > > > > definitions (inside cache) in case persistence turned on.
> > > > > Se no problems to provide such automated migration case.
> > > > > Also, we can test it using compatibility framework.
> > > > >
> > > > > BTW, Is proposed solution provides the guarantee that services will
> > be
> > > > > redeployed after each cluster restart since now we're not using the
> > > > cache?
> > > > >
> > > > > чт, 23 авг. 2018 г. в 15:21, Nikolay Izhikov <[hidden email]>:
> > > > >
> > > > > > Hello, Vyacheslav.
> > > > > >
> > > > > > Thanks, for sharing your design.
> > > > > >
> > > > > > > I have a question about services migration from AI 2.6 to a new
> > > > solution
> > > > > >
> > > > > > Can you describe consequences of not having migration solution?
> > > > > > What will happen on the user side?
> > > > > >
> > > > > >
> > > > > > В Чт, 23/08/2018 в 14:44 +0300, Vyacheslav Daradur пишет:
> > > > > > > Hi, Igniters!
> > > > > > >
> > > > > > > I’m working on Service Grid redesign tasks and design seems to be
> > > > > > finished.
> > > > > > >
> > > > > > > The main goal of Service Grid redesign is to provide missed
> > > > guarantees:
> > > > > > > - Synchronous services deployment/undeployment;
> > > > > > > - Failover on coordinator change;
> > > > > > > - Propagation of deployments errors across the cluster;
> > > > > > > - Introduce of a deployment failures policy;
> > > > > > > - Prevention of deployments initiators hangs while deployment;
> > > > > > > - etc.
> > > > > > >
> > > > > > > I’d like to ask the community their thoughts about the proposed
> > > > design
> > > > > > > to be sure that all important things have been considered.
> > > > > > >
> > > > > > > Also, I have a question about services migration from AI 2.6 to a
> > > new
> > > > > > > solution. It’s very hard to provide tools for users migration,
> > > > because
> > > > > > > of significant changes. We don’t use utility cache anymore.
> > Should
> > > we
> > > > > > > spend time on this?
> > > > > > >
> > > > > > > I’ve prepared a definition of new Service Grid design, it’s
> > > described
> > > > > > below:
> > > > > > >
> > > > > > > *OVERVIEW*
> > > > > > >
> > > > > > > All nodes (servers and clients) are able to host services, but
> > the
> > > > > > > client nodes are excluded from service deployment by default. The
> > > > only
> > > > > > > way to deploy service on clients nodes is to specify node filter
> > in
> > > > > > > ServiceConfiguration.
> > > > > > >
> > > > > > > All deployed services are identified internally by “serviceId”
> > > > > > > (IgniteUuid). This allows us to build a base for such features as
> > > hot
> > > > > > > redeployment and service’s versioning. It’s important to have the
> > > > > > > ability to identify and manage services with the same name, but
> > > > > > > different version.
> > > > > > >
> > > > > > > All actions on service’s state change are processed according to
> > > > unified
> > > > > > flow:
> > > > > > > 1) Initiator sends over disco-spi a request to change service
> > state
> > > > > > > [deploy, undeploy] DynamicServicesChangeRequestBatchMessage
> > which
> > > > will
> > > > > > > be stored by all server nodes in own queue to be processed, if
> > > > > > > coordinator failed, at new coordinator;
> > > > > > > 2) Coordinator calculates assignments and defines actions in a
> > new
> > > > > > > message ServicesAssignmentsRequestMessage and sends it over
> > > disco-spi
> > > > > > > to be processed by all nodes;
> > > > > > > 3) All nodes apply actions and build single map message
> > > > > > > ServicesSingleMapMessage that contains services id and amount of
> > > > > > > instances were deployed on this single node and sends the message
> > > > over
> > > > > > > comm-spi to coordinator (p2p);
> > > > > > > 4) Once coordinator receives all single map messages then it
> > builds
> > > > > > > ServicesFullMapMessage that contains services deployments across
> > > the
> > > > > > > cluster and sends message over disco-spi to be processed by all
> > > > nodes;
> > > > > > >
> > > > > > > *MESSAGES*
> > > > > > >
> > > > > > > class DynamicServicesChangeRequestBatchMessage {
> > > > > > >     Collection<DynamicServiceChangeRequest> reqs;
> > > > > > > }
> > > > > > >
> > > > > > > class DynamicServiceChangeRequest {
> > > > > > >     IgniteUuid srvcId; // Unique service id (generates to deploy,
> > > > > > > existing used to undeploy)
> > > > > > >     ServiceConfiguration cfg; // Empty in case of undeploy
> > > > > > >     byte flags; // Change’s types flags [deploy, undeploy, etc.]
> > > > > > > }
> > > > > > >
> > > > > > > class ServicesAssignmentsRequestMessage {
> > > > > > >     ServicesDeploymentExchangeId exchId;
> > > > > > >     Map<IgniteUuid, Map<UUID, Integer>> srvcsToDeploy; // Deploy
> > > and
> > > > > > reassign
> > > > > > >     Collection<IgniteUuid> srvcsToUndeploy;
> > > > > > > }
> > > > > > >
> > > > > > > class ServicesSingleMapMessage {
> > > > > > >     ServicesDeploymentExchangeId exchId;
> > > > > > >     Map<IgniteUuid, ServiceSingleDeploymentsResults> results;
> > > > > > > }
> > > > > > >
> > > > > > > class ServiceSingleDeploymentsResults {
> > > > > > >     int cnt; // Deployed instances count, 0 in case of undeploy
> > > > > > >     Collection<byte[]> errors; // Serialized exceptions to avoid
> > > > > > > issues at spi-level
> > > > > > > }
> > > > > > >
> > > > > > > class ServicesFullMapMessage  {
> > > > > > >     ServicesDeploymentExchangeId exchId;
> > > > > > >     Collection<ServiceFullDeploymentsResults> results;
> > > > > > > }
> > > > > > >
> > > > > > > class ServiceFullDeploymentsResults {
> > > > > > >     IgniteUuid srvcId;
> > > > > > >     Map<UUID, ServiceSingleDeploymentsResults> results; // Per
> > node
> > > > > > > }
> > > > > > >
> > > > > > > class ServicesDeploymentExchangeId {
> > > > > > >     UUID nodeId; // Initiated, joined or failed node id
> > > > > > >     int evtType; // EVT_NODE_[JOIN/LEFT/FAILED],
> > > > EVT_DISCOVERY_CUSTOM_EVT
> > > > > > >     AffinityTopologyVersion topVer;
> > > > > > >     IgniteUuid reqId; // Unique id of custom discovery message
> > > > > > > }
> > > > > > >
> > > > > > > *COORDINATOR CHANGE*
> > > > > > >
> > > > > > > All server nodes handle requests of service’s state changes and
> > put
> > > > it
> > > > > > > into deployment queue, but only coordinator process them. If
> > > > > > > coordinator left or fail they will be processed on new
> > coordinator.
> > > > > > >
> > > > > > > *TOPOLOGY CHANGE*
> > > > > > >
> > > > > > > Each topology change (NODE_JOIN/LEFT/FAILED event) causes
> > service's
> > > > > > > states deployment task. Assignments will be recalculated and
> > > applied
> > > > > > > for each deployed service.
> > > > > > >
> > > > > > > *CLUSTER ACTIVATION/DEACTIVATION*
> > > > > > >
> > > > > > > - On deactivation:
> > > > > > >     * local services are being undeployed;
> > > > > > >     * requests are not handling (including deployment /
> > > > undeployment);
> > > > > > > - On activation:
> > > > > > >     * local services are being redeployed;
> > > > > > >     * requests are handling as usual;
> > > > > > >
> > > > > > > *RELATED LINKS*
> > > > > > >
> > > > > > >
> > > > > >
> > > >
> > > https://cwiki.apache.org/confluence/display/IGNITE/IEP-
> > 17%3A+Oil+Change+in+Service+Grid
> > > > > > >
> > > > > >
> > > >
> > > http://apache-ignite-developers.2346864.n4.nabble.
> > com/Service-grid-redesign-td28521.html
> > > > > > >
> > > > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Best Regards, Vyacheslav D.
> > > >
> > >
> >



--
Best Regards, Vyacheslav D.
Reply | Threaded
Open this post in threaded view
|

Re: Service Grid new design overview

dsetrakyan
I also hope that we have some batching API to allow deployment of multiple
services together, either on grid startup or during the call to "deploy..."
API.

D.

On Thu, Aug 30, 2018 at 5:04 AM, Vyacheslav Daradur <[hidden email]>
wrote:

> Hi Igniters!
>
> I had a private talk about new Service Grid design with: Alexey
> Goncharuk, Vladimir Ozerov, Denis Mekhanikov, Nikolay Izhikov, Anton
> Vinogradov and I'd like to share results.
>
> Design looks good in general, but we have decided to improve the
> unified flow of requests processing as follows - when any node
> received request on deploying, a node should calculate assignments
> yourself and deploys it if needed, then sends the result to the
> coordinator instead of waiting for assignments from the coordinator.
>
> For this change, we should make our service's assignments function
> *determined*, that means the function will return the same results for
> the same arguments at any node.
>
> We all agreed with this change because it allows us to reduce messages
> for handling each request and making the solution more flexible.
>
> On Tue, Aug 28, 2018 at 12:26 AM Dmitriy Setrakyan
> <[hidden email]> wrote:
> >
> > Agree with Val. I think all users would expect that a service is
> restarted
> > upon a node or cluster restart. Let's make sure we preserve this
> behavior.
> >
> > D.
> >
> > On Fri, Aug 24, 2018 at 4:17 PM, Valentin Kulichenko <
> > [hidden email]> wrote:
> >
> > > Guys,
> > >
> > > I believe we should preserve the behavior that we have now. What
> happens to
> > > services if we restart a persistent cluster running 2.6? Are services
> > > recreated or not? If YES, we should make sure the same happens after
> > > redesign. Would be even better if we preserve compatibility, i.e. allow
> > > seamless upgrade from older version that uses system cache to newer
> version
> > > that uses disco messages for service deployment. If NO, it's much
> easier
> > > and we can leave it as is for now. However, eventually would be great
> to
> > > have an option to persist services and redeploy them after cluster
> restart.
> > >
> > > -Val
> > >
> > > On Fri, Aug 24, 2018 at 2:51 PM Dmitriy Pavlov <[hidden email]>
> > > wrote:
> > >
> > > > Denis M. & Val please share your vision about this topic.
> > > >
> > > > пт, 24 авг. 2018 г. в 15:52, Vyacheslav Daradur <[hidden email]
> >:
> > > >
> > > > > Nick, Antron, thank you for stepping in.
> > > > >
> > > > > AFAIK, Ignite cluster can move its state to a new version of Ignite
> > > > > using persistence only.
> > > > >
> > > > > Since Ignite v.2.3 persistence is configured on a memory region and
> > > > > system memory region is not persistence, that means the system
> > > > > (utility) cache will not be recovered on cluster restart.
> > > > >
> > > > > Here is a ticket which describes the same issue:
> > > > > https://issues.apache.org/jira/browse/IGNITE-6629
> > > > >
> > > > > > BTW, Is proposed solution provides the guarantee that services
> will
> > > be
> > > > > > redeployed after each cluster restart since now we're not using
> the
> > > > > cache?
> > > > >
> > > > > No, only services described in IgniteConfiguration will be
> deployed at
> > > > > node startup as well as now.
> > > > >
> > > > > Am I wrong in something?
> > > > > On Thu, Aug 23, 2018 at 5:59 PM Anton Vinogradov <[hidden email]>
> > > wrote:
> > > > > >
> > > > > > Vyacheslav.
> > > > > >
> > > > > > It looks like we able to restart all services on grid startup
> from
> > > old
> > > > > > definitions (inside cache) in case persistence turned on.
> > > > > > Se no problems to provide such automated migration case.
> > > > > > Also, we can test it using compatibility framework.
> > > > > >
> > > > > > BTW, Is proposed solution provides the guarantee that services
> will
> > > be
> > > > > > redeployed after each cluster restart since now we're not using
> the
> > > > > cache?
> > > > > >
> > > > > > чт, 23 авг. 2018 г. в 15:21, Nikolay Izhikov <
> [hidden email]>:
> > > > > >
> > > > > > > Hello, Vyacheslav.
> > > > > > >
> > > > > > > Thanks, for sharing your design.
> > > > > > >
> > > > > > > > I have a question about services migration from AI 2.6 to a
> new
> > > > > solution
> > > > > > >
> > > > > > > Can you describe consequences of not having migration solution?
> > > > > > > What will happen on the user side?
> > > > > > >
> > > > > > >
> > > > > > > В Чт, 23/08/2018 в 14:44 +0300, Vyacheslav Daradur пишет:
> > > > > > > > Hi, Igniters!
> > > > > > > >
> > > > > > > > I’m working on Service Grid redesign tasks and design seems
> to be
> > > > > > > finished.
> > > > > > > >
> > > > > > > > The main goal of Service Grid redesign is to provide missed
> > > > > guarantees:
> > > > > > > > - Synchronous services deployment/undeployment;
> > > > > > > > - Failover on coordinator change;
> > > > > > > > - Propagation of deployments errors across the cluster;
> > > > > > > > - Introduce of a deployment failures policy;
> > > > > > > > - Prevention of deployments initiators hangs while
> deployment;
> > > > > > > > - etc.
> > > > > > > >
> > > > > > > > I’d like to ask the community their thoughts about the
> proposed
> > > > > design
> > > > > > > > to be sure that all important things have been considered.
> > > > > > > >
> > > > > > > > Also, I have a question about services migration from AI 2.6
> to a
> > > > new
> > > > > > > > solution. It’s very hard to provide tools for users
> migration,
> > > > > because
> > > > > > > > of significant changes. We don’t use utility cache anymore.
> > > Should
> > > > we
> > > > > > > > spend time on this?
> > > > > > > >
> > > > > > > > I’ve prepared a definition of new Service Grid design, it’s
> > > > described
> > > > > > > below:
> > > > > > > >
> > > > > > > > *OVERVIEW*
> > > > > > > >
> > > > > > > > All nodes (servers and clients) are able to host services,
> but
> > > the
> > > > > > > > client nodes are excluded from service deployment by
> default. The
> > > > > only
> > > > > > > > way to deploy service on clients nodes is to specify node
> filter
> > > in
> > > > > > > > ServiceConfiguration.
> > > > > > > >
> > > > > > > > All deployed services are identified internally by
> “serviceId”
> > > > > > > > (IgniteUuid). This allows us to build a base for such
> features as
> > > > hot
> > > > > > > > redeployment and service’s versioning. It’s important to
> have the
> > > > > > > > ability to identify and manage services with the same name,
> but
> > > > > > > > different version.
> > > > > > > >
> > > > > > > > All actions on service’s state change are processed
> according to
> > > > > unified
> > > > > > > flow:
> > > > > > > > 1) Initiator sends over disco-spi a request to change service
> > > state
> > > > > > > > [deploy, undeploy] DynamicServicesChangeRequestBatchMessage
> > > which
> > > > > will
> > > > > > > > be stored by all server nodes in own queue to be processed,
> if
> > > > > > > > coordinator failed, at new coordinator;
> > > > > > > > 2) Coordinator calculates assignments and defines actions in
> a
> > > new
> > > > > > > > message ServicesAssignmentsRequestMessage and sends it over
> > > > disco-spi
> > > > > > > > to be processed by all nodes;
> > > > > > > > 3) All nodes apply actions and build single map message
> > > > > > > > ServicesSingleMapMessage that contains services id and
> amount of
> > > > > > > > instances were deployed on this single node and sends the
> message
> > > > > over
> > > > > > > > comm-spi to coordinator (p2p);
> > > > > > > > 4) Once coordinator receives all single map messages then it
> > > builds
> > > > > > > > ServicesFullMapMessage that contains services deployments
> across
> > > > the
> > > > > > > > cluster and sends message over disco-spi to be processed by
> all
> > > > > nodes;
> > > > > > > >
> > > > > > > > *MESSAGES*
> > > > > > > >
> > > > > > > > class DynamicServicesChangeRequestBatchMessage {
> > > > > > > >     Collection<DynamicServiceChangeRequest> reqs;
> > > > > > > > }
> > > > > > > >
> > > > > > > > class DynamicServiceChangeRequest {
> > > > > > > >     IgniteUuid srvcId; // Unique service id (generates to
> deploy,
> > > > > > > > existing used to undeploy)
> > > > > > > >     ServiceConfiguration cfg; // Empty in case of undeploy
> > > > > > > >     byte flags; // Change’s types flags [deploy, undeploy,
> etc.]
> > > > > > > > }
> > > > > > > >
> > > > > > > > class ServicesAssignmentsRequestMessage {
> > > > > > > >     ServicesDeploymentExchangeId exchId;
> > > > > > > >     Map<IgniteUuid, Map<UUID, Integer>> srvcsToDeploy; //
> Deploy
> > > > and
> > > > > > > reassign
> > > > > > > >     Collection<IgniteUuid> srvcsToUndeploy;
> > > > > > > > }
> > > > > > > >
> > > > > > > > class ServicesSingleMapMessage {
> > > > > > > >     ServicesDeploymentExchangeId exchId;
> > > > > > > >     Map<IgniteUuid, ServiceSingleDeploymentsResults>
> results;
> > > > > > > > }
> > > > > > > >
> > > > > > > > class ServiceSingleDeploymentsResults {
> > > > > > > >     int cnt; // Deployed instances count, 0 in case of
> undeploy
> > > > > > > >     Collection<byte[]> errors; // Serialized exceptions to
> avoid
> > > > > > > > issues at spi-level
> > > > > > > > }
> > > > > > > >
> > > > > > > > class ServicesFullMapMessage  {
> > > > > > > >     ServicesDeploymentExchangeId exchId;
> > > > > > > >     Collection<ServiceFullDeploymentsResults> results;
> > > > > > > > }
> > > > > > > >
> > > > > > > > class ServiceFullDeploymentsResults {
> > > > > > > >     IgniteUuid srvcId;
> > > > > > > >     Map<UUID, ServiceSingleDeploymentsResults> results; //
> Per
> > > node
> > > > > > > > }
> > > > > > > >
> > > > > > > > class ServicesDeploymentExchangeId {
> > > > > > > >     UUID nodeId; // Initiated, joined or failed node id
> > > > > > > >     int evtType; // EVT_NODE_[JOIN/LEFT/FAILED],
> > > > > EVT_DISCOVERY_CUSTOM_EVT
> > > > > > > >     AffinityTopologyVersion topVer;
> > > > > > > >     IgniteUuid reqId; // Unique id of custom discovery
> message
> > > > > > > > }
> > > > > > > >
> > > > > > > > *COORDINATOR CHANGE*
> > > > > > > >
> > > > > > > > All server nodes handle requests of service’s state changes
> and
> > > put
> > > > > it
> > > > > > > > into deployment queue, but only coordinator process them. If
> > > > > > > > coordinator left or fail they will be processed on new
> > > coordinator.
> > > > > > > >
> > > > > > > > *TOPOLOGY CHANGE*
> > > > > > > >
> > > > > > > > Each topology change (NODE_JOIN/LEFT/FAILED event) causes
> > > service's
> > > > > > > > states deployment task. Assignments will be recalculated and
> > > > applied
> > > > > > > > for each deployed service.
> > > > > > > >
> > > > > > > > *CLUSTER ACTIVATION/DEACTIVATION*
> > > > > > > >
> > > > > > > > - On deactivation:
> > > > > > > >     * local services are being undeployed;
> > > > > > > >     * requests are not handling (including deployment /
> > > > > undeployment);
> > > > > > > > - On activation:
> > > > > > > >     * local services are being redeployed;
> > > > > > > >     * requests are handling as usual;
> > > > > > > >
> > > > > > > > *RELATED LINKS*
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > >
> > > > https://cwiki.apache.org/confluence/display/IGNITE/IEP-
> > > 17%3A+Oil+Change+in+Service+Grid
> > > > > > > >
> > > > > > >
> > > > >
> > > > http://apache-ignite-developers.2346864.n4.nabble.
> > > com/Service-grid-redesign-td28521.html
> > > > > > > >
> > > > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Best Regards, Vyacheslav D.
> > > > >
> > > >
> > >
>
>
>
> --
> Best Regards, Vyacheslav D.
>
Reply | Threaded
Open this post in threaded view
|

Re: Service Grid new design overview

daradurvs
Dmitriy,

Yes, as you can see in the first message in this thread, request
*DynamicServicesChangeRequestBatchMessage* is a container which is
able to transfer several actions for batch operations like
deployAll/cancelAll.

Also, it will be possible to combine operations for *deploy* and
*undeploy* in one message, for example in the case of redeploying by
new version etc.
On Thu, Aug 30, 2018 at 7:44 PM Dmitriy Setrakyan <[hidden email]> wrote:

>
> I also hope that we have some batching API to allow deployment of multiple
> services together, either on grid startup or during the call to "deploy..."
> API.
>
> D.
>
> On Thu, Aug 30, 2018 at 5:04 AM, Vyacheslav Daradur <[hidden email]>
> wrote:
>
> > Hi Igniters!
> >
> > I had a private talk about new Service Grid design with: Alexey
> > Goncharuk, Vladimir Ozerov, Denis Mekhanikov, Nikolay Izhikov, Anton
> > Vinogradov and I'd like to share results.
> >
> > Design looks good in general, but we have decided to improve the
> > unified flow of requests processing as follows - when any node
> > received request on deploying, a node should calculate assignments
> > yourself and deploys it if needed, then sends the result to the
> > coordinator instead of waiting for assignments from the coordinator.
> >
> > For this change, we should make our service's assignments function
> > *determined*, that means the function will return the same results for
> > the same arguments at any node.
> >
> > We all agreed with this change because it allows us to reduce messages
> > for handling each request and making the solution more flexible.
> >
> > On Tue, Aug 28, 2018 at 12:26 AM Dmitriy Setrakyan
> > <[hidden email]> wrote:
> > >
> > > Agree with Val. I think all users would expect that a service is
> > restarted
> > > upon a node or cluster restart. Let's make sure we preserve this
> > behavior.
> > >
> > > D.
> > >
> > > On Fri, Aug 24, 2018 at 4:17 PM, Valentin Kulichenko <
> > > [hidden email]> wrote:
> > >
> > > > Guys,
> > > >
> > > > I believe we should preserve the behavior that we have now. What
> > happens to
> > > > services if we restart a persistent cluster running 2.6? Are services
> > > > recreated or not? If YES, we should make sure the same happens after
> > > > redesign. Would be even better if we preserve compatibility, i.e. allow
> > > > seamless upgrade from older version that uses system cache to newer
> > version
> > > > that uses disco messages for service deployment. If NO, it's much
> > easier
> > > > and we can leave it as is for now. However, eventually would be great
> > to
> > > > have an option to persist services and redeploy them after cluster
> > restart.
> > > >
> > > > -Val
> > > >
> > > > On Fri, Aug 24, 2018 at 2:51 PM Dmitriy Pavlov <[hidden email]>
> > > > wrote:
> > > >
> > > > > Denis M. & Val please share your vision about this topic.
> > > > >
> > > > > пт, 24 авг. 2018 г. в 15:52, Vyacheslav Daradur <[hidden email]
> > >:
> > > > >
> > > > > > Nick, Antron, thank you for stepping in.
> > > > > >
> > > > > > AFAIK, Ignite cluster can move its state to a new version of Ignite
> > > > > > using persistence only.
> > > > > >
> > > > > > Since Ignite v.2.3 persistence is configured on a memory region and
> > > > > > system memory region is not persistence, that means the system
> > > > > > (utility) cache will not be recovered on cluster restart.
> > > > > >
> > > > > > Here is a ticket which describes the same issue:
> > > > > > https://issues.apache.org/jira/browse/IGNITE-6629
> > > > > >
> > > > > > > BTW, Is proposed solution provides the guarantee that services
> > will
> > > > be
> > > > > > > redeployed after each cluster restart since now we're not using
> > the
> > > > > > cache?
> > > > > >
> > > > > > No, only services described in IgniteConfiguration will be
> > deployed at
> > > > > > node startup as well as now.
> > > > > >
> > > > > > Am I wrong in something?
> > > > > > On Thu, Aug 23, 2018 at 5:59 PM Anton Vinogradov <[hidden email]>
> > > > wrote:
> > > > > > >
> > > > > > > Vyacheslav.
> > > > > > >
> > > > > > > It looks like we able to restart all services on grid startup
> > from
> > > > old
> > > > > > > definitions (inside cache) in case persistence turned on.
> > > > > > > Se no problems to provide such automated migration case.
> > > > > > > Also, we can test it using compatibility framework.
> > > > > > >
> > > > > > > BTW, Is proposed solution provides the guarantee that services
> > will
> > > > be
> > > > > > > redeployed after each cluster restart since now we're not using
> > the
> > > > > > cache?
> > > > > > >
> > > > > > > чт, 23 авг. 2018 г. в 15:21, Nikolay Izhikov <
> > [hidden email]>:
> > > > > > >
> > > > > > > > Hello, Vyacheslav.
> > > > > > > >
> > > > > > > > Thanks, for sharing your design.
> > > > > > > >
> > > > > > > > > I have a question about services migration from AI 2.6 to a
> > new
> > > > > > solution
> > > > > > > >
> > > > > > > > Can you describe consequences of not having migration solution?
> > > > > > > > What will happen on the user side?
> > > > > > > >
> > > > > > > >
> > > > > > > > В Чт, 23/08/2018 в 14:44 +0300, Vyacheslav Daradur пишет:
> > > > > > > > > Hi, Igniters!
> > > > > > > > >
> > > > > > > > > I’m working on Service Grid redesign tasks and design seems
> > to be
> > > > > > > > finished.
> > > > > > > > >
> > > > > > > > > The main goal of Service Grid redesign is to provide missed
> > > > > > guarantees:
> > > > > > > > > - Synchronous services deployment/undeployment;
> > > > > > > > > - Failover on coordinator change;
> > > > > > > > > - Propagation of deployments errors across the cluster;
> > > > > > > > > - Introduce of a deployment failures policy;
> > > > > > > > > - Prevention of deployments initiators hangs while
> > deployment;
> > > > > > > > > - etc.
> > > > > > > > >
> > > > > > > > > I’d like to ask the community their thoughts about the
> > proposed
> > > > > > design
> > > > > > > > > to be sure that all important things have been considered.
> > > > > > > > >
> > > > > > > > > Also, I have a question about services migration from AI 2.6
> > to a
> > > > > new
> > > > > > > > > solution. It’s very hard to provide tools for users
> > migration,
> > > > > > because
> > > > > > > > > of significant changes. We don’t use utility cache anymore.
> > > > Should
> > > > > we
> > > > > > > > > spend time on this?
> > > > > > > > >
> > > > > > > > > I’ve prepared a definition of new Service Grid design, it’s
> > > > > described
> > > > > > > > below:
> > > > > > > > >
> > > > > > > > > *OVERVIEW*
> > > > > > > > >
> > > > > > > > > All nodes (servers and clients) are able to host services,
> > but
> > > > the
> > > > > > > > > client nodes are excluded from service deployment by
> > default. The
> > > > > > only
> > > > > > > > > way to deploy service on clients nodes is to specify node
> > filter
> > > > in
> > > > > > > > > ServiceConfiguration.
> > > > > > > > >
> > > > > > > > > All deployed services are identified internally by
> > “serviceId”
> > > > > > > > > (IgniteUuid). This allows us to build a base for such
> > features as
> > > > > hot
> > > > > > > > > redeployment and service’s versioning. It’s important to
> > have the
> > > > > > > > > ability to identify and manage services with the same name,
> > but
> > > > > > > > > different version.
> > > > > > > > >
> > > > > > > > > All actions on service’s state change are processed
> > according to
> > > > > > unified
> > > > > > > > flow:
> > > > > > > > > 1) Initiator sends over disco-spi a request to change service
> > > > state
> > > > > > > > > [deploy, undeploy] DynamicServicesChangeRequestBatchMessage
> > > > which
> > > > > > will
> > > > > > > > > be stored by all server nodes in own queue to be processed,
> > if
> > > > > > > > > coordinator failed, at new coordinator;
> > > > > > > > > 2) Coordinator calculates assignments and defines actions in
> > a
> > > > new
> > > > > > > > > message ServicesAssignmentsRequestMessage and sends it over
> > > > > disco-spi
> > > > > > > > > to be processed by all nodes;
> > > > > > > > > 3) All nodes apply actions and build single map message
> > > > > > > > > ServicesSingleMapMessage that contains services id and
> > amount of
> > > > > > > > > instances were deployed on this single node and sends the
> > message
> > > > > > over
> > > > > > > > > comm-spi to coordinator (p2p);
> > > > > > > > > 4) Once coordinator receives all single map messages then it
> > > > builds
> > > > > > > > > ServicesFullMapMessage that contains services deployments
> > across
> > > > > the
> > > > > > > > > cluster and sends message over disco-spi to be processed by
> > all
> > > > > > nodes;
> > > > > > > > >
> > > > > > > > > *MESSAGES*
> > > > > > > > >
> > > > > > > > > class DynamicServicesChangeRequestBatchMessage {
> > > > > > > > >     Collection<DynamicServiceChangeRequest> reqs;
> > > > > > > > > }
> > > > > > > > >
> > > > > > > > > class DynamicServiceChangeRequest {
> > > > > > > > >     IgniteUuid srvcId; // Unique service id (generates to
> > deploy,
> > > > > > > > > existing used to undeploy)
> > > > > > > > >     ServiceConfiguration cfg; // Empty in case of undeploy
> > > > > > > > >     byte flags; // Change’s types flags [deploy, undeploy,
> > etc.]
> > > > > > > > > }
> > > > > > > > >
> > > > > > > > > class ServicesAssignmentsRequestMessage {
> > > > > > > > >     ServicesDeploymentExchangeId exchId;
> > > > > > > > >     Map<IgniteUuid, Map<UUID, Integer>> srvcsToDeploy; //
> > Deploy
> > > > > and
> > > > > > > > reassign
> > > > > > > > >     Collection<IgniteUuid> srvcsToUndeploy;
> > > > > > > > > }
> > > > > > > > >
> > > > > > > > > class ServicesSingleMapMessage {
> > > > > > > > >     ServicesDeploymentExchangeId exchId;
> > > > > > > > >     Map<IgniteUuid, ServiceSingleDeploymentsResults>
> > results;
> > > > > > > > > }
> > > > > > > > >
> > > > > > > > > class ServiceSingleDeploymentsResults {
> > > > > > > > >     int cnt; // Deployed instances count, 0 in case of
> > undeploy
> > > > > > > > >     Collection<byte[]> errors; // Serialized exceptions to
> > avoid
> > > > > > > > > issues at spi-level
> > > > > > > > > }
> > > > > > > > >
> > > > > > > > > class ServicesFullMapMessage  {
> > > > > > > > >     ServicesDeploymentExchangeId exchId;
> > > > > > > > >     Collection<ServiceFullDeploymentsResults> results;
> > > > > > > > > }
> > > > > > > > >
> > > > > > > > > class ServiceFullDeploymentsResults {
> > > > > > > > >     IgniteUuid srvcId;
> > > > > > > > >     Map<UUID, ServiceSingleDeploymentsResults> results; //
> > Per
> > > > node
> > > > > > > > > }
> > > > > > > > >
> > > > > > > > > class ServicesDeploymentExchangeId {
> > > > > > > > >     UUID nodeId; // Initiated, joined or failed node id
> > > > > > > > >     int evtType; // EVT_NODE_[JOIN/LEFT/FAILED],
> > > > > > EVT_DISCOVERY_CUSTOM_EVT
> > > > > > > > >     AffinityTopologyVersion topVer;
> > > > > > > > >     IgniteUuid reqId; // Unique id of custom discovery
> > message
> > > > > > > > > }
> > > > > > > > >
> > > > > > > > > *COORDINATOR CHANGE*
> > > > > > > > >
> > > > > > > > > All server nodes handle requests of service’s state changes
> > and
> > > > put
> > > > > > it
> > > > > > > > > into deployment queue, but only coordinator process them. If
> > > > > > > > > coordinator left or fail they will be processed on new
> > > > coordinator.
> > > > > > > > >
> > > > > > > > > *TOPOLOGY CHANGE*
> > > > > > > > >
> > > > > > > > > Each topology change (NODE_JOIN/LEFT/FAILED event) causes
> > > > service's
> > > > > > > > > states deployment task. Assignments will be recalculated and
> > > > > applied
> > > > > > > > > for each deployed service.
> > > > > > > > >
> > > > > > > > > *CLUSTER ACTIVATION/DEACTIVATION*
> > > > > > > > >
> > > > > > > > > - On deactivation:
> > > > > > > > >     * local services are being undeployed;
> > > > > > > > >     * requests are not handling (including deployment /
> > > > > > undeployment);
> > > > > > > > > - On activation:
> > > > > > > > >     * local services are being redeployed;
> > > > > > > > >     * requests are handling as usual;
> > > > > > > > >
> > > > > > > > > *RELATED LINKS*
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > > > https://cwiki.apache.org/confluence/display/IGNITE/IEP-
> > > > 17%3A+Oil+Change+in+Service+Grid
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > > > http://apache-ignite-developers.2346864.n4.nabble.
> > > > com/Service-grid-redesign-td28521.html
> > > > > > > > >
> > > > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Best Regards, Vyacheslav D.
> > > > > >
> > > > >
> > > >
> >
> >
> >
> > --
> > Best Regards, Vyacheslav D.
> >



--
Best Regards, Vyacheslav D.