[DISCUSSION] User-facing API for managing Maintenance Mode

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

[DISCUSSION] User-facing API for managing Maintenance Mode

Sergey Chugunov
Hello Ignite dev community,

As internal implementation of Maintenance Mode [1] is getting closer to
finish I want to discuss one more thing: user-facing API (I will use
control utility for examples) for managing it.

What should be managed?
When a node enters MM, it may start some automatic actions (like
defragmentation) or wait for a user to intervene and resolve the issue
(like in case of pds corruption).

So for manually triggered operations like pds cleanup after corruption we
should provide the user with a way to actually trigger the operation.
And for long-running automatic operations like defragmentation actions like
status and cancel are reasonable to implement.

At the same time Maintenance Mode is a supporting feature; it doesn't bring
any value by itself but enables implementation of other features.
Thus putting it at the center of API and build all commands around the main
"maintenance" command may not be right.

There are two alternatives - "*Big features deserve their own commands*"
and "*Everything should be unified*". Consider them.

Big features deserve their own commands
Here for each big feature we implement its own command. Defragmentation is
a big separate feature so why shouldn't it have its own commands to request
or cancel it?

Examples
    *control.sh defragmentation request-for-node --nodeId <node-id>
[--caches <caches list>]* - defragmentation will be started on the
particular node after its restart.
    *control.sh defragmentation status* - prints information about status
of on-going defragmentation.
    *control.sh defragmentation cancel* - cancels on-going defragmentation.

Another command - "maintenance" - will be used for more generic purposes.

Examples
    *control.sh maintenance list-records* - prints information about each
maintenance record (id and name of the record, parameters, description,
current status).
    *control.sh maintenance record-actions --id <record-id>* - prints
information about user-triggered actions available for this record (e.g.
for pds corruption record it may be "clean-corrupted-files")
    *control.sh maintenance execute-action --id <record-id> --action-name
<action name>* - triggers execution of particular action and prints results.

*Pros:*

   1. Big features like defragmentation get their own commands and more
   freedom in implementing them.
   2. It is emphasized that maintenance mode is just a supporting thing and
   not a first-class feature (it is not at the center of API).

*Cons:*

   1. Duplication of functionality. The same functions may be available via
   general maintenance command and a separate command of the feature.
   2. Information about a feature may be split into two commands. One piece
   of information is available in the "feature" command, another in the
   "maintenance" command.


Everything should be unified
We can go another way and gather all features that rely on MM under one
unified command.

API for node that is already in MM looks complete and logical, very
intuitive:
    *control.sh maintenance list-records* - output all records that have to
be resolved to finish maintenance.
    *control.sh maintenance record-actions --id <record-id>* - all actions
available for the record.
    *control.sh maintenance execute-action --id <record-id> --action-name
<action-name>* - executes action of the given name (like general actions
"status" or "delete" and more specific action "clean-corrupted-files" for
corrupted pds situation).

But API to request node to enter maintenance mode becomes more vague.
    *control.sh maintenance available-operations* - prints all operations
available to request (for instance, defragmentation).
    control.sh maintenance request-operation --id <operation-id> --params
<operation parameters> - requests given operation to start on next node
restart.
Here we have to distinguish operations that are requested automatically
(like pds corruption) and not show them to the user.

*Pros:*

   1. Single API to get information and trigger actions without any
   duplication.


*Cons:*

   1. We restrict big features by model provided by maintenance command.
   2. In this API we put maintenance in the center although it is nothing
   more than a supporting feature.
   3. API to request maintenance operations doesn't feel intuitive to me
   but more artificial.


So what do you think? What looks better and more intuitive from your
perspective?

I will be glad to hear any feedback on the subject.

As a result of this discussion I will create a ticket for implementation
and include it into IEP-53 [2]

[1] https://issues.apache.org/jira/browse/IGNITE-13366
[2]
https://cwiki.apache.org/confluence/display/IGNITE/IEP-53%3A+Maintenance+Mode
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSSION] User-facing API for managing Maintenance Mode

Denis Mekhanikov
Sergey,

Thanks for such a detailed description!

I find the first option more attractive. I see the maintenance mode as a
special state of a node that a user can turn on and off. If you want to
perform defragmentation, you need to turn that mode on. If you try to do it
in a normal mode, you get an error and a suggestion to turn MM on. Certain
commands will have a dependency on this mode.
It's like "active" / "inactive" / "read-only" cluster states, but for
nodes. You need to have an active cluster to perform cache puts. Similarly
you'll need to have a node in a maintenance mode to perform PDS recovery.

The approach with a "maintenance" command introduces the limitation that
the control utility will have to know about every command that requires
maintenance. There is a chance that this command will become bloated with
options. It will also be problematic for plugins to introduce new commands
requiring the maintenance mode.

Denis

вт, 29 сент. 2020 г. в 18:03, Sergey Chugunov <[hidden email]>:

> Hello Ignite dev community,
>
> As internal implementation of Maintenance Mode [1] is getting closer to
> finish I want to discuss one more thing: user-facing API (I will use
> control utility for examples) for managing it.
>
> What should be managed?
> When a node enters MM, it may start some automatic actions (like
> defragmentation) or wait for a user to intervene and resolve the issue
> (like in case of pds corruption).
>
> So for manually triggered operations like pds cleanup after corruption we
> should provide the user with a way to actually trigger the operation.
> And for long-running automatic operations like defragmentation actions like
> status and cancel are reasonable to implement.
>
> At the same time Maintenance Mode is a supporting feature; it doesn't bring
> any value by itself but enables implementation of other features.
> Thus putting it at the center of API and build all commands around the main
> "maintenance" command may not be right.
>
> There are two alternatives - "*Big features deserve their own commands*"
> and "*Everything should be unified*". Consider them.
>
> Big features deserve their own commands
> Here for each big feature we implement its own command. Defragmentation is
> a big separate feature so why shouldn't it have its own commands to request
> or cancel it?
>
> Examples
>     *control.sh defragmentation request-for-node --nodeId <node-id>
> [--caches <caches list>]* - defragmentation will be started on the
> particular node after its restart.
>     *control.sh defragmentation status* - prints information about status
> of on-going defragmentation.
>     *control.sh defragmentation cancel* - cancels on-going defragmentation.
>
> Another command - "maintenance" - will be used for more generic purposes.
>
> Examples
>     *control.sh maintenance list-records* - prints information about each
> maintenance record (id and name of the record, parameters, description,
> current status).
>     *control.sh maintenance record-actions --id <record-id>* - prints
> information about user-triggered actions available for this record (e.g.
> for pds corruption record it may be "clean-corrupted-files")
>     *control.sh maintenance execute-action --id <record-id> --action-name
> <action name>* - triggers execution of particular action and prints
> results.
>
> *Pros:*
>
>    1. Big features like defragmentation get their own commands and more
>    freedom in implementing them.
>    2. It is emphasized that maintenance mode is just a supporting thing and
>    not a first-class feature (it is not at the center of API).
>
> *Cons:*
>
>    1. Duplication of functionality. The same functions may be available via
>    general maintenance command and a separate command of the feature.
>    2. Information about a feature may be split into two commands. One piece
>    of information is available in the "feature" command, another in the
>    "maintenance" command.
>
>
> Everything should be unified
> We can go another way and gather all features that rely on MM under one
> unified command.
>
> API for node that is already in MM looks complete and logical, very
> intuitive:
>     *control.sh maintenance list-records* - output all records that have to
> be resolved to finish maintenance.
>     *control.sh maintenance record-actions --id <record-id>* - all actions
> available for the record.
>     *control.sh maintenance execute-action --id <record-id> --action-name
> <action-name>* - executes action of the given name (like general actions
> "status" or "delete" and more specific action "clean-corrupted-files" for
> corrupted pds situation).
>
> But API to request node to enter maintenance mode becomes more vague.
>     *control.sh maintenance available-operations* - prints all operations
> available to request (for instance, defragmentation).
>     control.sh maintenance request-operation --id <operation-id> --params
> <operation parameters> - requests given operation to start on next node
> restart.
> Here we have to distinguish operations that are requested automatically
> (like pds corruption) and not show them to the user.
>
> *Pros:*
>
>    1. Single API to get information and trigger actions without any
>    duplication.
>
>
> *Cons:*
>
>    1. We restrict big features by model provided by maintenance command.
>    2. In this API we put maintenance in the center although it is nothing
>    more than a supporting feature.
>    3. API to request maintenance operations doesn't feel intuitive to me
>    but more artificial.
>
>
> So what do you think? What looks better and more intuitive from your
> perspective?
>
> I will be glad to hear any feedback on the subject.
>
> As a result of this discussion I will create a ticket for implementation
> and include it into IEP-53 [2]
>
> [1] https://issues.apache.org/jira/browse/IGNITE-13366
> [2]
>
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-53%3A+Maintenance+Mode
>