Apache Ignite Developers - Legacy Mail Archive

[DISCUSS] Best way to re-encrypt existing data (TDE cache key rotation).

Classic

List

Threaded

32 messages Options

Pavel Pereslegin

[DISCUSS] Best way to re-encrypt existing data (TDE cache key rotation).

Hello Igniters.

Recently, master key rotation for Apache Ignite Transparent Data
Encryption was implemented [1], but some security standards (PCI DSS
at least) require rotation of all encryption keys [2]. Currently,
encryption occurs when reading/writing pages to disk, cache encryption
keys are stored in metastore.

I'm going to contribute cache encryption key rotation and want to
consult what is the best way to re-encrypting existing data, I see two
different strategies.

1. In place re-encryption:
Using the old key, sequentially read all the pages from the datastore,
mark as dirty and log them into the WAL. After checkpoint pages will
be stored to disk encrypted with the new key (as usual, along with
updates). This strategy requires store the identifier (number) of the
encryption key into the encrypted page.
pros:
- can work in the background with minimal performance impact (this
impact can be managed).
cons:
- page duplication in the WAL may affect performance and historical rebalance.

2. Copy partition with re-encryption.
This strategy is similar to partition snapshotting [3] - create
partition copy encrypted with the new key and then replace the
original partition file with the new one (see details [4]).
pros:
- should work faster than "in place" re-encryption.
cons:
- re-encryption in active cluster (and on unstable topology) can be
difficult to implement.

(See more detailed comparison [5])

Re-encryption of existing data is a long and rare procedure (It is
recommended to change the key every 6 months, but at least once every
2 years). Thus, re-encryption can be implemented for maintenance mode
(for example, on a stable topology in a read-only cluster) and in such
case the approach with partition copying seems simpler and faster.

So, what do you think - do we need "online" re-encryption and which of
the proposed options is best suited for this?

[1] https://issues.apache.org/jira/browse/IGNITE-12186
[2] https://www.pcisecuritystandards.org/documents/PCI_DSS_v3-2-1.pdf
[3] https://cwiki.apache.org/confluence/display/IGNITE/IEP-43%3A+Cluster+snapshots#IEP-43:Clustersnapshots-Partitionscopystrategy
[4] https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase-3.Cachekeyrotation.-Copywithre-encryptiondesign.
[5] https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase-3.Cachekeyrotation.-Comparison

Anton Vinogradov-2

Re: [DISCUSS] Best way to re-encrypt existing data (TDE cache key rotation).

+1 to "In place re-encryption".

- It has a simple design.
- Clusters under load may require just load to re-encrypt the data.
(Friendly to load).
- Easy to throttle.
- Easy to continue.
- Design compatible with the multi-key architecture.
- It can be optimized to use own WAL buffer and to re-encrypt pages without
restoring them to on-heap.

On Thu, May 14, 2020 at 1:54 AM Pavel Pereslegin <[hidden email]> wrote:

> Hello Igniters.
>
> Recently, master key rotation for Apache Ignite Transparent Data
> Encryption was implemented [1], but some security standards (PCI DSS
> at least) require rotation of all encryption keys [2]. Currently,
> encryption occurs when reading/writing pages to disk, cache encryption
> keys are stored in metastore.
>
> I'm going to contribute cache encryption key rotation and want to
> consult what is the best way to re-encrypting existing data, I see two
> different strategies.
>
> 1. In place re-encryption:
> Using the old key, sequentially read all the pages from the datastore,
> mark as dirty and log them into the WAL. After checkpoint pages will
> be stored to disk encrypted with the new key (as usual, along with
> updates). This strategy requires store the identifier (number) of the
> encryption key into the encrypted page.
> pros:
> - can work in the background with minimal performance impact (this
> impact can be managed).
> cons:
> - page duplication in the WAL may affect performance and historical
> rebalance.
>
> 2. Copy partition with re-encryption.
> This strategy is similar to partition snapshotting [3] - create
> partition copy encrypted with the new key and then replace the
> original partition file with the new one (see details [4]).
> pros:
> - should work faster than "in place" re-encryption.
> cons:
> - re-encryption in active cluster (and on unstable topology) can be
> difficult to implement.
>
> (See more detailed comparison [5])
>
> Re-encryption of existing data is a long and rare procedure (It is
> recommended to change the key every 6 months, but at least once every
> 2 years). Thus, re-encryption can be implemented for maintenance mode
> (for example, on a stable topology in a read-only cluster) and in such
> case the approach with partition copying seems simpler and faster.
>
> So, what do you think - do we need "online" re-encryption and which of
> the proposed options is best suited for this?
>
> [1] https://issues.apache.org/jira/browse/IGNITE-12186
> [2] https://www.pcisecuritystandards.org/documents/PCI_DSS_v3-2-1.pdf
> [3]
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-43%3A+Cluster+snapshots#IEP-43:Clustersnapshots-Partitionscopystrategy
> [4]
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase-3.Cachekeyrotation.-Copywithre-encryptiondesign
> .
> [5]
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase-3.Cachekeyrotation.-Comparison
>

Alexey Goncharuk

Re: [DISCUSS] Best way to re-encrypt existing data (TDE cache key rotation).

Pavel, Anton,

How do you see the whole key rotation procedure will work? Clearly, during
the re-encryption there will exist pages encrypted with both new and old
keys at the same time. Will a node continue to re-encrypt the data after it
restarts? If a node goes down during the re-encryption, but the rest of the
cluster finishes re-encryption, will we consider the procedure complete? By
the way, is the encryption key for the data the same on all nodes in the
cluster?

чт, 14 мая 2020 г. в 11:30, Anton Vinogradov <[hidden email]>:

> +1 to "In place re-encryption".
>
> - It has a simple design.
> - Clusters under load may require just load to re-encrypt the data.
> (Friendly to load).
> - Easy to throttle.
> - Easy to continue.
> - Design compatible with the multi-key architecture.
> - It can be optimized to use own WAL buffer and to re-encrypt pages without
> restoring them to on-heap.
>
> On Thu, May 14, 2020 at 1:54 AM Pavel Pereslegin <[hidden email]> wrote:
>
> > Hello Igniters.
> >
> > Recently, master key rotation for Apache Ignite Transparent Data
> > Encryption was implemented [1], but some security standards (PCI DSS
> > at least) require rotation of all encryption keys [2]. Currently,
> > encryption occurs when reading/writing pages to disk, cache encryption
> > keys are stored in metastore.
> >
> > I'm going to contribute cache encryption key rotation and want to
> > consult what is the best way to re-encrypting existing data, I see two
> > different strategies.
> >
> > 1. In place re-encryption:
> > Using the old key, sequentially read all the pages from the datastore,
> > mark as dirty and log them into the WAL. After checkpoint pages will
> > be stored to disk encrypted with the new key (as usual, along with
> > updates). This strategy requires store the identifier (number) of the
> > encryption key into the encrypted page.
> > pros:
> > - can work in the background with minimal performance impact (this
> > impact can be managed).
> > cons:
> > - page duplication in the WAL may affect performance and historical
> > rebalance.
> >
> > 2. Copy partition with re-encryption.
> > This strategy is similar to partition snapshotting [3] - create
> > partition copy encrypted with the new key and then replace the
> > original partition file with the new one (see details [4]).
> > pros:
> > - should work faster than "in place" re-encryption.
> > cons:
> > - re-encryption in active cluster (and on unstable topology) can be
> > difficult to implement.
> >
> > (See more detailed comparison [5])
> >
> > Re-encryption of existing data is a long and rare procedure (It is
> > recommended to change the key every 6 months, but at least once every
> > 2 years). Thus, re-encryption can be implemented for maintenance mode
> > (for example, on a stable topology in a read-only cluster) and in such
> > case the approach with partition copying seems simpler and faster.
> >
> > So, what do you think - do we need "online" re-encryption and which of
> > the proposed options is best suited for this?
> >
> > [1] https://issues.apache.org/jira/browse/IGNITE-12186
> > [2] https://www.pcisecuritystandards.org/documents/PCI_DSS_v3-2-1.pdf
> > [3]
> >
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-43%3A+Cluster+snapshots#IEP-43:Clustersnapshots-Partitionscopystrategy
> > [4]
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase-3.Cachekeyrotation.-Copywithre-encryptiondesign
> > .
> > [5]
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase-3.Cachekeyrotation.-Comparison
> >
>

Pavel Pereslegin

Re: [DISCUSS] Best way to re-encrypt existing data (TDE cache key rotation).

Hello, Alexey.

> is the encryption key for the data the same on all nodes in the cluster?
Yes, each encrypted cache group has its own encryption key, the key is
the same on all nodes.

> Clearly, during the re-encryption there will exist pages
> encrypted with both new and old keys at the same time.
Yes, there will be pages encrypted with different keys at the same time.
Currently, we only store one key for one cache group. To rotate a key,
at a certain point in time it is necessary to support several keys (at
least for reading the WAL).
For the "in place" strategy, we'll store the encryption key identifier
on each encrypted page (we currently have some unused space on
encrypted page, so I don't expect any memory overhead here). Thus, we
will have several keys for reading and one key for writing. I assume
that the old key will be automatically deleted when a specific WAL
segment is deleted (and re-encryption is finished).

> Will a node continue to re-encrypt the data after it restarts?
Yes.

> If a node goes down during the re-encryption, but the rest of the
> cluster finishes re-encryption, will we consider the procedure complete?
I'm not sure, but it looks like the key rotation is complete when we
set the new key on all nodes so that the updates will be encrypted
with the new key (as required by PCI DSS).
Status of re-encryption can be obtained separately (locally or cluster wide).

I forgot to mention that with “in place” re-encryption it will be
impossible to quickly cancel re-encryption, because by canceling we
mean re-encryption with the old key.

> How do you see the whole key rotation procedure will work?
Initial design for re-encryption with "partition copying" is described
here [1]. I'll prepare detailed design for "in place" re-encryption if
we'll go this way. In short, send the new encryption key cluster-wide,
each node adds a new key and starts background re-encryption.

[1] https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase-3.Cachekeyrotation.-Copywithre-encryptiondesign.

вс, 17 мая 2020 г. в 18:35, Alexey Goncharuk <[hidden email]>:

>
> Pavel, Anton,
>
> How do you see the whole key rotation procedure will work? Clearly, during
> the re-encryption there will exist pages encrypted with both new and old
> keys at the same time. Will a node continue to re-encrypt the data after it
> restarts? If a node goes down during the re-encryption, but the rest of the
> cluster finishes re-encryption, will we consider the procedure complete? By
> the way, is the encryption key for the data the same on all nodes in the
> cluster?
>
> чт, 14 мая 2020 г. в 11:30, Anton Vinogradov <[hidden email]>:
>
> > +1 to "In place re-encryption".
> >
> > - It has a simple design.
> > - Clusters under load may require just load to re-encrypt the data.
> > (Friendly to load).
> > - Easy to throttle.
> > - Easy to continue.
> > - Design compatible with the multi-key architecture.
> > - It can be optimized to use own WAL buffer and to re-encrypt pages without
> > restoring them to on-heap.
> >
> > On Thu, May 14, 2020 at 1:54 AM Pavel Pereslegin <[hidden email]> wrote:
> >
> > > Hello Igniters.
> > >
> > > Recently, master key rotation for Apache Ignite Transparent Data
> > > Encryption was implemented [1], but some security standards (PCI DSS
> > > at least) require rotation of all encryption keys [2]. Currently,
> > > encryption occurs when reading/writing pages to disk, cache encryption
> > > keys are stored in metastore.
> > >
> > > I'm going to contribute cache encryption key rotation and want to
> > > consult what is the best way to re-encrypting existing data, I see two
> > > different strategies.
> > >
> > > 1. In place re-encryption:
> > > Using the old key, sequentially read all the pages from the datastore,
> > > mark as dirty and log them into the WAL. After checkpoint pages will
> > > be stored to disk encrypted with the new key (as usual, along with
> > > updates). This strategy requires store the identifier (number) of the
> > > encryption key into the encrypted page.
> > > pros:
> > > - can work in the background with minimal performance impact (this
> > > impact can be managed).
> > > cons:
> > > - page duplication in the WAL may affect performance and historical
> > > rebalance.
> > >
> > > 2. Copy partition with re-encryption.
> > > This strategy is similar to partition snapshotting [3] - create
> > > partition copy encrypted with the new key and then replace the
> > > original partition file with the new one (see details [4]).
> > > pros:
> > > - should work faster than "in place" re-encryption.
> > > cons:
> > > - re-encryption in active cluster (and on unstable topology) can be
> > > difficult to implement.
> > >
> > > (See more detailed comparison [5])
> > >
> > > Re-encryption of existing data is a long and rare procedure (It is
> > > recommended to change the key every 6 months, but at least once every
> > > 2 years). Thus, re-encryption can be implemented for maintenance mode
> > > (for example, on a stable topology in a read-only cluster) and in such
> > > case the approach with partition copying seems simpler and faster.
> > >
> > > So, what do you think - do we need "online" re-encryption and which of
> > > the proposed options is best suited for this?
> > >
> > > [1] https://issues.apache.org/jira/browse/IGNITE-12186
> > > [2] https://www.pcisecuritystandards.org/documents/PCI_DSS_v3-2-1.pdf
> > > [3]
> > >
> > https://cwiki.apache.org/confluence/display/IGNITE/IEP-43%3A+Cluster+snapshots#IEP-43:Clustersnapshots-Partitionscopystrategy
> > > [4]
> > >
> > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase-3.Cachekeyrotation.-Copywithre-encryptiondesign
> > > .
> > > [5]
> > >
> > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase-3.Cachekeyrotation.-Comparison
> > >
> >

Ivan Rakov

Re: [DISCUSS] Best way to re-encrypt existing data (TDE cache key rotation).

Folks,

Just keeping you informed: I and my colleagues are highly interested in TDE
in general and keys rotations specifically, but we don't have enough time
so far.
We'll dive into this feature and participate in reviews next month.

--
Best Regards,
Ivan Rakov

On Sun, May 17, 2020 at 10:51 PM Pavel Pereslegin <[hidden email]> wrote:

> Hello, Alexey.
>
> > is the encryption key for the data the same on all nodes in the cluster?
> Yes, each encrypted cache group has its own encryption key, the key is
> the same on all nodes.
>
> > Clearly, during the re-encryption there will exist pages
> > encrypted with both new and old keys at the same time.
> Yes, there will be pages encrypted with different keys at the same time.
> Currently, we only store one key for one cache group. To rotate a key,
> at a certain point in time it is necessary to support several keys (at
> least for reading the WAL).
> For the "in place" strategy, we'll store the encryption key identifier
> on each encrypted page (we currently have some unused space on
> encrypted page, so I don't expect any memory overhead here). Thus, we
> will have several keys for reading and one key for writing. I assume
> that the old key will be automatically deleted when a specific WAL
> segment is deleted (and re-encryption is finished).
>
> > Will a node continue to re-encrypt the data after it restarts?
> Yes.
>
> > If a node goes down during the re-encryption, but the rest of the
> > cluster finishes re-encryption, will we consider the procedure complete?
> I'm not sure, but it looks like the key rotation is complete when we
> set the new key on all nodes so that the updates will be encrypted
> with the new key (as required by PCI DSS).
> Status of re-encryption can be obtained separately (locally or cluster
> wide).
>
> I forgot to mention that with “in place” re-encryption it will be
> impossible to quickly cancel re-encryption, because by canceling we
> mean re-encryption with the old key.
>
> > How do you see the whole key rotation procedure will work?
> Initial design for re-encryption with "partition copying" is described
> here [1]. I'll prepare detailed design for "in place" re-encryption if
> we'll go this way. In short, send the new encryption key cluster-wide,
> each node adds a new key and starts background re-encryption.
>
> [1]
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase-3.Cachekeyrotation.-Copywithre-encryptiondesign
> .
>
> вс, 17 мая 2020 г. в 18:35, Alexey Goncharuk <[hidden email]>:
> >
> > Pavel, Anton,
> >
> > How do you see the whole key rotation procedure will work? Clearly,
> during
> > the re-encryption there will exist pages encrypted with both new and old
> > keys at the same time. Will a node continue to re-encrypt the data after
> it
> > restarts? If a node goes down during the re-encryption, but the rest of
> the
> > cluster finishes re-encryption, will we consider the procedure complete?
> By
> > the way, is the encryption key for the data the same on all nodes in the
> > cluster?
> >
> > чт, 14 мая 2020 г. в 11:30, Anton Vinogradov <[hidden email]>:
> >
> > > +1 to "In place re-encryption".
> > >
> > > - It has a simple design.
> > > - Clusters under load may require just load to re-encrypt the data.
> > > (Friendly to load).
> > > - Easy to throttle.
> > > - Easy to continue.
> > > - Design compatible with the multi-key architecture.
> > > - It can be optimized to use own WAL buffer and to re-encrypt pages
> without
> > > restoring them to on-heap.
> > >
> > > On Thu, May 14, 2020 at 1:54 AM Pavel Pereslegin <[hidden email]>
> wrote:
> > >
> > > > Hello Igniters.
> > > >
> > > > Recently, master key rotation for Apache Ignite Transparent Data
> > > > Encryption was implemented [1], but some security standards (PCI DSS
> > > > at least) require rotation of all encryption keys [2]. Currently,
> > > > encryption occurs when reading/writing pages to disk, cache
> encryption
> > > > keys are stored in metastore.
> > > >
> > > > I'm going to contribute cache encryption key rotation and want to
> > > > consult what is the best way to re-encrypting existing data, I see
> two
> > > > different strategies.
> > > >
> > > > 1. In place re-encryption:
> > > > Using the old key, sequentially read all the pages from the
> datastore,
> > > > mark as dirty and log them into the WAL. After checkpoint pages will
> > > > be stored to disk encrypted with the new key (as usual, along with
> > > > updates). This strategy requires store the identifier (number) of the
> > > > encryption key into the encrypted page.
> > > > pros:
> > > > - can work in the background with minimal performance impact (this
> > > > impact can be managed).
> > > > cons:
> > > > - page duplication in the WAL may affect performance and historical
> > > > rebalance.
> > > >
> > > > 2. Copy partition with re-encryption.
> > > > This strategy is similar to partition snapshotting [3] - create
> > > > partition copy encrypted with the new key and then replace the
> > > > original partition file with the new one (see details [4]).
> > > > pros:
> > > > - should work faster than "in place" re-encryption.
> > > > cons:
> > > > - re-encryption in active cluster (and on unstable topology) can be
> > > > difficult to implement.
> > > >
> > > > (See more detailed comparison [5])
> > > >
> > > > Re-encryption of existing data is a long and rare procedure (It is
> > > > recommended to change the key every 6 months, but at least once every
> > > > 2 years). Thus, re-encryption can be implemented for maintenance mode
> > > > (for example, on a stable topology in a read-only cluster) and in
> such
> > > > case the approach with partition copying seems simpler and faster.
> > > >
> > > > So, what do you think - do we need "online" re-encryption and which
> of
> > > > the proposed options is best suited for this?
> > > >
> > > > [1] https://issues.apache.org/jira/browse/IGNITE-12186
> > > > [2]
> https://www.pcisecuritystandards.org/documents/PCI_DSS_v3-2-1.pdf
> > > > [3]
> > > >
> > >
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-43%3A+Cluster+snapshots#IEP-43:Clustersnapshots-Partitionscopystrategy
> > > > [4]
> > > >
> > >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase-3.Cachekeyrotation.-Copywithre-encryptiondesign
> > > > .
> > > > [5]
> > > >
> > >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase-3.Cachekeyrotation.-Comparison
> > > >
> > >
>

Alexei Scherbakov

Re: [DISCUSS] Best way to re-encrypt existing data (TDE cache key rotation).

Pavel Pereslegin,

I see another opportunity.
We can use rebalancing to re-encrypt node data with a new key.
It's a trivial procedure for me: stop a node, clear database, change a key,
start node and wait for rebalancing to complete.
Data will be re-encrypted during rebalancing.

Did I miss something ?

пт, 22 мая 2020 г. в 16:14, Ivan Rakov <[hidden email]>:

> Folks,
>
> Just keeping you informed: I and my colleagues are highly interested in TDE
> in general and keys rotations specifically, but we don't have enough time
> so far.
> We'll dive into this feature and participate in reviews next month.
>
> --
> Best Regards,
> Ivan Rakov
>
> On Sun, May 17, 2020 at 10:51 PM Pavel Pereslegin <[hidden email]>
> wrote:
>
> > Hello, Alexey.
> >
> > > is the encryption key for the data the same on all nodes in the
> cluster?
> > Yes, each encrypted cache group has its own encryption key, the key is
> > the same on all nodes.
> >
> > > Clearly, during the re-encryption there will exist pages
> > > encrypted with both new and old keys at the same time.
> > Yes, there will be pages encrypted with different keys at the same time.
> > Currently, we only store one key for one cache group. To rotate a key,
> > at a certain point in time it is necessary to support several keys (at
> > least for reading the WAL).
> > For the "in place" strategy, we'll store the encryption key identifier
> > on each encrypted page (we currently have some unused space on
> > encrypted page, so I don't expect any memory overhead here). Thus, we
> > will have several keys for reading and one key for writing. I assume
> > that the old key will be automatically deleted when a specific WAL
> > segment is deleted (and re-encryption is finished).
> >
> > > Will a node continue to re-encrypt the data after it restarts?
> > Yes.
> >
> > > If a node goes down during the re-encryption, but the rest of the
> > > cluster finishes re-encryption, will we consider the procedure
> complete?
> > I'm not sure, but it looks like the key rotation is complete when we
> > set the new key on all nodes so that the updates will be encrypted
> > with the new key (as required by PCI DSS).
> > Status of re-encryption can be obtained separately (locally or cluster
> > wide).
> >
> > I forgot to mention that with “in place” re-encryption it will be
> > impossible to quickly cancel re-encryption, because by canceling we
> > mean re-encryption with the old key.
> >
> > > How do you see the whole key rotation procedure will work?
> > Initial design for re-encryption with "partition copying" is described
> > here [1]. I'll prepare detailed design for "in place" re-encryption if
> > we'll go this way. In short, send the new encryption key cluster-wide,
> > each node adds a new key and starts background re-encryption.
> >
> > [1]
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase-3.Cachekeyrotation.-Copywithre-encryptiondesign
> > .
> >
> > вс, 17 мая 2020 г. в 18:35, Alexey Goncharuk <[hidden email]
> >:
> > >
> > > Pavel, Anton,
> > >
> > > How do you see the whole key rotation procedure will work? Clearly,
> > during
> > > the re-encryption there will exist pages encrypted with both new and
> old
> > > keys at the same time. Will a node continue to re-encrypt the data
> after
> > it
> > > restarts? If a node goes down during the re-encryption, but the rest of
> > the
> > > cluster finishes re-encryption, will we consider the procedure
> complete?
> > By
> > > the way, is the encryption key for the data the same on all nodes in
> the
> > > cluster?
> > >
> > > чт, 14 мая 2020 г. в 11:30, Anton Vinogradov <[hidden email]>:
> > >
> > > > +1 to "In place re-encryption".
> > > >
> > > > - It has a simple design.
> > > > - Clusters under load may require just load to re-encrypt the data.
> > > > (Friendly to load).
> > > > - Easy to throttle.
> > > > - Easy to continue.
> > > > - Design compatible with the multi-key architecture.
> > > > - It can be optimized to use own WAL buffer and to re-encrypt pages
> > without
> > > > restoring them to on-heap.
> > > >
> > > > On Thu, May 14, 2020 at 1:54 AM Pavel Pereslegin <[hidden email]>
> > wrote:
> > > >
> > > > > Hello Igniters.
> > > > >
> > > > > Recently, master key rotation for Apache Ignite Transparent Data
> > > > > Encryption was implemented [1], but some security standards (PCI
> DSS
> > > > > at least) require rotation of all encryption keys [2]. Currently,
> > > > > encryption occurs when reading/writing pages to disk, cache
> > encryption
> > > > > keys are stored in metastore.
> > > > >
> > > > > I'm going to contribute cache encryption key rotation and want to
> > > > > consult what is the best way to re-encrypting existing data, I see
> > two
> > > > > different strategies.
> > > > >
> > > > > 1. In place re-encryption:
> > > > > Using the old key, sequentially read all the pages from the
> > datastore,
> > > > > mark as dirty and log them into the WAL. After checkpoint pages
> will
> > > > > be stored to disk encrypted with the new key (as usual, along with
> > > > > updates). This strategy requires store the identifier (number) of
> the
> > > > > encryption key into the encrypted page.
> > > > > pros:
> > > > > - can work in the background with minimal performance impact
> (this
> > > > > impact can be managed).
> > > > > cons:
> > > > > - page duplication in the WAL may affect performance and
> historical
> > > > > rebalance.
> > > > >
> > > > > 2. Copy partition with re-encryption.
> > > > > This strategy is similar to partition snapshotting [3] - create
> > > > > partition copy encrypted with the new key and then replace the
> > > > > original partition file with the new one (see details [4]).
> > > > > pros:
> > > > > - should work faster than "in place" re-encryption.
> > > > > cons:
> > > > > - re-encryption in active cluster (and on unstable topology) can
> be
> > > > > difficult to implement.
> > > > >
> > > > > (See more detailed comparison [5])
> > > > >
> > > > > Re-encryption of existing data is a long and rare procedure (It is
> > > > > recommended to change the key every 6 months, but at least once
> every
> > > > > 2 years). Thus, re-encryption can be implemented for maintenance
> mode
> > > > > (for example, on a stable topology in a read-only cluster) and in
> > such
> > > > > case the approach with partition copying seems simpler and faster.
> > > > >
> > > > > So, what do you think - do we need "online" re-encryption and which
> > of
> > > > > the proposed options is best suited for this?
> > > > >
> > > > > [1] https://issues.apache.org/jira/browse/IGNITE-12186
> > > > > [2]
> > https://www.pcisecuritystandards.org/documents/PCI_DSS_v3-2-1.pdf
> > > > > [3]
> > > > >
> > > >
> >
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-43%3A+Cluster+snapshots#IEP-43:Clustersnapshots-Partitionscopystrategy
> > > > > [4]
> > > > >
> > > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase-3.Cachekeyrotation.-Copywithre-encryptiondesign
> > > > > .
> > > > > [5]
> > > > >
> > > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase-3.Cachekeyrotation.-Comparison
> > > > >
> > > >
> >
>

--

Best regards,
Alexei Scherbakov

Nikolay Izhikov-2

Re: [DISCUSS] Best way to re-encrypt existing data (TDE cache key rotation).

Hello, Alexei.

I think we want to implement this feature without nodes restart.
In the ideal scenario all nodes will stay alive and respond to the user requests.

> 24 мая 2020 г., в 15:24, Alexei Scherbakov <[hidden email]> написал(а):
>
> Pavel Pereslegin,
>
> I see another opportunity.
> We can use rebalancing to re-encrypt node data with a new key.
> It's a trivial procedure for me: stop a node, clear database, change a key,
> start node and wait for rebalancing to complete.
> Data will be re-encrypted during rebalancing.
>
> Did I miss something ?
>
> пт, 22 мая 2020 г. в 16:14, Ivan Rakov <[hidden email]>:
>
>> Folks,
>>
>> Just keeping you informed: I and my colleagues are highly interested in TDE
>> in general and keys rotations specifically, but we don't have enough time
>> so far.
>> We'll dive into this feature and participate in reviews next month.
>>
>> --
>> Best Regards,
>> Ivan Rakov
>>
>> On Sun, May 17, 2020 at 10:51 PM Pavel Pereslegin <[hidden email]>
>> wrote:
>>
>>> Hello, Alexey.
>>>
>>>> is the encryption key for the data the same on all nodes in the
>> cluster?
>>> Yes, each encrypted cache group has its own encryption key, the key is
>>> the same on all nodes.
>>>
>>>> Clearly, during the re-encryption there will exist pages
>>>> encrypted with both new and old keys at the same time.
>>> Yes, there will be pages encrypted with different keys at the same time.
>>> Currently, we only store one key for one cache group. To rotate a key,
>>> at a certain point in time it is necessary to support several keys (at
>>> least for reading the WAL).
>>> For the "in place" strategy, we'll store the encryption key identifier
>>> on each encrypted page (we currently have some unused space on
>>> encrypted page, so I don't expect any memory overhead here). Thus, we
>>> will have several keys for reading and one key for writing. I assume
>>> that the old key will be automatically deleted when a specific WAL
>>> segment is deleted (and re-encryption is finished).
>>>
>>>> Will a node continue to re-encrypt the data after it restarts?
>>> Yes.
>>>
>>>> If a node goes down during the re-encryption, but the rest of the
>>>> cluster finishes re-encryption, will we consider the procedure
>> complete?
>>> I'm not sure, but it looks like the key rotation is complete when we
>>> set the new key on all nodes so that the updates will be encrypted
>>> with the new key (as required by PCI DSS).
>>> Status of re-encryption can be obtained separately (locally or cluster
>>> wide).
>>>
>>> I forgot to mention that with “in place” re-encryption it will be
>>> impossible to quickly cancel re-encryption, because by canceling we
>>> mean re-encryption with the old key.
>>>
>>>> How do you see the whole key rotation procedure will work?
>>> Initial design for re-encryption with "partition copying" is described
>>> here [1]. I'll prepare detailed design for "in place" re-encryption if
>>> we'll go this way. In short, send the new encryption key cluster-wide,
>>> each node adds a new key and starts background re-encryption.
>>>
>>> [1]
>>>
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase-3.Cachekeyrotation.-Copywithre-encryptiondesign
>>> .
>>>
>>> вс, 17 мая 2020 г. в 18:35, Alexey Goncharuk <[hidden email]
>>> :
>>>>
>>>> Pavel, Anton,
>>>>
>>>> How do you see the whole key rotation procedure will work? Clearly,
>>> during
>>>> the re-encryption there will exist pages encrypted with both new and
>> old
>>>> keys at the same time. Will a node continue to re-encrypt the data
>> after
>>> it
>>>> restarts? If a node goes down during the re-encryption, but the rest of
>>> the
>>>> cluster finishes re-encryption, will we consider the procedure
>> complete?
>>> By
>>>> the way, is the encryption key for the data the same on all nodes in
>> the
>>>> cluster?
>>>>
>>>> чт, 14 мая 2020 г. в 11:30, Anton Vinogradov <[hidden email]>:
>>>>
>>>>> +1 to "In place re-encryption".
>>>>>
>>>>> - It has a simple design.
>>>>> - Clusters under load may require just load to re-encrypt the data.
>>>>> (Friendly to load).
>>>>> - Easy to throttle.
>>>>> - Easy to continue.
>>>>> - Design compatible with the multi-key architecture.
>>>>> - It can be optimized to use own WAL buffer and to re-encrypt pages
>>> without
>>>>> restoring them to on-heap.
>>>>>
>>>>> On Thu, May 14, 2020 at 1:54 AM Pavel Pereslegin <[hidden email]>
>>> wrote:
>>>>>
>>>>>> Hello Igniters.
>>>>>>
>>>>>> Recently, master key rotation for Apache Ignite Transparent Data
>>>>>> Encryption was implemented [1], but some security standards (PCI
>> DSS
>>>>>> at least) require rotation of all encryption keys [2]. Currently,
>>>>>> encryption occurs when reading/writing pages to disk, cache
>>> encryption
>>>>>> keys are stored in metastore.
>>>>>>
>>>>>> I'm going to contribute cache encryption key rotation and want to
>>>>>> consult what is the best way to re-encrypting existing data, I see
>>> two
>>>>>> different strategies.
>>>>>>
>>>>>> 1. In place re-encryption:
>>>>>> Using the old key, sequentially read all the pages from the
>>> datastore,
>>>>>> mark as dirty and log them into the WAL. After checkpoint pages
>> will
>>>>>> be stored to disk encrypted with the new key (as usual, along with
>>>>>> updates). This strategy requires store the identifier (number) of
>> the
>>>>>> encryption key into the encrypted page.
>>>>>> pros:
>>>>>> - can work in the background with minimal performance impact
>> (this
>>>>>> impact can be managed).
>>>>>> cons:
>>>>>> - page duplication in the WAL may affect performance and
>> historical
>>>>>> rebalance.
>>>>>>
>>>>>> 2. Copy partition with re-encryption.
>>>>>> This strategy is similar to partition snapshotting [3] - create
>>>>>> partition copy encrypted with the new key and then replace the
>>>>>> original partition file with the new one (see details [4]).
>>>>>> pros:
>>>>>> - should work faster than "in place" re-encryption.
>>>>>> cons:
>>>>>> - re-encryption in active cluster (and on unstable topology) can
>> be
>>>>>> difficult to implement.
>>>>>>
>>>>>> (See more detailed comparison [5])
>>>>>>
>>>>>> Re-encryption of existing data is a long and rare procedure (It is
>>>>>> recommended to change the key every 6 months, but at least once
>> every
>>>>>> 2 years). Thus, re-encryption can be implemented for maintenance
>> mode
>>>>>> (for example, on a stable topology in a read-only cluster) and in
>>> such
>>>>>> case the approach with partition copying seems simpler and faster.
>>>>>>
>>>>>> So, what do you think - do we need "online" re-encryption and which
>>> of
>>>>>> the proposed options is best suited for this?
>>>>>>
>>>>>> [1] https://issues.apache.org/jira/browse/IGNITE-12186
>>>>>> [2]
>>> https://www.pcisecuritystandards.org/documents/PCI_DSS_v3-2-1.pdf
>>>>>> [3]
>>>>>>
>>>>>
>>>
>> https://cwiki.apache.org/confluence/display/IGNITE/IEP-43%3A+Cluster+snapshots#IEP-43:Clustersnapshots-Partitionscopystrategy
>>>>>> [4]
>>>>>>
>>>>>
>>>
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase-3.Cachekeyrotation.-Copywithre-encryptiondesign
>>>>>> .
>>>>>> [5]
>>>>>>
>>>>>
>>>
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase-3.Cachekeyrotation.-Comparison
>>>>>>
>>>>>
>>>
>>
>
>
> --
>
> Best regards,
> Alexei Scherbakov

Alexei Scherbakov

Re: [DISCUSS] Best way to re-encrypt existing data (TDE cache key rotation).

Nikolay,

Can you explain why such restriction is necessary ?
Most likely having a currently re-encrypting node serving only demand
requests will have least preformance impact on a grid.

пн, 25 мая 2020 г. в 11:08, Nikolay Izhikov <[hidden email]>:

> Hello, Alexei.
>
> I think we want to implement this feature without nodes restart.
> In the ideal scenario all nodes will stay alive and respond to the user
> requests.
>
> > 24 мая 2020 г., в 15:24, Alexei Scherbakov <[hidden email]>
> написал(а):
> >
> > Pavel Pereslegin,
> >
> > I see another opportunity.
> > We can use rebalancing to re-encrypt node data with a new key.
> > It's a trivial procedure for me: stop a node, clear database, change a
> key,
> > start node and wait for rebalancing to complete.
> > Data will be re-encrypted during rebalancing.
> >
> > Did I miss something ?
> >
> > пт, 22 мая 2020 г. в 16:14, Ivan Rakov <[hidden email]>:
> >
> >> Folks,
> >>
> >> Just keeping you informed: I and my colleagues are highly interested in
> TDE
> >> in general and keys rotations specifically, but we don't have enough
> time
> >> so far.
> >> We'll dive into this feature and participate in reviews next month.
> >>
> >> --
> >> Best Regards,
> >> Ivan Rakov
> >>
> >> On Sun, May 17, 2020 at 10:51 PM Pavel Pereslegin <[hidden email]>
> >> wrote:
> >>
> >>> Hello, Alexey.
> >>>
> >>>> is the encryption key for the data the same on all nodes in the
> >> cluster?
> >>> Yes, each encrypted cache group has its own encryption key, the key is
> >>> the same on all nodes.
> >>>
> >>>> Clearly, during the re-encryption there will exist pages
> >>>> encrypted with both new and old keys at the same time.
> >>> Yes, there will be pages encrypted with different keys at the same
> time.
> >>> Currently, we only store one key for one cache group. To rotate a key,
> >>> at a certain point in time it is necessary to support several keys (at
> >>> least for reading the WAL).
> >>> For the "in place" strategy, we'll store the encryption key identifier
> >>> on each encrypted page (we currently have some unused space on
> >>> encrypted page, so I don't expect any memory overhead here). Thus, we
> >>> will have several keys for reading and one key for writing. I assume
> >>> that the old key will be automatically deleted when a specific WAL
> >>> segment is deleted (and re-encryption is finished).
> >>>
> >>>> Will a node continue to re-encrypt the data after it restarts?
> >>> Yes.
> >>>
> >>>> If a node goes down during the re-encryption, but the rest of the
> >>>> cluster finishes re-encryption, will we consider the procedure
> >> complete?
> >>> I'm not sure, but it looks like the key rotation is complete when we
> >>> set the new key on all nodes so that the updates will be encrypted
> >>> with the new key (as required by PCI DSS).
> >>> Status of re-encryption can be obtained separately (locally or cluster
> >>> wide).
> >>>
> >>> I forgot to mention that with “in place” re-encryption it will be
> >>> impossible to quickly cancel re-encryption, because by canceling we
> >>> mean re-encryption with the old key.
> >>>
> >>>> How do you see the whole key rotation procedure will work?
> >>> Initial design for re-encryption with "partition copying" is described
> >>> here [1]. I'll prepare detailed design for "in place" re-encryption if
> >>> we'll go this way. In short, send the new encryption key cluster-wide,
> >>> each node adds a new key and starts background re-encryption.
> >>>
> >>> [1]
> >>>
> >>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase-3.Cachekeyrotation.-Copywithre-encryptiondesign
> >>> .
> >>>
> >>> вс, 17 мая 2020 г. в 18:35, Alexey Goncharuk <
> [hidden email]
> >>> :
> >>>>
> >>>> Pavel, Anton,
> >>>>
> >>>> How do you see the whole key rotation procedure will work? Clearly,
> >>> during
> >>>> the re-encryption there will exist pages encrypted with both new and
> >> old
> >>>> keys at the same time. Will a node continue to re-encrypt the data
> >> after
> >>> it
> >>>> restarts? If a node goes down during the re-encryption, but the rest
> of
> >>> the
> >>>> cluster finishes re-encryption, will we consider the procedure
> >> complete?
> >>> By
> >>>> the way, is the encryption key for the data the same on all nodes in
> >> the
> >>>> cluster?
> >>>>
> >>>> чт, 14 мая 2020 г. в 11:30, Anton Vinogradov <[hidden email]>:
> >>>>
> >>>>> +1 to "In place re-encryption".
> >>>>>
> >>>>> - It has a simple design.
> >>>>> - Clusters under load may require just load to re-encrypt the data.
> >>>>> (Friendly to load).
> >>>>> - Easy to throttle.
> >>>>> - Easy to continue.
> >>>>> - Design compatible with the multi-key architecture.
> >>>>> - It can be optimized to use own WAL buffer and to re-encrypt pages
> >>> without
> >>>>> restoring them to on-heap.
> >>>>>
> >>>>> On Thu, May 14, 2020 at 1:54 AM Pavel Pereslegin <[hidden email]>
> >>> wrote:
> >>>>>
> >>>>>> Hello Igniters.
> >>>>>>
> >>>>>> Recently, master key rotation for Apache Ignite Transparent Data
> >>>>>> Encryption was implemented [1], but some security standards (PCI
> >> DSS
> >>>>>> at least) require rotation of all encryption keys [2]. Currently,
> >>>>>> encryption occurs when reading/writing pages to disk, cache
> >>> encryption
> >>>>>> keys are stored in metastore.
> >>>>>>
> >>>>>> I'm going to contribute cache encryption key rotation and want to
> >>>>>> consult what is the best way to re-encrypting existing data, I see
> >>> two
> >>>>>> different strategies.
> >>>>>>
> >>>>>> 1. In place re-encryption:
> >>>>>> Using the old key, sequentially read all the pages from the
> >>> datastore,
> >>>>>> mark as dirty and log them into the WAL. After checkpoint pages
> >> will
> >>>>>> be stored to disk encrypted with the new key (as usual, along with
> >>>>>> updates). This strategy requires store the identifier (number) of
> >> the
> >>>>>> encryption key into the encrypted page.
> >>>>>> pros:
> >>>>>> - can work in the background with minimal performance impact
> >> (this
> >>>>>> impact can be managed).
> >>>>>> cons:
> >>>>>> - page duplication in the WAL may affect performance and
> >> historical
> >>>>>> rebalance.
> >>>>>>
> >>>>>> 2. Copy partition with re-encryption.
> >>>>>> This strategy is similar to partition snapshotting [3] - create
> >>>>>> partition copy encrypted with the new key and then replace the
> >>>>>> original partition file with the new one (see details [4]).
> >>>>>> pros:
> >>>>>> - should work faster than "in place" re-encryption.
> >>>>>> cons:
> >>>>>> - re-encryption in active cluster (and on unstable topology) can
> >> be
> >>>>>> difficult to implement.
> >>>>>>
> >>>>>> (See more detailed comparison [5])
> >>>>>>
> >>>>>> Re-encryption of existing data is a long and rare procedure (It is
> >>>>>> recommended to change the key every 6 months, but at least once
> >> every
> >>>>>> 2 years). Thus, re-encryption can be implemented for maintenance
> >> mode
> >>>>>> (for example, on a stable topology in a read-only cluster) and in
> >>> such
> >>>>>> case the approach with partition copying seems simpler and faster.
> >>>>>>
> >>>>>> So, what do you think - do we need "online" re-encryption and which
> >>> of
> >>>>>> the proposed options is best suited for this?
> >>>>>>
> >>>>>> [1] https://issues.apache.org/jira/browse/IGNITE-12186
> >>>>>> [2]
> >>> https://www.pcisecuritystandards.org/documents/PCI_DSS_v3-2-1.pdf
> >>>>>> [3]
> >>>>>>
> >>>>>
> >>>
> >>
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-43%3A+Cluster+snapshots#IEP-43:Clustersnapshots-Partitionscopystrategy
> >>>>>> [4]
> >>>>>>
> >>>>>
> >>>
> >>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase-3.Cachekeyrotation.-Copywithre-encryptiondesign
> >>>>>> .
> >>>>>> [5]
> >>>>>>
> >>>>>
> >>>
> >>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase-3.Cachekeyrotation.-Comparison
> >>>>>>
> >>>>>
> >>>
> >>
> >
> >
> > --
> >
> > Best regards,
> > Alexei Scherbakov
>
>

--

Best regards,
Alexei Scherbakov

Alexei Scherbakov

Re: [DISCUSS] Best way to re-encrypt existing data (TDE cache key rotation).

I mean: serving supply requests.

пн, 25 мая 2020 г. в 11:15, Alexei Scherbakov <[hidden email]
>:

> Nikolay,
>
> Can you explain why such restriction is necessary ?
> Most likely having a currently re-encrypting node serving only demand
> requests will have least preformance impact on a grid.
>
> пн, 25 мая 2020 г. в 11:08, Nikolay Izhikov <[hidden email]>:
>
>> Hello, Alexei.
>>
>> I think we want to implement this feature without nodes restart.
>> In the ideal scenario all nodes will stay alive and respond to the user
>> requests.
>>
>> > 24 мая 2020 г., в 15:24, Alexei Scherbakov <
>> [hidden email]> написал(а):
>> >
>> > Pavel Pereslegin,
>> >
>> > I see another opportunity.
>> > We can use rebalancing to re-encrypt node data with a new key.
>> > It's a trivial procedure for me: stop a node, clear database, change a
>> key,
>> > start node and wait for rebalancing to complete.
>> > Data will be re-encrypted during rebalancing.
>> >
>> > Did I miss something ?
>> >
>> > пт, 22 мая 2020 г. в 16:14, Ivan Rakov <[hidden email]>:
>> >
>> >> Folks,
>> >>
>> >> Just keeping you informed: I and my colleagues are highly interested
>> in TDE
>> >> in general and keys rotations specifically, but we don't have enough
>> time
>> >> so far.
>> >> We'll dive into this feature and participate in reviews next month.
>> >>
>> >> --
>> >> Best Regards,
>> >> Ivan Rakov
>> >>
>> >> On Sun, May 17, 2020 at 10:51 PM Pavel Pereslegin <[hidden email]>
>> >> wrote:
>> >>
>> >>> Hello, Alexey.
>> >>>
>> >>>> is the encryption key for the data the same on all nodes in the
>> >> cluster?
>> >>> Yes, each encrypted cache group has its own encryption key, the key is
>> >>> the same on all nodes.
>> >>>
>> >>>> Clearly, during the re-encryption there will exist pages
>> >>>> encrypted with both new and old keys at the same time.
>> >>> Yes, there will be pages encrypted with different keys at the same
>> time.
>> >>> Currently, we only store one key for one cache group. To rotate a key,
>> >>> at a certain point in time it is necessary to support several keys (at
>> >>> least for reading the WAL).
>> >>> For the "in place" strategy, we'll store the encryption key identifier
>> >>> on each encrypted page (we currently have some unused space on
>> >>> encrypted page, so I don't expect any memory overhead here). Thus, we
>> >>> will have several keys for reading and one key for writing. I assume
>> >>> that the old key will be automatically deleted when a specific WAL
>> >>> segment is deleted (and re-encryption is finished).
>> >>>
>> >>>> Will a node continue to re-encrypt the data after it restarts?
>> >>> Yes.
>> >>>
>> >>>> If a node goes down during the re-encryption, but the rest of the
>> >>>> cluster finishes re-encryption, will we consider the procedure
>> >> complete?
>> >>> I'm not sure, but it looks like the key rotation is complete when we
>> >>> set the new key on all nodes so that the updates will be encrypted
>> >>> with the new key (as required by PCI DSS).
>> >>> Status of re-encryption can be obtained separately (locally or cluster
>> >>> wide).
>> >>>
>> >>> I forgot to mention that with “in place” re-encryption it will be
>> >>> impossible to quickly cancel re-encryption, because by canceling we
>> >>> mean re-encryption with the old key.
>> >>>
>> >>>> How do you see the whole key rotation procedure will work?
>> >>> Initial design for re-encryption with "partition copying" is described
>> >>> here [1]. I'll prepare detailed design for "in place" re-encryption if
>> >>> we'll go this way. In short, send the new encryption key cluster-wide,
>> >>> each node adds a new key and starts background re-encryption.
>> >>>
>> >>> [1]
>> >>>
>> >>
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase-3.Cachekeyrotation.-Copywithre-encryptiondesign
>> >>> .
>> >>>
>> >>> вс, 17 мая 2020 г. в 18:35, Alexey Goncharuk <
>> [hidden email]
>> >>> :
>> >>>>
>> >>>> Pavel, Anton,
>> >>>>
>> >>>> How do you see the whole key rotation procedure will work? Clearly,
>> >>> during
>> >>>> the re-encryption there will exist pages encrypted with both new and
>> >> old
>> >>>> keys at the same time. Will a node continue to re-encrypt the data
>> >> after
>> >>> it
>> >>>> restarts? If a node goes down during the re-encryption, but the rest
>> of
>> >>> the
>> >>>> cluster finishes re-encryption, will we consider the procedure
>> >> complete?
>> >>> By
>> >>>> the way, is the encryption key for the data the same on all nodes in
>> >> the
>> >>>> cluster?
>> >>>>
>> >>>> чт, 14 мая 2020 г. в 11:30, Anton Vinogradov <[hidden email]>:
>> >>>>
>> >>>>> +1 to "In place re-encryption".
>> >>>>>
>> >>>>> - It has a simple design.
>> >>>>> - Clusters under load may require just load to re-encrypt the data.
>> >>>>> (Friendly to load).
>> >>>>> - Easy to throttle.
>> >>>>> - Easy to continue.
>> >>>>> - Design compatible with the multi-key architecture.
>> >>>>> - It can be optimized to use own WAL buffer and to re-encrypt pages
>> >>> without
>> >>>>> restoring them to on-heap.
>> >>>>>
>> >>>>> On Thu, May 14, 2020 at 1:54 AM Pavel Pereslegin <[hidden email]>
>> >>> wrote:
>> >>>>>
>> >>>>>> Hello Igniters.
>> >>>>>>
>> >>>>>> Recently, master key rotation for Apache Ignite Transparent Data
>> >>>>>> Encryption was implemented [1], but some security standards (PCI
>> >> DSS
>> >>>>>> at least) require rotation of all encryption keys [2]. Currently,
>> >>>>>> encryption occurs when reading/writing pages to disk, cache
>> >>> encryption
>> >>>>>> keys are stored in metastore.
>> >>>>>>
>> >>>>>> I'm going to contribute cache encryption key rotation and want to
>> >>>>>> consult what is the best way to re-encrypting existing data, I see
>> >>> two
>> >>>>>> different strategies.
>> >>>>>>
>> >>>>>> 1. In place re-encryption:
>> >>>>>> Using the old key, sequentially read all the pages from the
>> >>> datastore,
>> >>>>>> mark as dirty and log them into the WAL. After checkpoint pages
>> >> will
>> >>>>>> be stored to disk encrypted with the new key (as usual, along with
>> >>>>>> updates). This strategy requires store the identifier (number) of
>> >> the
>> >>>>>> encryption key into the encrypted page.
>> >>>>>> pros:
>> >>>>>> - can work in the background with minimal performance impact
>> >> (this
>> >>>>>> impact can be managed).
>> >>>>>> cons:
>> >>>>>> - page duplication in the WAL may affect performance and
>> >> historical
>> >>>>>> rebalance.
>> >>>>>>
>> >>>>>> 2. Copy partition with re-encryption.
>> >>>>>> This strategy is similar to partition snapshotting [3] - create
>> >>>>>> partition copy encrypted with the new key and then replace the
>> >>>>>> original partition file with the new one (see details [4]).
>> >>>>>> pros:
>> >>>>>> - should work faster than "in place" re-encryption.
>> >>>>>> cons:
>> >>>>>> - re-encryption in active cluster (and on unstable topology) can
>> >> be
>> >>>>>> difficult to implement.
>> >>>>>>
>> >>>>>> (See more detailed comparison [5])
>> >>>>>>
>> >>>>>> Re-encryption of existing data is a long and rare procedure (It is
>> >>>>>> recommended to change the key every 6 months, but at least once
>> >> every
>> >>>>>> 2 years). Thus, re-encryption can be implemented for maintenance
>> >> mode
>> >>>>>> (for example, on a stable topology in a read-only cluster) and in
>> >>> such
>> >>>>>> case the approach with partition copying seems simpler and faster.
>> >>>>>>
>> >>>>>> So, what do you think - do we need "online" re-encryption and which
>> >>> of
>> >>>>>> the proposed options is best suited for this?
>> >>>>>>
>> >>>>>> [1] https://issues.apache.org/jira/browse/IGNITE-12186
>> >>>>>> [2]
>> >>> https://www.pcisecuritystandards.org/documents/PCI_DSS_v3-2-1.pdf
>> >>>>>> [3]
>> >>>>>>
>> >>>>>
>> >>>
>> >>
>> https://cwiki.apache.org/confluence/display/IGNITE/IEP-43%3A+Cluster+snapshots#IEP-43:Clustersnapshots-Partitionscopystrategy
>> >>>>>> [4]
>> >>>>>>
>> >>>>>
>> >>>
>> >>
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase-3.Cachekeyrotation.-Copywithre-encryptiondesign
>> >>>>>> .
>> >>>>>> [5]
>> >>>>>>
>> >>>>>
>> >>>
>> >>
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase-3.Cachekeyrotation.-Comparison
>> >>>>>>
>> >>>>>
>> >>>
>> >>
>> >
>> >
>> > --
>> >
>> > Best regards,
>> > Alexei Scherbakov
>>
>>
>
> --
>
> Best regards,
> Alexei Scherbakov
>

--

Best regards,
Alexei Scherbakov

Alexei Scherbakov

Re: [DISCUSS] Best way to re-encrypt existing data (TDE cache key rotation).

And definitely this approach is much simplier to implement because all
corner cases are handled by rebalancing code.

пн, 25 мая 2020 г. в 11:16, Alexei Scherbakov <[hidden email]
>:

> I mean: serving supply requests.
>
> пн, 25 мая 2020 г. в 11:15, Alexei Scherbakov <
> [hidden email]>:
>
>> Nikolay,
>>
>> Can you explain why such restriction is necessary ?
>> Most likely having a currently re-encrypting node serving only demand
>> requests will have least preformance impact on a grid.
>>
>> пн, 25 мая 2020 г. в 11:08, Nikolay Izhikov <[hidden email]>:
>>
>>> Hello, Alexei.
>>>
>>> I think we want to implement this feature without nodes restart.
>>> In the ideal scenario all nodes will stay alive and respond to the user
>>> requests.
>>>
>>> > 24 мая 2020 г., в 15:24, Alexei Scherbakov <
>>> [hidden email]> написал(а):
>>> >
>>> > Pavel Pereslegin,
>>> >
>>> > I see another opportunity.
>>> > We can use rebalancing to re-encrypt node data with a new key.
>>> > It's a trivial procedure for me: stop a node, clear database, change a
>>> key,
>>> > start node and wait for rebalancing to complete.
>>> > Data will be re-encrypted during rebalancing.
>>> >
>>> > Did I miss something ?
>>> >
>>> > пт, 22 мая 2020 г. в 16:14, Ivan Rakov <[hidden email]>:
>>> >
>>> >> Folks,
>>> >>
>>> >> Just keeping you informed: I and my colleagues are highly interested
>>> in TDE
>>> >> in general and keys rotations specifically, but we don't have enough
>>> time
>>> >> so far.
>>> >> We'll dive into this feature and participate in reviews next month.
>>> >>
>>> >> --
>>> >> Best Regards,
>>> >> Ivan Rakov
>>> >>
>>> >> On Sun, May 17, 2020 at 10:51 PM Pavel Pereslegin <[hidden email]>
>>> >> wrote:
>>> >>
>>> >>> Hello, Alexey.
>>> >>>
>>> >>>> is the encryption key for the data the same on all nodes in the
>>> >> cluster?
>>> >>> Yes, each encrypted cache group has its own encryption key, the key
>>> is
>>> >>> the same on all nodes.
>>> >>>
>>> >>>> Clearly, during the re-encryption there will exist pages
>>> >>>> encrypted with both new and old keys at the same time.
>>> >>> Yes, there will be pages encrypted with different keys at the same
>>> time.
>>> >>> Currently, we only store one key for one cache group. To rotate a
>>> key,
>>> >>> at a certain point in time it is necessary to support several keys
>>> (at
>>> >>> least for reading the WAL).
>>> >>> For the "in place" strategy, we'll store the encryption key
>>> identifier
>>> >>> on each encrypted page (we currently have some unused space on
>>> >>> encrypted page, so I don't expect any memory overhead here). Thus, we
>>> >>> will have several keys for reading and one key for writing. I assume
>>> >>> that the old key will be automatically deleted when a specific WAL
>>> >>> segment is deleted (and re-encryption is finished).
>>> >>>
>>> >>>> Will a node continue to re-encrypt the data after it restarts?
>>> >>> Yes.
>>> >>>
>>> >>>> If a node goes down during the re-encryption, but the rest of the
>>> >>>> cluster finishes re-encryption, will we consider the procedure
>>> >> complete?
>>> >>> I'm not sure, but it looks like the key rotation is complete when we
>>> >>> set the new key on all nodes so that the updates will be encrypted
>>> >>> with the new key (as required by PCI DSS).
>>> >>> Status of re-encryption can be obtained separately (locally or
>>> cluster
>>> >>> wide).
>>> >>>
>>> >>> I forgot to mention that with “in place” re-encryption it will be
>>> >>> impossible to quickly cancel re-encryption, because by canceling we
>>> >>> mean re-encryption with the old key.
>>> >>>
>>> >>>> How do you see the whole key rotation procedure will work?
>>> >>> Initial design for re-encryption with "partition copying" is
>>> described
>>> >>> here [1]. I'll prepare detailed design for "in place" re-encryption
>>> if
>>> >>> we'll go this way. In short, send the new encryption key
>>> cluster-wide,
>>> >>> each node adds a new key and starts background re-encryption.
>>> >>>
>>> >>> [1]
>>> >>>
>>> >>
>>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase-3.Cachekeyrotation.-Copywithre-encryptiondesign
>>> >>> .
>>> >>>
>>> >>> вс, 17 мая 2020 г. в 18:35, Alexey Goncharuk <
>>> [hidden email]
>>> >>> :
>>> >>>>
>>> >>>> Pavel, Anton,
>>> >>>>
>>> >>>> How do you see the whole key rotation procedure will work? Clearly,
>>> >>> during
>>> >>>> the re-encryption there will exist pages encrypted with both new and
>>> >> old
>>> >>>> keys at the same time. Will a node continue to re-encrypt the data
>>> >> after
>>> >>> it
>>> >>>> restarts? If a node goes down during the re-encryption, but the
>>> rest of
>>> >>> the
>>> >>>> cluster finishes re-encryption, will we consider the procedure
>>> >> complete?
>>> >>> By
>>> >>>> the way, is the encryption key for the data the same on all nodes in
>>> >> the
>>> >>>> cluster?
>>> >>>>
>>> >>>> чт, 14 мая 2020 г. в 11:30, Anton Vinogradov <[hidden email]>:
>>> >>>>
>>> >>>>> +1 to "In place re-encryption".
>>> >>>>>
>>> >>>>> - It has a simple design.
>>> >>>>> - Clusters under load may require just load to re-encrypt the data.
>>> >>>>> (Friendly to load).
>>> >>>>> - Easy to throttle.
>>> >>>>> - Easy to continue.
>>> >>>>> - Design compatible with the multi-key architecture.
>>> >>>>> - It can be optimized to use own WAL buffer and to re-encrypt pages
>>> >>> without
>>> >>>>> restoring them to on-heap.
>>> >>>>>
>>> >>>>> On Thu, May 14, 2020 at 1:54 AM Pavel Pereslegin <[hidden email]
>>> >
>>> >>> wrote:
>>> >>>>>
>>> >>>>>> Hello Igniters.
>>> >>>>>>
>>> >>>>>> Recently, master key rotation for Apache Ignite Transparent Data
>>> >>>>>> Encryption was implemented [1], but some security standards (PCI
>>> >> DSS
>>> >>>>>> at least) require rotation of all encryption keys [2]. Currently,
>>> >>>>>> encryption occurs when reading/writing pages to disk, cache
>>> >>> encryption
>>> >>>>>> keys are stored in metastore.
>>> >>>>>>
>>> >>>>>> I'm going to contribute cache encryption key rotation and want to
>>> >>>>>> consult what is the best way to re-encrypting existing data, I see
>>> >>> two
>>> >>>>>> different strategies.
>>> >>>>>>
>>> >>>>>> 1. In place re-encryption:
>>> >>>>>> Using the old key, sequentially read all the pages from the
>>> >>> datastore,
>>> >>>>>> mark as dirty and log them into the WAL. After checkpoint pages
>>> >> will
>>> >>>>>> be stored to disk encrypted with the new key (as usual, along with
>>> >>>>>> updates). This strategy requires store the identifier (number) of
>>> >> the
>>> >>>>>> encryption key into the encrypted page.
>>> >>>>>> pros:
>>> >>>>>> - can work in the background with minimal performance impact
>>> >> (this
>>> >>>>>> impact can be managed).
>>> >>>>>> cons:
>>> >>>>>> - page duplication in the WAL may affect performance and
>>> >> historical
>>> >>>>>> rebalance.
>>> >>>>>>
>>> >>>>>> 2. Copy partition with re-encryption.
>>> >>>>>> This strategy is similar to partition snapshotting [3] - create
>>> >>>>>> partition copy encrypted with the new key and then replace the
>>> >>>>>> original partition file with the new one (see details [4]).
>>> >>>>>> pros:
>>> >>>>>> - should work faster than "in place" re-encryption.
>>> >>>>>> cons:
>>> >>>>>> - re-encryption in active cluster (and on unstable topology) can
>>> >> be
>>> >>>>>> difficult to implement.
>>> >>>>>>
>>> >>>>>> (See more detailed comparison [5])
>>> >>>>>>
>>> >>>>>> Re-encryption of existing data is a long and rare procedure (It is
>>> >>>>>> recommended to change the key every 6 months, but at least once
>>> >> every
>>> >>>>>> 2 years). Thus, re-encryption can be implemented for maintenance
>>> >> mode
>>> >>>>>> (for example, on a stable topology in a read-only cluster) and in
>>> >>> such
>>> >>>>>> case the approach with partition copying seems simpler and faster.
>>> >>>>>>
>>> >>>>>> So, what do you think - do we need "online" re-encryption and
>>> which
>>> >>> of
>>> >>>>>> the proposed options is best suited for this?
>>> >>>>>>
>>> >>>>>> [1] https://issues.apache.org/jira/browse/IGNITE-12186
>>> >>>>>> [2]
>>> >>> https://www.pcisecuritystandards.org/documents/PCI_DSS_v3-2-1.pdf
>>> >>>>>> [3]
>>> >>>>>>
>>> >>>>>
>>> >>>
>>> >>
>>> https://cwiki.apache.org/confluence/display/IGNITE/IEP-43%3A+Cluster+snapshots#IEP-43:Clustersnapshots-Partitionscopystrategy
>>> >>>>>> [4]
>>> >>>>>>
>>> >>>>>
>>> >>>
>>> >>
>>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase-3.Cachekeyrotation.-Copywithre-encryptiondesign
>>> >>>>>> .
>>> >>>>>> [5]
>>> >>>>>>
>>> >>>>>
>>> >>>
>>> >>
>>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase-3.Cachekeyrotation.-Comparison
>>> >>>>>>
>>> >>>>>
>>> >>>
>>> >>
>>> >
>>> >
>>> > --
>>> >
>>> > Best regards,
>>> > Alexei Scherbakov
>>>
>>>
>>
>> --
>>
>> Best regards,
>> Alexei Scherbakov
>>
>
>
> --
>
> Best regards,
> Alexei Scherbakov
>

--

Best regards,
Alexei Scherbakov

Nikolay Izhikov-2

Re: [DISCUSS] Best way to re-encrypt existing data (TDE cache key rotation).

In reply to this post by Alexei Scherbakov

> Can you explain why such restriction is necessary ?

Reencryption should have a minimum impact on the cluster.

> Most likely having a currently re-encrypting node serving only demand requests will have least preformance impact on a grid.

Current design assumes that reencryption will started on all noes simultaneously.

Makes sense?

> 25 мая 2020 г., в 11:16, Alexei Scherbakov <[hidden email]> написал(а):
>
> I mean: serving supply requests.
>
> пн, 25 мая 2020 г. в 11:15, Alexei Scherbakov <[hidden email]
>> :
>
>> Nikolay,
>>
>> Can you explain why such restriction is necessary ?
>> Most likely having a currently re-encrypting node serving only demand
>> requests will have least preformance impact on a grid.
>>
>> пн, 25 мая 2020 г. в 11:08, Nikolay Izhikov <[hidden email]>:
>>
>>> Hello, Alexei.
>>>
>>> I think we want to implement this feature without nodes restart.
>>> In the ideal scenario all nodes will stay alive and respond to the user
>>> requests.
>>>
>>>> 24 мая 2020 г., в 15:24, Alexei Scherbakov <
>>> [hidden email]> написал(а):
>>>>
>>>> Pavel Pereslegin,
>>>>
>>>> I see another opportunity.
>>>> We can use rebalancing to re-encrypt node data with a new key.
>>>> It's a trivial procedure for me: stop a node, clear database, change a
>>> key,
>>>> start node and wait for rebalancing to complete.
>>>> Data will be re-encrypted during rebalancing.
>>>>
>>>> Did I miss something ?
>>>>
>>>> пт, 22 мая 2020 г. в 16:14, Ivan Rakov <[hidden email]>:
>>>>
>>>>> Folks,
>>>>>
>>>>> Just keeping you informed: I and my colleagues are highly interested
>>> in TDE
>>>>> in general and keys rotations specifically, but we don't have enough
>>> time
>>>>> so far.
>>>>> We'll dive into this feature and participate in reviews next month.
>>>>>
>>>>> --
>>>>> Best Regards,
>>>>> Ivan Rakov
>>>>>
>>>>> On Sun, May 17, 2020 at 10:51 PM Pavel Pereslegin <[hidden email]>
>>>>> wrote:
>>>>>
>>>>>> Hello, Alexey.
>>>>>>
>>>>>>> is the encryption key for the data the same on all nodes in the
>>>>> cluster?
>>>>>> Yes, each encrypted cache group has its own encryption key, the key is
>>>>>> the same on all nodes.
>>>>>>
>>>>>>> Clearly, during the re-encryption there will exist pages
>>>>>>> encrypted with both new and old keys at the same time.
>>>>>> Yes, there will be pages encrypted with different keys at the same
>>> time.
>>>>>> Currently, we only store one key for one cache group. To rotate a key,
>>>>>> at a certain point in time it is necessary to support several keys (at
>>>>>> least for reading the WAL).
>>>>>> For the "in place" strategy, we'll store the encryption key identifier
>>>>>> on each encrypted page (we currently have some unused space on
>>>>>> encrypted page, so I don't expect any memory overhead here). Thus, we
>>>>>> will have several keys for reading and one key for writing. I assume
>>>>>> that the old key will be automatically deleted when a specific WAL
>>>>>> segment is deleted (and re-encryption is finished).
>>>>>>
>>>>>>> Will a node continue to re-encrypt the data after it restarts?
>>>>>> Yes.
>>>>>>
>>>>>>> If a node goes down during the re-encryption, but the rest of the
>>>>>>> cluster finishes re-encryption, will we consider the procedure
>>>>> complete?
>>>>>> I'm not sure, but it looks like the key rotation is complete when we
>>>>>> set the new key on all nodes so that the updates will be encrypted
>>>>>> with the new key (as required by PCI DSS).
>>>>>> Status of re-encryption can be obtained separately (locally or cluster
>>>>>> wide).
>>>>>>
>>>>>> I forgot to mention that with “in place” re-encryption it will be
>>>>>> impossible to quickly cancel re-encryption, because by canceling we
>>>>>> mean re-encryption with the old key.
>>>>>>
>>>>>>> How do you see the whole key rotation procedure will work?
>>>>>> Initial design for re-encryption with "partition copying" is described
>>>>>> here [1]. I'll prepare detailed design for "in place" re-encryption if
>>>>>> we'll go this way. In short, send the new encryption key cluster-wide,
>>>>>> each node adds a new key and starts background re-encryption.
>>>>>>
>>>>>> [1]
>>>>>>
>>>>>
>>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase-3.Cachekeyrotation.-Copywithre-encryptiondesign
>>>>>> .
>>>>>>
>>>>>> вс, 17 мая 2020 г. в 18:35, Alexey Goncharuk <
>>> [hidden email]
>>>>>> :
>>>>>>>
>>>>>>> Pavel, Anton,
>>>>>>>
>>>>>>> How do you see the whole key rotation procedure will work? Clearly,
>>>>>> during
>>>>>>> the re-encryption there will exist pages encrypted with both new and
>>>>> old
>>>>>>> keys at the same time. Will a node continue to re-encrypt the data
>>>>> after
>>>>>> it
>>>>>>> restarts? If a node goes down during the re-encryption, but the rest
>>> of
>>>>>> the
>>>>>>> cluster finishes re-encryption, will we consider the procedure
>>>>> complete?
>>>>>> By
>>>>>>> the way, is the encryption key for the data the same on all nodes in
>>>>> the
>>>>>>> cluster?
>>>>>>>
>>>>>>> чт, 14 мая 2020 г. в 11:30, Anton Vinogradov <[hidden email]>:
>>>>>>>
>>>>>>>> +1 to "In place re-encryption".
>>>>>>>>
>>>>>>>> - It has a simple design.
>>>>>>>> - Clusters under load may require just load to re-encrypt the data.
>>>>>>>> (Friendly to load).
>>>>>>>> - Easy to throttle.
>>>>>>>> - Easy to continue.
>>>>>>>> - Design compatible with the multi-key architecture.
>>>>>>>> - It can be optimized to use own WAL buffer and to re-encrypt pages
>>>>>> without
>>>>>>>> restoring them to on-heap.
>>>>>>>>
>>>>>>>> On Thu, May 14, 2020 at 1:54 AM Pavel Pereslegin <[hidden email]>
>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hello Igniters.
>>>>>>>>>
>>>>>>>>> Recently, master key rotation for Apache Ignite Transparent Data
>>>>>>>>> Encryption was implemented [1], but some security standards (PCI
>>>>> DSS
>>>>>>>>> at least) require rotation of all encryption keys [2]. Currently,
>>>>>>>>> encryption occurs when reading/writing pages to disk, cache
>>>>>> encryption
>>>>>>>>> keys are stored in metastore.
>>>>>>>>>
>>>>>>>>> I'm going to contribute cache encryption key rotation and want to
>>>>>>>>> consult what is the best way to re-encrypting existing data, I see
>>>>>> two
>>>>>>>>> different strategies.
>>>>>>>>>
>>>>>>>>> 1. In place re-encryption:
>>>>>>>>> Using the old key, sequentially read all the pages from the
>>>>>> datastore,
>>>>>>>>> mark as dirty and log them into the WAL. After checkpoint pages
>>>>> will
>>>>>>>>> be stored to disk encrypted with the new key (as usual, along with
>>>>>>>>> updates). This strategy requires store the identifier (number) of
>>>>> the
>>>>>>>>> encryption key into the encrypted page.
>>>>>>>>> pros:
>>>>>>>>> - can work in the background with minimal performance impact
>>>>> (this
>>>>>>>>> impact can be managed).
>>>>>>>>> cons:
>>>>>>>>> - page duplication in the WAL may affect performance and
>>>>> historical
>>>>>>>>> rebalance.
>>>>>>>>>
>>>>>>>>> 2. Copy partition with re-encryption.
>>>>>>>>> This strategy is similar to partition snapshotting [3] - create
>>>>>>>>> partition copy encrypted with the new key and then replace the
>>>>>>>>> original partition file with the new one (see details [4]).
>>>>>>>>> pros:
>>>>>>>>> - should work faster than "in place" re-encryption.
>>>>>>>>> cons:
>>>>>>>>> - re-encryption in active cluster (and on unstable topology) can
>>>>> be
>>>>>>>>> difficult to implement.
>>>>>>>>>
>>>>>>>>> (See more detailed comparison [5])
>>>>>>>>>
>>>>>>>>> Re-encryption of existing data is a long and rare procedure (It is
>>>>>>>>> recommended to change the key every 6 months, but at least once
>>>>> every
>>>>>>>>> 2 years). Thus, re-encryption can be implemented for maintenance
>>>>> mode
>>>>>>>>> (for example, on a stable topology in a read-only cluster) and in
>>>>>> such
>>>>>>>>> case the approach with partition copying seems simpler and faster.
>>>>>>>>>
>>>>>>>>> So, what do you think - do we need "online" re-encryption and which
>>>>>> of
>>>>>>>>> the proposed options is best suited for this?
>>>>>>>>>
>>>>>>>>> [1] https://issues.apache.org/jira/browse/IGNITE-12186
>>>>>>>>> [2]
>>>>>> https://www.pcisecuritystandards.org/documents/PCI_DSS_v3-2-1.pdf
>>>>>>>>> [3]
>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>
>>> https://cwiki.apache.org/confluence/display/IGNITE/IEP-43%3A+Cluster+snapshots#IEP-43:Clustersnapshots-Partitionscopystrategy
>>>>>>>>> [4]
>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>
>>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase-3.Cachekeyrotation.-Copywithre-encryptiondesign
>>>>>>>>> .
>>>>>>>>> [5]
>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>
>>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase-3.Cachekeyrotation.-Comparison
>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Best regards,
>>>> Alexei Scherbakov
>>>
>>>
>>
>> --
>>
>> Best regards,
>> Alexei Scherbakov
>>
>
>
> --
>
> Best regards,
> Alexei Scherbakov

Nikolay Izhikov-2

Re: [DISCUSS] Best way to re-encrypt existing data (TDE cache key rotation).

In reply to this post by Alexei Scherbakov

> And definitely this approach is much simplier to implement

I agree.

If we allow to made nodes offline for reencryption then we can implement a fully offline procedure:

1. Stop node.
2. Execute some control.sh command that will reencrypt all data without starting node
3. Start node.

Pavel, can you, please, write it one more time - what disadvantages in offline procedure?

> 25 мая 2020 г., в 11:20, Alexei Scherbakov <[hidden email]> написал(а):
>
> And definitely this approach is much simplier to implement because all
> corner cases are handled by rebalancing code.
>
> пн, 25 мая 2020 г. в 11:16, Alexei Scherbakov <[hidden email]
>> :
>
>> I mean: serving supply requests.
>>
>> пн, 25 мая 2020 г. в 11:15, Alexei Scherbakov <
>> [hidden email]>:
>>
>>> Nikolay,
>>>
>>> Can you explain why such restriction is necessary ?
>>> Most likely having a currently re-encrypting node serving only demand
>>> requests will have least preformance impact on a grid.
>>>
>>> пн, 25 мая 2020 г. в 11:08, Nikolay Izhikov <[hidden email]>:
>>>
>>>> Hello, Alexei.
>>>>
>>>> I think we want to implement this feature without nodes restart.
>>>> In the ideal scenario all nodes will stay alive and respond to the user
>>>> requests.
>>>>
>>>>> 24 мая 2020 г., в 15:24, Alexei Scherbakov <
>>>> [hidden email]> написал(а):
>>>>>
>>>>> Pavel Pereslegin,
>>>>>
>>>>> I see another opportunity.
>>>>> We can use rebalancing to re-encrypt node data with a new key.
>>>>> It's a trivial procedure for me: stop a node, clear database, change a
>>>> key,
>>>>> start node and wait for rebalancing to complete.
>>>>> Data will be re-encrypted during rebalancing.
>>>>>
>>>>> Did I miss something ?
>>>>>
>>>>> пт, 22 мая 2020 г. в 16:14, Ivan Rakov <[hidden email]>:
>>>>>
>>>>>> Folks,
>>>>>>
>>>>>> Just keeping you informed: I and my colleagues are highly interested
>>>> in TDE
>>>>>> in general and keys rotations specifically, but we don't have enough
>>>> time
>>>>>> so far.
>>>>>> We'll dive into this feature and participate in reviews next month.
>>>>>>
>>>>>> --
>>>>>> Best Regards,
>>>>>> Ivan Rakov
>>>>>>
>>>>>> On Sun, May 17, 2020 at 10:51 PM Pavel Pereslegin <[hidden email]>
>>>>>> wrote:
>>>>>>
>>>>>>> Hello, Alexey.
>>>>>>>
>>>>>>>> is the encryption key for the data the same on all nodes in the
>>>>>> cluster?
>>>>>>> Yes, each encrypted cache group has its own encryption key, the key
>>>> is
>>>>>>> the same on all nodes.
>>>>>>>
>>>>>>>> Clearly, during the re-encryption there will exist pages
>>>>>>>> encrypted with both new and old keys at the same time.
>>>>>>> Yes, there will be pages encrypted with different keys at the same
>>>> time.
>>>>>>> Currently, we only store one key for one cache group. To rotate a
>>>> key,
>>>>>>> at a certain point in time it is necessary to support several keys
>>>> (at
>>>>>>> least for reading the WAL).
>>>>>>> For the "in place" strategy, we'll store the encryption key
>>>> identifier
>>>>>>> on each encrypted page (we currently have some unused space on
>>>>>>> encrypted page, so I don't expect any memory overhead here). Thus, we
>>>>>>> will have several keys for reading and one key for writing. I assume
>>>>>>> that the old key will be automatically deleted when a specific WAL
>>>>>>> segment is deleted (and re-encryption is finished).
>>>>>>>
>>>>>>>> Will a node continue to re-encrypt the data after it restarts?
>>>>>>> Yes.
>>>>>>>
>>>>>>>> If a node goes down during the re-encryption, but the rest of the
>>>>>>>> cluster finishes re-encryption, will we consider the procedure
>>>>>> complete?
>>>>>>> I'm not sure, but it looks like the key rotation is complete when we
>>>>>>> set the new key on all nodes so that the updates will be encrypted
>>>>>>> with the new key (as required by PCI DSS).
>>>>>>> Status of re-encryption can be obtained separately (locally or
>>>> cluster
>>>>>>> wide).
>>>>>>>
>>>>>>> I forgot to mention that with “in place” re-encryption it will be
>>>>>>> impossible to quickly cancel re-encryption, because by canceling we
>>>>>>> mean re-encryption with the old key.
>>>>>>>
>>>>>>>> How do you see the whole key rotation procedure will work?
>>>>>>> Initial design for re-encryption with "partition copying" is
>>>> described
>>>>>>> here [1]. I'll prepare detailed design for "in place" re-encryption
>>>> if
>>>>>>> we'll go this way. In short, send the new encryption key
>>>> cluster-wide,
>>>>>>> each node adds a new key and starts background re-encryption.
>>>>>>>
>>>>>>> [1]
>>>>>>>
>>>>>>
>>>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase-3.Cachekeyrotation.-Copywithre-encryptiondesign
>>>>>>> .
>>>>>>>
>>>>>>> вс, 17 мая 2020 г. в 18:35, Alexey Goncharuk <
>>>> [hidden email]
>>>>>>> :
>>>>>>>>
>>>>>>>> Pavel, Anton,
>>>>>>>>
>>>>>>>> How do you see the whole key rotation procedure will work? Clearly,
>>>>>>> during
>>>>>>>> the re-encryption there will exist pages encrypted with both new and
>>>>>> old
>>>>>>>> keys at the same time. Will a node continue to re-encrypt the data
>>>>>> after
>>>>>>> it
>>>>>>>> restarts? If a node goes down during the re-encryption, but the
>>>> rest of
>>>>>>> the
>>>>>>>> cluster finishes re-encryption, will we consider the procedure
>>>>>> complete?
>>>>>>> By
>>>>>>>> the way, is the encryption key for the data the same on all nodes in
>>>>>> the
>>>>>>>> cluster?
>>>>>>>>
>>>>>>>> чт, 14 мая 2020 г. в 11:30, Anton Vinogradov <[hidden email]>:
>>>>>>>>
>>>>>>>>> +1 to "In place re-encryption".
>>>>>>>>>
>>>>>>>>> - It has a simple design.
>>>>>>>>> - Clusters under load may require just load to re-encrypt the data.
>>>>>>>>> (Friendly to load).
>>>>>>>>> - Easy to throttle.
>>>>>>>>> - Easy to continue.
>>>>>>>>> - Design compatible with the multi-key architecture.
>>>>>>>>> - It can be optimized to use own WAL buffer and to re-encrypt pages
>>>>>>> without
>>>>>>>>> restoring them to on-heap.
>>>>>>>>>
>>>>>>>>> On Thu, May 14, 2020 at 1:54 AM Pavel Pereslegin <[hidden email]
>>>>>
>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hello Igniters.
>>>>>>>>>>
>>>>>>>>>> Recently, master key rotation for Apache Ignite Transparent Data
>>>>>>>>>> Encryption was implemented [1], but some security standards (PCI
>>>>>> DSS
>>>>>>>>>> at least) require rotation of all encryption keys [2]. Currently,
>>>>>>>>>> encryption occurs when reading/writing pages to disk, cache
>>>>>>> encryption
>>>>>>>>>> keys are stored in metastore.
>>>>>>>>>>
>>>>>>>>>> I'm going to contribute cache encryption key rotation and want to
>>>>>>>>>> consult what is the best way to re-encrypting existing data, I see
>>>>>>> two
>>>>>>>>>> different strategies.
>>>>>>>>>>
>>>>>>>>>> 1. In place re-encryption:
>>>>>>>>>> Using the old key, sequentially read all the pages from the
>>>>>>> datastore,
>>>>>>>>>> mark as dirty and log them into the WAL. After checkpoint pages
>>>>>> will
>>>>>>>>>> be stored to disk encrypted with the new key (as usual, along with
>>>>>>>>>> updates). This strategy requires store the identifier (number) of
>>>>>> the
>>>>>>>>>> encryption key into the encrypted page.
>>>>>>>>>> pros:
>>>>>>>>>> - can work in the background with minimal performance impact
>>>>>> (this
>>>>>>>>>> impact can be managed).
>>>>>>>>>> cons:
>>>>>>>>>> - page duplication in the WAL may affect performance and
>>>>>> historical
>>>>>>>>>> rebalance.
>>>>>>>>>>
>>>>>>>>>> 2. Copy partition with re-encryption.
>>>>>>>>>> This strategy is similar to partition snapshotting [3] - create
>>>>>>>>>> partition copy encrypted with the new key and then replace the
>>>>>>>>>> original partition file with the new one (see details [4]).
>>>>>>>>>> pros:
>>>>>>>>>> - should work faster than "in place" re-encryption.
>>>>>>>>>> cons:
>>>>>>>>>> - re-encryption in active cluster (and on unstable topology) can
>>>>>> be
>>>>>>>>>> difficult to implement.
>>>>>>>>>>
>>>>>>>>>> (See more detailed comparison [5])
>>>>>>>>>>
>>>>>>>>>> Re-encryption of existing data is a long and rare procedure (It is
>>>>>>>>>> recommended to change the key every 6 months, but at least once
>>>>>> every
>>>>>>>>>> 2 years). Thus, re-encryption can be implemented for maintenance
>>>>>> mode
>>>>>>>>>> (for example, on a stable topology in a read-only cluster) and in
>>>>>>> such
>>>>>>>>>> case the approach with partition copying seems simpler and faster.
>>>>>>>>>>
>>>>>>>>>> So, what do you think - do we need "online" re-encryption and
>>>> which
>>>>>>> of
>>>>>>>>>> the proposed options is best suited for this?
>>>>>>>>>>
>>>>>>>>>> [1] https://issues.apache.org/jira/browse/IGNITE-12186
>>>>>>>>>> [2]
>>>>>>> https://www.pcisecuritystandards.org/documents/PCI_DSS_v3-2-1.pdf
>>>>>>>>>> [3]
>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>
>>>> https://cwiki.apache.org/confluence/display/IGNITE/IEP-43%3A+Cluster+snapshots#IEP-43:Clustersnapshots-Partitionscopystrategy
>>>>>>>>>> [4]
>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>
>>>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase-3.Cachekeyrotation.-Copywithre-encryptiondesign
>>>>>>>>>> .
>>>>>>>>>> [5]
>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>
>>>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase-3.Cachekeyrotation.-Comparison
>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> Best regards,
>>>>> Alexei Scherbakov
>>>>
>>>>
>>>
>>> --
>>>
>>> Best regards,
>>> Alexei Scherbakov
>>>
>>
>>
>> --
>>
>> Best regards,
>> Alexei Scherbakov
>>
>
>
> --
>
> Best regards,
> Alexei Scherbakov

Alexei Scherbakov

Re: [DISCUSS] Best way to re-encrypt existing data (TDE cache key rotation).

For me, the one big disadvantage for offline re-encryption is the
possibility to run out of WAL history.
If an re-encryption takes a long time we will get full rebalancing with
partition eviction.
This willl takes us to the re-encryption using full rebalancing, proposed
by me earlier.

пн, 25 мая 2020 г. в 11:27, Nikolay Izhikov <[hidden email]>:

> > And definitely this approach is much simplier to implement
>
> I agree.
>
> If we allow to made nodes offline for reencryption then we can implement a
> fully offline procedure:
>
> 1. Stop node.
> 2. Execute some control.sh command that will reencrypt all data without
> starting node
> 3. Start node.
>
> Pavel, can you, please, write it one more time - what disadvantages in
> offline procedure?
>
> > 25 мая 2020 г., в 11:20, Alexei Scherbakov <[hidden email]>
> написал(а):
> >
> > And definitely this approach is much simplier to implement because all
> > corner cases are handled by rebalancing code.
> >
> > пн, 25 мая 2020 г. в 11:16, Alexei Scherbakov <
> [hidden email]
> >> :
> >
> >> I mean: serving supply requests.
> >>
> >> пн, 25 мая 2020 г. в 11:15, Alexei Scherbakov <
> >> [hidden email]>:
> >>
> >>> Nikolay,
> >>>
> >>> Can you explain why such restriction is necessary ?
> >>> Most likely having a currently re-encrypting node serving only demand
> >>> requests will have least preformance impact on a grid.
> >>>
> >>> пн, 25 мая 2020 г. в 11:08, Nikolay Izhikov <[hidden email]>:
> >>>
> >>>> Hello, Alexei.
> >>>>
> >>>> I think we want to implement this feature without nodes restart.
> >>>> In the ideal scenario all nodes will stay alive and respond to the
> user
> >>>> requests.
> >>>>
> >>>>> 24 мая 2020 г., в 15:24, Alexei Scherbakov <
> >>>> [hidden email]> написал(а):
> >>>>>
> >>>>> Pavel Pereslegin,
> >>>>>
> >>>>> I see another opportunity.
> >>>>> We can use rebalancing to re-encrypt node data with a new key.
> >>>>> It's a trivial procedure for me: stop a node, clear database, change
> a
> >>>> key,
> >>>>> start node and wait for rebalancing to complete.
> >>>>> Data will be re-encrypted during rebalancing.
> >>>>>
> >>>>> Did I miss something ?
> >>>>>
> >>>>> пт, 22 мая 2020 г. в 16:14, Ivan Rakov <[hidden email]>:
> >>>>>
> >>>>>> Folks,
> >>>>>>
> >>>>>> Just keeping you informed: I and my colleagues are highly interested
> >>>> in TDE
> >>>>>> in general and keys rotations specifically, but we don't have enough
> >>>> time
> >>>>>> so far.
> >>>>>> We'll dive into this feature and participate in reviews next month.
> >>>>>>
> >>>>>> --
> >>>>>> Best Regards,
> >>>>>> Ivan Rakov
> >>>>>>
> >>>>>> On Sun, May 17, 2020 at 10:51 PM Pavel Pereslegin <[hidden email]
> >
> >>>>>> wrote:
> >>>>>>
> >>>>>>> Hello, Alexey.
> >>>>>>>
> >>>>>>>> is the encryption key for the data the same on all nodes in the
> >>>>>> cluster?
> >>>>>>> Yes, each encrypted cache group has its own encryption key, the key
> >>>> is
> >>>>>>> the same on all nodes.
> >>>>>>>
> >>>>>>>> Clearly, during the re-encryption there will exist pages
> >>>>>>>> encrypted with both new and old keys at the same time.
> >>>>>>> Yes, there will be pages encrypted with different keys at the same
> >>>> time.
> >>>>>>> Currently, we only store one key for one cache group. To rotate a
> >>>> key,
> >>>>>>> at a certain point in time it is necessary to support several keys
> >>>> (at
> >>>>>>> least for reading the WAL).
> >>>>>>> For the "in place" strategy, we'll store the encryption key
> >>>> identifier
> >>>>>>> on each encrypted page (we currently have some unused space on
> >>>>>>> encrypted page, so I don't expect any memory overhead here). Thus,
> we
> >>>>>>> will have several keys for reading and one key for writing. I
> assume
> >>>>>>> that the old key will be automatically deleted when a specific WAL
> >>>>>>> segment is deleted (and re-encryption is finished).
> >>>>>>>
> >>>>>>>> Will a node continue to re-encrypt the data after it restarts?
> >>>>>>> Yes.
> >>>>>>>
> >>>>>>>> If a node goes down during the re-encryption, but the rest of the
> >>>>>>>> cluster finishes re-encryption, will we consider the procedure
> >>>>>> complete?
> >>>>>>> I'm not sure, but it looks like the key rotation is complete when
> we
> >>>>>>> set the new key on all nodes so that the updates will be encrypted
> >>>>>>> with the new key (as required by PCI DSS).
> >>>>>>> Status of re-encryption can be obtained separately (locally or
> >>>> cluster
> >>>>>>> wide).
> >>>>>>>
> >>>>>>> I forgot to mention that with “in place” re-encryption it will be
> >>>>>>> impossible to quickly cancel re-encryption, because by canceling we
> >>>>>>> mean re-encryption with the old key.
> >>>>>>>
> >>>>>>>> How do you see the whole key rotation procedure will work?
> >>>>>>> Initial design for re-encryption with "partition copying" is
> >>>> described
> >>>>>>> here [1]. I'll prepare detailed design for "in place" re-encryption
> >>>> if
> >>>>>>> we'll go this way. In short, send the new encryption key
> >>>> cluster-wide,
> >>>>>>> each node adds a new key and starts background re-encryption.
> >>>>>>>
> >>>>>>> [1]
> >>>>>>>
> >>>>>>
> >>>>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase-3.Cachekeyrotation.-Copywithre-encryptiondesign
> >>>>>>> .
> >>>>>>>
> >>>>>>> вс, 17 мая 2020 г. в 18:35, Alexey Goncharuk <
> >>>> [hidden email]
> >>>>>>> :
> >>>>>>>>
> >>>>>>>> Pavel, Anton,
> >>>>>>>>
> >>>>>>>> How do you see the whole key rotation procedure will work?
> Clearly,
> >>>>>>> during
> >>>>>>>> the re-encryption there will exist pages encrypted with both new
> and
> >>>>>> old
> >>>>>>>> keys at the same time. Will a node continue to re-encrypt the data
> >>>>>> after
> >>>>>>> it
> >>>>>>>> restarts? If a node goes down during the re-encryption, but the
> >>>> rest of
> >>>>>>> the
> >>>>>>>> cluster finishes re-encryption, will we consider the procedure
> >>>>>> complete?
> >>>>>>> By
> >>>>>>>> the way, is the encryption key for the data the same on all nodes
> in
> >>>>>> the
> >>>>>>>> cluster?
> >>>>>>>>
> >>>>>>>> чт, 14 мая 2020 г. в 11:30, Anton Vinogradov <[hidden email]>:
> >>>>>>>>
> >>>>>>>>> +1 to "In place re-encryption".
> >>>>>>>>>
> >>>>>>>>> - It has a simple design.
> >>>>>>>>> - Clusters under load may require just load to re-encrypt the
> data.
> >>>>>>>>> (Friendly to load).
> >>>>>>>>> - Easy to throttle.
> >>>>>>>>> - Easy to continue.
> >>>>>>>>> - Design compatible with the multi-key architecture.
> >>>>>>>>> - It can be optimized to use own WAL buffer and to re-encrypt
> pages
> >>>>>>> without
> >>>>>>>>> restoring them to on-heap.
> >>>>>>>>>
> >>>>>>>>> On Thu, May 14, 2020 at 1:54 AM Pavel Pereslegin <
> [hidden email]
> >>>>>
> >>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> Hello Igniters.
> >>>>>>>>>>
> >>>>>>>>>> Recently, master key rotation for Apache Ignite Transparent Data
> >>>>>>>>>> Encryption was implemented [1], but some security standards (PCI
> >>>>>> DSS
> >>>>>>>>>> at least) require rotation of all encryption keys [2].
> Currently,
> >>>>>>>>>> encryption occurs when reading/writing pages to disk, cache
> >>>>>>> encryption
> >>>>>>>>>> keys are stored in metastore.
> >>>>>>>>>>
> >>>>>>>>>> I'm going to contribute cache encryption key rotation and want
> to
> >>>>>>>>>> consult what is the best way to re-encrypting existing data, I
> see
> >>>>>>> two
> >>>>>>>>>> different strategies.
> >>>>>>>>>>
> >>>>>>>>>> 1. In place re-encryption:
> >>>>>>>>>> Using the old key, sequentially read all the pages from the
> >>>>>>> datastore,
> >>>>>>>>>> mark as dirty and log them into the WAL. After checkpoint pages
> >>>>>> will
> >>>>>>>>>> be stored to disk encrypted with the new key (as usual, along
> with
> >>>>>>>>>> updates). This strategy requires store the identifier (number)
> of
> >>>>>> the
> >>>>>>>>>> encryption key into the encrypted page.
> >>>>>>>>>> pros:
> >>>>>>>>>> - can work in the background with minimal performance impact
> >>>>>> (this
> >>>>>>>>>> impact can be managed).
> >>>>>>>>>> cons:
> >>>>>>>>>> - page duplication in the WAL may affect performance and
> >>>>>> historical
> >>>>>>>>>> rebalance.
> >>>>>>>>>>
> >>>>>>>>>> 2. Copy partition with re-encryption.
> >>>>>>>>>> This strategy is similar to partition snapshotting [3] - create
> >>>>>>>>>> partition copy encrypted with the new key and then replace the
> >>>>>>>>>> original partition file with the new one (see details [4]).
> >>>>>>>>>> pros:
> >>>>>>>>>> - should work faster than "in place" re-encryption.
> >>>>>>>>>> cons:
> >>>>>>>>>> - re-encryption in active cluster (and on unstable topology) can
> >>>>>> be
> >>>>>>>>>> difficult to implement.
> >>>>>>>>>>
> >>>>>>>>>> (See more detailed comparison [5])
> >>>>>>>>>>
> >>>>>>>>>> Re-encryption of existing data is a long and rare procedure (It
> is
> >>>>>>>>>> recommended to change the key every 6 months, but at least once
> >>>>>> every
> >>>>>>>>>> 2 years). Thus, re-encryption can be implemented for maintenance
> >>>>>> mode
> >>>>>>>>>> (for example, on a stable topology in a read-only cluster) and
> in
> >>>>>>> such
> >>>>>>>>>> case the approach with partition copying seems simpler and
> faster.
> >>>>>>>>>>
> >>>>>>>>>> So, what do you think - do we need "online" re-encryption and
> >>>> which
> >>>>>>> of
> >>>>>>>>>> the proposed options is best suited for this?
> >>>>>>>>>>
> >>>>>>>>>> [1] https://issues.apache.org/jira/browse/IGNITE-12186
> >>>>>>>>>> [2]
> >>>>>>> https://www.pcisecuritystandards.org/documents/PCI_DSS_v3-2-1.pdf
> >>>>>>>>>> [3]
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-43%3A+Cluster+snapshots#IEP-43:Clustersnapshots-Partitionscopystrategy
> >>>>>>>>>> [4]
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase-3.Cachekeyrotation.-Copywithre-encryptiondesign
> >>>>>>>>>> .
> >>>>>>>>>> [5]
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase-3.Cachekeyrotation.-Comparison
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>>
> >>>>> Best regards,
> >>>>> Alexei Scherbakov
> >>>>
> >>>>
> >>>
> >>> --
> >>>
> >>> Best regards,
> >>> Alexei Scherbakov
> >>>
> >>
> >>
> >> --
> >>
> >> Best regards,
> >> Alexei Scherbakov
> >>
> >
> >
> > --
> >
> > Best regards,
> > Alexei Scherbakov
>
>

--

Best regards,
Alexei Scherbakov

Nikolay Izhikov-2

Re: [DISCUSS] Best way to re-encrypt existing data (TDE cache key rotation).

> This willl takes us to the re-encryption using full rebalancing

Rebalance will require 2x efforts for reencryption

1. Read and send data from supplier node.
2. Reencrypt and write data on demander node.

Instead of

1. Read, reencrypt and write data on «demander» node.

> 25 мая 2020 г., в 11:46, Alexei Scherbakov <[hidden email]> написал(а):
>
> For me, the one big disadvantage for offline re-encryption is the
> possibility to run out of WAL history.
> If an re-encryption takes a long time we will get full rebalancing with
> partition eviction.
> This willl takes us to the re-encryption using full rebalancing, proposed
> by me earlier.
>
>
>
> пн, 25 мая 2020 г. в 11:27, Nikolay Izhikov <[hidden email]>:
>
>>> And definitely this approach is much simplier to implement
>>
>> I agree.
>>
>> If we allow to made nodes offline for reencryption then we can implement a
>> fully offline procedure:
>>
>> 1. Stop node.
>> 2. Execute some control.sh command that will reencrypt all data without
>> starting node
>> 3. Start node.
>>
>> Pavel, can you, please, write it one more time - what disadvantages in
>> offline procedure?
>>
>>> 25 мая 2020 г., в 11:20, Alexei Scherbakov <[hidden email]>
>> написал(а):
>>>
>>> And definitely this approach is much simplier to implement because all
>>> corner cases are handled by rebalancing code.
>>>
>>> пн, 25 мая 2020 г. в 11:16, Alexei Scherbakov <
>> [hidden email]
>>>> :
>>>
>>>> I mean: serving supply requests.
>>>>
>>>> пн, 25 мая 2020 г. в 11:15, Alexei Scherbakov <
>>>> [hidden email]>:
>>>>
>>>>> Nikolay,
>>>>>
>>>>> Can you explain why such restriction is necessary ?
>>>>> Most likely having a currently re-encrypting node serving only demand
>>>>> requests will have least preformance impact on a grid.
>>>>>
>>>>> пн, 25 мая 2020 г. в 11:08, Nikolay Izhikov <[hidden email]>:
>>>>>
>>>>>> Hello, Alexei.
>>>>>>
>>>>>> I think we want to implement this feature without nodes restart.
>>>>>> In the ideal scenario all nodes will stay alive and respond to the
>> user
>>>>>> requests.
>>>>>>
>>>>>>> 24 мая 2020 г., в 15:24, Alexei Scherbakov <
>>>>>> [hidden email]> написал(а):
>>>>>>>
>>>>>>> Pavel Pereslegin,
>>>>>>>
>>>>>>> I see another opportunity.
>>>>>>> We can use rebalancing to re-encrypt node data with a new key.
>>>>>>> It's a trivial procedure for me: stop a node, clear database, change
>> a
>>>>>> key,
>>>>>>> start node and wait for rebalancing to complete.
>>>>>>> Data will be re-encrypted during rebalancing.
>>>>>>>
>>>>>>> Did I miss something ?
>>>>>>>
>>>>>>> пт, 22 мая 2020 г. в 16:14, Ivan Rakov <[hidden email]>:
>>>>>>>
>>>>>>>> Folks,
>>>>>>>>
>>>>>>>> Just keeping you informed: I and my colleagues are highly interested
>>>>>> in TDE
>>>>>>>> in general and keys rotations specifically, but we don't have enough
>>>>>> time
>>>>>>>> so far.
>>>>>>>> We'll dive into this feature and participate in reviews next month.
>>>>>>>>
>>>>>>>> --
>>>>>>>> Best Regards,
>>>>>>>> Ivan Rakov
>>>>>>>>
>>>>>>>> On Sun, May 17, 2020 at 10:51 PM Pavel Pereslegin <[hidden email]
>>>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hello, Alexey.
>>>>>>>>>
>>>>>>>>>> is the encryption key for the data the same on all nodes in the
>>>>>>>> cluster?
>>>>>>>>> Yes, each encrypted cache group has its own encryption key, the key
>>>>>> is
>>>>>>>>> the same on all nodes.
>>>>>>>>>
>>>>>>>>>> Clearly, during the re-encryption there will exist pages
>>>>>>>>>> encrypted with both new and old keys at the same time.
>>>>>>>>> Yes, there will be pages encrypted with different keys at the same
>>>>>> time.
>>>>>>>>> Currently, we only store one key for one cache group. To rotate a
>>>>>> key,
>>>>>>>>> at a certain point in time it is necessary to support several keys
>>>>>> (at
>>>>>>>>> least for reading the WAL).
>>>>>>>>> For the "in place" strategy, we'll store the encryption key
>>>>>> identifier
>>>>>>>>> on each encrypted page (we currently have some unused space on
>>>>>>>>> encrypted page, so I don't expect any memory overhead here). Thus,
>> we
>>>>>>>>> will have several keys for reading and one key for writing. I
>> assume
>>>>>>>>> that the old key will be automatically deleted when a specific WAL
>>>>>>>>> segment is deleted (and re-encryption is finished).
>>>>>>>>>
>>>>>>>>>> Will a node continue to re-encrypt the data after it restarts?
>>>>>>>>> Yes.
>>>>>>>>>
>>>>>>>>>> If a node goes down during the re-encryption, but the rest of the
>>>>>>>>>> cluster finishes re-encryption, will we consider the procedure
>>>>>>>> complete?
>>>>>>>>> I'm not sure, but it looks like the key rotation is complete when
>> we
>>>>>>>>> set the new key on all nodes so that the updates will be encrypted
>>>>>>>>> with the new key (as required by PCI DSS).
>>>>>>>>> Status of re-encryption can be obtained separately (locally or
>>>>>> cluster
>>>>>>>>> wide).
>>>>>>>>>
>>>>>>>>> I forgot to mention that with “in place” re-encryption it will be
>>>>>>>>> impossible to quickly cancel re-encryption, because by canceling we
>>>>>>>>> mean re-encryption with the old key.
>>>>>>>>>
>>>>>>>>>> How do you see the whole key rotation procedure will work?
>>>>>>>>> Initial design for re-encryption with "partition copying" is
>>>>>> described
>>>>>>>>> here [1]. I'll prepare detailed design for "in place" re-encryption
>>>>>> if
>>>>>>>>> we'll go this way. In short, send the new encryption key
>>>>>> cluster-wide,
>>>>>>>>> each node adds a new key and starts background re-encryption.
>>>>>>>>>
>>>>>>>>> [1]
>>>>>>>>>
>>>>>>>>
>>>>>>
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase-3.Cachekeyrotation.-Copywithre-encryptiondesign
>>>>>>>>> .
>>>>>>>>>
>>>>>>>>> вс, 17 мая 2020 г. в 18:35, Alexey Goncharuk <
>>>>>> [hidden email]
>>>>>>>>> :
>>>>>>>>>>
>>>>>>>>>> Pavel, Anton,
>>>>>>>>>>
>>>>>>>>>> How do you see the whole key rotation procedure will work?
>> Clearly,
>>>>>>>>> during
>>>>>>>>>> the re-encryption there will exist pages encrypted with both new
>> and
>>>>>>>> old
>>>>>>>>>> keys at the same time. Will a node continue to re-encrypt the data
>>>>>>>> after
>>>>>>>>> it
>>>>>>>>>> restarts? If a node goes down during the re-encryption, but the
>>>>>> rest of
>>>>>>>>> the
>>>>>>>>>> cluster finishes re-encryption, will we consider the procedure
>>>>>>>> complete?
>>>>>>>>> By
>>>>>>>>>> the way, is the encryption key for the data the same on all nodes
>> in
>>>>>>>> the
>>>>>>>>>> cluster?
>>>>>>>>>>
>>>>>>>>>> чт, 14 мая 2020 г. в 11:30, Anton Vinogradov <[hidden email]>:
>>>>>>>>>>
>>>>>>>>>>> +1 to "In place re-encryption".
>>>>>>>>>>>
>>>>>>>>>>> - It has a simple design.
>>>>>>>>>>> - Clusters under load may require just load to re-encrypt the
>> data.
>>>>>>>>>>> (Friendly to load).
>>>>>>>>>>> - Easy to throttle.
>>>>>>>>>>> - Easy to continue.
>>>>>>>>>>> - Design compatible with the multi-key architecture.
>>>>>>>>>>> - It can be optimized to use own WAL buffer and to re-encrypt
>> pages
>>>>>>>>> without
>>>>>>>>>>> restoring them to on-heap.
>>>>>>>>>>>
>>>>>>>>>>> On Thu, May 14, 2020 at 1:54 AM Pavel Pereslegin <
>> [hidden email]
>>>>>>>
>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hello Igniters.
>>>>>>>>>>>>
>>>>>>>>>>>> Recently, master key rotation for Apache Ignite Transparent Data
>>>>>>>>>>>> Encryption was implemented [1], but some security standards (PCI
>>>>>>>> DSS
>>>>>>>>>>>> at least) require rotation of all encryption keys [2].
>> Currently,
>>>>>>>>>>>> encryption occurs when reading/writing pages to disk, cache
>>>>>>>>> encryption
>>>>>>>>>>>> keys are stored in metastore.
>>>>>>>>>>>>
>>>>>>>>>>>> I'm going to contribute cache encryption key rotation and want
>> to
>>>>>>>>>>>> consult what is the best way to re-encrypting existing data, I
>> see
>>>>>>>>> two
>>>>>>>>>>>> different strategies.
>>>>>>>>>>>>
>>>>>>>>>>>> 1. In place re-encryption:
>>>>>>>>>>>> Using the old key, sequentially read all the pages from the
>>>>>>>>> datastore,
>>>>>>>>>>>> mark as dirty and log them into the WAL. After checkpoint pages
>>>>>>>> will
>>>>>>>>>>>> be stored to disk encrypted with the new key (as usual, along
>> with
>>>>>>>>>>>> updates). This strategy requires store the identifier (number)
>> of
>>>>>>>> the
>>>>>>>>>>>> encryption key into the encrypted page.
>>>>>>>>>>>> pros:
>>>>>>>>>>>> - can work in the background with minimal performance impact
>>>>>>>> (this
>>>>>>>>>>>> impact can be managed).
>>>>>>>>>>>> cons:
>>>>>>>>>>>> - page duplication in the WAL may affect performance and
>>>>>>>> historical
>>>>>>>>>>>> rebalance.
>>>>>>>>>>>>
>>>>>>>>>>>> 2. Copy partition with re-encryption.
>>>>>>>>>>>> This strategy is similar to partition snapshotting [3] - create
>>>>>>>>>>>> partition copy encrypted with the new key and then replace the
>>>>>>>>>>>> original partition file with the new one (see details [4]).
>>>>>>>>>>>> pros:
>>>>>>>>>>>> - should work faster than "in place" re-encryption.
>>>>>>>>>>>> cons:
>>>>>>>>>>>> - re-encryption in active cluster (and on unstable topology) can
>>>>>>>> be
>>>>>>>>>>>> difficult to implement.
>>>>>>>>>>>>
>>>>>>>>>>>> (See more detailed comparison [5])
>>>>>>>>>>>>
>>>>>>>>>>>> Re-encryption of existing data is a long and rare procedure (It
>> is
>>>>>>>>>>>> recommended to change the key every 6 months, but at least once
>>>>>>>> every
>>>>>>>>>>>> 2 years). Thus, re-encryption can be implemented for maintenance
>>>>>>>> mode
>>>>>>>>>>>> (for example, on a stable topology in a read-only cluster) and
>> in
>>>>>>>>> such
>>>>>>>>>>>> case the approach with partition copying seems simpler and
>> faster.
>>>>>>>>>>>>
>>>>>>>>>>>> So, what do you think - do we need "online" re-encryption and
>>>>>> which
>>>>>>>>> of
>>>>>>>>>>>> the proposed options is best suited for this?
>>>>>>>>>>>>
>>>>>>>>>>>> [1] https://issues.apache.org/jira/browse/IGNITE-12186
>>>>>>>>>>>> [2]
>>>>>>>>> https://www.pcisecuritystandards.org/documents/PCI_DSS_v3-2-1.pdf
>>>>>>>>>>>> [3]
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>
>> https://cwiki.apache.org/confluence/display/IGNITE/IEP-43%3A+Cluster+snapshots#IEP-43:Clustersnapshots-Partitionscopystrategy
>>>>>>>>>>>> [4]
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase-3.Cachekeyrotation.-Copywithre-encryptiondesign
>>>>>>>>>>>> .
>>>>>>>>>>>> [5]
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase-3.Cachekeyrotation.-Comparison
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>> Best regards,
>>>>>>> Alexei Scherbakov
>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> Best regards,
>>>>> Alexei Scherbakov
>>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Best regards,
>>>> Alexei Scherbakov
>>>>
>>>
>>>
>>> --
>>>
>>> Best regards,
>>> Alexei Scherbakov
>>
>>
>
> --
>
> Best regards,
> Alexei Scherbakov

Alexei Scherbakov

Re: [DISCUSS] Best way to re-encrypt existing data (TDE cache key rotation).

пн, 25 мая 2020 г. в 12:00, Nikolay Izhikov <[hidden email]>:

> > This willl takes us to the re-encryption using full rebalancing
>
> Rebalance will require 2x efforts for reencryption
>
> 1. Read and send data from supplier node.
> 2. Reencrypt and write data on demander node.
>
> Instead of
>
> 1. Read, reencrypt and write data on «demander» node.
>

Usually, reading and sending is not a bottleneck. And don't forget we can
run out of WAL history and fall back to full rebalancing with partition
eviction eliminating all efforts from offline re-encryption.

On the other side, for a grid having many nodes one-by-one re-encryption
can take a long time.
It should also be possible to re-encrypt all data as fast as possible if,
for example, if a load can be switched to another grid, where offline
encryption will come in handy.

So, I suggest to implement offline re-encryption and online re-encryption
using rebalancing as a first step.

Next step can be online in-place re-encryption. It's important to measure
business impact from it on online grid.

>
>
> > 25 мая 2020 г., в 11:46, Alexei Scherbakov <[hidden email]>
> написал(а):
> >
> > For me, the one big disadvantage for offline re-encryption is the
> > possibility to run out of WAL history.
> > If an re-encryption takes a long time we will get full rebalancing with
> > partition eviction.
> > This willl takes us to the re-encryption using full rebalancing, proposed
> > by me earlier.
> >
> >
> >
> > пн, 25 мая 2020 г. в 11:27, Nikolay Izhikov <[hidden email]>:
> >
> >>> And definitely this approach is much simplier to implement
> >>
> >> I agree.
> >>
> >> If we allow to made nodes offline for reencryption then we can
> implement a
> >> fully offline procedure:
> >>
> >> 1. Stop node.
> >> 2. Execute some control.sh command that will reencrypt all data without
> >> starting node
> >> 3. Start node.
> >>
> >> Pavel, can you, please, write it one more time - what disadvantages in
> >> offline procedure?
> >>
> >>> 25 мая 2020 г., в 11:20, Alexei Scherbakov <
> [hidden email]>
> >> написал(а):
> >>>
> >>> And definitely this approach is much simplier to implement because all
> >>> corner cases are handled by rebalancing code.
> >>>
> >>> пн, 25 мая 2020 г. в 11:16, Alexei Scherbakov <
> >> [hidden email]
> >>>> :
> >>>
> >>>> I mean: serving supply requests.
> >>>>
> >>>> пн, 25 мая 2020 г. в 11:15, Alexei Scherbakov <
> >>>> [hidden email]>:
> >>>>
> >>>>> Nikolay,
> >>>>>
> >>>>> Can you explain why such restriction is necessary ?
> >>>>> Most likely having a currently re-encrypting node serving only demand
> >>>>> requests will have least preformance impact on a grid.
> >>>>>
> >>>>> пн, 25 мая 2020 г. в 11:08, Nikolay Izhikov <[hidden email]>:
> >>>>>
> >>>>>> Hello, Alexei.
> >>>>>>
> >>>>>> I think we want to implement this feature without nodes restart.
> >>>>>> In the ideal scenario all nodes will stay alive and respond to the
> >> user
> >>>>>> requests.
> >>>>>>
> >>>>>>> 24 мая 2020 г., в 15:24, Alexei Scherbakov <
> >>>>>> [hidden email]> написал(а):
> >>>>>>>
> >>>>>>> Pavel Pereslegin,
> >>>>>>>
> >>>>>>> I see another opportunity.
> >>>>>>> We can use rebalancing to re-encrypt node data with a new key.
> >>>>>>> It's a trivial procedure for me: stop a node, clear database,
> change
> >> a
> >>>>>> key,
> >>>>>>> start node and wait for rebalancing to complete.
> >>>>>>> Data will be re-encrypted during rebalancing.
> >>>>>>>
> >>>>>>> Did I miss something ?
> >>>>>>>
> >>>>>>> пт, 22 мая 2020 г. в 16:14, Ivan Rakov <[hidden email]>:
> >>>>>>>
> >>>>>>>> Folks,
> >>>>>>>>
> >>>>>>>> Just keeping you informed: I and my colleagues are highly
> interested
> >>>>>> in TDE
> >>>>>>>> in general and keys rotations specifically, but we don't have
> enough
> >>>>>> time
> >>>>>>>> so far.
> >>>>>>>> We'll dive into this feature and participate in reviews next
> month.
> >>>>>>>>
> >>>>>>>> --
> >>>>>>>> Best Regards,
> >>>>>>>> Ivan Rakov
> >>>>>>>>
> >>>>>>>> On Sun, May 17, 2020 at 10:51 PM Pavel Pereslegin <
> [hidden email]
> >>>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> Hello, Alexey.
> >>>>>>>>>
> >>>>>>>>>> is the encryption key for the data the same on all nodes in the
> >>>>>>>> cluster?
> >>>>>>>>> Yes, each encrypted cache group has its own encryption key, the
> key
> >>>>>> is
> >>>>>>>>> the same on all nodes.
> >>>>>>>>>
> >>>>>>>>>> Clearly, during the re-encryption there will exist pages
> >>>>>>>>>> encrypted with both new and old keys at the same time.
> >>>>>>>>> Yes, there will be pages encrypted with different keys at the
> same
> >>>>>> time.
> >>>>>>>>> Currently, we only store one key for one cache group. To rotate a
> >>>>>> key,
> >>>>>>>>> at a certain point in time it is necessary to support several
> keys
> >>>>>> (at
> >>>>>>>>> least for reading the WAL).
> >>>>>>>>> For the "in place" strategy, we'll store the encryption key
> >>>>>> identifier
> >>>>>>>>> on each encrypted page (we currently have some unused space on
> >>>>>>>>> encrypted page, so I don't expect any memory overhead here).
> Thus,
> >> we
> >>>>>>>>> will have several keys for reading and one key for writing. I
> >> assume
> >>>>>>>>> that the old key will be automatically deleted when a specific
> WAL
> >>>>>>>>> segment is deleted (and re-encryption is finished).
> >>>>>>>>>
> >>>>>>>>>> Will a node continue to re-encrypt the data after it restarts?
> >>>>>>>>> Yes.
> >>>>>>>>>
> >>>>>>>>>> If a node goes down during the re-encryption, but the rest of
> the
> >>>>>>>>>> cluster finishes re-encryption, will we consider the procedure
> >>>>>>>> complete?
> >>>>>>>>> I'm not sure, but it looks like the key rotation is complete when
> >> we
> >>>>>>>>> set the new key on all nodes so that the updates will be
> encrypted
> >>>>>>>>> with the new key (as required by PCI DSS).
> >>>>>>>>> Status of re-encryption can be obtained separately (locally or
> >>>>>> cluster
> >>>>>>>>> wide).
> >>>>>>>>>
> >>>>>>>>> I forgot to mention that with “in place” re-encryption it will be
> >>>>>>>>> impossible to quickly cancel re-encryption, because by canceling
> we
> >>>>>>>>> mean re-encryption with the old key.
> >>>>>>>>>
> >>>>>>>>>> How do you see the whole key rotation procedure will work?
> >>>>>>>>> Initial design for re-encryption with "partition copying" is
> >>>>>> described
> >>>>>>>>> here [1]. I'll prepare detailed design for "in place"
> re-encryption
> >>>>>> if
> >>>>>>>>> we'll go this way. In short, send the new encryption key
> >>>>>> cluster-wide,
> >>>>>>>>> each node adds a new key and starts background re-encryption.
> >>>>>>>>>
> >>>>>>>>> [1]
> >>>>>>>>>
> >>>>>>>>
> >>>>>>
> >>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase-3.Cachekeyrotation.-Copywithre-encryptiondesign
> >>>>>>>>> .
> >>>>>>>>>
> >>>>>>>>> вс, 17 мая 2020 г. в 18:35, Alexey Goncharuk <
> >>>>>> [hidden email]
> >>>>>>>>> :
> >>>>>>>>>>
> >>>>>>>>>> Pavel, Anton,
> >>>>>>>>>>
> >>>>>>>>>> How do you see the whole key rotation procedure will work?
> >> Clearly,
> >>>>>>>>> during
> >>>>>>>>>> the re-encryption there will exist pages encrypted with both new
> >> and
> >>>>>>>> old
> >>>>>>>>>> keys at the same time. Will a node continue to re-encrypt the
> data
> >>>>>>>> after
> >>>>>>>>> it
> >>>>>>>>>> restarts? If a node goes down during the re-encryption, but the
> >>>>>> rest of
> >>>>>>>>> the
> >>>>>>>>>> cluster finishes re-encryption, will we consider the procedure
> >>>>>>>> complete?
> >>>>>>>>> By
> >>>>>>>>>> the way, is the encryption key for the data the same on all
> nodes
> >> in
> >>>>>>>> the
> >>>>>>>>>> cluster?
> >>>>>>>>>>
> >>>>>>>>>> чт, 14 мая 2020 г. в 11:30, Anton Vinogradov <[hidden email]>:
> >>>>>>>>>>
> >>>>>>>>>>> +1 to "In place re-encryption".
> >>>>>>>>>>>
> >>>>>>>>>>> - It has a simple design.
> >>>>>>>>>>> - Clusters under load may require just load to re-encrypt the
> >> data.
> >>>>>>>>>>> (Friendly to load).
> >>>>>>>>>>> - Easy to throttle.
> >>>>>>>>>>> - Easy to continue.
> >>>>>>>>>>> - Design compatible with the multi-key architecture.
> >>>>>>>>>>> - It can be optimized to use own WAL buffer and to re-encrypt
> >> pages
> >>>>>>>>> without
> >>>>>>>>>>> restoring them to on-heap.
> >>>>>>>>>>>
> >>>>>>>>>>> On Thu, May 14, 2020 at 1:54 AM Pavel Pereslegin <
> >> [hidden email]
> >>>>>>>
> >>>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> Hello Igniters.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Recently, master key rotation for Apache Ignite Transparent
> Data
> >>>>>>>>>>>> Encryption was implemented [1], but some security standards
> (PCI
> >>>>>>>> DSS
> >>>>>>>>>>>> at least) require rotation of all encryption keys [2].
> >> Currently,
> >>>>>>>>>>>> encryption occurs when reading/writing pages to disk, cache
> >>>>>>>>> encryption
> >>>>>>>>>>>> keys are stored in metastore.
> >>>>>>>>>>>>
> >>>>>>>>>>>> I'm going to contribute cache encryption key rotation and want
> >> to
> >>>>>>>>>>>> consult what is the best way to re-encrypting existing data, I
> >> see
> >>>>>>>>> two
> >>>>>>>>>>>> different strategies.
> >>>>>>>>>>>>
> >>>>>>>>>>>> 1. In place re-encryption:
> >>>>>>>>>>>> Using the old key, sequentially read all the pages from the
> >>>>>>>>> datastore,
> >>>>>>>>>>>> mark as dirty and log them into the WAL. After checkpoint
> pages
> >>>>>>>> will
> >>>>>>>>>>>> be stored to disk encrypted with the new key (as usual, along
> >> with
> >>>>>>>>>>>> updates). This strategy requires store the identifier (number)
> >> of
> >>>>>>>> the
> >>>>>>>>>>>> encryption key into the encrypted page.
> >>>>>>>>>>>> pros:
> >>>>>>>>>>>> - can work in the background with minimal performance impact
> >>>>>>>> (this
> >>>>>>>>>>>> impact can be managed).
> >>>>>>>>>>>> cons:
> >>>>>>>>>>>> - page duplication in the WAL may affect performance and
> >>>>>>>> historical
> >>>>>>>>>>>> rebalance.
> >>>>>>>>>>>>
> >>>>>>>>>>>> 2. Copy partition with re-encryption.
> >>>>>>>>>>>> This strategy is similar to partition snapshotting [3] -
> create
> >>>>>>>>>>>> partition copy encrypted with the new key and then replace the
> >>>>>>>>>>>> original partition file with the new one (see details [4]).
> >>>>>>>>>>>> pros:
> >>>>>>>>>>>> - should work faster than "in place" re-encryption.
> >>>>>>>>>>>> cons:
> >>>>>>>>>>>> - re-encryption in active cluster (and on unstable topology)
> can
> >>>>>>>> be
> >>>>>>>>>>>> difficult to implement.
> >>>>>>>>>>>>
> >>>>>>>>>>>> (See more detailed comparison [5])
> >>>>>>>>>>>>
> >>>>>>>>>>>> Re-encryption of existing data is a long and rare procedure
> (It
> >> is
> >>>>>>>>>>>> recommended to change the key every 6 months, but at least
> once
> >>>>>>>> every
> >>>>>>>>>>>> 2 years). Thus, re-encryption can be implemented for
> maintenance
> >>>>>>>> mode
> >>>>>>>>>>>> (for example, on a stable topology in a read-only cluster) and
> >> in
> >>>>>>>>> such
> >>>>>>>>>>>> case the approach with partition copying seems simpler and
> >> faster.
> >>>>>>>>>>>>
> >>>>>>>>>>>> So, what do you think - do we need "online" re-encryption and
> >>>>>> which
> >>>>>>>>> of
> >>>>>>>>>>>> the proposed options is best suited for this?
> >>>>>>>>>>>>
> >>>>>>>>>>>> [1] https://issues.apache.org/jira/browse/IGNITE-12186
> >>>>>>>>>>>> [2]
> >>>>>>>>>
> https://www.pcisecuritystandards.org/documents/PCI_DSS_v3-2-1.pdf
> >>>>>>>>>>>> [3]
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>
> >>
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-43%3A+Cluster+snapshots#IEP-43:Clustersnapshots-Partitionscopystrategy
> >>>>>>>>>>>> [4]
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>
> >>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase-3.Cachekeyrotation.-Copywithre-encryptiondesign
> >>>>>>>>>>>> .
> >>>>>>>>>>>> [5]
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>
> >>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase-3.Cachekeyrotation.-Comparison
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> --
> >>>>>>>
> >>>>>>> Best regards,
> >>>>>>> Alexei Scherbakov
> >>>>>>
> >>>>>>
> >>>>>
> >>>>> --
> >>>>>
> >>>>> Best regards,
> >>>>> Alexei Scherbakov
> >>>>>
> >>>>
> >>>>
> >>>> --
> >>>>
> >>>> Best regards,
> >>>> Alexei Scherbakov
> >>>>
> >>>
> >>>
> >>> --
> >>>
> >>> Best regards,
> >>> Alexei Scherbakov
> >>
> >>
> >
> > --
> >
> > Best regards,
> > Alexei Scherbakov
>
>

--

Best regards,
Alexei Scherbakov

Pavel Pereslegin

Re: [DISCUSS] Best way to re-encrypt existing data (TDE cache key rotation).

Nikolay, Alexei,

thanks for your suggestions.

Offline re-encryption does not seem so simple, we need to read/replace
the existing encryption keys on all nodes (therefore, we should be
able to read/write metastore/WAL and exchange data between the
baseline nodes). Re-encryption in maintenance mode (for example, in a
stable read-only cluster) will be simple, but it still looks very
inconvenient, at least because users will need to interrupt all
operations.

The main advantage of online "in place" re-encryption is that we'll
support multiple keys for reading, and this procedure does not
directly depend on background re-encryption.

So, the first step is similar to rotating the master key when the new
key was set for writing on all nodes - that’s it, the cache group key
rotation is complete (this is what PCI DSS requires - encrypt new
updates with new keys).
The second step is to re-encrypt the existing data, As I said
previously I thought about scanning all partition pages in some
background mode (store progress on the metapage to continue after
restart), but rebalance approach should also work here if I figure out
how to automate this process.

пн, 25 мая 2020 г. в 12:22, Alexei Scherbakov <[hidden email]>:

>
>
>
> пн, 25 мая 2020 г. в 12:00, Nikolay Izhikov <[hidden email]>:
>>
>> > This willl takes us to the re-encryption using full rebalancing
>>
>> Rebalance will require 2x efforts for reencryption
>>
>> 1. Read and send data from supplier node.
>> 2. Reencrypt and write data on demander node.
>>
>> Instead of
>>
>> 1. Read, reencrypt and write data on «demander» node.
>
>
> Usually, reading and sending is not a bottleneck. And don't forget we can run out of WAL history and fall back to full rebalancing with partition eviction eliminating all efforts from offline re-encryption.
>
> On the other side, for a grid having many nodes one-by-one re-encryption can take a long time.
> It should also be possible to re-encrypt all data as fast as possible if, for example, if a load can be switched to another grid, where offline encryption will come in handy.
>
> So, I suggest to implement offline re-encryption and online re-encryption using rebalancing as a first step.
>
> Next step can be online in-place re-encryption. It's important to measure business impact from it on online grid.
>
>>
>>
>>
>> > 25 мая 2020 г., в 11:46, Alexei Scherbakov <[hidden email]> написал(а):
>> >
>> > For me, the one big disadvantage for offline re-encryption is the
>> > possibility to run out of WAL history.
>> > If an re-encryption takes a long time we will get full rebalancing with
>> > partition eviction.
>> > This willl takes us to the re-encryption using full rebalancing, proposed
>> > by me earlier.
>> >
>> >
>> >
>> > пн, 25 мая 2020 г. в 11:27, Nikolay Izhikov <[hidden email]>:
>> >
>> >>> And definitely this approach is much simplier to implement
>> >>
>> >> I agree.
>> >>
>> >> If we allow to made nodes offline for reencryption then we can implement a
>> >> fully offline procedure:
>> >>
>> >> 1. Stop node.
>> >> 2. Execute some control.sh command that will reencrypt all data without
>> >> starting node
>> >> 3. Start node.
>> >>
>> >> Pavel, can you, please, write it one more time - what disadvantages in
>> >> offline procedure?
>> >>
>> >>> 25 мая 2020 г., в 11:20, Alexei Scherbakov <[hidden email]>
>> >> написал(а):
>> >>>
>> >>> And definitely this approach is much simplier to implement because all
>> >>> corner cases are handled by rebalancing code.
>> >>>
>> >>> пн, 25 мая 2020 г. в 11:16, Alexei Scherbakov <
>> >> [hidden email]
>> >>>> :
>> >>>
>> >>>> I mean: serving supply requests.
>> >>>>
>> >>>> пн, 25 мая 2020 г. в 11:15, Alexei Scherbakov <
>> >>>> [hidden email]>:
>> >>>>
>> >>>>> Nikolay,
>> >>>>>
>> >>>>> Can you explain why such restriction is necessary ?
>> >>>>> Most likely having a currently re-encrypting node serving only demand
>> >>>>> requests will have least preformance impact on a grid.
>> >>>>>
>> >>>>> пн, 25 мая 2020 г. в 11:08, Nikolay Izhikov <[hidden email]>:
>> >>>>>
>> >>>>>> Hello, Alexei.
>> >>>>>>
>> >>>>>> I think we want to implement this feature without nodes restart.
>> >>>>>> In the ideal scenario all nodes will stay alive and respond to the
>> >> user
>> >>>>>> requests.
>> >>>>>>
>> >>>>>>> 24 мая 2020 г., в 15:24, Alexei Scherbakov <
>> >>>>>> [hidden email]> написал(а):
>> >>>>>>>
>> >>>>>>> Pavel Pereslegin,
>> >>>>>>>
>> >>>>>>> I see another opportunity.
>> >>>>>>> We can use rebalancing to re-encrypt node data with a new key.
>> >>>>>>> It's a trivial procedure for me: stop a node, clear database, change
>> >> a
>> >>>>>> key,
>> >>>>>>> start node and wait for rebalancing to complete.
>> >>>>>>> Data will be re-encrypted during rebalancing.
>> >>>>>>>
>> >>>>>>> Did I miss something ?
>> >>>>>>>
>> >>>>>>> пт, 22 мая 2020 г. в 16:14, Ivan Rakov <[hidden email]>:
>> >>>>>>>
>> >>>>>>>> Folks,
>> >>>>>>>>
>> >>>>>>>> Just keeping you informed: I and my colleagues are highly interested
>> >>>>>> in TDE
>> >>>>>>>> in general and keys rotations specifically, but we don't have enough
>> >>>>>> time
>> >>>>>>>> so far.
>> >>>>>>>> We'll dive into this feature and participate in reviews next month.
>> >>>>>>>>
>> >>>>>>>> --
>> >>>>>>>> Best Regards,
>> >>>>>>>> Ivan Rakov
>> >>>>>>>>
>> >>>>>>>> On Sun, May 17, 2020 at 10:51 PM Pavel Pereslegin <[hidden email]
>> >>>
>> >>>>>>>> wrote:
>> >>>>>>>>
>> >>>>>>>>> Hello, Alexey.
>> >>>>>>>>>
>> >>>>>>>>>> is the encryption key for the data the same on all nodes in the
>> >>>>>>>> cluster?
>> >>>>>>>>> Yes, each encrypted cache group has its own encryption key, the key
>> >>>>>> is
>> >>>>>>>>> the same on all nodes.
>> >>>>>>>>>
>> >>>>>>>>>> Clearly, during the re-encryption there will exist pages
>> >>>>>>>>>> encrypted with both new and old keys at the same time.
>> >>>>>>>>> Yes, there will be pages encrypted with different keys at the same
>> >>>>>> time.
>> >>>>>>>>> Currently, we only store one key for one cache group. To rotate a
>> >>>>>> key,
>> >>>>>>>>> at a certain point in time it is necessary to support several keys
>> >>>>>> (at
>> >>>>>>>>> least for reading the WAL).
>> >>>>>>>>> For the "in place" strategy, we'll store the encryption key
>> >>>>>> identifier
>> >>>>>>>>> on each encrypted page (we currently have some unused space on
>> >>>>>>>>> encrypted page, so I don't expect any memory overhead here). Thus,
>> >> we
>> >>>>>>>>> will have several keys for reading and one key for writing. I
>> >> assume
>> >>>>>>>>> that the old key will be automatically deleted when a specific WAL
>> >>>>>>>>> segment is deleted (and re-encryption is finished).
>> >>>>>>>>>
>> >>>>>>>>>> Will a node continue to re-encrypt the data after it restarts?
>> >>>>>>>>> Yes.
>> >>>>>>>>>
>> >>>>>>>>>> If a node goes down during the re-encryption, but the rest of the
>> >>>>>>>>>> cluster finishes re-encryption, will we consider the procedure
>> >>>>>>>> complete?
>> >>>>>>>>> I'm not sure, but it looks like the key rotation is complete when
>> >> we
>> >>>>>>>>> set the new key on all nodes so that the updates will be encrypted
>> >>>>>>>>> with the new key (as required by PCI DSS).
>> >>>>>>>>> Status of re-encryption can be obtained separately (locally or
>> >>>>>> cluster
>> >>>>>>>>> wide).
>> >>>>>>>>>
>> >>>>>>>>> I forgot to mention that with “in place” re-encryption it will be
>> >>>>>>>>> impossible to quickly cancel re-encryption, because by canceling we
>> >>>>>>>>> mean re-encryption with the old key.
>> >>>>>>>>>
>> >>>>>>>>>> How do you see the whole key rotation procedure will work?
>> >>>>>>>>> Initial design for re-encryption with "partition copying" is
>> >>>>>> described
>> >>>>>>>>> here [1]. I'll prepare detailed design for "in place" re-encryption
>> >>>>>> if
>> >>>>>>>>> we'll go this way. In short, send the new encryption key
>> >>>>>> cluster-wide,
>> >>>>>>>>> each node adds a new key and starts background re-encryption.
>> >>>>>>>>>
>> >>>>>>>>> [1]
>> >>>>>>>>>
>> >>>>>>>>
>> >>>>>>
>> >> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase-3.Cachekeyrotation.-Copywithre-encryptiondesign
>> >>>>>>>>> .
>> >>>>>>>>>
>> >>>>>>>>> вс, 17 мая 2020 г. в 18:35, Alexey Goncharuk <
>> >>>>>> [hidden email]
>> >>>>>>>>> :
>> >>>>>>>>>>
>> >>>>>>>>>> Pavel, Anton,
>> >>>>>>>>>>
>> >>>>>>>>>> How do you see the whole key rotation procedure will work?
>> >> Clearly,
>> >>>>>>>>> during
>> >>>>>>>>>> the re-encryption there will exist pages encrypted with both new
>> >> and
>> >>>>>>>> old
>> >>>>>>>>>> keys at the same time. Will a node continue to re-encrypt the data
>> >>>>>>>> after
>> >>>>>>>>> it
>> >>>>>>>>>> restarts? If a node goes down during the re-encryption, but the
>> >>>>>> rest of
>> >>>>>>>>> the
>> >>>>>>>>>> cluster finishes re-encryption, will we consider the procedure
>> >>>>>>>> complete?
>> >>>>>>>>> By
>> >>>>>>>>>> the way, is the encryption key for the data the same on all nodes
>> >> in
>> >>>>>>>> the
>> >>>>>>>>>> cluster?
>> >>>>>>>>>>
>> >>>>>>>>>> чт, 14 мая 2020 г. в 11:30, Anton Vinogradov <[hidden email]>:
>> >>>>>>>>>>
>> >>>>>>>>>>> +1 to "In place re-encryption".
>> >>>>>>>>>>>
>> >>>>>>>>>>> - It has a simple design.
>> >>>>>>>>>>> - Clusters under load may require just load to re-encrypt the
>> >> data.
>> >>>>>>>>>>> (Friendly to load).
>> >>>>>>>>>>> - Easy to throttle.
>> >>>>>>>>>>> - Easy to continue.
>> >>>>>>>>>>> - Design compatible with the multi-key architecture.
>> >>>>>>>>>>> - It can be optimized to use own WAL buffer and to re-encrypt
>> >> pages
>> >>>>>>>>> without
>> >>>>>>>>>>> restoring them to on-heap.
>> >>>>>>>>>>>
>> >>>>>>>>>>> On Thu, May 14, 2020 at 1:54 AM Pavel Pereslegin <
>> >> [hidden email]
>> >>>>>>>
>> >>>>>>>>> wrote:
>> >>>>>>>>>>>
>> >>>>>>>>>>>> Hello Igniters.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> Recently, master key rotation for Apache Ignite Transparent Data
>> >>>>>>>>>>>> Encryption was implemented [1], but some security standards (PCI
>> >>>>>>>> DSS
>> >>>>>>>>>>>> at least) require rotation of all encryption keys [2].
>> >> Currently,
>> >>>>>>>>>>>> encryption occurs when reading/writing pages to disk, cache
>> >>>>>>>>> encryption
>> >>>>>>>>>>>> keys are stored in metastore.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> I'm going to contribute cache encryption key rotation and want
>> >> to
>> >>>>>>>>>>>> consult what is the best way to re-encrypting existing data, I
>> >> see
>> >>>>>>>>> two
>> >>>>>>>>>>>> different strategies.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> 1. In place re-encryption:
>> >>>>>>>>>>>> Using the old key, sequentially read all the pages from the
>> >>>>>>>>> datastore,
>> >>>>>>>>>>>> mark as dirty and log them into the WAL. After checkpoint pages
>> >>>>>>>> will
>> >>>>>>>>>>>> be stored to disk encrypted with the new key (as usual, along
>> >> with
>> >>>>>>>>>>>> updates). This strategy requires store the identifier (number)
>> >> of
>> >>>>>>>> the
>> >>>>>>>>>>>> encryption key into the encrypted page.
>> >>>>>>>>>>>> pros:
>> >>>>>>>>>>>> - can work in the background with minimal performance impact
>> >>>>>>>> (this
>> >>>>>>>>>>>> impact can be managed).
>> >>>>>>>>>>>> cons:
>> >>>>>>>>>>>> - page duplication in the WAL may affect performance and
>> >>>>>>>> historical
>> >>>>>>>>>>>> rebalance.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> 2. Copy partition with re-encryption.
>> >>>>>>>>>>>> This strategy is similar to partition snapshotting [3] - create
>> >>>>>>>>>>>> partition copy encrypted with the new key and then replace the
>> >>>>>>>>>>>> original partition file with the new one (see details [4]).
>> >>>>>>>>>>>> pros:
>> >>>>>>>>>>>> - should work faster than "in place" re-encryption.
>> >>>>>>>>>>>> cons:
>> >>>>>>>>>>>> - re-encryption in active cluster (and on unstable topology) can
>> >>>>>>>> be
>> >>>>>>>>>>>> difficult to implement.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> (See more detailed comparison [5])
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> Re-encryption of existing data is a long and rare procedure (It
>> >> is
>> >>>>>>>>>>>> recommended to change the key every 6 months, but at least once
>> >>>>>>>> every
>> >>>>>>>>>>>> 2 years). Thus, re-encryption can be implemented for maintenance
>> >>>>>>>> mode
>> >>>>>>>>>>>> (for example, on a stable topology in a read-only cluster) and
>> >> in
>> >>>>>>>>> such
>> >>>>>>>>>>>> case the approach with partition copying seems simpler and
>> >> faster.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> So, what do you think - do we need "online" re-encryption and
>> >>>>>> which
>> >>>>>>>>> of
>> >>>>>>>>>>>> the proposed options is best suited for this?
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> [1] https://issues.apache.org/jira/browse/IGNITE-12186
>> >>>>>>>>>>>> [2]
>> >>>>>>>>> https://www.pcisecuritystandards.org/documents/PCI_DSS_v3-2-1.pdf
>> >>>>>>>>>>>> [3]
>> >>>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>
>> >>>>>>
>> >> https://cwiki.apache.org/confluence/display/IGNITE/IEP-43%3A+Cluster+snapshots#IEP-43:Clustersnapshots-Partitionscopystrategy
>> >>>>>>>>>>>> [4]
>> >>>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>
>> >>>>>>
>> >> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase-3.Cachekeyrotation.-Copywithre-encryptiondesign
>> >>>>>>>>>>>> .
>> >>>>>>>>>>>> [5]
>> >>>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>
>> >>>>>>
>> >> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase-3.Cachekeyrotation.-Comparison
>> >>>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> --
>> >>>>>>>
>> >>>>>>> Best regards,
>> >>>>>>> Alexei Scherbakov
>> >>>>>>
>> >>>>>>
>> >>>>>
>> >>>>> --
>> >>>>>
>> >>>>> Best regards,
>> >>>>> Alexei Scherbakov
>> >>>>>
>> >>>>
>> >>>>
>> >>>> --
>> >>>>
>> >>>> Best regards,
>> >>>> Alexei Scherbakov
>> >>>>
>> >>>
>> >>>
>> >>> --
>> >>>
>> >>> Best regards,
>> >>> Alexei Scherbakov
>> >>
>> >>
>> >
>> > --
>> >
>> > Best regards,
>> > Alexei Scherbakov
>>
>
>
> --
>
> Best regards,
> Alexei Scherbakov

Maksim Stepachev

Re: [DISCUSS] Best way to re-encrypt existing data (TDE cache key rotation).

Hi!

Do you have any updates about this issue? What types of implementations
have you chosen (in-place, offline, or in the background)? I know that we
want to add a partition defragmentation function, we can add a hole to
integrate the re-encryption scheme. Could you update your IEP with your
plans?

пн, 25 мая 2020 г. в 12:50, Pavel Pereslegin <[hidden email]>:

> Nikolay, Alexei,
>
> thanks for your suggestions.
>
> Offline re-encryption does not seem so simple, we need to read/replace
> the existing encryption keys on all nodes (therefore, we should be
> able to read/write metastore/WAL and exchange data between the
> baseline nodes). Re-encryption in maintenance mode (for example, in a
> stable read-only cluster) will be simple, but it still looks very
> inconvenient, at least because users will need to interrupt all
> operations.
>
> The main advantage of online "in place" re-encryption is that we'll
> support multiple keys for reading, and this procedure does not
> directly depend on background re-encryption.
>
> So, the first step is similar to rotating the master key when the new
> key was set for writing on all nodes - that’s it, the cache group key
> rotation is complete (this is what PCI DSS requires - encrypt new
> updates with new keys).
> The second step is to re-encrypt the existing data, As I said
> previously I thought about scanning all partition pages in some
> background mode (store progress on the metapage to continue after
> restart), but rebalance approach should also work here if I figure out
> how to automate this process.
>
> пн, 25 мая 2020 г. в 12:22, Alexei Scherbakov <
> [hidden email]>:
> >
> >
> >
> > пн, 25 мая 2020 г. в 12:00, Nikolay Izhikov <[hidden email]>:
> >>
> >> > This willl takes us to the re-encryption using full rebalancing
> >>
> >> Rebalance will require 2x efforts for reencryption
> >>
> >> 1. Read and send data from supplier node.
> >> 2. Reencrypt and write data on demander node.
> >>
> >> Instead of
> >>
> >> 1. Read, reencrypt and write data on «demander» node.
> >
> >
> > Usually, reading and sending is not a bottleneck. And don't forget we
> can run out of WAL history and fall back to full rebalancing with partition
> eviction eliminating all efforts from offline re-encryption.
> >
> > On the other side, for a grid having many nodes one-by-one re-encryption
> can take a long time.
> > It should also be possible to re-encrypt all data as fast as possible
> if, for example, if a load can be switched to another grid, where offline
> encryption will come in handy.
> >
> > So, I suggest to implement offline re-encryption and online
> re-encryption using rebalancing as a first step.
> >
> > Next step can be online in-place re-encryption. It's important to
> measure business impact from it on online grid.
> >
> >>
> >>
> >>
> >> > 25 мая 2020 г., в 11:46, Alexei Scherbakov <
> [hidden email]> написал(а):
> >> >
> >> > For me, the one big disadvantage for offline re-encryption is the
> >> > possibility to run out of WAL history.
> >> > If an re-encryption takes a long time we will get full rebalancing
> with
> >> > partition eviction.
> >> > This willl takes us to the re-encryption using full rebalancing,
> proposed
> >> > by me earlier.
> >> >
> >> >
> >> >
> >> > пн, 25 мая 2020 г. в 11:27, Nikolay Izhikov <[hidden email]>:
> >> >
> >> >>> And definitely this approach is much simplier to implement
> >> >>
> >> >> I agree.
> >> >>
> >> >> If we allow to made nodes offline for reencryption then we can
> implement a
> >> >> fully offline procedure:
> >> >>
> >> >> 1. Stop node.
> >> >> 2. Execute some control.sh command that will reencrypt all data
> without
> >> >> starting node
> >> >> 3. Start node.
> >> >>
> >> >> Pavel, can you, please, write it one more time - what disadvantages
> in
> >> >> offline procedure?
> >> >>
> >> >>> 25 мая 2020 г., в 11:20, Alexei Scherbakov <
> [hidden email]>
> >> >> написал(а):
> >> >>>
> >> >>> And definitely this approach is much simplier to implement because
> all
> >> >>> corner cases are handled by rebalancing code.
> >> >>>
> >> >>> пн, 25 мая 2020 г. в 11:16, Alexei Scherbakov <
> >> >> [hidden email]
> >> >>>> :
> >> >>>
> >> >>>> I mean: serving supply requests.
> >> >>>>
> >> >>>> пн, 25 мая 2020 г. в 11:15, Alexei Scherbakov <
> >> >>>> [hidden email]>:
> >> >>>>
> >> >>>>> Nikolay,
> >> >>>>>
> >> >>>>> Can you explain why such restriction is necessary ?
> >> >>>>> Most likely having a currently re-encrypting node serving only
> demand
> >> >>>>> requests will have least preformance impact on a grid.
> >> >>>>>
> >> >>>>> пн, 25 мая 2020 г. в 11:08, Nikolay Izhikov <[hidden email]
> >:
> >> >>>>>
> >> >>>>>> Hello, Alexei.
> >> >>>>>>
> >> >>>>>> I think we want to implement this feature without nodes restart.
> >> >>>>>> In the ideal scenario all nodes will stay alive and respond to
> the
> >> >> user
> >> >>>>>> requests.
> >> >>>>>>
> >> >>>>>>> 24 мая 2020 г., в 15:24, Alexei Scherbakov <
> >> >>>>>> [hidden email]> написал(а):
> >> >>>>>>>
> >> >>>>>>> Pavel Pereslegin,
> >> >>>>>>>
> >> >>>>>>> I see another opportunity.
> >> >>>>>>> We can use rebalancing to re-encrypt node data with a new key.
> >> >>>>>>> It's a trivial procedure for me: stop a node, clear database,
> change
> >> >> a
> >> >>>>>> key,
> >> >>>>>>> start node and wait for rebalancing to complete.
> >> >>>>>>> Data will be re-encrypted during rebalancing.
> >> >>>>>>>
> >> >>>>>>> Did I miss something ?
> >> >>>>>>>
> >> >>>>>>> пт, 22 мая 2020 г. в 16:14, Ivan Rakov <[hidden email]>:
> >> >>>>>>>
> >> >>>>>>>> Folks,
> >> >>>>>>>>
> >> >>>>>>>> Just keeping you informed: I and my colleagues are highly
> interested
> >> >>>>>> in TDE
> >> >>>>>>>> in general and keys rotations specifically, but we don't have
> enough
> >> >>>>>> time
> >> >>>>>>>> so far.
> >> >>>>>>>> We'll dive into this feature and participate in reviews next
> month.
> >> >>>>>>>>
> >> >>>>>>>> --
> >> >>>>>>>> Best Regards,
> >> >>>>>>>> Ivan Rakov
> >> >>>>>>>>
> >> >>>>>>>> On Sun, May 17, 2020 at 10:51 PM Pavel Pereslegin <
> [hidden email]
> >> >>>
> >> >>>>>>>> wrote:
> >> >>>>>>>>
> >> >>>>>>>>> Hello, Alexey.
> >> >>>>>>>>>
> >> >>>>>>>>>> is the encryption key for the data the same on all nodes in
> the
> >> >>>>>>>> cluster?
> >> >>>>>>>>> Yes, each encrypted cache group has its own encryption key,
> the key
> >> >>>>>> is
> >> >>>>>>>>> the same on all nodes.
> >> >>>>>>>>>
> >> >>>>>>>>>> Clearly, during the re-encryption there will exist pages
> >> >>>>>>>>>> encrypted with both new and old keys at the same time.
> >> >>>>>>>>> Yes, there will be pages encrypted with different keys at the
> same
> >> >>>>>> time.
> >> >>>>>>>>> Currently, we only store one key for one cache group. To
> rotate a
> >> >>>>>> key,
> >> >>>>>>>>> at a certain point in time it is necessary to support several
> keys
> >> >>>>>> (at
> >> >>>>>>>>> least for reading the WAL).
> >> >>>>>>>>> For the "in place" strategy, we'll store the encryption key
> >> >>>>>> identifier
> >> >>>>>>>>> on each encrypted page (we currently have some unused space on
> >> >>>>>>>>> encrypted page, so I don't expect any memory overhead here).
> Thus,
> >> >> we
> >> >>>>>>>>> will have several keys for reading and one key for writing. I
> >> >> assume
> >> >>>>>>>>> that the old key will be automatically deleted when a
> specific WAL
> >> >>>>>>>>> segment is deleted (and re-encryption is finished).
> >> >>>>>>>>>
> >> >>>>>>>>>> Will a node continue to re-encrypt the data after it
> restarts?
> >> >>>>>>>>> Yes.
> >> >>>>>>>>>
> >> >>>>>>>>>> If a node goes down during the re-encryption, but the rest
> of the
> >> >>>>>>>>>> cluster finishes re-encryption, will we consider the
> procedure
> >> >>>>>>>> complete?
> >> >>>>>>>>> I'm not sure, but it looks like the key rotation is complete
> when
> >> >> we
> >> >>>>>>>>> set the new key on all nodes so that the updates will be
> encrypted
> >> >>>>>>>>> with the new key (as required by PCI DSS).
> >> >>>>>>>>> Status of re-encryption can be obtained separately (locally or
> >> >>>>>> cluster
> >> >>>>>>>>> wide).
> >> >>>>>>>>>
> >> >>>>>>>>> I forgot to mention that with “in place” re-encryption it
> will be
> >> >>>>>>>>> impossible to quickly cancel re-encryption, because by
> canceling we
> >> >>>>>>>>> mean re-encryption with the old key.
> >> >>>>>>>>>
> >> >>>>>>>>>> How do you see the whole key rotation procedure will work?
> >> >>>>>>>>> Initial design for re-encryption with "partition copying" is
> >> >>>>>> described
> >> >>>>>>>>> here [1]. I'll prepare detailed design for "in place"
> re-encryption
> >> >>>>>> if
> >> >>>>>>>>> we'll go this way. In short, send the new encryption key
> >> >>>>>> cluster-wide,
> >> >>>>>>>>> each node adds a new key and starts background re-encryption.
> >> >>>>>>>>>
> >> >>>>>>>>> [1]
> >> >>>>>>>>>
> >> >>>>>>>>
> >> >>>>>>
> >> >>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase-3.Cachekeyrotation.-Copywithre-encryptiondesign
> >> >>>>>>>>> .
> >> >>>>>>>>>
> >> >>>>>>>>> вс, 17 мая 2020 г. в 18:35, Alexey Goncharuk <
> >> >>>>>> [hidden email]
> >> >>>>>>>>> :
> >> >>>>>>>>>>
> >> >>>>>>>>>> Pavel, Anton,
> >> >>>>>>>>>>
> >> >>>>>>>>>> How do you see the whole key rotation procedure will work?
> >> >> Clearly,
> >> >>>>>>>>> during
> >> >>>>>>>>>> the re-encryption there will exist pages encrypted with both
> new
> >> >> and
> >> >>>>>>>> old
> >> >>>>>>>>>> keys at the same time. Will a node continue to re-encrypt
> the data
> >> >>>>>>>> after
> >> >>>>>>>>> it
> >> >>>>>>>>>> restarts? If a node goes down during the re-encryption, but
> the
> >> >>>>>> rest of
> >> >>>>>>>>> the
> >> >>>>>>>>>> cluster finishes re-encryption, will we consider the
> procedure
> >> >>>>>>>> complete?
> >> >>>>>>>>> By
> >> >>>>>>>>>> the way, is the encryption key for the data the same on all
> nodes
> >> >> in
> >> >>>>>>>> the
> >> >>>>>>>>>> cluster?
> >> >>>>>>>>>>
> >> >>>>>>>>>> чт, 14 мая 2020 г. в 11:30, Anton Vinogradov <[hidden email]
> >:
> >> >>>>>>>>>>
> >> >>>>>>>>>>> +1 to "In place re-encryption".
> >> >>>>>>>>>>>
> >> >>>>>>>>>>> - It has a simple design.
> >> >>>>>>>>>>> - Clusters under load may require just load to re-encrypt
> the
> >> >> data.
> >> >>>>>>>>>>> (Friendly to load).
> >> >>>>>>>>>>> - Easy to throttle.
> >> >>>>>>>>>>> - Easy to continue.
> >> >>>>>>>>>>> - Design compatible with the multi-key architecture.
> >> >>>>>>>>>>> - It can be optimized to use own WAL buffer and to
> re-encrypt
> >> >> pages
> >> >>>>>>>>> without
> >> >>>>>>>>>>> restoring them to on-heap.
> >> >>>>>>>>>>>
> >> >>>>>>>>>>> On Thu, May 14, 2020 at 1:54 AM Pavel Pereslegin <
> >> >> [hidden email]
> >> >>>>>>>
> >> >>>>>>>>> wrote:
> >> >>>>>>>>>>>
> >> >>>>>>>>>>>> Hello Igniters.
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>> Recently, master key rotation for Apache Ignite
> Transparent Data
> >> >>>>>>>>>>>> Encryption was implemented [1], but some security
> standards (PCI
> >> >>>>>>>> DSS
> >> >>>>>>>>>>>> at least) require rotation of all encryption keys [2].
> >> >> Currently,
> >> >>>>>>>>>>>> encryption occurs when reading/writing pages to disk, cache
> >> >>>>>>>>> encryption
> >> >>>>>>>>>>>> keys are stored in metastore.
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>> I'm going to contribute cache encryption key rotation and
> want
> >> >> to
> >> >>>>>>>>>>>> consult what is the best way to re-encrypting existing
> data, I
> >> >> see
> >> >>>>>>>>> two
> >> >>>>>>>>>>>> different strategies.
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>> 1. In place re-encryption:
> >> >>>>>>>>>>>> Using the old key, sequentially read all the pages from the
> >> >>>>>>>>> datastore,
> >> >>>>>>>>>>>> mark as dirty and log them into the WAL. After checkpoint
> pages
> >> >>>>>>>> will
> >> >>>>>>>>>>>> be stored to disk encrypted with the new key (as usual,
> along
> >> >> with
> >> >>>>>>>>>>>> updates). This strategy requires store the identifier
> (number)
> >> >> of
> >> >>>>>>>> the
> >> >>>>>>>>>>>> encryption key into the encrypted page.
> >> >>>>>>>>>>>> pros:
> >> >>>>>>>>>>>> - can work in the background with minimal performance
> impact
> >> >>>>>>>> (this
> >> >>>>>>>>>>>> impact can be managed).
> >> >>>>>>>>>>>> cons:
> >> >>>>>>>>>>>> - page duplication in the WAL may affect performance and
> >> >>>>>>>> historical
> >> >>>>>>>>>>>> rebalance.
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>> 2. Copy partition with re-encryption.
> >> >>>>>>>>>>>> This strategy is similar to partition snapshotting [3] -
> create
> >> >>>>>>>>>>>> partition copy encrypted with the new key and then replace
> the
> >> >>>>>>>>>>>> original partition file with the new one (see details [4]).
> >> >>>>>>>>>>>> pros:
> >> >>>>>>>>>>>> - should work faster than "in place" re-encryption.
> >> >>>>>>>>>>>> cons:
> >> >>>>>>>>>>>> - re-encryption in active cluster (and on unstable
> topology) can
> >> >>>>>>>> be
> >> >>>>>>>>>>>> difficult to implement.
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>> (See more detailed comparison [5])
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>> Re-encryption of existing data is a long and rare
> procedure (It
> >> >> is
> >> >>>>>>>>>>>> recommended to change the key every 6 months, but at least
> once
> >> >>>>>>>> every
> >> >>>>>>>>>>>> 2 years). Thus, re-encryption can be implemented for
> maintenance
> >> >>>>>>>> mode
> >> >>>>>>>>>>>> (for example, on a stable topology in a read-only cluster)
> and
> >> >> in
> >> >>>>>>>>> such
> >> >>>>>>>>>>>> case the approach with partition copying seems simpler and
> >> >> faster.
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>> So, what do you think - do we need "online" re-encryption
> and
> >> >>>>>> which
> >> >>>>>>>>> of
> >> >>>>>>>>>>>> the proposed options is best suited for this?
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>> [1] https://issues.apache.org/jira/browse/IGNITE-12186
> >> >>>>>>>>>>>> [2]
> >> >>>>>>>>>
> https://www.pcisecuritystandards.org/documents/PCI_DSS_v3-2-1.pdf
> >> >>>>>>>>>>>> [3]
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>
> >> >>>>>>>>>
> >> >>>>>>>>
> >> >>>>>>
> >> >>
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-43%3A+Cluster+snapshots#IEP-43:Clustersnapshots-Partitionscopystrategy
> >> >>>>>>>>>>>> [4]
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>
> >> >>>>>>>>>
> >> >>>>>>>>
> >> >>>>>>
> >> >>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase-3.Cachekeyrotation.-Copywithre-encryptiondesign
> >> >>>>>>>>>>>> .
> >> >>>>>>>>>>>> [5]
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>
> >> >>>>>>>>>
> >> >>>>>>>>
> >> >>>>>>
> >> >>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase-3.Cachekeyrotation.-Comparison
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>
> >> >>>>>>>>>
> >> >>>>>>>>
> >> >>>>>>>
> >> >>>>>>>
> >> >>>>>>> --
> >> >>>>>>>
> >> >>>>>>> Best regards,
> >> >>>>>>> Alexei Scherbakov
> >> >>>>>>
> >> >>>>>>
> >> >>>>>
> >> >>>>> --
> >> >>>>>
> >> >>>>> Best regards,
> >> >>>>> Alexei Scherbakov
> >> >>>>>
> >> >>>>
> >> >>>>
> >> >>>> --
> >> >>>>
> >> >>>> Best regards,
> >> >>>> Alexei Scherbakov
> >> >>>>
> >> >>>
> >> >>>
> >> >>> --
> >> >>>
> >> >>> Best regards,
> >> >>> Alexei Scherbakov
> >> >>
> >> >>
> >> >
> >> > --
> >> >
> >> > Best regards,
> >> > Alexei Scherbakov
> >>
> >
> >
> > --
> >
> > Best regards,
> > Alexei Scherbakov
>

Pavel Pereslegin

Re: [DISCUSS] Best way to re-encrypt existing data (TDE cache key rotation).

Hello, Maksim.

For implementation, I chose so-called "in place background
re-encryption" design.

The first step is to rotate the key for writing data, it only works on
the active cluster, at the moment..
The second step is re-encryption (to remove previous encryption key).
If node was restarted reencryption starts after metastorage becomes
ready for read/write. Each "re-encrypted" partition (including index)
has an attribute on the meta page that indicates whether background
re-encryption should be continued.

I updated the description in wiki [1].
Some more details in jira [2].
Draft PR [3].

[1] https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384
[2] https://issues.apache.org/jira/browse/IGNITE-12843
[3] https://github.com/apache/ignite/pull/7941

вт, 7 июл. 2020 г. в 13:49, Maksim Stepachev <[hidden email]>:

>
> Hi!
>
> Do you have any updates about this issue? What types of implementations
> have you chosen (in-place, offline, or in the background)? I know that we
> want to add a partition defragmentation function, we can add a hole to
> integrate the re-encryption scheme. Could you update your IEP with your
> plans?
>
> пн, 25 мая 2020 г. в 12:50, Pavel Pereslegin <[hidden email]>:
>
> > Nikolay, Alexei,
> >
> > thanks for your suggestions.
> >
> > Offline re-encryption does not seem so simple, we need to read/replace
> > the existing encryption keys on all nodes (therefore, we should be
> > able to read/write metastore/WAL and exchange data between the
> > baseline nodes). Re-encryption in maintenance mode (for example, in a
> > stable read-only cluster) will be simple, but it still looks very
> > inconvenient, at least because users will need to interrupt all
> > operations.
> >
> > The main advantage of online "in place" re-encryption is that we'll
> > support multiple keys for reading, and this procedure does not
> > directly depend on background re-encryption.
> >
> > So, the first step is similar to rotating the master key when the new
> > key was set for writing on all nodes - that’s it, the cache group key
> > rotation is complete (this is what PCI DSS requires - encrypt new
> > updates with new keys).
> > The second step is to re-encrypt the existing data, As I said
> > previously I thought about scanning all partition pages in some
> > background mode (store progress on the metapage to continue after
> > restart), but rebalance approach should also work here if I figure out
> > how to automate this process.
> >
> > пн, 25 мая 2020 г. в 12:22, Alexei Scherbakov <
> > [hidden email]>:
> > >
> > >
> > >
> > > пн, 25 мая 2020 г. в 12:00, Nikolay Izhikov <[hidden email]>:
> > >>
> > >> > This willl takes us to the re-encryption using full rebalancing
> > >>
> > >> Rebalance will require 2x efforts for reencryption
> > >>
> > >> 1. Read and send data from supplier node.
> > >> 2. Reencrypt and write data on demander node.
> > >>
> > >> Instead of
> > >>
> > >> 1. Read, reencrypt and write data on «demander» node.
> > >
> > >
> > > Usually, reading and sending is not a bottleneck. And don't forget we
> > can run out of WAL history and fall back to full rebalancing with partition
> > eviction eliminating all efforts from offline re-encryption.
> > >
> > > On the other side, for a grid having many nodes one-by-one re-encryption
> > can take a long time.
> > > It should also be possible to re-encrypt all data as fast as possible
> > if, for example, if a load can be switched to another grid, where offline
> > encryption will come in handy.
> > >
> > > So, I suggest to implement offline re-encryption and online
> > re-encryption using rebalancing as a first step.
> > >
> > > Next step can be online in-place re-encryption. It's important to
> > measure business impact from it on online grid.
> > >
> > >>
> > >>
> > >>
> > >> > 25 мая 2020 г., в 11:46, Alexei Scherbakov <
> > [hidden email]> написал(а):
> > >> >
> > >> > For me, the one big disadvantage for offline re-encryption is the
> > >> > possibility to run out of WAL history.
> > >> > If an re-encryption takes a long time we will get full rebalancing
> > with
> > >> > partition eviction.
> > >> > This willl takes us to the re-encryption using full rebalancing,
> > proposed
> > >> > by me earlier.
> > >> >
> > >> >
> > >> >
> > >> > пн, 25 мая 2020 г. в 11:27, Nikolay Izhikov <[hidden email]>:
> > >> >
> > >> >>> And definitely this approach is much simplier to implement
> > >> >>
> > >> >> I agree.
> > >> >>
> > >> >> If we allow to made nodes offline for reencryption then we can
> > implement a
> > >> >> fully offline procedure:
> > >> >>
> > >> >> 1. Stop node.
> > >> >> 2. Execute some control.sh command that will reencrypt all data
> > without
> > >> >> starting node
> > >> >> 3. Start node.
> > >> >>
> > >> >> Pavel, can you, please, write it one more time - what disadvantages
> > in
> > >> >> offline procedure?
> > >> >>
> > >> >>> 25 мая 2020 г., в 11:20, Alexei Scherbakov <
> > [hidden email]>
> > >> >> написал(а):
> > >> >>>
> > >> >>> And definitely this approach is much simplier to implement because
> > all
> > >> >>> corner cases are handled by rebalancing code.
> > >> >>>
> > >> >>> пн, 25 мая 2020 г. в 11:16, Alexei Scherbakov <
> > >> >> [hidden email]
> > >> >>>> :
> > >> >>>
> > >> >>>> I mean: serving supply requests.
> > >> >>>>
> > >> >>>> пн, 25 мая 2020 г. в 11:15, Alexei Scherbakov <
> > >> >>>> [hidden email]>:
> > >> >>>>
> > >> >>>>> Nikolay,
> > >> >>>>>
> > >> >>>>> Can you explain why such restriction is necessary ?
> > >> >>>>> Most likely having a currently re-encrypting node serving only
> > demand
> > >> >>>>> requests will have least preformance impact on a grid.
> > >> >>>>>
> > >> >>>>> пн, 25 мая 2020 г. в 11:08, Nikolay Izhikov <[hidden email]
> > >:
> > >> >>>>>
> > >> >>>>>> Hello, Alexei.
> > >> >>>>>>
> > >> >>>>>> I think we want to implement this feature without nodes restart.
> > >> >>>>>> In the ideal scenario all nodes will stay alive and respond to
> > the
> > >> >> user
> > >> >>>>>> requests.
> > >> >>>>>>
> > >> >>>>>>> 24 мая 2020 г., в 15:24, Alexei Scherbakov <
> > >> >>>>>> [hidden email]> написал(а):
> > >> >>>>>>>
> > >> >>>>>>> Pavel Pereslegin,
> > >> >>>>>>>
> > >> >>>>>>> I see another opportunity.
> > >> >>>>>>> We can use rebalancing to re-encrypt node data with a new key.
> > >> >>>>>>> It's a trivial procedure for me: stop a node, clear database,
> > change
> > >> >> a
> > >> >>>>>> key,
> > >> >>>>>>> start node and wait for rebalancing to complete.
> > >> >>>>>>> Data will be re-encrypted during rebalancing.
> > >> >>>>>>>
> > >> >>>>>>> Did I miss something ?
> > >> >>>>>>>
> > >> >>>>>>> пт, 22 мая 2020 г. в 16:14, Ivan Rakov <[hidden email]>:
> > >> >>>>>>>
> > >> >>>>>>>> Folks,
> > >> >>>>>>>>
> > >> >>>>>>>> Just keeping you informed: I and my colleagues are highly
> > interested
> > >> >>>>>> in TDE
> > >> >>>>>>>> in general and keys rotations specifically, but we don't have
> > enough
> > >> >>>>>> time
> > >> >>>>>>>> so far.
> > >> >>>>>>>> We'll dive into this feature and participate in reviews next
> > month.
> > >> >>>>>>>>
> > >> >>>>>>>> --
> > >> >>>>>>>> Best Regards,
> > >> >>>>>>>> Ivan Rakov
> > >> >>>>>>>>
> > >> >>>>>>>> On Sun, May 17, 2020 at 10:51 PM Pavel Pereslegin <
> > [hidden email]
> > >> >>>
> > >> >>>>>>>> wrote:
> > >> >>>>>>>>
> > >> >>>>>>>>> Hello, Alexey.
> > >> >>>>>>>>>
> > >> >>>>>>>>>> is the encryption key for the data the same on all nodes in
> > the
> > >> >>>>>>>> cluster?
> > >> >>>>>>>>> Yes, each encrypted cache group has its own encryption key,
> > the key
> > >> >>>>>> is
> > >> >>>>>>>>> the same on all nodes.
> > >> >>>>>>>>>
> > >> >>>>>>>>>> Clearly, during the re-encryption there will exist pages
> > >> >>>>>>>>>> encrypted with both new and old keys at the same time.
> > >> >>>>>>>>> Yes, there will be pages encrypted with different keys at the
> > same
> > >> >>>>>> time.
> > >> >>>>>>>>> Currently, we only store one key for one cache group. To
> > rotate a
> > >> >>>>>> key,
> > >> >>>>>>>>> at a certain point in time it is necessary to support several
> > keys
> > >> >>>>>> (at
> > >> >>>>>>>>> least for reading the WAL).
> > >> >>>>>>>>> For the "in place" strategy, we'll store the encryption key
> > >> >>>>>> identifier
> > >> >>>>>>>>> on each encrypted page (we currently have some unused space on
> > >> >>>>>>>>> encrypted page, so I don't expect any memory overhead here).
> > Thus,
> > >> >> we
> > >> >>>>>>>>> will have several keys for reading and one key for writing. I
> > >> >> assume
> > >> >>>>>>>>> that the old key will be automatically deleted when a
> > specific WAL
> > >> >>>>>>>>> segment is deleted (and re-encryption is finished).
> > >> >>>>>>>>>
> > >> >>>>>>>>>> Will a node continue to re-encrypt the data after it
> > restarts?
> > >> >>>>>>>>> Yes.
> > >> >>>>>>>>>
> > >> >>>>>>>>>> If a node goes down during the re-encryption, but the rest
> > of the
> > >> >>>>>>>>>> cluster finishes re-encryption, will we consider the
> > procedure
> > >> >>>>>>>> complete?
> > >> >>>>>>>>> I'm not sure, but it looks like the key rotation is complete
> > when
> > >> >> we
> > >> >>>>>>>>> set the new key on all nodes so that the updates will be
> > encrypted
> > >> >>>>>>>>> with the new key (as required by PCI DSS).
> > >> >>>>>>>>> Status of re-encryption can be obtained separately (locally or
> > >> >>>>>> cluster
> > >> >>>>>>>>> wide).
> > >> >>>>>>>>>
> > >> >>>>>>>>> I forgot to mention that with “in place” re-encryption it
> > will be
> > >> >>>>>>>>> impossible to quickly cancel re-encryption, because by
> > canceling we
> > >> >>>>>>>>> mean re-encryption with the old key.
> > >> >>>>>>>>>
> > >> >>>>>>>>>> How do you see the whole key rotation procedure will work?
> > >> >>>>>>>>> Initial design for re-encryption with "partition copying" is
> > >> >>>>>> described
> > >> >>>>>>>>> here [1]. I'll prepare detailed design for "in place"
> > re-encryption
> > >> >>>>>> if
> > >> >>>>>>>>> we'll go this way. In short, send the new encryption key
> > >> >>>>>> cluster-wide,
> > >> >>>>>>>>> each node adds a new key and starts background re-encryption.
> > >> >>>>>>>>>
> > >> >>>>>>>>> [1]
> > >> >>>>>>>>>
> > >> >>>>>>>>
> > >> >>>>>>
> > >> >>
> > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase-3.Cachekeyrotation.-Copywithre-encryptiondesign
> > >> >>>>>>>>> .
> > >> >>>>>>>>>
> > >> >>>>>>>>> вс, 17 мая 2020 г. в 18:35, Alexey Goncharuk <
> > >> >>>>>> [hidden email]
> > >> >>>>>>>>> :
> > >> >>>>>>>>>>
> > >> >>>>>>>>>> Pavel, Anton,
> > >> >>>>>>>>>>
> > >> >>>>>>>>>> How do you see the whole key rotation procedure will work?
> > >> >> Clearly,
> > >> >>>>>>>>> during
> > >> >>>>>>>>>> the re-encryption there will exist pages encrypted with both
> > new
> > >> >> and
> > >> >>>>>>>> old
> > >> >>>>>>>>>> keys at the same time. Will a node continue to re-encrypt
> > the data
> > >> >>>>>>>> after
> > >> >>>>>>>>> it
> > >> >>>>>>>>>> restarts? If a node goes down during the re-encryption, but
> > the
> > >> >>>>>> rest of
> > >> >>>>>>>>> the
> > >> >>>>>>>>>> cluster finishes re-encryption, will we consider the
> > procedure
> > >> >>>>>>>> complete?
> > >> >>>>>>>>> By
> > >> >>>>>>>>>> the way, is the encryption key for the data the same on all
> > nodes
> > >> >> in
> > >> >>>>>>>> the
> > >> >>>>>>>>>> cluster?
> > >> >>>>>>>>>>
> > >> >>>>>>>>>> чт, 14 мая 2020 г. в 11:30, Anton Vinogradov <[hidden email]
> > >:
> > >> >>>>>>>>>>
> > >> >>>>>>>>>>> +1 to "In place re-encryption".
> > >> >>>>>>>>>>>
> > >> >>>>>>>>>>> - It has a simple design.
> > >> >>>>>>>>>>> - Clusters under load may require just load to re-encrypt
> > the
> > >> >> data.
> > >> >>>>>>>>>>> (Friendly to load).
> > >> >>>>>>>>>>> - Easy to throttle.
> > >> >>>>>>>>>>> - Easy to continue.
> > >> >>>>>>>>>>> - Design compatible with the multi-key architecture.
> > >> >>>>>>>>>>> - It can be optimized to use own WAL buffer and to
> > re-encrypt
> > >> >> pages
> > >> >>>>>>>>> without
> > >> >>>>>>>>>>> restoring them to on-heap.
> > >> >>>>>>>>>>>
> > >> >>>>>>>>>>> On Thu, May 14, 2020 at 1:54 AM Pavel Pereslegin <
> > >> >> [hidden email]
> > >> >>>>>>>
> > >> >>>>>>>>> wrote:
> > >> >>>>>>>>>>>
> > >> >>>>>>>>>>>> Hello Igniters.
> > >> >>>>>>>>>>>>
> > >> >>>>>>>>>>>> Recently, master key rotation for Apache Ignite
> > Transparent Data
> > >> >>>>>>>>>>>> Encryption was implemented [1], but some security
> > standards (PCI
> > >> >>>>>>>> DSS
> > >> >>>>>>>>>>>> at least) require rotation of all encryption keys [2].
> > >> >> Currently,
> > >> >>>>>>>>>>>> encryption occurs when reading/writing pages to disk, cache
> > >> >>>>>>>>> encryption
> > >> >>>>>>>>>>>> keys are stored in metastore.
> > >> >>>>>>>>>>>>
> > >> >>>>>>>>>>>> I'm going to contribute cache encryption key rotation and
> > want
> > >> >> to
> > >> >>>>>>>>>>>> consult what is the best way to re-encrypting existing
> > data, I
> > >> >> see
> > >> >>>>>>>>> two
> > >> >>>>>>>>>>>> different strategies.
> > >> >>>>>>>>>>>>
> > >> >>>>>>>>>>>> 1. In place re-encryption:
> > >> >>>>>>>>>>>> Using the old key, sequentially read all the pages from the
> > >> >>>>>>>>> datastore,
> > >> >>>>>>>>>>>> mark as dirty and log them into the WAL. After checkpoint
> > pages
> > >> >>>>>>>> will
> > >> >>>>>>>>>>>> be stored to disk encrypted with the new key (as usual,
> > along
> > >> >> with
> > >> >>>>>>>>>>>> updates). This strategy requires store the identifier
> > (number)
> > >> >> of
> > >> >>>>>>>> the
> > >> >>>>>>>>>>>> encryption key into the encrypted page.
> > >> >>>>>>>>>>>> pros:
> > >> >>>>>>>>>>>> - can work in the background with minimal performance
> > impact
> > >> >>>>>>>> (this
> > >> >>>>>>>>>>>> impact can be managed).
> > >> >>>>>>>>>>>> cons:
> > >> >>>>>>>>>>>> - page duplication in the WAL may affect performance and
> > >> >>>>>>>> historical
> > >> >>>>>>>>>>>> rebalance.
> > >> >>>>>>>>>>>>
> > >> >>>>>>>>>>>> 2. Copy partition with re-encryption.
> > >> >>>>>>>>>>>> This strategy is similar to partition snapshotting [3] -
> > create
> > >> >>>>>>>>>>>> partition copy encrypted with the new key and then replace
> > the
> > >> >>>>>>>>>>>> original partition file with the new one (see details [4]).
> > >> >>>>>>>>>>>> pros:
> > >> >>>>>>>>>>>> - should work faster than "in place" re-encryption.
> > >> >>>>>>>>>>>> cons:
> > >> >>>>>>>>>>>> - re-encryption in active cluster (and on unstable
> > topology) can
> > >> >>>>>>>> be
> > >> >>>>>>>>>>>> difficult to implement.
> > >> >>>>>>>>>>>>
> > >> >>>>>>>>>>>> (See more detailed comparison [5])
> > >> >>>>>>>>>>>>
> > >> >>>>>>>>>>>> Re-encryption of existing data is a long and rare
> > procedure (It
> > >> >> is
> > >> >>>>>>>>>>>> recommended to change the key every 6 months, but at least
> > once
> > >> >>>>>>>> every
> > >> >>>>>>>>>>>> 2 years). Thus, re-encryption can be implemented for
> > maintenance
> > >> >>>>>>>> mode
> > >> >>>>>>>>>>>> (for example, on a stable topology in a read-only cluster)
> > and
> > >> >> in
> > >> >>>>>>>>> such
> > >> >>>>>>>>>>>> case the approach with partition copying seems simpler and
> > >> >> faster.
> > >> >>>>>>>>>>>>
> > >> >>>>>>>>>>>> So, what do you think - do we need "online" re-encryption
> > and
> > >> >>>>>> which
> > >> >>>>>>>>> of
> > >> >>>>>>>>>>>> the proposed options is best suited for this?
> > >> >>>>>>>>>>>>
> > >> >>>>>>>>>>>> [1] https://issues.apache.org/jira/browse/IGNITE-12186
> > >> >>>>>>>>>>>> [2]
> > >> >>>>>>>>>
> > https://www.pcisecuritystandards.org/documents/PCI_DSS_v3-2-1.pdf
> > >> >>>>>>>>>>>> [3]
> > >> >>>>>>>>>>>>
> > >> >>>>>>>>>>>
> > >> >>>>>>>>>
> > >> >>>>>>>>
> > >> >>>>>>
> > >> >>
> > https://cwiki.apache.org/confluence/display/IGNITE/IEP-43%3A+Cluster+snapshots#IEP-43:Clustersnapshots-Partitionscopystrategy
> > >> >>>>>>>>>>>> [4]
> > >> >>>>>>>>>>>>
> > >> >>>>>>>>>>>
> > >> >>>>>>>>>
> > >> >>>>>>>>
> > >> >>>>>>
> > >> >>
> > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase-3.Cachekeyrotation.-Copywithre-encryptiondesign
> > >> >>>>>>>>>>>> .
> > >> >>>>>>>>>>>> [5]
> > >> >>>>>>>>>>>>
> > >> >>>>>>>>>>>
> > >> >>>>>>>>>
> > >> >>>>>>>>
> > >> >>>>>>
> > >> >>
> > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase-3.Cachekeyrotation.-Comparison
> > >> >>>>>>>>>>>>
> > >> >>>>>>>>>>>
> > >> >>>>>>>>>
> > >> >>>>>>>>
> > >> >>>>>>>
> > >> >>>>>>>
> > >> >>>>>>> --
> > >> >>>>>>>
> > >> >>>>>>> Best regards,
> > >> >>>>>>> Alexei Scherbakov
> > >> >>>>>>
> > >> >>>>>>
> > >> >>>>>
> > >> >>>>> --
> > >> >>>>>
> > >> >>>>> Best regards,
> > >> >>>>> Alexei Scherbakov
> > >> >>>>>
> > >> >>>>
> > >> >>>>
> > >> >>>> --
> > >> >>>>
> > >> >>>> Best regards,
> > >> >>>> Alexei Scherbakov
> > >> >>>>
> > >> >>>
> > >> >>>
> > >> >>> --
> > >> >>>
> > >> >>> Best regards,
> > >> >>> Alexei Scherbakov
> > >> >>
> > >> >>
> > >> >
> > >> > --
> > >> >
> > >> > Best regards,
> > >> > Alexei Scherbakov
> > >>
> > >
> > >
> > > --
> > >
> > > Best regards,
> > > Alexei Scherbakov
> >

Maksim Stepachev

Re: [DISCUSS] Best way to re-encrypt existing data (TDE cache key rotation).

Hello everyone, yesterday we discussed the implementation of TDE over a
conference call. I added a summary of this call here:

1. The wiki documentation should be expanded. It should describe the
steps - how it works under the hood. What are the domain objects in the
implementation?
2. We should try to run the existing test suites in encryption mode.
Encryption should not affect any PDS or other tests.
3. SPI requires an additional method such as getKeyDigest, because the
current implementation of GridEncryptionManager#masterKeyDigest() looks
strange. We reset the master key to calculate the digest. This will not
work well if we want to use VOLT as a key provider implementation.
4. Recommendation - the encryption processor should be divided into
external subclasses, and we should use the OOP decomposition pattern for
it. Right now, this class has more than 2000 lines and does not support
SOLID. This is similar to inline unrelated logic with a single class.
5. Recommendation - we should not use tuples and triples, because this
is a marker of a design problem.
6. Strict recommendation - please don't put context everywhere. it
should only be used in the parent class. You can pass the necessary
dependencies through the constructor, as in the DI pattern.
7. Question -the current implementation does not use the throttling that
is implemented in PDS. Users should set the throughput such as 5 MB per
second, but not the timeout, packet size, or stream size.
8. Question - why we add a lot of system properties? Why we didn’t add a
configuration for it?
9. Question - How do we optimize when we can check that this page is
already encrypted by parallel loading? Maybe we should do this in Phase 4?
10. Question - CRC is read in two places encryptionFileIO and
filePageStore - what should we do with this?
11. We should remember about complicated test scenarios with failover
like node left when encryption started and joined after it finished. In the
process, the baseline changed node left before / after / in the middle of
this process. And etc.
12. How to use a sandbox to protect our cluster of master and user key
stealing via compute?
13. Will re-encryption continue after the cluster is completely stopped?

If I forgot some points, you can add them to the message.

вт, 7 июл. 2020 г. в 17:40, Pavel Pereslegin <[hidden email]>:

> Hello, Maksim.
>
> For implementation, I chose so-called "in place background
> re-encryption" design.
>
> The first step is to rotate the key for writing data, it only works on
> the active cluster, at the moment..
> The second step is re-encryption (to remove previous encryption key).
> If node was restarted reencryption starts after metastorage becomes
> ready for read/write. Each "re-encrypted" partition (including index)
> has an attribute on the meta page that indicates whether background
> re-encryption should be continued.
>
> I updated the description in wiki [1].
> Some more details in jira [2].
> Draft PR [3].
>
> [1]
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384
> [2] https://issues.apache.org/jira/browse/IGNITE-12843
> [3] https://github.com/apache/ignite/pull/7941
>
> вт, 7 июл. 2020 г. в 13:49, Maksim Stepachev <[hidden email]>:
> >
> > Hi!
> >
> > Do you have any updates about this issue? What types of implementations
> > have you chosen (in-place, offline, or in the background)? I know that we
> > want to add a partition defragmentation function, we can add a hole to
> > integrate the re-encryption scheme. Could you update your IEP with your
> > plans?
> >
> > пн, 25 мая 2020 г. в 12:50, Pavel Pereslegin <[hidden email]>:
> >
> > > Nikolay, Alexei,
> > >
> > > thanks for your suggestions.
> > >
> > > Offline re-encryption does not seem so simple, we need to read/replace
> > > the existing encryption keys on all nodes (therefore, we should be
> > > able to read/write metastore/WAL and exchange data between the
> > > baseline nodes). Re-encryption in maintenance mode (for example, in a
> > > stable read-only cluster) will be simple, but it still looks very
> > > inconvenient, at least because users will need to interrupt all
> > > operations.
> > >
> > > The main advantage of online "in place" re-encryption is that we'll
> > > support multiple keys for reading, and this procedure does not
> > > directly depend on background re-encryption.
> > >
> > > So, the first step is similar to rotating the master key when the new
> > > key was set for writing on all nodes - that’s it, the cache group key
> > > rotation is complete (this is what PCI DSS requires - encrypt new
> > > updates with new keys).
> > > The second step is to re-encrypt the existing data, As I said
> > > previously I thought about scanning all partition pages in some
> > > background mode (store progress on the metapage to continue after
> > > restart), but rebalance approach should also work here if I figure out
> > > how to automate this process.
> > >
> > > пн, 25 мая 2020 г. в 12:22, Alexei Scherbakov <
> > > [hidden email]>:
> > > >
> > > >
> > > >
> > > > пн, 25 мая 2020 г. в 12:00, Nikolay Izhikov <[hidden email]>:
> > > >>
> > > >> > This willl takes us to the re-encryption using full rebalancing
> > > >>
> > > >> Rebalance will require 2x efforts for reencryption
> > > >>
> > > >> 1. Read and send data from supplier node.
> > > >> 2. Reencrypt and write data on demander node.
> > > >>
> > > >> Instead of
> > > >>
> > > >> 1. Read, reencrypt and write data on «demander» node.
> > > >
> > > >
> > > > Usually, reading and sending is not a bottleneck. And don't forget we
> > > can run out of WAL history and fall back to full rebalancing with
> partition
> > > eviction eliminating all efforts from offline re-encryption.
> > > >
> > > > On the other side, for a grid having many nodes one-by-one
> re-encryption
> > > can take a long time.
> > > > It should also be possible to re-encrypt all data as fast as possible
> > > if, for example, if a load can be switched to another grid, where
> offline
> > > encryption will come in handy.
> > > >
> > > > So, I suggest to implement offline re-encryption and online
> > > re-encryption using rebalancing as a first step.
> > > >
> > > > Next step can be online in-place re-encryption. It's important to
> > > measure business impact from it on online grid.
> > > >
> > > >>
> > > >>
> > > >>
> > > >> > 25 мая 2020 г., в 11:46, Alexei Scherbakov <
> > > [hidden email]> написал(а):
> > > >> >
> > > >> > For me, the one big disadvantage for offline re-encryption is the
> > > >> > possibility to run out of WAL history.
> > > >> > If an re-encryption takes a long time we will get full rebalancing
> > > with
> > > >> > partition eviction.
> > > >> > This willl takes us to the re-encryption using full rebalancing,
> > > proposed
> > > >> > by me earlier.
> > > >> >
> > > >> >
> > > >> >
> > > >> > пн, 25 мая 2020 г. в 11:27, Nikolay Izhikov <[hidden email]
> >:
> > > >> >
> > > >> >>> And definitely this approach is much simplier to implement
> > > >> >>
> > > >> >> I agree.
> > > >> >>
> > > >> >> If we allow to made nodes offline for reencryption then we can
> > > implement a
> > > >> >> fully offline procedure:
> > > >> >>
> > > >> >> 1. Stop node.
> > > >> >> 2. Execute some control.sh command that will reencrypt all data
> > > without
> > > >> >> starting node
> > > >> >> 3. Start node.
> > > >> >>
> > > >> >> Pavel, can you, please, write it one more time - what
> disadvantages
> > > in
> > > >> >> offline procedure?
> > > >> >>
> > > >> >>> 25 мая 2020 г., в 11:20, Alexei Scherbakov <
> > > [hidden email]>
> > > >> >> написал(а):
> > > >> >>>
> > > >> >>> And definitely this approach is much simplier to implement
> because
> > > all
> > > >> >>> corner cases are handled by rebalancing code.
> > > >> >>>
> > > >> >>> пн, 25 мая 2020 г. в 11:16, Alexei Scherbakov <
> > > >> >> [hidden email]
> > > >> >>>> :
> > > >> >>>
> > > >> >>>> I mean: serving supply requests.
> > > >> >>>>
> > > >> >>>> пн, 25 мая 2020 г. в 11:15, Alexei Scherbakov <
> > > >> >>>> [hidden email]>:
> > > >> >>>>
> > > >> >>>>> Nikolay,
> > > >> >>>>>
> > > >> >>>>> Can you explain why such restriction is necessary ?
> > > >> >>>>> Most likely having a currently re-encrypting node serving only
> > > demand
> > > >> >>>>> requests will have least preformance impact on a grid.
> > > >> >>>>>
> > > >> >>>>> пн, 25 мая 2020 г. в 11:08, Nikolay Izhikov <
> [hidden email]
> > > >:
> > > >> >>>>>
> > > >> >>>>>> Hello, Alexei.
> > > >> >>>>>>
> > > >> >>>>>> I think we want to implement this feature without nodes
> restart.
> > > >> >>>>>> In the ideal scenario all nodes will stay alive and respond
> to
> > > the
> > > >> >> user
> > > >> >>>>>> requests.
> > > >> >>>>>>
> > > >> >>>>>>> 24 мая 2020 г., в 15:24, Alexei Scherbakov <
> > > >> >>>>>> [hidden email]> написал(а):
> > > >> >>>>>>>
> > > >> >>>>>>> Pavel Pereslegin,
> > > >> >>>>>>>
> > > >> >>>>>>> I see another opportunity.
> > > >> >>>>>>> We can use rebalancing to re-encrypt node data with a new
> key.
> > > >> >>>>>>> It's a trivial procedure for me: stop a node, clear
> database,
> > > change
> > > >> >> a
> > > >> >>>>>> key,
> > > >> >>>>>>> start node and wait for rebalancing to complete.
> > > >> >>>>>>> Data will be re-encrypted during rebalancing.
> > > >> >>>>>>>
> > > >> >>>>>>> Did I miss something ?
> > > >> >>>>>>>
> > > >> >>>>>>> пт, 22 мая 2020 г. в 16:14, Ivan Rakov <
> [hidden email]>:
> > > >> >>>>>>>
> > > >> >>>>>>>> Folks,
> > > >> >>>>>>>>
> > > >> >>>>>>>> Just keeping you informed: I and my colleagues are highly
> > > interested
> > > >> >>>>>> in TDE
> > > >> >>>>>>>> in general and keys rotations specifically, but we don't
> have
> > > enough
> > > >> >>>>>> time
> > > >> >>>>>>>> so far.
> > > >> >>>>>>>> We'll dive into this feature and participate in reviews
> next
> > > month.
> > > >> >>>>>>>>
> > > >> >>>>>>>> --
> > > >> >>>>>>>> Best Regards,
> > > >> >>>>>>>> Ivan Rakov
> > > >> >>>>>>>>
> > > >> >>>>>>>> On Sun, May 17, 2020 at 10:51 PM Pavel Pereslegin <
> > > [hidden email]
> > > >> >>>
> > > >> >>>>>>>> wrote:
> > > >> >>>>>>>>
> > > >> >>>>>>>>> Hello, Alexey.
> > > >> >>>>>>>>>
> > > >> >>>>>>>>>> is the encryption key for the data the same on all nodes
> in
> > > the
> > > >> >>>>>>>> cluster?
> > > >> >>>>>>>>> Yes, each encrypted cache group has its own encryption
> key,
> > > the key
> > > >> >>>>>> is
> > > >> >>>>>>>>> the same on all nodes.
> > > >> >>>>>>>>>
> > > >> >>>>>>>>>> Clearly, during the re-encryption there will exist pages
> > > >> >>>>>>>>>> encrypted with both new and old keys at the same time.
> > > >> >>>>>>>>> Yes, there will be pages encrypted with different keys at
> the
> > > same
> > > >> >>>>>> time.
> > > >> >>>>>>>>> Currently, we only store one key for one cache group. To
> > > rotate a
> > > >> >>>>>> key,
> > > >> >>>>>>>>> at a certain point in time it is necessary to support
> several
> > > keys
> > > >> >>>>>> (at
> > > >> >>>>>>>>> least for reading the WAL).
> > > >> >>>>>>>>> For the "in place" strategy, we'll store the encryption
> key
> > > >> >>>>>> identifier
> > > >> >>>>>>>>> on each encrypted page (we currently have some unused
> space on
> > > >> >>>>>>>>> encrypted page, so I don't expect any memory overhead
> here).
> > > Thus,
> > > >> >> we
> > > >> >>>>>>>>> will have several keys for reading and one key for
> writing. I
> > > >> >> assume
> > > >> >>>>>>>>> that the old key will be automatically deleted when a
> > > specific WAL
> > > >> >>>>>>>>> segment is deleted (and re-encryption is finished).
> > > >> >>>>>>>>>
> > > >> >>>>>>>>>> Will a node continue to re-encrypt the data after it
> > > restarts?
> > > >> >>>>>>>>> Yes.
> > > >> >>>>>>>>>
> > > >> >>>>>>>>>> If a node goes down during the re-encryption, but the
> rest
> > > of the
> > > >> >>>>>>>>>> cluster finishes re-encryption, will we consider the
> > > procedure
> > > >> >>>>>>>> complete?
> > > >> >>>>>>>>> I'm not sure, but it looks like the key rotation is
> complete
> > > when
> > > >> >> we
> > > >> >>>>>>>>> set the new key on all nodes so that the updates will be
> > > encrypted
> > > >> >>>>>>>>> with the new key (as required by PCI DSS).
> > > >> >>>>>>>>> Status of re-encryption can be obtained separately
> (locally or
> > > >> >>>>>> cluster
> > > >> >>>>>>>>> wide).
> > > >> >>>>>>>>>
> > > >> >>>>>>>>> I forgot to mention that with “in place” re-encryption it
> > > will be
> > > >> >>>>>>>>> impossible to quickly cancel re-encryption, because by
> > > canceling we
> > > >> >>>>>>>>> mean re-encryption with the old key.
> > > >> >>>>>>>>>
> > > >> >>>>>>>>>> How do you see the whole key rotation procedure will
> work?
> > > >> >>>>>>>>> Initial design for re-encryption with "partition copying"
> is
> > > >> >>>>>> described
> > > >> >>>>>>>>> here [1]. I'll prepare detailed design for "in place"
> > > re-encryption
> > > >> >>>>>> if
> > > >> >>>>>>>>> we'll go this way. In short, send the new encryption key
> > > >> >>>>>> cluster-wide,
> > > >> >>>>>>>>> each node adds a new key and starts background
> re-encryption.
> > > >> >>>>>>>>>
> > > >> >>>>>>>>> [1]
> > > >> >>>>>>>>>
> > > >> >>>>>>>>
> > > >> >>>>>>
> > > >> >>
> > >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase-3.Cachekeyrotation.-Copywithre-encryptiondesign
> > > >> >>>>>>>>> .
> > > >> >>>>>>>>>
> > > >> >>>>>>>>> вс, 17 мая 2020 г. в 18:35, Alexey Goncharuk <
> > > >> >>>>>> [hidden email]
> > > >> >>>>>>>>> :
> > > >> >>>>>>>>>>
> > > >> >>>>>>>>>> Pavel, Anton,
> > > >> >>>>>>>>>>
> > > >> >>>>>>>>>> How do you see the whole key rotation procedure will
> work?
> > > >> >> Clearly,
> > > >> >>>>>>>>> during
> > > >> >>>>>>>>>> the re-encryption there will exist pages encrypted with
> both
> > > new
> > > >> >> and
> > > >> >>>>>>>> old
> > > >> >>>>>>>>>> keys at the same time. Will a node continue to re-encrypt
> > > the data
> > > >> >>>>>>>> after
> > > >> >>>>>>>>> it
> > > >> >>>>>>>>>> restarts? If a node goes down during the re-encryption,
> but
> > > the
> > > >> >>>>>> rest of
> > > >> >>>>>>>>> the
> > > >> >>>>>>>>>> cluster finishes re-encryption, will we consider the
> > > procedure
> > > >> >>>>>>>> complete?
> > > >> >>>>>>>>> By
> > > >> >>>>>>>>>> the way, is the encryption key for the data the same on
> all
> > > nodes
> > > >> >> in
> > > >> >>>>>>>> the
> > > >> >>>>>>>>>> cluster?
> > > >> >>>>>>>>>>
> > > >> >>>>>>>>>> чт, 14 мая 2020 г. в 11:30, Anton Vinogradov <
> [hidden email]
> > > >:
> > > >> >>>>>>>>>>
> > > >> >>>>>>>>>>> +1 to "In place re-encryption".
> > > >> >>>>>>>>>>>
> > > >> >>>>>>>>>>> - It has a simple design.
> > > >> >>>>>>>>>>> - Clusters under load may require just load to
> re-encrypt
> > > the
> > > >> >> data.
> > > >> >>>>>>>>>>> (Friendly to load).
> > > >> >>>>>>>>>>> - Easy to throttle.
> > > >> >>>>>>>>>>> - Easy to continue.
> > > >> >>>>>>>>>>> - Design compatible with the multi-key architecture.
> > > >> >>>>>>>>>>> - It can be optimized to use own WAL buffer and to
> > > re-encrypt
> > > >> >> pages
> > > >> >>>>>>>>> without
> > > >> >>>>>>>>>>> restoring them to on-heap.
> > > >> >>>>>>>>>>>
> > > >> >>>>>>>>>>> On Thu, May 14, 2020 at 1:54 AM Pavel Pereslegin <
> > > >> >> [hidden email]
> > > >> >>>>>>>
> > > >> >>>>>>>>> wrote:
> > > >> >>>>>>>>>>>
> > > >> >>>>>>>>>>>> Hello Igniters.
> > > >> >>>>>>>>>>>>
> > > >> >>>>>>>>>>>> Recently, master key rotation for Apache Ignite
> > > Transparent Data
> > > >> >>>>>>>>>>>> Encryption was implemented [1], but some security
> > > standards (PCI
> > > >> >>>>>>>> DSS
> > > >> >>>>>>>>>>>> at least) require rotation of all encryption keys [2].
> > > >> >> Currently,
> > > >> >>>>>>>>>>>> encryption occurs when reading/writing pages to disk,
> cache
> > > >> >>>>>>>>> encryption
> > > >> >>>>>>>>>>>> keys are stored in metastore.
> > > >> >>>>>>>>>>>>
> > > >> >>>>>>>>>>>> I'm going to contribute cache encryption key rotation
> and
> > > want
> > > >> >> to
> > > >> >>>>>>>>>>>> consult what is the best way to re-encrypting existing
> > > data, I
> > > >> >> see
> > > >> >>>>>>>>> two
> > > >> >>>>>>>>>>>> different strategies.
> > > >> >>>>>>>>>>>>
> > > >> >>>>>>>>>>>> 1. In place re-encryption:
> > > >> >>>>>>>>>>>> Using the old key, sequentially read all the pages
> from the
> > > >> >>>>>>>>> datastore,
> > > >> >>>>>>>>>>>> mark as dirty and log them into the WAL. After
> checkpoint
> > > pages
> > > >> >>>>>>>> will
> > > >> >>>>>>>>>>>> be stored to disk encrypted with the new key (as usual,
> > > along
> > > >> >> with
> > > >> >>>>>>>>>>>> updates). This strategy requires store the identifier
> > > (number)
> > > >> >> of
> > > >> >>>>>>>> the
> > > >> >>>>>>>>>>>> encryption key into the encrypted page.
> > > >> >>>>>>>>>>>> pros:
> > > >> >>>>>>>>>>>> - can work in the background with minimal performance
> > > impact
> > > >> >>>>>>>> (this
> > > >> >>>>>>>>>>>> impact can be managed).
> > > >> >>>>>>>>>>>> cons:
> > > >> >>>>>>>>>>>> - page duplication in the WAL may affect performance
> and
> > > >> >>>>>>>> historical
> > > >> >>>>>>>>>>>> rebalance.
> > > >> >>>>>>>>>>>>
> > > >> >>>>>>>>>>>> 2. Copy partition with re-encryption.
> > > >> >>>>>>>>>>>> This strategy is similar to partition snapshotting [3]
> -
> > > create
> > > >> >>>>>>>>>>>> partition copy encrypted with the new key and then
> replace
> > > the
> > > >> >>>>>>>>>>>> original partition file with the new one (see details
> [4]).
> > > >> >>>>>>>>>>>> pros:
> > > >> >>>>>>>>>>>> - should work faster than "in place" re-encryption.
> > > >> >>>>>>>>>>>> cons:
> > > >> >>>>>>>>>>>> - re-encryption in active cluster (and on unstable
> > > topology) can
> > > >> >>>>>>>> be
> > > >> >>>>>>>>>>>> difficult to implement.
> > > >> >>>>>>>>>>>>
> > > >> >>>>>>>>>>>> (See more detailed comparison [5])
> > > >> >>>>>>>>>>>>
> > > >> >>>>>>>>>>>> Re-encryption of existing data is a long and rare
> > > procedure (It
> > > >> >> is
> > > >> >>>>>>>>>>>> recommended to change the key every 6 months, but at
> least
> > > once
> > > >> >>>>>>>> every
> > > >> >>>>>>>>>>>> 2 years). Thus, re-encryption can be implemented for
> > > maintenance
> > > >> >>>>>>>> mode
> > > >> >>>>>>>>>>>> (for example, on a stable topology in a read-only
> cluster)
> > > and
> > > >> >> in
> > > >> >>>>>>>>> such
> > > >> >>>>>>>>>>>> case the approach with partition copying seems simpler
> and
> > > >> >> faster.
> > > >> >>>>>>>>>>>>
> > > >> >>>>>>>>>>>> So, what do you think - do we need "online"
> re-encryption
> > > and
> > > >> >>>>>> which
> > > >> >>>>>>>>> of
> > > >> >>>>>>>>>>>> the proposed options is best suited for this?
> > > >> >>>>>>>>>>>>
> > > >> >>>>>>>>>>>> [1] https://issues.apache.org/jira/browse/IGNITE-12186
> > > >> >>>>>>>>>>>> [2]
> > > >> >>>>>>>>>
> > > https://www.pcisecuritystandards.org/documents/PCI_DSS_v3-2-1.pdf
> > > >> >>>>>>>>>>>> [3]
> > > >> >>>>>>>>>>>>
> > > >> >>>>>>>>>>>
> > > >> >>>>>>>>>
> > > >> >>>>>>>>
> > > >> >>>>>>
> > > >> >>
> > >
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-43%3A+Cluster+snapshots#IEP-43:Clustersnapshots-Partitionscopystrategy
> > > >> >>>>>>>>>>>> [4]
> > > >> >>>>>>>>>>>>
> > > >> >>>>>>>>>>>
> > > >> >>>>>>>>>
> > > >> >>>>>>>>
> > > >> >>>>>>
> > > >> >>
> > >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase-3.Cachekeyrotation.-Copywithre-encryptiondesign
> > > >> >>>>>>>>>>>> .
> > > >> >>>>>>>>>>>> [5]
> > > >> >>>>>>>>>>>>
> > > >> >>>>>>>>>>>
> > > >> >>>>>>>>>
> > > >> >>>>>>>>
> > > >> >>>>>>
> > > >> >>
> > >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase-3.Cachekeyrotation.-Comparison
> > > >> >>>>>>>>>>>>
> > > >> >>>>>>>>>>>
> > > >> >>>>>>>>>
> > > >> >>>>>>>>
> > > >> >>>>>>>
> > > >> >>>>>>>
> > > >> >>>>>>> --
> > > >> >>>>>>>
> > > >> >>>>>>> Best regards,
> > > >> >>>>>>> Alexei Scherbakov
> > > >> >>>>>>
> > > >> >>>>>>
> > > >> >>>>>
> > > >> >>>>> --
> > > >> >>>>>
> > > >> >>>>> Best regards,
> > > >> >>>>> Alexei Scherbakov
> > > >> >>>>>
> > > >> >>>>
> > > >> >>>>
> > > >> >>>> --
> > > >> >>>>
> > > >> >>>> Best regards,
> > > >> >>>> Alexei Scherbakov
> > > >> >>>>
> > > >> >>>
> > > >> >>>
> > > >> >>> --
> > > >> >>>
> > > >> >>> Best regards,
> > > >> >>> Alexei Scherbakov
> > > >> >>
> > > >> >>
> > > >> >
> > > >> > --
> > > >> >
> > > >> > Best regards,
> > > >> > Alexei Scherbakov
> > > >>
> > > >
> > > >
> > > > --
> > > >
> > > > Best regards,
> > > > Alexei Scherbakov
> > >
>

Nikolay Izhikov-2

Re: [DISCUSS] Best way to re-encrypt existing data (TDE cache key rotation).

Hello, Maksim.

Thanks for the summary.
From my point of view, we should focus on Phase3 implementation and then do the refactoring for some specific SPI implementation.

> 8. Question - why we add a lot of system properties?

Can you, please, list system properties that should be moved to the configuration?

> 10. Question - CRC is read in two places encryptionFileIO and filePageStore - what should we do with this?

filePageStore checks CRC of the encrypted page. This required to confirm the page not corrupted on the disk.
encryptionFileIO checks CRC of the decrypted page(CRC itself stored in the encrypted data).
This required to be sure the decrypted page contains correct data and not replaced with some malicious content.

Here is the list of items that are not related to Phase3 implementation.
Please, tell me what do you think:

> 2. We should try to run the existing test suites in encryption mode.

We did it during TDE.Phase1 testing.

> 3. SPI requires an additional method such as getKeyDigest
> 4. Recommendation - the encryption processor should be divided into external subclasses

> 5. Recommendation - we should not use tuples and triples, because this is a marker of a design problem.
> 6. Strict recommendation - please don't put context everywhere

Actually, this is a question of taste and obviously not related to the current discussion.

> 24 июля 2020 г., в 14:27, Maksim Stepachev <[hidden email]> написал(а):
>
> Hello everyone, yesterday we discussed the implementation of TDE over a
> conference call. I added a summary of this call here:
>
> 1. The wiki documentation should be expanded. It should describe the
> steps - how it works under the hood. What are the domain objects in the
> implementation?
> 2. We should try to run the existing test suites in encryption mode.
> Encryption should not affect any PDS or other tests.
> 3. SPI requires an additional method such as getKeyDigest, because the
> current implementation of GridEncryptionManager#masterKeyDigest() looks
> strange. We reset the master key to calculate the digest. This will not
> work well if we want to use VOLT as a key provider implementation.
> 4. Recommendation - the encryption processor should be divided into
> external subclasses, and we should use the OOP decomposition pattern for
> it. Right now, this class has more than 2000 lines and does not support
> SOLID. This is similar to inline unrelated logic with a single class.
> 5. Recommendation - we should not use tuples and triples, because this
> is a marker of a design problem.
> 6. Strict recommendation - please don't put context everywhere. it
> should only be used in the parent class. You can pass the necessary
> dependencies through the constructor, as in the DI pattern.
> 7. Question -the current implementation does not use the throttling that
> is implemented in PDS. Users should set the throughput such as 5 MB per
> second, but not the timeout, packet size, or stream size.
> 8. Question - why we add a lot of system properties? Why we didn’t add a
> configuration for it?
> 9. Question - How do we optimize when we can check that this page is
> already encrypted by parallel loading? Maybe we should do this in Phase 4?
> 10. Question - CRC is read in two places encryptionFileIO and
> filePageStore - what should we do with this?
> 11. We should remember about complicated test scenarios with failover
> like node left when encryption started and joined after it finished. In the
> process, the baseline changed node left before / after / in the middle of
> this process. And etc.
> 12. How to use a sandbox to protect our cluster of master and user key
> stealing via compute?
> 13. Will re-encryption continue after the cluster is completely stopped?
>
> If I forgot some points, you can add them to the message.
>
>
> вт, 7 июл. 2020 г. в 17:40, Pavel Pereslegin <[hidden email]>:
>
>> Hello, Maksim.
>>
>> For implementation, I chose so-called "in place background
>> re-encryption" design.
>>
>> The first step is to rotate the key for writing data, it only works on
>> the active cluster, at the moment..
>> The second step is re-encryption (to remove previous encryption key).
>> If node was restarted reencryption starts after metastorage becomes
>> ready for read/write. Each "re-encrypted" partition (including index)
>> has an attribute on the meta page that indicates whether background
>> re-encryption should be continued.
>>
>> I updated the description in wiki [1].
>> Some more details in jira [2].
>> Draft PR [3].
>>
>> [1]
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384
>> [2] https://issues.apache.org/jira/browse/IGNITE-12843
>> [3] https://github.com/apache/ignite/pull/7941
>>
>> вт, 7 июл. 2020 г. в 13:49, Maksim Stepachev <[hidden email]>:
>>>
>>> Hi!
>>>
>>> Do you have any updates about this issue? What types of implementations
>>> have you chosen (in-place, offline, or in the background)? I know that we
>>> want to add a partition defragmentation function, we can add a hole to
>>> integrate the re-encryption scheme. Could you update your IEP with your
>>> plans?
>>>
>>> пн, 25 мая 2020 г. в 12:50, Pavel Pereslegin <[hidden email]>:
>>>
>>>> Nikolay, Alexei,
>>>>
>>>> thanks for your suggestions.
>>>>
>>>> Offline re-encryption does not seem so simple, we need to read/replace
>>>> the existing encryption keys on all nodes (therefore, we should be
>>>> able to read/write metastore/WAL and exchange data between the
>>>> baseline nodes). Re-encryption in maintenance mode (for example, in a
>>>> stable read-only cluster) will be simple, but it still looks very
>>>> inconvenient, at least because users will need to interrupt all
>>>> operations.
>>>>
>>>> The main advantage of online "in place" re-encryption is that we'll
>>>> support multiple keys for reading, and this procedure does not
>>>> directly depend on background re-encryption.
>>>>
>>>> So, the first step is similar to rotating the master key when the new
>>>> key was set for writing on all nodes - that’s it, the cache group key
>>>> rotation is complete (this is what PCI DSS requires - encrypt new
>>>> updates with new keys).
>>>> The second step is to re-encrypt the existing data, As I said
>>>> previously I thought about scanning all partition pages in some
>>>> background mode (store progress on the metapage to continue after
>>>> restart), but rebalance approach should also work here if I figure out
>>>> how to automate this process.
>>>>
>>>> пн, 25 мая 2020 г. в 12:22, Alexei Scherbakov <
>>>> [hidden email]>:
>>>>>
>>>>>
>>>>>
>>>>> пн, 25 мая 2020 г. в 12:00, Nikolay Izhikov <[hidden email]>:
>>>>>>
>>>>>>> This willl takes us to the re-encryption using full rebalancing
>>>>>>
>>>>>> Rebalance will require 2x efforts for reencryption
>>>>>>
>>>>>> 1. Read and send data from supplier node.
>>>>>> 2. Reencrypt and write data on demander node.
>>>>>>
>>>>>> Instead of
>>>>>>
>>>>>> 1. Read, reencrypt and write data on «demander» node.
>>>>>
>>>>>
>>>>> Usually, reading and sending is not a bottleneck. And don't forget we
>>>> can run out of WAL history and fall back to full rebalancing with
>> partition
>>>> eviction eliminating all efforts from offline re-encryption.
>>>>>
>>>>> On the other side, for a grid having many nodes one-by-one
>> re-encryption
>>>> can take a long time.
>>>>> It should also be possible to re-encrypt all data as fast as possible
>>>> if, for example, if a load can be switched to another grid, where
>> offline
>>>> encryption will come in handy.
>>>>>
>>>>> So, I suggest to implement offline re-encryption and online
>>>> re-encryption using rebalancing as a first step.
>>>>>
>>>>> Next step can be online in-place re-encryption. It's important to
>>>> measure business impact from it on online grid.
>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>> 25 мая 2020 г., в 11:46, Alexei Scherbakov <
>>>> [hidden email]> написал(а):
>>>>>>>
>>>>>>> For me, the one big disadvantage for offline re-encryption is the
>>>>>>> possibility to run out of WAL history.
>>>>>>> If an re-encryption takes a long time we will get full rebalancing
>>>> with
>>>>>>> partition eviction.
>>>>>>> This willl takes us to the re-encryption using full rebalancing,
>>>> proposed
>>>>>>> by me earlier.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> пн, 25 мая 2020 г. в 11:27, Nikolay Izhikov <[hidden email]
>>> :
>>>>>>>
>>>>>>>>> And definitely this approach is much simplier to implement
>>>>>>>>
>>>>>>>> I agree.
>>>>>>>>
>>>>>>>> If we allow to made nodes offline for reencryption then we can
>>>> implement a
>>>>>>>> fully offline procedure:
>>>>>>>>
>>>>>>>> 1. Stop node.
>>>>>>>> 2. Execute some control.sh command that will reencrypt all data
>>>> without
>>>>>>>> starting node
>>>>>>>> 3. Start node.
>>>>>>>>
>>>>>>>> Pavel, can you, please, write it one more time - what
>> disadvantages
>>>> in
>>>>>>>> offline procedure?
>>>>>>>>
>>>>>>>>> 25 мая 2020 г., в 11:20, Alexei Scherbakov <
>>>> [hidden email]>
>>>>>>>> написал(а):
>>>>>>>>>
>>>>>>>>> And definitely this approach is much simplier to implement
>> because
>>>> all
>>>>>>>>> corner cases are handled by rebalancing code.
>>>>>>>>>
>>>>>>>>> пн, 25 мая 2020 г. в 11:16, Alexei Scherbakov <
>>>>>>>> [hidden email]
>>>>>>>>>> :
>>>>>>>>>
>>>>>>>>>> I mean: serving supply requests.
>>>>>>>>>>
>>>>>>>>>> пн, 25 мая 2020 г. в 11:15, Alexei Scherbakov <
>>>>>>>>>> [hidden email]>:
>>>>>>>>>>
>>>>>>>>>>> Nikolay,
>>>>>>>>>>>
>>>>>>>>>>> Can you explain why such restriction is necessary ?
>>>>>>>>>>> Most likely having a currently re-encrypting node serving only
>>>> demand
>>>>>>>>>>> requests will have least preformance impact on a grid.
>>>>>>>>>>>
>>>>>>>>>>> пн, 25 мая 2020 г. в 11:08, Nikolay Izhikov <
>> [hidden email]
>>>>> :
>>>>>>>>>>>
>>>>>>>>>>>> Hello, Alexei.
>>>>>>>>>>>>
>>>>>>>>>>>> I think we want to implement this feature without nodes
>> restart.
>>>>>>>>>>>> In the ideal scenario all nodes will stay alive and respond
>> to
>>>> the
>>>>>>>> user
>>>>>>>>>>>> requests.
>>>>>>>>>>>>
>>>>>>>>>>>>> 24 мая 2020 г., в 15:24, Alexei Scherbakov <
>>>>>>>>>>>> [hidden email]> написал(а):
>>>>>>>>>>>>>
>>>>>>>>>>>>> Pavel Pereslegin,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I see another opportunity.
>>>>>>>>>>>>> We can use rebalancing to re-encrypt node data with a new
>> key.
>>>>>>>>>>>>> It's a trivial procedure for me: stop a node, clear
>> database,
>>>> change
>>>>>>>> a
>>>>>>>>>>>> key,
>>>>>>>>>>>>> start node and wait for rebalancing to complete.
>>>>>>>>>>>>> Data will be re-encrypted during rebalancing.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Did I miss something ?
>>>>>>>>>>>>>
>>>>>>>>>>>>> пт, 22 мая 2020 г. в 16:14, Ivan Rakov <
>> [hidden email]>:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Folks,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Just keeping you informed: I and my colleagues are highly
>>>> interested
>>>>>>>>>>>> in TDE
>>>>>>>>>>>>>> in general and keys rotations specifically, but we don't
>> have
>>>> enough
>>>>>>>>>>>> time
>>>>>>>>>>>>>> so far.
>>>>>>>>>>>>>> We'll dive into this feature and participate in reviews
>> next
>>>> month.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>> Ivan Rakov
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Sun, May 17, 2020 at 10:51 PM Pavel Pereslegin <
>>>> [hidden email]
>>>>>>>>>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hello, Alexey.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> is the encryption key for the data the same on all nodes
>> in
>>>> the
>>>>>>>>>>>>>> cluster?
>>>>>>>>>>>>>>> Yes, each encrypted cache group has its own encryption
>> key,
>>>> the key
>>>>>>>>>>>> is
>>>>>>>>>>>>>>> the same on all nodes.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Clearly, during the re-encryption there will exist pages
>>>>>>>>>>>>>>>> encrypted with both new and old keys at the same time.
>>>>>>>>>>>>>>> Yes, there will be pages encrypted with different keys at
>> the
>>>> same
>>>>>>>>>>>> time.
>>>>>>>>>>>>>>> Currently, we only store one key for one cache group. To
>>>> rotate a
>>>>>>>>>>>> key,
>>>>>>>>>>>>>>> at a certain point in time it is necessary to support
>> several
>>>> keys
>>>>>>>>>>>> (at
>>>>>>>>>>>>>>> least for reading the WAL).
>>>>>>>>>>>>>>> For the "in place" strategy, we'll store the encryption
>> key
>>>>>>>>>>>> identifier
>>>>>>>>>>>>>>> on each encrypted page (we currently have some unused
>> space on
>>>>>>>>>>>>>>> encrypted page, so I don't expect any memory overhead
>> here).
>>>> Thus,
>>>>>>>> we
>>>>>>>>>>>>>>> will have several keys for reading and one key for
>> writing. I
>>>>>>>> assume
>>>>>>>>>>>>>>> that the old key will be automatically deleted when a
>>>> specific WAL
>>>>>>>>>>>>>>> segment is deleted (and re-encryption is finished).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Will a node continue to re-encrypt the data after it
>>>> restarts?
>>>>>>>>>>>>>>> Yes.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> If a node goes down during the re-encryption, but the
>> rest
>>>> of the
>>>>>>>>>>>>>>>> cluster finishes re-encryption, will we consider the
>>>> procedure
>>>>>>>>>>>>>> complete?
>>>>>>>>>>>>>>> I'm not sure, but it looks like the key rotation is
>> complete
>>>> when
>>>>>>>> we
>>>>>>>>>>>>>>> set the new key on all nodes so that the updates will be
>>>> encrypted
>>>>>>>>>>>>>>> with the new key (as required by PCI DSS).
>>>>>>>>>>>>>>> Status of re-encryption can be obtained separately
>> (locally or
>>>>>>>>>>>> cluster
>>>>>>>>>>>>>>> wide).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I forgot to mention that with “in place” re-encryption it
>>>> will be
>>>>>>>>>>>>>>> impossible to quickly cancel re-encryption, because by
>>>> canceling we
>>>>>>>>>>>>>>> mean re-encryption with the old key.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> How do you see the whole key rotation procedure will
>> work?
>>>>>>>>>>>>>>> Initial design for re-encryption with "partition copying"
>> is
>>>>>>>>>>>> described
>>>>>>>>>>>>>>> here [1]. I'll prepare detailed design for "in place"
>>>> re-encryption
>>>>>>>>>>>> if
>>>>>>>>>>>>>>> we'll go this way. In short, send the new encryption key
>>>>>>>>>>>> cluster-wide,
>>>>>>>>>>>>>>> each node adds a new key and starts background
>> re-encryption.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>
>>>>
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase-3.Cachekeyrotation.-Copywithre-encryptiondesign
>>>>>>>>>>>>>>> .
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> вс, 17 мая 2020 г. в 18:35, Alexey Goncharuk <
>>>>>>>>>>>> [hidden email]
>>>>>>>>>>>>>>> :
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Pavel, Anton,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> How do you see the whole key rotation procedure will
>> work?
>>>>>>>> Clearly,
>>>>>>>>>>>>>>> during
>>>>>>>>>>>>>>>> the re-encryption there will exist pages encrypted with
>> both
>>>> new
>>>>>>>> and
>>>>>>>>>>>>>> old
>>>>>>>>>>>>>>>> keys at the same time. Will a node continue to re-encrypt
>>>> the data
>>>>>>>>>>>>>> after
>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>> restarts? If a node goes down during the re-encryption,
>> but
>>>> the
>>>>>>>>>>>> rest of
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>> cluster finishes re-encryption, will we consider the
>>>> procedure
>>>>>>>>>>>>>> complete?
>>>>>>>>>>>>>>> By
>>>>>>>>>>>>>>>> the way, is the encryption key for the data the same on
>> all
>>>> nodes
>>>>>>>> in
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>> cluster?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> чт, 14 мая 2020 г. в 11:30, Anton Vinogradov <
>> [hidden email]
>>>>> :
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> +1 to "In place re-encryption".
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> - It has a simple design.
>>>>>>>>>>>>>>>>> - Clusters under load may require just load to
>> re-encrypt
>>>> the
>>>>>>>> data.
>>>>>>>>>>>>>>>>> (Friendly to load).
>>>>>>>>>>>>>>>>> - Easy to throttle.
>>>>>>>>>>>>>>>>> - Easy to continue.
>>>>>>>>>>>>>>>>> - Design compatible with the multi-key architecture.
>>>>>>>>>>>>>>>>> - It can be optimized to use own WAL buffer and to
>>>> re-encrypt
>>>>>>>> pages
>>>>>>>>>>>>>>> without
>>>>>>>>>>>>>>>>> restoring them to on-heap.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Thu, May 14, 2020 at 1:54 AM Pavel Pereslegin <
>>>>>>>> [hidden email]
>>>>>>>>>>>>>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Hello Igniters.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Recently, master key rotation for Apache Ignite
>>>> Transparent Data
>>>>>>>>>>>>>>>>>> Encryption was implemented [1], but some security
>>>> standards (PCI
>>>>>>>>>>>>>> DSS
>>>>>>>>>>>>>>>>>> at least) require rotation of all encryption keys [2].
>>>>>>>> Currently,
>>>>>>>>>>>>>>>>>> encryption occurs when reading/writing pages to disk,
>> cache
>>>>>>>>>>>>>>> encryption
>>>>>>>>>>>>>>>>>> keys are stored in metastore.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I'm going to contribute cache encryption key rotation
>> and
>>>> want
>>>>>>>> to
>>>>>>>>>>>>>>>>>> consult what is the best way to re-encrypting existing
>>>> data, I
>>>>>>>> see
>>>>>>>>>>>>>>> two
>>>>>>>>>>>>>>>>>> different strategies.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> 1. In place re-encryption:
>>>>>>>>>>>>>>>>>> Using the old key, sequentially read all the pages
>> from the
>>>>>>>>>>>>>>> datastore,
>>>>>>>>>>>>>>>>>> mark as dirty and log them into the WAL. After
>> checkpoint
>>>> pages
>>>>>>>>>>>>>> will
>>>>>>>>>>>>>>>>>> be stored to disk encrypted with the new key (as usual,
>>>> along
>>>>>>>> with
>>>>>>>>>>>>>>>>>> updates). This strategy requires store the identifier
>>>> (number)
>>>>>>>> of
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>> encryption key into the encrypted page.
>>>>>>>>>>>>>>>>>> pros:
>>>>>>>>>>>>>>>>>> - can work in the background with minimal performance
>>>> impact
>>>>>>>>>>>>>> (this
>>>>>>>>>>>>>>>>>> impact can be managed).
>>>>>>>>>>>>>>>>>> cons:
>>>>>>>>>>>>>>>>>> - page duplication in the WAL may affect performance
>> and
>>>>>>>>>>>>>> historical
>>>>>>>>>>>>>>>>>> rebalance.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> 2. Copy partition with re-encryption.
>>>>>>>>>>>>>>>>>> This strategy is similar to partition snapshotting [3]
>> -
>>>> create
>>>>>>>>>>>>>>>>>> partition copy encrypted with the new key and then
>> replace
>>>> the
>>>>>>>>>>>>>>>>>> original partition file with the new one (see details
>> [4]).
>>>>>>>>>>>>>>>>>> pros:
>>>>>>>>>>>>>>>>>> - should work faster than "in place" re-encryption.
>>>>>>>>>>>>>>>>>> cons:
>>>>>>>>>>>>>>>>>> - re-encryption in active cluster (and on unstable
>>>> topology) can
>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>> difficult to implement.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> (See more detailed comparison [5])
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Re-encryption of existing data is a long and rare
>>>> procedure (It
>>>>>>>> is
>>>>>>>>>>>>>>>>>> recommended to change the key every 6 months, but at
>> least
>>>> once
>>>>>>>>>>>>>> every
>>>>>>>>>>>>>>>>>> 2 years). Thus, re-encryption can be implemented for
>>>> maintenance
>>>>>>>>>>>>>> mode
>>>>>>>>>>>>>>>>>> (for example, on a stable topology in a read-only
>> cluster)
>>>> and
>>>>>>>> in
>>>>>>>>>>>>>>> such
>>>>>>>>>>>>>>>>>> case the approach with partition copying seems simpler
>> and
>>>>>>>> faster.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> So, what do you think - do we need "online"
>> re-encryption
>>>> and
>>>>>>>>>>>> which
>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>> the proposed options is best suited for this?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> [1] https://issues.apache.org/jira/browse/IGNITE-12186
>>>>>>>>>>>>>>>>>> [2]
>>>>>>>>>>>>>>>
>>>> https://www.pcisecuritystandards.org/documents/PCI_DSS_v3-2-1.pdf
>>>>>>>>>>>>>>>>>> [3]
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>
>>>>
>> https://cwiki.apache.org/confluence/display/IGNITE/IEP-43%3A+Cluster+snapshots#IEP-43:Clustersnapshots-Partitionscopystrategy
>>>>>>>>>>>>>>>>>> [4]
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>
>>>>
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase-3.Cachekeyrotation.-Copywithre-encryptiondesign
>>>>>>>>>>>>>>>>>> .
>>>>>>>>>>>>>>>>>> [5]
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>
>>>>
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase-3.Cachekeyrotation.-Comparison
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>>
>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>> Alexei Scherbakov
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>>
>>>>>>>>>>> Best regards,
>>>>>>>>>>> Alexei Scherbakov
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>>
>>>>>>>>>> Best regards,
>>>>>>>>>> Alexei Scherbakov
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>>
>>>>>>>>> Best regards,
>>>>>>>>> Alexei Scherbakov
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>> Best regards,
>>>>>>> Alexei Scherbakov
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> Best regards,
>>>>> Alexei Scherbakov
>>>>
>>