Multi-Version Concurrency Control

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Multi-Version Concurrency Control

Serge Puchnin
CONTENTS DELETED
The author has deleted this message.
Reply | Threaded
Open this post in threaded view
|

Re: Multi-Version Concurrency Control

Alexey Kuznetsov
Serge,

Cool feature!

I have following questions:
1) MVCC will be a kind of "cache mode"? And we can have caches with old behavior and caches with MVCC?
2) " Garbage collection" - may be we should give another name to not intersect with JVM GC?

Thanks! 

On Tue, Aug 15, 2017 at 8:11 PM, Serge Puchnin n <[hidden email]> wrote:
Hello Ignite Developers, 

I’d like to start a discussion about the design of  MVCC implementation [1].
It’ll help us to solve an issue when a user might see a partially committed transaction.

Below you can find the proposed design. 
Please provide your feedback/thoughts about suggested solution. 

Thanks a lot,
Sergey Puchnin 

[1] https://issues.apache.org/jira/browse/IGNITE-3478


Multi-Version Concurrency Control Architecture


Abstract

This page contains high-level description of MVCC Architecture for JIRA https://issues.apache.org/jira/browse/IGNITE-3478

Problem Description

Current Ignite SQL does not take into account transaction boundaries. For example, if a transaction atomically changes the balance of two accounts, then a concurrent SQL query can see a partially committed transaction. This works for data analytics but does not work for more strict and demanding scenarios.
It would be perfect if we had a mode which ensures transaction boundaries are taken into account for SQL query.

Design Description

Multi-Version Concurrency Control (MVCC) is a concurrency control mechanism that has been implemented in numerous RDBMS systems.
The main approach of this solution is instead of change data in-place a session creates a new version of data, and different transactions are able to get a correct version in concurrent surroundings. 
Consequently, readers never block writers, 
writers never block readers, readers don't need any locks.

In cluster environment data are well distributed via different partitions, nodes and it's necessary to support cases when read-write transactions are processed by nodes in an accidental order.

At this point, we have to revise the only transactions that need to update or read data on more than one partition.

To provide cross-cache ACID transactions for all isolation levels and lock models in distributed environment it necessary to determinate what changes were made before a transaction was started and are visible for the transaction.

To solve that new component is provided in the system - TS coordinator and two new attributes are added to an entity: minTS and maxTS.
The coordinator will order all changes for multi-partition transactions. 
minTS and maxTS will determine a visibility of specific version for a transaction.

A base workflow is include next steps.

1. Initialize

During initialization, a transaction asks a node that acts as a TS coordinator to get a new transaction identifier (currentTX). The currentTX is a monotonic, unique identifier to order all changes. 
The coordinator adds current transaction into an active transaction list in RUNNING state. 
Also, the coordinator informs the transaction about all transactions were registered before its and now are in RUNNING state (excluding TX).

2. Writing

On Prepare stage the transaction get all necessary locks.
On Commit stage the transaction:
1. To find current version in PK Index (minTS = currentTX or maxTS = 0)
2. To find current version in Data Page by a link from #1
3. Insert new version into Data Page and set minTS = currentTX, maxTS=0
4. Insert new version into PK Index set minTS = currentTX set maxTS=0 and set link to #2
5. Update current version in Data Page (find via link from #1) set maxTS = currentTX
6. Update current version in PK Index Page (find at #1) set maxTS = currentTX
7. Find current version in Secondary Index (minTS = currentTX or maxTS = 0)
8. Insert new version into Secondary Index set minTS = currentTX set maxTS=0 set link to #4
9. Update current version in Secondary Index set maxTS=currentTX

Due to Two-Phase Commit it's not possible to get a case when two not committed transactions update the same key. 
For delete statements steps 3, 4, 8 should be skipped.

3. Reading

For a transaction with Repeatable Read, a value will be cached on Near Node.
For other levels, a transaction should find a version for which the following condition holds:
minTS >= CurrentTS and (maxTS < CurrentTS or maxTS = 0). Also minTS, maxTS should not be in excluding TX list.
For Read Committed, CurrentTS is an identifier on start statement moment. 
For Serialization level, CurrentTS is an identifier on start session moment. In addition to that if a session has found that maxTS !=0 an error "It's not possible to serialize the result" should be raised.

4. Garbage collection

During a Garbage collection procedure any entity's versions with maxTS less than minimal currentTX for sessions with RUNNING state can be deleted. There are no cases when we should show these values. minTS for next version should be updated to 0.

5. Sample 

As a sample, there are three write operation and check changes in data structures. 







--
Alexey Kuznetsov
Reply | Threaded
Open this post in threaded view
|

Re: Multi-Version Concurrency Control

Serge Puchnin
CONTENTS DELETED
The author has deleted this message.
Reply | Threaded
Open this post in threaded view
|

Re: Multi-Version Concurrency Control

Alexey Kuznetsov
Serge,

>#1 Could you please share any case when we need to use different modes for different
caches?

Atomic caches?



--
Alexey Kuznetsov
Reply | Threaded
Open this post in threaded view
|

Re: Multi-Version Concurrency Control

Serge Puchnin
CONTENTS DELETED
The author has deleted this message.