Monday, July 25, 2016

ACID (Part 3)

So in the last post we discussed Isolation (the third letter from ACID). We managed to cover the 1st isolation level which was the Read Uncommitted. That level of isolation in fact offers no isolation at all (thus poor consistency), but yield excellent performance because we don't have to jump to previous "versions" of the record being read. The three phenomenas are reproducible with the read uncommitted. The next level of isolation we want to talk about is Read Committed.

Here is the reference to all ACID posts that I authored:

ACID (Part 1) - Atomicity and Consistency
ACID (Part 2) - Isolation (Read Uncommitted)
ACID (Part 3) - Isolation (Read Committed)
ACID (Part 4) - Isolation (Repeatable Read)

Read Committed

This isolation level offers isolation against uncommitted reads. A transaction will only read entries that has been committed by other transactions. So from this definition we know that dirty reads phenomena cannot appear in this isolation level. But again, we need a good example that illustrates this. Lets do our Instagram example again.

Brief recap, the Picture table has the LikesCount field which keeps track of how many likes a picture got. This is a performance tweak in order to avoid querying the Like table every time we want to get the likes count for a picture. We notice that picture 1 has 1 like only by Jon and picture 3 has two likes by Reek and Sansa. Picture 2 has 2 likes by Jon and Reek. The Picture_ID is the primary key in the Picture table, while both the Picture_ID and the User constitute the primary key for the Like table. There are many fields that we can add but for simplicity we are sticking with these now.

Jon has already liked picture 2, but he is insisting in liking it again, we have nice atomicity and consistency that prevent him from doing so but lets see what happens. Jon fires up a like on picture 2, and fraction of a second later Sansa loads up picture 2, this will retrieve the number of likes. 

Jon sends the like, first query executes successfully, incrementing the likes count.

Before Jon second query executes, Sansa's Select kicks in to read picture 2 row, she is going to get 2 likes instead of 3. This is because we are operating under the Read Committed isolation level, and since Jon still did not commit his transaction, Sansa is getting the current committed value. So we have avoided a dirty read phenomena. 

Sansa issues another read to the Like table to get all the users who likes picture 2, she gets two rows, Jon and Reek. Consistent with the number of likes she got.

Jon transaction moves on and executes the second query which fails because of the constraint we have in place. Rolling back the entry for likes back to 2.

Jon gives up, his transaction is finished and he failed to ruin our system consistent state. But meanwhile a new user comes in, Eddard, likes picture 2, he is a brand new user who never liked picture 2 before so his transaction commits fine.

Sansa's transaction is still running she is querying other tables, doing some stuff, updating the view count perhaps, and then finally, she comes back for a final read of picture 2 getting the likes count. She is getting a different result although she executed the same query twice in the same transaction. So we still get non-repeatable read phenamona with read committed isolation level.  

She issues a final read to the Like table to find out the list of users who liked picture 2, and surprise surprise, she got an extra record this time, hence Phantom read is also reproducible with read committed. 

So this isolation level did save us from a dirty read but still we getting phantoms and non-repeatable reads. The reason is that Read Committed reads committed transactions, and during the life time of the transaction a lot of other transactions can commit stuff that touches the data our transaction is about to read.

Next up, Repeatable Read