Monday, July 4, 2016

Persistent Connections

English is not my first language, so I have to remind myself of definitions from time to time. So mind me if you see me define a phrase before I elaborate on it. To persist something is to keep it existing. So what does a persistent connection refer to? and why this is something software engineers should know about?

TCP is the de-facto protocol of the network communication. When we want to send data between node A and node B we establish a TCP connection, be it a simple visit to google.com or an Oracle database connection, under the hood its all TCP.  

Lots of things involve establishing the connection. Handshaking, acknowledgment, making sure the parties in the connection are in fact who they say they are etc. We don't need to discuss details of what exactly happens in this post, but one thing you should know, opening TCP connection is expensive. 

So imagine this scenario. You visit http://www.nationalgeographic.com website, this uses the HTTP protocol which underneath open a TCP connection to the national geographic server. So lets go back in time to 1996, when HTTP/1.0 was just released. This would happen when you visit that page:

Open TCP Connection national geographic website (this has many steps remember)
Read Index.html and send it back to the browser
Close TCP Connection
For each Image in the page
     Open TCP Connection
     Read Image from the server disk, send back to browser
     Close TCP Connection
Next image


Images are just an example of things we need to load, back then there were much more nastier resources. 

So now imagine the overhead of opening and closing the connections. The network congestion from all acknowledgments being sent back and forth and the wasted processing cycles both the server and the client have to endure. That is why persistent connections became popular, open a connection, and leave it open while we send everything we have, once we are done we can close it. Here is a modern visit of nationalgeographic.com in 2016, HTTP 1.1


Open TCP Connection national geographic website  
Read Index.html and send it back to the browser
For each Image in the page
     Read Image from the server disk, send back to browser
Next image
... Do more 
... Do more
Close TCP Connection


Now, I know this is very specific to browsers and web servers but the same story is true for database connections. In my years of experience as a programmer working with databases, I developed a habit, and am sure most of you did too, of opening a connection, sending a query and then closing a connection. That might be fine and barely noticeable if you have like 10 users working with your application. However, as you scale up, you will start noticing performance degradation.

Another thing you gain of persistent connections is PUSH events. This is how WhatsApp is able to freak you out by instantly delivering your wife's message "Where are you!" to your phone the moment she hit the send key on hers. WhatsApp do that by having a live open connection to their server from your mobile (not exactly TCP though, a much more efficient protocol called XMPP, we can touch upon that on some other post). 

Disadvantages?
We mentioned the advantages of persistent connection, but are there any disadvantages? Yes, there is no free lunch apparently.

When using persistent connection, you keep the connection alive on both the client and server, so you start eating up more memory with more connections opened. So you have to be smart about closing those idle connections.

Another problem came with persistent connections that is specifically for TCP. In a nut shell, now that we started to keep connections alive on the server, attackers came up with this idea, "Hey, what if we made the server run out of memory by establishing millions of connections and never replay back?" Thus DDOS attacks were born. Again we can touch more on this on another post. 

-Hussein