Crash Course in Performance Tuning

One of my buddies runs a very popular sports blog which was hitting a performance bottleneck and was looking around for a dedicated host.  He was unclear on how all the hardware specs helped him scale.  I sent him this email hoping to give him some insight.  I figured it was a good crash course on performance tuning concepts and might be useful to some other people out there.

A quick crash course in performance tuning.  When running a high workload you’re eventually going to hit a “bottleneck”.  This is some limiting factor which doesn’t allow you to run any faster.  Think of it as the speed limit on a highway.  You can fix one bottleneck by giving more resources to it and causing the site to run faster until the next bottleneck is hit.  You then address that bottleneck.  You keep going until there is no clear bottleneck out there.

There are 4 common bottlenecks.  They are CPU, memory, network and disk speed.

Let’s go in reverse order:

Disk speed: This is by far the biggest issue for most websites.  However, it is easily solved by taking disks out of the equation and moving commonly used items  to the very fast memory.  Moving items to memory can be done by adding Squid for images, using Memcached instead of the database, or a larger buffer pool on MySQL/InnoDB.

Network: There are two ideas in network.  The hardware and the bandwidth.  And the term “bandwidth” is misleading.  Bandwidth is just how _much_ data you are allowed to transfer over the course of a month, it is not a statement about how _fast_ the data transfer.  The speed of a network transfer is in the hardware.

Memory: For every person that comes to the website they will use a certain amount of memory to be able to send them a webpage.  That memory is being taken up for as long as the website is still sending them the webpage (ie a connection being open).  However, if the website isn’t sending the pages fast enough, then that memory is used longer, and eventually you’ll run out.  Additionally, memory is used for caching to alleviate the disk bottleneck.

CPU: The goal of all performance tuning is to make CPU the bottleneck. The reason being is that you can throw more hardware at the problem and it will fix things.  This is not typically the case in all the above bottlenecks.

Dedicated vs. Virtual hosting:  Their are _major_ advantages to not virtualizing.  When you virtualize something, it does makes things easier to manage, however it slows the hardware down a lot.  So the hardware that you have can not run nearly as fast.  Additionally, unless it’s “dedicated-virtual”  there are other websites on the same hardware.  These other websites will compete for hardware resources and will not allow websites to run as fast.  By having a server “dedicated” it makes sure that the hardware is “dedicated” to your website only.

I Want to Go Fast: Speeding Up the Server Side

One of my websites has started getting a fair amount of traffic (approaching 1 Million pageviews a month), and the load is high enough that I decided it needed it’s own server. This post shows how I scaled the server/website to handle even Digg.

The New Server:

2x Xeon 3.2 Ghz Dual Core (4 cores total)
2 GB Memory
2x 73GB SCSI Drives

Software:

Linux: CentOS 5.0
Apache: 2.2
MySQL: 4.3 (on a separate server)
PHP: 5.1.6
Webpage: CakePHP Framework 1.2.2.8120

Website workload:

95% Read / 5% Write

Benchmark:

Apache Bench: ab -c 20 -n 1000 http://webpage

Base: 8.68 Pageviews/Second

Admittedly the webpage is heavy. I took a stock apache config, and on my first run I got 8.68 pageviews/second.  I was pretty happy with this, however my CPU utilization was 100%.  While this is normally a good thing, I knew that the CakePHP framework was eating up a lot of CPU cycles that were unecessary.  My next step was to fix CakePHP up a little.

CakePHP CacheTime Patch: 9.40 Pageviews/Second

There is a small performance bug in the code where it processes an entire page for caching even though the page won’t be cached.  Here’s the patch.

Even though this patch reduced the code path, we were still doing a lot of processing of code. While PHP is relatively light, it is still an interpreted language. There’s always a way of speeding an interpreted language up. In PHP’s case, it’s turning on APC.

Turn on APC: 18.35 Pageviews/Second

APC caches PHP byte code so it doesn’t need to get interpreted every single run.  This nearly doubles performance and drops the CPU utilization down to 85%.  Which means we have a new bottleneck.  Originally I though it was disk IO, but iostat shows little disk activity.  And since I’m running a local apache bench, it’s not network.  This leaves the database.  Even though the database is running on another server and that server CPU is only 30% utilized, there is latency involved in going to another server.  I figure it’s time to fire up memcached and point CakePHP at it (this proves to be a mistake).

CakePHP using memcached: 17.20 Pageviews/Second

I actually lose performance running cake’s caching through memcached.  After some investigation, I see that it only uses it to store paths, and realize that when using the default file cache backing it’s still sitting in memory and a file open/read is much quicker than making a connection to memcached and pulling the information.  I turn memcached back off for Cake’s caching.

Going back to the drawing board, I turn on database profiling and see that the session management is taking 500 milliseconds! (I had been using the database for storing sessions)  After some investigation, I find that CakePHP has an undocumented way of storing sessions in it’s cache. This time I decide to use the APC cache instead of memcached (using APC for Cake’s internal path/model caching makes no performance impact).  I can get away with APC for session caching since I only have 1 webserver (will have to go to memcached if I ever have more than 1 webserver).

Session Caching in APC: 20.55

The session caching did improve performance.  And it also improves my CPU utilization to 95%.  Still no disk IO.  Looking at the database profiles, I see a few queries which are a bit heavy.  I decide to integrate memcached into my CakePHP models.  Based on some code I found here, I reworked to be able to use in models as this is a more appropriate place than the controller.

Memcache for heavy calls: 21.55 Pageviews/Second

This helped some, and brought my CPU utilization to just over 99%.  While, you would think that this is good enough as it would be able to handle a front page Digg a few times over, I didn’t want to take any chances.  I needed to reduce the code path to get pageviews up.  That’s when I decided to code up “Lock Down Mode”. 

If the server gets hit hard enough causing the load to go above 10, the code will switch over to “Lock Down Mode”.  This reduces much of the site functionality, and makes most pages static instead of dynamic.  It will be good enough for 99% of the users and keep the site up.

Lock Down Mode: 108.70 Pageviews/Second

This seemed to do the trick. That should be enough to handle Digg, Slashdot, StumbleUpon and Reddit all at once.

Conclusion

While my work isn’t done, this is good enough for now. Going from 8.68 pageviews/second to 108 pageviews/second is not too bad. The secret to scaling a website is looking at what the current bottleneck and the figuring out how to address it.

Maybe one of these days I’ll write a post on speeding up the client side and moving to multiple servers.

Disclaimer: Yes, I know that this is not a truly valid test as I ran Apache Bench locally which was taking up CPU resources as well as not hitting the network path. But, it was good enough to show relative performance improvements. Additionally, there were numerous small tweaks along the way that I didn’t mention and many webpage specific performance improvements. Before all the website improvements, the “base” number was probably closer to 5 pageviews a second.

The Last Lecture

Wisdom:

Genetic Lib “press”

This article appeared in an Australian magazine

Inside the self-tuning “Genetic” Linux

In genetic-library updates: I/O Workload fingerprinting should be released to LKML soon. Brandon Philips has a running version of the 0(1) plugin, w/ a small positive improvement. That should be following shortly.

Genetic Library Updates

So, I’ve been pretty busy lately with MBA school, book writing, full work plate, businesses to run, personal issues, and attempting a life on top of that. And thus the genetic library has suffered due with a lack of updates. But enough of my excuses…what are the future plans to revive this patchset.

I have some good news. IBM has generously given me 1 PY of help to work on the genetic library which I will be overseeing. Unfortunately, I will still have my normal work responsibilities to attend to, so only my “free work time” will be available. This PY will attempt to create a plugin for the O(1) scheduler in order to increase the likelihood that the genetic lib will ever make it into the kernel. Resumes are being accepted as we speak. :)

In other good news, I have a “I/O Workload Fingerprinting” feature coming out on the genetic library, which will enable faster convergence. This feature is the main topic of my 2006 OLS paper, and thus I will actually have to do it.

Stay tuned for updates.

Who is Your Hero?

It strikes me that too many of America’s youth idolize sports stars as their heroes in life. Unfortunately, this idolization extends past the star’s abilities in the arena to their personal life. Needless to say, stars like Kobe Bryant and Mark McGwire are not model citizens.

A hero should be someone who you want to emulate, and has done something to be proud of.

My question is, why do not more children see their parents as heroes. From fathers who are over in Iraq, to mothers who work three jobs to support their family. These are America’s heroes.

While I do not believe kids should grow up thinking that working three jobs is something to emulate, I do believe the mentality of doing what it takes is something to be proud of and something to strive for.

Perspective

Sometimes the best way to prove a point is through an example. It is a little long, but intriguing until the end. Take a look at a simplier life:

   Body Ritual among the Nacirema

Read the comment below after reading the study.

T-minus 6 days

Only 6 days until I start B-School at University of Texas. The first week I go down to San Antonio for a Business Immersion Course (BIC). It’s basically business boot camp for us non-business types.

I’ll soon find out if my summer of business-book reading actually helped.

I’m excited about it right now, but I’m sure my tune will change once I’m studying for final exams again. My first classes are accounting and operations management. How hard can accounting be for an engineer? (famous last words)

genetic-lib 2.6.12-gl2 posted

After much work w/ Peter Williams and Con Kolivas, I finally have a decent performing genetic-lib implementation on interbench. It does orders of magnitude better in the burn & compile pieces. What really screws Zaphod and the genetic-lib up is when there is one interactive child running in the background w/ a bunch of threads that are doing while(1) {}. The interactive task does not get classified as such and they miss many of their checkpoints while the cpu hogs run. This latest version has a few bug fixes and detects interactive tasks sooner. The only thing to watch is that with the quicker detection of CPU hogs, a burst of CPU activity on X will get it classified as a CPU hog. Please let me know if you see as such.

I’ve been using interbench a lot lately, and it does an excellent job of catching the “bad children”. I can see almost immediately when some child gets mutated off into the woods and performance tanks. Con did a great job with the benchmark.

What is Your Most Valued Commodity?

Mine is time. With more and more people vying for bits of it, I have realized that time is a finite resource, and should be cherished. In the last few years, I have noticed a fundamental difference in how I think. I the past I valued money above time. I would rather spend a day putting in a garage door opener rather than paying someone $60 to do it. Now, $60 for a free day sounds like a great trade off.

That’s not to say I just throw money around to save my time. I don’t mind spending an extra hour negotiating a car salesman to save a $1,000 (plus I find it enjoyable). The trade off of how much money saved versus how much time it took must always be made.

There are other in-tangible commodities that the time trade off should be made. Mainly, relationships. It is very important to take time to maintain relationships as they are what will get you through when times are tough. They are also one of the greatest assets. Having a friend that knows a friend goes a long ways in developing a career or a business.

The final time trade off that I am always making is an emotional investment. To keep from burning out and experiencing the time-loss that goes along with it, methods of decompressing must be constituted. Some people use television, others use sports. Whatever method is used, the recreation trade off goes a long way.

Update: As one co-worker pointed out, I am probably not old enough yet, but in time, my most valued commodity will probably change to health.

« Previous Entries