Page-level caching with Nginx 0.6

In a further attempt to modify my websites so that they can withstand the Digg Effect, I have looked into getting Nginx, a lightweight http server, to perform page-level caching.

Nginx can act as a reverse proxy, sending any HTTP request sent to it to another web server. It can also store the response to file, which can later be served on future requests.

Update 2010-04-10 – Nginx >= 0.7 now has proxy_cache* directives, to do this in those versions, see Page-level caching with Nginx

So I installed Nginx:

cd /usr/src
wget http://sysoev.ru/nginx/nginx-0.6.29.tar.gz
tar zxvf nginx-0.6.29.tar.gz

cd nginx-0.6.29
./configure --with-http_gzip_static_module --with-http_ssl_module
make
make install

Next, I moved Apache over to another port, e.g. 8080. To do this I modified the Listen and NameVirtualHost directives to this port, and modified the VirtualHosts to this. I also added to the following, so that the X-Real-IP header Nginx will send to Apache will be stored for scripts:

SetEnvIf X-Real-IP "^(.*)$" REMOTE_HOST=$1 REMOTE_ADDR=$1

I edited /usr/local/nginx/conf/nginx.conf, and commented out the default server, and added:

include vhosts/*.conf;

This makes it so you can specify multiple nginx server configurations in different files.

I also set Nginx’s user to apache, so that apache can also write to the cache files.

I made the /usr/local/nginx/vhosts directory, and created a new conf:

server {
    listen 80;
    server_name www.example.com;

    # if the request uri was a directory, store the index page name
    if ($request_uri ~ /$) {
        set $store_extra index.html;
    }

    # proxy module defaults
    proxy_store_access   user:rw  group:rw  all:r;
    proxy_set_header  X-Real-IP  $remote_addr;
    proxy_set_header  Host       $host;

    # if a precompiled gzip of the file exists, use it and force http proxies
    # to use separate cache's based on User-Agent
    gzip_vary on;
    gzip_static on;

    location / {
        root /var/www/${host}/cache;
        index index.html;

        # set the location the proxy will store the data to. Add the index page
        # name if the uri was a directory (Nginx can't normally store these)
        proxy_store $document_root${request_uri}${store_extra};

        # go through the proxy if there is no cache
        if (!-f $document_root${request_uri}${store_extra}) {
            proxy_pass http://localhost:8080;
        }

        # workaround. headers module doesn't take into account proxy response
        # headers. It overwrites the proxy Cache-Control header, causing
        # private/no-cache/no-store to be wiped, so only set if not using proxy
        if (-f $document_root${request_uri}${store_extra}) {
            expires 0;
        }
    }

    # don't cache admin folder, send all requests through the proxy
    location /admin {
        proxy_pass http://localhost:8080;
    }

    # handle static files directly. Set their expiry time to max, so they'll
    # always use the browser cache after first request
    location ~* (css|js|png|jpe?g|gif|ico)$ {
        root /var/www/${host}/http;
        expires max;
    }
}

Next I made the cache directory, and set the ownership so that Nginx could write to it, and started up Nginx:

mkdir /var/www/www.example.com/cache
chown apache /var/www/www.example.com/cache

/usr/local/nginx/sbin/nginx

All was working fine until I tested some pages that I didn’t want caching. PHP had set the Cache-Control headers so that proxies don’t cache/store them (private, no-cache, no-store), but Nginx was caching them anyway. I modified Nginx’s source so that it wouldn’t do this. The source can be found at the end of this blog entry.

Next, to address the Internet Explorer 6 bug, which doesn’t cache pages which use HTTP header “Vary: Accept-Encoding”, I modified the code to use “Vary: User-Agent”, which I have also included a patch below.

Now I have a fully static website, which will work with the minimum of resources, and will work even if Apache or MySQL go down. All thats left is:

  • Store extra proxy response headers, such as Content-Type, to be played back. My sites don’t in fact use file extensions for dynamic content, so Nginx can’t match up the file to its associated mime type. At the moment I’m forcing the default type to text/html.
  • Invalidate the appropriate parts of the cache when the dynamic content has changed, otherwise I have to manually delete the cache files to get updates.
  • Store the entire site in the cache. At the moment, Nginx is only using the cache if it has already stored the proxy response.
  • Store a gzipped version of the cache for browsers that support gzipped content.

The patch files I have created are for the latest version (0.6.29). To patch your copy do:

cd /usr/src/nginx-0.6.29
patch -p0 < /path/to/patch.diff