Lessons Learned from Our Most Challenging Projects
Ever hear the phrase “clear your cache” and wondered about this voodoo magic? In most situations, people are probably referring to your browser cache so that you can see the latest data or content on a website or application. There are, however, four major caching types typically used in web development and hosting.
Let’s dive in and explore them.
Web Caching (Browser/Proxy/Gateway):
Browser, Proxy, and Gateway caching work differently but have the same goal: to reduce overall network traffic and latency. Browser caching is controlled at the individual user level; where as, proxy and gateway is on a much larger scale. The latter two allow for cached information to be shared across larger groups of users. Commonly cached data could be DNS (Domain Name Server) data, used to resolve domain names to the IP addresses and mail server records. This data type changes infrequently and is best cached for longer periods of time by the Proxy and/or Gateway servers. Browser caching helps users quickly navigate pages they have recently visited. This caching feature is free to take advantage of and is often overlooked by most hosting companies and many developers. This process requires Cache-Control and ETag headers to be present to instruct the user’s browser to cache certain files and for a certain period of time.
Data caching is a very important tool when you have database driven applications or CMS solutions, and this is my favorite type of caching. It’s best used for frequent calls to data that does not change rapidly. Data caching will help your website or application load faster giving your users a better experience. It does this by avoiding extra trips to the DB to retrieve data sets that it knows has not changed. It stores the data in local memory on the server which is the fastest way to retrieve information on a web server. The database is the bottle neck for almost all web application, so the fewer DB calls the better. Most DB solutions will also make an attempt to cache frequently used queries in order to reduce turnaround time. For example, MS SQL uses Execution Plans for Store Procedures and Queries to speed up the process time. It’s standard practice to clear any cache data after it has been altered. This way the CMS’ front-end will always have the most recent data and will not need to hit the database each time a user hits a page. Note: Overuse of data caching can cause memory issues if you create a loop that is constantly adding and removing data to and from cache. However, when this technique is coupled with AJAX requests, which do partial page loads, you can dramatically enhance your user’s experience and wait time.
Most CMS have built in cache mechanisms; however, many users don’t understand them and simply ignore them. It’s best to understand what data cache options you have and to implement them whenever possible. Application/Output caching can drastically reduce your website load time and reduce server overhead. Different than Data Caching, which stores raw data sets, Application/Output Caching often utilizes server level caching techniques that cache raw HTML. It can be per page of data, parts of a page (headers/footers) or module data, but it is usually HTML markup. I have personally used this technique on many sites and CMS’s, and have seen page load times reduced by more than 50%.
Distributed Caching is for the big dogs. Most high volume systems like Google, YouTube, Amazon and many others use this technique. This approach allows the web servers to pull and store from distributed server’s memory. Once implemented, it allow the web server to simply serve pages and not have to worry about running out of memory. This allows the distributed cache to be made up of a cluster of cheaper machines only serving up memory. Once the cluster is setup, you can add new machine of memory at any time without disrupting your users. Ever notice how these large companies like Google can return results so quickly when they have hundreds of thousands of simultaneous users? They use Clustered Distributed Caching along with other techniques to infinitely store the data in memory because memory retrieval is faster than file or DB retrieval. A few popular systems are Memcached for Linux and AppFabric for Windows Server. Although your average website will not need this much horse power, it’s very cool to see it implemented. I have personally implemented such a system, and it’s quite a powerful structure once deployed.
I hope this helps explain the four major types of caching. Leave a note/question below if you have any follow-up thoughts.