Let’s say you get 1000 hits per second on your website’s home page, you want the users to feel that the site is as real time as possible without overloading your database. These hits on your site display some data from the database and are mostly reads; for every 10 reads there is one write.
Yes you can cache it for 1 minute and the database won’t be hit that much. By doing this it will look like your site doesn’t update so often. Now what will happen if you do a 1 second cache? You will eliminate most of the reads and your site will still look like it is updating real time. If a user hits F5/CTRL + R every couple of seconds he will see new content. Does this make sense? I am interested in your opinion.
[edit]I am talking about database results caching here. For example take a page like the new submission page on digg, a lot of stuff gets submitted every second, a lot more gets read of course. If you cache this for one second you eliminate a lot of the reads[/edit]
6 Comments
I can see that having an effect with the given numbers. But what about a 5 seconds cache. Afterall it takes at least a couple seconds to load any page in the browser. I think 5 or even 10 seconds would be just as good unless the app is Ajaxified. Then I would go lower.
do you mean browser caching, data caching (e.g. memcached/Application Cache) or output/proxy caching (e.g. squid) ?
I think it depends what you want to save, and how often the data is modified… if it is modified once a day, then caching for 1 second is a bit daft 🙂 but if it can change within seconds, you probably can’t live with multi-minute or hour caches.
Again, it depends what you’re caching and where, but the most sensible approach is to consider the lifecycle of the data to understand how long it can be cached for. You can of course do explicit event driven cache refreshing – e.g. when you update dataset x or namespace y or even object z, then replace the cached object (though it depends whether you’re caching single items or sets etc), but it can be a useful option.
I think we need more info if you want a better answer as there is no magic one-size-fits-all solution, and it’s very dependable on the dynamics of the architecture and the data.
Yes, it’s a potentially good idea, as long as you are only caching what you actually need. I think you would need to decide which parts of the page need caching. Is it just a certain table of data, a few tables, or the whole page that is changing? You may even decide to have a collection of cached data objects which are set at different intervals as some may not have the same priority as others.
I modified the post because I was not clear in explaining what I was talking about
So, given your 1000 hits and 100 writes, you want to serve up 1 version per second, rather than 100 versions per second?
Insanely brilliant!
No one is going to know (or care) that it’s being updated once per second rather than 100 times per second.
I would totally do it, and I would do it at the output level (squid). This isn’t just limited to web interfaces, you can leverage this even in high volume applications as well. My current company has a data driven distributed application, and I’m looking to migrate as much of the data driven logic to generated code logic as possible. Anything we can do to drop data access, even cached data access, results in a massive performance increase. And seeing as how the data is static once the application is distributed, there’s no reason to even make data access calls.