Tuesday, August 28, 2012

Web caching! how relevant is it?

Why is caching done ?

It is done as an optimization step. Motive behind it, is to
  1. Reduce lag at the client side.
  2. Reduce load on the servers.
  3. Reduce the required bandwidth network infrastructure has to support.

Mostly web caching refers to caching of content by some intermediate machines or servers that are heavily accessed through it.

Caching can be done at different levels,

Following excerpt is from Wikipedia article Web_cache which explains some of the places content is cached.

Web caches can be used in various systems.
  1. A search engine may cache a website.
  2. A forward cache is a cache outside the web-server's network, e.g. on the client software's ISP or company network.
  3. A network-aware forward cache is just like a forward cache but only caches heavily accessed items.
  4. A reverse cache sits in front of one or more Web servers and web applications, accelerating requests from the Internet.
  5. A client, such as a web browser, can store web content for reuse. For example, if the back button is pressed, the local cached version of a page may be displayed instead of a new request being sent to the web server.
  6. A web proxy sitting between the client and the server can evaluate HTTP headers and choose to store web content.
  7. A content delivery network can retain copies of web content at various points throughout a network.

Now, does web caching still has some relevance left?

So let us consider this,

             How much time do you spend on the web browsing through general web pages, i.e. pages that are as seen by you can be seen AS IS by someone else sitting somewhere else?
           Mostly answer is not much, its because most of the pages that you use are personalized.

Read newspaper ? Probably you have personalized your news to suit your taste.
Read feeds? You have again personalized the contents that you want to read from.

          These two places you personalized the look & feel, presentation. But the content inside will be the ones which others can see.

But consider this,

Your mail,
Your social networking account
And many other kinds of accounts,

           All these are strictly personal,they are meant for you only. Going by the rules of caching heavily accessed content these data will be cached rarely. You are not the only one out there who are trying to access personal content.

           Forward cache, reverse cache, network aware cache and web proxy don't considers factors of personalization. Unless some of these servers are meant just for you!(in that case your are rich), or a group of small people

That leaves us with three more,

Web browser - this is the best place to cache content from personalization point of view, but what about data that is constantly changing, your social network feeds, your news, you mails.

Search engine - this place can be used to cache trending articles and results for most queried search terms. (How long can  same article be trending upwards?)

Content delivery - how many of you have not heard of content on demand? This serves the purpose of delivering what content you want and when you want. If some new movie is released (or any other content), it will have high demand on initial days, gradually demand for it also reduces.

           Since the content you browse are personalized, simply caching heavily accessed content doesn't work! Its relevance has reduced. Caching can be done to only certain extent. Replicating the servers to multiple places is the way forward.

           So my opinion here is that, "relevance of web caching going by the rules of caching heavily accessed content has reduced in the era of personalization if not completely eliminated"

Do leave your comments and opinions! Lets have an open discussion.

No comments:

Post a Comment