Tuesday, August 28, 2012

Web caching! how relevant is it?

Why is caching done ?

It is done as an optimization step. Motive behind it, is to
  1. Reduce lag at the client side.
  2. Reduce load on the servers.
  3. Reduce the required bandwidth network infrastructure has to support.

Mostly web caching refers to caching of content by some intermediate machines or servers that are heavily accessed through it.

Caching can be done at different levels,

Following excerpt is from Wikipedia article Web_cache which explains some of the places content is cached.

Web caches can be used in various systems.
  1. A search engine may cache a website.
  2. A forward cache is a cache outside the web-server's network, e.g. on the client software's ISP or company network.
  3. A network-aware forward cache is just like a forward cache but only caches heavily accessed items.
  4. A reverse cache sits in front of one or more Web servers and web applications, accelerating requests from the Internet.
  5. A client, such as a web browser, can store web content for reuse. For example, if the back button is pressed, the local cached version of a page may be displayed instead of a new request being sent to the web server.
  6. A web proxy sitting between the client and the server can evaluate HTTP headers and choose to store web content.
  7. A content delivery network can retain copies of web content at various points throughout a network.

Now, does web caching still has some relevance left?

So let us consider this,

             How much time do you spend on the web browsing through general web pages, i.e. pages that are as seen by you can be seen AS IS by someone else sitting somewhere else?
           Mostly answer is not much, its because most of the pages that you use are personalized.

Read newspaper ? Probably you have personalized your news to suit your taste.
Read feeds? You have again personalized the contents that you want to read from.

          These two places you personalized the look & feel, presentation. But the content inside will be the ones which others can see.

But consider this,

Your mail,
Your social networking account
And many other kinds of accounts,

           All these are strictly personal,they are meant for you only. Going by the rules of caching heavily accessed content these data will be cached rarely. You are not the only one out there who are trying to access personal content.

           Forward cache, reverse cache, network aware cache and web proxy don't considers factors of personalization. Unless some of these servers are meant just for you!(in that case your are rich), or a group of small people

That leaves us with three more,

Web browser - this is the best place to cache content from personalization point of view, but what about data that is constantly changing, your social network feeds, your news, you mails.

Search engine - this place can be used to cache trending articles and results for most queried search terms. (How long can  same article be trending upwards?)

Content delivery - how many of you have not heard of content on demand? This serves the purpose of delivering what content you want and when you want. If some new movie is released (or any other content), it will have high demand on initial days, gradually demand for it also reduces.

           Since the content you browse are personalized, simply caching heavily accessed content doesn't work! Its relevance has reduced. Caching can be done to only certain extent. Replicating the servers to multiple places is the way forward.

           So my opinion here is that, "relevance of web caching going by the rules of caching heavily accessed content has reduced in the era of personalization if not completely eliminated"

Do leave your comments and opinions! Lets have an open discussion.

Thursday, August 23, 2012

Sniffer for Linux

 Introduction 

          To completely understand the article I expect the reader to know basic concept of network, different terms used in it and some C programming.

          This article explains how a small program can be used to sniff packets from the network. I worked on this about 21/2  years back. I didn't get a chance to share this back then. Content of this article is for Ubuntu and its derivatives. (It can be achieved on Microsoft Windows as well with some modifications to the code presented in this article and related changes to commands )

Content

           Before I start keep in mind that whenever I say network its LAN unless specified to be something else.

           The idea presented here is simple, with just few lines of code and some very good libraries(thanks to open source community), you can write a very strong tool which can be used to sniff packets out of the network, not just the packets that are meant for your machine, but the ones that are addressed to any machine on same network.

This is how packets reach the destination computer on LAN

           Whenever a packet reaches gateway, it verifies that if the destination is one of the computers in the LAN for which it is gateway. If yes then IP addresses are converted to corresponding MAC addresses using ARP protocol and then frames are forwarded to LAN, then the repeaters(if any) flood the LAN with these frames.

            So all the computers in LAN read these frames to match the destination MAC address against theirs. If it doesn't match then the frames are discarded else it is passed along the network stack.

This is what makes it possible to sniff frames

              It is possible to tell the computer to pass each frame up the network stack for any MAC address i.e its own MAC address and others'.  If you are able to achieve this, then you are sniffing! This is also called as Promiscuous mode.

So how do we achieve this?

Here is the code to do it. 

#include <pcap.h>
#include <stdio.h>
#include <arpa/inet.h>
#include <sys/time.h>
#define SIZE_ETHERNET 14
#define SNAP_LEN 1518

/* IP header */
struct sniff_ip {
  u_char  ip_vhl;                 /* version << 4 | header length >> 4 */
  u_char  ip_tos;                 /* type of service */
  u_short ip_len;                 /* total length */
  u_short ip_id;                  /* identification */
  u_short ip_off;                 /* fragment offset field */
  #define IP_RF 0x8000            /* reserved fragment flag */
  #define IP_DF 0x4000            /* don't fragment flag */
  #define IP_MF 0x2000            /* more fragments flag */
  #define IP_OFFMASK 0x1fff       /* mask for fragmenting bits */
  u_char  ip_ttl;                 /* time to live */
  u_char  ip_p;                   /* protocol */
  u_short ip_sum;                 /* checksum */
  struct  in_addr ip_src,ip_dst;  /* source and destination address */
};

#define IP_HL(ip)               (((ip)->ip_vhl) & 0x0f)
#define IP_V(ip)                (((ip)->ip_vhl) >> 4)

/* TCP header */
typedef u_int tcp_seq;

struct sniff_tcp {
  u_short th_sport;               /* source port */
  u_short th_dport;               /* destination port */
  tcp_seq th_seq;                 /* sequence number */
  tcp_seq th_ack;                 /* acknowledgement number */
  u_char  th_offx2;               /* data offset, rsvd */
#define TH_OFF(th)      (((th)->th_offx2 & 0xf0) >> 4)
  u_char  th_flags;
  #define TH_FIN  0x01
  #define TH_SYN  0x02
  #define TH_RST  0x04
  #define TH_PUSH 0x08
  #define TH_ACK  0x10
  #define TH_URG  0x20
  #define TH_ECE  0x40
  #define TH_CWR  0x80
  #define TH_FLAGS        (TH_FIN|TH_SYN|TH_RST|TH_ACK|TH_URG|TH_ECE|TH_CWR)
  u_short th_win;                 /* window */
  u_short th_sum;                 /* checksum */
  u_short th_urp;                 /* urgent pointer */
};

//
//Prints the payload to stdout or to fd 1. 
//
void print_hex_ascii_line(const u_char *payload, int len, int offset)
{
 const u_char *ch;

 ch = payload;
 int i;
 int gap;
 
 for(i=0; i<len; i++)
 {
  printf("%02x ",*ch);
  ch++;
  if(i==7)
   printf(" ");
 }
 if(len < 8)
  printf(" ");
 
 if (len < 16) 
 {
  gap = 16 - len;
  for (i = 0; i < gap; i++) 
  {
   printf("   ");
  }
 }

 printf("   ");

 ch = payload;
 for(i=0; i<len; i++)
 {
  if(isprint(*ch))
   printf("%c",*ch);
  else
   printf(".");
 
  ch++;
 }
 
 printf("\n");

 return;
}
 
//
//Does book-keeping work required to print the payload. 
//
void print_payload(const u_char *payload, int len)
{ 
 int len_rem = len;
 int line_width = 16;                    
 int line_len;
 int offset = 0;                          
 const u_char *ch = payload;
 
 if (len <= 0)
  returnif (len <= line_width) 
 {
  print_hex_ascii_line(ch, len, offset);
  return;
 }

 for ( ;; ) 
 {
  line_len = line_width % len_rem;
  print_hex_ascii_line(ch, line_len, offset);
  len_rem = len_rem - line_len;
  ch = ch + line_len;
  if (len_rem <= line_width) 
  {
   print_hex_ascii_line(ch, len_rem, offset);
   break;
  }
 }
 
 return;
}

int cnt=1;

//
// Prints the details of the packet captured
// Packet number.
// Protocol, Of course this program will always print tcp packets because of the filter used. :)
// Source IP address. 
// Destination IP address. 
//
void got_packet(u_char *args, const struct pcap_pkthdr *header, const u_char *packet)
{
 
 const struct sniff_ip *ip;  /* IP header */
 const struct sniff_tcp *tcp;            /* The TCP header */
 char *payload;

 int size_ip;
  int size_tcp;
  int size_payload;
 
 ip = (struct sniff_ip*)(packet + SIZE_ETHERNET);
 size_ip = IP_HL(ip)*4;

 printf("packet number %d\n",cnt);   // Prints the packet number. 
 cnt++;
 switch(ip->ip_p) {                  // Prints the protocol. 
  case IPPROTO_TCP:
   printf("   Protocol: TCP\n");
   break;
  case IPPROTO_UDP:
   printf("   Protocol: UDP\n");
   return;
  case IPPROTO_ICMP:
   printf("   Protocol: ICMP\n");
   return;
  case IPPROTO_IP:
   printf("   Protocol: IP\n");
   return;
  default:
   printf("   Protocol: unknown\n");
   return;
 }
 printf("       From: %s\n", inet_ntoa(ip->ip_src)); //Prints source IP address. 
 printf("         To: %s\n", inet_ntoa(ip->ip_dst)); //Prints destination IP address. 
 
 tcp = (struct sniff_tcp*)(packet + SIZE_ETHERNET + size_ip);
 size_tcp = TH_OFF(tcp)*4;
 
 payload = (u_char*)(packet + SIZE_ETHERNET + size_ip + size_tcp);
 size_payload = ntohs(ip->ip_len) - (size_ip + size_tcp);
 
 if (size_payload > 0) {
    printf("   Payload (%d bytes):\n", size_payload);
    print_payload(payload, size_payload);
 }
}

int main(int c,char *v[])
{
 char *dev;
 char errbuf[PCAP_ERRBUF_SIZE];
 char filter_exp[]="tcp"// Filter expression is set to tcp, so it captures only TCP packets. 
 pcap_t *handle;
 struct bpf_program fp;
 bpf_u_int32 net;
 bpf_u_int32 mask;
 if(c==2)
 {
  dev=v[1];
 }
 else if(c>2)
 {
  printf("unidentified  options\n");
  return 1;
 }
 else
 {
  dev=pcap_lookupdev(errbuf); // Checks for the default device. 
  if(dev==NULL)
  {
   fprintf(stderr,"couldn't find the default device %s\n",errbuf);
   return 1;
  }
 }
 printf("Probing device %s\n",v[1]);
 pcap_lookupnet(dev,&mask,&net,errbuf); // Gets the details of the device. 
 handle=pcap_open_live(dev,SNAP_LEN,1,1000,errbuf); // Opens the device for capturing the packets. 
 if(pcap_compile(handle,&fp,filter_exp,0,net) == -1) // Compilation with filter_exp. 
 {
  fprintf(stderr,"no valid filter expression");
  return 1;
 }
 pcap_setfilter(handle,&fp); // Sets the filter as tcp. 
 pcap_loop(handle,200,got_packet,NULL); // Sets the call-back method which would be called every time a packet is captured. 
 
 return 0; // Returns 0 if everything goes well. 
}
 

How to compile the code?

            First you have install the libraries that I was talking about, use the following command on Ubuntu or any other derivatives (for fedora use yum).

                 $ sudo apt-get install libpcap-dev

Us the following command to compile using gcc. 

                 $ gcc -o sniffer sniffer.c -lpcap

You can always crate makefile and make to build to save you from trouble of compiling every time.

How to run it?

If you are connected to network via wifi then,

                 $ sudo ./sniffer wlan0

(Here wlan0 identifies your device to be used to sniff packets)

If you are using wired connection then following command should suffice.

                 $ sudo ./sniffer

Its because eth0 is usually the default device for wired connections, which libraries will pick up directly, you don't have to explicitly set the device. But you can always explicitly specify the device. 

You can find out which device to use by running

                 $ ifconfig

it will list all the network interfaces, choose active one, usually its the one for which inet address, broadcast, mask and other details are assigned to, unless you are connected via two network interfaces simultaneously.

Here is the sample output

packet number 21
   Protocol: TCP
       From: 74.125.135.84
         To: 192.168.1.2
   Payload (133 bytes):
16 03 01 00 51 02 00 00  4d 03 01 50 36 37  97 dd    ....Q...M..P67..
2e 7c 49 6c 4d 7e a8  f c  e5 2f  c3 5c 2c  a0 12 32    .|IlM~.../.\,..2
e2 b2 43 9c 56 77 57 6f   2f a6 57 20 8b  0e 89 44    ..C.VwWo/.W ...D
ef  9a 31 b2 01 8a cc f3   ea 02 cd 59 01  e8  78 39    ..1........Y..x9
81 07 e0 9c 90 7b 2c f8   f9 40 6b 94 c0  11 00 00    .....{,..@k.....
05 ff  01 00 01 00 14 03   01 00 01 01 16  03 01 00    ................
24 9b c8 38 11 cb ea 21   1c 3a c3 8d 12  aa 4a 3d    $..8...!.:....J=
9e a1 03 14 57 9e 9b dd   ba 46 dc ba 18  ee f7  f8    ....W....F......
95 fd 84  6e 71                                                           ...nq

Do leave your feedback!

Q&A

1. The packet number in the above output is 21. What does it mean?

 First let me explain why multiple packets are used for communication
           For every communication that happens between computers via network (LAN, WLAN or any type of network), information is interchanged between them through series of packets that are sent and received by both computers. If information to be sent is too big to be sent in single packet, its divided into smaller chunks and sent as series of packets.

From the perspective of above program what number 21 mean?
         Packet number 21 means that from the time that this Sniffer has started executing, it has caught 20 packets already, and the above one is 21st.

Every time Sniffer is started,
Count is set to 1,
/*The below section is looped every time a packet is captured by sniffer */
For every packet captured,
                     it prints the Count as number of packet captured.
                     then Count is incremented by one and
                     prints all other details.
/*Looped till here */

This includes all the packets, including the ones that are sent and received by the machine that is running this Sniffer.

Saturday, August 11, 2012

Installing Linux Mint Maya on USB for a computer with Nviida GPU

          I have been a long term user of Linux. I had never tried installing persistent Linux on pen drive before, so gave it a try.

         I have listed below how I achieved this to help others who may want to try it. Before we start its essential to know your hardware,

My Hardware:

Core i7, NVIDIA GF108 [Quadro 1000M], 4 GB DDR3 RAM.

These are the things that matter  most.
Currently I am using Windows 7 OS.

Before we start, note that

Contents of this tutorial is valid only for Ubuntu ands its variants. (With little variations this can be applied for other flavors as well, googling will help you )

Things we need for this tutorial are

  1. Pen drive(at-least 2 GB). 
  2. Computer running Windows or Linux. Capable of booting from USB drive(thats basically anything thats manufactured in last 4 or 5 years). 
  3. Ubuntu or one of its variants.
  4.  UNetbootin software.

Steps!

         1. Download your favorite OS. I prefer Linux Mint. (Yes, Mint is based on Ubuntu)
You can find Mint here. I tried hands on KDE 64 bit version, but you can take your pick.There is one particular reason why prefer mint, it comes preloaded with many softwares, video and audio CODECS, which means you can play any music and movie out of the box and you don't miss out on the support of strong Ubuntu community.

        2.  Download UNetbootin. You can find it here.
Run it, you should be seeing window like this. 



           3.  Navigate to the downloaded OS, by choosing Diskimage and browse button,
           4.  Choose the amount of persistent storage you want in the space used to preserve files across reboots section.( I used 4096 MB, I am using 16 GB pen drive :) ).
         5.  Choose your USB Drive.

Hit OK,

It may look like UNetbootin has hung when it has finished about 80 to 90%, but don't worry its common, it takes about 10 min to finish. 
Once your done, reboot.

           6.  While rebooting(Before Windows logo appears!) hit Esc and choose boot from USB Drive,(this may be different for you, depending on your hardware, go through manual for you hardware to find ways to chose USB Drive as boot medium).

          7.  Navigate to "Try Linux Mint without installing" option and press TAB, add

              nouveau.modeset=0

after quiet splash. This is essential if you are using Nvidia GPU like me, otherwise Linux will not boot!. This step is to disable use of GPU. (Don't worry next step explains how to enable GPU).
              If you are not using nvidia GPU then you don't need this step and next step.

         8.  After booting has finished, open terminal( press Alt-F2 and choose Konsole) run following commands.

This adds repository for installing driver

               sudo add-apt-repository ppa:ubuntu-x-swat/x-updates

This will update the repository information

                sudo apt-get update 

 This installs nvidia dirvers.

               sudo apt-get install nvidia-current

This configures the drivers. 
 
               sudo nvidia-xconfig

Following command is for restarting X-server on KDE,

               sudo restart kdm

If you are using Gnome then

               sudo restart gdm

for information on how to restart other x-servers google it, you will find answers, or let me know I will help you out.


And you are done! These settings will be preserved across reboots.

If everything goes well you should be seeing something like this.


If you run into any trouble, feel free to contact me, do leave your feedback!