You are on page 1of 6

Computer Networks Project

Deadline : 7th April 2012 : 9pm Firm Deadline , No extensions. Project Goal : To build a TCP based proxy web server that serves out HTTP requests of clients with a cache replacement policy. Requirements : 1) Build a client program that sends HTTP requests to the proxy server using TCP as the transport layer protocol. The client program will take as input the URL being requested by the client and generate a HTTP request using the typed in URL. The client can send a regular GET request. (10% weightage) The client can send a conditional GET request. 2) Build a proxy server program that has two functions -- First function, is to accept HTTP requests from client programs using a TCP socket. The proxy server will first check if the received HTTP request is properly formed and then do further processing. If the HTTP request is poorly formed then the proxy server will return an error message back to the client. (10% weightage) -- Second function, is to process the HTTP request. The proxy server opens a TCP connection to the corresponding host mentioned in the client HTTP request and downloads the requested URL object. The server will maintain a local copy of the object which includes the last modified time of the object. The server then transmits this data back to the client program using the TCP socket. All further requests to the same URL are processed by the server itself, until the client issues a conditional-GET request (20% weightage) 3) For a single HTTP object request the client can issue a normal request to the proxy. 4) For multiple HTTP objects, each request must be sent in a different thread and processed accordingly. (Points 3,4,5 --> 10% weightage) 5) There can be more than one client (from the same host machine) that can issue HTTP requests (single or multiple object requests). For grading purpose we will test the code with two concurrent clients from the same PC. 6) The cache will be limited to 100 HTTP requests. If more requests are made then the

cache needs to pick one object from the cache for replacement. The following policies are suggested. a). The oldest object in the cache. b). The largest object in the cache. For each of these calculate the cache hit-ratio and miss-ratio which can be estimated as follows: Hit-ratio= 100*Number of hits/total number of requests Miss-ratio=(100-hit-ratio) .Each time some request is sent to proxy server by client , hit-ratio and miss-ratio should be reported to the client apart from the response data. (20% weightage) 7) Implement an efficient way of storing the entries in cache so that search,insert and delete operations are fast. Think of an apt data-structure (Heap, Binary search tree, AVL tree, Hash table, Tries etc) for faciliating this. Cache entries are to be indexed by URLs. The search, deletion and addition complexity must be reported when the cache is full (100 entries). The average of search/deletion/add complexity for the cache, starting from the point where it has one entry and uptil 100 entries must be reported as well (in seconds) in your design overview document. (20% weightage) 8) Log File should be maintained which will report the responses of the proxy server. Further details are explained in Project Details section. You should implement the following commands in your client side which retrieves information from proxy server: a) Print Cache : Prints the data in the look up table of your cache. b) Print Log : Prints the data in the log file. c) Search key : Given key, you should be able to search your cache table and report the URL of the key if present else report Not Present. Example : Search amazon should return http://www.amazon.com if present else report Not Present. Your data structure should help you in fast search. Please note that your cache has fixed size, hence, the cache table will change as explained in point 6. But your log file would report every action of your proxy server hence would be ever increasing. (10% weightage) Critical Part : 1) The communication between your proxy server and your program must take place over TCP. 2) The cache must be searchable by host names, i.e., all URLs belonging to a particular host must be displayed if the user inputs the host name, e.g., Amazon, Rediff etc.

3) The language of coding will be C/C++. 4) The projects will be done in groups of 2. One student can make the server and the other student can work on the client program. Mention your group members inside a separate Readme file that you will upload along with the project. - You can do the project alone if you wish. The grading will be uniform either way. 5) Plagiarism of any sort will give you a zero in the project. Sample Input : http://cstar.iiit.ac.in/cn/index.html Sample output : the html code of the page/error response , hit ratio , miss ratio Project Details : 1) You will have to read the following parts from the HTTP 1.1 (RFC 2616). The link for the RFC can be found on the internet. a. HTTP Request format b. HTTP Response format 2) Design your Proxy Server. It should handle these functionalities : read the request from the client satisfy the request from the local cache if possible if the request cannot be satisfied from the local cache then form a valid HTTP request and send it to the server. read the response from the server. check the Content-Type: header field of the response. if the type is not allowed then inform the client in a HTTP response. Send the response to the client Cache the content locally for future use 3) The HTTP client (eg: mozilla/your PC) will send a complete HTTP request to the proxy server. You can pass the same request to the HTTP server after you have extracted any information necessary for your purpose. An example of the client requesting a web page is shown below: When client requests www.gnu.org, your proxy will see the following request: GET http://www.gnu.org/ HTTP/1.1 Host: www.gnu.org User-Agent: Mozilla/5.0 (X11; U; SunOS sun4u; en-US; rv:1.1) Gecko/20020827 Accept:text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/

plain;q=0.8,video/x-mng,image/png,image/jpeg,image/gif;q=0.2,text/css,*/*;q=0.1 Accept-Language: en-us, en;q=0.50 Accept-Encoding: gzip, deflate, compress;q=0.9 Accept-Charset: ISO-8859-1, utf-8;q=0.66, *;q=0.66 Keep-Alive: 300 Proxy-Connection: keep-alive Referer: http://www.gnu.org/ It may not exactly be this but something similar to this. Note the first two lines (shown in bold). These two lines are necessary in HTTP 1.1. The first line is the HTTP GET request. The format of the GET request is GET <complete-path-of-resource> HTTP/1.1 The second line is the HTTP header Host: www.gnu.org A control feed and a newline character terminates each line (CRLF). When the browser connects to the proxy server it will send all these lines. You can use a single read statement to read this from the socket. 4) Proxy server will have to extract the hostname of the HTTP server to establish a connection with it. It can extract the hostname from the Host: header field in clients request. Once the proxy server opens a socket connection to the HTTP server, it can pass the same request that it received from the client. The HTTP server will respond to proxy servers request with a HTTP response. The format of the response is as shown: HTTP/1.1 200 OK Date: Mon, 08 Sep 2003 15:37:44 GMT P3P: policyref="http://p3p.yahoo.com/w3c/p3p.xml", CP="CAO DSP COR CUR ADM DEV TAI PSA PSD IVAi IVDi CONi TELo OTPi OUR DELi SAMi OTRi UNRi PUBi IND PHY ONL UNI PUR FIN COM NAV INT DEM CNT STA POL HEA PRE GOV" Cache-Control: private Pragma: no-cache Expires: Thu, 05 Jan 1995 22:00:00 GMT Connection: close Content-Type: text/html Content-Length: 1442 <data> Usually the response headers have one Last-Modified header also. If you dont get one in your response, then use the Date header vale as the last-modified header.This lastmodified header would be useful in your caching.

5) Caching: Properly handle the caching directives in the response headers, according to the HTTP/1.1 protocol specification. For example you should not cache a document for which caching directive indicates no-caching. Format of the local proxy cache: a. Make a directory called ProxyServerCache and save the requested files in this directory. b. You can use any format to store the files inside the directory. For example if the request is GET http://www.gnu.org HTTP/1.1 then you can create a directory inside the ProxyServerCache directory and store all the files for this site inside that directory. Create a lookup table which contains relevant information(like host URL, last modified date, path where file is stored etc). Basically you should be able to satisfy the request from the local cache if it is available locally. Think of an apt data-structure (Heap, Binary search tree, AVL tree, Hash table, Tries etc) for faciliating this. Cache entries are to be indexed by URLs. The search, deletion and addition complexity must be reported when the cache is full (100 entries). The average of search/deletion/add complexity for the cache, starting from the point where it has one entry and uptil 100 entries must be reported as well (in seconds). 6) Logging: Record the information about the requests and caching of responses Logfile. Record in the logfile whether the response was served from cache or whether the origin server was contacted to get the response. www.google.com::served from cache www.gmail.com::contacted origin server When a response is cached, record the name of the cache file in the Logfile along with its size, and its expiration time. For example: www.google.com::ProxyServerCache/<fname> 12450 Fri, 21 Jan 2011 22:21:32 GMT

Things to submit : Submit a tar.gz file with the name: Project_RollNo1_RollNo2.tar.gz The tar file should contain: 1. All the required C/C++ files. 2. Your cache repository 3. Submit a design overview document up to 2-3 pages describing schematic design of your proxy server and important components.The average of search/ deletion/add complexity for the cache, starting from the point where it has one

entry and uptill it has 100 entries must be reported as well (in seconds). 4. Include a README file if there are specific instruction that need to be followed to execute your proxy server.

You might also like