You are on page 1of 31

Hypertext Transfer Protocol

IS 373—Web Standards
Todd Will
Topics
• Intro to HTTP
• Following links
• What actually happens during a request
• Content
• Tips and Tricks
• For Next Week

CIS 373---Web Standards-HTTP


2 of 31
Intro
• HTTP is the Hypertext Transfer Protocol
• When you browse the web, you transfer data between the server and your client
machine using http
• Major steps performed
– You start up your browser that can understand and display html text
– You either click on a link or type a link into the address space
– You make a request of a web server (it listens to and responds to requests for data from
the client)
• This request can be any digital resource
– The web server executes the request and delivers the returned document to the user
• The web server identifies the type of document to the browser
– The browser displays the document
• Images, JavaScript, style sheets are downloaded if referenced
• Each additional item that is retrieved generates an additional request to the server
• HTTP only defines how the browser and the web server communicate with each
other
– Actual data moved using the TCP/IP protocol
• Simplified version of how HTTP works

CIS 373---Web Standards-HTTP


3 of 31
HTTP Versions
• HTTP/0.9
– Very primitive standard
– Earliest version
• HTTP/1.0
– In common usage today
– HTTP/0.9 very rarely used anymore
• HTTP/1.1
– Extends and improves HTTP/1.0
– Supported by few browsers
– Client can keep request open after downloading the file so that a new
request does not have to be generated
• Decreases server load
• Reduces bandwidth

CIS 373---Web Standards-HTTP


4 of 31
What happens in HTTP?
• Parse the URL
– The browser must identify the url of the request
– Most url’s have the form:
• protocol://server/request-URI
– Protocol tells the server the document you want and how to retrieve it
– Server part tells the web server which server to query to find the document
– Request-uri tells the specific document to retrieve
• Sending the Request
– Most usually, the protocol will be http
• Sometimes it can be https to request the data over a secure connection
• Assume you wanted the document http://web.njit.edu/~txw5999/index.html
– GET /~txw5999/index.html HTTP/1.0
– Note – the request is all the server sees, independent of where the request
originated, whether it be by a robot, link validator, or browser

CIS 373---Web Standards-HTTP


5 of 31
Server Response
• Step 3: The server response
– Upon receiving the request, the web server must identify the document and return it
to the user
– Sample header content returned to the browser
• HTTP/1.0 200 OK Server: Netscape-Communications/1.1 Date: Tuesday, 25-Nov-97 01:22:04
GMT Last-modified: Thursday, 20-Nov-97 10:44:53 GMT Content-length: 6372 Content-type:
text/html <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"> <HTML> ...
Followed by the html page
• HTTP/1.0 tells the browser the version of http used
• 200 OK is the most common response, this is the code returned by the server to say all is well
(more on this later)
• Server: Netscape-Communications/1.1 is the web server that returns the document
• Date: Tuesday, 25-Nov-97 01:22:04 GMT is the date and time of the request
• Last-modified: Thursday, 20-Nov-97 10:44:53 GMT tells the last time the document was
modified (useful in caching)
• Content-length: 6372 is how many bytes the document is
• Content-type: text/html tells the browser the returned document type, could be image/gif or
something else
• <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"> is the version of html to
be used
– The browser does not care how the page was produced, could be by scripts or straight
html

CIS 373---Web Standards-HTTP


6 of 31
The Client Request
• All requests follow the same basic pattern
• [METH] [REQUEST-URI] HTTP/[VER] [fieldname1]: [field-value1]
[fieldname2]: [field-value2] [request body, if any]
– The METH (for request method)
– The request body uri is the url to be retrieved
– Ver is the http version used
– Fieldname and values are on the next slide
• Getting a document
– Get request means to send me a document
– Assume you wanted the document http://web.njit.edu/~txw5999/index.html
– GET /~txw5999/index.html HTTP/1.0

• Longer version request


– GET / HTTP/1.0 User-Agent: Mozilla/3.0 (compatible; Opera/3.0; Windows 95/NT4)
Accept: */* Host: web.njit.edu:81
• Head works just like GET except just the header will be retrieved

CIS 373---Web Standards-HTTP


7 of 31
Get Header Fields
• Some of the header fields that can be used with GET are:
• User-Agent
– Identifies the user-agent
– Examples: "Mozilla/4.03 [en] (WinNT; I ;Nav)”
• Referer
– The referer field (yes the standard spells it this way)
– Logs where the page request came from
– Useful to find out where your audience is located
• If-Modified-Since
– If the browser has the document in its cache, this field can be set to the last time this version was
received
– If the document is out of date, then it can be reloaded from the web server
• Checks to make sure that the cache is current
• From
– The from field contains the email address of the person who is using the agent
– SPAMMER’s DREAM
– Web robots use it sometimes so that webmasters can contact the sender of the robot
• Authorization
– Holds the username and password of the user if authorization is required to access the page

CIS 373---Web Standards-HTTP


8 of 31
HTTP Status Codes
• No need to memorize, just know they exist
• Codes are the same, but text can be different
• 1xx Informational
• Request received, continuing process.
– 100: Continue
– 101: Switching Protocols
• 2xx Success
• The action was successfully received, understood, and accepted.
– 200: OK
– 201: Created
– 202: Accepted
– 203: Non-Authoritative Information
– 204: No Content
– 205: Reset Content
– 206: Partial Content
– 207: Multi-Status

CIS 373---Web Standards-HTTP


9 of 31
3xx Status Codes
• 3xx Redirection
• The client must take additional action to complete the
request.
– 300: Multiple Choices
– 301: Moved Permanently
– 302: Found
– 303: See Other (since HTTP/1.1)
– 304: Not Modified
– 305: Use Proxy (since HTTP/1.1)
– 306 is no longer used, but reserved. Was used for 'Switch
Proxy'.
– 307: Temporary Redirect (since HTTP/1.1)
CIS 373---Web Standards-HTTP
10 of 31
4xx Status Codes
• The request contains bad syntax or cannot be fulfilled.
– 400: Bad Request
– 401: Unauthorized
– 402: Payment Required
– 403: Forbidden
– 404: Not Found
– 405: Method Not Allowed
– 406: Not Acceptable
– 407: Proxy Authentication Required
– 408: Request Timeout
– 409: Conflict
– 410: Gone
– 411: Length Required
– 412: Precondition Failed
– 413: Request Entity Too Large
– 414: Request-URI Too Long
– 415: Unsupported Media Type
– 416: Requested Range Not Satisfiable
– 417: Expectation Failed
– 449: Retry With

CIS 373---Web Standards-HTTP


11 of 31
5xx Status Codes
• Server Error
• The server failed to fulfill an apparently valid
request.
– 500: Internal Server Error
– 501: Not Implemented
– 502: Bad Gateway
– 503: Service Unavailable
– 504: Gateway Timeout
– 505: HTTP Version Not Supported
– 509: Bandwidth Limit Exceeded
CIS 373---Web Standards-HTTP
12 of 31
Browser Cache
• If a page has already been retrieved by your browser, it is
usually stored in your cache
• If you return to that page, your browser will first check to see
if the data on that page has already been downloaded and on
your local drive
– If it finds the page or images, the browser will load those images from
your cache and only make the request to the web server for the
changed information
– Usually set a max size or a time limit to keep stored pages in your
cache
• Most browsers have a refresh button that can be selected to
force a reload of the page
• Reduce the number of requests and the server load as well as
reducing bandwidth costs substantially
CIS 373---Web Standards-HTTP
13 of 31
Proxy Cache
• Browser cache’s are stored on the local machine whereas a proxy cache is
stored on a proxy server
• The proxy is essentially a cache for many different users
• The user’s browser now checks the proxy to see if a page is already loaded
into its cache
– If the page is found, the page is loaded into the user’s browser cache
– If the page is not found, the request is made of the web server
• After getting the new page from the server, it is loaded into the proxy cache for
anyone else that may request that page
• The proxy then returns the cached page or item to the user’s local cache
• Proxy cache reduces network traffic dramatically and substantially reduces
the load on the web server
• Skews log statistics dramatically as the requests if they can be filled by the
proxy cache are not seen by the web server

CIS 373---Web Standards-HTTP


14 of 31
Proxy Cache Hierarchy
• You can also have a hierarchy of proxy caches, as in each
department in a company could have its own smaller proxy
cache. The page would be loaded from the local proxy cache
if it can be found, and if not, then make the request of the
company cache. If the company cache cannot fulfill the
request, then the request is sent to the web server to be filled.
The returned page or item will then be sent to the local proxy
cache and then to the local cache.
– This method has an even larger reduction in network traffic
– However, the pages may not be the most current version of the page
that the web server would return
– The cache should be cleaned out as pages go more out of date

CIS 373---Web Standards-HTTP


15 of 31
Caching Diagram

CIS 373---Web Standards-HTTP


16 of 31
Cache Replacement Algorithms
• LRU: the algorithm replaces the least recently used
document
• FBR (Frequency Based): the algorithm takes into
account both the recency and frequency of access to a
page
• LRU/2: the algorithm replaces the page whose
penultimate (second-to-last) access is least recent
among all penultimate accesses
• SLRU: the algorithm combines both the recency and
frequency of access when making a replacement
decision

CIS 373---Web Standards-HTTP


17 of 31
Replacement Algorithms
• (hit rate) of all algorithms increases with cache size
• For caches larger than 1 Gbyte all algorithms
perform very close to the best . For very small caches
(<100 Mbytes) FBR and SLRU have the best
performance followed by LRU/2 and LRU. For mid-
size caches, LRU/2, SLRU and FBR have similar
performance followed by LRU. For large caches (> 1
Gbyte) all algorithms have similar performance.

CIS 373---Web Standards-HTTP


18 of 31
Server Side Programming
• Server side scripts run on the web server to respond
to requests from the client
• There is no way for the client to know whether the
page has been generated from a script or was a
straight html file
• Used to dynamically change the output of a page
based on some type of input
– Can accept input from cookies to identify the user for
example and check for authorization to download a file
– Can also accept parameters as passed in the address bar

CIS 373---Web Standards-HTTP


19 of 31
Server Side Programming
• When to use server side instead of client side
– Client side will be much faster to run since it does not need to generate a new
request to the web server every time something changes
– User server side when data that needs to be accessed is on the web server and
not on the client machine
• Use server side to interact with a database on the web server
• Best used when infrequent interactions are required with the server
• Need to use server side when gathering information over time and the data is stored
on the web server
• Take for example Google
– It would not be good to download Google’s entire catalog of pages to the client
– Better to send the search query to the web server at Google and only return
those documents that match the user query
– Checking to ensure that the user has entered a search query before sending the
request to the web server would be best served by using a client side script

CIS 373---Web Standards-HTTP


20 of 31
CGI
• Stands for Common Gateway Interface
• A method that allows for web servers and client side
pages to interact with each other
• Used in the same way by almost all web servers in
existence
• Web server needs to differentiate between scripts and
ordinary html files
– CGI scripts are placed in different cgi directories on the
server
– The web server is configured to identify all files in a
particular folder as cgi scripts
– Default directory is cgi-bin
CIS 373---Web Standards-HTTP
21 of 31
More About CGI
• CGI programs are ordinary executable programs written in
some language and compiled
• The CGI script contains a number of environment variables
– Think of the ?variable=value seen on web pages
• Example – the developer could require that the ip address be a
variable to ensure that a hit counter only counts unique visitors
– <img src="http://stats.vendor.com/cgi-bin/counter.pl?ip address”>
• The CGI script returns a text string that can be used to identify
the image to be displayed as above
– New image source would be:
• <img src=“1000hits.jpg”>

CIS 373---Web Standards-HTTP


22 of 31
Server Side Programming
• CGI is one way to develop sever side scripts
– Slow and inefficient to use
• Better way is to use a server Application
Programming Interface (API)
– The program essentially is a part of the server process
– The programming language is server dependent
– Much faster since the program is already in memory and
the data that is required can be easily inputted and results
obtained
– Examples include ASP, Java Server Pages, Python

CIS 373---Web Standards-HTTP


23 of 31
Server Logs
• Most servers keep a log of all requests and responses
generated by the server
• A sample log is as follows:
– rip.axis.se - - [04/Jan/1998:21:24:46 +0100] "HEAD /ftp/pub/software/
HTTP/1.0" 200 6312 - "Mozilla/4.04 [en] (WinNT; I)"
tide14.microsoft.com - - [04/Jan/1998:21:30:32 +0100] "GET
/robots.txt HTTP/1.0" 304 158 - "Mozilla/4.0 (compatible; MSIE 4.0;
MSIECrawler; Windows 95)" microsnot.HIP.Berkeley.EDU - -
[04/Jan/1998:22:28:21 +0100] "GET /cgi-bin/wwwbrowser.pl
HTTP/1.0" 200 1445 "http://www.ifi.uio.no/~larsga/download/stats/"
"Mozilla/4.03 [en] (Win95; U)" isdn69.ppp.uib.no - -
[05/Jan/1998:00:13:53 +0100] "GET /download/RFCsearch.html
HTTP/1.0" 200 2399 "http://www.kvarteret.uib.no/~pas/"
"Mozilla/4.04 [en] (Win95; I)" isdn69.ppp.uib.no - -
[05/Jan/1998:00:13:53 +0100] "GET /standard.css HTTP/1.0" 200
1064 - "Mozilla/4.04 [en] (Win95; I)"

CIS 373---Web Standards-HTTP


24 of 31
Server Logs (cont)
• This log can be useful in troubleshooting or
finding dead links
• You can also track page views to determine
the popularity of a page
• Good practice to review these to see if your
web server is having any problems
• Caching of web pages can cause problems as
they will be viewed but not counted in the
server log

CIS 373---Web Standards-HTTP


25 of 31
Cookies
• In HTTP, each request is counted as an individual request
• Only way to transfer data between each request is by passing
parameters or by using cookies
• Say you want to allow a user to log in to your website and
maintain that login across several different pages
– In straight http, you cannot pass the user information between different
pages
– You would need to generate a cookie to store the userid of the current
logged in user
• Cookies store data
• Cookies have an expiry date (can be days, weeks, or session)
• Need a script to generate and read data from cookies (html cannot do this)

CIS 373---Web Standards-HTTP


26 of 31
Cookies (cont)
• Useful to keep track of user data, but privacy issues
are involved that need to be resolved
• Keep in mind that the user can turn off cookies in his
or her browser, so you may want to design so that the
system will not fail if a cookie is failed to be read
– Can check to see if the browser supports cookies by trying
to write a cookie and then read it back

CIS 373---Web Standards-HTTP


27 of 31
Tips and Hints about HTTP
• Hiding the source
– No but you can try to hide it by putting blank lines at the top
– Can make the source look messy so the user has a hard time finding a particular
part
– Web crawlers can save the html page without even loading it in browser
• Downloading images
– You cannot stop the user from downloading images from your site
– Can watermark images to show where they cam from
– HTML request does not care what kind of document it is and will return it
anyway
• Passing parameters between web pages
– Can’t do this if the type is plain html
– Need to design a dynamic web page using asp or java in order to use the
parameters
– HTML pages will just drop the parameters and do nothing with them

CIS 373---Web Standards-HTTP


28 of 31
Tips (cont)
• Preventing browsers from caching pages
– Set the expiration date of the content to a past date
– Advantage of caching is that the browser can fetch the page from the
cache without generating a new request to the web server
• Using slash at the end of a url
– If the url points to a directory then yes
– If it is not included, then the web server must first check for the file
and then if the file does not exist then try to find the directory
– Some web servers can automatically direct you to an index or default
file, but only if you include the slash
– Good practice to do so

CIS 373---Web Standards-HTTP


29 of 31
Remember
• HTTP Verbs spell CRUD
• Create  PUT
• Read  GET
• Update  POST
• Delete  DELETE
• Of these, GET and POST are the most
important

CIS 373---Web Standards-HTTP


30 of 31
For Next Week
• Read Zeldman Chapter 14
• HTTP web log reading
• Next week – Web Accessibility (making the
web accessible to everyone, including those
with disabilities)

CIS 373---Web Standards-HTTP


31 of 31

You might also like