HTTP

In this blog, I will document some basic concepts about HTTP. HTTP stands for Hyper Text Transfer Protocol and is a way to communicate between a server and a client. I will also look into some of the Headers associated with HTTP.
HTTP belongs to the Application layer of internet protocol suite along with SMTP, DNS, Telnet etc. Each communication contains a request from the browser and a response from the server.
Request and Response usually have 3 parts. The first line is either a request line or a status line (in case of response). Second line contains the headers and the third line contains the body. Let's examine the first line.

Resources
URL Structure - <scheme>://<host>:<port>/<path>?<query>#<frag>
Scheme - http, https, ftp etc.
Host - Which computer on the internet is hosting this resource. Browser will use DNS to find the network address from the hostname.
Port - The port on which the host is listening. 80 is default for http.
Path - Host will know which specific resource is requested by the path value.
Query - or query string. Everything after ? is query. It contains information for the host to interpret. There is no formal standard for this as its for host to determine how to interpret this value.
Frag - part after # is known as fragment. This fragment is not processed by the server but the browser. It contains the id of the element to put the focus on.

URL Encoding - Certain characters like space or # or ^ are considered unsafe for URLs. So they get percent encoded.

Media Types and Content Negotiation - When a host responds to a http request, to enable browser to read the content correctly, it also sends the content type (or media type). HTTP relies on MIME (Multipurpose Internet Mail Extensions) standards to specify the content type. Content type of text/html means that text is primary type and html is the subtype. Browsers don't just rely on the file extension to determine the content type. Most browsers first look for the content type. If not found they try to determine on their own by reading first few bytes of response. If still unable, then they look for the file extension. Sometimes, the host might have the same content exposed in different content types. To get a specific content type, the client can request for a specific content type. For ex, client can request the content to be in hindi language. If the host doesn't have that info in hindi they might still send it in english. That's why its called negotiation and not ultimatum.
Content negotiation process requires 3 items - the object to serialize, the available formatters and the request. The process then checks the request's preferred format with available formatters. If found, it returns the formatter to use and the media type of the response. If not found, it returns HTTP 406 - Not Acceptable.

Request Methods
HEAD - similar to GET but it only sends the headers without the response body.
GET - requests a static or a dynamic resource. Get parameters go into the query string.
POST - to submit data to the server. Post parameters go into the body of the HTTP message.
PUT and DELETE - to upload and delete specified resource
TRACE - server sends the received request so that client can see if intermediate servers made any modifications to the request.
OPTIONS - returns HTTP methods that are supported on the requested URL.
CONNECT - to facilitate HTTPS encrypted communication over an unencrypted HTTP proxy. In this mechanism, the client asks an HTTP Proxy server to forward the TCP connection to the desired destination using the "CONNECT" HTTP method. The server then proceeds to make the connection on behalf of the client. Once the connection has been established by the server, the Proxy server continues to proxy the TCP stream to and from the client. Note that only the initial connection request is HTTP - after that, the server simply proxies the established TCP connection. This mechanism is how a client behind an HTTP proxy can access websites using SSL (i.e. HTTPS).
PATCH - to facilitate partial modifications to a resource.
Let's now take a look at some of the more important request and response headers. 

Request Headers
Accept - acceptable content types. Similarly Accept-charset, language, encoding means acceptable charsets, languages and encodings by the browser.
Content-type, Content-length and Content-MD5 - denotes type, length and base-64 encoded binary MD5 sum of the content of the request
Authorization - authentication credentials for HTTP authentication
Cache-control - to control caching.
Connection - preferable connection.
Cookie - an HTTP cookie sent previously by server with Set-Cookie header.
Max-forwards - max number of hops the message can be forwarded through proxies and gateways.
Host - domain name of the server.
Referer - address of the previous web page from which the currently requested page was followed.
User-agent - tells the browser from which request is coming.
Via - informs the server of proxies through which the request was sent.
Date - date and time when the request was sent.
If-None-Match - works with HTTP ETag. ETag is a validation for web cache validation. If browser sends this value, server checks if the value of resource on server is the same. If so, it sends a short 304 Not Modified message which tells the browser that the copy it has is current and can be used. The response comes back with etag header in HTTP response. Browser stores the resource and this etag so that it can send it for future requests.

Response Headers
Cache-control - to control caching.
Connection - options that are desired for this connection
Content-encoding, language, length, MD5, type - describes the metadata about the content.
Expires - date and time at which the response becomes stale.
Server - name of the server
Set-Cookie - sets the cookie which will be stored by the browser and will be sent back. The cookie can be used for authentication, setting user preference, identifying user session, contents of shopping cart etc.
Via - informs the client about the proxies through which the request was sent.
WWW-Authenticate - indicates the authentication scheme that should be used to access the requested entity. For ex, basic or digest.
There are some non-standard headers as well. Usually they start with an x-.
etag - used for web cache validation. See If-None-Match request header above.
x-xss-protection - is a filter to prevent most common XSS attacks. The filter is on if the value is 1, is off if the value is 0.

Response Status Codes
1xx - Informational. For ex.
  • 122 - requested URI is too long (> 2083 chars)
2xx - Success
  • 200 - OK
  • 204 - No Content
3xx - Redirection
4xx - Client Error
  • 400 - Bad Request
  • 401 - Unauthorized
  • 403 - forbidden
  • 404 - not found
5xx - Server Error
  • 500 - internal server error
  • 503 - service unavailable
  • 504 - gateway timeout