When creating applications that use the HTTP protocol, there are some things which will be useful to know about. It is not expected that you know and remember everything, but it is important that you know about the things that matter to you, and your situation.
The HTTP (Hypertext Transfer Protocol) is used by client and server applications to communicate over the internet. Clients communicate with servers by sending requests that are formed according to the HTTP protocol, the server then sends a response back to the client. For example, this is what happens when you navigate to a web page using a browser.
A browser is a client application running on a computer, which is used to visit websites that are hosted on servers available on the WWW (World Wide Web).
When a client sends a request over the HTTP protocol, it is usually done using one of the common methods, such as:
Typically used when requesting a resource over HTTP, the resource can almost be anything including (but not limited to): web pages, images and video files.
The HEAD method is similar to the GET method but it only returns the head part of the response, excluding the body part. The head is the part containing the response headers.
The POST method is used when uploading data to a server, examples include online messaging bords such as guestbooks and forums, but it can also be used to upload images and video.
There are more methods than the above, but in a lot of cases, you will only be dealing with the above in your web application, as browsers do not all methods. Even though we do have methods for deleting a resource, we will usually be sending a POST or GET instead of a DELETE. There is no difference to the user, so it does not really matter in practice.
Sending requests over the HTTP protocol
To send a HTTP request, without a browser, you will typically use a programming language. Depending on which language you are using the actual code required to send a request may differ, but the code is often very similar across languages. For some languages it can be a matter of a few lines of code, while in others you need to write much more code. It is also further dependent on, if you are writing a desktop application or if you are writing a web application which is running on a web server.
A typical request made using the GET method would look like the below:
GET /index.html HTTP/1.1 Host: example.com
In the above example, a file called index.html in the server root is requested from a host identified as example.com. A DNS lookup is performed to find out which IP address handles the requests for example.com, and the request is then sent to that IP address.
It is important to note, that even though a requested path may be /directory/file, it does not necessarily mean that there is a file located physically in that location on the server. This is because servers can intercept incoming requests (URL rewriting). Typically a server will have mapped most or all requests to a web application, such as a CMS system which will handle the requests, and determine what content is to be delivered back to the browser.
A server response can look like:
HTTP/1.1 200 OK Date: Fri, 01 Jan 2016 01:00:00 GMT Server: Apache/126.96.36.199 (Unix) (Red-Hat/Linux) Last-Modified: Sun, 05 Dec 2015 20:33:33 GMT ETag: "8f9s94k9dm40giwpleit8s9tf9s8rn40" Content-Type: text/html; charset=UTF-8 Content-Length: 140 Accept-Ranges: bytes Connection: close <html> <head> <title>Example page</title> </head> <body> <p>Hello World</p> </body> </html>
In this case the server responds with a HTTP/1.1 200 OK message indicating that the requested file or resource was found on the server, followed by the response headers and the body of the response. The head is separated from the body using <CR><LF> (a carriage return followed by a line feed).
Request and response composition
Requests normally only contains head, while responses usually contain both a head and a body. It is the head that contains the headers, such as the host header found in client requests, and the last-modified timestamp found in server responses. It is the body of a response that contains the actual content being sent back to the client.
There are many different headers, too many to be listed here, and you can even invent your own. The first line in the examples are not really headers, but has to do with the request and response itself. The headers can typically be accessed on the server-side using a programming language such as PHP.
The path (I.e: /index.html) part of the request, found in the first line, is not allowed to be empty. If it is empty, a slash "/" is automatically added. The server needs a path to know what to deliver back to the client./p>
The HOST request header is sent together with the GET method, and in HTTP 1.1 it can not be empty or missing. This is different for older clients based on HTTP 1.0. If the HOST is missing from the request, the server can sometimes respond unexpectedly, and show a different website from one of it's other virtual hosts – so be careful when configuring your vhosts!
Web servers which are hosting multiple websites on the same IP address will need the HOST header to know which domain (host) was requested by the client, so it is required by default in HTTP 1.1.
The Content-Type response header tells the client exactly what is being delivered to it, and the charterset it was encoded in. For HTML pages UTF-8 is very common.
The ETag header is usually used for caching purposes. It is often used to check if the web application code used to generate a HTML page has changed, so that clients will know when to re-download the page, and when to just fetch the page from the cache.
The Last-Modified header is also used for caching purposes. It is used to tell clients when the page was last modified. For CMS systems, it is recommended that this is a page-global timestamp, so that if even the links in a menu are updated, the timestamp will also be updated.