Ok. I did not intend to write an article now. I have been working hard on a new tagging system for Brugbart's new CMS, since i finally have some time off from my school.
One thing that has bothered me for a long time, is the requirement to use absolute URLs for many things, including sitemaps. I find it totally redundant when not linking to something outside of the domain – and while minuscule – it still dose save some bandwidth to use relative URLs.
Now that i am starting to focus on internationalization of Brugbart, and therefor maintain multiple versions of the same site, the question about relative URLs vs absolute URLs has popped up again.
I have in fact been through this decision process before, but i am more than ever, leaning towards using relative URLs, especially since i now know that it is supported by browsers. Its an obvious place where we can optimize, both for simplicity, but also for a direct decrease in data that needs to be transferred to the client.
The article that inspired me to write about this can be found here: Why relative URLs should be forbidden for web developers
Relative URLs and SEO issues
This is where i find it important to distinguish between now and then, and of course the importance of education. There was a time where developers did not know much about SEO, and therefor did not account for all the problems.
When it comes to SEO however, i would argue that SEO is in fact such a tiny part of running a website, that i find it interesting that some people can specialize in this area. Not to mention that the area of SEO, is dominated by myths and outdated information. I think the search optimization part, should take care of itself – given that your site has been designed within certain standards.
I have, no doubt, tried to mess up fatally when it comes to SEO – likely even more than a couple of times. But the mistake has been discovered quickly, and i made the necessary modifications to sites.
Google will quickly adapt to changes these days. A mistake like disallowing your entire site in robots.txt, is much less fetal than it used to be, since Google will quickly re-index your site, once you get around correcting the mistake.
Some developers are not very skilled, and could easily end up making a mistake like the above. But should that be reason to disallow the use of a tool altogether? Certainly not!
The Multiple host environment
In an environment where you have the same CMS on multiple hosts, relative URLs are easier to implement, both for content on the site, but also for dynamically generated navigation links, and even for redirects.
The alternative, that is to save the domain in a settings file, is just not reliable, when your CMS is handling multiple domains.
As i have pointed out earlier, the HTTP_HOST variable, should normally be safe to use – given that the server has been configured correctly. So using it to create absolute URLs for dynamic content, is relatively easy and safe.
Creating absolute URLs however, will add another level of complexity to your code, and will also raise the question about protocols on the different hosts. What if the HTTP protocol is replaced in the future? Do you go back and change all those places where you are generating your URLs? You already need to account for HTTPS!
What if its later decided that your site should be using HTTPS?
When you are creating your own CMS system, these are questions that you may want to ask yourself. Its obviously not ideal, if you later have to go back and change it.
Issues of test environments getting indexed
Joost also mentions cases of test environments getting indexed in search engines, but i think this one goes under the umbrella of education and experience – some developers just have to learn the hard way :-P
I would also argue that its generally bad practice, to expose test environments to the internet, unless really necessary – in which case you should consider password protecting them, or at least block indexing with robots.txt. Its not really hard to account for these problems, the problem is that a lot of developers are uneducated.
301 redirects and relative URLs
The big question that i think Google should respond to, is whether the Google search engine will correctly understand relative 301 redirects.
If you ask me, there is no reason why this would not be the case, and in fact, even the Chrome browser will respond correctly to relative URLs in HTTP location headers. That is a very strong hint, as to how Google's search engine will behave.
Google also recently started to support relative URLs in robots.txt, and after all, why should we not? This can be a huge time-saver for those who work with multiple hosts, in that they do not have to generate the robots.txt file dynamically – although you may be enjoying some obvious benefits, if you are doing it dynamically.
I think relative URLs is a natural development, realizing that we rarely have to supply the full address, when the host is already known. In addition:
- When used right, they will not cause any problems whatsoever, invalidating any argument against them.
- There are some clear benefits in bandwidth usage.
- Can be easier for CMS developers to implement, and may lower maintainability directly.
- Can be faster to type, although less relevant.