Each page in the site should have a canoncial URL. Maybe a page is available as
and many more. But one URL, e.g. http://eiði.com/%FAr%20ei%F0i/ should be the canonical one,
the others should redirect (301 Permanently Moved) to the canonical one.
The URL http://gadget.com/tv/ is easier remembered than
http://gadget.com/bin/iis/display-page.asp?product-category={af34fef887}
E.g. http://heima.olivant.fo/~egilstro/eiði.html
is encoded as http://heima.olivant.fo/~egilstro/ei%F0i.html
http://gadget.com/products/tv/
may be a good URL is TVs are placed under the Products
menu item. But the relation to the menu structure should not be enforced.
http://gadget.com/tv/ is easier to remember.
And more robust, if the TVs are moved from the Products
menu
to the Consumer Electronics
submenu.
E.g. hoppa.com changed from "Defence" to the more objective term "Military" in a URL, without breaking the old link to http://hoppa.com/Defence/index.en.pl.gz?.
The content should be deliverable in XHTML (for easier consumption by other applications), and maybe HTML for better legacy browser compatibility, WML for mobile phones and PDF or XSL for printing. Highly structured data maybe as XML, comma-separeded files, Open-Office (or legacy Excel) spreadsheets...
WML pages are generally smaller than HTML pages. And PDF/XSL should generally contain everything the user wants to know in one file.
Graphs should be presented as SVG, with fallback to GIF, and the underlaying data available as XML.
See w3.org/WAI
Validity is not and end in itself, but it is a good indication that the code is clean, and taht it will do what is expected on most platforms.
In general, it si OK for a page to be inavalid, as long as there is a reason for it, and the consequences—if any— are known and accepted.
The most practical way to do that, is probably to rewrite the original system to create a set of XML files suited for import into the new system.
But if the new system can read the database of the old one, or simply read its generated pages, nothing could be better.
Changing CMS should not break the links to the old site.
Old links should still work, if the new system uses new URL, the
old URLs should be redirected (301 Permanently Moved
) to the new one.
If the CMS requires special support from the web hotel, that severely restricts our freedom to choose web hosting provider.
If the site doesn't require data from the users, the CMS site could run offline. The CMS could FTP a complete site to the web hotel once a day, or FTP pages as they change.
If special CMS software must be installed on the server, it should be generally available software that is running on many different web hotels. Being locked to one web provider is dangerous.
At the very least, you should have access to all source, and the option of running the CMS on your own server.
Javascript allows some dynamics on a static server. Incredibly, the language setting of IE, Mozilla and Opera is invisible to Javascript. So language negotiation is not an option on static servers, at least not if the client is one of these mainstream browsers.
When dynamics are needed, the CMS should still not requre anythng more exotic than JSP, ASP, PHP, XSLT or mySQL. Maybe stuff like Cold Fusion, Oracle and MS SQL; these are rare, but still more common than any CMS offering by web hotels.
The Last-Modified header should be correct.
This is not easy, because the page is usually made from different 'atoms' that may have been updated at different times. Last-Modified of the page should be maximum of Last-Modified for the atoms in the page
Almost all CMS 'solve' this problem by claiming that all pages have been modified right now. (If they don't know the modification time, they should at least shut up, and omit the Last-Modified header.) This causes browsers to retrieve updates of pages that have not been changed, and makes it harder for serach robots to determine what they should bother to re-index.
To save bandwidth, the HEAD request should return the HTTP header only. (Not, like Cold Fusion does, return the entire page.) Likewise, a GET request with if-not-modified-since should only return the head, unless the page is modified.
new.
Any change in a page, no matter how small, will cause Last-Modified to be updated, and the browser to retrieve the entire page when refreshing.
But smaller corrections should not cause emails to keep people informed of updates at the site, and not new items in RSS streams. Only a human editor can determine when a new page is really new.
When the user agent supports it, pages should be delivered in compressed form.
The CMS should accept client SSL certificates as authentication, when authentication is needed.
Pages should be presented for both HTML browsers and WAP. There should not be a 1-to-1 mapping between WAP and HTM pages, because WAP pages need to be smaller. RSS feeds and other metadata usually doesn't map to specific HTML pages.
A web site always has a structure, often a tree. A page must keep its URL, even if it is moved to a different location in that tree.
Like W3C often does: Quoting two URLs in a document, the URL of this version, and the URL of the newest version. (These URLs must be different, even when this version is the newest.)
I have a hard time defineing this point, because exceptions are—well—exceptional.
One example: I have this group of gallery pages. CMS would be a tremendous help in keeping link titles consistent with headers, thumbnails concistent with big images, <prev> links consistent with <next> links &c. Except that I want something special for leynar_surf.html and spot.html.
A CMS should be able to handle this kind of thing, but I don't know how.
The CMS should generate unique bookmarks to make deep linking from the outside easy.
And the URLs of internal pages should be visible to the user. I.e.: no frames, and no full page Flash.
When following a deep link, maybe from a search engine, the page should provide some context,
e.g. with pointers up
and home
.
If deep links for some reason are unwanted, they should be stopped politely.
Simply redirecting to the home page, or displaying 404 Not Found
, is not quite polite enough.
For an image, one way to do it, could be sending an image of the URL of the page that the image is stolen from. Another could be returning the page that the image belongs in.
Titles should—even with only the first few characters shown—help the user navigate between his windows.
E.g. this page is titles CMS - Content Management Systems Wish List
, and the first two characters,
CM
, is enough to identify it in my task bar.
There may be a fixed maximum for menu nesting, but there should be no minimum: A top menu item can exist without sub-items. We don't want:
just to get down to level 3.
Help Google by generating clean code, using informative file names, title elements etc, and supply correct update metadata.
Don't try to trick Google. That will backfire when Google catches up. And I haven't really gained anything, if I get a million hits on this page from peaple looking for Britney Spears.
The CMS should keep track of what language is used in each text, and declare the main language in the HTTP header. Page elements with other languages, shuld be declared with the lang (or xml:lang) attribute.
Default language should be determined by browser settings. Translations should be assigned a lower q than originals, so someone who prefers English but understands Faroese, will get a Faroese original, rather than the English translation.
Links between languages should link to the similar page in the other langguage. E.g. Uttanlands, International and International should link to each other, not to various translations of the home page.
When there is no similar page in the other language, there are to possibilities:
If you don't read Faroese, and someone sends you a link to
Familjuferðir,
you would expect to find an English translation behind
English
and Danish behind
Dansk
but you won't.
Yeah, open source is OK. But from the above, it is clear that I consider other forms of openness more important.
I guess that XML, XSLT and JSP are good technologies to satisfy these wishes. But technology is not part of my demands, these are guesses.