On your Linux/UNIX account on something like:
in a directory called public_html:student.computing.dcu.ie
to make the files:cd mkdir public_html edit public_html/index.html edit public_html/file.html
http://host/~user/ http://host/~user/file.html
The URL will be something like:
http://student.computing.dcu.ie/~username/
The hierarchy of directories above the files needs to be executable by "others". See Notes on directory protections
cd chmod o+x . chmod o+x public_htmlAll files need to be readable by "others". See Notes on file protections
cd chmod o+r public_html/index.html chmod o+r public_html/file.html chmod o+r public_html/image.jpg
<html> <head> <title> My web page </title> </head> <body> <h1> My web page </h1> <p> I am a very interesting person and here are my poems. </p> <p> Here is a link to my favourite artist, <a href="http://www.daniel-site.com/"> Daniel O'Donnell</a>. I hope to marry him some day. </p> </body> </html>
Read HTML Reference.
"View .. Source" on other people's pages round the Web to scavenge them for ideas (be careful not to scavenge actual content (text or image) though!)
HTML plus images is the most portable format, readable everywhere and on anything. Think of your users not just in the CA labs, but also at home, at work, abroad, on old machines and slow phone lines, on Web TV and palmtops. Why make them unable to read your work for no good reason. Use the lowest common denominator.
pdf, doc, ps, rtf, and anything that requires plug-ins in general, often break the clean Web model of browse-and-move on. Instead we get a dialog asking us to save to disk (Where? And will litter be left behind?) and run the plugin to view.
Of course, this is all about integration of the plugin with the browser - you can set it up so that the plugin launches automatically and the file is deleted afterward. But of course you have to go and get the plugin. And wait 2 hours while it downloads. And install it. And reboot. And you've got other things to do. And are you really bothered about reading this document anyway? There's lots of other stuff to read on the Net. So you hit the "Back" button, and move on.
With the browsers many people currently use, pdf, doc, ps, rtf simply means "not browsable", "off line".
Also, if it is in HTML, the content can be picked up in search engines (whereas content of ps, doc, pdf, etc. may be hidden).
But perhaps the most important reason to present everything in HTML is that people can link to it, can link to sub-sections within it, can link to labels within those sub-sections, and those sections in turn can link back out to everything else on the Web.
Another reason not to use Microsoft Word documents is the risk of spreading viruses. This risk does not exist with the other formats (certainly not with HTML).
Even if you are confident your Word files are uninfected, think of your users. As I'm browsing data, unless something is absolutely essential to my job, if I see that somebody's data is in Word format, I simply won't read it.
Robert X. Cringely points out the genius of Microsoft's public relations:
The wonder of all these Internet security problems is that they are continually labeled as "e-mail viruses" or "Internet worms," rather than the more correct designation of "Windows viruses" or "Microsoft Outlook viruses."
See How to write a CGI script.
My search engine simply does a grep of all my web pages on the spot, and pipes the result through a filter that generates tidy HTML code (so you can click on the pages in the results).
How to write a search engine in 9 lines of Shell
<a class=main-navigation href="http://HumphrysFamilyTree.com/blog.html">Blog</a>
<!--#include virtual="/SSI/header.html" -->
SSI is invisible to client:
<script src=http://file.js ></script>the user (client-side) can see the include has happened.