On your Linux/UNIX account on:
in a directory called public_html:eiger.computing.dcu.ie camac.dcu.ie
to make the files:cd mkdir public_html edit public_html/index.html edit public_html/file.html
http://host/~user/ http://host/~user/file.html
The URL will be something like:
http://computing.dcu.ie/~user/ http://student.computing.dcu.ie/~user/ http://student.dcu.ie/~user/
The hierarchy of directories above the files needs to be executable by "others". See Notes on directory protections
cd chmod o+x . chmod o+x public_htmlAll files need to be readable by "others". See Notes on file protections
cd chmod o+r public_html/index.html chmod o+r public_html/file.html chmod o+r public_html/image.jpg
<html> <head> <title> My web page </title> </head> <body> <h1> My web page </h1> <p> I am a very interesting person and here are my poems. </p> <p> Here is a link to my favourite artist, <a href="http://www.daniel-site.com/"> Daniel O'Donnell</a>. I hope to marry him some day. </p> </body> </html>
Case of tags doesn't matter. Blank space and new lines are all compressed.
You can probably leave out the <html> tags, and also the closing </body> and </html> - all browsers can display partial downloads. Also <p> starts a new paragraph no matter what, so you can leave out the </p> tags.
Read HTML Reference.
"View .. Source" on other people's pages round the Web to scavenge them for ideas (be careful not to scavenge actual content (text or image) though!)
HTML plus images is the most portable format, readable everywhere and on anything. Think of your users not just in the CA labs, but also at home, at work, abroad, on old machines and slow phone lines, on Web TV and palmtops. Why make them unable to read your work for no good reason. Use the lowest common denominator.
pdf, doc, ps, rtf, and anything that requires plug-ins in general, often break the clean Web model of browse-and-move on. Instead we get a dialog asking us to save to disk (Where? And will litter be left behind?) and run the plugin to view.
Of course, this is all about integration of the plugin with the browser - you can set it up so that the plugin launches automatically and the file is deleted afterward. But of course you have to go and get the plugin. And wait 2 hours while it downloads. And install it. And reboot. And you've got other things to do. And are you really bothered about reading this document anyway? There's lots of other stuff to read on the Net. So you hit the "Back" button, and move on.
With the browsers many people currently use, pdf, doc, ps, rtf simply means "not browsable", "off line".
Also, if it is in HTML, the content can be picked up in search engines (whereas content of ps, doc, pdf, etc. may be hidden).
Though
Google now searches pdf
and other formats,
including
PostScript [ps], Word [doc], PowerPoint [ppt], Excel [xls] and
Rich Text Format [rtf].
See discussions
here
and here.
See also the new specialised
Citation search engines,
which search and index ps and pdf.
But perhaps the most important reason to present everything in HTML is that people can link to it, can link to sub-sections within it, can link to labels within those sub-sections, and those sections in turn can link back out to everything else on the Web.
(See hyperTex
- embedding links within PS documents
to other PS documents and ordinary html sites.
Apparently PDF can also embed hyperlinks now.
Can anyone find an example online of a link to a section
within a PS or a PDF document?)
Another reason not to use Microsoft Word documents is the massive risk of spreading viruses (Word viruses are becoming the single most common type of virus). This risk does not exist with the other formats (certainly not with HTML). Even if you are confident your Word files are uninfected, think of your users. As I'm browsing data, unless something is absolutely essential to my job, if I see that somebody's data is in Word format, I simply won't read it.
On a similar note, never send anybody email in TNF format. Is there anything more arrogant than an email program, Microsoft "Exchange", that sends messages that can only be read by Microsoft "Exchange"? Kind of defeats the whole purpose of email don't you think?
Again, if some email is absolutely essential to my job, I will jump through hoops to read it. If I don't know that it's absolutely essential to my job, and it arrives in Microsoft TNF format, I simply won't read it.
Robert X. Cringely points out the genius of Microsoft's public relations:
The wonder of all these Internet security problems is that they are continually labeled as "e-mail viruses" or "Internet worms," rather than the more correct designation of "Windows viruses" or "Microsoft Outlook viruses."
See How to write a CGI script.
My search engine simply does a grep of all my web pages on the spot, and pipes the result through a filter that generates tidy HTML code (so you can click on the pages in the results).
How to write a search engine in 9 lines of Shell