Notes on How to set up and maintain Web pages


Where to put your Web pages

On your Linux/UNIX account on:

eiger.computing.dcu.ie 

camac.dcu.ie 
in a directory called public_html:
cd
mkdir public_html
edit public_html/index.html
edit public_html/file.html
to make the files:
http://host/~user/
http://host/~user/file.html

The URL will be something like:

http://computing.dcu.ie/~user/
http://student.computing.dcu.ie/~user/

http://student.dcu.ie/~user/


Web hosting in DCU:



Protections

The hierarchy of directories above the files needs to be executable by "others". See Notes on directory protections

cd
chmod o+x .
chmod o+x public_html
All files need to be readable by "others". See Notes on file protections
cd
chmod o+r public_html/index.html
chmod o+r public_html/file.html
chmod o+r public_html/image.jpg





How to write them

  1. Raw HTML in a text editor

  2. Raw HTML in an assisted text editor

  3. WYSIWYG




Minimalist web page

<html>
<head>
<title> My web page </title>
</head>
<body>

<h1> My web page </h1>

<p> I am a very interesting person
and here are my poems. </p>

<p> Here is a link to my favourite artist,
<a href="http://www.daniel-site.com/"> Daniel O'Donnell</a>.
I hope to marry him some day. </p>

</body>
</html>

Case of tags doesn't matter. Blank space and new lines are all compressed.

You can probably leave out the <html> tags, and also the closing </body> and </html> - all browsers can display partial downloads. Also <p> starts a new paragraph no matter what, so you can leave out the </p> tags.

Read HTML Reference.

"View .. Source" on other people's pages round the Web to scavenge them for ideas (be careful not to scavenge actual content (text or image) though!)



Some HTML tags.



Using other formats

(or converting other formats to HTML)




HTML plus images is the Ultimate Format

HTML plus images is the most portable format, readable everywhere and on anything. Think of your users not just in the CA labs, but also at home, at work, abroad, on old machines and slow phone lines, on Web TV and palmtops. Why make them unable to read your work for no good reason. Use the lowest common denominator.

pdf, doc, ps, rtf, and anything that requires plug-ins in general, often break the clean Web model of browse-and-move on. Instead we get a dialog asking us to save to disk (Where? And will litter be left behind?) and run the plugin to view.

Of course, this is all about integration of the plugin with the browser - you can set it up so that the plugin launches automatically and the file is deleted afterward. But of course you have to go and get the plugin. And wait 2 hours while it downloads. And install it. And reboot. And you've got other things to do. And are you really bothered about reading this document anyway? There's lots of other stuff to read on the Net. So you hit the "Back" button, and move on.

With the browsers many people currently use, pdf, doc, ps, rtf simply means "not browsable", "off line".



HTML creates a seamless Web

Also, if it is in HTML, the content can be picked up in search engines (whereas content of ps, doc, pdf, etc. may be hidden).

Though Google now searches pdf and other formats, including PostScript [ps], Word [doc], PowerPoint [ppt], Excel [xls] and Rich Text Format [rtf]. See discussions here and here.

See also the new specialised Citation search engines, which search and index ps and pdf.


But perhaps the most important reason to present everything in HTML is that people can link to it, can link to sub-sections within it, can link to labels within those sub-sections, and those sections in turn can link back out to everything else on the Web.

(See hyperTex - embedding links within PS documents to other PS documents and ordinary html sites. Apparently PDF can also embed hyperlinks now. Can anyone find an example online of a link to a section within a PS or a PDF document?)



HTML is safe

Another reason not to use Microsoft Word documents is the massive risk of spreading viruses (Word viruses are becoming the single most common type of virus). This risk does not exist with the other formats (certainly not with HTML). Even if you are confident your Word files are uninfected, think of your users. As I'm browsing data, unless something is absolutely essential to my job, if I see that somebody's data is in Word format, I simply won't read it.


On a similar note, never send anybody email in TNF format. Is there anything more arrogant than an email program, Microsoft "Exchange", that sends messages that can only be read by Microsoft "Exchange"? Kind of defeats the whole purpose of email don't you think?

Again, if some email is absolutely essential to my job, I will jump through hoops to read it. If I don't know that it's absolutely essential to my job, and it arrives in Microsoft TNF format, I simply won't read it.


Robert X. Cringely points out the genius of Microsoft's public relations:

The wonder of all these Internet security problems is that they are continually labeled as "e-mail viruses" or "Internet worms," rather than the more correct designation of "Windows viruses" or "Microsoft Outlook viruses."



How to browse them


Relative links




How to upload them

  1. Edit them directly off disk in UNIX, or:

  2. Edit them in Windows. Make UNIX account look like a drive:

See Accessing UNIX remotely and from Windows.




Search engines

See How to write a CGI script.

My search engine simply does a grep of all my web pages on the spot, and pipes the result through a filter that generates tidy HTML code (so you can click on the pages in the results).

How to write a search engine in 9 lines of Shell