File - A named section of disk.
Normally both user and programmer never deal with disk directly,
but only by calling named files.
In performance-critical application, may need to implement your own
file system, but this is obviously very dangerous.
Traditionally, program data would be in very efficient binary format:
(2-byte-number)(4-byte-number)(1-byte-character)(2-byte-number)....Program needs to know structure of the file to display it. Otherwise it doesn't know where to put boundaries - might display:
(4-byte-no)(1-byte-char)(2-byte-no)(4-byte-no)....There has been a trend more recently towards program data that humans can read in a text editor:
(1-byte-character)(1-byte-character)(1-byte-character)....which displays characters that express the contents. - html, xml, and (sort of) ps, tex
Example: To store a 2 byte short integer in a file:
It is a much less efficient format - might take 9 1-byte characters to display the 2-byte-number - and this is why binary was so popular in the past. But can think about adopting such schemes now because machines getting more powerful, disk space bigger, bandwidth better than used to be.
HTML showed it could be done. XML is now taking idea one step further.
$ grep string *doc
Windows file system can spread over multiple pieces of hardware. Each given its own (single-letter) drive. Can partition a single piece of hardware into multiple drives too:
drive:\dir\fileUNIX file system can spread over multiple pieces of hardware too. But these all just appear as sub-directories of a single file hierarchy:
/drive/dir/filee.g. recall /floppy.
Can organise files in separate dirs
(Many web authors seem not to have discovered sub-dirs!).
Crucial to keep user files separate from system files (Why?).
Hence the excellent invention of
C:\My Documents,
to match its UNIX equivalent,
$home
Can reuse same file names in different sub-dirs (like index.html).
Currently files need names. Maybe this is a flaw - Real-world papers don't need names, because they look different visually, feel different, and are found in different places on your desk or on your shelves. Whereas all computer files (of same type) look the same in a File Manager. In future, maybe OS displays a representation of what is in the file (e.g. Windows File Manager preview is a step in the right direction - just not quick enough), or OS displays older files yellowed-with-age, etc., and lots of other visual clues, so it doesn't need a name. (But then how about programming?)
Anyway, currently files need names. Users and programmers are terrible at picking them. OS often doesn't help. UNIX and Mac allow long filenames (recall UNIX filenames). So does Windows post 95. Before that: You want to call a file:
photos.kenya.apr.1963.htmlbut Windows pre-Win 95 forces you to use 8 char name, 3 char extension, so you have to call it:
phka0463.htm1 year later, you have no idea what this filename means.
Still, at least old Windows had sub-directories. For years I used VM/CMS which had 8 char filenames and no sub-directories!
Short file names are good, though, for:
- Stuff you must type.
Utility names at the command-line (i.e. the program you call has a short filename). sed, grep, ls, cut, etc.
Some people say also URLs?
I would say: You should never type URLs.
(I never do.)
At most you type the host name that you saw on an ad.
For everything else you cut and paste, or click.
Backward compatibility: Perhaps you want to allow your website to be downloaded and browsed offline on a Windows 3.1 machine. My website cannot be. (You can still browse it online on a Windows 3.1 machine though.)
Can selectively break the hierarchy with shortcuts.
ln -s dir shortcutor in Windows see "Create Shortcut"
e.g.
$ ls -l /bin lrwxrwxrwx 1 root root 9 Apr 14 1997 /bin -> ./usr/binCan also just give a file multiple names:
ln -s file secondnamee.g. (StarOffice is a Windows-compatibility suite on UNIX):
lrwxrwxrwx 1 humphrys staff 10 Dec 13 21:03 excel -> staroffice -rwxr-xr-x 1 humphrys staff 10 May 4 1999 staroffice lrwxrwxrwx 1 humphrys staff 10 Dec 13 21:03 win -> staroffice lrwxrwxrwx 1 humphrys staff 10 Dec 13 21:03 word -> staroffice"staroffice" itself contains:
soffice &Can do this on Windows as well (have multiple shortcuts to a data file or program).
With shortcuts, if doing a recursive search of disk, can get infinite loop problems, or at least duplication. e.g. List all files on disk. If follow symbolic links may list files twice.
Q. Also, if delete file, do you delete symbolic link?
If so, how do you find them - do you have reverse directory of them?
Also, I make symbolic link to other user's file.
They delete file. They can't delete my link.
A. If link doesn't work, so what.
Might even leave it dangling as reminder.
If your directory is readable by others on your local machine, someone on your machine can make it readable by the world on the Web (either maliciously or accidentally):
The world can then read other user's directory through:cd /homes/your-userid/public_html ln -s /homes/other-userid/dir shortcut
Has valid uses too. Might want to make one of your own dirs visible without having to have it under public_html, e.g. public_html disk is full, dir is on another disk.http://host/~your-userid/shortcut/
Another example - SAMBA or read-write ftp may only drop you in home directory rather than root directory and you may not be able to go upwards. What you do is put symbolic links in your home directory and you can access any directory through them:
ln -s /var/mail email ln -s /htdocs ht
General conclusion is that a basic hierarchy, with some cross-links for difficult points, is excellent way to structure complex data (e.g. Yahoo directory, Google directory) - rather than total cross-link free-for-all on one hand (e.g. the Web with just search engines and no directories), or rigid hierarchy on other (e.g. Dewey library system).
Interestingly, family trees are also basically hierarchical, with arbitrary cross-links, rather than strictly hierarchical as many people seem to think.
If it's data (1's and 0's), there's no real excuse for losing it. You can make a million copies and store them all over the world. Disk space is big and cheap. Machines are often idle. The network is always on. Backups can be automated across comms. links at night.
e.g. I currently have backups of different things on 4 machines in 3-4 sites in 2 countries, plus some old backups on floppy and some partial backups in 3 different sites. And that's not even including wherever the UNIX machine here is backed up to, nor in fact does it include where the machine in the other country is backed up to. Possibly 5-6 different sites in 2-3 countries in total. And there are also electronic copies of some of my public data in electronic archives (at least 3 more sites in other countries) and search engine databases (e.g. I can recover my page from Google's cache). And there are also electronic copies of my data in the Internet Archive.
OK I think I'm finished. But this is the modern world. If it's 1's and 0's, everyone can have their own copy. And you can keep your backups in foreign countries.
In future, backup and long-term storage will be essential part of "ISP" or "Network Computer" service.
Which of these is the most dangerous: