The Web
HTTP client
Web browser
Uses MIME types.
(a) Plug-in - Runs inside browser process.
(b) Helper application - Separate process.
HTTP server
Doesn't make separate disk access for every file request - too slow.
Instead maintains cache in memory of frequently accessed files.
Multi-threaded.
Site spread over multiple disks
to help many reads going on at once.
For high-demand sites:
Multiple copies of entire site -
"server farm"
- front end routes requests to different CPUs.
Problem: OK to have all (small size) requests come in through one front end
and get routed to searching nodes.
Not OK to have all (large size) replies go back through one front end - bottleneck.
Solution: TCP handoff
- trick to have the searching node reply directly
in a manner that is invisible to client.
The reply load is therefore distributed over all the nodes.
Some URL formats.
URI schemes listed above:
- http, https
-
ftp (no password) hardly used any more - pre-web system
ftp (with password) still important
now sftp:
-
file:
very useful
(may not need prefix)
-
news:
not as important as used to be:
- read on
Google
- so many other places to talk now - discussion websites, blogs
- gopher - port 70
- not used any more - pre-web system
-
mailto: very useful
- but spammers search for these
-
telnet: (now ssh:) useful - but rarely do it by clicking a link
Others:
Keeping state
Relating one client-server stateless request
with other client-server requests.
Identify user (pay-to-view, register, personalisation).
Shopping carts.
- cookies
- Users may turn cookies off (for good reason).
Structuring content
- XHTML
- Existing HTML is forgiving
- can skip end tags,
etc. and it will still display.
Any case.
- Badly-formatted HTML is everywhere, and can be hard to parse:
- Parsing XML / HTML
-
XHTML
is trying to change this for the future
- make HTML unforgiving and case-sensitive.
The idea is to make it:
- Easier for programs to process content.
- Easier to process/display on small, low-memory devices (tiny browsers).
- I am sceptical of the XHTML vision:
-
Yes, it is true that if we all migrated to XHTML,
it would make it easier for programs to process content.
But are you going to re-write 10 billion web pages?
Good luck with that.
-
XHTML:
"XML requires user-agents to fail when encountering malformed XML".
Question - Would you use such a browser?
i.e. One that wouldn't allow you view a favourite site
because it had "malformed" XHTML.
Or would you (as the whole history of the Web shows)
simply move quietly to a different, more tolerant browser?
Anyone selling an unforgiving browser will lose a lot of money.
- "The recommendation for browsers to post an error
rather than attempt to render malformed content
should help eliminate malformed content."
- Yeah, right.
Because authors have nothing better to do.
- As for the idea of making it easier to display on small devices:
Well, my PDA
displays malformed HTML beautifully with no problem,
and so does
even my WAP phone!
- Joel Spolsky
on HTML standards
- Maybe "the way the web "should have" been built would be to have very, very strict standards and every web browser should be positively obnoxious about pointing them all out to you and web developers that couldn't figure out how to be "conservative in what they emit" should not be allowed to author pages that appear anywhere until they get their act together.
But, of course, if that had happened, maybe the web would never have taken off like it did, and maybe instead, we'd all be using a gigantic Lotus Notes network operated by AT&T. Shudder."
- About the idea that old web pages need to "change" to conform to standards:
"Those websites are out of your control. Some of them were developed by people who are now dead.
...
The idealists don’t care: they want those pages changed.
Some of those pages can’t be changed. They might be burned onto CD-ROMs. Some of them were created by people who are now dead. Most of them created by people who have no frigging idea what’s going on and why their web page, which they paid a designer to create 4 years ago, is now not working properly."
- Again, if the browser doesn't display the old pages, what will most people do?
That's right.
Dump the browser.
- In practice, instead of mixing XML and HTML they are often entirely separate services.
- e.g. Company provides:
- 5,000 HTML pages displaying products and prices.
- 1 single XML dump of all machine readable price data for whole website,
of size a few M of text.
Remote bots grab this dump with one request
instead of making 5,000 separate requests for tiny XML fragments.
Performance (client-side)
Caching
- Browser maintains cache.
- Site-wide (or ISP-wide) cache
via proxy server.
- wwwproxy.computing.dcu.ie
= 136.206.11.243
(forwards requests through 136.206.11.249)
- proxy.dcu.ie
= alternates between returning 136.206.1.17
or 136.206.1.20
(for load balancing)
- port: 8080 or 3128
- proxy1.dcu.ie = 136.206.1.20
- proxy3.dcu.ie = 136.206.1.17
To set proxy, something like:
- Firefox - Tools - Options - Network - Settings
- IE - Tools - Options - Connections - LAN settings
You may use a
proxy auto-config (PAC) file:
- http://www.computing.dcu.ie/proxy.pac
- http://proxy.dcu.ie/proxy.pac
Test the IP address other sites see: