How to write a search engine in 9 lines of Shell
The following CGI script
is a fully working search engine for your web pages:
#!/bin/sh
echo "Content-type: text/html"
echo
echo '<html> <head> <title> Search results </title> </head> <body>'
argument=`echo "$QUERY_STRING" | sed "s|q=||"`
cd /users/homes/me/public_html
echo '<pre>'
grep -i "$argument" *html */*html | sed -e 's|<|\<|g' -e 's|>|\>|g'
echo '</pre>'
|
Notes:
- The "sed" command is there because
there are HTML tags in the results returned by grep.
Unfortunately, these will be interpreted by your browser.
To just print the tag without interpreting it,
the search engine pipes the results through a program that
converts all
< characters to
<
and
> to
>
- q= assumes that your input variable is called "q" in the HTML form
that sends data to this CGI script.
- Your web directories need to be readable
for the wildcard to work.
Further enhancements you might make:
- Some extra security
would be wise, e.g. process the argument with a C++ script
before passing it to grep,
check your PATH, etc.
- Consider also where there are spaces in the argument
(multiple search words), etc.
- If you have more than 2 levels of web pages
you may write them out explicitly as
*/*/*html
etc.,
or get a recursive grep,
or use recursive find first to build the filespec:
cd /users/homes/me/public_html
filespec=`find . -type f -name "*html" | tr '\n' ' '`
grep -i "$argument" $filespec
Since each search will be using the same file list,
it would be more efficient to pre-build the list once,
and cache it in a file, and then:
read filespec < filelist.txt
grep -i "$argument" $filespec
(I hope you realise that a heavy-duty search engine
would go further and pre-index all the files in advance,
rather than grep-ing them on the spot.
But simple grep is alright for a personal website.)
- You might of course like to tidy up the output,
in particular so that someone can actually click on the page(s) returned.
- The pages are not ranked in order of relevance,
but only in the order in which
grep finds them.
How would you solve this?
But the principle is that in Shell you can rustle up
a quick search engine for your personal pages,
or any subset of them, in a few lines.
e.g. My search engine
in about 55 lines of Shell
(with a C++ input pre-processor for security)
has the above enhancements.