Alan Smeaton's Research Activities



Follow this link back to my contact points

Click to view page on my teaching activities

This page is my list of research activities

Click here to see my list of (online) publications

Click here to see details on my research group

Click to see a page of miscellaneous other links

Last updated: 07 June, 2003

Indexing and Browsing of Digital Video (1997-now). The largest series of projects I am involved with is one to develop techniques to index and browse digital video. We have developed a demonstrator which records TV broadcasts digitally, analyses them, and allows a user to browse and then to play back the broadcast. The demo system is called Físchlár and this is done as part of the Centre for Digital Video Processing where there are far more details available.

Clever Searching through Large Hypertexts including the WWW (1998-now). Working with Cathal Gurrin, we developed techniques to search through large hypertext structures including the WWW, using links information, similar to what I did many years ago on guided tours through hypertext. We have taken part in the TREC web track as a first exercise in evaluating our techniqies and have also tried this on larger collections of document sets.

CEOLAIRE (1999-2003). Tom Sødring and I have worked on techniques for melody matching in music archives, part-sponsored by Ennterprise Ireland. This led to the CEOLAIRE system which does ringtone retrieval for mobile phones, based on melody matching.

EuroGatherer: Personalised Information filtering from the WWW (1997-2000). EuroGatherer was a consortium of 7 European partners in a project to develop a personalised information filtering system for the web. Our involvement (through Francis Crimmins) has been to develop the dispatcher service which pulls together information from many different sources which is then matched against user profiles. EuroGatherer is scheduled to complete in January 2000

ISOS: Irish Script on Screen: (1998-2000). This is a joint project between ourselves (Criostai MacIomhaire) and the Dublin Institute for Advanced Studies with the collaboration of the Library in Trinity College Dublin, to make high quality digital images of Old irish manusripts and make them available on the WWW. Effectively this will be a digital library of images with catalogue information for easy access. Our material includes manuscripts from the Mount Melleary collection and the famous Book of Leinster.

Character shape coding for information retrieval: (1996 - 1999) We evaluated the upperbound performance level for Larry Spitz's character shape encoding representation, a cheap and fast but we don't know how effective way to do information retrieval on scanned documents, without having to do OCR. This involves representing each character in the input documents as a code indicating the "shape" of that character.  The design of this mapping is such that it can be used to represent scanned copies of poor qualities of documents (old documents for example) without having to do full OCR which would have too many errors for subsequent information retrieval applications. Our evaluations incorporated the kind of noice introduced by the CSC recognition process into the document representation and included a mode where the user selects CSCs to include in the search or not.

Data Fusion Applied to Searching WWW (May 96 - Jan 98) Francis Crimminshas developed a method of searching for information on WWW by broadcasting a search to a number of well-known WWW search engines. This version essentially runs a Java applet which passes your query back to our fusion server which in turn broadcasts your query to 6 search engines and "fuses" the results into one consolidated list which appears on the applet running on your machine. You can then interact with this applet to retrieve WWW pages directly to your machine rather than via our fusion server. Having marked some WWW pages as "relevant" you can then ask these relevant ones (or any others) to be analysed in order to have additional search terms suggested to you. This is available with help and instructions on how to use it.

Radio News Retrieval - the Taiscéalaí system - (Oct 96 - Sept 98) We have almost completed a project on information filtering of RTÉ radio news broadcasts, working with the School of Electronic Engineering and funded by FORBAIRT. Gerard Quinn worked on this, along with Dr. Mike Morony, a post-doc based in Electronic Engineering who worked on the speech processing. We   dynamically index RTÉ radio news, taken in off the airwaves, by phone triples and deliver an information retrieval service to users, via WWW.  There is a demo of this system online.

Virtual Lectures and the Multimedia Repository (July 96 - May 97) Dublin City University funded a pilot project which delivers lectures to third year full-time and part-time students, virtually. We digitised the audio of all lectures and developed a synchronised presentation of OHP/blackboard material, using RealAudio multimedia synchronisation tools. Students accessed the lecture material in their own time, using equipment anywhere on campus, or from home instead of attending the traditional lecture presentation. The audio track was transcribed and we developed a search engine to allow students search for material in the course, or access it via a hierarchical table of contents, or access it via a back-of-the-book type index. There is a demo of this available with a guest login. The virtual lectures were used in 1996/7, 1997/8 (to over 350 students !) and I plan to use them again in 1998/9.

Efficient information retrieval on large volumes of text: (1994 - 1997) After BORGES, Fergus returned to his PhD topic, developing ways to reduce the processing costs of information retrieval without loss of effectiveness. After an M.Sc. in which he developed ways of partitioning aignature files for subsequent retrieval, he then worked on applying thresholding to inverted file searching saving considerable processing costs without loss of effectiveness. This work has been evaluated in the NIST/ARPA sponsored TREC-4 and TREC-5 benchmarking exercises and for this we used a collection of 2 Gbytes of text. There are a couple of papers on this off my list of online publications as is his PhD dissertation.

Multimedia image caption-based retrieval: (Oct 94 - Sept 96) Ian Quigley took Ray's word-word distance measurement work and applied it to a collection of 3,000 images which he hand-captioned and we also built a collection of queries. Using a varity of ways to incorporate the word-word distance into an image-scoring function, we found a net improvement in retrieval effectiveness. A paper describing this is available which was presented at SIGIR'96.

Multimedia indexing and retrieval: (Oct 95 - Sept 97) Gavin Gollogley and Mark Burnett were working on a project sponsored by FORBAIRT on multi-media indexing and retrieval ... essentially Gavin developed a web-based authoring tool which encoded meta-information into the web being created ... like link types, for example. Mark developed a search engine which will download WWW pages from a remote host and index them locally.

Word-word semantic distances: (1994 - 1996) Ray Richardson completed his PhD with us a few years ago in which he used WordNet as a knowledge base to develop techniques for measuring the semantic distances between words. He applied this to the TREC database (Wall Street Journal only) and obtained mixed results. We also evaluated his word sense disambiguation techniques on the Brown Corpus and performance was quite good. This thread of work is still continuing in the group and we will soon apply lexical chaining, a technique to identify the "essence" of a segment of text, which we will use in document indexing. Joe Carthy is working on this topic.

BORGES: In Jan 1995 to July 1996 we worked on the BORGES project, funded by the CEC- LIBRARIES program on methods to deliver an information filtering service on USENET news.