
![]() |
||||||
|
||||||
Sign Language Recognition We aim to build a system which can recognise
a significant number of signs from a sign language for the Deaf such as
ISL (Irish Sign Language). The system should be able to run on a standard
PC and use only a single webcam. The webcam feeds live video images to
the PC where they are processed.
As you can see the system has picked up some objects in the background which have a similar colour. We can filter these out by selecting the largest yellow obect in the image.
Because the size of the hand image will vary as the hand moves backwards and forwards we scale all the images to a standard size of 32x32 pixels.
The next task in constructing any Sign Language Recognition system is to gather a large set of images of the signs which are to form the vocabulary of the system. Many examples of each sign should be gathered to allow for minor variations. As yet we have gathered signs from only one person. To gather signs from many people is a much larger task. The data set may contain many thousands of images. The next task is to design the system so that it can recognise a new image D which it has not seen before. A naive approach to recognising a new image D would be to simply compare it with all the images stored in the data set and find the target image T with the closest match. This is the so-called “nearest neighbour” or “template matching” approach. But because there are so many images in the data set this will take far too long. We can reduce the time by using two techniques known as “multi-scale” and “principal components analysis”. The multi-scale approach works by using the idea of “divide-and-conquer”. We divide up the data set into groups of images which are similar to one another. We do this by deliberately blurring the images so that small differences between similar images will be eroded. Thus a whole group of original images may become reduced to just one image, which represents the entire group. So the total size of the data set will be reduced. For example the next three images show
the sign 'a' at full resolution, then scaled to 32x32 pixels and finally
blurred. The following set of three images shows the sign 'e' at full resolution,
scaled to 32x32 and then blurred.
As a first step, we need only search those images which remain distinguishable after blurring and find the target image T1 which matches D best. As a second step we reduce the blurring slightly and then search only those images in the group represented by T1. We find the image T2 which matches D best. We then reduce the blurring still further and search only those images represented by T2. We find the next target T3. And so on until the blurring has been reduced to zero. Suppose we can search 10 images at each step. It takes only 4 steps to find the best match out of 10000 images. But we can speed up the search further by using a technique known as Principal Component Analysis (PCA). This depends on the idea that each image can be represented by a point in a multi-dimensional space. If each image has 32x32 (1024) pixels and each pixel has a brightness value between 0 and 255 then each pixel forms one dimension of a 1024-dimensional space. Any 1024-pixel image forms a point within this space. This is a vast space and the number of possible 1024-pixel images is truly gargantuan. The total set of hand images will occupy only a miniscule fraction of this space. Just as a 2-dimensional piece of paper forms a sub-space of a 3-dimensional room the hand images will lie on a low-dimensional sub-space of the 1024-dimensional image space. The PCA algorithm is able to calculate the position and size of this sub-space. With any luck the sub-space may have only a small number of dimensions. This means that we can describe each image by much fewer than 1024 numbers. We can combine multi-scale and PCA as follows. First of all we blur the entire data set of images to such an extent that it occupies a sub-space of only 3 dimensions. This means that we can actually visualise the distribution of points in the space. Each point represents an image and the distance between points represents how similar their images are. If we divide the space into two along each dimension then this divides the space into 8 separate groups, which contain images, which are all similar to one another. We can then take each of these 8 groups in turn and reduce the blurring until the images in that group just occupy 3 dimensions. Using the PCA algorithm again we can extract the sub-space which is occupied by that group and we can visualise it. We can then sub-divide this new sub-space into 8 groups just as we did to the previous space. We can then repeat the whole process until we have reduced the blurring to zero. To see the operation of developed Sign
Language Recognition System click
HERE
|
||||||
|
|
||||||
|
|