Aspect Ratio Based Object Identification
Martin S. Mason
School of Computing and Mathematics, University of Plymouth
Plymouth, PL48AA, UK
Abstract— A Canny filter is used to separate pens and coins from a high contrast background. Contours are calculated around the objects and the objects are discriminated based on two criteria, the aspect ratio of the bounding box around each object and the ratio of the area of the object contour to the area of the bounding box.
Object identification is a common machine vision task. In order to identify particular objects in a complex image, it is first necessary to extract the objects of interest from the scene (figure ground separation) and then identify features on those objects that allow the objects to be uniquely classified. A variety of techniques including color filtering, thresholding, and edge detection can be employed to extract the objects of interest from background. After the objects are separated, unique features can be identified based on their geometric features including corners[1], edges[2], or blobs[3] and these features can be matched against a database of stored representations.
Computer vision has become much more accessible due to the introduction of high level computer vision packages such as OpenCV[4] and Roborealm[5]. These packages are best described as rapid prototyping systems that allow the user to quickly try out new computer vision techniques.
In this particular problem, the machine vision system is asked to discriminate between pens and coins positioned against a high contrast white background. (Fig 1.) This paper will describe the algorithms used to identify the pens and coins in the image, compare the speed of the algorithms, and discuss the limitations and possible extensions of these algorithms.

Figure 1: Sample image of pencil and coins used for testing the image processing algorithm.
An image is read into a memory structure and converted to a greyscale by setting each pixel equal to the normalized average of the red, green and blue channels. [Fig 2]

Figure 2: Greyscale image created from the average of red, green and blue color channels.
A Gaussian filter[Fig 3] is then applied to the image which smooths the image but still preserves edges. The Gaussian filter is a employs the convolution of the image data with a convolution matrix equal to the Gaussian function. Since the convolution matrix is largest in the center and tapers off to the sides, this helps to preserve edge information in the image. This Gaussian filter is described as the first step in the implementation of a Canny filter in [2].

Figure 3: The Gaussian blur smooths the image while retaining edge information.
A Canny filter[2] is then applied to the image. [Fig 4]. The blurring in the previous step has served to remove any small discontinuities in the image. Now the directional derivatives are calculated for the image. This is done by just subtracting two subsequent left and right pixels for the horizontal derivatives and top and bottom pixels for the vertical direction. A non maximal filter is applied to the directional derivatives to only preserve only those edges that are on ridge, ie the directional derivatives should have opposite signs on either side of ridge. Finally, a threshold is applied to the ridges found from the non-maximal filter. Edges between the threshold value and half the threshold value are preserved if they connect to edges that are above the edge threshold.

Figure 4: Canny filter produces a binary image based on the edges determined from the directional derivatives of the images.
The image is then dilated[Fig 5] to connect any small areas that may be disconnected despite the large hysteresis in the Canny filter. The dilate function works by taking the binary image and if a pixel is white, setting the immediately adjacent horizontal and vertical pixels white(a 1:1 square kernel). This process is run through several passes to grow interesting features in a binary image.

Figure 5: The dilate fills in any small breaks in the image. This is an essential step for finding complete contours around the objects of interest.
Now contours are found around the objects of interest[Fig 6]. Contours are closed curves in the image as described in [4] pg 234-238. In this case only the external contours are objects are of interest.

Figure 6: Contours are found on the image. The contour will be used to discriminate between the pens and coins.
A this point all of the image processing is done. The complex original image has been reduced to a set of sequences of points which represent the objects of interests. These sequences will be used to discriminate between the pens and coins.
The previous section on image processing takes a complex scene and returns a set of sequences that represent contours around each of the objects of interest in the image. The list of contours is searched and any contour with a small area is removed to remove noise from the image as is illustrated below (Fig. 7)


Figure 7: The left image has many small contours that appeared due to noise early in the image. The right image shows that these have been removed by cutting on the minimum area of the contour.
A bounding box is generated around each contour. The geometry of this bounding box will be used to discriminate between pens and lines. The aspect ratio (the ratio of the width to the height) works to separate pens from coins in this and all the other test images.

Figure 8: The bounding boxes are drawn around the objects of interest.
The bounding box around a coin will be approximately square which means that the height of the box is approximately equal to the width of the bounding box. The aspect ratio is calculated as the bounding box width divided by the bounding box height. If this value is near one, then the object is labelled as a coin. If the value is far from one then the object is a pen. This condition is sufficient if the pens are oriented near vertically or near horizontally in the frame. However, if they are oriented diagonally, then the bounding box will have the same aspect ratio as the coins. As a second check, the area enclosed by each contour is calculated and that area is compared to the area enclosed by the bounding box. If the bounding box has a much larger area then the contour, then it is the case of a pen oriented diagonally. Based on these two criteria, the pens and coins are now labelled. [Fig 9]

Figure 9: The coins and pen are correctly labelled in the image as a result of the cuts on aspect ratio and contour to bounding box area.
Additional Techniques:
Since the coins are circular, a Hough Circles filter was applied [Fig 10]. This detected the coins in the image but was not as robust as the technique that was eventually employed. Large angles result in parallax distortions that the Hough circle is sensitive to. The Hough circle filter also identified a great deal of noise in the image as small circles leading to many false identification of coins.

Figure 10: Coins identified using Hough Circle Technique.
The technique described consistently identified pens from coins in a wide variety of situations and orientations. (See Appendix 1) It is also not highly computationally expensive. However, in the interest of implementing computer vision on a small embedded platform, speed of the algorithm must always be considered. Replacing the Gaussian blur and Canny filter with a simple threshold [Fig 11] produced identical results with the test images with a much lower computational overhead. [Table 1]

Figure 11: Using a simple threshold to create a binary image. This image was then used to generate contours and the same
|
Technique |
Time for 1000 iterations(s) |
Iteration t(s) |
|
Canny |
73.0 ± 0.8 |
0.073 |
|
Threshold |
59.0 ± 0.8 |
0.059 |
Table 1: The Threshold technique is faster within error then the Canny technique.
The Canny technique could only operate at 13.6 fps while the threshold technique could run at nearly 17 fps.
Limitations to Technique:
While the techniques described work well on all of the test images they fail in the presence of a similar background. For instance an image of the pen and coins on a dark wood table failed to recognize the objects from background. In addition as lighting conditions change, shadows can show up as additional objects are recognized and provide false positives in the system. Finally if the pen and coin are touching in the frame, they will be recognized as a single contour and be identified as either only a single pen or coin. An image with several of these difficulties is shown in Appendix 2. This image is outside of the image set, but the robustness of this technique is illustrated by its success with a more difficult image.
An aspect ratio based object system was developed that worked successfully on all of the test images. It is sufficiently robust that it works on more difficult images as is shown in Appendix 2. As is summarized in Table 1, the threshold based system required the least time to run. A webcam was also used for input to this system and it could successfully identify images given the limitations discussed above. This technique could be extended to deal with changing lighting conditions through the use of an adaptive threshold which changes the threshold value through the image based on the relative intensity of local pixels. The algorithm could be extended to deal with pens and coins that are touching through the use of contour geometry to define regions of interest and then successive erosion and dilation within those regions. Finally the problem of identifying the pen and coin on a low contrast background could be pursued by looking for corners within the image and using a classifier to match them against a library of known pens and coins.
References
J. Shi and C. Tomasi (June 1994). “Good Features to Track,”. 9th IEEE Conference on Computer Vision and Pattern Recognition. Springer.
Ferreira A., Engel P. “Positioning a Robot Arm: An Adaptive Neural Approach” Proc. NICROSP ‘96)
Canny, J. (1986). “A Computational Approach To Edge Detection”. IEEE Trans. Pattern Analysis and Machine Intelligence
8: 679–714. doi:10.1109/TPAMI.1986.
T. Lindeberg (1998). “Feature detection with automatic scale selection” (abstract). International Journal of Computer Vision
30 (2): pp 77–116.
Learning OpenCV, 1st Edition. By: Gary Rost Bradski; Adrian Kaehler. Publisher: O’Reilly Media, Inc. Pub. Date: September 24, 2008
S. Genter (2010) Roborealm: Vision for Machines: www.roborealm.com
Appendix 1: Processed Images:
All images were processed with the following settings:

Figure 1: The Gaussian Blur was set to use a 5×5 window. The Upper Threshold for the Canny filter was set to 150 with the Lower edge set to half of that or 75. The circle parameter for the Hough Circles was set to 35. The minimum distance between circles for the Hough Circle filter was 30 pixels. The dilate was set to run for 3 passes of a 1:1 kernel and the aspect ratio was set to 0.3 so that objects with an aspect ratio of less then 0.3 or greater then 3.33 would be considered pens.

Figure 4: coins1-small.jpg

Figure 1: coin2-small.jpg

Figure 3: pens1-small.jpg

Figure 4: pens2-small.jpg
Appendix 2: A more difficult image:
The following image [Fig 1] has shadows as a result of difficult lighting. The camera is at a significant angle relative to the subjects resulting in parallax of the image. The pens are at a near forty five degree angle and not entirely in the frame.

Figure 1: Source image.

Figure 2: The processed image with the Pens and Coins correctly identified.