The present article proposes a new approach for visual object class recognition based on exploiting semantic relations in a visual object class structure. The algorithm uses the hypothesis in line with the Gestalt laws of proximity for human vision that, in an image, basic semantic structures are formed by line segments (arcs also approximated and broken into smaller line segments based on pixel deviation threshold in the proposed approach) which are in close proximity with each other. Further, these basic semantic structures are hierarchically combined (by brain) until such a point where a semantic meaning of the structure can be extracted. Following the same argument, the algorithm in a bottom up approach extracts line segments in an image and then forms semantic groups of these line segments based on a minimum distance threshold from each other. The line segment groups so formed can be differentiated from each other by the number of group members and their geometrical properties. The geometrical properties of these semantic groups are used to generate rotation, translation, and scale-invariant histograms used as feature vector for object class recognition tasks in a K-nearest-neighbor framework. The algorithm has been tested on standard benchmark database and results are compared with existing approaches to understand the strengths and weaknesses of the grouping approach vis-à-vis other approaches.
Nishat Ahmad, Jongan Park, "Defining Semantic Structure Features for Content-Based Visual Object Class Recognition" in Journal of Imaging Science and Technology, 2011, pp 20509-1 - 20509-9, https://doi.org/10.2352/J.ImagingSci.Technol.2011.55.2.020509