In this paper, we present a new system to segment and label document images into text, images, and background using a modified Fuzzy C-means Algorithm. Each pixel is assigned a feature pattern extracted from the gray level distribution and computed at different scales. The invariant feature pattern is then assigned to a specific region using Fuzzy logic. Our algorithm is formulated by modifying the objective function of the standard fuzzy c-means (FCM) algorithm to allow the labeling of a pixel to be influenced by the labels in its immediate neighborhood. The neighborhood effect acts as a regularizer and biases the solution towards piecewise-homogeneous labelings. Such a regularization is useful in segmenting scans corrupted by scanner noise.
Mohamed Nooman Ahmed, "A Modified Fuzzy C-Means Algorithm for Document Image Segmentation" in Proc. IS&T Int'l Conf. on Digital Printing Technologies (NIP18), 2002, pp 815 - 818, https://doi.org/10.2352/ISSN.2169-4451.2002.18.1.art00099_2