Creating and sharing knowledge for telecommunications

Scanned Compound Document Encoding Using Multiscale Recurrent Patterns

Francisco, N. ; Rodrigues, Nuno M. M. ; Silva, E. ; Carvalho, M. ; Faria, S.M.M. ; Silva, V.

IEEE Trans. on Image Processing Vol. 19, Nº 10, pp. 2712 - 2724, October, 2010.

ISSN (print): 1057-7149
ISSN (online): 1057-7149

Journal Impact Factor: 3,315 (in 2008)

Digital Object Identifier: 10.1109/TIP.2010.2049181

Abstract
In this paper, we propose a new encoder for scanned compound documents, based upon a recently introduced coding paradigm called multidimensional multiscale parser (MMP). MMP uses approximate pattern matching, with adaptive multiscale dictionaries that contain concatenations of scaled versions of previously encoded image blocks. These features give MMP the ability to adjust to the input image's characteristics, resulting in high coding efficiencies for a wide range of image types. This versatility makes MMP a good candidate for compound digital document encoding. The proposed algorithm first classifies the image blocks as smooth (texture) and nonsmooth (text and graphics). Smooth and nonsmooth blocks are then compressed using different MMP-based encoders, adapted for encoding either type of blocks. The adaptive use of these two types of encoders resulted in performance gains over the original MMP algorithm, further increasing the performance advantage over the current state-of-the-art image encoders for scanned compound images, without compromising the performance for other image types.