Classification and Retrieving Printed Arabic Document Images Based on Bagged Decision Tree Classifier
Keywords:
DIR, header-words, features extraction, bagging, bagged decision treeAbstract
Printed Arabic document image retrieval is a very important and needed system for many applications including electronic archiving, search engines, and document management systems. In this paper, an adaptive header-words based printed Arabic document images classification and retrieval system has been proposed that based on decision tree classifier improved by bagging technique. The proposed system implements effective preprocessing and segmentation techniques to prepare the document and correctly detect a specific Arabic header words form query document. Besides that, a collection of discriminative features has been extracted from detected header words to correctly classify them to a right class. In the proposed system, bagging technique has been adapted with decision tree classifier to enhance the performance of classification and hence improve the precision of retrieving documents. The experimental tests confirmed that the proposed system achieved very satisfied results of 97.35% for precision.