Proposed Method to Enhance Text Document Clustering Using Improved Fuzzy C Mean Algorithm with Named Entity Tag
Keywords:
Fuzzy clustering, documents datasets, information extraction, named entityAbstract
Text document clustering denotes to the clustering of correlated text
documents into groups for unsupervised document society, text data
mining, and involuntary theme extraction. The most common document
representation model is vector space model (VSM) which embodies a set
of documents as vectors of vital terms, outmoded document clustering
methods collection related documents lacking at all user contact. The
proposed method in this paper is an attempt to discover how clustering
might be better-quality with user direction by selecting features to separate
documents. These features are the tag appear in documents, like Named
Entity tag which denote to important information for cluster names in text,
through introducing a design system for documents representation model
which takes into account create combined features of named entity tag
and use improvement Fuzzy clustering algorithms.
The proposed method is tested in two levels, first level uses only vectornspace model with traditional Fuzzy c mean, and the second level usesnvector space model with combined features of named entity tag and use improvement fuzzy c mean algorithm, through uses a subset of Reuters 21578 datasets that contains 1150 documents of ten topics (150) document for each topic. The results show that using second level as clustering techniques for text documents clustering achieves good performance with an average categorization accuracy of 90%.