Deprecated: htmlspecialchars(): Passing null to parameter #1 ($string) of type string is deprecated in /home2/muc/public_html/journal/plugins/generic/citationStyleLanguage/CitationStyleLanguagePlugin.php on line 451
Warning: Cannot modify header information - headers already sent by (output started at /home2/muc/public_html/journal/plugins/generic/citationStyleLanguage/CitationStyleLanguagePlugin.php:451) in /home2/muc/public_html/journal/plugins/generic/citationStyleLanguage/CitationStyleLanguagePlugin.php on line 654
Warning: Cannot modify header information - headers already sent by (output started at /home2/muc/public_html/journal/plugins/generic/citationStyleLanguage/CitationStyleLanguagePlugin.php:451) in /home2/muc/public_html/journal/plugins/generic/citationStyleLanguage/CitationStyleLanguagePlugin.php on line 655
TY - JOUR
TI - Proposed Method to Enhance Text Document Clustering Using Improved Fuzzy C Mean Algorithm with Named Entity Tag
PY - %2022/%10/%04
Y2 - %2025/%12/%22
JF - مجلة المنصور
JA - مجلة المنصور
VL - 28
IS - 1
LA - en
UR - https://journal.muc.edu.iq/journal/article/view/98
SP - 43-62
AB - Text document clustering denotes to the clustering of correlated textdocuments into groups for unsupervised document society, text datamining, and involuntary theme extraction. The most common documentrepresentation model is vector space model (VSM) which embodies a setof documents as vectors of vital terms, outmoded document clusteringmethods collection related documents lacking at all user contact. Theproposed method in this paper is an attempt to discover how clusteringmight be better-quality with user direction by selecting features to separatedocuments. These features are the tag appear in documents, like NamedEntity tag which denote to important information for cluster names in text,through introducing a design system for documents representation modelwhich takes into account create combined features of named entity tagand use improvement Fuzzy clustering algorithms.The proposed method is tested in two levels, first level uses only vectornspace model with traditional Fuzzy c mean, and the second level usesnvector space model with combined features of named entity tag and use improvement fuzzy c mean algorithm, through uses a subset of Reuters 21578 datasets that contains 1150 documents of ten topics (150) document for each topic. The results show that using second level as clustering techniques for text documents clustering achieves good performance with an average categorization accuracy of 90%.
ER -