admin管理员组文章数量:1630185
Chinese Literature Clustering Research Based on Python K-means Algorithm
ZHAO Qian-yi;Guizhou University of Finance and Economics School of Information;
Clustering is an important means of effective organization, summarization and navigation of text information. The K-means algorithm is a very typical distance-based clustering algorithm. It is used for Chinese document clustering. According to the content similarity, a group of documents is divided into several categories and the invisible knowledge is found. In this paper, the K-means algorithm based on Python language is used to summarize the Chinese literature clustering process. The initial cluster cluster number of K-means algorithm is selected by three evaluation indexes: CH index, contour coefficient index and SSE index. The range of optimal k-values is then clustered according to keywords and based on abstracts, and the clustering results are compared and analyzed, so that the clustering of Chinese documents based on abstracts can get better results. In conclusion, the literature in the same category can be clustered by keywords to further explore the invisible knowledge.
CAJViewer7.0 supports all the CNKI file formats; AdobeReader only supports the PDF format.
本文标签: LiteratureClusteringchineseChinesePythonDocuments
版权声明:本文标题:python documents in chinese_Chinese Literature Clustering Research Based on Python K-means Algorithm 内容由热心网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:https://m.elefans.com/dongtai/1729056431a1184020.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论