Traditional data clustering statistics method is only applicable to low dimensional data clustering problem, therefore, this study designed a high-dimensional sparse data clustering based on fuzzy data statistical method, the clustering of high-dimensional sparse data statistics results. Based on the fuzzy c-means clustering algorithm, by optimizing the initial clustering center, solve the problem of local optimum, shorten the clustering statistics time; Then weighting mechanism are introduced, the method is suitable for high-dimensional sparse data clustering statistics. Based on this, in order to replace the original Euclidean distance, cosine distance to improve the effect of high-dimensional sparse data clustering statistics. Experiments show: the data dimension is not at the same time, this method has a better clustering effect of statistics. When data dimension is low, partitioned clustering statistics result when compared with 10% of the optimal; When high dimension data, block ratio is 40% when the optimal clustering statistics effect. In the sparse degree of different grade, the shooting and cluster statistical efficiency of the method are high.
周燕茹.
基于模糊数学的高维稀疏数据聚类统计方法设计
[J]. 吉林化工学院学报, 2021, 38(9): 107-111.
ZHOU Yanru.
Design of Clustering Statistics Method for High-dimensional Sparse Data based on Fuzzy Mathematics
. Journal of Jilin Institute of Chemical Technology, 2021, 38(9): 107-111.