Abstract
Traditional data clustering statistics method is only applicable to low dimensional data clustering problem, therefore, this study designed a high-dimensional sparse data clustering based on fuzzy data statistical method, the clustering of high-dimensional sparse data statistics results. Based on the fuzzy c-means clustering algorithm, by optimizing the initial clustering center, solve the problem of local optimum, shorten the clustering statistics time; Then weighting mechanism are introduced, the method is suitable for high-dimensional sparse data clustering statistics. Based on this, in order to replace the original Euclidean distance, cosine distance to improve the effect of high-dimensional sparse data clustering statistics. Experiments show: the data dimension is not at the same time, this method has a better clustering effect of statistics. When data dimension is low, partitioned clustering statistics result when compared with 10% of the optimal; When high dimension data, block ratio is 40% when the optimal clustering statistics effect. In the sparse degree of different grade, the shooting and cluster statistical efficiency of the method are high.
|