Abstract
In recent years, with the rapid development of big data mining technology in the medical industry, clinical precision therapy has become a research hotspot in the field of medical big data. In this study, based on the breast cancer dataset in the UCI database, a breast cancer dichotomous classification algorithm was constructed to predict breast tumour types. Among them, machine learning techniques including random oversampling algorithm, Least absolute shrinkage and selection operator (Lasso) regression for feature selection, and sequential forward selection (SFS) for feature selection algorithm were used for the processing of imbalanced dataset, optimisation of feature selection algorithm and evaluation of classification accuracy. The results showed that the random forest algorithm containing six of these features had the highest classification accuracy (97.07%), which improved the accuracy relative to the algorithm without feature selection and could potentially provide new ideas in breast cancer detection.
|