ORCID

Abstract

Gene selection is crucial for cancer classification using microarray data. In the interests of improvingcancer classification accuracy, in this paper, we developed a new wrapper method called ieGENES for geneselection. First we proposed a parsimonious kernel machine regularization (PKMR) model by usingridge regularization in kernel machine driven classification to tackle multi-collinearity for the sake of stableestimates in high-dimensional settings. Then the ieGENES algorithm was developed to optimally identifyrelevant genes while iteratively eliminating redundant ones based on leave-one-out cross-validation accuracy.In particular, we developed a new methodology to optimally update model parameters upon gene removal.The ieGENES algorithm was evaluated on six cancer microarray datasets and compared to existing methods.Classification accuracy and number of differentially expressed genes (DEGs) identified were assessed. Interms of gene selection accuracy, the ieGENES outperformed multiple wrapper methods on 5 out of 6datasets (Colon, Leukemia, Hepato, Glioma, and Breast Cancers), with statistically significant improvements(�� < 0.001). For the Colon dataset, ieGENES achieved 96.21% accuracy with 167 DEGs. The proposed ieGENEStechnique demonstrated superior performance in identifying DEGs for cancer diagnosis comparing withexisting techniques. It offers a promising tool for identifying biologically relevant genes in microarray dataanalysis and biomarker discovery for cancer research.

Publication Date

2025-02-28

Publication Title

Journal of Biomedical Informatics

Volume

164

ISSN

1532-0464

Acceptance Date

2025-02-14

Deposit Date

2025-03-15

Funding

Shang-Ming Zhou was partly supported by the Baily Thomas Charitable Fund (Ref.: TRUST/VC/AC/SG/6298-9545). Shang-Ming Zhou was supported in part by the UK CDT in Artificial Intelligence, Machine Learning and Advanced Computing (EP/S023992/1).

Keywords

Cancer, Differentially expressed genes, Gene detection, Kernel machines, Machine learning, Microarray data

Creative Commons License

Creative Commons Attribution 4.0 International License
This work is licensed under a Creative Commons Attribution 4.0 International License.

First Page

104803

Last Page

104803

Additional Files

ieGENES_Supp-mmc1.pdf (109 kB)

Share

COinS