×
Home Current Archive Editorial board
News Contact
Original scientific article

FEATURE SELECTION METHOD USING HYBRID SWARM WITH IMPROVED FUZZY C-MEANS CLUSTERING IN DATA MINING FOR DISEASE DETECTION

By
M. Birundha Rani Orcid logo ,
M. Birundha Rani

Mother Teresa Women’s University , Dindigul , India

Dr.A. Subramani Orcid logo
Dr.A. Subramani

M.V. Muthiah Govt. Arts College for Women , Dindigul , India

Abstract

A crucial method for reducing the dimensionality issue in DM (Data Mining) tasks is FS (Feature Selection). Conventional techniques for FS do not scale well in vast spaces. The HPSO-IKM approach has a rather long processing time, so future studies will keep enhancing the technique's stages to reduce the duration of detection. PSO's poor local search capability and lagging convergence in the refining search phase prevent it from mitigating the effects of poor initialization by reducing the greatest number of IC (Intra-Clustering) faults. This paper suggests a novel approach to the dimensionality issue, in which a good feature subset is produced through combining the correlation metric using clustering. Following Z Score Normalization (ZSN) for pre-processing, a computational model is constructed to identify the pertinent features based on pertinent constraints, and a structure is developed by extracting features via Principal Component Analysis (PCA). Next, utilizing Multi-Objective Glowworm Swarm Optimization using Improved Fuzzy C-Means Clustering (MOGWO-IFCM), unnecessary features are removed, and non-redundant features are chosen from every cluster based on correlation measures. This approach employs the IFCM technique for optimizing the initial clustering center after receiving the optimal solution as an initial clustering center with the GSO (Glowworm Swarm Optimization) technique. Utilizing the Modified Long Short-Term Memory (MLSTM) classifier, the suggested approach is tested on UCI datasets, and the outcomes are contrasted with those of other well-known FS methods. Percent-wise criteria are employed to confirm the accuracy of the suggested technique with varying numbers of pertinent features. The suggested technique's accuracy and efficiency are demonstrated by the outcomes of the experiment.

References

1.
Wang XD, Chen RC, Yan F, Zeng ZQ, Hong CQ. Fast Adaptive K-Means Subspace Clustering for High-Dimensional Data. IEEE Access. 2019 Mar 22; 7:42639–51.
2.
Nizam MU, Zaneta SA, Basri FA. Machine Learning based Human eye disease interpretation. International Journal of Communication and Computer Technologies (IJCCTS). 2023;11(2):42–52.
3.
Jaiswal JK, Samikannu R. Application of random forest algorithm on feature subset selection and classification and regression. In2017 world congress on computing and communication technologies (WCCCT) 2017 Feb 2 (pp 65-68) Ieee .
4.
Singhal P, Yadav RK, Dwivedi U. Unveiling patterns and abnormalities of human gait: a comprehensive study. Indian Journal of Information Sources and Services . 2024;14(1):51–70.
5.
Snousi HM, Aleej FA, Bara MF, Alkilany A. ADC: Novel Methodology for Code Converter Application for Data Processing. Journal of VLSI circuits and systems. 2022 Sep 20;4(2):46–56.
6.
Blum AL, Langley P. Selection of relevant features and examples in machine learning. Artificial Intelligence. 1997 Dec 1;97(1–2):245–71.
7.
Lakkaraju H, Kamar E, Caruana R, Horvitz E. Identifying Unknown Unknowns in the Open World: Representations and Policies for Guided Exploration. InProceedings of the AAAI Conference on Artificial Intelligence . 2017;2017 Feb 13 (Vol. 31, No. 1).(1).
8.
Soofi AA, Awan A. Classification techniques in machine learning: applications and issues. Journal of Basic & Applied Sciences. 2017 Jan 5; 13:459–65.
9.
Chen RC, Dewi C, Huang SW, Caraka RE. Selecting critical features for data classification based on machine learning methods. Journal of Big Data. 2020 Jul 23;7(1):52.
10.
Alhassan S, Abdul-Salaam G, Micheal A, Missah YM, Ganaa ED, Shirazu AS. CFS-AE: Correlation-based Feature Selection and Autoencoder for Improved Intrusion Detection System Performance. Journal of Internet Services and Information Security. 2024 Mar;14(1):104–20.
11.
Ying S, Xue-Ying Z. Characteristics of human auditory model based on compensation of glottal features in speech emotion recognition. Future Generation Computer Systems. 2018 Apr 1; 81:291–6.
12.
Rodriguez-Galiano V, Sanchez-Castillo M, Chica-Olmo M, Chica-Rivas MJ. Machine learning predictive models for mineral prospectivity: An evaluation of neural networks, random forest, regression trees and support vector machines. Ore Geology Reviews. 2015 Dec 1;(71):804–18.
13.
Zhu F, Jiang M, Qiu Y, Sun C, Wang M. RSLIME: an efficient feature importance analysis approach for industrial recommendation systems. . In2019 International Joint Conference on Neural Networks (IJCNN) . 2019 Jul 14 (pp. 1-6). IEEE.
14.
Bhuyan HK, Kamila NK. Privacy preserving sub-feature selection in distributed data mining. Applied Soft Computing. 2015 Nov 1; 36:552–69.
15.
Ravisankar P, Ravi V, Rao GR, Bose I. Detection of financial statement fraud and feature selection using data mining techniques. Decision Support Systems. 2011 Jan 1;50(2):491–500.
16.
Rahmat A, Nurrahman AA, Pramono SA, Ahmadi D, Firdaus W, Rahim R. Data Optimization using PSO and K-Means Algorithm. Journal of Wireless Mobile Networks, Ubiquitous Computing, and Dependable Applications. 2023;14(3):14–24.
17.
Devi SG, Sabrigiriraj M. Swarm intelligent based online feature selection (OFS) and weighted entropy frequent pattern mining (WEFPM) algorithm for big data analysis. Cluster Computing. 2017;2019 Sep;22(Suppl 5)(S5):11791–803.
18.
Abualigah L, Dulaimi AJ. A novel feature selection method for data mining tasks using hybrid sine cosine algorithm and genetic algorithm. Cluster Computing. 2021;2021 Sep;24(3):2161–76.
19.
Muralidharan J. Wideband Patch Antenna For Military Applications. National Journal of Antennas and Propagation. 2020 Jan 1;2(1):25–30.
20.
Tang J, Zhang J, Yu G, Zhang W, Yu W. Multisource Latent Feature Selective Ensemble Modeling Approach for Small-Sample High-Dimensional Process Data in Applications. IEEE Access. 2020 Aug 11;8:148475–88.
21.
Gong L, Xie S, Zhang Y, Wang M, Wang X. Hybrid Feature Selection Method Based on Feature Subset and Factor Analysis. IEEE Access. 2022 Nov 17;10:120792–803.
22.
Fan W, Bouguila N, Ziou D. Unsupervised Hybrid Feature Extraction Selection for High-Dimensional Non-Gaussian Data Clustering with Variational Inference. IEEE Transactions on Knowledge and Data Engineering. 2012 May 15;25(7):1670–85.
23.
Huang Z, Yang C, Zhou X, Huang T. A Hybrid Feature Selection Method Based on Binary State Transition Algorithm and ReliefF. IEEE Journal of Biomedical and Health Informatics. 2018 Sep 28;23(5):1888–98.
24.
Zhang X, Mei C, Li J, Yang Y, Qian T. Instance and Feature Selection Using Fuzzy Rough Sets: A Bi-Selection Approach for Data Reduction. IEEE Transactions on Fuzzy Systems. 2022 Oct 25;31(6):1981–94.
25.
Muthulakshmi P, Parveen M. Z-Score Normalized Feature Selection and Iterative African Buffalo Optimization for Effective Heart Disease Prediction. International Journal of Intelligent Engineering and Systems. 2023 Jan 1;16(1).
26.
Shah SM, Batool S, Khan I, Ashraf MU, Abbas SH, Hussain SA. Feature extraction through parallel Probabilistic Principal Component Analysis for heart disease diagnosis. Physica A: Statistical Mechanics and its Applications. 2017 Sep 15; 482:796–807.
27.
Sefidian AM, Daneshpour N. Missing value imputation using a novel grey based fuzzy c-means, mutual information based feature selection, and regression model. Expert Systems with Applications. 2019 Jan 1; 115:68–94.
28.
Aljarah I, Ludwig SA. A new clustering approach based on glowworm swarm optimization. In2013 IEEE congress on evolutionary computation. 2013 Jun 20(pp. 2642-2649). IEEE.
29.
Yu Y, Si X, Hu C, Zhang J. A Review of Recurrent Neural Networks: LSTM Cells and Network Architectures. Neural Computation. 2019 Jul 1;31(7):1235–70.
30.
Sherstinsky A. Fundamentals of Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM) network. Physica D: Nonlinear Phenomena. 2020 Mar 1;404 :132306.
31.
Fong S, Wong R, Vasilakos AV. Accelerated PSO Swarm Search Feature Selection for Data Stream Mining Big Data. IEEE Transactions on Services Computing. 2015 Jun 1;9(1):33–45.
32.
Wang D, Nie F, Huang H. Feature Selection via Global Redundancy Minimization. IEEE Transactions on Knowledge and Data Engineering. 2015 Apr 28;27(10):2743–55.
33.
García-Escudero LA, Gordaliza A, Matrán C, Mayo-Iscar A. A review of robust clustering methods. Advances in Data Analysis and Classification. 2010 Sep;4(2):89–109.
34.
Chen RC. Using Deep Learning to Predict User Rating on Imbalance Classification Data. IAENG International Journal of Computer Science. 2019 Mar 1;46(1).
35.
Varshavardhini S, Rajesh A. An Efficient Feature Subset Selection with Fuzzy Wavelet Neural Network for Data Mining in Big Data Environment. Journal of Internet Services and Information Security. 2023 May;13(2):233–48.
36.
Cenggoro TW, Mahesworo B, Budiarto A, Baurley J, Suparyanto T, Pardamean B. Features Importance in Classification Models for Colorectal Cancer Cases Phenotype in Indonesia. Procedia Computer Science. 2019 Jan 1; 157:313–20.
37.
CAMGÖZLÜ Y, KUTLU Y. Leaf Image Classification Based on Pre-trained Convolutional Neural Network Models. Natural and Engineering Sciences. 2023 Dec 1;8(3):214–32.
38.
Venkatasalam K, Rajendran P, Thangavel M. Improving the Accuracy of Feature Selection in Big Data Mining Using Accelerated Flower Pollination (AFP) Algorithm. Journal of Medical Systems. 2019 Apr;43(4):96.

Citation

This is an open access article distributed under the  Creative Commons Attribution Non-Commercial License (CC BY-NC) License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. 

Article metrics

Google scholar: See link

The statements, opinions and data contained in the journal are solely those of the individual authors and contributors and not of the publisher and the editor(s). We stay neutral with regard to jurisdictional claims in published maps and institutional affiliations.