Review of High-Dimensional Data Reduction Methods
Mahmoud Rokaya*
Department of Information Technology, Taif University, Saudi Arabia
*Corresponding Author: Mahmoud Rokaya, Department of Information
Technology, Taif University, Saudi Arabia.
Received:
November 14, 2022; Published: December 23, 2022
Abstract
In the current decade, most of the computational problems came to be problems with high dimensional data. Correct data reduction will relax a load of computation to an acceptable range in time and space. Most of the available data reduction methods are built on the statistical background. Few of them adopted machine learning. Few works considered ensemble learning as a method to merge different methods to get a superior method to all merged individual reduction methods. This work will present the history of high-dimensional data reduction methods. It will analyze the recent developments in methods of reduction data schemes, especially ensemble methods.
Keywords: Dimensionality Reduction; Random Projection; PCA; Curvilinear Component Analysis (CCA); Projected Support Points (PSPs); Sequential Ensemble (SEMSE); Projection Pursuit; Particle Swarm Optimization (PSO); Genetic Algorithm (GA); Quadratic Discriminant Analysis (QDA)
References
- Aggarwal CC. “Hierarchical subspace sampling: a unified framework for high dimensional data reduction, selectivity estimation and nearest neighbor search”. In Proceedings of the 2002 ACM SIGMOD international conference on Management of data (2002): 452-463.
- Jolliffe Ian T and Cadima Jorge. "Principal component analysis: a review and recent developments". Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 374.2065 (2016): 20150202.
- Hui Zou., et al. “Sparse principal component analysis". Journal of Computational and Graphical Statistics 2 (2006): 262-286.
- Hastie T and Stuetzle W. "Principal Curves". Journal of the American Statistical Association406 (1989): 502-506.
- T Bouwmans and E Zahzah. "Robust PCA via Principal Component Pursuit: A Review for a Comparative Evaluation in Video Surveillance". Computer Vision and Image Understanding 122 (2014): 22-34.
- Hyvärinen Aapo. "Independent component analysis: recent advances". Philosophical Transactions: Mathematical, Physical and Engineering Sciences 371 (1984): 20110534.
- Liao JC., et al. “Network component analysis: Reconstruction of regulatory signals in biological systems". Proceedings of the National Academy of Sciences26 (2003): 15522-15527.
- Johnson William B and Lindenstrauss Joram. "Extensions of Lipschitz mappings into a Hilbert space". Conference in Modern Analysis and Probability (New Haven, Conn., 1982). Contemporary Mathematics. Providence, RI: American Mathematical Society 26 (1984): 189-206.
- Bingham E and Mannila H. “Random projection in dimensionality reduction: applications to image and text data”. In Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining (2001): 245-250.
- Cirrincione G., et al. “The online curvilinear component analysis (onCCA) for real-time data reduction”. In 2015 International Joint Conference on Neural Networks (IJCNN) (2015): 1-8. IEEE.
- Pang G., et al. “Sparse modeling-based sequential ensemble learning for effective outlier detection in high-dimensional numeric data”. In Proceedings of the AAAI Conference on Artificial Intelligence 32.1 (2018).
- Kabán A. “On the distance concentration awareness of certain data reduction techniques”. Pattern Recognition2 (2011): 265-277.
- Jie J., et al. “High dimensional feature data reduction of multichannel sEMG for gesture recognition based on double phases PSO”. Complex and Intelligent Systems4 (2021): 1877-1893.
- Faraoun KM and Rabhi A. “Data dimensionality reduction based on genetic selection of feature subsets”. INFOCOMP Journal of Computer Science3 (2007): 36-46.
- Johnstone IM and Titterington DM. “Statistical challenges of high-dimensional data”. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences1906 (2009): 4237-4253.
- Bertini E., et al. “Quality metrics in high-dimensional data visualization: An overview and systematization”. IEEE Transactions on Visualization and Computer Graphics 12 (2011): 2203-2212.
- Panthong R and Srivihok A. “Wrapper feature subset selection for dimension reduction based on an ensemble learning algorithm”. Procedia Computer Science 72 (2015): 162-169.
- Mak S and Joseph VR. “Projected support points: a new method for high-dimensional data reduction”. arXiv preprint arXiv:1708.06897 (2017).
- Ge X., et al. “High-dimensional hybrid data reduction for effective bug triage”. Mathematical Problems in Engineering (2020): 2020.
- Permanasasi Y., et al. “PCA and projection pursuits on high dimensional data reduction”. In Journal of Physics: Conference Series 1 (2001): 012087. IOP Publishing.
- Ding C., et al. “Adaptive dimension reduction for clustering high dimensional data”. In 2002 IEEE International Conference on Data Mining, 2002. Proceedings (2002): 147-154). IEEE.
- Durieux J and Wilderjans TF. “Partitioning subjects based on high-dimensional fMRI data: comparison of several clustering methods and studying the influence of ICA data reduction in big data”. Behaviormetrika2 (2019): 271-311.
Citation
Copyright