Acta Scientific Computer Sciences

Research Article Volume 4 Issue 6

Reduction of Training Data from Large Datasets using Encoder and Decoder Algorithm without Loss of Accuracy

Bagesh Kumar*, Akhil Shukla, Akhil Singh, Mohd Javed Ali and OP Vyas

IIIT Allahabad, India

*Corresponding Author: Bagesh Kumar, IIIT Allahabad, India.

Received: February 07, 2022; Published: May 25, 2022


The objective of this paper is to minimize the number of samples required for training algorithms involving support vectors while maximizing knowledge of the target class. A method is proposed which uses autoencoder in conjunction with farthest boundary point extraction for selecting most promising frontier points from the original sample. Farthest frontier points are chosen using a geometrical approach for estimating extreme points of a class and autoencoder for learning a compressed representation of the data. For experimentation, we have used datasets of MNIST, Iris, credit card fraud detection, Indian Pines, Human Activity Recognition Database.

Keywords: Sample Reduction; Autoencoder; Dimensionality Reduction; Farthest Boundary Point Extraction; Multiclass Classification; SVM; Training Data Reduction


  1. Bagesh Kumar., et al. “A fast learning algorithm for One-Class Slab Support Vector Machines”. Indian Institute of Information Technology, Allahabad, India.
  2. Aha DW., et al. “Instance-based learning algorithms”. Machine Learning 6 (1991): 37-66.
  3. Alam Shamshe., et al. “Sample reduction using farthest boundary point estimation (FBPE) for support vector data description (SVDD)”. Pattern Recognition Letters 131 (2020): 268-276.
  4. Angiulli F. “Prototype-based domain description for one-class classification”. IEEE Transactions on Pattern Analysis 34 (2012): 1131-1144.
  5. Yasi Wang., et al. “Auto-encoder based dimensionality reduction”. School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China.
  6. Chalapathy R and Chawla S. “Deep learning for anomaly detection: A survey”. arXiv preprint arXiv:1901.03407 (2019).
  7. Domingues R., et al. “A comparative evaluation of outlier detection algorithms: Experiments and analyses”. Pattern Recognition 74 (2018): 406-421.
  8. Fan Cheng., et al. “A subregion division based multi-objective evolutionary algorithm for SVM training set selection”. Neurocomputing 394 (2020): 70-83.
  9. Gates GW. “The reduced nearest neighbor rule”. IEEE Transactions on Information Theory 18 (1972): 431-433.
  10. Wei Wang., et al. “Generalized Autoencoder: A Neural Network Framework for Dimensionality Reduction”. Center for Research on Intelligent Perception and Computing, CRIPAC, Nat’l Lab of Pattern Recognition, Institute of Automation Chinese Academy of Sciences Nat’l Eng. Lab for Video Technology, Key Lab. of Machine Perception (MoE), Sch’l of EECS, Peking University, Beijing, China.
  11. Hart PE. “The condensed nearest neighbor rule”. IEEE Transactions on Information Theory 14 (1968): 515-516.
  12. Hastie T and Tibshirani R. “Discriminant adaptive nearest neighbor classification”. IEEE Transactions on Pattern Analysis 18 (1996): 607-616.
  13. Wenzhu SUN., et al. “Heuristic sample reduction method for support vector data description”. Naval Aeronautical Engineering Institute, Qingdao Branch, Qingdao, P.R. China.
  14. Ji M and Xing HJ. “Adaptive-weighted one-class support vector machine for outlier detection”. in: Control And Decision Conference (CCDC), 2017 29th Chinese, IEEE (2017): 1766-1771.
  15. Latorre Javier., et al. “Effect of data reduction on sequence-to-sequence neural tts”. ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, (2019).
  16. Li Y. “Selecting training points for one-class support vector machines”. Pattern Recognition Letters 32 (2011): 1517-1522.
  17. Ougiaroglou Stefanos., et al. “Exploring the effect of data reduction on Neural Network and Support Vector Machine classification”. Neurocomputing 280 (2018): 101-110.
  18. GE Hinton and RR Salakhutdinov. “Reducing the Dimensionality of a with Neural Networks”.
  19. Reducing the Number of Training Samples for Fast Support Vector Machine Classification by Ravindra Koggalage and Saman Halgamuge Department of Mechanical and Manufacturing Engineering, The University of Melbourne.
  20. Rico-Juan JR and I nesta JM. “New rank methods for reducing the size of the training set using the nearest neighbor rule”. Pattern Recognition Letters 33 (2012): 654-660.
  21. Ritter G., et al. “An algorithm for a selective nearest neighbor decision rule”. IEEE T Inform Theory 21 (1975): 665-669.
  22. Venelin Valkov. “Credit Card Fraud Detection using Autoencoders in Keras” (2017).
  23. Wilson DR and Martinez TR. “Reduction techniques for instance-based learning algorithms”. Machine Learning 38 (2000): 257-286.
  24. Zhu, F., et al. “Boundary detection and sample reduction for one-class support vector machines”. Neurocomputing 123 (2014): 166-173.
  25. Zvarevashe Kudakwashe Olugbara Oludayo. “Ensemble Learning of Hybrid Acoustic Features for Speech Emotion Recognition”. Algorithms (2020).


Citation: Bagesh Kumar., et al. “Reduction of Training Data from Large Datasets using Encoder and Decoder Algorithm without Loss of Accuracy". Acta Scientific Computer Sciences 4.6 (2022): 59-74.


Copyright: © 2022 Bagesh Kumar., et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.


Acceptance rate35%
Acceptance to publication20-30 days

Indexed In

News and Events

Contact US