Principal Component Analysis (PCA) - An Effective Tool in Machine Learning
Abstract
Abstract— Machine learning problems involves thousands or even millions of features of each training instance. This makes training in machine learning very slow and much harder to find a good solution. Ability to reduce the number features considerably without losing important information will significantly reduce the number of training data required to resolve tasks. Principal component Analysis (PCA) is the most popular dimensionality reduction algorithm used in machine learning analyses the interrelationships among a large number of variables and to explain these variabilities in terms of a smaller number of variables, called principal components, with a minimum loss of information. In this paper Iris dataset which has 50 instances of 3 types of iris flower species (Iris-setosa, Iris-versicolor and Iris-virginica) with four features (sepal length, sepal width, petal length and petal width) was used to demonstrate PCA. The model was implemented using Anaconda python 3.6 distribution. From the results, PCA clearly classifies the data visually showing their differences.
Full Text:
PDFReferences
A. Gron, Hands-on Machine Learning with Scikit-learn, Keras, and Tensorflow. [s.l.]: O'Reilly Media, 2018.
D. Speyer, Good PCA examples for teaching. [online]. Available at: https://stats.stackexchange.com/questions/78990/good-pca-examples-for-teaching.
C. Zaiontz, Principal Component Analysis (PCA) | Real Statistics Using Excel. [online] Real-statistics.com. Available at: http://www.real-statistics.com/multivariate-statistics/factor-analysis/principal-component-analysis/
NCSS (2018). [online] Ncss-wpengine.netdna-ssl.com. Available at: https://ncss-wpengine.netdna-ssl.com/wp-content/themes/ncss/pdf/Procedures/NCSS/Principal_Components_Analysis.pdf .
]5] D. Marx, Is PCA considered a machine learning algorithm. [online] Data Science Stack Exchange. Available at: https://datascience.stackexchange.com/questions/26714/is-pca-considered-a-machine-learning-algorithm
J. Leskovec, Dimensionality reduction, PCA, SVD, MDS, ICA,
and friends. [online] Slideshare.net. Available at: https://www.slideshare.net/hustwj/dimensionality-reductionpca-svd-mds-ica-and-friends?qid=2e82a898-cf19-409c-bd51-3305b00c573e&v=&b=&from_search=3
R. Fisher, Iris Data Set. [html] University of California, School of Information and Computer Science, UCI Machine Learning Repository [http://archive.ics.uci.edu/ml], 1988.
DOI: https://doi.org/10.23956/ijarcsse.v9i5.1007
Refbacks
- There are currently no refbacks.