Full Description
This textbook is intended for students of mathematics who have completed the foundational courses of their undergraduate studies and now want to specialize in Data Science and Machine Learning. It introduces the reader to the most important topics in the latter areas focusing on rigorous proofs and a systematic understanding of the underlying ideas.
The textbook comes with 121 classroom-tested exercises. Topics covered include k-nearest neighbors, linear and logistic regression, clustering, best-fit subspaces, principal component analysis, dimensionality reduction, collaborative filtering, perceptron, support vector machines, the kernel method, gradient descent and neural networks.
Contents
Preface.- 1 What is Data (Science)?.- 2 Affine Linear, Polynomial and Logistic Regression.- 3 k-nearest Neighbors.- 4 Clustering.- 5 Graph Clustering.- 6 Best-Fit Subspaces.- 7 Singular Value Decomposition.- 8 Curse and Blessing of High Dimensionality.- 9 Concentration of Measure.- 10 Gaussian Random Vectors in High Dimensions.- 11 Dimensionality Reduction à la Johnson-Lindenstrauss.- 12 Separation and Fitting of HIgh-Dimensional Gaussians.- 13 Perceptron.- 14 Support Vector Machines.- 15 Kernel Method.- 16 Neural Networks.- 17 Gradient Descent for Convex Functions.- Appendix: Selected Results of Probability Theory.- Bibliography.- Index.