Description
Data Compression for Data Mining Algorithms tackles the important problems in the design of more efficient data mining algorithms by way of data compression techniques and provides the first systematic and comprehensive description of the relationships between data compression mechanisms and the computations involved in data mining algorithms. Data mining algorithms are powerful analytical techniques used across various disciplines, including business, engineering, and science. However, in the big data era, tasks such as association rule mining and classification often require multiple scans of databases, while clustering and outlier detection methods typically depend on Euclidean distance for similarity measures, leading to high computational costs.Data Compression for Data Mining Algorithms addresses these challenges by focusing on the scalarization of data mining algorithms, leveraging data compression techniques to reduce dataset sizes and applying information theory principles to minimize computations involved in tasks such as feature selection and similarity computation. The book features the latest developments in both lossless and lossy data compression methods and provides a comprehensive exposition of data compression methods for data mining algorithm design from multiple points of view.Key discussions include Huffman coding, scalar and vector quantization, transforms, subbands, wavelet-based compression for scalable algorithms, and the role of neural networks, particularly deep learning, in feature selection and dimensionality reduction. The book's contents are well-balanced for both theoretical analysis and real-world applications, and the chapters are well organized to compose a solid overview of the data compression techniques for data mining. To provide the reader with a more complete understanding of the material, projects and problems solved with Python are interspersed throughout the text.- Covers popular data compression methods and their solutions to aid in the development and application of data mining algorithms- Includes projects and problems solved with Python to help readers create programs for both data compression and data mining problems- Focuses on the scalarization of data mining algorithms, leveraging data compression techniques to reduce dataset sizes and applying information theory principles to minimize computations- Simplifies the content of the field of data compression by covering topics that are widely useful from a data mining perspective
Table of Contents
Part I: Foundation1. Overview and Contributions2. Introduction to Data Mining Algorithms3. Introduction to Data Compression MethodsPart II: Association Rule Mining4. Huffman Coding for Association Rule Mining5. Arithmetic Coding for Maximal Frequent Itemsets MiningPart III: Classification6. Feature Subset Selection for Decision Tree Construction7. Neural Networks for Decision Tree Construction8. Principal Component Analysis for Decision Tree Construction9. Dictionary Techniques for Support Vector Machine10. Quantization for Support Vector MachinePart IV: Clustering and Outlier Detection11. A Sparse Data Representation for Clustering12. Dictionary Coding Based Compression for Clustering13. Nearest Neighbor Based Compression for Outlier Detection14. Huffman Coding for Outlier Detection15. Arithmetic Coding for Outlier Detection



