- ホーム
- > 電子洋書
Description
A hands-on and intuitive guide to the foundations of modern deep learning
In Deep Learning: Principles and Implementations, distinguished researcher and professor Weidong “Will” Kuang delivers an up-to-date exploration of how major deep learning algorithms and architectures are formalized and developed from mathematical equations. The book bridges theory and practice and covers a wide range of fundamental topics, including linear regression, logistic regression, basic neural networks, convolution neural networks, as well as other basic and advanced subjects in the field.
The author provides intuitive introductions to each subject and presents the development of algorithms and architectures from basic mathematical concepts. Along the way, he relies on straightforward math to keep the topics accessible for non-mathematicians and accompanies his explanations with tested Python sample code you can apply in your own work.
You’ll also find:
- Thorough introductions to both linear and logistic regression, offering a solid foundation and insight into neural networks
- Comprehensive explorations of neural networks, computer vision, natural language processing, generative models, and reinforcement learning
- Practical exercises that students and practitioners can use to apply and develop the concepts found in the book
- Balanced treatments of the mathematics, algorithms, architecture, and code that serve as the foundations of a complete understanding of deep learning
Perfect for undergraduate and graduate students with an interest in deep learning, Deep Learning: Principles and Implementations will also benefit practicing software engineers, faculty, and researchers whose work involves deep learning and related topics.
Table of Contents
Preface xv
Mathematical Notation xxi
1 Introduction to Deep Learning 1
1.1 Introduction 1
1.2 Types of Machine Learning 2
1.2.1 Supervised Learning 3
1.2.2 Unsupervised Learning 5
1.2.3 Reinforcement Learning 6
1.3 Data Representation in Machine Learning 6
1.3.1 Tensor 6
1.3.2 Datasets: Training, Validation, and Testing 7
1.3.3 Resources of Datasets 8
1.4 An Overview of Deep Learning 8
1.4.1 Perceptron 9
1.4.2 Multilayer Neural Networks and Backpropagation 10
1.4.3 Convolutional Neural Networks (CNNs) 11
1.4.4 Recurrent Neural Networks (RNNs) 12
1.4.5 Reinforcement Learning 14
1.5 Resources for Deep Learning 14
1.5.1 Frameworks 15
1.5.2 Resources for Studying Deep Learning 15
Exercises 17
References 18
2 Linear Regression 19
2.1 Linear Regression with Single Feature 19
2.1.1 Linear Regression Model 19
2.1.2 Loss Function 20
2.1.3 Analytic Solution 20
2.1.4 Gradient Descent Algorithm 22
2.2 Linear Regression with Multiple Features 25
2.3 Linear Models for Regression 28
2.3.1 Polynomial Curve Fitting 28
2.3.2 Linear Models with Basis Functions 29
2.4 Linear Regression – a Probabilistic Perspective View 31
2.4.1 Equivalence of Least Square Error and Maximum Likelihood Estimation 32
2.4.2 Loss Analysis: Bias and Variance 33
2.5 An Example: House Price Prediction 35
2.5.1 Practical Issues: Feature Scaling and Learning Rate 35
2.5.2 Linear Regression for House Price Prediction in Python 37
2.6 Summary and Further Reading 41
Exercises 42
References 44
3 Classification and Logistic Regression 45
3.1 Logistic Regression 45
3.1.1 Classification 45
3.1.2 Logistic Regression Model 46
3.1.3 Learn the Model: Find Optimal θ Based on a Dataset 49
3.2 Performance Metrics for Classification 52
3.2.1 Metrics for Two-Class Classification 52
3.2.2 Metrics for Multi-Class Classification 54
3.2.3 Receiver Operating Characteristic (ROC) Curve 55
3.3 Implementation of Logistic Regression in Python 56
3.4 Summary 61
Exercises 62
4 Basics of Neural Networks 67
4.1 A Simplest Neural Network: A Logistic Regression Unit 67
4.2 From Regression to Neural Networks 69
4.3 Neural Network Representation: Feedforward Propagation 72
4.4 Activation Functions 73
4.5 Network Training: Backward Propagation 76
4.6 Multi-class Classification: Softmax and Cross-Entropy Loss 79
4.6.1 Softmax Activation in Neural Network 79
4.6.2 Cross-Entropy Loss and Backpropagation 80
4.7 Practice in Python 82
4.7.1 A Simple Two-layer Neural Network for Binary Classification 82
4.7.2 Multi-class Classification on MNIST Dataset 91
4.8 Summary and Further Reading 100
Exercises 101
Reference 105
5 Practical Considerations in Neural Networks 107
5.1 Multiple-Layer Neural Networks 108
5.1.1 Architecture 108
5.1.2 Forward Propagation and Backward Propagation 109
5.2 Generalization and Model Selection 111
5.2.1 Generalization, Underfitting, and Overfitting 111
5.2.2 Training Set, Validation Set, and Test Set 112
5.2.3 Model Selection and K-Fold Cross-Validation 113
5.3 Regularization 115
5.3.1 Regularization for Linear Regression 115
5.3.2 Regularization for Logistic Regression 117
5.3.3 Regularization for Neural Network 117
5.3.4 Dropout for Regularization 118
5.4 Weight Initialization 119
5.4.1 Xavier Initialization 120
5.4.2 He Initialization 121
5.5 Mini-batch Gradient Descent 122
5.5.1 Three Types of Gradient Descent 122
5.5.2 Implementation of Mini-batch Gradient Descent 123
5.5.3 Selection of Mini-batch Size 124
5.6 Normalization 124
5.6.1 Input Feature Normalization 124
5.6.2 Batch Normalization 125
5.7 Adam Optimization 129
5.7.1 Gradient Descent with Momentum 129
5.7.2 Adam Optimization Algorithm 130
5.7.3 Learning Rate Decay 131
5.8 Gradient Checking 132
5.9 Examples in Python 133
5.9.1 A 3-Layer Network with Regularization and Dropout 133
5.9.2 A 3-Layer Network for Multi-classification with Mini-batch Training and Different Optimization Options 147
5.10 Summary and Further Reading 166
Exercises 166
References 169
6 Introduction to PyTorch 171
6.1 Why PyTorch? 171
6.2 Tensors 172
6.2.1 Tensor: Multidimensional Array 172
6.2.2 Indexing and Operations on Tensors 173
6.3 Data Representation Using Tensors 184
6.3.1 Images 184
6.3.2 Excel CSV Files 186
6.3.3 Converting Categorical Label to One-hot Label 189
6.4 Linear Regression Using PyTorch 189
6.4.1 Dataset 189
6.4.2 Linear Regression Without Using Autograd 190
6.4.3 Linear Regression Using Autograd 192
6.4.4 Linear Regression Using Autograd and Optim 195
6.5 Neural Networks Using PyTorch 198
6.5.1 Download Dataset and Transforms 198
6.5.2 Create Customized Datasets from CIFAR-10 199
6.5.3 Neural Network Model 200
6.5.4 Train the Model Using DataLoader 201
6.5.5 Access Parameters of the Trained Model 202
6.6 Summary and Further Reading 203
Exercises 203
References 204
7 Convolutional Neural Networks 205
7.1 Architecture of Convolutional Neural Networks 205
7.1.1 Motivation of Convolutional Neural Networks 205
7.1.2 Architecture of Convolutional Neural Networks 207
7.2 Convolution Layer 207
7.2.1 Convolution Operation 207
7.2.2 Stride and Zero-Padding 208
7.2.3 Convolution Implementation by Matrix Multiplication 210
7.3 Pooling Layer and Fully Connected Layer 212
7.3.1 Pooling Layer (POOL) 212
7.3.2 Fully Connected Layer (FC) 212
7.3.3 CNN Example: LeNet- 5 213
7.4 Backpropagation in CNNs (Optional) 214
7.4.1 Backpropagation in CONV Layers 214
7.4.2 Backpropagation in Pooling Layers 219
7.5 Batch Normalization for CNNs 222
7.6 Implement CNNs in PyTorch 223
7.6.1 Dataset 224
7.6.2 Modules in torch.nn and Functions in torch.nn.functional 225
7.6.3 Training CNNs 227
7.6.4 Testing the Trained Model 228
7.6.5 Save and Load the Trained Model 229
7.6.6 CIFAR-10 Image Classifier 229
7.7 Summary and Further Reading 234
Exercises 235
References 238
8 Classic Architectures of CNNs 239
8.1 Datasets 239
8.1.1 Mnist 240
8.1.2 Fashion-MNIST 241
8.1.3 Cifar- 10 241
8.1.4 Cifar- 100 242
8.1.5 ImageNet (https://image-net.org/index.php) 243
8.1.6 COCO Dataset (https://cocodataset.org/#home) 243
8.1.7 Cityscapes (https://www.cityscapes-dataset.com/) 243
8.2 AlexNet 243
8.3 VGG: Networks Using Blocks 246
8.4 GoogLeNet 249
8.4.1 Inception Blocks 249
8.4.2 GoogLeNet Architecture 250
8.5 ResNet 250
8.5.1 Residual Block 252
8.5.2 ResNet Architectures 253
8.6 Pretrained Models 253
8.6.1 Load Pretrained Model Using Torchvision 255
8.6.2 Image Classification Using Pretrained AlexNet 258
8.6.3 Fine-Tune Pretrained Model 260
8.7 Summary and Further Reading 265
Exercises 266
References 267
9 Object Detection – YOLO 269
9.1 Introduction 269
9.2 YOLO (v1) 270
9.2.1 Architecture of Yolo V 1 271
9.2.2 Training and Loss Function 271
9.2.3 Inference and Non-maximal Suppression (NMS) 275
9.3 YOLO (v2) 276
9.3.1 Architecture of Yolo V 2 276
9.3.2 Anchor Boxes 278
9.3.3 Predictions From Yolo V 2 279
9.4 YOLO (v3) 280
9.4.1 Architecture of Yolo V 3 281
9.4.2 Loss Function of Yolo V 3 284
9.5 Implementation of YOLO v3 Using Pre-trained Model 289
9.5.1 Model Architecture Specified by a Configuration File: yolov3.cfg 289
9.5.2 Create the Model and Load the Weights 290
9.5.3 Non-max Suppression 301
9.5.4 Put It All Together 307
9.6 A Metric for Object Detection: mAP 310
9.6.1 Precision and Recall in Object Detection 310
9.6.2 Mean Average Precision 311
9.7 Summary and Further Reading 314
Exercises 315
References 316
10 Introduction to Probabilistic Generative Models 319
10.1 Generative Models with Latent Variables 320
10.1.1 Graph Representation 320
10.1.2 Gaussian Mixture Models 321
10.2 EM Algorithm 323
10.2.1 EM Algorithm for GMMs 323
10.2.2 EM Algorithm for Latent Variable Models in General 327
10.3 Variational Auto-encoder (VAE) 333
10.3.1 Variational Lower Bound 333
10.3.2 Gradients of Variational Lower Bound 335
10.3.3 Variational Auto-encoder 337
10.4 VAE on MNIST Dataset in PyTorch 340
10.4.1 Architecture of VAE 340
10.4.2 Implementation in PyTorch 341
10.4.3 Conditional VAE 346
10.5 Summary and Further Reading 348
Exercises 348
References 350
11 Generative Adversarial Networks 351
11.1 Mathematical Description of the Original GAN 351
11.1.1 Principle and Algorithm 351
11.1.2 Convergence of GANs 353
11.2 Implementation of GANs 354
11.2.1 Alternating Two Training Processes 355
11.2.2 Transposed Convolutional Neural Networks 357
11.2.3 An Example of GAN 361
11.3 Practical Issues with the Original GAN 362
11.4 Conditional GAN 362
11.4.1 Principle and Loss Function for Conditional GANs 362
11.4.2 Implementation 363
11.5 InfoGAN 364
11.5.1 Principle and Loss Function 364
11.5.2 Principle and Loss Function 366
11.6 Wasserstein GAN 367
11.6.1 Wasserstein Distance and WGAN Principle 367
11.6.2 WGAN with Weight Clipping 369
11.6.3 WGAN with Gradient Penalty 371
11.7 CycleGAN 372
11.7.1 Principle and Loss Function 372
11.7.2 Implementation 374
11.8 f-GANs 375
11.8.1 f-divergences 375
11.8.2 Variational Divergence Minimization 375
11.8.3 Basic GAN: A Special Case of f-divergence Model 376
11.8.4 Algorithm of f-GANs 378
11.9 Example: Deep Convolutional GAN on MNIST Dataset 378
11.9.1 Basic DCGAN 378
11.9.2 Conditional DCGAN for MNIST Dataset 385
11.10 Summary and Further Reading 394
Exercises 395
References 396
12 Diffusion Models 399
12.1 Revisit Variational Auto-Encoder 399
12.1.1 Evidence Lower Bound 399
12.1.2 Variational Auto-Encoder 400
12.2 Denoising Diffusion Probabilistic Models (DDPMs) 401
12.2.1 Diffusion Process and Denoising Process 401
12.2.2 ELBO of Diffusion Models 403
12.2.3 Training of the Denoising Diffusion Models 405
12.3 Score-Based Generative Modeling 409
12.3.1 Score Matching 410
12.3.2 Sampling by Langevin Dynamics 411
12.3.3 Noise Conditional Score Network with Multiple Noise Perturbations 412
12.3.4 Connection Between Denoising Diffusion Probabilistic Models and Score-Based Models 414
12.3.5 Continuous-Time Diffusion Modeling via Stochastic Differential Equation 415
12.4 Denoising Diffusion Implicit Models for Acceleration 417
12.4.1 Non-Markovian Forward Processes 417
12.4.2 Generative Process and DDIMs 419
12.4.3 Accelerated Generation Process 420
12.5 Guidance 421
12.5.1 Classifier Guidance 422
12.5.2 Classifier-Free Guidance 423
12.6 Implementation of a Simple Diffusion Model on MNIST Dataset 424
12.6.1 Architecture of Diffusion Model 424
12.6.2 Implementation in PyTorch 426
12.7 Summary and Further Reading 436
Exercises 436
References 437
13 Word Embedding 439
13.1 Introduction to Natural Language Processing 439
13.1.1 Pipeline of Natural Language Processing 440
13.1.2 Text Preprocess 440
13.1.3 Word Embedding 441
13.2 Word2vec 442
13.2.1 Continuous Bag-of-Word Model (CBOW) 442
13.2.2 Skip-Gram Model 448
13.3 Hierarchical Softmax in Word2vec 452
13.3.1 Problem with Softmax in CBOW and Skip-Gram Models 452
13.3.2 Hierarchical Softmax Using Binary Tree 453
13.3.3 Huffman Tree 456
13.3.4 Algorithms for CBOW and Skip-Gram Models with Hierarchical Softmax 457
13.4 Negative Sampling in Word2vec 459
13.4.1 Negative Sampling 459
13.4.2 Algorithms of Word2vec with Negative Sampling 461
13.5 GloVe 463
13.6 Implementation of a Skip-Gram Model by PyTorch 464
13.6.1 Dataset and Pre-process 464
13.6.2 Generate Training Batches 468
13.6.3 Skip-Gram Model 469
13.6.4 Training and Validation 470
13.7 Summary and Further Reading 472
Exercises 473
References 474
14 Recurrent Neural Networks 475
14.1 Introduction to Sequence Models 475
14.2 Basic RNNs 476
14.2.1 Vanilla RNN Architecture 477
14.2.2 Backpropagation Through Time (BPTT) 478
14.2.3 Gradient Exploding and Vanishing Problem 481
14.3 Long Short-Term Memory 482
14.3.1 Gated Recurrent Unit (GRU) 482
14.3.2 Long Short-Term Memory Unit 483
14.3.3 Variants of LSTM 485
14.4 Practical RNN Architectures 486
14.4.1 Task-Specific RNN Architectures 486
14.4.2 Deep RNN 487
14.4.3 Bidirectional RNN 488
14.4.4 Deep Bidirectional RNN 489
14.5 Sequence-to-Sequence Learning: An Application of RNNs 490
14.5.1 Language Modeling by RNNs 490
14.5.2 Encoder-Decoder RNN Architecture for Sequence-to-Sequence Learning 491
14.5.3 Beam Search 493
14.6 Attention Mechanism in Encoder-Decoder Architectures 494
14.7 BLEU: A Metric of Machine Translation 496
14.7.1 Clipped n-gram Precision 497
14.7.2 BLEU Definition 498
14.8 Implementations of RNNs Using PyTorch 499
14.8.1 Dataset 499
14.8.2 RNN Model 500
14.8.3 Training and Testing 502
14.9 Summary and Further Reading 504
Exercises 505
References 506
15 Transformer 509
15.1 Bahdanau Attention Mechanism 510
15.1.1 RNN Encoder-Decoder Revisit 510
15.1.2 Bahdanau Attention Mechanism 511
15.2 Attention Mechanism 512
15.2.1 General Attention Mechanism 512
15.2.2 Scaled Dot-Product Attention 512
15.3 Transformer Architecture 514
15.3.1 Input to the Encoder 514
15.3.2 Encoder 515
15.3.3 Decoder 517
15.3.4 Final Linear and Softmax Layer 519
15.4 Bert 520
15.4.1 BERT Architecture 520
15.4.2 Pre-training 521
15.4.3 Fine-tuning 523
15.5 Generative Pre-trained Transformer (GPT) 526
15.5.1 Architecture of GPT 526
15.5.2 Pre-training of GPT 527
15.5.3 Fine-tuning of GPT 527
15.6 Implementation of a Transformer in PyTorch 529
15.6.1 Overall Architecture 529
15.6.2 Building Blocks 530
15.6.3 Put It All Together 540
15.7 Summary and Further Reading 543
Exercises 543
References 544
16 Introduction to Reinforcement Learning 547
16.1 Definition of Markov Decision Process 547
16.1.1 Settings of Reinforcement Learning 547
16.1.2 Markov Decision Process 548
16.2 Policy, Value Function, and Bellman Equation 550
16.2.1 Policy and Value Function 551
16.2.2 Optimal Policies and Bellman Optimality Equations 555
16.3 Dynamic Programming for MDPs 557
16.3.1 Solve Bellman Equation by Dynamic Programming 557
16.3.2 Solve Bellman Optimality Equation by Dynamic Programming 559
16.4 Monte Carlo Learning 560
16.4.1 State Value Function Evaluation by Monte Carlo 560
16.4.2 Action-Value Function by Monte Carlo 562
16.5 Temporal Difference Learning 564
16.5.1 TD(0) Learning 564
16.5.2 TD(λ) Learning 565
16.5.3 SARSA: On-policy TD Control 566
16.5.4 Q-learning: Off-policy TD Control 567
16.6 Implementation of Q-Learning for a Mountain Car Task 568
16.6.1 Gym Environment 569
16.6.2 Q-learning 570
16.6.3 Performance Plot 574
16.7 Summary and Further Reading 575
Exercises 575
References 578
17 Deep Q-Learning 579
17.1 Value Function Approximation 579
17.1.1 State Value Function Approximation for Policy Evaluation 580
17.1.2 State-Action Value Function Approximation for Policy Control 581
17.2 Basic Deep Q-Network 582
17.2.1 Experience Replay 583
17.2.2 Deep Q-Network 583
17.3 Double Deep Q-Network 585
17.3.1 Double Q-learning 585
17.3.2 Double Deep Q-Network (Double DQN) 586
17.4 Implementation of DQN for Mountain Car-v 0 586
17.4.1 Q-Network for Mountain Car 586
17.4.2 Python Programming for Mountain Car 587
17.4.3 Another Example: CartPole-v 0 595
17.5 Summary and Further Reading 595
Exercises 596
References 598
18 Policy Gradient Methods 601
18.1 Introduction to Policy-Based Methods 601
18.2 Policy Gradient Theorem 602
18.3 REINFORCE Algorithm 605
18.3.1 Vanilla REINFORCE 606
18.3.2 Neural Networks for REINFORCE 607
18.3.3 REINFORCE with Baseline 608
18.4 Actor-Critic Methods 609
18.4.1 Advantage Actor-Critic Algorithm (TD-Error Actor-Critic) 610
18.4.2 Asynchronous Advantage Actor-Critic (A3C) Algorithm 611
18.5 Policy Optimization Methods 613
18.5.1 Trust Region Policy Optimization (TRPO) 613
18.5.2 Proximal Policy Optimization (PPO) 619
18.6 Deep Deterministic Policy Gradient (DDPG) 623
18.6.1 Q-network in DDPG 623
18.6.2 Policy Network in DDPG 623
18.6.3 DDPG Algorithm 624
18.6.4 Twin Delayed DDPG Algorithm 625
18.7 Soft Actor-Critic Algorithm 627
18.7.1 Entropy-Regularized Reinforcement Learning 627
18.7.2 Soft Actor-Critic Algorithm 628
18.8 On-Policy and Off-Policy 631
18.8.1 Revisit Q-Learning and SARSA 631
18.8.2 On-Policy and off-Policy in Policy Gradient Algorithms 632
18.9 Implementations of Policy Gradient Algorithms in Python 633
18.9.1 On-policy: PPO-clip Algorithm for Categorical Action Space: CartPole-v 0 633
18.9.2 On-policy: PPO-clip Algorithm for Continuous Action Space: Pendulum-v 0 639
18.9.3 Off-policy: DDPG for Continuous Action Space: Pendulum-v 0 645
18.9.4 Differences Between On-Policy Training and Off-Policy Training 651
18.10 Summary and Further Reading 652
Exercises 653
References 656
Appendix A Mathematics in Machine Learning 657
Index 717
-
- 電子書籍
- うめともものふつうの暮らし ストーリア…
-
- 洋書電子書籍
- Motivational Immedi…



