深層学習:原理と実装<br>Deep Learning : Principles and Implementations

個数:1
紙書籍版価格
¥21,907
  • 電子書籍
  • ポイントキャンペーン

深層学習:原理と実装
Deep Learning : Principles and Implementations

  • 著者名:Kuang, Weidong
  • 価格 ¥12,804 (本体¥11,640)
  • Wiley(2026/04/23発売)
  • GWに本を読もう!Kinoppy 電子書籍・電子洋書 全点ポイント30倍キャンペーン(~5/6)
  • ポイント 3,480pt (実際に付与されるポイントはご注文内容確認画面でご確認下さい)
  • 言語:ENG
  • ISBN:9781394256006
  • eISBN:9781394256013

ファイル: /

Description

A hands-on and intuitive guide to the foundations of modern deep learning

In Deep Learning: Principles and Implementations, distinguished researcher and professor Weidong “Will” Kuang delivers an up-to-date exploration of how major deep learning algorithms and architectures are formalized and developed from mathematical equations. The book bridges theory and practice and covers a wide range of fundamental topics, including linear regression, logistic regression, basic neural networks, convolution neural networks, as well as other basic and advanced subjects in the field.

The author provides intuitive introductions to each subject and presents the development of algorithms and architectures from basic mathematical concepts. Along the way, he relies on straightforward math to keep the topics accessible for non-mathematicians and accompanies his explanations with tested Python sample code you can apply in your own work.

You’ll also find:

  • Thorough introductions to both linear and logistic regression, offering a solid foundation and insight into neural networks
  • Comprehensive explorations of neural networks, computer vision, natural language processing, generative models, and reinforcement learning
  • Practical exercises that students and practitioners can use to apply and develop the concepts found in the book
  • Balanced treatments of the mathematics, algorithms, architecture, and code that serve as the foundations of a complete understanding of deep learning

Perfect for undergraduate and graduate students with an interest in deep learning, Deep Learning: Principles and Implementations will also benefit practicing software engineers, faculty, and researchers whose work involves deep learning and related topics.

Table of Contents

Preface xv

Mathematical Notation xxi

1 Introduction to Deep Learning 1

1.1 Introduction 1

1.2 Types of Machine Learning 2

1.2.1 Supervised Learning 3

1.2.2 Unsupervised Learning 5

1.2.3 Reinforcement Learning 6

1.3 Data Representation in Machine Learning 6

1.3.1 Tensor 6

1.3.2 Datasets: Training, Validation, and Testing 7

1.3.3 Resources of Datasets 8

1.4 An Overview of Deep Learning 8

1.4.1 Perceptron 9

1.4.2 Multilayer Neural Networks and Backpropagation 10

1.4.3 Convolutional Neural Networks (CNNs) 11

1.4.4 Recurrent Neural Networks (RNNs) 12

1.4.5 Reinforcement Learning 14

1.5 Resources for Deep Learning 14

1.5.1 Frameworks 15

1.5.2 Resources for Studying Deep Learning 15

Exercises 17

References 18

2 Linear Regression 19

2.1 Linear Regression with Single Feature 19

2.1.1 Linear Regression Model 19

2.1.2 Loss Function 20

2.1.3 Analytic Solution 20

2.1.4 Gradient Descent Algorithm 22

2.2 Linear Regression with Multiple Features 25

2.3 Linear Models for Regression 28

2.3.1 Polynomial Curve Fitting 28

2.3.2 Linear Models with Basis Functions 29

2.4 Linear Regression – a Probabilistic Perspective View 31

2.4.1 Equivalence of Least Square Error and Maximum Likelihood Estimation 32

2.4.2 Loss Analysis: Bias and Variance 33

2.5 An Example: House Price Prediction 35

2.5.1 Practical Issues: Feature Scaling and Learning Rate 35

2.5.2 Linear Regression for House Price Prediction in Python 37

2.6 Summary and Further Reading 41

Exercises 42

References 44

3 Classification and Logistic Regression 45

3.1 Logistic Regression 45

3.1.1 Classification 45

3.1.2 Logistic Regression Model 46

3.1.3 Learn the Model: Find Optimal θ Based on a Dataset 49

3.2 Performance Metrics for Classification 52

3.2.1 Metrics for Two-Class Classification 52

3.2.2 Metrics for Multi-Class Classification 54

3.2.3 Receiver Operating Characteristic (ROC) Curve 55

3.3 Implementation of Logistic Regression in Python 56

3.4 Summary 61

Exercises 62

4 Basics of Neural Networks 67

4.1 A Simplest Neural Network: A Logistic Regression Unit 67

4.2 From Regression to Neural Networks 69

4.3 Neural Network Representation: Feedforward Propagation 72

4.4 Activation Functions 73

4.5 Network Training: Backward Propagation 76

4.6 Multi-class Classification: Softmax and Cross-Entropy Loss 79

4.6.1 Softmax Activation in Neural Network 79

4.6.2 Cross-Entropy Loss and Backpropagation 80

4.7 Practice in Python 82

4.7.1 A Simple Two-layer Neural Network for Binary Classification 82

4.7.2 Multi-class Classification on MNIST Dataset 91

4.8 Summary and Further Reading 100

Exercises 101

Reference 105

5 Practical Considerations in Neural Networks 107

5.1 Multiple-Layer Neural Networks 108

5.1.1 Architecture 108

5.1.2 Forward Propagation and Backward Propagation 109

5.2 Generalization and Model Selection 111

5.2.1 Generalization, Underfitting, and Overfitting 111

5.2.2 Training Set, Validation Set, and Test Set 112

5.2.3 Model Selection and K-Fold Cross-Validation 113

5.3 Regularization 115

5.3.1 Regularization for Linear Regression 115

5.3.2 Regularization for Logistic Regression 117

5.3.3 Regularization for Neural Network 117

5.3.4 Dropout for Regularization 118

5.4 Weight Initialization 119

5.4.1 Xavier Initialization 120

5.4.2 He Initialization 121

5.5 Mini-batch Gradient Descent 122

5.5.1 Three Types of Gradient Descent 122

5.5.2 Implementation of Mini-batch Gradient Descent 123

5.5.3 Selection of Mini-batch Size 124

5.6 Normalization 124

5.6.1 Input Feature Normalization 124

5.6.2 Batch Normalization 125

5.7 Adam Optimization 129

5.7.1 Gradient Descent with Momentum 129

5.7.2 Adam Optimization Algorithm 130

5.7.3 Learning Rate Decay 131

5.8 Gradient Checking 132

5.9 Examples in Python 133

5.9.1 A 3-Layer Network with Regularization and Dropout 133

5.9.2 A 3-Layer Network for Multi-classification with Mini-batch Training and Different Optimization Options 147

5.10 Summary and Further Reading 166

Exercises 166

References 169

6 Introduction to PyTorch 171

6.1 Why PyTorch? 171

6.2 Tensors 172

6.2.1 Tensor: Multidimensional Array 172

6.2.2 Indexing and Operations on Tensors 173

6.3 Data Representation Using Tensors 184

6.3.1 Images 184

6.3.2 Excel CSV Files 186

6.3.3 Converting Categorical Label to One-hot Label 189

6.4 Linear Regression Using PyTorch 189

6.4.1 Dataset 189

6.4.2 Linear Regression Without Using Autograd 190

6.4.3 Linear Regression Using Autograd 192

6.4.4 Linear Regression Using Autograd and Optim 195

6.5 Neural Networks Using PyTorch 198

6.5.1 Download Dataset and Transforms 198

6.5.2 Create Customized Datasets from CIFAR-10 199

6.5.3 Neural Network Model 200

6.5.4 Train the Model Using DataLoader 201

6.5.5 Access Parameters of the Trained Model 202

6.6 Summary and Further Reading 203

Exercises 203

References 204

7 Convolutional Neural Networks 205

7.1 Architecture of Convolutional Neural Networks 205

7.1.1 Motivation of Convolutional Neural Networks 205

7.1.2 Architecture of Convolutional Neural Networks 207

7.2 Convolution Layer 207

7.2.1 Convolution Operation 207

7.2.2 Stride and Zero-Padding 208

7.2.3 Convolution Implementation by Matrix Multiplication 210

7.3 Pooling Layer and Fully Connected Layer 212

7.3.1 Pooling Layer (POOL) 212

7.3.2 Fully Connected Layer (FC) 212

7.3.3 CNN Example: LeNet- 5 213

7.4 Backpropagation in CNNs (Optional) 214

7.4.1 Backpropagation in CONV Layers 214

7.4.2 Backpropagation in Pooling Layers 219

7.5 Batch Normalization for CNNs 222

7.6 Implement CNNs in PyTorch 223

7.6.1 Dataset 224

7.6.2 Modules in torch.nn and Functions in torch.nn.functional 225

7.6.3 Training CNNs 227

7.6.4 Testing the Trained Model 228

7.6.5 Save and Load the Trained Model 229

7.6.6 CIFAR-10 Image Classifier 229

7.7 Summary and Further Reading 234

Exercises 235

References 238

8 Classic Architectures of CNNs 239

8.1 Datasets 239

8.1.1 Mnist 240

8.1.2 Fashion-MNIST 241

8.1.3 Cifar- 10 241

8.1.4 Cifar- 100 242

8.1.5 ImageNet (https://image-net.org/index.php) 243

8.1.6 COCO Dataset (https://cocodataset.org/#home) 243

8.1.7 Cityscapes (https://www.cityscapes-dataset.com/) 243

8.2 AlexNet 243

8.3 VGG: Networks Using Blocks 246

8.4 GoogLeNet 249

8.4.1 Inception Blocks 249

8.4.2 GoogLeNet Architecture 250

8.5 ResNet 250

8.5.1 Residual Block 252

8.5.2 ResNet Architectures 253

8.6 Pretrained Models 253

8.6.1 Load Pretrained Model Using Torchvision 255

8.6.2 Image Classification Using Pretrained AlexNet 258

8.6.3 Fine-Tune Pretrained Model 260

8.7 Summary and Further Reading 265

Exercises 266

References 267

9 Object Detection – YOLO 269

9.1 Introduction 269

9.2 YOLO (v1) 270

9.2.1 Architecture of Yolo V 1 271

9.2.2 Training and Loss Function 271

9.2.3 Inference and Non-maximal Suppression (NMS) 275

9.3 YOLO (v2) 276

9.3.1 Architecture of Yolo V 2 276

9.3.2 Anchor Boxes 278

9.3.3 Predictions From Yolo V 2 279

9.4 YOLO (v3) 280

9.4.1 Architecture of Yolo V 3 281

9.4.2 Loss Function of Yolo V 3 284

9.5 Implementation of YOLO v3 Using Pre-trained Model 289

9.5.1 Model Architecture Specified by a Configuration File: yolov3.cfg 289

9.5.2 Create the Model and Load the Weights 290

9.5.3 Non-max Suppression 301

9.5.4 Put It All Together 307

9.6 A Metric for Object Detection: mAP 310

9.6.1 Precision and Recall in Object Detection 310

9.6.2 Mean Average Precision 311

9.7 Summary and Further Reading 314

Exercises 315

References 316

10 Introduction to Probabilistic Generative Models 319

10.1 Generative Models with Latent Variables 320

10.1.1 Graph Representation 320

10.1.2 Gaussian Mixture Models 321

10.2 EM Algorithm 323

10.2.1 EM Algorithm for GMMs 323

10.2.2 EM Algorithm for Latent Variable Models in General 327

10.3 Variational Auto-encoder (VAE) 333

10.3.1 Variational Lower Bound 333

10.3.2 Gradients of Variational Lower Bound 335

10.3.3 Variational Auto-encoder 337

10.4 VAE on MNIST Dataset in PyTorch 340

10.4.1 Architecture of VAE 340

10.4.2 Implementation in PyTorch 341

10.4.3 Conditional VAE 346

10.5 Summary and Further Reading 348

Exercises 348

References 350

11 Generative Adversarial Networks 351

11.1 Mathematical Description of the Original GAN 351

11.1.1 Principle and Algorithm 351

11.1.2 Convergence of GANs 353

11.2 Implementation of GANs 354

11.2.1 Alternating Two Training Processes 355

11.2.2 Transposed Convolutional Neural Networks 357

11.2.3 An Example of GAN 361

11.3 Practical Issues with the Original GAN 362

11.4 Conditional GAN 362

11.4.1 Principle and Loss Function for Conditional GANs 362

11.4.2 Implementation 363

11.5 InfoGAN 364

11.5.1 Principle and Loss Function 364

11.5.2 Principle and Loss Function 366

11.6 Wasserstein GAN 367

11.6.1 Wasserstein Distance and WGAN Principle 367

11.6.2 WGAN with Weight Clipping 369

11.6.3 WGAN with Gradient Penalty 371

11.7 CycleGAN 372

11.7.1 Principle and Loss Function 372

11.7.2 Implementation 374

11.8 f-GANs 375

11.8.1 f-divergences 375

11.8.2 Variational Divergence Minimization 375

11.8.3 Basic GAN: A Special Case of f-divergence Model 376

11.8.4 Algorithm of f-GANs 378

11.9 Example: Deep Convolutional GAN on MNIST Dataset 378

11.9.1 Basic DCGAN 378

11.9.2 Conditional DCGAN for MNIST Dataset 385

11.10 Summary and Further Reading 394

Exercises 395

References 396

12 Diffusion Models 399

12.1 Revisit Variational Auto-Encoder 399

12.1.1 Evidence Lower Bound 399

12.1.2 Variational Auto-Encoder 400

12.2 Denoising Diffusion Probabilistic Models (DDPMs) 401

12.2.1 Diffusion Process and Denoising Process 401

12.2.2 ELBO of Diffusion Models 403

12.2.3 Training of the Denoising Diffusion Models 405

12.3 Score-Based Generative Modeling 409

12.3.1 Score Matching 410

12.3.2 Sampling by Langevin Dynamics 411

12.3.3 Noise Conditional Score Network with Multiple Noise Perturbations 412

12.3.4 Connection Between Denoising Diffusion Probabilistic Models and Score-Based Models 414

12.3.5 Continuous-Time Diffusion Modeling via Stochastic Differential Equation 415

12.4 Denoising Diffusion Implicit Models for Acceleration 417

12.4.1 Non-Markovian Forward Processes 417

12.4.2 Generative Process and DDIMs 419

12.4.3 Accelerated Generation Process 420

12.5 Guidance 421

12.5.1 Classifier Guidance 422

12.5.2 Classifier-Free Guidance 423

12.6 Implementation of a Simple Diffusion Model on MNIST Dataset 424

12.6.1 Architecture of Diffusion Model 424

12.6.2 Implementation in PyTorch 426

12.7 Summary and Further Reading 436

Exercises 436

References 437

13 Word Embedding 439

13.1 Introduction to Natural Language Processing 439

13.1.1 Pipeline of Natural Language Processing 440

13.1.2 Text Preprocess 440

13.1.3 Word Embedding 441

13.2 Word2vec 442

13.2.1 Continuous Bag-of-Word Model (CBOW) 442

13.2.2 Skip-Gram Model 448

13.3 Hierarchical Softmax in Word2vec 452

13.3.1 Problem with Softmax in CBOW and Skip-Gram Models 452

13.3.2 Hierarchical Softmax Using Binary Tree 453

13.3.3 Huffman Tree 456

13.3.4 Algorithms for CBOW and Skip-Gram Models with Hierarchical Softmax 457

13.4 Negative Sampling in Word2vec 459

13.4.1 Negative Sampling 459

13.4.2 Algorithms of Word2vec with Negative Sampling 461

13.5 GloVe 463

13.6 Implementation of a Skip-Gram Model by PyTorch 464

13.6.1 Dataset and Pre-process 464

13.6.2 Generate Training Batches 468

13.6.3 Skip-Gram Model 469

13.6.4 Training and Validation 470

13.7 Summary and Further Reading 472

Exercises 473

References 474

14 Recurrent Neural Networks 475

14.1 Introduction to Sequence Models 475

14.2 Basic RNNs 476

14.2.1 Vanilla RNN Architecture 477

14.2.2 Backpropagation Through Time (BPTT) 478

14.2.3 Gradient Exploding and Vanishing Problem 481

14.3 Long Short-Term Memory 482

14.3.1 Gated Recurrent Unit (GRU) 482

14.3.2 Long Short-Term Memory Unit 483

14.3.3 Variants of LSTM 485

14.4 Practical RNN Architectures 486

14.4.1 Task-Specific RNN Architectures 486

14.4.2 Deep RNN 487

14.4.3 Bidirectional RNN 488

14.4.4 Deep Bidirectional RNN 489

14.5 Sequence-to-Sequence Learning: An Application of RNNs 490

14.5.1 Language Modeling by RNNs 490

14.5.2 Encoder-Decoder RNN Architecture for Sequence-to-Sequence Learning 491

14.5.3 Beam Search 493

14.6 Attention Mechanism in Encoder-Decoder Architectures 494

14.7 BLEU: A Metric of Machine Translation 496

14.7.1 Clipped n-gram Precision 497

14.7.2 BLEU Definition 498

14.8 Implementations of RNNs Using PyTorch 499

14.8.1 Dataset 499

14.8.2 RNN Model 500

14.8.3 Training and Testing 502

14.9 Summary and Further Reading 504

Exercises 505

References 506

15 Transformer 509

15.1 Bahdanau Attention Mechanism 510

15.1.1 RNN Encoder-Decoder Revisit 510

15.1.2 Bahdanau Attention Mechanism 511

15.2 Attention Mechanism 512

15.2.1 General Attention Mechanism 512

15.2.2 Scaled Dot-Product Attention 512

15.3 Transformer Architecture 514

15.3.1 Input to the Encoder 514

15.3.2 Encoder 515

15.3.3 Decoder 517

15.3.4 Final Linear and Softmax Layer 519

15.4 Bert 520

15.4.1 BERT Architecture 520

15.4.2 Pre-training 521

15.4.3 Fine-tuning 523

15.5 Generative Pre-trained Transformer (GPT) 526

15.5.1 Architecture of GPT 526

15.5.2 Pre-training of GPT 527

15.5.3 Fine-tuning of GPT 527

15.6 Implementation of a Transformer in PyTorch 529

15.6.1 Overall Architecture 529

15.6.2 Building Blocks 530

15.6.3 Put It All Together 540

15.7 Summary and Further Reading 543

Exercises 543

References 544

16 Introduction to Reinforcement Learning 547

16.1 Definition of Markov Decision Process 547

16.1.1 Settings of Reinforcement Learning 547

16.1.2 Markov Decision Process 548

16.2 Policy, Value Function, and Bellman Equation 550

16.2.1 Policy and Value Function 551

16.2.2 Optimal Policies and Bellman Optimality Equations 555

16.3 Dynamic Programming for MDPs 557

16.3.1 Solve Bellman Equation by Dynamic Programming 557

16.3.2 Solve Bellman Optimality Equation by Dynamic Programming 559

16.4 Monte Carlo Learning 560

16.4.1 State Value Function Evaluation by Monte Carlo 560

16.4.2 Action-Value Function by Monte Carlo 562

16.5 Temporal Difference Learning 564

16.5.1 TD(0) Learning 564

16.5.2 TD(λ) Learning 565

16.5.3 SARSA: On-policy TD Control 566

16.5.4 Q-learning: Off-policy TD Control 567

16.6 Implementation of Q-Learning for a Mountain Car Task 568

16.6.1 Gym Environment 569

16.6.2 Q-learning 570

16.6.3 Performance Plot 574

16.7 Summary and Further Reading 575

Exercises 575

References 578

17 Deep Q-Learning 579

17.1 Value Function Approximation 579

17.1.1 State Value Function Approximation for Policy Evaluation 580

17.1.2 State-Action Value Function Approximation for Policy Control 581

17.2 Basic Deep Q-Network 582

17.2.1 Experience Replay 583

17.2.2 Deep Q-Network 583

17.3 Double Deep Q-Network 585

17.3.1 Double Q-learning 585

17.3.2 Double Deep Q-Network (Double DQN) 586

17.4 Implementation of DQN for Mountain Car-v 0 586

17.4.1 Q-Network for Mountain Car 586

17.4.2 Python Programming for Mountain Car 587

17.4.3 Another Example: CartPole-v 0 595

17.5 Summary and Further Reading 595

Exercises 596

References 598

18 Policy Gradient Methods 601

18.1 Introduction to Policy-Based Methods 601

18.2 Policy Gradient Theorem 602

18.3 REINFORCE Algorithm 605

18.3.1 Vanilla REINFORCE 606

18.3.2 Neural Networks for REINFORCE 607

18.3.3 REINFORCE with Baseline 608

18.4 Actor-Critic Methods 609

18.4.1 Advantage Actor-Critic Algorithm (TD-Error Actor-Critic) 610

18.4.2 Asynchronous Advantage Actor-Critic (A3C) Algorithm 611

18.5 Policy Optimization Methods 613

18.5.1 Trust Region Policy Optimization (TRPO) 613

18.5.2 Proximal Policy Optimization (PPO) 619

18.6 Deep Deterministic Policy Gradient (DDPG) 623

18.6.1 Q-network in DDPG 623

18.6.2 Policy Network in DDPG 623

18.6.3 DDPG Algorithm 624

18.6.4 Twin Delayed DDPG Algorithm 625

18.7 Soft Actor-Critic Algorithm 627

18.7.1 Entropy-Regularized Reinforcement Learning 627

18.7.2 Soft Actor-Critic Algorithm 628

18.8 On-Policy and Off-Policy 631

18.8.1 Revisit Q-Learning and SARSA 631

18.8.2 On-Policy and off-Policy in Policy Gradient Algorithms 632

18.9 Implementations of Policy Gradient Algorithms in Python 633

18.9.1 On-policy: PPO-clip Algorithm for Categorical Action Space: CartPole-v 0 633

18.9.2 On-policy: PPO-clip Algorithm for Continuous Action Space: Pendulum-v 0 639

18.9.3 Off-policy: DDPG for Continuous Action Space: Pendulum-v 0 645

18.9.4 Differences Between On-Policy Training and Off-Policy Training 651

18.10 Summary and Further Reading 652

Exercises 653

References 656

Appendix A Mathematics in Machine Learning 657

Index 717

 

最近チェックした商品