- ホーム
- > 洋書
- > 英文書
- > Science / Mathematics
Full Description
Introduces professionals and scientists to statistics and machine learning using the programming language R
 Written by and for practitioners, this book provides an overall introduction to R, focusing on tools and methods commonly used in data science, and placing emphasis on practice and business use. It covers a wide range of topics in a single volume, including big data, databases, statistical machine learning, data wrangling, data visualization, and the reporting of results. The topics covered are all important for someone with a science/math background that is looking to quickly learn several practical technologies to enter or transition to the growing field of data science. 
 The Big R-Book for Professionals: From Data Science to Learning Machines and Reporting with R includes nine parts, starting with an introduction to the subject and followed by an overview of R and elements of statistics. The third part revolves around data, while the fourth focuses on data wrangling. Part 5 teaches readers about exploring data. In Part 6 we learn to build models, Part 7 introduces the reader to the reality in companies, Part 8 covers reports and interactive applications and finally Part 9 introduces the reader to big data and performance computing. It also includes some helpful appendices.
 
Provides a practical guide for non-experts with a focus on business users
Contains a unique combination of topics including an introduction to R, machine learning, mathematical models, data wrangling, and reporting
Uses a practical tone and integrates multiple topics in a coherent framework
Demystifies the hype around machine learning and AI by enabling readers to understand the provided models and program them in R
Shows readers how to visualize results in static and interactive reports
Supplementary materials includes PDF slides based on the book's content, as well as all the extracted R-code and is available to everyone on a Wiley Book Companion Site
 The Big R-Book is an excellent guide for science technology, engineering, or mathematics students who wish to make a successful transition from the academic world to the professional. It will also appeal to all young data scientists, quantitative analysts, and analytics professionals, as well as those who make mathematical models.
Contents
Foreword xxv
 About the Author xxvii
 Acknowledgements xxix
 Preface xxxi
 About the Companion Site xxxv
 I Introduction 1
 1 The Big Picture with Kondratiev and Kardashev 3
 2 The Scientific Method and Data 7
 3 Conventions 11
 II Starting with R and Elements of Statistics 19
 4 The Basics of R 21
 4.1 Getting Started with R 23
 4.2 Variables 26
 4.3 Data Types 28
 4.3.1 The Elementary Types 28
 4.3.2 Vectors 29
 4.3.3 Accessing Data from a Vector 29
 4.3.4 Matrices 32
 4.3.5 Arrays 38
 4.3.6 Lists 41
 4.3.7 Factors 45
 4.3.8 Data Frames 49
 4.3.9 Strings or the Character-type 54
 4.4 Operators 57
 4.4.1 Arithmetic Operators 57
 4.4.2 Relational Operators 57
 4.4.3 Logical Operators 58
 4.4.4 Assignment Operators 59
 4.4.5 Other Operators 61
 4.5 Flow Control Statements 63
 4.5.1 Choices 63
 4.5.2 Loops 65
 4.6 Functions 69
 4.6.1 Built-in Functions 69
 4.6.2 Help with Functions 69
 4.6.3 User-defined Functions 70
 4.6.4 Changing Functions 70
 4.6.5 Creating Function with Default Arguments 71
 4.7 Packages 72
 4.7.1 Discovering Packages in R 72
 4.7.2 Managing Packages in R 73
 4.8 Selected Data Interfaces 75
 4.8.1 CSV Files 75
 4.8.2 Excel Files 79
 4.8.3 Databases 79
 5 Lexical Scoping and Environments 81
 5.1 Environments in R 81
 5.2 Lexical Scoping in R 83
 6 The Implementation of OO 87
 6.1 Base Types 89
 6.2 S3 Objects 91
 6.2.1 Creating S3 Objects 94
 6.2.2 Creating Generic Methods 96
 6.2.3 Method Dispatch 97
 6.2.4 Group Generic Functions 98
 6.3 S4 Objects 100
 6.3.1 Creating S4 Objects 100
 6.3.2 Using S4 Objects 101
 6.3.3 Validation of Input 105
 6.3.4 Constructor functions 107
 6.3.5 The Data slot 108
 6.3.6 Recognising Objects, Generic Functions, and Methods 108
 6.3.7 CreatingS4Generics 110
 6.3.8 Method Dispatch 111
 6.4 The Reference Class, refclass, RC or R5 Model 113
 6.4.1 Creating RC Objects 113
 6.4.2 Important Methods and Attributes 117
 6.5 Conclusions about the OO Implementation 119
 7 Tidy R with the Tidyverse 121
 7.1 The Philosophy of the Tidyverse 121
 7.2 Packages in the Tidyverse 124
 7.2.1 The Core Tidyverse 124
 7.2.2 The Non-core Tidyverse 125
 7.3 Working with the Tidyverse 127
 7.3.1 Tibbles 127
 7.3.2 Piping with R 132
 7.3.3 Attention Points When Using the Pipe 133
 7.3.4 Advanced Piping 134
 7.3.5 Conclusion 137
 8 Elements of Descriptive Statistics 139
 8.1 Measures of Central Tendency 139
 8.1.1 Mean 139
 8.1.2 The Median 142
 8.1.3 The Mode 143
 8.2 Measures of Variation or Spread 145
 8.3 Measures of Covariation 147
 8.3.1 The Pearson Correlation 147
 8.3.2 The Spearman Correlation 148
 8.3.3 Chi-square Tests 149
 8.4 Distributions 150
 8.4.1 Normal Distribution 150
 8.4.2 Binomial Distribution 153
 8.5 Creating an Overview of Data Characteristics 155
 9 Visualisation Methods 159
 9.1 Scatterplots 161
 9.2 Line Graphs 163
 9.3 Pie Charts 165
 9.4 Bar Charts 167
 9.5 Boxplots 171
 9.6 Violin Plots 173
 9.7 Histograms 176
 9.8 Plotting Functions 179
 9.9 Maps and Contour Plots 180
 9.10 Heat-maps 181
 9.11 Text Mining 184
 9.11.1 Word Clouds 184
 9.11.2 Word Associations 188
 9.12 Colours in R 191
 10 Time Series Analysis 197
 10.1 Time Series in R 197
 10.1.1 The Basics of Time Series in R 197
 10.2 Forecasting 200
 10.2.1 Moving Average 200
 10.2.2 Seasonal Decomposition 206
 11 Further Reading 211
 III Data Import 213
 12 A Short History of Modern Database Systems 215
 13 RDBMS 219
 14 SQL 223
 14.1 Designing the Database 223
 14.2 Building the Database Structure 226
 14.2.1 Installing a RDBMS 226
 14.2.2 Creating the Database 228
 14.2.3 Creating the Tables and Relations 229
 14.3 Adding Data to the Database 235
 14.4 Querying the Database 239
 14.4.1 The Basic Select Query 239
 14.4.2 More Complex Queries 240
 14.5 Modifying the Database Structure 244
 14.6 Selected Features of SQL 249
 14.6.1 Changing Data 249
 14.6.2 Functions in SQL 249
 15 Connecting R to an SQL Database 253
 IV Data Wrangling 257
 16 Anonymous Data 261
 17 Data Wrangling in the tidyverse 265
 17.1 Importing the Data 266
 17.1.1 Importing from an SQLRDBMS 266
 17.1.2 Importing Flat Files in the Tidyverse 267
 17.2 Tidy Data 275
 17.3 Tidying Up Data with tidyr 277
 17.3.1 Splitting Tables 278
 17.3.2 Convert Headers to Data 281
 17.3.3 Spreading One Column Over Many 284
 17.3.4 Split One Columns into Many 285
 17.3.5 Merge Multiple Columns Into One 286
 17.3.6 Wrong Data 287
 17.4 SQL-like Functionality via dplyr 288
 17.4.1 Selecting Columns 288
 17.4.2 Filtering Rows 289
 17.4.3 Joining 290
 17.4.4 Mutating Data 293
 17.4.5 Set Operations 296
 17.5 String Manipulation in the tidyverse 299
 17.5.1 Basic String Manipulation 300
 17.5.2 Pattern Matching with Regular Expressions 302
 17.6 Dates with lubridate 314
 17.6.1 ISO 8601 Format 315
 17.6.2 Time-zones 317
 17.6.3 Extract Date and Time Components 318
 17.6.4 Calculating with Date-times 319
 17.7 Factors with Forcats 325
 18 Dealing with Missing Data 333
 18.1 Reasons for Data to be Missing 334
 18.2 Methods to Handle Missing Data 336
 18.2.1 Alternative Solutions to Missing Data 336
 18.2.2 Predictive Mean Matching(PMM) 338
 18.3 R Packages to Deal with Missing Data 339
 18.3.1 mice 339
 18.3.2 missForest 340
 18.3.3 Hmisc 341
 19 Data Binning 343
 19.1 What is Binning and Why Use It 343
 19.2 Tuning the Binning Procedure 347
 19.3 More Complex Cases: Matrix Binning 352
 19.4 Weight of Evidence and Information Value 359
 19.4.1 Weight of Evidence(WOE) 359
 19.4.2 Information Value(IV) 359
 19.4.3 WOE and IV in R 359
 20 Factoring Analysis and Principle Components 363
 20.1 Principle Components Analysis (PCA) 364
 20.2 Factor Analysis 368
 V Modelling 373
 21 Regression Models 375
 21.1 Linear Regression 375
 21.2 Multiple Linear Regression 379
 21.2.1 Poisson Regression 379
 21.2.2 Non-linear Regression 381
 21.3 Performance of Regression Models 384
 21.3.1 Mean Square Error (MSE) 384
 21.3.2 R-Squared 384
 21.3.3 Mean Average Deviation(MAD) 386
 22 Classification Models 387
 22.1 Logistic Regression 388
 22.2 Performance of Binary Classification Models 390
 22.2.1 The Confusion Matrix and Related Measures 391
 22.2.2 ROC 393
 22.2.3 The AUC 396
 22.2.4 The Gini Coefficient 397
 22.2.5 Kolmogorov-Smirnov (KS) for Logistic Regression 398
 22.2.6 Finding an Optimal Cut-off 399
 23 Learning Machines 405
 23.1 Decision Tree 407
 23.1.1 Essential Background 407
 23.1.2 Important Considerations 412
 23.1.3 Growing Trees with the Package rpart 414
 23.1.4 Evaluating the Performance of a Decision Tree 424
 23.2 Random Forest 428
 23.3 Artificial Neural Networks (ANNs) 434
 23.3.1 The Basics of ANNs in R 434
 23.3.2 Neural Networks in R 436
 23.3.3 The Work-flow to for Fitting a NN 438
 23.3.4 Cross Validate the NN 444
 23.4 Support Vector Machine 447
 23.4.1 Fitting a SVM in R 447
 23.4.2 Optimizing the SVM 449
 23.5 Unsupervised Learning and Clustering 450
 23.5.1 k-Means Clustering 450
 23.5.2 Visualizing Clusters in Three Dimensions 462
 23.5.3 Fuzzy Clustering 464
 23.5.4 Hierarchical Clustering 466
 23.5.5 Other Clustering Methods 468
 24 Towards a Tidy Modelling Cycle with modelr 469
 24.1 Adding Predictions 470
 24.2 Adding Residuals 471
 24.3 Bootstrapping Data 472
 24.4 Other Functions of modelr 474
 25 Model Validation 475
 25.1 Model Quality Measures 476
 25.2 Predictions and Residuals 477
 25.3 Bootstrapping 479
 25.3.1 Bootstrapping in Base R 479
 25.3.2 Bootstrapping in the tidyverse with modelr 481
 25.4 Cross-Validation 483
 25.4.1 Elementary Cross Validation 483
 25.4.2 Monte Carlo Cross Validation 486
 25.4.3 k-Fold Cross Validation 488
 25.4.4 Comparing Cross Validation Methods 489
 25.5 Validation in a Broader Perspective 492
 26 Labs 495
 26.1 Financial Analysis with quantmod 495
 26.1.1 The Basics of quantmod 495
 26.1.2 Types of Data Available in quantmod 496
 26.1.3 Plotting with quantmod 497
 26.1.4 The quantmod Data Structure 500
 26.1.5 Support Functions Supplied by quantmod 502
 26.1.6 Financial Modelling in quantmod 504
 27 Multi Criteria Decision Analysis (MCDA) 511
 27.1 What and Why 511
 27.2 General Work-flow 513
 27.3 Identify the Issue at Hand: Steps 1 and 2 516
 27.4 Step3: the Decision Matrix 518
 27.4.1 Construct a Decision Matrix 518
 27.4.2 Normalize the Decision Matrix 520
 27.5 Step 4: Delete Inefficient and Unacceptable Alternatives 521
 27.5.1 Unacceptable Alternatives 521
 27.5.2 Dominance - Inefficient Alternatives 521
 27.6 Plotting Preference Relationships 524
 27.7 Step5: MCDA Methods 526
 27.7.1 Examples of Non-compensatory Methods 526
 27.7.2 The Weighted Sum Method(WSM) 527
 27.7.3 Weighted Product Method(WPM) 530
 27.7.4 ELECTRE 530
 27.7.5 PROMethEE 540
 27.7.6 PCA(Gaia) 553
 27.7.7 Outranking Methods 557
 27.7.8 Goal Programming 558
 27.8 Summary MCDA 561
 VI Introduction to Companies 563
 28 Financial Accounting (FA) 567
 28.1 The Statements of Accounts 568
 28.1.1 Income Statement 568
 28.1.2 Net Income: The P&L statement 568
 28.1.3 Balance Sheet 569
 28.2 The Value Chain 571
 28.3 Further, Terminology 573
 28.4 Selected Financial Ratios 575
 29 Management Accounting 583
 29.1 Introduction 583
 29.1.1 Definition of Management Accounting (MA) 583
 29.1.2 Management Information Systems (MIS) 584
 29.2 Selected Methods in MA 585
 29.2.1 Cost Accounting 585
 29.2.2 Selected Cost Types 587
 29.3 Selected Use Cases of MA 590
 29.3.1 Balanced Scorecard 590
 29.3.2 Key Performance Indicators (KPIs) 591
 30 Asset Valuation Basics 597
 30.1 Time Value of Money 598
 30.1.1 Interest Basics 598
 30.1.2 Specific Interest Rate Concepts 598
 30.1.3 Discounting 600
 30.2 Cash 601
 30.3 Bonds 602
 30.3.1 Features of a Bond 602
 30.3.2 Valuation of Bonds 604
 30.3.3 Duration 606
 30.4 The Capital Asset Pricing Model (CAPM) 610
 30.4.1 The CAPM Framework 610
 30.4.2 The CAPM and Risk 612
 30.4.3 Limitations and Shortcomings of the CAPM 612
 30.5 Equities 614
 30.5.1 Definition 614
 30.5.2 Short History 614
 30.5.3 Valuation of Equities 615
 30.5.4 Absolute Value Models 616
 30.5.5 Relative Value Models 625
 30.5.6 Selection of Valuation Methods 630
 30.5.7 Pitfalls in Company Valuation 631
 30.6 Forwards and Futures 638
 30.7 Options 640
 30.7.1 Definitions 640
 30.7.2 Commercial Aspects 642
 30.7.3 Short History 643
 30.7.4 Valuation of Options at Maturity 644
 30.7.5 The Black and Scholes Model 649
 30.7.6 The Binomial Model 654
 30.7.7 Dependencies of the Option Price 660
 30.7.8 The Greeks 664
 30.7.9 Delta Hedging 665
 30.7.10 Linear Option Strategies 667
 30.7.11 Integrated Option Strategies 674
 30.7.12 Exotic Options 678
 30.7.13 Capital Protected Structures 680
 VII Reporting 683
 31 A Grammar of Graphics with ggplot2 687
 31.1 TheBasicsofggplot2 688
 31.2 Over-plotting 692
 31.3 CaseStudyforggplot2 696
 32 R Markdown 699
 33 knitr and LATEX 703
 34 An Automated Development Cycle 707
 35 Writing and Communication Skills 709
 36 Interactive Apps 713
 36.1 Shiny 715
 36.2 Browser Born Data Visualization 719
 36.2.1 HTML-widgets 719
 36.2.2 Interactive Maps with leaflet 720
 36.2.3 Interactive Data Visualisation with ggvis 721
 36.2.4 googleVis 723
 36.3 Dashboards 725
 36.3.1 The Business Case: a Diversity Dashboard 726
 36.3.2 A Dashboard with flexdashboard 731
 36.3.3 A Dashboard with shinydashboard 737
 VIII Bigger and Faster R 741
 37 Parallel Computing 743
 37.1 Combine foreach and doParallel 745
 37.2 Distribute Calculations over LAN with Snow 748
 37.3 Using the GPU 752
 37.3.1 Getting Started with gpuR 754
 37.3.2 On the Importance of Memory use 757
 37.3.3 Conclusions for GPU Programming 759
 38 R and Big Data 761
 38.1 Use a Powerful Server 763
 38.1.1 Use R on a Server 763
 38.1.2 Let the Database Server do the Heavy Lifting 763
 38.2 Using more Memory than we have RAM 765
 39 Parallelism for Big Data 767
 39.1 Apache Hadoop 769
 39.2 Apache Spark 771
 39.2.1 Installing Spark 771
 39.2.2 Running Spark 773
 39.2.3 SparkR 776
 39.2.4 sparklyr 788
 39.2.5 SparkR or sparklyr 791
 40 The Need for Speed 793
 40.1 Benchmarking 794
 40.2 Optimize Code 797
 40.2.1 Avoid Repeating the Same 797
 40.2.2 Use Vectorisation where Appropriate 797
 40.2.3 Pre-allocating Memory 799
 40.2.4 Use the Fastest Function 800
 40.2.5 Use the Fastest Package 801
 40.2.6 Be Mindful about Details 802
 40.2.7 Compile Functions 804
 40.2.8 Use C or C++ Code in R 806
 40.2.9 Using a C++ Source File in R 809
 40.2.10CallCompiledC++Functions in R 811
 40.3 Profiling Code 812
 40.3.1 The Package profr 813
 40.3.2 The Package proftools 813
 40.4 Optimize Your Computer 817
 IX Appendices 819
 A Create your own R Package 821
 A.1 Creating the Package in the R Console 823
 A.2 Update the Package Description 825
 A.3 Documenting the Functionsxs 826
 A.4 Loading the Package 827
 A.5 Further Steps 828
 B Levels of Measurement 829
 B.1 Nominal Scale 829
 B.2 Ordinal Scale 830
 B.3 Interval Scale 831
 B.4 Ratio Scale 832
 C Trademark Notices 833
 C.1 General Trademark Notices 834
 C.2 R-Related Notices 835
 C.2.1 Crediting Developers of R Packages 835
 C.2.2 The R-packages used in this Book 835
 D Code Not Shown in the Body of the Book 839
 E Answers to Selected Questions 845
 Bibliography 859
 Nomenclature 869
 Index 881 


 
               
              


