データサイエンスのためのR完全入門<br>The Big R-Book : From Data Science to Learning Machines and Big Data (HAR/PSC)

個数:
電子版価格 ¥11,762
  • 電書あり

データサイエンスのためのR完全入門
The Big R-Book : From Data Science to Learning Machines and Big Data (HAR/PSC)

  • 提携先の海外書籍取次会社に在庫がございます。通常2週間で発送いたします。
    重要ご説明事項
    1. 納期遅延や、ご入手不能となる場合が若干ございます。
    2. 複数冊ご注文の場合、分割発送となる場合がございます。
    3. 美品のご指定は承りかねます。
  • 【重要:入荷遅延について】
    各国での新型コロナウィルス感染拡大により、洋書・洋古書の入荷が不安定になっています。
    弊社サイト内で表示している標準的な納期よりもお届けまでに日数がかかる見込みでございます。
    申し訳ございませんが、あらかじめご了承くださいますようお願い申し上げます。

  • 製本 Hardcover:ハードカバー版/ページ数 892 p.
  • 言語 ENG
  • 商品コード 9781119632726
  • DDC分類 005.133

Full Description


Introduces professionals and scientists to statistics and machine learning using the programming language RWritten by and for practitioners, this book provides an overall introduction to R, focusing on tools and methods commonly used in data science, and placing emphasis on practice and business use. It covers a wide range of topics in a single volume, including big data, databases, statistical machine learning, data wrangling, data visualization, and the reporting of results. The topics covered are all important for someone with a science/math background that is looking to quickly learn several practical technologies to enter or transition to the growing field of data science. The Big R-Book for Professionals: From Data Science to Learning Machines and Reporting with R includes nine parts, starting with an introduction to the subject and followed by an overview of R and elements of statistics. The third part revolves around data, while the fourth focuses on data wrangling. Part 5 teaches readers about exploring data. In Part 6 we learn to build models, Part 7 introduces the reader to the reality in companies, Part 8 covers reports and interactive applications and finally Part 9 introduces the reader to big data and performance computing. It also includes some helpful appendices.Provides a practical guide for non-experts with a focus on business usersContains a unique combination of topics including an introduction to R, machine learning, mathematical models, data wrangling, and reportingUses a practical tone and integrates multiple topics in a coherent frameworkDemystifies the hype around machine learning and AI by enabling readers to understand the provided models and program them in RShows readers how to visualize results in static and interactive reportsSupplementary materials includes PDF slides based on the book's content, as well as all the extracted R-code and is available to everyone on a Wiley Book Companion SiteThe Big R-Book is an excellent guide for science technology, engineering, or mathematics students who wish to make a successful transition from the academic world to the professional. It will also appeal to all young data scientists, quantitative analysts, and analytics professionals, as well as those who make mathematical models.

Contents

Foreword vAbout the Author viiAcknowledgements ixPreface / Why this book? xiContents xvI Introduction 11 The Big Picture with Kondratiev and Kardashev 32 The Scientific Method and Data 73 Conventions 13II Starting with R and Elements of Statistics 194 The Basics of R 214.1 Variables 274.2 Data Types 294.2.1 Elementary Data Types 294.2.2 Vectors 304.2.3 Lists 334.2.4 Matrices 394.2.5 Arrays 424.2.6 Factors 444.2.7 Data Frames 484.3 Operators 564.3.1 Arithmetic Operators 564.3.2 Relational Operators 574.3.3 Logical Operators 574.3.4 Assignment Operators 594.3.5 Other Operators 604.3.6 Loops 624.3.7 Functions 664.3.8 Packages 704.3.9 Strings 734.4 Selected Data Interfaces 764.4.1 CSV Files 764.4.2 Excel Files 804.4.3 Databases 804.5 Distributions 834.5.1 Normal Distribution 834.5.2 Binomial Distribution 855 Lexical Scoping and environments 915.1 Environments in R 925.2 Lexical Scoping in R 946 The Implementation of OO 996.1 Base Types 1026.2 S3 Objects 1046.2.1 Creating S3 objects 1076.2.2 Creating generic methods 1096.2.3 Method dispatch 1106.2.4 Group generic functions 1116.3 S4 Objects 1146.3.1 Creating S4 Objects 1146.3.2 Recognising objects, generic functions, and methods 1226.3.3 Creating S4 Generics 1246.3.4 Method dispatch 1256.4 The reference class, refclass, RC or R5 model 1276.4.1 Creating R5 objects 1276.5 OO Conclusion 1347 Tidy R with the Tidyverse 1377.1 The Philosophy of the Tidyverse 1387.2 Packages in the tidyverse 1417.3 Working with the tidyverse 1447.3.1 tibbles 1447.3.2 Piping with R 1507.3.3 Attention points when using the pipe command 1517.3.3.1 Advanced piping 1537.3.3.2 Conclusion 1558 Elements of Descriptive Statistics 1578.1 Measures of Central Tendency 1588.1.1 Mean 1588.1.2 The Median 1618.1.3 The Mode 1628.2 Measures of Variation or Spread 1648.3 Measures of Covariation 1668.4 Chi Square Tests 1699 Further Reading 171III Data Import 17310 A short history of modern database systems 17511 RDBMS 17912 SQL 18312.1 Designing the database 18412.2 Building the database 18712.3 Adding data to the database 19612.4 Querying the database 20012.5 Modifying an existing database 20612.6 Advanced features of SQL 21113 Connecting R to an SQL database 215IV Data Wrangling 22114 Anonymising Data 22515 DataWrangling in the tidyverse 22915.1 Tidy data 23015.2 Importing the data 23215.2.1 Importing from an SQL RDBMS 23215.2.2 Importing flat files in the tidyverse 23415.2.2.1 CSV Files 23615.2.2.2 Making sense of fixed width files 23815.3 Tidying up data with tidyr 24315.3.1 Splitting tables 24415.3.2 headers to data 24915.3.3 Spreading one column over many 25015.3.4 separate 25215.3.5 Unite 25415.3.6 Wrong Data 25515.4 Playing with tipples: SQL-like functionality 25615.4.1 Selecting 25615.4.2 Filtering 25615.4.3 Joining 25815.4.4 Mutating 26215.4.5 Set Operations 26515.5 String Manipulation in the tidyverse 26815.5.1 Basic string manipulation 26915.5.2 Pattern matching with regular expressions 27215.5.2.1 Regular Expressions 27315.5.2.2 Functions using Regex 27915.6 Dates with lubridate 28715.6.0.1 ISO 8601 Format 28815.6.0.2 Timezones 29015.6.0.3 Extract and set date and time components 29115.6.0.4 Calculating with date-times 29315.7 Factors with forcats 29816 Dealing with missing data 30717 Data Binning 31917.1 Tuning the binning procedure 32317.2 More complex cases: matrix binning 32917.3 Weight of evidence and information value 33618 Factoring analysis and principle components 33918.1 Principle components analysis 34018.2 Factor Analysis 345V Explore Data 34919 Using Descriptive Statistics 35320 Standard Charts & Graphs 35720.1 Pie Charts 35820.2 Bar Charts 35920.3 Boxplots 36120.4 Violin plots 36320.5 Histograms 36620.6 Scatterplots 36820.7 Line Graphs 37120.8 Plotting Functions 37320.9 Maps and contour plots 37421 Elected Visualization Methods 37721.1 Heat-maps 37721.2 Text Mining 37921.2.1 Word Clouds 37921.2.2 Word Associations 38321.3 Colours in R 38622 Time Series Analysis 39322.1 Time Series in R 39422.2 Forecasting 39722.2.1 Moving Average 39722.2.2 Seasonal Decomposition 403VI Modelling 40923 Regression Models 41123.1 Linear Regression 41123.2 Multiple Linear Regression 41523.2.1 Poisson Regression 41623.2.2 Non-Linear Regression 41823.3 Performance of regression models 42123.3.1 Mean Square Error (MSE) 42123.3.2 R-Squared 42123.3.3 Mean Average Deviation (MAD) 42324 Classification Models 42524.1 Logistic Regression 42524.2 The performance of binary classification models 42724.2.1 The Confusion Matrix and related measures 42824.2.2 ROC 43124.2.3 AUC 43324.2.4 AUC Gini for logistic regression 43524.2.5 Kolmogorov-Smirnov (KS) for logistic regression 43624.2.6 Finding an Optimal Cut-off 43925 Learning Machines 44525.1 Decision Tree 44725.1.1 Essential Background 44725.1.2 Important considerations 45225.1.3 Growing trees with R 45525.1.4 Evaluating the performance of a decision tree 46325.1.4.1 The performance of the regression tree 46425.1.4.2 The performance of the classification tree 46425.2 Random Forest 46725.3 Artificial Neural Networks (ANN) 47225.3.1 The basics of ANNs in R 47225.3.2 An example of a work-flow to develop an ANN 47525.4 Support Vector Machine 48325.5 Unsupervised learning and clustering 48725.5.1 k-means clustering 48825.5.2 Fuzzy clustering 50125.5.3 Hierarchical clustering 50425.5.4 Other clustering methods 50626 Towards a tidy modelling cycle with modelr 50727 Model Validation 51327.1 Model quality measures 51527.2 Predictions and residuals 51627.3 Bootstrapping 51727.4 Cross-Validation 52027.4.1 training and validating 52127.5 Monte-Carlo Cross Validation 52527.6 k-Fold Cross Validation 52727.7 Comparison 52927.8 Validation in a broader perspective 53028 Labs 53528.1 Financial Analysis with QuantMod 53528.1.1 The quantmod data structure 53928.1.2 Support functions supplied by quantmod 54328.1.3 Financial modelling in quantmod 54529 Multi Criteria Decision Analysis (MCDA) 55329.1 What and Why 55329.2 GeneralWork-flow 55529.3 Identify the issue at hand: step 1 and 2 55929.4 STEP 3: the decision matrix 56129.4.1 Construct a decision matrix 56129.4.2 Normalize the decision matrix 56329.5 STEP 4: leave out inefficient and unacceptable alternatives 56529.5.1 Unacceptable Alternatives 56529.5.2 Dominance- inefficient alternatives 56529.6 Printing preference relationships 56829.7 STEP 6: MCDA Methods 57029.7.1 Examples of non-compensatory methods 57029.7.2 The weighted sum method (WSM) 57129.7.3 WPM 57429.7.4 ELECTRE 57529.7.4.1 ELECTRE I 57629.7.4.2 ELECTRE II 58229.7.5 PROMethEE 58429.7.5.1 PROMethEE I 58729.7.5.2 PROMethEE II 59729.7.6 PCA (Gaia) 60229.7.7 Outranking methods 60729.7.8 Goal Programming 60829.8 Summary MCDA 611VII Introduction to Companies 61330 Financial Accounting 61730.1 The Statements of Accounts 61830.1.1 Income Statement 61830.1.2 Net Income: The P&L statement 61830.1.3 Balance Sheet 61930.2 The Value Chain 62130.3 Further Terminology 62330.4 Selected Financial Ratios 62531 Management Accounting 62731.1 Introduction 62831.2 Selected Methods in MA 63031.2.1 Cost Accounting 63031.2.2 Selected Cost Types 63231.3 Selected Use Cases of MA 63531.3.1 Balanced Scorecard 63531.3.2 Key Performance Indicators 63631.3.2.1 Selection of KPIs 63832 Asset Valuation Basics 64132.1 Time Value of Money 64232.2 Cash 64532.3 Bonds 64632.3.1 Valuation of Bonds 64832.3.2 Duration 65032.3.2.1 Macaulay Duration 65132.3.2.2 Modified Duration 65232.4 Equities 65432.4.1 Valuation of Equities 65532.4.1.1 CAPM 65632.4.2 Absolute Value Models 66032.4.2.1 Dividend Discount Model 66032.4.2.2 Free Cash Flow (FCF) 66432.4.2.3 Discounted Cash Flow Model 66632.4.2.4 Discounted Abnormal Operating Earnings valuation model 66832.4.2.5 Net Asset Value Method or Cost Method 66832.4.2.6 Excess Earnings Method 67032.4.3 Relative Value Models 67032.4.3.1 The Idea behind Relative Value Models 67032.4.3.2 Some Ratios that can be used in relative value models 67132.4.3.3 Measures Related to Company Value for External Stakeholders 67332.4.3.4 Relative Value Models in Practice 68032.4.3.5 Conclusions and Use 68032.4.4 Selection of Valuation Methods 68132.4.5 Pitfalls and Matters Requiring Attention for all Methods 68232.4.5.1 Results and Sensitivity 68232.5 Forwards and Futures 69032.6 Options 69232.6.1 Definitions 69232.6.2 Commercial Aspects 69532.6.3 Historic observations 69632.6.4 Valuation of Options at Maturity 69732.6.5 The Put-Call Parity 70032.6.6 The Black & Scholes Model 70232.6.6.1 Apply the Black and Scholes formula 70332.6.7 Dependencies 70532.6.8 Sensitivities: "the Greeks" 71032.6.9 Delta Hedging 71132.6.10 Linear Option Strategies 71432.6.10.1 The Limits of the Black and Scholes Model 72032.6.11 The Binomial Model 72432.6.11.1 Risk Neutral Method 72732.6.11.2 The Equivalent Portfolio Binomial Model 72932.6.11.3 Summary Binomial Model 73232.6.12 Exotic Options 73232.6.13 Integrated Option Strategies 73332.6.14 Capital Protected Structures 736VIII Report 73933 ggplot2 74334 R-markdown 75335 knitr and LATEX 75736 An automated development cycle 76137 Writing and communication skills 76338 Interactive apps 76738.1 Shiny 76938.2 Browser born data visualization 77338.2.1 HTML-widgets 77338.2.2 ggvis 77538.2.3 googleVis 77738.3 Dashboards 77938.3.1 The business case: a diversity dashboard 78038.3.2 A dashboard with flexdashboard 78538.3.2.1 Interactive dashboards with flexdashboard 79038.3.3 A dashboard with shinydashboard 791IX Appendices 79539 Other Resources 79740 Levels of Measurement 79940.1 Nominal Scale 80040.2 Ordinal Scale 80140.3 Interval Scale 80240.4 Ratio Scale 80341 Trademark Notices 80542 Code snippets not shown in the body of the book 80943 Answers to questions 815Bibliography 829Index 839Nomenclature 851