統計・機械学習的データマイニング(第2版)<br>Statistical and Machine-Learning Data Mining : Techniques for Better Predictive Modeling and Analysis of Big Data (2 Revised)

電子版価格
¥10,423
  • 電子版あり
  • ポイントキャンペーン

統計・機械学習的データマイニング(第2版)
Statistical and Machine-Learning Data Mining : Techniques for Better Predictive Modeling and Analysis of Big Data (2 Revised)

  • ただいまウェブストアではご注文を受け付けておりません。 ⇒古書を探す
  • 製本 Hardcover:ハードカバー版/ページ数 516 p.
  • 言語 ENG
  • 商品コード 9781439860915
  • DDC分類 658.872

基本説明

Focusing on uniquely large-scale statistical models that effectively consider big data identifying structures (variables) with the appropriate predictive power in order to yield reliable, robust, relevant large scale analyses, this revised edition incorporates 13 new chapters, as well as expanded explanations of the author's own popular machine-learning GenIQ Model.

Full Description


The second edition of a bestseller, Statistical and Machine-Learning Data Mining: Techniques for Better Predictive Modeling and Analysis of Big Data is still the only book, to date, to distinguish between statistical data mining and machine-learning data mining. The first edition, titled Statistical Modeling and Analysis for Database Marketing: Effective Techniques for Mining Big Data, contained 17 chapters of innovative and practical statistical data mining techniques. In this second edition, renamed to reflect the increased coverage of machine-learning data mining techniques, the author has completely revised, reorganized, and repositioned the original chapters and produced 14 new chapters of creative and useful machine-learning data mining techniques. In sum, the 31 chapters of simple yet insightful quantitative techniques make this book unique in the field of data mining literature.The statistical data mining methods effectively consider big data for identifying structures (variables) with the appropriate predictive power in order to yield reliable and robust large-scale statistical models and analyses. In contrast, the author's own GenIQ Model provides machine-learning solutions to common and virtually unapproachable statistical problems. GenIQ makes this possible - its utilitarian data mining features start where statistical data mining stops.This book contains essays offering detailed background, discussion, and illustration of specific methods for solving the most commonly experienced problems in predictive modeling and analysis of big data. They address each methodology and assign its application to a specific type of problem. To better ground readers, the book provides an in-depth discussion of the basic methodologies of predictive modeling and analysis. While this type of overview has been attempted before, this approach offers a truly nitty-gritty, step-by-step method that both tyros and experts in the field can enjoy playing with.

Contents

IntroductionThe Personal Computer and StatisticsStatistics and Data AnalysisEDAThe EDA ParadigmEDA WeaknessesSmall and Big DataData Mining ParadigmStatistics and Machine LearningStatistical Data MiningReferencesTwo Basic Data Mining Methods for Variable AssessmentIntroductionCorrelation CoefficientScatterplotsData MiningSmoothed ScatterplotGeneral Association TestSummaryReferencesCHAID-Based Data Mining for Paired-Variable AssessmentIntroductionThe ScatterplotThe Smooth ScatterplotPrimer on CHAIDCHAID-Based Data Mining for a Smoother ScatterplotSummaryReferencesAppendixThe Importance of Straight Data: Simplicity and Desirability for Good Model-Building PracticeIntroductionStraightness and Symmetry in DataData Mining Is a High ConceptThe Correlation CoefficientScatterplot of (xx3, yy3)Data Mining the Relationship of (xx3, yy3)What Is the GP-Based Data Mining Doing to the Data?Straightening a Handful of Variables and a Dozen of Two Baker's Dozens of VariablesSummaryReferencesSymmetrizing Ranked Data: A Statistical Data Mining Method for Improving the Predictive Power of DataIntroductionScales of MeasurementStem-and-Leaf DisplayBox-and-Whiskers PlotIllustration of the Symmetrizing Ranked Data MethodSummaryReferencesPrincipal Component Analysis: A Statistical Data Mining Method for Many-Variable AssessmentIntroductionEDA Reexpression ParadigmWhat Is the Big Deal?PCA BasicsExemplary Detailed IllustrationAlgebraic Properties of PCAUncommon IllustrationPCA in the Construction of a Quasi-Interaction VariableSummaryThe Correlation Coefficient: Its Values Range between Plus/Minus 1, or Do They?IntroductionBasics of the Correlation CoefficientCalculation of the Correlation CoefficientRematchingCalculation of the Adjusted Correlation CoefficientImplication of RematchingSummaryLogistic Regression: The Workhorse of Response ModelingIntroductionLogistic Regression ModelCase StudyLogits and Logit PlotsThe Importance of Straight DataReexpressing for Straight Straight Data for Case StudyTechniques When Bulging Rule Does Not ApplyReexpressing MOS_OPENAssessing the Importance of VariablesImportant Variables for Case StudyRelative Importance of the VariablesBest Subset of Variables for Case StudyVisual Indicators of Goodness of Model PredictionsEvaluating the Data Mining WorkSmoothing a Categorical VariableAdditional Data Mining Work for Case StudySummaryOrdinary Regression: The Workhorse of Profit ModelingIntroductionOrdinary Regression ModelMini Case StudyImportant Variables for Mini Case StudyBest Subset of Variables for Case StudySuppressor Variable AGESummaryReferencesVariable Selection Methods in Regression: Ignorable Problem, Notable SolutionIntroductionBackgroundFrequently Used Variable Selection MethodsWeakness in the StepwiseEnhanced Variable Selection MethodExploratory Data AnalysisSummaryReferencesCHAID for Interpreting a Logistic Regression ModelIntroductionLogistic Regression ModelDatabase Marketing Response Model Case StudyCHAIDMultivariable CHAID TreesCHAID Market SegmentationCHAID Tree GraphsSummaryThe Importance of the Regression CoefficientIntroductionThe Ordinary Regression ModelFour QuestionsImportant Predictor VariablesP Values and Big DataReturning to Question 1Effect of Predictor Variable on PredictionThe CaveatReturning to Question 2Ranking Predictor Variables by Effect on PredictionReturning to Question 3Returning to Question 4SummaryReferencesThe Average Correlation: A Statistical Data Mining Measure for Assessment of Competing Predictive Models and the Importance of the Predictor VariablesIntroductionBackgroundIllustration of the Difference between Reliability and ValidityIllustration of the Relationship between Reliability and ValidityThe Average CorrelationSummaryReferenceCHAID for Specifying a Model with Interaction VariablesIntroductionInteraction VariablesStrategy for Modeling with Interaction VariablesStrategy Based on the Notion of a Special PointExample of a Response Model with an Interaction VariableCHAID for Uncovering RelationshipsIllustration of CHAID for Specifying a ModelAn Exploratory LookDatabase ImplicationSummaryReferencesMarket Segmentation Classification Modeling with Logistic RegressionIntroductionBinary Logistic RegressionPolychotomous Logistic Regression ModelModel Building with PLRMarket Segmentation Classification ModelSummaryCHAID as a Method for Filling in Missing ValuesIntroductionIntroduction to the Problem of Missing DataMissing Data AssumptionCHAID ImputationIllustrationCHAID Most Likely Category Imputation for a Categorical VariableSummaryReferencesIdentifying Your Best Customers: Descriptive, Predictive, and Look-Alike ProfilingIntroductionSome DefinitionsIllustration of a Flawed Targeting EffortWell-Defined Targeting EffortPredictive ProfilesContinuous TreesLook-Alike ProfilingLook-Alike Tree CharacteristicsSummaryAssessment of Marketing ModelsIntroductionAccuracy for Response ModelAccuracy for Profit ModelDecile Analysis and Cum Lift for Response ModelDecile Analysis and Cum Lift for Profit ModelPrecision for Response ModelPrecision for Profit ModelSeparability for Response and Profit ModelsGuidelines for Using Cum Lift, HL/SWMAD, and CVSummaryBootstrapping in Marketing: A New Approach for Validating ModelsIntroductionTraditional Model ValidationIllustrationThree QuestionsThe BootstrapHow to BootstrapBootstrap Decile Analysis ValidationAnother QuestionBootstrap Assessment of Model Implementation PerformanceSummaryReferencesValidating the Logistic Regression Model: Try BootstrappingIntroductionLogistic Regression ModelThe Bootstrap Validation MethodSummaryReferenceVisualization of Marketing ModelsData Mining to Uncover Innards of a ModelIntroductionBrief History of the GraphStar Graph BasicsStar Graphs for Single VariablesStar Graphs for Many Variables Considered JointlyProfile Curves MethodIllustrationSummaryReferencesAppendix 1: SAS Code for Star Graphs for Each Demographic Variable about the DecilesAppendix 2: SAS Code for Star Graphs for Each Decile about the Demographic VariablesAppendix 3: SAS Code for Profile Curves: All DecilesThe Predictive Contribution Coefficient: A Measure of Predictive ImportanceIntroductionBackgroundIllustration of Decision RulePredictive Contribution CoefficientCalculation of Predictive Contribution CoefficientExtra Illustration of Predictive Contribution CoefficientSummaryReferenceRegression Modeling Involves Art, Science, and Poetry, TooIntroductionShakespearean ModelogueInterpretation of the Shakespearean ModelogueSummaryReferenceGenetic and Statistic Regression Models: A ComparisonIntroductionBackgroundObjectiveA Pithy Summary of the Development of Genetic ProgrammingThe GenIQ Model: A Brief Review of Its Objective and Salient FeaturesThe GenIQ Model: How It WorksSummaryReferencesData Reuse: A Powerful Data Mining Effect of the GenIQ ModelIntroductionData Reuse?Illustration of Data ReuseModified Data Reuse: A GenIQ-Enhanced Regression ModelSummaryA Data Mining Method for Moderating Outliers Instead of Discarding ThemIntroductionBackgroundModerating Outliers Instead of Discarding ThemSummaryOverfitting: Old Pr oblem, New SolutionIntroductionBackgroundThe GenIQ Model Solution to OverfittingSummaryThe Importance of Straight Data: RevisitedIntroductionRestatement of Why It Is Important to Straighten Restatement of Section 4.6"Data Mining the Relationship of (xx3, yy3)"SummaryThe GenIQ Model: Its Definition and an ApplicationIntroductionWhat Is Optimization?What Is Genetic Modeling?Genetic Modeling: An IllustrationParameters for Controlling a Genetic Model RunGenetic Modeling: Strengths and LimitationsGoals of Marketing ModelingThe GenIQ Response ModelThe GenIQ Profit Case Study: Response ModelCase Study: Profit ModelSummaryReferenceFinding the Best Variables for Marketing ModelsIntroductionBackgroundWeakness in the Variable Selection MethodsGoals of Modeling in MarketingVariable Selection with GenIQNonlinear Alternative to Logistic Regression ModelSummaryReferencesInterpretation of Coefficient-Free ModelsIntroductionThe Linear Regression CoefficientThe Quasi-Regression Coefficient for Simple Regression ModelsPartial Quasi-RC for the EverymodelQuasi-RC for a Coefficient-Free ModelSummary

最近チェックした商品