Description
Understand applied statistics and its application in biology research
Biology and its related sciences generate prodigious quantities of data through experimentation and observation. Interpreting this data and using it to generate statistically defensible inferences has become one of the most significant components of modern biological research. There are, however, very few up-to-date resources by which graduate students and researchers in biology can familiarize themselves with the key methodologies of applied statistics as they specifically connect to the applied life sciences.
Applied Statistics in Biology remedies this oversight with a thorough, accessible overview to statistics and its biological applications. Beginning with the history and fundamentals of statistics, it covers all major statistical modes of analysis that biologists might find useful, with an eye towards a robust quantitative education for biologists. Fully up to date and addressing all conventional approaches to statistical analysis, it’s a must-own for biology students and researchers alike.
Applied Statistics in Biology readers will also find:
- Treatment rooted in years of graduate teaching in statistics and biology
- Detailed discussion of topics including regression, “non-Gaussian data,” multivariate techniques, and many more
- A valuable complement to existing resources on applied statistics
Applied Statistics in Biology is ideal for graduate students in agriculture, biology, natural resources, and related fields, as well as for instructors and researchers in these and related subjects.
Table of Contents
Preface xiii
1 Introduction 1
Statistics, 1
Application of Statistics, 1
Scientific Method, 2
Statistical Null Hypothesis, 4
Type I Error (α), 5
Type II Error (β), 5
Power of the Test, 6
P-Value Misuse, 8
Effect Size, 9
Diagnostic Tests, 10
Bias, 12
Summary, 13
SAS Code, 14
R Code, 19
JMP Method, 21
References, 25
Additional Reading, 25
2 Data Management 27
Data Management Plan, 27
Organize Files, 28
Data Workbooks, 29
Backup, 33
Securing Data, 33
Data Analysis, 33
Data Preservation, 34
Data Sharing, 35
Summary, 35
Additional Reading, 35
3 Distributions 37
Measures of Central Tendency, 37
Dispersion, 38
Accuracy and Precision, 41
Normal Distribution, 42
Normal Probability Plot, 43
Measures of Departures from Normality, 44
Tests of Normality, 45
Comparing Distributions, 48
Comparing Two Mean Estimates, 50
Student’s t-Test, 50
Wald Z-Test, 54
Bootstrap, 54
Summary, 57
SAS Code, 58
R Code, 63
JMP Method, 68
References, 80
Additional Reading, 81
4 Goodness-of-fit 83
χ 2 Distribution, 83
Enumeration Data, 83
Two Cell Tests, 85
Sample Size to Differentiate Alternative Ratios, 87
Contingency Tests, Goodness-of-Fit, 88
Contingency Tests, No Expected Distribution, 89
Meta-Analysis, 92
Summary, 94
SAS Code, 95
R Code, 100
JMP Method, 106
References, 121
Additional Reading, 121
5 Variance Analyses—gaussian 123
Factors, 123
Experimental Unit, 124
Effect Types, 124
One-Factor Analysis, 125
Experimental Error, 126
F-Distribution, 127
Replication, 128
Randomized Complete Block, 129
Arrangement, 130
Variance Analysis, 132
Block: Fixed or Random Effect?, 133
Mixed Model Analysis, 134
REML Estimation, 135
Significance of Effects, 136
Generalized Linear Mixed Model, 137
Conditional and Marginal Models, 138
Covariance Structure, 139
Negative Variance Estimates, 140
Means Comparisons, 142
Contrasts, 143
Estimate of a Difference, 144
BLUE and BLUP Estimates, 145
Multiplicity Adjustment, 146
Letter Codes, 147
Test CV, 148
Power Analyses, 148
Summary, 149
SAS Code, 149
R Code, 164
JMP Method, 173
References, 187
Additional Reading, 187
6 Correlation and Regression 189
Rank Correlations, 191
Linear Regression, 192
Model I, 194
Model II, 194
Prediction of Y from X, 196
Broad and Narrow Inference, 197
Regression Through the Origin, 198
Inverse Prediction, 198
Transformations for Linear Regression, 199
Nonlinear Regression, 203
Dosage Response, 206
Segmented or Spline Regression, 208
Logistic Regression, 209
Creating Plots for Publication, 213
Summary, 214
SAS Code, 214
R Code, 226
JMP Method, 236
References, 277
Additional Reading, 277
7 Regression in Anova 279
Unequally Spaced or Unequally Balanced Treatments, 281
Dummy Variables, 284
Optimum Treatment Level, 286
Comparison of Regression Response, 287
Comparison of Responses, 289
Non-Gaussian Data, 290
Summary, 291
SAS Code, 292
R Code, 300
JMP Method, 307
References, 324
Additional Reading, 324
8 Checking Model Fit 325
Violation of Assumptions, 326
Fit the Model to the Data, 326
Checking Assumptions, 326
Residual Types, 327
Residual Adjustment, 327
Plots of Residuals, 328
Model Modifications, 335
Fit Statistics, 337
Chi-Square/DF, 339
Link Function, 340
Outliers and Influential Observations, 340
Influence Statistics for Generalized Models, 342
Pea Study, Epilogue, 344
Summary, 345
SAS Code, 346
R Code, 355
JMP Method, 359
References, 374
Additional Reading, 375
9 Non-gaussian Data 377
Denominator df, 378
Quantitative Data, 378
Count Data, 379
Zero-Inflated Models, 382
Proportion Data, Continuous, 383
Values of 0 and 1, 383
Proportion Data, Discrete, 384
Multinomial Data, 386
Ordinal Multinomial Analysis, 387
Nominal Multinomial Analysis, 390
Compositional Data, 392
Summary, 393
SAS Code, 394
R Code, 404
JMP Method, 410
References, 424
Additional Reading, 425
10 Error Control 427
Experimental Error, 428
Variation Within Experimental Units, 428
Heterogeneity Among Experimental Units, 431
Analysis of Covariance, 431
Heterogeneity Within a Study, 436
Minimizing Heterogeneity, 437
Post-hoc Detection of Heterogeneity, 438
Spatial Error–Covariance Adjustment, 442
Beyond the RCBD, 451
Latin Square, 451
Lattice Designs, 452
Balanced Lattice, 452
Partially Balanced Lattice, 453
Simple Lattice Repeated, 454
Rectangular Lattice, 454
α-Designs, 455
Augmented Designs, 456
Partially Replicated Designs, 456
Experimental Design Software, 459
Summary, 459
SAS Code, 460
R Code, 474
JMP Method, 482
References, 503
Additional Reading, 504
11 Factorial Experiments 507
Expected Mean Squares, 509
Estimation of Variance Components, 512
Subsampling, 513
Two-Factor Factorials, 515
Three-Factor Factorials, 515
Split-Plot, 516
Do Not Under- or Over-Specify, 518
Model Specification, 518
Bias Correction, 521
Split-Block, 523
Repeated Measures, 524
Correlated Errors Are Not Restricted to Time, 527
Selection of Covariance Structure, 527
Repeated Measures, Non-Gaussian, 529
No Convergence, 532
Adjusting for Baseline, 533
Combined Experiments, 535
Coefficients for Contrasts and Estimates, 539
Investigating Interactions, 542
Fixed, Random, or a Bit of Both?, 544
Summary, 545
SAS Code, 545
R Code, 571
JMP Method, 579
References, 609
Additional Reading, 610
12 Response Surface 613
First-Order Designs, 614
Second-Order Designs, 615
Central Composite Design, 615
Central Rotatable Composite Design, 616
Mixture and Double Mixture Designs, 618
Plotting Response Surfaces, 623
Hoerl and Spline Models, 624
Avoid Extrapolation, 624
Summary, 627
SAS Code, 627
R Code, 643
JMP Method, 649
References, 661
Additional Reading, 661
13 Multiple Regression 663
Linear Model, 663
Assumptions, 664
Variable Selection—Fixed Effect Models, 664
Variable Selection—Mixed Models, 666
Multimodel Inference, 668
Collinear Variables, 670
Variance Inflation Factor, 670
Collinearity Diagnostics, 671
Adjusting Collinear Variables, 672
Polynomial Models, 672
Prediction Models Involving Collinear Variables, 673
Cross-Validation, 673
Model Validation, 674
Latent Factor Regression, 675
Summary, 680
SAS Code, 681
R Code, 692
JMP Method, 697
References, 714
Additional Reading, 714
14 Multivariate Analyses 717
Analyses of Dependence, 717
Genotypic Correlations, 719
Path Analysis, 720
Analyses of Interdependence, 722
Assumptions, 723
Example Multivariate Dataset (Grin), 723
Dimension Reduction, 726
Value of Variables, 728
Number of Components/Factors, 729
Clustering, 733
Distance Measures, 735
Cluster Methods, 737
Number of Clusters, 739
Groupings Unknown, 740
Partialling Out, 741
Cluster Validation, 743
Groupings Known, 745
Canonical Correlation Analysis, 746
Canonical Discriminant Analysis, 746
Comparing Distance Matrices, 750
Summary, 752
SAS Code, 753
R Code, 768
JMP Method, 774
References, 798
Additional Reading, 799
15 G×e Analysis 801
Fixed or Random Environments?, 802
I. Univariate Models, 803
Mean-CV, 803
Regression Coefficient, 805
Regression Deviation, 806
Random Environment Effect, 807
Yield Stability Index, 810
Superiority Measure, 811
II. Multivariate Models, 812
Biplots, 816
Confidence Intervals, 820
AMMI or GGE?, 821
G×E Analyses-Summary, 822
SAS Code, 822
R Code, 837
JMP Method, 847
References, 863
Additional Reading, 864



