Rによる集団ゲノム遺伝学<br>Population Genomics with R

個数:1
紙書籍版価格
¥28,899
  • 電子書籍

Rによる集団ゲノム遺伝学
Population Genomics with R

  • 著者名:Paradis, Emmanuel
  • 価格 ¥20,179 (本体¥18,345)
  • Chapman and Hall/CRC(2020/05/05発売)
  • ポイント 183pt (実際に付与されるポイントはご注文内容確認画面でご確認下さい)
  • 言語:ENG
  • ISBN:9781138608184
  • eISBN:9780429882425

ファイル: /

Description

Population Genomics With R presents a multidisciplinary approach to the analysis of population genomics. The methods treated cover a large number of topics from traditional population genetics to large-scale genomics with high-throughput sequencing data. Several dozen R packages are examined and integrated to provide a coherent software environment with a wide range of computational, statistical, and graphical tools. Small examples are used to illustrate the basics and published data are used as case studies. Readers are expected to have a basic knowledge of biology, genetics, and statistical inference methods. Graduate students and post-doctorate researchers will find resources to analyze their population genetic and genomic data as well as help them design new studies.

The first four chapters review the basics of population genomics, data acquisition, and the use of R to store and manipulate genomic data. Chapter 5 treats the exploration of genomic data, an important issue when analysing large data sets. The other five chapters cover linkage disequilibrium, population genomic structure, geographical structure, past demographic events, and natural selection. These chapters include supervised and unsupervised methods, admixture analysis, an in-depth treatment of multivariate methods, and advice on how to handle GIS data. The analysis of natural selection, a traditional issue in evolutionary biology, has known a revival with modern population genomic data. All chapters include exercises. Supplemental materials are available on-line (http://ape-package.ird.fr/PGR.html).

Table of Contents

1. Introduction

Heredity, Genetics, and Genomics

Principles of Population Genomics

Units

Genome Structures

Mutations

Drift and Selection

R Packages and Conventions

Required Knowledge and Other Readings

2. Data Acquisition

Samples and Sampling Designs

How Much DNA in a Sample?

Degraded Samples

Sampling Designs

Low-Throughput Technologies

Genotypes From Phenotypes

DNA Cleavage Methods

Repeat Length Polymorphism

Sanger and Shotgun Sequencing

DNA Methylation and Bisulfite Sequencing

High-Throughput Technologies

DNA Microarrays

High-Throughput Sequencing

Restriction Site Associated DNA

RNA Sequencing

Exome Sequencing

Sequencing of Pooled Individuals

Designing a Study With HTS

The Future of DNA Sequencing

File Formats

Data Files

Archiving and Compression

Bioinformatics and Genomics

Processing Sanger Sequencing Data With sangerseqR

Read Mapping With Rsubread

Managing Read Alignments With Rsamtools

Simulation of High-Throughput Sequencing Data

Exercises

3. Genomic Data in R

What is an R Data Object?

Data Classes for Genomic Data

The Class "loci" (pegas)

The Class "genind" (adegenet)

The Classes "SNPbin" and "genlight" (adegenet)

The Class "SnpMatrix" (snpStats)

The Class "DNAbin" (ape)

The Classes "XString" and "XStringSet" (Biostrings)

The Package SNPRelate

Data Input and Output

Reading Text Files

Reading Spreadsheet Files

Reading VCF Files

Reading PED and BED Files

Reading Sequence Files

Reading Annotation Files

Writing Files

Internet Databases

Managing Files and Projects

Exercises

4. Data Manipulation

Basic Data Manipulation in R

Subsetting, Replacement, and Deletion

Commonly Used Functions

Recycling and Coercion

Logical Vectors

Memory Management

Conversions

Case Studies

Mitochondrial Genomes of the Asiatic Golden Cat

Complete Genomes of the Fruit Fly

Human Genomes

Influenza HN Virus Sequences

Jaguar Microsatellites

Bacterial Whole Genome Sequences

Metabarcoding of Fish Communities

Exercises

5. Data Exploration and Summaries

Genotype and Allele Frequencies

Allelic Richness

Missing Data

Haplotype and Nucleotide Diversity

The Class "haplotype"

Haplotype and Nucleotide Diversity From DNA Sequences

Genetic and Genomic Distances

Theoretical Background

Hamming Distance

Distances From DNA Sequences

Distances From Allele Sharing

Distances From Microsatellites

Summary by Groups

Sliding Windows

DNA Sequences

Summaries With Genomic Positions

Package SNPRelate

Multivariate Methods

Matrix Decomposition

Eigendecomposition

Singular Value Decomposition

Power Method and Random Matrices

Principal Component Analysis

adegenet

SNPRelate

flashpcaR

Multidimensional Scaling

Case Studies

Mitochondrial Genomes of the Asiatic Golden Cat

Complete Genomes of the Fruit Fly

Human Genomes

Influenza HN Virus Sequences

Jaguar Microsatellites

Bacterial Whole Genome Sequences

Metabarcoding of Fish Communities

Exercises

6. Linkage Disequilibrium and Haplotype Structure

Why Linkage Disequilibrium is Important?

Linkage Disequilibrium: Two Loci

Phased Genotypes

Theoretical Background

Implementation in pegas

Unphased Genotypes

More Than Two Loci

Haplotypes From Unphased Genotypes

The Expectation–Maximization Algorithm

Implementation in haplostats

Locus-Specific Imputation

Maps of Linkage Disequilibrium

Phased Genotypes With pegas

SNPRelate

snpStats

Case Studies

Complete Genomes of the Fruit Fly

Human Genomes

Jaguar Microsatellites

Exercises

7. Population Genetic Structure

Hardy–Weinberg Equilibrium

F-Statistics

Theoretical Background

Implementations in pegas and in mmod

Implementations in snpStats and in SNPRelate

Trees and Networks

Minimum Spanning Trees and Networks

Statistical Parsimony

Median Networks

Phylogenetic Trees

Multivariate Methods

Principles of Discriminant Analysis

Discriminant Analysis of Principal Components

Clustering

Maximum Likelihood Methods

Bayesian Clustering

Admixture

Likelihood Method

Principal Component Analysis of Coancestry

A Second Look at F-Statistics

Case Studies

Mitochondrial Genomes of the Asiatic Golden Cat

Complete Genomes of the Fruit Fly

Influenza HN Virus Sequences

Jaguar Microsatellites

Exercises

8. Geographical Structure

Geographical Data in R

Packages and Classes

Calculating Geographical Distances

A Third Look at F-Statistics

Hierarchical Components of Genetic Diversity

Analysis of Molecular Variance

Moran I and Spatial Autocorrelation

Spatial Principal Component Analysis

Finding Boundaries Between Populations

Spatial Ancestry (tessr)

Bayesian Methods (Geneland)

Case Studies

Complete Genomes of the Fruit Fly

Human Genomes

Exercises

9. Past Demographic Events

The Coalescent

The Standard Coalescent

The Sequential Markovian Coalescent

Simulation of Coalescent Data

Estimation of _

Heterozygosity

Number of Alleles

Segregating Sites

Microsatellites

Trees

Coalescent-Based Inference

Maximum Likelihood Methods

Analysis of Markov Chain Monte Carlo Outputs

Skyline Plots

Bayesian Methods

Heterochronous Samples

Site Frequency Spectrum Methods

The Stairway Method

CubSFS

Popsicle

Whole-Genome Methods (psmcr)

Case Studies

Mitochondrial Genomes of the Asiatic Golden Cat

Complete Genomes of the Fruit Fly

Influenza HN Virus Sequences

Bacterial Whole Genome Sequences

Exercises

10. Natural Selection

Testing Neutrality

Simple Tests

Selection in Protein-Coding Sequences

Selection Scans

A Fourth Look at F-Statistics

Association Studies (LEA)

Principal Component Analysis (pcadapt)

Scans for Selection With Extended Haplotypes

FST Outliers

Time-Series of Allele Frequencies

Case Studies

Mitochondrial Genomes of the Asiatic Golden Cat

Complete Genomes of the Fruit Fly

Influenza HN Virus Sequences

Exercises

A Installing R Packages

B Compressing Large Sequence Files

C Sampling of Alleles in a Population