Title: | Wavelet Analysis of Genomic Data from Admixed Populations |
---|---|
Description: | Implements wavelet-based approaches for describing population admixture. Principal Components Analysis (PCA) is used to define the population structure and produce a localized admixture signal for each individual. Wavelet summaries of the PCA output describe variation present in the data and can be related to population-level demographic processes. For more details, see J Sanderson, H Sudoyo, TM Karafet, MF Hammer and MP Cox. 2015. Reconstructing past admixture processes from local genomic ancestry using wavelet transformation. Genetics 200:469-481 <doi:10.1534/genetics.115.176842>. |
Authors: | Jean Sanderson |
Maintainer: | Murray Cox <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.3 |
Built: | 2024-11-14 04:13:36 UTC |
Source: | https://github.com/cran/adwave |
The example dataset provides simulated allele calls for 50 individuals from 3 populations. We consider a simple admixture scenario where two ancestral populations, popA (sample size = 15) and popB (sample size
= 15), merged 160 generations ago to form the admixed population, popAB (sample size
= 20). The ancestral populations contributed to the admixed population with probability 0.5.
data(admix)
data(admix)
A list with the following entries:
data
: 3,000 x 50 data matrix with 3,000 genotype calls (rows) for 50 individuals (columns). Raw entries take the value 0 if heterozygous, and -1 or 1 if homozygous. Polymorphisms are ordered by their physical positions. Column names list individual IDs, row names list polymorphism IDs.
map
: 3,000 x 2 data matrix providing a mapping to genetic distance (‘recombination distance’) for each polymorphism.
colplot
: vector of length 50. Plotting color for each individual.
populations
: list of length 3. Character vectors giving the IDs of individuals in each population. IDs must map to the column names of the data matrix.
Further description of the dataset can be found in Sanderson et al. (2015). The data were simulated using MaCS (Chen et al. 2009).
Sanderson J, H Sudoyo, TM Karafet, MF Hammer and MP Cox. 2015. Reconstructing past admixture processes from local genomic ancestry using wavelet transformation. Genetics 200:469-481. https://doi.org/10.1534/genetics.115.176842
Chen GK, P Marjoram and JD Wall. 2009. Fast and flexible simulation of DNA sequence data. Genome Research 19:136-142. https://doi.org/10.1101/gr.083634.108
data(admix) str(admix)
data(admix) str(admix)
Plotting function for objects of class adsig
.
plotsignal(x, ind = NULL, popA = NULL, popB = NULL, xlab = NULL, ylab = NULL, ylim = NULL, main = NULL)
plotsignal(x, ind = NULL, popA = NULL, popB = NULL, xlab = NULL, ylab = NULL, ylim = NULL, main = NULL)
x |
object of class |
ind |
character giving ID of a single individual to plot. |
popA |
(optional) name of ancestral population 1. |
popB |
(optional) name of ancestral population 2. |
xlab |
(optional) character string for x axis label. |
ylab |
(optional) character string for y axis label. |
ylim |
(optional) vector giving plotting range for y axis. |
main |
(optional) character string for title. |
Produces figure.
Jean Sanderson
Sanderson J, H Sudoyo, TM Karafet, MF Hammer and MP Cox. 2015. Reconstructing past admixture processes from local genomic ancestry using wavelet transformation. Genetics 200:469-481. https://doi.org/10.1534/genetics.115.176842
data(admix) # Generate the admixture signal with windowing AdexPCA2 <- signal(admix$data,popA="popA",popB="popB",populations=admix$populations, tol=0.001,n.signal=1000,window.size=0.01) # Plot resulting admixture signal for one individual plotsignal(AdexPCA2,ind="AD00001",popA=AdexPCA2$popA,popB=AdexPCA2$popB)
data(admix) # Generate the admixture signal with windowing AdexPCA2 <- signal(admix$data,popA="popA",popB="popB",populations=admix$populations, tol=0.001,n.signal=1000,window.size=0.01) # Plot resulting admixture signal for one individual plotsignal(AdexPCA2,ind="AD00001",popA=AdexPCA2$popA,popB=AdexPCA2$popB)
Produces estimates of localized ancestry for each individual.
signal(table, who = colnames(table), populations, popA = NA, popB = NA, normalize = FALSE, n.pca = 5, PCAonly = FALSE, verbose = TRUE, tol = 0.001, n.signal = NULL, window.size = NULL, genmap = NULL)
signal(table, who = colnames(table), populations, popA = NA, popB = NA, normalize = FALSE, n.pca = 5, PCAonly = FALSE, verbose = TRUE, tol = 0.001, n.signal = NULL, window.size = NULL, genmap = NULL)
table |
matrix of genotype calls (rows, length T) versus individuals (columns, length n). |
who |
individuals to include in the analysis. |
populations |
list containing a vector of IDs for each population in the analysis. |
popA |
name of ancestral population 1 (used for forming the axes of variation). Must match one of the names in |
popB |
name of ancestral population 2 (used for forming the axes of variation). Must match one of the names in |
normalize |
if |
n.pca |
number of PCA axes to compute (only the first principal component is used for forming the signals, but additional components may be desired for visualization). Default is 5. |
PCAonly |
if |
verbose |
if |
tol |
tolerance for normalization of admixture signals ( |
n.signal |
(optional) number of data points in the windowed signal. |
window.size |
(optional) size of window specified as a proportion of total length; |
genmap |
(optional) genetic distance of genotype calls, supplied as vector of length T. If specified, signals will be formulated in terms of genetic distance along the chromosome (rather than physical position). |
Applies PCA to genome-wide data using ancestral reference populations. The first eigenvector reflects the population structure. All individuals are then projected on to this axis to form the SNP-level admixture signals. PCA scores are used to estimate the proportion of admixture at the level of individuals (indP
) and populations (popP
). There is no restriction on the length of the data (number of SNPs) and the default is to provide an estimate of localized ancestry at each SNP.
Optionally, it is also possible to window the signals, producing processed signals of length n.signal
. The windows may be overlapping or disjoint with width specified through the window.size
option (see examples). If genmap
is specified, the signals will be formulated in terms of genetic distance along the chromosome (note: this function is not described in the accompanying paper).
Returns an object of class adsig
, a list with the following components:
call |
function call. |
date |
date of function call. |
individuals |
individuals for whom projections on the first principal component are calculated. |
n.snps |
number of polymorphisms in the table. |
signals |
The admixture signals, output as a |
n.tol |
the number of entries replaced by zero in the normalization procedure. This is dependent on the value set for the tolerance, tol. |
popP |
estimated proportion of admixture for each population. |
indP |
estimated proportion of admixture for each individual. |
pa.ind |
columns are principal axes in individual coordinates ( |
pa.snp |
columns are principal axes in polymorphism coordinates (T rows, |
G |
matrix of quadratic form in individual coordinates. |
ev |
vector of eigenvalues. |
gendist |
(only if |
Jean Sanderson
Sanderson J, H Sudoyo, TM Karafet, MF Hammer and MP Cox. 2015. Reconstructing past admixture processes from local genomic ancestry using wavelet transformation. Genetics 200:469-481. https://doi.org/10.1534/genetics.115.176842
data(admix) # EXAMPLE 1 # Generate the admixture signal AdexPCA <- signal(admix$data,popA="popA",popB="popB",populations=admix$populations,tol=0.001, n.signal=NULL) # Plot the resulting PCA plot(AdexPCA$pc.ind[,1],AdexPCA$pc.ind[,2],col=admix$colplot,xlab="PC1",ylab="PC2",pch=16) legend("bottomright",c("popA","popB","popAB"),col=c(3,4,2),pch=16) # EXAMPLE 2 # Generate the admixture signal with windowing AdexPCA2 <- signal(admix$data,popA="popA",popB="popB",populations=admix$populations,tol=0.001, n.signal=1000,window.size=0.01) # Plot resulting admixture signal for one individual plotsignal(AdexPCA2,ind="AD00001",popA=AdexPCA2$popA,popB=AdexPCA2$popB) # EXAMPLE 3 # Generate the admixture signal with windowing # As in EXAMPLE 2 but with n.signal reduced to 100 to provide disjoint windows AdexPCA3 <- signal(admix$data,popA="popA",popB="popB",populations=admix$populations,tol=0.001, n.signal=100,window.size=0.01) # Plot resulting admixture signal for one individual plotsignal(AdexPCA3,ind="AD00001",popA=AdexPCA2$popA,popB=AdexPCA2$popB) # EXAMPLE 4 # Generate the admixture signal in terms of genetic distance # As in EXAMPLE 2 but with genmap specified so that signals are formulated using genetic distances AdexPCA4 <- signal(admix$data,popA="popA",popB="popB",populations=admix$populations,tol=0.001, n.signal=1000,window.size=0.01,genmap=admix$map[,2]) # Plot resulting admixture signal for one individual plotsignal(AdexPCA4,ind="AD00001",popA=AdexPCA4$popA,popB=AdexPCA4$popB)
data(admix) # EXAMPLE 1 # Generate the admixture signal AdexPCA <- signal(admix$data,popA="popA",popB="popB",populations=admix$populations,tol=0.001, n.signal=NULL) # Plot the resulting PCA plot(AdexPCA$pc.ind[,1],AdexPCA$pc.ind[,2],col=admix$colplot,xlab="PC1",ylab="PC2",pch=16) legend("bottomright",c("popA","popB","popAB"),col=c(3,4,2),pch=16) # EXAMPLE 2 # Generate the admixture signal with windowing AdexPCA2 <- signal(admix$data,popA="popA",popB="popB",populations=admix$populations,tol=0.001, n.signal=1000,window.size=0.01) # Plot resulting admixture signal for one individual plotsignal(AdexPCA2,ind="AD00001",popA=AdexPCA2$popA,popB=AdexPCA2$popB) # EXAMPLE 3 # Generate the admixture signal with windowing # As in EXAMPLE 2 but with n.signal reduced to 100 to provide disjoint windows AdexPCA3 <- signal(admix$data,popA="popA",popB="popB",populations=admix$populations,tol=0.001, n.signal=100,window.size=0.01) # Plot resulting admixture signal for one individual plotsignal(AdexPCA3,ind="AD00001",popA=AdexPCA2$popA,popB=AdexPCA2$popB) # EXAMPLE 4 # Generate the admixture signal in terms of genetic distance # As in EXAMPLE 2 but with genmap specified so that signals are formulated using genetic distances AdexPCA4 <- signal(admix$data,popA="popA",popB="popB",populations=admix$populations,tol=0.001, n.signal=1000,window.size=0.01,genmap=admix$map[,2]) # Plot resulting admixture signal for one individual plotsignal(AdexPCA4,ind="AD00001",popA=AdexPCA4$popA,popB=AdexPCA4$popB)
Produces wavelet summaries for each individual and group. Returns the wavelet variance and average block size metric (ABS).
wavesum(x, populations, popA = NA, popB = NA, ml = NULL, type = "la8", t.factor = 1, fullWT = FALSE)
wavesum(x, populations, popA = NA, popB = NA, ml = NULL, type = "la8", t.factor = 1, fullWT = FALSE)
x |
object of class |
populations |
list containing a vector of individual IDs for each population in the analysis. |
popA |
name of ancestral population 1 (used for forming the axes of variation). Must match one of the names in |
popB |
name of ancestral population 2 (used for forming the axes of variation). Must match one of the names in |
ml |
number of wavelet scales in the decomposition. Must not exceed |
type |
name of the wavelet to use in the decomposition. The default, “la8”, is Daubechies Least Asymmetric wavelet of length 8. Other options include “haar”. |
t.factor |
multiplicative factor for thresholding. See paper for details. Default is 1. |
fullWT |
if |
Produces wavelet summaries for objects of class adsig
. The function computes the wavelet variance for each individual and population, extracts the informative wavelet variance based on levels observed in the ancestral populations, and computes summary measures of average block size metric (ABS) and peak wavelet scale for each individual and population.
See waveslim
documentation for details of the modwt
function and alternative wavelet options.
The code returns a list with the following components:
n.ind |
number of individuals in the analysis. |
n.group |
number of groups in the analysis. |
rv.ind |
matrix of dimension |
rv.group |
matrix of dimension |
threshold |
vector of length |
iv.ind |
matrix of dimension |
iv.group |
matrix of dimension |
abs.ind |
vector of length |
abs.group |
vector of length |
pws.ind |
vector of length |
pws.group |
vector of length |
wtmatrix |
(only if |
wtmatrix.group |
(only if |
Jean Sanderson
Sanderson J, H Sudoyo, TM Karafet, MF Hammer and MP Cox. 2015. Reconstructing past admixture processes from local genomic ancestry using wavelet transformation. Genetics 200:469-481. https://doi.org/10.1534/genetics.115.176842
data(admix) # Generate the admixture signal AdexPCA <- signal(admix$data,popA="popA",popB="popB",populations=admix$populations, tol=0.001, n.signal=NULL) # Compute wavelet summaries WSN <- wavesum(AdexPCA,populations=admix$populations,popA="popA",popB="popB") # Plot raw wavelet variance for each population barplot(WSN$rv.group[3,],ylim=c(0,0.9),col="red", names.arg=1:11,border=NA) barplot(WSN$rv.group[1,],ylim=c(0,0.9),col="green3",names.arg=1:11,border=NA,add=TRUE) barplot(WSN$rv.group[2,],ylim=c(0,0.9),col="blue", names.arg=1:11,border=NA,add=TRUE) legend("topright",c("popA","popB","popAB"),col=c(3,4,2),pch=15) box() # Plot informative wavelet variance for admixed population barplot(WSN$iv.group[3,],ylim=c(0,0.15),col="red",names.arg=1:11,border=NA) ABS <- round(WSN$abs.group[3],2) text(11,0.13,paste("ABS=",ABS)) box()
data(admix) # Generate the admixture signal AdexPCA <- signal(admix$data,popA="popA",popB="popB",populations=admix$populations, tol=0.001, n.signal=NULL) # Compute wavelet summaries WSN <- wavesum(AdexPCA,populations=admix$populations,popA="popA",popB="popB") # Plot raw wavelet variance for each population barplot(WSN$rv.group[3,],ylim=c(0,0.9),col="red", names.arg=1:11,border=NA) barplot(WSN$rv.group[1,],ylim=c(0,0.9),col="green3",names.arg=1:11,border=NA,add=TRUE) barplot(WSN$rv.group[2,],ylim=c(0,0.9),col="blue", names.arg=1:11,border=NA,add=TRUE) legend("topright",c("popA","popB","popAB"),col=c(3,4,2),pch=15) box() # Plot informative wavelet variance for admixed population barplot(WSN$iv.group[3,],ylim=c(0,0.15),col="red",names.arg=1:11,border=NA) ABS <- round(WSN$abs.group[3],2) text(11,0.13,paste("ABS=",ABS)) box()