Home

Package `synbreed`

1. set 200 random chosen values to NA set seed 123 ind1 lt sample 1 nrow maize coded geno 200 ind2 lt sample 1 ncol maize coded geno 200 original lt maize coded geno cbind ind1 ind2 maize coded geno cbind ind1 ind2 lt NA imputing of missing values by family structure maize imputed lt codeGeno maize coded impute TRUE impute type family label heter NULL compare in a cross table imputed lt maize imputed geno cbind ind1 ind2 t1 lt table original imputed sum of correct replacements sum diag t1 sum t1 compare with random imputation maize random lt codeGeno maize coded impute TRUE impute type random label heter NULL imputed2 lt maize random genoLcbind ind1 ind2 t2 lt table original imputed2 sum of correct replacements sum diag t2 sum t2 End Not run create gpData Create genomic prediction data object create gpData 9 Description This function combines all raw data sources in a single unified data object of class gpData This is a list with elements for phenotypic genotypic marker map pedigree and further covariate data All elements are optional Usage create gpData pheno NULL geno NULL map NULL pedigree NULL family NULL covar NULL reorderMap TRUE map unit cM repeated NULL modCovar NULL Arguments pheno data frame with individuals organized in rows and traits organized in columns
2. Example from Henderson 1977 dat lt data frame y c 132 147 156 172 time c 1 2 1 2 animal c 1 2 3 4 ped lt create pedigree ID c 6 5 1 2 3 4 Parl c 0 0 5 5 1 6 Par2 c 0 0 0 0 6 2 gp lt create gpData pheno dat pedigree ped A lt kin gp ret add assuming h2 sigma2u sigma2u sigma2 0 5 no REML fit possible due to the limited number of observations y lt 132 147 156 172 names y lt paste 1 4 mod1 lt list fit list sigma c 1 1 kin A model BLUP y y m NULL class mod1 lt gpMod predict mod1 c 5 6 prediction by hand X lt matrix 1 ncol 1 nrow 4 Z lt diag 6 c 1 2 AI lt solve A RI lt diag 4 res lt MME X Z AI RI y res b res u 1 2 40 simul pedigree simul pedigree Simulation of pedigree structure Description This function can be used to simulate a pedigree for a given number of generations and individu als Function assumes random mating within generations Inbred individuals may be generated by chance Usage simul pedigree generations 2 ids 4 animals FALSE familySize 1 Arguments generations integer Number of generations to simulate ids integer or vector of integers Number of genotypes in each generation If length equal one the same number will be replicated and used for each genera tion animals logical Should a pedigree for animals be simulated no inbreeding See Details familySize numeric
3. not used Author s Valentin Wimmer Examples plant pedigree ped lt simul pedigree gener 4 7 summary ped animal pedigree ped lt simul pedigree gener 4 7 animals TRUE summary ped summary relationship Matrix 45 summary relationshipMatrix Summary of relationship matrices Description Summary method for class relationshipMatrix Usage S3 method for class relationshipMatrix summary object Arguments object object of class relationshipMatrix not used Author s Valentin Wimmer Examples Not run data maize U lt kin codeGeno maize ret realized summary U End Not run summaryGenMap Summary of marker map information Description This function can be used to summarize information from a marker map in an object of class gpData Return value is a data frame with one row for each chromosome and one row summarizing all chromosomes Usage summaryGenMap map Arguments map data frame with columns chr and pos or a gpData object with element map 46 write beagle Details Summary statistics of differences are based on euclidian distances between markers with non missing position in map i e pos NA Value A data frame with one row for each chromosome and the intersection of all chromosomes and columns noM number of markers range range of positions i e difference between first and last marker avDist avarage distance of
4. pdf you can decide if you like to have all graph ics in one file or in multiple files Further arguments for plot Author s Hans Juergen Auinger Theresa Albrecht and Valentin Wimmer References For nonlinear regression curve Hill WG Weir BS 1988 Variances and covariances of squared linkage disequilibria in finite populations Theor Popul Biol 33 54 78 26 LDMap See Also pairwiseLD LDMap Examples Not run maize data example data maize maizeC lt codeGeno maize LD for chr 1 maizeLD lt pairwiseLD maizeC chr 1 type data frame scatterplot LDDist maizeLD type p pch 19 col hsv alpha 0 1 v 0 stacked bars with default categories LDDist maizeLD type bars stacked bars with user defined categories LDDist maizeLD type bars breaks list dist c 0 10 20 40 60 180 r2 c 1 0 6 0 4 0 3 0 1 0 End Not run LDMap LD Heatmap Description Visualization of pairwise Linkage Disequilibrium LD estimates generated by function pairwiseLD in a LD heatmap for each chromosome using the LDheatmap package Shin et al 2006 Usage LDMap LDmat gpData chr NULL file NULL fileFormat pdf onefile TRUE Arguments LDmat Object of class LDmat generated by function pairwiseLD and argument type matrix gpData Object of class gpData that was used in pairwiseLD chr numeric Return value is a plot for each chromosome in chr file Optionally a path to a file w
5. For unrepeated measures unique rownames should identify individuals For re peated measures the first column identifies individuals and a second column indicates repetitions see also argument repeated geno matrix with individuals organized in rows and markers organized in columns Genotypes could be coded arbitrarily Missing values should be coded as NA Colums or rows with only missing values not allowed Unique rownames iden tify individuals and unique colnames markers If no rownames are available they are taken from element pheno if available and if dimension matches If no colnames are used the rownames of map are used if dimension matches map data frame with one row for each marker and two columns named chr and pos First columns gives the chromosome numeric or character but not factor and second column the position on the chromosome in centimorgan or the physical distance relative to the reference sequence in basepairs Unique rownames indicate the marker names which should match with marker names in geno Note that order and number of markers must not be identical with the order in geno If this is the case gaps in the map are filled with NA to ensure the same number and order as in element geno of the resulting gpData object pedigree Object of class pedigree family data frame assigning individuals to families with names of individuals in rownames This information could be used for replacing of missing values with funct
6. frame with columns marker1 marker2 r2 and distance for all p p 1 2 marker pairs or thinned see Details For type matrix an object of class LDmat with one element for each chromosome is returned Each element is a list of 2 a p x p matrix with pairwise LD and the corresponding p x p matrix with pairwise distances Author s Valentin Wimmer References Hill WG Robertson A 1968 Linkage Disequilibrium in Finite Populations Theoretical and Applied Genetics 6 38 226 231 Purcell S Neale B Todd Brown K Thomas L Ferreira MAR Bender D Maller J Sklar P de Bakker PIW Daly MJ amp Sham PC 2007 PLINK a toolset for whole genome association and population based linkage analysis American Journal of Human Genetics 81 See Also LDDist LDMap Examples Not run data maize maizeC lt codeGeno maize maizeLD lt pairwiseLD maizeC chr 1 type data frame End Not run 32 plot LDdf plot LDdf Plot function for class LDdf Description The function visualises wheter the LD between adjacent values or visualization of pairwise Linkage Disequilibrium LD estimates generated by function pairwiseLD versus marker distance A single plot is generated for every chromosome Usage S3 method for class LDdf plot x gpData plotType dist dense FALSE nMarker TRUE centr NULL Arguments xX gpData plotType dense nMarker centr chr type br
7. model The model type see Arguments y The phenotypic records for the individuals in the training set g The predicted genetic values for the individuals in the training set m Predicted SNP effects if available kin Matrix kin Note The verbose output of the BLR function is written to a file BLRout txt in the working directory to prevent the screen output from overload Author s Valentin Wimmer Hans Juergen Auinger and Theresa Albrecht References Clifford D McCullagh P 2012 regress Gaussian Linear Models with Linear Covariance Struc ture R package version 1 3 8 URL http www csiro au Gustavo de los Campos and Paulino Perez Rodriguez 2010 BLR Bayesian Linear Regression R package version 1 2 http CRAN R project org package BLR See Also kin crossVal 22 kin Examples Not run data maize maizeC lt codeGeno maize pedigree based expected kinship matrix K lt kin maizeC ret kin DH maize covar DH marker based realized relationship matrix divide by an additional factor 2 because for testcross prediction the kinship of DH lines is used lt kin maizeC ret realized 2 BLUP models P BLUP mod1 lt gpMod maizeC model BLUP kin K G BLUP mod2 lt gpMod maizeC model BLUP kin U te th Co te Bayesian Lasso prior lt list varE list df 3 S 35 lambda list shape 0 52 rate 1e 4 value 20 type random mod3 lt gpMod maizeC model BL prior pr
8. ncol maize geno replace TRUE nrow 1 rownames newDHgeno lt newDH new pedigree newDHpedigree lt data frame ID newDH Par1 0 Par2 0 gener 0 new covar information newDHcovar lt data frame family NA DH 1 tbv 1000 row names newDH add individual Maize2 lt add individuals maize newDHpheno newDHgeno newDHpedigree newDHcovar summary maize2 add markers Add new markers to an object of class gpData Description This function adds new markers to the element geno of an object of class gpData and updates the marker map Usage add markers gpData geno map 4 add markers Arguments gpData object of class gpData to be updated geno matrix with new columns map data frame with columns chr and pos for new markers Details rownames in argument geno must match rownames in the element geno object of class gpData Value object of class gpData with new markers Author s Valentin Wimmer See Also add individuals discard markers Examples creating gpData object phenotypic data pheno lt data frame Yield rnorm 10 100 5 Height rnorm 10 10 1 rownames pheno lt 1 10 genotypic data geno lt matrix sample c 1 0 2 NA size 120 replace TRUE prob c 0 6 0 2 0 1 0 1 nrow 10 rownames geno lt 1 10 genetic map map lt data frame chr rep 1 3 each 4 pos rep 1 12 colnames geno lt rownames map lt paste M 1 12 sep
9. 1 0 1 nrow n rownames geno lt letters n 1 colnames geno lt paste M 1 12 sep create pedigree genetic map 11 one SNP is not mapped M5 and will therefore be removed map lt data frame chr rep 1 3 each 4 pos rep 1 12 map lt map 5 rownames map lt paste M c 1 4 6 12 sep simulate pedigree ped lt simul pedigree 3 c 3 3 n 6 combine in one object gp lt create gpData pheno geno map ped summary gp 9 plants with 2 traits 3 replcations n lt 9 pheno lt data frame ID rep letters 1 n 3 rep rep 1 3 each n Yield rnorm 3 n 200 5 Height rnorm 3 n 100 1 combine in one object gp2 lt create gpData pheno geno map repeated rep summary gp2 create pedigree Create pedigree object Description This function can be used to create a pedigree object Usage create pedigree ID Parl Par2 gener NULL sex NULL add ancestors FALSE Arguments ID Parl Par2 gener sex add ancestors Details vector of unique IDs identifying individuals vector of IDs identifying parent 1 with animals sire vector of IDs identifying parent 2 with animals dam vector identifying the generation If NULL gener will be 0 for unknown parents and max gener Par1 gener Par2 1 for generations 1 vector identifying the sex female 0 and male 1 logical Add ancestors which do not occur in ID to the pedigree
10. Missing values for parents in the pedigree should be coded with 0 for numeric ID or NA for character ID Value An object of class pedigree Column gener starts from 0 and pedigree is sorted by generation 12 cross Val Author s Valentin Wimmer See Also plot pedigree Examples example with 9 individuals id lt 1 9 par1 lt c 0 0 0 0 1 1 1 4 7 par2 lt c 0 0 0 0 2 3 2 5 8 gener lt c 0 0 0 0 1 1 1 2 3 create pedigree object using argument gener ped lt create pedigree id par1 par2 gener ped plot ped create pedigree object without using argument gener ped2 lt create pedigree id par1 par2 ped2 crossVal Cross validation of different prediction models Description Function for the application of the cross validation procedure on prediction models with fixed and random effects Covariance matrices must be committed to the function and variance components can be committed or reestimated with ASReml or the BLR function Usage crossVal gpData trait 1 cov matrix NULL k 2 Rep 1 Seed NULL sampling c random within popStruc across popStruc commit TS NULL ES NULL varComp NULL popStruc NULL VC est c commit ASReml BRR BL verbose FALSE Arguments gpData Object of class gpData trait numeric or character The name or number of the trait in the gpData object to be used as trait cov matrix list including covariance matr
11. Number of individuals in each full sib family in the last generation Details If animals FALSE the parents for the current generation will be randomly chosen out of the geno types in the last generation If Parl Par2 an inbreed is generated If animal TRUE each ID is either sire or dam Each ID is progeny of one sire and one dam Value An object of class pedigree with N sum ids genotypes Author s Valentin Wimmer See Also simul phenotype create pedigree plot pedigree simul phenotype 41 Examples example for plants ped lt simul pedigree gener 4 ids c 3 5 8 8 plot ped example for animals peda lt simul pedigree gener 4 ids c 3 5 8 8 animals TRUE plot peda simul phenotype Simulation of a field trial with single trait Description Simulates observations from a field trial using an animal model The field trial consists of multiple locations and randomized complete block design within locations A single quantitative trait is simulated according to the model Trait id A block loc e Usage simul phenotype pedigree NULL A NULL mu 100 ve NULL Nloc 1 Nrepl 1 Arguments pedigree object of class pedigree A object of class relationshipMatrix mu numeric Overall mean of the trait ve list containing the variance components vc consists of elements sigma2e sigma2a sigma21 sigma2b with the variance components of the residual the additive genetic effect the
12. as gpData object gp lt create gpData pheno geno map new data geno2 lt matrix c 0 0 1 1 1 2 2 1 1 2 1 2 0 2 1 1 1 2 2 2 ncol 2 rownames geno2 lt 1 10 map2 lt data frame pos c 0 3 5 chr c 1 2 rownames map2 lt colnames geno2 lt c M13 M14 adding new markers gp2 lt add markers gp geno2 map2 summary gp2 summary gp codeGeno 5 codeGeno Recode genotypic data imputation of missing values and preselection of markers Description This function combines all algorithms for processing of marker data within synbreed package Raw marker data is a matrix with elements of arbitrary format e g alleles coded as pair of observed alleles A T G C or by genotypes AA BB AB The function is limited to biallelic markers with a maximum of 3 genotypes per locus Raw data is recoded into the number of copies of the minor allele i e 0 1 and 2 Imputation of missing values can be done by random sampling from allele distribution the Beagle software or family information see details Additional preselection Of markers can be carried out according to the minor allele frequency and or fraction of missing values Usage codeGeno gpData impute FALSE impute type c random family beagle beagleAfterFamily fix replace value NULL maf NULL nmiss NULL label heter AB keep identical TRUE verbose FALSE minFam 5 showBeagleOutput FALSE tester NULL print report F
13. effects or numeric vector of marker effects to plot gpData object of class gpData with map position colored logical Color the chromosomes The default is FALSE with chromosomes distinguished by grey tones 28 MME add If TRUE the plot is added to an existing plot The default is FALSE pch a vector of plotting characters or symbols see points The default is an open circle ylab a title for the y axis see title further arguments for function plot Author s Valentin Wimmer Examples Not run data mice plot only random noise b lt rexp ncol mice geno 3 manhattanPlot b mice End Not run MME Mixed Model Equations Description Set up Mixed Model Equations for given design matrices i e variance components for random effects must be known Usage MME X Z GI RI y Arguments X Design matrix for fixed effects Z Design matrix for random effects GI Inverse of estimated variance covariance matrix of random genetic effects multplied by the ratio of residual to genetic variance RI Inverse of estimated variance covariance matrix of residuals without multi plying with a constant i e 0 y Vector of phenotypic records Details The linear mixed model is given by y Xb Zu e MME 29 with u N 0 G and e N 0 R Solutions for fixed effects b and random effects u are obtained by solving the corresponding mixed model equations Henderson 1984 x R X X R71Z BY XR ly ZRAIX
14. markers maxDist maximum distance of markers minDist minimum distance of markers Author s Valentin Wimmer See Also create gpData Examples Not run data maize summaryGenMap maize End Not run write beagle Prepare genotypic data for Beagle Description Create input file for Beagle software Browning and Browning 2009 from an object of class gpData This function is created for usage within function codeGeno to impute missing values Usage write beagle gp wdir getwd prefix Arguments gp gpData object with elements geno and map wdir character Directory for Beagle input files prefix character Prefix for Beagle input files write plink 47 Details The Beagle software must be used chromosomewise Consequently gp should contain only data from one chromosome use discard markers see Examples Value No value is returned Function creates files prefix ingput bgl with genotypic data in Beagle input format and prefix marker txt with marker information used by Beagle Author s Valentin Wimmer References BL Browning and S R Browning 2009 A unified approach to genotype imputation and haplotype phase inference for large data sets of trios and unrelated individuals Am J Hum Genet 84 210 22 See Also codeGeno Examples map lt data frame chr c 1 1 1 1 1 2 2 2 2 pos 1 9 geno lt matrix sample c 0 1 2 NA size 10 9 replace TRUE nrow 10 ncol 9 colnames geno lt row
15. of an object of class gpData Use function discard individuals to discard individuals from elements covar pheno geno and pedigree of an object of class gpData Usage discard markers gpData which discard individuals gpData which keepPedigree FALSE Arguments gpData object of class gpData which character vector either identifying the colnames of markers in geno to discard function discard markers or the rownames of individuals to discard func tion discard individuals keepPedigree logical Should the individual only be removed from elements geno and pheno but kept in the pedigree Value Object of class gpData Author s Valentin Wimmer and Hans Juergen Auinger 16 gpData2cross See Also create gpData Examples Not run example data set seed 311 pheno lt data frame Yield rnorm 10 200 5 Height rnorm 10 100 1 rownames pheno lt letters 1 10 geno lt matrix sample c A A B B NA size 120 replace TRUE prob c 0 6 0 2 0 1 0 1 nrow 10 rownames geno lt letters 1 10 colnames geno lt paste M 1 12 sep one SNP is not mapped M5 map lt data frame chr rep 1 3 each 4 pos rep 1 12 map lt map 5 rownames map lt paste M c 1 4 6 12 sep gp lt create gpData pheno pheno geno geno map map summary gp remove unmapped SNP M5 which has no postion in the map gp2 lt discard markers gp M5 summary gp2 discard ge
16. to plot on the x axis Other arguments for function igraph plotting Details The pedigree is structured top to bottom The first generation is printed in the first line Links over more than one generation are possible as well as genotypes with only one known parent Usually no structure in one generation is plotted If an effect is given the genotypes are ordered by this effect in the horizontal direction and a labeled axis is plotted at the bottom Value A named graph visualizing the pedigree structure Color is used to distinguish sex Note This function uses the plotting method for graphs in the library igraph Author s Valentin Wimmer and Hans Juergen Auinger See Also create pedigree simul pedigree plot relationshipMatrix 35 Examples example with 9 individuals id lt 1 9 parl lt c 0 0 0 0 1 1 1 4 7 par2 lt c 0 0 0 0 2 3 2 5 8 gener lt c 0 0 0 0 1 1 1 2 3 create pedigree object ped lt create pedigree id par1 par2 gener plot ped plot relationshipMatrix Heatmap for relationship Matrix Description Visualization for objects of class relationshipMatrix using a heatmap of pairwise relatedness coefficients Usage S3 method for class relationshipMatrix plot x Arguments x Object of class relationshipMatrix further graphical arguments passed to function levelplot in package lattice To create equal colorkeys for two heatmaps use at seq from to length 9 Auth
17. with u N 0 Go gives the mixed model equations X X X Z b X y ZX WZ G 1 u g Z y 14 cross Val Value An object of class 1ist with following items bu Estimated fixed and random effects of each fold within each replication n DS Size of the data set ES TS in each fold y TS Predicted values of all test sets within each replication n TS Size of the test set in each fold id TS List of IDs of each test sets within a list of each replication PredAbi Predictive ability of each fold within each replication calculated as correlation coefficient r yrs rs rankCor Spearman s rank correlation of each fold within each replication calculated be tween yrs and Yrs mse Mean squared error of each fold within each replication calculated between yrs and UTS bias Regression coefficients of a regression of the observed values on the predicted values in the TS A regression coefficient lt 1 implies inflation of predicted values and a coefficient of gt 1 deflation of predicted values m10 Mean of observed values for the 10 best predicted of each replication The k test sets are pooled within each replication k Number of folds Rep Replications sampling Sampling method Seed Seed for set seed rep seed Calculated seeds for each replication nr ranEff Number of random effects VC est method Method for the variance components committed or reestimated with ASReml BRR BL Author s Theresa Albrecht Refer
18. A SE OS Be a 47 write relationshipMatrix 2 2 ee 48 GenMap s 206 bbe a bE a SA a A ee de ae 50 relationshipMatrix 2 ee 50 Index 51 add individuals Add new individuals to objects of class gpData Description This function extends an object of class gpData by adding new phenotypes genotypes and pedigree Usage add individuals gpData pheno NULL geno NULL pedigree NULL covar NULL repl NULL Arguments gpData object of class gpData to be updated pheno data frame with new rows for phenotypes with rownames indicating individu als For repeated values the ID should be stored in a column with name ID geno matrix with new rows for genotypic data with rownames indicating individuals add markers 3 pedigree data frame with new rows for pedigree data covar data frame with new rows for covar information with rownames indicating individuals repl The column of the pheno data frame for the replicated measures If the values are not repeated or this column is named rep1 this argument is not needed Details colnames in geno pheno and pedigree must match existing names in gpData object Value object of class gpData with new individuals Author s Valentin Wimmer See Also add markers discard individuals Examples add one new DH line to maize data data maize newDHpheno lt data frame Trait 1000 row names newDH simulating genotypic data newDHgeno lt matrix sample c 0 1
19. ALSE Arguments gpData object of class gpData with arbitrary coding in element geno Missing values have to be coded as NA impute logical Should missing value be replaced by imputing impute type character with one out of fix random family beagle beagleAfterFamily default random See details replace value numeric scalar to replace missing values in case impute type fix maf numeric scalar Threshold to discard markers due to the minor allele frequency MAF Markers with a MAF lt maf are discarded thus maf in 0 0 5 If map in gpData is available markers are also removed from map nmiss numeric scalar Markers with more than nmiss fraction of missing values are discarded thus nmiss in 0 1 If map in gpData is available markers are also removed from map label heter This is either a scalar or vector of characters to identify heterozygous genotypes or a function returning TRUE if an element of the marker matrix is the heterozy gous genotype Defining a function is useful if number of unique heterozygous genotypes is large i e if genotypes are coded by alleles If the heterozygous genotype is coded like A T G C then label heter alleleCoding can be used Note that heterozygous values must be identified unambiguously by label heter Use label heter NULL if there are only homozygous geno types i e in DH lines to speed up computation and restrict imputation to values 0 and 2 keep
20. Package synbreed September 26 2012 Type Package Title Framework for the analysis of genomic prediction data using R Version 0 9 4 Date 2012 09 18 Author Valentin Wimmer Theresa Albrecht Hans Juergen Auinger Chris Carolin Schoen with con tributions by Larry Schaeffer Malena Erbe Ulrike Ober and Christian Reimer Depends R gt 2 14 lattice igraphO MASS LDheatmap qtl doBy BLR regress gt 1 3 8 abind synbreedData gt 1 3 Maintainer Valentin Wimmer lt Valentin Wimmer wzw tum de gt Description The package was developed within the Synbreed project for synergistic plant and ani mal breeding www synbreed tum de It contains a collection of functions required for ge nomic prediction in both plant and animal breeding This covers data processing data visualiza tion and analysis All functions are embedded within the framework of a single unified data ob ject The implementation is flexible with respect to a wide range of data formats This re search was funded by the German Federal Ministry of Education and Re search BMBF within the AgroClustEr Synbreed Synergistic plant and animal breed ing FKZ 0315528A URL http synbreed r forge r project org License GPL 2 LazyLoad yes LazyData no ZipData no R topics documented add individuals uaea E a e a E aa e E a a E e de aS 2 add markers lt c c son occa ee 3 COLEGIO oa a be heck de e a dd dd desa da 3 create 2pDA a pos s sopa
21. ZRIZ G ZR ly Matrix on left hand side of mixed model equation is denoted by LHS and matrix on the right hand side of MME is denoted as RHS Generalized Inverse of LHS equals prediction error variance matrix Square root of diagonal values multiplied with 0 equals standard error of prediction Note that variance components for fixed and random effects are not estimated by this function but have to be specified by the user i e GT must be multiplied with shrinkage factor A Value A list with the following arguments b Estimations for fixed effects vector u Predictions for random effects vector LHS left hand side of MME RHS right hand side of MME C Generalized inverse of LHS This is the prediction error variance matrix SEP Standard error of prediction for fixed and random effects SST Sum of Squares Total SSR Sum of Squares due to Regression residuals Vector of residuals Author s Valentin Wimmer References Henderson C R 1984 Applications of Linear Models in Animal Breeding Univ of Guelph Guelph ON Canada See Also regress crossVal Examples Not run data maize realized kinship matrix maizeC lt codeGeno maize U lt kin maizeC ret realized 2 solution with gpMod m lt gpMod maizeC kin U model BLUP gt solution with MME diag U lt diag U 0 000001 to avoid singularities 30 pairwiseLD determine shrinkage parameter lambda lt m fit sigma 2 m fit sigma 1
22. ait repl markerEffects fixed object of class gpData character Type of genomic prediction model BLUP indicates best linear unbiased prediction BLUP using REML for both pedigree based P BLUP and marker based G BLUP model BL and BRR indicate Bayesian Lasso and Bayesian Ridge Regression respectively object of class relationshipMatrix only required for model BLUP Use a pedigree based kinship to evaluate P BLUP or a marker based kinship to eval uate G BLUP For BL and BRR also a kinship structure may be used as additional polygenic effect u in the Bayesian regression models see BLR pack age logical If TRUE genetic values will be predicted for genotyped but not phe notyped individuals Default is FALSE Note that this option is only meaning ful for marker based models For pedigree based model please use function predict gpMod numeric or character A vector with names or numbers of the traits to fit the model numeric or character A vector with names or numbers of the repeated values of gpData pheno to fit the model logical Should marker effects be estimated for a G BLUP model i e RR BLUP Plose note that in this case also the variance components pertaining to model G BLUP are reported instead of those from the G BLUP model see vignette If the variance components are committed to crossVal it must be guaranteed that there also the RR BLUP model is used e g no cov matrix object sh
23. ap or a data frame with columns chr specifying the chromosome of the marker and pos position of the marker within chromosome measured with genetic or physical distances dense logical Should density visualization for high density genetic maps be used nMarker logical Print number of markers for each chromosome centr numeric vector Positions for the centromeres in the same order as chromo somes in map If maize centromere positions of maize in Mbp are used file Optionally a path to a file where the plot is saved to fileFormat character At the moment two file formats are supported pdf and png Default is pdf further graphical arguments for function plot Details The plot is similar to plotGenMap with the option dense TRUE but here the LD between adjacent markers is plotted along the chromosomes 38 predict gpMod Value Plot of neighbour LD along each chromosome One chromosome is displayed from the first to the last marker Author s Theresa Albrecht and Hans Juergen Auinger See Also plotGenMap pairwiseLD Examples Not run data maize maize2 lt codeGeno maize LD lt pairwiseLD maize2 chr 1 10 type matrix plotNeighbourLD LD maize2 nMarker FALSE End Not run predict gpMod Prediction for genomic prediction models Description S3 predict method for objects of class gpMod A genomic prediction model is used to predict the genetic performance for e g unphenotyped indi
24. ar in pheno geno and pedigree Two additional columns in covar named phenotyped and genotyped are automatically generated to identify individuals that appear in the corresponding gpData object Value Object of class gpData which is a list with the following elements covar pheno geno pedigree map phenoCovars info Note data frame with information on individuals array individuals x traits x replications with phenotypic data matrix marker matrix containing genotypic data Columns marker are in the same order as in map 1f reorderMap TRUE object of class pedigree data frame with columns chr and pos and markers sorted by pos within chr array with phenotypic covariates list with additional information on data coding of data unit in map In case of missing row names or column names in one item information is substituted from other el ements assuming the same order of individuals markers and a warning specifying the assumptions 1s returned Please check them carefully Author s Valentin Wimmer and Hans Juergen Auinger See Also codeGeno summary gpData gpData2data frame Examples set seed 123 9 plants with 2 traits n lt 9 only for n gt 6 pheno lt data frame Yield rnorm n 200 5 Height rnorm n 100 1 rownames pheno lt letters 1 n marker matrix geno lt matrix sample c AA AB BB NA Size n 12 replace TRUE prob c 0 6 0 2 0
25. ber of observations used to estimate LD Only required for type nls Optionally a path to a file where the plot is saved to character At the moment two file formats are supported pdf and png Default is pdf logical If fileFormat pdf you can decide if you like to have all graph ics in one file or in multiple files further graphical arguments for function plot plot LDmat 33 Details For more Details see at plotNeighbourLD or LDDist Author s Hans Juergen Auinger See Also plotNeighbourLD LDDist plotGenMap pairwiseLD plot LDmat Plot function for class LDmat Description A function to visualize Linkage Disequilibrium estimates between adjacent markers or isualization of pairwise Linkage Disequilibrium LD estimates generated by function pairwiseLD in a LD heatmap for each chromosome using the LDheatmap package Shin et al 2006 Usage S3 method for class LDmat plot x gpData plotType map dense FALSE nMarker TRUE centr NULL chr NULL file Arguments x Object of class LDmat i e the output of function pairwiseLD with argument type matrix gpData Object of class gpData with object map plotType You can decide if you like to have a plot with the LD of the neighbouring markers option neighbour or you like to have a heatmap of the LD default option map dense For plotType neighbour logical Should density visualization for high density genetic map
26. ce for large data sets of trios and unrelated individuals Am J Hum Genet 84 210 223 Examples create marker data for 9 SNPs and 10 homozygous individuals snp9 lt matrix c AA AA AA BB AA AA AA AA NA AA AA BB BB AA AA BB AA NA AA AA AB BB AB AA AA BB NA AA AA BB BB AA AA AA AA NA AA AA BB AB AA BB BB BB AB AA AA BB BB AA NA BB AA NA AB AA BB BB BB AA BB BB NA AA AA NA BB NA AA AA AA AA AA NA NA BB BB BB BBS BB AA AA NA AA BB BB BB AA AA NA ncol 9 byrow TRUE create gpData set names for markers and individuals colnames snp9 lt paste SNP 1 9 sep rownames snp9 lt paste ID 1 10 100 sep create object of class gpData gp lt create gpData geno snp9 code genotypic data gp coded lt codeGeno gp impute TRUE impute type random comparison gp coded geno gp geno example with heterogeneous stock mice Not run data mice summary mice heterozygous values must be labeled may run some seconds mice coded lt codeGeno mice label heter function x substr x 1 1 substr x 3 3 example with maize data and imputing by family data maize first only recode alleles maize coded lt codeGeno maize label heter NULL
27. cter A vector with the names or numbers of the trait that should be extracted from pheno Default is 1 scalar logical Only return phenotypic data scalar logical Include all individuals with phenotypes in the data frame and fill the genotypic data with NA scalar logical Include all individuals with genotypes in the data frame and fill the phenotypic data with NA character or numeric A vector which contains names or numbers of replica tion that should be drawn from the phenotypic values and covariates Default is NULL 1 e all values are used logical If TRUE columns with the phenotypic covariables are attached from element phenoCovars to the data frame Only required for repeated measure ments further arguments to be used in function reshape The argument times could be useful to rename the levels of the grouping variable such as locations or environments Argument all geno can be used to predict the genetic value of individuals without phenotypic records using the BLR package Here the genetic value of individuals with NA as phenotype is predicted by the marker profile For multiple measures phenotypic data in object gpData is arranged with replicates in an array With gpData2data frame this could be reshaped to long format with multiple observations in one column In this levels of the groupi are added Value case one column for the phenotype and 2 additional columns for the id and the ng variable suc
28. descent IBD Note that the diagonal elements of the gametic relationship matrix are 1 The off diagonals of individuals with unknown or unrelated parents in the pedigree are 0 If ret gam is specified the gametic relationship matrix constructed by pedigree is returned The gametic relationship matrix can be used to construct other types of relationship matrices If ret add the additive numerator relationship matrix is returned The additive relationship of individuals A alleles A1 A2 and B alleles B1 B2 is given by the entries of the gametic rela tionship matrix 0 5 A1 B1 41 B2 42 B1 42 B2 where 41 B1 denotes the element A1 B 1 in the gametic relationship matrix If ret kin the kinship matrix is returned which is half of the additive relationship matrix If ret dom the dominance relationship matrix is returned The dominance relationship matrix between individuals A 41 42 and B B1 B2 in case of no inbreeding is given by 41 B1 42 B2 41 B2 42 B1 where 41 C1 denotes the element A1 C1 in the gametic relationship matrix Marker based relatedness return arguments realized realizedAB sm and sm smin Function kin provides different types of measures for marker based relatedness An element geno must be available in the object of class gpData Furthermore genotypes must be coded by the number of copies of the minor allele i e function codeGeno must be a
29. e e a e ee aa 8 Cr eate pedisree gt s ge ki de ri AAA 11 Cross Val on ed a tad aware Biden Ace da ta amp od a ore Bk ee i 12 2 add individuals discatdmarkers 4 6604 5 a e a a ee 15 gpData2crOSS 2 6 s op a o RS RR Re a a 16 epData2data Irame 4 sa eed er is A A RA td 18 SpMOd a x a o a rara o ole eb dor bE Boe Gate oo 20 Ki is gie ehh ebed e a EE See ees 22 EDDISt ecu o Re Se SR Oe Spee e E Sho Bae Ee a ee 25 LDMap sran Gok BSS a ae AG a a a AS Ga ar ges a 26 manhattanPlot i 5 24 A a eR ea ea ee 21 MME s gti edie a p e a E a eh e a a a es 28 patrwiseleD 2 2 04 Faw eye oe A ti Ss A ASS A A 30 Plot EDA s sei ge ek os Dr Ga o ce ee be rd La Hs 32 plot LDmat 2468 rd e ls de a EE 39 plotpedigree s p rias OR E Se R E de 34 plot relationshipMatrix oaoa e 39 plotGenMap i oep ot e a RA e ee 36 plotNeighbourLD s po c 0 0 00 00 0020 000 37 predictgpMod s o e vi be osos oras ea a ee ee ee 38 Simul pedipre bn 5 5 4 Sa or a a e a a Got ae es 40 simul phenotype 2 a 41 summary cvData s 2444405 44 884 eee oe Ro See ee eee eee 42 Summary SpDala usaras RG Be a AG Ea wa AG Ba Oo 43 summary gpMod 2 2 2 0 00 43 summary pedigree spea r ea a k a aa a a e p a ah a a ia a p a aa 44 summary relationshipMatrix 0 0 000002 eee ee 45 summaryGenMap estira es eee See a we eS ee a ee God ew ook 45 Write beaple s 344 2 25 52 245 S244 0404 ii ES BAe eee 46 Wiite plNKk eo he ws Pe eS ee ee ee A
30. e used file Optionally a path to a file where the plot is saved to fileFormat character At the moment two file formats are supported pdf and png Default is pdf further graphical arguments for function plot Details In the low density plot the unique positions of markers are plotted as horizontal lines In the high density plot the distribution of the markers is visualized as a heatmap of density estimation together with a color key In this case the number of markers within an interval of equal bandwidth bw is counted The high density plot is typically useful if the number of markers exceeds 200 per chromosome on average Value Plot of the marker positions within each chromosome One chromosome is displayed from the first to the last marker Author s Valentin Wimmer and Hans Juergen Auinger plotNeighbourLD 37 See Also create gpData Examples Not run low density plot data maize plotGenMap maize high density plot data mice plotGenMap mice dense TRUE nMarker FALSE End Not run plotNeighbourLD Plot neighbour linkage disequilibrium Description A function to visualize Linkage Disequilibrium estimates between adjacent markers Usage plotNeighbourLD LD gpData dense FALSE nMarker TRUE centr NULL file NULL fileFormat pdf Arguments LD object of class LDmat i e the output of function pairwiseLD using argument type matrix gpData object of class gpData with object m
31. eaks file fileFormat onefile Object of class LDdf i e the output of function pairwiseLD with argument type data frame Object of class gpData with object map You can decide if you like to have a plot with the LD of the neighbouring markers option neighbour or you like to have a scatter plot of distance and LD default option dist For plotType neighbour logical Should density visualization for high density genetic maps be used For plotType neighbour logical Print number of markers for each chro mosome For plotType neighbour numeric vector Positions for the centromeres in the same order as chromosomes in map If maize centromere positions of maize in Mbp are used For plotType dist numeric scalar or vector Return value is a plot for each chromosome in chr Note Remember to add in a batch script one empty line for each chromosome if you use more than one chromosome For plotType dist character string to specify the type of plot Use p for a scatterplot bars for stacked bars or n1s for scatterplot together with nonlinear regression curve according to Hill and Weir 1988 For plotType dist list containing breaks for stacked bars optional only for type bars Components are dist with breaks for distance on x axis and r2 for breaks on for r2 on y axis By default 5 equal spaced categories for dist and r2 are used For plotType dist numeric Num
32. ences Albrecht T Wimmer V Auinger HJ Erbe M Knaak C Ouzunova M Simianer H Schoen CC 2011 Genome based prediction of testcross values in maize Theor Appl Genet 123 339 350 Mosier CI 1951 I Problems and design of cross validation 1 Educ Psychol Measurement 11 5 11 Crossa J de los Campos G Perez P Gianola D Burgueno J et al 2010 Prediction of genetic val ues of quantitative traits in plant breeding using pedigree and molecular markers Genetics 186 713 724 Gustavo de los Campos and Paulino Perez Rodriguez 2010 BLR Bayesian Linear Regression R package version 1 2 http CRAN R project org package BLR See Also summary cvData discard markers 15 Examples loading the maize data set Not run data maize Maize2 lt codeGeno maize U lt kin maize2 ret realized cross validation cv maize lt crossVal maize2 cov matrix list U k 5 Rep 1 Seed 123 sampling random varComp c 26 5282 48 5785 VC est commit cv maize2 lt crossVal maize2 k 5 Rep 1 Seed 123 sampling random varComp c 0 0704447 48 5785 VC est commit comparing results both are equal cv maize PredAbi cv maize2 PredAbi summary cv maize summary cv maize2 End Not run discard markers Subsets for objects of class gpData Description Both functions could be used to get subsets from an object of class gpData Use function discard markers to discard markers from elements geno and map
33. eno maize ret realized UL1 3 1 3 End Not run Index Topic IO write relationshipMatrix 48 Topic textasciitildekwd1 plot LDdf 32 plot LDmat 33 Topic textasciitildekwd2 plot LDdf 32 plot LDmat 33 Topic hplot LDDist 25 LDMap 26 manhattanPlot 27 plot pedigree 34 plot relationshipMatrix 35 plotGenMap 36 plotNeighbourLD 37 Topic manip add individuals 2 add markers 3 codeGeno 5 create gpData 8 create pedigree 11 discard markers 15 gpData2data frame 18 write beagle 46 write plink 47 Topic methods summary cvData 42 summary gpData 43 summary gpMod 43 summary pedigree 44 summary relationshipMatrix 45 GenMap 50 relationshipMatrix 50 add individuals 2 4 add markers 3 3 BLR 13 21 codeGeno 5 10 17 47 create gpData 8 16 17 19 37 46 create pedigree 11 34 40 cross2gpData gpData2cross 16 51 crossVal 12 21 29 43 discard individuals 3 discard individuals discard markers 15 discard markers 4 15 gpData2cross 16 gpData2data frame 0 18 gpMod 20 39 44 kin 21 22 LDDist 25 27 31 33 34 LDheatmap 27 LDMap 26 26 31 34 manhattanPlot 27 MME 28 pairwiseLD 26 27 30 33 34 38 48 plot GenMap plotGenMap 36 plot LDdf 32 plot LDmat 33 plot pedigree 12 34 40 plot relationshipMatrix 24 35 plotGenMap 33 34 36 38 plotNeighbourLD 33 34 37 points 28 predict gpMod 38 p
34. enotypes the less frequent genotype is coded as 2 and the remaining genotype as O Note that function codeGeno will terminate with an error whenever more than three genotypes are found 2 1 Discarding duplicated markers if keep identical FALSE before starting of the imputing step From identical marker based on pairwise complete oberservations one is discarded randomly For getting identical results use the function set seed before code geno 3 Replace missing values by replace value or impute missing values according to one of the following methods Imputing is done according to impute type family This option is only suitable for homozygous individuals such as doubled haploid lines structured in families Suppose an observation 2 is missing NA for a marker j in family k If marker j is fixed in family k the imputed value will be the fixed allele If marker j is segregating for the population k the value is 0 with probability of 0 5 and 2 with probability of 0 5 To use this algorithm family information has to be stored as variable family in list element covar of an object of class gpData This column should contain a character or numeric to identify family of all genotyped individuals beagle Use Beagle Genetic Analysis Software Package Browning and Browning 2007 2009 to infer missing genotypes If you use this option please cite the original papers in publications Beagle uses a HMM to reconstruct missing genotypes by the
35. es are imputed by replace value Note that only 0 1 or 2 should be chosen 4 Recoding of alleles after imputation if necessary due to changes in allele frequencies caused by the imputed alleles 5 Discarding markers with a minor allele frequency of lt maf 6 Discarding duplicated markers if keep identical FALSE From identical marker based on pair wise complete oberservations one is discarded randomly For getting identical results use the func tion set seed before code geno 7 Restoring original data format gpData matrix or data frame Information about imputing is reported after a call of codeGeno Note Beagle is included in the synbreed package Once required Beagle is called using path package Value An object of class gpData containing the recoded marker matrix If maf or nmiss were specified or keep identical FALSE dimension of geno and map may be reduced due to selection of mark ers The genotype which is homozygous for the minor allele is coded as 2 the other homozygous genotype is coded as 0 and heterozygous genotype is coded as 1 Author s Valentin Wimmer and Hans Juergen Auinger References S R Browning and B L Browning 2007 Rapid and accurate haplotype phasing and missing data inference for whole genome association studies using localized haplotype clustering Am J Hum Genet 81 1084 1097 B L Browning and S R Browning 2009 A unified approach to genotype imputation and haplotype phase inferen
36. flanking markers The beagle executive file beagle jar version 3 3 1 is in the directory exec of the package Function codeGeno will create a directory beagle for Beagle input and output files if it does not exist and run Beagle with default settings The information on marker position is taken from ele ment map Indeed the postion in map pos must be available for all markers By default three genotypes 0 1 2 are imputed To restrict the imputation only to homozygous genotypes use label heter NULL beagleAfterFamily In the first step missing genotypes are imputed according to the algo rithm with impute type family but only for markers that are fixed within the family Moreover markers with a missing position map pos NA are imputed using the algorithm of impute type family In the second step the remaining genotypes are imputed by Beagle codeGeno 7 random The missing values for a marker j are sampled from the marginal allele distribution of marker j With 2 possible genotypes to force this option use label heter NULL i e O and 2 values are sampled from distribution with probabilities P x 0 1 p and P x 2 p where p is the minor allele frequency of marker j In the standardd case of 3 genotypes i e with heterozygous genotypes values are sampled from distribution P x 0 1 p P x 1 p 1 p and P x 2 p assuming Hardy Weinberg equilibrium for all loci fix All missing valu
37. gt 1d threshold is reported when PLINK is used This argument can only be used for type data frame ld window numeric Window size for pairwise differences which will be reported by PLINK only for use plink TRUE argument 1d window kb in PLINK to thin the output dimensions Only SNP pairs with a distance lt 1d window are reported default 99999 rm unmapped logical Remove markers with unknown postion in map before using PLINK pairwiseLD 31 Details The function write plink is called to prepare the input files and the script for PLINK The ex ecutive PLINK file plink exe must be available e g in the working directory or through path variables The function pairwiseLD calls PLINK and reads the results The evaluation is per formed separately for every chromosome The measure for LD is r This is defined as D pap PAPB and E D r gt PAPBPaPb where pag is defined as the observed frequency of haplotype AB pa 1 pa and pg 1 py the observed frequencies of alleles A and B If the number of markers is high a threshold for the LD can be used to thin the output In this case only pairwise LD above the threshold is reported argument 1d window r2 in PLINK Default PLINK options used no parents no sex no pheno allow no sex ld window p ld window kb 99999 Value For type data frame an object of class LDdf with one element for each chromosome is returned Each element is a data
38. h as replications years of locations in multi environment trials A data frame with the individuals names in the first column the phenotypes in the next column s and the marker gen otypes in subsequent columns egpData2data frame Author s Valentin Wimmer and Hans Juergen Auinger See Also create gpData reshape Examples example data with unrepeated observations set seed 311 simulating genotypic and phenotypic data pheno lt data frame Yield rnorm 12 100 5 Height rnorm 12 100 1 rownames pheno lt letters 4 15 geno lt matrix sample c A A B B NA size 120 replace TRUE prob c 0 6 0 2 0 1 0 1 nrow 10 rownames geno lt letters 1 10 colnames geno lt paste M 1 12 sep different subset of individuals in pheno and geno create gpData object gp lt create gpData pheno pheno geno geno summary gp gp covar as data frame with individuals with genotypes and phenotypes gpData2data frame gp trait 1 2 as data frame with all individuals with phenotypes gpData2data frame gp 1 2 all pheno TRUE tt as data frame with all individuals with genotypes gpData2data frame gp 1 2 a11l geno TRUE example with repeated observations set seed 311 simulating genotypic and phenotypic data pheno lt data frame ID letters 1 10 Trait c rnorm 10 1 2 rnorm 10 2 0 2 rbeta 10 2 4 repl rep 1 3 each 10 geno lt matrix rep c 1 0 2 10 nrow 10 co
39. here the plot is saved to fileFormat character At the moment two file formats are supported pdf and png Default is pdf onefile logical If fileFormat pdf you can decide if you like to have all graph 1cs in one file or in multiple files Further arguments that could be passed to function LDheatmap manhattanPlot 27 Details Note If you have an LDmat object with more than one chromosome and you like to plot all chro mosomes you need to put an empty line for each chromosome in your script after the LDMap function Author s Hans Juergen Auinger Theresa Albrecht and Valentin Wimmer References Shin JH Blay S McNeney B Graham J 2006 LDheatmap An R Function for Graphical Display of Pairwise Linkage Disequilibria Between Single Nucleotide Polymorphisms Journal of Statistical Software 16 Code Snippet 3 URL http stat db stat sfu ca 8080 statgen research LDheatmap See Also pairwiseLD LDheatmap LDDist Examples Not run data maize maizeC lt codeGeno maize LD for chr 1 maizeLD lt pairwiseLD maizeC chr 1 type matrix LDMap maizeLD maizeC End Not run manhattanPlot Manhattan plot for SNP effects Description Plot of SNP effects along the chromosome e g for the visualization of marker effects generated by function gpMod Usage manhattanPlot b gpData NULL colored FALSE add FALSE pch 19 ylab NULL Arguments b object of class gpMod with marker
40. ices for the random effects Size and order of rows and columns should be equal to rownames of y If no covariance is given an identity matrix and marker genotypes are used for a marker regression In general a covariance matrix should be non singular and positive definite to be invertible if this is not the case a constant of 1e 5 is added to the diagonal elements of the covariance matrix cross Val 13 k numeric Number of folds for k fold cross validation thus k should be in 2 nrow y default 2 Rep numeric Number of replications default 1 Seed numeric Number for set seed to make results reproducable sampling Different sampling strategies can be random within popStruc or across popStruc If sampling is commit test sets have to specified in TS see Details TS A optional list of vectors with IDs for the test set in each fold within a list of replications same layout as output for id TS ES A optional list of IDs for the estimation set in each fold within each replication varComp A vector of variance components for the random effects which has to be spec ified if VC est commit The first variance components should be the same order as the given covariance matrices the last given variance component is for the residuals popStruc Vector of length nrow y assigning individuals to a population structure If no popStruc is defined family information of gpData is used Only required for options
41. identical logical Should duplicated markers be kept NOTE From a set of identical markers with respect to the non missing alleles the one with the smallest num ber of missing values is kept For those with an identical number of missing values the first one is kept and all others are removed 6 codeGeno verbose logical If TRUE verbose output is generated during the steps of the algorithm This is useful to obtain numbers of discarded markers due to different criteria minFam For impute type family and beagleAfterFamily each family should have at least minFam members with available information for a marker to impute missing values according to the family The default is 5 showBeagle0utput logical Would you like to see the output of the Beagle software package The default is FALSE tester This option is in testing mode at the moment print report logical Should a file SNPreport txt be generated containing further infor mation on SNPs This includes SNP name original coding of major and minor allele MAF and number of imputed values Details Coding of genotypic data is done in the following order depending on choice of arguments not all steps are performed 1 Discarding markers with fraction gt nmiss of missing values 2 Recoding alleles from character factor numeric into the number of copies of the minor alleles i e 0 1 and 2 In codeGeno in the first step heterozygous genotypes are coded as 1 From the other g
42. ion codeGeno covar data frame with further covariates for all individuals that either appear in pheno geno or pedigree ID e g sex or age rownames must be specified to identify individuals Typically this element is not specified by the user reorderMap logical Should markers in geno and map be reordered by chromosome number and position within chromosome according to map default TRUE map unit Character Unit of position in map i e cM for genetic distance or bp for physical distance default cM repeated This column is used to identify the replications of the phenotypic values The unique values become the names of the third dimension of the pheno object in the gpData This argument is only required for repeated measurements modCovar vector with colnames which identify columns with covariables in pheno This argument is only required for repeated measurements 10 Details create gpData The class gpData is designed to provide a unified framework for data related to genomic prediction analysis Every data source can be omitted In this case the corresponding argument must be NULL By default argument reorderMap markers in geno are ordered by their position in map Individuals are ordered in alphabetical order An object of class gpData can contain different subsets of individuals or markers in the elements pheno geno and pedigree In this case the id in covar comprises all individuals that either appe
43. ion and population based linkage analysis American Journal of Human Genetics 81 See Also pairwiseLD Examples Not run write plink maize type data frame gt write relationshipMatrix Writing relationshipMatrix in table format Description This function can be used to write an object of class relationshipMatrix in the table format used by other software i e WOMBAT or ASReml The resulting table has three columns the row the column and the entry of the inverse relationshipMatrix Usage write relationshipMatrix relationshipMatrix file NULL sorting c WOMBAT ASReml t type c ginv inv none digits 10 write relationshipMatrix 49 Arguments relationshipMatrix Object of class relationshipMatrix file Path where the output should be written If NULL the result is returned in R sorting Type of sorting Use WOMBAT for row wise sorting of the table and AS Reml for column wise sorting type A character string indicating which form of relationshipMatrix should be returned One of ginv Moore Penrose generalized inverse inv inverse or none no inverse digits Numeric The result is rounded to digits Details Note that WOMBAT only uses the generalized inverse relationship matrix and expects a file with the name ranef gin where ranef is the name of the random effect with option GIN in the MODEL part of the parameter file For ASREML ei
44. ior nIter 6000 burnIn 1000 thin 5 summary mod1 summary mod2 summary mod3 End Not run kin Relatedness based on pedigree or marker data Description This function implements different measures of relatedness between individuals in an object of class gpData 1 Expected relatedness based on pedigree and 2 realized relatedness based on marker data See Details The function uses as first argument an object of class gpData An argument ret controls the type of relatedness coefficient Usage kin gpData ret c add kin dom gam realized realizedAB sm sm smin DH NULL Arguments gpData object of class gpData ret character The type of relationship matrix to be returned See Details DH logical vector of length n TRUE or 1 if individual is a doubled haploid DH line and FALSE or 0 otherwise Only used for pedigree based relatedness coeffi cients kin 23 Details Pedigree based relatedness return arguments add kin dom and gam Function kin provides different types of measures for pedigree based relatedness An element pedigree must be available in the object of class gpData In all cases the first step is to build the gametic relationship The gametic relationship is of order 2n as each individual has two alleles e g individual A has alleles 41 and 42 The gametic relationship is defined as the matrix of probabilities that two alleles are identical by
45. lnames geno lt c M1 M2 M3 rownames geno lt letters 1 10 create gpData object gp lt create gpData pheno pheno geno geno repeated rep1 reshape of phenotypic data and merge of genotypic data tt levels of grouping variable loc are named a b and c gpData2data frame gp onlyPheno FALSE times letters 1 3 19 20 gpMod gpMod Genomic predictions models for objects of class gpData Description This function fits genomic prediction models based on phenotypic and genotypic data in an ob ject of class gpData The possible models are Best Linear Unbiased Prediction BLUP using a pedigree based or a marker based genetic relationship matrix and Bayesian Lasso BL or Bayesian Ridge regression BRR BLUP models are fitted using the REML implementation of the regress package Clifford and McCullagh 2012 The Bayesian regression models are fitted using the Gibbs Sampler of the BLR package de los Campos and Perez 2010 The covariance structure in the BLUP model is defined by an object of class relationshipMatrix The training set for the model fit consists of all individuals with phenotypes and genotypes All data is restricted to individuals from the training set used to fit the model Usage gpMod gpData model c BLUP BL BRR kin NULL predict FALSE trait 1 repl NULL markerEffects FALSE fixed NULL random NULL Arguments gpData model kin predict tr
46. location effect and the block effect Nloc numeric Number of locations in the field trial Nrepl Numeric Number of complete blocks within location Details Either pedigree or A must be specified If pedigree is given pedigree information is used to set up numerator relationship matrix with function kinship If unrelated individuals should be used for simulation use identity matrix for A True breeding values for N individuals is simulated according to following distribution tbv N 0 Ao Observations are simulated according to y N mu tbv block loc 0 If no location or block effects should appear use sigma21 0 and or sigma2b 0 42 summary cvData Value A data frame with containing the simulated values for trait and the following variables ID Factor identifying the individuals Names are extracted from pedigree or row names of A Loc Factor for Location Block Factor for Block within Location Trait Trait observations TBV Simulated values for true breeding values of individuals Results are sorted for locations within individuals Author s Valentin Wimmer See Also simul pedigree Examples Not run ped lt simul pedigree gener 5 varcom lt list sigma2e 25 sigma2a 36 sigma21 9 sigma2b 4 field trial with 3 locations and 2 blocks within locations data simul lt simul phenotype ped mu 10 vc varcom Nloc 3 Nrepl 2 head data simul analysis of variance anova 1m Trait ID Loc Loc Bl
47. multiply G with shrinkage parameter GI lt solve U lambda y lt maizeC pheno 1 n lt length y X lt matrix 1 ncol 1 nrow n mme lt MME y y Z diag n GI GI X X RI diag n comparison head m fit predicted 1 m fit beta head mme u End Not run pairwiseLD Pairwise LD between markers Description Estimate pairwise Linkage Disequilibrium LD between markers measured as r using an object of class gpData For the general case a gateway to the software PLINK Purcell et al 2007 is established to estimate the LD A within R solution is available for marker data with only 2 geno types i e homozgous inbred lines Return value is an object of class LDdf which is a data frame with one row per marker pair or an object of class LDMat which is a matrix with all marker pairs Additionally the euclidian distance between position of markers is computed and returned Usage pairwiseLD gpData chr NULL type c data frame matrix use plink FALSE 1d threshold 0 1d window 99999 rm unmapped TRUE Arguments gpData object of class gpData with elements geno and map chr numeric scalar or vector Return value is a list with pairwise LD of all markers for each chromosome in chr type character Specifies the type of return value see Value use plink logical Should the software PLINK be used for the computation 1d threshold numeric Threshold for the LD to thin the output Only pairwise LD
48. names map lt paste SNP 1 9 sep rownames geno lt paste ID 1 10 100 sep gp lt create gpData geno geno map map gp1 lt discard markers gp rownames mapLmap chr 1 Not run write beagle gp1 prefix test write plink Prepare data for PLINK Description Create input files and corresponding script for PLINK Purcell et al 2007 to estimate pairwise LD through function pairwiseLD Usage write plink gp wdir getwd prefix paste substitute gp 1d threshold 0 type c data frame matrix 1d window 99999 48 write relationshipMatrix Arguments gp gpData object with elements geno and map wdir character Directory for PLINK input files prefix character Prefix for PLINK input files ld threshold numeric Threshold for the LD used in PLINK type character Specifies the type of return value for PLINK 1d window numeric Window size for pairwise differences which will be reported by PLINK only for use plink TRUE argument 1d window kb in PLINK to thin the output dimensions Only SNP pairs with a distance lt 1d window are reported default 99999 Value No value returned Files prefix map prefix ped and prefixPlinkScript txt are created in the working directory Author s Valentin Wimmer References Purcell S Neale B Todd Brown K Thomas L Ferreira MAR Bender D Maller J Sklar P de Bakker PIW Daly MJ 8 Sham PC 2007 PLINK a toolset for whole genome associat
49. notypes with missing values in the marker matrix gp3 lt discard individuals gp names which rowSums is na gp geno gt 0 summary gp3 End Not run gpData2cross Conversion between objects of class cross and gpData Description Function to convert an object of class gpData to an object of class cross F2 intercross class in the package qt1 and vice versa If not done before function codeGeno is used for recoding in gpData2cross Usage gpData2cross gpData cross2gpData cross Arguments gpData object of class gpData with non empty elements for pheno geno and map cross object of class cross further arguments for function codeGeno Only used in gpData2cross gpData2cross 17 Details In cross genotypic data is splitted into chromosomes while in gpData genotypic data comprises all chromosomes because separation into chromosomes in not required for genomic prediction Note that coding of genotypic data differs between classes In gpData genotypic data is coded as the number of copies of the minor allele i e O 1 and 2 Thus function codeGeno should be applied to gpData before using gpData2cross to ensure correct coding In cross coding for F2 intercross is AA 1 AB 2 BB 3 When using gpData2cross or cross2gpData resulting genotypic data has correct format Value Object of class cross of gpData for function gpData2cross and cross2gpData respectively Author s Valentin Wimmer and Han
50. o R Dekkers J 2007 The Impact of Genetic Relationship information on Genome Assisted Breeding Values Genetics 177 2389 2397 vanRaden P 2008 Efficient methods to compute genomic predictions Journal of Dairy Science 91 4414 4423 Astle W and D J Balding 2009 Population Structure and Cryptic Relatedness in Genetic Asso ciation Studies Statistical Science 24 4 451 471 Reif J C Melchinger A E and Frisch M Genetical and mathematical properties of similarity and dissimilarity coefficients applied in plant breeding and seed bank management Crop Science January February 2005 vol 45 no 1 p 1 7 Rogers J 1972 Measures of genetic similarity and genetic distance In Studies in genetics VII volume 7213 Univ of Texas Austin Hayes B J and M E Goddard 2008 Technical note Prediction of breeding values using marker derived relationship matrices J Anim Sci 86 See Also plot relationshipMatrix Examples Not run data maize K lt kin maize ret kin plot K End Not run Not run data maize U lt kin codeGeno maize ret realized plot U End Not run Example for Legarra et al 2009 J Dairy Sci 92 p 4660 id lt 1 17 LDDist 25 par1 lt c 0 0 0 0 0 0 0 0 1 3 5 7 9 11 4 13 13 par2 lt c 0 0 0 0 0 0 0 0 2 4 6 8 10 12 11 15 14 ped lt create pedigree id par1 par2 gp lt create gpData pedigree ped additive relati
51. ock data data simul End Not run summary cvData Summary of options and results of the cross validation procedure Description summary method for class cvData Usage S3 method for class cvData summary object Arguments object object of class cvData not used summary gpData Author s Theresa Albrecht See Also crossVal 43 summary gpData Summary for class gpData Description S3 summary method for objects of class gpData Usage S3 method for class gpData summary object Arguments object object of class gpData not used Author s Valentin Wimmer Examples Not run data maize summary maize End Not run summary gpMod Summary for class g Mod Description S3 summary method for objects of class gpMod Usage HH S3 method for class gpMod summary object 44 Arguments object object of class gpMod not used See Also gpMod Examples Not run data maize maizeC lt codeGeno maize marker based realized relationship matrix U lt kin maizeC ret realized 2 BLUP model mod lt gpMod maizeC model BLUP kin U summary mod End Not run summary pedigree summary pedigree Summary of pedigree information Description Summary method for class pedigree Usage S3 method for class pedigree summary object Arguments object object of class pedigree
52. onship A lt kin gp ret add dominance relationship D lt kin gp ret dom gt LDDist LD versus distance Plot Description Visualization of pairwise Linkage Disequilibrium LD estimates generated by function pairwiseLD versus marker distance A single plot is generated for every chromosome Usage LDDist LDdf chr NULL type p breaks NULL n NULL file NULL fileFormat pdf onefile TRUE Arguments LDdf object of class LDdf which is the output of function pairwiseLD and argument type data frame chr numeric scalar or vector Return value is a plot for each chromosome in chr Note Remember to add in a batch script one empty line for each chromosome if you use more than one chromosome type Character string to specify the type of plot Use p for a scatterplot bars for stacked bars or n1s for scatterplot together with nonlinear regression curve according to Hill and Weir 1988 breaks list containing breaks for stacked bars optional only for type bars Com ponents are dist with breaks for distance on x axis and r2 for breaks on for r2 on y axis By default 5 equal spaced categories for dist and r2 are used n numeric Number of observations used to estimate LD Only required for type n1s file character path to a file where plot is saved to optional fileFormat character At the moment two file formats are supported pdf and png Default is pdf onefile logical If fileFormat
53. or s Valentin Wimmer and Hans Juergen Auinger Examples small pedigree ped lt simul pedigree gener 4 7 gp lt create gpData pedigree ped A lt kin gp ret add plot A big pedigree Not run data maize K lt kin maize ret kin U lt kin codeGeno maize ret realized 2 equal colorkeys plot K at seq 0 2 length 9 plot U at seq 0 2 length 9 End Not run 36 plotGenMap plotGenMap Plot marker map Description A function to visualize low and high density marker maps Usage S3 method for class GenMap plot x dense FALSE nMarker TRUE bw 1 centr NULL file NULL fileFormat pdf plotGenMap map dense FALSE nMarker TRUE bw 1 centr NULL file NULL fileFormat pdf Arguments x object of class GenMap i e the map object in a gpData object map object of class gpData with object map or a data frame with columns chr specifying the chromosome of the marker and pos position of the marker within chromosome measured with genetic or physical distances dense logical Should density visualization for high density genetic maps be used nMarker logical Print number of markers for each chromosome bw numeric Bandwidth to use for dense TRUE to control the resolution default 1 map unit centr numeric vector Positions for the centromeres in the same order as chromo somes in map If maize centromere positions of maize in Mbp ar
54. ould be specified A formula for fixed effects The details of model specification are the same as for 1m only right hand side required Only for model BLUP gpMod 21 random A formula for random effects of the model Specifies the matrices to include in the covariance structure Each term is either a symmetric matrix or a factor Independent Gaussian random effects are included by passing the corresponding block factor For mor details see regress Only for model BLUP further arguments to be used by the genomic prediction models i e prior values and MCMC options for the BLR function see BLR or parameters for the REML algorithm in regress Details By default an overall mean is added to the model If no kin is specified and model BLUP a G BLUP model will be fitted For BLUP further fixed and random effects can be added through the arguments fixed and random Only a subset of the individuals the training set is used to fit the model This contains all individuals with phenotypes and genotypes If kin does not match the dimension of the training set if e g ancestors are included the respective rows and columns from the trainings set are choosen Marker effects for model BLUP are extracted from the corresponding G BLUP model using their functional relationship In this case fit reports the G BLUP model Value Object of class gpMod which is a list of fit The model fit returned by the genomic prediction method
55. pplied in advance If ret realized the realized relatedness between individuals is computed according to the for mulas in Habier et al 2007 or vanRaden 2008 ZZ U 25 pill pi where Z W P W is the marker matrix P contains the allele frequencies multiplied by 2 p is the allele frequency of marker 7 and the sum is over all loci If ret realizedAB the realized relatedness between individuals is computed according to the formula in Astle and Balding 2009 ad wi 2pi wi 2p 4 MD 2pi 1 pi where w is the marker genotype p is the allele frequency at marker locus 7 and M is the number of marker loci and the sum is over all loci If ret sm the realized relatedness between individuals is computed according to the simple matching coefficient Reif et al 2005 The simple matching coefficient counts the number of shared alleles across loci It can only be applied to homozygous inbred lines i e only genotypes 0 and 2 To account for loci that are alike in state but not identical by descent IBD Hayes and God dard 2008 correct the simple matching coefficient by the minimum of observed simple matching coefficients S Smin I Smin where s is the matrix of simple matching coefficients This formula is used with argument ret sm smin 24 kin Value An object of class relationshipMatrix Author s Valentin Wimmer and Theresa Albrecht References Habier D Fernand
56. rint summary cvData summary cvData 42 print summary gpData summary gpData 43 print summary gpMod summary gpMod 43 print summary gpModList summary gpMod 43 print summary pedigree summary pedigree 44 print summary relationshipMatrix summary relationshipMatrix 45 read cross 17 regress 21 29 52 reshape 19 simul pedigree 34 40 42 simul phenotype 40 41 summary cvData 14 42 summary gpData 10 43 summary gpMod 43 summary gpModL ist summary gpMod 43 summary pedigree 44 summary relationshipMatrix 45 summaryGenMap 45 title 28 write beagle 46 write plink 47 write relationshipMatrix 48 INDEX
57. s Juergen Auinger References Broman K W and Churchill S S 2003 R qtl Qtl mapping in experimental crosses Bioinfor matics 19 889 890 See Also create gpData read cross codeGeno Examples Not run from gpData to cross data maize maizeC lt codeGeno maize maize cross lt gpData2cross maizeC tt descriptive statistics summary maize cross plot maize cross use function scanone maize cross lt calc genoprob maize cross step 2 5 result lt scanone maize cross pheno col 1 method em display of LOD curve along the chromosome plot result from cross to gpData data fake f2 fake f2 gpData lt cross2gpData fake f2 summary fake f2 gpData End Not run 18 gpData2data frame gpData2data frame Merge of phenotypic and genotypic data Description Create a data fra datasets using the c me out of phenotypic and genotypic data in object of class gpData by merging ommon id The shared data set could either include individuals with phenotypes and genotypes default or additional unphenotyped or ungenotyped individuals In the latter cases the missing observations are filled by NA s Usage gpData2data frame gpData trait 1 onlyPheno FALSE all pheno FALSE Arguments gpData trait onlyPheno all pheno all geno repl phenoCovars Details all geno FALSE repl NULL phenoCovars TRUE object of class gpData numeric or chara
58. s be used nMarker For plotType neighbour logical Print number of markers for each chro mosome centr For plotType neighbour numeric vector Positions for the centromeres in the same order as chromosomes in map If maize centromere positions of maize in Mbp are used chr For plotType map numeric scalar or vector Return value is a plot for each chromosome in chr Note Remember to add in a batch script one empty line for each chromosome if you use more than one chromosome file Optionally a path to a file where the plot is saved to fileFormat character At the moment two file formats are supported pdf and png Default is pdf onefile logical If fileFormat pdf you can decide if you like to have all graph ics in one file or in multiple files Further arguments that could be passed to function LDheatmap 34 plot pedigree Details For more details see at plotNeighbourLD or LDMap Author s Hans Juergen Auinger See Also plotNeighbourLD LDDist plotGenMap pairwiseLD plot pedigree Visualization of pedigree Description A function to visualize pedigree structure by a graph using the igraph package Each genotype is represented as vertex and direct offsprings are linked by an edge Usage S3 method for class pedigree plot x effect NULL Arguments x object of class pedigree or object of class gpData with element pedigree effect vector of length nrow pedigree with effects
59. sampling within popStruc or sampling across popStruc VC est Should variance components be reestimated with ASReml or with Bayesian Ridge Regression BRR or Bayesian Lasso BL of the BLR package within the estimation set of each fold in the cross validation If VC est commit the variance components have to be defined in varComp For ASRem1 ASReml soft ware has to be installed on the system verbose Logical Whether output shows replications and folds further arguments to be used by the genomic prediction models i e prior values and MCMC options for the BLR function see BLR Details In cross validation the data set is splitted into an estimation ES and a test set TS The effects are estimated with the ES and used to predict observations in the TS For sampling into ES and TS k fold cross validation is applied where the data set is splitted into k subsets and k 1 comprising the ES and 1 is the TS repeated for each subset To account for the family structure Albrecht et al 2011 sampling can be defined as random Does not account for family structure random sampling within the complete data set within popStruc Accounts for within population structure information e g each family is splitted into k subsets across popStruc Accounts for across population structure information e g ES and TS contains a set of complete families The following mixed model equation is used for VC est commit y Xb Zu e
60. ther the relationship could be saved as grm or its generalized inverse as giv Author s Valentin Wimmer References Meyer K 2006 WOMBAT A tool for mixed model analyses in quantitative genetics by REML J Zhejinag Uni SCIENCE B 8 815 821 Gilmour A Cullis B Welham S and Thompson R 2000 ASREML program user manual NSW Agriculture Orange Agricultural Institute Forest Road Orange Australia Examples Not run example with 9 individuals id lt 1 9 parl lt c 0 0 0 0 1 1 1 4 7 par2 lt c 0 0 0 0 2 3 2 5 8 gener lt c 0 0 0 0 1 1 1 2 3 ped lt create pedigree id par1 par2 gener gp lt create gpData pedigree ped A lt kin ped ret add write relationshipMatrix A type ginv End Not run 50 relationshipMatrix GenMap Extract or replace part of map data frame Description Extract or replace part of an object of class GenMap Usage S3 method for class GenMap x Arguments x object of class GenMap indices Examples Not run data maize head maize map End Not run relationshipMatrix Extract or replace part of relationship matrix Description Extract or replace part of an object of class relationshipMatrix Usage S3 method for class relationshipMatrix x Arguments x object of class relationshipMatrix indices Examples Not run data maize U lt kin codeG
61. viduals using an object of class gpMod estimated by a training set Usage S3 method for class gpMod predict object newdata Arguments object object of class gpMod which is the model used for the prediction newdata for model BL and BRR an object of class gpData with the marker data of the unphenotyped individuals For model BLUP a character vector with the names of the individuals to predict If newdata NULL the genetic performances of the individuals for the training set are returned not used predict gpMod 39 Details For models model RR and BL the prediction for the unphenotyped individuals is given by Y p W with the estimates taken from the gpMod object For the prediction using model BLUP the full relationship matrix including individuals of the training set and the prediction set must be specified in the gpMod This model is used to predict the unphenotyped individuals of the prediction set by solving the corresponding mixed model equations using the variance components of the fit in gpMod Value a named vector with the predicted genetic values for all individuals in newdata Author s Valentin Wimmer References Henderson C 1977 Best linear unbiased prediction of breeding values not in the model for records Journal of Dairy Science 60 783 787 Henderson CR 1984 Applications of linear models in animal breeding University of Guelph See Also gpMod Examples

Package `synbreed`

Contents

Download Pdf Manuals

Related Search

Related Contents