Home
User Manual for GIGI-Check v1.06
Contents
1. changes so that GIGI is now compatible with the new Inheritance Vectors file format generated by MORGAN s gl auto version 3 2 Because this new file format no longer requires users to provide GIGI with the meiosis indexes GIGI can now directly use the MORGAN pedigree file In the past version that uses MORGAN s gl auto version 3 1 or earlier users must create the pedigree meiosis file from manually parsing the console output of gl auto Therefore to avoid confusion we suggest our users to use the pedigree file instead of the pedigree meiosis file from now on We also understand the importance of backward compatibility You may continue to use gl auto s output from the pre v3 2 and the pedigree meiosis file in place of the pedigree file GIGI will detect which kind of the file you are using and read it properly Files in GIGI Check software distribution GIGI Check software code and its dependency files the Mersenne random number generator e example folder Inferring IVs using gl auto The first step is to use framework markers to infer IVs For this purpose we use gl auto a program in the MORGAN package that is freely available at http www stat washington edu thompson Genepi MORGAN Morgan shtml To infer IVs in gl auto we need to supply the required files in MORGAN format 1 Pedigree file 2 Marker file this is a composite file that contains the map positions of framework markers in centiMorgans assuming the Haldane map functi
2. 999999 to print the progress of imputation at every NUM markers to specify the absolute directory path of where the output files will be created If this flag is missing the output files will be saved to the user s current directory e g home charles output no character at the end of the directory name to read dense genotype marker file as the long format rows are markers and columns are genotypes of individuals to only perform error detection using the faster Method A1 Note even though summary statistic Al is not affected by allele frequencies of markers and an error probability GIGI Check requires specification of allele frequencies and error probability For example for each multi allelic marker the correct number of allele frequency values must be specified for the marker even though the values can be arbitrary this threshold determine the error detection threshold for which markers to be flagged by Al as summarized in the output file summary markers flagged txt this number must be between 0 and 1 default is 0 05 99 An example of running GIGI Check with additional options JGIGI Check example param v3_2 txt outD home charles GIGI Check_output To run GIGI Check with error detection threshold of 0 1 and with A1 only JGIGI Check example param v3_2 txt threshold 0 1 A1 Making the Parameter File The parameter file tells GIGI Check where to look for the required files and where to save the output fi
3. Second framework markers that are multi allelic tend to be more informative than di allelic markers if they are available Third if framework markers are SNPs markers with high minor allele frequencies in the sample tend to be more informative Fourth framework markers should be moderately but not too dense eg not denser than 1 marker per 0 3 cM because of concern about MCMC mixing and violation of the assumption of Linkage Equilibrium If the framework markers are multiallelic this spacing should be greater e g not denser than 1 marker per 2 cM for good MCMC mixing What do I have to be careful about the genetic map that I am using The map positions are in map distances based on the Haldane map function instead of Kosambi map function or sequence positions If this map is based on the Kosambi map function the user will need to convert the map positions to Haldane map function using an appropriate conversion method Also since recombination fractions are relative to each other we strongly encourage the user to generate both the framework map positions and the dense marker positions at the same time 12 Appendix A In versions prior to 1 06 GIGI check only supports inheritance vectors from MORGAN gl auto version 3 1 or earlier If you generated the inheritance vectors file using MORGAN gl auto version 3 1 or earlier you must use the Pedigree Meiosis file instead of the Pedigree file a The Pedigree Meiosis file contains the infor
4. User Manual for GIGI Check v1 06 Author Charles Y K Cheung cykc uw edu Ellen M Wijsman wijsman uw edu Department of Biostatistics University of Washington Last Modified on 2 3 2015 Contents TPO GUC ON 03 cassace tees coscasetods ste a aa a anton cea taabddeaaeetecsetsnedbanSoaccselan 3 Citing ATG IAGO c 3 Sotware UR Dorene e E A a a E 3 What s new in version 1 06 cccccccccccecsssssscececececseesssnseceseccesesesesseseeeceesesesesaseeesececeesessaaseeeeeseeesensrtaaeees 3 Files in GIGI Check software distribution sess eene eene nennen nnne entere nnns 4 T f rrm AR ANDE ME i C c P 4 T stalling SIS MV qup 4 Ru nnins SU qM 4 dria fr T c 5 Making the Parameter File nio Rp pu e edhe RR MM ELM eM I p aD PM Mdb E IME 6 File RE HET ENTENDU OU ta an ae as eee ie 7 Output TES vec c C 10 Frequently Asked Questions FAQS sssscsciastesesseicapiatisescensans ewes VE EAS ED AV GA VE sided Sassuedesastecensassevaddieusnesaaseys 11 Pe OS a OR RTT EE EOS 12 TA CONS E 13 Ackhowl d gement T 13 Introduction GIGI Check is a C program to detect Mende
5. d by applicable law will any GIGI Check copyright holder be liable to you for damages including any general special incidental or consequential damages arising out of the use or inability to use the program including but not limited to loss of data or data being rendered inaccurate or losses sustained by you or third parties or a failure of the program to operate with any other programs Acknowledgement Supported by funding from the National Institutes of Health grants R37GM046255 P01HL030086 and P50AG05136 I would also like to thank Elizabeth Marchani for providing feedback
6. fault assumed genotype file format in GIGI Check is the traditional wide format To let GIGI Check know that your file is in the long format you use long flag e g JGIGI Check example param longFormat txt long Option 1 The ong format Each row of the file contains the genotypes for a marker Consistent with the BEAGLE s genotype data file format this file requires a header line The header contains the following information id person1 person person2 person2 person3 person3 The id is the name of the marker The pair of columns for each individual represents the first allele and second allele of the specified individual Unlike BEAGLE s genotype file format GIGI Check s long file format does not have the T column because GIGI Check assumes that every row contains a marker An example of this file looks like id 101 101 102 102 103 103 rs0001 111211 rs0002 120012 rs0003 222212 Note if the Jong format is used the outputs will also be generated in the long format Option 2 The default wide format Each row of the file contains the genotypes for an individual This file does not contain a header line The first column specifies the name of an individual The subject_ID can be an alphanumeric string In this wide format the name of the marker is not retained Subject ID allelel_marker1 allele2_marker1 allelel_marker2 allele2_marker2 allele1 marker3 allele2_marker3 An example of this file looks li
7. ke 101111222 102120022 103111212 6 allele frequencies of the dense marker file line 7 Each row contains the allele frequencies of the dense markers The first column is the allele frequency of allele 1 the second column is the allele frequency of allele 2 etc Each row must sum to 1 The allele frequencies file is space delimited allele frequencies of marker 1 allele frequencies of marker 2 allele frequencies of marker 3 eg 0 4 0 6 0 2 0 3 0 5 0 8 0 2 10 Output files GIGI Check displays the results of error detection in two files results GIGI Check txt and summary markers flagged txt A results GIGI Check txt Tthe first line is the header Results from GIGI Check on each dense marker is displayed in a separate line e g A pctConsistent A2 probNoErr ML person ML person prob allelel allele2 0 866667 0 888544 0 9 0 954627 0 766667 0 784216 0 933333 0 927275 1 0 988929 1 0 989176 0 4 0 439488 00514 0 823529 2 2 0 566667 0 561086 Main summaries refer to the GIGI Check paper A pctConsistent percent consistency Method A1 A2 probNoErr the posterior probability that there is no error in this marker Method A2 Summaries associated with Method A2 ML person the most likely person with genotyping error ML person prob the posterior probability that this person has a genotyping error allele1 allele2 the most likely true genotype of ML perso
8. les GIGI Check needs the pedigree file or the pedigree meiosis file that user prepares from the output of gl auto dense marker file Vs file from gl auto map of the sparse marker file map of the dense marker file and allele frequencies of the dense marker file An example of the parameter file is found in the example directory under example param v3_2 txt In GIGI Check the parameter file is organized as follows line 1 filename of the pedigree or the pedigree meiosis file if using framework IVs generated by MORGAN v3 1 or earlier line 2 framework IVs file line 3 number of sampled realizations in the IV file line 4 filename of the map positions of the framework markers file CM based on Haldane map function line 5 filename of the map positions of the dense markers file CM based on Haldane map function line 6 filename of the dense marker genotype file line 7 filename of the allele frequencies of the dense markers file line 8 the assumed allelic error rate e g 0 01 for 1 Notes I suggest using the absolute paths of the filenames instead of relative paths A relative path is relative to the directory containing the executable program The parameter file in the example folder is created using a relative path line 1 GIGI Check can only process 1 pedigree at a time line 2 when you run gl auto you should instruct gl auto to print Meiosis Indicators instead of Founder Genome Labels line 3 this co
9. lian consistent genotyping errors of dense markers in pedigree data It detects genotyping errors by using Inheritance Vectors IVs which are inferred by using sparse framework genotypes available on a subset of relatives in the pedigree Thus our error detection approach consists of two steps The first step is to infer IVs at the positions of framework markers using gl auto a MCMC based program from the MORGAN package We assume that these markers are free of genotyping errors The second step is to detect errors in dense genotypes by GIGI Check using the IVs and pedigree structure file from MORGAN In this documentation we use the following terminology Framework markers are a relatively sparse set of markers used to infer IVs on a chromosome of interest Dense markers are markers with missing genotypes on some subjects that we want to detect genotyping errors For example these dense markers may be genotypes obtained from sequence data or from a dense SNP panel and may be typed on fewer and even different subjects in the pedigree See the publication describing GIGI Check below for more information GIGI Check is developed under the linux environment Citing GIGI Check Cheung CYK Thompson E A Wijsman E M 2014 Detection of Mendelian Consistent Genotyping Errors in Pedigrees Genetic Epidemiology 38 4 291 299 Software URL http faculty washington edu wijsman software shtml What s new in version 1 06 We have made
10. mation about the structure of the pedigree This file is different from the pedigree file used in gl auto In addition to the pedigree structure this file also contains information that GIGI Check needs to determine how the Inheritance Vectors Meiosis indicators are organized i e the i line of the meisosis indicator belongs to which subject in the pedigree and whether this meiosis indicator is this person s maternal or paternal chromosome We need to create this file from the console output of gl auto Pedigree Meiosis file Create the pedigree meiosis file from the console output of gl auto It is very easy to make this file When we run gl auto the program prints a huge amount of output to the console This console output actually contains the content of the pedigree meiosis file that we need to extract 1 In order to extract this content we first need to direct the console output to a file by using the gt directive so we can subsequently extract the content from this file ie Jgl auto gl auto parameter file gt glauto console output txt 2 Then we extract the pedigree meiosis content from the console output to a new file To simplify the creation of this file use the Perl script extractPedMeiosis pl e Usage perl extractPedMeiosis pl glauto console output txt FILENAME PED MEIO o We need to have Perl installed in linux o assuming glauto console output txt is in the same directory as extractPedMeiosis pl e Alternatively thi
11. n Note If ML person prob A2 probNoErr ML person ML person prob allele 1 and allele 2 would not be printed because it s more likely that the marker does not contain a genotyping error B summary markers flagged txt This file contains a summary of which markers are flagged e g Markers flagged by percent consistency A1 because they are below the error detection threshold of 0 05 Markers are indexed by 1 N 8 27 63 67 74 86 90 91 132 11 Frequently Asked Questions FAQs 1 2 What threshold should we use to flag for error This decision is a tradeoff between sensitivity and specificity From computer simulation we see that in almost all cases a threshold very close to 0 can often detect most of the errors that are detectable by this class of approach However we often see that sensitivity may increase markedly if we relax the detection threshold slightly from 0 although after that increase sensitivity does not continue to increase at that rate if we continue to relax the threshold For convenient purpose we simply set the default threshold to 0 05 5 and user can flexibly adjust Please refer to the paper How do I choose framework markers Because the framework markers are assumed to be Sparse we want to choose framework markers that are informative about which chromosomes are being transmitted at the framework loci First framework markers typed on a large number of subjects tend to be most informative
12. nce in centi Morgans cM based on the Haldane map function Markers must be ordered in ascending order and consistent with the order used in gl auto Each line contains the position of a marker position of Marker position of Marker2 position of Marker3 position of MarkerN eg 1 0 2 0 3 0 4 0 e map positions of dense markers line 5 Similar to the marker map for framework panel of markers the marker map of dense markers is a text file which contains the map distance in centi Morgans cM based on the Haldane map function Markers must be ordered in ascending order Each line contains the position of a marker eg 0 5 0 7 0 9 1 1 1 15 b dense marker genotype file line 6 The dense markers are the markers that we want to detect Mendelian consistent genotyping errors The dense marker file contains the genotypes of observed individuals a This file is space delimited b The markers should be sorted by ascending map positions c Alleles are labeled numerically starting from 1 2 in ascending order We use 0 to indicate a missing allele f the original genotypes are in alpha numeric users first need to convert the genotypes to an indexed numerical format User have two options in the format of the genotype file 1 the newer ong format rows are markers or 2 the more traditional wide format rows are individuals The long format should be used for file with many markers e g gt 10 000 Note The de
13. on allele frequencies of framework markers and genotype data of framework markers 3 Parameter file used to run gl auto In the parameter file for gl auto please make sure to use the option output meiosis indicators instead of output founder genome labels GIGI Check uses the Framework IVs file produced by gl auto representing the IVs as meiosis indicators at the position of each Framework marker Refer to the documentation of MORGAN for guidance on setting up these files and on running gl auto Example files used to infer Vs using gl auto are included under the example gl auto example directory For versions of GIGI Check before ver 1 06 We need to obtain 2 files from running gl auto a Framework IVs file see instructions above b Pedigree meiosis file see Appendix A Installing GIGI Check Simply unzip the files navigate to the code directory and type make If make does not work go to the GIGI Check cpp s directory and install the program by g GIGI Check cpp o GIGI Check Running GIGI Check GIGI Check accepts a parameter file To run GIGI Check type JGIGI Check lt parameter file gt options To run the example file go to the main GIGI Check program directory and type JGIGI Check example param v3_2 txt Options The flags are case sensitive Flag seed NUM prog NUM outD long Al threshold NUM Purpose to change the default seed NUM should be a number between 1 and
14. rresponds to the number of samples that the user actually prints to the meisois indicator file File Formats Examples of these files are provided in the example directory refer to the param txt for the filename of these files a pedigree file line 1 This is the pedigree file used by MORGAN gl auto to infer IVs For an example of this file please refer to the ped52 MORGAN ped in the example directory Specifically GIGI only cares that 1 the first word in the first line is input to verify that this is a pedigree file 2 the first subject in the pedigree starts on the fifth line with the first 3 columns being subjectID fatherID momID e g input pedigree size 52 input pedigree record names 3 integers 2 input pedigree record trait 1 integer 2 Th 2K 2 2K Kk 1010010 1020020 201 101 102 10 202 101 102 20 20100020 For both gl_auto and GIGI the pedigree should be ordered in a way such that the ancestors are specified before the descendants For more information please refer to http www stat washington edu thompson Genepi MORGAN morgan3 11 tut html morgan tut_2 html SEC13 b framework IVs file line 2 The Inheritance Vectors file describes the descent pattern of chromosomes at the positions of the framework markers It is the output file that gl_auto generates c map positions of framework markers line 4 The marker map positions of the framework markers file is a text file which contains the map dista
15. s file can also be easily created by the user See the following example for how to manually edit the file manually and also for an example For manual creation of the Pedigree Meiosis file from the console output of gl auto Using a text editor we open the console output txt and fetch the line that begins with name name pa name ma Compnt pat meio mat meio Copy this line and table below paste the table to another file and save it The file includes the header line and looks like this name name pa name ma Compnt pat meio mat meio 2100 6 0 Oo 1 8 2100_21 0 0 1 0 0 2100_25 0 0 1 0 0 2100_29 0 0 1 0 0 13 2100_31 0 0 1 0 0 2100_39 0 0 si AG su 2100 907 0 0 1 0 0 2100_908 2100_901 2100_902 1 2 1 2100_909 0 0 0 0 2100_910 2100_901 2100902 1 4 3 2100_911 2100_901 2100_902 1 6 5 2100_915 2100_907 2100_908 1 8 7 until the end of table License GIGI Check is free software you can redistribute it and or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation either version 3 of the License or at your option any later version This program is distributed in the hope that it will be useful but WITHOUT ANY WARRANTY without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE See the GNU General Public License for more details There is NO WARRANTY for the program to the extent permitted by applicable law In no event unless require
Download Pdf Manuals
Related Search
Related Contents
320I/36 - TechPlus 1 - Bugatti manuel d`utilisateur hprim-net Defibrillator ZOLL AED Pro - Logistikbasis der Armee LBA 『ろくろ倶楽部』新発売! Prüfungsrichtlinie - Fahrlehrerverband Pfalz eV ZXHN F660A 取扱説明書 V1.0 Genius ECO-u200 Copyright © All rights reserved.
Failed to retrieve file