Home

App C H3Africa EGA Submission Guidelines 13092013

image

Contents

1. amp H3ABioNet 2 Pan African Bioinformatics Network for H3Africa TELE rT TET err eT ELTI seceeees AII European Genome phenome Archive EGA General Information 1 What is the EGA The EGA is a centralised repository that permanently stores and provides user access to all types of genetic and phenotypic biomedical research data which are personally identifiable The EGA archives data from study participants who have expressly provided their consent agreement to only release data for specific research use and to bona fide researchers Very strict policies and protocols are in place which determines how the data is managed stored and distributed Only members of the EGA are allowed to process data on site and all data is encrypted with the encryption keys codes 2 EGA System and Security The EGA is a secure computing facility that uses a shared EBI setup for data submissions and has a petabyte scale archive for original data files and data access via the website All data outside this secure EGA site is encrypted EBI web team has implemented strict web protocols that include phpBB for https logins and also maintain the file archive and data submission parts EGA itself is not visible to any EBI network machine and only to EGA staff The EGA data archive is modular to provide high security and performance for large archived datasets Individuals their samples and phenotypes are stored in separate databases Experi
2. If preparing XMLs for submission there are 2 key stages The first stage for sequence data involves the preparation of a submission study sample experiment DAC Policy and Run XML files for annotation 7 XML files in total The second stage requires a submission different to the one from stage 1 and a dataset XML file 2 XML files in total More details on XML files are provided in Appendix A Example XML files that can be used modified for submission are provided on the EGA user manual website https www ebi ac uk ega submission manual Example XML Submission The XML files are uploaded to a Test XML user account provided within the submission pack which acts as a testing area whereby one can validate their XMLs Once validated and fine the finished XMLs are uploaded in to a Production XML user account also provided within submission pack Note e There isa Java uploader that needs to be installed if using the dropboxto upload files instead of Aspera or FTP e Data files affiliated to a submission are uploaded into private submission drop boxes using FTP or Aspera protocols which are provided as part of the submission procedure e Submitters are encouraged to use the EGA uploadertool https www ebi ac uk ega submission tools which encrypts generates md5sum s and uploads your files to your submission dropbox e Data files may also be uploaded manually using FTP or Aspera but submitters must ensure that all data files are encr
3. A receipt will be generated upon successful completion Upon completion an EGA website space is prepared and an EGA member will contact to ensure the details are correct Stage 1 Submission xml Submission xml Study xml Dataset xml Sample xml Experiment xml DAC xml Policy xml Run xml lease note Each stage requires a submission xml which defines the submission transaction APPENDIX B Example of a genotype data submission process Taken directly from EGA documentation What follows is an EGA AF walk through based on a hypothetical case control genotype submission consisting of 2 human lung samples genotyped with 2 different platforms Affymetrix_500K and Illumina _550K i Individual contact details Provide details of the submitter and one other contact person Please add contacts to adjacent columns as shown Person Affiliation JDRF WT Diabetes Group Person Last Name Micheal Person First Name Hughes Person Mid Intials R Person Phone 01223 XXXXXX Person Email Micheal Hughes somewhere com Person Fax 01223 XXXXXX Person Roles Submitter ii Details of data providers and data abstract Details of the submitting organisation and abstract of data being submitted Organssahon Name Leicester University Department of Health Sciences Uneversity of Leicester Organisation Address Adnan Buiding Uneversity Road Leicester LET 7RH The National Child Development Study NCDS also known as
4. according to shared data type technology and by case control We also like to capture the number of samples that make up a dataset and the Data Access Committee responsible for approving access to the named dataset You will find the Dataset component located in the tab at the bottom of the sheet shown here Datafiles What follows is an example of how to map your samples detailed in the Samples and phenotype tab to the genotype files added to your upload account You will find the Genotype and SNP component located in the tab at the bottom of the sheet shown here E Ready oe H3ABioNet Pan African Bioinformatics Network for H3Africa File format Signal one File formal Simple Mime Genetype platierm Aw daca File Pen Fite a Lana type Liprewion fle Genotype file Additional flea Datta eet SAC lal Allyreetris CT ae mat ABYS00 CELIS CELI gp JOEL at AH500 Signal Leng in X sac ASOOK Eenoyps l gpg OSTAT Association Filler gr emoe 500K SHC 241 Alymretiz E SAC ASSON CELI gpg DEL Ral ANSO Signal gpg int brima SARC ASOOK genoypeti gpg WOTAT fAsmociat ion file gpg Cr rected S0 SAC 2a Alytis 15K hma SEC AR1SK genotypest gpg OSTAT jAswciation Megpg Gencede 15K SAC lal Aymer 15 brimi MAL ARISKE penotypestgpg OTAT Aa aum ae htt Me gpi Genocide 15K SAC lal Nein 5508 MAL lussi leg JOE Pal uS 50 Sigal epg int Whore AH MUS 50 penorypes epg TAT As
5. aunt lae lau lle gpg Cee recite SS SAC 2a Maina 550K maL 550 ide gpg MaC u50 Sigeal gpg iat hernia SE MSS genotypesd gpg OGTAT Aa cu lat lic Mega Genoud SS Important note If you have uploaded files NOT using the EGA uploader you must upload the encrypted and unencrypted md5sum values of all files uploaded to your submission account Your submission will not be processed without md5sum values supplied for all files css Pan African Bioinformatics Network for H3Africa APPENDIX C Example of a GWAS summary aggregate submission What follows is an EGA GSF example based on a hypothetical case control genotype summary submission conducted on 1500 human lung samples genotyped with 2 different platforms Affymetrix_500K and IIlumina_550K Individual Contact details Details of the submitter and a contact person for the submitted Person Affiliation JDRF WT Diabetes Group Wellcome Trust Person Last Name Micheal Sue Person First Name Hughes Peters Person Mid Intials R G Person Phone 01223 XXXXXX 01223 XXXXKX Person Email Micheal hughes somewhere com David pete somewhere com Person Fax 01223 KAANAK 01223 XXXXKM Person Roles Submitter Principal Investigator Details of Data provider and data abstract Details of the data source organisation and abstract of your data being submitied Comment Orgartsatan Name Leicester Unversity a Department of Heath Sciences University of Leicester The National C
6. has been granted ethical approval and is in accordance with the applicable laws and regulations Sincerely Representative of study e g Principal Investigator gt OPC eR
7. log in details to this account should have been provided at the beginning of the submission process Test XML upload account recommended for first time users https www test ebi ac uk ena submit drop box submit Production XML upload account https www ebi ac uk ena submit drop box submit Submitters are advised to use the Test XML upload account when submitting XML s to the EGA for the first time The test service is identical to the production service except that all submissions will be discarded on the following day We recommend that you validate all XMLs using the VALIDATE action in your submission XML before submitting using the ADD action Stage 1 XML Descriptions e Submission XML describes the submission transaction contact details md5 checksum values before and after encryption ftp ftp sra ebi ac uk meta xsd sra_1 4 SRA submission xsd e Study XML describes study in detail title study name and abstract Also provides unique identifier accession that can be used within the submission receipt ftp ftp sra ebi ac uk meta xsd sra_1 4 SRA study xsd e Sample XML description of each of the samples used in the study ftp ftp sra ebi ac uk meta xsd sra_1 4 SRA sample xsd e Experimental XML experimental details such library preparation sequencing platforms and type etc different XML files for different platforms experimental types ftp ftp sra ebi ac uk meta xsd sra_1 4 SRA experiment xs
8. reads 4 _ 12 The first line for each read must start with The base calls and quality scores must be separated by a line starting with The Fastq files must be compressed using gzip or bzip2 Example of Fastq file containing single reads read name GATT TGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTCAACTCACAGTTT Ltt SSS4 333 1 55CCF gt gt gt gt gt gt CCCCCCC65 Example of Fastq file containing paired reads read name 1 GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTCAACTCACAGTTT 111x S 3 3 5 1 55CCF gt gt gt gt gt gt CCCCCCC65 Cread name 2 GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTCAACTCACAGTTT LVI 3 1 55CCF gt gt gt gt gt gt CCCCCCC65 amp H3ABioNet 2 Pan African Bioinformatics Network for H3Africa TELE rT TEL err eT e ELTI seceeees AILI where lt cycle gt indicates the cycle number that starts the second read The Fastq files should be compressed using gzip ii Secondary analysis formats The EGA supports 2 types of analyses reference alignments in BAM format and sequence variations in VCF format b Array Based Submissions The EGA supports submission of processed data from all types of array based technologies such as genotype gene expression methylations etc EGA also archives any associated phenotypic data EGA does not provide any m
9. the 1958 British Birth Cohort 1958BC is a continuing mulli disciplinary longitudinal study which takes as its subjects all the people born in one Organisation Description week in March 1958 in England Scotland and Wales The resource is used widely for research in genetic and genomic epidemiology in parbcular as a platform for genetic assocushon studies Website uri Nttp www je ac uk prosects birthcohort This GWAS extends the Wellcome Trust Case Control Consortium WTCCC v wiece org uk by using 4 000 additional T1D Abstract for data submitted cases in the UK and 2 500 addtional controls recruited from the 1958 Birth Cohort The EGA hosts the contro data for this study The cases are available from the dbGAP iii Attaching policy documentation _ Nameofdata provider Address of data provider Website of data provider Wellcome Trust Sue Peters G 01223 XXXXXX David peters somewhere com 01223 XXXXXX Princioal Investicator n H3ABioNet is Pan African Bioinformatics Network for H3Africa Policy statements required for EGA data archiving and distribution Add policy document to your submission upload account Please refer to your submission pack for an example of a Policy document Please add name and path of your policy document lt EXAMPLE document policy_document doc gt Notes on policy documentation e Document MUST be undersigned by an individual capab
10. 50 Data files Ti 4 4 Filename Dataset Name Gencode500_summary file Gencode_500K Gencode500_read_me Gencode_500K hy HSABioNet sssr Pan African Bioinformatics Network for H3Africa APPENDIX D Example Policy Statement European Genome Phenome Archive c o European Bioinformatics Institute Wellcome Trust Genome Campus Hinxton Cambridge CB101SD United Kingdom To whom it may concem to the European Genome Archive EGA for the restricted access by legitimate academic institutions that have agreed to comply with the terms of a Data Access Agreement drafted by Se eee eee There are a number of steps that a researcher must take to obtain access to this data and the process is overseen by our Data Access Committee called lt NAME and EMAIL ADDRESS OF CDAC gt lt If the CDAC consists of a single individual please provide individual details for example gt lt NAME gt is a Postdoctoral Fellow for lt NAME gt and is authorized to approve researchers to have encrypted access to the data submitted to the EGA Please be advised that lt INDIVIDUAL NAME and EMAIL ADDRESS gt is authorized to upload data to the EGA for archiving and distribution as part of your submission process which will enable approved researchers to have encrypted access to the data We can confirm that this submission is consistent with the informed consent of the participants of the study or
11. A only accepts de identified data with a DAC approved plan Accepted data types include manufacturer specific raw data formats from the array based and new sequencing platforms The processed data which may include genotype structural variants or any summary level statistical analyses from the original study authors are stored in databases The EGA will also accept and provide access to any phenotype data associated with the data Prior to submission the submitters must contact the EGA EGA Data Types and Formats for Submission 1 What type of data does the EGA Accept The EGA accepts a Sequence data raw unaligned and analysis aligned including RNASeq re sequencing epigenomics transcriptomics and other sequence based assays https www ebi ac uk ega submission https www ebi ac uk ega submission sequence LLE H Pan African Bioinformatics Network for H3Africa b Array based data genotypes SNP Expression and their associated phenotypic information c Analysis based submissions 2 What File formats does EGA Accept a Sequence Based Submissions i Sequences BAM format is the preferred EGA option All BAM files submitted must be able to be read with SAMtools and Picard Currently BAM files must be de multiplexed prior to submission plans to accept submission of BAM files with reads from multiple samples are in the pipeline Table 1 Summary of information on supported file formats for EGA Sequence
12. All consortium members have access to the consortium web page which is used to collate data on consortium projects in the EGA The pre publication phase for this is usually between 6 12 months 4 Consortium Specific Website The EGA creates a separate website for each consortium that deposits data within its system The website provides information about the consortium a link back to that consortium s website and a list of archived studies by that consortium Each study is assigned a stable identifier that can be referred to in publications Authorised data access requires a user to login and files designated for distribution will be encrypted and moved to a dedicated disk area outside the EGA Each file is made available to only those users that are granted access to the data by the DAC Hyperlinks to download data are available in the secure login website to approved users only and a script will verify the access to the data before downloads begin For large datasets web download is not optimal and EGA enable temporary FTP access via as Aspera account The Aspera account requires a password and can only be used by one person at a time 5 Data Submission Before the EGA can accept any data submissions the following policy documentation is required to be submitted with the data Data access agreement data access application form and a Policy statement examples are provided in submission pack when contacting the EGA for submitting data EG
13. accept but not happy with this format for the same reasons as the scarf format above Submitters should convert from qseq to fastq format Data submissions in PacBio HDF5 format are accepted Complete Genomics data should be submitted using the intact Complete Genomics directory tree structure containing the ASM LIB and MAP subfolders Each individual genome should be submitted as a single Run object associated with a single Experiment and Sample object Please note that the reads and mappings in the MAP directory should be included in the submission If submitting Fastq files then the compression algorithm to use is gzip or bzip2 For FastQ format primary sequence data submissions of single and paired reads are accepted as Fastq files that meet the following the requirements Quality scores must be in Phred scale For example quality scores from early Solexa pipelines must be converted to use this scale Both ASCII and space delimitered decimal encoding of quality scores are supported We will automatically detect the Phred quality offset of either 33 or 64 No technical reads adapters linkers barcodes are allowed Single reads must be submitted using a single Fastq file and can be submitted with or without read names Paired reads must split and submitted using either one or two Fastq files The read names must have a suffix identifying the first and second read from the pair for example 1 and 2 regular expression for the
14. amples and phenotypes What follows is a small sample of the Samples and phenotypes component which consists of 2 samples from two individuals Both samples have been genotyped using Affymetrix_500K and Illumina 550K platforms and three types of genotype calling software have been used chiamo brimm and IIluminus You will find the Samples and phenotypes component located in the tab at the bottom of the sheet shown here Sample Name Description Charactensics Characteristics Accession Charactensfics Characteristics Charactertsfics Characteristics Charactensiics Accession Sample file Organism Organism_Part Celi hae Geader Region Case Control Disesse_state name Human 1 58C_tat eptthebal cell ine denved from lung tissue EFO 0000934 Lung 1 East England Control EFO 0001461Sample fle tet EFO_0000934 Lung Scotland Case ny 0009071 Sample_file txt Human 2 58C_2at a cig dbs ai onain foes Important note If you have uploaded files NOT using the EGA uploader you must upload the encrypted and unencrypted md5sum values of all files uploaded to your submission account Your submission will not be processed without md5sum values supplied for all files Datasets HBABioNet h Pan African Bioinformatics Network for H3Africa What follows is a small sample of the dataset component We suggest that each dataset should consist of acommon set of data The example below consists of three datasets grouped
15. at document EGA AF spreadsheet template to add meta data and policy documentation associated with each genotype submission The EGA will only process the submission once the EGA AF is completed and received The EGA AF template comprises of 4 parts within a spreadsheet 1 Investigator and Policy documents information about study and policy documentation Contact details of submitter 2 Sample and phenotype information 3 Datasets and description 4 Data files and how data is organised for distribution Appendix B walks through an example submission 3 GWAS summary aggregate submission The EGA accepts submissions of complete summary level data associated with processed data such as genotypes structural variants and whole genome sequence with any value associated with these calls As an example summary level data associated with genotypes called with separate algorithms can be submitted If applicable please ensure that your submission of summary level data does not contravene the original consent agreements signed by the participants of the study WE DO NOT ACCEPT SUMMARY SUBMISSIONS BASED ON TOP LEVEL SNPS The steps are similar contact EGA receive submission pack with account details and upload data and document The steps for the documentation are provided below The EGA Genotype Submission Format EGA GSF is a soreadsheet template for submitters to add metadata associated with your summary level submission Once co
16. based data Supported Not Supported BAM most preferred Colour spaced BAM SFF for 454 data Files not de multiplexed before submission Convert Illumina Scarf Format to FastQ Platforms submission supported barcoded before submission Complete Genomics data with some caveats a see below Reference alignments BAM format es Sequence Variations VCF format oo e Colour spaced BAM files are not supported Data files have to be de multiplexed before submission so that each run is submitted with files containing data for a single sample only e Signal data is no longer accepted for Illumina GA Hiseq and SOLID platforms but continues to be supported for the 454 platform The minimum submission level for EGA is base colour calls with quality scores e As BAM is near optimal in terms of compression files should be submitted uncompressed e For 454 data the EGA accepts SFF which are also compressed and should be submitted uncompressed e llumina Scarf Format EGA will accept but are not keen on it as these submissions cannot be processed or made available in other formats https www ebi ac uk ega submission array_based i HSABioNet 3 Pan African Bioinformatics Network for H3Africa Sh 800 EGA requires one to convert Illumina Scarf Format data to Fastq prior to submission and to convert those scarf format logs odds qualities to Phred qualities when preparing the FastQ files for submission IIluminagseg format
17. d e DAC XML description of the Data access policy and url ftp ftp sra ebi ac uk meta xsd sra_1 4 EGA dataset xsd e Policy XML describes the data access agreement to be linked to the DAC ftp ftp sra ebi ac uk meta xsd sra_1 3 EGA policy xsd e Run XML describes data file and relation to experiment ftp ftp sra ebi ac uk meta xsd sra_1 4 SRA run xsd e Another interesting file is the Analysis XML which can be used to submit BAM files to the EGA with one BAM file for each analysis e Asimilar XML file is also used for submitting VCF files to EGA LELE S808 8 toe e ELT Tt oT TILILLE l UW gt 2 2 2 a e Dataset XML describes the data files that constitute the dataset and linked to the specific Policy in place and is defined by the Run XML and Analysis XML ftp ftp sra ebi ac uk meta xsd sra 1 4 SRA dataset xsd The Submission_example xml Study _example xml Sample_example xml Experiment_example xml Run_example xml DAC_example xml and Policy_example xml are submitted to your Production XML upload account whereby on successful completion one obtains a receipt with accession numbers for each object Stage 2 XML Descriptions The second metadata information submission XML consists of 1 named XML object with file name dataset used for stage 1 and people to contact for issues arising with the submission The Dataset XML groups each of the experiments accession numbers into a single dataset with accession policy
18. had Development Study NCDS also known as the 1958 Brash Birth Cohort 19588C is a continuing muli disciplinary longitudinal shady which takes as Comment Organtsabon Descnption s sutyects al the people born in one week in March 1958 in England Scotiand and g Comment Organisation Address Adnan Bunding University Road K Leicester LE 1 7RH Wales The resource i5 used widely for research in genetic and genomic epxiemectogy pacsi as a platform for genetic assocusbon shades Wet ot data provider This GWAS extends the Weicome Trust Case Control Consortium WTCCC Nip Jew wicee org uk by using 4 000 additional TID cases in the UK and 2 500 additional controls recrusted from the 1958 Birt Cohort The EGA hosts the control Gaeta for thes study The cases are avedable from De Gb gt GAP Comment Abstract for data submitied Further details of Study Further details of Ines Tite WICC 1958 Bih Cohort Contr g Publication title EGA Study Accession Number EGAS0000000001 Pubkcation Title Controts for genome wide association studies Pubkcation Status SUBMITTED PubMed ID 12345 q Pubbcation Manuscript pubication_ manuscript doc EGA release date 29 06 2011 Dataset am H3SABioNet jiis Pan African Bioinformatics Network for H3Africa Dataset Name Dataset accession number Data Tech Gencode 500K Gencode summary using 500K Genotype Affymetrix 500K Contino 4
19. ing the necessary information from a user wishing to access data Completion of a Data access application form by the applicant s should form part of the application process to the DAC MalariaGen Data access form Wellcome Trust Case Control Consortium Data access form Policy statements Please find below a policy statement example All submitters must provide the policy statements captured in this template An example policy statement is shown in Appendix D H3ABioNet tit Pan African Bioinformatics Network for H3Africa APPENDIX A Creating and submitting XMLs Taken Verbatim from httos www ebi ac uk ega submission manual All metadata required by the EGA may be collected using the EGA s XMLs Submitters are required to prepare validate and submit the XMLs Prepare XML s Validate XML s Working with XML We recommend manipulating EGA metadata using an XML editor preferably one with the ability to validate against XML schemas A good article on choosing an XML editor can be found here Alternatively XML can be edited in standard text editors and then checked using an XML validator e g xmllint a free unix based XML validator General concepts Aliases and center names Every EGA object must be uniquely identified within the submission account using the alias attribute The aliases can be used in submissions to make references between EGA objects Please find more information about the use of aliases and center name
20. le of confirming the statements made therein e g Principal Investigator e Please add your policy document template to your data file upload account or email directly to EGA Helpdesk You can view an example of policy statements here iv Details of your Data Access Committee DAC E E A EEROR e A AA e Pl submission pack for examp form lt EXAMPLE WTCOC gt lt EXAMPLE hittps www wtece org uk gt lt EXAMPLE cooc welcome oru Please add name and path for your Data access application form lt EXAMPLE document Data_access_application_form doc Please add name and path for your Data Access Agreement DAA lt EXAMPLE document DAA doc Notes on Data Access Committees e Please add your Data access application form and Data Access Agreement form to your data file upload account or email directly to EGA Helpdesk e View examples of a Data Access Application and a Data Access Agreement e Further information on DAC s can be found here v Further details of study and release policy Further details of your study Investigation Title WTCCC 1958 Birth Cohort Controls EGA Study Accession Number EGAS0000000001 Publication Title Controls for genome wide association studies Data types to be submitted Genotype Tiered access required NO Publication Status SUBMITTED PubMed ID 12345 Publication Manuscript publication_manuscript doc EGA release date 29 06 2011 EGA Array based Format document S
21. mental data are also stored in separate databases depending on the type of data The EGA short read archive is only for raw sequence data Processed data types must be submitted separately to the EGA and are stored in dedicated databases 3 How is data accessed from the EGA The EGA implements a distributed access granting policy where the decision to grant access to the data resides with the relevant consortium data access committee DAC In order to access the data in the EGA a scientist has to apply in writing to the DAC The DAC is primarily responsible for granting and making all data access decisions and not the EGA who merely facilitates the secure archiving and hosting of the data The DAC is composed of members from the organisation that produced the data and not EGA personnel The Data Access Agreement DAA is a contract made directly between the relevant DAC and applicant wishing to access the data The EGA will only provide access to the data once a successful application process has been passed onto the EGA from the DAC https www ebi ac uk ega node 66 LLE EGA provides encryption keys codes physically and offline after one is granted access to a particular dataset studyby approval from a DAC These can be used to obtain web access The EGA provides support for consortium members to access the data before publication by allowing access to only the consortium data and study websites by means of authorised secure logins
22. mpleted and validated the EGA GSF is used to produce a website that will describe and link to the submitted data The EGA can only process a submission once a completed EGA Genotype Submission Format document is received from the submitter amp H3ABioNet Pan African Bioinformatics Network for H3Africa TET EPL ELT err eT e ELTI seceenes ALLL ILIELLLLEL The EGA GSF spreadsheet consists of three components 1 Study and investigators Information including the title description publication and contact details 2 Dataset Define how your data is going to be organised into datasets for distribution 3 Data Files Filenames of data files to be submitted and the name of the dataset to which they will be affiliated An example is provided in Appendix C What happens after data is submitted successfully A draft website is prepared which will point to your study dataset and Data Access Committee Once your draft website is completed a member of the EGA will be in touch before your website goes live to ensure 1 Your study is represented accurately 2 Access to EGA user management tools is provided to the Data Access Committee named contacts 3 Further information regarding the role of the Data Access Committee can be found here Finally your data is archived within our databases and prepared for encrypted distribution upon the request of permitted EGA account holders We strongly advise yo
23. ore information other than what is above on files accepted for Array based submissions in their online user manual c Analysis Based Submissions These include genotypes summary or aggregate structural variants VCF expression and phenotype Accepted data types include raw data manufacturer specific formats from array based and NGS platforms and processed data such as genotypes structural variants aligned reads Key Stages of EGA Submission 1 Sequence Based Submissions to the EGA There are 4 key stages for the submission of sequence based data according to the EGA manual https www ebi ac uk ega submission manual a Contact EGA Contact the EGA and provide details as to your data files types and anticipated size b Receive submission pack The submission pack which includes login details for account documents providing details for key stages of submission and policy statements template for completion and return c Upload data Upload data files into your data upload account using EGA Webin Data uploader which automatically encrypts and creates md5sum check values to make sure all your data is uploaded correctly d Document Provide details of study samples experiments policy and datasets Metadata is required to be produced at this stage which can be done either using the Webin EGA tool or creating and submitting one s own XMLs https www ebi ac uk ega submission phenotypes LELE Po TLiiil ELELE TTTTIT
24. s below alias attribute every object should have a name that is unique within your submission account Once submitted successfully every alias will be assigned an accession refname attribute when an object references another by its alias the alias goes into the refname attribute For example if a sample has the alias sample1 and an experiment uses this sample then the EXPERIMENT SAMPLE refname should be sample1 center_name attribute The center_name attribute is required within the submission XML and will be propagated to all other XMLs if not individually provided This element is the controlled vocabulary acronym or abbreviation that is provided to the account holder when the account is first generated for an institute If the submitter is brokering a submission for another institute the submitter should use their special broker account name in broker_name while the data centre acronym remains in center_name run_center attribute Many submitting centres contract out the sequencing to another centre In these cases the sequencing centre should be acknowledged in the run_center attribute Again this is controlled vocabulary and the acronym should be sought from EGA before submitting e686 tees He H3ABioNet th E 886 os amp Pan African Bioinformatics Network for H3 lt Africa g HHH PTTTiitt Validating and submitting your EGA XML s Please submit your EGA XML s to your XML upload account Please note that your
25. u NOT to delete your data until we confirm that your data has been successfully archived Policy documentation required for submissions The following policy documentation is required to be prepared and submitted to the EGA together with your data files and associated metadata Data Access Agreement DAA Data access application form Policy statements All policy documentation should be emailed directly to EGA Helpdesk Please be advised that the EGA cannot process your submission without the documentation shown below Data Access Agreement DAA Please find below links to examples of Data Access Agreements DAA used by existing Data Access Committees DACs The Data Access Agreement is a contract made between user and Data Access Committee The agreement should be drafted by the DAC and includes but is not limited to details of data use publication restrictions and storage i Pan African Bioinformatics Network for H3Africa Completion of a DAA by the applicant s should form part of the application process to the DAC Wellcome Trust Case Control Consortium DAA Wellcome Trust Sanger Institute Cancer Genome Project UK Academic Wellcome Trust Sanger Institute Cancer Genome Project US Corporate Data access application form Please find below links to examples of Data access forms used by existing Data Access Committees DACs The Data access form should be drafted by the DAC for the purpose of captur
26. ypted with GnuPG and md5sum values are provided in the format required Please note the EGA uploader tool may also be used to encrypt your files and generate md5sum values without uploading your files e All submissions except summary aggregate level require policy documentation This consists of Policy statements Data Access Agreement DAA and Data access application form e EGA also require the submission of associated metadata which includes contact details of the submitter and sample descriptions For sequence submissions we use XMLs and or Webin and for Array based submissions we use the Array based format document to collect this information 2 Array Based Submissions The EGA accepts processed data from all types of array based technologies such as genotypes gene expression methylations etc https www ebi ac uk ega submission manual Contact_ Genot The key stages to array based submissions are a Contact EGA helpdesk Provide details of the data sizes and types you wish to submit b Receive a submission pack LLE Includes unique accession numbers login details for account submission guide documents and policy statements for completion c Upload Data Upload your data using the EGA Webin tool which automatically encrypts data and generate md5cum values for checking d Document Provide details of the protocols samples experiments and policy documentation One uses the EGA Array based Form

Download Pdf Manuals

image

Related Search

Related Contents

TVAC15010A Manuel utilisateur  Final Report for Touch-Activated Response Gaming Entertainment  Page 1 Page 2 流量測定について 関連法規  Télécharger ce fichier  Gigabyte GSmart Guru Review    

Copyright © All rights reserved.
Failed to retrieve file