Home

ChIP Sequencing using Biomedical Genomics Workbench

1. QIAGEN Tutorial ChIP Sequencing using the Biomedical Genomics Workbench November 24 2015 Sample to Insight CLC bio a QIAGEN Company Silkeborgvej 2 Prismet 8000 Aarhus C Denmark Telephone 45 70 22 32 44 www clicbio com support clcbio qiagen com 66606 00000 QIAGEN Tutorial ChIP Sequencing using the Biomedical Genomics Workbench 2 ChIP Sequencing using the Biomedical Genomics Workbench This tutorial takes you through a complete ChIP sequencing workflow using the Biomedical Genomics Workbench The tutorial makes use of the peak shape based Transcription Factor ChIP Seq tool present in Biomedical Genomics Workbench 2 0 and higher ChIP Sequencing is used to analyze the interactions of proteins with genomic DNA After a cross linking step that covalently links proteins and DNA ChiP seq uses chromatin immunoprecipitation ChIP to fish out the relevant pieces of genomic DNA By subsequent massive parallel DNA sequencing and mapping to the reference genome it is possible to identify binding sites of DNA associated proteins It can be used to accurately map global binding sites for any protein of interest when specific antibodies are available A natural next step for bioinformatics analysis is to extract the binding regions and perform pattern discovery to learn about any conserved binding motif in the DNA Usually a control experiment is performed where the immuno precipitation step is left out This contro
2. ChIP Sequencing in figure 4 by clicking on the arrow to the right We then press Next and check that only the two lists control chr21 and nrsf chr21 are selected figure 5 Clicking Next will allow you to select a reference sequence as shown in figure 6 QIAGEN Tutorial Bx Map Reads to Reference Select sequencing reads 1 Choose where to run ete Navigation Area 2 Select sequencing reads nats HE control chr21 Gf pnrsf chr21 1e Q zenter search term gt Batch Figure 4 Select sequence list containing the reads Since we want to map two lists we choose the batch mode i Map Reads to Reference Batch overview 1 Choose where to run Units nrsf chr21 2 Select sequencing reads control chr21 3 Batchoverview Only use elements containing Exclude elements containing xs Figure 5 Check that all reads are used as input for the mapping Map Reads to Reference References Choose where to run References Select sequencing reads References NC_000021 Genome Batch overview Reference masking No masking Exclude annotated References gt Indude annotated only Masking track Figure 6 Specify the reference sequences and reference masking At the top you select NC_000021 Genome by clicking the Browse and select element acy button You can select either single Sequences or a list of Seq
3. In addition to the NRSF ChIP seq dataset we will also use a control experiment where the immuno precipitation step is left out In this tutorial we will look at only a subset of the data namely only the reads of the NRSF and control experiments mapping to human chromosome 21 Importing the raw sequencing data First download the data set from our web site http download clcbio com testdata raw_data ChIP segq NRSF_chr21_bx zip Unzip the file somewhere on your computer e g the Desktop Start the Biomedical Genomics Workbench and import the sequencing data ChIP Sequencing using the Biomedical Genomics Workbench 3 QIAGEN Tutorial File Import 4 Ilumina p This will bring up the dialog shown in figure 1 Illumina High Throughput Sequencing Import Set parameters 1 Choose where to run Look in ChIPSeqBxWB 2 Import files and options l i contrel chr21 fastq ce ii nrsf chr21 fastq Recent Items Desktop F My Documents A a Computer File name control chr 21 fastq nrsf chr21 fastg Network Files of type Illumina files txt fasta fq qseq General options Paired reads Paired read information _ Discard read names 3 Paired end forward reverse Mate pair reverse forward Discard quality scores F Minimum distance 180 Maximum distance 250 Illumina options V Remove failed reads Quality scores NCBI Sanger or Illumina Pipeline 1 8 and later O M
4. can be interpreted in a similar fashion We note that since this is a control experiment the value of the relative strand correlation is not important and the status would be OK also for low values As for NRSF the fact In this tutorial we used only the subsets of the data mapping to chromosome 21 The com plete datasets can be found at the UCSC website The complete NRSF dataset is available at http ngdownload test cse ucsc edu goldenPath hg18 encodeDCC wgEncodeHudsonalphaChipSeg releasel wgEncodeHudsonalphaChipSeqRawDataReplK562Nrsf fastq gz The complete control dataset is available at http hgdownload test cse ucsc edu goldenPath hg18 encodeDCC wgEncodeHudsonalphaChipSeg releasel wgEncodeHudsonalphaChipSegRawDataRep1K562Control tanto gz QIAGEN Tutorial ChIP Sequencing using the Biomedical Genomics Workbench 9 Number of reads 307 787 Very low For mammalian cells this value should be at least 10 million reads For smaller organisms e g worm and fly this value should be at least 2 million reads Relative strand correlation 1 192 OK The relative strand correlation describes the ratio between the fragment length peak and the read length peak in the cross correlation plot This value should be greater than 0 8 for transcription factor binding sites but can be lower for ChIP seq input or for histone marks Normalized strand coefficient 2 3751 OK The normalized strand coefficient describes the ratio between the
5. ck NC_000021 Gene Previous gt Next Figure 14 Select the reference gene track Choose NC_000021 Gene as the reference gene track then click Next and Save the result The file nrsf chr21 Reads Peaks Annotated HE will be generated You should now have the files depicted in figure 15 Visualizing the results The best way to visualize the results is to create a Genome Browser View ChIP Sequencing using the Biomedical Genomics Workbench 10 QIAGEN Tutorial EF ChIP Sequencing f control chr21 4 nrsf chr21 ZE NC_000021 Gene AE NC_000021 Genome i control chr21 Reads fl control chr21 mapping summary report ies Mrsf chr21 Reads fl nrsf chr21 mapping summary report 5E nrsf chr21 Reads Peaks fill nrsf chr21 Reads QC Report dae nrsf chr21 Reads Peak shape filter vee lve nrsf chr21 Reads Peak shape score 5E nrsf chr21 Reads Peaks Annotated Figure 15 All files created after the Transcription Factor ChIP Seq analysis is done Toolbox Genome Browser Create New Genome Browser View igp Select the tracks we created so far as shown in figure 16 and then press Finish a Create New Genome Browser View r Select tracks from same genome 5 nibs Sas Navigation Area Selected elements 6 f gt ChIP Sequencing A X NC_000021 Genome NC_000021 Gene E NC_000021 Gene X NC_000021 Genome orsf chr21 Reads c
6. eak 42269275 206 25 81 3 20E 147 C2CD2 22097 ZNF295 10637 h Create Track from Selection als Ee Ei Figure 17 A very strong peak near the gene SYNJ1 11 00000 00000 QIAGEN Tutorial Bibliography Landt et al 2012 Landt S G Marinov G K Kundaje A Kneradpour P Pauli F Batzoglou S Bernstein B E Bickel P Brown J B Cayting P Chen Y DeSalvo G Epstein C Fisher Aylor K l Euskirchen G Gerstein M Gertz J Hartemink A J Hoffman M M Iyer V R Jung Y L Karmakar S Kellis M Kharchenko P V Li Q Liu T Liu X S Ma L Milosavljevic A Myers R M Park P J Pazin M J Perry M D Raha D Reddy T E Rozowsky J Shoresh N Sidow A Slattery M Stamatoyannopoulos J A Tolstorukov M Y White K P Xi S Farnham P J Lieb J D Wold B J and Snyder M 2012 ChIP seq guidelines and practices of the ENCODE and modENCODE consortia Genome Res 22 9 1813 31 Marinov et al 2014 Marinov G K Kundaje A Park P J and Wold B J 2014 Large scale quality analysis of published ChIP seq data G3 Bethesda 4 2 209 23 Rye et al 2011 Rye M B Seetrom P and Drabl s F 2011 A manually curated ChiP seq benchmark demonstrates room for improvement in current peak finder programs Nucleic Acids Res 39 4 e25 12
7. fragment length peak and the background cross correlation values This value should be greater than 1 05 for ChIP seq experiments Figure 13 Quality measures for the control ChIP seq dataset that the number of reads is very low is due to the fact that only a small subset of the data was used The quality report contains additional information that could be used for troubleshooting For ex ample if the relative strand correlation or the normalized strand coefficient were classified as low the cross correlation plots should be examined in more details More information regarding the cross correlation plots and the Transcription Factor ChIP Seq tool can be found in the user manual Click the Help 7 button or go to http clcsupport com clcgenomicsworkbench current index php manual Running_ChIP_Segq Analysis_tool html After having verified that the quality of the ChIP seq datasets is acceptable the next step is to annotate them with information about their nearest upstream and downstream genes This can be done using the Annotate with Nearby Gene Information tool Toolbox Epigenomics Analysis Ss Annotate with Nearby Gene Information 5 Select first the track to annotate nrsf chr21 Reads Peaks and after clicking Next the dialog shown in figure 14 will appear F Annotate with Nearby Gene Information Parameters 1 Choose where to run 2 Select an annotation track Selecte gene track 3 papimetor Gene tra
8. iSeq de multiplexing Trim reads Previous gt Next Figure 1 Import raw reads When analyzing your own data you should select the sequencing technology appropriate for your data This dataset consists of two fastq files obtained using an Illumina sequencer so the Illumina importer should be chosen Select the nrsf chr21 fastg and control chr21 fastgq files and make sure the Paired reads checkbox is not checked The option to discard read names and quality scores are not significant in this context and can be safely set to false because of the relatively small amount of reads Click Next Save the imported reads list and click Finish After a short while the raw reads from both files have been imported Next import the reference genome sequence that was also included in the zip file In this tutorial we will use only the human chromosome 21 as reference The reference data is provided in clc format in the files NC_000021 Genome clc andNC_000021 Gene clc which are the genomic chromosome 21 reference sequence and the gene annotation track for chromosome 21 respectively To import the clc files drag and drop the files NC_000021 Genome clc and NC_000021 Gene clc into the Biomedical Genomics Workbench or use the Standard Import tool File Import j Standard Import Locate NC_000021 Genome clc and NC_000021 Gene clc Select QIAGEN Tutorial ChIP Sequencing using the Biomedical Genom
9. ics Workbench 4 Select the default option Automatic import The Biomedical Genomics Workbench will correctly recognize that the file is in clc format see figure 2 P Import Choose which files should be imported Lookin ChIPSeqBxWB 1 Choose where to run 2 Choose files to import I _ control chr21 fastq aries NC_000021 Gene clc Recan Rans a NC_000021 Genome clc _ nesf chr21 fastq File name NC_000021 Gene dc NC_000021 Genome dic Files of type All Files Network Options Automatic import Force import as type Trace files abi ab1 scf phd Force import as external file s Figure 2 Import reference data Use the standard importer to import the reference sequence along with the gene annotation track Then press Next and choose a folder where the result will be saved You should now have the files depicted in figure 3 E ChIP Sequencing Figure 3 The files created after the importing step is done Mapping the reads to the reference genome Once the data has been imported the next step in the analysis is to map the reads to the reference genome Toolbox Resequencing Analysis Map Reads to Reference The dialog shown in figure 4 allows you to choose the files containing the raw reads Since we want to map two lists we check the Batch option to enable the batch mode and select the folder where the sequence lists are stored
10. l data is typically used to correct for sequencing biases e g genomic regions that are more accessible repeated regions or copy number aberrations For further information see the Wikipedia entry at http en wikipedia org wiki Chip Seq The workflow consists of five parts Importing the raw sequencing data Mapping the reads to a reference genome e Calling peaks e Visualizing the results In this tutorial we will focus on how to run the analysis and we will not go through the technical details of how the Transcription Factor ChIP Seq analysis is implemented The user manual already explains the details of the algorithm Click the Help button in the dialog see below to read this or go to http clcsupport com clcgenomicsworkbench current index php manual ChIP_Seq_Analysis html We will look at a subset of a ChiP seq dataset for the transcription factor NRSF Neural Restrictive Silencer Factor on the human cell line Gm12878 Also known as REST RE1 Silencing Transcription factor NRSF is a transcription factor involved in the repression of neural genes in non neuronal cells such as the lymphoblastoid cell line Gm12878 We therefore expect NRSF ChIP seq peaks to be associated with genes involved in neural activity The data was collected by the Myers Lab at the HudsonAlpha Institute for Biotechnology This dataset is well studied and has been often used to evaluate the performances of ChIP seq algorithms Rye et al 2011
11. og shown in figure 8 now appears Map Reads to Reference Result handling Output options 2 Select sequencing reads Create report 1 Choose where to run 3 Batch overview E Collect un mapped reads 4 References Result handling 5 Mapping optons one Result handling Save C Into separate folders Log handling E Open log Figure 8 Create report and Save Check Create report to obtain a detailed report about the read mapping and leave Collect un mapped reads unchecked since we are not interested in those reads Click Finish You can follow the progress of the mapping both in the status bar at the bottom left corner and under the Processes tab There is also a log showing the progress Because of the quite big reference sequence Human chromosome 21 with a size of 47 Mbp it may take a few minutes to map the data 00000 00000 QIAGEN Tutorial ChIP Sequencing using the Biomedical Genomics Workbench T Calling peaks The results of the read mapping are now used as input to the Transcription Factor ChIP Seq tool to detect significant peaks Toolbox Epigenomics Analysis Ss Transcription Factor ChIP Seq This opens a dialog where you select the nrsf chr21 Reads gt see figure 9 and click Next oO oO Transcription Factor ChiP Seq Select one or more read mapping 1 Choose where to run ss Navigation Area Selected elements 1 ki Select one or more read Y ChIP Sequencing n
12. ontrol chr21 Reads control chr21 Reads nrsf chr21 Reads lag nrsf chr21 Reads Peak shape score nrsf chr21 Reads Peaks gt nrsf chr21 Reads Peaks Annotated lf nrsf chr21 Reads Peak shape score An Introduction to Annotation Tracks bg 4 m 3 Eai Qr lt enter search term gt ie Ex Figure 16 Create a Genome Browser View to visualize the results Once the Genome Browser View is created the easiest way to explore peaks is to make a split view of the table and the peak annotation track by double clicking on the label nrsf chr21 Reads Peaks Annotated in the left side of the View Area You will then be able to browse through the peaks by clicking in the table as the peak annotation track and the table are connected figure 17 As a result the view will jump to the position of the peak selected in the table You can browse through all the 144 peaks found for this sample by selecting in the table Next we sort the table according to P value so that we can look at the top peak figure 17 You can sort columns by clicking on the column header in this case the P value The strongest peak is close to the gene SYNJ1 synaptojanin 1 This gene encodes a phospho inositide phosphatase that regulates levels of membrane phosphatidylinositol 4 5 bisphosphate The expression of this enzyme affects synaptic transmission and thus it is not a surprise that this gene i
13. oss correlation plot This value should be greater than 0 8 for transcription factor binding sites but can be lower for ChIP seq input or for histone marks Normalized strand coefficient 2 488 OK The normalized strand coefficient describes the ratio between the fragment length peak and the background cross correlation values This value should be greater than 1 05 for ChIP seq experiments Figure 12 Table of quality measures for the NRSF ChIP seq dataset For each of the 3 measures the table provides the name the value notes to better understand the meaning of the measure and a status which can assume the value OK if the value is reflective of sufficient quality and Low or Very Low if the value is lower than the quality threshold For more details on how the quality thresholds were determined see Landt et al 2012 and Marinov et al 2014 In figure 12 the values for the relative strand correlation and the normalized strand coefficient are OK while the number of reads is classified as Very Low This should not be surprising or worrisome because the data used in this tutorial is a small subset of a ChIP seq experiment In fact the full datasets consists of about 16 millions reads which is significantly higher than the threshold value However in normal circumstances a small number of reads would be a strong indicator that the ChIP seg experiment is of low quality The quality measures table for the control experiment figure 13
14. rsf chr21 Reads mapping control chr2 1 Reads iip 4i gt Q lt enter search term gt Batch J Previous Next Cancel Figure 9 Select the first set of mapped reads You can now choose control chr21 Reads gt as control data see figure 10 You can leave the Maximum P value for peak calling to the default value of 0 10 A smaller P value can be specified to obtain a smaller number of high quality peaks while a higher P value threshold can be set to obtain a higher number of peaks BOR Transcription Factor ChlP Seq Peak shape parameters Choose where to run N Select one or more read mapping Controls Control data control chr21 Reads S VJ Peak shape parameters Peak calling Maximum P value for peak calling 0 1 Previous Next Cancel L Figure 10 Choose control data After clicking Next you can choose the output data to be generated see figure 11 In this tutorial we select all the possible outputs the Transcription Factor ChIP Seq tool can generate Click Next chose the folder where you want your results to be saved and click Finish After a few minutes the analysis will complete and the following results will appear e nrsf chr21 Reads Peaks the list of all called peaks e nrsf chr21 Reads QC Report The quality control reports The QC report contains metrics about the quality of the ChlP seq experiment e nrsf chr21 Reads Peak shape fil
15. s inhibited by NRSF whose function is to repress neural genes in non neuronal cells Note the nicely distributed green forward and red reverse reads for this peak this is a typical shape for transcription factors QIAGEN Tutorial ChIP Seque S ncing using the Biomedical Genomics Workbench GeBe BeS E oe 8s OC amp how New Save Import Export Graphics Print Undo Redo Cut Copy Paste Delete Workspace Plugins Download Workflows P tw Track List x 4 000 33 022 200 33022400 33 022 600 33 022 800 33 023 000 33 023 200 Se a 33 NC_000021 Genome NC_000021 Gene Gene annotations 352 SN 1 o2 gt OS nrsf chr21 Reads ee am am am a a z 7 486 047 reads Z 5T 24 2190 pm See control chr21 Reads 2 2 _ _ __ _ 307 787 reads 20 35 nrsf chr21 Reads Peak shape score Graph 0 nrsf chr21 Reads Peaks Annotated Peak annotations 144 C E nrsf chr21 R X 4 Rows 144 Table view Genome Filter Name Center of peak Length Peak shape score P value 5 gene 5 distance 3 gene 3 distance 0000 530 Peak 3 83E 271 SYNJ1 381 C2lorf66 NC_000021 38208454 38208647 Peak 29 06 5 47E 186 DYRK1A 398904 KCNJ6 0 NC_000021 39822927 39823121 Peak 39823024 195 28 59 4 61E 180 RPS26P4 37112 B3GALTS 128002 NC_000021 43567226 43567423 Peak 43567323 198 26 92 5 89E 160 CRYAA 101243 C21lorfl36 5582 NC 000021 42269166 42269371 P
16. ter l The peak shape filter contains the peak shape that was learned during the ChIP seq analysis e nrsf chr21 Reads Peak shape score ls A graph track containing the peak Shape score The track shows the peak shape score for each genomic position Before continuing the analysis or looking at the results we recommend to look at the quality control report The most important sections of the report are the tables containing Quality QIAGEN Tutorial ChIP Sequencing using the Biomedical Genomics Workbench 8 Transcription Factor ChiP Seq Result handling m Choose where to run 2 Select one or more read mapping Output options 3 Peak shape parameters v Create QC report v Save peak shape filter 4 Result handlin g vV Save peak shape score graph track Result handling Open Save Log handling Open log Previous Next Cancel Figure 11 Select the output data to be generated measures The report nrsf chr21 Reads QC Report m will contain one table for the NRSF dataset figure 12 and one for the control dataset figure 13 of reads 047 Very low For mammaiian cells this value should be at least 10 million reads For smaller organisms e g worm and fly this value should be at least 2 million reads Relative strand correlation 1 012 OK The relative strand correlation describes the ratio between the fragment length peak and the read length peak in the cr
17. uences as reference sequences but in this tutorial we are using only chromosome 21 Click Next and set mapping options as shown in figure 7 For ChiP seq we recommend stringent mapping settings Setting the length fraction to 0 5 specifies the minimum length fraction of a read that must match the reference sequence and setting the similarity fraction to 0 8 specifies the minimum fraction of similarity between the read and the reference sequence The mismatch insertion and deletion costs are here set at 2 3 and 3 Next select to ignore the non specific matches The settings are not important for the result of this tutorial but when you work with your own data this may be important For more information about the other settings please click the Help button QIAGEN Tutorial Map Reads to Reference Mapping options 1 Choose where to run Read alignment 3 Select sequencing reads Mismatch cost 2 Linear gap cost Batch overview TE Affine gap cost References Insertion cost 3 3 2 Mapping options Deletion cost Insertion open cost 6 Insertion extend cost 1 Deletion open cost 6 Deletion extend cost 1 Length fraction 0 5 Similarity fraction 0 8 E Global alignment Color space alignment Color error cost 3 Auto detect paired distances Non specific match handling E Map randomly Ignore Figure 7 A stringent read matching is desired for ChIP seq After clicking Next the dial

ChIP Sequencing using Biomedical Genomics Workbench

Contents

Download Pdf Manuals

Related Search

Related Contents