Home

GenEx User Guide - Gene

1. Figure 17 Classification of genes and samples visualized with dendrograms and a heat map indicating expression levels The heat map appearance can be altered by mirroring in any of the dendrograms nodes 18 Experimental design Designing experiments is more difficult than analyzing fie a fia oe dial results In a well designed experiment confounding Vent Yen Ye Ae a AN i a AA variation is minimized and the number of subject is sufficient to obtain conclusive results A good strategy is to F LA perform a fully nested pilot study before specifying the test protocol for a larger study Figure 18 Tichopad et al Es 2009 Clinical Chemistry 55 101816 1823 Figure 19 shows the result of a nested pilot study wherein three heifers were studied by collecting three blood samples of each that were reverse transcribed in triplicates and each cDNA was O analyzed in triplicate using qPCR Using a nested ANOVA i the variation arising from the different experimental steps can be estimated and expressed either as standard deviations Or as variance contributions Kitchen et al 2010 Methods 50 231 236 While B actine and Caspase 3 show generally s s low standard deviations in all steps Interleukin 1 6 and Interferon y levels varied substantially among the heifers For liver samples the picture was different with the data Figure 18 Nested experimental design evidencing large variation
2. 21 38 7 58 24 92 20 7 28 21 28 7 66 24 97 20 64 20 Ea i 21 29 7 63 24 94 20 6 22 i 21 41 7 98 25 20 65 2 21 25 7 8 24 96 20 57 2 21 27 7 78 24 91 20 51 2h 21 37 7 79 24 99 20 51 28 Genes Figure 11 Left Input data for comparison of normalization with reference genes to normalization with total amount of RNA using Normfider A column indexing the used total RNA concentrations in logarithmic scale has been added Right Output indicating that for these particular samples total RNA normalization is almost as stable as normalization with the single optimum reference gene An older method to identify good reference genes that still is being used is geNorm It uses the same input data as Normfinder but it does not consider groups all samples are treated as being from a single population geNorm sequentially eliminates the gene that shows the highest variation relative to all the other genes based on paired expression values in all the studied samples The variability is reflected by a so called M value Figure 12 Because of the elimination process geNorm cannot identify an optimum reference gene and ends up by suggesting a pair of genes that shows high correlation and should be suitable for normalization The M value is related to the SD but as calculated the M values for the genes are based on different sample sizes and are therefore not strictly comparable Furthermore as the comparison of any individual candidate gene is performed towar
3. 6 F 6 9 10 11 12 13 Log Conc Figure 8 Estimation of LOD by fitting the rate of positive calls as function of concentration Selecting reference genes With qPCR the amount of target molecules in a sample is measured rather than their concentration A large sample is expected to contain more target molecules than a small one and to compensate for the effect of size normalization must be applied There are several options as to normalize A popular option in gene expression analysis 1s to normalize with reference genes since this should not only compensate for variation in sample amount but also for variations in extraction yield reverse transcription efficiency and RNA quality In the early days of PCR genes needed for basic housekeeping functions were thought to have stable expression and could serve as references Experience has shown this is not always true 12 and before a gene 1s used as a reference this assumption should be validated Criteria for a good reference gene is that it has a stable expression among samples and that its expression is invariant of the treatment applied Stability of expression is reflected by the standard deviation SD of biological replicates However we cannot just take a set of samples measure the Cqs and calculate the SD s because we do not know how to normalize the samples for this exercise Of course we can use the same amount of RNA in the analyses but then how do we know that all samples we
4. Finally the study is paired meaning that each subject received both treatments and a sample was collected after each treatment Paired study designs are more powerful because the pairing reduces confounding variation This elevates the power of the test and the experiment requires fewer subjects It can be for example samples collected from all subjects before treatment and a second set collected after treatment It can also be positive and negative samples collected from the same subject or genetically similar individuals such as siblings identical twins or clones A special type of paired study design is repeated samplings used in more than two subsequent measurements In general the word paired is replaced by repeated Specialized statistical procedures are available to analyze repeated samplings Data import The experimental design is defined in part by deciding on the experimental factors and covariates involved in the experiment and in part when the samples and assays are mixed while dispensed into the qPCR containers This information is critical for proper analysis and mining of the measured data This has been realized by several of the leading qPCR instrument and assay providers Roche LC480 software for RealTime Ready custom and focus panels for example names all genes and samples indicates reference genes and specifies technical and biological replicates at various levels This information is transferred to GenEx and a
5. for the false discovery rate due to multiple testing including Bonferroni Benjamini Hochberg Westfall amp Young and Benjamini Y ekutieli Figure 15 4 r 7T Bra B Ygl B Sia B Genes p53 A Figure 13 Comparison of the expression of four genes in three groups Bars indicate mean expression and the error bar either SD SE or CI s 3 1946979940 PE 3 5002778024 E 3 7502976451 E 3 6113977323 un 34076778602 E 3 3058179242 A 3 1020980521 3 9447575230 3 2039579882 3 4447178370 46948170520 3 0743180696 2 6946583079 3 7595576393 8 1487948832 7 9635949995 7 7783951158 7 4079953484 7 2227954647 7 2227954647 Difference Linear 1 01859936 1 48159906 1 111199302 1 94459877 2 03719872 Val B Sia B p53 Bra B Genes Figure 14 Comparison with t test of the expression of four genes between two groups Left Descriptive statistics includes normality test and p values Right Bar graph showing the differential expression in linear scale Note the indicated confidence intervals are asymmetric around the means 16 36187663 00001215 00012782 00044 0 00587647 119182055 0 00031428 0 00150089 0 01037 0 00690071 14982538 0 76938131 0 00932495 M 0 05265586 0 73924194 0019547086 AGS 0 06705 0 10790482 0 77903746 0046868727 DEEE 0 09908 0 22880968 RG13 0 48639622 0 20140296 0 20206905 0 20406 0 9
6. 175643 lt 0 9922678353 ed red lines ed standard curve cluding confidence The residual plot shows the deviations of the standard samples measured Cq and their predicted Cq by the standard curve Figure 6 If the straight line standard curve is adequate to model the data residuals should fluctuate randomly If the model is inadequate runs of positive and negative residuals will be seen GenEx performs a statistical test for the number of runs and if they are too few warns that the linear standard curve may be an inadequate model Outliers are readily identified visually in the residual plot and GenEx further uses the Grubb s test to support the removal of outliers In general no more than one outlier should be removed from a standard curve EP6 A Vol 23 no 16 2003 Approved guideline NCCLS If multiple outliers are indicated the approach used 1s likely to be unstable and should be 10 overseen When replicates are available the residual plot also reveals if noise increases at low concentrations A reliable standard curve is critical for accurate estimation of the concentrations of field samples which in GenEx are referred to as test samples The estimates improve 1f the field samples are available in replicates that can be averaged to reduce confounding variation Concentrations of the unknown samples are estimated by entering the standard curve at the measured Cq and reading out the log of the concentration on the x ax
7. 2 LM 157 gene 54 28 95 27 57 A gene 17 28 66 21 31 NS gene 18 30 9 29 57 133 aene 53 32 41 31 16 1 26 Figure 3 Scatter plot comparing replicate measurements performed in separate plates left and two different conditions right Differentially expressed genes are tabulated bottom In the data manager subjects and genes can be removed temporarily from analysis to compare results based on analyses of subsets of data Data can also be mean centered subtraction of the mean value or autoscaled subtraction of mean followed by division with the standard deviation to change the weights of the genes samples in analyses This is particularly useful in expression profiling analysis where genes having different expression levels can be assigned equal weights For analyses that apply models based on measured data such as the standard curve reverse calibration neural networks self organized map potential curves etc samples and genes can be assigned either training or test Training data are used to create the model which is applied to classify the test samples The various analyses available in GenEx are listed in Table 1 Table 1 Analyses available in different versions of GenEx GenEx version comparison _ Standard Enterprise Interplate calibration x PCR efficiency correction x oo x Normalize to sample amount x Normalize to reference genes samples x x x Normalize to spike x x o Missing data handling and primer dim
8. 2906087 Figure 15 Comparison of the expression of multiple genes between two group corrected for the high false discovery rate due to multiple testing Columns 2 and 3 indicate result of normality test and column 4 the differential expression in log scale Colum 5 indicates p values calculated with t test Green indicates p values considered significant based on Bonferroni correction yellow are p values below stipulated uncorrected confidence level that are not significant with Bonferroni correction and red are p values above confidence threshold The three last columns indicate p values calculated using Benjamini Hochberg Westfall amp Young and Benjamini Yekutieli corrections for the high false discovery rate Expression profiling T test and ANOVA are univariate methods that analyze the expression of every gene separately effectively assuming that the genes are expressed independently of each other This is rarely the case genes expressions tend to be correlated This correlation can be exploited in the analyses using multivariate statistical methods GenEx offers several unsupervised as well as a selection of supervised methods to classify samples and categorize genes based on expression profiles Unsupervised methods classify samples and genes based on the measured profiles only They include classical hierarchical clustering combined with heat map which can be based on various clustering schemes including the Ward s algorithm a
9. D among animals analyzed with a single sample collected from each animal RT performed in triplicates and qPCR in duplicate is estimated to 1 02 cycles This can be fed into a Power analysis to estimate the number of animals needed to ensure a particular difference with certain confidence and power If we accept 5 false positive rate 95 confidence and 5 false negative rate 95 power we construct a graph showing how many subject are needed to measure a particular difference due to treatment For example to measure a 2 fold difference ACq of 1 we require under these criteria 15 animals Figure 20 0 130625 036142060753 1000 1 045 1 022722524150 125 99 M 95 90 80 Difference re Jet ee 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 No of samples Figure 20 Power analysis assuming SD 1 055 cycles and 5 false positive and 5 false negative calls Straight line and arrow indicates that 15 subjects are needed to measure a difference of 1 cycle due to treatment under those conditions More information An extensive help file is accessible in the software that further describes how the various analyses are performed Help can also be found in the audiovisual tutorials at MultiD s homepage www multid se tutorials php and in the free support forum where hundreds of GenEx users share experiences and advises www qpcriorum com 20
10. GenEx User Guide Version 1 0 Copyrights 2001 2012 Multid Analyses AB www multid se Introduction As the qPCR field advances the design of experiments and the analysis of data are becoming more important and more challenging Studies are now designed using multiple markers nested levels exploring or confirming the effect of multiple factors occasionally in paired designs etc These analyses require more information on how the experiment was set up to handle references standards and controls appropriately and correctly account for the variance and covariance in the measured data Proper handling of such data requires software that support the planning and design of experiments and data analysis GenEx is the main qPCR software on the market today Data arrangement When groups are compared data are classically arranged with the measured Cq explained variables in columns headed with the experimental group label explanatory variables This arrangement provides easy overview of the data Figure However it is not practical for advanced studies that may include more than one nominal factor or covariate variable of metric character e g time age dose etc multiple markers replicate measurement repeated measurements same subject sampled repeatedly multiplate measurements etc A more flexible approach is to arrange the data with samples as rows and all variables in columns Figure 1 The format is readily generalize
11. ations including 11 Limit of detection The limit of detection LOD is the lowest amount of analyte in a sample that can be detected with stated probability although perhaps not quantified as an exact value with analyte here referring to the targeted nucleic acid World Health Organization 1995 document WHO BS 95 173 and EP17 A Vol 24 no 10 2004 Approved guideline NCCLS For classical tests when a signal is measured against a background LOD is estimated from the standard deviation of the blank readout at the standard curve intercept This approach is however not applicable to qPCR which due to its real time readout gives no reading for a negative sample Cq for a blank sample is formally infinity Instead for an analytical process that involves qPCR LOD can be estimated from multiple standard curves Burns et al European Food Research and Technology Volume 226 Number 6 1513 1524 DOI 10 1007 s00217 007 0683 z A minimum of six is recommended and concentrations around the expected LOD should be assessed The measured data are transferred to binary format indicating positive and negative PCR s and the fraction of positive calls at each concentration is calculated LOD is the concentration at which replicates are positive at the stated rates e g 95 GenEx fits the measured positive rates at different concentrations to estimate the LOD Figure 8 Fraction postive samples 1 0 1 2 3 4 5
12. d a plurality of genes assumed to resemble most closely the anticipated stable behavior it is prone to systematic failure where group of co regulated instable genes may be involved in the analysis Any such co regulated complex of instable genes may dominate over the stable genes and hence point at deviant genes as candidates Usually the gene rankings by geNorm and Normfinder are similar which is reassuring Should the rankings differ there would be a reason to suspect the selection to include one or more regulated genes and the result should be interpreted with caution Genes Figure 12 Ranking of reference gene candidates be sequential elimination using geNorm The final two genes selected cannot be compared further 15 Relative quantification Treatment groups are readily compared visually in bar graphs using descriptive statistics Figure 13 and statistical comparison is made using ANOVA one factor two or more levels or 2 way ANOVA two factors two or more levels each or in the case of two groups with either t test paired unpaired 1 tail 2 tail or non parametric tests Mann Whitney Wilcoxon The difference between the groups is shown in either linear or logarithmic scale and the confidence interval is indicated Figure 14 Note that the confidence interval of the differential expression is asymmetric when data are presented in linear scale When expression of many genes is compared GenEx offers several means to control
13. d to any number of markers and additional columns and rows can be added that specify the experimental design indexing samples markers plates etc These are referred to as classification columns and classification rows and have labels starting with In the example shown in Figure 2 Repeat indexes qPCR technical replicates samples with the same index are replicates on the qPCR level These are expected to be highly similar and shall be averaged during data pre processing Explained Explanatory Explanatory Control Control Control Control Control Treated Treated Treated Treated Explained mn s heads left 22 22481 16 71773 M 1 1 17 93696 16 59702 M 1 2 19 76137 16 60889 E 1 3 17 23068 1623785 2 1 4 16 57810 16 89222 B 1 5 18 25507 16 54029 Y 1 6 2947699 16 8608f M 2 1 16 66465 16 08867 1 2 2 22 91167 16 67638 2 2 3 17 80890 16 15741 2 2 4 16 80731 16 05589 3 2 5 1977756 16 19985 B 2 6 0 1 Figure 2 Example of data arrangement in GenEx First column list the samples 2 and 3 th th columns are measured Cq values 4 column indexes technical replicates 5 column indexes th rc treatment groups and 6 column indexes paired samples Bottom row identifies maker and reference genes Treatment indexes treatment groups that eventually shall be compared using a statistical test
14. en 2 Teu 2 one Calbraten 3 a Tew Coran 4 e Tend sib ator af Ten 5 Car pur MA Standard cur Seed data Standard cure Geed dati ation in y be reversibly The standard curve 1s the best straight line fit of the Cqs measured for the standard samples to their concentration in logarithmic scale Figure 6 It is calculated using linear regression and defines the intercept which is the Cq expected for a sample containing a single template molecule and the slope From the slope the PCR efficiency is estimated GenEx also calculates the uncertainties in the estimates of the slope and the intercept which are reflected by the dashed lines in the plot as the Working Hotelling area and the confidence interval for the PCR efficiency Figure 6 It is essential to calculate the confidence information since it reflects the precision of the estimated efficiency In this example the precision of the estimated efficiency is quite high because a large number 21 of standards was used and a wide concentration range was covered 6 logs In the literature we frequently see standard curves based on a substantially lower number of standards The PCR efficiencies estimated from such standard curves are highly uncertain and any corrections made are unreliable Gene Slope 3 503664972 lt 3 420833333 lt 3 338001693 Intercept 31 213375170 lt 31 583609523 lt 31 954243876 Efficiency 0 9253672932 lt 0 9603
15. er correction x x x Relative quantities and fold changes d L a x Plots PS Scaterplots poo poo x y Line plots x Barplots poo oo ooo EFTESST SS SO PCA PS RE Yo EX E O SO Hierarchical clustering dendogram Heatmap analysis PE SO E Self organizing map SOM ox ox Artificial neural networks ANN O ox Support vector machine SVM C ee E Standard curve x Reverse calibration Z oOo oS ox oo a Limit of detection LOD x x Partial least square PLS Trilinear decomposition rs a ee Descriptive statisties x x Parametric test ox oso Non parametric tests oo x y One way ANOVA oo ooo Two way ANOVA ox os Nested ANOVA Spearman rank correlation coefficient Pearson correlation coefficient Sample size x Experimental design optimization ooo ox X lt gt gt d gt lt lt gt lt lt lt gt gt lt gt x gt X x lt gt A A Standard curve and reverse calibration Amounts of pathogens in field samples can be quantified using qPCR by comparing the measured Cqs of the field samples with those of standard samples by means of a standard curve Representative data arrangement is shown in Figure 4 In addition to the measured data the concentration of the standards is given in a classification column Additional classification columns can be used to ind
16. ex replicates and to identify the standard and test samples Any technical replicates shall be averaged during pre processing and the averaged Cq shall be considered a single data point Independently prepared standards are treated as different data points while replicate measurements of field samples are averaged and used as a single more precise estimate Groups are created in the GenEx Data Manager and assigned either test or training status Samples can also be reversibly removed from analysis Figure 5 A confidence level is set for the analysis B C D E me 27 63 1 1 1 25 31 2 2 1 21 42 3 3 1 17 82 4 4 1 14 65 5 5 1 10 63 6 6 1 7957 7 1 26 79 1 1 2461 2 9 1 21 38 3 10 1 17 87 4 11 1 14 27 5 12 1 10 7 6 13 1 6 02 T 14 1 21 91 A 15 1 24 68 2 16 1 21 42 3 17 1 17914 18 1 14 45 5 19 1 10 35 6 20 1 8 14 7 21 1 15 76 0 22 2 18 22 0 23 2 on 74 0 4 2 s indicate es wager Lineas regression maf x xj Apply Lig Apply Close 2 Data selection Coloes amp Symbol Groups Cala selection Colors Symbols Groupe a i ro se ma Caldnation Seka 1 35 Select 246 Rename A total reacia of Calibuation 6 EPT Fiowes Classes Pared Automate M Select Geoup column y ee Acteretions Clase Pais te ues k ease s E a Desctreste pel Cabbe amor 2 Cabbuation 6 Calibrators of Cabbration 7 _ a_n Le sation IE Calibration B luli I Calbraser 1 of Teni Scaling Calbeas
17. f the heat map can be changed in GenEx within equivalent mathematical solutions by mirroring the dendrograms in nodes lt erni lt iamnis Figure 16 Classification of genes expressed during developmental of Xenopus laevis based on hierarchical clustering left SOM center and PCA right The dendrogram can be mirrored in any of the white nodes Supervised methods require a training set of samples with known classification for example negative and positive samples or short medium and long term survivals A model is developed based on the training set that can be used to classify new data in GenEx called test data The procedure is similar to a regression based on standard curve but here it is based on multiple genes and the model does not have to be linear Supervised methods available in GenEx include Partial Least Squares PLA which is used to calculate a single standard curve based on the expression of multiple genes to predict concentrations or other measures of test samples Potential Curves is a variant for prediction of new data based on PCA and Artificial Neural Network ANN and Support Vector Machines SVM are multivariate non linear methods to classify samples Logistic regression Probit Receiver Operating Characteristics ROC and Survival Analysis will soon also be available on N P I N lt NA D NH NN be i p i i E 2 Le 6 4 7 0 7 4 5 a O 1 r 4 A a i i i i i
18. ical effect and invalidate the statistical inference in the majority of the statistical tests employed Outliers in the data can be tested for based on standard deviation and the Grubb s outlier test The pre processing of data is logged and stored in a log file Screening by correlation Several companies including Roche Exigon Life Technologies Lonza Qiagen and TATAA Biocenter offer pre plated assays for smooth expression profiling and screening purposes Data from those plates are readily read into GenEx Rarely are all assays relevant for every study and a strategy is to analyze a few representative samples of each kind in a pilot study to identify differentially expressed genes to be used in a larger downstream study This is readily done using the GenEx scatter plot Figure 3 Replicated measurements can be compared to test the reproducibility between plates top left or screen for differentially expressed genes under two conditions top right bottom Correlations between genes expressions can be quantified by calculating the Pearson or Spearman correlation coefficients This is typically applied to larger number of samples and has for example been used to reveal correlations between genes expressed in individual cells Stahlberg et al 2010 Nucl Acids Res 1 12 doi 10 1093 nar gkq 1 182 Preparing the data for analysis Groups for comparison are created using the GenEx Data Manager Treatment groups or treatment facto
19. in the TATAA reference gene panel in 14 brain samples from mice estimated using Normfider intergroup variation Good strategy using Normfinder is to inspect the calculated intra and intergroup variations to identify any genes that appear regulated or exceedingly unstable and remove them from the data set using the GenEx Data Manager Normfinder analysis is then repeated without considering the groups since the remaining genes are not regulated This produces more robust result with a single SD estimate for each gene based on which the genes are ranked Figure 10 The gene with lowest SD is the optimum reference gene GenEx also calculates the accumulated SD expected if multiple reference genes are used for normalization If we use larger number of reference genes random variation among the genes expression partially cancel reducing the SD Comparing the SD contributed from different number of reference genes selected based on stability a minimum in the accumulated SD plot is obtained indicating the number of reference genes that give the lowest SD Figure 10 However analyzing more genes cost time and money and one should consider the degree of improvement and the overall noise contributed by the reference genes when making a decision In the example the largest improvement is observed when including the 2 reference gene including additional reference genes only slightly improves the result Furthermore the noise contribution from the best
20. in the sampling step Knowing the Three subjects heifers are tested costs associated with the different experimental steps the Three samples are collected from each follow up study can be cost optimized For example for subject and extracted genes exhibiting SD 0 1 cycle for the qPCR and RT steps The extracted material is reverse SD 0 2 cycles for the sampling extraction step and SD 1 ape Pachi DINA cycle for the variation among the animals and assuming a a a cost of 1 unit for the qPCR 3 units for the RT 10 units for sampling extraction and 100 unit for each animal with a total budget of 1000 units the best we can do is to analyze 8 animals 0 95 100 0 9 95 0 85 30 08 65 0 75 30 0 7 T5 0 65 is 0 6 0 55 ES 55 5 05 2 50 0 45 45 o4 40 0 35 35 0 3 30 0 2 20 0 15 15 0 1 10 0 05 5 0 0 Acte Casp 3 IL 1b IFNg Acte Casp 3 IL 1b IFNg Figure 19 Decomposition of the variance in a 4 level nested study of blood samples collected from heifers Left estimated standard deviations SD among heifers red among replicate blood samplings green among reverse transcription replicates yellow and among qPCR replicates blue Right Same data presented as variance contributions expressed in percentages 19 sample each animal once perform RT in triplicates and qPCR in duplicates The total standard error SE for this study is expected to be about 0 36 cycles Figure 20 Using the same tool the S
21. is Figure 7 The Working Hotelling area which reflects the prediction uncertainty is wider than before because of the additional error contribution from the measured Cq GenEx calculates confidence intervals for the estimated concentrations The confidence intervals are symmetric around the mean in logarithmic scale while they are asymmetric around the mean in linear scale Figure 7 The uncertainty in the estimates is larger than what most people think Even though the standard curve in the example is based on 7 concentrations of standard covering 6 logs each measured in triplicates for a total of 21 readings and the assay has 96 efficiency the uncertainty in the estimated concentrations of the unknowns is substantial For example for Test 1 estimated concentration is 46700 copies with the 95 confidence interval 31000 61000 copies With less accurate standard curve the precision in the estimated concentration would be even worse Gene Test 1 Test 2 Test 3 Test 4 Test 5 Test 6 Test 7 Test 8 Test 9 Test 10 Test 11 Test 12 Test 13 Test 14 Repeats Conc 8901 6248683_ 1 1410 8124655 2 26191 839432 __ 915995123857 1 2828 3941731_ 71726 761884 1 337312 17479 _ 11943 139268 _ 1 193798 28152 2 5301 2457380 305602 90163 4 17577721 901 E 459729 29095 QU Y Ly LY LY 0 Ma Ly Y Ly QU Ca La La mple s itr
22. it is essential to correctly define the statistical unit often referred to as a Subject when organisms are used Each unit should be associated with a single value for each variable to use common statistical methods This value must however frequently be assembled from various measurements 1 e responses of target and reference genes estimated amplification efficiencies etc In order to integrate the process into a logical workflow GenEx provides an intuitive wizard with the following sequential operations 1 Interplate calibration Many studies cannot be fitted in a single experimental run or for practical reasons have to be extended over time qPCR instruments perform base line correction and set threshold separately for each run which introduces a bias between the Cqs measured in different runs This bias can be compensated for by performing a common amplification in all plates where the same sample is analyzed for a given assay This sample is called Inter Plate Calibrator IPC Any variation in the measured Cqs of the IPC among runs reflects systematic variation due to instrument factors and should be compensated for It is sufficient to run a single IPC for each channel in the instrument if a common threshold and base line correction is used It is not recommended to perform separate inter plate calibrations for each target Since every correction adds confounding variation to the data unimportant corrections shall be avoided as they may i
23. l is selected which corresponds to AACq 2744 in the classical in the classical approach and data are converted to linear scale approach The reference level can be the most expressed sample the least expressed sample mean expression of all the samples mean expression of a group of samples or percentage sum of the expression in all samples set to 1 It is also possible to convert the ACq values to an arbitrary linear scale AO Convert to log scale For statistical analysis with parametric methods the data shall be converted to logarithmic scale Available options are log logio In and log X 1 All the steps in the workflow are not needed since some cancel the effect of others The appropriate steps depend on the experimental design the controls and references that are available and the analysis that will be performed In addition to the pre processing work flow GenEx has correction for missing data GenEx recognizes two types of missing data random missing failed experiments and non random missing off scale data There is built in handling of random missing data among technical replicates which are replaced based on available information in the course of the pre processing This is very useful since the missing information can be ignored and is automatically accounted for There are also several means to handle non random missing or off scale data that are due to too low target amounts which may bias the biolog
24. mpair data quality rather than improving it For the same reason it is a good strategy to use a robust sample for IPC and analyze it in replicates In multiplate experiments the runs and the inter plate calibrators shall be indexed in classification columns 2 Efficiency correction If PCR efficiency has been estimated the measured Cq values can be corrected to account for suboptimal amplification Typically PCR efficiencies are estimated from serial dilutions run separately The PCR efficiencies may then be listed in a classification row for automatic correction in GenEx 3 Normalize using spiking PCR efficiency depends on the sample matrix Usually it is assumed that the sample matrix and thus the PCR efficiency 1s constant But occasionally there are variations which can be tested for using an exogenous spike added to the samples Differential expression of the spike between the test and a standard sample reflects the sample s specific inhibition and can to some degree be accounted for 10 11 Normalize to sample amount Measured Cq values depend on the sample input This can be the sample volume processed amount of RNA used for reverse transcription or cell count If sample input vary data may have to be normalized The sample input shall be indicated in a classification column Average qPCR replicates If gPCR replicates are available they shall be indexed in a classification column and their Cq values shall be averaged C
25. nd several distance measures including the Euclidian distance and the magnitude of the Pearson correlation While the Euclidian distance clusters genes based on similarities and consider up regulation and down regulation being opposite hence anticorrelated distance based on the magnitude of the Pearson correlation considers up regulation and down regulation to be correlated The latter is useful to classify for example genes that show the same temporal response to treatment independently of the genes being up or down regulated The clustering is visualized in a dendrogram which in GenEx can be mirrored in every node Mirroring in a node changes the visual appearance of the dendrogram producing an equivalent mathematical solution A small Self Organized Map SOM can be used to force classification into a defined small number of groups based on expression similarities SOM can also be used to validate a classification model based on the distribution of samples genes in a large map Principal Component Analysis PCA groups samples genes based on correlated expression in reduced space Figure 16 shows example of hierarchical clustering SOM and PCA of genes expressed during the development of the African claw frog Xenopus 17 Laevis from the oocyte to tadpole stage Bergkvist et al 2010 Methods 50 323 335 Hierarchical clustering of sample and genes can be combined showing also the measured intensities in a heat map Figure 17 The appearance o
26. orrect for genomic DNA background When quantifying RNA levels using RT qPCR the assays may also amplify genomic copies of the target if the DNase treatment used is insufficient The amount of genomic background can be assessed by measuring either NoRT controls or by using the ValidPrime approach The contribution to Cq from the genomic background can be calculated and the Cqs corrected Normalization with reference genes In expression studies normalization to endogenous controls such as stably expressed reference genes is popular In GenEx you can normalize to any number of reference genes you can even normalize sets of reporter genes to sets of reference genes to match the genes properties such as expression levels stabilities distribution in tissues etc It is also possible to normalize to the mean expression of all the genes global normalization Optionally reference genes can be indexed in a classification row for automatic processing Normalization to the expression of reference genes corresponds to calculating ACq in the classical approach Average technical replicates If additional technical replicates are available such as RT extraction and sampling replicates they shall be indexed in classification columns and averaged Normalize with Reference Sample s In some paired designs systematic variation can be reduced by normalizing to the paired sample during pre processing Relative quantities An arbitrary reference leve
27. re extracted and reverse transcribed with the same yield and that they have the same mRNA totalRNA ratio and the same RNA quality Furthermore 1f we were to assume that we can evaluate genes expression stabilities based on samples normalized to the same amount of RNA then we would have already decided that total RNA is the best norm The gene selected based on minimum SD measured on samples having the same amount of RNA will be the gene that shows a variation that correlates the most with that of total amount of RNA and we may then as well normalize to the amount of RNA directly If we suspect that the total amount of RNA is not the best norm we have to identify optimum reference genes using different strategy a NormFinder Intergroup variation al NormFinder Intragroup variation File Edit ar File Edit A B C A B C 1 1 Gene Name 1 2 0 0349 0 0349 2 0 0006 0 0021 0 0572 0 0572 3 0 0242 0 0062 0 0507 0 0507 4 0 0190 0 0166 0 0178 0 0176 E 0 0066 0 0043 0 1122 0 1122 6 0 0139 0 0057 0 0029 0 0029 Y 0 0156 0 0017 0 0656 0 0656 0 0154 0 0157 0 0086 0 0086 g 0 0035 0 0139 0 0542 0 0542 10 0 0055 0 0013 0 0471 0 0471 11 0 0164 0 0152 0 0099 0 0099 12 0 0072 0 0006 0 0321 0 0321 13 0 0092 0 0046 Figure 9 Estimated intergroup and intragroup variations with Normfinder of genes from the TATAA reference panels in representative brain samples of wild type and obese mice An appropriate app
28. reference gene is only 0 05 cycles and as little as 0 04 cycles when combining the two best reference genes Considering that the repeatability of a qPCR instrument is rarely less than 0 1 cycle estimated as SD of technical replicates using more than one reference gene and definitely more than two will in this study not improve the quality of the data appreciably Using Normfinder normalization with reference genes can be compared to normalization with total RNA by adding an extra column in the data sheet with the RNA concentrations per analyzed sample in logarithmic scale Figure 11 The algorithm is ignorant of the nature of the variables and will compare their variation For the data in our example normalization with total RNA is essentially as stable as normalization with PPIA which is the single optimum reference gene here In this study the samples analyzed were flash frozen biopsies from mouse brains from which RNA of very high quality RIN 8 9 was extracted Our experience is that for samples with high quality RNA normalization to total amount of RNA is often as good as normalizing with a single reference gene In samples of poor RNA integrity or when expression may have been induced normalization with reference genes is preferred 14 18S rRNA GUSB ARBP t ml 21 18 7 61 24 88 20 49 2 21 07 7 72 24 83 20 32 2 21 24 7 78 25 03 20 62 2a 21 48 7 7 25 12 20 6 2 21 29 7 68 25 20 55 24 21 13 77 24 97 20 49 2a a
29. roach to select reference genes 1s a special form of analysis of variance which in qPCR literature is best known as using the tool Normfinder Andersen et al 2004 Cancer Res Aug 1 64 15 5245 50 Normfinder is applied to a panel of candidate reference genes that is analyzed in a set of representative samples In essence Normfinder calculates a global average expression of all the genes in all the samples to which the individual genes are compared Based on this comparison SD for each candidate reference gene is estimated Furthermore if the samples are from different treatment groups Normfinder separates the variation into an intragroup and an intergroup contribution Figure 9 shows an example where reference genes were sought for an obesity study in mice where wild type mice and an obese strain were compared The genes were selected from the TATAA reference gene panel www tataa com which was measured on seven representative mice from each strain The intragroup variation estimated is the SD of the genes in the different treatment groups while the intergroup variation 1s differential expression and sums to zero for every gene over all the groups Good reference genes shall have low intergroup variation in all groups and negligible 13 0 05 F 0 045 0 04 0 035 0 03 SD 0 025 D Acc 0 02 0 015 0 01 0 005 Genes No of Genes Figure 10 SD left and accumulated SD right for the reference gene candidates
30. rranged appropriately for downstream analysis using a wizard A similar high level user friendly solution is provided by Exiqon who offers a customized version of GenEx with a powerful wizard to read their miRCURY LNA Universal RT microRNA PCR platform www exiqon com qpcr software On the BIOMARK microfluidic platform from Fluidigm technical and biological replicates are indicated by the naming of assays and samples and appropriate classification columns are created automatically Data generated on other qPCR instruments can also be read by GenEx including the Stratagene MX300X from Agilent Realplex from Eppendorf CFX96 384 from Bio Rad Eco from Illumina and the many different qPCR platforms from Life Technologies These efforts from the instrument manufacturers to transfer experimental design information automatically or at least semi automatically into GenEx substantially simplify the pre processing needed to prepare qPCR data for statistical analysis Data pre processing For most studies performed today the AACq method is not sufficient to analyze qPCR data Not that we are calculating differently but the studies have become larger and the experiments more complex In fact for most studies performed today it is not even possible to write a closed form expression to calculate the resulting expression response Rather the measured data must be processed sequentially to account for the various aspects of the experiment In particular
31. rs in multifactorial studies such as studies of the effect of gender or covariates such as age time or drug load can be indexed in classification columns and used to assign subjects into groups automatically The groups are assigned colors and symbols for plotting A neat feature is that colors and symbols can be set independently which makes it possible to assign subjects to multiple groups and identify these in plots by the shape size and color of the symbol Even shades of colors can be used creatively to indicate various levels of covariates at ordinary scale e g darker shade indicates higher drug load Treatment 41h plate 2 BRRBEMREBSERERaABStaeses Treatment A 1h plate 1 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 36 39 40 41 Treatment 4 24h 30 32 a4 36 36 40 gene 95 36 4 30 641 gene 74 29 64 gene 37 37 61 3284 477 gene 55 23 73 gene 83 31 81 2734 447 gene 12 28 65 gene 42 36 27 3187 439 gene 7s 24 97 gene 61 34 13 30 75 338 genes 27 92 gene 32 32 32 2917 315 gene 93 29 78 gene 29 29 07 2654 252 gene 85 24 59 gene 9 26 46 2413 234 gene 61 31 61 gene 65 35 05 32713 231 gene 54 29 25 gene 49 26 33 2405 228 gene T 24 93 gene 71 33 3 3121 208 gene 34 31 82 gene 96 29 55 2748 2 06 gene 51 28 54 gene 39 28 64 2661 204 gene 63 29 36 gene 16 27 55 2554 201 gene 44 29 32 gene 94 30 86 2909 176 gene 60 32 8 gene T3 30 2 28 54 167 gene 2 29 84 28 27 17 56 gene 59 28 75 27 21 153 gene 5 29 05 27 5

GenEx User Guide - Gene

Contents

Download Pdf Manuals

Related Search

Related Contents