Home

ProteoIQ User Guide Version 1.4.1

1. Label Free Quantification Individual Column Options Normalization Options J Average Spectral Count V No Normalization _ Avg Spectral Count Std Deviation E Spectral Count Normalization Avg Spectral Count Std Deviation E Protein Size Normalization NSAF Log2 Relative Expression Log2 Relative Expression Std Deviation Data Groups J Total Proteome V Biological Samples V Sample Replicates ok jJ came Note Selecting Sample Replicates in the Protein Table Column Selection Menu will display the results associated with each Replicate Modifying Plot Views All of the Plot Views in ProteolQ can be customized by right clicking in the plot area then selecting Properties as shown in Figure 53 Fields that can be modified include 1 2 3 4 5 6 7 Figure 53 Chart Properties Menu Chart Header Text X Axis Label Text Y Axis Label Text Line Colors Font Type and Colors Axis Tick Marks Plot Background Color Comparison of Protein Expression Across Biological Groups Chart Properties Allow the Chart View to be Modified o 1 Right click to access 130 chart properties muU 120 Tite Piot Other Properties XY Plot en Seas Doman Axs Range Ans Appearance a General g nees Label Sample Groups z k 9 up Font SansSerif plain 12 Select Total Spectral Count SC Bm c i Sclect 8 Auto Ra
2. M 109 Experimental Approach for GeLC Analysis sess sees 109 Preparing results for GeLC Analysis in ProteolQ csccscsseceescecsseeessaeessseecsereessaeessaeeseneeseaes 109 How to display a 1D Gel View ciii terree Ernie en orn n arua a oannes d Een ea a anna a a 110 Interpreting the 1D Gel VIeW 2 tueri ette terea ne dade erae e are a Er uen eva ane sa De ER Rae E 110 1D Gel View by Protein rice edet cereo eco rore adt eerta eese pp eed ae eru deese incedo on D teen ede 111 e E n ncinlm T 112 Quantitation Overview Label Free sess enne enne nnne nnne nnne 113 Quantitation Without Normalization Label Free seen 115 Quantitation With Normalization Label Free seeeeeeeenenenee enne 115 Normalization by Selected Proteins sess seen enne enne enne 116 KE gel ai olniirziscm mr 117 Protein Size Normalization NSAF essen enne nennen nnne nnn enne nns 119 Modifying Label Free Quantitation Settings eene enne nnns 120 Quantitation Overview Reporter lon Quantitation essere enne 121 The ProteolQ Quantitation Workflow Reporter lon Quantitation eeseeeeeseese 122 Modifying Reporter Quantitation Settings eene 123 Pseudo Spectral COUrts coiere oci cote E Fee dE dep dee gau dee ecu deo 124 The Pseudo Spectral Count
3. 144 Getting Started The ProteolQ user guide includes information about the use of ProteolQ version 1 3 for the analysis of proteomics results derived from database searching of MS MS data Contents Tutorials and Quick Start Guide System Requirements Installing ProteolQ Licensing ProteolQ Contact Information Tutorials and Quick Start Guide PDF versions of the tutorials help guide and quick start guide and be found in the ProteolQ installation path at C Program files ProteolQ docs These documents can also be accessed online at https www bioinquire com library php Tutorials 1 Database Wizard Guide 2 How to Create Hyperlinks 3 Guide to Grouping Database Search Results 4 Managing Protein Sets Onboard help is also available in ProteolQ Select HELP to access the onboard help guide System Requirements This release of ProteolQ requires Java version 1 6 14 or higher ProteolQ will not work on earlier versions of Java If Java version 1 6 14 is not installed ProteolQ will automatically install it during the installation process ProteolQ can be operated on Windows Mac or Linux operating sytems Less than 400 000 spectra in a single ProteolQ project Processor System Type Memory Disk Space Internet Connection More than 400 000 spectra in a single ProteolQ project Processor System Type Memory Disk Space Internet Connection Minimum Requirements
4. Is Top Sequence Id Sequence Name Total Score tral 2 Relative EERE Spectra Sce oont p ilYes 000375 2 lapolipoprotein B precursor Homo sapiens 434 631 496 253 063 129 0 552 2Yes INP 000055 2 complement component 3 precursor Homo sapiens 411 292 836 265 714 230 0 478 2iNo XP 001719515 1 PREDICTED hypothetical protein partial Homo sapiens 297 244 587 191 374 164 0 458 2No 9 001724196 1 PREDICTED similar to complement component 3 Homo sapiens 106 297 227 p 59 0 598 ES NP_001002029 2 _ complement component 48 preproprotein Homo sapiens 210 802 373 165 427 108 0 386 3 INP 009224 2 complement component 4A preproprotein Homo sapiens 210 802 373 165 427 108 0 386 4Yes INP 000583 2 complement component 48 preproprotein Homo sapiens 204 916 353 157 907 104 0 358 4 INP 00 1002029 1 complement component 48 preproprotein Homo sapiens 204 916 353 157 907 104 0 358 SlYes 000087 1 ceruloplasmin precursor Homo sapiens 152 203 179 98 085 52 0 375 6 Yes INP_001701 2 complement factor B preproprotein Homo sapiens 104 856 150 49 674 4 0 371 7Yes NP 000005 2 lalpha 2 macroglobuln precursor Homo sapiens 99 673 128 41 95 24 1 046 BlYes 001726 2 complement component 5 preproprotein Homo sapiens 96 593 104 52 028 32 0 353 oles 000574 2 vitamin D binding protein precursor Homo sapiens 88 866 120 45 9 28 3 737 Toles 000177 2 complement factor H isoform a precursor Homo sapiens 85 571
5. Combine all results files MudPIT Select Target Files For Parsing Each result file represents an individual sample Select Decoy Files For Parsing Group files by biological sample Select NonTarget Files For Parsing Number of Biological Samples 3 Number of Biological Replicates for each sample 3 Peptide Parsing Options V Spectra were searched against a decoy database Parse MS lons Data Separate target and decoy databases Concatenated target and decoy database Cluster Peptides 2 3 V Spectra were searched against a non target database Peptide Protein Viewer Table 11 Fields in the Describe Proteome Section Field Description Proteome Name Enter the name of the proteome here A descriptive name should be given as a single project may contain multiple proteomes Instrument Description Enter the type of mass spectrometer used in the proteome analysis This is an optional field and has no effect on the data processing Combine all results files Groups all database search results as if from a single biological sample MudPIT See Combine All Results Files for detailed information Each result represents Performs protein identification in each database search results using an individual sample ONLY peptides identified from that specific result file See topic Each Result Represents an Individual Sample for more information Group files by Allows the user to define the number of bio
6. Test Parse Rules The Test Parse Rules section displays the results of the Parse Rule Settings ProteolQ applies the parse rules to the specified database and displays the Protein ID and Description fields To ensure proper indexing of the database ensure that unique Protein IDs are being displayed for each protein entry For Concatenated Target Decoy Databases the Protein Type is displayed that indicates if the sequence is being considered a target or decoy based on the Decoy Identifier Parse Rule Figure 44 Specify Parse Rules Specify Hyperlinks Save Database Indicates if the sequence is a target or decoy as specified by the decoy identifier parse rule Sequence file test results Database Name Combined T G FASTA Format Test PASS Protein ID Description Protein Type parse rule w se rule s s gene type TARGET DECOY SWISS PROT P00761 TRYP PIG Tr BL Q32MB2 Q86Y 46 Tax Id SWISS PROT P 190 13 Tax_Id 960 REMBL Q7RTT2 Tax_Id 9606 Ge P1553 Protease I p TEOIQ Specify Hyperlinks The Specify Hyperlinks window allows hyperlinks to be added to a database during indexing Once a protein set is created the hyperlinks will appear as links from the protein name the registered database specified by the hyperlink Figure 45 shows the Specify Hyperlink window For a detailed description of how to create new hyperlinks please see the Create
7. Number of Protein Groups 2 898 8 8 8 5 8 8 858 8 8 8 2 8 18 808 8 w 0 07 08 00 20 21 22 23 24 26 26 27 29 290 3 0 m Peptide Scores Eran Protein Groups Decoy Protein Groups Protein FOR e Figure 64 shows and example of how the Protein FDR Plot should be used to evaluate if Protein False Discovery Rate based protein identification is reliable for your data set The Protein FDR Plot contains three data series 1 Distribution of target proteins at each score RED 2 Calculated Protein False Discovery Rate at each score threshold GREEN 3 Distribution of decoy proteins at each score threshold BLUE For the Protein FDR s to be accurate the distributions should approximate a Gaussian distribution There should also be good separation between the target and decoy peptide distributions Figure 64 It is important to view distributions based on the Maximum Starting Peptide Coverage since this plot will contain the largest number of data points for viewing a complete distribution It is also helpful to view the Output Notes section to view the number of proteins at each score threshold for calculation of the Protein False Discovery Rates Whole Proteome Visualization To view a snapshot of the entire protein set select the following Pie Chart Venn Diagram Group Summary 2D Gel View 1D Gel View 1D Gel View by Protein Access Pie Chart Plots Experimental Reproducibility P
8. Describe Proteome 2G 5 Project Name Proteome Choose Database s User Name BioInquire Search Result Options Project Description For Help Select Target Files For Parsing Project Path Select Decoy Files For Parsing e Select NonTarget Files For Parsing Peptide Parsing Options Parse Files Parse MS lons Data Cluster Peptides Peptide Protein Viewer Table 10 Fields in the Describe Project Section Field Description Project Name The project name should be entered in this field A ProteolQ project may contain multiple proteomes and peptide protein sets along with all the parameters used in its creation A project may be saved and reopened at any time User Name Enter a user name here to assist you in identifying your project in the future This field is not required Project Description Enter a description of the project here to assist you in identifying your project This field is optional Project Path Specifies the location that the project will be saved Click browse to locate the project path on your computer or network Describe Proteome The Describe Proteome section contains fields for adding information about the project and defines how the database search results will be grouped Figure 26 Table 11 describes the fields Figure 26 Describe Proteome Proteome Name News Proteome Describe Samples Instrument Description QTOF Choose Database s
9. Edit Menu The Edit Menu Figure 16 accesses the Find Copy and Edit functions as described in Table 2 Select All area Figure 16 Edit Menu Select by Find Ctrl F Select by List Ctrl L Copy Alt C Copy to New Protein Set Clear Selected Edit Normalization Settings aute IBEX Edit Reporter Quant Settings Table 2 Edit Menu Items Menu Item About Access Short Cut Key Select All Selects all proteins or peptides Edit gt Select All Alt A in the peptide or protein set viewer Select by Find Selects data in the peptide or Edit gt Select by Find Ctrl F protein set viewer based on user defined search criteria Select by List Load a tab delimited file Edit gt Select by List Ctrl L containing user defined search criteria and selects data in the peptide or protein set viewer Copy Copy selected information Edit gt Copy Copy to New Protein Set Copies selected peptides or Edit gt Copy to New proteins to a new Protein Set Protein Set accessible in the Navigation Bar Clear Selected Deselects selected items in Edit gt Clear Selected peptide and protein set viewer Menu Item About Access Opens the Normalization Settings Edit gt Edit Normalization dialogue box Settings Edit Normalization Settings Opens the edit Reporter Quantitation Edit gt Edit Reporter or Label Free Settings dialogue box Quant Label Free Settings Edit Reporter Quant or Label Free Settings View
10. Help Contents About Table 5 Help Menu Menu Item About Access Help Contents Opens onboard help guide Help Help Contents About Opens the About page to display version Help About information Pop Up Menus Right clicking on the selected peptides or proteins in the Peptide and Protein Set Viewer opens the Pop Up Menus as shown in Figure 12 and Figure 8 Pop Up Menu items are listed in Table 6 Table 6 Pop Up Menu Menu Item About Access Shortcut Copy Copy selected information Right click Copy Selected Ctrl C Copy to New Copies selected peptides or proteins to Right click gt Copy to New Protein Set a new Protein Set accessible in the Protein Set Navigation Bar Select Matching Selects proteins in Protein Set Viewer Right click Select Matching Proteins that contain selected peptides Proteins View Protein Info Opens the Protein Sequence Viewer Right click View Protein Info Normalize SC Opens the Normalize SC From Selected Right click gt Normalize SC from selected Proteins dialogue box from selected proteins proteins Statistical Algorithms Overview ProteolQ provides two methods for stastical analysis of peptide and protein identifications Determining confidence in peptide and protein assignments can be performed using Peptide Probability or Protein Probability thresholds or False Discovery Rate analysis See the discussion on Evaluating Confidence in Peptide and Protein Identifica
11. FASTA Database ProteolQ maps peptide identifications to proteins present in a FASTA database To ensure accurate results the same database used to generate the search results should be used to create the ProteolQ project See Modify Database Wizard for information on importing databases in ProteolQ E Q n I e x Q LL a m o 2 z Creating a New Project Label Free Quantitation To create a new project in ProteolQ first select the New Project Wizard from the Navigation Pane as shown in Figure 24 then select Label Free Quantitation Figure 24 New Project Wizard PROTEOIQ Select New Project Wizard to create a new project Reporter Quantitation TRAQ TMT nese There are twelve sections in the New Project Wizard The following describe the individual sections in the New Project Wizard Q Describe Project Describe Proteome Describe Samples Choose Database Q Search Results Options Select Target Files for Parsing Select Decoy Files for Parsing Select Non Target Files for Parsing Q Peptide Parsing Options Q Parse Files MS lons Data Cluster Peptides Describe Project The Describe Project section contains fields for adding information about the new project and defines where the project will be saved Figure 25 shows the Describe Project section of the New Project Wizard Table 10 describes the fields Figure 25 Describe Project Describe Project
12. Manually group search result files to source if selected allows source files with different names to be grouped together as if from the same source MS MS file Peptide Parsing Options The Peptide Parsing Options window as shown in Figure 31 is used to set the parameters for creation of the first protein set Apply filters to the data and select parameters for creating MS MS spectral views during peptide parsing Table 13 Table 14 and Table 15 contain descriptions of each parameter 31 Peptide Parsing Options Peptide filters Options for generating MS MS spectra m Extract no ion data Describe Proteome Extraction data for peptides id d by a sng spectrum Extract on data for tap scomng peptides Mr Peptide Length Aa 5 Extraction data for ALL peptides Describe Samples Choose Database s Search Result Options Initial Protein Set Options Select Target Files For Parsing endear Sea iie Set statistical thresholds g Mn Proten Probab ity 0 5 See tied C 8 for validating peptide and Max Protein FOR protein identificatiosn Peptide Parsing Optons anaes 2d Parse Files Select to automate the parsing and clustering process There are five components to the Peptide Parsing Options Peptide Parsing Options Peptide filters Initial Protein Set Options Protein and peptide statistical thresholds Options for generating MS MS spectral views Specifying name and location fo
13. 1 5 GHz Pentium IV 32 bit x86 or better 32 bit 1 5 GB or more 200 MB for installation 2 GB to run Cable or DSL Minimum Requirements 2 0 GHz 64 bit x64 or better 64 bit 2 GB or more 200 MB for installation 4 GB to run Cable or DSL The number of spectra in a single project is scalable with the amount of RAM The computer must be attached to the internet during installation for license validation For limited use licenses ProteolQ 40 and 200 the computer must have access to the internet each time the software is used Amount of disk space required for is at least twice the amount of RAM e g 4GB requires at least 8GB disk space Installing ProteolQ Double click on the ProteolQ installer and follow the instructions to install ProteolQ For Windows Vista users you may need to right click on the ProteolQ installer and select Run as administrator The default installation path for ProteolQ is C Program files ProteolQ Help files and sample data The help files and the quick start guide can be found in C Program files ProteolQ docs To locate sample data files go to C Program files ProteolQ Sample data Comparative Mouse Cell Proteome To open the sample data select Open Project from the main toolbar browse for the above folder then select validated Mouse cell proteome piq Licensing From the ProteolQ navigation panel select View License To enter a new license type the li
14. Starting Peptide Coverage Discriminant score F value Peptide Probability Protein Probability Protein Group Probability Description In ProteolQ when a protein false discovery is to be applied the user must determine the starting peptide coverage C to use in the calculation of false discovery rates The concept of a false discovery rate is based on the assumption that the null hypotheses proteins identified in the random database follow a quasi normal distribution of minimal scores S Indeed score distributions at increasing c may not necessarily look normal themselves especially when the sample size is small Thus as the sample size number of protein identifications increases for each c the score distribution will approach normality and the false discovery rate calculations will become more accurate In practice determining C prior to protein false discovery rate analysis will be dependent on the distribution of minimal scores S in the target and decoy protein assignments By default ProteolQ begins the protein false discovery rate calculation using starting peptide coverage of 3 For large data sets the starting peptide coverage can be increased Note choosing correct starting peptide coverage is largely dependent on the frequency of protein identifications in the decoy database at that coverage level After the protein groups have been rendered the user can evaluate the starting peptide coverage level by viewin
15. The following sections provide an overview about how to use ProteolQ The following contents are covered in this section Opening a ProteolQ Project Saving Projects l Sharing Moving ProteolQ Projects Creating a New ProteolQ Project Overview What You Need to Create a New Project Creating a New ProteolQ Project Label Free Quantitation Creating a New ProteolQ Project Reporter lon Quantitation Modifying Databases Using the Filter Pane Comparing Protein Sets Customizing Results Views Using the Plots Exporting Results Opening a ProteolQ Project This sections describes how to open a ProteolQ Project and provides instructions for accessing the Demo ProteolQ Project as shown in Figure 20 Figure 20 Open Project PROTEOIQ Open an existing ProteolQ project Lookin Sample Data gt dc di MSlonsData a validated Demo Dataset One piq Select piq file BIOINQUIRE PROIEOIQ Click Open JU erue rated eno Dataset One p j Open Files of type ust pig fles x ice Open a ProteolQ Project An Demo ProteolQ Project is including with the installation package To open the Demo ProteolQ Project 1 Select Navigation Menu gt Open Project 2 Browse to C Program Files ProteolQ Sample Data Validated Demo Dataset One piq 3 Select Open Note A ProteolQ Project contains multiple files For the project to opened correctly ALL files must be pre
16. Enter the number of amino acids that must be present in an identified peptide for it to be included in the final peptide list For example if the Min Peptide Length is set to 5 then any peptides with 4 or less amino acids will be removed Warning Low scoring peptide matches are required for accurate calculation of peptide probabilities by Peptide Prophet therefore if probability thresholds are to be applied then the user should consider NOT applying filters during the parsing step Parse Files MS lons Data The parse files progress window displays the information about each database search result as ProteolQ extracts the relevent peptide and protein information Figure 32 gt Q LL 24 OW Lu x O mi Number of Files to Parse Num Target Files 48 Num Decoy Files 48 Note For large data sets especially LTQ results the parsing process will take some time In these cases we recommend running the parsing and clustering steps in auto mode by selecting Auto Generate Initial Protein Set Cluster Peptides Figure 33 shows the cluster Peptides progress window displays the status of mapping peptide to a FASTA database during the creation of a new ProteolQ Project Select 34966 51433 Memory MB cHEEBEEE ins 46892 49248 Start Clustering target Clustering gene 1 of 34
17. Modify Database Wizard Figure 4 shows the ProteolQ Modify Database Wizard The Modify Database Wizard enables FASTA databases to be saved in ProteolQ Previously saved databases can be modified within the wizard and custom hyperlinks can be registered to link protein identifications to external information resources There are five components to the Modify Database Wizard Describe Database Specify Parse Rules Test Parse Rules 4 Specify Hyperlinks Save Database Figure 4 Modify Database Wizard SE ee sa y PrOTEOIQ GI DUE CGCY ET CHER EO XE WA Modify existing databases BIOINQUIRE PROTEOIQ eer ec r wee eee eee Select Database Human GenBank Target j onosoaanascosnoonononoa oo Descrbe Database Database Name Human GenBank Target Specify Parse Rules R Target Wizard provides step by step i Test Parse Rules ere me instructions for loading s FASTA databases in ProteolQ SPecify Hyperlinks Doscdplion I aw dz s pu t owuwsuuuoneusueocunnueuenems Load new databases sucede A 0 poco ee ee nanasassoansanoscososnnaooee 1 qt Register custom hyperlinks Bh ee c ur c ELE Lum ut yv 4 mmm mimwe o o 1 Register Hyperlinks 7 Yes Note For more information on loading FASTA databases please see the Database Wizard tutorial Note For more information on adding custom hyperlinks to your database please see the Cre
18. Note the average spectral count will be used when multiple groups are selected Note These settings only affect protein sets created after the settings are applied Apply Cancel Quantitation Overview Reporter lon Quantitation Figure 73 shows a flow chart for how quantitation is performed for a binary comparison of two biological samples with two replicates using an iTRAQ 2 Plex reagent Table 34 contains descriptions of the data types specifically associated with Reporter lon Quantitation Figure 73 Quantitation Flow Chart Reporter lon Quantitation Example iTRAQ 2 Plex Experiment Sample 1 Sample 2 Label with 114 Label with 117 Analyze in duplicate e e T O nc at TT ce 7 Z e C Rep 1 gt Rep2 ae 5 ag iu ces A l Peptides matching Peptide1 Peptide 2 Peptide1 R B Pk a ee es Protein X Extract Reporter lon Intensities 114 1 1000 114 1 3000 114 1 500 117 1 1500 117 1 4000 117 1 1000 Calculate Reporter lon Ratios _ _ _ Reference group 1 5 1500 1000 1 33 4000 3000 2 0 1000 500 cot as 114 1 Log Rel Exp 0 58 Log Rel Exp 0 41 Log Rel Exp 1 0 Average across all peptides matching protein Average Log Rel Exp 0 49 Average Log Rel Exp 1 0 Average across all replicates internal or external Average Log Rel Exp 0 74 Log Rel Expression SDEV 0 36 The ProteolQ Quant
19. Sample Names Inserting a unique name for each sample assists the user in identifying the individual sample sets in a complete proteome This field is useful if biological replicates were analyzed e g patient 1 2 or 3 This field is optional as default sample names will be generated for the number of biological samples specified in the Describe Proteome page Replicates Input the number of technical replicates performed on each biological replicate This option is only available if Group File By Biological Sample is selected on the Describe Proteome page TEOIC BIOINQUIRE PRO Choose Database Select a target or decoy database using the drop down menus in the Choose Database s page Figure 28 Database definitions may be modified or a new database may be created by selecting the Modify Databases button To read more about creating databases in ProteolQ see the Database Wizard tutorial Figure 28 Choose Database T PROTEOIQ Describe Project Select target database from the list Describe Proteome Describe Samples Search Result Options Select decoy database from the list Select Target Files For Parsing Select Decoy Files For Parsing If you need to add a new database or edit an existing database press the Modify Databases button Select NonTarget Files For Parsing Peptide Parsing Options Parse Files Modify Databases Parse MS lons Data Cluster
20. 001726 2 comp t component 5 preproprotein Homo sapiens 96 593 104 52 028 32 0 353 INe 000574 2 tamin D binding protein precursor Homo sapiens 88 866 120 45 9 28 3 P_000177 2 factor H isoform a precursor Homo sapiens 85 571 99 45 055 15 1 342 NP 001014975 1 factor H isoform b precursor Homo sapiens 22 648 26 11 086 3 71 038 e _000604 1 hemopexin Homo sapiens 82 691 195 61647 53 0 481 _NP_596869 3__ btin isoform N2 A Homo sapiens 79 406 92 27217 28 0 604 002209 2 inter alpha globulin inhibitor H4 Homo sapiens 73 742 84 54566 29 0 141 INP_002207 2 iter alpha globulin inhibitor H2 polypeptide Homo sapiens 65 591 86 37901 26 0 443 INP 001076 2 pin peptidase inhibitor dade A member 3 precursor Homo sapiens 65 531 176 48 357 60 0 201 NP_000884 1 1 isoform 2 Homo sapiens 62 489 104 34879 25 39 77 NP 001095886 1 1isoform 1 Homo sapiens 55 964 87 28 353 7 1 038 INP 570602 2 18 glycoprotein precursor Homo sapiens 60 067 77 28 532 12 1 386 WNP _000497 1 tion factor II precursor Homo sapiens 59 524 60 28 717 15 0 656 Ne 000473 2 tein A IV precursor Homo sapiens 58 287 81 32425 16 0 964 INP 002206 2 Iter alpha globulin inhibitor H1 Homo sapiens 57 771 69 31 107 m 1 429 One protein or multiple proteins can be selected by a single right click on the row of interest Figure 8 Right clicking on the selected proteins allows the highlighted proteins to be ex
21. Avg Int 1250 Avg Int 3500 Avg Int 750 Calculate Normalized Reporter lon Intensities Pseudo Spectral Count 114 1 0 8 1000 1250 114 1 0 85 3000 3500 114 1 0 66 500 750 117 1 1 5 1500 1250 117 1 1 14 4000 3500 117 1 1 33 1000 750 The normalized reporter ion intensities represent spectral counts For example for sample one replicate one peptide one has a pseudo spectral count 0 8 At this point the spectral counts are calculated as described under the Label Free Quantitation Section Note Once pseudo spectral counts are calculated for an individual peptide normalization and replicate analysis is performed exactly as described under the Label Free Quantitation section The Pseudo Spectral Count Workflow 1 For every peptide the reporter ions are detected based upon the user specified Mass Window and Reporter Method Note for a reporter ion to be detected the ion must be within the Mass Window specified and the peptide must contain the modification associated with the reporter label 2 Reporter ion intensities are then saved for each peptide 3 The average reporter ion intensity is calculated across all reporter ions for a given peptide 4 Pseudo spectral counts are then calculated by dividing the intensity of each reporter ion by the average reporter ion intensity Pseudo Spectral Counts Without Normalization Table 35 contains descriptions of the data types specifically associat
22. Peptide probability 71 Peptide Probability 26 29 48 72 77 89 91 Peptide Table Options 20 25 75 76 O LLI d O Y Q Lu te O Peptide Protein Viewer 15 39 52 PeptideProphet 29 Probability 29 project name 40 Properties 23 84 Protein groups per peptide 69 Protein ID parse rule 63 Protein list html 86 Protein Name parse rule 63 protein names 62 Protein Normalized 119 Protein Probability 29 48 72 80 100 101 Protein Sequence Viewer 15 17 18 19 21 28 75 83 95 112 Protein sequences 86 Protein sets to export 85 ProteinProphet 29 Proteins per peptide 69 ProValT 71 regular expression 63 relative standard deviation 81 82 Remove 74 Rename 74 Replicates 43 Sample Names 43 56 Save 14 23 33 34 60 62 Score 17 47 49 67 69 71 76 77 80 89 90 91 92 93 94 100 102 137 141 Select All 24 25 Select by Find 24 25 Select by List 24 Select database 61 Select Existing 63 Select Files for Parsing 59 SEQUEST 10 37 38 42 45 47 49 69 71 77 78 86 90 141 142 Source 139 Specify Parse Rules 63 Spectra per peptide 69 spectral count 48 69 119 Spectral Count 17 77 80 81 82 105 115 116 117 125 127 target 29 34 42 44 61 71 72 TOP 80 139 Total protein score 70 total score 70 Total Selected Files 46 Tutorials 6 Union 74 Validate Proteome 23 33 34 Validate
23. Please see the Grouping Database Search Results tutorial for a more detailed description of this process Result File Type allows you to select the database search results format to be grouped Figure 30 Select Files for Parsing Select Data File Format Database search results are assigned to Biological Group Add selected files and Replicate Choose Biological Group and Replicate Assign files to Browse for database Biological Group search files and Replicate Grouping database search results in ProteolQ is performed as follows 1 7 Select the search results format Note multiple search result formats can be imported into ProteolQ within the same project Browse database search results The name of the peak list used to produce the database search result is rendered in the Available Results Files viewer Select the files for grouping Once selected the files appear as highlighted Move the highlighted files to the Total Selected Files viewer by selecting Add Select the appropriate result file s in the Total Selected Files viewer Select the Biological Group or Replicate from the List of Samples drop down menu Click Assign Files to Sample The assigned results files will then appear as tabbed under the Biological Group name in the Total Selected Files viewer At any time the results files may be removed or ungrouped by highlighting either the result file or group name and selecting the remove button
24. References Weatherly DB Atwood JA 3rd Minning TA Cavola C Tarleton RL Orlando R A Heuristic method for assigning a false discovery rate for protein identifications from Mascot database search results Molecular and Cellular Proteomics 2005 4 762 72 Nesvizhskii Al Keller A Kolker E Aebersold R A statistical model for identifying proteins by tandem mass spectrometry Anal Chem 2003 75 4646 4658 Nesvizhskii Al Aebersold R Analysis statistical validation and dissemination of large scale proteomics data sets generated by tandem MS Drug Discov Today 2004 4 173 181 Zhang B Chambers MC Tabb DL Proteomic parsimony through bipartite graph analysis improves accuracy and transparency J Proteome Res 2007 6 3549 57 Source TOP indicates that the protein sequence can account for all peptides within a PROTEIN GROUP OTHER indicates that the protein is a member of a PROTEIN GROUP but it is not the TOP Protein assignments list as OTHER contain a subset of peptides that were observed in the TOP identification but cannot be distinguished as unique proteins because of shared peptide representation Note Where a protein is listed as OTHER does not mean that the protein was not identified It simply means that the peptides observed for that protein do not allow it to be distinguished from the other proteins within the group It is always plausible that several or all of the members of a protein group ar
25. Select Metric Data Source to Display 7 Number of Protein Groups Diagram _ _ _ _ Tradtonai vem Diagram Type Strict Venn Diagram Collection C Collection B Collection A Groups are Color Coded Table 31 Group Summary Venn Diagram Data Sources Field Description Number of Compares the number of peptides for all proteins between Biological Groups Peptides Number of Compares the total number of proteins between Biological Groups Proteins Number of Compares the total number of protein groups between Biological Groups Protein Groups BIOINQUIRE PROTEOIC Traditional Venn Diagram Labeling Under traditional labeling the total number of protein groups proteins or peptides are reported for each biological group along with the intersection data across biological groups Under Traditional Labeling there is no concept of uniqueness as indicated in the example below Biological Biological Intersection Group A Group B A and B Number of proteins 10 10 5 GroupA Group B 10 5 10 AnB Strict Venn Diagram Labeling Under strict labeling the non intersection areas display the protein groups proteins or peptides that are unique to each biological group as displayed in the example below Biological Biological Intersection Group A Group B A and B Number of proteins 10 10 5 GroupA Group B my e LLI I Q a4 CL n e E e ES e A 2D Gel View The 2D Gel View Figure 67 presents
26. Viewer Figure 12 Right click functionality in the Peptide Set Viewer Modification 457 0215 C15 Single clicking on rows highlights selects the peptides of interest Sov EET E Copy Selected Ctrl C Select Matching Proteins 1 Copy Selected to New Protein Set mud a 40215 C5 Select Matching Proteins Copy Selected to New Protein highlights the proteins in the Protein Set creates a protein set ONLY Set Viewer that contain the selected containing the selected peptides peptides BIOINQUIRE PROTEOIQ MS MS Spectra Viewer Figure 13 is an example of the MS MS Spectra Viewer To open the spectra viewer double click on the peptide sequence in the Peptide Set Viewer or in the Protein Sequence Viewer The spectra viewer contains three panels The left most panel is for selection of spectra labeling parameters that are rendered in the annotated MS MS spectrum Select to annotate spectra by Charge State Fragment lon Type or search for Neutral Losses Figure 13 MS MS Spectra Viewer Export spectra as images or spreadsheet format TEN S Ss Eos File Help Spectral View and Peptide Fragmentation Table ya Select how MS MS x spectra are annotated Specify thresholds for E peak labeling 77757 2 Label regardless of intensity o8B aeRGSRRRSERAS ER Mod Loss 1 3 O 4 P 97 9769 o 3P 79 96
27. 150 110 62 5 75 75 2 Calculate Log Rel Expression across replicates Replicate 1 as an example Log Rel Expression for Sample 1 Replicate 1 0 40 Log 133 101 Log Rel Expression for Sample 1 Replicate 2 0 57 Log 150 101 Log Rel Expression for Sample 1 Replicate 3 0 110 Log 110 101 Log Rel Expression for Sample 2 Replicate 1 0 69 Log 62 5 101 Log Rel Expression for Sample 2 Replicate 2 0 42 Log 75 101 Log Rel Expression for Sample 2 Replicate 3 0 42 Log 75 101 Calculate Sample 1 Log Rel Expression Calculate Sample 2 Log Rel Expression Log Rel Exp 0 37 0 40 0 57 0 110 3 Log Rel Exp 0 51 0 69 0 42 0 42 3 Log Rel Expression SDEV 0 22 Log Rel Expression SDEV 0 15 The ProteolQ Quantitation Workflow Label Free 1 Total Spectral Counts are first determined for each replicate by summing the spectral counts for all proteins identified in that replicate shown as Total SPC 2 The spectral counts for each protein in each replicate are then normalized based on Total Spectral Counts in each replicate or using the spectral counts for the selected proteins used in Control Protein Normalization 3 Normalization factors are calculated between Biological Samples Designated as Normalization Factor using Max Replicate SC 4 The Replicate Normalized spectral counts are then normalized using the Biological Sample Normalization factor In the example above each replicate normalize
28. 99 45 055 15 1 342 10 INP 001014975 1 complement factor H isoform b precursor Homo sapiens 22 648 26 11 086 3 1 038 qo INP 000604 1 emopexin Homo sapiens 82 691 195 61 647 53 0 481 IjYes NP 596069 itn isoform N2 A Home sapiens 79 06 92 2717 24 0 604 13 Yes 002209 2 linter alpha globulin inhibitor H4 Homo sapiens 73 742 84 54 566 29 0 141 14Yes INP 002207 2 linter alpha globulin inhibitor H2 polypeptide Homo sapiens 65 591 86 37 901 26 0 443 15 Yes INP 001076 2 serpin peptidase inhibitor dade A member 3 precursor Homo sapiens 65 531 176 48 367 60 0 201 16 Yes INP_000884 1 Lisoform 2 Homo sapiens 62 489 104 34 879 25 0 677 16 No 001095886 1 Jininogen 1isoform 1 Homo sapiens 55 964 87 28 353 17 1 038 17 Yes 570602 2 jalpha 1B glycoprotein precursor Homo sapiens 60 067 77 28 532 12 1 386 18 Yes 000497 1 coagulation factor II precursor Homo sapiens 59 524 60 28 717 15 0 656 19 Yes NP 000473 2 apolipoprotein A IV precursor Homo sapiens 58 287 81 32 426 16 0 964 20 Yes NP 002206 2 nter alpha globulin inhibitor H1 Homo sapiens 57 771 69 31 107 11 1 429 Peptide Table Options in Peptide Set Viewer Figure 50 shows the Peptide Column selection menu that is accessible by selecting View gt Peptide Table Options gt Select Columns Select columns to display by checking the box adjacent to each column name Table 22 provides descriptions of each data type Figu
29. A SC Standard Deviation The standard deviation of spectral counts between replicates within a Biological Sample A SC Standard Deviation The percent standard deviation of spectral counts between replicates within a Biological Sample A SC Log2 Relative Expression Log2 Average SC for protein A in Biological Sample X Sum of the Average SC for protein A in Biological Sample X Y Z A SC Log2 Relative Expression The standard deviation of the Log 2 Expression Values calculated Std Dev for matched replicates between the Biological Samples Quantitation With Normalization Label Free Three methods for normalization are available in ProteolQ The following sections discuss each method Normalization by Selected Proteins Sample Normalization Protein Normalization NSAF Normalization by Selected Proteins Spectral counts for selected proteins or internal standards can be used to perform normalization across replicates and biological groups The following sections describe how to perform normalization by selected proteins Figure 71 1 Highlight proteins to use for normalization in the protein set viewer 2 Right click the selected protein to open the Pop Up Menu 3 Select Normalize SC From Selected Proteins 4 The Normalization Dialogue Box will appear 5 Select Spectral Counts of User Specified Control Protein s in the Normalization Dialogue Box 6 The normalization factors for each replicate and biological samp
30. Biological Samples Protein Size Normalization NSAF To display Protein Normalized Spectral Counts select View Protein Table Options Select Columns Protein normalized spectral counts more commonly referred to as normalized spectral abundance factors NSAF are calculated as the number of spectral counts for protein x SpC divided by the number of amino acids L in protein x divided by the sum of SpC L for all proteins in the experimental dataset Spectral count normalization based on protein size and protein set composition has been shown to improve the accuracy of spectral count quantification see references below NSAF values have also been applied to study protein complexes and generate protein interaction networks References Zybailov BL Florens L Washburn MP Quantitative shotgun proteomics using a protease with broad specificity and normalized spectral abundance factors Mol Biosyst 2007 3 354 360 Zybailov BL Mosley AL Sardiu ME Coleman MK Florens L Washburn MP Statistical analysis of membrane proteome expression changes in Saccharomyces cerevisiae J Proteome Res 2006 5 2339 2347 Powell DW Weaver CM Jennings JL McAfee KJ He Y Weil PA Link AJ Cluster analysis of mass spectrometry data reveals a novel component of SAGA Mol Cell Biol 2005 24 7249 7259 Modifying Label Free Quantitation Settings By default ProteolQ calculates Log Relative
31. Field Fragment Charge States lon Series Minimum Intensity Percentile Fragment Mass Tolerance Description Four selections are available 1 2 3 and gt 3 Selection of a charge state will label ions in the MS MS spectrum that arise from the selected ion series y b etc at the assigned charge state For example if a charge state of 2 and an ion series of y are selected ProteolQ will label the peaks in the MS MS spectrum that correlate to the m z of predicted doubly charged y ions from the identified peptide By default charge is considered M H M H and M H3 for 1 2 and 3 etc From the peptide sequence ProteolQ predicts the m z of the common fragment ions y b a c x and z Selecting an ion series labels the MS MS spectrum if there are ions present of the selected ion series that meet the criteria set in the left panel ProteolQ generates a distribution of intensities for the fragment ions present in each MS MS spectrum The intensities are binned to create a histogram based on the frequency of intensities observed in the MS MS spectrum The minimum intensity percentile requires that the peaks in the MS MS spectrum must be above the setting in order to be labeled with an ion series annotation For example a minimum intensity percentile of 90 requires that a peak in the MS MS spectrum has intensity greater than 90 of all ions in the spectrum to be considered for annotation with an ion series la
32. Filters Min Peptide Probability Statistical Filters Min Protein Probability Min Protein Group Probability Fiter Previous Set no Fiter Protein Settame 0 Advanced Filter Pane Peptide Filters Peptide X ions Score Mn Score Differential Min Max Max Peptide Length AA Mn Max Spectra Per Peptide Min Max d Proteins Per Peptide M Max d Pro Groups Per Peptide Mn Max Charge States yu ge gs i 7 1ndude All Spectra for ANY Matching Peptide Protein Filters Spectra PE ES EFTA False Discovery Rate Filters Metric to Use for FDR Masco Max Peptide FDR Max Protein FOR 59 Starting Pep Coverage for ProFDR Probability Filters Discriminant Score F val Mn Peptide Probability Mn Protein Probability Mn Protein Group Probability Mn pEET Fiter Previous Seb No Fiter Protein Set Rame Nae Acta sca How Filtering Works The filtering pane is broken down into three sections 1 Peptide Filters 2 Protein Filters 3 Statistical Filter When a filter is applied the peptide and protein sets are filtered according to the parameters set in the Filter Pane and new peptide or protein set is displayed and can be accessed via the Protein Set Navigation Bar Filters are applied in the following order 1 Ifa Peptide False Discovery Rate is assigned peptides are removed based up the minimum Xcorr F value lon Score or Hype
33. General Information Protein Description Quality Metrics F Protein Group Number F Protein Score V Is Top V Protein Group Probability 7 Sequence Id V Protein Probability V Sequence Name 7 Number of Peptides J Protein Length AA 7 Sequence Coverage v Protein Weight kDa 7 Total Spectral Count Reporter Ion Quantification Individual Column Options Normalization Options V Pseudo Spectral Count V No Normalization V Pseudo SC Std Deviation V Spectral Count Normalization 7 Pseudo SC Std Deviation V Log2 Relative Expression V Log2 Relative Expression Std Deviation Normalized Spectral Abundance Factors NSAF Data Groups V Total Proteome 7 Biological Samples Table 23 Protein Column Selection Menu Data Groups Field Total Proteome Description Displays the Total Values for each protein identification Totals are calculated by summing the values across all Biological Groups and Replicates Biological Samples Group Displays the Quality Metrics and Quantification Data for each Biological Table 24 Protein Column Selection Menu General Information Field Protein Group Number Is Top Sequence ID Sequence Name Protein Length Protein Weight Protein Group Probability Protein Probability Number of Peptides 96 Sequence Coverage Total Spectral Count Description Ranks the Protein Groups based on Total Score TOP in
34. LU D lt O E Specify Parse Rules Select Database Test Parse Rules Specify Hyperlinks Database Name Human GenBank Target Ee Database Type Target Description Homo sapiens GenBank 4 Mar 2008 Browse Sequence File Human GenBank 7 Target 1 fasta F Register Hyperlinks V Yes No Delete Database There are five sections in the Modify Database Wizard The following describe the individual sections in the Modify Database Wizard Describe Database Specify Parse Rules Q Test Parse Rules 2 Specify Hyperlinks Save Database Describe Database The Describe Database section contains fields for adding information about the new database The information is stored when the database is saved and can be accessed through the Database Wizard at anytime by selecting the database name in the drop down menu Figure 42 shows the Describe Database section of the Modify Database Wizard Table 18 describes the fields Figure 42 Describe Database Specify Parse Rules Select Database Test Parse Rules Specify Hyperlinks Database Name Human GenBank Target Save Database Database Type Target Description Homo sapiens GenBank 4 Mar 2008 Human GenBank Target 1 fasta Register Hyperlinks 7 Yes El No Table 18 Describe Proteome Field Description Select Database The drop down menu al
35. Log Relative Expression Std Deviation No Normalization Spectral Count Normalization Protein Size Normalization Description Calculated by dividing the Sum of the Total Spectral Counts for a protein in each Replicate by the number of Replicates If 22 replicates are assigned ProteolQ calculates the standard deviation for each protein identification using the Total Spectral Counts and Average Spectral Count in each replicate for a given Biological Group The percent standard deviation more commonly referred to as relative standard deviation RSD is calculated as Avg Spectral Count Std Deviation Average Spectral Count X 10096 Calculated by taking the Log of the Relative Expression Value For more detailed information see Quantitation Overview Label Free If 22 replicates are assigned ProteolQ calculates the standard deviation of Log Relative Expression Value across all replicates within a Biological Groups Calculates Relative Expression and Log Relative Expression without normalization Total Spectral Counts are first normalized by comparing the total spectral counts for all proteins identified in each biological group The normalized Spectral Counts are then used to calculate Relative Expression and Log Relative Expression For more information see the Sample Normalization topic Displays the Protein Size Normalized Total Spectral Counts for each Biological Group For more information on how the Protein
36. Parsing Select Decoy Files for Parsing Select Non Target Files for Parsing Peptide Parsing Options Q Parse Files 4 Parse MS lon Data Cluster Peptides e TY e X A T Ta S O e Peptide Protein Viewer Figure 2 New Project Wizard Label Free Quantitation Qr er pa che nae Proteome Name Three cell comparison Instrument Description LTQ T 1 1 Parse MS lons Data 1 1 1 Describe Samples 1 i Combine ali results files MudPIT Choose Database s Each result ile represents an individual sample Define parameters for Wizard provides step by step Search Result Options i Group files by biological sample aT project creation instructions for importing select Torget Fes For P 1 Number of Biological Samples 4 2m H 1 LI Number of Biological Replicates for each sample 3 rj database search results into i issectbocosrhesFor Pass persos alb ien e APA ProteolQ 3 select NonTarget Files For Parsing J Spectra were searched against a decoy database Peptide Parsing Options Parse Files J Spectra were searched against a non target database LI LI Note For more information on grouping database search results please see the Grouping database search results tutorial New Project Wizard Reporter lon Quantitation Figure 3 shows the ProteolQ New Project Wizard for Re
37. Plots Comparative Proteomics Quantitative Expression Heat Map The Heat Map is similar to standard Heat Maps for microarray analysis in which the Log2 expression values are color coded based on relative expression The colors range from red lowest expression to green highest expression Absolute values spectral counts peptides etc can also be selected and the number of color partitions can be set Figure 79 Quantitative Expression Heat Map Select Biological segues ania Paent Parione Patents Sample to Display e NP 2 complement component 5 preproprotein Homo sapiens NP 001002029 2 complement component 4B preproprotein Homo sapiens NP 001002029 1 complement component 4B preproprotein Homo sapiens 8 NP 000056 1 Complement component 6 precursor Homo sapiens a Select M etric NP_001027466 1 complement component 1 inhibitor precursor Homo sapiens _ NP_001728 1 complement component 9 Homo sapiens 48 NP_001014975 1 complement factor H isoform b precursor Homo sapiens NP_000053 2 complement component 1 inhibitor precursor Homo sapiens 6 14 24 NP_000054 2 complement component 2 precursor Homo sapiens XP 001724196 1 PREDICTED similar to complement component 3 Homo sapiens Choose Log2 or Absolute Value Scale complement component 4A preproprotein Homo sapiens Table 38 Quantitative Expression Heat Map Field Description Choose One or Mo
38. Project Wizard Label Free Quantitation Q New Project Wizard Reporter Quantitation 2 Modify Database Wizard Q Peptide Protein Set Viewer Statistical Algorithms Overview Q Data File Formats Navigation Panel Figure 1 shows the ProteolQ Navigation panel When ProteolQ is launched the navigation panel is docked to the left side of the screen The following items can be accessed via the navigation panel New Project Wizard Open Project Modify Database Wizard License Manager Help Guide Minimize and Close OQoponopp Figure 1 ProteolQ Navigation Panel BIOINQUIRE PROTEOIQ a Create a new project and come add database search results Open an existing oe ProteolQ project MEM Add FASTA databases 3 and create hyperlinks fee License Manager eepe Onboard help Note You can open multiple windows simultaneously For example an existing project can be viewed while a New Project is being created New Project Wizard Label Free Quantitation Figure 2 shows the ProteolQ New Project Wizard for Label Free Quantitation The New Project Wizard is used to create a new ProteolQ project Step by step instructions are provided to assist the user in grouping and importing database search results Navigating through the New Project Wizard occurs in thirteen steps 4 Describe Project Describe Proteome 4 Describe Samples Q Choose Database 4 Search Result Options C Select Target Files for
39. Results to Groups and Replicates Enter Initial Statistical Thresholds for Filtering Peptide and Protein Identifications What You Need to Create a New Project Table 9 describes the files required to Create a New Project in ProteolQ For more detailes on creating new projects in ProteolQ see the Database Wizard and Grouping Database Search Results Tutorials located at C Program Files ProteolQ docs Table 9 Files required to create a new ProteolQ project File Type Description File Extension Database Search Contains search parameters peptides Result Mascot protein assignments and MS MS spectra Database Search Contains search parameters peptides protein Result assignments and MS MS spectra X Tandem Database Search SEQUEST creates folders containing peptide files OUT DTA and Result out spectral files dta and parameters files PARAMS SEQUEST ProteolQ requires all three or SRF files can also be imported into ProteolQ FASTA Database Contains the protein sequences used to perform the database search Mascot DAT Mascot renders results in two formats its native data format DAT and an HTML view The HTML files should NOT be used in ProteolQ since they do not contain all of the information needed to perform protein grouping and statistical analysis The DAT files are typically located in the directory Inetpub Mascot data and are organized by a
40. Sample The y axis displays the Spectral Counts Pseudo Spectral Counts or Number of Peptides for the protein and the x axis displays the Average Spectral Counts Average Pseudo Spectral Counts for the protein calculated across all replicates within the selected Biological Group Visualizing Quantitative Results ProteolQ includes multiple plots for viewing the results of a quantitative proteome The following sections describe how to visualize protein expression data in ProteolQ Quantitative Expression Scatter Q Quantitative Expression Heat Map Relative Expression Changes Cluster Plot 2 Relative Expressin Differences Dot Quantitative Expression Scatter The quantitative expression scatter plot is used to compare Spectral Counts Pseudo Spectral Counts Peptides or Sequence Coverage between two or more Biological Samples Figure 78 To open the quantitative expression scatter plot Plots gt Comparative Proteomics gt Quantitative Expression Scatter Figure 78 Quantitative Expression Scatter Plot Com parison of Protein Expression Across Biological Groups Select Biological Samples to Display Te 300 225 B Select Reference m o E 150 F o 125 m a Select Metric otal specal cant Sw m 998 o Add Trendline Show Trendne tit at 25 60 75 100 125 150 175 200 225 250 275 Average of Patient Patient2 Patient3 Total Spectral Count SC Pati
41. Table Options Select Columns Table 33 contains descriptions of the data types specifically associated with normalized spectral count quantitation Example 2 Normalization Factors Sample Normalization Total Spectral Counts for all proteins within In many cases over sampling of a proteome of a one Biological Sample 1 130 000 biological sample relative to another biological sample can occur In these cases normalizing the Biologics Sample c 100 008 spectral counts for each protein in the under sampled Biological Sample 3 200 000 proteome relative to the over sampled state can make the quantitation more accurate ProteolQ Normalization Factors performs this normalization by comparing the total M MS Biological Sample 1 1 53 spectral counts for all proteins identified in each DM EE biological group and replicate The normalization Biological Sample 2 2 00 factors are calculated such that the total spectral counts for all proteins in each replicate and biological Biological Sample 3 1 00 sample are equal The normalization factors are then applied to the spectral counts for each protein as shown in Example 2 See Quantitation Overview for a detailed description of how normalization is applied in ProteolQ Normalization Factors are then applied to individual protein spectral counts To view the normalization factors for Sample Normalization perform the following Highlight one or more proteins in the protein s
42. and overview of protein expression across the selected biological groups To view the 2D Gel View select Plots gt Gel Based Proteomics gt 2D Gel View To display a 2D Gel View 1 Select the biological groups to display form the Choose One or More Groups for Comparison menu 2 Choose the metric to display Note the size of darkness of the spot is dependent upon the selected value 3 Click View Chart Figure 67 2D Gel View 2D Gel View Choose One or More Groups For Select Biological Group 5 rum e 9 e Mouse over to display _ Capen OAT ABI 4 z protein name and information Double click on a protein spot to open the Protein Sequence Viewer Molecule Weight kDa U UON B8 8 8 u s s s s Bi Single click to select the protein in the Protein Set Viewer View Chart w fe i ovem updates selections 470 475 480 m mI m so Cell Type 1 Cell Type 2 Cell Type 3 Groups are Color Coded Interacting With The 2D Gel View In the 2D Gel View the Y axis is theoretical molecular weight and the X axis is the theoretical isoelectric point Both values are calculated from the properties of the amino acids in the protein and do not include contributions from detected modifications The size of the protein spot is directly proportional the data source value For example if Total Spectral Counts is selected then a protein with 200 spectral counts will have a s
43. are listed under the modification column Customize the peptide table by selecting Peptide Table Options Figure 11 Peptide Set Viewer Select peptides tab to BIOINQUIRE PROTEOIQ display identified peptid All columns can be Double click on N g Perey AGRIS PER UUES sorted or moved peptide sequence to access Find peptides with PTMs by MS MS spectral view sorting the modification column 7 L D Peptides proteins 20 Gel y d i J Total Mr expt Mica Charge Score Delta Prob Peptde Sequence Mod fcaton Spectral Spectra ond Intensity 1 805 926 1 805 962 2 5028 2 833 1FIQVGVISWGWDVCK 457 0215 C15 3 111024 3 343 866 4 2 143 1134 0 951FLSIAYSPOGKYLASGAIDGIINIFDIATGK i 274237 1079 594 73 a 1062 099NNMSK T di 58954 1 773 018 2 4563 2 655 1FICPLTGLWPINTLK 357 0215 C3 9 6 908 1 121 608 2 3 242 1414 0 995FGlDEDGX ij 7628 4 1 274 614 2 3 016 0 843 0 924 NWGLGGHAFCR 457 0215 C10 1 4 862 2 1 052 492 2 zas 1038 0 979 WGYCLEPK c 2 4098 2 1180 2 194i 0 086 0 15WGYCLEPKK 2 82515 E 414 3 zes 128 QSISFATCSGAGAISONWORHEK a ara 3 683 842 3 683 752 4 5 089 3 307 0 988 NYQFNYPHTSVTOVTQNNFHNYFGGSEIWAGK 3 56417 2 631 394 2 631 433 3 2 902 1 139 0 956 FFGLFYAVGTPSLNPLTYTLRNK 2 9 385 9 1 831 023 1 830 986 2 1 78 0 078 0 068 NVAGTRSVAVNCKVLDK
44. area se EPIS IST Parse Files Mass tolerance for Parse MS lons Data Mass Tolerance da for matching reporter ions in each spectrum 0 0 7770000000000 v0 npn labeling reporter ions Cluster Peptides Peptide Protein Viewer lt Back Next gt He Cancel Table 16 Reporter methods supported in ProteolQ Reporter Method Reporter Masses Used for Quantitation iTRAQ2 Duplex 114 1 117 1 iTRAQA Four plex 114 1 115 1 116 1 117 1 iTRAQS Eight plex 113 1 114 1 115 1 116 1 117 1 118 1 119 1 121 1 Tandem Mass Tag 126 0 127 0 TMT2 Tandem Mass Tag 126 0 127 0 128 0 129 0 130 0 131 0 TMT6 Table 17 Fields in the Describe Experiment Section Field Choose Report Method Experimental Multiplicity Number of Replicates Replication Method Mass Tolerance DA Description Select the reporter ion used for your experiment iTRAQ TMT or Custom Note The custom report ion selection allows the user to input data from any report ion whether commercial or in house that was utilized in an experiment Neither the mass nor the number of ions is limited Multiplicity indicates that multiple reporter ion experiments were performed using a single label with a greater number of samples that the number of reporter ions utilized Example A researcher wants to analyze three cell types but only has an iTRAQ2 reagent Multiplicity allows him her to do perform two MS experiments Experiment o
45. by selecting Add 3 Select the appropriate reporter ion mass in the lons Assigned to Groups viewer 4 Select the Biological Group from the List of Samples drop down menu 5 Click Assign lon to Sample The assigned reporter ion masses will then appear as tabbed under the Biological Group name in the lons Assigned to Groups viewer At any time the reporter ions may be removed or ungrouped by highlighting either the reporter ion mass or group name and selecting the remove button Select Files for Parsing The Select Files for Parsing window is used to add Target Decoy or Non Target database search results and assign them to Replicates and Plex Figure 39 Please see the Grouping Database Search Results tutorial for a more detailed description of this process Result FileType allows you to select the database search results format to be grouped Figure 39 Select Files for Parsing Database search results are assigned to Plex and or Replicate Add selected files Describe Project 3 cte atone cts bcc eom Describe Experiment Available Results Files i Files Assigned to External Repicates Describe Correction Factor Cell Rep 2 C 1DemoWascot Results yTRAQWascotlRep d Plexi rep1 Descibe cooks Cel Rep 1 Map lons to Groups Choose Database s Search Result Options T For Parsing Select Decoy Files For Parsing seno Select NonTarget Files For Parsing Select Label Modification
46. name to the new parse rule in the Description field Enter Protein ID parse rule Enter Protein Name parse rule Select Next to test the rule vow cpu pr Save database to permanently store the new parse rule Table 19 Specify Parse Rules Field Select Existing Description Protein ID Parse Rule Protein Name Parse Rule Decoy Identifier Rule Description Existing database parse rules can be accessed via the drop down menu or a new set of parse rules can be created A description of the parse rule can be provided in the description text box Text entered here will be stored with the parse rule once saved The regular expression that defines how to parse an accession string or protein ID from the fasta header line The regular expression that defines how to parse the protein name and description string from the fasta header line For use with concatenated databases ONLY When creating a concatenated target and decoy database the user must specify a unique identifier to the decoy sequences One example would be to add the term Rev to all protein names for the decoy sequences in the FASTA db If Rev is entered in the Decoy Identifier Rule field then all peptides matching proteins with the name Rev will be treated as decoy matches for the FDR calculations For more information on creating concatenated target decoy databases see the Database Wizard Tutorial ej e T i e X A TT ce e z e a
47. only found in protein set 1 and protein set 4 will have proteins only found in protein set 2 Customizing Results Views The peptide and protein tables in the Peptide and Protein Set Viewer can be modified to adjust the view The following topics discuss how to customize the results views Moving and Sorting Columns Peptide Table Options in the Peptide Set Viewer Protein Table Options in the Protein Set Viewer Customize Protein Table in Protein Sequence Viewer Customize Peptide Table in Protein Sequence Viewer Modifying Plot Views Q Selecting Proteins in View Moving and Sorting Columns Columns in both the Peptide and Protein Set Viewer can be modified using three options Figure 49 1 Rearranged columns by dragging the column header Sort results by clicking on the column header 3 Select View Peptide Table Options or View Protein Table Options to set which columns and rows are displayed M Figure 49 Protein Set Viewer Customization Select View Protein Table Options View Plots Help Basic Filtering Pane Select proteins tab to nario displ di ibd rica All columns can be Advanced Filtering Pane isplay identified proteins sorted or moved Peptide Table Options Drag columns to change view Protein Table Options gt 1 Peptides Proteins 2D Gel Patienti Patient1 N SC Protein Total Patient1
48. score The number of non redundant peptides matching a given protein Calculated as the sum of the number of amino acids in all identified peptides divided by the total number of amino acids in the protein multiplied by one hundred Number of MS MS sequencing events for all peptides assigned to a given protein Also known as spectral counts Filters protein identifications based on their identification in a percentage or replicates For example if 4 replicates were performed and a value of 7596 or replicates was applied then the protein would have to be identified in 3 out of 4 replicates to be included in the final protein set This function allows the user to assign a text search of the entire protein dataset Proteins with names that contain the user specified text string will be rendered as a new protein set Filter produces a protein set containing proteins identified in the selected biological samples Two options are available If In is selected the protein set will contain proteins identified in the selected sample If Not In is selected the created protein set will contain proteins identified in the biological sample that was not selected Field Description Metric to use for FDR The drop down menu allows the user to select the type of peptide scoring function to utilize when ProteolQ calculates the protein false discovery rate Three choices are available 1 Discriminate score Fval 2 Score for Mascot lon Sco
49. 066 Clustering gene 1001 of 34966 Clustering gene 2001 of 34966 Clustering gene 3001 of 34066 Clustering gene 4001 of 34066 Clustering gene 5001 of 34966 Clustering gene 6001 of 34966 Clustering gene 7001 of 34066 Clustering gene 8001 of 34966 Clustering gene 9001 of 34966 BIOINQUIRE PROTEOIQ Note For large databases the clustering process will take some time In these cases we recommend running the parsing and clustering steps in auto mode by selecting Auto Generate Initial Protein Set Creating a New Project Reporter lon Quantitation To create a new project in ProteolQ first select the New Project Wizard from the Navigation Pane as shown in Figure 34 then select Reporter lon Quantitation Figure 34 New Project Wizard Select New Project Wizard to create a new project ProTEOIQ IM Please choose the quantitation method for this proteome uw lm 1 Label free Spectral Counting d About E Reporter Quantitation TRAG TMT Minimize uo Close ox e LL Dr A ui x Z e There are seventeen sections in the New Project Wizard The following describe the individual sections in the New Project Wizard Describe Project Describe Proteome Describe Experiment Describe Correction Factor Describe Samples Map lons to Groups Choose Database Search Results Options Select Files for Parsing Select Label Modification Peptide Pa
50. 1 8 252 3 1 046 598 1 046 58 2 3341 119 099FEQVVXK 9 70852 209 19 2 129 122 3 2194 0 403 0 378 WFLLEQPEIQVAHFPFK i 51849 1 684 926 1 684 892 3 1641 0 466 0 488FOPVRGEVPPRYPR 26 2 6893 1 268 703 1 268 675 2 2 01 0 701 0 665FOPVRGEVPPR 26 3 6639 1140 74 1 140551 2 2 979 1 079 0 904 MEYCNUK 4570215 04 H3 459 3 985 35 3 985 254 3 1212 1097 1NTRLGWIQGKQVTV LGSPVPVNVFLGVPFAAPPLGSUR i 7295 3 1 909 801 1 909 77 3 1 392 0 335 0 168 WESCDYQRSEO PR 457 0215 C4 457 0215 C11 1 5 470 9 1092651 1 09263 2 2 956 1 207 _0 99 NTLITYLDK ll 43642 2 481 295 2 481233 3 5 278 2 829 0 999 NTLIIYLDKVSHSEDOCLAFK 357 0215 C17 8 5 6322 108 751 1 508 741 2 2052 0 039 0 093 WEPDSSKKGMTW 315 9949 M10 i 8085 2 299 26 2 299 204 3 3 625 0 785 0 79FDASHKIEVEGVTRGAVELNK 1 93556 1 370 668 1 370 637 2 3 384 1894 0 999NTFTLSCDGSIR 457 0215 C7 3 4 143 8 1009399 LONESISI 2 3153 12 0993WRCDPR 8 4457 0215 C4 _ E 12 725 2 096 064 2 096 014 3 3 826 1 76 0 995 NTEOEEGGEAVHEVEWIK 3 8 0373 One peptide or multiple peptides can be selected by a single right click on the row of interest Figure 12 Right clicking on the selected peptides allows the highlighted peptides to be extracted to a new protein set that is accessed via the Protein Set Navigation Bar Select Matching Proteins highlights proteins containing the selected peptides in the Protein Set
51. 6331 a 162 052824 2 203 079373 200 300 400 500 600 700 800 900 1 000 1 100 1 200 1 300 1 400 1 500 1 600 1 700 1 800 Mass miz Peptide FIOVGVISWGWDVCK 57 0215 C15 Precursor Mass 903 4883 Annotate spectra with neutral losses This table can be customized 77777 Peptide Fragmentation Table The bottom panel contains a table with the predicted m z of the selected ion series The color codes indicate that the specified ion series was detected in the selected MS MS spectrum BIOINQUIRE PROTEOIQ Protein Set Parameters Field and Output Notes Since multiple Protein Sets can be created in ProteolQ the Parameters and protein set information are reported in the Protein Set Parameters and Output Notes panes Figure 14 Both are located at the bottom of the Protein Peptide Set Viewer Custom text may be entered into these text fields to help keep track of important information related to each Protein Set Figure 14 Protein Set Parameters and Output Notes All parameters used to create a protein Melende Weight kDa asssassaspassi chkibiBhid 4m 43 4 am soo sm ss sr eoo em 4m am Tm r r r7 ew s 4A sm m O28 set are recorded and displayed in the Protein Set Parameters pane Output and Notes Peptide Filter Settings Min Pep Score 2 0 Peptides with modification s 57 0215 Min Peptide Probability 0 9 Protein Filter Settings Biclo
52. Biological Sample 1 proteinX Log 0 5 is selected as Y Axis and Biological Sample 2 proteinX Log 0 5 is selected as Reference Group the Log Expression Difference will be calculated as follows 1 0 0 5 0 5 When more than one Biological Sample is selected as the reference X axis the Data Source is averaged for each protein between the selected Biological Samples prior to the Log calculation being performed The Data Source specifies when type of data to use for the Log relative expression calculation Options are Non Normalized Average Spectral Counts or Normalized Spectral Counts Note for reporter quantitation the Data Source field is deactivated and expression levels are calculated as described Quantitation Overview Reporter lon Quantitation Select either Descending or Ascending to modify the order of the Log2 Relative Expression Difference values Hides proteins that are NOT present in all the selected Biological Samples Sets the color intensity for the data points based on the metric chosen in the Choose Data Source drop down menu and the Min Significance Level setting For example if Normalized Spectral Counts is selected and the Min Significance Level was set to 50 The spots colored dark blue would have normalized spectral counts exceeding 50 and the lighter colored spots are protein with less than 50 spectral counts Other Topics Protein Grouping Q Source Q Redundancy and Multiple
53. Charge States Q Discrimanant Score F value BIOINQUIRE PROTEOIQ Protein Grouping Correlating peptides to proteins in a large scale proteomics project is not a straightforward process because protein databases contain many homologous proteins which are often indistinguishable by single peptide identification To account for this it has become readily accepted that peptide identifications should be distributed into protein groups A protein group represents a set of peptides used to identify a group of homologous proteins In the example below five peptides were identified APHLRK peptide A EIEQLFEHHHQQK peptide B CGYKQPTPVQR peptide C HGVSPARER peptide D LSTVHECPSHK peptide E Peptides A B C D all map to protein 1 However peptides C D are also observed on protein 2 therefore proteins 1 and 2 are grouped as PROTEIN GROUP 1 with protein 1 being listed as TOP since it is able to explain the presence of all four peptides whereas protein 2 can only account for two The identification of peptide E allows a unique PROTEIN GROUP 2 to be formed because peptide E is not observed in the sequence of protein 1 or 2 PROTEIN GROUP 1 LSTVHECPSHK PROTEIN GROUP2 O LLI O a4 OW Lu E e O aa For a more detailed discussion on the formation of protein groups and their importance in the calculation of false discovery rates or probabilities see the following references
54. Choose Plex Peptide Parsing Options q d z and Replicate Parse Files ResultFile Type Mascot dat ListofSamples Plxirepl c a Parse MS lons Data impe jooo Assign files to Cluster Peptides Foe RE ERES Plex and Replicate Peptide Protein Viewer E Browse for database search files Grouping database search results in ProteolQ is performed as follows 1 Select the search results format Note multiple search result formats can be imported into ProteolQ within the same project 2 Browse database search results The name of the peak list used to produce the database search result is rendered in the Available Results Files viewer 3 Selectthe files for grouping 4 Once selected the files appear as highlighted Move the highlighted files to the Files Assigned to External Replicates viewer by selecting Add 5 Select the appropriate result file s in the Files Assigned to External Replicates viewer 6 Select the Replicate and Plex from the List of Samples drop down menu 7 Click Assign Files to Sample Manually group search result files to source if selected allows source files with different names to be grouped together as if from the same source MS MS file e T I e ad Q TI c ES Select Label Modification The Select Label Modification window allows the user to select which modification is associated with the reporter ion being used for qu
55. Expression Values by comparing the individual spectral counts from each biological group relative to the average spectral counts across all biological groups The Modify Label Free Quantitation Settings dialog box allows the user to select the biological group to be used as the reference group for quantitation For example in figure 69 there are three biological groups Patient 1 3 As described in the Quantitation Overview Label Free section the reference group is treated as the average spectral count for protein X across all patients If patient one is selected as the reference group then the spectral counts for protein X in patient one are treated as the reference and all Log ratios are recalculated and displayed as a new protein set The following sections describe how to modify label free quantitation settings Figure 72 Select Edit Modify Label Free Quant Settings Select the biological group to act as the reference value Click Apply To recalculate Log expression values using the selected biological group as a reference you MUST CREATE A NEW PROTEIN SET 5 Tocreate a new protein set either a Highlight all of the proteins right click and select copy to new protein set or b Select the protein set in the Filter Previous Set drop down menu and click Create New Protein Set BmowMI Figure 72 Modify Label Free Quant Settings Dialogue Box f Normalization Method X Please choose one or more groups for the reference
56. Menu From the View Menu Figure 17 the Filter Panes Peptide and Protein Table Options are accessible Table 3 lists the menu items availabe in the View Menu Figure 17 View Menu View Plots Help Basic Filtering Pane Intermediate Filtering Pane All Results Advanced Filtering Pane Top Scoring Results Peptide Table Options Select Columns Pond Table Options gt All Results Top Scoring Results gt Select Columns Table 3 View Menu Items Menu Item About Access Basic Filtering Pane Toggle Basic Filtering Pane View gt Basic Filtering Pane Intermediate Filtering Pane Toggle Intermediate Filtering View gt Intermediate Filtering Pane Pane Advanced Filtering Pane Toggle Advanced Filtering Pane View gt Advanced Filtering Pane Peptide Table Options Adjust display options in the View gt Peptide Table Options Peptide Set Viewer Protein Table Options Adjust display options in the View gt Protein Table Options Protein Set Viewer Plots Menu Graphical displays are avialable in the Plots Menu Figure 18 Table 4 lists the menu items availabe in the Plots Menu Figure 18 Plots Menu Plots Help Proteome Validation gt Experimental Reproducibility gt Gel based Proteomics gt Comparative Proteomics Table 4 Plots Menu Items Menu Item Peptide FDR Protein FDR Peptide Probability Quantitation Sampling Correlation Scatter Replicate Analysis Scat
57. New Hyperlinks tutorial Figure 45 Specify Hyperlinks ac A T x j i Z e e ProteolQ Database Maintenance eJ a Describe Database Add new hyperlink if desired Specify Parse Rules Test Parse Rules Address attp www ncbi nim nih gov sites entrez db protein amp cmd search term id Enter entire URL type Sid where the protein ID should go e g Specify Hyperlinks http www ncbi nlm nih gov entrez viewer fcgi db protein amp val Sid Save Database Hyperlink Name NCBI Protein Entrez Test Add Add existing hyperlinks if desired Existing Hyperlinks Selected Hyperlinks iProtKB IPI International Protein Index eneDB NCBI Protein Entrez Add Steps to add an existing hyperlink 1 Select hyperlink name in the Existing Hyperlinks field 2 Select Add 3 Click Next Note For a detailed tutorial on creating new hyperlinks in ProteolQ please read the Create New Hyperlinks tutorial Tutorials are located in the C Program files ProteolQ docs path O a4 OW Lu oe Z m Using the Filter Pane The filter pane is designed to allow the user to select a set of parameters for filtering a peptide and protein set Topics in the section are Q Selecting Filter Panes Q How Filtering Works C Selecting Protein Sets to Filter C Description of Filter Parameters Selecting Filter Panes There are three f
58. Normalization is performed see the Protein Normalization topic Table 26 Protein Column Selection Menu Reporter lon Quantification Field Description Pseudo Spectral Calculated by normalizing the reporter ion intensities for each peptide Count then summing the normalized reporter ion intensities for all peptides matching a protein For detailed information on how this calculation is performed see topic Pseudo Spectral Count Quantitation Pseudo Spectral If 22 replicates are assigned ProteolQ calculates the standard deviation Count Std Deviation for each protein identification using the Pseudo Spectral Counts and Average Pseudo Spectral Count in each replicate for a given Biological Group Pseudo Spectral The percent standard deviation more commonly referred to as relative Count 96Std standard deviation RSD is calculated as Avg Pseudo Spectral Count Deviation Std Deviation Average Pseudo Spectral Count X 10096 Log Relative Calculated by taking the Log of the Relative Expression Value For more Expression detailed information see Quantitation Overview Reporter lon Log Relative If 22 replicates are assigned ProteolQ calculates the standard deviation of Expression Std Log Relative Expression Value across all replicates within a Biological Deviation Group No Normalization Calculates Relative Expression and Log Relative Expression without normalization using the Pseudo Spectral Count Spectral Count Total Pseudo S
59. PROTEOIQ BEYOND PROTEIN IDENTIFICATIONS by BIOINQUIRE Table of Contents Getting Started seiss 6 Tutorials and Quick Start Guide sssssssssssssssseesessussssscesesssssusesecessssssssssoesosssssessessssssssssseseess 6 System Requirements 5 eas oou s ooa a aio oe yu pa ER DEREN Y PE DERE NR EEDERER REDDERE RR BE DERERE EE skosad baiss dosskss 7 Installing ProteolQ 5 ioo ER oe reor AENEAN SEEE EPE EE CERERR A ERROR Le EKINEN NENESE 8 Help files and sample data 2 01 rino ra rone prune na eant naa enu 2a eoe Xa ax p naei o Sao aA Ex aka naga d 8 Licetising 2 oorpore cto son coup uas Do sa peo EE RR ER c ee Sa EROS e uo RR a E FR RROOR RR sudcaasssvcbaassevesaasssuosaueasveoys 8 Contact Information eiie tton barn hoa hun ERE RR PR PER nS RP REBERE EE BEBERE SERERE XR RE RERKR RA ERES E MP ESNE 9 igce edrrrcPM T 10 Navigation Panel 11 1 eo tekn se or p tna tope ha E rupe kb ae ne be ENN on RARE ERIS RARE K SAA ANAS ENSAR NIRE AaS AN 11 New Project Wizard Label Free Quantitation eeeeeee e eese eee eene eee eene nennen nnus 12 New Project Wizard Reporter lon Quantitation ecceeeee eese eee eene eene nnne nennen nnus 13 Modify Database Wizard 2 irren euo eoo era pa dh nia ep by a Vyyd Kk Va LED EINE vedas VR EE DUE E REY RR RN VR ENDO 14 Peptide Protein ViCwe lic siscsssissses
60. Peptides Search Results Options Select how peptides are used to assign proteins Figure 29 Table 12 contains the options for selection in this section Figure 29 Search Results Options Choose Database s Mascot and Sequest allow you to set the number of protein IDs to show for every peptide match Search Result Options If you set this to the maximum AND you are using the same sequence database it is probably unnecessary to re cluster the parsed peptides to the protein sequences Select Target Files For Parsing Select Decoy Files For Parsing Believe the protein IDs in the search results files Select NonTarget Files For Parsing Cluster parsed peptides to my sequence database Peptide Parsing Options Parse Files Parse MS lons Data Cluster Peptides Peptide Protein Viewer lt Back I Next al Cancel Table 12 Selections in Search Results Options Field Description Believe the protein IDs in Reports ONLY proteins assigned to peptides in the database search the search results files result If this is selected peptides are not mapped to the FASTA database For more information see Believe The Protein IDs in the Search Results Files Cluster parsed peptides to By selecting this option ProteolQ will align all peptide matches to the my protein database protein sequences within the specified database This is the recommended option for most proteom
61. Score Filters ProteolQ sums the Peptide Scores for all peptides matching to a protein to create a Total Score Two methods can be used to filter proteins by Total Score as shown in Figure 63 Figure 63 Protein Score Filters Sort by Total Score Set Total Protein Score Filters Protein Filters Protein Total Protein Min Spectra Gap 4 P as Score ICT Weight 009 antigen identified by monodonal antibody Ki 67 Homo sapiens Min Peptides 2 ten sof TE Homo sapie 7 Hin Sequence Coverage Min Total Protein Score 2 e P0606 M Copy Selected Ctrl C e 00306 Copy Selected to New Protein Set Min of Replicates Lr ree View Protein Info 1 4 rez a Homo sapiens 5 1450 148 147 Copy Select Proteins to New Protein Set Validating protein identifications by Total Score in the Filter Pane 1 Enter Min Total Protein Score in Filter Pane 2 Select Create New Protein Set Validating protein identifications by Total Score in the Protein Set Viewer 1 Sort peptides by Score by selecting the Total Score column header 2 Highlight to select proteins of interest 3 Right click and select Copy Selected to New Protein Set Protein Probability ProteolQ calculates probabilities that estimate the likelihood that the protein assignment to be correct based on the peptide probabilities for the identified peptides mapping to the protein Probability thresholds can be applied using the Filter Pane or b
62. The relative expression changes cluster dot plot is used to compare Log Relative Expression Values between two or more Biological Samples Figure 80 To open the Relative Expression Changes Cluster Dot Plot Plots Comparative Proteomics Relative Expression Changes Cluster Dot The Cluster Dot Plot provides an overview of expression for all proteins from the selected Biological Samples Biological Samples are listed on the X axis and the Y axis displays Log Relative Expression Figure 80 Relative Expression Select Biological Samples to Display Select Metric Add Connections Changes Cluster Dot Plot Relative Expression of Proteins Across Biological Groups 3 Patient Proteins by Biological Group Patient2 Log2Ratio 0 885 0 337 0 585 0 602 0 274 Replicate Data Displayed for Selected Protein Table 39 Parameters in the Relative Expression Changes Cluster Dot Plot Field Description Choose One or More Groups Selects Biological Samples to display When a Biological Sample for Comparison is selected the Log Expression for proteins using either Non Normalized Average Spectral Counts or Normalized Spectral Counts will be displayed as data points with the selected Data Source on the Y axis Select Groups to use as Determines the Biological Samples that act as the reference for Reference the Log2 Relative Expression calculations Fo
63. Vacveadesstegeanssontendssteneanvsctectesse 91 Peptide False Discovery Rates ertet erento sscacenvsescchestacadcessacecheaseadenssacechedtacadecesscectense 93 Validating MS MS Spectra esses enne nnne nnne nnn nennen nnns as ia aa 95 Spectral Viewer Overview 95 Adjusting Peak Labeling eai tete di cbatesaeadesteaiadedapeeasdecadictadeieeevecgalees 96 Mapping Neutral LOSSES TR 98 Validating Reporter lon Quantitation esses esee eene enne ener nne 99 Protein Identificatiori eorr ere EYES EXE NEN ES RNEN EYES EXE RN E NN NEN NES SNR NN YES RN EN EN RS EYE exe SN NEN RES Eve Y 100 Protein Score Filter S E 100 Protein Probability ED 101 Protein False Discovery Rate c cccccsccccsssenceceeseeeececceneececceeeecscceeaeceseeeeeceseeeseceeeeesecsseenaeeeseenaess 102 Whole Proteome Visualization eeeeeeeee esee eene ee eene enne nennen nnn nnnnn nennt nn sn innu nnn 104 PLO CHAM d EI acess 105 Venn Diagram Group Summary VENN sceesseceecesssecesseecesseecseeeesaecseseecseeeessaeeesaeeseateesaess 106 Traditional Venn Diagram Labeling eessseeseses eene ener 107 Strict Venn Diagram Labeling esses ener nne ne nenne 107 eiucJBum M M 108 UD GO WICW
64. Workflow sess 125 Pseudo Spectral Counts Without Normalization eese enne 125 Pseudo Spectral Counts With Normalization eese eene 126 Analyzing Replicate Results esses esee ener snnt n tinens 127 isole duccceme M 127 Replicate Pot oera cand sdatenedesgntent OA AA 128 Visualizing Quantitative Results ssssessessseeesee esee eene nnne 129 Quantitative Expression Scatter ceccccssscssssecsscecsssceessecseseeceeeessaeeseseeceeeeessaesesaeeseesessas 129 Quantitative Expression Heat Map enne nnne nnne nnne nnns nnns 131 Relative Expression Changes Cluster Dot esses nnne 133 Relative Expression Differences Dot cccccssccssssesssscsessecesseeceeeessaeessseesseneessaeessaeesenseesaess 135 iege c 137 O T of at TT cc zZ O e Protein Grouping o or oen oo nonu oe nb khu oe ee yk RR nr nea ERE oe eX ERR Ra RRRERE oa SERERE IR RRRERR i EENEN N soi 138 S PR 139 Redundancy and Multiple Charge States ccccsccccsscesssecessecssseecseeeesseessseecseeeecseesssaeeseaaeees 140 Discriminant Score F value ice Eee eee te Cei dte ER ee Ree Eger ERE Ee Pede d ERE RE TENDRE 141 Legal Notices eorr eee ottenuto SENE RUPEE EE ER En NENNEN E AEE INOVE ENE AON ENOC EENE NE SES Seeees ENNES EEEIEE Ei 142 ID 3 qp
65. YKGETKSFYPEEVS heat shock protein 8 Mus musculus Protein sequence SMVLTKMKEIAEAYLGKTVINAVVIVPAYFNDSOROATKDAGTIAGLNVLRIINEPTAAA Protein Length AA 646 with identified IAYGLDKKVGAERNVLIFDLGGGTFDVSILTIEDGIFEVKSTAGDTHLGGEDFDNRMVNH Protein Weight kDa 70 8092 3 displaved FIAEFKRKEKKDISENKRAVRRLRTACERAKRTLSSSTOASIEIDSLYEGIDFYTSITRA Link out to peptides displaye RFEELNADLFRGTLDPVEKALRDAKLDKSQIMDIVLVOGSTRIPKIOKLLODFFNGKELN 360 NcBIProleinEnrez 4 pi in red KSINPDEAVAYGAAVOAAILSGDKSENVODLLLLDVTPLSLGIETAGGVMTVLIKRNTTI biological databases PTKQTQTFTTYSDNQPGVLIQVYEGERAMTKDNNLLGKFELTGIPPAPRGVPOIEVTFDI DANGILNVSAVDKSTGKENKITITNDKGRLSKEDIERMVQOEAEKYKAEDEKORDKVSSKN SLESYAFNMKATVEDEKLOGKINDEDKOKILDKCNEIISWLDKNQTAEKEEFEHQOKELE KVCNPIITKLYOSAGGMPGGMPGGFPGGGAPPSGGASSGPTIEEVD 627 311 Identified peptides are 2 5 627 405 4 R 1131808 2 261601 t 7 Jeene Carbamidomethyl C14 1 132172 z 7 AVGIDLGTTYSCVGVFQHGK Carbamidomethyl C14 1 132 462 2 262 909 GPAVGIDLGTTYSCVGVFQHGK Carbamidomethy C14 113218 2 GPAVGIDUGTTYSCVGVFQHGK Carbamidomethyl C14 1330 300 2257 mal XXn 3 s L 3 GOAVGITN CTTVSCVGUEDRACY C arhamidnmatni 7 14 i j displayed for each protein The lower table provides information about each peptide identification Double clicking on the peptide sequence will access the MS MS Spectral Viewer Right clicking on th
66. a J MSlonsData validated Mouse Cell Proteome piq Select piq file jvalidated msHeaderobj to open project validated normUsedGenes fasta validated normUsedGenes inx E aa Related validated pepToGene obj project _ validated proteinSet obj files validated targetPeptide obj Creating a New Project Overview Figure 23 shows the process taken to create a new ProteolQ Project The New Project Wizard and Modify Database Wizard provide step by step instructions for creating a new project Figure 23 The New Project Process for Label Free Quantitation A and Reporter lon Quantitation B A Label Free Quantitation B Reporter lon Quantitation Create Project Name and Description Create Project Name and Description Groups and Replicates Specify Replicates or Multiplicity i i Enter Number of Biological Select Reporter Quantitation Method BIOINQUIRE PROTEOIQ Name Groups and Replicates Apply Correction Factors i Select or Modify FASTA Database Name Groups and Replicates Match Reporter lons to Groups and Replicates Select or Modify FASTA Database Choose Search Result Format lt Groups and Replicates Enter Initial Statistical Thresholds for Filtering Peptide and Protein Identifications Assign Database Search Results to Choose Search Result Format Assign Database Search
67. antitation i e in the case of iTRAQ the modification would be 144 1 To select the modification simply highlight the correct modification mass and name then select Next Figure 40 Select label modification Describe Experiment Please select the modification associated with the labeling reagent in this experiment Describe Correction Factor Describe Samples Note if possible ProteoIQ auto selects the most likely modificaiton Map lons to Groups Modification Name Modification Mass 144 098068 Choose Database s 15 994904 Search Result Options 0 984009 enter the name of modification enter the mass of modification Select Target Files For Parsing Select Decoy Files For Parsing Select NonTarget Files For Parsing Peptide Parsing Options Parse Files Parse MS lons Data Cluster Peptides Peptide Protein Viewer Note If the correct mass name is not displayed the user can enter the appropriate mass in the enter the mass of modification field Modify Database Wizard To load or modify a FASTA database in ProteolQ first select the Modify Database Wizard from the Navigation Pane as shown in Figure 41 For a detailed description for creation of new databases please see the Database Wizard and Creating New Hyperlinks Tutorials Figure 41 Accessing the Modify Database Wizard Add or modify FASTA databases AI and create hyperlinks O LLI I Q a4 as
68. at appear in more than one protein Protein Groups per Specifies the number of protein groups a peptide can match to For Peptide proteomes with many homologous proteins a single peptide can match to many protein groups This means that the peptide is not unique to a single protein group If the user restricts the number of protein groups per peptide to 1 they in effect display only unique peptides In other words only peptides than are uniquely assigned to a single protein group are displayed Charge state If the filter by charge state is selected ProteolQ removes all peptides that were not identified with the specified charge states Note Filtering by charge state only requires that at least one occurrence of the peptide identification is a result of the assigned charge state All other peptide identifications will also be displayed as long as one of them has the specified charge state Enzyme Specificity Selecting enzyme specificity limits the displayed peptides to those identified in the database search results from which the given enzyme cleavage rules were applied Field Modification Name DB Search Engine Total Protein Score Peptides 96 Sequence Coverage Spectra 96 of Replicates Gene Name Contains Biological Sample Description Displays peptides identified with the user selected modification Displays peptides identified from the selected DB search engine Filters proteins based on total
69. atabase 44 Describe Proteome 43 Describe Samples 43 56 Description 61 63 Difference 74 discriminant score 141 Discriminant score 72 Discriminate score 71 Each result file represents an individual sample 42 Enzyme specificity 69 Export 23 30 85 86 Expression Ratios 17 Extract ion data for peptides id d from a single spectrum 48 61 63 68 69 70 71 72 74 76 77 78 79 80 81 82 86 87 88 96 97 105 106 107 115 118 125 126 130 134 136 F value 141 false discovery rate 29 67 71 72 False Discovery Rate 29 48 89 94 100 103 104 False Discovery Rates 42 FASTA Database 37 38 Fragment lon Type 21 Fragment mass tolerance 87 96 Fval 71 Gene name contains 70 Hyperlink 65 Installing ProteolQ 8 intensity percentile 87 88 96 97 105 106 Intersection 74 76 lon Series 87 96 Label peaks with mass to charge 88 97 Label regardless of intensity 88 97 Licensing 8 Mascot 10 30 37 42 47 49 69 71 77 78 86 90 141 142 Max Peptide FDR 71 Max Protein FDR 71 Metric to use for FDR 71 Min peptide length 50 Min score 48 49 Min score differential 49 Modification 20 70 Modification Name 70 Modify Database Wizard 10 11 14 18 36 38 60 61 Modify Databases 44 Neutral Losses 21 88 95 97 98 99 New Project Wizard 12 13 non target database 42 NSAF 119 OTHER 139 Parse Files Progress 50 Peptide Length 69
70. ate New Hyperlinks tutorial Peptide Protein Viewer Figure 5 is a an overview of the Protein Peptide Set Viewer PSV The PSV contains all of the protein and peptide identifications in the ProteolQ project Filters Plots and Comparitive Functions are all accessbile from the PSV interface The following items can be accessed through the PSV Q Protein Set Navigation Bar Protein Set Viewer Protein Sequence Viewer Peptide Set Viewer MS MS Spectra Viewer Q Protein Set Parameters Field Output and Note Field Q Menu Bar BIOINQUIRE PROTEOIQ Figure 5 Protein Peptide Set Viewer Virtual 2D Gel View Interactive Plots Charts Protein View disp Set Navigation j i ach set contains a unique E i rotem paptide a Compare protein sets using difference and select filter panes F Peptide View P pep intersection and union functions 1 1 E Save and INTRO export options Proteol ProfeinSetViewer OrgEnrich Total TI mam Access project properties Lj in E 2 m Molecule Weight kDa Filter protein and peptides sets Min Spectra 400 425 480 475 500 525 550 575 600 625 650 675 700 725 750 775 800 825 850 875 900 925 pl amp Amastigote Epimastigote Trypomastigote Min amp Peptides filter Previous Set No Fiter Es p TT is we
71. ative Expression between two or more Biological Samples To open the Relative Expression Differences Dot Plot Plots Comparative Proteomics Relative Expression Differences Dot Figure 81 Relative Expression Differences Dot Plot JroteolQ Relative Expression Di logi Dot ole File Help Selec t Biological E ane Protein Relative Expression Differences Between Biological Groups Samples to Display Stemm UR E aa Patient3 j a 100 CO PENSIERI A 075 Pae Ese Select Reference D pd Es i O E 025 UU Select Metric mos 05 ue P Non Normaized Average SC v xe Order Proteins by Value d T 177 Ascending Order z 1 25 Choose Ordering 7 Shade By Expression Level d i Protein ld Individual Proteins Listed on the X Axis Table 40 Parameters in the Relative Expression Differences Dot Plot Field Choose Groups for Y Axis Choose Groups for X Axis Expression Level Data Source Order Protein by Value Show Proteins in All Groups Shade by Expression Value Description When a Biological Sample is selected the Log Expression Difference for that samples proteins using either Non Normalized Average Spectral Counts Normalized Spectral Counts or Reporter lon Quantitation will be displayed as data points Determines the Biological Samples that act as the reference for the Log2 Relative Expression Difference calculations For example if
72. ator in the ratio calculations For example if an iTRAQ duplex experiment is performed and the 114 1 is set as the reference group then quantitation will be performed by taking the ratio of reporter ion intensities as 117 1 114 1 Map lons to Groups The Map lons to Groups window is used to associate reporter ion masses with a Biological Group and Plexes Figure 38 Figure 38 Map lons to Groups Reporter ion masses are assigned to Biological Group Add selected reporter ion and Plex Please map the reporter ions to the biological groups replicates Describe Experiment Available Reporter Ions Tons Assigned to Groups 1 m 116 1 i Describe Samples Control i Maplons to Groups Choose Database s Select reporter ion masss Search Result Options Select Target Files For Parsing Select Decoy Files For Parsing Select NonTarget Files For Parsing Select Label Modification Peptide Parsing Options and Plex TCU a ListofSamples Celi ccc EC Cluster Peptides ontrol Peptide Protein Viewer a Cell 2 Choose Biological Group lt Back Next gt Cancel Mapping reporter ions to biological groups in ProteolQ is performed as follows 1 Select the reporter ion mass and plex in the Available Reporter lons viewer 2 Once selected the reporter ion masses appear as highlighted Move the highlighted reporter ions to the lons Assigned to Groups viewer
73. ave Settings to save the correction factors Bo wm mp Describe Samples Figure 37 shows the Describe Samples section of the New Project Wizard Add Samples Names and select the sample to be used as the reference group Figure 37 Described Samples Assign a name to each sample Assign reference group Describe Project Describe Proteome Describe Experiment Doscribe C ion F Please provide names for your biological groups and modify the number of replicates TESTE if necessary Uescm vamp SAMPLE NAMES REPLICATES REFERENCE GROUP Map lons to Groups Control 2 Choose Database s Cell 1 2 Cell 2 2 Search Result Options d H 2 Select Target Files For Parsing Select Decoy Files For Parsing Select NonTarget Files For Parsing Select Label Modification Peptide Parsing Options Parse Files Parse MS lons Data Cluster Peptides Peptide Protein Viewer Sample Names Inserting a unique name for each sample assists the user in identifying the individual sample sets in a complete proteome This field is useful if biological replicates were analyzed e g patient 1 2 or 3 This field is optional as default sample names will be generated based on the Reporter Method selected Reference Group When a reference group is selected the reporter ion corresponding to the reference group will be used as the control and will act as the denomin
74. be display identified proteins Sama sorted or moved Total across all Individual data for biological groups each biological group d e LI Peptides Proteins 20 Gel d Patenti Patenti N SC Pien ISTo Sequence Id Sequence Name Total Score T Patenti Spectral Log2Reltve l Count Expression JNP 000375 2 polipoprotein B precursor Homo sapiens 434 631 496 253 063 129 0 552 INP 000055 2 complement component 3 precursor Homo sapiens 411 292 836 265 714 230 0 478 IxXP_001719515 1 PREDICTED hypothetical protein partial Homo sapiens 297 244 587 191374 164 0 458 XP 001724196 1 PREDICTED similar to complement component 3 Homo sapiens 106 297 227 66 589 59 0 558 001002029 2 complement component 4B preproprotein Homo sapiens 210 802 373 165 427 108 0 386 009224 2 complement component 4A preproprotein Homo sapiens 210 802 373 165 427 108 0 386 INP 000583 2 complement component 48 preproprotein Homo sapiens 204 916 353 157 907 104 0 358 NP 001002029 1 complement component 48 preproprotein Homo sapiens 204 916 353 157 907 104 0 358 INP 000087 1 precursor Homo sapiens 152 203 179 98 085 52 0 375 001701 2 complement factor B preproprotein Homo sapiens 104 856 150 49 674 E 0 371 NP 000005 2 pha 2 macroglobuin precursor Homo sapiens 99 673 128 4195 24 1 046
75. bel Lowering the minimum intensity percentile allows ions of lesser intensity to be considered for ion series annotation Specifies the error window for labeling of fragment ions with ion series annotations Units are Da Field Description Label Reporter lons Adds the reporter ion label to the reporter ions in the MS MS spectrum Selecting Label Region Only adds only a single label to the entire reporter region Label Peaks with Adds the m z of the ion along with the ion series label Mass to Charge Selected Neutral In the Selected Neutral Losses table masses of expected neutral losses Losses can be entered and then selected for annotation in MS MS spectrum The mass of the neutral loss will be adjusted based on the charge of the Observed precursor ion and subtracted from the precursor m z For example if the neutral loss mass is entered as 98 Da and the observed precursor ion has a charge state of M H then ProteolQ will search for an ion in the MS MS spectrum over the minimum intensity percentile that is 49 m z units below the m z of the observed precursor ion Label Regardless of Selecting Label Regardless of Intensity removes the minimum intensity Intensity percentile filter when labeling ions resulting from the selected neutral losses Mapping Neutral Losses Several types of post translational or chemical modification produce intense ions in the MS MS spectrum because of fragmentation mechanism occurr
76. ccsssecesscecsnecesaecssaeeceeneecsaeesssaecseseecseeeesaesessaeeseneessaees 69 Comparing Protein Sets coo esas o suas s e sah h nn SE pua a OE SE RR siss SE RRRREISRRRRRSNRRRRR ORAN NR REGSSRR ERR GERA 73 Customizing Results VieWs eese pupa a eo nnn b a Eee un Pa RES pa RR BERE ua REESE RR RRRURNRR RARENR Nu RRERNe e RA DERE 75 Moving and Sorting Columns esses seen enemies nnne esee en innen 75 Peptide Table Options in Peptide Set Viewer ener 76 Protein Table Options in Protein Set Viewer esses enne enne nnne 79 Customize Protein and Peptide Tables in Protein Sequence Viewer seeeeeeeee 83 Modifying aua REM 84 Select Proteins Ii VIEW e 84 Exporting Results occssicccsssestsccssssiseesssssseesssscecessssssnessascccessacsceensdecccessancceensssccsessadecsessdaccdessaases 85 PLOTEIN SETS to EXDOFt eter ee eee E vanuieitegsitgyteeedinasden EE a E E EEEa 85 bikes do olg m 86 Parameters for Creation of Spectral Images cccesccesssecsssseceeeeecseeeessaeesessecseeessaeesssaeeseseeseaees 87 arra iU cz e 89 Peptide Identification ooo ci oo eos enero ra eo eaa oae a ouai oe Yan saxa ona Uk sean SARNE snas Ca aaa navy voa En aa 89 k dulcce 90 Peptide Probabilities censerent nnper nane sickaxasscndehdssdecean
77. cense code into the fields available and select submit Note Your computer must have an active internect connection For unlimited licenses this will be the only time ProteolQ has to confirm the license so your computer can be removed from the internet once the license has been activated If you are using the ProteolQ 40 or 200 versions the license key must be confirmed each time ProteolQ is opened so the computer must always have access to the internet If your license has expired or you are using the ProteolQ viewer select Open Viewer Only This will allow you to access previously processed ProteolQ data files without the need for an active license key Contact Information For technical support and general inquires Phone 706 583 4000 Fax 706 583 4037 Email support bioinquire com Online support https www bioinquire com library php For sales or to obtain a license Phone 706 583 4000 Fax 706 583 4037 Email sales bioinquire com To purchase online https www bioinquire com editions and pricing ph ProteolQ Basics ProteolQ version 1 5 supports the analysis of database search results from Mascot SEQUEST and X Tandem The software is built using a modular design in which all windows can be opened seperatly and two wizards guide you though the process of importing the results into ProteolQ This chapter provides an overview of the interfaces and functionality in ProteolQ Contents Navigation panel Q New
78. d spectral count is multiplied by the Normalization Factor according to Biological Sample either 1 or 1 25 5 Average spectral counts are calculated for every protein across the replicates Designated as Normalized Avg SpC Note By default expression ratios are calculated relative to the Normalized Avg SpC The user has the option of changing the reference group using the Modifying Label Free Quantitation Settings 6 Log transformations of the Relative Expression values are calculated for each protein across replicates Relative Expression values are calculated as the spectral count for protein X divided by the average spectral count for that protein across all biological samples for the same replicate or using the user defined reference group 7 To determine the Log Relative Expression for a Biological Sample the Replicate Log relative expression values are averaged across all replicates within a biological sample 8 Standard deviations for the Log2 Relative Expression are then reported Quantitation Without Normalization Label Free Relative Expression analysis can be performed using native spectral counts or normalization can be applied Table 32 contains descriptions of the data types specifically associated with native spectral count quantitation Table 32 Native Spectral Count Quantitation Field Description Avg Spectral Count A SC The average spectral counts across all replicates within a Biological Sample
79. d Project 34 View Chart 96 Whole Project 34 X Tandem 10 30 37 38 47 49 69 71 77 78 90 BIOINQUIRE 220 Riverbend Rd Suite 103 Athens GA 30602 Phone 706 583 4000 Fax 706 583 4000 www bioinquire com
80. de Identification In ProteolQ there are three methods to assist in establishing a threshold of confidence for peptide identifications generated from database search results These include traditional methods of Filtering by Scores or related metrics Score Delta Charge etc Peptide Probabilities and Peptide False Discovery Rates Manual validation of individual MS MS spectra is also supported in ProteolQ Q Score Filters C Peptide Probability Peptide False Discovery Rate Validating MS MS Spectra Note Peptide probability and False Discovery Rate estimations are HIGHLY dependent on the size of the data set It is always recommended that you evaluate the probability and false discovery rate plots to determine if these calculations are valid Score Filters For data sets with fewer than 500 peptides native database search scores should be used to filter peptide identifications Setting thresholds for confident peptide identification can be performed using native Mascot lon Scores SEQUEST Xcorr s and X Tandem Hyper Scores Two methods can be used to remove peptides by score as shown in Figure 55 Figure 55 Score Filters Combining Peptide Scores Sort by Score with Charge State Filters 1 sacra Peptide Filters Peptides Proteins 20 Gel 1 Min Peptide X ions Score 3 0 Min Peptide X ions Score 2 5 fiu So Bone relied Mia 96 Score Diffe
81. dicates that the protein sequence can account for all peptides within a Protein Group Selecting Is Top hides the other members within the protein group See Protein Group topic for more information In the default database parse settings for ProteolQ the Sequence ID represents the string accession number etc used to access a given protein in a public repository Description of the protein or gene in the database as defined by the Protein Name Parse Rule Number of amino acids in the protein The sum of the molecular masses for all amino acids in the protein amino acid sequence Probability that the Protein Group is identified Protein Group Probability does not penalize proteins if they contain peptides identified in multiple proteins within a group Probability that a protein within a Protein Group is identified Protein Probability penalizes proteins if they contain peptides identified in multiple proteins within a group The number of peptides matching a protein Calculated as the number of amino acids in the identified peptides divided by the total number of amino acids in the protein sequence The sum of all spectra generating peptide identifications to a given protein i e the number of redundant peptide spectra Table 25 Protein Column Selection Menu Label Free Quantification Field Average Spectral Count Avg Spectral Count Std Deviation Avg Spectral Count 96Std Deviation Log Relative Expression
82. dsvessccivievessdiveysvesvivsvesdeievsesusiesssisiediesssbiaseesdeeadsedaviens seine 15 Protein Set Navigation Bar ccscccceceeeeeeseenececeeeeeeeeeaaececeeeeeeeesaaaececeeeeeeaaaeaeeeeeeeseeaaeaeeeeeeeeees 16 Protein Set VIGWEM I 17 Protein Sequence VICWEF e 18 Peptide Set VIEW 5 eoo reperiet rhe A Re rp rege Free hte redu Te aeg rp rege RR a re ga Re RR 20 MS MS Spectra Vie Wet cec oci sete eoe cit iex cde ee Foo mur oec ev ket cog n d oec os Le e Ewa 21 Protein Set Parameters Field and Output Notes sssssesseseeeeeeene enne enn nnne 22 Menti Bal ti E r M 23 File MON Ui SEK 23 MU 24 MADE 25 Plots MEN Onei eaa dore ivi etd dente Mese etc eese eee 26 Help MENU E 28 deed 28 Statistical Algorithms OVervieW e eee vo er ee eon n ee Eo Pr ea en Sana eeu eaa Ep eV enu e Eee FH ERE Ee ae S NENNEN E Ea EEN EFE ones 29 Data File FOrIMatsscssssscsscosteccsscsscvsscevssccccesseucsecsssecseccieteccessvecdsesseeadestdecessassessesssspesnssessescsoseeess 30 HOW A GSUEIE d qocD01lo 31 Opening a ProteolQ Project 21 too eene oi ooo ovo neo pano os ouo o Pans suo ones o o SVEFN PEN Roa ER PER REN P sus 32 Saving ProteolQ Project nenne o
83. e Score Delta Discriminant Score F value Peptide Probability Peptide Sequence Modifications Total Spectral Count Spectra Intensity Total Intensity Reporter lon Intensity Description Theoretical molecular mass calculated from the sum of masses of the amino acids for each peptide Charge states are not calculated in ProteolQ The charge states are extracted from the database search results Mascot lon Score SEQUEST Xcorr or X Tandem Hyper Score Score difference between the 1 and 2 best peptide assignments to a given MS MS spectrum The score delta is analogous to the Delta Cn in SEQUEST Score normalization performed prior to probability calculation For a detailed description of how F values are calculated see the Discriminant Score section The probability that the peptide assignment to a given spectra is correct The amino acid sequence of the identified peptide Peptides modifications are displayed in a separate column and are reported by name and amino acid position If a peptide is identified multiple times the Total Spectral Count will equal the number of times the peptide was identified For Mascot results this value is the intensity of the precursor ion that generated the MS MS spectrum In SEQUEST results this value is the total ion current Total Inten of the MS MS scan The sum of the Spectra Intensities for all identical peptides Displays the intensity of each reporter ion Not
84. e for the intensity to be displayed the peptide MUST contain the appropriate modification and the reporter ion MUST be within the specified mass window Field Reporter Quant Log Rel Expression Spectra Query Biological Group Replicate Spectra Source File DB Search Result Description Displays the Log expression for each peptide as calculated based on the intensity of the reporter ions For Mascot results Spectra Query is the query number For SEQUEST results Spectra Query equals the number of the OUT file In X Tandem results the Spectra Query equals the Domain id Specifies the Biological Group the peptide identification resulted from Specifies the Replicate the peptide identification resulted from Indicates the database search result files that the identified peptide originated from pkl mgf DTA xml etc that was used to perform the database search which lead to the peptide identification Indicates the database search result files that the peptide identification originated from Note When a ProteolQ project is saved the Peptide and Protein Views will also be saved Protein Table Options in Protein Set Viewer Figure 51 shows the Protein Column selection menu that is accessible by selecting View gt Protein Table Options gt Select Columns The Select Protein Columns window will be dependent on the type of quantification performed either Label Free or Reporter lon Quantitati
85. e By Expresson Level P1 Min Significance Value so for Shading T Show Connection euaaBlBl dd Es 1 en 1 LL c ar oC Fig P m Oa ess M en E Proteins with PTMs Experimental Gel Band on X axis Interpreting the 1D Gel View by Protein Plot When gel slices Biological Samples are selected they appear on the X axis The theoretical molecular weight is displayed on the Y axis Each spot is an individual protein and the color intensity is based on the metric chosen in the Choose Data Source drop down menu and the Min Significance Level setting In the example shown in Figure 69 Total Spectral Counts is selected and the Min Significance Level was set to 50 spectral counts The spots colored dark blue have spectral counts exceeding 50 and the lighter colored spots are protein with less than 50 spectral counts Double clicking on a data point opens the Protein Sequence Viewer and single clicking selects the protein in the Protein Set Viewer Quantitation Protein expression can be quantified in ProteolQ using either Spectral Counting method or Reporter lon Quantitation The following sections describe how protein quantitation is performed in ProteolQ using the Label Free approach and provides an overview of how to interpret the results Quantitation Overview Label Free Quantitation Without Normalization Label Free Quantitation With Normalization Label Free Modifying Label Free Q
86. e cell the more the spectral counts By selecting a cell ALL of the proteins appearing in that cell are selected in the Protein Set Viewer e L O a4 al Lu ce e 1 1D Gel View by Protein The 1D Gel View by Protein Figure 69 is used to display the results of a 1D GeLC analysis on a protein by protein basis For a description of how a GeLC experiment should be setup in ProteolQ see the Experimental Approach for GeLC Analysis and the Preparing Results for GeLC Analysis in ProteolQ sections To Display a 1D Gel View by Protein Plot 1 Select the biological samples Gel Slices to display form the Choose One or More Groups for Comparison menu 2 Choose the metric to display Note the darkness of the data point is dependent upon the selected value 3 Select shading level using the Min Significance Level setting Click View Chart Figure 69 1D Gel View by Protein Intensity based on selected metric Theoretical Weight Range on Y axis ARE ACE By Group Ci Chat fo ee File Help Cluster Chart By Molecular Weight Choose One Or More Groups For Compartson Select Gel Bands to eee 2n Aa Move Up e les Likely truncated or p aesddec degraded protein products Select Metric to Display choose pata source Total Spectral Count SC w Molecular Weight kDa BUYS AERO RE BR ESET Single click to select the proteins in the Protein Set Viewer Select Level V Sad
87. e data sets Believe The Protein IDs In The Search Results Files Reports ONLY proteins assigned to peptides in the database search result If this is selected peptides are not mapped to the FASTA database Database search software allows the user to set the number of proteins identifications displayed for every peptide match Since ProteolQ arranges protein identifications into Protein Groups each peptide match is assigned to all the proteins in the specified database that the database search software determines contains this sequence If the maximum number of proteins displayed for each peptide match in the database search software is set too low then protein group creation will not be accurate Alternatively if the user s database does not contain a large number of homologous protein families and the number of proteins displayed for each peptide match in database search software is set to the maximum then it may be unnecessary to perform the clustering Warning If clustering is not performed then the parse rules defined in ProteolQ for your database MUST correlate with the rules used in the database search software Otherwise the protein identifications will not be properly extracted from the database search results For this reason we recommend ALWAYS clustering Select Files for Parsing The Select Files for Parsing window is used to add database search results and assign them to Biological Groups and Replicates Figure 30
88. e expressed Redundancy and Multiple Charge States While spectra from multiple precursor ion charge states may exhibit distinct fragmentation patterns these peptides are not considered as independent peptide matches by ProteolQ and are not allowed to contribute to the total score We have adopted this approach because of the fact that during database searching precursor ions of multiple charge states will be converted to their singly charged precursor masses prior to comparison with the in silico derived peptides The subset of in silico derived peptides passing within the precursor ion mass tolerance will essentially be identical for the singly charged and multiply charge species Thus identical assignments made from peptides with different precursor ion charge states are not unequivocally independent events Conversely peptide assignments with alternate modifications are matched through different parent mass filters and are treated as independent Discriminant Score F value F value calculation for Mascot F value co c1 IonScore IdentityScore AvgIdentityScore Example Peptide IAYNVEAAR Ion score 58 4 Identity score 25 Average Identity score 20 F value 3 0 0 1 58 4 25 20 F value 2 34 F value calculation for SEQUEST As described in Keller et a ProteolQ calculates a discriminant score F value for every peptide assignment to a given spectra This allows the assignment of a single score fo
89. e of how the Peptide FDR Plot should be used to evaluate if Peptide False Discovery Rate based peptide identification is reliable for your data set The Peptide FDR Plot contains three data series 1 Distribution of target peptides at each score RED 2 Calculated Peptide False Discovery Rate at each score threshold GREEN 3 Distribution of decoy peptides at each score threshold BLUE For the Peptide FDR s to be accurate the distributions should approximate a Gaussian distribution There should also be good separation between the target and decoy peptide distributions Figure 59 Validating MS MS Spectra To view the MS MS spectral assignment for any peptide identification simply double click on the peptide sequence in the Peptide Set Viewer or Protein Sequence Viewer The following sections describe how to use the Spectral Viewer Spectral Viewer Overview Adjusting peak labeling Mapping Neutral Losses Validating Reporter lon Quantitation Spectral Viewer Overview The spectral viewer contains three panels as shown in Figure 60 The left most panel is for selection of spectral labeling parameters that are rendered in the annotated MS MS spectrum The bottom panel contains a table with the predicted m z of the selected ion series The color codes indicate that the specified ion series was detected in the selected MS MS spectrum Figure 60 MS MS Spectral Viewer Export spectra as images or spreadsheet format Spec
90. e peptide sequence opens up the Update Protein Sequence dialogue box Hyperlinks created in the Modify Database Wizard are displayed in the right panel mA e T e 4 A TT ce z e a One or more peptides can be selected by a single right click on the row of interest Figure 10 Right clicking on the selected peptides allows the highlighted peptides to be displayed in the protein sequence coverage The selected peptides are displayed on the protein sequence in red Selecting Copy Selected places the peptide information on the clipboard Figure 10 Right click functionality in the Protein Sequence Viewer Selected peptides are displayed on the protein sequence in red Sequence Id NP 001029249 1 Sequence Name histone cluster 2 H4b Homo sapiens Protein Length AA 103 Protein Weight kDa 11 3424 60 120 MSGRGKGGKGLGKGGAKRHRKVLRDNIQGITKPAIRRLARRGGVKRISGLIYEETRGVLK VFLENVIRDAVTYTEHAKRKTVTAMDVVYALKROGRTLYGFGG 61 Single clicking on rows highlights selects the peptides of interest Copy Selected places xc peptide information onto clipboard Select Update Protein Sequence to modify peptide coverage display Peptide Set Viewer The Peptide Set Viewer displays the identified peptides Figure 11 Information about the peptide identifications are displayed along with each peptide sequence and any PTM s predicted to be present on the peptide
91. e scores are directly related to the number of matched fragment ions then different score thresholds should be used for peptides of different size To perform this analysis in ProteolQ simply enter the Score in Min Peptide Score and then select the Charge State in the Filter Pane Peptide Probabilities ProteolQ calculates probabilities that estimate the likelihood that the peptide assignment to a given MS MS spectra is correct Probability thresholds can be applied using the Filter Pane or by Sorting peptides by probability in the Peptide Set Viewer then copying the selecting peptides to a new protein set as shown in Figure 55 Validating peptide identifications by probability in Filter Pane 1 Enter Min Peptide Probability in Filter Pane 2 Select Create New Protein Set Validating peptide identifications by Peptide Probability in Peptide Set Viewer 1 Sort peptides by Probability by selecting the Prob column header 2 Highlight to select peptides of interest 3 Right click and select Copy Selected to New Protein Set If probability thresholds are utilized it is critically important to evaluate the distributions used to calculate the probability values Assessing the Quality of Probability Calculations Discriminant Score F value Plots Figure 56 should be used to evaluate the legitimacy of the probability calculations To access the Discriminant Score Plots select Plots Proteome Validation Peptide Probability To view dist
92. earching for Multiple Neutral Losses If more than one unique neutral loss is highlighted in the Selected Neutral Losses table ProteolQ will search for each neutral loss based on the mass value entered in the table In cases where multiple neutral losses of the same type occur from a single peptide these can be identified by adding a after the neutral loss mass For example in the figure above the peptide has two sites of phosphorylation In the Selected Neutral Losses table H 3 O 4 P 97 9 is entered and the MS MS spectrum is labeled with two sequential phosphate neutral losses labeled H 3 O 4 P 1 and H 3 O 4 P 2 Validating Reporter lon Quantitation Searching for fragment ions resulting from the cleavage of an isobaric tag in a peptide MS MS spectrum is supported via the Label Reporter lons check box as shown in Figure 62 Two options are available for labeling reporter ions The Label Reporter lons option annotates each reporter ion with the name for the biological group When Label Region Only is selected the entire reporter region is labeled with the name of the isobaric tag used for the experiment To validate the quality of the reporter fragments Open the spectral viewer by double clicking on a peptide of interest Click the checkbox adjacent Label Reporter lons in the spectral viewer Zoom into the reporter region typically from 110 140 m z The intensities for each reporter ion will be displayed in the Peptide Set V
93. ed with native pseudo spectral counts Table 35 Native Pseudo Spectral Count Data Types Field Description Pseudo Spectral Count P SC The average pseudo spectral counts across all replicates within a Biological Sample P SC Standard Deviation The standard deviation of pseudo spectral counts between replicates within a Biological Sample P SC 96 Standard Deviation The percent standard deviation of pseudo spectral counts between replicates within a Biological Sample Pseudo Spectral Counts With Normalization Table 36 contains descriptions of the data types specifically associated with normalized pseudo spectral counts Table 36 Normalized Pseudo Spectral Counts Field Description Normalized P SC NP SC The Average Pseudo Spectral Counts across all replicates within a Biological Sample that have been multiplied by a Normalization Factor NP SC Standard Deviation The standard deviation of the normalized pseudo spectral counts between replicates within a Biological Sample NP SC 96 Standard Deviation The percent standard deviation of normalized pseudo spectral counts between replicates within a Biological Sample Analyzing Replicate Results Replicate analyses can be specified in the Describe Proteome window of the New Project Wizard When replicates are included ProteolQ averages the spectral counts Avg Spectral Count across replicates and reports a standard deviation A SC Standard Deviation and percent s
94. en the Min Percent Replicates should be set to 66 2 3 To identify proteins found in at least 1 of 3 replicates set the Min Percent Replicates to 3396 1 3 y m Q ae Lu e zZ O ca To filter protein sets by Standard Deviation 1 Click the Standard Deviation column in the Protein Set Viewer to sort by Standard Deviation 2 Highlight the proteins of interest 3 Right click on the highlighted proteins and select Copy Selected to New Protein Set Replicate Plots Figure 77 shows the Replicate Analysis Scatter Plot To access the Replicate Analysis Scatter plot select Plots Experimental Reproducibility Replicate Analysis Scatter Figure 77 Replicate Analysis Scatter Plot Select Biological Comparison of Protein Expression Across Replicates Sample to Display aXX a Replicates pre Pane d P d s P d i A Expression Level Select Metric Data Source ie Add Trendline x Replicate Values Displayed for the Selected Biological Sample To display the plot 1 Choose a Biological Sample to display in the Choose A Group for Comparison menu 2 Select a data type in the Expression Level Data Source menu 3 Select check box to add a trend line 4 Click View Chart For the selected Biological Sample all replicates are displayed in the plot Each data point represents a protein identified in one or more replicates within the selected Biological
95. ent Patient2 Patient3 LI Biological Samples Table 37 Parameters in the quantitative expression scatter plot Field Choose One or More Groups for Comparison Select Groups to use as Reference Expression Level Data Source Show Trend line Description Selects Biological Samples to display When a Biological Sample is selected the proteins Data Source Spectral Counts amp Peptides etc will be displayed as individual data points with the selected Data Source on the Y axis Selects Biological Samples to display proteins Data Source Spectral Counts Peptides etc as a reference on the X axis When more than one Biological Sample is selected as a reference the Data Source is averaged for each protein between the selected Biological Samples The Data Source specifies when type of data to plot for the proteins Options are Total Spectral Counts Non Normalized Average Spectral Counts Normalized Spectral Counts Number of Peptides Percent Sequence Coverage Note for reporter quantitation the options are Non Normalized Pseudo Spectral Counts and Normalized Pseudo Spectral Counts Calculates a linear trend line through each data series for all Biological Samples Quantitative Expression Heat Map The quantitative expression heat map is used to compare Spectral Counts Peptides or 96 Sequence Coverage between two or more Biological Samples Figure 79 To open the quantitative expression heat map
96. ents Comparison functions are performed as follows 1 Selecting protein sets to compare in the Protein Set Navigation Bar by checking the box adjacent to the Protein Set Name 2 Select Comparison Function for On Selected Sets drop down menu 3 Assign a name to the new protein set in the Protein Set Name text field 4 Select GO to perform the function Table 21 Comparison Functions in the On Select Sets drop down menu Field Description Remove Deletes the selected protein set including proteins and peptides Rename To rename a protein set select the protein set for renaming enter text in the Protein Set Name field and select GO Intersection Two or more protein sets must be selected The intersection function renders a new protein set containing the proteins found in common between all of the selected protein sets Union Two or more protein sets must be selected The union function renders a new protein set that is the combined list of proteins from each of the selected protein sets Difference Two or more protein sets must be selected The difference function renders two or more protein sets that each contain the unique proteins found in each selected protein set For example if two protein sets are selected protein set 1 and protein set 2 and the difference function is applied to both two additional proteins sets will be created protein set 3 and protein set 4 Protein set 3 will have proteins
97. eo icac enero ez naiiai Ee doe ko coda ce e eoe Ru Eoo uu e Eee RETO 53 Describe Correction Factor eere oriente ett bee eee REESE R ee ERE e RENE re Ege aUe sonscueevaseetenee gt 55 Describe Samples 5 5 er rre terree enters rede re saducdesvebacuds sateccesseauceds vatecdessetacedeateccesss 56 Wap IONS to GrOUDS EET n a e a LOL eane a aa a eran E a 57 Select Files for Parsing ui ertet ire Sie eto tese e hoe stan a r a 58 Select Label Modification roit reiecit tain raped ha ette oi ose eee er eaae aaaeaii 59 Modify Database Wizard es eterne eene ae tuya rena EX RW Rem PEN a Pa ERYR Ee CER DERE Ea EDEN RR SEEE EVRENE pian 60 Describe Database e 61 Specity Parse Rules eite EDEN ne ert pee even ee rede ette edet eg 62 Test Parse R l8S citer a ER toisesanteasiaes vive eet e EDEN xev SE NER eet viens ELEME 64 Specity EHyDerli ks icai rete tee mre teret iege setate deese TEREE RAE SEE veoca te eie needed 65 Using the Filter Pane rre iiber ero een rec ooi e o ean ar o bona ir peau ka oc PP Ea Ea OR eo FR DIDANEAN OER Rude 66 Selecting Filter Panes oneri tette eaae t eee aeo eT eee e ERE Vo eek Va eaae ERE e eer eed node 66 How Filtering WOFIKS iere enr eet eee terre para eene ee e Seda betae Ee ee pne ea pan 67 Selecting Protein Sets to Filter eiecti ettet te reete tena ta roe nennen Ee osa n ae rona de obe niano 68 Description of Filter Parameters ccccccss
98. erlinked protein files are generated that contain the protein sequence coverage information Saves a FASTA file containing the protein sequences for all identified proteins in the selected protein sets Creates PNG image files for the selected MS MS spectra If selected the user must define how the spectra will be annotated using the Parameters for Creation of Spectral Images Menu Selecting Export According to ProteinSetViewer Views will create all tab delimited and web page exports using the column and row settings of the selected protein or peptide set Specifies the number of decimal places to report for the Peptide Mr expt Mr calc and Mr Observed in the Peptides tab delimited txt export Parameters for Creation of Spectral Images When Generate Spectral Images is selected the user must define how the spectra will be annotated using the Parameters for Creation of Spectral Images Menu Table 28 describes the parameters in the Spectral Images Menu Table 28 Spectral Images Menu Field Fragment Charge States lon Series Minimum Intensity Percentile Fragment Mass Tolerance Description Four selections are available 1 2 3 and gt 3 Selection of a charge state will label ions in the MS MS spectrum that arise from the selected ion series y b etc at the assigned charge state For example if a charge state of 2 and an ion series of y are selected ProteolQ will label the peaks in the MS MS spectrum
99. et viewer Right click the selected protein to open the Pop Up Menu Select Normalize SC From Selected Proteins The Normalization Dialogue Box will appear Select Total Spectral Counts in Biological Groups and Replicates in the Normalization Dialogue Box 6 The normalization factors for each replicate and biological sample will be displayed in the Normalization Dialogue box Figure 71 7 eee To display native or normalized Spectral Count Quantitation select View gt Protein Table Options gt Select Columns Table 33 contains descriptions of the data types specifically associated with normalized spectral count quantitation Table 33 Normalized Spectral Count Quantitation Field Description Normalized SC N SC The Average Spectral Counts across all replicates within a Biological Sample that have been multiplied by a Normalization Factor N SC Standard Deviation The standard deviation of the normalized spectral counts between replicates within a Biological Sample N SC 96 Standard Deviation The percent standard deviation of normalized spectral counts between replicates within a Biological Sample N SC Log2 Relative Expression Log2 Normalized Average SC for protein A in Biological Sample X Sum of the Normalized Average SC for protein A in Biological Sample X Y Z N SC Log2 Relative Expression The standard deviation of the N SC Log 2 Relative Expression Std Dev values calculated for matched replicates between the
100. f stringency and can be adjusted once the protein set is created The following steps should be followed to perform and pro FDR analysis using the Filter Pane Figure 58 gon Se ae Once the protein set is created select View gt Intermediate Filtering Pane Set Min Protein FDR 1 0 and leave all other parameters blank Figure 58 Select Score Metric from Metric to Use for FDR drop down menu Set the Starting Pep Coverage to 3 Select No Filter in the Filter Previous Set drop down menu Click Create New Protein Set Q LL e Y A TT E e z Q an Assessing the Quality of Protein False Discovery Rate Calculations Protein False Discovery Rate Plots and Output Notes Pane Figure 64 should be used to evaluate the legitimacy of the protein false discovery rate calculations To access the Protein False Discovery Rate Plot select Plots gt Protein FDR To view distributions of proteins from target and decoy database search results for different Peptide Coverage Levels select the Peptide Coverage from the Select the Plot to View drop down menu Note When evaluating a Pro FDR in the Protein FDR Plot it is important to use the Maximum Starting Peptide Coverage Level to assess the distributions Since ProteolQ does not calculate Pro FDRs at EVERY score threshold the distributions will not appear as smooth as in the Pep FDR and Probability Plots Cross referencing with the Output Notes Pane will aid in evaluati
101. g the protein false discovery plot Filters peptides based on discriminant score or F value Filters peptides based on peptide probability Filters proteins based on total protein probability Protein Probability takes into account the presence of protein groups and penalizes proteins if they contain peptides identified in multiple proteins within a group Filters proteins based on total protein group probability Protein Group Probability takes does not penalize proteins if they contain peptides identified in multiple proteins within a group 3 T a4 a T um z e e Comparing Protein Sets Comparing two or more protein sets in ProteolQ is supported via the On Selected Sets drop down menu Figure 48 shows that the On Selected Sets menu is located in the top right of the Peptide Protein Set Viewer Table 21 describes the functions Figure 48 Comparison Functions Select Comparison Function and click GO On Selected Sets Select protein sets for comparison by clicking in the checkbox adjacent to each protein set name Available Protein Sets 4 2A erotenset2 F Protein Set3 j nsaisan Remove Jem AvaobeProtenses TIMMEN frotenset2 Vostenseti Doe 2D Gel View Patent2 Patents False Discovery Rate Filters Max Peptide FOR Max Protein FOR SB 59 60 er 62 63 64 65 68 67 68 69 70 71 72 73 74 75 7i pl Patent Patient Pati
102. gical Groups Count The Total Spectral Count for a group is used to perform Sample Normalization for spectral counting Viewing the Total Spectral Count Pie Chart can be used to determine if a certain group was over or under sampled Number of Compares the number of peptides for all proteins between Biological Groups Peptides Number of Compares the total number of proteins between Biological Groups Proteins Number of Compares the total number of protein groups between Biological Groups Protein Groups All of the Generates four pie charts showing all of the above metrics Above Venn Diagram Group Summary Venn The Venn Diagram Figure 66 displays the relationship between selected biological groups in terms of protein groups proteins or peptides To view the Venn Diagram select Plots Comparative Proteomics Venn Group Summary Venn Table 31 describes the metrics that can be displayed and differences between the Traditional and Strict Venn Diagrams are discussed in the following sections Traditional Venn Diagram Labeling Strict Venn Diagram Labeling Figure 66 Group Summary Venn Diagram m olor File Help Choose Groups for Number of Protein Groups Collection A Select Biological Select regions in the Venn Group for A Patents Diagram to select proteins or peptides in the set viewer Select Biological Patent 1 GroupforB 1 ZEE Pa Select Biological Patient 2 Group forc 7
103. gical sample filter s Protein in Patient Filtering previous protein set Protein Set 1 Add your own text to help keep track of important information Using Intermediate filter pane DONE Summary of results Total number of peptides 678 Number of unique peptides 167 Number of protein groups 38 Number of proteins 47 For each protein set the number of proteins protein groups and peptides are reported Menu Bar The following Menus can be accessed via drop down boxes from the Menu Bar and by right clicking in the Protein Peptide Set Viewer and Plots Pop Up Menus 2 File Menu Q Edit Menu 2 View Menu Plots Menu 2 Help Menu 2 Pop Up Menus File Menu The File Menu is used to access Save and Export functions or to review Project Properties Figure 15 Items in the File Menu are described in Table 1 Edit View Plots Help Figure 15 File Menu nd Save Ctrl S Ir Save As Ctrl A o Validate Proteome Ctrl V Export b 5e L Properties 73 Close Table 1 File Menu Items Menu Item About Access Save Saves ProteolQ project File gt Save Save As Save ProteolQ project File gt Save As Validate Proteome Save Validated Proteome File gt Validate Proteome Export Export proteins peptides and File gt Export spectra Properties Access Project Properties File gt Project Properties Close Close program File gt Close Short Cut Key Ctrl S Ctrl A Ctrl V
104. gts forPentde Masses 6 Set significant digits to report J Mod Loss 1 E O 4 P 97 9769 O 3P 79 966331 lex 162 0528 lexNAc 203 0793 Protein Sets to Export Select the Protein Sets to export The Protein Sets are listed in the Protein Set Navigation Bar The check boxes below specify what Data Formats will be saved for each protein set selected Data to Export Table 27 describes the export formats Table 27 Export Formats Field Description All Parsed Peptides From Results Files Peptides Protein List Protein list html Protein Sequences Generate Spectral Images Export According to ProteinSetViewer Views Significant Digits for Peptide Masses Saves a tab delimited file containing all peptides extracted from the Mascot SEQUEST or X Tandem results Saves a tab delimited file containing all peptides and related information from the selected protein sets The Saves a tab delimited file containing all protein identifications scores and related information from the selected protein sets If Include Peptides For Top Proteins is selected the peptide identifications associated with each protein assignment will be included in the exported file Saves an html file containing all protein and peptide identifications If Generate Individual Protein Files is selected hyp
105. ie Chart Access Venn Diagram Plots Comparative Proteomics gt Venn Group Summary Venn Access 2D Gel View Plots gt Gel Based Proteomics gt 2D Gel View Access 1D Gel View Plots gt Gel Based Proteomics gt 1D Gel View Access 1D Gel View by Protein Plots gt Gel Based Proteomics gt Mol Wt Cluster Plot Pie Chart The Pie Chart Figure 65 presents an overview of the entire proteome compared across Biological Groups To view the Pie Chart select Plots Experimental Reproducibility Pie Chart Table 30 describes the metrics that can be displayed Figure 65 Pie Chart Total Spectral Count SC Number of Peptides Cen l Celi Type 1 AA EM 6 797 38 Cell Type 1 8 148 46 Select Biological Group Cell Type 2 Cell Type 2 2 856 16 334 23 Cell Type 1 Cell Type 2 Cell Type 3 Cell Type 1 Cell Type 2 Cell Type 3 Number of Proteins Number of Protein Groups Cell Type 3 Cell Type 1 cet Type 3 Cet Type 1 327 38 331 38 27 38 331 38 View Chart oo updates selections Cell Type 2 Cell Type 2 202 23 202 23 j Cell Type 1 Cell Type 2 Cell Type 3 Cell Type 1 Cell Type 2 Cell Type 3 Groups are Color Coded Table 30 Pie Chart Parameters Field Description Total Spectral Compares the Total Spectral Counts for all proteins between Biolo
106. iewer pummrm Figure 62 Reporter lon validation in the MS MS Spectral View Reporter lon Intensities are displayed in the Peptide Set Viewer Select Label Reporter lons Spectral View and Peptide Fragmentation Table to annotate reporter ion 9 500 fragments in the MS MS m iTRAGA Experimental2 spectrum 8 000 ITRAQ4 Gontrel iTRAQ4 Experimental1 7 500 d ara gt 70001 gt pet iTRAQ4 Experimental3 Label peaks with m z Jae g6001 7 _ Vertical peak labels t Z st f T sooo 7 Label ReporterIons 55 Label Region Only 3 500 1140 1145 1150 1155 116 0 116 5 117 0 117 5 Mass m z Peptide DOQEAALVDMVNDGVEDLR 144 1 N term Precursor Mass 2261 2708 e LLU e c an Lu x Z m Protein Identification In ProteolQ there are three methods to assist in establishing a threshold of confidence for protein identifications generated from database search results These include traditional methods of Filtering by Scores or related metrics Score Delta Charge etc Protein Probabilities and Protein False Discovery Rates Q Protein Score Filters Protein Probability Protein False Discovery Rate Note Protein probability and False Discovery Rate estimations are HIGHLY dependent on the size of the data set It is always recommended that you evaluate the probability and false discovery rate plots to determine if these calculations are valid Protein
107. ilter panes to choose from as shown in Figure 46 To select a Filter Pane choose View Basic Intermediate or Advance Filtering Pane Figure 46 ProteolQ Filter Panes Mew Plots Help Basic Filtering Pane Intermediate Filtering Pane Advanced Filtering Pane Basic Filter Pane Peptide Filters Min Peptide X ions Score P Peptide Length AA Peptide Filters Min Spectra Per Peptide Charge States ga wR Wes Woe Protein Filters Min Spectra Min Peptides Min Sequence Coverage E s Protein Filters min 3 False Discovery Rate Filters Max Peptide FOR Max Protein FDR Probability Filters Min Peptide Probability 0 0 Protein Set Mame s a Statistical Filters Intermediate Filter Pane Peptide Filters Min Peptide X ions Score Min Score Differential Min Peptide Length AA Min Spectra Per Peptide Max Proteins Per Peptide Max Pro Groups Per Peptide Peptide Filters Modification Name Carbamidomethyl L Include All Spectra for ANY Matching Peptide Protein Filters Min amp Spectra Min Peptides Min Sequence Coverage Min Total Protein Score Protein Filters Min of Repbcates Gene tame Contains Biological Sample Cel Type 1 en cei Te 2 min COTES J False Discovery Rate Filters Metric to Use for FDR Mascot ions score Max Peptide FOR Max Protein FOR Starting Pep Coverage for ProFDR Probability
108. ing from the loss of a stable neutral molecule Searching for ions resulting from a neutral loss in a peptide MS MS spectrum is supported via the Selected Neutral Loss table as shown in Figure 61 In the Selected Neutral Losses table masses of expected neutral losses can be entered and then selected for annotation in MS MS spectrum The mass of the neutral loss will be adjusted based on the charge of the observed precursor ion and subtracted from the precursor m z To search for neutral losses in the displayed spectrum 1 If using a default neutral loss highlight the neutral loss name and mass in the Selected Neutral Losses table 2 If the desired Neutral Loss mass is not listed enter the name and the neutral mass Mr at the bottom of the table 3 Select View Chart Figure 61 Neutral Loss Mapping in the MS MS Spectra View Spectral View and Peptide Fragmentation Table 1 500 e HG 04 P 1 1 400 Select Neutral aol Neutral Loss of 1st 1 200 Loss Mass vw Phosphate 1 000 Selected Neutral Losses H 900 7 Label regardless of intensity t 800 IL BM 700 Mod Loss 1 m n Neutral Loss of 2nd IHo 3P 79 966331 zl 500 Bes mones Phosphate eere 300 I dd 200 bj h 100 b KON ee NL UL bli 71 1 Lh 575 600 625 650 675 700 725 750 775 800 825 850 875 900 925 950 975 din Peptide TOVTSTSDSEEEGDDOEGEKKR 79 96630 S7 79 96630 S9 Precursor Mass 849 1287 ay S
109. ion Adds the reporter ion label to the reporter ions in the MS MS spectrum Adds the m z of the ion along with the ion series label In the Selected Neutral Losses table masses of expected neutral losses can be entered and then selected for annotation in MS MS spectrum The mass of the neutral loss will be adjusted based on the charge of the observed precursor ion and subtracted from the precursor m z For example if the neutral loss mass is entered as 98 Da and the observed precursor ion has a charge state of M H2 then ProteolQ will search for an ion in the MS MS spectrum over the minimum intensity percentile that is 49 m z units below the m z of the observed precursor ion Selecting Label Regardless of Intensity removes the minimum intensity percentile filter when labeling ions resulting from the selected neutral losses Exports spectral images for ALL spectra in the selected protein set Exports spectral images for ONLY the top scoring peptides in the selected protein set Only one spectral image is created per peptide Exports spectral images for peptides present on proteins with ONLY a single peptides assignment Evaluating Results The following sections provide an overview on how to interpret your results in ProteolQ The following contents are covered in this section Peptide Identification Q Protein Identification Q Whole Proteome Visualization Q Quantitation e T c at TT x Pepti
110. ior to sharing data with collaborators since the resulting files sizes are significantly smaller See the What is the difference between Save and Validate discussion section for additional information What is the Difference Between Save and Validate When you Save a ProteolQ Project you are saving a Whole Project The key difference between Whole Project and Validated is in the spectra peptides that are saved The Whole Project data will contain ALL spectra peptides that were parsed for both target and decoy data This way you can always open the Whole Project and perform validation on your proteome When you do a validation however you are discarding all decoy data and all target spectra peptides that are not used in the protein sets that are shown at the time that you validate When you first create a new project you are building the Whole Project data set The spectra peptides peptide to protein mappings etc are saved automatically after parsing and clustering in the Whole Project folder Then the Protein Set Viewer is opened and the initial protein set is generated Again this protein set has no bearing necessarily on other protein sets created on the Whole Project e g you can do an initial protein set at 196 FDR and then do subsequent ones at 5 or 0 5 using all peptides parsed from target decoy results However the initial protein set and subsequent ones are not automatically saved for you You have to manually Sa
111. itation Workflow Reporter lon Quantitation 1 For every peptide the reporter ions are detected based upon the user specified Mass Window and Reporter Method Note for a reporter ion to be detected the ion must be within the Mass Window specified and the peptide must contain the modification associated with the reporter label 2 Reporter ion intensities are then saved for each peptide 3 Using the Reference Group specified by the user ratios are calculated by dividing each reporter ion intensity by the intensity of the reference group reporter ion 4 The ratios are then Log transformed 5 If multiple peptides are matching a given protein within a single replicate the Log ratios are average across all peptides within the replicate 6 Average Log ratios are calculated for every protein across the replicates 7 Standard Deviations and Percent Standard Deviations are calculated for each protein across all replicates within a Biological Sample Table 34 Reporter lon Quantitation Field Description Ratio Log2 ratios calculated using reporter ion intensities specific to protein set viewer Log2 Relative Expression Log2 ratios calculated using reporter ion intensities specific to the protein sequence viewer Log2 Relative Expression Std The standard deviation of the Log 2 Expression Values calculated Dev for matched replicates between the Biological Samples Modifying Reporter Quantitation Settings During the new
112. itative Expression Scatter Quantitative Expression Heat Map Quantitative Expression Bar Chart Relative Expression Changes Cluster Dot Relative Expression Differences Dot About Opens the Method Comparison Plot Opens the 1D Gel View Opens the 1D Gel View by Protein Opens the 2D Gel View Opens the Venn Diagram Opens the Quantitative Expression Scatter Plot Opens the Quantitative Expression Heat Map Plot Opens the Quantitative Expression Bar Chart Plot Opens the Relative Expression Changes Cluster Dot Plot Opens the Relative Expression Differences Dot Plot Access Plots gt Experimental Reproducibility gt Method Comparison Dot Plots gt Gel based Proteomics gt 1D Gel View Plots gt Gel based Proteomics gt 1D Gel View by Protein Plots gt Gel based Proteomics gt 2D Gel View Plots gt Comparative Proteomics gt Venn Diagram Plots gt Comparative Proteomics gt Quantitative Expression Scatter Plots gt Comparative Proteomics gt Quantitative Expression Heat Map Plots gt Comparative Proteomics gt Quantitative Expression Bar Chart Plots gt Comparative Proteomics gt Relative Expression Changes Cluster Dot Plots gt Comparative Proteomics gt Relative Expression Differences Dot Help Menu Select the help menu to access the onboard ProteolQ User Guide and to view software versions Figure 19 Figure 19 Help Menu
113. le will be displayed 7 Click Apply 8 To recalculate Log expression values using the selected normalization factors you MUST CREATE A NEW PROTEIN SET 9 Tocreate a new protein set either a Highlight all of the proteins right click and select copy to new protein set or b Select the protein set in the Filter Previous Set drop down menu and click Create New Protein Set Figure 71 Normalization Dialogue Box Select Sample Normalization of Use Control Proteins for Normalization 1 Normalization factors are listed in table Please choose the normalization method Total Spectral Counts ri Biological Groups and Repicated Spectral Counts of User Specified Control Protein s Normalization Data for Current Protein Set Patent Patient Patient Patient i Patient 1 Patient Patent Patent Patent2 Patent2 Patient Lrepi Patient Lrep2 Patient lrep3 Patient Z rep1 Patient 2rep2 Patent 2rep3 Patient Selected proteins Protein 1d izeiSc Mom irepzsc Mom 1 rep3 SC is hel aeisi Mem 2erep2SC Mom zre3Sc Mo tis 3rrepiSC Factor Factor Factor Factor Factor Factor NP 000055 2 62 181 112 1 88 12 n2 1 85 129 n n gs 125 n 102 5 NP 001002029 2 36 Ls 4 it Ic 126 4 In 1 135 so ji 3s ma is juo fs Average Normalization Factors 165 1 1 26 1 132 n 134 jns Apply Cancel Select Apply to apply normalization To select native or normalized Spectral Count Quantitation select View Protein
114. logical samples and the biological sample number of replicate analyses performed on each sample Results files from all replicates of the same biological sample are then combined for further analysis Field Description Spectra were searched This box allows the user to specify that a decoy database search against a decoy database was performed Searching against a decoy database is used when ProteolQ calculates False Discovery Rates FDR for the peptides and proteins identified Specify if the database was concatenated or separate target and decoy databases Spectra were search against Specifies if a non target database search was performed This a non target database option is useful if one expects that the sample contained proteins from organisms other than the target organism e g human keratins in protein preparations Combine All Results Files MudPIT Groups all database search results as if from a single biological sample This is useful when peptides from a single biological sample are subjected to multidimensional chromatography tandem mass spectrometry with off line fractionation between the chromatographic steps For example the peptides are separated into 10 fractions by strong cation exchange chromatography each of which is analyzed by reversed phase chromatography tandem mass spectrometry This experiment would result in a series of 10 raw files which would be processed into peak list format and searched separatel
115. lows the selection of a previously indexed database or the creation of a new database Database Name Enter the name of the new database in the text field provided Once indexed this name will be available in the select database drop down menu Database Type Specify if the database is a separate target or decoy database or a concatenated target decoy database Decoy databases are used to calculate False Discovery Rates Description A description of the database can be provided in the description text box Text entered here will be stored with the indexed database Specify Parse Rules The Specify Parse Rules section allows regular expressions to be created and saved to instruct ProteolQ as to how to extract the accession numbers protein names and descriptions from the FASTA database Figure 43 Table 19 describes the fields Figure 43 Specify Parse Rules Test Parse Rules Please specify the parse rules to extract the protein IDs and names from the sequence database Specify Hyperlinks Save Database Select Existing 1 DEFAULT z Description DEFAULT Protein ID Parse Rule En Protein Name Parse Rule s s e LU aZ as LLI iz 3 Z ea Decoy Identifier Rule REV ProteolQ comes preloaded with parse rules for commonly used database structures and custom parse rules can be created as follows Steps for creating custom parse rules Assign a
116. m Max Protein FOR 1 0 giri Number of protein groups 1881 fe Starting coverage level 3 Dn Number of proteins 3576 s Parameters are recorded to help keep track Record protein set data such of how protein sets are created as number of proteins and peptides BIOINQUIRE PROTEOIQ Protein Set Navigation Bar The protein set navigation bar is accessible from the top of the Protein Peptide Set Viewer as shown in Figure 6 Each protein set is listed in the Navigation Bar and the individual protein sets are displayed by clicking on the protein set name To compare two or more protein sets simply click on the check box located to the left of the protein set name then select the Comparison Function Figure 6 Protein Set Navigation Bar Use check boxes to activate protein sets for comparison features such as difference or intersection Rename or remove sets using the comparison panel 1 Protein sets can be displayed by clicking on the name of the set Complement Component BIOINQUIRE PROTEOIQ Protein Set Viewer Figure 7 shows the protein set viewer Identified proteins are organized by Protein Group and all relavent data such as Score Spectral Count and Expression Ratios can be accessed within the protein set viewer Creat custom views using the Protein Table Options Figure 7 Protein Set Navigation Bar Select proteins tab to All columns can
117. n assigned number which is generated when the search is performed Often determining which DAT file corresponds to a particular peak list may be difficult therefore we recommend copying the search results into another directory prior to running ProteolQ This will ensure that the data may be easily accessed for multiple analyses at a later date There is also no need to rename your DAT files When you select the file in the Available Results Files Viewer ProteolQ extracts the name of the peak list from the DAT file and renders it in the window thus making it easy to determine which database results correspond to each experiment e Li e D A TT ce z e m SEQUEST OUT DTA PARAMS and SRF ProteolQ supports either binary search results formats SRF or the human readable format of SEQUEST OUT DTA and PARAMS For the OUT format each search result folder should contain the OUT DTA and PARAMS files Note SEQUEST folders often contain very large numbers of files For this reason we recommend copying these folders to a separate directory prior to running ProteolQ BioWorks versions 3 2 and greater support the creation of SRF files ProteolQ currently support parsing of SRF files generated directly from the SEQUEST search or via conversion of the OUT and DTA results folders to SRF format X Tandem XML X Tandem generates an XML results report than can be directly imported into ProteolQ
118. n set has been created open the 1D Gel View by selecting Plots Gel Based Proteomics 1D Gel View How to display a 1D Gel View 1 Select the biological samples Gel Slices to display form the Choose One or More Groups for Comparison menu 2 Choose the metric to display Note the darkness of the cell is dependent upon the selected value 3 Click View Chart Figure 68 1D Gel View Intensity based on selected metric Theoretical Weight Range on Y axis 1 1D Gel View isplay Likely truncated or Display i a degraded protein products m EE Enim Move Down Select Metric to Display crocs pata source Pe E Total Spectral Count SC HI Tr T E Enter HGelBands MEE Single click to select the proteins Starting kDa and ES CU MM in the Protein Set Viewer E Proteins with PTMs Experimental Gel Band on X axis Interpreting the 1D Gel View When gel slices Biological Samples are selected they appear on the X axis The theoretical gel bands created by selecting the Gel Bands and Weight Range are displayed on the Y axis Each cell is then color coded by the amount of the metric chosen in the Choose Data Source drop down menu In the example shown in Figure 68 Total Spectral Counts is selected For each cell the spectral counts for all proteins appearing in the theoretical molecular weight range and experimental gel slice are summed The darker th
119. ne contains cell type one and cell type two and experiment two contains cell type one and cell type three With a multiplicity of two ProteolQ will compare both cell type two and three to cell type one for quantitation Note The only restriction is that the control be the same for all experiments Therefore one can make comparison across multiple Plexes on an apples to apples basis Indicates the number of replicates performed These can be either internal or external Select whether replication was done internally or externally Internal replication example iTRAQ 4 reagent is used to analyze two samples in duplicate External replication indicates that multiple MS MS experiments were performed for the same set of samples Indicates the mass window used for selection of the reporter ions Note A Mass Tolerance of 0 1 for a reporter ion mass of 114 1 indicates that ProteolQ will search for the reporter ion from 114 0 to 114 2 eo OINQUIRE PROTEOIC Describe Correction Factor Correction factors are designed to adjust for batch to batch deviations in purity of the reporter reagents Purity coefficents are typically supplied by the reagent manufacturer to indicate the percentages fo each reporter ion that differ by 2 1 1 and 2 from the mass of each reporter ion ProteolQ will automatically adjust the peak intensities of each reporter ion based on the correction factors selected prior to quantitation Figure 36
120. ng the validity of the Pro FDR calculations Figure 64 Protein False Discovery Rate Plot Select Max Starting Output Notes Showing Peptide Coverage Calculation of Pro FDR p Trying coverage 2 Trying metric 48 18779 FOR 07980078 target PG 397 decoy PG 3 Trying metic 37 97802 FOR 0 72115380 target PO 416 decoy PG 3 Trying metric 33 8418 FOR 1 401809 target PG 428 decoy PO Trying metric 35 90991 FDR 1 1782454 target PG 424 decoy PG 5 Protein FDR for Coverage of 3 or More Peptides Target Proteins F R 0 94788733 target PG 422 decoy PG 4 Trying metric 36 47394 FOR 0 94786733 target PG 422 decoy PG 4 Trying metric 38 41127 FOR 1 1848241 target PG 422 decoy PG 5 Trying metric 36 47394 FOR 0 4786733 target PG 422 decoy PG 4 Final FDR for cov 2 ix 0 94786732 at min metic ot 38 47394 Tying Trying metic 49 57197 FOR 2217742 target PG 496 decoy PG 11 FOR 10660981 target PG 468 decoy PG 5 Trying meric 00 97791 FOR 10705638 target PG 467 decoy PG 5 Trying metric 61 78282 FOR 1 075269 target PG 405 decoy PG 5 ezam aey Aiennasig esies FOR 0752089 target PO 405 decoy PG 5 Trying metric 62 41932 Pro FDR Decoy Proteins 1 Trying metic 62 60733 FOR 1 0782088 target PG 408 decoy PG 8 Trying metric 2 67 FOR 1 0752089 target PG 405 decoy PG 5 Final FOR for cov 1 it 1 0752689 at min metic of 6267 DONE
121. nge H i Select Proteins in View paiz F ul Tis Range el V Show tk labels Tete icerum 7 V Show tick marks r 10 E Mew chart Laub TEN TREO LN ul aM La d C eme Protein Id l Patient 1 Patient 2 8 Patient 3 Select Proteins In View The select proteins in view function enables the selection of proteins in the protein set view that are displayed within the viewable area of each plot This function is particularly useful for selecting proteins within the Relative Expression Differences and Relative Expression Changes expression plots Exporting Results To access the Export Menu select File Export Figure 54 shows that the Export Menu is divided into three sections Protein Sets to Export Q Data to Export Q Parameters for Creation of Spectral Images Figure 54 Export Menu All Peptides Parsed from Results Files tab delimited txt Select protein set E eie ss ntm B 2 2 2 toexport Export peptides V Protein List tab delimited txt Indude Peptides For Top Proteins lo arenas Export proteins n Create web pages V Generate Individual Protein Files 5 LLI a4 OW LL ES Z Q aa 7 Generate Spectral Images RET oe Export FASTA DB Export files according to 7 Generate Spectral Images png Export MS MS spectra protein peptide table 4 WiBwstsccringtoprotansetieierieng gnifcantDi
122. nt factor H isoform a precursor Homo sapiens s 1 llYes NP_000604 1 hemopexn Homo sapiens 12667 1528 izes Je 596869 3 titin isoform N2 A Homo sapiens i 8 2 646 13Yes NP 002209 2 nter aipha Globulin inhibitor H4 Homo sapiens 3 667 2 517 14Yes 002272 nter aipha giobuin inhibitor H2 polypeptide Homo sapi 8 667 4041 1SYes NP 001076 2 iserpin peptidase inhibitor dade A member 3 precursor 20 6 928 Gs PP Sees __Faunngen tein 2 pimo NEG w 0 835 1 528 17Yes NP_S70602 2 alpha 18 gycoprotein precursor Homo sapiens 4 1 732 isles INe 000497 1 Coagulation factor II precursor Homo sapiens s 1 732 34 641 19 Yes NP 000473 2 japolipoprotein A IV precursor Homo sapiens 5 333 1 528 28 641 H 20Yes NP 002206 2 Jnter alpha globulin inhibitor H1 Homo sapiens 3 667 2 082 56 773 Filter by Replicates In or Not In 21Yes JNP_000033 2 apolipoprotein H precursor Homo sapiens i 8 333 1 155 13 856 22Yes he _000292 1 plasminogen Homo sapiens 333 asi 7548 a Biological Sample 23Yes INe 000479 1 serine or cysteine proteinase inhibitor dade C antith 9 333 1 528 16 366 24es pP 000020 1 Jangiotersinogen preproprotein Homo sapiens 2135 ssia 4289 Setting the Minimum Percentage of Replicates is performed as follows If three replicates are performed and the user wants to create a protein set containing proteins ONLY identified in 2 of the 3 replicates th
123. odification page of the New Project Wizard The peptide must contain all of the reporter ions present For example an iTRAQ 2 plex experiment both the reporter ion at 114 1 and 117 1 should be detected to be considered a Valid Reporter Peptide Pseudo Spectral Counts The concept of pseudo spectral counts is intended to provide a measure of sampling when a reporter ion based quantitation is performed Since reporter ion quantitation is a mixture based technique i e all samples are mixed together prior to analysis a traditional spectral count value is not feasible To circumvent this the pseudo spectral count takes into account the number of peptides and relative reporter ion intensities for each biological group to allow a researcher to evaluate a protein s sampling Figure 75 shows a flow chart for how pseudo spectral counts are calculated as performed for a binary comparison of two biological samples with two replicates Figure 75 Calculation of Pseudo Spectral Counts Example iTRAQ 2 Plex Experiment Sample 1 Sample 2 Label with 114 Label with 117 O LLI d O al LU E 2 e O aa Analyze in duplicate C Rep1 gt C Rep 2 MIRA E og Den de41 Peptides matching Peptide 1 Peptide 2 Peptide 1 gt LOT _ D p pus ae _ Protein X Extract Reporter lon Intensities 114 1 1000 114 1 3000 114 1 500 117 1 1500 117 1 4000 117 1 1000 Calculate Average Reporter lon Intensity
124. on Select the columns to display by checking the box adjacent to each column name Table 23Table 24 Table 25 and Table 26 provides descriptions of each data type Figure 51 Protein Column Selection Menu Select Columns to Customize the Protein Set Viewer Wie Plots Hep 5 BasicFiltering Pane Intermediate Filtering Pane Advanced Filtering Pane Peptide Table Options gt All Results Top Scoring Results ay Select Columns Protein Table Options gt Protein Columns for Label Free Quantitation Please select the columns and settings for the view General Information Protein Description Quality Metrics V Protein Group Number V Protein Score V Is Top V Protein Group Probability V Sequence id V Protein Probability 7 Sequence Name V Number of Peptides V Protein Length AA V Sequence Coverage F Protein Weight kDa V Total Spectral Count Label Free Quantification Individual Column Options Normalization Options V Average Spectral Count V No Normalization V Avg Spectral Count Std Deviation V Spectral Count Normalization 7 Avg Spectral Count Std Deviation V Log2 Relative Expression V Log2 Relative Expression Std Deviation V Normalized Spectral Abundance Factors NSAF Data Groups V Total Proteome V Biological Samples Protein Columns for Reporter lon Quantitation Select Protein Columns mE Please select the columns and settings for the view
125. pectral Counts are first normalized by comparing the total Normalization pseudo spectral counts for all proteins identified in each biological group The normalized Pseudo Spectral Counts are then used to calculate Relative Expression and Log Relative Expression For more information see the Sample Normalization topic Customize Protein and Peptide Tables in Protein Sequence Viewer Figure 52 shows the Protein and Peptide Column Selection Menu in the Protein Sequence Viewer The Column Selection Menu is accessible by choosing View gt Customize Protein Table or View gt Customize Peptide Table Select columns to display by checking the box adjacent to each column name Table 22 contains descriptions of the data types in the Customize Peptide Table Menu Table 23 Table 24 Table 25 and Table 26 provides descriptions of each data types in the Customize Protein Table Menu Figure 52 Peptide and Protein Table Column Selection Menu in Protein Sequence Viewer Mein Select Customize ion Select Customize Customize Protein Table Customize Protein Table Customize Peptide Table Pe pt ide Table Customize Peptide Table Protein Table Please select the columns and settings for the view General Information V Total Spectral Count 7 Spectra Intensity V Number of Peptides V Total Spectral Count F Total Intensity V Spectra Query
126. porter lon Quantitation The New Project Wizard is used to create a new ProteolQ project Step by step instructions are provided to assist the user in grouping and importing database search results Navigating through the New Project Wizard occurs in seventeen steps Describe Project Describe Proteome Described Experiment Describe Correction Factor Describe Samples Map lons to Groups Choose Database Search Result Options Select Target Files for Parsing Select Decoy Files for Parsing Select Non Target Files for Parsing Select Label Modification Peptide Parsing Options Parse Files Parse MS lon Data Cluster Peptides Peptide Protein Viewer Figure 3 New Project Wizard Reporter lon Quantitation Please describe your experiment Choose Reporter Methot i semmasnssmm Define parameters for project creation iTRAQ2 TMT2 Custom 9 iTRAQS TMT6 Method Name iTRAQB Number of lons B Wizard provides step by step instructions for importing database search results into Proteold 1 Map lons to Groups Choose Database s Search Result Options Select Target Files For Parsing Select Decoy Files For Parsing lect NonTarget Files For Parsing Experiment Multiplicity 1 Number of Replicates Replication Method External 1 group per label replicates are separate results Internal 1 replicate per label 1 set of results Mass Tolerance da for matching reporter ions in each spectrum 0 1
127. pot that is twice the size and darkness as a protein with 100 spectral counts Note Single clicking on a protein spot will select the protein in the protein set viewer Double clicking will open the protein sequence viewer 1D Gel View The 1D Gel View Figure 68 is used to display the results of a 1D GeLC analysis The following sections discuss how to setup up a ProteolQ project for use with the 1D Gel View Experimental Approach for GeLC Analysis Preparing Results for GeLC Analysis in ProteolQ How to Display a 1D Gel View Interpreting the 1D Gel View Experimental Approach for GeLC Analysis The underlying assumption when using this plot is that the following experimental approach has been used Proteins were separated on a 1D Gel The gel lane was sliced into sections Proteins in each gel slice were in gel digested to produce peptides The peptides from each gel slice were independently analyzed by LC MS MS 5 Each MS MS file generated one or more database search results Bosw S Preparing results for GeLC Analysis in ProteolQ 1 Setthe number of Biological Samples to equal the number of gel slices in the Describe Proteome window of the New Project Wizard 2 Name to Biological Samples according to gel slice It is recommended that you number the groups using strict alpha numeric ordering 01 02 10 3 Group the database search results by gel slice in the Select Files for Parsing window 4 Once the protei
128. project creation the user must specify settings for selection of reporter ions for isobaric tag based quantitation These settings include Mass Tolerance and Reference Group After the project is created the user may modify these settings by selecting the Modify Reporter Quantitation Settings The following sections describe how to modify reporter quantitation settings Figure 74 Select Edit Modify Reporter Quant Settings Set desired Mass Tolerance and Reference Group Click Apply To recalculate Log expression values using the new settings you MUST CREATE A NEW PROTEIN SET 5 Tocreate a new protein set either a Highlight all of the proteins right click and select copy to new protein set or b Select the protein set in the Filter Previous Set drop down menu and click Create New Protein Set A WUNA LU X A LLJ D e lt aA Figure 74 Modify Reporter Quant Settings Dialogue Box Please specify the reporter quantitation settings Fragment Mass Tolerance 0 05 Reference Group Control Total Number of Valid Reporter Peptides 14012 of 14613 Note These settings only affect protein sets created after the settings are applied ee nit Note Total Number of Valid Reporter Peptides indicates the number of peptides that have all detectable reporter ions within the Fragment Mass Tolerance specified and that have the modification selected in the Select Label M
129. r peptide assignments from different database search engines and thus different scoring systems For Mascot search results F values are calculated using the lon Score Identity Score and Average Identity Score For SEQUEST results the Xcorr are first normalized Xcorr to account for the Xcorr dependence on peptide size F values are then calculated using the Xcorr ACn SpRank and AM F values can be employed in ProteolQ to calculate peptide probabilities and or false discovery rates References Keller A Nesvizhskii A 1 Kolker E Aebersold R Empirical statistical model to estimate the accuracy of peptide identifications made by MS MS and database search Anal Chem 2002 74 5383 5392 Legal Notices Copyright 2009 BIOINQUIRE LLC All rights reserved This manual and the software described in it are the exclusive property of BIOINQUIRE LLC BIOINQUIRE LLC including BIOINQUIRE LLC officers employees agents directors independent contractors affiliates distributors and successors has not made or granted any express warranties concerning the information contained in this manual The content of this manual are for information purposes only and is subject to change at any time BIOINQUIRE LLC shall not be liable for any direct indirect lost profits consequential exemplary incidental or punitive damages regardless of the form of action as a result of any errors or inaccuracies in this manual No pa
130. r Previous Set drop down menu Click Create New Protein Set pao oe ee Figure 58 Setting Peptide FDR in Filtering Pane False Discovery Rate Filters Metric to Use for FDR Mascotions score v Max Peptide FDR 9 5 1 0 Max Protein FDR 9 6 Starting Pep Coverage for ProFDR Probability Filters Min Peptide Probability Min Protein Probability Min Protein Group Probability Assessing the Quality of Peptide False Discovery Rate Calculations Peptide False Discovery Rate Plots Figure 59 should be used to evaluate the legitimacy of the peptide false discovery rate calculations To access the Peptide False Discovery Rate Plot select Plots Proteome Validation Peptide FDR To view distributions of peptides from target and decoy database search results for different Score Metrics select the Score Metric from the Select the Plot to View drop down menu Figure 59 Peptide False Discovery Rate Plot 7 ProtectQ Peptide FDR Charts rer File Help m Peptide FOR Peptide False Discovery Rate Plot Type QU P bd Line Plot 150 000 z 80 Bar Chart 140 000 pe los 130 000 70 mum E Decoy Peptides Number of Peptides aey Aenoosig esie4 Target Peptides 30000 834888525 828323588 i to o 2 30 4 50 6 1300 110 120 130 140 150 70 80 Peptide Scores Target Peptides Decoy Peptides Peptide FDR 9e Figure 59 shows and exampl
131. r ProteolQ Project pow Choose auto protein set creation or manual Table 13 Peptide Parsing Options Field Description Min Score Removes peptides below the Mascot lon Score SEQUEST XCorr and X Tandem Hyper Score Min Score Differential Removes peptides below the difference in score between the most significant match and the second best match to a given spectra Min Peptide Length Removes peptides having fewer amino acids than the setting Table 14 Initial Protein Set Options Field Min Peptide Probability Min Protein Probability Min Protein FDR Description Removes peptides below the specified Peptide Probability Removes proteins below the specified Protein Probability Removes proteins above the specified Protein False Discovery Rate Table 15 Options for MS MS Spectral Creation and other parameters Field Create No lon Data Extract lon Data For Peptides ID d From a Single Spectrum Create lon Data For Top Scoring Create lon Data For All Peptides Protein Set Name Results Path Auto Generate Initial Protein Set Description This is the default setting for ProteolQ and will NOT create any MS MS spectra This setting will create the smallest possible ProteolQ project Creates an MS MS spectrum for a peptide identification that was generated from a single MS MS event i e with a spectral count of one Creates MS MS spectra for top scoring peptides ONLY The mo
132. r Score required to meet the user specified false discovery rate 2 Peptides are then removed based upon the assigned peptide filters For example if a minimum peptide ion score of 40 is selected then all peptides below a score of 40 are first removed prior to protein assembly If a peptide probability filter is assigned peptides are removed from the dataset based on the filter parameters 3 Once Protein Groups are assembled protein groups are selected based upon the user specified Protein False Discovery Rate 4 Protein filters are then applied to the proteins passing the False Discovery Rate Protein probability filters are applied at the same time as other protein filters such as peptides or sequence coverage 5 If the maximum number of proteins groups matching a peptide is assigned then this filter is applied after the protein filtering Note Filters are based off of the TOTAL RESULTS Selecting a Biological Sample does NOT restrict the filters to the selected sample In order to filter specific Biological Samples you must use the Copy Selected to New Protein Set Feature Selecting Protein Sets to Filter Filters can be applied to an existing protein set or to all of the peptides extracted from the database search results No Filter as shown in Figure 47 Table 20 describes the fields in the Selecting Protein Sets Filter window Figure 47 Protein Set Selection Window for Filtering Select protein set from filter previo
133. r example if Biological Sample 1 is selected as Comparison Group and Biological Sample 2 is selected as Reference Group the Log value will be calculated as follows Log SpC in BioGroup1 SpC in BioGroup 2 When more than one Biological Sample is selected as a reference the Data Source is averaged for each protein between the selected Biological Samples Expression Level Data Source The Data Source specifies when type of data to use for the Log relative expression calculation Options are Non Normalized Average Spectral Counts or Normalized Spectral Counts Note for reporter quantitation the Data Source field is deactivated and expression levels are calculated as described Quantitation Overview Reporter lon Quantitation Show Standard Deviations Adds Log Standard Deviations to the plot Show Proteins in All Groups Hides proteins that are NOT present in all the selected Biological Samples Show Connections Adds lines between common proteins found in more than one group Relative Expression Differences Dot The relative expression differences cluster dot plot is used to compare DIFFERENCE between the Log Relative Expression Values between two or more Biological Samples Figure 81 While the Relative Expression Changes Cluster Dot Plot displays the Log2 expression values for proteins in multiple Biological Samples the Relative Expression Differences Plot asked the question What is the difference in Log2 Rel
134. re SEQUEST Xcorr and X Tandem Hyper Score 3 Peptide probability Max Peptide FDR 96 Peptide false discovery rates are calculated by comparing the frequency of peptide identifications at each score between a target and decoy reversed or scrambled database The false discovery rate is defined as the number of decoy peptides divided by the number of target peptides multiplied by 10096 When a max peptide FDR is defined ProteolQ selects peptide score thresholds such that the proportion of decoy peptide matches relative to the number of target peptide matches does not exceed the user defined max peptide FDR Max Protein FDR 96 ProteolQ utilizes the ProValT algorithm to calculate protein false discovery rates as described in Weatherly et al ProValT assigns a protein false discovery rate based on the distribution of proteins with a minimum peptide coverage c identified by peptides score bins B between a target and decoy database search By setting a max protein FDR 96 ProteolQ selects peptide coverage levels and peptide score thresholds such that the proportion of decoy protein matches relative to the number of target protein matches does not exceed the user defined max protein FDR Weatherly DB Atwood JA Minning TA Cavola C Tarleton RL Orlando R A heuristic method for assigning a false discovery rate for protein identifications from Mascot database search results Mol Cell Proteomics 2005 4 762 772 Field
135. re 50 Peptide Column Selection Menu View Plots Help Select Columns Basic Filtering Pane to customize Peptide Intermediate Filtering Pane Set View Advanced Filtering Pane Peptide Table Options gt AllResults Protein Table Options Top Scoring Results 73 Select Columns SdetPepdeCoums O amp Sg Please select the columns that you would like to view for each peptide J Is Top V Total Spectral Count 7 Observed Mass V Mr expt 7 Mr calc V Reporter Ion Intensity J Charge State V Reporter Quant Log2 Rel Expression V Peptide Score V Spectra Query V Score Delta V Biological Group V Discriminant Score F Value V Replicate J Peptide Prob V Spectra Source File 7 Peptide Sequence V DB Search Result File V Modifications Table 22 Data Types Available in the Peptide Column Selection Menu Field Description Is Top Indicates if a peptide assignment has the top Peptide Score The same peptide is often identified multiple times To display only the Top Scoring Peptides and remove redundancy select Top Scoring Results in the Peptide Table Options Observed Mass The observed peptide m z that was selected to generate the MS MS spectrum Otherwise known as precursor m z Mr expt Calculated molecular mass from the observed mass For example Observed Mass 1000 00 and Charge State 3 then the Mr expt 3000 00 Field Mr calc Charge State Peptide Scor
136. re Groups Selects Biological Samples to display as columns in the heat map for Comparison When a Biological Sample is selected the proteins Data Source Spectral Counts Peptides etc will be displayed in tabular format underneath the Biological Sample column header Expression Level Data Source The Data Source specifies when type of data to display for the proteins Options are Total Spectral Counts Non Normalized Average Spectral Counts Normalized Spectral Counts Number of Peptides Percent Sequence Coverage Note for reporter quantitation the options are Non Normalized Pseudo Spectral Counts and Normalized Pseudo Spectral Counts Log2 Scale Displays Log2 expression values calculated using the selected Data Source Absolute Scale Displays the native Total Spectral Counts Non Normalized Average Spectral Counts Normalized Spectral Counts Number of Peptides Percent Sequence Coverage Note for reporter quantitation the options are Non Normalized Pseudo Spectral Counts and Normalized Pseudo Spectral Counts Number of Partitions Sets the number of color gradients For example if the Number of Partitions is set to 4 and the maximum spectral counts are 100 and the minimum spectral counts is O Then the color partitions will be set as follows 100 0 4 25 The first color will range from 1 25 the second from 26 50 etc e Q LLI e a4 OW LL c e 1 E ca Relative Expression Changes Cluster Dot
137. rential Mr expt Mr cak Charge Score Delta Prob Peptide Sequence Modification 1 805 976 1 805 962 2 5028 2 833 FIQVGVISWGYVDVCK E 57 0215 C15 Min Peptide Length AA 5 Min Peptide Length AA 5 Y zl zs i ak TOONANADENANK 3 SLK Min Spectra Per Peptide Min Spectra Per Peptide Max Proteins Per Peptide Max Proteins Per Peptide 1 274 614 2 0 924 NWGLG Copy Selected Ctrl C 0215 C10 1 052 492 1 052 48 2 2418 1038 0 979 WGYC Select Matching Proteins 0215 C4 Max Pro Groups Per Peptide Max Pro Groups Per Peptide 1 180 606 1 180 582 2 1941 0 086 0 15 WGYCL 0215 C4 3 dn 2 835 388 2 835 414 3 2 89 1 283 0 959lFFLTCV Copy Selected to New Protein Set 9975 cs Charge States Charge States 1 ya Ed Os D DH gs As Fl 1 Copy Selected to New Protein Set Validating peptide identifications by Score in the Filter Pane 1 Enter Min Peptide Score in Filter Pane 2 Select Create New Protein Set Validating peptide identifications by Score in the Peptide Set Viewer 1 Sort peptides by Score by selecting the Score column header 2 Highlight to select peptides of interest 3 Right click and select Copy Selected to New Protein Set Combining Score Filters with Charge State A common approach for validating peptide identifications is to combine Score Filters with Charge State Filters The assumption is that as charge increases the size of the peptide will also increase Sinc
138. resholds between a Target and Decoy reversed or scrambled database The false discovery rate is defined as the number of decoy peptide or protein identifications divided by the number of target peptides or proteins multiplied by 10096 Reference Weatherly DB Atwood JA Minning TA Cavola C Tarleton RL Orlando R A heuristic method for assigning a false discovery rate for protein identifications from Mascot database search results Mol Data File Formats Table 7 describes the files formats supported in ProteolQ Table 7 File formats File Extension About Location and Use dat Mascot Matrix Science database Import in New Project Wizard search result format out Thermo Fischer database search result Import in New Project Wizard format containing peptide information dta Thermo Fischer MS MS peak list format Import in New Project Wizard used to create the MS MS spectral view in ProteolQ srf Thermo Fischer BioWorks search Import in New Project Wizard results file piq ProteolQ project extension Saved when a project is created Select this file to open a project fasta Database format Export obj ProteolQ project file Required to be present in project folder but not selected to open a project inx ProteolQ index file Required to be present in project folder but not selected to open a project xml X Tandem database search results Import in New Project Wizard output format How to Use ProteolQ
139. ributions for peptides of a given charge state Select the Charge State of interest in the Select The Plot to View drop down menu Figure 56 Peptide Probability Plot LH Discriminant Value Distributions for Charge 2 Select charge to view Discriminant Plots for each charge state Select Charge All t to see an average of e all charge states View Sensitivity amp Error Plots to help determine where to set a probability cut off Number of Peptides eset Paik a ee S 1411 e T O a4 al Lu c 2 Z e Figure 57 shows and example of how the Discriminant Score Distributions should be used to evaluate if probability based peptide identification is reliable for your data set The Discriminant Score Distribution Plot contains three data series 1 Distribution of experimental discrimanant score RED 2 Predicted distribution of negative peptide assignments GREEN 3 Predicted distribution of positive peptide assignments BLUE For the peptide probabilities to be accurate the plot should show a good separation between the predicted positive and predicted negative distributions and both distributions should fit well with the distribution of experimental peptide discriminant scores Figure 57 Example Discriminant Score Distributions Poor Good Excellent Discriminant Value Distributions for Charge 2 Discriminant Val
140. rina oaa a eu Roa Ra ne RR DENN SN RR DERE Ye BR DENN SERA DERR Ne BA DEAN SR RA DEAN Ne BABERE 33 Sharing and Moving a ProteolQ Project leeeeeeeee eese eene eee eene nennen nnn shn nne n sten nn nuns 35 Creating a New Project Overview eeee eee ee eene eene nn enne nnn none nnns these sa stesse sa srte sea sonos enun 36 What You Need to Create a New Project lleeeeee esee esee eee eene enne n nnns ento nnne nett nenas 37 Creating a New Project Label Free Quantitation eeee eese eene ee eene nenne eren 39 DESCHIDE Proje Cts EET EET 40 Describe gerer m canadeconnsesst aane eaa kataa aaa akra aeaa ieoa eieae ariii 41 Describe Sampl68 arnir O EAE OEO E ROEE OAE 43 Choose Database acini ennienni aneka e RAAE LE Aa RAEE E RAEES REEE EERS 44 Search Results Options 5 oerte retine te E quads Eae etre ha NEESKENS ESTEE RENANE ARENE 44 Select Files fot Parsinig rece RR C mer ER EE Ti ecce Einer eae de Paare te ERE THEN eRdd 46 Peptide Parsing OptiOLs oerte rara Re np ege kr re See dine ea TRE Se Roue epa grae SER dea ga RERO Rue ER 47 Parse FilesNMS lons Data tci tete ee octez ptores e oct suene eed tocbes iuter ee toc n 50 Cluster Peptides riii nina EEN baee ea Eee Ne Re banale Foe en nae reU ERU dead na de 51 Creating a New Project Reporter lon Quantitation eese ee ee eene ee eene eene enne nn nnne 52 Describe EXperimert ucii c
141. rsing Options Parse Files MS lons Data Cluster Peptides Peptide Protein Viewer Describe Experiment The Describe Experiment section contains fields for specifying how the reporter quantitation experiment was performed Select the reporter method specify a custom method set replicates and multiplicity Figure 35 shows the Describe Experiment section of the New Project Wizard Table 16 describes the reporter methods and Table 17 describes the fields Figure 35 Describe Experiment Section Select commercial reporter ion labels for quantitation Perform quantitation using custom labels ProteolQ roject Quantitat oj amp 3 Describe Project Describe Proteome Describe Experiment Please describe your experiment Describe Correction Factor Choose Reporter Method Describe Samples iTRAQ2 TMT2 5 Custom Nev Map lons to Groups iTRAQ4 TMT6 Method Name Custom Method Choose Database s 5 iTRAQS Number oflons otov Due RR N Set multiplicity level Select Target Files For Parsing Experiment Multiplicity 1 Select Decoy Files For Parsing Number of Replicates 2 See Sr oe E ne ERO ere COLLET re Meee Specify number of replicates Select NonTarget Files For Parsing Replication Method n Assign internal or Select Label Modification External 1 group per label replicates are separate results external replicates Peptide Parsing Options internal ioplicale per label 1 wet ol TOpulli eret eee acai
142. rt of this manual including text and images shall be copied without the prior written consent of BIOINQUIRE LLC Microsoft Windows and Vista are registered trademarks of Microsoft Corporation Mascot is a registered trademark of Matrix Science Limited Corporation SEQUEST is a registered trademark of the University of Washington Java is a registered trademark of Sun Microsystems Inc iTRAQ is a registered trademark of Applied Biosystems Tandem Mass Tags is a registered trademark of Proteome Sciences Index peptide 70 Spectra 70 of Replicates 70 Sequence Coverage 70 dat 37 42 out 42 accession number 80 accession numbers 62 All parsed peptides from results files 86 Available Results Files viewer 46 57 58 biological sample 41 42 43 70 117 Biological Sample 70 Charge state 69 Charge State 21 76 77 90 91 Clear Selected 24 Cluster parsed peptides to my protein database 45 Combine all results files MudPIT 42 Comparison Function 16 73 Copy 24 Copy to New Protein Set 24 Create ion data for ALL peptides 48 61 63 69 70 Create ion data for top scoring 48 61 63 68 69 70 71 72 74 76 77 78 80 81 82 86 87 88 96 97 105 106 Create no ion data 48 61 63 68 69 70 71 72 74 76 77 78 79 80 81 82 86 87 88 96 97 105 115 118 125 126 130 132 134 136 Database name 61 Database type 61 decoy database 42 44 61 71 72 Describe D
143. sent in the source folder Select only the piq file to open a ProteolQ project Saving a ProteolQ Project There are two options for saving a ProteolQ Project Save and Validate Proteome The following sections describe both saving options and discuss when each should be applied For a detailed dicussion on the difference between Save and Validate Proteome see the What is the difference between Save and Validate discussion section Figure 21 shows how to access the Save functions Table 8 provides an overview of the Save functions Figure 21 Project Save Edit View Plots Help iMd Seve Ctri s Bd Save As Ctri A lg Validate Proteome Ctrl V i port cip Properties i Close To Save a ProteolQ Project 1 Select File Save or Save As 2 Browse to directory for file to be saved 3 Select Save Note The Save or Save As functions create a file containing ALL of the peptide and proteins identifications in the ProteolQ Project even if the peptides or proteins are not displayed in the Peptide Protein Set Viewer See the What is the difference between Save and Validate discussion section for additional information To Validated a ProteolQ Project 1 Select File gt Validate Proteome 2 Browse to directory for file to be saved 3 Select Save Note Validate Proteome DISCARDS ALL decoy and target peptides protein spectra that are not displayed in the Peptide or Protein Set Viewer It is recommended that you Validate pr
144. shows the Describe Correction Factor section of the New Project Wizard Figure 36 Describe Correction Factor Select previously saved correction factors oteolQ New Project Reporter Quantitation eJ amp Es Describe Project Please choose from existing reagent correction factor settings or create settings by mer unlocking the table and entering correction factors for your reagent below Describe Experiment Describe Correction Factor Stored correction factor settings Demo Settings TITEL L m Describe Samples Label Peak of 2 of 1 of 0 of 1 of 42 Edit settings Map lons to Groups 114 1 0 0 10 92 5 6 3 0 2 sf 4 115 1 0 0 2 0 919 6 0 0 1 Choose Database s 116 1 0 0 3 0 92 1 49 0 0 Search Result Options 117 1 3 8 92 2 3 8 0 1 Select Target Files For Parsing Select Decoy Files For Parsing Select NonTarget Files For Parsing Select Label Modification Peptide Parsing Options Name new settings GET E Lock Settings Settings Name Demo Settings 8 TED EE Save settings Custer apes a Peptide Protein Viewer l Back I Next gt f Cancel The following steps outline how to create a set of correction factors Deselect Lock Settings Enter a name for the new correction factor in the Settings Name field To edit a cell double click on the cell and type in the desired value Click S
145. st significant match by score for each peptide will be linked with an MS MS spectrum Generates an MS MS spectrum for every peptide identification regardless of score or rank Defines the name of the first protein set Specifies the location of the ProteolQ Project Automates the process of Peptide Parsing and Cluster If not selected the user will have to manually initiate Peptide Parsing and Clustering Min Score The user may set a minimum score threshold for extraction of peptides from the database search results For Mascot the score threshold refers to lon Score For SEQUEST score indicates XCorr and when X Tandem results are used score refers to Hyper Score Min Score Differential 96 The Min Score Differential function allows the user to select the degree of difference between the most significant match and the second best match to a given spectra For SEQUEST users this setting is loosely analogous to the delta Cn score If the top match passes the user defined threshold then the assignment to that spectra will be included in the final peptide list However if the 1 match does not have a score lon Score Xcorr or Hyper Score that is higher than the 2 match by the percentage set in the Min Score Differential then the assignment to that spectra will not be included in the final peptide list See Example 1 for how Min Score Differential is applied in ProteolQ Min Score Differential and Multiple DB Search Resul
146. tandard deviation A SC 96 Standard Deviation The following sections describe how to evaluate replicate results in ProteolQ Replicate Filters Replicate Plots Replicate Filters Protein identifications can be filtered based on Replicate results using two methods as shown in Figure 76 Using the Filter Pane protein sets can be filtered by Min Percent Replicates Figure 76 Filtering by Replicates Sort by Standard Deviation or Percent Standard Deviation Sort by Average i Spectral Counts Peptdes Proteins 2 Gel A s E Patenti Protein ASC Group TOR Sequence Id Sequence Name ay aed Asc Deviation iyes j 000375 2 jpolpoprotenBprearserHomosapes 43 5 568 1294 2Yes Fe 000055 2 complement component 3 precursor Homo sapiens 76 667 12 741 3ives JNP_001002029 2 complement component 4 preproprotein Homo sapiens 36 il 4yes INP _000583 2 komplement component 4 preproprotein Homo sapiens 34867 0 577 Set Percent Repl icate SYes jP 000087 1 ceruloplasmin precursor Homo sapiens 17 333 CES 6iYes NP 0017012 complement factor B preproprotein Homo sapiens 14 667 2 309 7yYes Je 000005 2 jalpha 2 macroglobulin precursor Homo sapiens i 8 1 SYes INP 001726 2 complement component 5 preproprotein Homo sapiens 10 667 3 512 sles INP_000574 2 vitamin D binding protein precursor Homo sapiens 9 333 2 517 i0Yes NP 000177 2 compleme
147. ter Proteome Summary Pie Sequence Coverage Cluster Dot Peptide FDR Line Bar Protein FDR Line Bar Peptide Probability Line Bar Quantitation Sampling Correlation Scatter Proteome Summary Pie Sequence Coverage Cluster Dot Replicate Analysis Scatter Method Comparison Dot 1D Gel View 1D Gel View By Proteins 2D Gel View Group Summary Venn Quantitative Expression Scatter oe tt Quantitative Expression Heat Map Quantitative Expression Bar Relative Expression Changes Cluster Dot Relative Expression Differences Dot About Opens the Peptide FDR Plot Opens the Protein FDR Plot Opens the Peptide Probability Plot Opens the Quantitation Sampling Scatter Plot Opens the Replicate Analysis Scatter Plot Opens the Pie Chart Opens the Sequence Coverage Cluster Dot Plot Access Plots gt Proteome Validation gt Peptide FDR Plots gt Proteome Validation gt Protein FDR Plots gt Proteome Validation gt Peptide Probability Plots gt Proteome Validation gt Quantitation Sampling Correlation Scatter Plots gt Experimental Reproducibility gt Replicate Analysis Scatter Plots gt Experimental Reproducibility gt Proteome Summary Pie Plots gt Experimental Reproducibility gt Sequence Coverage Cluster Dot Menu Item Method Comparison 1D Gel View 1D Gel View by Protein 2D Gel View Group Summary Venn Diagram Quant
148. that correlate to the m z of predicted doubly charged y ions from the identified peptide By default charge is considered M H M H and M H3 for 1 2 and 3 etc From the peptide sequence ProteolQ predicts the m z of the common fragment ions y b a c x and z Selecting an ion series labels the MS MS spectrum if there are ions present of the selected ion series that meet the criteria set in the left panel ProteolQ generates a distribution of intensities for the fragment ions present in each MS MS spectrum The intensities are binned to create a histogram based on the frequency of intensities observed in the MS MS spectrum The minimum intensity percentile requires that the peaks in the MS MS spectrum must be above the setting in order to be labeled with an ion series annotation For example a minimum intensity percentile of 9096 requires that a peak in the MS MS spectrum has intensity greater than 9096 of all ions in the spectrum to be considered for annotation with an ion series label Lowering the minimum intensity percentile allows ions of lesser intensity to be considered for ion series annotation Specifies the error window for labeling of fragment ions with ion series annotations Units are Da Field Label Reporter lons Label Peaks with Mass to Charge Selected Neutral Losses Label Regardless of Intensity All Available Spectra Top Scoring Peptides Peptides to Single Hit Proteins Descript
149. tions for more details on how to apply probabilities or false discovery rates in ProteolQ Probability ProteolQ calculates probabilities that peptide and protein assignments are correct by employing a Java based implementation of the statistical models commonly known as Peptide Prophet and Protein Prophet as developed by the Institute for Systems Biology In the Peptide Set Viewer probability refers to the probability that the peptide assignment to a given spectra is correct In the Protein Set Viewer probability indicates the likelihood that the protein assignment is correct taking into account the probability of the peptide assignments for all peptides apportioned to that protein For a more detailed discussion on how the probabilities are calculated see the journal articles below References Keller A Nesvizhskii A I Kolker E Aebersold R Empirical statistical model to estimate the accuracy of peptide identifications made by MS MS and database search Anal Chem 2002 74 5383 5392 Nesvizhskii A l Keller A Kolker E Aebersold R A statistical model for identifying proteins by tandem mass spectrometry Anal Chem 2003 75 4646 4658 Peptide and Protein False Discovery Rates ProteolQ utilizes the ProValT algorithm to calculate protein and peptide false discovery rates as described in Weatherly et al False discovery rates are calculated by comparing the frequency of peptide and protein identifications at score th
150. tracted to a new protein set that is accessed via the Protein Set Navigation Bar Selecting View Protein Info opens the Protein Sequence Viewer as shown in Figure 9 Figure 8 Right click functionality in the Protein Set Viewer Single clicking on rows highlights selects the proteins of interest LECKER LC D Copy Selected Ctrl C Copy Selected to New Protein Set View Protein Info i A Copy Selected to New Protein Set View Protein Info opens the creates a new protein set that Proteins Sequence Viewer containing only the selected proteins BIOINQUIRE PROTEOIQ Protein Sequence Viewer Detailed information about every identified protein can be accessed in the Protein Sequence Viewer shown in Figure 9 The Protein Sequence Viewer is opened by double clicking on the protein name in the Protein Set Viewer Identifed peptides are highlighted in red and overlayed on the full amino acid sequence for the protein The top table contains information about the protein identification broken down by biological group and replicate Figure 9 Protein Sequence Viewer Select View to modify All columns can protein and peptide tables be sorted or moved Biological groups and 97 X 18e 140140 replicates displayed pe 3rep D 11911 142 142 901133 67 1 037 MSKGPAVGIDLGTTY SCVGVFQHGKVEI IANDOGNRTTPSYVAFTDTERLIGDAAKNOVA 31901000 MNPTNTVFDAKRLIGRRFDDAVVOSDMKHWPFMVVNDAGRPKVOVE
151. tral View and Peptide Fragmentation Table 65 000 ye Fragment Charge States Da ve Select how MS MS p Fs spectra are annotated Ion Series Ha Iv He Ex mY mz Specify thresholds for peak labeling 77777 1 F Tolerance 0 5 Peak Intensity s 8 x10 Label peaks with m z Vertical peak labels V Label Reporter Ions V Label Region Only b irRAQ4 y wa BS 27 a tt A A O eo m ifa adai anal ea eres Ob iati i i fil ijl ji 100 200 300 400 500 600 700 800 900 1 000 1 100 1 200 Mass m z Peptide DOQEAALVDMVNDGVEDLR 144 1 N term Precursor Mass 2261 2708 2 116 983 Selected Neutral Losses F Label regardless of intensity A h Mod Loss 1 nnotate spectra wit hs oP 97 979 la neutral losses This table HoP 79966331 jj nai ombre ai b Hex 162 0528 can be customized beac 203 0793 1873897 1 745 838 by mass means search for multiple losses 1 545 759 1 474722 s 1 361 637 12 14 267 sQ 111 Peptide Fragmentation Table Adjusting Peak Labeling To adjust labeling of the MS MS spectrum select parameters from the left panel then click View Chart The spectrum and ion series table will automatically be updated if there are ions in the MS MS spectrum that meet the criteria set in the left panel Table 29 describes the parameters Table 29 Peak labeling parameters in MS MS spectral viewer
152. ts for a Single Peak List ProteolQ supports an unlimited number of database search results for a single peak list One common example is to perform two db searches first using fully tryptic enzyme specificity and then performing a second search using partially tryptic or no enzyme specificity All database search results may be loaded in simultaneously and ProteolQ will determine the best match for each query spectrum between all db search results If a Min Score Differential is applied ProteolQ will compare the Scores between each search result and determine if the 1 match passes the user defined Min Score Differential Example 1 Min Score Differential Background Query 399 was matched to two peptides P1 and P2 P1 IAYNVEAAR lon Score of 58 4 P2 TGGNKTEAAR lon Score of 18 4 Set Min Score Differential to 2096 This means that the 1 match IAYNVEAAR must have a score that is 2096 higher than the 2 match TGGNKTEAAR to be included in the peptide list following search results parsing ProteolQ performs this calculation as follows T the score of the 1 match NT the score of the 2 match min A min score differential T 2 NT T X min A So 18 4 58 4 X 0 2 30 08 Since 58 4 is 2 30 08 then the peptide identification is included Another way to look at this the 1 match 58 4 must have a score greater than 46 72 58 4 58 4 X 0 2 to be 20 higher than the 2 match Min Peptide Length
153. uantitation Settings Quantitation Overview Reporter lon Quantitation Modifying Reporter Quantitation Settings Pseudo Spectral Counts Pseudo Spectral Counts Without Normalization Pseudo Spectral Counts With Normalization Analyzing Replicate Results Visualizing Quantitative Results e Q T e aa A TT ce z e Quantitation Overview Label Free Figure 70 shows a flow chart for how quantitation is performed for a binary comparison of two biological samples with three replicates Figure 70 Quantitation Flow Chart Label Free Sample 1 Sample 2 Rep 1 CRep2 CRep3 CRept gt Rep2 CRep3 Spectral count for protein x 100 75 110 50 30 45 les Spe 750 Normalize the Spectral Counts in each replicate for Protein X by the opi Total Spectral Counts for all proteins in each replicate For example normalization S1 Rep2 500 A S1 Rep3 1000 factor for Sample 1 Replicate 1 1 333 1000 750 S2 Rep1 800 S2 Rep2 400 S2Rep3 600 133 150 110 50 60 60 Normalize the Spectral Count in each replicated for Protein X by the Max Replicate SC Max Replicate Spectral Counts for Sample 1 relative to Sample 2 Sample 1 1000 Sample 2 800 Normalization 1 33 1 50 1 1 0 62 5 75 75 Factor 1 25 133X1 150X1 110X1 50X1 25 60X1 25 60X1 25 1000 800 Define Reference Group In this example we use the average of all groups Calculate Normalized Avg SpC for protein X 101 133
154. ue Distributions for Charge 2 Overlap between and Predicted kei To few peptides to p CEEI Negative distributions predict Positive Distribution 8984 883i tristes succeutas s Mamre ot Pennans 229 B 208 0801403 71093 1 56 Predicted Positive Gatien CoRR ER ECE Eee ERE EE EES iisddbdddddbibdiddbddbddg Discriminant Value Distributions for Charge 2 Clear separation between Predicted Positive and Predicted Negative distributions Peptide False Discovery Rates Peptide False Discovery Rates pep FDR can be calculated and utilized to filter large peptide sets in ProteolQ To perform pep FDR you will need to have searched your MS MS spectra against a target and decoy database prior to running ProteolQ The following steps should be followed to perform and pep FDR analysis 1 Loadtarget and decoy database searches in Select Files for Parsing window of the New Project Wizard 2 In the Peptide Parsing Options enter the following parameters a Min Score 0 Min Peptide Length 5 Min Peptide Prob 0 Min Protein Prob 0 Max Protein FDR 10 Note This value will be adjusted once the protein set is created Once the protein set is created select View gt Intermediate Filtering Pane Set Min Peptide FDR 1 0 and leave all other parameters blank figure 56 Select Score Metric from Metric to Use for FDR drop down menu Select No Filter in the Filte
155. us set drop down menu Enter name of new protein set ie Protein Set Name New Protein Set fi Protein set 1 Protein Set 2 Create New Protein Set Protein Set 3 0 Filter Previous Set _ No Filter Table 20 Fields in the Protein Set Section Window Field Description Filtering an Existing To filter an existing protein set select the protein set name Protein Set from the Filter Previous Set drop down menu No Filter Selecting No Filter means that the filter will be applied to ALL peptides parsed from your database search results This allows peptides not present in the original protein set to be displayed by losing the filter conditions Protein Set Name Text field for specifying the name of the new protein set This name will appear in the Protein Set Navigation Bar Description of Filter Parameters Field Description Peptide Score Score refers to either Mascot lon Score SEQUEST Xcorr or X Tandem Hyper Score Peptide Length AA Number of amino acids contained in the peptide 4 Spectra per peptide Number of MS MS sequencing events for unique peptide identification Also known as spectral counts Proteins per Peptide Specifies the number of proteins a peptide can match to When performing spectral count quantification peptides that match to more than one protein can affect the accuracy of the quantification Setting the max number of proteins per peptide to 1 removes all peptides th
156. ve these Validating a ProteolQ Project A Validated ProteolQ Project discards all decoy data and all target spectra peptides that are not used in the protein sets that are shown at the time that you validate Validated projects contain only peptides and proteins viewable in the protein set viewer You CAN do further validation but you will never be able to add data to the validated proteome Table 8 Summary of Save Features Save Function Description Save Saves the current protein sets shown in the protein set viewer and all unused peptides so that validation can be performed at a later date All data is saved in the project path directory specified in the New Project Wizard Save As Save As will save all data depending on if you are currently working on the Whole Project or Validated project in another folder Validate Saving a Validate Proteome removes all unused peptides spectra from the Proteome dataset to save memory and then issues a Save As Validated proteomes are small files that are excellent for sharing with collaborators BIOINQUIRE PROTEOIO Sharing and Moving a ProteolQ Project Figure 22 shows a typical ProteolQ Project results folder Each folder containes the piq file which should be used to Open a ProteolQ Project Index object and database files are also present To share or move a ProteolQ Project the ENTIRE folder must be transferred Figure 22 A typical ProteolQ Project Folder MS MS Spectr
157. y generating 10 Mascot dat or SEQUEST results folders containing many out files By selecting this option ProteolQ assumes all of the peptides were derived from a single source regardless of the fact that the peptides were present in different fractions and thus were identified in distinct search results Each Result File Represents an Individual Sample Performs protein identification in each database search results using ONLY peptides identified from that specific result file The user should select this option for example when identifying spots from a 2 D gel and individual protein spots are excised digested and analyzed independently In this case all of the peptides mapping to a protein would reasonably be expected to be in the same result file Describe Samples Figure 27 shows the Describe Samples section of the New Project Wizard Add Samples Names and modify the number of Replicates Figure 27 Described Samples PROrTEOIQ If desired give names for each of your biological Describe Project samples and or modify the number of replicates Doecitbe Prowse SAMPLE NAMES REPLICATES sample1 3 sample3 3 Choose Database s Search Result Options Select Target Files For Parsing Select Decoy Files For Parsing Select NonTarget Files For Parsing Peptide Parsing Options Parse Files Parse MS lons Data Cluster Peptides Peptide Protein Viewer
158. y sorting proteins by probability in the Protein Set Viewer then copying the selecting proteins to a new protein set as shown in Figure 8 Validating protein identifications by probability in the Filter Pane 1 Enter Min Protein Probability in Filter Pane 2 Select Create New Protein Set Validating protein identifications by Protein Probability in Protein Set Viewer 1 Sort proteins by Probability by selecting the Prob column header 2 Highlight to select proteins of interest 3 Right click and select Copy Selected to New Protein Set If probability thresholds are utilized it is critically important to evaluate the distributions used to calculate the probability values For more details see the Assessing the Quality of Probability Calculations section Protein False Discovery Rate Protein False Discovery Rates pro FDR can be calculated and utilized to filter large protein sets in ProteolQ To perform pro FDR you will need to have searched your MS MS spectra against a target and decoy database prior to running ProteolQ The following steps should be followed to perform and pro FDR analysis 1 Loadtarget and decoy database searches in Select Files for Parsing window of the New Project Wizard 2 Inthe Peptide Parsing Options enter the following parameters a pao Min Score 0 Min Peptide Length 5 Min Peptide Prob 0 Min Protein Prob 0 Max Protein FDR 1 Note This value is dependent upon your desired level o

ProteoIQ User Guide Version 1.4.1

Contents

Download Pdf Manuals

Related Search

Related Contents