Home
MET-COFEI User Manual
Contents
1. i 3 4 5 6 Scan Number 1043 Spectrum at Scan Number 3834 Fig 8 Select files to Batch Run 4d MET COFH OOo eee an aum ANM Lo Fille Help Input GCMS File s CAUsers wezhangNDesktopWMET DEA 2 08 2 19 2013 Configure Parameter Fie angiDesktopAGC Develop Tes MET COFERHdlease Conlighles config paralcev Configure GCMS library C Users wezhang Desktop GC_Develop_Test LIB Noble Lib Apr 7 Publicmsl Batch Run Parallel Run Scan Number 7 Retention Time File i TIC Plot 05020208 CDF Total Intensity Sa A SA 4 5 6 Scan Number 10 3 e Spectrum at Scan Number 3934 F Job Run Config csv B E D E F G H 4 Job Create Time Tue Apr 15 17 59 33 2014 Config Parameter Path C Users wezhang Desktop GC_ Develop _Test MET_COFE I Release ConfigFiles Data File Path c Users wezhang Desktop MET IDEA_v2 08 MET IDEA 12 19 2013 MET IDEA_2 05 Mar_16_2011 Lloyd 050202001 30CDFs Output Path C Users wezhang Desktop MET IDEA_v2 08 MET IDEA 12 19 2013 MET IDEA_2 05 16 2011 Lloyd 050202001 30CDFs Result Run Mode Parallel M CPU Num 2 Process File Name Aligned 05020207 CDF 05020208 CDF 05020209 CDF 10 05020210 CDF 11 05020211 CDF 12 05020212 CDF 13 05020213 CDF 14 05020214 CDF 15 05020215 CDF WON AU PWN COC ORPRP RE Fig 10 Content of Job Run Config File 10 Pa
2. Batch Run Parallel Run Intensity threshold for 2D plot 10 7 Scan Number 7 Retention Time 20 File Name Align 05020207 CDF D5020208 CDF 2D Plot 05020208 CDF BEN gun agis IF ng 1 Hii jji E Retention Time seconds Spectrum at Retention Time 2476 and Scan Number 4448 Fig 5 Visualization of Raw GC MS Data in 2D mode Select the Configure Parameter File A default configure parameter file named as config para csv 15 snapshot in Fig 6 Parameter be further configured by software property page of Parameter Setup Select the configured GC MS library file msl a default library file named as COFEI Test Lib msl is snapshot in fig 7 Check process checkbox and or align checkbox for desired sample file s to run Click Batch Run or Parallel Run button to run See Fig 8 and Fig 9 Then a file named as Job Run Config csv will automatically generated which recorded the information for this job See Fig 10 7 The output files will be created in the Result folder where the GC MS CDF files are loaded F config para csw Profile centroid 1 Para Cutoff Mode o Fara Cutoff Intensity Para Cutoff TopNumber 500 Para_Cutoff_Low Percent O 1 Para Start Scan 100 Para_End_Scan 1960 Para_PPM Para Mlim EIC
3. TT 0 Input GCMS File s CXLUserswezhang Desktop MET IDEA v2 08 MET IDEA 12 19 2013 4MET Configure Parameter File mE Configure GEMS library Batch Run Parallel Run Scan Number Retention Time 2D File Name Process TIC Plot 05020208 CDF 05020207 CDF L 05020209 05020210 05020211 CDF 05020212 CDF 05020213 CDF 05020214 CDF 05020215 CDF 9191919191909 2191091919190 15000000 d E 10000000 5000000 0 ce 4 5 6 Scan Number 10 3 Spectrum at Scan Number 3892 Fig 3 Visualization of Raw GC MS Data and TIC are plotted at Scan mode File Help Input GCMS File s C Users wezhang Desktop MET4DEA_v2 08 MET IDEA 12 19 2013 MET Configure Parameter File Configure GCMS library Batch Run Parallel Run D5020208 CDF B E wmwuck wo 8 wasco Total Intensity Scan Number Retention Time 20 TIC Plot 05020208 CDF 2000 3000 Retention Time seconds Spectrum at Retention Time 2265 and Scan Number 3930 Fig 4 Visualization of Raw GC MS Data and TIC are plotted at retention time mode input File s C Users wezhang Desktop MET IDEA_v2 08 MET IDEA 12 19 2013 MET Configure Parameter File Browse Configure GCMS library Browse
4. File identified grouped chromatograph peaklist aligndb Standard Nomalized Scan Number RT Original RT_Comected SAMPLE NAME Align ID Alig 05020213 05020207 Retention Time seconds Fig 19 Visualization of peaklist with the same Align_ID across different samples and displayed at RT_Corrected mode 21 Parameter Explanation 1 Cutoff Intensity Intensity cutoff threshold for each scan Only the data point with the intensity is larger than the threshold is used to do mass trace extraction by which the low intensity noisy data point can be filtered 2 Cutoff Top Number For each scan only the Top Number intensity data points are considered to do mass trace extraction by which the low intensity noisy data point can be filtered 3 Cutoff Low Percent For each scan only the percentage of low intensity data points is filtered Start Scan Specific the start scan for mass trace extraction End Scan Specific the end scans for mass trace extraction Trace PPM Specific the PPM threshold of m z value variation for a valid mass trace Min Trace Length Specific the minimum mass trace length pe NM Miss Allow Th Specific the maximum allowed miss data point number during the mass trace extraction 9 Inter Trace Distance Th Specific the minimum distance between two neighboring mass trace z ass trace2 Tracel lt TracePPM Miss point num
5. lt MissAllow _ Th nterTrace _ Dist lt InterTraceDistance 122811 un 4 S e Fig 20 Illustration for some Parameter of Mass Trace 10 Min Peak Width The minimum width of a valid peak the very thin peaks with peak width smaller than MinPeakWidth will be filtered 11 Max Peak Width The maximum width of a valid peak the very fat peaks with peak width larger than MaxPeakWidth will be filtered 12 Peak intensity Th The minimum intensity value of a valid peak the very low intensity peaks with intensity smaller than Peakintensity Th will be filtered 22 13 SNR Th The minimum S Signal N Noise value of a valid peak the peaks with SNR smaller than SNR Th will be filtered SNR is defined in the wavelet domain by the ratio of the CWT coefficient at marker point to 95 quintile of the absolute CWT coefficient in scale 1 14 Peak Significance Th The minimum Peak Significant level of a valid peak the peaks with the Peak Significant level smaller than Peak Significance Th will be filtered Peak significant level is defined by the ratio between the mean intensity value of data points near the peak apex and the mean intensity value of data points near the two boundaries 15 TPASR Th The maximum Triangle Peak Area Similarity Ratio TPASR of a valid peak the peaks with TPASR larger than TPASR Th will be filtered Triangle Peak Area Similarity Ratio TPASR 1s defined as the following formula TPA 0 5 Pe
6. 1 66 6 32 116 9 33 225 34 206 9 35 60 9 36 100 9 37 118 9 38 117 9 39 30 9 40 221 41 66 9 42 103 43 85 208 45 128 9 722 Fig 14 Content of constructed Spectrum exported by peaklist with Group_ID 5 B Intensity 38 53861 3 456452 6 438059 6 196463 1 440188 126 8702 25 42246 424 5926 67 65128 13 75067 1 320694 230 3299 10 35266 62 47495 5 163806 0 383007 15 89634 3 674568 2 425882 4 071663 1 986521 3 643781 2 276646 3 166849 22 10376 38 68054 5 148673 19 94974 6 303955 5 287474 16 56008 5 136672 10 07714 3 273819 2 839676 11 25434 2 803671 0 718529 1000 1 446972 6 291953 1 461582 1 858156 toT 7 16 r 05020207_identification_resulLIDEN Notepad E E R File Edit Format View Help Group_ID 5 Extracted spectrum 59 73856 115 9 6624 88 12338 148 11875 76 2760 223 243136 75 48720 73 813696 132 9 129648 86 9 26352 142 9 2531 222 441408 207 19840 204 9 119728 158 9 9896 193 9 734 95 30464 70 9 7042 59 9 4649 191 9 7803 56 9 3807 79 9 6983 81 9 4363 60 6069 190 9 42360 147 74128 144 9 9867 224 38232 60 9 12081 88 8 10133 116 9 31736 225 9844 206 9 19312 80 9 6274 100 9 5442 118 9 21568 117 9 5373 90 9 1377 221 1 91642 006 66 9 2773 103 147648 85 12058 208 2801 128 9 3561 133 9 17800 189 9 18752 101 5928 101 9 3887 188 9 92952 192 9 4360 91 1129 74 9 48680 114
7. 1289 176 28064 117 51424 80 9 856 Library Match Spectrum Name bis trimethylsilyl amine Match 5 949 058 51 3 53 1 54 1 55 4 56 7 57 8 58 11 59 164 60 13 61 6 64 7 69 3 70 12 71 7 73 303 74 27 75 10 80 1 81 1 82 1 83 2 84 3 86 526 87 52 88 22 89 2 98 15 99 6 100 43 101 12 102 10 103 9 104 2 105 1 113 4 114 9 115 6 116 10 117 53 118 8 119 4 128 1 129 1 130 20 131 6 132 3 133 1 144 23 145 4 146 4 158 2 160 1000 161 172 162 79 163 9 164 2 172 1 175 156 176 29 77 12 178 1 Group ID 8 Extracted Spectrum 175 86864 57 5095 174 481344 189 29456 100 9 35312 75 8280 59 9 7233 86 9 7489 176 9 3794 55 2968 54 9 2976 71 3737 86 67040 131 15226 69 9 8801 115 4519 61 2703 88 3113 130 24840 130 9 11454 69 2385 83 9 1506 68 9 2294 131 9 4696 172 4296 116 8374 73 9 16076 158 9 1051 71 9 7700 190 6848 4 Fig 15 Content of the identification result file 17 sa NF File Help Identification Result Select Output Result db C Users wezhang Desktop MET DEA 2 08 Select GC MS CDF File C Users wezhang Desktop MET IDEA_v2 08 MET IDEA 1 Select Identification IDEN File C Users wezhang Desktop MET IDEA_v2 08 MET IDEA 1 Retention Time seconds LT ume 4 Unknown 0000 Construct
8. 2 CWT based peak detection 21 Group Scan Shift Tol The peaks with its peak apexs fall in the ranges of Group Scan Shift Tol can be considered a peak group 22 Group Shape Angle Tol The peaks with their peak shape similarities defined by dot product and then cos are smaller than Group Shape Angle Tol can be considered as a meaningful peak group Here HCA Hierarchical Cluster Analysis are adopted 23 Fragment Mass Tol When calculate the similarity score between the constructed spectrum and library spectrum MET COFEI carefully considers the cases of the integer and float type of the spectrums so the fragment s m z difference is allowed One m z pair one from the constructed spectrum the other from library spectrum can be considered as a matched m z pair if their m z tolerance smaller than Fragment Mass Tol Only the matched m z pair will contribute to the final spectrum similarity score 25 24 29 26 Dae 28 29 Library Spectrum Score Th Only the similarity score between the constructed spectrum and library spectrum is higher than Library Spectrum Score Th it will be considered as a matched identification None One Two Phase alignment If The user need alignment to align the compound associated peaklist across samples please choose one or two phase alignment strategy One phase alignment means a loosen alignment but a fast alignment while two phase align means a stringent alignment but a time consuming alig
9. 9 37192 54 9 2483 104 9 15358 75 9 2600 55 1880 80 9156 176 9 9087 87 9 13838 174 9 16672 162 9 3375 130 9 46616 74 71336 58 9 64456 106 9 1671 71 6561 134 9 10166 89 9 1090 148 9 6299 205 9 28256 Library Match Spectrum Name Trisiloxane Octamethyl Match score 753 722 4 57 2 59 38 60 3 61 6 5 67 7 71 4 25 80 6 81 5 3 85 6 87 14 88 7 89 5 10 9 2 95 24 101 3 103 75 11 107 21 108 3 110 4 113 2 16 117 14 118 3 119 11 122 7 2 131 23 132 3 133 53 134 7 5 145 6 147 34 148 6 149 3 5 161 4 175 8 177 6 189 43 10 191 20 192 4 205 65 206 13 9 208 2 221 1000 222 226 223 123 19 225 6 Group ID 7 Extracted Spectrum 86 9 25432 103 8110 114 8343 101 9 9557 82 9 1782 74 9 11773 69 9 12716 144 9 3341 60 9 6681 175 154752 118 9 3324 72 9 328640 114 9 6339 163 8824 55 9 8235 99 9 44360 161 169664 68 9 2952 102 11925 116 9358 160 986496 177 12324 64 5 8552 131 6205 88 20640 86 486656 58 13600 118 7773 116 9 49680 57 9 8341 87 9 21216 52 9 2211 83 9 3283 117 9 7922 112 9 4133 103 9 1882 51 5383 115 9 9178 59 9 12084 130 9 5647 162 78096 113 9 8496 128 9 812 129 9 20440 87 50320 54 9 4987 102 9 8839 75 13199 59 186048 70 9 7345 56 9 8569 73 9 29272 57 9314 100 9 12070 53 1920 56 8447 104 9 670 60 14871 79 9
10. MET COFEI User Manual Beta version last updated 04 16 2014 The Samuel Roberts Noble Foundation Inc MET COFEI Description MET COFEI is a GC MS Data Processing Platform for METabolite COmpound Feature Extraction and Identification which is aiming to extract the pure spectrum that associated with metabolite compounds from the inputting GC MS files and then identify the compound by searching against an user specific GC MS spectrum library It mainly includes 3 sequential modules Figure 1 compound feature extraction compound identification and compound alignment Compound feature extraction module include 3 sequential sub modules EIC extraction and Peak detection and peak filtering while compound identification module include 3 sequential sub modules peak grouping pure spectrum reconstructing and library searching EIC extraction aims to extract the meaningful mass trace slices from the start scan to the end scan Peak detection aims to detect the local chromatograph peak for each EIC Peak filtering aims to filter out some bad quality peaks Peak grouping is to cluster the detected peaks with the close retention time and peak shape similarity Pure spectrum reconstructing is to build a compound related spectrum by combining all of the mz intensity pair at apex position from all the peaks with the same group id Library searching is to search the constructed spectrum against an user specified GC MS library msl file Compound al
11. Para_Misscount_Allow Para_InterTrace_Dist_Min Para_Min_PeakWidth Para_Max_PeakWidth Para_Branch_Gap_Allow Para_Min_Branch_Length Para Iwlin Spam Para IMlarkerBramnch Dif Th Para Peakintensity Th Para SIR Th Para Peak Significance Th Para TP ASR Th Para zig Zag Index Th Para Group 5 Shift Tol Para Group Shape Angle Th Para Library Fragment Mlass Tol Para Library Matching Score Th Para Two Phase Align S manso oe NOV m G BB RE ala SISS aola 28 Para_Align_Fragment_Mlass_Tol 29 Para_Align_Matching_Score_Th 30 Window 10 31 Align Window 5 32 Fig 6 One Configured Parameters file for MET COFI MET COFEI Tes B NAME Ribitol TMS CASNO 0120 30070101 N1001 NAME 5 methoxy salicylic acid 2TMS FORM C14H24045Si2 CASNO 5 methoxy salicylic acid 2TMS RI RW RT 27 401 COMMENT 27 401 min MEDIA LWS12DECO5100 FIN LIBRARIES MS_LIBRARIES Nobl1e_LIBS methoxySalicyclic 1 ENON ON Fig 7 One Configured GC MS library file for MET COFI ol MET COFEI File Input GCMS File s C Users wezhang Desktop MET IDEA_w2 08 MET IDEA 12 19 2013 MET Configure GCMS library C Users wezhang Desktop GC_Develop_Test LIB Noble_Lib_AprO7_Public msl Batch Run Parallel Run Scan Number Retention Time 20 Process Align 05020208 CDF
12. S data file from the loaded data file name list CDF and visual the raw data which include the TIC spectrum of the specific scan determined by user mouse click position 2D spanned by mz retention time binarization visualization at the specific cutoff threshold Additionally in this property page user can select the files to run and align from the loaded file name list After select the parameter file or change the processing parameters and click Apply in the property page of Parameter Setting the user can run the selected files The following 1 the normal procedures for this property page 1 Input the GC MS data file s Click Browse button to select CDF file s and click Load Data button to display the list of the file name s to the table See Fig 2 TE mm mmn aul MET COFEI acu a Input GCMS File s C Users wezhang Desktop MET IDEA_v2 08 MET IDEA 12 19 2013 MET Configure Parameter File Configure GCMS library Batch Run Parallel Run Fig 2 Load Data file names 2 Once data file names are loaded select the file name on the table to visualize the TIC and spectrum of each scan TIC can be diplayed as scan number mode see Fig 3 or retention time mode see Fig 4 Here the retention time unit is second Additionally the raw data also can be displayed as 2D model if user specific a cutoff threshold see Fig 5 E vrrcoru
13. ak _ Width Intensity Peak _ Apex Right Boundary RPA 25 Intensity i i Left Boundary RPA TPASR TPA Here TPA is the Triangle peak area and RPA is the real peak area TPASR provides index for the closeness of the detected peak and triangle peak in area The TPASR value is more close to 0 the better of the peak quality P P Low TPASR Good peak Low TPASR Good peak A B A B High TPASR bad peak High TPASR bad peak A B A B Fig 21 Illustration of Parameter of TPASR for fat peak and thin peak 16 Zig Zag Index Th The maximum Zig Zag Index of a valid peak the peaks with Zig Zag Index larger than Zig Zag Index Th will be filtered Zig Zag Index is adopted to evaluate the degree of zig zag of a chromatograph peak Zig Zag Index can be calculated by the following procedure Suppose the intensity array of a chromatograph peak is represented as NEM ME 1 Calculate the effective peak intensity by subtract the baseline at the peak apex Max h b I5 4 In 1544 IN Baseline Apex 2 Calculate the first order derivative of the peak and acquire the increment for each data 23 3 4 5 17 18 19 20 point pair I im Lia Lus I n 2 3 Calculate the variance of each two neighbor increment pair as dds V dy dn 1 m danean dn 1 n dimein and dnmean After some simple deducing the variance can be repres
14. e same Job configure file Job_Run_Config csv and the same processing parameter file config_para csv can ensure the same result for the same data file set 11 coco File Help Parameter Setup identification Result Alignment Result Data Type Parameters for Peak Detection Parameters for Peak Grouping and Library Searching Centroid_Data Min Peak Width 6 Group Scan Shift Tol 3 Scan Profile Data Max Peak Width 50 Group Shape Angle Tol 20 Degree Parameters for Mass Trace Extraction Branch Gap Allow Fragment Mass Tol 0 1 Da Cutoff Intensity 70 Min Branch Length 6 Library Spectrum Score Th 750 Cutoff Top Number 500 Min Search Span 2 Cutoff Low Percent 01 Marker Branch Dif Th 8 Parameters for Alignment No Alignment Start Scan 100 Parameters for Peak Filtering One Phase Alignment End Scan 1960 Peak Intensity Th 500 Two Phase Alignment Trace PPM 80 Min Ti Length 6 Align Window 1 10 Seconds iface Peak Significance Th 1 Align Window 2 5 Seconds Miss Allow Th 3 TPASR Th 0 7 Align Fragment Mass Tol 0 08 Da Inter Trace Distance Th 0 06 Da Bude Wha 06 Align Spect Score Th 700 Gently Heminds 1 Have you specificied the configuration folder in Data Process panel 2 Have you clicked the Apply button for the newly Parameter Setup Fig 11 Parameter Configuring Identification Result In this property page the user can view t
15. ed Spectrum Up and Lib S pectrum Down if Match 4 4 Unknown Ss ct Trisiloxane Octamethyl HN 7 Trisiloxane Octamethyl sane Osama Trisiloxane Octamethryl Trisilaxane Octamethyl store Oana iB Trsiloxane Octamethyl 718 ne cee Cp 5 Trisilaxane Octamethy Cp iare Oana 260 Tine Octane BEN 7C Bo Trisiloxane Octamethryl Intensity Fig 16 Visualization of the specific peak and it s corresponding extracted EIC 18 File Select Output Result db File C Users wezhang Desktop MET IDEA v2 08 WMET DEA T Select GC MS CDF File CAUsers wezhang Desktop MET IDEA v2 08 MET IDEA 1 Select Identification IDEN File CUserswezhang Desktop MET DEA v2 08 MET IDEA T ID Peak ID a L 1100 1200 1300 1400 Retention Time seconds M Z 56 90 Da 01000 40 binsize 0 1 Binning Plot Intensity Fig 17 Visualization of the extracted EIC for the specific peak and the binning EIC from raw CDF file Alignment Result In this property page the user can visualize the peaklist with the same Align_ID across different samples From the visualization see Fig 18 Fig 19 all of the peaks with the same Align_ID across different samples are plotted and the peaks from the same Sample are plotted with one specific color Right top panel Additionally the peak associated wit
16. ented as v dn dn 1 0 5 Zl In 1 1 1 Here 21 1 1 In41 indicate the local zig zag degree of data point L 4 In In41 Sum all of the local zig zag we get Sum zig zag s 215 In 1 Inga Calculate the average and normalized Sum zig zag then get Zig Zag Index as following N EPI Based on the real data s testing the proposed Zig Zag Index can evaluate the zig zag degree of a chromatograph peak shape and the lower the Zig 7 Index the better the Zig Zag Index peak quality Parameter 17 20 is about peak branch pattern detection in CWT domain In MET COFEI ID mass trace EIC 1s firstly transformed into 2D CWT coefficients Then local maximum detection 1 utilized for each scale Several continuous meaningful local maximum points across the 2D scan scale space are defined as a meaningful peak pattern branch in general a meaningful branch should be composed of the local maximum points across several continuous scales and corresponds to one valid peak of the original EIC A meaningful peak branch pattern should be larger than a specific length and its searching span should be limited to a specific value and all of its Branch pattern searching gap should be smaller than a specific value Min Branch Length The minimum value for a meaningful branch The branch with its final length is smaller than Min Branch Length will not be considered a valid peak branch pattern Min Search Span The min
17. h the mouse click can also be plotted individually or with the associated peaklist with the same Group_ID Right bottom panel So in this property page the user can clearly know the relationship of the same compound associated peaks across different samples and the relationship between the individual peak and the peaklist with the same Group_ID The alignment procedure is based on the retention time and the similarity score between each constructed spectrum pairs If the retention time and constructed spectrum similarity score fall in the users specific tolerance the compound related peaks will be aligned together 19 The following is the normal procedures for this property page 1 Select the result file name aligned identified grouped chromatograph peaklist aligndb This database file will be generated only if you choose some files for alignment See the Property Page of Data Process 2 Switch the retention time mode between RT Original and RT Corrected to check the alignment results 3 Click a cell in the table to visualize the peaklist that have been aligned into the same Align ID across different sample files Scan Number RT Original Peak Plot Group Plot 71 vH 118 133 131 100006 147 WT 52 05020212 Retention Time seconds Fig 18 Visualization of peaklist with the same Align ID across different samples displayed at RT Original mode 20 Select Aligned Result aligndb
18. he chromatograph peak shape for each detected peak peaklist for a group with the same Group ID We suppose the chromatograph peaks from the same metabolite should have the close retention time and peak shape so we separated peaklist into different Group ID if the peaks retention time corresponds to peak apex and peak shape meets some criteria In this property MET COFEI visualization can clearly provide the relationship of an individual peak with the related peaklist with the same Group ID so the user can double check the performance of peak grouping module also called de convolution in other GC MS tools The user only need to click the radio check button at Peak Plot or Group Plot the detailed visualization for the individual peak peaklist with the same Group ID will be plotted Additionally MET COFEI visualization also can clearly provide the relationship of the individual peak and the related whole extracted EIC by mass trace method or binning method you need to specific the raw CDF file 12 Regarding to the peaklist with the same Group_ID the corresponding pure re constructed spectrum can be generated and plotted So the user can click Export button to export the pure re constructed spectrum into a txt file Compared with the raw mixed spectrum the spectrum constructed with the same Group_ID become more pure by separation according to the criteria of retention time and peak shape The constructed spectrum is c
19. ignment is to align the same compound across different samples The library searching and alignment are based on the calculated similarity score between two spectrums The following figure 1 is the flow chart of MET COFEI data processing Mass Trace Peak Peak Peak Compound i Extraction Detection Filteri Groupin Spectrum png reconstructing Alignment Fig 1 Flow chart of MET COFEI Data Processing In the latest version all of the processing algorithms are coded in C and all of visualization parts are coded in C CLI During the data processing 3 peaklist files named as chromatograph peaklist csv xxx grouped chromatograph peaklist csv and xxx _ identified grouped chromatograph peaklis csv will be produced for each sample xxx means the sample name If you choose some samples to align another aligned peaklist file named as aligned identified grouped chromatograph peaklist csv will also be generated in the end Additionally all of the final output peaklist and the intermediate extracted mass traces are stored in database by SQLite each sample has a corresponding database file named as xxx identified grouped chromatograph peaklist db all of the files that chosen to align will have a corresponding database file aligned identified grouped chromatograph peaklist aligndb The corresponding database file contains the peak shape information Therefore it realized the complete se
20. imum search span for search another local maximum point in its neighboring scale Branch Gap Allow The maximum gap across several continuous neighboring scales Only the branchs with its maximum gap is smaller than Branch Gap Allow are considered as a meaningful branch and finally a meaningful profile peak Marker Branch Dif Th For all data points of a branch the point with its coefficient value larger than its neighboring scales is defined as a marker points Usually for the good quality peak shape there is only one marker point for a branch But at the case of peak overlapping or low peak shape quality there maybe exists several Marker points for one branch If the distances of two marker points across scans are larger than Marker Branch Dif Th the branch should be split into two meaningful branches and finally two peaks should be identified A The original EIC signal B CWT coefficients at different scales C local maximum detection for each scale and 3 meaningful branches can be recognized for the branch 2 24 there exist two marker points and also the distance of the two marker points are larger than Marker Branch Dif Th So there are 4 meaningful peaks are detected D The peak s parameters such as peak s apex left right boundary etc are retrieved according to the detected marker points Intensity Coef Scale Intensity Fig 2
21. method and then configure the parameters for peak detection Parameter configuring for peak quality filtering Bear in mind MET COFEI is aiming to output the high quality chromatograph peak by some criteria 6 a Min Peak Width default 26 Max Peak Width default 250 Peak intensity Th default 250 can be known from mass spectrometry instrument facility personnel or 27 you can tentative configure this parameter as default values and run MET COFEI if found some peaks missing try to configure a loosen parameters to find more peaks 6 b SNR Th default 2 0 Peak Significance Th default 1 0 TPASRTh default 0 7 Zig Zag Index default 0 6 You can tentative configure this parameter as default values and run MET COFEI if found some peaks missing try to configure a loosen parameters to find more peaks Parameter configuring for peak grouping identification and alignment 7 a Group Scan Shift_Tol default 3 Group Shape Angle Tol default 20 Fragment Mass Tol default 0 1 Library Spectrum Score Th default 750 The calculation method for spectrum similarity score is same to the NIST method The maximum score 1000 7 b One Two Phase alignment Align Window Align Window2 Usually use One Phase alignment if you find some compounds are misalign you can use two phase alignment strategy to refine the alignment results keep mind the AlignWindo2 for Two Phase alignment should be narrower than AlignWindowlfor One Pha
22. n time mode see Fig 12 and Fig 13 For Group Plot a constructed spectrum will be generated according to the m z values and apex intensity values of the associated peaks if the user click the button of Export One of the constructed spectrum using peaklist with same Group ID is showed Fig 14 For more details about the matching between the constructed spectrum and the library spectra the user can open the identification file xxx IDEN to double check One of a identification result file 15 showed in Fig 15 From the extracted EIC the user can know the shape information of the front and back of the specific peak Fig 16 provide the visualization of the specific peak and the whole extracted EIC Because the EIC is extracted based an object tracing method and the following chromatograph peak is detected by a CWT based method so the user can optimize the parameters for mass tracing module and CWT based peak detection module If the raw CDF file data 15 selected the binning EIC and TIC also can be displayed Compared the difference between the extracted EIC and binning EIC specific m z and tolerance the user can optimize parameter for mass trace extraction See Fig 17 13 MET COFEI E Select Output Result db ult 05020207_identified_grouped_chromatograph_peakiist db Select GC MS CDF File 05 Mar_16_2011 Loyd 050202001 30CDFs 05020207 CDF Select Identification IDEN File 202001 30CDFs Result 05020207 identification_re
23. nment Align Window The align window for the first phase alignment across retention time Align Window2 The align window for the second phase alignment across retention time The second alignment means a more accurate align So AlignWindow2 AlignWindowl Align Fragment Mass Tol When calculate the similarity score between the two constructed spectra from two samples MET COFEI carefully considers the observed m z difference between the two spectrums so the fragment s m z difference is allowed One m z pair one m z value from one constructed spectrum the other m z value from another spectrum can be considered as a matched m z pair if their m z tolerance smaller than Align Fragment Mass Tol Only the matched m z pair will contribute to the final spectrum similarity score Align Spectrum Score Th Only the similarity score between the two constructed spectra is higher than Align Spectrum Score Th it will be considered as matched alignment The two components represented by the two constructed spectrum can be considered the same metabolite compound Parameter Setting and Optimization Procedure Given one GC MS Data sample parameter configuration will affect the processing speed and performance greatly Usually the loosen parameter can get more peak information but with the high computation burden and longer computation time The user needs to balance it in real application If there are not too much dataset the user always wants to ac
24. onsidered to a pure spectrum that related to a potential compound which can be searched against a library If the user also specific the library based identification file xxx IDEN the matched degree between the constructed spectrum upward and library searched spectrum downward can be together plotted The following is the normal procedures for this property page 1 Select the result file xxx_identified_grouped_chromatograph_peaklist db from result directory Usually after the data processing the processing result will be generated in the same folder of the original GC MS data files located 2 Click a cell in the table to visualize the individual peak Check Radio of Peak Plot associated peaks with the same Group_ID Check Radio of Group Plot 3 Export the constructed spectrum into a txt file Select the library based identification file xxx IDEN to check matching degree between the constructed spectrum and the library searching spectrum 5 Select the raw CDF file and manually configure the m z tolerance ppm or Da to view the binning based EIC If the user wants to view the meaningful peak or peaklist you can click the head of the column with the Group ID or Apex MZ and then the whole table displayed in the left part will be sorted as ascending or descending order The user can view the associated peaks in the lower right part at the normalized focus on peak shape similarity or standard Scan Number or Retentio
25. paration between data processing and result visualization For the raw data CDF you can view graphics such as TIC Total Ion Chromatograph spectrum data of each scan 2D display of the raw data and binning based EIC For the output results visualization you can open the database file only to view the extracted EIC by mass trace method the detected individual chromatograph peak peaklist that have been grouped and aligned 2 The latest version of MET COFEI support Batch mode and Parallel mode MPI Message Passing Interface to run your multiple samples depending the core number of your PC Of course the data processing time will saved and the required memory will increased if you run at Parallel mode Additionally considering the compatibility for 32bit and 64 bit CPU in MPI package we are separated them into two packages The users should download the corresponding package according to their own CPU hardware MET COFEI Application The following screenshot is METCOFEI software interface the application operation and parameters configuration can be finished by the software There are 4 main parts they are displayed as 4 item property page Data Process for raw Data visualization and processing Parameter Setup for processing Identification Result for individual sample result visualization and Alignment Result for multiple sample visualization after alignment Data Process This property page let user to select the GC M
26. quire the optimal analysis results Here I provide a normal procedure for parameter setting and optimization l Open a representative CDF file refer to Data Process property page Determine the Start Scan and End Scan from the TIC curve Move mouse to a representative scan from the corresponding spectrum determine the intensity cutoff threshold Bear in mind only the higher intensity point of a scan will be used for EIC extraction and peak detection Still from the opened representative spectrum zoom out you can easily know the data is centroid data or profile data In profile data the spectrum displayed as many spectrum peaks while centroid data displayed as sticks Parameter configuring for EIC extraction 26 Bear in mind MET COFEI adopt more advanced method to extract mass trace not binning based method in principle it i5 same to object tracing problem in video tracing So a meaning mass trace EIC should be a continuous point trace across several continuous scans and the m z value varies shift in the specific tolerance 4 a from the mass spectrometry instrument facility personnel you should know the mass spectrometry accuracy tolerance and then you can configure Trace PPM 4 b also you can know the minimum meaningful chromatograph peak width from the mass spectrometry instrument facility personnel and then you can configure the Min Trace Length 4 c Allow some point missing during EIC tracing you
27. rameter Setup In this property page the user can configure the processing parameter Fig 11 The different parameter setting will generate very different results If the user wants to acquire the optimal results by setting the optimal parameters please read the parameter explanation section and parameter optimization section first After parameter configuring then Click Apply button it will save the modified parameters to file user specific parameter file default as config_para csv see Data Process property page Only the user click the button of Apply the newly configured parameters can be loaded into parameter setup panel and can used in the following data processing If the user wants to use the new configured parameter go back to Data Process page and click Batch Run or Parallel Run The parameters configuration includes 6 parts Data type configuration Parameter configuration for Mass traces EIC Extraction Parameter configuration for CWT based Peak detection Parameter configuration for peak quality filtering Parameter configuration for peak grouping and library searching Parameter configuration for Compound alignment Regarding Data type configuration you need to select the inputting data as profile data or centroid data For profile data the centroid processing module will be called at first These data configuration should be correct otherwise you can t get the correct meaningful run result Th
28. se alignment Regarding the configured value for Align Window1 Align Window2 you should know these parameters from the mass spectrometry instrument facility personnel because they are related to the real dynamic range for retention time shift Additionally Start Scan End Scan and Cutoff Intensity are mainly factor to affect the computation burden Potential problems in application MET COFEI development team tried to find any potential bugs or problems in application if you find any problems please contact us by pzhao noble org or wezhang noble org We are very appreciated for your feedback and suggestion During our test the potential problems that have been found include l MET COFEI may fail in parallel processing mode if the specific file folder name and file name have any space This problem is generated by the MPI command line parsing module Solutions remove any space in the folder name and file name In some Windows 7 machine MPI based parallel processing may fail Solutions Check Windows OS and update into the latest version ensure it support MPI NET 28
29. should configure Miss Allow Th default value 3 4 d Allow the tolerance between two neighboring mass trace you should configure Inter Trace Distance Th You can refer to fig 20 to know the real physical meaning at first and then configure the parameters for mass trace extraction Parameter configuring for Peak detection Bear in mind MET COFEI adopt a more advanced method to detect meaningful profile peak The peak detection is implemented in 2D CWT domain spanned by retention time scan by detect the peak branch pattern One meaningful peak corresponds to a meaningful branch pattern meaningful branch pattern should be across several scales and the continuity 1s not too bad 5 a Min Branch Length the minimum branch length in 2D CWT domain for a peak Default value 6 5 b Branch Gap Allow allow the branch have a limited gap across neighboring scales Default value 22 3 5 c Min Search Span specific the branch searching range in the neighboring scales Default value 22 3 5 d Marker Branch Dif Th for sharing peak issues the branch pattern is a little challenge you can consider it a very wide peak or two sharing peaks So one peak branch pattern in the high scale will split into two at some scale In this case there will two marker points When the distance between the two marker points 1s larger than Marker Branch Dif Th MET COFEI will consider it as two sharing peaks You can refer to fig 22 to know the real meaning of this
30. suit IDEN Peak Plot amp Group Plot Standard Nomalized 05020207 identified grouped chromatograph peaklist db 2500000 Lu 2000000 ty 1500000 1000000 Constructed Spectrum Up and Lib Spectrum Down if Match Trisiloxane Octame Intensity 4 Fig 12 Visualization of the associated un normalized peaks with the same Group ID and the constructed and library matching spectrums 14 ud MET COFEI File Select Output Result db ult 05020207_identified_crouped_chromatograph_peakiist db Select GC MS CDF File Select Identification IDEN File 202001 30C DFsXResult 05020207 identification result IDEN Peak Plot Group Plot Standard 8 Nomalized Scan Number Retention Time 05020207 identified grouped chromatograph peaklist db 2500000 2000000 1500000 Intensity 1000000 Constructed Spectrum Up and Lib Spectrum Down if Match Trisilaxane Octame Intensity Fig 13 Visualization of the associated Normalized peaks with the same Group ID and the constructed and library matching spectrums 15 pseudospectrum group id5 txt 2 59 3 115 9 88 2 148 76 7 223 75 9 73 10 132 9 11 66 9 12 142 9 13 222 14 207 15 204 9 16 158 9 17 193 9 15 95 19 70 9 20 59 9 21 191 9 22 25 9 23 79 9 24 81 9 25 60 26 190 9 27 147 28 144 9 29 224 30 60 9 3
Download Pdf Manuals
Related Search
Related Contents
マニュアルの使いかた PHV-1000 取扱説明書ダウンロード GRAN CREMA - saeco(サエコ) Hybridization Ovens User Manual Getting Started Guide Version 2.4 August 2007 4-mydle-rampesimple-o13 PVP 5-7 Series User Manual Rev. 3 Approx APPHSVLV2 headset STC GZ584 Installation Guide Copyright © All rights reserved.
Failed to retrieve file