Home

mavolcanoplot

1. PDB Database Record Field in the MATLAB Structure CRYST1 Cryst1 ORIGXn OriginXx SCALEn Scale MTRIXn Matrix TVECT TranslationVector MODEL Model ATOM Atom SIGATM AtomSD ANISOU AnisotropicTemp SIGUIJ AnisotropicTempSD TER Terminal HETATM HeterogenAtom CONECT Connectivity PDBStruct getpdb PDBid PropertyName PropertyValue calls getpdb with optional properties that use property name property value pairs You can specify one or more properties in any order Each PropertyName must be enclosed in single quotation marks and is case insensitive These property name property value pairs are as follows PDBStruct getpdb PDBid ToFile ToFileValue saves the data returned from the database to a PDB formatted file ToFileValue Tip After you save the protein structure record to a local PDB formatted file you can use the pdbread function to read the file into MATLAB offline or use the molviewer function to display and manipulate a 3 D image of the structure 2 225 getpdb 2 226 PDBStruct getpdb PDBid SequenceOnly SequenceOnlyValue controls the return of the protein sequence only Choices are true or false default If there is one sequence it is returned as a character array If there are multiple sequences they are returned as a cell array The Sequence Field The Sequence field is also a structure containing sequence information
2. 10 zo 30 40 61 tgggggecge ctcggagcat gacccccegeg ggecagegee c 121 ceccegegete cegcagcecat gggcaccggg ggccggceggg as Md 181 ctggtggcgg tggccgcgct getactggge gecgegggee gt 4 4 f o BP Pixel ak Sequence 1 ggggggetge gcggccgggt cggtgcgcac acgagaagga a E bijv Bioinformatics Toolbox functions aa2nt aacount aminolookup basecount baselookup dimercount emblread fastaread fastawrite genbankread geneticcode genpeptread getembl getgenbank getgenpept nt2aa proteinplot seqcomplement seqdisp seqrcomplement seqreverse seqshoworfs seqshowwords seqwordcount 2 679 seqwordcount Purpose Syntax Arguments Description Examples 2 680 Count number of occurrences of word in sequence seqwordcount Seq Word Seq Enter a nucleotide or amino acid sequence of characters You can also enter a structure with the field Sequence Word Enter a short sequence of characters seqwordcount Seq Word counts the number of times that a word appears in a sequence and then returns the number of occurrences of that word If Word contains nucleotide or amino acid symbols that represent multiple possible symbols ambiguous characters then seqwordcount counts all matches For example the symbol R represents either G or A purines For another example if word equals ART then seqwordcount counts occurrences of both AAT and AGT seqwordcoun
3. 8 t 8 Evaluate the performance of the classifier classperf cp classes test cp CorrectRate ans 0 9867 9 Use a one norm hard margin support vector machine classifier by changing the boxconstraint property 2 712 svmtrain figure svmStruct svmtrain data train groups train showplot true boxconstraint 1e6 Figure 2 classes svmclassify svmStruct data test showplot true 2 713 svmtrain Figure 2 0 training 0 classified 1 training 1 classified Support Vectors 10 Evaluate the performance of the classifier classperf cp classes test cp CorrectRate ans 0 9867 2 714 svmtrain References See Also 1 Kecman V 2001 Learning and Soft Computing Cambridge MA MIT Press 2 Suykens J A K Van Gestel T De Brabanter J De Moor B and Vandewalle J 2002 Least Squares Support Vector Machines Singapore World Scientific 3 Scholkopf B and Smola A J 2002 Learning with Kernels Cambridge MA MIT Press 4 Cristianini N and Shawe Taylor J 2000 An Introduction to Support Vector Machines and Other Kernel based Learning Methods First Edition Cambridge Cambridge University Press http www support
4. 3 Shift a spectrum by the difference between RP the known reference mass of 4000 m z and SP the experimental mass of 4051 14 m z RP 4000 SP 4051 14 YOut interp1 MZ Y MZ RP SP 4 Plot the original spectrum in red and the shifted spectrum in blue and zoom in on the reference peak plot MZ Y r MZ YOut b xlabel Mass Charge M Z ylabel Relative Intensity 2 423 msalign References See Also 2 424 legend Y YOut axis 3600 4800 2 60 Figure 1 IE File Edit View Insert Tools Desktop Window Help a DSHS i aana E0 n 60 Relative Intensity 3600 3800 4000 4200 4400 4600 4800 Mass Charge M Z 1 Monchamp P Andrade Cetto L Zhang J Y and Henson R 2007 Signal Processing Methods for Mass Spectrometry In Systems Bioinformatics An Engineering Case Based Approach G Alterovitz and M F Ramoni eds Artech House Publishers Bioinformatics Toolbox functions msbackadj msheatmap mspalign mspeaks msresample msviewer msbackadj Purpose Syntax Arguments Description Correct baseline of mass spectrum Yout msbackadj MZ Y msbackadj msbackadj msbackadj msbackadj msbackadj msbackadj msbackadj msbackadj msbackadj MZ Yout msbackadj MZ Y adjusts the variable baseline of a raw mass Ga bead Gant J PropertyName PropertyValue WindowSize WindowSizeValue StepSize
5. Before normalization After normalization 30000 30000 25000 25000 20000 20000 xX 8 data Y 2 baseline 15000 15000 10000 ae 5000 3 Invariant set aug Smooth curve Invariant set 5000 15000 25000 5000 15000 25000 Examples 1 Load a MAT file included with Bioinformatics Toolbox which contains Affymetrix data variables including pmMatrix a matrix of PM probe intensity values from multiple CEL files load prostatecancerrawdata 2 Normalize the data in pmMatrix using the affyinvarsetnorm function 2 20 affyinvarsetnorm NormMatrix affyinvarsetnorm pmMatrix The prostatecancerrawdata mat file used in the previous example contains data from Best et al 2005 References 1 Li C and Wong W H 2001 Model based analysis of oligonucleotide arrays model validation design issues and standard error application Genome Biology 2 8 research0032 1 0032 11 2 http biosun1 harvard edu complab dchip normalizing 20arrays htm isn 3 Best C J M Gillespie J W Yi Y Chandramouli G V R Perlmutter M A Gathright Y Erickson H S Georgevich L Tangrea M A Duray P H Gonzalez S Velasco A Linehan W M Matusik R J Price D K Figg W D Emmert Buck M R and Chuaqui R F 2005 Molecular alterations in primary prostate cancer after androgen ablation therapy Clinical Cancer Research 11 6823 6834 See Also affyread celintensityread mainvarsetnorm malowess ma
6. GeneticCode Genetic code for translating nucleotide codons to amino acids Enter a code number or code name from the table If you use a code name you can truncate the name to the first two characters of the name Alphabet Value Property to select the nucleotide alphabet Enter either dna or rna The default value is dna ThreeLetterCodesValue Property to select one or three letter amino Genetic Code acid codes Enter true for three letter codes or false for one letter codes Code Number Code Name 1 Standard Vertebrate Mitochondrial Yeast Mitochondrial 2 3 4 Mold Protozoan Coelenterate Mitochondrial and Mycoplasma Spiroplasma Invertebrate Mitochondrial 2 611 revgeneticcode Description 2 612 Code Code Name Number 6 Ciliate Dasycladacean and Hexamita Nuclear 9 Echinoderm Mitochondrial 10 Euplotid Nuclear 11 Bacterial and Plant Plastid 12 Alternative Yeast Nuclear 13 Ascidian Mitochondrial 14 Flatworm Mitochondrial 15 Blepharisma Nuclear 16 Chlorophycean Mitochondrial 21 Trematode Mitochondrial 22 Scenedesmus Obliquus Mitochondrial 23 Thraustochytrium Mitochondrial map revgeneticcode returns a structure containing the reverse mapping for the standard genetic code revgeneticcode GeneticCode returns a structure containing the reverse mapping for an alternate genetic code revgeneticcode
7. 2 317 hmmprofstruct Field Name Description MatchEmission Symbol emission probabilities in the MATCH states Size is ModelLength x AlphaLength Defaults to uniform distributions May accept a structure with residue counts see aacount or basecount InsertEmission Symbol emission probabilities in the INSERT state Size is ModelLength x AlphaLength Defaults to uniform distributions May accept a structure with residue counts see aacount or basecount NullEmission BeginX Symbol emission probabilities in the MATCH and INSERT states for the NULL model NULL model size is 1 x AlphaLength Defaults to a uniform distribution May accept a structure with residue counts see aacount or basecount The NULL model is used to compute the log odds ratio at every state and avoid overflow when propagating the probabilities through the model BEGIN state transition probabilities Format is B gt D1 B gt M1 B gt M2 B gt M3 B gt Mend Notes sum S BeginX 1 For fragment profiles sum S BeginX 3 end 0 Default is 0 01 0 9900 OJ 2 318 hmmprofstruct Field Name Description Matchx MATCH state transition probabilities Format is M1 gt M2 M2 gt M3 M end 1 gt Mend M1 gt I1 M2 gt I2 M end 1 gt I end 1 M1 gt D2 M2 gt D3 M end 1 gt Dend M1 gt E M2 gt E M end 1 gt E Notes sum S Matchx 11 1 For fragm
8. maxflow BGObj SNode TNode Method MethodValue er Arguments BGObj biograph object created by biograph object constructor SNode Node in a directed graph represented by an N by N adjacency matrix extracted from biograph object BGObj TNode Node in a directed graph represented by an N by N adjacency matrix extracted from biograph object BGObj 4 45 maxflow biograph CapacityValue MethodValue Description Column vector that specifies custom capacities for the edges in the N by N adjacency matrix It must have one entry for every nonzero value edge in the N by N adjacency matrix The order of the custom capacities in the vector must match the order of the nonzero values in the N by N adjacency matrix when it is traversed column wise By default maxflow gets capacity information from the nonzero entries in the N by N adjacency matrix String that specifies the algorithm used to find the minimal spanning tree MST Choices are e Edmonds Uses the Edmonds and Karp algorithm the implementation of which is based on a variation called the labeling algorithm Time complexity is 0 N E 2 where N and E are the number of nodes and edges respectively e Goldberg Default algorithm Uses the Goldberg algorithm which uses the generic method known as preflow push Time complexity is O0 N 2 sqrt E where N and E are the number of nodes and edges respectively Tip For introducto
9. alo File Edit View Insert Tools Desktop Window Help a Sequence 2 0 100 200 300 400 500 600 700 800 100 200 F Sequence 1 D m Matches seqdotplot moufflon takin 11 7 Matches 5552 Matches Matrix seqdotplot moufflon takin 11 7 See Also Bioinformatics Toolbox functions nwalign swalign 2 637 seqinsertgaps Purpose Syntax Arguments Return Values 2 638 Insert gaps into nucleotide or amino acid sequence NewSeq seqinsertgaps Seq Positions NewSeq seqinsertgaps Seq GappedSeq NewSeq seqinsertgaps Seq GappedSeq Relationship Seq Positions GappedSeq Relationship NewSeq Either of the following e String specifying a nucleotide or amino acid sequence e MATLAB structure containing a Sequence field Vector of integers to specify the positions in Seq before which to insert a gap Either of the following e String specifying a nucleotide or amino acid sequence e MATLAB structure containing a Sequence field Integer specifying the relationship between Seq and GappedSeq Choices are e 1 Both sequences use the same alphabet that is both are nucleotide sequences or both are amino acid sequences e 3 Seq contains nucleotides representing codons and GappedSeg contains amino acids default Sequence with gaps inserted represented by a string specifying a nucleotide or amino acid sequence seqinsertgaps Description NewSeq s
10. BGObj biograph object created by biograph object constructor DirectedValue Property that indicates whether the graph is directed or undirected Enter false for an undirected graph This results in the upper triangle of the sparse matrix being ignored Default is true WeightsValue Column vector that specifies custom weights for the edges in the N by N adjacency matrix extracted from a biograph object BGObj It must have one entry for every nonzero value edge in the matrix The order of the custom weights in the vector must match the order of the nonzero values in the matrix when it is traversed column wise This property lets you use zero valued weights By default allshortestpaths gets weight information from the nonzero entries in the matrix Tip For introductory information on graph theory functions see Graph Theory Functions in the Bioinformatics Toolbox documentation dist allshortestpaths BGO0Obj finds the shortest paths between every pair of nodes in a graph represented by an N by N adjacency matrix extracted from a biograph object BGObj using Johnson s References allshortestpaths biograph algorithm Nonzero entries in the matrix represent the weights of the edges Output dist is an N by N matrix where dist S T is the distance of the shortest path from node S to node T A 0 in this matrix indicates the source node an Inf is an unreachable node Johnson s algorithm has a time
11. ioj x File Edit View Insert Tools Desktop Window Help a OSES naana EHan Retention Time Relative Intensity 200 250 300 350 400 450 500 550 600 Mass Charge M Z 3 Plot the reconstructed profile spectra between two retention times figure t1 3370 t2 3390 h find ret_time gt t1 amp ret_time lt t2 MZ Y msppresample peaks h 10000 plot3 repmat MZ 1 numel h repmat ret_time h 10000 1 Y xlabel Mass Charge M Z ylabel Retention Time 2 482 msppresample Zlabel Relative Intensity alo x File Edit View Insert Tools Desktop Window Help Oe eS re aQMoe O8 a0 x10 Relative Intensity 8 N A for co S 3370 Retention Time 200 Mass Charge M Z 4 Resample the data to plot the Total Ion Chromatogram TIC figure MZ Y msppresample peaks 5000 plot ret_time sum Y title Total Ion Chromatogram TIC xlabel Retention Time ylabel Relative Intensity 2 483 msppresample lolx Edit View Insert Tools Desktop Window Help eHS s Q2QMa e2 08 so0 x10 Total lon Chromatogram TIC 2 18 1 6 1 4 ee N oo Relative Intensity 0 4 0 2 2500 3000 3500 4000 4500 Retention Time 5 Resample the data to plot the Extracted Ion Chromatogram XIC in the 450 to 500 m z range figure MZ Y msppresample peaks 5000 Range 450 500 plot ret_time
12. proteinpropplot SeqAA Startat StartatValue proteinpropplot SeqAA Endat EndatValue proteinpropplot SeqAA Smoothing SmoothingValue proteinpropplot SeqAA EdgeWeight EdgeWeightValue proteinpropplot SeqAA WindowLength WindowLengthValue 2 584 proteinpropplot Arguments SeqgAA PropertyTitleValue StartatValue EndatValue SmoothingValue Amino acid sequence Enter any of the following e Character string of letters representing an amino acid e Vector of integers representing an amino acid such as returned by aa2int e Structure containing a Sequence field that contains an amino acid sequence such as returned by getembl getgenpept or getpdb String that specifies the property to plot Default is Hydrophobicity Kyte amp Doolittle To display a list of properties to plot enter a empty string for PropertyTitleValue For example type proteinpropplot sequence propertytitle Tip To access references for the properties view the proteinpropplot m file Integer that specifies the starting point for the plot from the N terminal end of the amino acid sequence SegAA Default is 1 Integer that specifies the ending point for the plot from the N terminal end of the amino acid sequence SegAA Default is length SeqAA String the specifies the smoothing method Choices are e linear default e exponential e lowess 2 585 prote
13. Others OthersValue aacount Structure StructureValue SeqAA Amino acid sequence Enter a character string or vector of integers from the table Examples ARN or 1 2 3 You can also enter a structure with the field Sequence ChartValue Property to select a type of plot Enter either pie or bar OthersValue Property to control the counting of ambiguous characters individually Enter either full or bundle default StructureValue Property to control blocking the unknown characters warning and to not count unknown characters Amino aacount SeqAA counts the type and number of amino acids in an amino acid sequence SeqAA and returns the counts in a 1 by 1 structure Amino with fields for the standard 20 amino acids A R N D CQEGHILKMFPSTWYV e Ifa sequence contains amino acids with ambiguous characters B Z X the stop character or gaps indicated with a hyphen the field Others is added to the structure and a warning message is displayed Warning Symbols other than the standard 20 amino acids appear in the sequence aacount e Ifa sequence contains any characters other than the 20 standard amino acids ambiguous characters stop and gap characters the characters are counted in the field Others and a warning message is displayed Warning Sequence contains unknown characters These will be ignored e Ifthe property Others full this function lists the ambiguous ch
14. lets you select the column index N from Data to be the baseline column Default is the index of the column whose median intensity is the median of all the columns affyinvarsetnorm 2 18 affyinvarsetnorm Thresholds ThresholdsValue sets the thresholds for the lowest average rank and the highest average rank which are used to determine the invariant set The rank invariant set is a set of data points whose proportional rank difference is smaller than a given threshold The threshold for each data point is determined by interpolating between the threshold for the lowest average rank and the threshold for the highest average rank Select these two thresholds empirically to limit the spread of the invariant set but allow enough data points to determine the normalization relationship ThresholdsValue is a 1 by 2 vector LT HT where LT is the threshold for the lowest average rank and HT is threshold for the highest average rank Values must be between 0 and 1 Default is 0 05 0 005 affyinvarsetnorm StopPrctile StopPrctileValue stops the iteration process when the number of data points in the invariant set reaches N percent of the total number of data points Default is 1 Note If you do not use this property the iteration process continues until no more data points are eliminated affyinvarsetnorm RayPrctile RayPrctileValue selects the N percentage of the highest ra
15. 2 229 goannotread See Also Bioinformatics Toolbox e functions geneont object constructor num2goid e geneont object methods getancestors getdescendants getmatrix getrelatives 2 230 gonnet Purpose Syntax Description References See Also Gonnet scoring matrix gonnet gonnet returns the Gonnet matrix The Gonnet matrix is the recommended mutation matrix for initially aligning protein sequences Matrix elements are ten times the logarithmic of the probability that the residues are aligned divided by the probability that the residues are aligned by chance and then matrix elements are normalized to 250 PAM units Expected score 0 6152 Entropy 1 6845 bits Lowest score 8 Highest score 14 2 Order A R N D C Q E GH I LK M F PS TW Y V BZ XK 1 Gaston H Gonnet M Cohen A Benner S 1992 Exhaustive matching of the entire protein sequence database Science 256 1443 1445 Bioinformatics Toolbox functions blosum dayhoff pam 2 231 gprread Purpose Syntax Arguments Description 2 232 Read microarray data from GenePix Results GPR file GPRData gprread File gprread PropertyName PropertyValue gprread CleanColNames CleanColNamesValue File GenePix Results formatted file file extension CleanColNamesValue GPR Enter a file name or a path and file name Property to control creating column names that MATLAB can use
16. Genetic Code Code Code Name Number 1 Standard 2 Vertebrate Mitochondrial 3 Yeast Mitochondrial 4 Mold Protozoan Coelenterate Mitochondrial and Mycoplasma Spiroplasma 5 Invertebrate Mitochondrial 6 Ciliate Dasycladacean and Hexamita Nuclear 9 Echinoderm Mitochondrial 10 Euplotid Nuclear 11 Bacterial and Plant Plastid 12 Alternative Yeast Nuclear 13 Ascidian Mitochondrial 14 Flatworm Mitochondrial 15 Blepharisma Nuclear 16 Chlorophycean Mitochondrial 2 188 geneticcode Code Code Name Number 21 Trematode Mitochondrial 22 Scenedesmus Obliquus Mitochondrial 23 Thraustochytrium Mitochondrial Description Map geneticcode returns a structure with a mapping of nucleotide codons to amino acids for the standard genetic code geneticcode GeneticCode returns a structure of the mapping for alternate genetic codes where GeneticCode is either of the following e The transl_table code number from the NCBI Genetics Web page http www ncbi nlm nih gov Taxonomy Utils wprintgc cgi mode c e One of the supported names in the table above Examples List the mapping of nucleotide codons to amino acids for a specific genetic code wormcode geneticcode Flatworm Mitochondrial See Also Bioinformatics Toolbox functions aa2nt aminolookup baselookup codonbias dnds dndsml nt2aa revgeneticcode seqshoworfs seqtool 2 189 genevarfilter
17. Note For a uniformly spaced MZ vector a nonrobust smoothing with Order equal to 0 is equivalent to filtering the signal with the kernel vector mslowess ShowPlot ShowPlotValue plots the smoothed spectrum over the original spectrum When mslowess is called without output arguments the spectra are plotted unless ShowPlotValue is false When ShowPlotValue is true only the first spectrum in Y is plotted ShowPlotValue can also contain an index to one of the spectra in Y 1 Load sample data load sample_lo_res 2 Smooth spectrum and draw figure with unsmoothed and smoothed spectra YS mslowess MZ_lo_res Y_lo_res 1 Showplot true mslowess Figure 1 B io x Fie Edit View Insert Tools Desktop Window Help a Spectrogram ID 1 Original spectrogram Smoothed spectrogram lon Intensity 1 1 1 1 0 5000 10000 15000 Mass Charge M Z 2 449 mslowess Figure 1 alx File Edit View Insert Tools Desktop Window Help a Spectrogram ID 1 1 eee Original spectrogram i oe ener ee Smoothed spectrogram lon Intensity 7350 7400 7450 7500 Mass Charge M Z See Also Bioinformatics Toolbox functions msalign msbackadj msheatmap msheatmap msnorm mspeaks msresample mssgolay msviewer 2 450 msnorm Purpose Syntax Arguments Description Normalize set of mass spectra Yout msnorm MZ Y Yout NormParameters msnorm msnorm MZ NewY NormParameters
18. W msheatmap MZ_lo_res YA markers R range 3000 10000 title after alignment 2 420 msalign Figure 2 lol x File Edit View Insert Tools Desktop Window Help a OFS s QQMod 08 eo0 after alignment Spectrogram Indices Relative Intensity 3000 4000 5000 6000 7000 98000 9000 10000 Mass Charge M Z Aligning Mass Spectrum with One Reference Peak It is not recommended to use the msalign function if you have only one reference peak Instead use the following procedure which shifts the MZ vector but does not scale it 1 Load sample data and view the first sample spectrum load sample_lo_res MZ MZ_lo res Y Y_lo_res 1 2 421 msalign msviewer MZ Y lolx File Tools Window Help a I Ela aA Markers M Z a Relative Intensity He ag Add Marker 0 2000 4000 6000 8000 10000 12000 14000 16000 18000 MIZ ho Sn Delete Marker 2 Use the tall peak around 4000 m z as the reference peak To determine the reference peak s m z value click a and then click drag to zoom in on the peak Right click in the center of the peak and then click Add Marker to label the peak with its m z value 2 422 msalign Mass Spectra Viewer ole x File Tools Window Help mamka Markers M Z 4051 14 a Relative Intensity Hl O E Add Marker 3700 3800 3900 4000 4100 4200 4300 4400 Delete Marker I ne Ma
19. hmmprofgenerate Model Align AlignValue hmmprofgenerate Model Flanks FlanksValue hmmprofgenerate Model Signature SignatureValue c Model Hidden Markov model created with the hmmprofstruct function AlignValue Property to control using uppercase letters for matches and lowercase letters for inserted letters Enter either true or false Default is false FlanksValue Property to control including the symbols generated by the FLANKING INSERT states in the output sequence Enter either true or false Default is false SignatureValue Property to control returning the most likely path and symbols Enter either true or false Default is false Sequence hmmprofgenerate Model returns the string Sequence showing a sequence of amino acids or nucleotides drawn from the profile Model The length alphabet and probabilities of the Model are stored in a structure For move information about this structure see hmmprofstruct Sequence Profptr hmmprofgenerate Model returns a vector of the same length as the profile model pointing to the respective states in the output sequence Null pointers 0 mean that such states do not exist in the output sequence either because they are never touched i e jumps 2 313 hmmprofgenerate Examples See Also 2 314 from the BEGIN state to MATCH states or from MATCH states to the END state or because DELETE states are not in the output sequence
20. msnorm PropertyName PropertyValue msnorm Quantile QuantileValue msnorm Limits LimitsValue msnorm Consensus ConsensusValue msnorm Method MethodValue msnorm Max MaxValue MZ Mass charge vector with the range of ions in the spectra Y Ion intensity vector with the same length as the mass charge vector MZ Y can also be a matrix with several spectra that share the same mass charge MZ range Yout msnorm MZ Y normalizes a group of mass spectra by standardizing the area under the curve AUC to the group median Yout NormParameters msnorm returns a structure with the parameters to normalize another group of spectra msnorm MZ NewY NormParameters uses the parameter information from a previous normalization NormParameters to normalize a new set of spectra NewY with the MZ positions and output scale from the previous normalization NormParameters is a structure created by msnorm If a consensus proportion ConsensusValue was given in the previous normalization no new MZ positions are selected and normalization is performed using the same MZ positions msnorm PropertyName PropertyValue defines optional properties using property name value pairs 2 451 msnorm Example 1 2 452 msnorm Quantile QuantileValue specifies a 1 by 2 vector with the quantile limits for reducing the set of MZ values For example wh
21. CTA CTG AGT AGC 2 613 revgeneticcode http www ncbi nlm nih gov Taxonomy Utils wprintgc cgi mode c See Also Bioinformatics Toolbox functions aa2nt aminolookup baselookup geneticcode nt2aa 2 614 rmabackadj Purpose Syntax Arguments Perform background adjustment on Affymetrix microarray probe level data using Robust Multi array Average RMA procedure BackgroundAdjustedMatrix rmabackadj PMData BackgroundAdjustedMatrix rmabackadj Method MethodValue BackgroundAdjustedMatrix rmabackadj Truncate TruncateValue BackgroundAdjustedMatrix rmabackadj Showplot ShowplotValue PMData Matrix of intensity values where each row corresponds to a perfect match PM probe and each column corresponds to an Affymetrix CEL file Each CEL file is generated from a separate chip All chips should be of the same type MethodValue Property to control the estimation method for the background adjustment model parameters Enter either RMA to use estimation method described by Bolstad 2005 or MLE to estimate the parameters using maximum likelihood Default is RMA 2 615 rmabackadj Description 2 616 TruncateValue Property to control the background noise model Enter either true use a truncated Gaussian distribution or false use a nontruncated Gaussian distribution Default is true ShowplotValue Property to control the plott
22. Explore mass spectrum or set of mass spectra msviewer MZ Y msviewer Markers MarkersValue msviewer Group GroupValue MZ Mass charge vector with the range of ions in the spectra Y Ion intensity vector with the same length as the mass charge vector MZ Y can also be a matrix with several spectra that share the same mass charge MZ range msviewer MZ Y creates a GUI to display and explore a mass spectrum Y msviewer Markers MarkersValue specifies a list of marker positions from the mass charge vector MZ for exploration and easy navigation Enter a column vector with MZ values msviewer Group GroupValue specifies a class label for every spectrum with a different color for every class Enter a column vector of size numSpectra x 1 with integers The default value is numSpectra MSViewer GUI features include the following e Plot mass spectra The spectra are plotted with different colors according to their class labels e An overview displays a full spectrum and a box indicates the region that is currently displayed in the main window e Five different zoom in options one zoom out option and a reset view option resize the spectrum e Add focus move delete marker operations msviewer Examples Import Export markers from to MATLAB workspace Print and preview the spectra plot Print the spectra plot to a MATLAB figure window MSViewer has five components Menu
23. Purpose PDBStruct PDBStruct PDBStruct Syntax Retrieve protein structure data from Protein Data Bank PDB database getpdb PDBid getpdb PDBid getpdb PDBid TOFile ToFileValue SequenceOnly SequenceOnlyValue Arguments PDBid ToFileValue SequenceOnlyValue 2 222 String specifying a unique identifier for a protein structure record in the PDB database Note Each structure in the PDB database is represented by a four character alphanumeric identifier For example 4hhb is the identifier for hemoglobin String specifying a file name or a path and file name for saving the PDB formatted data If you specify only a file name that file will be saved in the MATLAB Current Directory Tip After you save the protein structure record to a local PDB formatted file you can use the pdbread function to read the file into MATLAB offline or use the molviewer function to display and manipulate a 3 D image of the structure Controls the return of the protein sequence only Choices are true or false default If there is one sequence it is returned as a character array If there are multiple sequences they are returned as a cell array getpdb Return Values Description PDBStruct MATLAB structure containing a field for each PDB record The Protein Data Bank PDB database is an archive of experimentally determined 3 D biological macromolecular structure data For more
24. Purpose Syntax Arguments Description 2 190 Filter genes with small profile variance Mask genevarfilter Data Mask FData genevarfilter Data Mask FData FNames genevarfilter Data Names genevarfilter PropertyName PropertyValue genevarfilter Percentile PercentileValue genevarfilter AobsValue AbsValValue Data Matrix where each row corresponds to a gene The first column is the names of the genes and each additional column is the results from an experiment Names Cell array with the name of a gene for each row of experimental data Names has same number of rows as Data with each row containing the name or ID of the gene in the data set Percentile Property to specify a percentile below which gene expression profiles are removed Enter a value from 0 to 100 Abs Value Property to specify an absolute value below which gene expression profiles are removed Gene profiling experiments have genes that exhibit little variation in the profile and are generally not of interest in the experiment These genes are commonly removed from the data Mask genevarfilter Data calculates the variance for each gene expression profile in Data and then identifies the expression profiles with a variance less than the 10th percentile Mask is a logical vector with one element for each row in Data The elements of Mask corresponding to rows with a variance greater than the threshold ha
25. ScoringMatrix ScoringMatrixValue nwalign Seq1 Seq2 Scale ScaleValue nwalign Seq1 Seq2 GapOpen GapOpenValue nwalign Seq1 Seq2 ExtendGap ExtendGapValue nwalign Seq1 Seq2 Showscore ShowscoreValue Seq1 Seq2 Amino acid or nucleotide sequences Enter any of the following e Character string of letters representing amino acids or nucleotides such as returned by int2aa or int2nt e Vector of integers representing amino acids or nucleotides such as returned by aa2int or nt2int e Structure containing a Sequence field Tip For help with letter and integer representations of amino acids and nucleotides see Amino Acid Lookup Table on page 2 42 or Nucleotide Lookup Table on page 2 52 AlphabetValue String specifying the type of sequence Choices are AA default or NT nwalign ScoringMatrixValue String specifying the scoring matrix to use for ScaleValue GapOpenValue the global alignment Choices for amino acid sequences are e PAM40 e PAM250 e DAYHOFF e GONNET e BLOSUM30 increasing by 5 up to BLOSUM90O e BLOSUM62 e BLOSUM100 Default is e BLOSUM50 when AlphabetValue equals DAA 1 e NUC44 when AlphabetValue equals 1 NT 1 Note All of the above scoring matrices have a built in scale factor that returns Score in bits Positive value that specifies the scale factor used to return Score
26. See Also Convert geneont object into relationship matrix Matrix ID Relationship getmatrix Geneont0Ob GeneontObj geneont object created by geneont object constructor Matrix ID Relationship getmatrix GeneontObj converts a geneont object GeneontObj into Matrix a matrix of relationship values between nodes row and column indices in which 0 indicates no relationship 1 indicates an is_a relationship and 2 indicates a part_of relationship ID is a column vector listing Gene Ontology IDs that correspond to the rows and columns of Matrix Relationship is a cell array of strings defining the types of relationships GO geneont LIVE true MATRIX ID REL getmatrix GO e Bioinformatics Toolbox functions geneont object constructor goannotread num2goid e Bioinformatics Toolbox object geneont object e Bioinformatics Toolbox methods of geneont object getancestors getdescendants getmatrix getrelatives getmatrix phytree Purpose Syntax Arguments Description Examples See Also Convert phytree object into relationship matrix Matrix ID Distances getmatrix PhytreeOb 7 PhytreeObj phytree object created by phytree object constructor Matrix ID Distances getmatrix PhytreeObj converts a phytree object PhytreeObj into a logical sparse matrix Matrix in which 1 indicates that a branch node row index is connected to its child column index
27. UG 4 1 0 4500 5 1 0 2100 2 288 graphshortestpath 6 1 0 9900 3 2 0 5100 5 2 0 3200 6 2 0 4100 4 3 0 1500 5 3 0 3200 6 3 0 2900 5 4 0 3600 6 4 0 3800 h view biograph UG ShowArrows off ShowWeights on Biograph object with 6 nodes and 11 edges 2 289 graphshortestpath Biograph Viewer 2 2 Find the shortest path in the graph from node 1 to node 6 dist path pred graphshortestpath UG 1 6 directed false dist 0 8200 path 2 290 graphshortestpath pred 3 Mark the nodes and edges of the shortest path by coloring them red and increasing the line width set h Nodes path Color 1 0 4 0 4 fowEdges getedgesbynodeid h get h Nodes path ID revEdges getedgesbynodeid h get h Nodes fliplr path ID edges fowEdges revEdges set edges LineColor 1 0 0 set edges LineWidth 1 5 2 291 graphshortestpath Biograph Viewer 2 FT ol File Tools Window Help a AY References 1 Dijkstra E W 1959 A note on two problems in connexion with graphs Numerische Mathematik 1 269 271 2 Bellman R 1958 On a Routing Problem Quarterly of Applied Mathematics 16 1 87 90 3 Siek J G Lee L Q and Lumsdaine A 2002 The Boost Graph Library User Guide and Reference Manual Upper Saddle River NJ Pearson Education 2 292 graphshortestpath See Also Bioinformatics Toolbox
28. WidthOfPulsesValue Lia msalign WindowSizeRatio WindowSizeRatioValue es msalign Iterations IterationsValue msalign GridSteps GridStepsValue msalign SearchSpace SearchSpaceValue msalign ShowPlot ShowPlotValue Lintensitiesout RefMZOut msalign Group GroupValue Arguments MZ Vector of mass charge m z values for a spectrum or set of spectra The number of elements in the vector equals n or the number of rows in the matrix Intensities Intensities Either of the following e Column vector of intensity values for a spectrum where each row corresponds to an m z value e Matrix of intensity values for a set of mass spectra that share the same m z range where each row corresponds to an m z value and each column corresponds to a spectrum The number of rows equals n or the number of elements in vector MZ 2 411 msalign RefMZ WeightsValue RangeValue WidthOfPulsesValue 2 412 Vector of m z values of known reference masses in a sample spectrum Tip For reference peaks select compounds that do not undergo structural transformation such as phosphorylation Doing so will increase the accuracy of your alignment and allow you to detect compounds that do exhibit structural transformations among the sample spectra Vector of positive values with the same number of elements as RefMZ The default vector
29. 8 Evaluate the performance of the classifier classperf cp classes test cp CorrectRate ans 0 9867 9 Use a one norm hard margin support vector machine classifier by changing the boxconstraint property 2 692 svmclassify figure svmStruct svmtrain data train groups train showplot true boxconstraint 1e6 Figure 2 classes svmclassify svmStruct data test showplot true 2 693 svmclassify Figure 2 0 training 0 classified 1 training 1 classified Support Vectors 10 Evaluate the performance of the classifier classperf cp classes test cp CorrectRate ans 0 9867 2 694 svmclassify References See Also 1 Kecman V Learning and Soft Computing MIT Press Cambridge MA 2001 2 Suykens J A K Van Gestel T De Brabanter J De Moor B and Vandewalle J Least Squares Support Vector Machines World Scientific Singapore 2002 3 Scholkopf B and Smola A J Learning with Kernels MIT Press Cambridge MA 2002 4 Cristianini N and Shawe Taylor J 2000 An Introduction to Support Vector Machines and Other Kernel based Learning Methods First Edition Cambridge Cambridge University Press http www support vector net Bioinformatics Toolbox functions classperf cr
30. Description Examples 2 510 Read mzXML file into MATLAB as structure mzXMLStruct mzxmlread File File String containing a file name or a path and file name of an mzXML file that conforms to the mzXML 2 1 specification mzXMLStruct mzxmlread File reads an mzXML file File and then creates a MATLAB structure mzXMLStruct File can be a file name or a path and file name of an mzXML file The file must conform to the mzXML 2 1 specification at http sashimi sourceforge net schema_revision mzXML_2 1 Doc mzXML_2 1_tutorial pdf mzXMLStruct includes the following fields e scan e offset mzXML Tip If you receive any errors related to memory or Java heap space try increasing your Java heap space as described at http www mathworks com support solutions data 1 1812C html out mzxmlread results mzxml1 view a scan out scan 1 peaks mz 1 2 end z out scan 1 peaks mz 2 2 end bar m z 3 mzxmliread See Also Note The file results mzxml is not provided Sample mzXML files can be found at http sashimi sourceforge net repository html Bioinformatics Toolbox functions jcampread msdotplot mslowess msppresample mssgolay msviewer mzxml2peaks 2 511 nmercount Purpose Syntax Arguments Description Examples See Also 2 512 Count number of n mers in nucleotide or amino acid sequence nmercount Seq Length nmercount Seq Length C Seq
31. Graph Theory Functions in the Bioinformatics Toolbox documentation dist path pred shortestpath BGObj S determines the single source shortest paths from node S to all other nodes in the graph represented by an N by N adjacency matrix extracted from a biograph object BGObj Weights of the edges are all nonzero entries in the N by N adjacency matrix dist are the N distances from the source to every node using Infs for nonreachable nodes and 0 for the source node path contains the winning paths to every node pred contains the predecessor nodes of the winning paths dist path pred shortestpath BGObj S T determines the single source single destination shortest path from node S to node T shortestpath PropertyName PropertyValue calls shortestpath with optional properties that use property name property value pairs You can specify one or more properties in any order Each PropertyName must be enclosed in single quotes and is case insensitive These property name property value pairs are as follows shortestpath Directed DirectedValue dicats whether the graph represented by the N by N adjacency matrix extracted from a biograph object BGObj is directed or undirected Set DirectedValue to false for an undirected graph This results in the upper triangle of the sparse matrix being ignored Default is true shortestpath Method MethodValue lets you sp
32. Property to specify a reading frame Choices are 1 2 3 or all Default is 1 If FrameValue is all then SeqAA is a 3 by 1 cell array nt2aa GeneticCodeValue Property to specify a genetic code Enter a Code Number or a string with a Code Name from the tableGenetic Code on page 2 515 If you use a Code Name you can truncate it to the first two characters Default is 1 or Standard AlternativeStartCodonsValue Property to control the translation of Genetic Code alternative codons Choices are true or false Default is true Code Code Name Number 1 Standard 2 Vertebrate Mitochondrial 3 Yeast Mitochondrial 4 Mold Protozoan Coelenterate Mitochondrial and Mycoplasma Spiroplasma 5 Invertebrate Mitochondrial 6 Ciliate Dasycladacean and Hexamita Nuclear 9 Echinoderm Mitochondrial 10 Euplotid Nuclear 11 Bacterial and Plant Plastid 12 Alternative Yeast Nuclear 13 Ascidian Mitochondrial 14 Flatworm Mitochondrial 15 Blepharisma Nuclear 2 515 nt2aa Return Values Description 2 516 Code Code Name Number 16 Chlorophycean Mitochondrial 21 Trematode Mitochondrial 22 Scenedesmus Obliquus Mitochondrial 23 Thraustochytrium Mitochondrial SeqAA String specifying an amino acid sequence SeqAA nt2aa SeqNT converts a nucleotide sequence to an amino acid sequence using the standard genetic code SeqAA nt2aa SeqNT PropertyName
33. PropertyName PropertyValue calls revgeneticcode with optional properties that use property name property value pairs You can specify one or more properties in any order Each PropertyName must be enclosed in single quotes and is case insensitive These property name property value pairs are as follows revgeneticcode Alphabet AlphabetValue defines the nucleotide alphabet to use in the map revgeneticcode ThreeLetterCodes ThreeLetterCodesValue returns the mapping structure with revgeneticcode three letter amino acid codes as field names instead of the default single letter codes if ThreeLetterCodes is true Examples moldcode revgeneticcode 4 Alphabet rna wormcode revgeneticcode Flatworm Mitochondrial ThreeLetterCodes true map revgeneticcode map Name Standard A GCT GCC GCA GCG R CGT CGC CGA CGG N AAT AAC D GAT GAC C TGT TGC Q CAA CAG E GAA GAG G GGT GGC GGA GGG H CAT CAC I ATT ATC ATA L TTA TTG CTT CTC K AAA AAG M ATG F TTT TTC P CCT CCC CCA CCG S TCT TCC TCA TCG T ACT ACC ACA ACG W TGG Y TAT TAC V GTT GTC GTA GTG Stops TAA TAG TGA Starts TTG CTG ATG References 1 NCBI Web page describing genetic codes AGA AGG
34. RowLabelsValue clustergram Data ColumnLabels ColumnLabelsValue clustergram Data Pdist PdistValue clustergram Data Linkage LinkageValue clustergram Data Dendrogram DendrogramValue clustergram Data OptimalLeafOrder OptimalLeafOrderValue clustergram Data ColorMap ColorMapValue clustergram Data SymmetricRange SymmetricRangeValue aaa clustergram Data Dimension DimensionValue clustergram Data Ratio RatioValue Data Matrix in which each row corresponds to a gene and each column corresponds to a single experiment or microarray RowLabelsValue Vector of numbers or cell array of text strings to label the rows in Data ColumnLabelsValue Vector of numbers or cell array of text strings to label the columns in Data 2 89 clustergram 2 90 PdistValue LinkageValue DendrogramValue String to specify the distance metric to pass to the pdist function Statistics Toolbox to use to calculate the pair wise distances between observations For information on choices see the pdist function Default is euclidean Note Ifthe distance metric requires extra arguments then PdistValue is a cell array For example to use the Minkowski distance with exponent P you would use minkowski P String to specify the linkage method to pass to the linkage function Statistics Toolbox to use to create the hierarchical c
35. Set DirectedValue to false for an undirected graph This results in the upper triangle of the sparse matrix being ignored Default is true traverse BGObj S Method MethodValue lets you specify the algorithm used to traverse the graph represented by the N by N adjacency matrix extracted from a biograph object BGObj Choices are e BFS Breadth first search Time complexity is 0 N E where N and E are number of nodes and edges respectively e DFS Default algorithm Depth first search Time complexity is O N E where N and E are number of nodes and edges respectively traverse biograph References See Also 1 Sedgewick R 2002 Algorithms in C Part 5 Graph Algorithms Addison Wesley 2 Siek J G Lee L Q and Lumsdaine A 2002 The Boost Graph Library User Guide and Reference Manual Upper Saddle River NJ Pearson Education Bioinformatics Toolbox functions biograph object constructor graphtraverse Bioinformatics Toolbox object biograph object Bioinformatics Toolbox methods of a biograph object allshortestpaths conncomp isdag isomorphism isspantree maxflow minspantree shortestpath topoorder 4 79 view biograph Purpose Syntax Arguments Description Examples See Also 4 80 Draw figure from biograph object view BGobj BGobjHandle view BGobj BGobj Biograph object created with the function biograph view BGobj opens
36. StepSizeValue RegressionMethod RegressionMethodValue EstimationMethod EstimationMethodValue SmoothMethod SmoothMethodValue QuantileValue QuantileValueValue PreserveHeights PreserveHeightsValue ShowPlot ShowPlotValue Range of mass charge ions Enter a vector with the range of ions in the spectra Ion intensity vector with the same length as the mass charge vector MZ Y can also be a matrix with several spectra that share the same mass charge MZ range spectrum by following three steps 1 Estimates the baseline within multiple shifted windows of width 200 m z 2 Regresses the varying baseline to the window points using a spline approximation 3 Adjusts the baseline of the spectrum Y msbackadj J PropertyName PropertyValue defines optional properties using property name value pairs 2 425 msbackadj 2 426 msbackadj WindowSize WindowSizeValue specifies the width for the shifting window WindowSizeValue can also be a function handler The function is evaluated at the respective MZ values and returns a variable width for the windows This option is useful for cases where the resolution of the signal is dissimilar at different regions of the spectrogram The default value is 200 baseline point estimated for windows with a width of 200 m z Note The result of this algorithm depends on carefully choosing the window size and the step size C
37. Structure of each column s intensity median before and after normalization and the index of the column chosen as the baseline Property to control the selection of the column index N from Data to be used as the baseline column Default is the column index whose median intensity is the median of all the columns affyinvarsetnorm ThresholdsValue StopPrctileValue RayPrctileValue Property to set the thresholds for the lowest average rank and the highest average rank which are used to determine the invariant set The rank invariant set is a set of data points whose proportional rank difference is smaller than a given threshold The threshold for each data point is determined by interpolating between the threshold for the lowest average rank and the threshold for the highest average rank Select these two thresholds empirically to limit the spread of the invariant set but allow enough data points to determine the normalization relationship ThresholdsValue is a 1 by 2 vector LT HT where LT is the threshold for the lowest average rank and HT is threshold for the highest average rank Values must be between O and 1 Default is 0 05 0 005 Property to stop the iteration process when the number of data points in the invariant set reaches N percent of the total number of data points Default is 1 Note If you do not use this property the iteration process continues until no more data points are eliminated
38. getembl PropertyName PropertyValue defines optional properties using property name value pairs getembl ToFile ToFileValue returns a structure containing information about the sequence and saves the information in a file using an EMBL data format If you do not give a location or path to the file the file is stored in the MATLAB current directory Read an EMBL formatted file back into MATLAB using the function emblread getembl SequenceOnly SequenceOnlyValue if SequenceOnlyValue is true returns the sequence information without the metadata Retrieve data for the rat liver apolipoprotein A I emblout getemb1 X00558 2 201 getembl Retrieve data for the rat liver apolipoprotein and save in the file rat_protein Ifa file name is given without a path the file is stored in the current directory Seq getembl X00558 ToFile c project rat_protein txt Retrieve only the sequence for the rat liver apolipoprotein Seq getembl X00558 SequenceOnly true See Also Bioinformatics Toolbox functions emblread getgenbank getgenpept getpdb seqtool 2 202 getgenbank Purpose Syntax Arguments Description Sequence information from GenBank database Data getgenbank AccessionNumber getgenbank AccessionNumber getgenbank PropertyName PropertyValue getgenbank ToFile ToFileValue getgenbank FileFormat FileFormatV
39. not aligned output see below hmmprofgenerate Model PropertyName PropertyValue calls hmmprofgenerate with optional properties that use property name property value pairs You can specify one or more properties in any order Each PropertyName must be enclosed in single quotes and is case insensitive These property name property value pairs are as follows hmmprofgenerate Model Align AlignValue if Align is true the output sequence is aligned to the model as follows uppercase letters and dashes correspond to MATCH and DELETE states respectively the combined count is equal to the number of states in the model Lowercase letters are emitted by the INSERT or FLANKING INSERT states If AlignValue is false the output is a sequence of uppercase symbols The default value is true hmmprofgenerate Model Flanks FlanksValue if Flanks is true the output sequence includes the symbols generated by the FLANKING INSERT states The default value is false hmmprofgenerate Model Signature SignatureValue if SignatureValue is true returns the most likely path and symbols The default value is false load hmm_model_examples model_7tm_2 load a model example rand_sequence hmmprofgenerate model_ 7tm_2 Bioinformatics Toolbox functions hmmprofalign hmmprofstruct showhmmprof hmmprofmerge Purpose Syntax Arguments Description Concatenate prealigned strings of several
40. randfeatures SubsetSize SS sets the number of features considered in every subset Default is 20 randfeatures PoolSize PS sets the targeted number of accepted subsets for the final pool Default is 1000 randfeatures NumberOfIndices N sets the number of output indices in IDX Default is the same as the number of features randfeatures CrossNorm CN applies independent normalization across the observations for every feature Cross normalization ensures comparability among different features although it is not always necessary because the selected classifier properties might already account for this Options are none default Intensities are not cross normalized meanvar X_new x mean x std x softmax x_new 1 exp mean x x std x 1 minmax x_new x min x max x min x randfeatures Verbose VerboseValue when Verbose is true turns off verbosity Default is true Find a reduced set of genes that is sufficient for classification of all the cancer types in the t matrix NCI60 data set Load sample data randfeatures See Also load NCI60tmatrix Select features I randfeatures X GROUP SubsetSize 15 Classifier da Test features with a linear discriminant classifier C classify X 1 1 25 X 1 1 25 GROUP cp classperf GROUP C cp CorrectRate Bioinformatics Toolbox functions classperf crossvalind knncla
41. 011906 2 Wu Z Irizarry R A Gentleman R Murillo F M and Spencer F 2004 A Model Based Background Adjustment for Oligonucleotide 2 27 affyprobeaffinities 2 28 See Also Expression Arrays Journal of the American Statistical Association 99 468 909 917 3 Best C J M Gillespie J W Yi Y Chandramouli G V R Perlmutter M A Gathright Y Erickson H S Georgevich L Tangrea M A Duray P H Gonzalez S Velasco A Linehan W M Matusik R J Price D K Figg W D Emmert Buck M R and Chuaqui R F 2005 Molecular alterations in primary prostate cancer after androgen ablation therapy Clinical Cancer Research 11 6823 6834 Bioinformatics Toolbox functions affyprobeseqread affyread celintensityread probelibraryinfo affyprobeseqread Purpose Read data file containing probe sequence information for Affymetrix GeneChip array Syntax Struct affyprobeseqread SeqFile CDFFile Struct affyprobeseqread SeqFile CDFFile SeqPath SeqPathValue Struct affyprobeseqread SeqFile CDFFile CDFPath CDFPathValue Struct affyprobeseqread SeqFile CDFFile SeqOnly SeqOnlyValue 2 29 affyprobeseqread 2 30 Arguments SeqFile CDFFile String specifying a file name of a sequence file tab separated or FASTA that contains the following information for a specific type of Affymetrix GeneChip array e Probe set IDs e Probe x c
42. 1x4723 char 1x105 char 1x95 char Bioinformatics Toolbox functions genbankread getembl getgenpept getpdb seqtool 2 205 getgenpept Purpose Syntax Arguments Description 2 206 Retrieve sequence information from GenPept database Data getgenpept AccessionNumber getgenpept getgenpept getgenpept J SA getgenpept getgenpept AccessionNumber ToFileValue FileFormatValue SequenceOnlyValue J PropertyName PropertyValue ToFile ToFileValue FileFormat FileFormatValue SequenceOnly SequenceOnlyValue Unique identifier for a sequence record Enter a combination of letters and numbers Property to specify the location and file name for saving data Enter either a file name or a path and file name supported by your system ASCII text file Property to select the format for the file specified with the property ToFileValue Enter either GenBank or FASTA Property to control getting the sequence without metadata Enter either true or false getgenpept retrieves a protein amino acid sequence and sequence information from the GenPept database This database is a translation of the nucleotide sequences in GenBank and is maintained by the National Center for Biotechnology Information NCBI Note NCBI has changed the name of their protein search engine from GenPept to Entrez Protein However the function names
43. 2 399 mavolcanoplot Examples 2 400 Two vertical fold change lines at a fold change level of 2 which corresponds to a ratio of 1 and 1 on a log ratio scale Lines will be at different fold change levels if you used the Foldchange property One horizontal line at the 0 05 p value level which is equivalent to 1 3010 on the log p value scale The line will be at a different p value level if you used the PCutoff property Data points for genes that are considered both statistically significant above the p value line and differentially expressed outside of the fold changes lines appear in orange After you display the volcano scatter plot you can interactively Adjust the vertical fold change lines by click dragging one line or entering a value in the Fold Change text box Adjust the horizontal p value cutoff line by click dragging or entering a value in the p value Cutoff text box Display labels for data points by clicking a data point Select a gene from the Up Regulated or Down Regulated list to highlight the corresponding data point in the plot Press and hold Ctrl or Shift to select multiple genes Zoom the plot by selecting Tools gt Zoom In or Tools gt Zoom Out View lists of significantly up regulated and down regulated genes and their associated p values and optionally export the labels p values and fold changes to a structure in the MATLAB Workspace by clicking Export Load
44. 4 43 isspantree biograph 4 44 Purpose Syntax Arguments Description References See Also Determine if tree created from biograph object is spanning tree TF isspantree BGObj7 BGObj biograph object created by biograph object constructor Tip For introductory information on graph theory functions see Graph Theory Functions in the Bioinformatics Toolbox documentation TF isspantree BGObj returns logical 1 true if the N by N adjacency matrix extracted from a biograph object BGObj is a spanning tree and logical 0 false otherwise A spanning tree must touch all the nodes and must be acyclic The lower triangle of the N by N adjacency matrix represents an undirected graph and all nonzero entries indicate the presence of an edge 1 Siek J G Lee L Q and Lumsdaine A 2002 The Boost Graph Library User Guide and Reference Manual Upper Saddle River NJ Pearson Education Bioinformatics Toolbox functions biograph object constructor graphisspantree Bioinformatics Toolbox object biograph object Bioinformatics Toolbox methods of a biograph object allshortestpaths conncomp isdag isomorphism maxf low minspantree shortestpath topoorder traverse maxflow biograph Purpose Calculate maximum flow and minimum cut in biograph object Syntax MaxFlow FlowMatrix Cut maxflow BGObj SNode TNode maxflow BGObj SNode TNode Capacity CapacityValue
45. Enter false for an undirected graph This results in the upper triangle of the sparse matrix being ignored Default is true String that specifies the algorithm used to traverse the graph Choices are e BFS Breadth first search Time complexity is O N E where N and E are number of nodes and edges respectively e DFS Default algorithm Depth first search Time complexity is O N E where N and E are number of nodes and edges respectively Tip For introductory information on graph theory functions see Graph Theory Functions in the Bioinformatics Toolbox documentation 2 298 graphtraverse Examples disc pred closed graphtraverse G S traverses graph G starting from the node indicated by integer S Gis an N by N sparse matrix that represents a directed graph Nonzero entries in matrix G indicate the presence of an edge disc is a vector of node indices in the order in which they are discovered pred is a vector of predecessor node indices listed in the order of the node indices of the resulting spanning tree closed is a vector of node indices in the order in which they are closed graphtraverse G S PropertyName PropertyValue calls graphtraverse with optional properties that use property name property value pairs You can specify one or more properties in any order Each PropertyName must be enclosed in single quotes and is case insensitive These property name property v
46. R Eddy S Krogh A and Mitchison G 1998 Biological Sequence Analysis Cambridge University Press 2 Smith T and Waterman M 1981 Identification of common molecular subsequences Journal of Molecular Biology 147 195 197 See Also Bioinformatics Toolbox functions blosum nt2aa nwalign pam seqdotplot showalignment 2 722 traceplot Purpose Syntax Description Examples See Also Draw nucleotide trace plots traceplot TraceStructure traceplot A C G T h traceplot traceplot TraceStructure creates a trace plot from data in a structure with fields A C G T traceplot A C G T creates a trace plot from data in vectors A C G T h traceplot returns a structure with the handles of the lines corresponding to A C G T tstruct scfread sample scf traceplot tstruct Bioinformatics Toolbox e function scfread 2 723 Methods By Category Phylogenetic Tree p 3 1 Graph Visualization p 3 2 Gene Ontology p 3 3 Phylogenetic Tree Select modify and plot phylogenetic trees using phytree object methods View relationships between data visually with interactive maps hierarchy plots and pathways using biograph object methods Explore and analyze Gene Ontology data using geneont object methods Following are methods for use with a phytree object get phytree getbyname phytree getcanonical phytree getmatrix phytree ge
47. ShowplotValue controls the display of a plot of the probe affinity base profile Choices are true or false default 2 25 affyprobeaffinities Examples 1 Load the MAT file included with Bioinformatics Toolbox that contains Affymetrix data from a prostate cancer study The variables in the MAT file include seqMatrix a matrix containing sequence information for PM probes mmMatrix a matrix containing MM probe intensity values and probeIndices a column vector containing probe indexing information load prostatecancerrawdata 2 Compute the Affymetrix PM and MM probe affinities from their sequences and MM probe intensities and also plot the affinity values of each of the four bases A C G and T for each of the 25 sequence positions for all probes on the Affymetrix GeneChip array apm amm affyprobeaffinities seqMatrix mmMatrix 1 ProbeIndices probeIndices showplot true 2 26 affyprobeaffinities TT lolx File Edit View Insert Tools Desktop Window Help DSW es QAQMol e O0H8 a0 Position dependent Affinity Base Profile ceeeSCc 02 ceS Oa c P 0 1 M n z gar Tree Se er vvTeTrTT es E Rrrr beg eau The prostatecancerrawdata mat file used in this example contains data from Best et al 2005 References 1 Naef F and Magnasco M O 2003 Solving the Riddle of the Bright Mismatches Labeling and Effective Binding in Oligonucleotide Arrays Physical Review E 68
48. Solve shortest path problem in biograph object Perform topological sort of directed acyclic graph extracted from biograph object Traverse biograph object by following adjacent nodes Draw figure from biograph object Following are methods of a node object getancestors biograph getdescendants biograph getrelatives biograph Find ancestors in biograph object Find descendants in biograph object Find relatives in biograph object A biograph object contains two objects node objects and edge objects that have their own properties For a list of the properties of node objects and edge objects see the following tables biograph object Properties of a Biograph Object Property Description ID Label Description LayoutType EdgeType Scale String to identify the biograph object Default is This information is for bookkeeping purposes only String to label the biograph object Default is This information is for bookkeeping purposes only String that describes the biograph object Default is This information is for bookkeeping purposes only String that specifies the algorithm for the layout engine Choices are e hierarchical default e equilibrium e radial String that specifies how edges display Choices are e straight e curved default e segmented Note Curved or segmented edges occur only when necessary to avoid obst
49. The Boost Graph Library User Guide and Reference Manual Upper Saddle River NJ Pearson Education Bioinformatics Toolbox functions biograph object constructor graphconncomp Bioinformatics Toolbox object biograph object Bioinformatics Toolbox methods of a biograph object allshortestpaths isdag isomorphism isspantree maxflow minspantree shortestpath topoorder traverse dolayout biograph Purpose Calculate node positions and edge trajectories Syntax dolayout BGobj dolayout BGobj Paths PathsOnlyValue Arguments BGobj Biograph object created by the biograph function object constructor PathsOnlyValue Controls the calculation of only the edge paths leaving the nodes at their current positions Choices are true or false default Description dolayout BGobj calls the layout engine to calculate the optimal position for each node so that its 2 D rendering is clean and uncluttered and then calculates the best curves to represent the edges The layout engine uses the following properties of the biograph object e LayoutType Specifies the layout engine as hierarchical equilibrium or radial e LayoutScale Rescales the sizes of the node before calling the layout engine This gives more space to the layout and reduces the overlapping of nodes e NodeAutoSize Controls precalculating the node size before calling the layout engine When NodeAutoSize is set to on the layout engine
50. The default color is magenta The following options are only available when showing pairwise alignments showalignment Alignment StartPointers StartPointersValue specifies the starting indices in the original sequences of a local alignment showalignment Alignment Columns ColumnsValue specifies how many columns per line to use in the output and labels the start of each row with the sequence positions 2 683 showalignment Examples Enter two amino acid sequences and show their alignment Score Alignment nwalign VSPAGMASGYD IPGKASYD showalignment Alignment 8 Identities 6 11 55 VSPAGMASGYD Peay I P GKAS YD ioii Positives 7 11 64 Enter a multiply aligned set of sequences and show their alignment gag multialignread aagag aln showalignment gag See Also Bioinformatics Toolbox functions nwalign swalign 2 684 showhmmprof Purpose Syntax Arguments Description Plot Hidden Markov Model HMM profile showhmmprof Model showhmmprof PropertyName PropertyValue showhmmprof Scale ScaleValue showhmmprof Order OrderValue Model Hidden Markov model created by the function gethmmprof or pfamhmmread ScaleValue Property to select a probability scale Enter one of the following values e logprob Log probabilities e prob Probabilities e logodds Log odd ratios Ord
51. ThermoAlpha 4x3 double 3 List the thermodynamic calculations for the sequence 1 Thermo ans 178 5000 477 5700 36 1125 182 1000 497 8000 33 6809 190 2000 522 9000 34 2974 191 9000 516 9000 37 7863 Calculating Properties for a DNA Sequence with Ambiguous Characters 1 Calculate sequence properties of the sequence ACGTAGAGGACGTN S2 oligoprop ACGTAGAGGACGTN 2 GC 53 5714 GCAlpha 3 5714 Hairpins ACGTagaggACGTn 2 537 oligoprop References 2 538 Dimers 3x14 char MolWeight 4 3329e 003 MolWeightAlpha 20 0150 Tm 38 8357 42 2958 57 7880 52 4180 49 9633 55 1330 TmAlpha 1 4643 1 4643 10 3885 3 4633 0 2829 3 8074 Thermo 4x3 double ThermoAlpha 4x3 double 2 List the potential dimers for the sequence 2 Dimers ans ACGTagaggacgtn ACGTagaggACGTn acgtagagGACGTN 1 Breslauer K J Frank R Blocker H and Marky L A 1986 Predicting DNA duplex stability from the base sequence Proceedings of the National Academy of Science USA 83 3746 3750 2 Chen S H Lin C Y Cho C S Lo C Z and Hsiung C A 2003 Primer Design Assistant PDA A web based primer design tool Nucleic Acids Research 31 13 3751 3754 3 Howley P M Israel M A Law M and Martin M A 1979 A rapid method for detecting and mapping homology between heterologous DNAs Evaluation of polyomavirus genomes The Journal of Biological Chemistry 254 11 4876 4883 4 Marmur
52. a DNA oligonucleotide oligoprop SeqNT returns the sequence properties for a DNA oligonucleotide as a structure with the following fields Field GC Description Percent GC content for the DNA oligonucleotide Ambiguous N characters in SeqNT are considered to potentially be any nucleotide If SeqNT contains ambiguous N characters GC is the midpoint value and its uncertainty is expressed by GCdelta GCdelta The difference between GC midpoint value and either the maximum or minimum value GC could assume The maximum and minimum values are calculated by assuming all N characters are G C or not G C respectively Therefore GCdelta defines the possible range of GC content oligoprop Field Description Hairpins H by length SeqNT matrix of characters displaying all potential hairpin structures for the sequence SeqNT Each row is a potential hairpin structure of the sequence with the hairpin forming nucleotides designated by capital letters H is the number of potential hairpin structures for the sequence Ambiguous N characters in SeqNT are considered to potentially complement any nucleotide Dimers D by length SeqNT matrix of characters displaying all potential dimers for the sequence SeqNT Each row is a potential dimer of the sequence with the self dimerizing nucleotides designated by capital letters D is the number of potential dimers for the sequence Ambiguous N characters in SeqNT
53. aminolookup function reference 2 41 atomiccomp function reference 2 46 basecount function reference 2 48 baselookup function reference 2 52 biograph constructor reference 2 55 biograph object reference 5 2 blastncbi function reference 2 65 blastread function reference 2 73 blosum function reference 2 75 C celintensityread function reference 2 77 classperf function reference 2 82 cleave function reference 2 86 clustergram function reference 2 89 codonbias function reference 2 100 codoncount function reference 2 103 conncomp method reference 4 5 cpgisland function reference 2 107 crossvalind function reference 2 110 D dayhoff function reference 2 113 dimercount function reference 2 114 dna2rna function reference 2 117 dnds function reference 2 118 dndsml function reference 2 125 dolayout method reference 4 8 emblread function reference 2 130 evalrasmolscript function Index 1 Index reference 2 133 exprprofrange function reference 2 135 exprprofvar function reference 2 136 F fastaread function reference 2 137 fastawrite function reference 2 140 featuresmap reference 2 142 featuresparse reference 2 152 functions aa2int 2 2 aa2nt 2 5 aacount 2 10 affyinvarsetnorm 2 14 affyprobeaffinities 2 22 affyprobeseqread 2 29 affyread 2 34 agferead 2 39 aminolookup 2 41 atomiccomp 2 46 basecount 2 48 baselookup 2 52 biograph constructor 2 55 blastncbi 2 65 blastread 2 73 blosum 2
54. calls randseq with optional properties that use property name property value pairs You can specify one or more properties in any order Each PropertyName must be enclosed in single quotes and is case insensitive These property name property value pairs are as follows Seq randseq SeqLength Alphabet AlphabetValue generates a sequence from a specific alphabet Seq randseq SeqlLength Weights WeightsValue creates a weighted random sequence where the ith letter of the sequence alphabet is selected with weight W i The weight vector is usually a probability vector or a frequency count vector Note that the ith element of the nucleotide alphabet is given by int2nt i and the ith element of the amino acid alphabet is given by int2aa i Seq randseq SeqLength FromStructure FromStructureValue creates a weighted random sequence with weights given by the output structure from basecount dimercount codoncount or aacount Seq randseq SeglLength Case CaseValue specifies the case for a letter sequence Seq randseq SeqlLength DataType DataTypeValue specifies the data type for the sequence array Generate a random DNA sequence randseq 20 ans TAGCTGGCCAAGCGAGCTTG Generate a random RNA sequence randseq 20 alphabet rna ans 2 597 randseq GCUGCGGCGGUUGUAUCCUG Generate a random protein sequence randseq 20 alphabet amino ans
55. e s as i x x x 3H e oe x x e 4 f e 3 x o eo 5 e i 1 e e L 5 0 5 If you compare this plot with the one in Example 2 you see that some of the data points are classified differently using three nearest neighbors 1 Mitchell T 1997 Machine Learning McGraw Hill Bioinformatics Toolbox functions knnimpute classperf crossvalind svmclassify svmtrain Statistics Toolbox functions classify 2 345 knnimpute Purpose Syntax Arguments Description 2 346 Impute missing data using nearest neighbor method knnimpute Data knnimpute Data k knnimpute PropertyName PropertyValue knnimpute Distance DistanceValue knnimpute DistArgs DistArgsValue knnimpute Weights WeightsValues knnimpute Median MedianValue Data k knnimpute Data replaces NaNs in Data with the corresponding value from the nearest neighbor column The nearest neighbor column is the closest column in Euclidean distance If the corresponding value from the nearest neighbor column is also NaN the next nearest column is used knnimpute Data k replaces NaNs in Data with a weighted mean of the k nearest neighbor columns The weights are inversely proportional to the distances from the neighboring columns knnimpute PropertyName PropertyValue defines optional properties using property name value pairs knnimpute Distance Di
56. get set getancestors geneont geancesors igeneoni i Purpose Syntax Description Examples 4 16 Numeric IDs for ancestors of Gene Ontology term AncestorIDs getancestors GeneontObj ID AncestorIDs getancestors Height HeightValue AncestorIDs getancestors GeneontObj ID returns the numeric IDs AncestorIDs for the ancestors of a term ID including the ID for the term ID is a nonnegative integer or a numeric vector with a set of IDs AncestorIDs getancestors PropertyName PropertyValue defines optional properties using property name value pairs AncestorIDs getancestors Height HeightValue searches up through a specified number of levels Height Value in the Gene Ontology database HeightValue is a positive integer Default is Inf 1 Download the Gene Ontology database from the Web into MATLAB GO geneont LIVE true MATLAB creates a geneont object and displays the number of terms in the database Gene Ontology object with 20005 Terms 2 Get the ancestors for a Gene Ontology term ancestors getancestors GO0O 46680 ancestors 8150 9628 9636 17085 42221 getancestors geneont I srsar r lt r geneen 46680 50896 3 Create a sub Gene Ontology subontology GO ancestors Gene Ontology object with 7 Terms 4 View relationships using the biograph functions cm acc rels getmatrix subontology BG biograph cm get subontolo
57. graphshortestpath graphtopoorder Bioinformatics Toolbox method of biograph object traverse 2 306 hmmprofalign Purpose Syntax Arguments Align query sequence to profile using hidden Markov model alignment Alignment hmmprofalign Model Seq Alignment Score hmmprofalign Model Seq Score Alignment Prointer hmmprofalign Model Seq hmmprofalign PropertyName PropertyValue hmmprofalign ShowScore ShowScoreValue hmmprofalign Flanks FlanksValue hmmprofalign ScoreFlanks ScoreFlanksValue hmmprofalign ScoreNullTransitions ScoreNullTransitionValue Model Hidden Markov model created with the function hmmprofstruct Seq Amino acid or nucleotide sequence You can also enter a structure with the field Sequence ShowScoreValue Property to control displaying the scoring space and the winning path Enter either true or false default FlanksValue Property to control including the symbols generated by the FLANKING INSERT states in the output sequence Enter either true or false default ScoreFlanksValue Property to control including the transition probabilities for the flanking states in the raw score Enter either true or false default ScoreNullTransValue Property to control adjusting the raw score using the null model for transitions Model Nu11X Enter either true or false default 2 307 hmmprofalign Description 2 308 Alignm
58. in the following subfields e NumOfResidues e ChainID e ResidueNames Contains the three letter codes for the sequence residues e Sequence Contains the single letter codes for the sequence residues Note Ifthe sequence has modified residues then the ResidueNames subfield might not correspond to the standard three letter amino acid codes In this case the Sequence subfield will contain the modified residue code in the position corresponding to the modified residue The modified residue code is provided in the Modif iedResidues field The Model Field The Model field is also a structure or an array of structures containing coordinate information If the MATLAB structure contains one model the Model field is a structure containing coordinate information for that model If the MATLAB structure contains multiple models the Model field is an array of structures containing coordinate information for each model The Model field contains the following subfields e Atom e AtomSD getpdb AnisotropicTemp AnisotropicTempSD Terminal HeterogenAtom The Atom Field The Atom field is also an array of structures containing the following subfields AtomSerNo AtomName altLoc resName chainID resseq iCode X Y Z occupancy tempFactor segID element charge AtomNameStruct Contains three subfields chemSymbol remoteiInd and branch 2 227 getpdb Examples Retrieve the structure information fo
59. maimage Examples See Also 2 362 maimage ColorBar ColorBarValue when ColorBarValue is true a color bar is shown If ColorBarValue is false no color bar is shown The default is for the color bar to be shown maimage HandleGraphicsPropertyName PropertyValue allows you to pass optional Handle Graphics property name value pairs to the function For example a name value pair for color could be maimage color r madata gprread mouse_aiwt gpr maimage madata F635 Median figure maimage madata F635 Median B635 Title Cy5 Channel FG BG colormap hot Bioinformatics Toolbox functions maboxplot magetfield mairplot maloglog malowess MATLAB function imagesc mainvarsetnorm Purpose Syntax Arguments Perform rank invariant set normalization on gene expression values from two experimental conditions or phenotypes NormDataY mainvarsetnorm Datax DatayY NormDataY mainvarsetnorm ThresholdsValue NormDataY mainvarsetnorm ExcludeValue NormDataY mainvarsetnorm PrcetileValue NormDataY mainvarsetnorm IterateValue NormDataY mainvarsetnorm NormDataY mainvarsetnorm J z Thresholds Exclude Prcetile Iterate Method MethodValue Span SpanValue NormDataY mainvarsetnorm Showplot ShowplotValue Datax Vector of gene expression values fro
60. mattest 2 389 mavolcanoplot 2 395 molviewer 2 403 molweight 2 402 msalign 2 411 msbackadj 2 425 msdotplot 2 430 msheatmap 2 436 mslowess 2 446 msnorm 2 451 mspalign 2 455 mspeaks 2 465 msppresample 2 478 msresample 2 486 mssgolay 2 490 msviewer 2 492 multialign 2 495 multialignread 2 504 multialignviewer 2 506 mzxml2peaks 2 507 mzxmlread 2 510 nmercount 2 512 nt2aa 2 513 nt2int 2 518 ntdensity 2 520 nuc44 2 522 num2goid 2 523 nwalign 2 524 oligoprop 2 531 optimalleaforder 2 540 palindromes 2 544 pam 2 546 pdbdistplot 2 548 pdbread 2 550 pdbwrite 2 557 pfamhmmread 2 560 phytree constructor 2 561 phytreeread 2 565 phytreetool 2 566 phytreewrite 2 568 probelibraryinfo 2 570 probesetlink 2 572 probesetlookup 2 574 probesetplot 2 575 probesetvalues 2 576 profalign 2 578 proteinplot 2 581 proteinpropplot 2 584 Index 3 Index Index 4 quantilenorm 2 590 ramachandran 2 591 randfeatures 2 593 randseq 2 596 rankfeatures 2 599 rebasecuts 2 604 redgreencmap 2 606 restrict 2 608 revgeneticcode 2 611 rmabackadj 2 615 rmasummary 2 620 rna2dna 2 624 scfread 2 625 seq2regexp 2 628 seqcomplement 2 631 seqconsensus 2 632 seqdisp 2 634 seqdotplot 2 636 seqinsertgaps 2 638 seqlinkage 2 641 seqlogo 2 643 seqmatch 2 650 seqneighjoin 2 651 seqpdist 2 654 seqprofile 2 665 seqrcomplement 2 668 seqreverse 2 669 seqshoworfs 2 670 seqshowwords 2 675 seqtool 2 678 seqwordcount 2 680 showalignment 2 682 showhmm
61. procedure originally introduced by Benjamini and Hochberg 1995 to computes an FDR adjusted p value for each value in PValues Choices are true or false default Note If BHFDRValue is set to true the Lambda and Method properties are ignored mafdr PValues Lambda LambdaValue specifies lambda the tuning parameter used to estimate the true null hypotheses fto A LambdaValue can be either e A single value that is gt 0 and lt 1 e A series of values Each value must be gt 0 and lt 1 There must be at least four values in the series mafdr Tip The series of values can be expressed by a colon operator with the form first incr last where first is the first value in the series incr is the increment and last is the last value in the series Default LambdaValue is the series of values 0 01 0 01 0 95 Note If LambdaValue is set to a single value the Method property is ignored mafdr PValues Method MethodValue specifies a method to calculate the true null hypothesis o from the tuning parameter LambdaValue when LambdaValue is a series of values Choices are bootstrap default or polynomial mafdr PValues Showplot ShowplotValue controls the display of two plots e Plot of the estimated true null hypotheses 7to A versus the tuning parameter lambda with a cubic polynomial fitting curve e Plot of q values versus p values Choices
62. seqpdist Seqs PairwiseAlignment PairwiseAlignmentValue D seqpdist Seqs JobManager JobManagerValue D seqpdist Seqs WaitInQueue WaitInQueueValue D seqpdist Seqs SquareForm SquareFormValue D seqpdist Seqs Alphabet AlphabetValue D seqpdist Seqs ScoringMatrix ScoringMatrixValue aaa D seqpdist Seqs Scale ScaleValue D seqpdist Seqs GapOpen GapOpenValue D seqpdist Seqs ExtendGap ExtendGapValue Arguments Segs Any of the following e Cell array containing nucleotide or amino acid sequences e Vector of structures containing a Sequence field e Matrix of characters in which each row corresponds to a nucleotide or amino acid sequence MethodValue String that specifies the method for calculating pair wise distances Default is Jukes Cantor IndelsValue String that specifies how to treat sites with gaps Default is score 2 654 seqpdist OptargsValue PairwiseAlignmentValue String or cell array specifying one or more input arguments required or accepted by the distance method specified by the Method property Controls the global pair wise alignment of input sequences using the nwalign function while ignoring the multiple alignment of the input sequences if any Choices are true or false Default is e true When all input sequences do not have the same length e false When all input se
63. structure in the PDB is represented by a 4 character alphanumeric identifier For example 4hhb is the identification code for hemoglobin Distance Threshold distance in Angstroms shown on a spy plot Default value is 7 Description pdbdistplot displays the distances between atoms and amino acids in a PDB structure pdbdistplot PDBid retrieves the entry PDBid from the Protein Data Bank PDB database and creates a heat map showing interatom distances and a spy plot showing the residues where the minimum distances apart are less than 7 Angstroms PDBid can also be the name of a variable or a file containing a PDB MATLAB structure pdbdistplot PDBid Distance specifies the threshold distance shown on a spy plot Examples Show spy plot at 7 Angstroms of the protein cytochrome C from albacore tuna pdbdistplot 5CYT Now take a look at 10 Angstroms pdbdistplot 5CYT 10 2 548 pdbdistplot See Also Bioinformatics Toolbox functions getpdb molviewer pdbread proteinplot ramachandran 2 549 pdbread Purpose Syntax Arguments Return Values Description 2 550 Read data from Protein Data Bank PDB file PDBStruct pdbread File PDBStruct pdbread File ModelNum ModelNumValue File Either of the following e String specifying a file name a path and file name or a URL pointing to a file The referenced file is a Protein Data Bank PDB formatted file ASCII text file If yo
64. which each pair of sequences are different ignoring gaps The guide tree is calculated by the neighbor joining method assuming equal variance and independence of evolutionary distance estimates SeqsMultiAligned multialign Seqs Tree uses a tree Tree as a guide for the progressive alignment The sequences Seqs should have the same order as the leaves in the tree Tree or use a field Header or Name to identify the sequences multialign PropertyName PropertyValue enters optional arguments as property name value pairs multialign Weights WeightsValue selects the sequence weighting method Weights emphasize highly divergent sequences by scaling the scoring matrix and gap penalties Closer sequences receive smaller weights Values of the property Weights e THG default Thompson Higgins Gibson method using the phylogenetic tree branch distances weighted by their thickness e equal Assigns same weight to every sequence multialign ScoringMatrix ScoringMatrixValue selects the scoring matrix ScoringMatrixValue for the progressive alignment Match and mismatch scores are interpolated from the series of scoring matrices by considering the distances between the two profiles or sequences being aligned The first matrix corresponds to the smallest distance and the last matrix to the largest distance Intermediate distances are calculated using linear interpolation multialign S
65. 2 LABELS maStruct Names Bioinformatics Toolbox functions maboxplot magetfield mainvarsetnorm maimage mairplot malowess manorm mattest mavolcanoplot MATLAB function loglog malowess Purpose Syntax Arguments Description Smooth microarray data using Lowess method YSmooth malowess X Y malowess PropertyName PropertyValue malowess Order OrderValue malowess Robust RobustValue malowess Span SpanValue X Y Scatter data OrderValue Property to select the order of the algorithm Enter either 1 linear fit or 2 quadratic fit The default order is 1 RobustValue Property to select a robust fit Enter either true or false SpanValue Property to specify the window size The default value is 0 05 5 of total points in X YSmooth malowess X Y smooths scatter data X Y using the Lowess smoothing method The default window size is 5 of the length of X malowess PropertyName PropertyValue defines optional properties using property name value pairs malowess Order OrderValue chooses the order of the algorithm Note that Curve Fitting Toolbox refers to Lowess smoothing of order 2 as Loess smoothing malowess Robust RobustValue uses a robust fit when RobustValue is set to true This option can take a long time to calculate malowess Span SpanValue modifies the window size f
66. 6823 6834 2 Storey J D 2002 A direct approach to false discovery rates Journal of the Royal Statistical Society 64 3 479 498 3 Storey J D and Tibshirani R 2003 Statistical significance for genomewide studies Proc Nat Acad Sci 100 16 9440 9445 4 Storey J D Taylor J E and Siegmund D 2004 Strong control conservative point estimation and simultaneous conservative consistency of false discovery rates A unified approach Journal of the Royal Statistical Society 66 187 205 5 Benjamini Y and Hochberg Y 1995 Controlling the false discovery rate A practical and powerful approach to multiple testing Journal of the Royal Statistical Society 57 289 300 Bioinformatics Toolbox functions gcrma mairplot maloglog mapcaplot mattest mavolcanoplot rmasummary 2 359 magetfield Purpose Syntax Arguments Description Examples See Also 2 360 Extract data from microarray structure magetfield MAStruct FieldName MAStruct FieldName magetfield MAStruct FieldName extracts data for a column FieldName from a microarray structure MAStruct The benefit of this function is to hide the details of extracting a column of data from a structure created with one of the microarray reader functions gprread agferead sptread imageneread maStruct gprread mouse_alwt gpr cy3data magetfield maStruct F635 Median cy5data magetfield maStruct F532 Median
67. 75 celintensityread 2 77 classperf 2 82 cleave 2 86 clustergram 2 89 codonbias 2 100 codoncount 2 103 cpgisland 2 107 Index 2 crossvalind 2 110 dayhoff 2 113 dimercount 2 114 dna2rna 2 117 dnds 2 118 dndsml 2 125 emblread 2 130 evalrasmolscript 2 133 exprprofrange 2 135 exprprofvar 2 136 fastaread 2 137 fastawrite 2 140 featuresmap 2 142 featuresparse 2 152 galread 2 158 gcrma 2 159 germabackadj 2 168 genbankread 2 177 geneentropyfilter 2 179 genelowvalfilter 2 181 geneont 2 183 generangefilter 2 186 geneticcode 2 188 genevarfilter 2 190 genpeptread 2 192 geosoftread 2 195 getblast 2 197 getembl 2 200 getgenbank 2 203 getgenpept 2 206 getgeodata 2 209 gethmmalignment 2 211 gethmmprof 2 215 gethmmtree 2 220 getpdb 2 222 goannotread 2 229 gonnet 2 231 gprread 2 232 graphallshortestpaths 2 235 graphconncomp 2 242 Index graphisdag 2 249 graphisomorphism 2 255 graphisspantree 2 262 graphmaxflow 2 264 graphminspantree 2 272 graphpred2path 2 278 graphshortestpath 2 282 graphtopoorder 2 294 graphtraverse 2 298 hmmprofalign 2 307 hmmprofestimate 2 310 hmmprofgenerate 2 313 hmmprofmerge 2 315 hmmprofstruct 2 317 imageneread 2 323 int2aa 2 326 int2nt 2 329 isoelectric 2 332 jcampread 2 335 joinseq 2 338 knnclassify 2 339 knnimpute 2 346 maboxplot 2 350 mafdr 2 353 magetfield 2 360 maimage 2 361 mainvarsetnorm 2 363 mairplot 2 371 maloglog 2 379 malowess 2 381 manorm 2 383 mapcaplot 2 386
68. Acid Integers to Letters Amino Acid Integer Code Alanine 1 Arginine Asparagine Aspartic acid Aspartate Cysteine Glutamine Glutamic acid Glutamate Glycine OINIo oa AJOJN QO jmio oa uo j 2 n gt int2aa Description Amino Acid Integer Code Histidine 9 H Isoleucine 10 I Leucine 11 L Lysine 12 K Methionine 13 M Phenylalanine 14 F Proline 15 P Serine 16 S Threonine 17 T Tryptophan 18 W Tyrosine 19 y Valine 20 V Aspartic acid or Asparagine 21 B Glutamic acid or glutamine 22 Z Any amino acid 23 X Translation stop 24 Gap of indeterminate length 25 Unknown or any integer notin 0 table SeqChar int2aa SeqInt converts a 1 by N array of integers specifying an amino acid sequence to a character string of single letter codes specifying the same amino acid sequence See the table Mapping Amino Acid Integers to Letters on page 2 326 for valid integers 2 327 int2aa Examples See Also 2 328 SeqChar int2aa SeqInt Case CaseValue specifies the case of the returned character string representing an amino acid sequence Choices are upper default or lower Convert an amino acid sequence from integer to letter representation s int2aa 13 1 17 11 1 21 S MATLAB Bioinformatics Toolbox functions aa2int aminolookup int2nt nt2int int2nt Purpose Sy
69. Aspartic acid or Asparagine B 21 Glutamic acid or glutamine Z 22 Any amino acid X 23 Translation stop 24 Gap of indeterminate length 25 Unknown or any character or 0 symbol not in table SeqInt aa2int SeqChar converts SeqChar a string of single letter codes specifying an amino acid sequence to SeqInt a 1 by N array of integers specifying the same amino acid sequence See the table Mapping Amino Acid Letters to Integers on page 2 2 for valid codes Converting a Simple Sequence Convert the sequence of letters MATLAB to integers 2 3 aa2int SeqInt aa2int MATLAB SeqInt 13 1 17 11 1 21 Converting a Random Sequence Convert a random amino acid sequence of letters to integers 1 Create a random character string to represent an amino acid sequence SeqChar randseq 20 alphabet amino SeqChar dwcztecakfuecvifchds 2 Convert the amino acid sequence from letter to integer representation SeqInt aa2int SeqChar SeqInt Columns 1 through 13 4 18 5 22 17 7 5 1 12 14 0 7 5 Columns 14 through 20 20 10 14 5 9 4 16 See Also Bioinformatics Toolbox functions aminolookup int2aa int2nt nt2int 2 4 aa2nt Purpose Syntax Arguments Convert amino acid sequence to nucleotide sequence SeqNT aa2nt SeqAA aa2nt PropertyName PropertyValue aa2nt GeneticCode GeneticCodeValue aa2nt Alphabet AlphabetValue SeqAA Amino acid sequence
70. B N X 15 Any nucleotide G A T C N 16 Gap of indeterminate Gap length baselookup Complement SegNT displays the complementary nucleotide sequence baselookup Code CodeValue displays the corresponding letter code meaning and name For ambiguous nucleotide letters R Y KMS W B DH V N X the name is replace by a descriptive name baselookup Integer IntegerValue displays the corresponding letter code meaning and nucleotide name 2 53 baselookup baselookup Name NameValue displays the corresponding letter code and meaning Examples baselookup Complement TAGCTGRCCAAGGCCAAGCGAGCTTN baselookup Name cytosine See Also Bioinformatics Toolbox functions basecount codoncount dimercount geneticcode nt2aa nt2int revgeneticcode seqtool 2 54 biograph Purpose Syntax Create biograph object BGobj biograph CMatrix BGobj biograph CMatrix NodeIDs BGobj biograph CMatrix NodeIDs BGobj biograph CMatrix NodeIDs lw BGobj biograph CMatrix NodeIDs DescriptionValue BGobj biograph CMatrix NodeIDs LayoutTypeValue BGobj biograph CMatrix NodeIDs EdgeTypeValue BGobj biograph CMatrix NodeIDs sees BGobj biograph CMatrix NodeIDs LayoutScaleValue BGobj biograph CMatrix NodeIDs EdgeTextColorValue BGobj biograph CMatrix NodeIDs EdgeFontSizeValue BGobj biograph CMatrix NodeIDs Show
71. Consensus cp classperf species c get cp 2 83 classperf 2 84 o 10 fold cross validation on the fisheriris data using linear discriminant analysis and the third column as only feature for classification load fisheriris indices crossvalind Kfold species 10 cp classperf species initializes the CP object for i 1 10 test indices i train test class classify meas test 3 meas train 3 species train o classperf cp class test end updates the CP object with the current classification results cp CorrectRate queries for the correct classification rate cp biolearning classperformance Label Description ClassLabels GroundTruth NumberOfObservations ControlClasses TargetClasses ValidationCounter SampleDistribution ErrorDistribution SampleDistributionByClass ErrorDistributionByClass CountingMatrix CorrectRate ErrorRate InconclusiveRate ClassifiedRate Sensitivity 3x1 cell 150x1 double 150 2x1 double 1 1 150x1 double 150x1 double 3x1 double 3x1 double 4x3 double 0 0733 0 9267 classperf Specificity 0 8900 PositivePredictiveValue 0 8197 NegativePredictiveValue 1 PositiveLikelihood 9 0909 NegativeLikelihood 0 Prevalence 0 3333 DiagnosticTable 2x2 double ans 0 9467 See Also Bioinformatics Toolbox functions knnclassify svmclassify crossvalind Statistics Toolbox functio
72. DataX and DataY intensities when Type is IR e 1 2 log of the product of the DataX and Datay intensities when Type is MA Ratio Vector containing ratios of the microarray gene expression data calculated as log2 DataXx DataY H Handle of the plot mairplot DataX DataY creates a scatter plot that plots log of the product of the DataX and DataY intensities versus log of the intensity ratios Intensity Ratio mairplot DataX DataY returns the intensity and ratio values If you set Normalize to true the returned ratio values are normalized Intensity Ratio H mairplot DataX DataY returns the handle of the plot mairplot PropertyName PropertyValue calls mairplot with optional properties that use property name property value pairs You can specify one or more properties in any order Each PropertyName must be enclosed in single quotation marks and is case insensitive These property name property value pairs are as follows mairplot Type TypeValue specifies the plot type Choices are IR plots log of the product of the DataX and DataY intensities versus log of the intensity ratios or MA plots 1 2 log of the product of the DataX and DatayY intensities versus log of the intensity ratios Default is IR 2 373 mairplot 2 374 mairplot LogTrans LogTransValue controls the conversion of data in X and Y from natural to log scale Set LogTr
73. G 119 T 82 SPAC 16 25 PRIM 0 MACH Arkansas_SN312 DYEP DT3700POP5 BD v2 mob NAME HCIUP1D61207 LANE 6 GELN PROC RTRK CONV phred version 0 990722 h COMM SRCE ABI 373A or 377 See Also Bioinformatics Toolbox functions genbankread traceplot 2 627 seq2regexp Purpose Convert sequence with ambiguous characters to regular expression Syntax seq2regexp Seq seq2regexp PropertyName PropertyValue seq2regexp Alphabet AlphabetValue seq2regexp Ambiguous AmbiguousValue Arguments Seq Amino acid or nucleotide sequence as a string of characters You can also enter a structure with the field Sequence AlphabetValue Property to select the sequence alphabet Enter either AA for amino acids or NT for nucleotides The default value is NT AmbiguousValue Property to control returning ambiguous characters in the regular expression Enter either true include ambiguous characters or false return only unambiguous characters The default value is true Nucleotide Conversions Nucleotide Letter Nucleotide Nucleotide Letter Nucleotide A A Adenosine S GC Strong C C Cytosine W AT Weak G G Guanine B GTC T T Thymidine D GAT U U Uridine H ACT R GA Purine V GCA Y TC Pyrimidine N AGCT Any nucleotide 2 628 seq2regexp Description Nucleotide Letter Nucleoti
74. Graph Theory p 1 12 Gene Ontology p 1 13 Protein Analysis p 1 13 Profile Hidden Markov Models p 1 14 Microarray File Formats p 1 15 Microarray Utility p 1 15 Compare sets of nucleotide or amino acid sequences progressively align sequences using phylogenetic tree for guidance Standard scoring matrices such as PAM and BLOSUM families of matrices that alignment functions use Read phylogenetic tree files calculate pair wise distances between sequences and build a phylogenetic tree Apply basic graph theory algorithms to sparse matrices Read Gene Ontology formatted files Determine protein characteristics and simulate enzyme cleavage reactions Get profile hidden Markov model data from the PFAM database or create your own profiles from set of sequences Read data from common microarray file formats including Affymetrix GeneChip ImaGene results and SPOT files read GenePix GPR and GAL files Using Affymetrix and GeneChip data sets get library information for probe gene information from probe set and probe set values from CEL and CDF information show probe set information from NetAffx and plot probe set values Constructor Microarray Data Analysis and Visualization p 1 16 Microarray Normalization and Filtering p 1 17 Statistical Learning p 1 18 Mass Spectrometry File Formats Preprocessing and Visualization p 1 19 Constructor biograph geneont phyt
75. Histograms of t test Results _ 5 x File Edit View Insert Tools Desktop Window Help OeHS 8 QQMS 2 08 e0 t scores p values 3500 3000 3000 2500 2500 2000 gt 2000 5 S 1500 D D w 1500 i 1000 1000 i i 0 al e x 0 10 5 D 5 10 D 0 5 1 t score p value mattest Showplot ShowplotValue controls the display of a normal t score quantile plot When ShowplotValue is true mattest displays a quantile quantile plot Default is false In the t score quantile plot the black diagonal line represents the sample quantile being equal to the theoretical quantile Data points of genes considered to be differentially expressed lie farther away from this line Specifically data points with t scores gt 1 1 2N or lt 1 2N display with red circles N is the total number of genes 2 392 mattest Normal Quantile Plot of t E lol x Fie Edit View Insert Tools Desktop Window Help DoS s QQoo e 08 50 50 40 30 20 Sample quantile 20 15 10 5 D 5 10 15 20 Theoretical quantile mattest Labels LabelsValue controls the display of labels when you click a data point in the t score quantile plot LabelsValue is a cell array of labels typically gene names or probe set IDs for each row in DataX and DataY Examples 1 Load the MAT file included with Bioinformatics Toolbox that contains Affymetrix data from a prostate cancer study specifically pro
76. IUPAC letters If the property ACGTOn1ly is true you can only enter the characters A C T G and U UnknownValue Property to select the integer for unknown characters Enter an integer Maximum value is 255 Default value is 0 ACGTOnlyValue Property to control the use of ambiguous nucleotides Enter either true or false Default value is false Mapping Nucleotide Letters to Integers Base Code Base Code Base Code Adenosine A 1 T C Y 6 A T G not D 12 pyrimidine C Cytidine C 2 G T keto K 7 A T C not H 13 G Guanine G 3 A C amino M 8 A G C not V 14 T Thymidine T 4 G C strong S 9 A T G C any N 15 2 518 nt2int Description Examples See Also Base Code Base Code Base Code Uridine U 4 A T weak W 10 Gap of 16 indeterminate length A G R 5 1 G C not B 11 Unknown 0 purine A default and 217 SegInt nt2int SeqChar PropertyName PropertyValue converts a character string of nucleotides to a 1 by N array of integers using the table Mapping Nucleotide Letters to Integers above Unknown characters characters not in the table are mapped to 0 Gaps represented with hyphens are mapped to 16 nt2int Unknown UnknownValue defines the number used to represent unknown nucleotides The default value is 0 nt2int ACGTOnly ACGTONlyValue if ACGTOn1y is true the ambiguous nucleotide characters N R Y K M S W B D H and
77. If there are multiple features with the same FeatureValue then FeatStruct is an array of structures SequenceValue Property to control the extraction when possible of the sequences respective to each feature joining and complementing pieces of the source sequence and storing them in the Sequence field of the returned structure FeatStruct When extracting the sequence from an incomplete CDS feature featuresparse uses the codon_start qualifier to adjust the frame of the sequence Choices are true or false default 2 152 featuresparse Return Values Description FeatStruct FeatStruct Output structure containing a field for every database feature Each field name in FeatStruct matches the corresponding feature name in the GenBank GenPept or EMBL database with the exceptions listed in the table below Fields in FeatStruct contain substructures with feature qualifiers as fields In the GenBank GenPept and EMBL databases for each feature the only mandatory qualifier is its location which featuresparse translates to the field Location When possible featuresparse also translates this location to numeric indices creating an Indices field Note If you use the Indices field to extract sequence information you may need to complement the sequences featuresparse Features parses the features from Features which contains GenBank GenPept or EMBL features Features can be a String containing GenBank
78. NUMBRANCHES X 1 which contains the distance from each branch to the leaves In ultrametric trees all of the leaves are at the same location same distance to the root b 125 3 4 c 1 4 view phytree b c Tree phytree BC creates an ultrametric phylogenetic binary tree object with branch pointers in BC 1 2 and branch coordinates in BC 3 Same as phytree B C Tree phytree N specifies the names for the leaves and or the branches Nis a cell of strings If NUMEL N NUMLEAVES then the names are assigned chronologically to the leaves If NUMEL N NUMBRANCHES the names are assigned to the branch nodes If NUMEL N NUMLEAVES NUMBRANCHES all the nodes are named Unassigned names default to Leaf and or Branch as required Tree phytree creates an empty phylogenetic tree object Create a phylogenetic tree for a set of multiply aligned sequences Sequences multialignread aagag aln distances seqpdist Sequences tree seqlinkage distances phytreetool tree 2 563 phytree See Also Bioinformatics Toolbox functions phytreeread phytreetool phytreewrite seqlinkage seqneighjoin seqpdist Bioinformatics Toolbox object phytree object Bioinformatics Toolbox methods of phytree object get getbyname getcanonical getmatrix getnewickstr pdist plot prune reroot select subtree view weights 2 564 phytreeread Purpose Syntax Arguments Description Exam
79. Node Node index returned by the phytree object method getbyname Distance Distance from the reference branch Tree2 reroot Tree1 changes the root of a phylogenetic tree Tree1 using a midpoint method The midpoint is the location where the mean values of the branch lengths on either side of the tree are equalized The original root is deleted from the tree Tree2 reroot Tree1 Node changes the root of a phylogenetic tree Tree7 to a branch node using the node index Node The new root is placed at half the distance between the branch node and its parent Tree2 reroot Tree1 Node Distance changes the root of a phylogenetic tree Tree1 to a new root at a given distance Distance from the reference branch node Node toward the original root of the tree Note The new branch representing the root in the new tree Tree2 is labeled Root 1 Create an ultrametric tree tr_1 phytree 5 7 8 9 6 11 1 2 3 4 10 12 14 16 15 17 13 18 plot tr_1 branchlabels true MATLAB draws a figure with the phylogenetic tree 4 63 reroot phytree File Edit View Insert Tools Desktop Window Help a Branch 3 Branch 1 Branch 5 Branch 8 Branch 4 Branch 7 Branch 6 Branch 2 2 Place the root at Branch 7 sel getbyname tr_1 Branch 7 tr_2 reroot tr_1 sel plot tr_2 branchlabels true MATLAB draws a tree with the root moved to the center of branch 7 4 64 reroot phytree Figure
80. NodeIDs gets the handles for the specified nodes NodeIDs in a biograph object 1 Create a biograph object species Homosapiens Pan Gorilla Pongo Baboon Macaca Gibbon cm magic 7 gt 25 amp 1 eye 7 bg biograph cm species 2 Find the handles to members of the Cercopithecidae family and members of the Hominidae family Cercopithecidae Macaca Baboon Hominidae Homosapiens Pan Gorilla Pongo CercopithecidaeNodes getnodesbyid bg Cercopithecidae HominidaeNodes getnodesbyid bg Hominidae 3 Color the families differently and draw a graph Bioinformatics Toolbox function biograph object constructor Bioinformatics Toolbox object biograph object Bioinformatics Toolbox methods of a biograph object dolayout getancestors getdescendants getedgesbynodeid getnodesbyid getrelatives view getnodesbyid biograph MATLAB functions get set 4 37 getrelatives biograph Purpose Syntax Arguments Description Examples See Also 4 38 Find relatives in biograph object Nodes getrelatives BiographNode Nodes getrelatives BiographNode NumGenerations BiographNode Node in a biograph object NumGenerations Number of generations Enter a positive integer Nodes getrelatives BiographNode finds all the direct relatives for a given node BiographNode Nodes getrelatives BiographNode NumGenerations finds the direct relatives for a
81. NullXx HMMStruct gethmmprof PFAMNumber determines a protein family accession number from PFAMNumber an integer searches the PFAM database for the associated record retrieves the HMM profile information and stores it in HMMStruct a MATLAB structure HMMStruct gethmmprof PFAMAccessNumber searches the PFAM database for the record represented by PFAMAccessNumber a protein family accession number retrieves the HMM profile information and stores it in HMMStruct a MATLAB structure Note While this is the most efficient way to query the PFAM database version numbers can change making your input invalid HMMStruct gethmmprof PropertyName PropertyValue calls gethmmprof with optional properties that use property name property value pairs You can specify one or more properties in any order Each PropertyName must be enclosed in single quotation marks and is case insensitive These property name property value pairs are as follows HMMStruct gethmmprof ToFile ToFileValue saves the data returned from the PFAM database in a file specified by ToFileValue 2 217 gethmmprof Examples 2 218 Note You can read an HMM formatted file back into MATLAB using the pfamhmmread function HMMStruct gethmmprof Mode ModeValue specifies the returned alignment mode Choices are e 1s Default Global alignment mode e fs Local alignment mode HMMStruct
82. Polyorder PolyorderValue specifies the order of a polynomial kernel PolyorderValue must be a positive number Default is 3 SVMStruct svmtrain Mlp Params Mlp ParamsValue specifies the scale and bias parameters of the multilayer perceptron mlp kernel as a two element vector p1 p2 K tanh p1 U V p2 p1 gt 0 and p2 lt 0 p1 must be gt 0 and p2 must be lt 0 Default is 1 1 SVMStruct svmtrain Method MethodValue specifies the method to find the separating hyperplane Choices are e QP Quadratic Programming requires Optimization Toolbox The classifier is a two norm soft margin support vector machine e SMO Sequential Minimal Optimization The classifier is a one norm soft margin support vector machine e LS Least Squares svmtrain If you installed Optimization Toolbox the QP method is the default Otherwise the SMO method is the default Note If you specify the QP method the classifier is a two norm soft margin support vector machine SVMStruct svmtrain QuadProg Opts QuadProg OptsValue specifies an options structure created by the optimset function Optimization Toolbox This structure specifies options used by the QP method For more information on creating this structure see the optimset and quadprog functions SVMStruct svmtrain SMO _Opts SMO_OptsValue specifies an options structure created by sv
83. ProbeIndices SequenceMatrix Description Struct affyprobeseqread SegFile CDFFile reads the data from files SeqFile and CDFFile and stores the data in the MATLAB structure Struct which contains the following fields Field Description ProbeSetIDs Cell array containing the probe set IDs from the Affymetrix CDF library file 2 31 affyprobeseqread Field Description ProbeIndices Column vector containing probe indexing information Probes within a probe set are numbered 0 through N 1 where N is the number of probes in the probe set SequenceMatrix An N by 25 matrix of sequence information for the perfect match PM probes on the Affymetrix GeneChip array where N is the number of probes on the array Each row corresponds to a probe and each column corresponds to one of the 25 sequence positions Nucleotides in the sequences are represented by one of the following integers e 0 None e 1 A e 2 C e 3 G e 4 T Note Probes without sequence information are represented in SequenceMatrix as a row containing all Os Tip You can use the int2nt function to convert the nucleotide sequences in SequenceMatrix to letter representation Struct affyprobeseqread SeqFile CDFFile PropertyName PropertyValue calls affyprobeseqread with optional properties that use property name property value pairs 2 32 affyprobeseqread You can specify one
84. Property to select the N percentage of the highest ranked invariant set of data points to fit a straight line through while the remaining data points are fitted to a running median curve The final running median curve is a piece wise linear curve Default is 1 5 2 15 affyinvarsetnorm MethodValue Property to select the smoothing method used to normalize the data Enter lowess or runmedian Default is lowess ShowplotValue Property to control the plotting of two pairs of scatter plots before and after normalization The first pair plots baseline data versus data from a specified column chip from the matrix Data The second is a pair of M A scatter plots which plots M ratio between baseline and sample versus A the average of the baseline and sample Enter either all plot a pair of scatter plots for each column or chip or specify a subset of columns chips by entering the column number s or a range of numbers For example e Showplot 3 plots data from column 3 e Showplot 3 5 7 plots data from columns 3 5 and 7 e Showplot 3 9 plots data from columns 3 to 9 Description NormData affyinvarsetnorm Data normalizes the values in each column chip of probe intensities in Data to a baseline reference using the invariant set method NormData is a matrix of normalized probe intensities from Data Specifically affyinvarsetnorm e Selects a baseline in
85. The child can be either another branch node or a leaf node ID is a column vector of strings listing the labels that correspond to the rows and columns of Matrix with the labels from 1 to Number of Leaves being the leaf nodes then the labels from Number of Leaves 1 to Number of Leaves Number of Branches being the branch nodes and the label for the last branch node also being the root node Distances is a column vector with one entry for every nonzero entry in Matrix traversed column wise and representing the distance between the branch node and the child T phytreeread pf00002 tree MATRIX ID DIST getmatrix T Bioinformatics Toolbox functions phytree object constructor phytreetool Bioinformatics Toolbox object phytree object Bioinformatics Toolbox methods of phytree object get pdist prune 4 33 getnewickstr phytree Purpose Syntax Arguments Description References 4 34 Create Newick formatted string String getnewickstr Tree getnewickstr PropertyName PropertyValue getnewickstr Distances DistancesValue getnewickstr BranchNames BranchNamesValue Tree Phytree object created with the function phytree DistancesValue Property to control including or excluding distances in the output Enter either true include distances or false exclude distances Default is true BranchNamesValue Property to control including or excluding branch names i
86. Toolbox which contains Affymetrix probe level data including pmMatrix a matrix of PM probe intensity values from multiple CEL files load prostatecancerrawdata 2 Perform background adjustment on the PM probe intensity values in the matrix pmMatrix creating a new matrix BackgroundAdjustedMatrix 2 618 rmabackadj BackgroundAdjustedMatrix rmabackadj pmMatrix 3 Perform background adjustment on the PM probe intensity values in only column 3 of the matrix pmMatrix creating a new matrix BackgroundAdjustedChips BackgroundAdjustedChip3 rmabackadj pmMatrix 3 The prostatecancerrawdata mat file used in the previous example contains data from Best et al 2005 References 1 Irizarry R A Hobbs B Collin F Beazer Barclay Y D Antonellis K J Scherf U Speed T P 2003 Exploration Normalization and Summaries of High Density Oligonucleotide Array Probe Level Data Biostatistics 4 249 264 2 Bolstad B 2005 affy Built in Processing Methods http www bioconductor org repository devel vignette builtinMethods pdf 3 Best C J M Gillespie J W Yi Y Chandramouli G V R Perlmutter M A Gathright Y Erickson H S Georgevich L Tangrea M A Duray P H Gonzalez S Velasco A Linehan W M Matusik R J Price D K Figg W D Emmert Buck M R and Chuaqui R F 2005 Molecular alterations in primary prostate cancer after androgen ablation therapy Clinical Can
87. V are represented by the unknown nucleotide number Convert a nucleotide sequence with letters to integers s nt2int ACTGCTAGC 1 2 4 3 2 4 1 3 2 Bioinformatics Toolbox functions aa2int baselookup int2aa int2nt 2 519 ntdensity Purpose Syntax Description Examples 2 520 Plot density of nucleotides along sequence Density ntdensity SeqNT PropertyName PropertyValue ntdensity Window WindowValue Density HighCG ntdensity CGThreshold CGThresholdValue ntdensity SeqNT plots the density of nucleotides A T C G in sequence SeqnT Density ntdensity SeqNT PropertyName PropertyValue returns a MATLAB structure with the density of nucleotides A C G and T ntdensity Window WindowValue uses a window of length Window for the density calculation The default value is length SeqNT 20 Density HighCG ntdensity CGThreshold CGThresholdValue returns indices for regions where the CG content of SeqNT is greater than CGThreshold The default value for CG4Threshold is 5 s randseq 1000 alphabet dna ndensity s ntdensity Figure 1 x A o x Fie Edit View Insert Tools Desktop Window Help Nucleotide density 0 200 400 600 800 1000 A T C G density See Also Bioinformatics Toolbox functions basecount codoncount cpgisland dimercount MATLAB function filter 2 521 nuc44 Purpose Syntax Desc
88. You can specify one or more properties in any order Each PropertyName must be enclosed in single quotation marks and is case insensitive These property name property value pairs are as follows FeatStruct featuresparse Features Feature FeatureValue returns only the substructure that corresponds to FeatureValue the name of a feature contained in Features If there are multiple featuresparse Examples features with the same FeatureValue then FeatStruct is an array of structures FeatStruct featuresparse Features Sequence SequenceValue controls the extraction when possible of the sequences respective to each feature joining and complementing pieces of the source sequence and storing them in the field Sequence When extracting the sequence from an incomplete CDS feature featuresparse uses the codon_start qualifier to adjust the frame of the sequence Choices are true or false default Obtaining All Features from a GenBank File The following example obtains all the features stored in the GenBank file nm175642 txt gbkStruct genbankread nm175642 txt features featuresparse gbkStruct features source 1x1 struct gene 1x1 struct CDS 1x1 struct Obtaining a Subset of Features from a GenBank Record The following example obtains only the coding sequences CDS feature of the Caenorhabditis elegans cosmid record accession number Z92777 from the GenBank database worm ge
89. a positive real value Default is 0 Peaks mspeaks MZ Intensities ShowPlot ShowPlotValue controls the display of a plot of the original and the smoothed signal with the peaks included in the output matrix Peaks marked Choices are true false or I an integer specifying the index of a spectrum in Intensities If set to true the first spectrum in Intensities is plotted Default is e false When return values are specified e true When return values are not specified 1 Load a MAT file included with Bioinformatics Toolbox which contains mass spectrometry data variables including MZ_lo_res a vector of m z values for a set of spectra and Y_lo_res a matrix of intensity values for a set of mass spectra that share the same m z range load sample_lo_res 2 Adjust the baseline of the eight spectra stored in Y_lo_ res YB msbackadj MZ_lo_res Y_lo_res 3 Convert the raw mass spectrometry data to a peak list by finding the relevant peaks in each spectrum P mspeaks MZ_lo_res YB 4 Plot the third spectrum in YB the matrix of baseline corrected intensity values with the detected peaks marked P mspeaks MZ_lo_res YB SHOWPLOT 3 mspeaks lolx File Edit View Insert Tools Desktop Window Help a Oe S 3 QQ 8 OfB ea Spectrogram ID 3 al aaa E et aa Original spectrogram Denoised spectrogram O D A ea E Peaks ENO E E E en nee See Se 60ff
90. and order blue green red cyan magenta yellow brown light green orange purple gold and silver In the matrix each row corresponds to a color and each column specifies red green and blue intensity respectively Valid values for the RGB intensities are 0 0 to 1 0 2 143 featuresmap QualifiersValue Cell array of strings to specify an ordered list of qualifiers to search for in the structure and use as annotations For each feature the first matching qualifier found from the list is used for its annotation If a feature does not include any of the qualifiers no annotation displays for that feature By default QualifiersValue gene product locus_tag note db_xref protein_id Provide your own QualifiersValue to limit or expand the list of qualifiers or change the search order Tip Set QualifiersValue to create a map with no annotations Tip To determine all qualifiers available for a given feature do either of the following e Create the map and then click a feature or its annotation to list all qualifiers for that feature e Use the featuresparse command to parse all the features into a new structure and then use the fieldnames command to list the qualifiers for a specific feature See Determining Qualifiers for a Specific Feature on page 2 150 ShowPositionsValue Property to add the sequence position to the annotation label for each feature Enter true to add
91. are considered to potentially complement any nucleotide MolWeight MolWeightdelta Molecular weight of the DNA oligonucleotide Ambiguous N characters in SeqNT are considered to potentially be any nucleotide If SeqNT contains ambiguous N characters MolWeight is the midpoint value and its uncertainty is expressed by MolWeightdelta The difference between MolWeight midpoint value and either the maximum or minimum value MolWeight could assume The maximum and minimum values are calculated by assuming all N characters are G or C respectively Therefore MolWeightdelta defines the possible range of molecular weight for SeqNT 2 533 oligoprop Field Description Tm A vector with melting temperature values in degrees Celsius calculated by six different methods listed in the following order e Basic Marmur et al 1962 e Salt adjusted Howley et al 1979 e Nearest neighbor Breslauer et al 1986 e Nearest neighbor SantaLucia Jr et al 1996 e Nearest neighbor SantaLucia Jr 1998 e Nearest neighbor Sugimoto et al 1996 Ambiguous N characters in SeqNT are considered to potentially be any nucleotide If SeqNT contains ambiguous N characters Tm is the midpoint value and its uncertainty is expressed by Tmdelta Tmdelta A vector containing the differences between Tm midpoint value and either the maximum or minimum value Tm could assume for each of the six methods Theref
92. are its children Leaf nodes are numbered from 1 to NUMLEAVES and branch nodes are numbered from NUMLEAVES 1 to NUMLEAVES NUMBRANCHES Note that because only binary trees are allowed NUMLEAVES NUMBRANCHES 4 Branches are defined in chronological order for example B i gt NUMLEAVES i As a consequence the first row can only have pointers to leaves and the last row must represent the root branch Parent child 2 561 phytree distances are set to 1 unless the child is a leaf and to satisfy the ultrametric condition of the tree its distance is increased Given a tree with three leaves and two branches as an example 3 1 In the MATLAB Command Window type 12 3 3 4 tree phytree B view tree Phylogenetic Tree Viewer o x File Tools Window rep QQMZLDE Leaf 3 Leaf 2 Leaf 1 2 562 phytree Examples Tree phytree B D creates an additive ultrametric or nonultrametric phylogenetic tree object with branch distances defined by D Dis a numeric array of size NUMNODES X 1 with the distances of every child node leaf or branch to its parent branch equal to NUMNODES NUMLEAVES NUMBRANCHES The last distance in D is the distance of the root node and is meaningless b 12 34 d 121 51 0 view phytree b d Tree phytree B C creates an ultrametric phylogenetic tree object with distances between branches and leaves defined by C C is a numeric array of size
93. array SeqsMultiAligned is a char array with the output alignment following the same order as the input 2 495 multialign Tree WeightsValue ScoringMatrixValue SMInterpValue GapOpenValue 2 496 Phylogenetic tree calculated with either of the functions seqlinkage or seqneighjoin Property to select the sequence weighting method Enter either THG default or equal Property to select or specify the scoring matrix Enter an MxM matrix or MxMxN array of matrixes withN user defined scoring matrices ScoringMatrixValuemay also be a cell array of strings with matrix names The default is the BLOSUM80 to BLOSUM30 series for amino acids or a fixed matrix NUC44 for nucleotides When passing your own series of scoring matrices make sure all of them share the same scale Property to specify whether linear interpolation of the scoring matrices is on or off When false scoring matrix is assigned to a fixed range depending on the distances between the two profiles or sequences being aligned Default is true Scalar or a function specified using If you enter a function multialign passes four values to the function the average score for two matched residues sm the average score for two mismatched residues sx and the length of both profiles or sequences len1 len2 Default is sm sx leni len2 5 sm multialign ExtendGapValue DelayCutoffValue JobManagerValue WaitInQueueValue Verbo
94. as follows mainvarsetnorm NormDataY mainvarsetnorm Thresholds ThresholdsValue sets the thresholds for the lowest average rank and the highest average rank which are used to determine the invariant set The rank invariant set is a set of data points whose proportional rank difference is smaller than a given threshold The threshold for each data point is determined by interpolating between the threshold for the lowest average rank and the threshold for the highest average rank Select these two thresholds empirically to limit the spread of the invariant set but allow enough data points to determine the normalization relationship ThresholdsValue is a 1 by 2 vector LT HT where LT is the threshold for the lowest average rank and HT is threshold for the highest average rank Values must be between 0 and 1 Default is 0 03 0 07 NormDataY mainvarsetnorm Exclude ExcludeValue filters the invariant set of data points by excluding the data points whose average rank between DataX and DatayY is in the highest N ranked averages or lowest N ranked averages NormDataY mainvarsetnorm Prctile PrctileValue stops the iteration process when the number of data points in the invariant set reaches N percent of the total number of input data points Default is 1 Note If you do not use this property the iteration process continues until no more data points are eliminated Nor
95. as variable names GPRData gprread File reads GenePix results data from File and creates a MATLAB structure GPRData with the following fields Field Header Data Blocks Columns Rows Names IDs ColumnNames Indices Shape gprread PropertyName PropertyValue defines optional properties using property name value pairs gprread Examples gprread CleanColNames CleanColNamesValue A GPR file may contain column names with spaces and some characters that MATLAB cannot use in MATLAB variable names If CleanColNamesValue is true gprread returns names in the field ColumnNames that are valid MATLAB variable names and names that you can use in functions By default CleanColNamesValue is false and the field ColumnNames may contain characters that are invalid for MATLAB variable names The field Indices of the structure contains MATLAB indices that can be used for plotting heat maps of the data For more details on the GPR format see http www moleculardevices com pages software gn_genepix_file_formats html gpr http www moleculardevices com pages software gn_gpr_format_history html For a list of supported file format versions see http www moleculardevices com pages software gn_genepix_file_ formats html GenePix is a registered trademark of Molecular Devices Corporation Read in a sample GPR file and plot the median foreground intensity for the 635 nm channel gprStruct gprr
96. bar File Tools Window and Help Toolbar Zoom XY Zoom X Zoom Y Reset view Zoom out and Help Main window display the spectra Overview window display the overview of a full spectrum the average of all spectra in display Marker control panel a list of markers Add marker Delete marker up and down buttons Load and plot sample data load sample_lo_res msviewer MZ_lo_res Y_lo_ res 2 Add a marker by pointing to a mass peak right clicking and then clicking Add Marker 3 From the File menu select e Import Markers from Workspace Opens the Import Markers From MATLAB Workspace dialog The dialog should display a list of double Mx1 or 1xM variables If the selected variable is out of range the viewer displays an error message e Export Markers to Workspace Opens the Export Markers to MATLAB Workspace dialog You can enter a variable name for the markers All markers are saved If there is no marker available this menu item should be disabled 2 493 msviewer e Print to Figure Prints the spectra plot in the main display to a MATLAB figure window 4 From the Tools menu click e Add Marker Opens the Add Marker dialog Enter an m z marker e Delete Marker Removes the currently selected m z marker from the Markers m z list e Next Marker or Previous Marker Moves the selection up and down the Markers m z list e Zoom XY Zoom X Zoom Y or Zoom Out Changes the cursor from an
97. be enclosed in single quotes and is case insensitive These property name property value pairs are as follows S C graphconncomp G Directed DirectedValue indicates whether the graph is directed or undirected Set directedValue to false for an undirected graph This results in the upper triangle of the sparse matrix being ignored Default is true A DFS based algorithm computes the connected components Time complexity is O N E where N and E are number of nodes and edges respectively S C graphconncomp G Weak WeakValue indicates whether to find weakly connected components or strongly connected components A weakly connected component is a maximal group of nodes that are mutually reachable by violating the edge directions Set WeakValue to true to find weakly connected components Default is false which finds strongly connected components The state of this 2 243 graphconncomp Examples 2 244 parameter has no effect on undirected graphs because weakly and strongly connected components are the same in undirected graphs Time complexity is 0 N E where N and E are number of nodes and edges respectively Note By definition a single node can be a strongly connected component Note A directed acyclic graph DAG cannot have any strongly connected components larger than one 1 Create and view a directed graph with 10 nodes and 17 edges DG sparse 1 1122334567789 9 99 268
98. computations and their amino acid translations dn ds dndsml hk01_aligned vt04 aligned verbose true 1 Tamura K and Mei M 1993 Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees Molecular Biology and Evolution 10 512 526 2 Yang Z and Nielsen R 2000 Estimating synonymous and nonsynonymous substitution rates under realistic evolutionary models Molecular Biology and Evolution 17 32 43 Bioinformatics Toolbox functions dnds featuresparse geneticcode nt2aa nwalign seqinsertgaps seqpdist 2 129 emblread Purpose Syntax Arguments Description 2 130 Read data from EMBL file EMBLData emblread File EMBLSeq emblread File SequenceOnly SequenceOnlyValue File EMBL formatted file ASCII text file Enter a file name a path and file name or a URL pointing to a file File can also be a MATLAB character array that contains the text for a file name SequenceOnlyValue Property to control reading EMBL file information If SequenceOnlyValue is true emblread returns only the sequence EMBLSeq EMBLData MATLAB structure with fields corresponding to EMBL data EMBLSeq MATLAB character string without metadata for the sequence EMBLData emblread File reads data from an EMBL formatted file File and creates a MATLAB structure EMBLData with fields corresponding to the EMBL two character line ty
99. created either by the NodeIDs input argument or internally by the biograph constructor function Each node object s ID is unique and used internally to identify the node Label Description String for labeling a node when you display a biograph object using the view method Default is the ID property of the node object String that describes the node Default is This information is for bookkeeping purposes only Position Two element numeric vector of x and y coordinates for example 150 150 If you do not specify this property default is initially then when the layout algorithms are executed it becomes a two element numeric vector of x and y coordinates computed by the layout engine biograph object Property Description Shape String that specifies the shape of the nodes Choices are e box default e ellipse e circle e rectangle e diamond e trapezium e invtrapezium e house e inverse e parallelogram Size Two element numeric vector calculated before calling the layout engine using the actual font size and shape of the node Default is 10 10 Color Three element numeric vector of RGB values that specifies the fill color of the node Default is 1 1 0 7 which defines yellow LineWidth Positive number Default is 1 LineColor Three element numeric vector of RGB values that specifies the outl
100. curve e Plot of q values versus p values Choices are true or false default FDR Column vector of positive FDR pFDR values Q Column vector of q values Pio Estimated true null hypothesis o R2 Square of the correlation coefficient FDR mafdr PValues computes a positive FDR pFDR value for each value in PValues a column vector of p values for each gene in two microarray data sets using a procedure introduced by Storey 2002 FDR is a column vector of positive FDR pFDR values FDR Q mafdr PValues also returns a q value for each p value in PValues Q is a column vector 2 355 mafdr 2 356 FDR Q Pi0 mafdr PValues also returns Pi0 the estimated true null hypothesis o if using the procedure introduced by Storey 2002 FDR Q Pi0 R2 mafdr PValues also returns R2 the square of the correlation coefficient if using the procedure introduced by Storey 2002 and the polynomial method to calculate the true null hypothesis o from the tuning parameter lambda A mafdr PValues PropertyName PropertyValue calls mafdr with optional properties that use property name property value pairs You can specify one or more properties in any order Each PropertyName must be enclosed in single quotation marks and is case insensitive These property name property value pairs are as follows mafdr PValues BHFDR BHFDRValue controls the use of the linear step up LSU
101. database for the HMM profile record represented by PFAMAccessNumber a protein family accession number retrieves the multiple sequence alignment associated with the HMM profile and returns AlignStruct a MATLAB structure AlignStruct gethmmalignment PropertyName PropertyValue calls gethmmalignment with optional properties that use property name property value pairs You can specify one or more properties in any order Each PropertyName must be enclosed in single quotation marks and is case insensitive These property name property value pairs are as follows gethmmalignment AlignStruct gethmmalignment ToFile ToFileValue saves the data returned from the PFAM database to a file specified by ToFileValue Note You can read a FASTA formatted file containing PFAM data back into MATLAB using the fastaread function AlignStruct gethmmalignment Type TypeValue specifies the set of alignments returned Choices are e full Default Returns all sequences that fit the HMM profile e seed Returns only the sequences used to generate the HMM profile AlignStruct gethmmalignment Mirror MirrorValue specifies a Web database Choices are e Sanger default e Janelia You can reach other mirror sites by passing the complete URL to the fastaread function Note These mirror sites are maintained separately and may have slight variations For more inf
102. distance value Nodes with distances below this value are selected Property to remove exclude branch or leaf nodes from the output Enter none pranchs or leaves The default value is none Property to select propagating nodes toward the leaves or the root S select Tree N returns a logical vector S of size NumNodes x 1 indicating the N closest nodes to the root node of a phytree object Tree where NumNodes NumLeaves NumBranches The first criterion select uses is branch levels then patristic distance also 4 67 select phytree 4 68 known as tree distance By default select uses inf as the value of N and select 7ree returns a vector with values of true S Selleaves Selbranches select returns two additional logical vectors one for the selected leaves and one for the selected branches select PropertyName PropertyValue defines optional properties using property name value pairs select Reference ReferenceValue changes the reference point s to measure the closeness Reference can be the root default or leaves When using leaves a node can have multiple distances to its descendant leaves nonultrametric tree If this the case select considers the minimum distance to any descendant leaf select Criteria CriteriaValue changes the criteria select uses to measure closeness If C levels default the first criterion is branch levels and th
103. each base in the sequence being a C prob G Column vector containing the probability of each base in the sequence being a G prob _T Column vector containing the probability of each base in the sequence being a T base Column vector containing the called bases for the sequence Sample Probability Comments scfread File also returns the comment information from the SCF file in a character array Comments A C T G scfread File returns the sample data for the four bases in separate variables A C T G ProbA ProbC ProbG ProbT scfread File also returns the probabilities data for the four bases in separate variables A C T G ProbA ProbC ProbG ProbT Comments PkIndex Base scfread File also returns the peak indices and called bases in separate variables SCF files store data from DNA sequencing instruments Each file includes sample data sequence information and the relative probabilities of each of the four bases For more information on SCF files see http www mrc 1lmb cam ac uk pubseq manual formats_unix_2 html scfread Examples sampleStruct probStruct Comments scfread sample scf sampleStruct A 10827x1 double C 10827x1 double G 10827x1 double T 10827x1 double probStruct peak_index 742x1 double prob_A 742x1 double prob_C 742x1 double prob_G 742x1 double prob_T 742x1 double base 742x1 char Comments SIGN A 121 C 103
104. experimental conditions Cell array of labels typically gene names or probe set IDs for the data After creating the plot you can click a data point to display the label associated with it If you do not provide a LabelsValue data points are labeled with row numbers from DataX and DataY Property to control the conversion of data in DataX and DataY from natural scale to log 2 scale Enter true to convert data to log 2 scale or false Default is false which assumes data is already log 2 scale mavolcanoplot Description PCutoffValue Lets you specify a cutoff p value to define data points that are statistically significant This value is displayed graphically as a horizontal line on the plot Default is 0 05 which is equivalent to 1 3010 on the log p value scale Note You can also change the p value cutoff interactively after creating the plot FoldchangeValue Lets you specify a ratio fold change to define data points that are differentially expressed Default is 2 which corresponds to a ratio of 1 and 1 on a log ratio scale Note You can also change the fold change interactively after creating the plot mavolcanoplot DataX DataY PValues creates a scatter plot of gene expression data plotting significance versus fold change of gene expression ratios It uses the average gene expression values from two data sets DataX and DatayY for each gene in the data sets It plots significance as
105. extract this column vector from the MMIntensities matrix returned by the celintensityread function 2 23 affyprobeaffinities 2 24 Return Values Description ProbeIndicesValue Column vector containing probe indexing information Probes within a probe set are numbered 0 through N 1 where N is the number of probes in the probe set Tip You can use the affyprobeseqread function to generate this column vector ShowplotValue Controls the display of a plot showing the affinity values of each of the four bases A C G and T for each of the 25 sequence positions for all probes on the Affymetrix GeneChip array Choices are true or false default AffinPM Column vector of PM probe affinities computed from their probe sequences and MM probe intensities Af finMM Column vector of MM probe affinities computed from their probe sequences and MM probe intensities AffinPM AffinMM affyprobeaffinities SequenceMatrix MMIntensity returns a column vector of PM probe affinities and a column vector of MM probe affinities computed from their probe sequences and MM probe intensities Each row in AffinPM and AffinMM corresponds to a probe NaN is returned for probes with no sequence information Each probe affinity is the sum of position dependent base affinities For a given base type the positional effect is modeled as a polynomial of degree 3 AffinPM AffinMM BaseProf affyprobeaffinities SequenceMatri
106. fields A C G T e For sequences with the character U the number of U characters is added to the number of T characters e If a sequence contains ambiguous nucleotide characters R Y K M S W B D H V N or gaps indicated with a hyphen this function creates a field Others and displays a warning message Warning Ambiguous symbols symbol list appear in the sequence These will be in Others basecount e Ifa sequence contains undefined nucleotide characters E F H I J L O P Q X Z the characters are counted in the field Others and a warning message is displayed Warning Unknown symbols symbol list appear in the sequence These will be ignored e Ifthe property Others full ambiguous characters are listed separately and hyphens are counted in a new field Gaps basecount PropertyName PropertyValue defines optional properties using property name value pairs basecount Chart ChartValue creates a chart showing the relative proportions of the nucleotides basecount Others OthersValue when OthersValue is full counts all the ambiguous nucleotide symbols individually instead of bundling them together into the Others field of the output structure basecount Structure StructureValue when StructureValue is full blocks the unknown characters warning and ignores counting unknown characters e basecount SeqNT Display four nucleotides and onl
107. geneont GeneontObj geneont File FileValue GeneontObj geneont Live LiveValue GeneontObj geneont Live LiveValue ToFile ToFileValue FileValue file name of an OBO formatted file that is on the MATLAB search path LiveValue Property to create the most up to date geneont object Enter true to create a geneont object Geneont0bj from the most recent version of the Gene Ontology database Default is false ToFileValue file name to which to save the geneont object from the Gene Ontology database GeneontObj geneont searches for the file gene_ontology obo in the MATLAB Current Directory and creates a geneont object GeneontObj geneont File FileValue creates a geneont object Geneont0bj from an OBO formatted file that is on the MATLAB search path GeneontObj geneont Live LiveValue when LiveValue is true creates a geneont object GeneontObj from the most recent version of the Gene Ontology database which is the file at http www geneontology org ontology gene_ontology obo Note The full Gene Ontology database may take several minutes to download when you run this function using the Live property 2 183 geneont GeneontObj geneont Live LiveValue ToFile ToFileValue when LiveValue is true creates a geneont object Geneont0bj from the file at http www geneontology org ontology gene_ontology obo and saves the file to a local file ToFileValue Examples 1 Download
108. gethmmprof Mirror MirrorValue specifies a Web database Choices are e Sanger default e Janelia You can reach other mirror sites by passing the complete URL to the pfamhmmread function Note These mirror sites are maintained separately and may have slight variations For more information about the PFAM database see http www sanger ac uk Software Pfam http pfam janelia org To retrieve a hidden Markov model HMM profile for the global alignment of the 7 transmembrane receptor protein in the secretin family enter either of the following hmm gethmmprof 2 gethmmprof hmm hmm See Also gethmmprof 7tm_2 Name PfamAccessionNumber ModelDescription ModelLength Alphabet MatchEmission InsertEmission NullEmission Beginx Matchx Insertx Deletex FlankingInsertx Loopx Nullx 7tm_2 PF00002 14 1x42 char 296 AA 296x20 double 296x20 double 1x20 double 297x1 double 295x4 double 295x2 double 295x2 double 2x2 double 2x2 double 2x1 double Bioinformatics Toolbox functions gethmmalignment hmmprofalign hmmprofstruct pfamhmmread showhmmprof 2 219 gethmmtree Purpose Syntax Arguments Description Examples 2 220 Phylogenetic tree data from PFAM database Tree gethmmtree AccessionNumber gethmmtree PropertyName PropertyValue gethmmtree ToFile ToFileValue gethmm
109. getrelatives Depth DepthValue includes terms that are related down through a specified number of levels DepthValue in the Gene Ontology database DepthValue is a positive integer Default is 1 1 Download the Gene Ontology database from the Web into MATLAB GO geneont LIVE true MATLAB creates a geneont object and displays the number of terms in the database Gene Ontology object with 20005 Terms 2 Get the relatives for a Gene Ontology term 4 39 getrelatives geneont subontology getrelatives GO 46680 See Also Bioinformatics Toolbox e functions geneont object constructor goannotread num2goid e geneont object methods getancestors getdescendants getmatrix 4 40 Purpose Syntax Arguments Description References See Also isdag biograph Test for cycles in biograph object isdag BGOb BGObj biograph object created by biograph object constructor Tip For introductory information on graph theory functions see Graph Theory Functions in the Bioinformatics Toolbox documentation isdag BGObj returns logical 1 true if an N by N adjacency matrix extracted from a biograph object BGObj is a directed acyclic graph DAG and logical 0 false otherwise In the N by N sparse matrix all nonzero entries indicate the presence of an edge 1 Siek J G Lee L Q and Lumsdaine A 2002 The Boost Graph Library User Guide and Reference Manual Uppe
110. given node BiographNode up to a specified number of generations NumGenerations 1 Create a biograph object cm 0 1100 10011 10000 00001 1 010 0 bg biograph cm 2 Find all nodes interacting with node 1 intNodes getrelatives bg nodes 1 set intNodes Color 7 7 1 bg view Bioinformatics Toolbox function biograph object constructor Bioinformatics Toolbox object biograph object Bioinformatics Toolbox methods of a biograph object dolayout getancestors getdescendants getedgesbynodeid getnodesbyid getrelatives view MATLAB functions get set getrelatives geneont Purpose Syntax Arguments Description Examples Numeric IDs for relatives of Gene Ontology term RelativeIDs getrelatives GeneontObj ID getrelatives PropertyName PropertyValue getrelatives Height HeightValue getrelatives Depth DepthValue GeneontObj ID RelativeIDs getrelatives GeneontObj ID returns the numeric IDs RelativeIDs for the relatives of a term ID including the ID for the term ID is a nonnegative integer or a numeric vector with a set of IDs getrelatives PropertyName PropertyValue defines optional properties using property name value pairs getrelatives Height HeightValue includes terms that are related up through a specified number of levels Height Value in the Gene Ontology database HeightValue is a positive integer Default is 1
111. hmmprofalign multialign nwalign seqprofile seqconsensus proteinplot Purpose Syntax Arguments Description Characteristics for amino acid sequences proteinplot SeqAA SeqAA Amino acid sequence or a structure with a field Sequence containing an amino acid sequence proteinplot SeqAA loads an amino acid sequence into the protein plot GUI proteinplot is a tool for analyzing a single amino acid sequence You can use the results from proteinplot to compare the properties of several amino acid sequences It displays smoothed line plots of various properties such as the hydrophobicity of the amino acids in the sequence Importing Sequences into proteinplot 1 In the MATLAB Command Window type proteinplot Seq_AA The proteinplot interface opens and the sequence Seq AA is shown in the Sequence text box 2 Alternatively type or paste an amino acid sequence into the Sequence text box You can import a sequence with the Import dialog box 1 Click the Import Sequence button The Import dialog box opens 2 From the Import From list select a variable in the MATLAB workspace ASCII text file FASTA formatted file GenPept formatted file or accession number in the GenPept database 2 581 proteinplot 2 582 Information About the Properties You can also access information about the properties from the Help menu 1 From the Help menu click References The Help Browser opens with a list of properties and
112. in Bioinformatics Toolbox getgenpept and genpeptread are unchanged representing the still used GenPept report format Examples getgenpept For more details about the GenBank database see http www ncbi nlm nih gov Genbank Data getgenpept AccessionNumber searches for the accession number in the GenPept database and returns a MATLAB structure containing for the sequence If an error occurs while retrieving the GenBank formatted information then an attempt is make to retrieve the FASTA formatted data getgenpept displays the information to the screen without returning data to a variable The displayed information includes hyperlinks to the URLs used to search for and retrieve the data getgenpept PropertyName PropertyValue defines optional properties using property name value pairs getgenpept ToFile ToFileValue saves the information in a file If you do not give a location or path to the file the file is stored in the MATLAB current directory Read a GenPept formatted file back into MATLAB using the function genpeptread getgenpept FileFormat FileFormatValue returns the sequence in the specified format FileFormatValue getgenpept SequenceOnly SequenceOnlyValue returns only the sequence information without the metadata if SequenceOnlyValue is true When the properties SequenceOnly and ToFile are used together the output file is in the FASTA format To retr
113. in MMMatrix which specifies a chip This chip intensity data is used to compute probe affinities assuming no affinity data is provided Default is 1 Controls the use of optical background correction on the PM and MM intensity values in PMMatrix and MMMatrix Choices are true default or false gcrma CorrConstValue MethodValue TuningParamValue GSBCorrValue NormalizeValue VerboseValue Value that specifies the correlation constant rho for background intensity for each PM MM probe pair Choices are any value gt Oand lt 1 Default is 0 7 String that specifies the method to estimate the signal Choices are MLE a faster ad hoc Maximum Likelihood Estimate method or EB a slower more formal empirical Bayes method Default is MLE Value that specifies the tuning parameter used by the estimate method This tuning parameter sets the lower bound of signal values with positive probability Choices are a positive value Default is 5 MLE or 0 5 EB Tip For information on determining a setting for this parameter see Wu et al 2004 Controls whether gene specific binding GSB correction is performed on the non specific binding NSB data Choices are true default or false Controls whether quantile normalization is performed on background adjusted data Choices are true default or false Controls the display of a progress report showing the number of each chip as it is completed Choices
114. in a structure with the fields AAA AAC AAG TTG TTT e For sequences that have codons with the character U the U characters are added to codons with T characters e Ifthe sequence contains ambiguous nucleotide characters R Y K M S W B DH V N or gaps indicated with a hyphen this function creates a field Others and displays a warning message Warning Ambiguous symbols symbol appear in the sequence These will be in Others 2 103 codoncount Examples 2 104 e If the sequence contains undefined nucleotide characters E F H I J L O P Q X Z codoncount ignores the characters and displays a warning message Warning Unknown symbols symbol appear in the sequence These will be ignored Codons CodonArray codoncount SeqNT returns a 4x4x4 array CodonArray with the raw count data for each codon The three dimensions correspond to the three positions in the codon For example the element 2 3 4 of the array gives the number of CGT codons where A lt gt 1 C lt gt 2 G lt gt 3 and T lt gt 4 codoncount PropertyName PropertyValue defines optional properties using property name value pairs codoncount Frame FrameValue counts the codons in a specific reading frame codoncount Reverse ReverseValue when ReverseValue is true counts the codons for the reverse complement of the sequence codoncount Figure FigureValue when FigureValue is t
115. in arbitrary units other than bits For example if you enter log 2 for ScaleValue then nwalign returns Score in nats Positive integer specifying the penalty for opening a gap in the alignment Default is 8 2 525 nwalign Return Values Description 2 526 ExtendGapValue Positive integer specifying the penalty for extending a gap Default is equal to GapOpenValue ShowscoreValue Controls the display of the scoring space and the winning path of the alignment Choices are true or false default Score Optimal global alignment score in bits Alignment 3 by N character array showing the two sequences Seq and Seq2 in the first and third rows and symbols representing the optimal global alignment for them in the second row Start 2 by 1 vector of indices indicating the starting point in each sequence for the alignment Because this is a global alignment Start is always 131 Score nwalign Seq1 Seq2 returns the optimal global alignment score in bits The scale factor used to calculate the score is provided by the scoring matrix Score Alignment nwalign Seq1 Seq2 returns a 3 by N character array showing the two sequences Seq1 and Seq2 in the first and third rows and symbols representing the optimal global alignment for them in the second row The symbol indicates amino acids or nucleotides that match exactly The symbol indicates amino acids or nucleotides that are related as defined by the scori
116. in the current directory If you set CELFiles to then it opens the Select CEL Files dialog box from which you select the CEL files From this dialog box you can press and hold Ctrl or Shift while clicking to select multiple CEL files If you set CDFFile to then it opens the Select CDF File dialog box from which you select the CDF file ProbeStructure celintensityread PropertyName PropertyValue calls celintensityread with optional celintensityread properties that use property name property value pairs You can specify one or more properties in any order Each PropertyName must be enclosed in single quotation marks and is case insensitive These property name property value pairs are as follows ProbeStructure celintensityread CELPath CELPathValue specifies a path and directory where the files specified in CELFiles are stored ProbeStructure celintensityread CDFPath CDFPathValue specifies a path and directory where the file specified in CDFFile is stored ProbeStructure celintensityread PMOnly PMOnlyValue includes or excludes the mismatch MM probe intensity values When PMOnlyValue is true celintensityread returns only perfect match PM probe intensities When PMOnlyValue is false celintensityread returns both PM and MM probe intensities Default is true ProbeStructure contains the following fields Field Description CDFName File name of t
117. in the vector Nodes Nodes that do not have a predecessor become leaves in the list Nodes In this case pruning is the process of reducing a tree by turning some branch nodes into leaf nodes and removing the leaf nodes under the original branch Load a phylogenetic tree created from a protein family tr phytreeread pf00002 tree view tr To 4 57 prune phytree Remove all the mouse proteins ind getbyname tr mouse tr prune tr ind view tr Remove potential outliers in the tree sel sel_leaves select tr criteria distance threshold 3 reference leaves exclude leaves propagate toleaves tr prune tr sel_leaves view tr See Also Bioinformatics Toolbox e functions phytree object constructor phytreetool e phytree object methods select get 4 58 Purpose Syntax Arguments Return Values Description reorder phytree Reorder leaves of phylogenetic tree Tree1Reordered reorder Tree7 Order Tree Reordered OptimalOrder reorder 7Tree7 Order Approximate ApproximateValue TreeiReordered OptimalOrder reorder Tree1 Tree2 Tree1 Tree2 Phytree objects Order Vector with position indices for each leaf ApproximateValue Controls the use of the optimal leaf ordering calculation to find the closest order possible to the suggested one without dividing the clades or producing crossing branches En
118. information about the PDB format see http www rcsb org pdb file_formats pdb pdbguide2 2 guide2 2 frame html getpdb retrieves protein structure data from the Protein Data Bank PDB database which contains 3 D biological macromolecular structure data PDBStruct getpdb PDBid searches the PDB database for the protein structure record specified by the identifier PDBid and returns the MATLAB structure PDBStruct which contains a field for each PDB record The following table summarizes the possible PDB records and the corresponding fields in the MATLAB structure PDBStruct PDB Database Record Field in the MATLAB Structure HEADER Header OBSLTE Obsolete TITLE Title CAVEAT Caveat COMPND Compound SOURCE Source KEYWDS Keywords EXPDTA ExperimentData AUTHOR Authors REVDAT RevisionDate SPRSDE Superseded 2 223 getpdb 2 224 PDB Database Record Field in the MATLAB Structure JRNL Journal REMARK 1 Remark1 REMARK N Remarkn Note N equals 2 through Note n equals 2 through 999 999 DBREF DBReferences SEQADV SequenceConflicts SEQRES Sequence FTNOTE Footnote MODRES Modif iedResidues HET Heterogen HETNAM HeterogenName HETSYN HeterogenSynonym FORMUL Formula HELIX Helix SHEET Sheet TURN Turn SSBOND SSBond LINK Link HYDBND HydrogenBond SLTBRG SaltBridge CISPEP CISPeptides SITE Site getpdb
119. intensity for the 635 nm channel Note that the example file spotdata txt is not provided with Bioinformatics Toolbox spotStruct sptread spotdata txt maimage spotStruct Rmedian 2 Alternatively create a similar plot using more basic graphics commands Rmedian magetfield spotStruct Rmedian imagesc Rmedian spotStruct Indices colormap bone colorbar Bioinformatics Toolbox functions affyread agferead celintensityread geosoftread gprread imageneread maboxplot magetfield svmclassify Purpose Syntax Description Examples Classify data using support vector machine Group svmclassify SVMStruct Sample Group svmclassify SVMStruct Sample Showplot ShowplotValue Group svmclassify SVMStruct Sample classifies each row of the data in Sample using the information in a support vector machine classifier structure SVMStruct created using the svmtrain function Sample must have the same number of columns as the data used to train the classifier in svmtrain Group indicates the group to which each row of Sample has been assigned Group svmclassify SVMStruct Sample Showplot ShowplotValue controls the plotting of the sample data in the figure created using the Showplot property with the svmtrain function 1 Load the sample data which includes Fisher s iris data of 5 measurements on a sample of 150 irises load fisheriris 2 Create data a two column matrix containing
120. is ones size RefMZ Two element vector in which the first element is negative and the second element is positive that specifies the lower and upper limits of a range in m z units relative to each peak No peak will shift beyond these limits Default is 100 100 Positive value that specifies the width in m z units for all the Gaussian pulses used to build the correlating synthetic spectrum The point of the peak where the Gaussian pulse reaches 60 65 of its maximum is set to the width specified by WidthOfPulsesValue Default is 10 msalign WindowSizeRatioValue Positive value that specifies a scaling IterationsValue GridStepsValue SearchSpaceValue factor that determines the size of the window around every alignment peak The synthetic spectrum is compared to the sample spectrum only within these regions which saves computation time The size of the window is given in m z units by WidthOfPulsesValue WindowSizeRatioValue Default is 2 5 which means at the limits of the window the Gaussian pulses have a value of 4 39 of their maximum Positive integer that specifies the number of refining iterations At every iteration the search grid is scaled down to improve the estimates Default is 5 Positive integer that specifies the number of steps for the search grid At every iteration the search area is divided by GridStepsValue 2 Default is 20 String that specifies the type of search space Choic
121. mairplot cy3data cy5data title R vs G IR plot Bioinformatics Toolbox functions agferead goprread imageneread maboxplot mairplot maloglog malowess sptread maimage Purpose Syntax Arguments Description Spatial image for microarray data maimage X FieldName H maimage H HLines maimage maimage PropertyName PropertyValue maimage Title TitleValue maimage ColorBar ColorBarValue maimage HandleGraphicsPropertyName PropertyValue X A microarray data structure FieldName A field in the microarray data structure X TitleValue A string to use as the title for the plot The default title is FieldName ColorBarValue Property to control displaying a color bar in the figure window Enter either true or false The default value is false maimage X FieldName displays an image of field FieldName from microarray data structure X Microarray data can be GenPix Results GPR format After creating the image click a data point to display the value and ID if known H maimage returns the handle of the image H HLines maimage returns the handles of the lines used to separate the different blocks in the image maimage PropertyName PropertyValue defines optional properties using property name value pairs maimage Title TitleValue allows you to specify the title of the plot The default title is FieldName 2 361
122. mass spectrum with least squares polynomial msviewer Explore mass spectrum or set of mass spectra mzxml2peaks Convert mzXML structure to peak list mzxmlread Read mzXML file into MATLAB as structure 1 20 Functions Alphabetical List aa2int 2 2 Purpose Convert amino acid sequence from letter to integer representation Syntax SeqInt aa2int SeqChar Arguments Seqchar Either of the following e Character string of single letter codes specifying an amino acid sequence See the table Mapping Amino Acid Letters to Integers on page 2 2 for valid codes Unknown characters are mapped to 0 Integers are arbitrarily assigned to TUB IUPAC letters e Structure containing a Sequence field that contains an amino acid sequence such as returned by fastaread getembl getgenpept or getpdb Return SegInt Row vector of integers specifying an amino acid sequence Values Mapping Amino Acid Letters to Integers Amino Acid Code Integer Alanine 1 Arginine Asparagine Aspartic acid Aspartate Cysteine Glutamine Glutamic acid Glutamate Glycine EOM ee a olalN oal a sA w p Histidine aa2int Description Examples Amino Acid Code Integer Isoleucine I 10 Leucine L 11 Lysine K 12 Methionine M 13 Phenylalanine F 14 Proline P 15 Serine S 16 Threonine T 17 Tryptophan W 18 Tyrosine Y 19 Valine V 20
123. musculus mRNA for Sid394 ESTs ESTs Highly similar to unknown ESTs Weakly similar to KIAAD2 cyclic nucleotide phosphodieste ESTs Highly similar to calcium ESTs Weakly similar to KIAADS ESTs Highly similar to PROBAE ESTs Weakly similar to HEM45 ESTs Moderately similar to HYF Following is an M A plot of unnormalized data 2 375 mairplot Figure 1 MAIRPIot Am File Tools Window Help Rvs GIR plot T T T T oe T T T T Normalize EPa F Show smooth curve Up Regulated ESTs Highly similar to UTP GL a ESTs speckle type POZ protein ESTs Weakly similar to myelin ESTs Mus musculus mRNA for Sid394 ofl Down Regulated RAB25 member RAS oncogene ESTs A ESTs Highly similar to PTB ASE ESTs Threshold Show factor lines Fold change p Update Reset Clear Expor The intensity versus ratio scatter plot displays the following e log Intensity versus log Ratio scatter plot of genes e Two horizontal fold change lines at a fold change level of 2 which corresponds to a ratio of 1 and 1 on a log Ratio scale Lines will be at different fold change levels if you used the FactorLines property e Data points for genes that are considered differentially expressed outside of the fold change lines appear in orange 2 376 mairplot Examples After you display the intensity versus ratio scatter plo
124. new featuresmap structure features It then uses the fieldnames function to return all qualifiers for one of the features D_loop GenBankStructure getgenbank J01415 features featuresparse GenBankStructure features source 1x1 struct D_loop 1x2 struct rep_origin 1x3 struct repeat_unit 1x4 struct misc_signal 1x1 struct misc_RNA 1x1 struct variation 1x17 struct tRNA 1x22 struct rRNA 1x2 struct mRNA 1x10 struct CDS 1x13 struct conflict 1x1 struct fieldnames features D_ loop ans Location Indices note citation See Also featuresparse genbankread getgenbank seqtool 2 151 featuresparse Purpose Parse features from GenBank GenPept or EMBL data Syntax FeatStruct featuresparse Features FeatStruct featuresparse Features Feature FeatureValue FeatStruct featuresparse Features Sequence SequenceValue Arguments Features Any of the following e String containing GenBank GenPept or EMBL features e MATLAB character array including text describing GenBank GenPept or EMBL features e MATLAB structure with fields corresponding to GenBank GenPept or EMBL data such as those returned by genbankread genpeptread emblread getgenbank getgenpept or getembl FeatureValue Name of a feature contained in Features When specified featuresparse returns only the substructure that corresponds to this feature
125. of pair wise distances such as returned by the seqpdist function Tree seqlinkage Dist Method creates a phylogenetic tree object using a specified patristic distance method The available methods are 2 641 seqlinkage Examples See Also 2 642 single Nearest distance single linkage method complete Furthest distance complete linkage method average default Unweighted Pair Group Method Average UPGMA group average weighted Weighted Pair Group Method Average WPGMA centroid Unweighted Pair Group Method Centroid UPGMC median Weighted Pair Group Method Centroid WPGMC Tree seqlinkage Dist Method Names passes a list of names to label the leaf nodes for example species or products in a phylogenetic tree object Load a multiple alignment of amino acids seqs fastaread pf00002 fa Measure the Jukes Cantor pairwise distances dist seqpdist seqs method jukes cantor indels pair Build the phylogenetic tree with the single linkage method and pass the names of the sequences tree seqlinkage dist single seqs view tree Bioinformatics Toolbox functions phytree object constructor phytreewrite seqpdist seqneighjoin Bioinformatics Toolbox methods of phytree object plot view seqlogo Purpose Syntax Arguments Display sequence logo for nucleotide or amino acid sequences seqlogo Seqs seqlogo Profile
126. peak list is the peak that is closest to the common peak s m z value e shortest path For each common peak in the CMZ vector its counterpart in each peak list is selected using the shortest path algorithm Return CMZ Vector of common mass charge m z values Values estimated by the mspalign function AlignedPeaks Cell array of peak lists with the same form as Peaks but with corrected m z values in the first column of each matrix 2 456 mspalign Description CMZ AlignedPeaks mspalign Peaks aligns mass spectra from multiple peak lists centroided data by first estimating CMZ a vector of common mass charge m z values estimated by considering the peaks in all spectra in Peaks a cell array of peak lists where each element corresponds to a spectrum or retention time It then aligns the peaks in each spectrum to the values in CMZ creating AlignedPeaks a cell array of aligned peak lists CMZ AlignedPeaks mspalign Peaks PropertyName PropertyValue calls mspalign with optional properties that use property name property value pairs You can specify one or more properties in any order Each PropertyName must be enclosed in single quotation marks and is case insensitive These property name property value pairs are as follows CMZ AlignedPeaks mspalign Peaks Quantile QuantileValue determines which peaks are selected by the estimation method to create CMZ the vector of common m z
127. profalign GapOpen G1Value G2Value sets the penalties for opening a gap in the first and second profiles respectively G1Value and G2Value can be either scalars or vectors When using a vector the number of elements is one more than the length of the input profile Every element indicates the position specific penalty for opening a gap profalign Examples between two consecutive symbols in the sequence The first and the last elements are the gap penalties used at the ends of the sequence The default gap open penalties are 10 10 profalign ExtendGap E1Value E2Value sets the penalties for extending a gap in the first and second profile respectively E1Value and E2Value can be either scalars or vectors When using a vector the number of elements is one more than the length of the input profile Every element indicates the position specific penalty for extending a gap between two consecutive symbols in the sequence The first and the last elements are the gap penalties used at the ends of the sequence If ExtendGap is not specified then extensions to gaps are scored with the same value as GapOpen profalign ExistingGapAdjust ExistingGapAdjustValue if ExistingGapAdjustValue is false turns off the automatic adjustment based on existing gaps of the position specific penalties for opening a gap When ExistingGapAdjustValue is true for every profile position profalign proportionally lowers the penal
128. ratios Default is IR Controls the conversion of data in X and Y from natural scale to log scale Set LogTransValue to false when the data is already log scale Default is true which assumes the data is natural scale 2 371 mairplot 2 372 FactorLinesValue TitleValue LabelsValue NormalizeValue LowessOptionsValue Adds lines to the plot showing a factor of N change Default is 2 which corresponds to a level of 1 and 1 on a log scale Tip You can also change the factor lines interactively after creating the plot String that specifies a title for the plot Cell array of labels for the data If labels are defined then clicking a point on the plot shows the label corresponding to that point Controls the display of lowess normalized ratio values Enter true to display to lowess normalized ratio values Default is false Tip You can also normalize the data from the MAIR Plot window after creating the plot Cell array of one two or three property name value pairs in any order that affect the lowess normalization Choices for property name value pairs are e Order OrderValue e Robust RobustValue e Span SpanValue For more information on the preceding property name value pairs see malowess mairplot Return Values Description Intensity Vector containing intensity values for the microarray gene expression data calculated as e log of the product of the
129. same mass charge m z range where each row corresponds to an m z value and each column corresponds to a spectrum The intensity values represent a shifting and scaling of the data RefMZOut Vector of m z values of reference masses calculated from RefMZ and the sample data from multiple spectra in Intensities when GroupValue is set to true IntensitiesOut msalign MZ Intensities RefMZ aligns the peaks in a raw mass spectrum or spectra represented by Intensities and MZ to reference peaks provided by RefMZ First it creates a synthetic spectrum from the reference peaks using Gaussian pulses centered at the m z values specified by RefMZ Then it shifts and scales the m z scale to find the maximum alignment between the input spectrum or spectra and the synthetic spectrum It uses an iterative multiresolution grid search until it finds the best scale and shift factors for each spectrum Once the new m z scale is determined the corrected spectrum or spectra are created by resampling their intensities at the original m z values creating IntensitiesOut a vector or matrix of corrected intensity values The resampling method preserves the shape of the peaks 2 415 msalign 2 416 Note The msalign function works best with three to five reference peaks marker masses that you know will appear in the spectrum If you use a single reference peak internal standard there is a possibility of aligning sample peaks to the i
130. selects a method Method to compute the distances of the new nodes to all other nodes at every iteration The general expression to calculate the distances between the new node n after joining i and j and all other nodes k is given by D n k a D i k 1 a D j k a D n i 1 a D n j This expression is guaranteed to find the correct tree with additive data minimum variance reduction The following table describes the values for Method 2 651 seqneighjoin equivar Assumes equal variance and independence of default evolutionary distance estimates a 1 2 Such as in Studier and Keppler JMBE 1988 firstorder Assumes a first order model of the variances and covariances of evolutionary distance estimates a is adjusted at every iteration to a value between 0 and 1 Such as in Gascuel JMBE 1997 average New distances are the weighted average of previous distances while the branch distances are ignored D n k D i k D j k 2 As in the original neighbor joining algorithm by Saitou and Nei JMBE 1987 Tree seqneighjoin Dist Method Names passes a list of names Names to label the leaf nodes e g species or products in the phylogenetic tree object seqneighjoin PropertyName PropertyValue defines optional properties using property name value pairs seqneighjoin Reroot RerootValue when RerootValue is false excludes rerooting the resultin
131. sepal length and sepal width measurements for 150 irises data meas 1 meas 2 3 From the species vector create a new column vector groups to classify data into two groups Setosa and non Setosa groups ismember species setosa 4 Randomly select training and test sets train test crossvalind holdOut groups cp classperf groups 2 689 svmclassify 5 Use the svmtrain function to train an SVM classifier using a linear kernel function and plot the grouped data svmStruct svmtrain data train groups train showplot true ini xi File Edit View Insert Tools Desktop Window Help a DSHS i aana Ee o 4 5 6 Add a title to the plot using the KernelFunction field from the svmStruct structure as the title title sprintf Kernel Function s func2str svmStruct KernelFunction interpreter none 2 690 svmclassify Figure 1 KEG P 8 e 7 Classify the test set using a support vector machine classes svmclassify svmStruct data test showplot true 2 691 svmclassify File Edit View Insert Tools Desktop Window Help D eUs se aana Enl A Kernel Function linear_kernel 45 0 training 0 classified 1 training 1 classified Support Vectors oe 8 t
132. sequences to profile Hidden Markow Model HMM hmmprofmerge Sequences hmmprofmerge Sequences Names hmmprofmerge Sequences Names Scores Sequences Array of sequences Sequences can also be a structured array with the aligned sequences in a field Aligned or Sequences and the optional names in a field Header or Name Names Names for the sequences Enter a vector of names Scores Pairwise alignment scores from the function hmmprofalign Enter a vector of values with the same length as the number of sequences in Sequences hmmprofmerge Sequences displays a set of prealigned sequences toa HMM model profile The output is aligned corresponding to the HMM states e Match states Uppercase letters e Insert states Lowercase letters or asterisks e Delete states Dashes Periods are added at positions corresponding to inserts in other sequences The input sequences must have the same number of profile states that is the joint count of capital letters and dashes must be the same hmmprofmerge Sequences Names labels the sequences with Names hmmprofmerge Sequences Names Scores sorts the displayed sequences using Scores 2 315 hmmprofmerge Examples load hmm_model_examples model_7tm_2 load model load hmm_model_examples sequences load sequences for ind 1 length sequences scores ind sequences ind Aligned hmmprofalign model_7tm_2 sequences ind Sequence end hmmprof
133. sum Y title Extracted Ion Chromatogram XIC from 450 to 500 M Z xlabel Retention Time ylabel Relative Intensity 2 484 msppresample Figure 4 5h See Also Bioinformatics Toolbox functions msdotplot mspeaks mspalign msresample mzxml2peaks mzxmlread 2 485 msresample Purpose Syntax Arguments Description 2 486 Resample mass spectrometry signal MZout Yout msresample MZ Y N msresample PropertyName PropertyValue msresample Uniform UniformValue msresample Range RangeValue msresample Missing MissingValue msresample Window WindowValue msresample Cutoff CutoffValue msresample ShowPlot ShowPlotValue MZ Mass charge vector with the range of ions in the spectra Y Ion intensity vector with the same length as the mass charge vector MZ Y can also be a matrix with several spectra that share the same mass charge MZ range N Total number of samples MZout Yout msresample MZ Y N resamples a raw mass spectrum Y The output spectrum will have N samples with a spacing that increases linearly within the range min MZ max MZ MZ can be a linear or a quadratic function of its index When input arguments are set such that down sampling takes place msresample applies a lowpass filter before resampling to minimize aliasing For the antialias filter msresample uses a linear phase
134. that maps the training data into kernel space Kernel_FunctionValue can be one of the following strings or a function handle linear Default Linear kernel or dot product quadratic Quadratic kernel rbf Gaussian Radial Basis Function kernel with a default scaling factor sigma of 1 polynomial Polynomial kernel with a default order of 3 mlp Multilayer Perceptron kernel with default scale and bias parameters of 1 1 functionname Handle to a kernel function specified using and the functionname For example kfun or an anonymous function A kernel function must be of the following form function K kfun U V 2 705 svmtrain 2 706 Input arguments U and V are matrices with m and n rows respectively Return value K is an m by n matrix If kfun is parameterized you can use anonymous functions to capture the problem dependent parameters For example suppose that your kernel function is function K kfun U V P1 P2 K tanh P1 U V P2 You can set values for P1 and P2 and then use an anonymous function as follows U V kfun U V P1 P2 For more information on the types of functions that can be used as kernel functions see Cristianini and Shawe Taylor 2000 SVMStruct svmtrain RBF_Sigma RBFSigmaValue specifies the scaling factor sigma in the radial basis function kernel RBFSigmaValue must be a positive number Default is 1 SVMStruct svmtrain
135. the Gene Ontology database from the Web into MATLAB GO geneont LIVE true MATLAB creates a geneont object and displays the number of terms in the database Gene Ontology object with 20005 Terms 2 Display information about the geneont object get GO default_namespace gene_ontology format_version 1 0 date 01 11 2005 16 51 Terms 20005x1 geneont term 3 Search for all GO terms in the geneont object that contain the string ribosome in the property field name and create a structure of those terms comparison regexpi get GO Terms name ribosome indices find cellfun isempty comparison terms _with_ribosmome GO Term indices 23x1 struct array with fields id name ontology definition synonym 2 184 geneont is_a part_of obsolete See Also Bioinformatics Toolbox functions goannotread num2goid Bioinformatics Toolbox object geneont object Bioinformatics Toolbox methods of geneont object getancestors getdescendants getmatrix getrelatives 2 185 generangefilter Purpose Remove gene profiles with small profile ranges Syntax Mask generangefilter Data Mask FData generangefilter Data Mask FData FNames generangefilter Data Names generangefilter PropertyName PropertyValue generangefilter Percentile PercentileValue generangefilter AbsValue AbsValueValue generangefilter LOGPercentile LOGPercentileValue generange
136. the sample spectrum only within these regions which saves computation time The size of the window is given in m z units by WidthOfPulsesValue WindowSizeRatioValue Choices are any positive value Default is 2 5 which means at the limits of the window the Gaussian pulses have a value of 4 39 of their maximum 2 417 msalign 2 418 msalign Iterations IterationsValue specifies the number of refining iterations At every iteration the search grid is scaled down to improve the estimates Choices are any positive integer Default is 5 msalign GridSteps GridStepsValue specifies the number of steps for the search grid At every iteration the search area is divided by GridStepsValue 2 Choices are any positive integer Default is 20 msalign SearchSpace SearchSpaceValue specifies the type of search space Choices are e regular Default Evenly spaced lattice e latin Random Latin hypercube with GridStepsValue 2 samples msalign ShowPlot ShowPlotValue controls the display of a plot of an original and aligned spectrum over the reference masses specified by RefMZ Choices are true false or I an integer specifying the index of a spectrum in Intensities If set to true the first spectrum in Intensities is plotted Default is e false When return values are specified e true When return values are not specified IntensitiesOut RefMZ
137. the sequence position Default is false 2 144 featuresmap Description featuresmap GBStructure creates a linear or circular map of all features from a GenBank structure typically created using the getgenbank or the genbankread function featuresmap GBStructure FeatList creates a linear or circular map of a subset of features from a GenBank structure FeatList lets you specify features from the list of all features in the GenBank structure to include in or exclude from the map e If FeatList is a cell array of features these features are mapped Any features in FeatList not found in the GenBank structure are ignored e If FeatList includes as the first string in the cell array then the remaining strings features are not mapped By default FeatList is a list of all features in the GenBank structure featuresmap GBStructure FeatList Levels or featuresmap GBStructure Levels indicates which level on the map each feature is drawn Level 1 is the left most linear map or inner most circular map level and level N is the right most linear map or outer most circular map level where N is the number of features Levels is a vector of N integers where N is the number of features Each integer represents the level in the map for the corresponding feature For example ifLevels 1 1 2 3 3 the first two features would appear on level 1 the third feature on level 2 and the fourth and fifth features on lev
138. then read back into MATLAB getemb1 X00558 ToFile rat_protein txt EMBLData emblread rat_protein txt Bioinformatics Toolbox functions fastaread genbankread getembl seqtool evalrasmolscript Purpose Syntax Arguments Description Send RasMol script commands to Molecule Viewer window evalrasmolscript FigureHandle Command evalrasmolscript FigureHandle File FileValue FigureHandle Command FileValue Figure handle to a molecule viewer returned by the molviewer function Either of the following e String specifying one or more RasMol script commands Use a to separate commands e Character array or cell array containing strings specifying RasMol script commands Note For a complete list of RasMol script commands see http www stolaf edu academics chemapps jmol docs String specifying a file name or a path and file name of a text file containing Jmol script commands If you specify only a file name that file must be on the MATLAB search path or in the MATLAB Current Directory evalrasmolscript FigureHandle Command sends the RasMol script commands specified by Command to FigureHand1e the figure handle of a Molecule Viewer window created using the molviewer function evalrasmolscript FigureHandle File FileValue sends the RasMol script commands specified by FileValue to FigureHand1e the 2 133 evalrasmolscript Examples See Also 2 134 figure handl
139. to create the Peaks cell array Times Vector of retention times associated with an LC MS or GC MS data set The number of elements in Times equals the number of elements in the cell array Peaks Tip You can use the mzxml2peaks function to create the Times vector FigHandle Handle to an open Figure window such as one created by the msheatmap function QuantileValue Value that specifies a percentage When peaks are ranked by intensity only those that rank above this percentage are plotted Choices are any value gt Oand lt 1 Default is 0 For example setting QuantileValue 0 plots all peaks and setting QuantileValue 0 8 plots only the 20 most intense peaks msdotplot Return Values Description Examples PlotHandle Handle to the line series object figure plot msdotplot Peaks Times plots a set of peak lists from a liquid chromatography mass spectrometry LC MS or gas chromatography mass spectrometry GC MS data set represented by Peaks a cell array of peak lists where each element is a two column matrix with m z values in the first column and ion intensity values in the second column and Times a vector of retention times associated with the spectra Peaks and Times have the same number of elements The data is plotted into any existing figure generated by the msheatmap function otherwise the data is plotted into a new Figure window msdotplot FigHandle Peaks Times plots the set of peak lis
140. type and number of atoms in an amino acid sequence SegAA and returns the counts ina 1 by 1 structure NumberAtoms with fields C H N 0 and S 1 Get an amino acid sequence from the NCBI Genpept Database rhodopsin getgenpept NP_000530 2 Count the atoms in a sequence rhodopsinAC atomiccomp rhodopsin rhodopsinAC C 1814 H 2725 N 423 O 477 S 25 3 Retrieve the number of carbon atoms in the sequence rhodopsinACc C ans 1814 atomiccomp See Also Bioinformatics Toolbox functions aacount molweight proteinplot 2 47 basecount Purpose Syntax Arguments Description 2 48 Count nucleotides in sequence NumberBases basecount SeqNT basecount PropertyName PropertyValue basecount Chart ChartValue basecount Others OthersValue basecount Structure StructureValue SeqnT Nucleotide sequence Enter a character string with the letters A T U C and G The count for U characters is included with the count for T characters You can also enter a structure with the field Sequence ChartValue Property to select a type of plot Enter either pie or bar OthersValue Property to control counting ambiguous characters individually Enter either full or bundle default NumberBases basecount SeqNT counts the number of bases in a nucleotide sequence SeqNT and returns the base counts in a 1 by 1 structure Bases with the
141. uses the node properties FontSize and Shape and the biograph object property LayoutScale to precalculate the actual size of each node When NodeAutoSize is set to off the layout engine uses the node property Size For more information on the above properties see Properties of a Biograph Object on page 5 4 For information on accessing and specifying the above properties of a biograph object see and 4 8 dolayout biograph dolayout BGobj Paths PathsOnlyValue controls the calculation of only the edge paths leaving the nodes at their current positions Choices are true or false default Examples 1 Create a biograph object cm 0 1100 10011 10000 00001 1 010 0J bg biograph cm Biograph object with 5 nodes and 9 edges bg nodes 1 Position ans Nodes do not have a position yet 2 Call the layout engine and render the graph dolayout bg bg nodes 1 Position ans 112 224 view bg 3 Manually modify a node position and recalculate the paths only bg nodes 1 Position 150 150 dolayout bg Pathsonly true view bg See Also Bioinformatics Toolbox function biograph object constructor Bioinformatics Toolbox object biograph object 4 9 dolayout biograph Bioinformatics Toolbox methods of a biograph object dolayout getancestors getdescendants getedgesbynodeid getnodesbyid getrelatives view MATLAB functions get set Purpose Syntax Arguments Descript
142. using the view function to display the biograph in the Biograph Viewer you can double click an edge to activate the first callback or right click and select a callback to activate Default is edge inspect edge which displays the Property Inspector dialog box CustomNodeDrawFcnValue Function handle to customized function to draw nodes Default is BGobj biograph CMatrix creates a biograph object BGobj using a connection matrix CMatrix All nondiagonal and positive entries in the connection matrix CMatrix indicate connected nodes rows represent the source nodes and columns represent the sink nodes BGobj biograph CMatrix NodeIDs specifies the node identification strings NodeIDs can be 2 59 biograph 2 60 e Cell array of strings with the number of strings equal to the number of rows or columns in the connection matrix CMatrix Each string must be unique e Character array with the number of rows equal to the number of nodes Each row in the array must be unique e String with the number of characters equal to the number of nodes Each character must be unique Default values are the row or column numbers Note If you want to specify property name value pairs you must specify NodeIDs Set NodeIDs to to use the default values of the row column numbers BGobj biograph PropertyName PropertyValue calls biograph with optional properties that use property name property valu
143. values Choices are a scalar between 0 and 1 Default is 0 95 CMZ AlignedPeaks mspalign Peaks EstimationMethod EstimationMethodValue specifies the method used to estimate CMZ the vector of common mass charge m z values Choices are e histogram Default method Peak locations are clustered using a kernel density estimation approach The peak ion intensity is used as a weighting factor The center of all the clusters conform to the CMZ vector e regression Takes a sample of the distances between observed significant peaks and regresses the inter peak distance to create the CMZ vector with similar inter element distances CMZ AlignedPeaks mspalign Peaks CorrectionMethod CorrectionMethodValue specifies the method used to align each peak list to the CMZ vector Choices are 2 457 mspalign Examples 2 458 e nearest neighbor Default method For each common peak in the CMZ vector its counterpart in each peak list is the peak that is closest to the common peak s m z value shortest path For each common peak in the CMZ vector its counterpart in each peak list is selected using the shortest path algorithm Load a MAT file included with Bioinformatics Toolbox which contains liquid chromatography mass spectrometry LC MS data variables including peaks and ret_time peaks is a cell array of peak lists where each element is a two column matrix of m z values and ion i
144. variations and assessment of gene effects Nucleic Acids Research 29 2549 2557 2 Hoffmann R Seidl T and Dugas M 2002 Profound effect of normalization on detection of differentially expressed genes in oligonucleotide microarray data analysis Genome Biology 3 7 research 0033 1 0033 11 affyinvarsetnorm malowess manorm quantilenorm mairplot Purpose Create intensity versus ratio scatter plot of microarray data Syntax mairplot DataX DataY Intensity Ratio Intensity Ratio H mairplot Type mairplot mairplot mairplot mairplot mairplot Arguments Datax DataY TypeValue LogTransValue mairplot Datax DatayY mairplot Datax DatayY TypeValue LogTrans LogTransValue FactorLines FactorLinesValue mairplot Title TitleValue Labels LabelsValue Normalize NormalizeValue LowessOptions LowessOptionsValue Vectors of gene expression values where each row corresponds to a gene For example in a two color microarray experiment DataX could be cy3 intensity values and DataY could be cy5 intensity values String that specifies the plot type Choices are IR plots log of the product of the DataX and DatayY intensities versus log of the intensity ratios or MA plots 1 2 log of the product of the Datax and DatayY intensities versus log of the intensity
145. view the minimal spanning tree of the undirected graph ST pred graphminspantree UG ST 6 1 0 2900 6 2 0 2900 5 3 0 3200 5 4 0 3600 2 275 graphminspantree 6 5 0 2100 pred 0 6 5 5 6 1 view biograph ST ShowArrows off ShowWeights on Biograph Viewer 2 C8 2 276 graphminspantree References See Also 1 Kruskal J B 1956 On the Shortest Spanning Subtree of a Graph and the Traveling Salesman Problem Proceedings of the American Mathematical Society 7 48 50 2 Prim R 1957 Shortest Connection Networks and Some Generalizations Bell System Technical Journal 36 1389 1401 3 Siek J G Lee L Q and Lumsdaine A 2002 The Boost Graph Library User Guide and Reference Manual Upper Saddle River NJ Pearson Education Bioinformatics Toolbox functions graphallshortestpaths graphconncomp graphisdag graphisomorphism graphisspantree graphmaxflow graphpred2path graphshortestpath graphtopoorder graphtraverse Bioinformatics Toolbox method of biograph object minspantree 2 277 graphpred2path Purpose Convert predecessor indices to paths Syntax path graphpred2path pred D Arguments pred Row vector or matrix of predecessor node indices The value of the root or source node in pred must be 0 D Destination node in pred Description Tip For introductory information on graph theory functions see Graph Theory Functions
146. with all the flow values for every edge FlowMatrix Xx Y is the flow from node X to node Y Output Cut is a logical row vector indicating the nodes connected to SNode after calculating the minimum cut between SNode and TNode If several solutions to the minimum cut problem exist then Cut is a matrix graphmaxflow G SNode TNode PropertyName PropertyValue calls graphmaxflow with optional properties that use property name property value pairs You can specify one or more properties in any order Each PropertyName must be enclosed in single quotes and is case insensitive These property name property value pairs are as follows graphmaxflow G SNode TNode Capacity CapacityValue lets you specify custom capacities for the edges CapacityValue is a column vector having one entry for every nonzero value edge in matrix G The order of the custom capacities in the vector must match the order of the nonzero values in matrix G when it is traversed column wise By default graphmaxf low gets capacity information from the nonzero entries in matrix G graphmaxflow G SNode TNode Method MethodValue lets you specify the algorithm used to find the minimal spanning tree MST Choices are e Edmonds Uses the Edmonds and Karp algorithm the implementation of which is based on a variation called the labeling 2 265 graphmaxflow algorithm Time complexity is 0 N E 2 where N and
147. x File Edit View Insert Tools Desktop Window Help Oe eS Fs RAMS 2 08 50 g 8 Relative Intensity 8 S 8 8 Mass Charge M Z x10 References 1 Morris J S Coombes K R Koomen J Baggerly K A and Kobayash R 2005 Feature extraction and quantification for mass spectrometry in biomedical applications using the mean spectrum Bioinfomatics 21 9 1764 1775 2 Yasui Y Pepe M Thompson M L Adam B L Wright G L Qu Y Potter J D Winget M Thornquist M and Feng Z 2003 A data analytic strategy for protein biomarker discovery profiling of 2 476 mspeaks high dimensional proteomic data for cancer detection Biostatistics 4 3 449 463 3 Donoho D L and Johnstone I M 1995 Adapting to unknown smoothness via wavelet shrinkage J Am Statist Asso 90 1200 1224 4 Strang G and Nguyen T 1996 Wavelets and Filter Banks Wellesley Cambridge Press 5 Coombes K R Tsavachidis S Morris J S Baggerly K A Hung M C and Kuerer H M 2005 Improved peak detection and quantification of mass spectrometry data acquired from surface enhanced laser desorption and ionization by denoising spectra with the undecimated discrete wavelet transform Proteomics 5 16 4107 4117 See Also Bioinformatics Toolbox functions msbackadj msdotplot mslowess mspalign msppresample mssgolay 2 477 msppresample Purpose Synta
148. 0 A 2 347 knnimpute Example 2 References 2 348 Note that A 3 1 NaN Because column 2 is the closest column to column 1 in Euclidean distance knnimpute imputes the 3 1 entry of column 1 to be the corresponding entry of column 2 which is 1 knnimpute A ans 1 2 5 4 5 7 1 1 8 7 6 0 The following example loads the data set yeastdata and imputes missing values in the array yeastvalues load yeastdata Remove data for empty spots emptySpots strcmp EMPTY genes yeastvalues emptySpots genes emptySpots Impute missing values imputedValues knnimpute yeastvalues 1 Speed T 2003 Statistical Analysis of Gene Expression Microarray Data Chapman amp Hall CRC 2 Hastie T Tibshirani R Sherlock G Eisen M Brown P Botstein D 1999 Imputing missing data for gene expression arrays Technical Report Division of Biostatistics Stanford University 3 Troyanskaya O Cantor M Sherlock G Brown P Hastie T Tibshirani R Botstein D Altman R 2001 Missing value estimation methods for DNA microarrays Bioinformatics 17 6 520 525 knnimpute See Also Bioinformatics Toolbox function knnclassify MATLAB function isnan Statistics Toolbox functions nanmean nanmedian pdist 2 349 maboxplot Purpose Box plot for microarray data Syntax maboxplot MAData maboxplot MAData ColumnName maboxplot MAStruct FieldName H maboxplot H
149. 19 BAIT HUMAN 944 119 GPRE4 HUMAN 62 j SEB1 164 436 MTH_DROME 211 480 O1 015 O02 026 O38 O86 O4 3 From the phytree object create a connection matrix to represent the phylogenetic tree CM labels dist getmatrix tr 4 Find the nodes from the root to one leaf in the phylogenetic tree created from the phylogenetic tree file for the GLR_HUMAN protein root_loc size CM 1 root_loc 2 280 graphpred2path References See Also 65 glr_loc strmatch GLR labels glr_loc 28 T PRED graphminspantree CM root_loc PATH graphpred2path PRED glr_loc PATH 65 64 53 52 46 45 44 43 28 1 Siek J G Lee L Q and Lumsdaine A 2002 The Boost Graph Library User Guide and Reference Manual Upper Saddle River NJ Pearson Education Bioinformatics Toolbox functions graphallshortestpaths graphconncomp graphisdag graphisomorphism graphisspantree graphmaxflow graphminspantree graphshortestpath graphtopoorder graphtraverse 2 281 graphshortestpath Purpose Syntax Arguments 2 282 Solve shortest path problem in graph dist path pred graphshortestpath G S dist path pred graphshortestpath G S T graphshortestpath Directed DirectedValue graphshortestpath Method MethodValue graphshortestpath Weights WeightsValue G N by N sparse matrix that represents a graph Nonzero ent
150. 2 git g1 directed false F 1 Fortin S 1996 The Graph Isomorphism Problem Technical Report 96 20 Dept of Computer Science University of Alberta Edomonton Alberta Canada 2 McKay B D 1981 Practical Graph Isomorphism Congressus Numerantium 30 45 87 3 Siek J G Lee L Q and Lumsdaine A 2002 The Boost Graph Library User Guide and Reference Manual Upper Saddle River NJ Pearson Education Bioinformatics Toolbox functions graphallshortestpaths graphconncomp graphisdag graphisspantree graphmaxf low graphminspantree graphpred2path graphshortestpath graphtopoorder graphtraverse Bioinformatics Toolbox methods of biograph object isomorphism 2 261 graphisspantree Purpose Syntax Arguments Description Examples 2 262 Determine if tree is spanning tree TF graphisspantree G G N by N sparse matrix whose lower triangle represents an undirected graph Nonzero entries in matrix G indicate the presence of an edge Tip For introductory information on graph theory functions see Graph Theory Functions in the Bioinformatics Toolbox documentation TF graphisspantree G returns logical 1 true if Gis a spanning tree and logical 0 false otherwise A spanning tree must touch all the nodes and must be acyclic Gis an N by N sparse matrix whose lower triangle represents an undirected graph Nonzero entries in matrix G indicate the presence of an edge 1 Creat
151. 2 lel x File Edit View Insert Tools Desktop Window Help a Branch 3 Branch 5 3 Move the root to a branch that makes the tree as ultrametric as possible tr_3 reroot tr_2 plot tr_3 branchlabels true MATLAB draws the new tree with the root moved from the center of branch 7 to branch 8 4 65 reroot phytree File Edit View Insert Tools Desktop Window Help a Branch 3 Branch 1 Branch 5 Branch amp Branch 4 Branch 7 Branch 6 Branch 2 See Also Bioinformatics Toolbox e functions phytree object constructor seqneighjoin e phytree object methods get getbyname prune select 4 66 select phytree Purpose Syntax Arguments Description Select tree branches and leaves in phytree object S select Tree N S Selleaves Selbranches select select PropertyName PropertyValue select Reference ReferenceValue select Criteria CriteriaValue select Threshold ThresholdValue select Exclude ExcludeValue select Propagate PropagateValue Tree Phylogenetic tree phytree object created with N ReferenceValue CriteriaValue ThresholdValue ExcludeValue PropagateValue the function phytree Number of closest nodes to the root node Property to select a reference point for measuring distance Property to select a criteria for measuring distance Property to select a
152. 2rna MATLAB functions strrep regexp scfread Purpose Syntax Arguments Description Read trace data from SCF file Sample scfread File Sample Probability scfread File Sample Probability Comments scfread File A C T G scfread File A C T G ProbA ProbC ProbG ProbT scfread File A C T G ProbA ProbC ProbG ProbT Comments PkIndex Base scfread File File SCF formatted file Enter a file name or a path and file name scfread reads data from an SCF formatted file into MATLAB structures Sample scfread File reads an SCF formatted file and returns the sample data in the structure Sample which contains the following fields Field Description A Column vector containing intensity of A fluorescence tag c Column vector containing intensity of C fluorescence tag G Column vector containing intensity of G fluorescence tag T Column vector containing intensity of T fluorescence tag Sample Probability scfread File also returns the probability data in the structure Probability which contains the following fields 2 625 scfread 2 626 Field Description peak_index Column vector containing the position in the SCF file for the start of the data for each peak prob A Column vector containing the probability of each base in the sequence being an A prob C Column vector containing the probability of
153. 3142547649 8 10 5 3 true 10 10 DG x x wr vr wr vr wr rer a a a a TS wa Fa ee Ces Cree Cs Cree Crees Cs Conroe Cs Crone Cs Crees Cs Ce graphconncomp 8 9 1 9 10 1 h view biograph DG Biograph Viewer 1 Ee x File Tools Window Help a SE y aA Node 8 Node 6 Node9 Node 7 Node 2 ae aa i ode 1 i 2 Find the number of strongly connected components in the directed belongs graph and determine to which component each of the 10 nodes S C graphconncomp DG 2 245 graphconncomp 4 4 4 1 1 2 2 3 Color the nodes for each component with a different color colors jet S for i 1 numel h nodes h Nodes i Color colors C i end 2 246 graphconncomp References Biograph Viewer 1 4 5 x File Tools Window Help QQ 1 Tarjan R E 1972 Depth first search and linear graph algorithms SIAM Journal on Computing 1 2 146 160 2 Sedgewick R 2002 Algorithms in C Part 5 Graph Algorithms Addison Wesley 3 Siek J G Lee L Q and Lumsdaine A 2002 The Boost Graph Library User Guide and Reference Manual Upper Saddle River NJ Pearson Education 2 247 graphconncomp See Also Bioinformatics Toolbox functions graphallshortestpaths graphisdag graphisomorphism graphisspantree graphmaxflow graphminspantree graphpred2path graphshortestpath graphtopoord
154. 4 U A T weak 10 w Gap of 16 Alphabet indeterminate RNA length A G 5 R T G C not 11 B Unknown 0 and purine A default gt 17 int2nt SeqNT converts a 1 by N array of integers to a character string using the table Mapping Nucleotide Letters to Integers above int2nt PropertyName PropertyValue defines optional properties using property name value pairs int2nt Alphabet AlphabetValue selects the nucleotide alphabet to use The default value is DNA which uses the symbols A T C and G If AlphabetValue is set to RNA int2nt uses the symbols A C U G instead int2nt Unknown UnknownValue specifies the character to represent an unknown nucleotide base int2nt Case CaseValue selects the output case of the nucleotide string Enter a sequence of integers as a MATLAB vector space or comma separated list with square brackets int2nt s int2nt 1 243241 3 2 ACTGCTAGC Define a symbol for unknown numbers 16 and greater si 1 2 4 20 2 4 40 3 2 s int2nt si unknown S ACT CT GC See Also Bioinformatics Toolbox function aa2int int2aa nt2int 2 331 isoelectric Purpose Syntax Arguments Description 2 332 Estimate isoelectric point for amino acid sequence pI isoelectric SeqAA pI Charge isoelectric SeqAA isoelectric PropertyName PropertyValue isoelectric PKVals PKVals
155. 5 E 2 ro ies oes v YX 20 EE STAN e EEN E E P RE E E E N PE 10e A t i 1 OF 0 2000 4000 6000 8000 10000 12000 14000 16000 18000 Mass Charge M Z 5 Smooth the signal using the mslowess function Then convert the smoothed data to a peak list by finding relevant peaks and plot the third spectrum YS mslowess MZ_lo_res YB SHOWPLOT 3 2 473 mspeaks TT lolx File Edit View Insert Tools Desktop Window Help a Deua i eR eee n Spectrogram ID 3 Original spectrogram Smoothed spectrogram 3 8 Relative Intensity wld 0 2000 4000 6000 8000 10000 12000 14000 16000 18000 Mass Charge M Z P mspeaks MZ_lo_res YS DENOISING false SHOWPLOT 3 2 474 mspeaks alld Edit View Insert Tools Desktop Window Help SHS 8 QQM8 2 0 8 50 Spectrogram ID 3 Original spectrogram Peaks Relative Intensi 0 2000 4000 6000 8000 10000 12000 14000 16000 18000 Mass Charge M Z 6 Use the cellfun function to remove all peaks with m z values less than 2000 from the eight peaks lists in output P Then plot the peaks of the third spectrum in red over its smoothed signal in blue Q cellfun p p p 1 gt 2000 P UniformOutput false figure plot MZ_lo_res YS 3 b Q 3 1 Q 3 2 rx Xlabel Mass Charge M Z ylabel Relative Intensity axis 0 20000 5 95 2 475 mspeaks iol
156. 532 Median 3 Create an intensity versus ratio scatter plot of the cy3 and cy5 data Normalize the data and add a title and labels mairplot cy3data cy5data Normalize true Title Normalized R vs G IR plot Labels maStruct Names 4 Return intensity values and ratios without displaying the plot intensities ratios mairplot cy3data cy5data Showplot false 1 Quackenbush J 2002 Microarray Data Normalization and Transformation Nature Genetics Suppl 32 496 501 2 Dudoit S Yang Y H Callow M J and Speed T P 2002 Statistical Methods for Identifying Differentially Expressed Genes in Replicated cDNA Microarray Experiments Statistica Sinica 12 111 139 Bioinformatics Toolbox functions maboxplot magetfield maimage mainvarsetnorm maloglog malowess manorm mattest mavolcanoplot maloglog Purpose Create loglog plot of microarray data Syntax maloglog X Y PropertyName PropertyValue maloglog FactorLines N maloglog Title TitleValue maloglog Labels LabelsValues maloglog HandleGraphicsName HGValue H maloglog Arguments X A numeric array of microarray expression values from a single experimental condition Y A numeric array of microarray expression values from a single experimental condition N Property to add two lines to the plot showing a factor of N change TitleValue A string to use as the title for the pl
157. 89 svmsmoset function reference 2 696 svmtrain function reference 2 700 swalign function reference 2 716 T topoorder method reference 4 76 traceplot function reference 2 723 traverse method reference 4 77 Vv view biograph method reference 4 80 view phytree method reference 4 82 Ww weights method reference 4 83 Index 9
158. 917 2 Wu Z and Irizarry R A 2005 Stochastic Models Inspired by Hybridization Theory for Short Oligonucleotide Arrays Proceedings of RECOMB 2004 J Comput Biol 12 6 882 93 3 Wu Z and Irizarry R A 2005 A Statistical Framework for the Analysis of Microarray Probe Level Data Johns Hopkins University Biostatistics Working Papers 73 4 Wu Z and Irizarry R A 2003 A Model Based Background Adjustment for Oligonucleotide Expression Arrays RSS Workshop on Gene Expression Wye England http biosunO1 biostat jhsph edu 7Eririzarr Talks gctalk pdf 5 Abd Rabbo N A and Barakat H M 1979 Estimation Problems in Bivariate Lognormal Distribution Indian J Pure Appl Math 10 7 815 825 6 Best C J M Gillespie J W Yi Y Chandramouli G V R Perlmutter M A Gathright Y Erickson H S Georgevich L Tangrea M A Duray P H Gonzalez S Velasco A Linehan W M Matusik R J Price D K Figg W D Emmert Buck M R and Chuaqui R F 2005 Molecular alterations in primary prostate cancer after androgen ablation therapy Clinical Cancer Research 11 6823 6834 Bioinformatics Toolbox functions affyprobeseqread affyread celintensityread probelibraryinfo genbankread Purpose Syntax Arguments Description Examples Read data from GenBank file GenBankData genbankread File File Either of the following e String specifying a file name a path and file
159. AB search path or in the MATLAB Current Directory and that its associated library file is stored at D Affymetrix LibFiles DrosGenomel1 1 Read the contents of a CEL file into a MATLAB structure celStruct affyread Drosophila CEL 2 Display a spatial plot of the probe intensities maimage celStruct Intensity 3 Read the contents of a DAT file into a MATLAB structure and then display the raw image data datStruct affyread Drosophila dat imagesc datStruct Image axis image 2 37 affyread 4 Read the contents of a CHP file into a MATLAB structure and then plot the probe values for a probe set The CHP files require the library files Your file may be in a different location than this example chpStruct affyread Drosophila chp D Affymetrix LibFiles DrosGenome1 geneName probesetlookup chpStruct 14317_at probesetplot chpStruct 142417_at See Also Bioinformatics Toolbox functions agferead celintensityread gprread probelibraryinfo probesetlink probesetlookup probesetplot probesetvalues sptread 2 38 agferead Purpose Syntax Arguments Description Examples Read Agilent Feature Extraction Software file AGFEData agferead File File Microarray data file generated with the Agilent Feature Extraction Software AGFEData agferead File reads files generated with Feature Extraction Software from Agilent micoararry scanners and creates a structur
160. AN Bioinformatics Toolbox functions multialignread seqconsensus seqlogo seqprofile seqshoworfs seqshowwords seqtool getgenbank 2 635 seqdotplot Purpose Syntax Arguments Description Examples 2 636 Create dot plot of two sequences seqdotplot Seq1 Seq2 seqdotplot Seqi Seq2 Window Number Matches seqdotplot Matches Matrix seqdotplot Seq1 Seq2 Nucleotide or amino acid sequences Enter two character strings Do not enter a vector of integers You can also enter a structure with the field Sequence Window Enter an integer for the size of a window Number Enter an integer for the number of characters within the window that match seqdotplot Seq1 Seq2 plots a figure that visualizes the match between two sequences seqdotplot Seq1 Seq2 Window Number plots sequence matches when there are at least Number matches in a window of size Window When plotting nucleotide sequences start with a Window of 11 and Number of 7 Matches seqdotplot returns the number of dots in the dot plot matrix Matches Matrix seqdotplot returns the dotplot asa sparse matrix This example shows the similarities between the prion protein PrP nucleotide sequences of two ruminants the moufflon and the golden takin moufflon getgenbank AB060288 Sequence true seqdotplot takin getgenbank ABO60290 Sequence true seqdotplot moufflon takin 11 7
161. ArrowsValue BGobj biograph CMatrix NodeIDs ArrowSizeValue BGobj biograph CMatrix NodeIDs ShowWeightsValue BGobj biograph CMatrix NodeIDs ShowTextInNodesValue BGobj biograph CMatrix NodeIDs NodeAutoSizeValue BGobj biograph CMatrix NodeIDs NodeCallbackValue BGobj biograph CMatrix NodeIDs EdgeCallbackValue BGobj biograph CMatrix NodeIDs CustomNodeDrawFcnValue ID IDValue Label LabelValue Description LayoutType EdgeType Scale ScaleValue LayoutScale EdgeTextColor EdgeFontSize ShowArrows ArrowSize Showeights ShowTextInNodes NodeAutoSize NodeCallback EdgeCallback CustomNodeDrawFcn 2 55 biograph 2 56 Arguments CMatrix NodeIDs IDValue Full or sparse square matrix that acts as a connection matrix That is a value of 1 indicates a connection between nodes while a 0 indicates no connection The number of rows columns is equal to the number of nodes Node identification strings Enter any of the following e Cell array of strings with the number of strings equal to the number of rows or columns in the connection matrix CMatrix Each string must be unique e Character array with the number of rows equal to the number of nodes Each row in the array must be unique e String with the number of characters equal t
162. Bioinformatics Toolbox 2 Reference MATLAB 4 The MathWorks Accelerating the pace of engineering and science X Oo How to Contact The MathWorks www mathworks com Web comp soft sys matlab Newsgroup www mathworks com contact_TS htm1l Technical Support suggest mathworks com Product enhancement suggestions bugs mathworks com Bug reports doc mathworks com Documentation error reports service mathworks com Order status license renewals passcodes info mathworks com Sales pricing and general information 508 647 7000 Phone 508 647 7001 Fax The MathWorks Inc 3 Apple Hill Drive Natick MA 01760 2098 For contact information about worldwide offices see the MathWorks Web site Bioinformatics Toolbox Reference COPYRIGHT 2003 2007 by The MathWorks Inc The software described in this document is furnished under a license agreement The software may be used or copied only under the terms of the license agreement No part of this manual may be photocopied or reproduced in any form without prior written consent from The MathWorks Inc FEDERAL ACQUISITION This provision applies to all acquisitions of the Program and Documentation by for or through the federal government of the United States By accepting delivery of the Program or Documentation the government hereby agrees that this software or documentation qualifies as commercial computer software or commercial computer software documentation as such terms are used or de
163. Constraint property as small as possible will help the SMO algorithm run faster 1 Load the sample data which includes Fisher s iris data of 5 measurements on a sample of 150 irises load fisheriris 2 Create data a two column matrix containing sepal length and sepal width measurements for 150 irises data meas 1 meas 2 3 From the species vector create a new column vector groups to classify data into two groups Setosa and non Setosa groups ismember species setosa 4 Randomly select training and test sets train test crossvalind holdOut groups cp classperf groups 5 Train an SVM classifier using a linear kernel function and plot the grouped data svmStruct svmtrain data train groups train showplot true 2 709 svmtrain Figure 1 6 Add a title to the plot using the KernelFunction field from the svmStruct structure as the title title sprintf Kernel Function s func2str svmStruct KernelFunction interpreter none 2 710 svmtrain Figure 1 7 Use the svmclassify function to classify the test set classes svmclassify svmStruct data test showplot true 2 711 svmtrain Le File Edit View Insert Tools Desktop Window Help Oe Ss Aamo Emale n Kernel Function linear_kernel 45 0 training 0 classified 1 training 1 classified Support Vectors oe
164. DYKMCLYEFGMFGHFTGHKK See Also Statistics Toolbox functions hmmgenerate randsample MATLAB functions rand randperm 2 598 rankfeatures Purpose Syntax Description Rank key features by class separability criteria IDX Z rankfeatures X Group IDX Z rankfeatures X Group Criterion CriterionValue IDX Z rankfeatures X Group CCWeighting ALPHA seca IDX Z rankfeatures X Group NWeighting BETA IDX Z rankfeatures X Group NumberOfIndices N jaca IDX Z rankfeatures X Group CrossNorm CN IDX Z rankfeatures X Group ranks the features in X using an independent evaluation criterion for binary classification X is a matrix where every column is an observed vector and the number of rows corresponds to the original number of features Group contains the class labels IDX is the list of indices to the rows in X with the most significant features Z is the absolute value of the criterion used see below Group can be a numeric vector or a cell array of strings numel Group is the same as the number of columns in X and numel unique Group is equal to 2 IDX Z rankfeatures X Group PropertyName PropertyValue calls rankfeatures with optional properties that use property name property value pairs You can specify one or more properties in any order Each PropertyName must be enclosed in single quotation marks and is c
165. DisplayInfo seqlogo Seqs seqlogo Displaylogo DisplaylogoValue seqlogo Alphabet AlphabetValue seqlogo Startat StartatValue seqlogo Endat EndatValue seqlogo SSCorrection SSCorrectionValue Seqs Set of pair wise or multiply aligned nucleotide or amino acid sequences represented by any of the following e Character array e Cell array of strings e Array of structures containing a Sequence field Profile Sequence profile distribution matrix with the frequency of nucleotides or amino acids for every column in the multiple alignment such as returned by the seqprofile function The size of the frequency distribution matrix is e For nucleotides 4 x sequence length e For amino acids 20 x sequence length If gaps were included Profile may have 5 rows for nucleotides or 21 rows for amino acids but seqlogo ignores gaps 2 643 seqlogo Return Values Description 2 644 DisplaylogoValue AlphabetValue StartatValue EndatValue SSCorrectionValue DisplayInfo Controls the display of a sequence logo Choices are true default or false String specifying the type of sequence nucleotide or amino acid Choices are NT default or AA Positive integer that specifies the starting position for the sequences in Seqs Default starting position is 1 Positive integer that specifies the ending position for the se
166. E are the number of nodes and edges respectively e Goldberg Default algorithm Uses the Goldberg algorithm which uses the generic method known as preflow push Time complexity is O N 2 sqrt E where N and E are the number of nodes and edges respectively Examples 1 Create a directed graph with six nodes and eight edges cm sparse 1 12233 45 2345 45 6 6 233111 2 3 6 6 cm On WWND 2 Calculate the maximum flow in the graph from node 1 to node 6 M F K graphmaxflow cm 1 6 M 4 F 1 2 2 1 3 2 2 4 1 3 4 1 2 266 graphmaxflow 2 5 1 3 5 1 4 6 2 5 6 2 K 1 1 1 1 0 0 1 0 1 0 0 0 Notice that K is a two row matrix because there are two possible solutions to the minimum cut problem 3 View the graph with the original capacities h view biograph cm ShowWeights on 2 267 graphmaxflow Biograph Viewer 1 Wy 4 View the graph with the calculated maximum flows view biograph F Showeights on 2 268 graphmaxflow Biograph Viewer 2 5 Show one solution to the minimum cut problem in the original graph set h Nodes K 1 Color 1 0 0 2 269 graphmaxflow References 2 270 Biograph Viewer 1 E o x File Tools Window Help aA Notice that in the three edges that connect the source nodes red to the destination nodes yellow the original capacities and the calculated maximu
167. Enter a character string or a vector of integers from the table Examples ARN or 1 2 3 GeneticCodeValue Property to select a genetic code Enter a code number or code name from the Genetic Code on page 2 5 table below If you use a code name you can truncate the name to the first two characters of the name Alphabet Value Property to select a nucleotide alphabet Enter either DNA or RNA The default value is DNA which uses the symbols A C T G The value RNA uses the symbols A C U G Genetic Code Code Code Name Code Code Name Number Number 1 Standard 12 Alternative Yeast Nuclear 2 Vertebrate 13 Ascidian Mitochondrial Mitochondrial 3 Yeast Mitochondrial 14 Flatworm Mitochondrial 2 5 aa2nt 2 6 Description Code Code Name Code Code Name Number Number 4 Mold Protozoan 15 Blepharisma Nuclear Coelenterate Mitochondrial and Mycoplasma Spiroplasma 5 Invertebrate 16 Chlorophycean Mitochondrial Mitochondrial 6 Ciliate Dasycladacean 21 Trematode and Hexamita Nuclear Mitochondrial 9 Echinoderm 22 Scenedesmus Obliquus Mitochondrial Mitochondrial 10 Euplotid Nuclear 23 Thraustochytrium Mitochondrial 11 Bacterial and Plant Plastid SeqNT aa2nt SeqAA converts an amino acid sequence SeqgAA to a nucleotide sequence SeqNT using the standard genetic code In general the mapping from an amino acid to a nucleotide codon is not
168. FIR filter with a least squares error minimization The cu off frequency is set by the largest down sampling ratio when comparing the same regions in the MZ and MZout vectors Note msresample is particularly useful when you have spectra with different mass charge vectors and you want to match the scales msresample Examples msresample PropertyName PropertyValue defines optional properties using property name value pairs msresample Uniform UniformValue when UniformValue is true forces the vector MZ to be uniformly spaced The default value is false msresample Range RangeValue specifies a 1 by 2 vector with the mass charge range for the output spectrum Yout RangeValue must be within min MZ max MZ The default value is the full range min MZ max MZ msresample Missing MissingValue when MissingValue is true analyzes the mass charge vector MZ for dropped samples The default value is false If the down sample factor is large checking for dropped samples might not be worth the extra computing time Dropped samples can only be recovered if the original MZ values follow a linear or a quadratic function of the MZ vector index msresample Window WindowValue specifies the window used when calculating parameters for the lowpass filter Enter Flattop Blackman Hamming or Hanning The default value is Flattop msresample Cutoff CutoffVal
169. Field in the MATLAB Structure HET Heterogen HETNAM HeterogenName HETSYN HeterogenSynonym FORMUL Formula HELIX Helix SHEET Sheet TURN Turn SSBOND SSBond LINK Link HYDBND HydrogenBond SLTBRG SaltBridge CISPEP CISPeptides SITE Site CRYST1 Cryst1 ORIGXn OriginX SCALEn Scale MTRIXn Matrix TVECT TranslationVector MODEL Model ATOM Atom SIGATM AtomSD ANISOU AnisotropicTemp SIGUIJ AnisotropicTempsD TER Terminal pdbread PDB Database Record Field in the MATLAB Structure HETATM HeterogenAtom CONECT Connectivity PDBStruct pdbread File ModelNum ModelNumValue reads only the model specified by ModelNumValue from the PDB formatted text file File and stores the data in the MATLAB structure PDBStruct If ModeiNumValue does not correspond to an existing mode number in File then pdbread reads the coordinate information of all the models The Sequence Field The Sequence field is also a structure containing sequence information in the following subfields e NumOfResidues e ChainID e ResidueNames Contains the three letter codes for the sequence residues e Sequence Contains the single letter codes for the sequence residues Note Ifthe sequence has modified residues then the ResidueNames subfield might not correspond to the standard three letter amino acid codes In this case the Sequence subfield will contain the m
170. GenPept or EMBL features MATLAB character array including text describing GenBank GenPept or EMBL features MATLAB structure with fields corresponding to GenBank GenPept or EMBL data such as those returned by genbankread genpeptread emblread getgenbank getgenpept or getembl FeatStruct is the output structure containing a field for every database feature Each field name in FeatStruct matches the corresponding 2 153 featuresparse 2 154 feature name in the GenBank GenPept or EMBL database with the following exceptions Feature Name in GenBank Field Name in MATLAB Structure GenPept or EMBL Database 10_ signal minus_10_ signal 35_ signal minus_35 signal 3 UTR three_prime_UTR 3 clip three_prime_clip 5 UTR five_prime_UTR 5 clip five_prime_clip D loop D_loop Fields in FeatStruct contain substructures with feature qualifiers as fields In the GenBank GenPept and EMBL databases for each feature the only mandatory qualifier is its location which featuresparse translates to the field Location When possible featuresparse also translates this location to numeric indices creating an Indices field Note If you use the Indices field to extract sequence information you may need to complement the sequences FeatStruct featuresparse Features PropertyName PropertyValue calls featuresparse with optional properties that use property name property value pairs
171. HHV3gp06 HH V3gp05 HH aap04 HH 3gp03 HH V3gp02 HHv3gp01 2 149 featuresmap Examples 2 150 After creating a map e Click a feature or annotation to display a list of all qualifiers for that feature e Zoom the plot by clicking the following buttons ajn BI Creating a Circular Map with Legend The following example creates a circular map of five different features mapped on three levels It also uses outputs from the featuresmap function as inputs to the legend function to add a legend to the map GBStructure getgenbank J01415 Handles OutFeatList featuresmap GBStructure CDS D_loop mRNA tRNA rRNA 1 2 2 2 3 legend Handles OutFeatList interpreter none location bestoutside title Human Mitochondrion Complete Genome Creating a Linear Map with Sequence Position Labels and Changed Font Size The following example creates a linear map showing only the gene feature It changes the font of the labels to seven points and includes the sequence position in the labels herpes getgenbank NC_001348 featuresmap herpes gene fontsize 7 showpositions true title Genes in Human herpesvirus 3 strain Dumas Determining Qualifiers for a Specific Feature The following example uses the getgenbank function to create a GenBank structure GBStructure It then uses the featuresparse function to parse the features in the GenBank structure into a
172. HLines maboxplot maboxplot PropertyName PropertyValue maboxplot Title TitleValue maboxplot Notch NotchValue maboxplot Symbol SymbolValue maboxplot Orientation OrientationValue maboxplot WhiskerLength WhiskerLengthValue Arguments MAData A numeric array or a structure containing a field called Data The values in the columns of MAData will be used to create box plots ColumnName An array of column names corresponding to the data in MAData MAStruct A microarray data structure FieldName A field within the microarray data structure MAStruct The values in the field FieldName will be used to create box plots TitleValue A string to use as the title for the plot The default title is FieldName NotchValue Property to control the type of boxes drawn Enter either true for notched boxes or false for square boxes Default is false 2 350 maboxplot Description OrientationValue Property to specify the orientation of the box plot Enter Vertical or Horizontal Default is Horizontal WhiskerLengthValue Property to specify the maximum length of the whiskers as a function of the interquartile range IQR The whisker extends to the most extreme data value within WhiskerLengthValue IQR of the box Default 1 5 If WhiskerLengthValue equals 0 then maboxplot displays all data values outside the box using the p
173. Height_FilterValue Positive real value that specifies the minimum height for reported peaks Default is 0 ShowPlotValue Controls the display of a plot of the original and the smoothed signal with the peaks included in the output matrix Peaks marked Choices are true false or I an integer specifying the index of a spectrum in Intensities If set to true the first spectrum in Intensities is plotted Default is e false When return values are specified e true When return values are not specified Peaks Two column matrix where each row corresponds to a peak The first column contains mass charge m z values and the second column contains ion intensity values Peaks mspeaks MZ Intensities finds relevant peaks in raw mass spectrometry data and creates Peaks a two column matrix containing the m z value and ion intensity for each peak mspeaks finds peaks by first smoothing the signal using undecimated wavelet transform with Daubechies coefficients then assigning peak locations and lastly eliminating peaks that do not satisfy specified criteria Peaks mspeaks MZ Intensities PropertyName PropertyValue calls mspeaks with optional properties that 2 469 mspeaks 2 470 use property name property value pairs You can specify one or more properties in any order Each PropertyName must be enclosed in single quotation marks and is case insensitive These property name property value pairs ar
174. J Scherf U Speed T P 2003 Exploration Normalization and Summaries of High Density Oligonucleotide Array Probe Level Data Biostatistics 4 249 264 2 Mosteller F and Tukey J 1977 Data Analysis and Regression Reading Massachusetts Addison Wesley Publishing Company pp 165 202 3 Best C J M Gillespie J W Yi Y Chandramouli G V R Perlmutter M A Gathright Y Erickson H S Georgevich L Tangrea M A Duray P H Gonzalez S Velasco A Linehan W M Matusik R J Price D K Figg W D Emmert Buck M R and Chuaqui R F 2005 Molecular alterations in primary prostate cancer after androgen ablation therapy Clinical Cancer Research 11 6823 6834 affyinvarsetnorm celintensityread mainvarsetnorm malowess manorm quantilenorm rmabackadj 2 623 rna2dna Purpose Syntax Arguments Description Example See Also 2 624 Convert RNA sequence of nucleotides to DNA sequence SeqDNA rna2dna SeqRNA SeqRNA Nucleotide sequence for RNA Enter a character string with the characters A C U G and the ambiguous nucleotide bases N R Y K M S W B D H and V SeqDNA rna2dna SeqRNA converts any uracil nucleotides in an RNA sequence into thymine U gt T and returns in the same format as DNA For example if the RNA sequence is an integer sequence then so is SeqRNA rna2dna ACGAUGAGUCAUGCUU ans ACGATGAGTCATGCTT Bioinformatics Toolbox function dna
175. J and Doty P 1962 Determination of the base composition of deoxyribonucleic acid from its thermal denaturation temperature Journal Molecular Biology 5 109 118 oligoprop See Also 5 Panjkovich A and Melo F 2005 Comparison of different melting temperature calculation methods for short DNA sequences Bioinformatics 21 6 711 722 6 SantaLucia Jr J Allawi H T and Seneviratne P A 1996 Improved Nearest Neighbor Parameters for Predicting DNA Duplex Stability Biochemistry 35 3555 3562 7 SantaLucia Jr J 1998 A unified view of polymer dumbbell and oligonucleotide DNA nearest neighbor thermodynamics Proceedings of the National Academy of Science USA 95 1460 1465 8 Sugimoto N Nakano S Yoneyama M and Honda K 1996 Improved thermodynamic parameters and helix initiation factor to predict stability of DNA duplexes Nucleic Acids Research 24 22 4501 4505 9 http Awww basic northwestern edu biotools oligocalc html for weight calculations Bioinformatics Toolbox functions isoelectric molweight ntdensity palindromes randseq 2 539 optimalleaforder Purpose Determine optimal leaf ordering for hierarchical binary cluster tree Syntax Order optimalleaforder Tree Dist Order optimalleaforder Tree Dist Criteria CriteriaValue Order optimalleaforder Tree Dist Transformation TransformationValue Arguments Tree Hierarchical binary cluster
176. M1 enter a positive value for M7 and enter Inf for M2 Read the sequence for the human p53 tumor gene p53nt fastaread p53nt txt Read the sequence for the human p53 tumor protein fastaread p53aa fastaread p53aa txt Read the human mitochondrion genome in FASTA format entrezSite http www ncbi nlm nih gov entrez viewer fcgi textOptions amp txt on amp view fasta genbankID amp list_uids NC_001807 mitochondrion fastaread entrezSite textOptions genbankID See Also Bioinformatics Toolbox functions emblread fastawrite genbankread genpeptread multialignread seqprofile seqtool 2 139 fastawrite Purpose Syntax Arguments Description Examples 2 140 Write to file using FASTA format fastawrite File Data fastawrite File Header Sequence File Data Header Sequence String specifying either a file name or a path and file name supported by your operating system If you specify only a file name the file is saved to the MATLAB Current Directory Any of the following e String with a FASTA format e Sequence object e MATLAB structure containing the fields Header and Sequence e GenBank GenPept structure String containing information about the sequence This text will be included in the header of the FASTA formatted file File String or name of variable containing an amino acid or nucleotide sequence using the standard IUB IUPAC letter or integer co
177. MInterp SMInterpValue when SMInterpValue is false turns off the linear interpolation of the scoring matrices Instead each supplied scoring matrix is assigned to multialign a fixed range depending on the distances between the two profiles or sequences being aligned multialign GapOpen GapOpenValue specifies the initial penalty for opening a gap multialign ExtendGap ExtendGapValue specifies the initial penalty for extending a gap multialign DelayCutoff DelayCutoffValue specifies a threshold to delay the alignment of divergent sequences whose closest neighbor is farther than DelayCutoffValue median patristic distance between sequences multialign JobManager JobManagerValue distributes pair wise alignments into a cluster of computers using Distributed Computing Toolbox multialign WaitInQueue WaitInQueueValue when WaitInQueueValue is true waits in the job manager queue for an available worker When WaitInQueueValue is false default and there are no workers immediately available multialign errors out Use this property with Distributed Computing Toolbox and the multialign property WaitInQueue multialign Verbose VerboseValue when VerboseValue is true turns on verbosity The remaining input optional arguments are analogous to the function profalign and are used through every step of the progressive alignment of profiles multialign ExistingGapAdjus
178. NG INSERT states in the output sequence hmmprofalign ScoreFlanks ScoreFlanksValue when ScoreFlanksValue is true includes the transition probabilities for the flanking states in the raw score hmmprofalign ScoreNullTransitions ScoreNullTransitionValue when ScoreNullTransitionsValue is true adjusts the raw score using the null model for transitions Model Nul1X hmmprofalign Examples See Also Note Multiple target alignment is not supported in this implementation All the Mode1l LoopX probabilities are ignored load hmm_model_examples model_7tm_2 load a model example load hmm_model_examples sequences load a sequence example SCCR_RABIT sequences 2 Sequence a s hmmprofalign model_7tm_2 SCCR_RABIT showscore true Bioinformatics Toolbox functions gethmmprof hmmprofestimate hmmprofgenerate hmmprofgenerate hmmprofstruct pfamhmmread showhmmprof multialign profalign 2 309 hmmprofestimate Purpose Syntax Arguments 2 310 Estimate profile Hidden Markov Model HMM parameters using pseudocounts hmmprofestimate Model MultipleAlignment PropertyName PropertyValue hmmprofestimate 33 hmmprofestimate hmmprofestimate hmmprofestimate Model MultipleAlignment AX BE BMx BDx J 3 A AValue Ax AxValue BE BEValue BDx BDxValue Hidden Markov model created with the function hmmprofstr
179. Note that each row in Data corresponds to a perfect match PM probe and each column corresponds to an Affymetrix CEL file Each CEL file is generated from a separate chip All chips should be of the same type Note that the column vector ProbeIndices designates probes within each probe set by labeling each probe 0 to N 1 where N is the number of probes in the probe set Note that each row in ExpressionMatrix corresponds to a gene probe set and each column in ExpressionMatrix corresponds to an Affymetrix CEL file which represents a single chip For a given probe set n with J probe pairs let Yijn denote the background adjusted base 2 log transformed and quantile normalized PM probe intensity value of chip i and probe j Yijn follows a linear additive model Yijn Uin Ajn Eqn i 1 I j 1 J n 1 N where Uin gene expression of the probe set n on chip i Ajn probe affinity effect for the jth probe in the probe set Eijn residual for the jth probe on the ith chip The RMA methods assumes A1 A2 AJ 0 for all probe sets A robust procedure median polish is used to estimate Ui as the log scale measure of expression Note There is no column in ExpressionMatrix that contains probe set or gene information ExpressionMatrix rmasummary PropertyName PropertyValue defines optional properties that use property 2 621 rmasummary Examples 2 622 name value pairs in a
180. Nucleotide or amino acid sequence Enter a character string or a structure with the field Sequence Length Length of n mer to count Enter an integer nmercount Seq Length counts the number of n mers or patterns of a specific length in a sequence nmercount Seq Length C returns only the n nmers with cardinality at least C Count the number of n mers in an amino acid sequence and display the first six rows in the cell array S getgenpept AAA59174 SequenceOnly true nmers nmercount S 4 nmers 1 6 ans apes 2 dfrd 2 eslk 2 frdl 2 gnys 2 lkel 2 Bioinformatics Toolbox functions basecount codoncount dimercount nt2aa Purpose Syntax Convert nucleotide sequence to amino acid sequence SeqAA nt2aa SeqNT SeqAA nt2aa Frame FrameValue SeqAA nt2aa GeneticCode GeneticCodeValue SeqAA nt2aa AlternativeStartCodons AlternativeStartCodonsValue 2 513 nt2aa Arguments 2 514 SeqNT FrameValue Either of the following e String specifying a nucleotide sequence e MATLAB structure containing the field Sequence Valid characters include A c e G T U hyphen Note Hyphens are valid only if the codon to which it belongs represents a gap that is the codon contains all hyphens Example ACT TGA Tip Do not use a sequence with hyphens if you specify all for FrameValue
181. OFF e GONNET e BLOSUM30 increasing by 5 up to BLOSUM90 e BLOSUM62 e BLOSUM100 Default is e NUC44 when AlphabetValue equals 1 NT 1 e BLOSUM50 when AlphabetValue equals AA Positive value that specifies the scale factor used to return the score in arbitrary units If the scoring matrix information also provides a scale factor then both are used Positive integer specifying the penalty for opening a gap in the alignment Default is 8 Positive integer specifying the penalty for extending a gap Default is equal to GapOpenValue 2 657 seqpdist Return Values Description 2 658 D Vector containing biological distances between each pair of sequences stored in the M elements of Seqs D seqpdist Seqs returns D a vector containing biological distances between each pair of sequences stored in the M sequences of Seqs a cell array of sequences a vector of structures or a matrix or sequences Dis a 1 by M M 1 2 row vector corresponding to the M M 1 2 pairs of sequences in Seqs The output D is arranged in the order COSY CO nse is M 1 Ne ee ME eke M M 1 This is the lower left triangle of the full M by M distance matrix To get the distance between the Ith and the Jth sequences for I gt J use the formula D J 1 M J 2 I d D seqpdist Seqs PropertyName PropertyValue calls seqpdist with optional properties that use property name property value pairs
182. OptimalOrder is a vector of position indices for each leaf in Tree1Reordered determined by the optimal leaf ordering calculation Reordering Leaves Using a Valid Order 1 Create and view a phylogenetic tree b 1 2 3 4 5 6 7 8 9 10 tree phytree b Phylogenetic tree object with 6 leaves 5 branches view tree 2 Reorder the leaves on the phylogenetic tree and then view the reordered tree treeReordered reorder tree 5 6 3 4 1 2 view treeReordered Finding Best Approximate Order When Using an Invalid Order 1 Create a phylogenetic tree by reading a Newick formatted tree file ASCII text file tree phytreeread pf00002 tree Phylogenetic tree object with 33 leaves 32 branches 2 Create a row vector of the leaf names in alphabetical order dummy order sort get tree LeafNames reorder phytree 3 Reorder the phylogenetic tree to match as closely as possible the row vector of alphabetically ordered leaf names without dividing the clades or having crossing branches treeReordered reorder tree order approximate true Phylogenetic tree object with 33 leaves 32 branches 4 View the original and the reordered phylogenetic trees view tree view treeReordered Reordering Leaves to Match Leaf Order in Another Phylogenetic Tree 1 Create a phylogenetic tree by reading sequence data from a FASTA file calculating the pair wise distances between sequences and then using t
183. Out msalign Group GroupValue controls the creation of RefMZOut a new vector of m z values to be used as reference masses for aligning the peaks This vector is created by adjusting the values in RefMZ based on the sample data from multiple spectra in Intensities such that the overall shifting and scaling of the peaks is minimized Choices are true or false default msalign Tip Set GroupValue to true only if Intensities contains data for a large number of spectra and you are not confident of the m z values used for your reference peaks in RefMZ Leave GroupValue set to false if you are confident of the m z values used for your reference peaks in Ref MZ Examples Aligning Mass Spectrum with Three or More Reference Peaks 1 Load sample data reference masses and parameter data for synthetic peak width load sample_lo_res R 3991 4 4598 7964 9160 w 60 100 60 100 2 Display a color image of the mass spectra before alignment msheatmap MZ_lo_res Y_lo_res markers R range 3000 10000 title before alignment 2 419 msalign io x File Edit View Insert Tools Desktop Window Help a DSHS eana eane na before alignment Spectrogram Indices Relative Intensity 3000 4000 5000 6000 7000 8000 9000 10000 Mass Charge M Z 3 Align spectra with reference masses and display a color image of mass spectra after alignment YA msalign MZ_lo_res Y_lo_res R weights
184. Peaks mspeaks MZ Intensities PeakLocationValue Peaks mspeaks MZ Intensities FWHH_FilterValue Peaks mspeaks MZ Intensities OverSegmentation Filter OverSegmentation_FilterValue Peaks mspeaks MZ Intensities Height_FilterValue Peaks mspeaks MZ Intensities ShowPlotValue Base BaseValue Levels LevelsValue NoiseEstimator Multiplier Denoising PeakLocation FWHH Filter Height_Filter ShowPlot 2 465 mspeaks Arguments 2 466 MZ Intensities BaseValue LevelsValue Vector of mass charge m z values for a set of spectra The number of elements in the vector equals n or the number of rows in matrix Intensities Matrix of intensity values for a set of mass spectra that share the same mass charge m z range Each row corresponds to an m z value and each column corresponds to a spectrum or retention time The number of rows equals n or the number of elements in vector MZ An integer between 2 and 20 that specifies the wavelet base Default is 4 An integer between 1 and 12 that specifies the number of levels for the wavelet decomposition Default is 10 mspeaks NoiseEstimatorValue MultiplierValue DenoisingValue String or scalar that specifies the method to estimate the threshold T to filter out noisy components in the first high band decomposition y_h Choices are e mad D
185. PropertyValue calls nt2aa with optional properties that use property name property value pairs You can specify one or more properties in any order Each PropertyName must be enclosed in single quotation marks and is case insensitive These property name property value pairs are as follows SeqgAA nt2aa Frame FrameValue converts a nucleotide sequence for a specific reading frame to an amino acid sequence Choices are 1 2 3 or all Default is 1 If FrameValue is all then output SegAA is a 3 by 1 cell array SeqAA nt2aa GeneticCode GeneticCodeValue converts a nucleotide sequence to an amino acid sequence using a specific genetic code SeqAA nt2aa AlternativeStartCodons AlternativeStartCodonsValue controls the translation of alternative start codons By default AlternativeStartCodonsValue is set to true and if the first codon of a sequence is a known alternative start codon the codon is translated to methionine nt2aa Examples See Also If this option is set to false then an alternative start codon at the start of a sequence is translated to its corresponding amino acid in the genetic code that you specify which might not necessarily be methionine For example in the human mitochondrial genetic code AUA and AUU are known to be alternative start codons For more details of alternative start codons see www ncbi nlm nih gov Taxonomy Utils wprintgc cgi m
186. PropertyValue returns a PAM scoring matrix for amino acid sequences ScoringMatrix MatrixInfo pam N returns a structure with information about the PAM matrix The fields in the structure are Name Scale Entropy Expected and Order ScoringMatrix pam Extended ExtendedValue if Extended is true returns a scoring matrix with the 20 amino acid characters the ambiguous characters and stop character B Z X If Extended is false only the standard 20 amino acids are included in the matrix ScoringMatrix pam Order OrderValue returns a PAM matrix ordered by the amino acid sequence in Order If Order does not contain the extended characters B Z X and then these characters are not returned 2 546 pam PAM50 substitution matrix in 1 2 bit units Expected score 3 70 Entropy 2 00 bits Lowest score 13 Highest score 13 PAM250 substitution matrix in 1 3 bit units Expected score 0 844 Entropy 0 354 bits Lowest score 8 Highest score 17 Examples Get the PAM matrix with N 50 PAM50 pam 50 PAM250 pam 250 Order CSTPAGNDEQHRKMILVFYW See Also Bioinformatics Toolbox functions blosum dayhoff gonnet nwalign swalign 2 547 pdbdistplot Purpose Visualize intermolecular distances in Protein Data Bank PDB file Syntax pdbdistplot PDBid pdbdistplot PDBid Distance Arguments PDBid Unique identifier for a protein structure record Each
187. SVI1adL XLSV1d dlsviqd NLSV1d wipiboig Aq S NA 2 70 blastncbi 2 71 Tnesep 66 86 S6 88 08 6Z x x x x x pd L Ob L Ob i oL qmejop t LL KINLJPP IL LL t oL OMELPP IL UTAN Tk eb Qmesop t tb LE Tbk ob IG Zl Ile TEATE EE ZLI 221 Te 2 le x 8 Iz 6 8 Iz 6 Iz 8 Ie 6 8 Iz 6 x dvD VOIW XLSV19L NLSVI1dL XLSV14d dlsv1d NLSV1d blastncbi 2 72 Examples See Also oe Get a sequence from the Protein Data Bank and create a MATLAB structure getpdb iCIV of n oe Use the structure as input for a BLAST search with an expectation of ie 10 blastncbi S blastp expect 1e 10 of oe Click the URL link Link to NCBI BLAST Request to go directly to the NCBI request of oe You can also try a search directly with an accession number and an alternative scoring matrix RID blastncbi AAA59174 blastp matrix PAM70 expect 1e 10 The results based on the RID are at http www ncbi nlm nih gov BLAST Blast cgi or pass the RID to BLASTREAD to parse the report and load it into a MATLAB structure blastread RID Bioinformatics Toolbox functions blastread getblast blastread Purpose Syntax Arguments Description Read data from NCBI BLAST report file Data blastread File File NCBI BLAST formatted report file Enter a file name
188. TA TA seqshowwords GCTATAACGTATATATATA TA TA TA ans Start 3 10 Stop 6 19 000001 GCTATAACGTATATATATA See Also Bioinformatics Toolbox functions palindromes cleave restrict seqdisp seqtool seqwordcount MATLAB functions strfind regexp 2 677 seqtool Purpose Syntax Arguments Description Example 2 678 Open tool to interactively explore biological sequences seqtool Seq seqtool PropertyName PropertyValue seqtool Alphabet AlphabetValue Seq Struct with a field Sequence a character array or a file name with an extension of gbk gpt fasta fa or ebi seqtool Seq loads a sequence Seq into the seqtool GUI seqtool PropertyName PropertyValue defines optional properties using property name value pairs seqtool Alphabet AlphabetValue specifies an alphabet AlphabetValue for the sequence Seq Default is AA except when all of the symbols in the sequence are A C G T and then AlphabetValue is set to NT Use AA when you want to force an amino acid sequence alphabet 1 Get a sequence from Genbank S getgenbank M10051 2 Open the sequence tool window with the sequence seqtool S seqtool See Also Sequence Viewer HUMINSR Fie Edit Sequence Display Window Help a RRS H 2 Line length 60 M10051 Human insulin receptor mRNA complete cds Sequenc
189. TETTTTTTTI TETTE TETTTETTTS lon Intensity jee ee te 0 5000 10000 15000 Mass Charge M Z 3 Plot the estimated baseline for the fourth spectrum in Y_lo_res using an anonymous function to describe an m z dependent parameter wf mz 200 001 mz msbackadj MZ_lo_res Y_lo_res 4 STEPSIZE wf 2 428 msbackadj 0 x File Edit View Insert Tools Desktop Window Help Spectrogram ID 1 e x 100 ig 90 l Original spectrogram so Regressed baseline x Estimated baseline points 70 DBO Bh nen e ene reefer ecco bec eee ese e ec r ecb ones ec nec ec nee D 50 l H t 4 t 4 H S 40 30 W 20 By 4H H 0 5000 10000 15000 Mass Charge M Z See Also Bioinformatics Toolbox functions msalign mslowess msheatmap msnorm mspeaks msresample mssgolay msviewer 2 429 msdotplot Purpose Syntax Arguments 2 430 Plot set of peak lists from LC MS or GC MS data set msdotplot Peaks Times msdotplot FigHandle Peaks Times msdotplot Quantile QuantileValue PlotHandle msdotplot Peaks Cell array of peak lists where each element is a two column matrix with m z values in the first column and ion intensity values in the second column Each element corresponds to a spectrum or retention time Tip You can use the mzxml2peaks function
190. UG and stop codons UAA UAG and UGA seqshoworfs SeqnT PropertyName PropertyValue calls seqshoworfs with optional properties that use property name property value pairs You can specify one or more properties in any order Each PropertyName must be enclosed in single quotes and is case insensitive These property name property value pairs are as follows 2 671 seqshoworfs seqshoworfs SeqNT Frames FramesValue specifies the reading frames to display The default is to display the first second and third reading frames with ORFs highlighted in each frame seqshoworfs SeqnT GeneticCode GeneticCodeValue specifies the genetic code to use for finding open reading frames seqshoworfs SeqnT MinimumLength_ MinimumLengthValue sets the minimum number of codons for an ORF to be considered valid The default value is 10 seqshoworfs SeqnT AlternativeStartCodons AlternativeStartCodonsValue uses alternative start codons if AlternativeStartCodons is set to true For example in the human mitochondrial genetic code AUA and AUU are known to be alternative start codons For more details on alternative start codons see http www ncbi nlm nih gov Taxonomy Utils wprintgc cgi mode t SG1 seqshoworfs SeqNT Color ColorValue selects the color used to highlight the open reading frames in the output display The default color scheme is blue for the first reading fr
191. User Manual ImaGene is a registered trademark of BioDiscovery Inc 1 Read in a sample ImaGene Results file Note the file cy3 txt is not provided with Bioinformatics Toolbox cy3Data imageneread cy3 txt 2 Plot the signal mean maimage cy3Data Signal Mean 3 Read in a sample ImaGene Results file Note the file cy5 txt is not provided with Bioinformatics Toolbox cy5Data imageneread cy5 txt imageneread 4 Create a loglog plot of the signal median from two ImaGene Results files sigMedianCol find strcmp Signal Median cy3Data ColumnNames cy3Median cy3Data Data sigMedianCol cy5Median cy5Data Data sigMedianCol maloglog cy3Median cy5Median title Signal Median See Also Bioinformatics Toolbox functions gprread maboxplot maimage sptread 2 325 int2aa Purpose Syntax Arguments Return Values 2 326 Convert amino acid sequence from integer to letter representation SeqChar int2aa SeqInt SeqChar int2aa SeqIint Case CaseValue SeqInt Row vector of integers specifying an amino acid sequence See the table Mapping Amino Acid Integers to Letters on page 2 326 for valid integers Integers are arbitrarily assigned to IUB IUPAC letters CaseValue String that specifies the case of the returned character string Choices are upper default or lower SeqChar Character string of single letter codes specifying an amino acid sequence Mapping Amino
192. Value isoelectric Charge ChargeValue isoelectric Chart ChartValue SeqAA Amino acid sequence Enter a character string or a vector of integers from the table Examples ARN or 1 2 3 PKValsValue Property to provide alternative pK values ChargeValue Property to select a specific pH for estimating charge Enter a number between 0 and 14 The default value is 7 2 ChartValue Property to control plotting a graph of charge versus pH Enter true or false pI isoelectric SeqAA returns the estimated isoelectric point pI for an amino acid sequence The isoelectric point is the pH at which the protein has a net charge of zero pI Charge isoelectric SegAA returns the estimated isoelectric point pI for an amino acid sequence and the estimated charge for a given pH default is typical intracellular pH 7 2 The estimates are skewed by the underlying assumptions that all amino acids are fully exposed to the solvent that neighboring peptides have no influence on the pK of any given amino acid and that the constitutive amino acids as well as the N and C termini are unmodified Cysteine isoelectric residues participating in disulfide bridges also affect the true pI and are not considered here By default isoelectric uses the EMBOSS amino acid pK table or you can substitute other values using the property PKVals e If the sequence contains ambiguous amino acid characters b z isoelectric
193. You can specify one or more properties in any order Each PropertyName must be enclosed in single quotation marks and is case insensitive These property name property value pairs are as follows D seqpdist Seqs Method MethodValue specifies a method to compute distances between every pair of sequences Choices are shown in the following tables Methods for Nucleotides and Amino Acids Method Description p distance Proportion of sites at which the two sequences are different p is close to 1 for poorly related sequences and p is close to 0 for similar sequences d p seqpdist Method Jukes Cantor default alignment score Description Maximum likelihood estimate of the number of substitutions between two sequences p is described with the method p distance For nucleotides d 3 4 log 1 p 4 3 For amino acids d 19 20 log i p 20 19 Distance d between two sequences 1 2 is computed from the pair wise alignment score between the two sequences score12 and the pair wise alignment score between each sequence and itself score11 score22 as follows d 1 score12 score11 1 score12 score22 This option does not imply that prealigned input sequences will be realigned it only scores them Use with care this distance method does not comply with the ultrametric condition In the rare case where the score between sequences is greater than the score when aligning a sequence w
194. _lo_res Y Y_lo_res 1 2 5 6 MZ MZ_lo_res plot MZ Y 4 msnorm lolx File Edit View Insert Tools Desktop Window Help 2 Normalize the AUC of every spectrum to its median eliminating low mass noise and post rescaling such that the maximum intensity is 100 Y1 msnorm MZ Y Limits 1000 inf Max 100 plot MZ Yi 4 2 453 msnorm Example 2 See Also 2 454 iojx File Edit View Insert Tools Desktop Window Help a 100 80 60 40 20 3 Normalize the ion intensity of every spectrum to the maximum intensity of the single highest peak from any of the spectra in the range above 100 m z Y2 msnorm MZ Y QUANTILE 1 1 LIMITS 1000 inf 1 Select MZ regions where the intensities are within the third quartile in at least 90 of the spectrograms Y3 S msnorm MZ Y Quantile 0 5 0 75 Consensus 0 9 2 Use the same MZ regions to normalize another set of spectrograms Y4 msnorm MZ Y S Bioinformatics Toolbox functions msalign msbackadj msheatmap mslowess msresample mssgolay msviewer mspalign Purpose Align mass spectra from multiple peak lists from LC MS or GC MS data set Syntax CMZ AlignedPeaks mspalign Peaks CMZ AlignedPeaks mspalign Peaks Quantile QuantileValue CMZ AlignedPeaks mspalign Peaks EstimationMethod EstimationMethodValue CMZ AlignedPeaks mspalign Peaks Correcti
195. a Hal a 4200 4000 3800 3600 3400 Retention Time 3200 3000 28004 phra pi t i Mona i 2600 f bb 250 300 350 400 450 500 550 Mass Charge M Z 2 432 msdotplot 3 Resample the data then create a heat map and a dot plot of the LC MS data MZ Y msppresample peaks 5000 msheatmap MZ ret_time log Y lolx Fie Edit View Insert Tools Desktop Window Help a Dena iR amp a naea aA Retention Time Relative Intensity 200 250 300 350 400 450 500 550 600 Mass Charge M Z msdotplot peaks ret_time 2 433 msdotplot ol x File Edit View Insert Tools Desktop Window Help a Desr aana eli al n 2600 2800 3000 a 3200 gt E K E 3400 2 a o 5 3600 c Fr 3800 4000 4200 4400 200 350 400 450 Mass Charge M Z 4 Zoom in on the heat map to see the detail axis 470 520 3200 3600 2 434 msdotplot lolx File Edit View Insert Tools Desktop Window Help D eHalle na Elna n 3200 3250 3300 w Ww g Law Retention Time Relative Intensity 470 480 490 500 510 520 Mass Charge M Z See Also Bioinformatics Toolbox functions msheatmap mspalign mspeaks msppresample mzxml2peaks mzxmlread 2 435 msheatmap Purpose Syntax Arguments 2 436 Create pseudocolor image of set of mass spectra msheatmap MZ Intensities msheatmap MZ Ti
196. a MAT file included with Bioinformatics Toolbox which contains Affymetrix data variables including dependentData and mavolcanoplot References See Also independentData two matrices of gene expression values from two experimental conditions load prostatecancerexpdata 2 Use the mattest function to calculate p values for the gene expression values in the two matrices pvalues mattest dependentData independentData 3 Using the two matrices the pvalues calculated by mattest and the probesetIDs column vector of labels provided use mavolcanoplot to create a significance versus gene expression ratio scatter plot of the microarray data from the two experimental conditions mavolcanoplot dependentData independentData pvalues Labels probesetIDs The prostatecancerexpdata mat file used in the previous example contains data from Best et al 2005 1 Cui X Churchill G A 2003 Statistical tests for differential expression in cDNA microarray experiments Genome Biology 4 210 2 Best C J M Gillespie J W Yi Y Chandramouli G V R Perlmutter M A Gathright Y Erickson H S Georgevich L Tangrea M A Duray P H Gonzalez S Velasco A Linehan W M Matusik R J Price D K Figg W D Emmert Buck M R and Chuaqui R F 2005 Molecular alterations in primary prostate cancer after androgen ablation therapy Clinical Cancer Research 11 6823 6834 Bioinformatics Toolbox functi
197. a figure window and draws a graph represented by a biograph object BGobj When the biograph object is already drawn in the figure window this function only updates the graph properties BGobjHandle view 8Gobj returns a handle to a deep copy of the biograph object BGobj in the figure window When updating an existing figure you can use the returned handle to change object properties programmatically or from the command line When you close the figure window the handle is no longer valid The original biograph object BGobj is left unchanged 1 Create a biograph object cm 0 1100 10011 10000 00001 1 010 0 bg biograph cm 2 Render the biograph object into a Handles Graphic figure and get back a handle h view bg 3 Change the color of all nodes and edges set h Nodes Color 5 7 1 set h Edges LineColor 0 0 0 Bioinformatics Toolbox function biograph object constructor Bioinformatics Toolbox object biograph object view biograph Bioinformatics Toolbox methods of a biograph object dolayout getancestors getdescendants getedgesbynodeid getnodesbyid getrelatives view MATLAB functions get set 4 81 view phytree 4 82 Purpose Syntax Arguments Description Example See Also View phylogenetic tree view Tree view Tree IntNodes Tree Phylogenetic tree phytree object created with the function phytree IntNodes Nodes from the phytree object to initiall
198. a guiding tree dist seqpdist p53 ScoringMatrix gonnet tree seqlinkage dist UPGMA p53 Phylogenetic tree object with 7 leaves 6 branches 3 Score the progressive alignment with the PAM family ma multialign p53 tree ScoringMatrix pam150 pam200 pam250 showalignment ma 2 501 multialign Aligned Sequences a jol x amp E P53_XENLA 69 264 CAVPSTDD YAGKYGLOLDFQQNG TAKSVTCTYSPELNKLFCOLAKT P53_ONCMY 83 278 STVPTTSD YPGALGFOLRFLOS5 TAKSVTCT YS PDLNKLFCQOLAK P53_BRARE 63 257 STVPETSD YPGDHGFRLRF POSG TAKSVTCT YS PDLNKLFCQOLAR P53 HUMAN 95 289 SSVPSOKT YOGS YGFRLGFLHSG TAKSVTCT YS PALNKMFCOLAR P53 _ORYLA 80 270 TTVPVTTD YPGS YELELRFQKS5G TAKSVTSTYSETLNKL YCOLAK P73 HUMAN 113 309 PVIPSNTD YPGPHHFEVTF QOS 5 TAKSATWT YS PLLKEKL YCQTAR 927937 LOLFO 120 314 PSVPS NIK YPGEYVFEMSFAOPSKETKSTTWT YSEKLDKLYVRMAT Example 2 1 Enter an array of sequences seqs CACGTAACATCTC ACGACGTAACATCTTCT AAACGTAACATCTCGC 2 Promote terminations with gaps in the alignment multialign seqs terminalGapAdjust true ans CACGTAACATCTC ACGACGTAACATCTTCT AAACGTAACATCTCGC 2 502 multialign 3 Compare alignment without termination gap adjustment multialign seqs ans CA CGTAACATCT C ACGACGTAACATCTTCT AA ACGTAACATCTCGC See Also Bioinformatics Toolbox functions hmmprofalign multialignread nwalign profalign seqprofile seqcon
199. a one to one mapping For amino acids with more than one possible nucleotide codon this function selects randomly a codon corresponding to that particular amino acid For the ambiguous characters B and Z one of the amino acids corresponding to the letter is selected randomly and then a codon sequence is selected randomly For the ambiguous character X a codon sequence is selected randomly from all possibilities aa2nt PropertyName PropertyValue defines optional properties using property name value pairs aa2nt aa2nt GeneticCode GeneticCodeValue selects a genetic code GeneticCodeValue to use when converting an amino acid sequence SeqAA to a nucleotide sequence SeqNT aa2nt alphabet Alphabet Value Standard Genetic Code Alphabet AlphabetValue selects a nucleotide Amino Acid Amino Acid Alanine A GCT GCC GCA Phenylalanine TTT TTC GCG F Arginine R CGT CGC CGA Proline P CCT CCC CGG AGA AGG CCA CCG Asparagine ATT AAC Serine S TCT TCC N TCA TCG AGT AGC Aspartic GAT GAC Threonine T ACT ACC acid ACA ACG Aspartate D Cysteine C TGT TGC Tryptophan TGG W Glutamine CAA CAG Tyrosine Y TAT TAC Q Glutamic GAA GAG Valine V GTT GTC acid GTA GTG Glutamate E Glycine G GGT GGC GGA Aspartic acid B random GGG or Asparagine codon from D and N 2 7 aa2nt Ami
200. a path and file name or a URL pointing to a file File can also be a MATLAB character array that contains the text for a NCBI BLAST report BLAST Basic Local Alignment Search Tool reports offer a fast and powerful comparative analysis of interesting protein and nucleotide sequences against known structures in existing online databases BLAST reports can be lengthy and parsing the data from the various formats can be cumbersome Data blastread File reads a BLAST report from an NCBI formatted file File and returns a data structure Data containing fields corresponding to the BLAST keywords blastread parses the basic BLAST reports BLASTN BLASTP BLASTX TBLASTN and TBLASTX Data contains the following fields Field Description RID Algorithm Query Database Hits Name Hits Length Hits HSP Score Hits HSP Expect 2 73 blastread Examples References See Also 2 74 Field Description Hits HSP Identities Hits HSP Positives Hits HSP Gaps Hits HSP Frame Hits HSP Strand Hits HSP Alignment Hits HSPs QueryIndices Hits HSPs SubjectIndicies Statistics 1 Create a BLAST request with a GenPept accession number RID blastncbi AAA59174 blastp expect 1e 10 2 Pass the RID to getblast download the report and save the report to a text file getblast RID ToFile AAA59174_BLAST rpt 3 Using the saved file read
201. abel for the corresponding spectrum These values or strings are used to label the y axis of the heat map Note If input Times is provided it is assumed that Intensities contains LC MS or GC MS data and SpecIdxValue is ignored 2 439 msheatmap Description 2 440 GroupValue Either of the following e Vector of values with the same number of elements as rows in the matrix Intensities e Cell array of strings with the same number of elements as rows spectra in the matrix Intensities Each value or string specifies a group to which the corresponding spectrum belongs The spectra are sorted and combined into groups along the y axis in the heat map Note If input Times is provided it is assumed that Intensities contains LC MS or GC MS data and GroupValue is ignored ResolutionValue Value specifying the horizontal resolution of the heat map image Increase this value to enhance details Decrease this value to reduce memory usage Default is e 0 5 When MZ contains gt 2 500 elements e 0 05 When MZ contains lt 2 500 elements msheatmap MZ Intensities displays a pseudocolor heat map image of the intensities for the spectra in matrix Intensities msheatmap MZ Times Intensities displays a pseudocolor heat map image of the intensities for the spectra in matrix Intensities using the retention times in vector Times to label the y axis msheatmap PropertyName PropertyValue cal
202. ageneread File imagenedata imageneread CleanColNames CleanColNamesValue Arguments File ImaGene Results formatted file Enter a file name or a path and file name CleanColNameValue Property to control creating column names that MATLAB can use as variable names Description imagenedata imageneread File reads ImaGene results data from File and creates a MATLAB structure imagedata containing the following fields Field HeaderAA Data Blocks Rows Columns Fields IDs ColumnNames 2 323 imageneread Examples 2 324 Field Indices Shape imagenedata imageneread PropertyName PropertyValue defines optional properties using property name value pairs described as follows imagenedata imageneread CleanColNames CleanColNamesValue An ImaGene file may contain column names with spaces and some characters that MATLAB cannot use in MATLAB variable names If CleanColNamesValue is true imagene returns in the field ColumnNames names that are valid MATLAB variable names and names that you can use in functions By default CleanColNamesValue is false and the field ColumnNames may contain characters that are not valid for MATLAB variable names The field Indices of the structure contains MATLAB indices that you can use for plotting heat maps of the data with the function image or imagesc For more details on the ImaGene format and example data see the ImaGene
203. agment respective to the original sequence cleave Examples Fragments CuttingSites Lengths cleave returnsa numeric vector with the lengths of every fragment cleave PropertyName PropertyValue defines optional properties using property name value pairs cleave PartialDigest PartialDigestValue simulates a partial digestion where PartialDigest is the probability of a cleavage site being cut The following table lists some common proteases and their cleavage sites Protease Peptide Pattern Position Trypsin KR P 1 Chymotrypsin WYF P 1 Glutamine C ED P 1 Lysine C K P 1 Aspartic acid N D 1 1 Get a protein sequence from the GenPept database S getgenpept AAA59174 2 Cleave the sequence using trypsin Trypsin cleaves after K or R when the next residue is not P parts sites lengths cleave S Sequence KR P 1 5 for i 1 10 fprintf 5d 5d s n sites i lengths i parts i end 0 6 MGTGGR 6 1 R 7 34 GAAAAPLLVAVAALLLGAAGHLYPGEVCPGMDIR 41 5 NNLTR 2 87 cleave 46 21 LHELENCSVIEGHLQILLMFK 67 7 TRPEDFR 74 6 DLSFPK 80 12 LIMITDYLLLFR 92 8 VYGLESLK 100 10 DLFPNLTVIR See Also Bioinformatics Toolbox functions rebasecuts restrict seqshowwords MATLAB function regexp 2 88 clustergram Purpose Syntax Arguments Create dendrogram and heat map clustergram Data clustergram Data RowLabels
204. ale factor used to calculate the score is provided by the scoring matrix Score Alignment swalign Seq1 Seq2 returns a 3 by N character array showing the two sequences Seq1 and Seq2 in the first and third rows and symbols representing the optimal local alignment between them in the second row The symbol indicates amino acids or nucleotides that match exactly The symbol indicates amino acids or nucleotides that are related as defined by the scoring matrix nonmatches with a zero or positive scoring matrix value Score Alignment Start swalign Seq1 Seq2 returns a 2 by 1 vector of indices indicating the starting point in each sequence for the alignment swalign swalign Seq1 Seq2 PropertyName PropertyValue calls swalign with optional properties that use property name property value pairs You can specify one or more properties in any order Each PropertyName must be enclosed in single quotation marks and is case insensitive These property name property value pairs are as follows swalign Seq1 Seq2 Alphabet AlphabetValue specifies the type of sequences Choices are AA default or NT swalign Seq1 Seq2 ScoringMatrix ScoringMatrixValue specifies the scoring matrix to use for the local alignment Default is e BLOSUM50 when AlphabetValue equals AA e NUC44 when AlphabetValue equals NT swalign Seq1 Seq2 Scale ScaleValue specifies the scale factor u
205. alue getgenbank SequenceOnly SequenceOnlyValue AccessionNumber Unique identifier for a sequence record Enter a unique combination of letters and numbers ToFileValue Property to specify the location and file name for saving data Enter either a file name ora path and file name supported by your system ASCII text file FileFormatValue Property to select the format for the file specified with the property ToFileValue Enter either GenBank or FASTA SequenceOnlyValue Property to control getting the sequence only Enter either true or false getgenbank retrieves nucleotide and amino acid sequence information from the GenBank database This database is maintained by the National Center for Biotechnology Information NCBI For more details about the GenBank database see http www ncbi nlm nih gov Genbank Data getgenbank AccessionNumber searches for the accession number in the GenBank database and returns a MATLAB structure containing information for the sequence If an error occurs while retrieving the GenBank formatted information then an attempt is make to retrieve the FASTA formatted data 2 203 getgenbank getgenbank AccessionNumber displays information in the MATLAB Command Window without returning data to a variable The displayed information includes hyperlinks to the URLs for searching and retrieving data getgenbank PropertyName PropertyValue defines optional properti
206. alue pairs are as follows graphtraverse G S Depth DepthValue specifies the depth of the search DepthValue is an integer indicating a node in graph G Default is Inf infinity graphtraverse G S Directed DirectedValue indicates whether the graph is directed or aidi rected Set DirectedValue to false for an undirected graph This results in the upper triangle of the sparse matrix being ignored Default is true graphtraverse G S Method MethodValue lets you specify the algorithm used to traverse the graph Choices are e BFS Breadth first search Time complexity is O N E where N and E are number of nodes and edges respectively e DFS Default algorithm Depth first search Time complexity is O N E where N and E are number of nodes and edges respectively 1 Create a directed graph with 10 nodes and 12 edges DG sparse 1 2 3 4 5567 8 8 9Q9 241 367981 10 21 EUe 10 10 oo 2 299 graphtraverse i 1 i 1 1 1 i 1 1 i 1 1 h view biograph DG Biograph object with 10 nodes and 12 edges 2 300 graphtraverse Biograph Viewer 1 malki File Tools Help a AAT 2 Traverse the graph to find the depth first search DFS discovery order starting at node 4 order graphtraverse DG 4 order 2 301 graphtraverse 3 Label the nodes with the DFS discovery order for i 1 10 h Nodes order
207. ame If you set CDFFile to then it opens the Select CDF File dialog box from which you select the CDF file CELPathValue String specifying the path and directory where the files specified in CELFiles are stored CDFPathValue String specifying the path and directory where the file specified in CDFFile is stored 2 77 celintensityread 2 78 Return Values Description PMOnlyValue Property to include or exclude the mismatch MM probe intensity values in the returned structure Enter true to return only perfect match PM probe intensities Enter false to return both PM and MM probe intensities Default is true VerboseValue Controls the display of a progress report showing the name of each CEL file as it is read When VerboseValue is false no progress report is displayed Default is true ProbeStructure MATLAB structure containing information from the CEL files including probe intensities probe indices and probe set IDs Note This function is supported on the Windows 32 platform only ProbeStructure celintensityread CELFiles CDFFile reads the specified Affymetrix CEL files and the associated CDF library file and then creates ProbeStructure a structure containing information from the CEL files including probe intensities probe indices and probe set IDs CELFiles is a cell array of CEL file names CDFFile is a string specifying a CDF file name If you set CELFiles to then it reads all CEL files
208. ame for each block or print tip by dividing each block by the mean column intensity The output is a matrix with each column corresponding to the normalized data for each block e MAStruct Microarray structure XNorm ColVal manorm returns the values used to normalize the data manorm Method MethodValue allows you to choose the method for scaling or centering the data MethodValue can be Mear default Median STD standard deviation MAD median absolute deviation or a function handle If you pass a function handle then the function should ignore NaNs and must return a single value per column of the input data 2 383 manorm Examples 2 384 manorm Extra_Args Extra_ArgsValue allows you to pass extra arguments to the function MethodValue Extra_ArgsValue must be a cell array manorm LogData LogDataValue when LogDataValue is true works with log ratio data in which case the mean or MethodValue of each column is subtracted from the values in the columns instead of dividing the column by the normalizing value manorm Percentile PercentileValue only uses the percentile PercentileValue of the data preventing large outliers from skewing the normalization If PercentileValue is a vector containing two values then the range from the PercentileValue 1 percentile to the PercentileValue 2 percentile is used The default value is 100 that is to use all th
209. ame red for the second and green for the third frame seqshoworfs SeqNT Columns ColumnsValue specifies how many columns per line to use in the output The default value is 64 Examples Look for the open reading frames in a random nucleotide sequence s randseq 200 alphabet dna seqshoworfs s 2 672 seqshoworfs Open Reading Frames BEET amp Frame 1 000001 TAGCTTCATCGTTGACTTCTACTAAAAGCAAGCTCCTGAGTAGCTGGCCAAGCGAGCTTGCTTG 000065 TGCCCGGCTGCGGCGGTTGTATCCTGAATACGCCATGCGCCAGTGGACTGCGTAGACCTATTTT 000129 CCAGCTGCGCCTGATGAAGGCGCAACACGAAGGAAAGACGGGACCCAGGGCGACGTCCTATTAA 000193 AAGATAAT Frame 2 000001 TAGCTTCATCGTTGACTTCTACTAAAAGCAAGCTCCTGAGTAGCTGGCCAAGCGAGCTTGCTTG 000065 TGCCCGGCTGCGGCGGTTGTATCCTGAATACGCCATGCGCCAGTGGACTGCGTAGACCTATTTT 000129 CCAGCTGCGCCTGATGAAGGCGCAACACGAAGGAAAGACGGGACCCAGGGCGACGTCCTATTAA 000193 AAGATAAT Frame 3 000001 TAGCTTCATCGTTGACTTCTACTAAAAGCAAGCTCCTGAGTAGCTGGCCAAGCGAGCTTGCTTG 000065 TGCCCGGCTGCGGCGGTTGTATCCTGAATACGCCATGCGCCAGTGGACTGCGTAGACCTATTTT 000129 CCAGCTGCGCCTGATGAAGGCGCAACACGAAGGAAAGACGGGACCCAGGGCGACGTCCTATTAA 000193 AAGATAAT 2 673 seqshoworfs Identify the open reading frames in a GenBank sequence HLA_DQB1 getgenbank NM_002123 seqshoworfs HLA DQB1 Sequence See Also Bioinformatics Toolbox functions codoncount cpgisland geneticcode seqdisp seqshowwords seqtool seqwordcount MATLAB function regexp 2 674 seqshowword
210. ample and Training must be matrices with the same number of columns Group is a vector whose distinct values define the grouping of the rows in Training Each row of Training belongs to the group whose value is the corresponding entry of Group knnclassify assigns each row of Sample to the group for the closest row of Training Group can be a numeric vector a string array or a cell array of strings Training and Group must have the same number of rows knnclassify treats NaNs or empty strings in Group as missing values and ignores the corresponding rows of Training Class indicates which group each row of Sample has been assigned to and is of the same type as Group Class knnclassify Sample Training Group k enables you to specify k the number of nearest neighbors used in the classification Default is 1 knnclassify Class knnclassify Sample Training Group k distance enables you to specify the distance metric Choices for distance are euclidean Euclidean distance default cityblock Sum of absolute differences cosine One minus the cosine of the included angle between points treated as vectors correlation One minus the sample correlation between points treated as sequences of values hamming Percentage of bits that differ only suitable for binary data Class knnclassify Sample Training Group k distance rule enables you to specify the rule used to decide how to classify the sample C
211. ans 2 252 graphisdag References See Also Testing for Cycles in a Very Large Graph Greater Than 20 000 Nodes and 30 000 Edges 1 Download the Gene Ontology database to a geneont object GO geneont live true 2 Convert the geneont object to a matrix CM getmatrix GO 3 Test for cycles in the graph graphisdag CM Creating a Random DAG 1 Create and view a random directed acyclic graph DAG with 15 nodes and 20 edges g sparse true 15 15 while nnz g lt 20 edge randsample 15 15 1 get a random edge g edge true g edge graphisdag g end view biograph g 2 Test for cycles in the graph graphisdag g 1 Siek J G Lee L Q and Lumsdaine A 2002 The Boost Graph Library User Guide and Reference Manual Upper Saddle River NJ Pearson Education Bioinformatics Toolbox functions graphallshortestpaths graphconncomp graphisomorphism graphisspantree graphmaxf low 2 253 graphisdag graphminspantree graphpred2path graphshortestpath graphtopoorder graphtraverse Bioinformatics Toolbox method of biograph object isdag 2 254 graphisomorphism Purpose Syntax Arguments Description Find isomorphism between two graphs Isomorphic Map graphisomorphism G1 G2 Isomorphic Map graphisomorphism G71 G2 Directed DirectedValue G1 N by N sparse matrix that represents a directed or undirected graph Nonzero entries in matrix G1
212. ansValue to false when the data is already log scale Default is true which assumes the data is natural scale mairplot FactorLines FactorLinesValue adds lines to the plot showing a factor of N change Default is 2 which corresponds to a level of 1 and 1 on a log scale Tip You can also change the factor lines interactively after creating the plot mairplot Title TitleValue specifies a title for the plot mairplot Labels LabelsValue specifies a cell array of labels for the data If labels are defined then clicking a point on the plot shows the label corresponding to that point mairplot Normalize NormalizeValue controls the display of lowess normalized ratio values Enter true to display to lowess normalized ratio values Default is false Tip You can also normalize the data from the MAIR Plot window after creating the plot mairplot LowessOptions LowessOptionsValue lets you specify up to three property name value pairs in any order that affect the lowess normalization Choices for property name value pairs are e Order OrderValue e Robust RobustValue e Span SpanValue mairplot For more information on the previous three property name value pairs see the malowess function Following is an IR plot of normalized data Figure 1 MAIRPlot Morr F Show smooth cure ESTs Highly similar to UTP GL STs Mus
213. ant set normalization on probe intensities from multiple Affymetrix CEL or DAT files Compute Affymetrix probe affinities from their sequences and MM probe intensities Calculate range of gene expression profiles Calculate variance of gene expression profiles Perform GC Robust Multi array Average GCRMA background adjustment quantile normalization and median polish summarization on Affymetrix microarray probe level data Perform GC Robust Multi array Average GCRMA background adjustment on Affymetrix microarray probe level data using sequence information Remove genes with low entropy expression values Remove gene profiles with low absolute values Remove gene profiles with small profile ranges Filter genes with small profile variance 1 17 T Functions By Category 1 18 mainvarsetnorm malowess manorm quantilenorm rmabackadj rmasummary Statistical Learning classperf crossvalind knnclassify knnimpute optimalleaforder randfeatures Perform rank invariant set normalization on gene expression values from two experimental conditions or phenotypes Smooth microarray data using Lowess method Normalize microarray data Quantile normalization over multiple arrays Perform background adjustment on Affymetrix microarray probe level data using Robust Multi array Average RMA procedure Calculate gene probe set expression values from Affymetrix microarray probe level d
214. aph Library User Guide and Reference Manual Upper Saddle River NJ Pearson Education 4 3 allshortestpaths biograph See Also Bioinformatics Toolbox functions biograph object constructor graphallshortestpaths Bioinformatics Toolbox object biograph object Bioinformatics Toolbox methods of a biograph object conncomp isdag isomorphism isspantree maxflow minspantree shortestpath topoorder traverse 4 4 conncomp biograph Purpose Syntax Arguments Description Find strongly or weakly connected components in biograph object S C conncomp BGOb7 S C conncomp BGObj Directed DirectedValue S C conncomp BGObj Weak WeakValue BGObj DirectedValue WeakValue biograph object created by biograph object constructor Property that indicates whether the graph is directed or undirected Enter false for an undirected graph This results in the upper triangle of the sparse matrix being ignored Default is true A DFS based algorithm computes the connected components Time complexity is O N E where N and E are number of nodes and edges respectively Property that indicates whether to find weakly connected components or strongly connected components A weakly connected component is a maximal group of nodes that are mutually reachable by violating the edge directions Set WeakValue to true to find weakly connected components Default is false which finds s
215. aracters separately asterisks are counted in a new field Stop and hyphens are counted in a new field Gap aacount PropertyName PropertyValue defines optional properties using property name value pairs aacount Chart ChartValue creates a chart showing the relative proportions of the amino acids aacount Others OthersValue when OthersValue is full counts the ambiguous amino acid characters individually instead of adding them together in the field Others aacount Structure StructureValue when StructureValue is full blocks the unknown characters warning and ignores counting unknown characters e aacount SeqAA Display 20 amino acids and only if there are ambiguous and unknown characters add an Others field with the counts e aacount SeqgAA Others full Display 20 amino acids 3 ambiguous amino acids stops gaps and only if there are unknown characters add an Others field with the unknown counts e aacount SeqgAA Structure full Display 20 amino acids and always display an Others field If there are ambiguous and unknown characters add counts to the Others field otherwise display 0 aacount Examples e aacount SeqAA Others full Structure full Display 20 amino acids 3 ambiguous amino acids stops gaps and Others field If there are unknown characters add counts to the Others field otherwise display 0 1 C
216. are the number of nodes and edges respectively e BFS Breadth first search Assumes all weights to be equal and nonzero entries in the N by N adjacency matrix to represent edges Time complexity is 0 N E where N and E are the number of nodes and edges respectively e Acyclic Assumes the graph represented by the N by N adjacency matrix extracted from a biograph object BGObj to be a directed acyclic graph and that weights of the edges are nonzero entries in the N by N adjacency matrix Time complexity is 0 N E where N and E are the number of nodes and edges respectively e Dijkstra Default algorithm Assumes weights of the edges to be positive values in the N by N adjacency matrix Time complexity is O log N E where N and E are the number of nodes and edges respectively Column vector that specifies custom weights for the edges in the N by N adjacency matrix extracted from a biograph object BGObj It must have one entry for every nonzero value edge in the N by N adjacency matrix The order of the custom weights in the vector must match the order of the nonzero values in the N by N adjacency matrix when it is traversed column wise This property lets you use zero valued weights By default shortestpaths gets weight information from the nonzero entries in the N by N adjacency matrix 4 71 shortestpath biograph 4 72 Description Tip For introductory information on graph theory functions see
217. are true default or false 2 163 gcrma Return Values Description 2 164 ExpressionMatrix Matrix of log expression values where each row corresponds to a gene probe set and each column corresponds to an Affymetrix CEL file which represents a single chip ExpressionMatrix gcrma PMMatrix MMMatrix ProbeIndices AffinPM AffinMM performs GCRMA background adjustment quantile normalization and median polish summarization on Affymetrix microarray probe level data using probe affinity data ExpressionMatrix is a matrix of log expression values where each row corresponds to a gene probe set and each column corresponds to an Affymetrix CEL file which represents a single chip Note There is no column in ExpressionMatrix that contains probe set or gene information ExpressionMatrix gcrma PMMatrix MMMatrix ProbeIndices SequenceMatrix performs GCRMA background adjustment quantile normalization and Robust Multi array Average RMA summarization on Affymetrix microarray probe level data using probe sequence data to compute probe affinity data ExpressionMatrix is a matrix of log expression values where each row corresponds to a gene probe set and each column corresponds to an Affymetrix CEL file which represents a single chip Note If AffinPM and AffinMM affinity data and SequenceMatrix sequence data are not available you can still use the gcrma function by entering an empty matrix for these inputs in
218. are true or false default 2 357 mafdr lolx File Edit View Insert Tools Desktop Window Help a DeWall o 0 6808 cubic polynomial fit O o Examples 1 Load the MAT file included with Bioinformatics Toolbox that contains Affymetrix data from a prostate cancer study specifically probe intensity data from Affymetrix HG U133A GeneChip arrays The two variables in the MAT file dependentData and independentData are two matrices of gene expression values from two experimental conditions load prostatecancerexpdata 2 358 mafdr References See Also 2 Use the mattest function to calculate p values for the gene expression values in the two matrices pvalues mattest dependentData independentData permute true 3 Use the mafdr function to calculate positive FDR values and q values for the gene expression values in the two matrices and plot the data fdr q mafdr pvalues showplot true The prostatecancerexpdata mat file used in this example contains data from Best et al 2005 1 Best C J M Gillespie J W Yi Y Chandramouli G V R Perlmutter M A Gathright Y Erickson H S Georgevich L Tangrea M A Duray P H Gonzalez S Velasco A Linehan W M Matusik R J Price D K Figg W D Emmert Buck M R and Chuaqui R F 2005 Molecular alterations in primary prostate cancer after androgen ablation therapy Clinical Cancer Research 11
219. ariant set reaches N percent of the total number of input data points Default is 1 Note If you do not use this property the iteration process continues until no more data points are eliminated mainvarsetnorm Description IterateValue MethodValue SpanValue Showplot Value Property to control the iteration process for determining the invariant set of data points Enter true to repeat the process until either no more data points are eliminated or a predetermined percentage of data points StopPrctileValue is reached Enter false to perform only one iteration of the process Default is true Tip Select false for smaller data sets typically less than 200 data points Property to select the smoothing method used to normalize the data Enter lowess or runmedian Default is lowess Property to set the window size for the smoothing method If SpanValue is less than 1 the window size is that percentage of the number of data points If SpanValue is equal to or greater than 1 the window size is of size SpanValue Default is 0 05 which corresponds to a window size equal to 5 of the total number of data points in the invariant set Property to control the plotting of a pair of M A scatter plots before and after normalization M is the ratio between Datax and DatayY A is the average of DataX and DataY Enter true to create the pair of M A scatter plots Default is false NormDataY mainvar
220. arkers MarkersValue places markers along the top horizontal axis of the heat map for the m z values specified in the vector MarkersValue Default is msheatmap SpecIdx SpecIdxValue labels the spectra along the y axis in the heat map The labels are specified by SpecIdxValue a vector of values or cell array of strings The number of values or strings is the same as the number of columns spectra in the matrix Intensities Each value or string specifies a label for the corresponding spectrum msheatmap Group GroupValue sorts and combines spectra into groups along the y axis in the heat map The groups are 2 441 msheatmap Examples 2 442 specified by GroupValue a vector of values or cell array of strings The number of values or strings is the same as the number of rows in the matrix Intensities Each value or string specifies a group to which the corresponding spectrum belongs msheatmap Resolution ResolutionValue specifies the horizontal resolution of the heat map image Increase this value to enhance details Decrease this value to reduce memory usage Default is 0 5 When MZ contains gt 2 500 elements 0 05 When MZ contains lt 2 500 elements SELDI TOF Data 1 Load SELDI TOF sample data load sample_lo_res 2 Create a vector of four m z values to mark along the top horizontal axis of the heat map M 3991 4 4598 7964 9160 3 Display the he
221. arrow to crosshairs Left click and drag a rectangle box over an area and then release the mouse button The display zooms the area covered by the box 5 Move the cursor to the range window at the bottom Click and drag the view box to a new location See Also Bioinformatics Toolbox functions msalign msbackadj mslowess msnorm msheatmap msresample mssgolay 2 494 multialign Purpose Align multiple sequences using progressive method Syntax SeqsMultiAligned multialign Seqs SeqsMultiAligned multialign Seqs Tree multialign PropertyName PropertyValue multialign Weights WeightsValue multialign ScoringMatrix ScoringMatrixValue multialign SMInterp SMInterpValue multialign GapOpen GapOpenValue multialign ExtendGap ExtendGapValue multialign DelayCutoff DelayCutoffValue multialign JobManager JobManagerValue multialign WaitInQueue WaitInQueueValue multialign Verbose VerboseValue multialign ExistingGapAdjust ExistingGapAdjustValue multialign TerminalGapAdjust TerminalGapAdjustValue Arguments Seqs Vector of structures with the fields Sequence for the residues and Header or Name for the labels Seqs may also be a cell array of strings or a char array SeqsMultiAligned Vector of structures same as Seqs but with the field Sequence updated with the alignment When Seqs is a cell or char
222. ase insensitive These property name property value pairs are as follows IDX Z rankfeatures X Group Criterion CriterionValue sets the criterion used to assess the significance of every feature for separating two labeled groups Choices are 2 599 rankfeatures 2 600 ttest default Absolute value two sample t test with pooled variance estimate entropy Relative entropy also known as Kullback Lieber distance or divergence prattacharyya Minimum attainable classification error or Chernoff bound roc Area between the empirical receiver operating characteristic ROC curve and the random wilcoxon Abselfite widpe of the u statistic of a two sample unpaired Wilcoxon test also known as Mann Whitney Note ttest entropy and brattacharyya assume normal distributed classes while roc and wilcoxon are nonparametric tests All tests are feature independent IDX Z rankfeatures X Group CCWeighting ALPHA uses correlation information to outweigh the Z value of potential features using Z 1 ALPHA RHO where RHO is the average of the absolute values of the cross correlation coefficient between the candidate feature and all previously selected features ALPHA sets the weighting factor It is a scalar value between 0 and 1 When ALPHA is 0 default potential features are not weighted A large value of RHO close to 1 outweighs the significance statistic this means th
223. at features that are highly correlated with the features already picked are less likely to be included in the output list IDX Z rankfeatures X Group NWeighting BETA uses regional information to outweigh the Z value of potential features using Z 1 exp DIST BETA 2 where DIST is the distance in rows between the candidate feature and previously selected features BETA sets the weighting factor It is greater than or equal to 0 When BETA is 0 default potential features are not weighted rankfeatures A small DIST close to 0 outweighs the significance statistics of only close features This means that features that are close to already picked features are less likely to be included in the output list This option is useful for extracting features from time series with temporal correlation BETA can also be a function of the feature location specified using or an anonymous function In both cases rankfeatures passes the row position of the feature to BETA and expects back a value greater than or equal to 0 Note You can use CCWeighting and NWeighting together IDX Z rankfeatures X Group NumberOfIndices N sets the number of output indices in IDX Default is the same as the number of features when ALPHA and BETA are 0 or 20 otherwise IDX Z rankfeatures X Group CrossNorm CN applies independent normalization across the observations for every feature Cross no
224. at map with m z markers and a limited m z range msheatmap MZ_lo_res Y_lo_res markers M range 3000 10000 msheatmap Figure 1 i loj x File Edit View Insert Tools Desktop Window Help a Oe HS Ri AQM 8 S 08 s0 Spectrogram Indices Relative Intensity 3000 4000 5000 6000 7000 8000 9000 10000 Mass Charge M Z 4 Display the heat map again grouping each spectrum into one of two groups TwoGroups 112211 2 2 msheatmap MZ_lo_res Y_lo_res markers M group TwoGroups 2 443 msheatmap 15 x File Edit View Insert Tools Desktop Window Help a Oe eS F QAQMd ElO08 a0 Spectrogram Groups Relative Intensity 0 5000 10000 15000 Mass Charge M Z Liquid Chromatography Mass Spectrometry LC MS Data 1 Load LC MS sample data load lcmsdata 2 Resample the peak lists to create a vector of m z values and a matrix of intensity values MZ Intensities msppresample peaks 5000 2 444 msheatmap 3 Display the heat map showing mass spectra at different retention times msheatmap MZ ret_time log Intensities ioj x File Edit View Insert Tools Desktop Window Help a Deshan ea n 2800 3000 3200 Retention Time Relative Intensity 200 250 300 350 400 450 500 550 600 Mass Charge M Z See Also Bioinformatics Toolbox functions msalign msbackadj msdotplot mslowess msnorm mspalign msresample mssgolay msviewer 2 445 msl
225. ata that you want to associate with the edge The edge does not use this property but you can access and specify it using the get and set functions Default is biograph object Examples See Also Accessing Properties of a Biograph Object You can access properties of a biograph object BGobj by using either of the following syntaxes PropertyValue get BGobj PropertyName PropertyValue BGobj PropertyName Accessing Allowed Values of Biograph Object Properties You can access allowed values for any property that has a finite set of choices by using the following syntax set BGobj PropertyName Specifying Properties of a Biograph Object You can specify properties of a biograph object BGobj by using any of the following syntaxes set BGobj PropertyName PropertyValue BGobj PropertyName PropertyValue Bioinformatics Toolbox function biograph object constructor Bioinformatics Toolbox methods of a biograph object allshortestpaths conncomp dolayout getancestors getdescendants getedgesbynodeid getmatrix getnodesbyid getrelatives isdag isomorphism isspantree maxflow minspantree shortestpath topoorder traverse view MATLAB functions get set geneont object Purpose Description Method Summary Property Summary Data structure containing Gene Ontology GO information A geneont object is a data structure containing Gene Ontology information Gene Ontology terms
226. ata using Robust Multi array Average RMA procedure Evaluate performance of classifier Generate cross validation indices Classify data using nearest neighbor method Impute missing data using nearest neighbor method Determine optimal leaf ordering for hierarchical binary cluster tree Generate randomized subset of features Mass Spectrometry File Formats Preprocessing and Visualization rankfeatures svmclassify svmsmoset svmtrain Rank key features by class separability criteria Classify data using support vector machine Create or edit Sequential Minimal Optimization SMO options structure Train support vector machine classifier Mass Spectrometry File Formats Preprocessing and Visualization jcampread msalign msbackadj msdotplot msheatmap mslowess msnorm mspalign mspeaks msppresample Read JCAMP DX formatted files Align peaks in mass spectrum to reference peaks Correct baseline of mass spectrum Plot set of peak lists from LC MS or GC MS data set Create pseudocolor image of set of mass spectra Smooth mass spectrum using nonparametric method Normalize set of mass spectra Align mass spectra from multiple peak lists from LC MS or GC MS data set Convert raw mass spectrometry data to peak list centroided data Resample mass spectrometry signal while preserving peaks T Functions By Category msresample Resample mass spectrometry signal mssgolay Smooth
227. atch MM probe and each column corresponds to an Affymetrix CEL file Each CEL file is generated from a separate chip All chips should be of the same type Tip You can use the MMIntensities matrix returned by the celintensityread function Column vector of PM probe affinities such as returned by the affyprobeaffinities function Each row corresponds to a probe Column vector of MM probe affinities such as returned by the affyprobeaffinities function Each row corresponds to a probe Controls the use of optical background correction on the PM and MM probe intensity values in PMMatrix and MMMatrix Choices are true default or false 2 169 gcrmabackadj 2 170 CorrConstValue MethodValue TuningParamValue AddVarianceValue Value that specifies the correlation constant rho for log background intensity for each PM MM probe pair Choices are any value gt 0 and lt 1 Default is 0 7 String that specifies the method to estimate the signal Choices are MLE a faster ad hoc Maximum Likelihood Estimate method or EB a slower more formal empirical Bayes method Default is MLE Value that specifies the tuning parameter used by the estimate method This tuning parameter sets the lower bound of signal values with positive probability Choices are a positive value Default is 5 MLE or 0 5 EB Tip For information on determining a setting for this parameter see Wu et al 2004 Controls whet
228. ates a dendrogram and heat map from the gene expression data in the matrix Data It uses hierarchical clustering with euclidean distance metric and average linkage to generate the hierarchical tree The clustering is performed on the rows in matrix Data in which the rows correspond to genes and the columns correspond to different microarrays To cluster the columns instead of the rows transpose the data using the transpose operator clustergram Data PropertyName PropertyValue calls clustergram with optional properties that use property name property value pairs You can specify one or more properties in any order Each PropertyName must be enclosed in single quotation marks and is case insensitive These property name property value pairs are as follows clustergram Data RowLabels RowLabelsValue uses the contents of RowLabelsValue a vector of numbers or cell array of text strings as labels for the rows in Data clustergram Data ColumnLabels ColumnLabelsValue uses the contents of ColumnLabelsValue a vector of numbers or cell array of text strings as labels for the columns in Data clustergram Data Pdist PdistValue specifies the distance metric to pass to the pdist function Statistics Toolbox to use to calculate the pair wise distances between observations PdistValue is a string For information on choices see the pdist function Default is euclidean clustergram Note Ifth
229. ause NCBI purges reports after 24 hours getblast PropertyName PropertyValue defines optional properties using property name value pairs getblast Descriptions DescriptionsValue includes the specified number of descriptions DescriptionsValue in the report getblast Alignments AlignmentsValue includes the specified number of alignments in the report getblast ToFile ToFileValue saves the data returned from the NCBI BLAST report to a file ToFileValue The default format for the file is text but you can specify HTML with the property FileFormat getblast FileFormat FileFormatValue returns the report in the specified format FileFormat Value getblast WaitTime WaitTimeValue pauses MATLAB and waits a specified time minutes for a report from the NCBI Web site If the report is still unavailable after the wait time getblast returns an error message The default behavior is to not wait for a report For more information about reading and interpreting BLAST reports see http www ncbi nlm nih gov Education BLASTinfo tut1 html 1 Run a BLAST search with an NCBI accession number RID blastncbi AAA59174 blastp expect 1e 10 getblast 2 Pass the RID to GETBLAST to parse the report load it into a MATLAB structure and save a copy as a text file report getblast RID TOFILE Report txt See Also Bioinformatics Toolbox functions blastncbi bla
230. ay FNames where Names is a cell array with the names of the genes corresponding to each row in Data You can also create FNames using FNames Names L generangefilter PropertyName PropertyValue defines optional properties using property name value pairs generangefilter Percentile PercentileValue removes from the experimental data Data gene expression profiles with ranges less than a specified percentile PercentileValue generangefilter AbsValue AbsValueValue removes from Data gene expression profiles with ranges less than AbsValueValue generangefilter LOGPercentile LOGPercentileValue filters genes with profile ranges in the lowest percent of the log range LOGPercentileValue generangefilter LOGValue LOGValueValue filters genes with profile log ranges lower than LOGValueValue load yeastdata mask fyeastvalues fgenes generangefilter yeastvalues genes 1 Kohane I S Kho A T Butte A J 2003 Microarrays for an Integrative Genomics Cambridge MA MIT Press Bioinformatics Toolbox functions exprprofrange exprprofvar geneentropyfilter genelowvalfilter genevarfilter 2 187 geneticcode Purpose Nucleotide codon to amino acid mapping Syntax Map geneticcode geneticcode GeneticCode Arguments GeneticCode Enter a code number or code name from the table If you use a code name you can truncate the name to the first two characters of the name
231. be intensity data from Affymetrix HG U133A GeneChip arrays The two variables in the MAT file dependentData and independentData are two matrices of gene expression values from two experimental conditions 2 393 mattest References See Also 2 394 load prostatecancerexpdata 2 Calculate the p values and t scores for the gene expression values in the two matrices and display a normal t score quantile plot pvalues tscores mattest dependentData independentData showplot true 3 Calculate the p values and t scores again using permutation tests 1000 permutations and displaying histograms of t score distributions and p value distributions pvalues tscores mattest dependentData independentData permute true showhist true showplot true The prostatecancerexpdata mat file used in this example contains data from Best et al 2005 1 Huber W von Heydebreck A S ltmann H Poustka A and Vingron M 2002 Variance stabilization applied to microarray data calibration and to the quantification of differential expression Bioinformatics 18 Suppl1 S96 S104 2 Best C J M Gillespie J W Yi Y Chandramouli G V R Perlmutter M A Gathright Y Erickson H S Georgevich L Tangrea M A Duray P H Gonzalez S Velasco A Linehan W M Matusik R J Price D K Figg W D Emmert Buck M R and Chuaqui R F 2005 Molecular alterations in primary prosta
232. be set numbers for a CHP file use 0 based indexing while MATLAB uses 1 based indexing CHPStruct ProbeSets 1 has ProbeSetNumber 0 probesetplot GeneName GeneNameValue when GeneName is true uses the gene name rather than the probeset name for the title probesetplot Field FieldValue shows the data for a field FieldValue Valid fieldnames are Background Intensity StdDev Pixels and Outlier probesetplot ShowStats ShowStatsValue when ShowStats is true adds mean and standard deviation lines to the plot 1 Get the file Drosophila 121502 chp from http ww affymetrix com support technical sample_data demo_data affx 2 Read the data into MATLAB chpStruct affyread Drosophila 121502 chp D Affymetrix LibFiles DrosGenome1 3 Plots PM and MM intensity values probesetplot chpStruct AFFX YELO18w _at showstats true Bioinformatics Toolbox functions affyread celintensityread probesetlink probesetlookup 2 575 probesetvalues Purpose Syntax Description 2 576 Probe set values from probe results PSValues probesetvalues CELStruct CDFStruct PS PSValues probesetvalues CELStruct CDFStruct PS creates a table of values for a probe set PS from the probe data in a CEL file structure CELStruct PS is a probe set index or probe set name from the CDF library file structure CDFStruct PSValues is a matrix with 18 columns and one row for each probe pair in the pr
233. bed by Bolstad 2005 When MethodValue is MLE rmabackadj estimates the parameters using maximum likelihood Default is RMA BackgroundAdjustedMatrix rmabackadj Truncate TruncateValue controls the background noise model used When TruncateValue is false rmabackadj uses nontruncated Gaussian as the background noise model Default is true BackgroundAdjustedMatrix rmabackadj Showplot ShowplotValue lets you plot a histogram showing the distribution of PM probe intensity values blue and the convoluted probability distribution function red with estimated parameters When ShowplotValue is all rmabackadj plots a histogram for each column or chip When ShowplotValue is a number list of numbers or range of numbers rmabackadj plots a histogram for the indicated column number chip For example e Showplot 3 plots the intensity values in column 3 of Data e Showplot 3 5 7 plots the intensity values in columns 3 5 and 7 of Data e Showplot 3 9 plots the intensity values in columns 3 to 9 of PMData 2 617 rmabackadj Figure 1 lo x File Edit View Insert Tools Desktop Window Help a Os ANCENE Histogram of log2 PM and estimated BG density of sample 3 Estimated background density S gt o w Frequency S N 0 1 E 6 8 10 12 14 log2 PM Intensities Examples 1 Load a MAT file included with Bioinformatics
234. blue components or enter a character from the following list b blue g green r red c cyan m magenta or y yellow The default color is red r Property to select the color to highlight similar characters Enter a 1 by 3 RGB vector or color character The default color is magenta showalignment Description StarterPointersValue Property to specify the starting indices of the aligned sequences StartPointers is the two element vector returned as the third output of the function swalign ColumnsValue Property to specify the number of characters in a line Enter the number of characters to display in one row The default value is 64 showalignment Alignment displays an alignment in a MATLAB figure window showalignment Alignment PropertyName PropertyValue calls showalignment with optional properties that use property name property value pairs You can specify one or more properties in any order Each PropertyName must be enclosed in single quotes and is case insensitive These property name property value pairs are as follows showalignment Alignment MatchColor MatchColorValue selects the color to highlight the matches in the output display The default color is red For example to use cyan enter c or 0 255 255 showalignment Alignment SimilarColor SimilarColorValue selects the color to highlight similar residues that are not exact matches
235. by the Method property Use a string or cell array to pass one or multiple input arguments For example you can provide the nucleotide frequencies for the Tajima Nei distance method instead of computing them from the input sequences D seqpdist Seqs PairwiseAlignment PairwiseAlignmentValue controls the global pair wise alignment of input sequences using the nwalign function while ignoring the multiple alignment of the input sequences if any Default is e true When all input sequences do not have the same length e false When all input sequences have the same length Tip If your input sequences have the same length seqpdist will assume they aligned If they are not aligned do one of the following e Align the sequences before passing them to seqpdist for example using the multialign function e Set PairwiseAlignment to true when using seqpdist D seqpdist Seqs JobManager JobManagerValue distributes pair wise alignments into a cluster of computers using Distributed Computing Toolbox JobManagerValue is a jobmanager object such as returned by the Distributed Computing Toolbox function findResource that represents an available distributed MATLAB resource You must have Distributed Computing Toolbox to use this property 2 662 seqpdist D seqpdist Seqs WaitInQueue WaitInQueueValue controls whether seqpdist waits for a distributed MATLAB resource to be available
236. c PrimerconcValue SeqgProperties oligoprop SeqNT HPBase HPBaseValue sae SeqProperties oligoprop SeqNT HPLoop HPLoopValue E SeqProperties oligoprop Seqn7T Dimerlength DimerlengthValue SeqNT DNA oligonucleotide sequence represented by any of the following e Character string containing the letters A C G T or N e Vector of integers containing the integers 1 2 3 4 or 15 e Structure containing a Sequence field that contains a nucleotide sequence SaltValue Value that specifies a salt concentration in moles liter for melting temperature calculations Default is 0 05 moles liter TempValue Value that specifies the temperature in degrees Celsius for nearest neighbor calculations of free energy Default is 25 degrees Celsius PrimerconcValue Value that specifies the concentration in moles liter for melting temperature calculations Default is 50e 6 moles liter 2 531 oligoprop Return Values Description 2 532 HPBaseValue HPLoopValue Value that specifies the minimum number of paired bases that form the neck of the hairpin Default is 4 base pairs Value that specifies the minimum number of bases that form the loop of a hairpin Default is 2 bases DimerlengthValue Value that specifies the minimum number of SeqProperties SeqProperties aligned bases between the sequence and its reverse Default is 4 bases Structure containing the sequence properties for
237. cal maboxplot WhiskerLength WhiskerLengthValue allows you to specify the whisker length for the box plot WhiskerLengthValue defines the maximum length of the whiskers as a function of the interquartile range IQR default 1 5 The whisker extends to the most extreme data value within WhiskerLength IQR of the box If WhiskerLengthValue equals 0 then maboxplot displays all data values outside the box using the plotting symbol Symbol load yeastdata maboxplot yeastvalues times Xlabel Sample Times Using a structure geoStruct getgeodata GSM1768 maboxplot geoStruct For block based data madata gprread mouse_aiwt gpr maboxplot madata F635 Median figure maboxplot madata F635 Median B635 TITLE Cy5 Channel FG BG Bioinformatics Toolbox functions magetfield maimage mairplot maloglog malowess manorm mavolcanoplot Statistics Toolbox function boxplot mafdr Purpose Syntax Arguments Estimate false discovery rate FDR of differentially expressed genes from two experimental conditions or phenotypes FDR mafdr PValues FDR Q mafdr PValues FDR Q Pid mafdr PValues FDR Q Pid R2 mafdr PValues mafdr PValues BHFDR BHFDRValue mafdr PValues Lambda LambdaValue mafdr PValues Method MethodValue mafdr PValues Showplot ShowplotValue PValues Column vector of p values for eac
238. cally significant This value displays graphically as a horizontal line on the plot Default is 0 05 which is equivalent to 1 3010 on the log p value scale Note You can also change the p value cutoff interactively after creating the plot mavolcanoplot Foldchange FoldchangeValue lets you specify a ratio fold change to define data points that are differentially expressed Fold changes display graphically as two mavolcanoplot vertical lines on the plot Default is 2 which corresponds to a ratio of 1 and 1 on a log ratio scale Note You can also change the fold change interactively after creating the plot gt voicanopioe aioix File Tools Window Help log2 ratio 3 2 1 a 1 2 gt Up Regulated p values 0 0002029916 0 00031 40667 0 0008377167 0 0008533776 0 000886821153 0 0011549369 0 0011610192 0 0011833877 0 0012175645 0 0014935374 0 0030909859 xl Down Regulated p values 4 0 0001320088 0 0001626406 0 0005368028 0 0007320984 0 0008033467 0 0009049571 0 0010832956 0 0013745908 0 0014604914 0 0015024430 x Jo c015270830 f l l l l L 1 4 log1O p values ane kl TADY Cutoff Values log10 p value pos Fold change fo Update Reset Clear Export The volcano plot displays the following e log p value versus log ratio scatter plot of genes
239. can be explored and traversed through is_a and part_of relationships Following are methods of a geneont object getancestors geneont Numeric IDs for ancestors of Gene Ontology term getdescendants geneont Numeric IDs for descendants of Gene Ontology term getmatrix geneont Convert geneont object into relationship matrix getrelatives geneont Numeric IDs for relatives of Gene Ontology term Properties of a geneont Object Property Description default_namespace Read only string containing the namespace to which terms are assigned format_version Read only string containing the version of the encoding of the OBO flat format file date Read only string containing the date the OBO file was last updated Terms Read only column vector with handles to term objects of a geneont object For properties of term objects see Properties of Terms Objects on page 5 12 geneont object See Also Properties of Terms Objects Property id name ontology definition synonym isa part_of obsolete Description Numeric value that corresponds to the GO ID of the GO term Tip You can use the num2goid function to convert id to a GO ID string String representing the name of the GO term String limited to molecular function biological process or cellular component String that defines the GO term Numeric array containing GO IDs of GO terms that are synonyms o
240. ce 2 379 Index 6 malowess function reference 2 381 manorm function reference 2 383 mapcaplot function reference 2 386 mattest function reference 2 389 mavolcanoplot function reference 2 395 maxflow method reference 4 45 methods allshortestpaths 4 2 conncomp 4 5 dolayout 4 8 get 4 11 getancestors biograph 4 13 getancestors geneont 4 16 getbyname 4 20 getcanonical 4 22 getdescendants biograph 4 24 getdescendants geneont 4 27 getedgesbynodeid 4 29 getmatrix biograph 4 31 getmatrix geneont 4 32 getmatrix phytree 4 33 getnewickstr 4 34 getnodesbyid 4 36 getrelatives biograph 4 38 getrelatives geneont 4 39 isdag 4 41 isomorphism 4 42 isspantree 4 44 maxflow 4 45 minspantree 4 49 pdist 4 52 plot 4 54 prune 4 57 reorder 4 59 Index reroot 4 63 select 4 67 shortestpath 4 70 subtree 4 75 topoorder 4 76 traverse 4 77 view biograph 4 80 view phytree 4 82 weights 4 83 minspantree method reference 4 49 molviewer function reference 2 403 molweight function reference 2 402 msalign function reference 2 411 msbackadj function reference 2 425 msdotplot function reference 2 430 msheatmap function reference 2 436 mslowess function reference 2 446 msnorm function reference 2 451 mspalign function reference 2 455 mspeaks function reference 2 465 msppresample function reference 2 478 msresample function reference 2 486 mssgolay function reference 2 490 msviewer function referenc
241. cer Research 11 6823 6834 See Also affyinvarsetnorm affyread celintensityread probelibraryinfo probesetlink probesetlookup probesetvalues quantilenorm rmasummary 2 619 rmasummary Pu rpose Calculate gene probe set expression values from Affymetrix microarray probe level data using Robust Multi array Average RMA procedure Syntax ExpressionMatrix ExpressionMatrix Arguments ProbeIndices Data OutputValue Description ExpressionMatrix rmasummary ProbeIndices Data rmasummary Output OutputValue Column vector of probe indices The convention for probe indices is for each probe set to label each probe 0 to N 1 where N is the number of probes in the probe set Matrix of natural scale intensity values where each row corresponds to a perfect match PM probe and each column corresponds to an Affymetrix CEL file Each CEL file is generated from a separate chip All chips should be of the same type Property to control the scale of the returned gene expression values Output Value can be e log e log2 e 10g10 e natural e functionname In the last instance the data is transformed as defined by the function functionname Default is log2 rmasummary ProbeIndices Data returns gene probe set expression values after calculating them from natural scale probe intensities in the matrix Data using the column vector of probe 2 620 rmasummary indices ProbeIndices
242. ces are true or false default Tip Specify true to use this display to manually verify the codon alignment of the two input sequences SeqNT1 and SeqNT2 The presence of stop codons in the amino acid translation can indicate that SeqNT1 and SeqNT2 are not codon aligned Estimating Synonymous and Nonsynonymous Substitution Rates Between the gag Genes of Two HIV Viruses 1 Retrieve two sequences from the GenBank database for the gag genes of two HIV viruses gagi getgenbank L11768 gag2 getgenbank L11770 2 Estimate the synonymous and nonsynonymous substitution rates between the two sequences dn ds vardn vards dnds gagi gag2 dn 0 0241 dnds ds 0 0739 vardn 2 2785e 005 vards 2 6447e 004 Estimating Synonymous and Nonsynonymous Substitution Rates Between Two Nucleotide Sequences That Are Not Codon Aligned 1 Retrieve two nucleotide sequences from the GenBank database for the neuraminidase NA protein of two strains of the Influenza A virus H5N1 hk01 vt04 getgenbank AF509094 getgenbank DQ094287 2 Extract the coding region from the two nucleotide sequences hkO1_cds featuresparse hk01 feature CDS Sequence true vt04_cds featuresparse vt04 feature CDS Sequence true 3 Align the amino acids sequences converted from the nucleotide sequences sc al nwalign nt2aa hk01_cds nt2aa vt04_cds extendgap 1 4 Use the seqinsertgaps fun
243. ch we could not do in the directed graph DG 1 Johnson D B 1977 Efficient algorithms for shortest paths in sparse networks Journal of the ACM 24 1 1 13 2 Siek J G Lee L Q and Lumsdaine A 2002 The Boost Graph Library User Guide and Reference Manual Upper Saddle River NJ Pearson Education Bioinformatics Toolbox functions graphconncomp graphisdag graphisomorphism graphisspantree graphmaxflow graphminspantree graphpred2path graphshortestpath graphtopoorder graphtraverse Bioinformatics Toolbox method of biograph object allshortestpaths 2 241 graphconncomp Purpose Syntax Arguments 2 242 Find strongly or weakly connected components in graph S C graphconncomp G S C graphconncomp G Directed DirectedValue S C graphconncomp G Weak WeakValue G N by N sparse matrix that represents a graph DirectedValue WeakValue Nonzero entries in matrix G indicate the presence of an edge Property that indicates whether the graph is directed or undirected Enter false for an undirected graph This results in the upper triangle of the sparse matrix being ignored Default is true A DFS based algorithm computes the connected components Time complexity is O N E where N and E are number of nodes and edges respectively Property that indicates whether to find weakly connected components or strongly connected components A weakly connected component i
244. cid Z 22 Glx Glutamine or CAA CAG GAA GAG Glutamic acid X 23 Xaa Any amino acid All codons 24 END Termination UAA UAG UGA translation stop 25 GAP Gap of a unknown length aminolookup displays a table of amino acid codes integers abbreviations names and codons aminolookup SeqAA converts between three letter abbreviations and single letter codes for an amino acid sequence If the input is a character string of three letter abbreviations then the output is a character string of the corresponding single letter codes If the input is a character string of single letter codes then the output is a character string of three letter abbreviations If you enter one of the ambiguous single letter codes B Z or X this function displays the corresponding abbreviation for the ambiguous amino acid character aminolookup abc ans AlaAsxCys 2 43 aminolookup aminolookup Code CodeValue displays the corresponding amino acid three letter abbreviation and name aminolookup Integer IntegerValue displays the corresponding amino acid single letter code three letter abbreviation and name aminolookup Abbreviation AbbreviationValue displays the corresponding amino acid single letter code and name aminolookup Name NameValue displays the corresponding amino acid single letter code and three letter abbreviation Examples 1 Convert an amino acid sequence in single letter codes t
245. cid sequence SeqAA Default is 1 proteinpropplot Examples proteinpropplot SeqAA Endat EndatValue specifies the ending point for the plot from the N terminal end of the amino acid sequence SegAA Default is length SeqAA proteinpropplot SeqAA Smoothing SmoothingValue specifies the smoothing method Choices are e linear default e exponential e lowess proteinpropplot SeqAA EdgeWeight EdgeWeightValue specifies the edge weight used for linear and exponential smoothing methods Decreasing this value emphasizes peaks in the plot Choices are any value gt 0 and lt 1 Default is 1 proteinpropplot SeqAA WindowLength WindowLengthValue specifies the window length for the smoothing method Increasing this value gives a smoother plot that shows less detail Default is 11 Plotting Hydrophobicity 1 Use the getpdb function to retrieve a protein sequence prion getpdb 1HJM SEQUENCEONLY true 2 Plot the hydrophobicity Kyte and Doolittle 1982 of the residues in the sequence proteinpropplot prion 2 587 proteinpropplot lolx Edit View Insert Tools Desktop Window Help SCHES R QQMO 2 08 20 Hydrophobicity Kyte amp Doolittle Plotting Parallel Beta Strand 1 Use the getgenpept function to retrieve a protein sequence getgenpept aad50640 2 Plot the conformational preference for parallel beta strand for the resi
246. cid sequences using the BLOSUM50 default scoring matrix and the default values for the GapOpen and ExtendGap properties Return the optimal global alignment score in bits and the alignment character array Score Alignment nwalign VSPAGMASGYD IPGKASYD Score 7 3333 Alignment VSPAGMASGYD ILI I P GKAS YD 2 Globally align two amino acid sequences specifying the PAM250 scoring matrix and a gap open penalty of 5 Score Alignment nwalign IGRHRYHIGG SRYIGRG scoringmatrix pam250 gapopen 5 2 529 nwalign References See Also 2 530 Score 2 3333 Alignment IGRHRYHIG G I Tl S RY IGRG 3 Globally align two amino acid sequences returning the Score in nat units nats by specifying a scale factor of 1og 2 Score Alignment nwalign HEAGAWGHEE PAWHEAE Scale 1log 2 Score 0 2310 Alignment HEAGAWGHE E I Ut P AW HEAE 1 Durbin R Eddy S Krogh A and Mitchison G 1998 Biological Sequence Analysis Cambridge University Press Bioinformatics Toolbox functions blosum multialign nt2aa pam profalign seqdotplot showalignment swalign oligoprop Purpose Syntax Arguments Calculate sequence properties of DNA oligonucleotide SeqProperties oligoprop SeqNT SeqProperties oligoprop Seqn7T Salt SaltValue x SeqProperties oligoprop SeqnT Temp TempValue SeqProperties oligoprop SeqnT Primercon
247. class 2 1 Row 3 of sample is closest to row 2 of Training so class 3 2 2 342 knnclassify Classifying Rows into One of Two Groups The following example classifies each row of the data in sample into one of the two groups in training The following commands create the matrix training and the grouping variable group and plot the rows of training in two groups training mvnrnd 1 1 eye 2 100 mvnrnd 1 1 2 eye 2 100 group repmat 1 100 1 repmat 2 100 1 gscatter training 1 training 2 group rb x legend Training group 1 Training group 2 hold on 4 ra Training group 1 3h Pag x l Training group 2 N x a 2t F A we 4 Pi x ae i i g x r tre 4 aF iF g K 4 ote ETH H Be ag qe x ad x j noy a Or a ap 4 xH x X x n Hee Ate me X lt lt 7 x x 1 x f x xX XXX a Xx i 1 x ob 4 Ti HE R x x x x x X x x x ri x x 3h x x xX 4 4 x 1 L 1 1 L 1 1 J 5 4 3 2 1 0 1 2 3 4 The following commands create the matrix sample classify its rows into two groups and plot the result 2 343 knnclassify sample unifrnd 5 5 100 2 Classify the sample using the nearest neighbor classification c knnclassify sample training group gscatter sample 1 sample 2 c mc hold on legend Training group 1 Training group 2 Data in gr
248. cluster tree from the distance matrix Dist Tree linkage Dist average 4 Use the optimalleaforder function to determine the optimal leaf ordering for the hierarchical binary cluster tree represented by Tree using the distance matrix Dist order optimalleaforder Tree Dist optimalleaforder References 1 Bar Joseph Z Gifford D K and Jaakkola T S 2001 Fast optimal leaf ordering for hierarchical clustering Bioinformatics 17 Suppl 1 522 9 PMID 11472989 See Also Bioinformatics Toolbox function clustergram Statistics Toolbox functions linkage pdist 2 543 palindromes Purpose Syntax Description Examples 2 544 Find palindromes in sequence Position Length palindromes SeqNT PropertyName PropertyValue Position Length Pal palindromes SeqNT palindromes Length LengthValue palindromes Complement ComplementValue Position Length palindromes SeqNT PropertyName PropertyValue finds all palindromes in sequence SeqNT with a length greater than or equal to 6 and returns the starting indices Position and the lengths of the palindromes Length Position Length Pal palindromes SeqNT also returns a cell array Pal of the palindromes palindromes Length LengthValue finds all palindromes longer than or equal to Length The default value is 6 palindromes Complement ComplementValue finds complementary palindromes
249. cognized GGA NAR 0797 plop Eele ox Fanticadon pos 10021 10023 aa Gly tRNA Asp cytochrome oxidase subunit 2 RNA 16 featuresmap Figure 2 lolxi Fie Edit view Insert Tools Desktop Window Help FHS 2 AQ O El 08 50 D HH 3gp72 H Hap HHV3gp 0 HH 3gp69 HHV3gp6s HHW3gp67 HHA HH 3gp65 HH V3gp64 HH Sgp63 HHW3gp62 HH v3gp61 HH 3gp60 gp HHV3gp58 HH V3gp5 oH H3gp56 HHV3gp55 HHMaaps4 HH 3gp53 HHV3gp52 124884 bp a HH Aanda gene HHVgpsy HHV3gp38 HHW3gp3 HH V3gp36 HHV3gp35 HH V3gp34 71540 locus_tag HHY3gp42 b_xref GenelD 1487708 HKU 4 5359 0764 Hep 316 113480 8332 117790 5808 117679 101649 101170 101219 100302 100272 99607 9626 99411 8563 99302 5996 98641 5984 93675 3850 92855 0493 92303 738 1 90388 7882 86575 62263647 1 4667 86322 3163 34700 27 19 83318 0360 31451 IRATRA 75730 4307 65332 475363977 3910 62171 2138 60321 HH aap33 HHV3gp32 HHv3gp31 HH 3gp30 HH aap29 HH VSgp28 HHV3gp27 HHV3gp26 HHV3gp25 HH V3gp24 HHV3gp23 HH V3gp22 HH aap HHVGap20 HHV3gp 19 HH 3gp 18 HHD HH3gp 16 Hiyaa 15 HHV3gp 14 HHV3gp 13 N HH aap 12 HHV3gp11 HHV3gp09 1 bp 0759 33875 047529024 9845 26518 6493 25573 4149 255 16 3794 22568 247821258 1113 19431 18441 19346 16214 18199 13590 16049 12160 13392 11009 11917 10687 9477 HH V3gp08 HHV3gp07
250. complexity of 0 N log N N E where N and E are the number of nodes and edges respectively allshortestpaths BGObj PropertyName PropertyValue calls allshortestpaths with optional properties that use property name property value pairs You can specify one or more properties in any order Each PropertyName must be enclosed in single quotes and is case insensitive These property name property value pairs are as follows dist allshortestpaths BGObj Directed DirectedValue indicates whether the graph is directed or undirected Set DirectedValue to false for an undirected graph This results in the upper triangle of the sparse matrix being ignored Default is true dist allshortestpaths BGObj Weights WeightsValue lets you specify custom weights for the edges WeightsValue isa column vector having one entry for every nonzero value edge in the N by N adjacency matrix extracted from a biograph object BGObj The order of the custom weights in the vector must match the order of the nonzero values in the N by N adjacency matrix when it is traversed column wise This property lets you use zero valued weights By default allshortestpaths gets weight information from the nonzero entries in the N by N adjacency matrix 1 Johnson D B 1977 Efficient algorithms for shortest paths in sparse networks Journal of the ACM 24 1 1 18 2 Siek J G Lee L Q and Lumsdaine A 2002 The Boost Gr
251. creates and updates the CP object with the first validation This form is useful when you want to know the performance of a single validation cp cClassperf Positive PositiveValue Negative NegativeValue sets the positive and negative labels to identify the target disorder and the control classes These labels are used to compute clinical diagnostic test performance p and n must consist of disjoint sets of the labels used in groundtruth For example if groundtruth 12213441338 2 you could set p 1 2 n 3 4 If groundtruth is a cell array of strings p and n can either be cell arrays of strings or numeric vectors whose entries are subsets of grp2idx groundtruth PositiveValue defaults to the first class returned by grp2idx groundtruth while NegativeValue defaults to all the others In clinical tests inconclusive values or NaN are counted as false negatives for the computation of the specificity and as false positives for the computation of the sensitivity that is inconclusive results may decrease the diagnostic value of the test Tested observations for which true class is not within the union of PositiveValue and NegativeValue are not considered However tested observations that result in a class not covered by the vector groundtruth are counted as inconclusive Classify the fisheriris data with a K Nearest Neighbor classifier load fisheriris c knnclassify meas meas species 4 euclidean
252. cs Toolbox functions affyread geosoftread gprread imageneread sptread gcrma Purpose Syntax Perform GC Robust Multi array Average GCRMA background adjustment quantile normalization and median polish summarization on Affymetrix microarray probe level data ExpressionMatrix ExpressionMatrix SequenceMatrix ExpressionMatrix ChipIndexValue ExpressionMatrix OpticalCorrValue ExpressionMatrix ExpressionMatrix ExpressionMatrix TuningParamValue ExpressionMatrix ExpressionMatrix ExpressionMatrix Il I 1 gcrma PMMatrix MMMatrix ProbeIndices AffinPM AffinMM gcrma PMMatrix MMMatrix ProbeIndices gcrma gcrma gcrma gcrma gcrma gcrma gcrma gcrma ChipIndex OpticalCorr CorrConst CorrConstValue Method MethodValue TuningParam GSBCorr GSBCorrValue Normalize NormalizeValue Verbose VerboseValue 2 159 gcrma Arguments PMMatrix MMMatrix ProbeIndices 2 160 Matrix of intensity values where each row corresponds to a perfect match PM probe and each column corresponds to an Affymetrix CEL file Each CEL file is generated from a separate chip All chips should be of the same type Tip You can use the PMIntensities matrix returned by the celintensityread function Matrix of intensity values where each row corresponds to a mismatch MM probe and each column corresponds
253. ction to copy the gaps from the aligned amino acid sequences to their corresponding nucleotide sequences thus codon aligning them 2 123 dnds References See Also 2 124 hkO1_aligned seqinsertgaps hk01_cds al 1 vt04_aligned seqinsertgaps vt04_cds al 3 5 Estimate the synonymous and nonsynonymous substitutions rates of the codon aligned nucleotide sequences and also display the codons considered in the computations and their amino acid translations dn ds dnds hk01_aligned vt04_ aligned verbose true 1 Li W Wu C and Luo C 1985 A new method for estimating synonymous and nonsynonymous rates of nucleotide substitution considering the relative likelihood of nucleotide and codon changes Molecular Biology and Evolution 2 2 150 174 2 Nei M and Gojobori T 1986 Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions Molecular Biology and Evolution 3 5 418 426 3 Nei M and Jin L 1989 Variances of the average numbers of nucleotide substitutions within and between populations Molecular Biology and Evolution 6 3 290 300 4 Nei M and Kumar S 2000 Synonymous and nonsynonymous nucleotide substitutions in Molecular Evolution and Phylogenetics Oxford University Press 5 Pamilo P and Bianchi N 1993 Evolution of the Zfx And Zfy genes rates and interdependence between the genes Molecular Biology and Evoluti
254. d Default is true dist graphallshortestpaths G Weights WeightsValue lets you specify custom weights for the edges WeightsValue is a column vector having one entry for every nonzero value edge in matrix G The order of the custom weights in the vector must match the order of the nonzero values in matrix G when it is traversed column wise This property lets you use zero valued weights By default graphallshortestpaths gets weight information from the nonzero entries in matrix G Finding All Shortest Paths in a Directed Graph 1 Create and view a directed graph with 6 nodes and 11 edges W 41 99 51 32 15 45 38 32 36 29 21 DG sparse 61223445561 2635416343 5 W DG graphallshortestpaths 4 1 0 4500 6 2 0 4100 2 3 0 5100 5 3 0 3200 6 3 0 2900 3 4 0 1500 5 4 0 3600 1 5 0 2100 2 5 0 3200 1 6 0 9900 4 6 0 3800 view biograph DG ShowWeights on 2 237 graphallshortestpaths Biograph Viewer 1 File Tools Window Help a QAM 2 Find all the shortest paths between every pair of nodes in the directed graph graphallshortestpaths DG ans 0 1 3600 0 5300 0 5700 0 2100 0 9500 1 1100 0 0 5100 0 6600 0 3200 1 0400 0 6000 0 9400 0 0 1500 0 8100 0 5300 2 238 graphallshortestpaths 0 4500 0 7900 0 6700 0 0 6600 0 3800 0 8100 1 1500 0 3200 0 3600 0 0 7400 0 8900 0 4100 0 2900 0 4400 0 7300 0 The resulting ma
255. d DELETE states agreeing with Model ModelLength If model state annotation is missing but MultipleAlignment is space aligned then a maximum entropy criteria is used to select Model ModelLength states Note Insert and flank insert transition probabilities are not estimated but can be modified afterwards using hmmprof struct hmmprofestimate A AValue sets the pseudocount weight A Avalue when estimating the symbol emission probabilities Default value is 20 hmmprofestimate Ax AxValue sets the pseudocount weight Ax Axvalue when estimating the transition probabilities Default value is 20 hmmprofestimate BE BEValue sets the background symbol emission probabilities Default values are taken from Model NullEmission hmmprofestimate BMx BMxValue sets the background transition probabilities from any MATCH state M gt M M gt I M gt D Default values are taken from hmmprofstruct 2 311 hmmprofestimate hmmprofestimate BDx BDxValue sets the background transition probabilities from any DELETE state D gt M D gt D Default values are taken from hmmprofstruct See Also Bioinformatics Toolbox functions hmmprofalign hmmprofstruct showhmmprof 2 312 hmmprofgenerate Purpose Syntax Arguments Description Generate random sequence drawn from profile Hidden Markov Model HMM Sequence hmmprofgenerate Model Sequence Profptr hmmprofgenerate Model
256. d genbankread getgenpept pdbread seqtool 2 194 geosoftread Purpose Read Gene Expression Omnibus GEO SOFT format data Syntax GEOSOFTData geosoftread File Arguments File Gene Expression Omnibus GEO SOFT format Sample file GSM or Data Set file GDS Enter a file name a path and file name or a URL pointing to a file Note File can also be a MATLAB character array that contains the text of a GEO file Description GEOSOFTData geosoftread File reads a Gene Expression Omnibus GEO SOFT format Sample file GSM or Data Set file GDS and then creates a MATLAB structure GEOSOFTdata with the following fields Fields Scope Accession Header ColumnDescriptions ColumnNames Data Identifier GDS files only IDRef GDS files only Fields correspond to the GenBank keywords Each separate entry listed in File is stored as a separate element of the structure 2 195 geosoftread Examples Get data from the GEO Web site and save it to a file geodata getgeodata GSM3258 ToFile GSM3258 txt Use geosoftread to access a local copy of a GEO file instead of accessing it from the GEO Web site geodata geosoftread GSM3258 txt See Also Bioinformatics Toolbox functions galread getgeodata gprread sptread 2 196 getblast Purpose Syntax Arguments BLAST report from NCBI Web site Data getblast RID getblast PropertyName PropertyValue getbla
257. database For example 2 is the protein family number for the protein family PF0002 PFAMAccessNumber String specifying a protein family accession ToFileValue TypeValue number of an HMM profile record in the PFAM database For example PF00002 String specifying a file name or a path and file name for saving the data If you specify only a file name that file will be saved in the MATLAB Current Directory String that specifies the set of alignments returned Choices are e full Default Returns all alignments that fit the HMM profile e seed Returns only the alignments used to generate the HMM profile 2 211 gethmmalignment Return Values Description 2 212 MirrorValue String that specifies a Web database Choices are e Sanger default e Janelia IgnoreGapsValue Controls the removal of the symbols and from the sequence Choices are true or false default AlignStruct MATLAB structure containing the multiple sequence alignment associated with an HMM profile AlignStruct gethmmalignment PFAMNumber determines a protein family accession number from PFAMNumber an integer searches the PFAM database for the associated HMM profile record retrieves the multiple sequence alignment associated with the HMM profile and returns AlignStruct a MATLAB structure containing the following fields Field Header Sequence AlignStruct gethmmalignment PFAMAccessNumber searches the PFAM
258. dd partial counts to the standard symbols Limits Property to specify using part of the sequences Enter a 1x2 vector with the first position and the last position to include in the profile The default value is 1 SeqLength Profile seqprofile Seqs PropertyName PropertyValue returns a matrix Profile of size 20 or 4 x SequenceLength with the frequency of amino acids or nucleotides for every column in the multiple alignment The order of the rows is given by e 4 nucleotides A C G T U e 20 amino acids A RNDCQEGHILKMFPSTWYV Profile Symbols seqprofile Seqs returns a unique symbol list Symbols where every symbol in the list corresponds to a row in the profile Profile seqprofile Alphabet AlphabetValue selects a nucleotide alphabet amino acid alphabet or no alphabet seqprofile Counts CountsValue when Counts is true returns the counts instead of the frequency seqprofile Gaps GapsValue appends a row to the bottom ofa profile Profile with the count for gaps seqprofile Ambiguous AmbiguousValue when Ambiguous is count counts the ambiguous amino acid symbols B Z X and nucleotide symbols R Y KM S W B D H V N with the standard symbols For example the amino acid X adds a 1 20 count to every row while the amino acid B counts as 1 2 at the D and N rows seqprofile Limits LimitsValue specifies the start and end positions for the profile relati
259. de Nucleotide Letter Nucleotide K GT Keto Gap of indeterminate length M AC Amino Unknown Amino Acid Conversion Amino Acid Letter Description B DN Aspartic acid or asparagine Z EQ Glutamic acid or glutamine X ARNDCQEGHILKMFPSTWYV Any amino acid seq2regexp Seq converts ambiguous nucleotide or amino acid symbols in a sequence into a regular expression format using IUB IUPAC codes seq2regexp PropertyName PropertyValue defines optional properties using property name value pairs seq2regexp Alphabet AlphabetValue selects the sequence alphabet for nucleotide sequences or amino acid sequences seq2regexp Ambiguous AmbiguousValue when AmbiguousValue is false removes the ambiguous characters from the output regular expressions For example e If Seq ACGTK and AmbiguousValue is true default MATLAB returns ACGT GTK with the unambiguous characters G and T and the ambiguous character K e If Seq ACGTK and AmbiguousValue is false MATLAB returns ACGT GT with only the unambiguous characters 2 629 seq2regexp Example 1 Convert a nucleotide sequence into a regular expression seq2regexp ACWTMAN ans AC ATW T ACM A ACGTRYKMSWBDHVN 2 Remove ambiguous characters from the regular expression seq2regexp ACWTMAN ambiguous false ans AC AT T AC A ACGT See Also Bioinformatics Toolbox functions restrict
260. de codon to amino acid mapping Join two sequences to produce shortest supersequence Display and manipulate 3 D molecule structure Calculate sequence properties of DNA oligonucleotide Find palindromes in sequence Visualize intermolecular distances in Protein Data Bank PDB file Characteristics for amino acid sequences 1 7 T Functions By Category proteinpropplot ramachandran randseq rebasecuts restrict revgeneticcode seqconsensus seqdisp seqinsertgaps seqlogo seqmatch seqprofile seqshoworfs Sequence Statistics aacount aminolookup Plot properties of amino acid sequence Draw Ramachandran plot for Protein Data Bank PDB data Generate random sequence from finite alphabet Find restriction enzymes that cut protein sequence Split nucleotide sequence at restriction site Reverse mapping for genetic code Calculate consensus sequence Format long sequence output for easy viewing Insert gaps into nucleotide or amino acid sequence Display sequence logo for nucleotide or amino acid sequences Find matches for every string in library Calculate sequence profile from set of multiply aligned sequences Display open reading frames in sequence Count amino acids in sequence Find amino acid codes integers abbreviations names and codons Sequence Visualization basecount baselookup codonbias codoncount cpgisland dimercount isoelectric molweig
261. default 2 703 svmtrain Return Values Description 2 704 SVMStruct Structure containing information about the trained SVM classifier including the following fields SupportVectors Alpha Bias KernelFunction KernelFunctionArgs GroupNames SupportVectorIndices ScaleData FigureHandles Tip You can use SVMStruct as input to the svmclassify function to use for classification SVMStruct svmtrain Training Group trains a support vector machine SVM classifier using Training a matrix of training data taken from two groups specified by Group svmtrain treats NaNs or empty strings in Group as missing values and ignores the corresponding rows of Training Information about the trained SVM classifier is returned in SVMStruct a structure with the following fields SupportVectors e Alpha e Bias e KernelFunction svmtrain KernelFunctionArgs e GroupNames SupportVectorIndices e ScaleData e FigureHandles SVMStruct svmtrain Training Group PropertyName PropertyValue calls svmtrain with optional properties that use property name property value pairs You can specify one or more properties in any order Each PropertyName must be enclosed in single quotation marks and is case insensitive These property name property value pairs are as follows SVMStruct svmtrain Kernel Function Kernel_FunctionValue specifies the kernel function Kernel_FunctionValue
262. des For a list of valid characters see Amino Acid Lookup Table on page 2 42 or Nucleotide Lookup Table on page 2 52 fastawrite File Data writes the contents of Data toa FASTA formatted file ASCII text file fastawrite File Header Sequence writes the specified header and sequence information to a FASTA formatted file ASCII text file get the sequence for the human p53 gene from GenBank seq getgenbank NM_000546 fastawrite find the CDS line in the FEATURES information cdsline strmatch CDS seq Features read the coordinates of the coding region start stop strread seq Features cdsline s d d sextract the coding region codingSeq seq Sequence start stop Swrite just the coding region to a FASTA file fastawrite p53coding txt Coding region for p53 codingSeq Save multiple sequences data 1 Sequence ACACAGGAAA data 1 Header First sequence data 2 Sequence ACGTCAGGTC data 2 Header Second sequence fastawrite my_sequences txt data type my_sequences txt gt First sequence ACACAGGAAA gt Second sequence ACGTCAGGTC See Also Bioinformatics Toolbox functions fastaread seqtool 2 141 featuresmap Purpose Draw linear or circular map of features from GenBank structure Syntax featuresmap GBStructure featuresmap GBStructure FeatList featuresmap GBStructure FeatList Levels featuresmap GBStructure Levels Handles OutFeatList
263. dex typically the column whose median intensity is the median of all the columns 2 16 affyinvarsetnorm e For each column determines the proportional rank difference prd for each pair of ranks RankX and RankY from the sample column and the baseline reference prd abs RankX RankY e For each column determines the invariant set of data points by selecting data points whose proportional rank differences prd are below threshold which is a predetermined threshold for a given data point defined by the ThresholdsValue property It repeats the process until either no more data points are eliminated or a predetermined percentage of data points is reached The invariant set is data points with a prd lt threshold e For each column uses the invariant set of data points to calculate the lowess or running median smoothing curve which is used to normalize the data in that column NormData MedStructure affyinvarsetnorm Data also returns a structure of the index of the column chosen as the baseline and each column s intensity median before and after normalization Note If Data contains NaN values then NormData will also contain NaN values at the corresponding positions affyinvarsetnorm PropertyName PropertyValue defines optional properties that use property name value pairs in any order These property name value pairs are as follows affyinvarsetnorm Baseline BaselineValue
264. dgeFontSize EdgeFontSizeValue sets the size of the edge font in points Default is 8 BGobj biograph CMatrix NodeIDs ShowArrows ShowArrowsValue controls the display of arrows for the edges Choices are on default or off BGobj biograph CMatrix NodeIDs ArrowSize ArrowSizeValue sets the size of the arrows in points Default is 8 BGobj biograph CMatrix NodeIDs Showveights ShowWeightsValue controls the display of text indicating the weight of the edges Choices are on default or off BGobj biograph CMatrix NodeIDs ShowTextInNodes ShowTextInNodesValue specifies the node property used to label nodes when you display a biograph object using the view method BGobj biograph CMatrix NodeIDs NodeAutoSize NodeAutoSizeValue controls precalculating the node size before calling the layout engine Choices are on default or off BGobj biograph CMatrix NodeIDs NodeCallback NodeCallbackValue specifies user callback for all nodes BGobj biograph CMatrix NodeIDs EdgeCallback EdgeCallbackValue specifies user callback for all edges 2 61 biograph BGobj biograph CMatrix NodeIDs CustomNodeDrawFcn CustomNodeDrawFcnValue specifies function handle to customized function to draw nodes Default is Examples 1 Create a biograph object with default node IDs and then use the get function to display
265. dges that connects all the nodes in the undirected graph G and for which the total weight is minimized Weights of the edges are all nonzero entries in the lower triangle of the N by N sparse matrix G Output Tree isa spanning tree represented by a sparse matrix Output pred is a vector containing the predecessor nodes of the minimal spanning tree MST with the root node indicated by 0 The root node defaults to the first node in the largest connected component This computation requires an extra call to the graphconncomp function Tree pred graphminspantree G R sets the root of the minimal spanning tree to node R Tree pred graphminspantree PropertyName PropertyValue calls graphminspantree with optional properties that use property name property value pairs You can specify one or more properties in any order Each PropertyName must be enclosed in single quotes graphminspantree Examples and is case insensitive These property name property value pairs are as follows Tree pred graphminspantree Method MethodValue lets you specify the algorithm used to find the minimal spanning tree MST Choices are e Kruskal Grows the minimal spanning tree MST one edge at a time by finding an edge that connects two trees in a spreading forest of growing MSTs Time complexity is 0 E X 1log N where X is the number of edges no longer than the longest edge in the MST and N and E are the number
266. dues in the sequence proteinpropplot s propertytitle Parallel beta strand 2 588 proteinpropplot References See Also ioi x Edit View Insert Tools Desktop Window Help Hsr aantal E0 nl n Parallel beta strand 1 8 1 6 1 4 1 2 Value a 0 8 0 6 0 4 S 10 15 20 25 30 35 40 Residue 1 Kyte J and Doolittle R F 1982 A simple method for displaying the hydropathic character of a protein J Mol Biol 157 1 105 132 Bioinformatics Toolbox functions aacount atomiccomp molviewer molweight pdbdistplot proteinplot ramachandran seqtool MATLAB function plotyy 2 589 quantilenorm Pu rpose Quantile normalization over multiple arrays Syntax NormData quantilenorm Data NormData quantilenorm MEDIAN true NormData quantilenorm DISPLAY true Description NormData quantilenorm Data where the columns of Data correspond to separate chips normalizes the distributions of the values in each column Note If Data contains NaN values then NormData will also contain NaN values at the corresponding positions NormData quantilenorm MEDIAN true takes the median of the ranked values instead of the mean NormData quantilenorm DISPLAY true plots the distributions of the columns and of the normalized data Examples load yeastdata normYeastValues quantilenorm yeastvalues display 1 See Also malowess manorm
267. e 2 is the protein family number for the protein family PF0002 PFAMAccessNumber String specifying a protein family accession ToFileValue number of an HMM profile record in the PFAM database The string must include a version number appended at the end of the accession number For example PF00002 14 Note While this is the most efficient way to query the PFAM database version numbers can change making your input invalid String specifying a file name or a path and file name for saving the data If you specify only a file name that file will be saved in the MATLAB Current Directory 2 215 gethmmprof Return Values Description 2 216 ModeValue String that specifies the returned alignment mode Choices are e 1s Default Global alignment mode e fs Local alignment mode MirrorValue String that specifies a Web database Choices are e Sanger default e Janelia HMMStruct MATLAB structure containing information retrieved from the PFAM database HMMStruct gethmmprof PFAMName searches the PFAM database for the record represented by PFAMName a protein family name retrieves the HMM profile information and stores it in HMMStruct a MATLAB structure with the following fields Field Name PfamAccessionNumber ModelDescription ModelLength Alphabet MatchEmission InsertEmission NullEmission BeginxX Matchx gethmmprof Field Insertx Deletex FlankingInsertx Loopx
268. e AGFEData containing the following fields e Header e Stats e Columns e Rows e Names e IDs e Data e ColumnNames e TextData e TextColumnNames Feature Extraction Software takes an image from an Agilent microarray scanner and generates raw intensity data for each spot on the plate For more information about this software see a description on their Web site at http www chem agilent com scripts pds asp lpage 2547 1 Read in a sample Agilent Feature Extraction Software file Note that the file fe_sample txt is not provided with Bioinformatics Toolbox 2 39 agferead agfeStruct agferead fe_sample txt 2 Plot the median foreground maimage agfeStruct gMedianSignal maboxplot agfeStruct gMedianSignal See Also Bioinformatics Toolbox functions affyread celintensityread galread geosoftread gprread imageneread magetfield sptread 2 40 aminolookup Purpose Syntax Arguments Find amino acid codes integers abbreviations names and codons aminolookup aminolookup SeqgAA aminolookup Code CodeValue aminolookup Integer IntegerValue aminolookup Abbreviation AbbreviationValue aminolookup Name NameValue SeqAA Character string of single letter codes or three letter abbreviations representing an amino acid sequence See the Amino Acid Lookup Table on page 2 42 for valid codes and abbreviations CodeValue String specifying a single letter representing an amin
269. e html Examples 1 Enter a nucleotide sequence Seq AGAGGGGTACGCGCTCTGAAAAGCGGGAACCTCGTGGCGCTTTATTAA 2 Use the recognition pattern sequence GCGC with the point of cleavage at position 3 to cleave a nucleotide sequence fragmentsPattern restrict Seq GCGC 3 fragmentsPattern AGAGGGGTACGCG CTCTGAAAAGCGGGAACCTCGTGGCG CTTTATTAA 3 Use the restriction enzyme HspAI recognition sequence GCGC with the point of cleavage at position 1 to cleave a nucleotide sequence fragmentsEnzyme restrict Seq HSpAI fragmentsEnzyme AGAGGGGTACG CGCTCTGAAAAGCGGGAACCTCGTGG CGCTTTATTAA 2 609 restrict 4 Use a regular expression for the enzyme pattern fragmentsRegExp restrict Seq GCG C 3 fragmentsRegExp AGAGGGGTACGCGCTCTGAAAAGCG GGAACCTCGTGGCGCTTTATTAA 5 Capture the cutting sites and fragment lengths with the fragments fragments cut_sites lengths restrict Seq HspAI fragments AGAGGGGTACG CGCTCTGAAAAGCGGGAACCTCGTGG CGCTTTATTAA cut_sites 0 11 37 lengths 11 26 11 See Also Bioinformatics Toolbox functions cleave rebasecuts seq2regexp seqshowwords MATLAB function regexp 2 610 revgeneticcode Purpose Syntax Arguments Reverse mapping for genetic code map revgeneticcode revgeneticcode GeneticCode revgeneticcode Alphabet AlphabetValue revgeneticcode ThreeLetterCodes ThreeLetterCodesValue
270. e mssgolay msviewer 2 489 mssgolay Purpose Syntax Arguments Description 2 490 Smooth mass spectrum with least squares polynomial Yout mssgolay MZ Y mssgolay PropertyName PropertyValue mssgolay Span SpanValue mssgolay Degree DegreeValue mssgolay ShowPlot ShowPlotValue MZ Mass charge vector with the range of ions in the spectra Y Ion intensity vector with the same length as the mass charge vector MZ Y can also be a matrix with several spectra that share the same mass charge MZ range Yout mssgolay MZ Y smoothes a raw mass spectrum Y using a least squares digital polynomial filter Savitzky and Golay filters The default span or frame is 15 samples mssgolay PropertyName PropertyValue defines optional properties using property name value pairs mssgolay Span SpanValue modifies the frame size for the smoothing function If SpanValue is greater than 1 the window is the size of SpanValue in samples independent of the MZ vector Higher values will smooth the signal more with an increase in computation time If SpanValue is less than 1 the window size is a fraction of the number of points in the data MZ For example if SpanValue is 0 05 the window size is equal to 5 of the number of points in MZ mssgolay Examples See Also Note 1 The original algorithm by Savitzky and Golay assumes a uniformly spaced mas
271. e 2 492 multialign function reference 2 495 multialignread function reference 2 504 multialignviewer function reference 2 506 mzxml2peaks function reference 2 507 mzxmlread function reference 2 510 nmercount function reference 2 512 nt2aa function reference 2 513 nt2int function reference 2 518 ntdensity function reference 2 520 nuc44 function reference 2 522 num2 goid function reference 2 523 nwalign function reference 2 524 oO objects biograph 5 2 geneont 5 11 phytree 5 13 oligoprop function reference 2 531 optimalleaforder function reference 2 540 P palindromes function Index 7 Index reference 2 544 pam function reference 2 546 pdbdistplot function reference 2 548 pdbread function reference 2 550 pdbwrite function reference 2 557 pdist method reference 4 52 pfamhmmread function reference 2 560 phytree constructor reference 2 561 phytree object reference 5 13 phytreeread function reference 2 565 phytreetool function reference 2 566 phytreewrite function reference 2 568 plot method reference 4 54 probelibraryinfo function reference 2 570 probesetlink function reference 2 572 probesetlookup function reference 2 574 probesetplot function reference 2 575 probesetvalues function reference 2 576 profalign function reference 2 578 proteinplot function reference 2 581 proteinpropplot function Index 8 reference 2 584 prune method reference 4 57 Q quantilenorm function re
272. e Also 1 Enter a nucleotide sequence seq AGAGGGGTACGCGCTCTGAAAAGCGGGAACCTCGTGGCGCTTTATTAA 2 Look for all possible cleavage sites in the sequence seq enzymes sites rebasecuts seq 3 Find where restriction enzymes CfoI and Tru9I cut the sequence enzymes sites rebasecuts seq CfolI Tru9I 4 Search for any possible enzymes that cut after base 7 enzymes rebasecuts seq 7 5 Get the subset of enzymes that cut between base 11 and 37 enzymes rebasecuts seq 11 37 Bioinformatics Toolbox functions cleave restrict seq2regexp seqshowwords MATLAB function regexp 2 605 redgreencmap Purpose Syntax Arguments Description 2 606 Create red and green color map redgreencmap Length redgreencmap Interpolation InterpolationValue Length Length of the color map Enter either 256 or 64 Default is the length of the color map of the current figure InterpolationValue Property that lets you set the algorithm for color interpolation Choices are e linear e quadratic e cubic e sigmoid default Note The sigmoid interpolation is tanh redgreencmap Length returns an Length by 3 matrix containing a red and green color map Low values are bright green values in the center of the map are black and high values are red Enter either 256 or 64 for Length If Length is empty the length of the map will be the same as the length of the color map of the c
273. e Also Bioinformatics functions aacount molweight 2 334 jcampread Purpose Syntax Arguments Description Read JCAMP DX formatted files JCAMPData jcampread File File JCAMP DX formatted file ASCII text file Enter a file name a path and file name or a URL pointing to a file File can also be a MATLAB character array that contains the text of a JCAMP DX formatted file JCAMP DX is a file format for infrared NMR and mass spectrometry data from the Joint Committee on Atomic and Molecular Physical Data JCAMP jcampread supports reading data from files saved with Versions 4 24 and 5 of the JCAMP DX format For more details see http www jcamp org index html JCAMPData jcampread File reads data from a JCAMP DX formatted file File and creates a MATLAB structure UCAMPData containing the following fields Field Title DataType Origin Owner Blocks Notes The Blocks field of the structure is an array of structures corresponding to each set of data in the file These structures have the following fields 2 335 jcampread Examples 2 336 Field XData YData XUnits YUnits Notes 1 Download test data in the file isa_ms1 dx from http www jcamp org testdata html testdata zip 2 Read a JCAMP DxX file isas_ms1 dx into MATLAB and plot the mass spectrum jcampStruct jcampread isas_msi dx data jcampStruct Blocks 1 stem data XData data YData MarkerEdgeColo
274. e Newick format for describing trees The Newick tree format can be found at http evolution genetics washington edu phylip newicktree html phytreewrite Tree opens the Save Phylogenetic Tree As dialog box for you to enter or select a file name Examples Read tree data from a Newick formatted file tr phytreeread pf00002 tree Remove all the mouse proteins ind getbyname tr mouse tr prune tr ind 2 568 phytreewrite See Also view tr Write pruned tree data to a file phytreewrite newtree tree tr Bioinformatics Toolbox functions phytree object constructor phytreeread phytreetool seqlinkage Bioinformatics Toolbox object phytree object Bioinformatics Toolbox methods of phytree object getnewickstr 2 569 probelibraryinfo Purpose Syntax Description Examples 2 570 Probe set library information for probe results ProbeInfo probelibraryinfo CELStruct CDFStruct ProbeInfo probelibraryinfo CELStruct CDFStruct creates a table of information linking the probe data in a CEL file structure with probe set information from a CDF file structure ProbeInfo is a matrix with three columns and the same number of rows as the probes field of the CELStruct The first column is the probe set ID number to which the probe belongs Probes that do not belong to a probe set in the CDF library file have probe set ID equal to 0 The second column contains the probe pair number The thi
275. e a phytree object from a phylogenetic tree file tr phytreeread pf00002 tree Phylogenetic tree object with 33 leaves 32 branches 2 Create a connection matrix from the phytree object CM labels dist getmatrix tr 3 Determine if the connection matrix is a spanning tree graphisspantree CM ans graphisspantree References See Also 4 Add an edge between the root and the first leaf in the connection matrix CM end 1 1 5 Determine if the modified connection matrix is a spanning tree graphisspantree CM ans 1 Siek J G Lee L Q and Lumsdaine A 2002 The Boost Graph Library User Guide and Reference Manual Upper Saddle River NJ Pearson Education Bioinformatics Toolbox functions graphallshortestpaths graphconncomp graphisdag graphisomorphism graphmaxflow graphminspantree graphpred2path graphshortestpath graphtopoorder graphtraverse Bioinformatics Toolbox methods of biograph object isspantree 2 263 graphmaxflow Purpose Syntax Arguments 2 264 Calculate maximum flow and minimum cut in directed graph MaxFlow FlowMatrix Cut graphmaxflow G SNode TNode graphmaxflow G SNode TNode Capacity CapacityValue graphmaxflow G SNode TNode Method MethodValue Z SNode TNode CapacityValue MethodValue N by N sparse matrix that represents a directed graph Nonzero entries in matrix G represent the capacities of the
276. e as follows Peaks mspeaks MZ Intensities Base BaseValue specifies the wavelet base BaseValue must be an integer between 2 and 20 Default is 4 Peaks mspeaks MZ Intensities Levels LevelsValue specifies the number of levels for the wavelet decomposition LevelsValue must be an integer between 1 and 12 Default is 10 Peaks mspeaks MZ Intensities NoiseEstimator NoiseEstimatorValue specifies the method to estimate the threshold T to filter out noisy components in the first high band decomposition y_h Choices are e mad Default Median absolute deviation which calculates T sqrt 2 log n mad y_h 0 6745 where n the number of rows in the Intensities matrix e std Standard deviation which calculates T std y_h e A positive real value Peaks mspeaks MZ Intensities Multiplier MultiplierValue specifies the threshold multiplier constant MultiplierValue must be a positive real value Default is 1 0 Peaks mspeaks MZ Intensities Denoising DenoisingValue controls the use of wavelet denoising to smooth the signal Choices are true default or false mspeaks Note If your data has previously been smoothed for example with the mslowess or mssgolay function it is not necessary to use wavelet denoising Set this property to false Peaks mspeaks MZ Intensities PeakLocation PeakLocationValue specifies the proportio
277. e correction This correction works better when the number of sequences is greater than 50 Displaying a Sequence Logo for a Nucleotide Sequence 1 Create a series of aligned nucleotide sequences S ATTATAGCAAACTA AACATGCCAAAGTA ATCATGCAAAAGGA 2 Display the sequence logo seqlogo seqlogo S sequencetogo E ical File Window Help H amp L Sequence Position 3 Notice that correction for small samples prevents you from seeing columns with information equal to 10g2 4 2 bits but you can turn this adjustment off seqlogo S sscorrection false Displaying a Sequence Logo for an Amino Acid Sequence 1 Create a series of aligned amino acid sequences S2 LSGGQRQRVAIARALAL LSGGEKQRVAIARALMN 2 647 seqlogo LSGGQTQRVLLARALAA LSGGERRRLEIACVLAL FSGGEKKKNELWQMLAL LSGGERRRLEIACVLAL i 2 Display the sequence logo specifying an amino acid sequence and limiting the logo to sequence positions 2 through 10 seqlogo S2 alphabet aa startAt 2 endAt 10 lolx File Window Help H g eeRxg Sequence Position References 1 Schneider T D and Stephens R M 1990 Sequence Logos A new way to display consensus sequences Nucleic Acids Research 18 6097 6100 2 648 seqlogo See Also Bioinformatics Toolbox functions seqconsensus seqdisp seqprofile 2 649 seqmatch Purpose Syntax Desc
278. e data in the data set manorm Global GlobalValue when GlobalValue is true normalizes the values in the data set by the global mean or MethodValue of the data as opposed to normalizing each column or block of the data independently manorm StructureOutput StructureOutputValue when StructureOutputValue is true the input data is a structure returns the input structure with an additional data field for the normalized data manorm NewColumnName NewColumnNameValue when using StructureOutput allows you to specify the name of the column that is appended to the list of ColumnNames in the structure The default behavior is to prefix Block Normalized to the FieldName string maStruct gprread mouse_aiwt gpr Extract some data of interest Red magetfield maStruct F635 Median Green magetfield maStruct F532 Median Create a log log plot maloglog Red Green factorlines true Center the data normRed manorm Red manorm normGreen manorm Green Create a log log plot of the centered data figure maloglog normRed normGreen title Normalized factorlines true Alternatively you can work directly with the structure normRedBs manorm maStruct F635 Median B635 normGreenBs manorm maStruct F532 Median B532 Create a log log plot of the centered data This includes some zero values so turn off the warning figure w warning off Bioinf
279. e distance metric requires extra arguments then PdistValue is a cell array For example to use the Minkowski distance with exponent P you would use minkowski P clustergram Data Linkage LinkageValue specifies the linkage method to pass to the linkage function Statistics Toolbox to use to create the hierarchical cluster tree LinkageValue is a string For information on choices see the linkage function Default is average clustergram Data Dendrogram DendrogramValue specifies property name property value pairs to pass to the dendrogram function Statistics Toolbox to create the dendrogram plot DendrogramValue is a cell array of property name property value pairs For information on choices see the dendrogram function clustergram Data OptimalLeafOrder OptimalLeafOrderValue enables or disables the optimal leaf ordering calculation which determines the leaf order that maximizes the similarity between neighboring leaves Choices are true enable or false disable Default depends on the size of Data If the number of rows or columns in Data is greater than 1000 default is false otherwise default is true Note Disabling the optimal leaf ordering calculation can be useful when working with large data sets because this calculation uses a large amount of memory and can be very time consuming clustergram Data ColorMap ColorMapValue specifies the color map to use to creat
280. e function get or as fields in structures Some of these performance parameters are ErrorRate CorrectRate ErrorDistributionByClass Sensitivity and Specificity classperf without input arguments displays all the available performance parameters cp classperf groundtruth creates and initializes an empty object CP is the handle to the object groundtruth is a vector containing the true class labels for every observation groundtruth can be a numeric vector or a cell array of strings When used in a cross validation design experiment groundtruth should have the same size as the total number of observations classperf cp classout updates the CP object with the classifier output classout classout is the same size and type as groundtruth When classout is numeric and groundtruth is a cell array of strings the function grp2idx is used to create the index vector that links classout to the class labels When classout is a cell array of strings an empty string represents an inconclusive result of the classifier For numeric arrays NaN represents an inconclusive result classperf cp classout testidx updates the CP object with the classifier output classout classout has smaller size than groundtruth and testidx is an index vector or a logical index vector of classperf Examples the same size as groundtruth which indicates the observations that were used in the current validation cp classperf groundtruth classout
281. e is returned in the same format as the DNA sequence For example if SeqDNA is a vector of integers then so is SeqRNA Convert a DNA sequence to an RNA sequence rna dna2rna ACGATGAGTCATGCTT rna ACGAUGAGUCAUGCUU Bioinformatics Toolbox function rna2dna MATLAB functions regexp strrep 2 117 dnds Purpose Estimate synonymous and nonsynonymous substitution rates Syntax Dn Ds Vardn Vards dnds SeqNT7 SeqNT2 Dn Ds Vardn Vards dnds SeqNT7 SeqNT2 GeneticCode GeneticCodeValue Dn Ds Vardn Vards dnds SeqNT1 SeqNT2 Method MethodValue Dn Ds Vardn Vards dnds SeqNT1 SeqNT2 Window WindowValue Dn Ds Vardn Vards dnds SeqNT1 SeqNT2 Verbose VerboseValue Arguments SeqNT1 SeqNT2 Nucleotide sequences Enter either a string or a structure with the field Sequence GeneticCodeValue Property to specify a genetic code Enter a Code Number or a string with a Code Name from the table If you use a Code Name you can truncate it to the first two characters Default is 1 or Standard 2 118 dnds MethodValue WindowValue VerboseValue String specifying the method for calculating substitution rates Choices are e NG default Nei Gojobori method 1986 uses the number of synonymous and nonsynonymous substitutions and the number of potentially synonymous and nonsynonymous sites Based on the Jukes Cantor model e LWL Li W
282. e of a Molecule Viewer window created using the molviewer function 1 Use the molviewer function to create a figure handle to a Molecule Viewer window FH molviewer 2DHB 2 Use the evalrasmolscript function to send script commands to the molecule viewer that change the background to black and spin the molecule evalrasmolscript FH background white spin Bioinformatics Toolbox functions getpdb molviewer pdbread pdbwrite exprprofrange Purpose Syntax Arguments Description Examples See Also Calculate range of gene expression profiles Range exprprofrange Data Range LogRange exprprofrange Data exprprofrange PropertyName PropertyValue exprprofrange ShowHist ShowHistValue Data Matrix where each row corresponds to a gene ShowHistValue Property to control displaying a histogram with range data Enter either true include range data or false The default value is false Range exprprofrange Data calculates the range of each expression profile in a data set Data Range LogRange exprprofrange Data returns the log range that is log max prof log min prof of each expression profile If you do not specify output arguments exprprofrange displays a histogram bar plot of the range exprprofrange PropertyName PropertyValue defines optional properties using property name value pairs exprprofrange ShowHist ShowHistValue whe
283. e pairs You can specify one or more properties in any order Each PropertyName must be enclosed in single quotation marks and is case insensitive These property name property value pairs are as follows BGobj biograph CMatrix NodeIDs ID IDValue specifies an ID for the biograph object Default is This information is for bookkeeping purposes only BGobj biograph CMatrix NodeIDs Label LabelValue specifies a label for the biograph object Default is This information is for bookkeeping purposes only BGobj biograph CMatrix NodeIDs Description DescriptionValue specifies a description of the biograph object Default is This information is for bookkeeping purposes only BGobj biograph CMatrix NodeIDs LayoutType LayoutTypeValue specifies the algorithm for the layout engine biograph BGobj biograph CMatrix NodeIDs EdgeType EdgeTypeValue specifies how edges display BGobj biograph CMatrix NodeIDs Scale ScaleValue post scales the node coordinates Default is 1 BGobj biograph CMatrix NodeIDs LayoutScale LayoutScaleValue scales the size of the nodes before calling the layout engine Default is 1 BGobj biograph CMatrix NodeIDs EdgeTextColor EdgeTextColorValue specifies a three element numeric vector of RGB values Default is 0 0 0 which defines black BGobj biograph CMatrix NodeIDs E
284. e the clustergram This controls the colors used to display the heat map ColorMapValue is either a M by 3 matrix of RGB values or the name or function handle of a function that returns a color map Default is redgreencmap 2 93 clustergram clustergram Data SymmetricRange SymmetricRangeValue controls whether the color range of the heat map is symmetric around zero SymmetricRangeValue can be true default or false clustergram Data Dimension DimensionValue specifies whether to create a one dimensional or two dimensional clustergram Choices are 1 default or 2 The one dimensional clustergram clusters the rows of the data The two dimensional clustergram creates the one dimensional clustergram and then clusters the columns of the row clustered data clustergram Data Ratio RatioValue specifies the ratio of the space that the dendrogram s use in the X and Y directions relative to the size of the heat map If RatioValue is a scalar it is used as the ratio for both directions If RatioValue is a two element vector the first element is used for the X ratio and the second element is used for the Y ratio The Y ratio is ignored for one dimensional clustergrams Default ratio is 1 5 Tip Click and hold the mouse button on the heat map to display the intensity value column label and row label for that area of the heat map View row labels by using the zoom icon to zoom the right side of the cl
285. eIndices probeIndices 3 Perform GCRMA background adjustment quantile normalization and Robust Multi array Average RMA summarization on the Affymetrix microarray probe level data and create a matrix of expression values expdata gcrma pmMatrix mmMatrix probeIndices seqMatrix The prostatecancerrawdata mat file used in this example contains data from Best et al 2005 1 Wu Z Irizarry R A Gentleman R Murillo F M and Spencer F 2004 A Model Based Background Adjustment for Oligonucleotide gcrma See Also Expression Arrays Journal of the American Statistical Association 99 468 909 917 2 Wu Z and Irizarry R A 2005 Stochastic Models Inspired by Hybridization Theory for Short Oligonucleotide Arrays Proceedings of RECOMB 2004 J Comput Biol 12 6 882 93 3 Wu Z and Irizarry R A 2005 A Statistical Framework for the Analysis of Microarray Probe Level Data Johns Hopkins University Biostatistics Working Papers 73 4 Speed T 2006 Background models and GCRMA Lecture 10 Statistics 246 University of California Berkeley http www stat berkeley edu users terry Classes s246 2006 Week10 Week 5 Best C J M Gillespie J W Yi Y Chandramouli G V R Perlmutter M A Gathright Y Erickson H S Georgevich L Tangrea M A Duray P H Gonzalez S Velasco A Linehan W M Matusik R J Price D K Figg W D Emmert Buck M R and Chuaqui R F 2005 M
286. ead mouse_aipd gpr maimage gprStruct F635 Median Alternatively you can create a similar plot using more basic graphics commands F635Median magetfield gprStruct F635 Median imagesc F635Median gprStruct Indices colormap bone colorbar 2 233 gprread See Also Bioinformatics Toolbox functions affyread agferead celintensityread galread geosoftread imageneread magetfield sptread 2 234 graphallshortestpaths Purpose Syntax Arguments Description Find all shortest paths in graph dist graphallshortestpaths G dist graphallshortestpaths G Directed DirectedValue dist graphallshortestpaths G Weights WeightsValue wa G N by N sparse matrix that represents a graph Nonzero entries in matrix G represent the weights of the edges DirectedValue Property that indicates whether the graph is directed or undirected Enter false for an undirected graph This results in the upper triangle of the sparse matrix being ignored Default is true WeightsValue Column vector that specifies custom weights for the edges in matrix G It must have one entry for every nonzero value edge in matrix G The order of the custom weights in the vector must match the order of the nonzero values in matrix G when it is traversed column wise This property lets you use zero valued weights By default graphallshortestpaths gets weight information from the nonzero entries in
287. ecify the algorithm used to find the shortest path Choices are e Bellman Ford Assumes weights of the edges to be nonzero entries in the N by N adjacency matrix Time complexity is 0 N E where N and E are the number of nodes and edges respectively References See Also shortestpath biograph e BFS Breadth first search Assumes all weights to be equal and nonzero entries in the N by N adjacency matrix to represent edges Time complexity is 0 N E where N and E are the number of nodes and edges respectively e Acyclic Assumes the graph represented by the N by N adjacency matrix extracted from a biograph object BGObj to be a directed acyclic graph and that weights of the edges are nonzero entries in the N by N adjacency matrix Time complexity is O N E where N and E are the number of nodes and edges respectively e Dijkstra Default algorithm Assumes weights of the edges to be positive values in the N by N adjacency matrix Time complexity is O log N E where N and E are the number of nodes and edges respectively shortestpath Weights WeightsValue lets you specify custom weights for the edges WeightsValue is a column vector having one entry for every nonzero value edge in the N by N adjacency matrix extracted from a biograph object BGObj The order of the custom weights in the vector must match the order of the nonzero values in the N by N adjacency matrix when it is trave
288. ect also stores information such as color properties and text label characteristics used to create a 2 D visualization of the graph You create a biograph object using the object constructor function biograph You can view a graphical representation of a biograph object using the view method Following are methods of a biograph object allshortestpaths biograph conncomp biograph dolayout biograph getancestors biograph getdescendants biograph getedgesbynodeid biograph getmatrix biograph getnodesbyid biograph getrelatives biograph isdag biograph Find all shortest paths in biograph object Find strongly or weakly connected components in biograph object Calculate node positions and edge trajectories Find ancestors in biograph object Find descendants in biograph object Get handles to edges in biograph object Get connection matrix from biograph object Get handles to nodes Find relatives in biograph object Test for cycles in biograph object biograph object Property Summary isomorphism biograph isspantree biograph maxflow biograph minspantree biograph shortestpath biograph topoorder biograph traverse biograph view biograph Find isomorphism between two biograph objects Determine if tree created from biograph object is spanning tree Calculate maximum flow and minimum cut in biograph object Find minimal spanning tree in biograph object
289. edges Node in G Node in G Column vector that specifies custom capacities for the edges in matrix G It must have one entry for every nonzero value edge in matrix G The order of the custom capacities in the vector must match the order of the nonzero values in matrix G when it is traversed column wise By default graphmaxf low gets capacity information from the nonzero entries in matrix G String that specifies the algorithm used to find the minimal spanning tree MST Choices are e Edmonds Uses the Edmonds and Karp algorithm the implementation of which is based on a variation called the labeling algorithm Time complexity is O N E 2 where N and E are the number of nodes and edges respectively e Goldberg Default algorithm Uses the Goldberg algorithm which uses the generic method known as preflow push Time complexity is O N 2 sqrt E where N and E are the number of nodes and edges respectively graphmaxflow Description Tip For introductory information on graph theory functions see Graph Theory Functions in the Bioinformatics Toolbox documentation MaxFlow FlowMatrix Cut graphmaxflow G SNode TNode calculates the maximum flow of directed graph G from node SNode to node TNode Input G is an N by N sparse matrix that represents a directed graph Nonzero entries in matrix G represent the capacities of the edges Output MaxFlow is the maximum flow and FlowMatrix is a sparse matrix
290. efault Median absolute deviation which calculates T sqrt 2 log n mad y_h 0 6745 where n the number of rows in the Intensities matrix e std Standard deviation which calculates T std y_h e A positive real value A positive real value that specifies the threshold multiplier constant Default is 1 0 Controls the use of wavelet denoising to smooth the signal Choices are true default or false Note If your data has previously been smoothed for example with the mslowess or mssgolay function it is not necessary to use wavelet denoising Set this property to false 2 467 mspeaks PeakLocationValue FWHH_FilterValue Value that specifies the proportion of the peak height that selects the points used to compute the centroid mass of the respective peak The value must be gt 0 and lt 1 Default is 1 0 Positive real value that specifies the minimum full width at half height FWHH in m z units for reported peaks Peaks with FWHH below this value are not included in the output list Peaks Default is 0 OverSegmentation_FilterValuePositive real value that specifies 2 468 the minimum distance in m z units between neighboring peaks When the signal is not smoothed appropriately multiple maxima can appear to represent the same peak By increasing this filter value oversegmented peaks are joined into a single peak Default is 0 mspeaks Return Values Description
291. efines the position on Pattern where the sequence is cut Position 0 corresponds to the 5 end of the Pattern PartialDigestValue Property to specify a probability for partial digestion Enter a value from 0 to 1 Fragments restrict SeqNT Enzyme cuts a sequence SeqNT into fragments at the restriction sites of a restriction enzyme Enzyme The returned values are stored in a cell array of sequences Fragments Fragments restrict SeqNT Pattern Position cuts a sequence SeqNT into fragments at restriction sites specified by a nucleotide pattern Pattern Fragments CuttingSites restrict returns a numeric vector with the indices representing the cutting sites A 0 zero is added to the list so numel Fragments numel CuttingSites You restrict can use CuttingSites 1 to point to the first base of every fragment respective to the original sequence Fragments CuttingSites Lengths restrict returnsa numeric vector with the lengths of every fragment restrict PartialDigest PartialDigestValue simulates a partial digest where each restriction site in the sequence has a probability PartialDigestValue of being cut REBASE the restriction enzyme database is a collection of information about restriction enzymes and related proteins For more information about REBASE or to search REBASE for the name of a restriction enzyme go to the REBASE Web site at http rebase neb com rebase rebas
292. eights are the result of normalizing to unity the new patristic distances between every leaf and the root 1 Create an ultrametric tree with specified branch distances bd 1 2 3 tr_1 phytree 1 2 3 455 6 bd 2 View the tree view tr_1 4 83 weights phytree ol File Edit View Insert Tools Desktop Window Help 3 Display the calculated weights weights tr_1 ans 1 0000 1 0000 0 8000 0 8000 References 1 Thompson JD Higgins DG Gibson TJ 1994 CLUSTAL W Improving the sensitivity of progressive multiple sequence alignment through sequence weighting position specific gap penalties and weight matrix choice Nucleic Acids Research 22 22 4673 4680 2 Henikoff S Henikoff JG 1994 Position based sequence weights Journal Molecular Biology 243 4 574 578 4 84 weights phytree See Also Bioinformatics Toolbox e functions multialign phytree object constructor profalign seqlinkage 4 85 Objects Alphabetical List biograph object Purpose Description Method Summary Data structure containing generic interconnected data used to implement directed graph A biograph object is a data structure containing generic interconnected data used to implement a directed graph Nodes represent proteins genes or any other biological entity and edges represent interactions dependences or any other relationship between the nodes A biograph obj
293. el 3 By default Levels 1 N Handles OutFeatList featuresmap returns a list of handles for each feature in OutFeatList It also returns OutFeatList which is a cell array of the mapped features Tip Use Handles and OutFeatList with the legend command to create a legend of features 2 145 featuresmap 2 146 featuresmap PropertyName PropertyValue defines optional properties that use property name value pairs in any order These property name value pairs are as follows featuresmap FontSize FontSizeValue sets the font size points for the annotations of the features Default FontSizeValue is 9 featuresmap ColorMap ColorMapValue specifies a list of colors to use for each feature This matrix replaces the default matrix which specifies the following colors and order blue green red cyan magenta yellow brown light green orange purple gold and silver ColorMapValue is a three column matrix where each row corresponds to a color and each column specifies red green and blue intensity respectively Valid values for the RGB intensities are 0 0 to 1 0 featuresmap Qualifiers QualifiersValue lets you specify an ordered list of qualifiers to search for and use as annotations For each feature the first matching qualifier found from the list is used for its annotation If a feature does not include any of the qualifiers no annotation displays fo
294. ement SeqNT Code CodeValue Integer IntegerValue Name NameValue ern come eam y SeqNnT Nucleotide sequence Enter a character string of single letter codes from the Nucleotide Lookup Table below In addition to a single nucleotide sequence SeqNT can be a cell array of sequences or a two dimensional character array of sequences The complement for each sequence is determined independently CodeValue Nucleotide letter code Enter a single character from the Nucleotide Lookup Table below Code can also be a cell array or a two dimensional character array IntegerValue Nucleotide integer Enter an integer from the Nucleotide Lookup Table below Integers are arbitrarily assigned to TUB IUPAC letters NameValue Nucleotide name Enter a nucleotide name from the Nucleotide Lookup Table below NameValue can also be a single name a cell array or a two dimensional character array Nucleotide Lookup Table Code Integer Base Name Meaning Complement A 1 Adenine A T C 2 Cytosine c G baselookup Description Code Integer Base Name Meaning Complement G 3 Guanine G c T 4 Thymine T A U 4 Uracil U A R 5 Purine G A Y Y 6 Pyrimidine T C R K 7 Keto G T M M 8 Amino A C K S 9 Strong interaction 3 H G c S bonds w 10 Weak interaction 2 H A T W bonds B 11 Not A G T c V D 12 NotC G A T H H 13 NotG Aime D V 14 NotTorU G A Cc
295. en QuantileValue is 0 9 1 only the largest 10 of ion intensities in every spectrum are used to compute the AUC When QuantileValue is a scalar the scalar value represents the lower quantile limit and the upper quantile limit is set to 1 The default value is 0 1 use the whole area under the curve AUC msnorm Limits LimitsValue specifies a 1 by 2 vector with an MZ range for picking normalization points This parameter is useful to eliminate low mass noise from the AUC calculation The default value is 1 max MZ msnorm Consensus ConsensusValue selects MZ positions with a consensus rule to include an MZ position into the AUC Its ion intensity must be within the quantile limits of at least part ConsensusValue of the spectra in Y The same MZ positions are used to normalize all the spectrums Enter a scalar between 0 and 1 Use the Consensus property to eliminate low intensity peaks and noise from the normalization msnorm Method MethodValue selects a method for normalizing the AUC of every spectrum Enter either Median default or Mean msnorm Max MaxValue after individually normalizing every spectrum scales each spectrum to an overall maximum intensity Max Max is a scalar if omitted no postscaling is performed If QuantileValue is 1 1 then a single point peak height of the tallest peak is normalized to Max 1 Load sample data and plot one of the spectra load sample
296. en patristic distance If C distance the first criterion is patristic distance and then branch levels select Threshold ThresholdValue selects all the nodes where closeness is less than or equal to the threshold value ThresholdValue Notice you can also use either of the properties criteria or reference if N is not specified then N infF otherwise you can limit the number of selected nodes by N select Exclude ExcludeValue when ExcludeValue branches sets a postfilter that excludes all the branch nodes from S or when ExcludeValue leaves all the leaf nodes The default is none select Propagate PropagateValue activates a postfunctionality that propagates the selected nodes to the leaves when toleaves or toward the root finding a common ancestor when P toroot The default value is none P may also be both The Propagate property acts after the Exclude property select phytree Examples Load a phylogenetic tree created from a protein family tr phytreeread pf00002 tree To find close products for a given protein e g vips_ human ind getbyname tr vips_human sel sel_ leaves select tr criteria distance threshold 0 6 reference ind view tr sel_leaves To find potential outliers in the tree use sel sel_leaves select tr criteria distance threshold 3 reference leave
297. enePix Results GPR file Read microarray data from ImaGene Results file Read data from SPOT file Extract data from microarray structure Probe set library information for probe results Link to NetAffx Web site T Functions By Category probesetlookup Gene name for probe set probesetplot Plot values for Affymetrix CHP file probe set probesetvalues Probe set values from probe results Microarray Data Analysis and Visualization clustergram Create dendrogram and heat map maboxplot Box plot for microarray data mafdr Estimate false discovery rate FDR of differentially expressed genes from two experimental conditions or phenotypes maimage Spatial image for microarray data mairplot Create intensity versus ratio scatter plot of microarray data maloglog Create loglog plot of microarray data mapcaplot Create Principal Component Analysis plot of microarray data mattest Perform two tailed t test to evaluate differential expression of genes from two experimental conditions or phenotypes mavolcanoplot Create significance versus gene expression ratio fold change scatter plot of microarray data redgreencmap Create red and green color map 1 16 Microarray Normalization and B Microarray Normalization and Filtering affyinvarsetnorm affyprobeaffinities exprprofrange exprprofvar gcrma gcrmabackadj geneentropyfilter genelowvalfilter generangefilter genevarfilter Perform rank invari
298. ent hmmprofalign Model Seq returns the score for the optimal alignment of the query amino acid or nucleotide sequence Seq to the profile hidden Markov model Model Scores are computed using log odd ratios for emission probabilities and log probabilities for state transitions Alignment Score hmmprofalign Model Seq returns a string showing the optimal profile alignment Uppercase letters and dashes correspond to MATCH and DELETE states respectively the combined count is equal to the number of states in the model Lowercase letters are emitted by the INSERT states For more information about the HMM profile see hmmprofstruct Score Alignment Prointer hmmprofalign Model Seq returns a vector of the same length as the profile model with indices pointing to the respective symbols of the query sequence Null pointers NaN mean that such states did not emit a symbol in the aligned sequence because they represent model jumps from the BEGIN state of a MATCH state model jumps from the from a MATCH state to the END state or because the alignment passed through DELETE states hmmprofalign PropertyName PropertyValue defines optional properties using property name value pairs hmmprofalign ShowScore ShowScoreValue when ShowScoreValue is true displays the scoring space and the winning path hmmprofalign Flanks FlanksValue when FlanksValue is true includes the symbols generated by the FLANKI
299. ent profiles sum S Matchx 4 0 Default is repmat 0 998 0 001 0 001 0 profLength 1 1 InsertX INSERT state transition probabilities Format is I1 gt M2 I2 gt M3 I end 1 gt Mend I1 gt 11 I2 gt 12 I end 1 gt I end 1 Note sum S InsertX 11 1 Default is repmat 0 5 0 5 profLength 1 1 2 319 hmmprofstruct Field Name Description Deletex DELETE state transition probabilities The format is D1 gt M2 D2 gt M3 D end 1 gt Mend D1 gt D2 D2 gt D3 D end 1 gt Dend Note sum S DeleteX 11 1 Default is repmat 0 5 0 5 profLength 1 1 2 320 FlankingInsertX Flanking insert states N and C used for LOCAL profile alignment The format is N gt B C gt T N gt N C gt C Note sum S FlankingInsertsXx 1 1 To force global alignment use FlankingInsertsxX 1 1 0 0 Default is 0 01 0 01 0 99 0 99 hmmprofstruct Field Name Description LoopX Loop states transition probabilities used for multiple hits alignment The format is E2 gt Cs daa e E gt J J gt J Note sum S LoopX 1 1 Default is 0 5 0 01 0 5 0 99 Nul1x Null transition probabilities used to provide scores with log odds values also for state transitions The format is G gt F G gt G Note sum S Nul1X 1 Default is 0 01 0 99 Annotation Fields Optional Name Model Name IDNumbe
300. eq2regexp seqcomplement Read trace data from SCF file Draw nucleotide trace plots Convert amino acid sequence from letter to integer representation Convert amino acid sequence to nucleotide sequence Find amino acid codes integers abbreviations names and codons Nucleotide codes abbreviations and names Convert DNA sequence to RNA sequence Convert amino acid sequence from integer to letter representation Convert nucleotide sequence from integer to letter representation Convert nucleotide sequence to amino acid sequence Convert nucleotide sequence from letter to integer representation Convert RNA sequence of nucleotides to DNA sequence Convert sequence with ambiguous characters to regular expression Calculate complementary strand of nucleotide sequence Sequence Utilities seqrcomplement seqreverse Sequence Utilities aminolookup baselookup blastncbi cleave evalrasmolscript featuresparse geneticcode joinseq molviewer oligoprop palindromes pdbdistplot proteinplot Calculate reverse complement of nucleotide sequence Reverse letters or numbers in nucleotide sequence Find amino acid codes integers abbreviations names and codons Nucleotide codes abbreviations and names Generate remote BLAST request Cleave amino acid sequence with enzyme Send RasMol script commands to Molecule Viewer window Parse features from GenBank GenPept or EMBL data Nucleoti
301. eqinsertgaps Seq Positions inserts gaps in the sequence Seq before the positions specified by the integers in the vector Positions NewSeq seqinsertgaps Seq GappedSeq finds the gap positions in the sequence GappedSeq then inserts gaps in the corresponding positions in the sequence Seq NewSeq seqinsertgaps Seq GappedSeq Relationship specifies the relationship between Seq and GappedSeq Enter 1 for Relationship when both sequences use the same alphabet that is both are nucleotide sequences or both are amino acid sequences Enter 3 for Relationship when Seq contains nucleotides representing codons and GappedSeq contains amino acids Default is 3 Examples 1 Retrieve two nucleotide sequences from the GenBank database for the neuraminidase NA protein of two strains of the Influenza A virus H5N1 hkO1 getgenbank AF509094 vt04 getgenbank DQ094287 2 Extract the coding region from the two nucleotide sequences hkO1_cds featuresparse hk01 feature CDS Sequence true vt04_cds featuresparse vt04 feature CDS Sequence true 3 Align the amino acids sequences converted from the nucleotide sequences sc al nwalign nt2aa hk01_cds nt2aa vt04_cds extendgap 1 4 Use the seqinsertgaps function to copy the gaps from the aligned amino acid sequences to their corresponding nucleotide sequences thus codon aligning them hkO1_aligned seqinsertgaps hk01_cds al 1 vt04_aligned seqi
302. er graphtraverse Bioinformatics Toolbox method of biograph object conncomp 2 248 graphisdag Purpose Test for cycles in directed graph Syntax graphisdag G Arguments G N by N sparse matrix that represents a directed graph Nonzero entries in matrix G indicate the presence of an edge Description Tip For introductory information on graph theory functions see Graph Theory Functions in the Bioinformatics Toolbox documentation graphisdag G returns logical 1 true ifthe directed graph represented by matrix G is a directed acyclic graph DAG and logical 0 false otherwise Gis an N by N sparse matrix that represents a directed graph Nonzero entries in matrix G indicate the presence of an edge Examples Testing for Cycles in Directed Graphs 1 Create and view a directed acyclic graph DAG with six nodes and eight edges DG sparse 1 11223 4 6 2 463 5 4 6 5 true 6 6 DG lt x xw wre re wre TS wa ee k ee re Cee Cree Cee Cee 2 249 graphisdag view biograph DG Biograph Viewer 1 A S 2 Test for cycles in the DAG graphisdag DG ans 2 250 graphisdag 3 Add an edge to the DAG to make it cyclic and then view the directed graph DG 5 1 true DG x xw wre rer vere re TS wa e os sass st gt gt view biograph DG 2 251 graphisdag Biograph Viewer 2 4 Test for cycles in the new graph graphisdag DG
303. er of the nonzero values in the matrix when it is traversed column wise By default graphmaxf low gets capacity information from the nonzero entries in the matrix maxflow BGObj SNode TNode Method MethodValue lets you specify the algorithm used to find the minimal spanning tree MST Choices are e Edmonds Uses the Edmonds and Karp algorithm the implementation of which is based on a variation called the labeling algorithm Time complexity is 0 N E 2 where N and E are the number of nodes and edges respectively e Goldberg Default algorithm Uses the Goldberg algorithm which uses the generic method known as preflow push Time complexity is O N 2 sqrt E where N and E are the number of nodes and edges respectively 4 47 maxflow biograph 4 48 References See Also 1 Edmonds J and Karp R M 1972 Theoretical improvements in the algorithmic efficiency for network flow problems Journal of the ACM 19 248 264 2 Goldberg A V 1985 A New Max Flow Algorithm MIT Technical Report MIT LCS TM 291 Laboratory for Computer Science MIT 3 Siek J G Lee L Q and Lumsdaine A 2002 The Boost Graph Library User Guide and Reference Manual Upper Saddle River NJ Pearson Education Bioinformatics Toolbox functions biograph object constructor graphmaxflow Bioinformatics Toolbox object biograph object Bioinformatics Toolbox methods of a biograph object allshortest
304. er the graph is directed or adirected Set DirectedValue to false for an undirected graph This results in the upper triangle of the sparse matrix being ignored Default is true graphshortestpath Method MethodValue lets you specify the algorithm used to find the shortest path Choices are e Bellman Ford Assumes weights of the edges to be nonzero entries in sparse matrix G Time complexity is O0 N E where N and E are the number of nodes and edges respectively e BFS Breadth first search Assumes all weights to be equal and nonzero entries in sparse matrix G to represent edges Time graphshortestpath Examples complexity is O N E where N and E are the number of nodes and edges respectively e Acyclic Assumes G to be a directed acyclic graph and that weights of the edges are nonzero entries in sparse matrix G Time complexity is O N E where N and E are the number of nodes and edges respectively e Dijkstra Default algorithm Assumes weights of the edges to be positive values in sparse matrix G Time complexity is 0 1og N E where N and E are the number of nodes and edges respectively graphshortestpath Weights WeightsValue lets you specify custom weights for the edges WeightsValue isa column vector having one entry for every nonzero value edge in matrix G The order of the custom weights in the vector must match the order of the nonzero values in mat
305. erValue Property to specify the order of the amino acid alphabet Enter a character string with the 20 standard amino acids charactersA RNDCQEGH I LKMFPSTWY V The ambiguous characters B Z X are not allowed showhmmprof Model plots a profile hidden Markov model described by the structure Model showhmmprof PropertyName PropertyValue defines optional properties using property name value pairs showhmmprof Scale ScaleValue specifies the scale to use If log probabilities ScaleValue logprob probabilities ScaleValue prob or log odd ratios ScaleValue logodds To compute the log odd ratios the null model probabilities are used for symbol emission and equally distributed transitions are used for the null transition probabilities The default ScaleValue is logprob showhmmprof Order OrderValue specifies the order in which the symbols are arranged along the vertical axis This option 2 685 showhmmprof Examples See Also 2 686 allows you reorder the alphabet and group the symbols according to their properties 1 Load a model example model pfamhmmread pf00002 1s 2 Plot the profile showhmmprof model Scale logodds 3 Order the alphabet by hydrophobicity hydrophobic IVLFCMAGTSWYPHNDQEKR 4 Plot the profile showhmmprof model Order hydrophobic Bioinformatics Toolbox functions gethmmprof hmmprofalign hmmprofestimate hmmprofgene
306. es are e regular Default Evenly spaced lattice e latin Random Latin hypercube with GridStepsValue 2 samples 2 413 msalign 2 414 ShowPlotValue GroupValue Controls the display of a plot of an original and aligned spectrum over the reference masses specified by RefMZ Choices are true false or I an integer specifying the index of a spectrum in Intensities If set to true the first spectrum in Intensities is plotted Default is e false When return values are specified e true When return values are not specified Controls the creation of RefMZOut a new vector of m z values to be used as reference masses for aligning the peaks This vector is created by adjusting the values in RefMZ based on the sample data from multiple spectra in Intensities such that the overall shifting and scaling of the peaks is minimized Choices are true or false default Tip Set GroupValue to true only if Intensities contains data for a large number of spectra and you are not confident of the m z values used for your reference peaks in RefMZ Leave GroupValue set to false if you are confident of the m z values used for your reference peaks in RefMZ msalign Return Values Description IntensitiesOut Either of the following e Column vector intensity values for a spectrum where each row corresponds to an m z value e Matrix of intensity values for a set of mass spectra that share the
307. es in the dendrogram whose linkage is less than a threshold of 5 clustergram yeastvalues RowLabels genes Dendrogram colorthreshold 5 2 97 clustergram References 2 98 ioj xl File Edit View Insert Tools Desktop Window Help DeFaSi R QQAQMS E 008 e0 j bin bs est Ere eae nye i l O Woes T 1 Bar Joseph Z Gifford D K and Jaakkola T S 2001 Fast optimal leaf ordering for hierarchical clustering Bioinformatics 17 Suppl 1 822 9 PMID 11472989 2 Eisen M B Spellman P T Brown P O and Botstein D 1998 Cluster analysis and display of genome wide expression patterns Proc Natl Acad Sci USA 95 14863 8 clustergram 3 DeRisi J L Iyer V R and Brown P O 1997 Exploring the metabolic and genetic control of gene expression on a genomic scale Science 278 680 686s See Also Bioinformatics Toolbox function redgreencmap Statistics Toolbox functions cluster dendrogram linkage pdist 2 99 codonbias Purpose Syntax Arguments Description 2 100 Calculate codon frequency for each amino acid in DNA sequence codonbias SeqDNA codonbias PropertyName PropertyValue codonbias GeneticCode GeneticCodeValue codonbias Frame FrameValue codonbias Reverse ReverseValue codonbias Pie PieValue SeqDNA Nucleotide sequence DNA or RNA Enter a character string with the letter
308. es using property name value pairs getgenbank ToFile ToFileValue saves the data returned from GenBank in a file If you do not give a location or path to the file the file is stored in the MATLAB current directory Read a GenBank formatted file back into MATLAB using the function genbankread getgenbank FileFormat FileFormatValue returns the sequence in the specified format FileFormatValue getgenbank SequenceOnly SequenceOnlyValue when SequenceOnly is true returns only the sequence as a character array When the properties SequenceOnly and ToFile are used together the output file is in the FASTA format Examples To retrieve the sequence from chromosome 19 that codes for the human insulin receptor and store it in a structure S in the MATLAB Command Window type S getgenbank M10051 LocusName HUMINSR LocusSequenceLength 4723 LocusNumberofStrands LocusTopology linear LocusMoleculeType mRNA LocusGenBankDivision PRI LocusModificationDate 06 JAN 1995 Definition Human insulin receptor mRNA complete cds Accession M10051 Version M10051 1 2 204 getgenbank See Also GI Project Keywords Segment Source SourceOrganism Reference Comment Features CDS Sequence SearchuURL RetrieveURL 186439 insulin receptor tyrosine kinase Homo sapiens human 4x65 char 1x1 struct 14x67 char 51x74 char 1x1 struct
309. etween two graphs Determine if tree is spanning tree Calculate maximum flow and minimum cut in directed graph Find minimal spanning tree in graph Convert predecessor indices to paths Solve shortest path problem in graph Perform topological sort of directed acyclic graph Traverse graph by following adjacent nodes Gene Ontology goannotread num2goid Protein Analysis aacount aminolookup atomiccomp cleave evalrasmolscript isoelectric molviewer molweight pdbdistplot proteinplot proteinpropplot Gene B Annotations from Gene Ontology annotated file Convert numbers to Gene Ontology IDs Count amino acids in sequence Find amino acid codes integers abbreviations names and codons Calculate atomic composition of protein Cleave amino acid sequence with enzyme Send RasMol script commands to Molecule Viewer window Estimate isoelectric point for amino acid sequence Display and manipulate 3 D molecule structure Calculate molecular weight of amino acid sequence Visualize intermolecular distances in Protein Data Bank PDB file Characteristics for amino acid sequences Plot properties of amino acid sequence 1 13 T Functions By Category 1 14 ramachandran rebasecuts Profile Hidden Markov Models gethmmalignment gethmmprof gethmmtree hmmprofalign hmmprofestimate hmmprofgenerate hmmprofmerge hmmprofstruct pfamhmmread showhmmprof Draw Ramacha
310. ew Insert Tools Desktop Window Help a DSW rkr eaan 0al n 1 4 See Also Bioinformatics Toolbox functions classperf crossvalind randfeatures svmclassify Statistics Toolbox function classify 2 603 rebasecuts Purpose Syntax Arguments Description 2 604 Find restriction enzymes that cut protein sequence Enzymes Sites rebasecuts SeqNT rebasecuts SeqnNT Group rebasecuts SeqnT Q R rebasecuts SegNT S SeqnT Nucleotide sequence Enzymes Cell array with the names of restriction enzymes from REBASE Version 412 Sites Vector of cut sites with the base number before every cut relative to the sequence Group Cell array with the names of valid restriction enzymes Q R S Base positions Enzymes Sites rebasecuts SeqNT finds all the restriction enzymes that cut a nucleotide sequence SeqNT rebasecuts SegNT Group limits the search to a specified list of enzymes Group rebasecuts SegNT Q R limits the search to those enzymes that cut after a specified base position Q and before a specified base position R relative to the sequence rebasecuts SeqNT S limits the search to those enzymes that cut just after a specified base position S REBASE the Restriction Enzyme Database is a collection of information about restriction enzymes and related proteins For more information about REBASE see http rebase neb com rebase rebase html rebasecuts Example Se
311. f this GO term Numeric array containing GO IDs of GO terms that have an is_a relationship with this GO term Numeric array containing GO IDs that of GO terms that have a part_of relationship with this GO term Boolean value that indicates if the GO term is obsolete 1 or not obsolete 0 Bioinformatics Toolbox functions geneont object constructor goannotread num2goid Bioinformatics Toolbox methods of geneont object getancestors getdescendants getmatrix getrelatives phytree object Purpose Description Method Summary Data structure containing phylogenetic tree A phytree object is a data structure containing a phylogenetic tree Phylogenetic trees are binary rooted trees which means that each branch is the parent of two other branches two leaves or one branch and one leaf A phytree object can be ultrametric or nonultrametric Following are methods of a phytree object get phytree getbyname phytree getcanonical phytree getmatrix phytree getnewickstr phytree pdist phytree plot phytree prune phytree reorder phytree reroot phytree select phytree subtree phytree Information about phylogenetic tree object Branches and leaves from phytree object Calculate canonical form of phylogenetic tree Convert phytree object into relationship matrix Create Newick formatted string Calculate pair wise patristic distances in phytree object Dra
312. featuresmap featuresmap FontSize FontSizeValue featuresmap ColorMap ColorMapValue featuresmap Qualifiers QualifiersValue featuresmap ShowPositions ShowPositionsValue Arguments GBStructure GenBank structure typically created using the getgenbank or the genbankread function FeatList Cell array of features from the list of all features in the GenBank structure to include in or exclude from the map e If FeatList is a cell array of features these features are mapped Any features in FeatList not found in the GenBank structure are ignored e If FeatList includes as the first string in the cell array then the remaining strings features are not mapped By default FeatList is the a list of all features in the GenBank structure 2 142 featuresmap Levels FontSizeValue ColorMapValue Vector of N integers where N is the number of features Each integer represents the level in the map for the corresponding feature For example if Levels 1 1 2 3 3 the first two features would appear on level 1 the third feature on level 2 and the fourth and fifth features on level 3 By default Levels 1 N Scalar that sets the font size points for the annotations of the features Default is 9 Three column matrix to specify a list of colors to use for each feature This matrix replaces the default matrix which specifies the following colors
313. ference 2 590 ramachandran function reference 2 591 randfeatures function reference 2 593 randseq function reference 2 596 rankfeatures function reference 2 599 rebasecuts function reference 2 604 redgreencmap function reference 2 606 reorder method reference 4 59 reroot method reference 4 63 restrict function reference 2 608 revgeneticcode function reference 2 611 rmabackadj function reference 2 615 rmasummary function reference 2 620 rna2dna function reference 2 624 S scfread function Index reference 2 625 select method reference 4 67 seq2regexp function reference 2 628 seqcomplement function reference 2 631 seqconsensus function reference 2 632 seqdisp function reference 2 634 seqdotplot function reference 2 636 seqinsertgaps function reference 2 638 seqlinkage function reference 2 641 seqlogo function reference 2 643 seqmatch function reference 2 650 seqneighjoin function reference 2 651 seqpdist function reference 2 654 seqprofile function reference 2 665 seqrcomplement function reference 2 668 seqreverse function reference 2 669 seqshoworfs function reference 2 670 seqshowwords function reference 2 675 seqtool function reference 2 678 seqwordcount function reference 2 680 shortestpath method reference 4 70 showalignment function reference 2 682 showhmmprof function reference 2 685 sptread function reference 2 687 subtree method reference 4 75 svmclassify function reference 2 6
314. filter LOGValue LOGValueValue Arguments Data Matrix where each row corresponds to the experimental results for one gene Each column is the results for all genes from one experiment Names Cell array with the name of a gene for each row of experimental data Names has same number of rows as Data with each row containing the name or ID of the gene in the data set PercentileValue Property to specify a percentile below which gene expression profiles are removed Enter a value from 0 to 100 Abs ValueValue Property to specify an absolute value below which gene expression profiles are removed LOGPercentileValue Property to specify the LOG of a percentile LOGValueValue Property to specify the LOG of an absolute value Description Mask generangefilter Data calculates the range for each gene expression profile in the experimental data Data and then identifies the expression profiles with ranges less than the 10th percentile 2 186 generangefilter Examples References See Also Mask is a logical vector with one element for each row in Data The elements of Mask corresponding to rows with a range greater then the threshold have a value of 1 and those with a range less then the threshold are 0 Mask FData generangefilter Data returns a filtered data matrix FData FData can also be created using FData Data find I Mask FData FNames generangefilter Data Names returns a filtered names arr
315. fined in FAR 12 212 DFARS Part 227 72 and DFARS 252 227 7014 Accordingly the terms and conditions of this Agreement and only those rights specified in this Agreement shall pertain to and govern the use modification reproduction release performance display and disclosure of the Program and Documentation by the federal government or other entity acquiring for or through the federal government and shall supersede any conflicting contractual terms or conditions If this License fails to meet the government s needs or is inconsistent in any respect with federal procurement law the government agrees to return the Program and Documentation unused to The MathWorks Inc Trademarks MATLAB Simulink Stateflow Handle Graphics Real Time Workshop and xPC TargetBox are registered trademarks and SimBiology SimEvents and SimHydraulics are trademarks of The MathWorks Inc Other product or brand names are trademarks or registered trademarks of their respective holders Patents The MathWorks products are protected by one or more U S patents Please see www mathworks com patents for more information Revision History May 2005 September 2005 November 2005 March 2006 May 2006 September 2006 March 2007 Online only Online only Online only Online only Online only Online only Online only New for Version 2 1 Release 14SP2 Revised for Version 2 1 1 Release 14SP3 Revised for Version 2 2 Release 14SP3 Revised for Versi
316. fo probesetlink probesetlookup probesetplot probesetvalues 2 33 affyread Purpose Read microarray data from Affymetrix GeneChip file Windows 32 Syntax AffyStruct affyread File AffyStruct affyread File LibraryPath 2 34 affyread Arguments File LibraryPath String specifying a file name or a path and file name of one of the following Affymetrix file types DAT Data file containing raw image data CEL Data file containing information about the expression levels of the individual probes CHP Data file containing information about probe sets EXP Data file containing information about experimental conditions and protocols CDF Library file containing information about which probes belong to which probe set GIN Library file containing information about the probe sets such as the gene name with which the probe set is associated If you specify only a file name that file must be on the MATLAB search path or in the MATLAB Current Directory String specifying the path and directory where the library file CDF or GIN associated with File is stored Note This input argument is needed only if File is a CHP file 2 35 affyread 2 36 Return Values Description AffyStruct MATLAB structure containing information from the Affymetrix data or library file Note This function is supported on the Windows 32 platform only AffyStruct affyread Fi
317. formatics Toolbox functions exprprofrange generangefilter genevarfilter fastaread Purpose Syntax Arguments Description Read data from FASTA file FASTAData fastaread File Header Sequence fastaread File fastaread PropertyName PropertyValue fastaread IgnoreGaps IgnoreGapsValue fastaread Blockread BlockreadValue File FASTAData IgnoreGapsValue BlockreadValue FASTA formatted file ASCII text file Enter a file name a path and file name or a URL pointing to a file File can also be a MATLAB character array that contains the text for a file name MATLAB structure with the fields Header and Sequence Property to control removing gap symbols Enter either true or false default Property to control reading a single entry or block of entries from a file containing multiple sequences Enter a scalar N to read the Nth entry in the file Enter a 1 by 2 vector M1 M2 to read the block of entries starting at entry M1 and ending at entry M2 To read all remaining entries in the file starting at entry M1 enter a positive value for M7 and enter Inf for M2 fastaread reads data from a FASTA formatted file into a MATLAB structure with the following fields Field Header Sequence 2 137 fastaread Examples 2 138 A file with a FASTA format begins with a right angle bracket gt and a single line description Following this desc
318. functions graphallshortestpaths graphconncomp graphisdag graphisomorphism graphisspantree graphmaxflow graphminspantree graphpred2path graphtopoorder graphtraverse Bioinformatics Toolbox method of biograph object shortestpath 2 293 graphtopoorder Purpose Syntax Arguments Description Examples 2 294 Perform topological sort of directed acyclic graph order graphtopoorder G G N by N sparse matrix that represents a directed acyclic graph Nonzero entries in matrix G indicate the presence of an edge Tip For introductory information on graph theory functions see Graph Theory Functions in the Bioinformatics Toolbox documentation order graphtopoorder G returns an index vector with the order of the nodes sorted topologically In topological order an edge can exist between a source node u and a destination node v if and only if u appears before v in the vector order Gis an N by N sparse matrix that represents a directed acyclic graph DAG Nonzero entries in matrix G indicate the presence of an edge 1 Create and view a directed acyclic graph DAG with six nodes and eight edges DG sparse 6 662235 1 2 51345 1 4 true 6 6 DG E oo sss so S graphtopoorder view biograph DG Biograph Viewer 1 Mm File Tools Window Help a aan 2 Find the topological order of the DAG order graphtopoorder DG order 6 2 3 5 1 4 3 Permute the nodes so that
319. g to select commands 2 407 molviewer 2 408 Select Render Labels Color Zoom Spin Animate e Display the RasMol Scripts console by clicking Measurements Crystal Options Console About Jmol o molviewer Rasmol Scripts xi Script completed Close Run Halt Clear Help Examples View the acetylsalicylic acid aspirin molecule whose structural information is contained in the Elsevier MDL molecule file aspirin mol molviewer aspirin mol View the H5N1 influenza virus hemagglutinin molecule whose structural information is located at www rcsb org pdb files 2FKO pdb gz molviewer http www rcsb org pdb files 2FKO pdb gz 2 409 molviewer See Also 2 410 View the molecule with a PDB identifier of 2DHB molviewer 2DHB View the molecule with a PDB identifier of 4hhb and create a figure handle for the molecule viewer FH molviewer 4hhb Use the getpdb function to retrieve protein structure data from the PDB database and create a MATLAB structure Then view the protein molecule pdbstruct getpdb 1vqx molviewer pdbstruct Bioinformatics Toolbox functions evalrasmolscript getpdb pdbread pdbwrite msalign Purpose Align peaks in mass spectrum to reference peaks Syntax IntensitiesOut msalign MZ Intensities RefMZ msalign Weights WeightsValue msalign Range RangeValue msalign WidthOfPulses
320. g tree This is useful for observing the original linkage order followed by the algorithm By default seqneighjoin reroots the resulting tree using the midpoint method Examples 1 Load a multiple alignment of amino acids seqs fastaread pf00002 fa 2 Measure the Jukes Cantor pair wise distances dist seqpdist seqs method jukes cantor indels pair 3 Build the phylogenetic using the neighbor joining algorithm 2 652 seqneighjoin References See Also tree seqneighjoin dist equivar seqs view tree 1 Saitou N and Nei M 1987 The neighbor joining method A new method for reconstructing phylogenetic trees Molecular Biology and Evolution 4 4 406 425 2 Gascuel O 1997 BIONJ An improved version of the NJ algorithm based on a simple model of sequence data Molecular Biology and Evolution 14 685 695 3 Studier J A Keppler K J 1988 A note on the neighbor joining algorithm of Saitou and Nei Molecular Biology and Evolution 5 6 729 731 Bioinformatics Toolbox functions multialign phytree object constructor seqlinkage alternative method to create a phylogenetic tree seqpdist Methods of phytree object reroot view 2 653 seqpdist Purpose Calculate pair wise distance between sequences Syntax D seqpdist Seqs D seqpdist Seqs Method MethodValue D seqpdist Seqs Indels IndelsValue D seqpdist Seqgs Optargs OptargsValue D
321. ger Default is equal to GapOpenValue nwalign Seq1 Seq2 Showscore ShowscoreValue controls the display of the scoring space and winning path of the alignment Choices are true or false default 2 527 nwalign 2 528 lolx File Edit View Insert Tools Desktop Window Help OseS ps QQma 08 5o0 Scoring Space and Winning Path Sequence 2 The scoring space is a heat map displaying the best scores for all the partial alignments of two sequences The color of each n1 n2 coordinate in the scoring space represents the best score for the pairing of subsequences Seqi 1 n1 and Seq2 1 n2 where n1 is a position in Seqi and n2 is a position in Seq2 The best score for a pairing of specific subsequences is determined by scoring all possible alignments of the subsequences by summing matches and gap penalties nwalign Examples The winning path is represented by black dots in the scoring space and represents the pairing of positions in the optimal global alignment The color of the last point lower right of the winning path represents the optimal global alignment score for the two sequences and is the Score output returned by nwalign Tip The scoring space visually indicates if there are potential alternate winning paths which is useful when aligning sequences with big gaps Visual patterns in the scoring space can also indicate a possible sequence rearrangement 1 Globally align two amino a
322. groups but adds partial resubstitution when there are not enough observations You cannot set Min when using K fold cross validation Create a 10 fold cross validation to compute classification error load fisheriris indices crossvalind Kfold species 10 cp classperf species for i 1 10 test indices i train test class classify meas test meas train species train 2 111 crossvalind classperf cp class test end cp ErrorRate Approximate a leave one out prediction error estimate load carbig x Displacement y Acceleration N length x sse 0 for i 1 100 train test crossvalind LeaveMOut N 1 yhat polyval polyfit x train y train 2 x test sse sse sum yhat y test 2 end CVerr sse 100 Divide cancer data 60 40 without using the Benign observations Assume groups are the true labels of the observations labels Cancer Benign Control groups labels ceil rand 100 1 3 train test crossvalind holdout groups 0 6 classes Control Cancer sum test Total groups allocated for testing sum train Total groups allocated for training See Also Bioinformatics Toolbox functions classperf knnclassify svmclassify Statistics Toolbox functions classify grp2idx 2 112 dayhoff Purpose Dayhoff scoring matrix Syntax ScoringMatrix dayhoff Description ScoringMatrix dayhoff returns a PAM250 type sc
323. guments specifies the numbers of short descriptions returned to the quantity specified blastncbi Alignments AlignmentsValue when the function is called without output arguments specifies the number of sequences for which high scoring segment pairs HSPs are reported blastncbi Filter FilterValue selects the filter to applied to the query sequence blastncbi Expect ExpectValue provides a statistical significance threshold for matches against database sequences You can learn more about the statistics of local sequence comparison at http www ncbi nlm nih gov BLAST tutorial Altschul 1 html head2 blastncbi Word WordValue selects a word size for amino acid sequences blastncbi Matrix MatrixValue selects the substitution matrix for amino acid sequences only This matrix assigns the score for a possible alignment of two amino acid residues blastncbi GapOpen GapOpenValue selects a gap penalty for amino acid sequences Allowable values for a gap penalty vary with the selected substitution matrix For information about allowed gap penalties for matrixes other then the BLOSUM62 matrix see http www ncbi nlm nih gov staff tao URLAPI blastcgihelp_new html blastncbi ExtendGap ExtendGapValue defines the penalty for extending a gap greater than one space blastncbi blastncbi Inclusion InclusionValue for PSI BLAST only defines the statistical
324. gy Terms name view BG getancestors geneont getancestors geneont __ __ ss arrrrr See Also Bioinformatics Toolbox e functions geneont object constructor goannotread num2goid e geneont object methods getdescendants getmatrix getrelatives getbyname phytree Purpose Syntax Arguments Description Examples 4 20 Branches and leaves from phytree object S getbyname Tree Expression S getbyname Tree String Exact true Tree phytree object created by phytree function object constructor Expression Regular expression When Expression is a cell array of strings getbyname returns a matrix where every column corresponds to every query in Expression For information about the symbols that you can use in a matching regular expression see the MATLAB function regexp String String or cell array of strings S getbyname Tree Expression returns a logical vector S of size NumNodes by 1 with the node names of a phylogenetic tree Tree that match the regular expression Expression regardless of letter case S getbyname Tree String Exact true looks for exact string matches and ignores case When String is a cell array of char strings getbyname returns a vector with indices 1 Load a phylogenetic tree created from a protein family tr phytreeread pf00002 tree 2 Select all the mouse and human proteins sel getbyname tr mo
325. h gene in two microarray data sets such as returned by mattest BHFDRValue Property to control the use of the linear step up LSU procedure originally introduced by Benjamini and Hochberg 1995 Choices are true or false default Note If BHFDRValue is set to true the Lambda and Method properties are ignored 2 353 mafdr 2 354 LambdaValue Input that specifies lambda the tuning parameter used to estimate the true null hypotheses 7to A LambdaValue can be either e A single value that is gt 0 and lt 1 e A series of values Each value must be gt 0 and lt 1 There must be at least four values in the series Tip The series of values can be expressed by a colon operator with the form first incr last where first is the first value in the series incr is the increment and last is the last value in the series Default LambdaValue is the series of values 0 01 0 01 0 95 Note If LambdaValue is set to a single value the Method property is ignored mafdr Return Values Description MethodValue String that specifies a method to calculate the true null hypothesis 7o A from the tuning parameter LambdaValue when LambdaValue is a series of values Choices are e bootstrap default e polynomial ShowplotValue Property to display two plots e Plot of the estimated true null hypotheses 7to A versus the tuning parameter lambda A with a cubic polynomial fitting
326. h is represented by black dots in the scoring space and represents the pairing of positions in the optimal local alignment The color of the last point lower right of the winning path represents the optimal local alignment score for the two sequences and is the Score output returned by swalign Tip The scoring space visually shows tandem repeats small segments that potentially align and partial alignments of domains from rearranged sequences 1 Locally align two amino acid sequences using the BLOSUM50 default scoring matrix and the default values for the GapOpen and ExtendGap properties Return the optimal local alignment score in bits and the alignment character array Return the optimal global alignment score in bits and the alignment character array Score Alignment swalign VSPAGMASGYD IPGKASYD Score 8 6667 Alignment PAGMASGYD Pi Boer Be P GKAS YD 2 Locally align two amino acid sequences specifying the PAM250 scoring matrix and a gap open penalty of 5 Score Alignment swalign HEAGAWGHEE PAWHEAE ScoringMatrix pam250 GapOpen 5 2 721 swalign Score 8 Alignment GAWGHE LT PAW HE 3 Locally align two amino acid sequences returning the Score in nat units nats by specifying a scale factor of 10g 2 Score Alignment swalign HEAGAWGHEE PAWHEAE Scale 1log 2 Score 6 4694 Alignment AWGHE Il I AW HE References 1 Durbin
327. he Affymetrix library CDF file CELNames Cell array of names of the Affymetrix CEL files NumProbeSets Number of probe sets in each CEL file ProbeSetIDs Cell array of the probe set IDs from the Affymetrix CDF library file ProbeIndices Column vector containing probe indexing information Probes within a probe set are numbered 0 through N 1 where N is the number of probes in the probe set 2 79 celintensityread 2 80 Examples Field Description PMIntensities Matrix containing PM probe intensity values Each row corresponds to a probe and each column corresponds to a CEL file The rows are ordered the same as in ProbeIndices and the columns are ordered the same as in the CELFiles input argument MMIntensities Matrix containing MM probe intensity values Each row corresponds to a probe and each column corresponds to a CEL file The rows are ordered the same as in ProbeIndices and the columns are ordered the same as in the CELFiles input argument ProbeStructure celintensityread Verbose VerboseValue controls the display of a progress report showing the name of each CEL file as it is read When VerboseValue is false no progress report is displayed Default is true The following example assumes that you have the HG_U95Av2 CDF library file stored at D Affymetrix LibFiles HGGenome and that your Current Directory points to a location containing CEL files ass
328. he neighbor joining method seqs fastaread pf00002 fa seqs 33x1 struct array with fields Header Sequence dist seqpdist seqs method jukes cantor indels pair NJtree seqneighjoin dist equivar seqs Phylogenetic tree object with 33 leaves 32 branches 2 Create another phylogenetic tree from the same sequence data and pair wise distances between sequences using the single linkage method HCtree seqlinkage dist single seqs Phylogenetic tree object with 33 leaves 32 branches 4 61 reorder phytree See Also 4 62 3 Use the optimal leaf ordering calculation to reorder the leaves in HCtree such that it matches the order of leaves in NJtree as closely as possible without dividing the clades or having crossing branches HCtree_reordered reorder HCtree NJtree Phylogenetic tree object with 33 leaves 32 branches 4 View the reordered phylogenetic tree and the tree used to reorder it view HCtree_reordered view NJtree Bioinformatics Toolbox function phytree object constructor Bioinformatics Toolbox object phytree object Bioinformatics Toolbox methods of a phytree object get getbyname prune reroot phytree Purpose Syntax Arguments Description Examples Change root of phylogenetic tree Tree2 reroot Tree7 Tree2 reroot Tree1 Node Tree2 reroot Tree7i Node Distance Tree1 Phylogenetic tree phytree object created with the function phytree
329. her the signal variance is added to the weight function for smoothing low signal edge Choices are true or false default gcrmabackad ShowplotValue Controls the display of a plot showing the log of probe intensity values from a specified column chip in MMMatrix versus probe affinities in AffinMM Choices are true false or I an integer specifying a column in MMMatrix If set to true the first column in MMMatrix is plotted Default is e false When return values are specified e true When return values are not specified VerboseValue Controls the display of a progress report showing the number of each chip as it is completed Choices are true default or false Return Values PMMatrix_Adj Matrix of background adjusted PM perfect match intensity values nsbStruct Structure containing nonspecific binding background parameters estimated from the intensities and affinities of probes on an Affymetrix GeneChip array nsbStruct includes the following fields e sigma e mu_pm e mu_mm Description PMMatrix_Adj gcrmabackadj PMMatrix MMMatrix AffinPM AffinMM performs GCRMA background adjustment including optical background correction and nonspecific binding correction on Affymetrix microarray probe level data using probe sequence information and returns PMMatrix_Adj a matrix of background adjusted PM perfect match intensity values 2 171 gcrmabackadj 2 172 Note If AffinPM and Affinm
330. hoices for rule are nearest Majority rule with nearest point tie break default random Majority rule with random point tie break consensus Consensus rule The default behavior is to use majority rule That is a sample point is assigned to the class the majority of the k nearest neighbors are from Use consensus to require a consensus as opposed to majority rule When using the consensus option points where not all of the k nearest neighbors are from the same class are not assigned to one of the classes Instead the output Class for these points is NaN for numerical groups or for string named groups When classifying to more than two groups or when using an even value for k it might be necessary to break a tie in the number of nearest neighbors Options are random which selects a random tiebreaker and nearest which uses the nearest neighbor among the tied groups to break the tie The default behavior is majority rule with nearest tie break 2 341 knnclassify Examples Classifying Rows The following example classifies the rows of the matrix sample sample 9 83 1 35 2 6 sample 0 9000 0 8000 0 1000 0 3000 0 2000 0 6000 training 0 05 5 5 1 1 training 0 0 0 5000 0 5000 1 0000 1 0000 group 1 2 3 group class knnclassify sample training group class Row 1 of sample is closest to row 3 of Training so class 1 3 Row 2 of sample is closest to row 1 of Training so
331. ht nmercount ntdensity seqshowwords seqwordcount Sequence Visualization featuresmap seqtool Count nucleotides in sequence Nucleotide codes abbreviations and names Calculate codon frequency for each amino acid in DNA sequence Count codons in nucleotide sequence Locate CpG islands in DNA sequence Count dimers in sequence Estimate isoelectric point for amino acid sequence Calculate molecular weight of amino acid sequence Count number of n mers in nucleotide or amino acid sequence Plot density of nucleotides along sequence Graphically display words in sequence Count number of occurrences of word in sequence Draw linear or circular map of features from GenBank structure Open tool to interactively explore biological sequences 1 9 T Functions By Category Pair wise Sequence Alignment fastaread nwalign seqdotplot showalignment swalign Multiple Sequence Alignment 1 10 fastaread multialign multialignread multialignviewer profalign seqpdist showalignment Read data from FASTA file Globally align two sequences using Needleman Wunsch algorithm Create dot plot of two sequences Sequence alignment with color Locally align two sequences using Smith Waterman algorithm Read data from FASTA file Align multiple sequences using progressive method Read multiple sequence alignment file Open viewer for multiple sequence alignments Align two profiles u
332. hylogenetic Tree 0 ccc cece eens 3 1 Graph Visualization 0 ccc cece eens 3 2 Gene Ontology ccc eens 3 3 Contents Methods Alphabetical List 4 Objects Alphabetical List 5 Index vii Vili Contents Functions By Category Constructor p 1 3 Data Formats and Databases p 1 4 Trace Tools p 1 6 Sequence Conversion p 1 6 Sequence Utilities p 1 7 Sequence Statistics p 1 8 Sequence Visualization p 1 9 Pair wise Sequence Alignment p 1 10 Create objects Get data into MATLAB from Web databases read and write to files using specific sequence data formats Read data from SCF file and draw nucleotide trace plots Convert nucleotide and amino acid sequences between character and integer formats reverse and complement order of nucleotide bases and translate nucleotides codons to amino acids Calculate consensus sequence from set of multiply aligned sequences run BLAST search from MATLAB and search sequences using regular expressions Determine base counts nucleotide density codon bias and CpG islands search for words and identify open reading frames ORFs Visualize sequence data Compare nucleotide or amino acid sequences using pair wise sequence alignment functions T Functions By Category Multiple Sequence Alignment p 1 10 Scoring Matrices p 1 11 Phylogenetic Tree Tools p 1 11
333. i Label sprintf s d h Nodes order i 1D i end h ShowTextInNodes label dolayout h File Tools Window Help Sery 2 302 graphtraverse 4 Traverse the graph to find the breadth first search BFS discovery order starting at node 4 order graphtraverse DG 4 Method BFS order 4 5 3 6 7 1 9 8 2 10 5 Label the nodes with the BFS discovery order for i 1 10 h Nodes order i sprintf ss d end h ShowTextInNodes label dolayout h Label h Nodes order i 1D i 2 303 graphtraverse Biograph Viewer 1 BEE File Tools Window Help aan 6 Find and color nodes that are close to within two edges of node 4 node_idxs graphtraverse DG 4 depth 2 node_idxs 4 5 3 6 7 set h nodes node_idxs Color 1 0 0 2 304 graphtraverse Biograph Viewer 1 l File Tools Window Help QA Node 1 6 Node 2 9 E FE Node 10 10 References 1 Sedgewick R 2002 Algorithms in C Part 5 Graph Algorithms Addison Wesley 2 Siek J G Lee L Q and Lumsdaine A 2002 The Boost Graph Library User Guide and Reference Manual Upper Saddle River NJ Pearson Education 2 305 graphtraverse See Also Bioinformatics Toolbox functions graphallshortestpaths graphconncomp graphisdag graphisomorphism graphisspantree graphmaxflow graphminspantree graphpred2path
334. ics Toolbox object phytree object Bioinformatics Toolbox method of a phytree object pdist seqprofile Purpose Syntax Arguments Calculate sequence profile from set of multiply aligned sequences Profile seqprofile Seqs PropertyName PropertyValue Profile Symbols seqprofile Seqs seqprofile Alphabet AlphabetValue seqprofile Counts CountsValue seqprofile Gaps GapsValue seqprofile Ambiguous AmbiguousValue seqprofile Limits LimitsValue Seqs Set of multiply aligned sequences Enter an array of strings cell array of strings or an array of structures with the field Sequence Alphabet Sequence alphabet Enter NT nucleotides AA amino acids or none The default alphabet is AA When Alphabet is none the symbol list is based on the observed symbols Every character can be a symbol except for a hyphen and a period which are reserved for gaps Count Property to control returning frequency ratio of counts total counts or counts Enter either true counts or false frequency The default value is false Gaps Property to control counting gaps in a sequence Enter all counts all gaps noflanks counts all gaps except those at the flanks of every sequence or none The default value is none 2 665 seqprofile Description 2 666 Ambiguous Property to control counting ambiguous symbols Enter Count to a
335. ies oligoprop Seqn7T Salt SaltValue specifies a salt concentration in moles liter for melting temperature calculations Default is 0 05 moles liter SeqProperties oligoprop SeqnT Temp TempValue specifies the temperature in degrees Celsius for nearest neighbor calculations of free energy Default is 25 degrees Celsius SeqProperties oligoprop SeqnT Primerconc PrimerconcValue specifies the concentration in moles liter for melting temperatures Default is 50e 6 moles liter SeqProperties oligoprop SeqnT HPBase HPBaseValue specifies the minimum number of paired bases that form the neck of the hairpin Default is 4 base pairs SeqProperties oligoprop SeqNT HPLoop HPLoopValue specifies the minimum number of bases that form the loop of a hairpin Default is 2 bases SeqProperties oligoprop Seqn7T Dimerlength DimerlengthValue specifies the minimum number of aligned bases between the sequence and its reverse Default is 4 bases Calculating Properties for a DNA Sequence 1 Create a random sequence seq randseq 25 seq TAGCTTCATCGTTGACTTCTACTAA 2 Calculate sequence properties of the sequence oligoprop S1 oligoprop seq S1 GC 36 GCAlpha 0 Hairpins 0x25 char Dimers tAGCTtcatcgttgacttctactaa MolWeight 7 5820e 003 MolWeightAlpha 0 Tm 52 7640 60 8629 62 2493 55 2870 54 0293 61 0614 TmAlpha 0 0 0 0 0 0 Thermo 4x3 double
336. ieve the sequence for the human insulin receptor and store it in a structure Seq in the MATLAB Command Window type Seq getgenpept AAA59174 Seq LocusName AAA59174 LocusSequenceLength 1382 LocusNumberofStrands 2 207 getgenpept See Also 2 208 LocusTopology LocusMoleculeType LocusGenBankDivision LocusModificationDate Definition Accession Version GI Project DBSource Keywords Source SourceOrganism Reference Comment Features Sequence SearchuURL RetrieveURL Bioinformatics Toolbox functions getpdb linear PRI 06 JAN 1995 insulin receptor precursor AAA59174 AAA59174 1 307070 locus HUMINSR accession M10051 1 Homo sapiens human 4x65 char 1x1 struct 14x67 char 40x64 char 1x1382 char 1x104 char 1x92 char genpeptread getemb1 getgenbank getgeodata Purpose Syntax Arguments Description Retrieve Gene Expression Omnibus GEO Sample GSM data Data getgeodata AccessionNumber getgeodata PropertyName PropertyValue getgeodata ToFile ToFileValue AccessionNumber Unique identifier for a sequence record Enter a combination of letters and numbers ToFileValue Property to specify the location and file name for saving data Enter either a file name or a path and file name supported by your system ASCII text file Data getgeodata AccessionNu
337. if Complement is true that is where the elements match their complementary pairs A T or U and C G instead of an exact nucleotide match p 1 S palindromes GCTAGTAACGTATATATAAT p 11 12 l 7 7 S TATATAT ATATATA pc lc sc palindromes GCTAGTAACGTATATATAAT Complement true palindromes Find the palindromes in a random nucleotide sequence a randseq 100 a TAGCTTCATCGTTGACTTCTACTAA AAGCAAGCTCCTGAGTAGCTGGCCA AGCGAGCTTGCTTGTGCCCGGCTGC GGCGGTTGTATCCTGAATACGCCAT pos len pal palindromes a pos 74 len 6 pal GCGGCG See Also Bioinformatics Toolbox functions seqrcomplement seqshowwords MATLAB functions regexp strfind 2 545 pam Purpose PAM scoring matrix Syntax ScoringMatrix pam N PropertyName PropertyValue ScoringMatrix MatrixInfo pam N ScoringMatrix pam Extended ExtendedValue ScoringMatrix pam Order OrderValue Arguments N Enter values 10 10 500 The default ordering ofthe outputisA RNDCQEGHILKM FPSTWYVBZX Entering a larger value for N to allow sequence alignments with larger evolutionary distances Extended Property to add ambiguous characters to the scoring matrix Enter either true or false Default is false Order Property to control the order of amino acids in the scoring matrix Enter a string with at least the 20 standard amino acids Description ScoringMatrix pam N PropertyName
338. ignores the characters and displays a warning message Warning Symbols other than the standard 20 amino acids appear in the sequence e Ifthe sequence contains undefined amino acid characters i j o isoelectric ignores the characters and displays a warning message Warning Sequence contains unknown characters These will be ignored isoelectric PropertyName PropertyValue defines optional properties using property name value pairs isoelectric PKVals PKValsValue uses the alternative pK table stored in the text file PKValValues For an example of a pK text file see the file Emboss pK N_term 8 6 K 10 8 12 5 6 oR WwW a ou 10 1 R H D E C Y C_term 3 6 isoelectric Charge ChargeValue returns the estimated charge of a sequence for a given pH ChargeValue 2 333 isoelectric isoelectric Chart ChartValue when ChartValue is true returns a graph plotting the charge of the protein versus the pH of the solvent Example Get a sequence from PDB pdbSeq getpdb 1CIV SequenceOnly true Estimate its isoelectric point isoelectric pdbSeq Plot the charge against the pH for a short polypeptide sequence isoelectric PQGGGGWGQPHGGGWGQPHGGGGWGQGGSHSQG CHART true Get the Rh blood group D antigen from NCBI and calculate its charge at pH 7 3 typical blood pH gpSeq getgenpept AAB39602 pI Charge isoelectric gpSeq Charge 7 38 Se
339. ile seqconsensus PropertyName PropertyValue defines optional properties using property name value pairs seqconsensus ScoringMatrix ScoringMatrixValue specifies the scoring matrix The following input parameters are analogous to the function seqprofile when the alphabet is restricted to AA or NT seqconsensus Alphabet AlphabetValue seqconsensus Gaps GapsValue seqconsensus Ambiguous AmbiguousValue seqconsensus Limits LimitsValue Examples seqs fastaread pf00002 fa C S seqconsensus seqs limits 50 60 gaps all See Also Bioinformatics Toolbox functions fastaread multialignread profalign seqdisp seqprofile 2 633 seqdisp Purpose Syntax Arguments Description 2 634 Format long sequence output for easy viewing seqdisp Seq seqdisp seqdisp seqdisp seqdisp Seq Row Column ShowNumbers PropertyName PropertyValue Row RowValue Column ColumnValue ShowNumbers ShowNumbersValue Nucleotide or amino acid sequence Enter a character array a FASTA file name or a MATLAB structure with the field Sequence Multiply aligned sequences are allowed FASTA files can have the file extension fa fasta fas fsa or fst Property to select the length of each row Enter an integer The default length is 60 Property to select the column width or number of symbols before displayi
340. in the Bioinformatics Toolbox documentation path graphpred2path pred D traces back a path by following the predecessor list in pred starting at destination node D The value of the root or source node in pred must be 0 If a NaN is found when following the predecessor nodes graphpred2path returns an empty path If predisa And Dis Then path isa a eco row vector of scalar row vector listing the nodes from the predecessor root or source to D node indices row vector row cell array with every column containing the path to the destination for every element in D 2 278 graphpred2path If predisa And Dis Then path isa a eco matrix scalar column cell array with every row containing the path for every row in pred row vector matrix cell array with every row containing the paths for the respective row in pred and every column containing the paths to the respective destination in D Note IfD is omitted the paths to all the destinations are calculated for every predecessor listed in pred Examples 1 Create a phytree object from the phylogenetic tree file for the GLR_HUMAN protein tr phytreeread pf00002 tree Phylogenetic tree object with 33 leaves 32 branches 2 View the phytree object view tr 2 279 graphpred2path Phylogenetic Tree Tool 1 5 M o x File Tools Window Help RAPAE DTA BAI2_HUMAN 917 1
341. indicate the presence of an edge G2 N by N sparse matrix that represents a directed or undirected graph G2 must be the same directed or undirected as G7 DirectedValue Property that indicates whether the graphs are directed or undirected Enter false when both G1 and G2 are undirected graphs In this case the upper triangles of the sparse matrices G1 and G2 are ignored Default is true meaning that both graphs are directed Tip For introductory information on graph theory functions see Graph Theory Functions in the Bioinformatics Toolbox documentation Isomorphic Map graphisomorphism G1 G2 returns logical 1 true in Isomorphic if G1 and G2 are isomorphic graphs and logical 0 false otherwise A graph isomorphism is a 1 to 1 mapping of the nodes in the graph G7 and the nodes in the graph G2 such that adjacencies are preserved G1 and G2 are both N by N sparse matrices that represent directed or undirected graphs Return value Isomorphic is Boolean When Isomorphic is true Map is a row vector containing the node indices that map from G2 to G1 When Isomorphic is false the worst case time complexity is 0 N where N is the number of nodes 2 255 graphisomorphism Isomorphic Map graphisomorphism G7 G2 Directed DirectedValue indicates whether the graphs are directed or undirected Set DirectedValue to false when both G1 and G2 are undirected graphs In this case the upper triangles of the spa
342. ine color of the node Default is 0 3 0 3 1 which defines blue FontSize Positive number that sets the size of the node font in points Default is 8 biograph object Property Description TextColor Three element numeric vector of RGB values that specifies the color of the node labels Default is 0 0 0 which defines black UserData Miscellaneous user defined data that you want to associate with the node The node does not use this property but you can access and specify it using the get and set functions Default is Properties of an Edge Object Property Description ID Read only string defined when the biograph object is created internally by the biograph constructor function Each edge object s ID is unique and used internally to identify the edge Label String for labeling an edge when you display a biograph object using the view method Default is the ID property of the edge object Description Weight String that describes the edge Default is This information is for bookkeeping purposes only Value that represents the weight cost distance length or capacity associated with the edge Default is 1 LineWidth Positive number Default is 1 LineColor Three element numeric vector of RGB values that specifies the color of the edge Default is 0 5 0 5 0 5 which defines gray UserData Miscellaneous user defined d
343. ing of a histogram showing the distribution of PM probe intensity values blue and the convoluted probability distribution function red with estimated parameters Enter either all plot a histogram for each column or chip or specify a subset of columns chips by entering the column number list of numbers or range of numbers For example e Showplot 3 plots the intensity values in column 3 e Showplot 3 5 7 plots the intensity values in columns 3 5 and 7 e Showplot 3 9 plots the intensity values in columns 8 to 9 BackgroundAdjustedMatrix rmabackadj PMData returns the background adjusted values of probe intensities in the matrix PMData Note that each row in PMData corresponds to a perfect match PM probe and each column in PMData corresponds to an Affymetrix CEL file Each CEL file is generated from a separate chip All chips should be of the same type Details on the background adjustment are described by Bolstad 2005 BackgroundAdjustedMatrix rmabackadj PropertyName PropertyValue defines optional properties that use property name value pairs in any order These property name value pairs are as follows rmabackadj BackgroundAdjustedMatrix rmabackadj Method MethodValue controls the estimation method for the background adjustment model parameters When MethodValue is RMA rmabackadj implements the estimation method descri
344. inpropplot Description 2 586 EdgeWeightValue Value that specifies the edge weight used for linear and exponential smoothing methods Decreasing this value emphasizes peaks in the plot Choices are any value 20 and lt 1 Default is 1 WindowLengthValue Integer that specifies the window length for the smoothing method Increasing this value gives a smoother plot that shows less detail Default is 11 proteinpropplot SeqAA displays a plot of the hydrophobicity Kyte and Doolittle 1982 of the residues in sequence SeqAA proteinpropplot SeqAA PropertyName PropertyValue calls proteinpropplot with optional properties that use property name property value pairs You can specify one or more properties in any order Each PropertyName must be enclosed in single quotation marks and is case insensitive These property name property value pairs are as follows proteinpropplot SeqAA PropertyTitle PropertyTitleValue specifies a property to plot for the amino acid sequence SeqAA Default is Hydrophobicity Kyte amp Doolittle To display a list of possible properties to plot enter an empty string for PropertyTitleValue For example type proteinpropplot sequence propertytitle Tip To access references for the properties view the proteinpropplot m file proteinpropplot SeqAA Startat StartatValue specifies the starting point for the plot from the N terminal end of the amino a
345. iograph object isdag biograph isomorphism biograph isspantree biograph maxflow biograph minspantree biograph shortestpath biograph topoorder biograph traverse biograph view biograph Gene Ontology Gene cane Test for cycles in biograph object Find isomorphism between two biograph objects Determine if tree created from biograph object is spanning tree Calculate maximum flow and minimum cut in biograph object Find minimal spanning tree in biograph object Solve shortest path problem in biograph object Perform topological sort of directed acyclic graph extracted from biograph object Traverse biograph object by following adjacent nodes Draw figure from biograph object Following are methods for use with a geneont object getancestors geneont getdescendants geneont getmatrix geneont getrelatives geneont Numeric Ds for ancestors of Gene Ontology term Numeric IDs for descendants of Gene Ontology term Convert geneont object into relationship matrix Numeric Ds for relatives of Gene Ontology term 3 Methods By Category 3 4 Methods Alphabetical List allshortestpaths biograph 4 2 Purpose Syntax Arguments Description Find all shortest paths in biograph object dist allshortestpaths BGObj dist allshortestpaths BGObj Directed DirectedValue dist allshortestpaths BGObj Weights WeightsValue
346. ion get phytree Information about phylogenetic tree object Value1 Value2 get Tree Property1 Property2 get Tree V get Tree Tree Phytree object created with the function phytree Name Property name for a phytree object Value1 Value2 get Tree Property1 Property2 returns the specified properties from a phytree object Tree Properties for a phytree object are listed in the following table Property Description NumLeaves Number of leaves NumBranches Number of branches NumNodes Number of nodes NumLeaves NumBranches Pointers Branch to leaf branch connectivity list Distances Edge length for every leaf branch LeafNames Names of the leaves BranchNames Names of the branches NodeNames Names of all the nodes get Tree displays all property names and their current values for a phytree object Tree get phytree V get Tree returns a structure where each field name is the name of a property of a phytree object Tree and each field contains the value of that property Examples 1 Read in a phylogenetic tree from a file tr phytreeread pf00002 tree 2 Get the names of the leaves protein_names get tr LeafNames protein_names BAI2_HUMAN 917 1197 BAI1_HUMAN 944 1191 000406 622 883 See Also Bioinformatics Toolbox e functions phytree object constructor phytreeread e phytree object methods getbyname select getancesto
347. ion biograph object constructor Bioinformatics Toolbox object biograph object Bioinformatics Toolbox methods of a biograph object dolayout getancestors getdescendants getedgesbynodeid getnodesbyid getrelatives view MATLAB functions get set 4 30 getmatrix biograph Purpose Syntax Arguments Description Examples See Also Get connection matrix from biograph object Matrix ID Distances getmatrix BGObj7 BGObj biograph object created by biograph object constructor Matrix ID Distances getmatrix BGObj converts the biograph object BiographObj into a logical sparse matrix Matrix in which 1 indicates that a node row index is connected to another node column index ID is a cell array of strings listing the ID properties for each node and corresponds to the rows and columns of Matrix Distances is a column vector with one entry for every nonzero entry in Matrix traversed column wise and representing the respective Weight property for each edge cm 0 1100 200 4 4 40000 000 0 234050 0 bg biograph cm cm IDs dist getmatrix bg Bioinformatics Toolbox function biograph object constructor Bioinformatics Toolbox object biograph object Bioinformatics Toolbox methods of a biograph object dolayout getancestors getdescendants getedgesbynodeid getnodesbyid getrelatives view 4 31 getmatrix geneont 4 32 Purpose Syntax Arguments Description Examples
348. ion kernel with a default scaling factor sigma of 1 e polynomial Polynomial kernel with a default order of 3 e mlp Multilayer Perceptron kernel with default scale and bias parameters of 1 1 e functionname Handle to a kernel function specified using and the functionname For example kfun or an anonymous function Positive number that specifies the scaling factor sigma in the radial basis function kernel Default is 1 Positive number that specifies the order of a polynomial kernel Default is 3 Two element vector p1 p2 that specifies the scale and bias parameters of the multilayer perceptron mlp kernel K tanh p1 U V p2 p1 must be gt 0 and p2 must be lt 0 Default is 1 1 2 701 svmtrain 2 702 MethodValue QuadProg OptsValue SMO_OptsValue String specifying the method to find the separating hyperplane Choices are e QP Quadratic Programming requires Optimization Toolbox The classifier is a two norm soft margin support vector machine e SMO Sequential Minimal Optimization The classifier is a one norm soft margin support vector machine e LS Least Squares If you installed Optimization Toolbox the QP method is the default Otherwise the SMO method is the default An options structure created by the optimset function Optimization Toolbox This structure specifies options used by the QP method For more information on creating
349. irs multialignread IgnoreGaps IgnoreGapsValue when IgnoreGapsValue is true removes any gap symbol or from the sequences Default is false multialignread Examples Read a multiple sequence alignment of the gag polyprotein for several HIV strains gagaa multialignread aagag aln gagaa 1x16 struct array with fields Header Sequence See Also Bioinformatics Toolbox functions fastaread gethmmalignment multialign seqconsensus seqdisp seqprofile 2 505 multialignviewer Purpose Syntax Description Examples See Also 2 506 Open viewer for multiple sequence alignments multialignviewer Alignment multialignviewer PropertyName PropertyValue multialignviewer Alphabet AlphabetValue The multialignviewer is an interactive graphical user interface GUI for viewing multiple sequence alignments multialignviewer Alignment loads a group of previously multiple aligned sequences into the viewer Alignment is a structure with a field Sequence a character array or a file name multialignviewer PropertyName PropertyValue defines optional properties using property name value pairs multialignviewer Alphabet AlphabetValue specifies the alphabet type for the sequences AlphabetValue can be AA for amino acids or NT for nucleotides The default value is AA If AlphabetValue is not specified multialignviewer guesses the alphabet
350. isjoint subsets Repeated calls return different randomly generated partitions K defaults to 5 when omitted In K fold cross validation K 1 folds are used for training and the last fold is used for evaluation This process is repeated K times leaving one different fold for evaluation each time Train Test crossvalind HoldOut N P returns logical index vectors for cross validation of N observations by randomly selecting P N approximately observations to hold out for the evaluation set P must be a scalar between 0 and 1 P defaults to 0 5 when omitted corresponding to holding 50 out Using holdout cross validation within a loop is similar to K fold cross validation one time outside the loop except that non disjointed subsets are assigned to each evaluation Train Test crossvalind LeaveMOut N M where Mis an integer returns logical index vectors for cross validation of N observations by randomly selecting M of the observations to hold out for the evaluation set M defaults to 1 when omitted Using LeaveMOut cross validation within a loop does not guarantee disjointed evaluation sets Use K fold instead Train Test crossvalind Resubstitution N P Q returns logical index vectors of indices for cross validation of N observations by randomly selecting P N observations for the evaluation set and Q N observations for training Sets are selected in order to crossvalind Examples minimize the number of obser
351. ith itself then d 0 2 659 seqpdist Methods with No Scoring of Gaps Nucleotides Only Method Tajima Nei Kimura Tamura Hasegawa Nei Tamura Description Maximum likelihood estimate considering the background nucleotide frequencies It can be computed from the input sequences or given by setting Optargs to gA gC gG gT gA gC gG gT are scalar values for the nucleotide frequencies Considers separately the transitional nucleotide substitution and the transversional nucleotide substitution Considers separately the transitional nucleotide substitution the transversional nucleotide substitution and the GC content GC content can be computed from the input sequences or given by setting Optargs to the proportion of GC content scalar value form 0 to 1 Considers separately the transitional nucleotide substitution the transversional nucleotide substitution and the background nucleotide frequencies Background frequencies can be computed from the input sequences or given by setting the Optargs property to gA gC gG gT Considers separately the transitional nucleotide substitution between purines the transitional nucleotide substitution between pyrimidines the transversional nucleotide substitution and the background nucleotide frequencies Background frequencies can be computed from the input sequences or given by setting the Optargs property to gA gC gG gT 2 660 seqpdist Meth
352. k distance Class knnclassify Sample Training Group k distance rule Sample Matrix whose rows will be classified into groups Sample must have the same number of columns as Training Training Matrix used to group the rows in the matrix Sample Group Training must have the same number of columns as Sample Each row of Training belongs to the group whose value is the corresponding entry of Group Vector whose distinct values define the grouping of the rows in Training The number of nearest neighbors used in the classification Default is 1 2 339 knnclassify Description 2 340 distance String to specify the distance metric Choices are e euclidean Euclidean distance default e cityblock Sum of absolute differences e cosine One minus the cosine of the included angle between points treated as vectors e correlation One minus the sample correlation between points treated as sequences of values e hamming Percentage of bits that differ only suitable for binary data rule String to specify the rule used to decide how to classify the sample Choices are e nearest Majority rule with nearest point tie break default e random Majority rule with random point tie break e consensus Consensus rule Class knnclassify Sample Training Group classifies the rows of the data matrix Sample into groups based on the grouping of the rows of Training S
353. land GCmin GCminValue Ccpgisland Plot PlotValue SeqDNA DNA nucleotide sequence Enter a character string with the letters A T C and G You can also enter a structure with the field Sequence cpgisland does not count ambiguous bases or gaps cpgisland SeqDNA finds CpG islands by marking bases within a moving window of 100 DNA bases with a GC content greater than 50 and a CpGobserved CpGexpected ratio greater than 60 cpgisland PropertyName PropertyValue defines optional properties using property name value pairs cpgisland Window WindowValue specifies the window size for calculating GC percent and CpGobserved CpGexpected ratios for a sequence The default value is 100 bases A smaller window size increases the noise in a plot cpgisland MinIsland MinIslandValue specifies the minimum number of consecutive marked bases to report The default value is 200 bases cpgisland CpGoe CpGoeValue specifies the minimum CpGobserved CpGexpected ratio in each window needed to mark a base Enter a value between 0 and 1 The default value is 0 6 This ratio is defined as 2 107 cpgisland CPGobs CpGexp NumCpGs Length NumGs NumCs cpgisland GCmin GCminValue specifies the minimum GC percent in a window needed to mark a base Enter a value between 0 and 1 The default value is 0 5 cpgisland Plot PlotValue when Plot is true plots GC con
354. le ASCII text file Valid file types include e PDB e MOL MDL e SDF e XYZ e SMOL e JVXL e CIF mmCIF String specifying a unique identifier for a protein structure record in the PDB database Note Each structure in the PDB database is represented by a four character alphanumeric identifier For example 4hhb is the identifier for hemoglobin A structure containing a field for each PDB record such as returned by the getpdb or pdbread function molviewer Return Values Description FigureHandle Figure handle to a Molecule Viewer window molviewer opens a blank Molecule Viewer window You can display 3 D molecular structures by selecting File gt Open File gt Load PDB ID or File gt Open URL molviewer File reads the data in a molecule model file File and opens a Molecule Viewer window displaying the 3 D molecular structure for viewing and manipulation molviewer pdbID retrieves the data for a protein structure record pdbID from the PDB database and opens a Molecule Viewer window displaying the 3 D molecular structure for viewing and manipulation molviewer pdbStruct reads the data from pdbStruct a structure containing a field for each PDB record and opens a Molecule Viewer window displaying a 3 D molecular structure for viewing and manipulation FigureHandle molviewer returns the figure handle to the Molecule Viewer window Tip You can pass the FigureHand1e to the evalrasmol
355. le reads File an Affymetrix file and creates AffyStruct a MATLAB structure AffyStruct contains the following fields AffyStruct affyread File LibraryPath specifies the path and directory where the library file CDF or GIN associated with File is stored Use this syntax only if File is a CHP file You can learn more about the Affymetrix GeneChip files and download sample files from http www affymetrix com support technical sample_data demo_data affx Note Some Affymetrix sample data files DAT EXP CEL and CHP are combined together in a DTT file You must download and use the Affymetrix Data Transfer Tool to extract these files from the DTT file affyread Examples Caution When using affyread to read a CHP file the Affymetrix GDAC Runtime Libraries look for the associated CEL file in the directory that it was in when the CHP file was created If the CEL file is not found then affyread does not read probe set values in the CHP file If you encounter errors reading files then check that the Affymetrix GDAC Runtime Libraries are correctly installed You can reinstall the libraries by running the installer from Windows Explorer MATLAB toolbox bioinfo microarray lib GdacFilesRuntimeInstall v4 exe The following example assumes that Drosophila CEL and Drosophila dat are stored on the MATLAB search path or in the MATLAB Current Directory It also assumes that Drosophila chp is stored on the MATL
356. leotide sequences Caution If SeqNT1 and SeqNT2 are too short or too divergent saturation can be reached and dndsml returns NaNs and a warning message Dn Ds Like dndsml SeqNT71 SeqNT2 PropertyName PropertyValue calls dnds with optional properties that use property name property value pairs You can specify one or more properties in any order Each PropertyName must be enclosed in single quotation marks and is case insensitive These property name property value pairs are as follows Dn Ds Like dndsml SeqnNT1 SeqNT2 GeneticCode GeneticCodeValue calculates synonymous and nonsynonymous substitution rates using the specified genetic code Enter a Code Number or a string with a Code Name from the table If you use a Code Name you can truncate it to the first two characters Default is 1 or Standard Dn Ds Like dndsml SeqNT1 SeqNT2 Verbose VerboseValue controls the display of the codons considered in the computations and their amino acid translations Choices are true or false default Tip Specify true to use this display to manually verify the codon alignment of the two input sequences SeqNT1 and SeqNT2 The presence of stop codons in the amino acid translation can indicate that SeqNT1 and SeqNT2 are not codon aligned 2 127 dndsml Examples Estimating Synonymous and Nonsynonymous Substitution Rates Between the gag Genes of Two HIV Viruses 1 Re
357. line for the classifier when using two dimensional data Choices are true or false default Memory Usage and Out of Memory Error When you set Method to QP the svmtrain function operates on a data set containing N elements it creates an N 1 by N 1 matrix to find the separating hyperplane This matrix needs at least 8 n 1 2 bytes of contiguous memory If this size of contiguous memory is not available MATLAB displays an out of memory message When you set Method to SMO memory consumption is controlled by the SMO option KernelCacheLimit For more information on the KernelCacheLimit option see the svmsmoset function The SMO algorithm stores only a submatrix of the kernel matrix limited by the size specified by the KernelCacheLimit option However if the number of data points exceeds the size specified by the KernelCacheLimit option the SMO algorithm slows down because it has to recalculate the kernel matrix elements When using svmtrain on large data sets and you run out of memory or the optimization step is very time consuming try either of the following e Use a smaller number of samples and use cross validation to test the performance of the classifier e Set Method to SMO and set the KernelCacheLimit option as large as your system permits For information on setting the KernelCacheLimit option see the svmsmoset function svmtrain Examples Tip If you set Method to SMO setting the Box
358. lly display the logo seqlogo Seqs PropertyName PropertyValue calls seqpdist with optional properties that use property name property value pairs You can specify one or more properties in any order Each PropertyName must be enclosed in single quotation marks and is case insensitive These property name property value pairs are as follows seqlogo Displaylogo DisplaylogoValue controls the display of a sequence logo Choices are true default or false 2 645 seqlogo Examples 2 646 seqlogo Alphabet AlphabetValue specifies the type of sequence nucleotide or amino acid Choices are NT default or AA Note If you provide amino acid sequences to seqlogo you must set Alphabet to AA seqlogo Startat StartatValue specifies the starting position for the sequences in Seqs Default starting position is 1 seqlogo Endat EndatValue specifies the ending position for the sequences in Seqs Default ending position is the maximum length of the sequences in Seqs seqlogo SSCorrection SSCorrectionValue controls the use of small sample correction in the estimation of the number of bits Choices are true default or false Note A simple calculation of bits tends to overestimate the conservation at a particular location To compensate for this overestimation when SSCorrection is set to true a rough estimate is applied as an approximat
359. lotting symbol Symbol maboxplot MAData displays a box plot of the values in the columns of data MAData MAData can be a numeric array or a structure containing a field called Data maboxplot MAData ColumnName labels the box plot column names maboxplot MAStruct FieldName displays a box plot of the values in the field FieldName in the microarray data structure MAStruct If MAStruct is block based maboxplot creates a box plot of the values in the field FieldName for each block H maboxplot returns the handle of the box plot axes H HLines maboxplot returns the handles of the lines used to separate the different blocks in the image maboxplot PropertyName PropertyValue defines optional properties using property name value pairs in any order These property name value pairs are as follows maboxplot Title TitleValue allows you to specify the title of the plot The default TitleValue is FieldName maboxplot Notch NotchValue if NotchValue is true draws notched boxes The default is false to show square boxes 2 351 maboxplot Examples See Also 2 352 maboxplot Symbol SymbolValue allows you to specify the symbol used for outlier values The default Symbol is maboxplot Orientation OrientationValue allows you to specify the orientation of the box plot The choices are Vertical and Horizontal The default is Verti
360. ls msheatmap with optional properties that use property name property value pairs You can specify one or more properties in any order Each PropertyName must be enclosed in single quotation marks and is case insensitive These property name property value pairs are as follows msheatmap msheatmap Midpoint MidpointValue specifies a quantile of the ion intensity values to fall below the midpoint of the color map meaning they do not represent peaks msheatmap uses a custom color map where cool colors represent nonpeak regions white represents the midpoint and warm colors represent peaks Choices are any value between 0 and 1 Default is e 0 99 For LC MS or GC MS data or when input T is provided This means that 1 of the pixels are warm colors and represent peaks e 0 95 For non LC MS or non GC MS data or when input T is not provided This means that 5 of the pixels are warm colors and represent peaks Tip You can also change the midpoint interactively after creating the heat map by right clicking the color bar selecting Interactive Colormap Shift then click dragging the cursor vertically on the color bar This technique is useful when comparing multiple heat maps msheatmap Range RangeValue specifies the m z range for the x axis of the heat map RangeValue is a 1 by 2 vector that must be within min MZ max MZ Default is the full range min MZ max MZ msheatmap M
361. lter Data returns a filtered data matrix FData You can create FData using FData Data find I Mask FData FNames genelowvalfilter Data Names returns a filtered names array FNames where Names is a cell array of the names of the genes corresponding to each row of Data You can also create FNames using FNames Names L genelowvalfilter PropertyName PropertyValue defines optional properties using property name value pairs genelowvalfilter Prctile PrcetileValue removes from the experimental data Data gene expression profiles with all absolute values less than a specified percentile Percentile genelowvalfilter AbsValue AbsValueValue calculates the maximum absolute value for each gene expression profile and removes the profiles with maximum absolute values less than AbsValValue genelowvalfilter AnyVal AnyValValue when AnyValValue is true calculates the minimum absolute value for each gene expression profile and removes the profiles with minimum absolute values less than AnyValValue data labels I FI genelowvalfilter data labels AbsValue 5 1 Kohane I S Kho A T Butte A J 2003 Microarrays for an Integrative Genomics Cambridge MA MIT Press Bioinformatics Toolbox functions exprprofrange exprprofvar geneentropyfilter generangefilter genevarfilter geneont Purpose Syntax Arguments Description Create geneont object GeneontObj
362. lue ShowTextInNodesValue Three element numeric vector of RGB values Default is 0 0 0 which defines black Positive number that sets the size of the edge font in points Default is 8 Controls the display of arrows for the edges Choices are on default or off Positive number that sets the size of the arrows in points Default is 8 Controls the display of text indicating the weight of the edges Choices are on default or off String that specifies the node property used to label nodes when you display a biograph object using the view method Choices are e Label Uses the Label property of the node object default e ID Uses the ID property of the node object e None biograph Description NodeAutoSizeValue Controls precalculating the node size before calling the layout engine Choices are on default or off NodeCallbackValue User callback for all nodes Enter the name of a function a function handle or a cell array with multiple function handles After using the view function to display the biograph in the Biograph Viewer you can double click a node to activate the first callback or right click and select a callback to activate Default is node inspect node which displays the Property Inspector dialog box EdgeCallbackValue User callback for all edges Enter the name of a function a function handle or a cell array with multiple function handles After
363. luster tree For information on choices see the linkage function Default is average Cell array of property name property value pairs to pass to the dendrogram function Statistics Toolbox to create the dendrogram plot For information on choices see the dendrogram function clustergram OptimalLeafOrderValue Property to enable or disable the optimal ColorMapValue SymmetricRangeValue leaf ordering calculation which determines the leaf order that maximizes the similarity between neighboring leaves Choices are true enable or false disable Default depends on the size of Data If the number of rows or columns in Data is greater than 1000 default is false otherwise default is true Note Disabling the optimal leaf ordering calculation can be useful when working with large data sets because this calculation uses a large amount of memory and can be very time consuming Either of the following e M by 3 matrix of RGB values e Name or function handle of a function that returns a color map Default is redgreencmap Property to force the color range of the heat map to be symmetric around zero Choices are true default or false 2 91 clustergram 2 92 Description DimensionValue Property to specify either a one dimensional or two dimensional clustergram Choices are 1 default or 2 RatioValue Either of the following e Scalar e Two element vector Default is 1 5 clustergram Data cre
364. m a single experimental condition or phenotype where each row corresponds to a gene These data points are used as the baseline DataY Vector of gene expression values from a single experimental condition or phenotype where each row corresponds to a gene These data points will be normalized using the baseline 2 363 mainvarsetnorm 2 364 ThresholdsValue ExcludeValue PrctileValue Property to set the thresholds for the lowest average rank and the highest average rank which are used to determine the invariant set The rank invariant set is a set of data points whose proportional rank difference is smaller than a given threshold The threshold for each data point is determined by interpolating between the threshold for the lowest average rank and the threshold for the highest average rank Select these two thresholds empirically to limit the spread of the invariant set but allow enough data points to determine the normalization relationship ThresholdsValue is a 1 by 2 vector LT HT where LT is the threshold for the lowest average rank and HT is threshold for the highest average rank Values must be between 0 and 1 Default is 0 03 0 07 Property to filter the invariant set of data points by excluding the data points whose average rank between DataX and DatayY is in the highest N ranked averages or lowest N ranked averages Property to stop the iteration process when the number of data points in the inv
365. m data are not available you can still use the gcrmabackadj function by entering empty column vectors for both of these inputs in the syntax PMMatrix_Adj nsbStruct gcrmabackadj PMMatrix MMMatrix AffinPM AffinMM returns nsbStruct a structure containing nonspecific binding background parameters estimated from the intensities and affinities of probes on an Affymetrix GeneChip array nsbStruct includes the following fields e sigma e mu_pm e mu_mm gcrmabackadj PropertyName PropertyValue calls germabackadj with optional properties that use property name property value pairs You can specify one or more properties in any order Each PropertyName must be enclosed in single quotation marks and is case insensitive These property name property value pairs are as follows gcrmabackadj OpticalCorr OpticalCorrValue controls the use of optical background correction on the PM and MM probe intensity values in PMMatrix and MMMatrix Choices are true default or false gcrmabackadj CorrConst CorrConstValue specifies the correlation constant rho for log background intensity for each PM MM probe pair Choices are any value gt 0 and lt 1 Default is 0 7 gcrmabackadj Method MethodValue specifies the method to estimate the signal Choices are MLE a faster ad hoc Maximum Likelihood Estimate method or EB a slower more formal empirical Bayes method Default is MLE gcr
366. m flows are the same 1 Edmonds J and Karp R M 1972 Theoretical improvements in the algorithmic efficiency for network flow problems Journal of the ACM 19 248 264 2 Goldberg A V 1985 A New Max Flow Algorithm MIT Technical Report MIT LCS TM 291 Laboratory for Computer Science MIT graphmaxflow 3 Siek J G Lee L Q and Lumsdaine A 2002 The Boost Graph Library User Guide and Reference Manual Upper Saddle River NJ Pearson Education See Also Bioinformatics Toolbox functions graphallshortestpaths graphconncomp graphisdag graphisomorphism graphisspantree graphminspantree graphpred2path graphshortestpath graphtopoorder graphtraverse Bioinformatics Toolbox method of biograph object maxflow 2 271 graphminspantree Purpose Syntax Arguments Description 2 272 Find minimal spanning tree in graph Tree pred graphminspantree G Tree pred graphminspantree G R Tree pred graphminspantree Method MethodValue Tree pred graphminspantree Weights WeightsValue G N by N sparse matrix that represents an undirected graph Nonzero entries in matrix G represent the weights of the edges R Scalar between 1 and the number of nodes Tip For introductory information on graph theory functions see Graph Theory Functions in the Bioinformatics Toolbox documentation Tree pred graphminspantree G finds an acyclic subset of e
367. mDataY mainvarsetnorm Iterate IterateValue controls the iteration process for determining the invariant set of data points When IterateValue is true mainvarsetnorm repeats the process until either no more data points are eliminated or a predetermined percentage of data points PrctileValue is reached When IterateValue is false performs only one iteration of the process Default is true 2 367 mainvarsetnorm 2 368 Tip Select false for smaller data sets typically less than 200 data points NormDataY mainvarsetnorm Method MethodValue selects the smoothing method for normalizing the data When MethodValue is lowess mainvarsetnorm uses the lowess method When MethodValue is runmedian mainvarsetnorm uses the running median method Default is lowess NormDataY mainvarsetnorm Span SpanValue sets the window size for the smoothing method If SpoanValue is less than 1 the window size is that percentage of the number of data points If SpanValue is equal to or greater than 1 the window size is of size SpanValue Default is 0 05 which corresponds to a window size equal to 5 of the total number of data points in the invariant set NormDataY mainvarsetnorm Showplot ShowplotValue determines whether to plot a pair of M A scatter plots before and after normalization M is the ratio between Datax and DataY A is the average of DataX and DataY When Sh
368. mabackadj gcrmabackadj TuningParam TuningParamValue specifies the tuning parameter used by the estimate method This tuning parameter sets the lower bound of signal values with positive probability Choices are a positive value Default is 5 MLE or 0 5 EB Tip For information on determining a setting for this parameter see Wu et al 2004 gcrmabackadj AddVariance AddVarianceValue controls whether the signal variance is added to the weight function for smoothing low signal edge Choices are true or false default bad gcrmabackadj Showplot ShowplotValue controls the display of a plot showing the log of probe intensity values from a specified column chip in MMMatrix versus probe affinities in AffinMM Choices are true false or I an integer specifying a column in MMMatrix If set to true the first column in MMMatrix is plotted Default is e false When return values are specified e true When return values are not specified gcrmabackadj Verbose VerboseValue controls the display of a progress report showing the number of each chip as it is completed Choices are true default or false Examples 1 Load the MAT file included with Bioinformatics Toolbox that contains Affymetrix data from a prostate cancer study The variables in the MAT file include seqMatrix a matrix containing sequence information for PM probes pmMatrix and mmMatrix matrice
369. matrix G Tip For introductory information on graph theory functions see Graph Theory Functions in the Bioinformatics Toolbox documentation dist graphallshortestpaths G finds the shortest paths between every pair of nodes in the graph represented by matrix G using Johnson s algorithm Input Gis an N by N sparse matrix that represents a graph Nonzero entries in matrix G represent the weights of the edges 2 235 graphallshortestpaths Examples 2 236 Output dist is an N by N matrix where dist S T is the distance of the shortest path from node S to node T A 0 in this matrix indicates the source node an Inf is an unreachable node The pred output is the predecessor map of the winning paths Johnson s algorithm has a time complexity of 0 N log N N E where N and E are the number of nodes and edges respectively graphallshortestpaths G PropertyName PropertyValue calls graphallshortestpaths with optional properties that use property name property value pairs You can specify one or more properties in any order Each PropertyName must be enclosed in single quotes and is case insensitive These property name property value pairs are as follows dist graphallshortestpaths G Directed DirectedValue indicates whether the graph is directed or undirected Set DirectedValue to false for an undirected graph This results in the upper triangle of the sparse matrix being ignore
370. mber searches for the accession number in the Gene Expression Omnibus database and returns a MATLAB structure containing the following fields Field Scope Accession Header ColumnDescriptions ColumnNames Data getgeodata PropertyName PropertyValue defines optional properties using property name value pairs getgeodata ToFile ToFileValue saves the data returned from the database to a file Read a GenPept formatted file back into MATLAB using the function gensoftread 2 209 getgeodata Note Currently Bioinformatics Toolbox supports only Sample GSM records For more information see http www ncbi nlm nih gov About disclaimer html Examples geoStruct getgeodata GSM1768 See Also Bioinformatics Toolbox functions geosoftread getgenbank getgenpept 2 210 gethmmalignment Purpose Syntax Arguments Retrieve multiple sequence alignment associated with hidden Markov model HMM profile from PFAM database AlignStruct AlignStruct AlignStruct ToFileValue AlignStruct AlignStruct naie AlignStruct IgnoreGaps PFAMNumber ge ge ge ge ge ge thmmalignment PFAMNumber thmmalignment PFAMAccessNumber thmmalignment ToFile thmmalignment Type TypeValue thmmalignment Mirror MirrorValue thmmalignment IgnoreGaps Integer specifying a protein family number of an HMM profile record in the PFAM
371. mber of elements in Times equals the number of elements in Peaks Peaks Times mzxml2peaks mzXMLStruct extracts peak information from mzXMLStruct an mzXML structure and creates Peaks acell array of matrices containing mass charge m z values and ion intensity values and Times a vector of retention times associated with a liquid chromatography mass spectrometry LC MS or gas chromatography mass spectrometry GC MS data set Peaks Times mzxml2peaks mzXMLStruct Levels LevelsValue specifies the level s of the spectra in mzXMLStruct to convert assuming the spectra are from tandem MS data sets Default is 1 which converts only the first level spectra that is spectra containing precursor ions Setting LevelsValue to 2 converts only the second level spectra which are the fragment spectra created from a precursor ion 1 Use the mzxmlread function to read an mzXML file into MATLAB as structure Then extract the peak information of only the first level ions from the structure mzxml_struct mzxmlread results mzxml peaks time mzxml2peaks mzxml_struct mzxml2peaks Note The file results mzxm1 is not provided Sample mzXML files can be found at http sashimi sourceforge net repository html 2 Create a dotplot of the LC MS data msdotplot peaks time See Also Bioinformatics Toolbox functions msdotplot mspalign msppresample mzxmlread 2 509 mzxmliread Purpose Syntax Arguments
372. me returns the probe set ID for a gene name Name from a CHP or CDF structure AFFYStruct Name NDX Description Source SourceURL probesetlookup returns the name index into the CHP or CDF struct description source and source URL and for the probe set Note This function requires that you have the GIN file associated with the chip type that you are using in your Affymetrix library directory Examples 1 Get the file Drosophila 121502 chp from http ww affymetrix com support technical sample_data demo_data affx 2 Read the data into MATLAB chpStruct affyread Drosophila 121502 chp D Affymetrix LibFiles DrosGenomet 3 Get the gene name probesetlookup chpStruct AFFX YELO18w _at See Also Bioinformatics Toolbox functions affyread celintensityread probelibraryinfo probesetlink probesetplot probesetvalues rmabackadj 2 574 probesetplot Purpose Syntax Description Examples See Also Plot values for Affymetrix CHP file probe set probesetplot CHPStruct ID PropertyName PropertyValue probesetplot GeneName GeneNameValue probesetplot Field FieldValue probesetplot ShowStats ShowStatsValue probesetplot CHPStruct ID PropertyName PropertyValue plots the PM and MM intensity values for probe set ID CHPStruct is a structure created from an Affymetrix CHP file ID can be the index of the probe set or the probe set name Note the pro
373. merge sequences scores See Also Bioinformatics Toolbox functions hmmprofalign hmmprofstruct 2 316 hmmprofstruct Purpose Syntax Arguments Description Create profile Hidden Markov Model HMM structure Model hmmprofstruct Length Model hmmprofstruct Length Fieldi FieldValues7 hmmprofstruct Model Field1 FieldiValuesi Length Number of match states in the model Model Hidden Markov model created with the function hmmprofstruct Field1 Field name in the structure Model Enter a name from the table below Model hmmprofstruct Length returns a structure with the fields containing the required parameters of a profile HMM Length specifies the number of match states in the model All other mandatory model parameters are initialized to the default values Model hmmprofstruct Length Field1 FieldValues1 creates a profile HMM using the specified fields and parameters All other mandatory model parameters are initialized to default values hmmprofstruct Model Fieldi Field1iValues71 returns the updated profile HMM with the specified fields and parameters All other mandatory model parameters are taken from the reference MODEL HMM Profile Structure Format Model parameters fields mandatory All probability values are in the 0 1 range Field Name Description ModelLength Length of the profile number of MATCH states Alphabet AA or NT Default is AA
374. mes Intensities msheatmap msheatmap msheatmap msheatmap msheatmap msheatmap MZ Times J Midpoint MidpointValue Range RangeValue Markers MarkersValue SpecIdx SpecIdxValue Group GroupValue Resolution ResolutionValue Column vector of common mass charge m z values for a set of spectra The number of elements in the vector equals the number of rows in the matrix Intensities Note You can use the msppresample function to create the MZ vector Column vector of retention times associated with a liquid chromatography mass spectrometry LC MS or gas chromatography mass spectrometry GC MS data set The number of elements in the vector equals the number of columns in the matrix Intensities The retention times are used to label the y axis of the heat map Tip You can use the mzxml2peaks function to create the Times vector msheatmap Intensities Matrix of intensity values for a set of mass spectra that share the same m z range Each row corresponds to an m z value and each column corresponds to a spectrum or retention time The number of rows equals the number of elements in vector MZ The number of columns equals the number of elements in vector Times Note You can use the msppresample function to create the Intensities matrix 2 437 msheatmap 2 438 MidpointValue RangeValue Ma
375. method for smoothing the curve of estimated points and eliminating the effects of possible outliers Enter none lowess linear fit loess quadratic fit rlowess robust linear or rloess robust quadratic fit Default value is none msbackadj QuantileValue QuantileValueValue specifies the quantile value The default value is 0 10 msbackadj PreserveHeights PreserveHeightsValue when PreserveHeightsValue is true sets the baseline subtraction mode to preserve the height of the tallest peak in the signal The default value is false and peak heights are not preserved msbackadj ShowPlot ShowPlotValue plots the baseline estimated points the regressed baseline and the original spectrum When msbackadj is called without output arguments the spectra are plotted unless ShowPlotValue is false When ShowPlotValue is true only the first spectrum in Y is plotted ShowPlotValue can also contain an index to one of the spectra in Y 1 Load sample data load sample_lo_res 2 Adjust the baseline for a group of spectra and show only the third spectrum and its estimated background YB msbackadj MZ_lo_res Y_lo_res SHOWPLOT 3 2 427 msbackad gt Fiourea TT Fie Edit View Insert Tools Desktop Window Help a Spectrogram ID 3 100 5 n 90 Original spectrogram Regressed baseline i x Estimated baseline points eee oe eee Seam a eee ee eee ee TTTTT
376. mmetric and has a zero diagonal pdist Criteria CriteriaValue changes the criteria used to relate pairs C can be distance default or levels Examples 1 Get a phylogenetic tree from a file tr phytreeread pf00002 tree 2 Calculate the tree distances between pairs of leaves dist pdist tr nodes leaves squareform true See Also Bioinformatics Toolbox e functions phytree object constructor phytreeread phytreetool seqlinkage seqpdist 4 53 plot phytree Purpose Draw phylogenetic tree Syntax plot Tree plot Tree ActiveBranches plot Type TypeValue plot Orientation OrientationValue plot BranchLabels BranchLabelsValue plot LeafLabels LeafLabelsValue plot TerminalLabels TerminalLabelsValue Arguments Tree Phylogenetic tree object created with the phytree constructor function ActiveBranches Branches veiwable in the figure window TypeValue Property to select a method for drawing a phylogenetic tree Enter square angular or radial The default value is square OrientationValue Property to orient a phylogram or cladogram tree Enter top bottom left or right The default value is left BranchLabelsValue Property to control displaying branch labels Enter either true or false The default value is false LeafLabelsValue Property to control displaying leaf labels Enter either true or false The default value i
377. msmoset function This structure specifies options used by the SMO method For more information on creating this structure see the svmsmoset function SVMStruct svmtrain BoxConstraint BoxConstraintValue specifies box constraints for the soft margin BoxConstraintValue can be either of the following e Strictly positive numeric scalar e Array of strictly positive values with the number of elements equal to the number of rows in the Training matrix If BoxConstraintValue is a scalar it is automatically rescaled by N 2 N1 for the data points of group one and by N 2 N2 for the data points of group two N1 is the number of elements in group one N2 is the number of elements in group two and N N1 N2 This rescaling is done to take into account unbalanced groups that is cases where N1 and N2 have very different values If BoxConstraintValue is an array then each array element is taken as a box constraint for the data point with the same index 2 707 svmtrain 2 708 Default is a scalar value of 1 SVMStruct svmtrain Autoscale AutoscaleValue controls the shifting and scaling of data points before training When AutoscaleValue is true the columns of the input data matrix Training are shifted to zero mean and scaled to unit variance Default is false SVMStruct svmtrain Showplot ShowplotValue controls the display of a plot of the grouped data including the separating
378. n ShowHistValue is true displays a histogram of the range data Calculate the range of expression profiles for yeast data as gene expression changes during the metabolic shift from fermentation to respiration load yeastdata range exprprofrange yeastvalues ShowHist true Bioinformatics Toolbox function exprprofvar generangefilter 2 135 exprprofvar Purpose Syntax Arguments Description Examples See Also 2 136 Calculate variance of gene expression profiles Variance exprprofvar Data exprprofvar PropertyName PropertyValue exprprofvar ShowHist ShowHistValue Data Matrix where each row corresponds to a gene ShowHistValue Property to control the display of a histogram with variance data Enter either true or false default Variance exprprofvar Data calculates the variance of each expression profile in a data set Data If you do not specify output arguments this function displays a histogram bar plot of the range exprprofvar PropertyName PropertyValue defines optional properties using property name value pairs exprprofvar ShowHist ShowHistValue when ShowHist is true displays a histogram of the range data Calculate the variance of expression profiles for yeast data as gene expression changes during the metabolic shift from fermentation to respiration load yeastdata datavar exprprofvar yeastvalues ShowHist true Bioin
379. n of the peak height that selects the points used to compute the centroid mass of the respective peak PeakLocationValue must be a value gt 0 and lt 1 Default is 1 0 Note When PeakLocationValue 1 0 the peak location is exactly at the maximum of the peak while when PeakLocationValue 0 the peak location is computed with all the points from the closest minimum to the left of the peak to the closest minimum to the right of the peak Peaks mspeaks MZ Intensities FWHH Filter FWHH_FilterValue specifies the minimum full width at half height FWHH in m z units for reported peaks Peaks with FWHH below this value are not included in the output list Peaks FWHH_FilterValue must be a positive real value Default is 0 Peaks mspeaks MZ Intensities OverSegmentation Filter OverSegmentation_FilterValue specifies the minimum distance in m z units between neighboring peaks When the signal is not smoothed appropriately multiple maxima can appear to represent the same peak By increasing this filter value oversegmented peaks are joined into a single peak OverSegmentation_ FilterValue must be a positive real value Default is 0 Peaks mspeaks MZ Intensities Height_Filter Height_FilterValue specifies the minimum height for reported 2 471 mspeaks Examples 2 472 peaks Peaks with heights below this value are not included in the output list Peaks Height_FilterValue must be
380. n the output Enter either true include branch names or false exclude branch names Default is false String getnewickstr Tree returns the Newick formatted string of a phylogenetic tree object Tree getnewickstr PropertyName PropertyValue defines optional properties using property name value pairs getnewickstr Distances DistancesValue when DistancesValue is false excludes the distances from the output getnewickstr BranchNames BranchNamesValue when BranchNamesValue is true includes the branch names in the output Information about the Newick tree format http evolution genetics washington edu phylip newicktree html getnewickstr phytree Examples See Also 1 Create some random sequences seqs int2nt ceil rand 10 4 2 Calculate pairwise distances dist seqpdist seqs alpha nt 3 Construct a phylogenetic tree tree seqlinkage dist 4 Get the Newick string str getnewickstr tree Bioinformatics Toolbox e functions phytree object constructor phytreeread phytreetool phytreewrite seqlinkage e phytree object methods get getbyname getcanonical 4 35 getnodesbyid biograph Purpose Syntax Arguments Description Example See Also 4 36 Get handles to nodes NodesHandles getnodesbyid BGobj NodeIDs BGobj Biograph object NodeIDs_ Enter a cell string of node identifications NodesHandles getnodesbyid BGobj
381. name or a URL pointing to a file The referenced file is a GenBank formatted file ASCII text file If you specify only a file name that file must be on the MATLAB search path or in the MATLAB Current Directory e MATLAB character array that contains the text of a GenBank formatted file GenBankData MATLAB structure with fields corresponding to GenBank keywords GenBankData genbankread File reads in a GenBank formatted file File and creates a structure GenBankData containing fields corresponding to the GenBank keywords Each separate sequence listed in the output structure GenBankData is stored as a separate element of the structure 1 Get sequence information for a gene HEXA store data in a file and then read back into MATLAB getgenbank nm_000520 ToFile TaySachs_Gene txt s genbankread TaySachs_ Gene txt S LocusName NM_000520 LocusSequenceLength 2255 LocusNumberofStrands 2 177 genbankread LocusTopology LocusMoleculeType LocusGenBankDivision LocusModificationDate Definition Accession Version GI Project Keywords Segment Source SourceOrganism Reference Comment Features CDS Sequence linear mRNA PRI 13 AUG 2006 1x63 char NM_000520 NM_000520 2 13128865 Homo sapiens human 4x65 char 1x58 cell 15x67 char 74x74 char 1x1 struct 1x2255 char 2 Display the source organism for
382. ncorrect reference peaks as msalign both scales and shifts the MZ vector If using a single reference peak you might need to only shift the MZ vector To do this use IntensitiesOut interp1 MZ Intensities MZ ReferenceMass ExperimentalMass For more information see Aligning Mass Spectrum with One Reference Peak on page 2 421 msalign PropertyName PropertyValue calls msalign with optional properties that use property name property value pairs You can specify one or more properties in any order Each PropertyName must be enclosed in single quotation marks and is case insensitive These property name property value pairs are as follows msalign Weights WeightsValue specifies the relative weight for each mass in RefMZ the vector of reference m z values WeightsValue is a vector of positive values with the same number of elements as RefMZ The default vector is ones size RefMZ which means each reference peak is weighted equally so that more intense reference peaks have a greater effect in the alignment algorithm If you have a less intense reference peak you can increase its weight to emphasize it more in the alignment algorithm msalign Range RangeValue specifies the lower and upper limits of the range in m z units relative to each peak No peak will shift beyond these limits RangeValue is a two element vector in which the first element is negative and the second element i
383. ndle or a cell array with multiple function handles After using the view function to display the biograph object in the Biograph Viewer you can double click a node to activate the first callback or right click and select a callback to activate Default is the anonymous function node inspect node which displays the Property Inspector dialog box User defined callback for all edges Enter the name of a function a function handle or a cell array with multiple function handles After using the view function to display the biograph object in the Biograph Viewer you can double click an edge to activate the first callback or right click and select a callback to activate Default is the anonymous function edge inspect edge which displays the Property Inspector dialog box Function handle to customized function to draw nodes Default is biograph object Property Description Nodes Edges Read only column vector with handles to node objects of a biograph object The size of the vector is the number of nodes For properties of node objects see Properties of a Node Object on page 5 7 Read only column vector with handles to edge objects of a biograph object The size of vector is the number of edges For properties of edge objects see Properties of an Edge Object on page 5 9 Properties of a Node Object Property ID Description Read only string defined when the biograph object is
384. ndran returns a handle to the plot 2 591 ramachandran Examples Generate the Ramachandran plot for the human serum albumin complexed with octadecanoic acid ramachandran 1E7I1 Ramachandran Plot r lolx HUMAN SERUM ALBUMIN COMPLEXED WITH OCTADECANOIC ACID 180 Psi Degrees i r o aS Oo ai Oo T te e A F f f aN ow a T L F 1 180 135 90 45 0 45 90 135 180 Phi Degrees See Also Bioinformatics Toolbox functions getpdb molviewer pdbdistplot pdbread 2 592 randfeatures Purpose Syntax Description Generate randomized subset of features IDX Z randfeatures X Group PropertyName PropertyValue randfeatures Classifier C randfeatures ClassOptions CO randfeatures PerformanceThreshold PT randfeatures ConfidenceThreshold CT randfeatures SubsetSize SS randfeatures PoolSize PS randfeatures NumberOfIndices N randfeatures CrossNorm CN randfeatures Verbose VerboseValue IDX Z randfeatures X Group PropertyName PropertyValue performs a randomized subset feature search reinforced by classification randfeatures randomly generates subsets of features used to classify the samples Every subset is evaluated with the apparent error Only the best subsets are kept and they are joined into a single final pool The cardinalit
385. ndran plot for Protein Data Bank PDB data Find restriction enzymes that cut protein sequence Retrieve multiple sequence alignment associated with hidden Markov model HMM profile from PFAM database Retrieve hidden Markov model HMM profile from PFAM database Phylogenetic tree data from PFAM database Align query sequence to profile using hidden Markov model alignment Estimate profile Hidden Markov Model HMM parameters using pseudocounts Generate random sequence drawn from profile Hidden Markov Model HMM Concatenate prealigned strings of several sequences to profile Hidden Markow Model HMM Create profile Hidden Markov Model HMM structure Read data from PFAM HMM file Plot Hidden Markov Model HMM profile Microarray File Formats Microarray File Formats affyprobeseqread affyread agferead celintensityread galread geosoftread getgeodata gprread imageneread sptread Microarray Utility magetfield probelibraryinfo probesetlink Read data file containing probe sequence information for Affymetrix GeneChip array Read microarray data from Affymetrix GeneChip file Windows 32 Read Agilent Feature Extraction Software file Read probe intensities from Affymetrix CEL files Windows 32 Read microarray data from GenePix array list file Read Gene Expression Omnibus GEO SOFT format data Retrieve Gene Expression Omnibus GEO Sample GSM data Read microarray data from G
386. neont object constructor goannotread num2goid e geneont object methods getancestors getmatrix getrelatives 4 28 Purpose Syntax Arguments Description Example getedgesbynodeid biograph Get handles to edges in biograph object Edges getedgesbynodeid BGobj SourceIDs SinkIDs BGobj Biograph object SourcelIDs Enter a cell string or an empty cell array gets SinkIDs all edges Edges getedgesbynodeid BGobj SourceIDs SinkIDs gets the handles to the edges that connect the specified source nodes SourceIDs to the specified sink nodes SinkIDs in a biograph object 1 Create a biograph object for the Hominidae family species Homo Pan Gorilla Pongo Baboon Macaca Gibbon cm magic 7 gt 25 amp 1 eye 7 bg biograph cm species 2 Find all the edges that connect to the Homo node EdgesIn getedgesbynodeid bg Homo EdgesOut getedgesbynodeid bg Homo set EdgesIn LineColor 0 1 0 set EdgesOut LineColor 1 0 0 bg view 3 Find all edges that connect members of the Cercopithecidae family to members of the Hominidae family Cercopithecidae Macaca Baboon Hominidae Homo Pan Gorilla Pongo edgesSel getedgesbynodeid bg Cercopithecidae Hominidae set bg edges LineColor 5 5 5 set edgesSel LineColor 0 0 1 4 29 getedgesbynodeid biograph bg view See Also Bioinformatics Toolbox funct
387. ng Unknown symbols symbol list appear in the sequence These will be ignored Dimers Percent dimercount SeqNT returns a 4 by 4 matrix with the relative proportions of the dimers in SeqNT The rows correspond to A C G and T in the first element of the dimer and the columns correspond to A C G and T in the second element dimercount PropertyName PropertyValue defines optional properties using property name value pairs dimercount Chart ChartStyle creates a chart showing the relative proportions of the dimers Count the number of dimers in a nucleotide sequence dimercount TAGCTGGCCAAGCGAGCTTG ans AA o o nNO 07A842M22420002 2 115 dimercount See Also Bioinformatics Toolbox functions aacount basecount baselookup codoncount nmercount ntdensity 2 116 dna2rna Purpose Syntax Arguments Description Examples See Also Convert DNA sequence to RNA sequence SeqRNA dna2rna SeqDNA SeqDNA DNA sequence Enter either a character string with the characters A T G C and ambiguous characters R Y K M S W B D H V N or a vector of integers from the table Mapping Nucleotide Letters to Integers on page 2 518 You can also enter a structure with the field Sequence SeqRNA RNA sequence SeqRNA dna2rna SeqDNA converts a DNA sequence to an RNA sequence by converting any thymine nucleotides T in the DNA sequence to uracil U The RNA sequenc
388. ng a space Enter an integer The default column width is 10 Property to control displaying numbers at the start of each row Enter either true default to show numbers or false to hide numbers seqdisp Seq displays a sequence Seq in rows with a default row length of 60 and a default column width of 10 seqdisp PropertyName PropertyValue defines optional properties using property name value pairs seqdisp Row RowValue specifies the length of each row for the displayed sequence seqdisp Examples See Also seqdisp Column ColumnValue specifies the number of letters to display before adding a space Row must be larger than and evenly divisible by Column seqdisp ShowNumbers ShowNumbersValue when ShowNumbers is false turns off the position numbers at the start of each row off Read sequence information from the GenBank database Display the sequence in rows with 50 letters and within a row separate every 10 letters with a space mouseHEXA getgenbank AKO80777 seqdisp mouseHEXA Row 50 Column 10 Create and save a FASTA file with two sequences and then display it hdr Sequence A Sequence B seq TAGCTGRCCAAGGCCAAGCGAGCTTN ATCGACYGGTTCCGGTTCGCTCGAAN fastawrite local fa hdr seq seqdisp local fa ShowNumbers false ans gt Sequence A 1 TAGCTGRCCA AGGCCAAGCG AGCTTN gt Sequence B 1 ATCGACYGGT TCCGGTTCGC TCGA
389. ng matrix nonmatches with a zero or positive scoring matrix value Score Alignment Start nwalign Seq1 Seq2 returns a 2 by 1 vector of indices indicating the starting point in each sequence for the alignment Because this is a global alignment Start is always 131 nwalign nwalign Seq1 Seq2 PropertyName Pronari Value calls nwalign with optional properties that use property name property value pairs You can specify one or more properties in any order Each PropertyName must be enclosed in single quotation marks and is case insensitive These property name property value pairs are as follows nwalign Seq1 Seq2 Alphabet AlphabetVal ue specifies the type of sequences Choices are AA default or NT nwalign Seq1 Seq2 ScoringMatrix ScoringMatrixValue specifies the scoring matrix to use for the global alignment Default is e BLOSUM50 when AlphabetValue equals AA e NUC44 when AlphabetValue equals NT nwalign Seq1 Seq2 Scale ScaleValue specifies the scale factor used to return Score in arbitrary units other than bits Choices are any positive value nwalign Seq1 Seq2 GapOpen GapOpenValue specifies the penalty for opening a gap in the alignment Choices are any positive integer Default is 8 nwalign Seq1 Seq2 ExtendGap ExtendGapVal ue specifies the penalty for extending a gap in the alignment Choices are any positive inte
390. ng of a binary tree maximizes the 2 541 optimalleaforder Examples 2 542 similarity between adjacent elements clusters or leaves by flipping tree branches but without dividing the clusters The input Dist is a distance matrix such as that created by the pdist function Order optimalleaforder Tree Dist PropertyName PropertyValue calls optimalleaforder with optional properties that use property name property value pairs You can specify one or more properties in any order Each PropertyName must be enclosed in single quotation marks and is case insensitive These property name property value pairs are as follows Order optimalleaforder Tree Dist Criteria CriteriaValue specifies the optimization criteria Order optimalleaforder Tree Dist Transformation TransformationValue specifies the algorithm to transform the distances in Dist into similarity values The transformation is necessary because optimalleaforder maximizes the similarity between adjacent elements which is comparable to minimizing the sum of distances between adjacent elements 1 Use the rand function to create a 10 by 2 matrix of random values X rand 10 2 2 Use the pdist function to create a distance matrix containing the city block distances between the pairs of objects in matrix X Dist pdist X cityblock 3 Use the linkage function to create a matrix Tree that represents a hierarchical binary
391. nked invariant set of data points to fit a straight line through while the remaining data points are fitted to a running median curve The final running median curve is a piece wise linear curve Default is 1 5 affyinvarsetnorm Method MethodValue selects the smoothing method for normalizing the data When MethodValue is lowess affyinvarsetnorm uses the lowess method When MethodValue is runmedian affyinvarsetnorm uses the running median method Default is lowess affyinvarsetnorm Showplot ShowplotValue plots two pairs of scatter plots before and after normalization The affyinvarsetnorm first pair plots baseline data versus data from a specified column chip from the matrix Data The second is a pair of M A scatter plots which plots M ratio between baseline and sample versus A the average of the baseline and sample When ShowplotValue is all affyinvarsetnorm plots a pair of scatter plots for each column or chip When ShowplotValue is a number s or range of numbers affyinvarsetnorn plots a pair of scatter plots for the indicated column numbers chips For example e Showplot 3 plots the data from column 3 of Data e Showplot 3 5 7 plots the data from columns 3 5 and 7 of Data e Showplot 3 9 plots the data from columns 8 to 9 of Data affyinvarsetnorm File Edit View Insert Tools Desktop Window Help TENCE OALE
392. no Acid Amino Acid Histidine CAT CAC Glutamic acid Z random H or Glutamine codon from E and Q Isoleucine ATT ATC ATA Unknown or X random I any amino acid codon Leucine L TTA TTG CTT Translation TAA TAG TGA CTC CTA CTG stop Lysine K AAA AAG Gap of indeterminate length Methionine ATG Any character M or any symbol Examples standard genetic code aa2nt MATLAB not in table 1 Convert an amino acid sequence to a nucleotide sequence using the Warning The sequence contains ambiguous characters ans ATGGCAACCCTGGCGAAT 2 Use the Vertebrate Mitochondrial genetic code aa2nt MATLAP ans ATGGCAACTCTAGCGCCT GeneticCode 2 3 Use the genetic code for the Echinoderm Mitochondrial RNA alphabet 2 8 aa2nt See Also aa2nt MATLAB GeneticCode ec Alphabet RNA Warning The sequence contains ambiguous characters ans AUGGCUACAUUGGCUGAU 4 Convert a sequence with the ambiguous amino acid character B aa2nt abcd Warning The sequence contains ambiguous characters ans GCCACATGCGAC Bioinformatics Toolbox functions geneticcode nt2aa revgeneticcode seqtool MATLAB function rand 2 9 aacount Purpose Syntax Arguments Description 2 10 Count amino acids in sequence Amino aacount SeqAA aacount PropertyName PropertyValue aacount Chart ChartValue aacount
393. nonymous substitution rates between the two homologous sequences SeqNT1 and SeqNT2 using the Yang Nielsen method 2000 This maximum likelihood method estimates an explicit model for codon substitution that accounts for transition transversion rate bias and base codon frequency bias Then it uses the model to correct synonymous and nonsynonymous counts to account for multiple substitutions at the same site The maximum likelihood method is best suited when the sample size is significant larger than 100 bases and when the sequences being compared can have transition transversion rate biases and base codon frequency biases dndsm1 returns e Dn Nonsynonymous substitution rate s e Ds Synonymous substitution rate s e Like Likelihood of this estimate This analysis e Assumes that the nucleotide sequences SeqNT1 and SeqNT2 are codon aligned that is do not have frame shifts Tip If your sequences are not codon aligned use the nt2aa function to convert them to amino acid sequences use the nwalign function to globally align them then use the seqinsertgaps function to recover the corresponding codon aligned nucleotide sequences See Estimating Synonymous and Nonsynonymous Substitution Rates Between Two Nucleotide Sequences That Are Not Codon Aligned on page 2 128 e Excludes any ambiguous nucleotide characters or codons that include gaps dndsml e Considers the number of codons in the shorter of the two nuc
394. norm quantilenorm rmabackadj rmasummary 2 21 affyprobeaffinities Purpose Syntax 2 22 Compute Affymetrix probe affinities from their sequences and MM probe intensities AffinPM AffinMM affyprobeaffinities SequenceMatrix MMIntensity AffinPM AffinMM BaseProf affyprobeaffinities SequenceMatrix MMIntensity AffinPM AffinMM BaseProf Stats affyprobeaffinities SequenceMatrix MMIntensity affyprobeaffinities SequenceMatrix MMIntensity ProbeIndices ProbeIndicesValue affyprobeaffinities SequenceMatrix MMIntensity Showplot ShowplotValue affyprobeaffinities Arguments SequenceMatrix MMIntensity An N by 25 matrix of sequence information for the perfect match PM probes on an Affymetrix GeneChip array where N is the number of probes on the array Each row corresponds to a probe and each column corresponds to one of the 25 sequence positions Nucleotides in the sequences are represented by one of the following integers e 0 None e 1 A e 2 C e 3 G e 4 T Tip You can use the affyprobeseqread function to generate this matrix If you have this sequence information in letter representation you can convert it to integer representation using the nt2int function Column vector containing mismatch MM probe intensities from a CEL file generated from a single Affymetrix GeneChip array Each row corresponds to a probe Tip You can
395. notations and parameters of the model For more information about the model structure format see hmmprofstruct File can also be a URL or a MATLAB cell array that contains the text of a PFAM formatted file pfamhmmread is based on the HMMER 2 0 file formats pfamhmmread pf00002 1s site http www sanger ac uk pfamhmmread site cgi bin Pfam download_hmm p1l mode ls amp id 7tm_2 Bioinformatics Toolbox functions gethmmalignment gethmmprof hmmprofalign hmmprofstruct showhmmprof phytree Purpose Syntax Arguments Description Create phytree object Tree phytree B Tree phytree B D Tree phytree B C Tree phytree BC Tree phytree N Tree phytree B Numeric array of size NUMBRANCHES X 2 in which every row represents a branch of the tree It contains two pointers to the branch or leaf nodes which are its children Column vector with distances for every branch Column vector with distances from every node to their parent branch BC Combined matrix with pointers to branches or leaves and distances of branches N Cell array with the names of leaves and branches Tree phytree 8 creates an ultrametric phylogenetic tree object In an ultrametric phylogenetic tree object all leaves are the same distance from the root B is a numeric array of size NUMBRANCHES X 2 in which every row represents a branch of the tree and it contains two pointers to the branch or leaf nodes which
396. ns grp2idx classify 2 85 cleave 2 86 Purpose Syntax Arguments Description Cleave amino acid sequence with enzyme Fragments cleave SeqAA PeptidePattern Position Fragments CuttingSites cleave Fragments CuttingSites Lengths cleave Cleave PropertyName PropertyValue Cleave PartialDigest PartialDigestValue SeqAA Amino acid sequence Enter a character string or a vector of integers from the table Examples ARN or 1 2 3 You can also enter a structure with the field Sequence PeptidePattern Short amino acid sequence to search in a larger sequence Enter a character string vector of integers or a regular expression Position Position on the PeptidePattern where the sequence is cleaved Enter a position within the PeptidePattern Position 0 corresponds to the N terminal end of the PepetidePattern PartialDigestValue Property to specify the probability that a cleavage site will be cleaved Enter a value from 0 to 1 default Fragments cleave SeqAA PeptidePattern Position cuts an amino acid sequence SegAA into parts at the specified cleavage site specified by a peptide pattern and position Fragments CuttingSites cleave returns a numeric vector with the indices representing the cleave sites A 0 zero is added to the list so numel Fragments numel CuttingSites You can use CuttingSites 1 to point to the first amino acid of every fr
397. nsertgaps vt04_cds al 3 2 639 seqinsertgaps 5 Once you have code aligned the two sequences you can use them as input to other functions such as dnds which calculates the synonymous and nonsynonymous substitutions rates of the codon aligned nucleotide sequences By setting Verbose to true you can also display the codons considered in the computations and their amino acid translations dn ds dnds hk01_aligned vt04_ aligned verbose true See Also Bioinformatics Toolbox functions dnds dndsml int2aa int2nt 2 640 seqlinkage Purpose Syntax Arguments Description Construct phylogenetic tree from pair wise distances Tree seqlinkage Dist Tree seqlinkage Dist Method Tree seqlinkage Dist Method Names Dist Matrix or vector of pair wise distances such as returned by the seqpdist function Method String that specifies a distance method Choices are e single e complete e average default e weighted e centroid e median Names Property to use alternative labels for leaf nodes Enter a vector of structures with the fields Header or Name or a cell array of strings In both cases the number of elements you provide must comply with the number of samples used to generate the pair wise distances in Dist Tree seqlinkage Dist returns a phylogenetic tree object from the pair wise distances Dist between the species or products Dist is a matrix or vector
398. nstead of a least squares regression The default value is 1 Note Curve Fitting Toolbox also refers to Lowess smoothing of order 2 as Loess smoothing mslowess Span SpanValue specifies the window size for the smoothing kernel If SoanValue is greater than 1 the window is equal to SpanValue number of samples independent of the mass charge vector MZ The default value is 10 samples Higher values will smooth the signal more at the expense of computation time If SpanValue is less than 1 the window size is taken to be a fraction of the number of points in the data For example when SpanValue is 0 005 the window size is equal to 0 50 of the number of points in MZ mslowess Kernel KernelValue selects the function KernelValue for weighting the observed ion intensities Samples close to the MZ location being smoothed have the most weight in determining the estimate Enter tricubic default 1 dist dmax 3 3 gaussian exp 2 dist dmax 2 linear 1 dist dmax mslowess RobustIterations RobustIterationsValue specifies the number of iterations RobustValue for a robust fit If RobustIterationsValue is 0 default no robust fit is performed For robust smoothing small residual values at every span are outweighed to improve the new estimate 1 or 2 robust iterations are usually adequate while larger values might be computationally expensive 2 447 mslowess Example 2 448
399. nsynonymous substitution rates using the specified genetic code Enter a Code Number or a string with a Code Name from the table If you use a Code Name you can truncate it to the first two characters Default is 1 or Standard Dn Ds Vardn Vards dnds SeqNT71 SeqNT2 Method MethodValue allows you to calculate synonymous and nonsynonymous substitution rates using the following algorithms e NG default Nei Gojobori method 1986 uses the number of synonymous and nonsynonymous substitutions and the number of potentially synonymous and nonsynonymous sites Based on the Jukes Cantor model e LWL Li Wu Luo method 1985 uses the number of transitional and transversional substitutions at three different levels of degeneracy of the genetic code Based on Kimura s two parameter model 2 121 dnds Examples 2 122 e PBL Pamilo Bianchi Li method 1993 is similar to the Li Wu Luo method but with bias correction Use this method when the number of transitions is much larger than the number of transversions Dn Ds Vardn Vards dnds SeqNT1 SeqNT2 Window WindowValue performs the calculations over a sliding window specified in codons Each output is an array containing a rate or variance for each window Dn Ds Vardn Vards dnds SeqNT1 SeqNT2 Verbose VerboseValue controls the display of the codons considered in the computations and their amino acid translations Choi
400. ntax Arguments Convert nucleotide sequence from integer to letter representation int2nt SeqNnT int2nt PropertyName PropertyValue int2nt Alphabet AlphabetValue int2nt Unknown UnknownValue int2nt Case CaseValue SeqnT Nucleotide sequence represented by integers Enter a vector of integers from the table Mapping Nucleotide Integers to Letters below The array does not have to be of type integer but it does have to contain only integer numbers Integers are arbitrarily assigned to IUB IUPAC letters AlphabetValue Property to select the nucleotide alphabet Enter either DNA or RNA UnknownValue Property to select the integer value for the unknown character Enter a character to map integers 16 or greater to an unknown character The character must not be one of the nucleotide characters A T C G or the ambiguous nucleotide characters N R Y K M S W B D H or V The default character is CaseValue Property to select the letter case for the nucleotide sequence Enter either upper default or lower 2 329 int2nt Description Examples 2 330 Mapping Nucleotide Integers to Letters Base Code Base Code Base Code Adenosine IA rh C 6 Y A T G not 12 D pyrimidine C Cytidine 2 C_ G T keto 7 K A T C not 13 H G Guanine 3 G A C amino 8 M A G C not 14 V T Thymidine 4 T G C strong 9 S A T G C any 15 N Uridine if
401. ntensity values and each element corresponds to a spectrum or retention time ret_time is a column vector of retention times associated with the LC MS data set load lcmsdata 2 Resample the unaligned data and display it in a heat map and dot plot MZ Y msppresample peaks 5000 msheatmap MZ ret_time log Y lolx Fie Edit View Insert Tools Desktop Window Help a TENCA Retention Time Relative Intensity 200 250 300 350 400 450 500 550 600 Mass Charge M Z msdotplot peaks ret_time 2 459 mspalign 3 Foure ioj xl File Edit View Insert Tools Desktop Window Help a 0 easi eale a 350 400 450 Mass Charge M Z 2600 2800 3000 3200 3400 3600 Retention Time Relative Intensity 3800 4000 4200 4400 200 3 Align the peak lists from the mass spectra using the default estimation and correction methods CMZ aligned_peaks mspalign peaks 4 Resample the unaligned data and display it in a heat map and dot plot MZ2 Y2 msppresample aligned_peaks 5000 msheatmap MZ2 ret_time log Y2 2 460 File Edit View Insert Tools Desktop Window Help Oe HS Ri AQMS H 08 e0 Retention Time 250 300 350 400 450 500 Mass Charge M Z 550 Relative Intensity msdotplot aligned_peaks ret_time 2 461 mspalign ioj xl Fie Edit View Insert Tools Desktop Window Help a SHS S QAQMO EE 2600 2800 3000 3200 350 400 450 Ma
402. nts getedgesbynodeid getnodesbyid getrelatives view Bioinformatics Toolbox methods of a biograph object dolayout MATLAB functions get set 4 26 getdescendants geneont Purpose Numeric IDs for descendants of Gene Ontology term Syntax DescendantIDs getdescendants GeneontObj ID DescendantIDs getdescendants Depth DepthValue Description DescendantIDs getdescendants GeneontObj ID returns the numeric IDs DescendantIDs for the descendants of a term ID including the ID for the term ID is a nonnegative integer or a numeric vector with a set of IDs DescendantIDs getdescendants PropertyName PropertyValue defines optional properties using property name value pairs DescendantIDs getdescendants Depth DepthValue searches down through a specified number of levels DepthValue in the Gene Ontology DepthValue is a positive integer Default is Inf Examples 1 Download the Gene Ontology database from the Web into MATLAB GO geneont LIVE true MATLAB creates a geneont object and displays the number of terms in the database Gene Ontology object with 20005 Terms 2 Get the ancestors for a Gene Ontology term descendants getdescendants G0 5622 Depth 5 3 Create a sub Gene Ontology subontology GO descendants Gene Ontology object with 1071 Terms 4 27 getdescendants geneont See Also Bioinformatics Toolbox e functions ge
403. number of short descriptions The default value is normally 100 and for Program pciblast the default value is 500 Property to specify the number of sequences to report high scoring segment pairs HSP The default value is normally 100 and for Program pciblast the default value is 500 Property to select a filter Enter L low complexity R human repeats m mask for lookup table or lcase to turn on the lowercase mask The default value is L Property to select the statistical significance threshold Enter a real number The default value is 10 Property to select a word length For amino acid sequences Word can be 2 or 3 3 is the default value and for nucleotide sequences Word can be 7 11 or 15 11 is the default value If Program MegaBlast Word can be 11 12 16 20 24 28 32 48 or 64 with a default value of 28 blastncbi Description MatrixValue Property to select a substitution matrix for amino acid sequences Enter PAM30 PAM70 BLOSUM80 BLOSUM62 or BLOSUM45 The default value is BLOSUM62 InclusionValue Property for PCI BLAST searches to define the statistical significance threshold The default value is 0 005 PctValue Property to select the percent identity Enter None 99 98 95 90 85 80 75 or 60 Match and mismatch scores are automatically selected The default value is 99 99 1 3 The Basic Local Alignment Search Tool BLAST offers a fa
404. ny order These property name value pairs are as follows ExpressionMatrix rmasummary Output OutputValue controls the scale of the returned gene expression values Output Value can be e log e log2 e 1og10 e natural e functionname In the last instance the data is transformed as defined by the function functionname Default is log2 1 Load a MAT file included with Bioinformatics Toolbox which contains Affymetrix data variables including pmMatrix a matrix of PM probe intensity values from multiple CEL files load prostatecancerrawdata 2 Perform background adjustment on the PM probe intensity values in the matrix pmMatrix using the rmabackadj function thereby creating a new matrix BackgroundAdjustedMatrix BackgroundAdjustedMatrix rmabackadj pmMatrix 3 Normalize the data in BackgroundAdjustedMatrix using the quantilenorm function NormMatrix quantilenorm BackgroundAdjustedMatrix 4 Calculate gene expression values from the probe intensities in NormMatrix creating a new matrix ExpressionMatrix You will rmasummary References See Also use the probeIndices column vector provided to supply information on the probe indices ExpressionMatrix rmasummary probeIndices NormMatrix The prostatecancerrawdata mat file used in the previous example contains data from Best et al 2005 1 Irizarry R A Hobbs B Collin F Beazer Barclay Y D Antonellis K
405. nyVal AnyValValue Data Matrix where each row corresponds to the experimental results for one gene Each column is the results for all genes from one experiment Names Cell array with the same number of rows as Data Each row contains the name or ID of the gene in the data set PrcetileValue Property to specify a percentile below which gene expression profiles are removed Enter a value from 0 to 100 AbsValueValue Property to specify an absolute value below which gene expression profiles are removed AnyValValue Property to select the minimum or maximum absolute value for comparison with AbsValueValue If AnyValValue is true selects the minimum absolute value If AnyValValue is false selects the maximum absolute value The default value is false Gene expression profile experiments have data where the absolute values are very low The quality of this type of data is often bad due to large quantization errors or simply poor spot hybridization Mask genelowvalfilter Data identifies gene expression profiles in Data with all absolute values less than the 10th percentile 2 181 genelowvalfilter Examples References See Also 2 182 Mask is a logical vector with one element for each row in Data The elements of Mask corresponding to rows with absolute expression levels greater than the threshold have a value of 1 and those with absolute expression levels less then the threshold are 0 Mask FData genelowvalfi
406. o acid See the Amino Acid Lookup Table on page 2 42 for valid single letter codes IntegerValue Single integer representing an amino acid See the Amino Acid Lookup Table on page 2 42 for valid integers AbbreviationValue String specifying a three letter abbreviation representing an amino acid See the Amino Acid Lookup Table on page 2 42 for valid three letter abbreviations NameValue String specifying an amino acid name See the Amino Acid Lookup Table on page 2 42 for valid amino acid names 2 41 aminolookup 2 42 Amino Acid Lookup Table Code Integer AbbreviatioNlame Codons A 1 Ala Alanine GCU GCC GCA GCG R 2 Arg Arginine CGU CGC CGA CGG AGA AGG N Asn Asparagine AAU AAC D 4 Asp Aspartic acid GAU GAC Aspartate c 5 Cys Cysteine UGU UGC Q 6 Gin Glutamine CAA CAG E 7 Glu Glutamic acid GAA GAG Glutamate G 8 Gly Glycine GGU GGC GGA GGG H 9 His Histidine CAU CAC I 10 Ile Isoleucine AUU AUC AUA L 11 Leu Leucine UUA UUG CUU CUC CUA CUG K 12 Lys Lysine AAA AAG M 13 Met Methionine AUG F 14 Phe Phenylalanine UUU UUC P 15 Pro Proline CCU CCC CCA CCG S 16 Ser Serine UCU UCC UCA UCG AGU AGC T 17 Thr Threonine ACU ACC ACA ACG W 18 Trp Tryptophan UGG 19 Tyr Tyrosine UAU UAC aminolookup Description Code Integer AbbreviatioName Codons V 20 Val Valine GUU GUC GUA GUG B 21 ASX Asparagine or AAU AAC GAU GAC Aspartic a
407. o maloglog ZeroValues warning off Bioinfo maloglog NegativeValues maloglog normRedBs normGreenBs title Normalized Background Subtracted Median Values factorlines true warning w See Also Bioinformatics Toolbox functions affyinvarsetnorm maboxplot magetfield mainvarsetnorm mairplot maloglog malowess quantilenorm rmasummary 2 385 mapcaplot Purpose Syntax Arguments Description 2 386 Create Principal Component Analysis plot of microarray data mapcaplot Data mapcaplot Data Label Data Microarray expression profile data Label Cell array of strings representing labels for the data points mapcaplot Data creates 2 D scatter plots of principal components of the array Data mapcaplot Data Label uses the elements of the cell array of strings Label instead of the row numbers to label the data points mapcaplot Principal Component Visualization Tool l a x Selected Data Component 2 9 6 Component 1 79 8 Component 1 7 vS Component 2 9 Component 3 4 1 Component 1 79 8 Component 1 Po vS Component 3 4 7 2 387 mapcaplot Examples See Also 2 388 Once you plot the principal components you can Select principal components for the x and y axes from the drop down list boxes below each scatter plot Click a data point to display its label Select a subset of da
408. o the corresponding three letter abbreviations aminolookup MWKQAEDIRDIYDF ans MetTrpLysGlnAlaGluAspIleArgAspIleTyrAspPhe 2 Convert an amino acid sequence in three letter abbreviations to the corresponding single letter codes aminolookup MetTrpLysGlnAlaGluAspIleArgAspIleTyrAspPhe ans MWKQAEDIRDIYDF 3 Display the three letter abbreviation and name for the amino acid corresponding to the single letter code R aminolookup code R ans Arg Arginine 2 44 aminolookup 4 Display the single letter code three letter abbreviation and name for the amino acid corresponding to the integer 1 aminolookup integer 1 ans A Ala Alanine 5 Display the single letter code and name for the amino acid corresponding to the three letter abbreviation asn aminolookup abbreviation asn ans N Asparagine 6 Display the single letter code and three letter abbreviation for the amino acid proline aminolookup Name proline ans P Pro See Also Bioinformatics Toolbox functions aa2int aacount geneticcode int2aa nt2aa revgeneticcode 2 45 atomiccomp Purpose Syntax Arguments Description Examples 2 46 Calculate atomic composition of protein NumberAtoms atomiccomp SeqAA SeqAA Amino acid sequence Enter a character string or vector of integers from the table You can also enter a structure with the field Sequence NumberAtoms atomiccomp SegAA counts the
409. o the number of nodes Each character must be unique Default values are the row or column numbers Note You must specify NodeIDs if you want to specify property name value pairs Set NodeIDs to to use the default values of the row column numbers String to identify the biograph object Default is This information is for bookkeeping purposes only biograph LabelValue DescriptionValue LayoutTypeValue EdgeTypeValue ScaleValue LayoutScaleValue String to label the biograph object Default is This information is for bookkeeping purposes only String that describes the biograph object Default is This information is for bookkeeping purposes only String that specifies the algorithm for the layout engine Choices are e hierarchical default e equilibrium e radial String that specifies how edges display Choices are e straight e curved default e segmented Note Curved or segmented edges occur only when necessary to avoid obstruction by nodes Biograph objects with LayoutType equal to equilibrium or radial cannot produce curved or segmented edges Positive number that post scales the node coordinates Default is 1 Positive number that scales the size of the nodes before calling the layout engine Default is 1 2 57 biograph 2 58 EdgeTextColorValue EdgeFontSizeValue ShowArrows Value ArrowSizeValue ShowWeights Va
410. obe set The columns correspond to the fields in a CHP probe set data structure ProbeSetNumber ProbePairNumber UseProbePair Background PMPosx PMPosyY PMIntensity PMStdDev PMPixels PMOutlier PMMasked MMPosX MMPosY MMIntensity MMStdDev MMPixels MMOutlier MMMasked There are some minor differences between the output of this function and the data in a CHP file The PM and MM Intensity values in the CHP file are normalized by the Affymetrix software This function returns the raw intensity values The UseProbePair and Background fields are only returned by this function for compatibility with the CHP probe set data structure and are always set to zero probesetvalues Examples 1 Get the file Drosophila 121502 cel from http www affymetrix com support technical sample_data demo_data affx 2 Read the data into MATLAB celStruct affyread Drosophila 121502 cel cdfStruct affyread D Affymetrix LibFiles DrosGenome1 DrosGenome1 CDF 3 Get the values for probe set 147439_at psvals probesetvalues celStruct cdfStruct 147439 at See Also Bioinformatics Toolbox functions affyread celintensityread probelibraryinfo probesetlink probesetlookup rmabackadj 2 577 profalign Purpose Syntax Description 2 578 Align two profiles using Needleman Wunsch global alignment Prof profalign Prof1 Prof2 Prof H1 H2 profalign Prof1 Pr
411. ociated with this CDF library file In this example the celintensityread function reads all the CEL files in the Current Directory and a CDF file in a specified directory The next command line uses the rmabackadj function to perform background adjustment on the PM probe intensities in the PMIntensities field of PMProbeStructure PMProbeStructure celintensityread HG_U95Av2 CDF CDFPath D Affymetrix LibFiles HGGenome BackAdjustedMatrix rmabackadj PMProbeStructure PMIntensities The following example lets you select CEL files and a CDF file to read using Open File dialog boxes PMProbeStructure celintensityread celintensityread See Also Bioinformatics Toolbox functions affyread agferead gprread probelibraryinfo probesetlink probesetlookup probesetplot probesetvalues sptread 2 81 classperf 2 82 Purpose Syntax Description Evaluate performance of classifier classperf cp classperf groundtruth classperf cp classout classperf cp classout testidx cp classperf groundtruth classout cp Classperf Positive PositiveValue Negative NegativeValue classperf provides an interface to keep track of the performance during the validation of classifiers classperf creates and updates a classifier performance object CP that accumulates the results of the classifier Later classification standard performance parameters can be accessed using th
412. ode t SG1 The following example converts the gene ND1 on the human mitochondria genome to an amino acid sequence mitochondria getgenbank NC_001807 SequenceOnly true NDigene mitochondria 3308 4264 proteini nt2aa NDigene GeneticCode 2 protein2 getgenpept NP_536843 SequenceOnly true The following example converts the gene ND2 on the human mitochondria genome to an amino acid sequence In this case the first codon is ATT which is translated to M while the following ATT codons are converted to I If you set AlternativeStartCodons to false then the first codon ATT is translated to I the corresponding amino acid in the Vertebrate Mitochondrial genetic code mitochondria getgenbank NC_001807 SequenceOnly true ND2gene mitochondria 4471 5514 proteini nt2aa ND2gene GeneticCode 2 protein2 getgenpept NP_536844 SequenceOnly true Bioinformatics Toolbox functions aa2int aminolookup baselookup codonbias dnds dndsml geneticcode revgeneticcode seqtool 2 517 nt2int Purpose Convert nucleotide sequence from letter to integer representation Syntax SeqInt nt2int SeqChar PropertyName PropertyValue nt2int Unknown UnknownValue nt2int ACGTOnly ACGTON1yValue Arguments SeqChar Nucleotide sequence represented with letters Enter a character string from the table Mapping Nucleotide Letters to Integers below Integers are arbitrarily assigned to IUB
413. odified residue code in the position corresponding to the modified residue The modified residue code is provided in the ModifiedResidues field The Model Field The Model field is also a structure or an array of structures containing coordinate information If the MATLAB structure contains one model the Model field is a structure containing coordinate information for that model If the MATLAB structure contains multiple models the Model 2 553 pdbread field is an array of structures containing coordinate information for each model The Model field contains the following subfields e Atom e AtomsSD e AnisotropicTemp e AnisotropicTempSD e Terminal e HeterogenAtom The Atom Field The Atom field is also an array of structures containing the following subfields e AtomSerNo e AtomName e altLoc e resName e chainID e resSeq e iCode e X e yY eZ e occupancy e tempFactor e segID e element 2 554 pdbread Examples e charge e AtomNameStruct Contains three subfields chemSymbol remoteind and branch Use the getpdb function to retrieve structure information from the Protein Data Bank PDB for the nicotinic receptor protein with identifier 1abt and then save the data to the PDB formatted file nicotinic receptor pdb in the MATLAB Current Directory getpdb tabt ToFile nicotinic_receptor pdb 2 Read the data from the nicotinic_receptor pdb file into a MATLAB structure pdbs
414. ods with No Scoring of Gaps Amino Acids Only Method Description Poisson Assumes that the number of amino acid substitutions at each site has a Poisson distribution Gamma Assumes that the number of amino acid substitutions at each site has a Gamma distribution with parameter a You can set a by using the Optargs property Default is 2 You can also specify a user defined distance function using for example distfun The distance function must be of the form function D distfun S7 S2 OptArgsValue The distfun function takes the following arguments e S1 S2 Two sequences of the same length nucleotide or amino acid e OptArgsValue Optional problem dependent arguments The distfun function returns a scalar that represents the distance between S7 and S2 D seqpdist Seqs Indels IndelsValue specifies how to treat sites with gaps Choices are e score default Scores these sites either as a point mutation or with the alignment parameters depending on the method selected e pairwise del For every pair wise comparison it ignores the sites with gaps 2 661 seqpdist e complete del Ignores all the columns in the multiple alignment that contain a gap This option is available only if a multiple alignment was provided as the input Seqs D seqpdist Seqs Optargs OptargsValue passes one or more arguments required or accepted by the distance method specified
415. of nodes and edges respectively e Prim Default algorithm Grows the minimal spanning tree MST one edge at a time by adding a minimal edge that connects a node in the growing MST with any other node Time complexity is O E log N where N and E are the number of nodes and edges respectively Note When the graph is unconnected Prim s algorithm returns only the tree that contains R while Kruskal s algorithm returns an MST for every component Tree pred graphminspantree Weights WeightsValue lets you specify custom weights for the edges WeightsValue is a column vector having one entry for every nonzero value edge in matrix G The order of the custom weights in the vector must match the order of the nonzero values in matrix G when it is traversed column wise By default graphminspantree gets weight information from the nonzero entries in matrix G 1 Create and view an undirected graph with 6 nodes and 11 edges W 41 29 51 32 50 45 38 32 36 29 21 DG sparse 1 122344556 6 26354163 4 2 5 W 2 273 graphminspantree UG tril DG DG UG 2 1 0 4100 4 1 0 4500 6 1 0 2900 3 2 0 5100 5 2 0 3200 6 2 0 2900 4 3 0 5000 5 3 0 3200 5 4 0 3600 6 4 0 3800 6 5 0 2100 view biograph UG ShowArrows off ShowWeights on 2 274 graphminspantree Biograph Viewer 1 lO x File Tools Window Help a Qa 2 Find and
416. of2 profalign PropertyName PropertyValue profalign ScoringMatrix ScoringMatrixValue profalign GapOpen G7 Value G2Value profalign ExtendGap E1Value E2Value profalign ExistingGapAdjust ExistingGapAdjustValue profalign TerminalGapAdjust TerminalGapAdjustValue profalign ShowScore ShowScoreValue Prof profalign Prof1 Prof2 returns a new profile Prof for the optimal global alignment of two profiles Prof1 Prof2 The profiles Prof1 Prof2 are numeric arrays of size 4 or 5 or 20 or 21 x Profile Length with counts or weighted profiles Weighted profiles are used to down weight similar sequences and up weight divergent sequences The output profile is a numeric matrix of size 5 or 21 x New Profile Length where the last row represents gaps Original gaps in the input profiles are preserved The output profile is the result of adding the aligned columns of the input profiles Prof H1 H2 profalign Prof1 Prof2 returns pointers that indicate how to rearrange the columns of the original profiles into the new profile profalign PropertyName PropertyValue defines optional properties using property name value pairs profalign ScoringMatrix ScoringMatrixValue defines the scoring matrix ScoringMatrixValue to be used for the alignment The default is BLOSUM50 for amino acids or NUC44 for nucleotide sequences
417. oinformatics Toolbox object biograph object Bioinformatics Toolbox methods of a biograph object allshortestpaths conncomp isdag isomorphism isspantree maxflow minspantree shortestpath traverse traverse biograph Purpose Traverse biograph object by following adjacent nodes Syntax disc pred closed traverse BGObj S traverse BGObj S Depth DepthValue traverse BGObj S Directed DirectedValue traverse BGObj S Method MethodValue Arguments BGObj biograph object created by biograph object constructor S Integer that indicates the source node in BGObj DepthValue Integer that indicates a node in BGObj that specifies the depth of the search Default is Inf infinity DirectedValue Property that indicates whether graph represented by an N by N adjacency matrix extracted from a biograph object BGObj is directed or undirected Enter false for an undirected graph This results in the upper triangle of the sparse matrix being ignored Default is true MethodValue String that specifies the algorithm used to traverse Description the graph Choices are e BFS Breadth first search Time complexity is O N E where N and E are number of nodes and edges respectively e DFS Default algorithm Depth first search Time complexity is 0 N E where N and E are number of nodes and edges respectively Tip For introductory information on graph theory functi
418. olecular alterations in primary prostate cancer after androgen ablation therapy Clinical Cancer Research 11 6823 6834 Bioinformatics Toolbox functions affyprobeseqread affyread celintensityread gcrmabackadj quantilenorm rmabackadj rmasummary 2 167 gcrmabackadj Purpose Perform GC Robust Multi array Average GCRMA background adjustment on Affymetrix microarray probe level data using sequence information Syntax PMMatrix_Adj gcrmabackadj PMMatrix MMMatrix AffinPM AffinMM PMMatrix_Adj nsbStruct gcrmabackadj PMMatrix MMMatrix AffinPM AffinMM gcrmabackadj OpticalCorr OpticalCorrValue gcrmabackadj CorrConst CorrConstValue gcrmabackadj Method MethodValue gcrmabackadj TuningParam TuningParamValue gcrmabackadj AddVariance AddVarianceValue gcrmabackadj Showplot ShowplotValue gcrmabackadj Verbose VerboseValue 2 168 gcrmabackad Arguments PMMatrix MMMatrix AffinPM AffinMM OpticalCorrValue Matrix of intensity values where each row corresponds to a perfect match PM probe and each column corresponds to an Affymetrix CEL file Each CEL file is generated from a separate chip All chips should be of the same type Tip You can use the PMIntensities matrix returned by the celintensityread function Matrix of intensity values where each row corresponds to a mism
419. on 10 2 271 281 Bioinformatics Toolbox functions dndsml featuresparse geneticcode nt2aa nwalign seqinsertgaps seqpdist dndsml Purpose Syntax Arguments Return Values Estimate synonymous and nonsynonymous substitution rates using maximum likelihood method Dn Ds Like dndsml SeqNT1 SeqNT2 Dn Ds Like dndsml SeqNT1 SeqNT2 GeneticCode GeneticCodeValue Dn Ds Like dndsml SeqNT1 SeqNT2 Verbose VerboseValue z SeqNT1 SeqNT2 GeneticCodeValue VerboseValue Dn Ds Like Nucleotide sequences Enter either a string or a structure with the field Sequence Property to specify a genetic code Enter a Code Number or a string with a Code Name from the table If you use a Code Name you can truncate it to the first two characters Default is 1 or Standard Property to control the display of the codons considered in the computations and their amino acid translations Choices are true or false default Tip Specify true to use this display to manually verify the codon alignment of the two input sequences The presence of stop codons in the amino acid translation can indicate that SeqNT1 and SeqNT2 are not codon aligned Nonsynonymous substitution rate s Synonymous substitution rate s Likelihood of estimate of substitution rates 2 125 dndsml Description 2 126 Dn Ds Like dndsml SeqNT1 SeqNT2 estimates the synonymous and nonsy
420. on 2 2 1 Release 2006a Revised for Version 2 3 Release 2006a Revised for Version 2 4 Release 2006b Revised for Version 2 5 Release 2007a Functions By Category Constructor gsc bo bi tad a yen beck tho ainn Be bebe ees 1 3 Data Formats and Databases 4 1 4 Trace Tools 04 2 43c3 ee Blew ood Bb eee OE aes 1 6 Sequence Conversion 000 cee cece eens 1 6 Sequence Utilities 0c ccc eee 1 7 Sequence Statistics 0 0 cee 1 8 Sequence Visualization 0 0 0 cece eee eee 1 9 Pair wise Sequence Alignment 4 1 10 Multiple Sequence Alignment 1 10 Scoring Matrices 0 eee nee 1 11 Phylogenetic Tree Tools 0 00 c eee eeeee 1 11 Graph Theory 226 552 66 Sh he ee Rad RRA A 1 12 Gene Ontology ccc eens 1 13 Protein Analysis 0 00 cc cece ee eee eens 1 13 Profile Hidden Markov Models 1 14 vi Microarray File Formats 0 0 cece eee 1 15 Microarray Utility 0 0 0 ces 1 15 Microarray Data Analysis and Visualization 1 16 Microarray Normalization and Filtering 1 17 Statistical Learning 000s 1 18 Mass Spectrometry File Formats Preprocessing and Visualization 35 41 00 8 4 been oon Eas a 1 19 Functions Alphabetical List 2 Methods By Category 3 P
421. on frequency for each amino acid and plot the results cb codonbias S Sequence PIE true cb Ala ans Codon GCA GCC GCG GCT Freq 0 1600 0 3867 0 2533 02000 MATLAB draws a figure with 20 pie charts for the 20 amino acids 2 101 codonbias loj x File Edit View Insert Took ee Window acy a Ala GCA CCT AGAS CG cga CGC oy GAC PP His lle e SQ occ so O7 CAG ATC M a Met ne Pa ET CTC AAA e Ecce Thr p x Val a TCT ACT TAT GTT STAM cso Orc OM Orea Toc ACC TGG TAC GTG See Also Bioinformatics Toolbox functions aminolookup codoncount geneticcode nt2aa 2 102 codoncount Purpose Syntax Arguments Description Count codons in nucleotide sequence Codons codoncount SeqNT codoncount PropertyName PropertyValue codoncount Frame FrameValue codoncount Reverse ReverseValue codoncount Figure FigureValue SeqnT Nucleotide sequence Enter a character string or vector of integers You can also enter a structure with the field Sequence FrameValue Property to select a reading frame Enter 1 default 2 or 3 ReverseValue Property to control returning the complement sequence Enter true or false default FigureValue Property to control plotting a heat map Enter either true or false default Codons codoncount SegN7T counts the number of codon in a sequence SeqNT and returns the codon counts
422. on m z values for the spectra Output Intensities is a matrix of reconstructed intensity values for a set of mass spectra that share the same m z range Each row corresponds to an m z value and each column corresponds to a spectrum or retention time The number of rows equals N msppresample uses a Gaussian kernel to reconstruct the signal The ion intensity at any given m z value is taken from the maximum intensity of any contributing overlapping peaks Tip msppresamp1le is useful to prepare a set of spectra for imaging functions such as msheatmap and preprocessing functions such as msbackadj and msnorm MZ Intensities msppresample Peaks N PropertyName PropertyValue calls msppresample with optional properties that use property name property value pairs You can specify one or more properties in any order Each PropertyName must be enclosed in single quotation marks and is case insensitive These property name property value pairs are as follows MZ Intensities msppresample Peaks N Range RangeValue specifies an m z range for the output matrix Intensities using the minimum and maximum m z values specified in the 1 by 2 vector RangeValue RangeValue must be within min inputMZ max inputMZ where inputMZ is the concatenated m z values from the input Peaks Default is the full range min inputMZ max inputMZ MZ Intensities msppresample Peaks N FWHH FWHHValue sets the full wid
423. onMethod CorrectionMethodValue Arguments Peaks Cell array of peak lists from a liquid chromatography mass spectrometry LC MS or gas chromatography mass spectrometry GC MS data set Each element in the cell array is a two column matrix with m z values in the first column and ion intensity values in the second column Each element corresponds to a spectrum or retention time Note You can use the mzxml2peaks function or the mspeaks function to create the Peaks cell array QuantileValue Value that determines which peaks are selected by the estimation method to create CMZ the vector of common m z values Choices are any value gt 0 and lt 1 Default is 0 95 2 455 mspalign EstimationMethodValue String specifying the method to estimate CMZ the vector of common mass charge m z values Choices are e histogram Default method Peak locations are clustered using a kernel density estimation approach The peak ion intensity is used as a weighting factor The center of all the clusters conform to the CMZ vector e regression Takes a sample of the distances between observed significant peaks and regresses the inter peak distance to create the CMZ vector with similar inter element distances CorrectionMethodValue String specifying the method to align each peak list to the CMZ vector Choices are e nearest neighbor Default method For each common peak in the CMZ vector its counterpart in each
424. onds to a replicate DataxX contains data from one experimental condition and DataY contains data from another experimental condition DataX and DataY must have the same number of rows and are assumed to be normally distributed in each class with equal variances PValues is a column vector of p values for each gene PValues TScores mattest DataX DataY also returns a t score for each gene in DataX and DataY TScores is a column vector of t scores for each gene mattest PValues TScores DFs mattest DataX DataY also returns DFs a column vector containing the degree of freedom for each gene across both data sets DataX and DatayY mattest PropertyName PropertyValue calls mattest with optional properties that use property name property value pairs You can specify one or more properties in any order Each PropertyName must be enclosed in single quotation marks and is case insensitive These property name property value pairs are as follows mattest Permute PermuteValue controls whether permutation tests are run and if so how many PermuteValue can be true false default or any integer greater than 2 If set to true the number of permutations is 1000 mattest Showhist ShowhistValue controls the display of histograms of t score distributions and p value distributions When ShowhistValue is true mattest displays histograms Default is false 2 391 mattest
425. ons maboxplot maimage mainvarsetnorm mairplot maloglog malowess manorm mapcaplot mattest 2 401 molweight Purpose Syntax Arguments Description Examples See Also 2 402 Calculate molecular weight of amino acid sequence molweight SeqAA SeqAA Amino acid sequence Enter a character string or a vector of integers from the Amino Acid Lookup Table on page 2 42 Examples ARN 1 2 3 You can also enter a structure with the field Sequence molweight SeqAA calculates the molecular weight for the amino acid sequence SeqAA 1 Get an amino acid sequence from the NCBI Genpept Database rhodopsin getgenpept NP_000530 2 Calculate the molecular weight of the sequence rhodopsinMW molweight rhodopsin rhodopsinMWw 3 8892e 004 Bioinformatics Toolbox functions aacount atomiccomp isoelectric proteinplot molviewer Purpose Display and manipulate 3 D molecule structure Syntax molviewer molviewer File molviewer pdbID molviewer pdbStruct FigureHandle molviewer 2 403 molviewer Arguments 2 404 File pdbID pdbStruct String specifying one of the following e File name ofa file on the MATLAB search path or in the MATLAB Current Directory e Path and file name e URL pointing to a file URL must begin with a protocol such as http ftp or file The referenced file is a molecule model file such as a Protein Data Bank PDB formatted fi
426. ons see Graph Theory Functions in the Bioinformatics Toolbox documentation 4 77 traverse biograph 4 78 disc pred closed traverse BGObj S traverses the directed graph represented by an N by N adjacency matrix extracted from a biograph object BGObj starting from the node indicated by integer S In the N by N sparse matrix all nonzero entries indicate the presence of an edge disc is a vector of node indices in the order in which they are discovered pred is a vector of predecessor node indices listed in the order of the node indices of the resulting spanning tree closed isa vector of node indices in the order in which they are closed traverse BGObj S PropertyName PropertyValue calls traverse with optional properties that use property name property value pairs You can specify one or more properties in any order Each PropertyName must be enclosed in single quotes and is case insensitive These property name property value pairs are as follows traverse BGObj S Depth DepthValue specifies the depth of the search DepthValue is an integer indicating a node in the graph represented by the N by N adjacency matrix extracted from a biograph object BGObj Default is Inf infinity traverse BGObj S Directed DirectedValue indiate whether the graph represented by the N by N adjacency matrix extracted from a biograph object BGObj is directed or undirected
427. onsider the width of your peaks in the spectrum and the presence of possible drifts If you have wider peaks towards the end of the spectrum you may want to use variable parameters msbackadj StepSize StepSizeValue specifies the steps for the shifting window The default value is 200 m z baseline point is estimated for windows placed every 200 m z StepSizeValue may also be a function handle The function is evaluated at the respective m z values and returns the distance between adjacent windows msbackadj RegressionMethod RegressionMethodValue specifies the method to regress the window estimated points to a soft curve Enter pchip shape preserving piecewise cubic interpolation linear inear interpolation or spline spline interpolation The default value is pchip msbackadj EstimationMethod EstimationMethodValue specifies the method for finding the likely baseline value in every window Enter quantile quantile value is set to 10 or em assumes a doubly stochastic model With em every sample is the independent and identically distributed i i d draw of any of two normal distributed classes background or peaks Because the class label is hidden the distributions are estimated with an Expectation Maximization algorithm The ultimate baseline value is the mean of the background class msbackadj Example msbackadj SmoothMethod SmoothMethodValue specifies the
428. onzero values in matrix G when it is traversed column wise This property lets you use zero valued weights By default graphshortestpaths gets weight information from the nonzero entries in matrix G 2 283 graphshortestpath Description 2 284 Tip For introductory information on graph theory functions see Graph Theory Functions in the Bioinformatics Toolbox documentation dist path pred graphshortestpath G S determines the single source shortest paths from node S to all other nodes in the graph represented by matrix G Input G is an N by N sparse matrix that represents a graph Nonzero entries in matrix G represent the weights of the edges dist are the N distances from the source to every node using Infs for nonreachable nodes and 0 for the source node path contains the winning paths to every node pred contains the predecessor nodes of the winning paths dist path pred graphshortestpath G S T determines the single source single destination shortest path from node S to node T graphshortestpath PropertyName PropertyValue calls graphshortestpath with optional properties that use property name property value pairs You can specify one or more properties in any order Each PropertyName must be enclosed in single quotes and is case insensitive These property name property value pairs are as follows graphshortestpath Directed DirectedValue indicates wheth
429. oordinates e Probe y coordinates e Probe sequences in each probe set e Affymetrix GeneChip array type FASTA file only The sequence file tab separated or FASTA must be on the MATLAB search path or in the Current Directory unless you use the SeqPath property In a tab separated file each row represents a probe in a FASTA file each header represents a probe Either of the following e String specifying a file name of an Affymetrix CDF library file which contains information that specifies which probe set each probe belongs to on a specific type of Affymetrix GeneChip array The CDF library file must be on the MATLAB search path or in the MATLAB Current Directory unless you use the CDFPath property e CDF structure such as returned by the affyread function which contains information that specifies which probe set each probe belongs to on a specific type of Affymetrix GeneChip array Caution Make sure that SeqFile and CDFFile contain information for the same type of Affymetrix GeneChip array affyprobeseqread SeqPathValue String specifying a directory or path and directory where SeqFile is stored CDFPathValue String specifying a directory or path and directory where CDFFile is stored SeqOnlyValue Controls the return of a structure Struct with only one field SequenceMatrix Choices are true or false default Return Struct MATLAB structure containing the following fields Values e ProbeSetIDs e
430. or ScaleValue GapOpenValue the local alignment Choices for amino acid sequences are e PAM40 e PAM250 e DAYHOFF e GONNET e BLOSUM30 increasing by 5 up to BLOSUM90O e BLOSUM62 e BLOSUM100 Default is e BLOSUM50 when AlphabetValue equals DAA 1 e NUC44 when AlphabetValue equals 1 NT 1 Note All of the above scoring matrices have a built in scale factor that returns Score in bits Scale factor used to return Score in arbitrary units other than bits Choices are any positive value For example if you enter log 2 for ScaleValue then swalign returns Score in nats Penalty for opening a gap in the alignment Choices are any positive integer Default is 8 2 717 swalign Return Values Description 2 718 ExtendGapValue Penalty for extending a gap Choices are any positive integer Default is equal to GapOpenValue ShowscoreValue Controls the display of the scoring space and the winning path of the alignment Choices are true or false default Score Optimal local alignment score in bits Alignment 3 by N character array showing the two sequences Seq and Seq2 in the first and third rows and symbols representing the optimal local alignment between them in the second row Start 2 by 1 vector of indices indicating the starting point in each sequence for the alignment Score swalign Seq1 Seq2 returns the optimal local alignment score in bits The sc
431. or more properties in any order Each PropertyName must be enclosed in single quotation marks and is case insensitive These property name property value pairs are as follows Struct affyprobeseqread SeqFile CDFFile SeqPath SeqPathValue lets you specify a path and directory where SeqFile is stored Struct affyprobeseqread SeqFile CDFFile CDFPath CDFPathValue lets you specify a path directory where CDFFile is stored Struct affyprobeseqread SeqFile CDFFile SeqOnly SegOnlyValue controls the return of a structure Struct with only one field SequenceMatrix Choices are true or false default Examples 1 Read the data from a FASTA file and associated CDF library file assuming both are located on the MATLAB search path or in the Current Directory S1 affyprobeseqread HG U95A_probe_fasta HG_U95A CDF 2 Read the data from a tab separated file and associated CDF structure assuming the tab separated file is located in the specified directory and the CDF structure is in your MATLAB Workspace S2 affyprobeseqread HG U95A_probe_tab hgu95aCDFStruct seqpath C Affymetrix SequenceFiles HGGenome 3 Access the nucleotide sequences of the first probe set rows 1 through 20 in the SequenceMatrix field of the S2 structure seq int2nt S2 SequenceMatrix 1 20 See Also Bioinformatics Toolbox functions affyinvarsetnorm affyread celintensityread int2nt probelibraryin
432. or the smoothing function If SoanValue is less than 1 the window size is taken to be a fraction of the number of points in the data If SpanValue is greater than 1 the window is of size SpanValue 2 381 malowess Examples See Also 2 382 maStruct gprread mouse_alwt gpr cy3data magetfield maStruct F635 Median cy5data magetfield maStruct F532 Median x y mairplot cy3data cy5data drawnow ysmooth malowess x y hold on plot x ysmooth rx ynorm y ysmooth Bioinformatics Toolbox functions affyinvarsetnorm maboxplot magetfield maimage mainvarsetnorm mairplot maloglog manorm quantilenorm Statistics Toolbox function robustfit manorm Purpose Syntax Description Normalize microarray data XNorm manorm X XNorm manorm MAStruct FieldName XNorm ColVal manorm manorm Method MethodValue manorm Extra_Args Extra_ArgsValue manorm LogData LogDataValue manorm Percentile PercentileValue manorm Global GlobalValue manorm StructureOutput StructureOutputValue manorm NewColumnName NewColumnNameValue XNorm manorm X scales the values in each column of microarray data X by dividing by the mean column intensity e X Microarray data Enter a vector or matrix e XNorm Normalized microarray data XNorm manorm MAStruct FieldName scales the data for a field FieldN
433. ore Tmdelta defines the possible range of melting temperatures for SeqNT 2 534 oligoprop Field Description Thermo 4 by 3 matrix of thermodynamic calculations The rows correspond to nearest neighbor parameters from e Breslauer et al 1986 e SantaLucia Jr et al 1996 e SantaLucia Jr 1998 e Sugimoto et al 1996 The columns correspond to e delta H Enthalpy in kilocalories per mole kcal mol e delta S Entropy in calories per mole degrees Kelvin cal K mol e delta G Free energy in kilocalories per mole kcal mol Ambiguous N characters in SeqNT are considered to potentially be any nucleotide If SeqNT contains ambiguous N characters Thermo is the midpoint value and its uncertainty is expressed by Thermodelta Thermodelta 4 by 3 matrix containing the differences between Thermo midpoint value and either the maximum or minimum value Thermo could assume for each calculation and method Therefore Thermodelta defines the possible range of thermodynamic values for SeqNT SeqProperties oligoprop SeqnT PropertyName PropertyValue calls oligoprop with optional properties that 2 535 oligoprop Examples 2 536 use property name property value pairs You can specify one or more properties in any order Each PropertyName must be enclosed in single quotation marks and is case insensitive These property name property value pairs are as follows SeqgPropert
434. oring matrix The order of amino acids in the matrixisA RNDCQEGHILKM FPS TWYVBZ X See Also Bioinformatics Toolbox functions blosum gonnet pam 2 113 dimercount Purpose Syntax Arguments Description 2 114 Count dimers in sequence Dimers dimercount SeqNT Dimers Percent dimercount SeqNT dimercount PropertyName PropertyValue dimercount Chart ChartStyle SeqnT Nucleotide sequence Enter a character string or vector of integers Examples ACGT and 1 2 3 4 You can also enter a structure with the field Sequence ChartStyleValue Property to select the type of plot Enter pie or bar Dimers dimercount SeqNT counts the number of nucleotide dimers in a 1 by 1 sequence and returns the dimer counts in a structure with the fields AA AC AG AT CA CC CG CT GA GC GG GT TA TC TG TT e For sequences that have dimers with the character U the U characters are added to dimers with T characters If the sequence contains ambiguous nucleotide characters R Y K M S W B DH V N or gaps indicated with a hyphen this function creates a field Others and displays a warning message Warning Ambiguous symbols symbol list appear in the sequence These will be in Others e If the sequence contains undefined nucleotide characters E F H I J L O P Q X Z codoncount ignores the characters and displays a warning message dimercount Examples Warni
435. ormation about the PFAM database see http www sanger ac uk Software Pfam http pfam janelia org 2 213 gethmmalignment AlignStruct gethmmalignment IgnoreGaps IgnoreGaps controls the removal of the symbols and from the sequence Choices are true or false default Examples To retrieve a multiple alignment of the sequences used to train the HMM profile for global alignment to the 7 transmembrane receptor protein in the secretin family enter either of the following pfamalign gethmmalignment 2 Type seed pfamalign gethmmalignment PFO0002 Type seed pfamalign 32x1 struct array with fields Header Sequence See Also Bioinformatics Toolbox functions fastaread gethmmprof gethmmtree multialignread pfamhmmread 2 214 gethmmprof Purpose Syntax Arguments Retrieve hidden Markov model HMM profile from PFAM database HMMStruct HMMStruct HMMStruct HMMStruct HMMStruct HMMStruct PFAMName PFAMNumber gethmmprof PFAMName gethmmprof PFAMNumber gethmmprof PFAMAccessNumber gethmmprof ToFile ToFileValue gethmmprof Mode ModeValue gethmmprof Mirror MirrorValue String specifying a protein family name unique identifier of an HMM profile record in the PFAM database For example 7tm_2 Integer specifying a protein family number of an HMM profile record in the PFAM database For exampl
436. ossvalind knnclassify svmtrain Statistics Toolbox function classify Optimization Toolbox function quadprog 2 695 svmsmoset Purpose Syntax Arguments 2 696 Create or edit Sequential Minimal Optimization SMO options structure SMO_OptsStruct svmsmoset Property1Name Property1Value Property2Name Property2Value SMO_OptsStruct svmsmoset OldOpts Property1Name Property1Value Property2Name Property2Value SMO_OptsStruct svmsmoset OldOpts NewOpts OldOpts Structure that specifies options used by the SMO method used by the svmtrain function NewOpts Structure that specifies options used by the SMO method used by the svmtrain function PropertyName Description of PropertyValue ToLKKT Value that specifies the tolerance with which the KKT conditions are checked KKT conditions are Karush Kuhn Tucker conditions Default is 1 0000e 003 MaxIter Integer that specifies the maximum number of iterations of the main loop If this limit is exceeded before the algorithm converges then the algorithm stops and returns an error Default is 1500 svmsmoset PropertyName Display Description of PropertyValue String that specifies the level of information about the optimization iterations that is displayed as the algorithm runs Choices are e off Default Reports nothing e iter Reports every 10 iterations e final Reports only when the algori
437. ot LabelsValue A cell array of labels for the data in X and Y If you specify LabelsValue then clicking a data point in the plot shows the label corresponding to that point Description maloglog X Y PropertyName PropertyValue creates a loglog scatter plot of X versus Y X and Y are numeric arrays of microarray expression values from two different experimental conditions maloglog FactorLines N adds two lines to the plot showing a factor of N change maloglog Title TitleValue allows you to specify a title for the plot maloglog Labels LabelsValues allows you to specify a cell array of labels for the data If LabelsValues is defined then clicking a data point in the plot shows the label corresponding to that point maloglog HandleGraphicsName HGValue allows you to pass optional Handle Graphics property name property value pairs to the function 2 379 maloglog 2 380 Examples See Also H maloglog returns the handle to the plot maStruct gprread mouse_alwt gpr Red magetfield maStruct F635 Median Green magetfield maStruct F532 Median maloglog Red Green title Red vs Green Add factorlines and labels figure maloglog Red Green title Red vs Green FactorLines 2 LABELS maStruct Names Now create a normalized plot figure maloglog manorm Red manorm Green title Normalized Red vs Green FactorLines
438. oup 1 Data in group 2 hold off 5r 3 e Training group 1 4 x Training group 2 e Data in group 1 3 Ti e Data in group 2 e ae a e i A tt we o xX xq o Hf gt F 1F x e xs T on et m e x T E Th Fa to oy OK RO Tx he Het e Se oe k x F o L x 1 x x X xXe ot 6 x x Xe OR HO 2 X By X x x x m ax ps xx x Xe x 3b e e e j a x x x sil x eo ee _5Le e i e o i 5 0 Classifying Rows Using the Three Nearest Neighbors The following example uses the same data as in Example 2 but classifies the rows of sample using three nearest neighbors instead of one gscatter training 1 training 2 group rb x hold on c3 knnclassify sample training group 3 2 344 knnclassify References See Also gscatter sample 1 sample 2 c3 mc o legend Training group 1 Training group 2 Data in group 1 Data in group 2 5f 8 e Training group 1 4 x Training group 2 s e Data in group 1 3L car ar e Data in group 2 e Fe F o a ol a s bj sqa x ia T a oe pe xe w e ee i amp xd Ltt E A RX a et OF b Eryx 4s FOR ap o x i x F A Pet he xe amp x ee 1 e x X x x 4 X XXe xT OK e 2 yr aT P z x
439. owess Purpose Syntax Arguments Description 2 446 Smooth mass spectrum using nonparametric method Yout mslowess MZ Y PropertyName PropertyValue mslowess Order OrderValue mslowess Span SpanValue mslowess Kernel KernelValue mslowess RobustIterations RobustIterationsValue mslowess ShowPlot ShowPlotValue MZ Mass charge vector with the range of ions in the spectra Y Ion intensity vector with the same length as the mass charge vector MZ Y can also be a matrix with several spectra that share the same mass charge MZ range Yout mslowess MZ Y PropertyName PropertyValue smoothes a mass spectrum Y using a locally weighted linear regression Lowess method with a default span of 10 samples Note 1 mslowess assumes that a mass charge vector MZ might not be uniformly spaced Therefore the sliding window for smoothing is centered using the closest samples in terms of the MZ value and not in terms of the MZ indices 2 When the vector MZ does not have repeated values or NaNs the algorithm is approximately twice as fast mslowess Order OrderValue specifies the order OrderValue of the Lowess smoother Enter 1 linear polynomial fit or Lowess 2 quadratic polynomial fit or Loess or 0 equivalent to a weighted local mean estimator and presumably faster because only a mslowess mean computation is performed i
440. owplotValue is true mainvarsetnorm plots the M A scatter plots Default is false The following example illustrates how mainvarsetnorm can correct for dye bias or scanning differences between two channels of data from a two color microarray experiment Under perfect experimental conditions data points with equal expression values would fall along the M 0 line which represents a gene expression ratio of 1 However dye bias caused the measured values in one channel to be higher than the other channel as seen in the Before Normalization plot Normalization corrected the variance as seen in the After Normalization plot mainvarsetnorm 2101 xi File Edit View Insert Tools Desktop Window Help DW kh eana E E M A plots Before normalization After normalization o Invariant set Invariant set Smooth curve Examples The following example extracts data from a GPR file and creates two column vectors of gene expression values from different experimental conditions It then normalizes one of the data sets maStruct gprread mouse_aiwt gpr cy3data magetfield maStruct F635 Median cy5data magetfield maStruct F532 Median Normcy5data mainvarsetnorm cy3data cy5data References 1 Tseng G C Oh Min Kyu Rohlin L Liao J C and Wong W H 2001 Issues in cDNA microarray analysis quality filtering channel 2 369 mainvarsetnorm See Also 2 370 normalization models of
441. paths conncomp isdag isomorphism isspantree minspantree shortestpath topoorder traverse minspantree biograph Purpose Syntax Arguments Description Find minimal spanning tree in biograph object Tree pred minspantree BGObj Tree pred minspantree BGObj R Tree pred minspantree Method MethodValue Tree pred minspantree Weights WeightsValue BGObj biograph object created by biograph object constructor R Scalar between 1 and the number of nodes Tip For introductory information on graph theory functions see Graph Theory Functions in the Bioinformatics Toolbox documentation Tree pred minspantree B8GObj finds an acyclic subset of edges that connects all the nodes in the undirected graph represented by an N by N adjacency matrix extracted from a biograph object BGObj and for which the total weight is minimized Weights of the edges are all nonzero entries in the lower triangle of the N by N sparse matrix Output Tree is a spanning tree represented by a sparse matrix Output pred is a vector containing the predecessor nodes of the minimal spanning tree MST with the root node indicated by 0 The root node defaults to the first node in the largest connected component This computation requires an extra call to the graphconncomp function Tree pred minspantree BGObj R sets the root of the minimal spanning tree to node R Tree pred min
442. pe code Each line type code is stored as a separate element in the structure EMBLData contains the following fields Field Identification EntryName Identification Version Identification Topology Identification Molecule Identification DataClass emblread Field Identification Division Identification SequenceLength Accession SequenceVersion DateCreated DateUpdated Description Keyword OrganismSpecies OrganismClassification Organelle Reference Number Reference Comment Reference Position Reference MedLine Reference PubMed Reference Authors Reference Title Reference Location DatabaseCrossReference Comments Feature Basecount BP Basecount A Basecount C 2 131 emblread Examples See Also 2 132 Field Basecount G Basecount T Basecount Other Sequence Note Topology information was not included in EMBL flat files before release 87 of the database When reading a file created before release 87 EMBLREAD returns an empty Identification Topology field Note The entry name is no longer displayed in the ID line of EMBL flat files in release 87 When reading a file created in release 87 EMBLREAD returns the accession number in the Identification EntryName field EMBLSeq emblread File SequenceOnly SequenceOnlyValue when SequenceOnlyValue is true reads only the sequence information Get sequence information from the Web save to a file and
443. perimental condition For example in a two color microarray experiment DataX could be cy3 intensity values and DataY could be cy5 intensity values PermuteValue Controls whether permutation tests are run and if so how many Choices are true false default or any integer greater than 2 If set to true the number of permutations is 1000 ShowhistValue Controls the display of histograms of t score distributions and p value distributions Choices are true or false default 2 389 mattest Return Values Description 2 390 ShowplotValue Controls the display of a normal t score quantile plot Choices are true or false default In the t score quantile plot data points with t scores gt 1 1 2N or lt 1 2N display with red circles N is the total number of genes LabelsValue Cell array of labels typically gene names or probe set IDs for each row in DataX and DatayY The labels display if you click a data point in the t score quantile plot PValues Column vector of p values for each gene in Datax and Datay TScores Column vector of t scores for each gene in Datax and Datay DFs Column vector containing the degree of freedom for each gene in DataxX and Data PValues mattest DataX DataY compares the gene expression profiles in Datax and DataY and returns a p value for each gene Datax and DataY are matrices of gene expression values in which each row corresponds to a gene and each column corresp
444. ples See Also Read phylogenetic tree file Tree phytreeread File File Newick formatted tree files ASCII text file Enter a file name a path and file name or a URL pointing to a file File can also be a MATLAB character array that contains the text for a file Tree phytree object created with the function phytree Tree phytreeread File reads a Newick formatted tree file and returns a phytree object in the MATLAB workspace with data from the file The NEWICK tree format can be found at http evolution genetics washington edu phylip newicktree html Note This implementation only allows binary trees Non binary trees are translated into a binary tree with extra branches of length 0 tr phytreeread pf00002 tree Bioinformatics Toolbox functions phytree object constructor gethmmtree phytreetool phytreewrite 2 565 phytreetool Purpose View edit and explore phylogenetic tree data Syntax phytreetool Tree phytreetool File Arguments Tree Phytree object created with the functions phytree or phytreeread File Newick or ClustalW tree formatted file ASCII text file with phylogenetic tree data Enter a file name a path and file name or a URL pointing to a file File can also be a MATLAB character array that contains the text for a Newick file Description phytreetool is an interactive GUI that allows you to view edit and explore phylogenetic tree data This GUI allows branch pruning
445. pointing to a file File can also be a MATLAB character array that contains the text of a GenPept file genpeptread reads data from a GenPept formatted file into a MATLAB structure Note NCBI has changed the name of their protein search engine from GenPept to Entrez Protein However the function names in Bioinformatics Toolbox getgenpept and genpeptread are unchanged representing the still used GenPept report format GenPeptData genpeptread File reads in the GenPept formatted sequence from File and creates a structure GenPeptData containing fields corresponding to the GenPept keywords Each separate sequence listed in File is stored as a separate element of the structure GenPeptDATA contains these fields Field LocusName LocusSequenceLength LocusMoleculeType LocusGenBankDivision LocusModificationDate Definition genpeptread Examples Field Accession PID Version GI DBSource Keywords Source SourceDatabase SourceOrganism Reference Number Reference Authors Reference Title Reference Journal Reference MedLine Reference PubMed Reference Remark Comment Features Weight Length Sequence Get sequence information for the protein coded by the gene HEXA save to a file and then read back into MATLAB getgenpept p06865 ToFile TaySachs_Protein txt genpeptread TaySachs Protein txt 2 193 genpeptread See Also Bioinformatics Toolbox functions fastarea
446. port vector machine classifier SVMStruct SVMStruct svmtrain Kernel_FunctionValue SVMStruct svmtrain SVMStruct svmtrain SVMStruct svmtrain Mlp_ParamsValue SVMStruct svmtrain SVMStruct svmtrain QuadProg OptsValue svmtrain Training Group Kernel Function RBF_Sigma RBFSigmaValue Polyorder PolyorderValue Mlp_Params Method MethodValue QuadProg Opts SVMStruct svmtrain SMO _Opts SMO_OptsValue SVMStruct svmtrain BoxConstraint BoxConstraintValue z SVMStruct svmtrain Autoscale AutoscaleValue SVMStruct svmtrain Showplot ShowplotValue Training Matrix of training data where each row corresponds to an observation or replicate and each column corresponds to a feature or variable Group Column vector character array or cell array of strings for classifying data in Training into two groups It has the same number of elements as there are rows in Training Each element specifies the group to which the corresponding row in Training belongs svmtrain Kernel_FunctionValue String or function handle specifying the RBFSigmaValue PolyorderValue Ml1p_ParamsValue kernel function that maps the training data into kernel space Choices are e linear Default Linear kernel or dot product e quadratic Quadratic kernel e rbf Gaussian Radial Basis Funct
447. prof 2 685 sptread 2 687 svmclassify 2 689 svmsmoset 2 696 svmtrain 2 700 swalign 2 716 traceplot 2 723 G galread function reference 2 158 gcrma function reference 2 159 gcrmabackadj function reference 2 168 genbankread function reference 2 177 geneentropyfilter function reference 2 179 genelowvalfilter function reference 2 181 geneont function reference 2 183 geneont object reference 5 11 generangefilter function reference 2 186 geneticcode function reference 2 188 genevarfilter function reference 2 190 genpeptread function reference 2 192 geosoftread function reference 2 195 get method reference 4 11 getancestors method biograph object 4 13 geneont object 4 16 getblast function reference 2 197 getbyname method reference 4 20 getcanonical method reference 4 22 getdescendants method biograph object 4 24 Index geneont object 4 27 getedgesbynodeid method reference 4 29 getembl function reference 2 200 getgenbank function reference 2 203 getgenpept function reference 2 206 getgeodata function reference 2 209 gethmmalignment function reference 2 211 gethmmprof function reference 2 215 gethmmtree function reference 2 220 getmatrix biograph method reference 4 31 getmatrix geneont method reference 4 32 getmatrix phytree method reference 4 33 getnewickstr method reference 4 34 getnodesbyid method reference 4 36 getpdb function reference 2 222 getrelatives method biograph object 4 38 geneon
448. putMZ is the concatenated m z values from the input Peaks The default is a rough approximation of resolution observed in the input data Peaks Tip To ensure that the resolution of the peaks is preserved set FWHHValue to half the distance between the two peaks of interest that are closest to each other Controls the display of a plot of an original and resampled spectrum Choices are true false or I an integer specifying the index of a spectrum in Intensities If set to true the first spectrum in Intensities is plotted Default is e false When return values are specified e true When return values are not specified Vector of equally spaced common mass charge m z values for a set of spectra The number of elements in the vector equals N or the number of rows in matrix Intensities Matrix of reconstructed intensity values for a set of mass spectra that share the same mass charge m z range Each row corresponds to an m z value and each column corresponds to a spectrum or retention time The number of rows equals N or the number of elements in vector MZ 2 479 msppresample Description 2 480 MZ Intensities msppresample Peaks N resamples Peaks a mass spectrometry peak list by converting centroided peaks to a semicontinuous raw signal that preserves peak information The resampled signal has N equally spaced points Output MZ is a vector of N elements specifying the equally spaced comm
449. quence Length with the frequency or count of amino acids or nucleotides for every position Profile can also have 21 or 5 rows if gaps are included in the consensus ScoringMatrixValue Scoring matrix The default value is BLOSUM50 for amino acid sequences or NUC44 for nucleotide sequences ScoringMatrix can also be a 21x21 5x5 20x20 or 4x4 numeric array For the gap included cases gap scores last row column are set to mean diag ScoringMatrix for a gap matching with another gap and set to mean nodiag ScoringMatrix for a gap matching with another symbol CSeq seqconsensus Seqs for a multiply aligned set of sequences Seqs returns a string with the consensus sequence CSeq The frequency of symbols 20 amino acids 4 nucleotides in the set of sequences is determined with the function seqprofile For ambiguous seqconsensus nucleotide or amino acid symbols the frequency or count is added to the standard set of symbols CSeq Score seqconsensus Seqs returns the conservation score of the consensus sequence Scores are computed with the scoring matrix BLOSUM50 for amino acids or NUC44 for nucleotides Scores are the average euclidean distance between the scored symbol and the M dimensional consensus value M is the size of the alphabet The consensus value is the profile weighted by the scoring matrix CSeq seqconsensus Profile returns a string with the consensus sequence CSeq from a sequence profile Prof
450. quences have the same length Tip If your input sequences have the same length seqpdist will assume they aligned If they are not aligned do one of the following e Align the sequences before passing them to seqpdist for example using the multialign function e Set PairwiseAlignment to true when using seqpdist 2 655 seqpdist JobManager Value WaitInQueueValue SquareFormValue AlphabetValue 2 656 A jobmanager object such as returned by the Distributed Computing Toolbox function findResource that represents an available distributed MATLAB resource Specifying this property distributes pair wise alignments into a cluster of computers using Distributed Computing Toolbox You must have Distributed Computing Toolbox to use this property Controls whether seqpdist waits for a distributed MATLAB resource to be available when you have set the JobManager property Choices are true or false default You must have Distributed Computing Toolbox to use this property Controls the conversion of the output into a square matrix Choices are true or false default String specifying the type of sequence nucleotide or amino acid Choices are NT or AA default seqpdist ScoringMatrixValue ScaleValue GapOpenValue ExtendedGapValue String specifying the scoring matrix to use for the global pair wise alignment Choices for amino acid sequences are e PAM40 e PAM250 e DAYH
451. quences in Seqs Default ending position is the maximum length of the sequences in Seqs Controls the use of small sample correction in the estimation of the number of bits Choices are true default or false Cell array containing the symbol list in Seqs and the weight matrix used to graphically display the sequence logo seqlogo Seqs displays a sequence logo for Seqs a set of aligned sequences The logo graphically displays the sequence conservation at a particular position in the alignment of sequences measured in bits The maximum sequence conservation per site is 10g2 4 bits for nucleotide sequences and 1og2 20 bits for amino acid sequences If the sequence conservation value is zero or negative no logo is displayed in that position seqlogo Profile displays a sequence logo for Profile a sequence profile distribution matrix with the frequency of nucleotides or amino acids for every column in the multiple alignment such as returned by the seqprofile function seqlogo Color Code for Nucleotides Nucleotide Color A Green c Blue G Yellow T U Red Other Purple Color Code for Amino Acids Amino Acid Chemical Color Property GSTYCQN Polar Green AVLIPWFM Hydrophobic Orange DE Acidic Red K RH Basic Blue Other Tan DisplayInfo seqlogo Seqs returns a cell array of unique symbols in a sequence Seqs and the information weight matrix used to graphica
452. r phytreeread pf00002 tree 2 Get the subtree that contains the VIPS and CGRR human proteins sel getbyname tr vips_human cgrr_human sel any sel 2 tr subtree tr sel view tr Bioinformatics Toolbox e functions phytree object constructor e phytree object methods get getbyname prune select 4 75 topoorder biograph Purpose Syntax Arguments Description References See Also 4 76 Perform topological sort of directed acyclic graph extracted from biograph object order topoorder BGObj BGObj biograph object created by biograph object constructor Tip For introductory information on graph theory functions see Graph Theory Functions in the Bioinformatics Toolbox documentation order topoorder 8GObj returns an index vector with the order of the nodes sorted topologically In topological order an edge can exist between a source node u and a destination node v if and only if u appears before v in the vector order BGObj is a biograph object from which an N by N adjacency matrix is extracted and represents a directed acyclic graph DAG In the N by N sparse matrix all nonzero entries indicate the presence of an edge 1 Siek J G Lee L Q and Lumsdaine A 2002 The Boost Graph Library User Guide and Reference Manual Upper Saddle River NJ Pearson Education Bioinformatics Toolbox functions biograph object constructor graphtopoorder Bi
453. r w title jcampStruct Title xlabel data XUnits ylabel data YUnits A figure window opens with the mass spectrum jcampread O x File Edit View Insert Tools Desktop Window Help a 2 Chlorphenol 100 W O zZ T Q Z D m lt W gt lt m X See Also Bioinformatics Toolbox functions mslowess mssgolay msviewer mzxmlread 2 337 joinseq Purpose Syntax Arguments Description Examples See Also 2 338 Join two sequences to produce shortest supersequence SeqNT3 joinseq SeqNT1 SeqNT2 SeqNT1 SeqNT2 Nucleotide sequences SeqNT3 joinseq SeqNT1 SeqNT2 creates a new sequence that is the shortest supersequence of SeqNT1 and SeqNT2 If there is no overlap between the sequences then SeqgNT2 is concatenated to the end of SeqNT1 If the length of the overlap is the same at both ends of the sequence then the overlap at the end of SeqNT1 and the start of SeqNT2 is used to join the sequences If SeqNT1 is a subsequence of SeqNT2 then SeqNT2 is returned as the shortest supersequence and vice versa seqi ACGTAAA seq2 AAATGCA joined joinseq seq1 seq2 joined ACGTAAATGCA MATLAB functions cat strcat strfind knnclassify Purpose Syntax Arguments Classify data using nearest neighbor method Class knnclassify Sample Training Group Class knnclassify Sample Training Group k Class knnclassify Sample Training Group
454. r string with the characters A T U G C and ambiguous characters R Y K M S W B D H V N or a vector of integers You can also enter a structure with the field Sequence SeqR Returns a sequence in the same format as the nucleotide sequence For example if SeqNT is an integer sequence then so is Segf seqreverse calculates the reverse strand of a DNA or RNA sequence SegR seqreverse SeqNT calculates the reverse strand 3 gt 5 of the nucleotide sequence Reverse a nucleotide sequence s ATCG seqreverse s ans GCTA Bioinformatics Toolbox functions seqcomplement seqrcomplement seqtool MATLAB function flipir 2 669 seqshoworfs Purpose Display open reading frames in sequence Syntax seqshoworfs SeqNT seqshoworfs SeqnT Frames FramesValue seqshoworfs SeqnT GeneticCode GeneticCodeValue seqshoworfs SeqnT MinimumLength MinimumLengthValue ied seqshoworfs SeqnNT AlternativeStartCodons AlternativeStartCodonsValue seqshoworfs SeqnT Color ColorValue seqshoworfs SeqnT Columns ColumnsValue Arguments SeqnT Nucleotide sequence Enter either a character string with the characters A T U G C and ambiguous characters R Y K M S W B D H V N or a vector of integers You can also enter a structure with the field Sequence FramesValue Property to select the frame Enter 1 2 3 1 2 3 enter a vector with integer
455. r Identification Number Description Short description of the model A profile Markov model is a common statistical tool for modeling structured sequences composed of symbols These symbols include randomness in both the output emission of symbols and the state 2 321 hmmprofstruct transitions of the process Markov models are generally represented by state diagrams The figure shown below is a state diagram for a HMM profile of length 4 Insert match and delete states are in the regular part middle section e Match state means that the target sequence is aligned to the profile at the specific location e Delete state represents a gap or symbol absence in the target sequence also know as a silent state because it does not emit any symbol e Insert state represents the excess of one or more symbols in the target sequence that are not included in the profile Flanking states S N B E C T are used for proper modeling of the ends of the sequence either for global local or fragment alignment of the profile S N E and T are silent while N and C are used to insert symbols at the flanks Examples hmmprofstruct 100 Alphabet AA See Also Bioinformatics Toolbox functions aacount basecount gethmmprof hmmprofalign hmmprofestimate hmmprofgenerate hmmprofmerge pfamhmmread showhmmprof 2 322 imageneread Purpose Read microarray data from ImaGene Results file Syntax imagenedata im
456. r Saddle River NJ Pearson Education Bioinformatics Toolbox functions biograph object constructor graphisdag Bioinformatics Toolbox object biograph object Bioinformatics Toolbox methods of a biograph object allshortestpaths conncomp isomorphism isspantree maxflow minspantree shortestpath topoorder traverse 4 41 isomorphism biograph 4 42 Purpose Syntax Arguments Description Find isomorphism between two biograph objects Isomorphic Map isomorphism BGObj7 BGObj2 Isomorphic Map isomorphism BGObj1 BGObj2 Directed DirectedValue BGObj 1 biograph object created by biograph object constructor BGOb j 2 biograph object created by biograph object constructor DirectedValue Property that indicates whether the graphs are directed or undirected Enter false when both BGObj 1 and BGObj2 produce undirected graphs In this case the upper triangles of the sparse matrices extracted from BGObj1 and BGObj2 are ignored Default is true meaning that both graphs are directed Tip For introductory information on graph theory functions see Graph Theory Functions in the Bioinformatics Toolbox documentation Isomorphic Map isomorphism BGObj1 BGObj2 returns logical 1 true in Isomorphic if two N by N adjacency matrices extracted from biograph objects BGObj 1 and BGObj2 are isomorphic graphs and logical 0 false otherwise A graph isomorphism is a 1 to 1 mapping of the node
457. r that feature QualifiersValue is a cell array of strings By default QualifiersValue gene product locus _tag note db xref protein_id Provide your own QualifiersValue to limit or expand the list of qualifiers or change the search order Tip Set QualifiersValue to create a map with no annotations featuresmap Tip To determine all qualifiers available for a given feature do either of the following e Create the map and then click a feature or its annotation to list all qualifiers for that feature e Use the featuresparse command to parse all the features into a new structure and then use the fieldnames command to list the qualifiers for a specific feature See Determining Qualifiers for a Specific Feature on page 2 150 featuresmap ShowPositions ShowPositionsValue lets you add the sequence position to the annotation label If ShowPositionsValue is true sequence positions are added to the annotation labels Default is false 2 147 featuresmap 2 148 File Edit View Insert Tools Desktop Window Help OSHS h Q8QM9 06 20 Human Mitochondrion Complete Genome origin H strand replication 2 D loop termination signal O origin H strand replication 1 tRNA Phe 125 ribosomal RNA tRNA Val 165 ribosomal RNA RNIRNA 9991 10058 MTNC iproduct tRNA Gly cytochrome oxidase subunit 1 ato Naate codon re
458. r the electron transport heme protein that has a PDB identifier of 5CYT read the information into a MATLAB structure pdbstruct and save the information to a PDB formatted file electron_transport pdb in the MATLAB Current Directory pdbstruct getpdb S5CYT ToFile electron_transport pdb See Also Bioinformatics Toolbox functions getemb1 getgenbank getgenpept molviewer pdbdistplot pdbread pdbwrite 2 228 goannotread Purpose Syntax Arguments Description Examples Annotations from Gene Ontology annotated file Annotation goannotread File File Annotation goannotread File converts the contents of a Gene Ontology annotated file File into an array of structs Annotation Files should have the structure specified in http www geneontology org GO annotation shtml file A list with some annotated files can be found at http www geneontology org GO current annotations shtml 1 Open a Web browser to http www geneontology org GO current annotations shtml 2 Download the file containing GO annotations for the gene products of Saccharomyces cerevisiae gene_association sgd gz to your MATLAB Current Directory 3 Uncompress the file using the gunzip function gunzip gene_association sgd gz 4 Read the file into MATLAB SGDGenes goannotread gene_association sgd 5 Create a structure with GO annotations and get a list of genes S struct2cell SGDGenes genes S 3
459. rate hmmprofstruct pfamhmmread sptread Purpose Read data from SPOT file Syntax SPOTData sptread File SPOTData sptread File CleanColNames CleanColNamesValue Arguments File Either of the following e String specifying a file name a path and file name or a URL pointing to a file The referenced file is a SPOT formatted file ASCII text file If you specify only a file name that file must be on the MATLAB search path or in the MATLAB Current Directory e MATLAB character array that contains the text of a SPOT formatted file CleanColNamesValue Property to control using valid MATLAB variable names Description SPOTData sptread File reads a SPOT formatted file File and creates a MATLAB structure SPOTData containing the following fields Header Data Blocks Columns Rows IDs ColumnNames Indices Shape 2 687 sptread Examples See Also 2 688 SPOTData sptread File CleanColNames CleanColNamesValue The column names in the SPOT file contain periods and some characters that cannot be used in MATLAB variable names If you plan to use the column names as variable names in a function use this option with CleanColNames set to true and the function will return the field ColumnNames with valid variable names The Indices field of the structure includes the MATLAB indices that you can use for plotting heat maps of the data 1 Read in a sample SPOT file and plot the median foreground
460. rd column indicates if the probe is a perfect match 1 or mismatch 1 probe Note Affymetrix probe pair indexing is 0 based while MATLAB indexing is 1 based The output from probelibraryinfo is 1 based 1 Get the file Drosophila 121502 cel from http www affymetrix com support technical sample_data demo_data affx 2 Read the data into MATLAB CELStruct affyread Drosophila 121502 cel CDFStruct affyread D Affymetrix LibFiles DrosGenome1 DrosGenome1 CDF 3 Extract probe set library information ProbeInfo probelibraryinfo CELStruct CDFStruct 4 Find out probe set to which the 1104th probe belongs CDFStruct ProbeSets ProbeInfo 1104 1 Name probelibraryinfo See Also Bioinformatics Toolbox functions affyread celintensityread probesetlink probesetlookup probesetvalues 2 571 probesetlink Purpose Syntax Description Examples 2 572 Link to NetAffx Web site probesetlink AFFYStruct ID URL probesetlink AFFYStruct ID probesetlink PropertyName PropertyValue probesetlink Source SourceValue probesetlink Browser BrowserValue probesetlink NoDisplay NoDisplayValue probesetlink AFFYStruct ID displays information from the NetAffx Web site about a probe set ID from the CHP or CDF structure AFFYStruct ID can be the index of the probe set or the probe set name URL probesetlink AFFYStruct ID returns the URL for
461. read pf00002 tree plot tr Type radial Graph element properties can be modified as follows h get gcf UserData set h branchNodeLabels FontSize 6 Color 5 5 5 Bioinformatics Toolbox e functions phytree object constructor phytreeread phytreetool seqlinkage 4 55 plot phytree e phytree object method view 4 56 Purpose Syntax Arguments Description Examples prune phytree Remove branch nodes from phylogenetic tree T2 prune T1 Nodes T2 prune T1 Nodes Mode Exclusive T1 Phylogenetic object created with the phytree constructor function Nodes Nodes to remove from tree Mode Property to control the method of pruning Enter either Inclusive or Exclusive The default value is Inclusive T2 prune T1 Nodes removes the nodes listed in the vector Nodes from the tree T1 prune removes any branch or leaf node listed in Nodes and all their descendants from the tree T1 and returns the modified tree T2 The parent nodes are connected to the brothers as required Nodes in the tree are labeled as 1 numLeaves for the leaves and as numLeaves 1 numLeavest numBranches for the branches Nodes can also be a logical array of size numLeavest numBranches x 1 indicating the nodes to be removed T2 prune T1 Nodes Mode Exclusive changes the property Mode for pruning to Exclusive and removes only the descendants of the nodes listed
462. reate a sequence Seq aacount MATLAB 2 Count the amino acids in the sequence AA aacount Seq Warning Symbols other than the standard 20 amino acids appear in the sequence AA A 2 R 0 N 0 D 0 C 0 Q 0 E 0 G 0 H 0 I 0 Lig K 0 M 1 F 0 P 0 S 0 T 1 W 0 Y 0 V 0 Others 1 aacount 3 Get the count for alanine A residues AA A ans See Also Bioinformatics Toolbox functions aminolookup atomiccomp basecount codoncount dimercount isoelectric molweight proteinplot seqtool 2 13 affyinvarsetnorm Purpose Perform rank invariant set normalization on probe intensities from multiple Affymetrix CEL or DAT files Syntax NormData affyinvarsetnorm Data NormData MedStructure affyinvarsetnorm Data affyinvarsetnorm affyinvarsetnorm affyinvarsetnorm ThresholdsValue StopPrctileValue affyinvarsetnorm affyinvarsetnorm affyinvarsetnorm RayPrctileValue Arguments Data MedStructure BaselineValue Baseline BaselineValue Thresholds StopPrctile RayPrctile Method MethodValue Showplot ShowplotValue Matrix of intensity values where each row corresponds to a perfect match PM probe and each column corresponds to an Affymetrix CEL or DAT file Each CEL or DAT file is generated from a separate chip All chips should be of the same type
463. ree Analyze and visualize microarray data with t tests spatial plots box plots loglog plots and intensity ratio plots Normalize microarray data with lowess and mean normalization functions filter raw data for cleanup before analysis Classify and identify features in data sets set up cross validation experiments and compare different classification methods Read data from common mass spectrometry file formats preprocess raw mass spectrometry data from instruments and analyze spectra to identify patterns and compounds Create biograph object Create geneont object Create phytree object 1 3 T Functions By Category Data Formats and Databases affyprobeseqread affyread agferead blastread celintensityread emblread fastaread fastawrite galread genbankread genpeptread geosoftread getblast getembl getgenbank getgenpept getgeodata Read data file containing probe sequence information for Affymetrix GeneChip array Read microarray data from Affymetrix GeneChip file Windows 32 Read Agilent Feature Extraction Software file Read data from NCBI BLAST report file Read probe intensities from Affymetrix CEL files Windows 32 Read data from EMBL file Read data from FASTA file Write to file using FASTA format Read microarray data from GenePix array list file Read data from GenBank file Read data from GenPept file Read Gene Expression Omnibus GEO SOFT format da
464. ree object constructor phytreeread e phytree object methods getbyname select subtree 4 23 getdescendants biograph Purpose Syntax Arguments Description Examples 4 24 Find descendants in biograph object Nodes getdescendants BiographNode Nodes getdescendants BiographNode NumGenerations BiographNode Node in a biograph object NumGenerations Number of generations Enter a positive integer Nodes getdescendants BiographNode finds a given node BiographNode all of its direct descendants Nodes getdescendants BiographNode NumGenerations finds the node BiographNode and all of its direct descendants up to a specified number of generations NumGenerations 1 Create a biograph object cm 0 11003 100113 10000 00001 1 010 0 bg biograph cm 2 Find one generation of descendants for node 4 desNodes getdescendants bg nodes 4 set desNodes Color 1 7 7 bg view getdescendants biograph lO x File Tools Window Help Ra itane 4 desNodes 3 Find two generations of descendants for node 4 getdescendants bg nodes 4 2 set desNodes Color 7 1 bg view 7 5 4 25 getdescendants biograph Biograph Viewer 2 5 o x File Tools Window Help a QRAN A AA h See Also Bioinformatics Toolbox function biograph object constructor Bioinformatics Toolbox object biograph object getancestors getdescenda
465. references 2 Scroll down to locate the property you are interested in studying Working with Properties When you click on a property a smoothed plot of the property values along the sequence will be displayed Multiple properties can be selected from the list by holding down Shift or Ctrl while selecting properties When two properties are selected the plots are displayed using a PLOTYY style layout with one y axis on the left and one on the right For all other selections a single y axis is displayed When displaying one or two properties the y values displayed are the actual property values When three or more properties are displayed the values are normalized to the range 0 1 You can add your own property values by clicking on the Add button next to the property list This will open up a dialog that allows you to specify the values for each of the amino acids The Display Text box allows you to specify the text that will be displayed in the selection box on the main proteinplot window You can also save the property values to an m file for future use by typing a file name into the Filename box The Terminal Selection boxes allow you to choose to plot only part of the sequence By default all of the sequence is plotted The default smoothing method is an unweighted linear moving average with a window length of five residues You can change this using the Configuration Values dialog from the Edit menu The dialog allows you to selec
466. reordering renaming and distance exploring It can also open or save Newick formatted files phytreetool Tree loads data from a phytree object in the MATLAB workspace into the GUI phytreetool File loads data from a Newick formatted file into the GUI Examples tr phytreeread pf00002 tree phytreetool tr 2 566 phytreetool Phylogenetic Tree Tool 1 jas lol x File Tools Window Help QQnleacore QYYHCE RANRIM 26 382 VIPR1 PATIM 397 VIPR_CARAU 100 359 VIPR2 HUMAN 23 382 PACR PE oa SCTR_RABIT 135 391 O73768 CARALITa3 390 GHRHR_MOUSE 1 26 383 17505 _CAl 097802 BOVIN 769 1016 LPHN3 BOVIN 942 1198 BAI2_HUMAN 917 119 GPR64_HUMAN 62 MTH_ OROME 211 480 03 035 04 See Also Bioinformatics Toolbox functions phytree object constructor phytreeread phytreewrite Bioinformatics Toolbox methods of phytree object plot view 2 567 phytreewrite Purpose Write phylogenetic tree object to Newick formatted file Syntax phytreewrite File Tree phytreewrite Tree Arguments File Newick formatted file Enter either a file name or a path and file name supported by your operating system ASCII text file Tree Phylogenetic tree object either created with phytree object constructor function or imported using the phytreeread function Description phytreewrite File Tree copies the contents of a phytree object from the MATLAB workspace to a file Data in the file uses th
467. ries in matrix G represent the weights of the edges S Node in G T Node in G DirectedValue Property that indicates whether the graph is directed or undirected Enter false for an undirected graph This results in the upper triangle of the sparse matrix being ignored Default is true graphshortestpath MethodValue WeightsValue String that specifies the algorithm used to find the shortest path Choices are e Bellman Ford Assumes weights of the edges to be nonzero entries in sparse matrix G Time complexity is O0 N E where N and E are the number of nodes and edges respectively e BFS Breadth first search Assumes all weights to be equal and nonzero entries in sparse matrix G to represent edges Time complexity is O N E where N and E are the number of nodes and edges respectively e Acyclic Assumes G to be a directed acyclic graph and that weights of the edges are nonzero entries in sparse matrix G Time complexity is O N E where N and E are the number of nodes and edges respectively e Dijkstra Default algorithm Assumes weights of the edges to be positive values in sparse matrix G Time complexity is O log N E where N and E are the number of nodes and edges respectively Column vector that specifies custom weights for the edges in matrix G It must have one entry for every nonzero value edge in matrix G The order of the custom weights in the vector must match the order of the n
468. ription 2 522 NUC44 scoring matrix for nucleotide sequences ScoringMatrix nuc44 ScoringMatrix MatrixInfo nuc44 ScoringMatrix nuc44 returns the scoring matrix The nuc44 scoring matrix uses ambiguous nucleotide codes and probabilities rounded to the nearest integer Scale 0 277316 Expected score 1 7495024 Entropy 0 5164710 bits Lowest score 4 Highest score 5 Order ACGTRYKMSWBODHVN ScoringMatrix MatrixInfo nuc44 returns a structure with information about the matrix with fields Name and Order num2goid Purpose Syntax Description Examples See Also Convert numbers to Gene Ontology IDs GOIDs num2goid X GOIDs num2goid X converts the numbers in X to strings with Gene Ontology IDs IDs are a 7 digit number preceded by the prefix GO Get the Gene Ontology IDs of the following numbers t 5575 5622 5623 5737 5840 30529 43226 43228 43229 43232 43234 ids num2goid t Bioinformatics Toolbox functions geneont object constructor goannotread Bioinformatics Toolbox methods of geneont object getancestors getdescendants getmatrix getrelatives 2 523 nwalign Purpose Syntax Arguments 2 524 Globally align two sequences using Needleman Wunsch algorithm Score nwalign Seq 1 Seq2 Score Alignment nwalign Seq1 Seq2 Score Alignment Start nwalign Seq1 Seq2 nwalign Seq1 Seq2 Alphabet AlphabetValue nwalign Seq1 Seq2
469. ription Examples See Also 2 650 Find matches for every string in library Index seqmatch Strings Library Index seqmatch Strings Library looks through the elements of Library to find strings that begin with every string in Strings Index contains the index to the first occurrence for every string in the query Strings and Library must be cell arrays of strings lib VIPS_HUMAN SCCR_RABIT CALR_PIG VIPR_RAT PACR MOUSE query CALR VIP h seqmatch query lib lib h MATLAB functions regexp strmatch seqneighjoin Purpose Syntax Arguments Description Neighbor joining method for phylogenetic tree reconstruction Tree seqneighjoin Dist Tree seqneighjoin Dist Method Tree seqneighjoin Dist Method Names seqneighjoin PropertyName PropertyValue seqneighjoin Reroot RerootValue Dist Matrix or vector returned by the seqpdist function Method Method to compute the distances between nodes Enter equivar default firstorder or average Names Vector of structures with the fields Header Name or a cell array of strings In all cases the number of elements must equal the number of samples used to generate the pairwise distances in Dist Tree seqneighjoin Dist computes a phylogenetic tree object from pairwise distances Dist between the species or products using the neighbor joining method Tree seqneighjoin Dist Method
470. ription is the sequence as a series of lines with fewer than 80 characters Sequences are expected to use the standard IUB IUPAC amino acid and nucleotide letter codes For a list of codes see aminolookup and baselookup FASTAData fastaread File reads a file with a FASTA format and returns the data in a structure FASTAData Header is the header information while FASTAData Sequence is the sequence stored as a string of letters Header Sequence fastaread File reads data from a file into separate variables If the file contains more than one sequence then header and sequence are cell arrays of header and sequence information fastaread PropertyName PropertyValue defines optional properties The property name value pairs can be in any format supported by the function set for example name value string pairs structures and name value cell array pairs fastaread IgnoreGaps IgnoreGapsValue when IgnoreGapsValue is true removes any gap symbol or from the sequences Default is false fastaread Blockread BlockreadValue lets you read in a single entry or block of entries from a file containing multiple sequences If BlockreadValue is a scalar N then fastaread reads the Nth entry in the file If BlockreadValue is a 1 by 2 vector M7 M2 then fastaread reads the block of entries starting at entry M1 and ending at entry M2 To read all remaining entries in the file starting at entry
471. rix G when it is traversed column wise This property lets you use zero valued weights By default graphshortestpath gets weight information from the nonzero entries in matrix G Finding the Shortest Path in a Directed Graph 1 Create and view a directed graph with 6 nodes and 11 edges W 41 99 51 32 15 45 38 32 36 29 21 DG sparse 61223445561 263541634 3 5 W DG 4 1 0 4500 6 2 0 4100 2 3 0 5100 5 3 0 3200 6 3 0 2900 3 4 0 1500 5 4 0 3600 1 5 0 2100 2 285 graphshortestpath 2 5 0 3200 1 6 0 9900 4 6 0 3800 h view biograph DG ShowWeights on Biograph object with 6 nodes and 11 edges Biograph Viewer 1 File Tools Window Help a AAY a4 2 Find the shortest path in the graph from node 1 to node 6 dist path pred graphshortestpath DG 1 6 2 286 graphshortestpath dist 0 9500 path 1 5 4 6 pred 0 6 5 5 1 4 3 Mark the nodes and edges of the shortest path by coloring them red and increasing the line width set h Nodes path Color 1 0 4 0 4 edges getedgesbynodeid h get h Nodes path ID set edges LineColor 1 0 0 set edges LineWidth 1 5 2 287 graphshortestpath Biograph Viewer 1 my 10 x File Tools Window Help aa Finding the Shortest Path in an Undirected Graph 1 Create and view an undirected graph with 6 nodes and 11 edges UG tril DG DG
472. rkersValue Value specifying a quantile of the ion intensity values to fall below the midpoint of the color map meaning they do not represent peaks msheatmap uses a custom color map where cool colors represent nonpeak regions white represents the midpoint and warm colors represent peaks Choices are any value gt 0 and lt 1 Default is e 0 99 For LC MS or GC MS data or when input T is provided This means that 1 of the pixels are warm colors and represent peaks e 0 95 For non LC MS or non GC MS data or when input 7 is not provided This means that 5 of the pixels are warm colors and represent peaks Tip You can also change the midpoint interactively after creating the heat map by right clicking the color bar selecting Interactive Colormap Shift and then click dragging the cursor vertically on the color bar This technique is useful when comparing multiple heat maps 1 by 2 vector specifying the m z range for the x axis of the heat map RangeValue must be within min MZ max MZ Default is the full range min MZ max MZ Vector of m z values to mark on the top horizontal axis of the heat map Default is msheatmap SpecIdxValue Either of the following e Vector of values with the same number of elements as columns spectra in the matrix Intensities e Cell array of strings with the same number of elements as columns spectra in the matrix Intensities Each value or string specifies a l
473. rmabackadj rmasummary 2 590 ramachandran Purpose Syntax Arguments Description Draw Ramachandran plot for Protein Data Bank PDB data ramachandran PDBid ramachandran File ramachandran PDBData Angles ramachandran Angles Handle ramachandran PDBid Unique identifier for a protein structure record Each structure in the PDB is represented by a 4 character alphanumeric identifier For example 4hhb is the identification code for hemoglobin File Protein Data Bank PDB formatted file ASCII text file Enter a file name a path and file name or a URL pointing to a file File can also be a MATLAB character array that contains the text for a PDB file PDBData MATLAB structure with PDB formatted data ramachandran generates a plot of the torsion angle PHI torsion angle between the C N CA C atoms and the torsion angle PSI torsion angle between the N CA C N atoms of the protein sequence ramachandran PDBid generates the Ramachandran plot for the protein with PDB code ID ramachandran File generates the Ramachandran plot for protein stored in the PDB file File ramachandran PDBData generates the Ramachandran plot for the protein stored in the structure PDBData where PDBData is a MATLAB structure obtained by using pdbread or getpdb Angles ramachandran returns an array of the torsion angles PHI PSI and OMEGA for the residue sequence Angles Handle ramacha
474. rmalization ensures comparability among different features although it is not always necessary because the selected criterion might already account for this Choices are none default Intensities are not cross normalized meanvar X_new x mean x std x softmax x_new 1 exp mean x x std x 1 minmax X_new x min x max x min x Examples 1 Find a reduced set of genes that is sufficient for differentiating breast cancer cells from all other types of cancer in the t matrix NCI60 data set Load sample data load NCI60tmatrix 2 Get a logical index vector to the breast cancer cells 2 601 rankfeatures BC GROUP 8 3 Select features I rankfeatures X BC NumberOfIndices 12 4 Test features with a linear discriminant classifier C classify X 1I X 1I double BC cp classperf BC C cp CorrectRate ans 5 Use cross correlation weighting to further reduce the required number of genes I rankfeatures X BC CCWeighting 0 7 NumberOfIndices 8 C classify X 1I X I double BC cp classperf BC C cp CorrectRate ans 6 Find the discriminant peaks of two groups of signals with Gaussian pulses modulated by two different sources load GaussianPulses f rankfeatures y grp NWeighting x x 10 5 NumberOfIndices 5 plot t y grp 1 b t y grp 2 g t f 1 35 vr 2 602 rankfeatures LT iol xl File Edit Vi
475. rogram PropertyName PropertyValue Database DatabaseValue Descriptions DescriptionsValue Alignments AlignmentsValue Filter FilterValue Expect ExpectValue Word WordValue Matrix MatrixValue GapOpen GapOpenValue ExtendGap ExtendGapValue Inclusion InclusionValue Pct PctValue Nucleotide or amino acid sequence Enter a GenBank or RefSeq accession number GI FASTA file URL string character array or a MATLAB structure that contains the field Sequence You can also enter a structure with the field Sequence BLAST program Enter blastn blastp psiblast blastx tblastn tblastx or megablast 2 65 blastncbi 2 66 DatabaseValue DescriptionValue Alignment Value FilterValue ExpectValue WordValue Property to select a database Compatible databases depend upon the type of sequence submitted and program selected The nonredundant database nr is the default value for both nucleotide and amino acid sequences For nucleotide sequences enter nr est est_human est_mouse est_others gss htgs pat pdb month alu_repeats dbsts chromosome wgs refseq_rna refseq_genomic or env_nt The default value is nr For amino acid sequences enter nr swissprot pat pdb month refseq_protein or env_nr The default value is nr Property to specify the
476. rs biograph Purpose Syntax Arguments Description Examples Find ancestors in biograph object Nodes getancestors BiographNode Nodes getancestors BiographNode NumGenerations BiographNode Node in a biograph object NumGenerations Number of generations Enter a positive integer Nodes getancestors BiographNode returns a node BiographNode and all of its direct ancestors Nodes getancestors BiographNode NumGenerations finds the node BiographNode and its direct ancestors up to a specified number of generations NumGenerations 1 Create a biograph object cm 01100 10011 10000 00001 1 010 0J bg biograph cm 2 Find one generation of ancestors for node 2 ancNodes getancestors bg nodes 2 set ancNodes Color 1 7 7 bg view 4 13 getancestors biograph Biograph Viewer 1 2 o x File Tools Window Help RAK ancNodes 3 Find two generations of ancestors for node 2 getancestors bg nodes set ancNodes Color 7 1 bg view 2 52 7 5 4 14 getancestors biograph Biograph Viewer 2 o x File Tools Window Help a RAM See Also Bioinformatics Toolbox function biograph object constructor Bioinformatics Toolbox object biograph object getrelatives view Bioinformatics Toolbox methods of a biograph object dolayout getancestors getdescendants getedgesbynodeid getnodesbyid MATLAB functions
477. rse matrices G7 and G2 are ignored Default is true meaning that both graphs are directed Examples 1 Create and view a directed graph with 8 nodes and 11 edges m ABCDEFGH 12345 6 7 8 gi sparse m ABDCDCGEFFG m BCBDGEEFHGH true 8 8 gi lt x xw x rer rere ve a TS wa E eos i a ass si n view biograph gi ABCDEFGH 2 256 graphisomorphism Biograph Viewer 1 a BI x File Tools Window Help a Ra 2 Set a random permutation vector and then create and view a new permuted graph p randperm 8 2 257 graphisomorphism p 7 8 2 3 6 4 1 5 g2 gi p p view biograph g2 12345678 e cles File Tools Window Help RA 3 Check if the two graphs are isomorphic F Map graphisomorphism g2 g1 2 258 graphisomorphism Map 7 8 2 3 6 4 1 5 Note that the Map row vector containing the node indices that map from g2 to g1 is the same as the permutation vector you created in step 2 4 Reverse the direction of the D G edge in the first graph and then check for isomorphism again gi m DG m GD g1 m GD m DG view biograph g1 ABCDEFGH 2 259 2 260 graphisomorphism Biograph Viewer 3 File Tools Window Help RQ ia f lol x graphisomorphism References See Also 5 Convert the graphs to undirected graphs and then check for isomorphism F M graphisomorphism g2 g
478. rsed column wise This property lets you use zero valued weights By default shortestpath gets weight information from the nonzero entries in the N by N adjacency matrix 1 Dijkstra E W 1959 A note on two problems in connexion with graphs Numerische Mathematik 1 269 271 2 Bellman R 1958 On a Routing Problem Quarterly of Applied Mathematics 16 1 87 90 3 Siek J G Lee L Q and Lumsdaine A 2002 The Boost Graph Library User Guide and Reference Manual Upper Saddle River NJ Pearson Education Bioinformatics Toolbox functions biograph object constructor graphshortestpath 4 73 shortestpath biograph Bioinformatics Toolbox object biograph object Bioinformatics Toolbox methods of a biograph object allshortestpaths conncomp isdag isomorphism isspantree maxflow minspantree topoorder traverse 4 74 Purpose Syntax Description Examples See Also subtree phytree Extract phylogenetic subtree Tree2 subtree Tree 1 Nodes Tree2 subtree Tree Nodes extracts a new subtree Tree2 where the new root is the first common ancestor of the Nodes vector from Tree1 Nodes in the tree are indexed as 1 NUMLEAVES for the leaves and as NUMLEAVES 1 NUMLEAVES NUMBRANCHES for the branches Nodes can also be a logical array of following sizes NUMLEAVES NUMBRANCHES x 1 NUMLEAVES x 1 or NUMBRANCHES x 1 1 Load a phylogenetic tree created from a protein family t
479. ruction by nodes Biograph objects with LayoutType equal to equilibrium or radial cannot produce curved or segmented edges Positive number that post scales the node coordinates Default is 1 biograph object Property Description LayoutScale Positive number that scales the size of the nodes before calling the layout engine Default is 1 EdgeTextColor Three element numeric vector of RGB values Default is 0 0 0 which defines black EdgeFontSize Positive number that sets the size of the edge font in points Default is 8 ShowArrows Controls the display of arrows with the edges Choices are on default or off ArrowSize Positive number that sets the size of the arrows in points Default is 8 Showeights Controls the display of text indicating the weight of the edges Choices are on default or off ShowText InNodes String that specifies the node property used to label nodes when you display a biograph object using the view method Choices are e Label Uses the Label property of the node object default e ID Uses the ID property of the node object e None biograph object Property Description NodeAutoSize NodeCallback EdgeCallback CustomNodeDrawFcn Controls precalculating the node size before calling the layout engine Choices are on default or off User defined callback for all nodes Enter the name of a function a function ha
480. rue displays a figure showing a heat map of the codon counts Count the number of standard codons in a nucleotide sequence codons codoncount AAACGTTA codons AAA 1 ATC O CGG O GCT 0 TCA O AAC 0 ATG O CGT 1 GGA O TCC O AAG O ATT O CTA O GGC O TCG O AAT O CAA O CTC O GGG O TCT O ACA O CAC O CTG O GGT O TGA O ACC O CAG O CTT O GTA O TGC O0 ACG 0 CAT O GAA 0 GTC O TGG O ACT O CCA O GAC O0 GTG O TGT O codoncount AGA 0 CCC O GAG O GTT O TTA O AGC 0 CCG O GAT O TAA O TTC O AGG O CCT 0 GCA O TAC O TTG O AGT 0 CGA O GCC O TAG 0 TTT O ATA O CGC O GCG O TAT O Count the codons in the second frame for the reverse complement of a sequence r2codons codoncount AAACGTTA Frame 2 Reverse true Create a heat map for the codons in a nucleotide sequence a randseq 1000 codoncount a Figure true 2 105 codoncount Figure 1 File Edit View Insert Tools Desktop Window Help See Also Bioinformatics Toolbox functions aacount basecount baselookup codonbias dimercount nmercount ntdensity seqrcomplement seqwordcount 2 106 cpgisland Purpose Syntax Arguments Description Locate CpG islands in DNA sequence cpgisland SeqDNA Cpgisland PropertyName PropertyValue Cpgisland Window WindowValue Cpgisland MinIsland MinIslandValue Cpgisland CpGoe CpGoeValue Cpgis
481. ry information on graph theory functions see Graph Theory Functions in the Bioinformatics Toolbox documentation MaxFlow FlowMatrix Cut maxflow BGObj SNode TNode calculates the maximum flow of a directed graph represented by an N by N adjacency matrix extracted from a biograph object BGObj from 4 46 maxflow biograph node SNode to node TNode Nonzero entries in the matrix determine the capacity of the edges Output MaxFlow is the maximum flow and FlowMatrix is a sparse matrix with all the flow values for every edge FlowMatrix X Y is the flow from node X to node Y Output Cut is a logical row vector indicating the nodes connected to SNode after calculating the minimum cut between SNode and TNode If several solutions to the minimum cut problem exist then Cut is a matrix maxflow BGObj SNode TNode PropertyName PropertyValue calls maxflow with optional properties that use property name property value pairs You can specify one or more properties in any order Each PropertyName must be enclosed in single quotes and is case insensitive These property name property value pairs are as follows maxflow BGObj SNode TNode Capacity CapacityValue lets you specify custom capacities for the edges CapacityValue is a column vector having one entry for every nonzero value edge in the N by N adjacency matrix The order of the custom capacities in the vector must match the ord
482. s exclude leaves propagate toleaves view tr sel_leaves See Also Bioinformatics Toolbox e functions phytree object constructor phytreetool e phytree object methods get pdist prune 4 69 shortestpath biograph 4 70 Purpose Syntax Arguments Solve shortest path problem in biograph object dist path pred shortestpath BGObj S dist path pred shortestpath BGObj S T oss i DirectedValue shortestpath Directed DirectedValue shortestpath Method MethodValue shortestpath Weights WeightsValue biograph object created by biograph object constructor Node in graph represented by an N by N adjacency matrix extracted from a biograph object BGObj Node in graph represented by an N by N adjacency matrix extracted from a biograph object BGObj Property that indicates whether the graph represented by the N by N adjacency matrix extracted from a biograph object BGObj is directed or undirected Enter false for an undirected graph This results in the upper triangle of the sparse matrix being ignored Default is true MethodValue WeightsValue shortestpath biograph String that specifies the algorithm used to find the shortest path Choices are e Bellman Ford Assumes weights of the edges to be nonzero entries in the N by N adjacency matrix Time complexity is 0 N E where N and E
483. s Purpose Syntax Arguments Description Graphically display words in sequence seqshowwords Seq Word seqshowwords Seq Word Color ColorValue seqshowwords Seq Word Columns ColumnsValue seqshowwords Seq Word Alphabet AlphabetValue Seq Enter either a nucleotide or amino acid sequence You can also enter a structure with the field Sequence Word Enter a short character sequence ColorValue Property to select the color for highlighted characters Enter a 1 by 3 RGB vector specifying the intensity 0 255 of the red green and blue components or enter a character from the following list b blue g green r red c cyan m magenta or y yellow The default color is red r ColumnsValue Property to specify the number of characters in a line Default value is 64 AlphabetValue Property to select the alphabet Enter AA for amino acid sequences or NT for nucleotide sequences The default is NT seqshowwords Seq Word displays the sequence with all occurrences of a word highlighted and returns a structure with the start and stop positions for all occurrences of the word in the sequence seqshowwords Seq Word PropertyName PropertyValue calls seqshowwords with optional properties that use property name property value pairs You can specify one or more properties in any order Each PropertyName must 2 675 seqshowwords E
484. s containing PM and MM probe intensity values and probeIndices a column vector containing probe indexing information load prostatecancerrawdata 2 173 gcrmabackadj 2 Compute the Affymetrix PM and MM probe affinities from their sequences and MM probe intensities apm amm affyprobeaffinities seqMatrix mmMatrix 1 ProbeIndices probeIndices 3 Perform GCRMA background adjustment on the Affymetrix microarray probe level data creating a matrix of background adjusted PM intensity values Also display a plot showing the log of probe intensity values from column 3 chip 3 in mmMatrix versus probe affinities in amm pms_adj gcrmabackadj pmMatrix mmMatrix apm amm showplot 3 2 174 gcrmabackadj References io xl File Edit View Insert Tools Desktop Window Help a D ar Ranae mle an 15 Oo Log2 Optically adjusted MM probe intensities 4 Perform GCRMA background adjustment again using the slower more formal empirical Bayes method pms_adj2 gcrmabackadj pmMatrix mmMatrix apm amm method EB The prostatecancerrawdata mat file used in this example contains data from Best et al 2005 1 Wu Z Irizarry R A Gentleman R Murillo F M and Spencer F 2004 A Model Based Background Adjustment for Oligonucleotide 2 175 gcrmabackadj See Also 2 176 Expression Arrays Journal of the American Statistical Association 99 468 909
485. s or all The default value is the vector 1 2 3 Frames 1 2 and 3 correspond to the first second and third reading frames for the reverse complement GeneticCodeValue Genetic code name Enter a code number or a code name from the table see MinimumLengthValue Property to set the minimum number 2 670 of codons in an ORF seqshoworfs Description AlternativeStartCodonsValue Property to control using alternative start codons Enter either true or false The default value is false ColorValue Property to select the color for highlighting the reading frame Enter either a 1 by 3 RGB vector specifying the intensity 0 to 255 of the red green and blue components of the color or a character from the following list b hblue g green r red c cyan m magenta or y yellow To specify different colors for the three reading frames use a 1 by 3 cell array of color values If you are displaying reverse complement reading frames then COLOR should be a 1 by 6 cell array of color values ColumnsValue Property to specify the number of columns in the output seqshoworfs identifies and highlights all open reading frames using the standard or an alternative genetic code seqshoworfs SeqNT displays the sequence with all open reading frames highlighted and it returns a structure of start and stop positions for each ORF in each reading frame The standard genetic code is used with start codon A
486. s positive Default is 100 100 msalign Note Use these values to tune the robustness of the algorithm Ideally you should keep the range within the maximum expected shift If you try to correct larger shifts by increasing the limits you increase the possibility of picking incorrect peaks to align to the reference masses msalign WidthOfPulses WidthOfPulsesValue specifies the width in m z units for all the Gaussian pulses used to build the correlating synthetic spectrum The point of the peak where the Gaussian pulse reaches 60 65 of its maximum is set to the width specified by WidthOfPulsesValue Choices are any positive value Default is 10 WidthOfPulsesValue may also be a function handle The function is evaluated at the respective m z values and returns a variable width for the pulses Its evaluation should give reasonable values between 0 and max abs Range otherwise the function returns an error Note Tuning the spread of the Gaussian pulses controls a tradeoff between robustness wider pulses and precision narrower pulses However the spread of the pulses is unrelated to the shape of the observed peaks in the spectrum The purpose of the pulse spread is to drive the optimization algorithm msalign WindowSizeRatio WindowSizeRatioValue specifies a scaling factor that determines the size of the window around every alignment peak The synthetic spectrum is compared to
487. s A T or U C and G or a vector of integers You can also enter a structure with the field Sequence codonbias does not count ambiguous bases or gaps Many amino acids are coded by two or more nucleic acid codons However the probability that a codon from the various possible codons for an amino acid is used to code an amino acid is different between sequences Knowing the frequency of each codon in a protein coding sequence for each amino acid is a useful statistic codonbias SeqDNA calculates the codon frequency in percent for each amino acid in a DNA sequence SeqDNA codonbias PropertyName PropertyValue defines optional properties using property name value pairs codonbias GeneticCode GeneticCodeValue selects an alternative genetic code GenetidCodeValue The default value is Standard or 1 For a list of genetic codes see codonbias Frame FrameValue selects a reading frame FrameValue FrameValue can be 1 default 2 or 3 codonbias Reverse ReverseValue when ReverseValue is true returns the codon frequency for the reverse complement of the DNA sequence SeqDNA codonbias codonbias Pie PieValue when PieValue is true creates a figure of 20 pie charts for each amino acid Example 1 Import a nucleotide sequence from GenBank to MATLAB For example get the DNA sequence that codes for a human insulin receptor S getgenbank M10051 2 Calculate the cod
488. s a maximal group of nodes that are mutually reachable by violating the edge directions Set WeakValue to true to find weakly connected components Default is false which finds strongly connected components The state of this parameter has no effect on undirected graphs because weakly and strongly connected components are the same in undirected graphs Time complexity is O N E where N and E are number of nodes and edges respectively graphconncomp Description Tip For introductory information on graph theory functions see Graph Theory Functions in the Bioinformatics Toolbox documentation S C graphconncomp G finds the strongly connected components of the graph represented by matrix G using Tarjan s algorithm A strongly connected component is a maximal group of nodes that are mutually reachable without violating the edge directions Input G is an N by N sparse matrix that represents a graph Nonzero entries in matrix G indicate the presence of an edge The number of components found is returned in S and C is a vector indicating to which component each node belongs Tarjan s algorithm has a time complexity of O N E where N and E are the number of nodes and edges respectively S C graphconncomp G PropertyName PropertyValue calls graphconncomp with optional properties that use property name property value pairs You can specify one or more properties in any order Each PropertyName must
489. s charge vector MZ while mssgolay also allows one that is not uniformly spaced Therefore the sliding frame for smoothing is centered using the closest samples in terms of the MZ value and not in terms of the MZ index 2 When the vector MZ does not have repeated values or NaNs the algorithm is approximately twice as fast 3 When the vector MZ is evenly spaced the least squares fitting is performed once so that the spectrum is filtered with the same coefficients and the speed of the algorithm increases considerably 4 If the vector MZ is evenly spaced and SpanValue is even Span is incriminated by 1 to include both edge samples in the frame mssgolay Degree DegreeValue specifies the degree of the polynomial DegreeValue fitted to the points in the moving frame The default value is 2 DegreeValue must be smaller than SpanValue mssgolay ShowPlot ShowPlotValue plots smoothed spectra over the original When mssgolay is called without output arguments the spectra are plotted unless ShowPlotValue is false When ShowPlotValue is true only the first spectrum in Y is plotted ShowPlotValue can also contain an index to one of the spectra in Y load sample_lo_res YS mssgolay MZ_low_res Y_low_res 1 plot MZ Y 1 YS Bioinformatics Toolbox functions msalign msbackadj msheatmap mslowess msnorm mspeaks msresample msviewer 2 491 msviewer Purpose Syntax Arguments Description 2 492
490. s false TerminalLabels Property to control displaying terminal labels Enter either true or false The default value is false Description plot Tree draws a phylogenetic tree object into a MATLAB figure as a phylogram The significant distances between branches and nodes 4 54 Examples See Also plot phytree are in the horizontal direction Vertical distances have no significance and are selected only for display purposes Handles to graph elements are stored in the figure field UserData so that you can easily modify graphic properties plot Tree ActiveBranches hides the nonactive branches and all of their descendants ActiveBranches is a logical array of size numBranches x 1 indicating the active branches plot Type TypeValue selects a method for drawing a phylogenetic tree plot Orientation OrientationValue orients a phylogenetic tree within a figure window The Orientation property is valid only for phylogram and cladogram trees plot BranchLabels BranchLabelsValue hides or displays branch labels placed next to the branch node plot LeafLabels LeafLabelsValue hides or displays leaf labels placed next to the leaf nodes plot TerminalLabels TerminalLabelsValue hides or displays terminal labels Terminal labels are placed over the axis tick labels and ignored when Type radial H plot returns a structure with handles to the graph elements tr phytree
491. s in the graph from BGObj7 and the nodes in the graph from BGObj 2 such that adjacencies are preserved Return value Isomorphic is Boolean When Isomorphic is true Map is a row vector containing the node indices that map from BGObj 2 to BGObj 1 When Isomorphic is false the worst case time complexity is 0 N where N is the number of nodes isomorphism biograph References See Also Isomorphic Map isomorphism BGObj1 BGObj2 Directed DirectedValue indicates whether the graphs are directed or undirected Set DirectedValue to false when both BGObj 1 and BGObj2 produce undirected graphs In this case the upper triangles of the sparse matrices extracted from BGObj 1 and BGObj2 are ignored The default is true meaning that both graphs are directed 1 Fortin S 1996 The Graph Isomorphism Problem Technical Report 96 20 Dept of Computer Science University of Alberta Edomonton Alberta Canada 2 McKay B D 1981 Practical Graph Isomorphism Congressus Numerantium 30 45 87 3 Siek J G Lee L Q and Lumsdaine A 2002 The Boost Graph Library User Guide and Reference Manual Upper Saddle River NJ Pearson Education Bioinformatics Toolbox functions biograph object constructor graphisomorphism Bioinformatics Toolbox object biograph object Bioinformatics Toolbox methods of a biograph object allshortestpaths conncomp isdag isspantree maxflow minspantree shortestpath topoorder traverse
492. script function which sends RasMol script commands to the Molecule Viewer window Tip If you receive any errors related to memory or Java heap space try increasing your Java heap space as described at http www mathworks com support solutions data 1 1812C html 2 405 molviewer 2 406 Molecule Viewer 2DHB File Edit Display Tools Window Help Sasol oh QlO 2DHB Atoms 2289 Bonds 2360 Groups 289 Chains 2 Polymer 2 Models 1 Display Mv Atoms M Bonds C Spacefill Ball amp Stick C Wireframe Sticks Structure I Backbone I Trace I Cartoon T Ribbons T Rocket 2DHB pdb gz Color Scheme Show rModels Atoms x Tl Axes C1 CPK IF Boundbox C Amino I Unitcell C Structure I Selection Chain C Charge molviewer After displaying the 3 D molecule structure you can Click drag the molecule to spin rotate and view it from different angles Hover the mouse over a subcomponent of the molecule to display an identification label for it Zoom the plot by turning the mouse scroll wheel or clicking the following buttons 4 or Spin the molecule by clicking e Change the background color between black and white by clicking Reset the molecule position by clicking Show or hide the Control Panel by clicking Manipulate and annotate the 3 D structure by selecting options in the Control Panel or by right clickin
493. seValue ExistingGagAdjustValue TerminalGapAdjustValue Scalar or a function specified using IF you enter a function multiialign passes four values to the function the average score for two matched residues sm the average score for two mismatched residues sx and the length of both profiles or sequences len1 len2 Default is sm sx lent len2 sm 4 Property to specify the threshold delay of divergent sequences The default is unity where sequences with the closest sequence farther than the median distance are delayed JobManager object representing an available distributed MATLAB resource Enter a jobmanager object returned by the Distributed Computing Toolbox function findResource Property to control waiting for a distributed MATLAB resource to be available Enter either true or false The default value is false Property to control displaying the sequences with sequence information Default value is false Property to control automatic adjustment based on existing gaps Default value is true Property to adjusts the penalty for opening a gap at the ends of the sequence Default value is false 2 497 multialign Description 2 498 SeqsMultiAligned multialign Seqs performs a progressive multiple alignment for a set of sequences Seqs Pair wise distances between sequences are computed after pair wise alignment with the Gonnet scoring matrix and then by counting the proportion of sites at
494. sed to return Score in arbitrary units other than bits Choices are any positive value swalign Seq1 Seq2 GapOpen GapOpenValue specifies the penalty for opening a gap in the alignment Choices are any positive integer Default is 8 swalign Seqi Seq2 ExtendGap ExtendGapValue specifies the penalty for extending a gap in the alignment Choices are any positive integer Default is equal to GapOpenValue swalign Seq1 Seq2 Showscore ShowscoreValue controls the display of the scoring space and winning path of the alignment Choices are true or false default 2 719 swalign Figure 1 _ ol x File Edit View Insert Tools Desktop Window Help a Deal QAQMO Enl na Scoring Space and Winning Path 2 4 6 8 10 Sequence 1 1 8 2 7 3 6 N g4 E o m 25 wo 6 if 8 The scoring space is a heat map displaying the best scores for all the partial alignments of two sequences The color of each n1 n2 coordinate in the scoring space represents the best score for the pairing of subsequences Seqi s1 n1 and Seq2 s2 n2 where n1 is a position in Seq1 n2 is a position in Seq2 s1 is any position in Seq1 between 1 n1 and s2 is any position in Seq2 between 1 n2 The best score for a pairing of specific subsequences is determined by scoring all possible alignments of the subsequences by summing matches and gap penalties 2 720 swalign Examples The winning pat
495. sensus seqneighjoin showalignment 2 503 multialignread Purpose Syntax Arguments Description 2 504 Read multiple sequence alignment file S multialignread File Headers Sequences multialignread File multialignread PropertyName PropertyValue multialignread IgnoreGaps IgnoreGapsValue File Multiple sequence alignment file ASCII text file Enter a file name a path and file name or a URL pointing to a file File can also be a MATLAB character array that contains the text of a multiple sequence alignment file You can read common multiple alignment file types such as ClustalW aln and GCG msf IgnoreGapsValue Property to control removing gap symbols S multialignread File reads a multiple sequence alignment file The file contains multiple sequence lines that start with a sequence header followed by an optional number not used by multialignread and a section of the sequence The multiple sequences are broken into blocks with the same number of blocks for every sequence For an example type open aagag aln The output S is a structure array where S Header contains the header information and S Sequence contains the amino acid or nucleotide sequences Headers Sequences multialignread File reads the file into separate variables Headers and Sequences multialignread PropertyName PropertyValue defines optional properties using property name value pa
496. seqwordcount MATLAB functions regexp regexpi 2 630 seqcomplement Purpose Syntax Arguments Description Example See Also Calculate complementary strand of nucleotide sequence SeqC seqcomplement SeqnT SeqNnT Enter either a character string with the characters A T U G C and ambiguous characters R Y K M S W B D H V N or a vector of integers You can also enter a structure with the field Sequence SeqC seqcomplement SeqNT calculates the complementary strand A gt T C gt G G gt C T gt A of a DNA sequence and returns a sequence in the same format as SeqNT For example if SeqNT is an integer sequence then so is SeqC Return the complement of a DNA nucleotide sequence s ATCG seqcomplement s ans TAGC Bioinformatics Toolbox functions seqrcomplement seqreverse seqtool 2 631 seqconsensus Purpose Syntax Arguments Description 2 632 Calculate consensus sequence CSeq seqconsensus Seqs CSeq Score seqconsensus Seqs CSeq seqconsensus Profile seqconsensus PropertyName PropertyValue seqconsensus ScoringMatrix ScoringMatrixValue Seqs Set of multiply aligned amino acid or nucleotide sequences Enter an array of strings a cell array of strings or an array of structures with the field Sequence Profile Sequence profile Enter a profile from the function seqprofile Profile is a matrix of size 20 or 4 x Se
497. setnorm DataX DataY normalizes the values in DataY a vector of gene expression values to a reference vector Datax using the invariant set method NormDataY is a vector of normalized gene expression values from DataY 2 365 mainvarsetnorm 2 366 Specifically mainvarsetnorm e Determines the proportional rank difference prd for each pair of ranks RankX and RankY from the two vectors of gene expression values DataX and DatayY prd abs RankX RankY e Determines the invariant set of data points by selecting data points whose proportional rank differences prd are below threshold which is a predetermined threshold for a given data point defined by the ThresholdsValue property It optionally repeats the process until either no more data points are eliminated or a predetermined percentage of data points is reached The invariant set is data points with a prd lt threshold e Uses the invariant set of data points to calculate the lowess or running median smoothing curve which is used to normalize the data in Datay Note If Datax or DataY contains NaN values then NormDataY will also contain NaN values at the corresponding positions Tip mainvarsetnornm is useful for correcting for dye bias in two color microarray data NormDataY mainvarsetnorm PropertyName PropertyValue defines optional properties that use property name value pairs in any order These property name value pairs are
498. significance threshold InclusionValue for including a sequence in the Position Specific Score Matrix PSSm created by PSI BLAST for the subsequent iteration The default value is 0 005 blastncbi Pct PctValue when ProgramValue is Megablast selects the percent identity and the corresponding match and mismatch score for matching existing sequences in a public database 2 69 blastncbi Mejep q nesJep GMELJop QMLJop c9WNSO1d c9WNSO1d c9WNSO1d c9WNSO1d OsWwnsold OsWNso1d O8WNSO1d OsWNso1d SVINNSO 14d SVINNSO 1d SVINNNSO 1d SVINNNSO 1d OZWVd OZINVd OZINVd OZNYd X OENYd OENYd OENYd OENYd X XUJEW v9 8v ce IMELJOp GL 8z ve Oz GQNLJop INeJop qINeJop q Nejep Mejop LL OL SL EL Cc c c c Z PIOM OT FMBZOP OT FMBJOP OT WMeJOp OT IMeJOp ol q nejop ol yoodxy JOMOT J MOT atTqey J MOT Tqe uewny J MOT TALE J MOT aTqe 1mezop a Tqe uewny MOT qynejop moTt 4 Nejop MOT Qinejep MOT MOT Qinejep MOT 1314 u nua oTwouebh basjsou eug basjau SOM swosowogya S Sqp sjeadeu nTe Ju nua y uow qpd uTp Oud b sz a ed shy ssh yyuow Suay O S qpd ied asnow 4sa NLSV IE NLSV IE NLSV IE S dLSV IE JOoudssTms uewny s aures sonjea aures sonjeA ewes sonjea se oures sanjea q nejop uu 1Sea 4 Nejop au Iseqered VOIW XLSV1EL NIL
499. sing Needleman Wunsch global alignment Calculate pair wise distance between sequences Sequence alignment with color Scoring Matrices Scoring Matrices blosum dayhoff gonnet nuc44 pam Phylogenetic Tree Tools dnds dndsml gethmmtree phytreeread phytreetool phytreewrite seqinsertgaps seqlinkage BLOSUM scoring matrix Dayhoff scoring matrix Gonnet scoring matrix NUC44 scoring matrix for nucleotide sequences PAM scoring matrix Estimate synonymous and nonsynonymous substitution rates Estimate synonymous and nonsynonymous substitution rates using maximum likelihood method Phylogenetic tree data from PFAM database Read phylogenetic tree file View edit and explore phylogenetic tree data Write phylogenetic tree object to Newick formatted file Insert gaps into nucleotide or amino acid sequence Construct phylogenetic tree from pair wise distances T Functions By Category 1 12 seqneighjoin seqpdist Graph Theory graphallshortestpaths graphconncomp graphisdag graphisomorphism graphisspantree graphmaxf low graphminspantree graphpred2path graphshortestpath graphtopoorder graphtraverse Neighbor joining method for phylogenetic tree reconstruction Calculate pair wise distance between sequences Find all shortest paths in graph Find strongly or weakly connected components in graph Test for cycles in directed graph Find isomorphism b
500. spantree PropertyName PropertyValue calls minspantree with optional properties that use property name property value pairs You can specify one or more properties in any order Each PropertyName must be enclosed in single quotes and is case insensitive These property name property value pairs are as follows 4 49 minspantree biograph 4 50 References Tree pred minspantree Method MethodValue lets you specify the algorithm used to find the minimal spanning tree MST Choices are e Kruskal Grows the minimal spanning tree MST one edge at a time by finding an edge that connects two trees in a spreading forest of growing MSTs Time complexity is 0 E X 1log N where X is the number of edges no longer than the longest edge in the MST and N and E are the number of nodes and edges respectively e Prim Default algorithm Grows the minimal spanning tree MST one edge at a time by adding a minimal edge that connects a node in the growing MST with any other node Time complexity is O E log N where N and E are the number of nodes and edges respectively Note When the graph is unconnected Prim s algorithm returns only the tree that contains R while Kruskal s algorithm returns an MST for every component Tree pred minspantree Weights WeightsValue lets you specify custom weights for the edges WeightsValue is a column vector having one entry for every non
501. ss Charge M Z Retention Time Relative Intensity 5 Link the axes of the two heat plots and zoom in to observe the detail linkaxes findobj 0 Tag MSHeatMap axis 570 590 3750 3900 2 462 iol x File Edit View Insert Tools Desktop Window Help a Deug ri eana Ealen 3750 380 j 2 2 S Oo x 3850 3900 570 545 580 585 Mass Charge M Z 2 463 mspalign iol x File Edit View Insert Tools Desktop Window Help a wO ENE 3750 3800 Retention Time Relative Intensity 3850 3900 570 575 580 585 590 Mass Charge M Z References 1 Jeffries N 2005 Algorithms for alignment of mass spectrometry proteomic data Bioinfomatics 21 14 3066 3073 2 Purvine S Kolker N and Kolker E 2004 Spectral Quality Assessment for High Throughput Tandem Mass Spectrometry Proteomics OMICS A Journal of Integrative Biology 8 3 255 265 See Also Bioinformatics Toolbox functions msalign msdotplot msheatmap mspeaks msppresample mzxml2peaks 2 464 mspeaks Purpose Syntax Convert raw mass spectrometry data to peak list centroided data Peaks mspeaks MZ Intensities Peaks mspeaks MZ Intensities Peaks mspeaks MZ Intensities sax Peaks mspeaks MZ Intensities NoiseEstimatorValue Peaks mspeaks MZ Intensities MultiplierValue Peaks mspeaks MZ Intensities DenoisingValue
502. ssify rankfeatures svmclassify Statistics Toolbox function classify 2 595 randseq Purpose Generate random sequence from finite alphabet Syntax Seq randseq SeqLlength Seq randseq SeqlLength Alphabet AlphabetValue Seq randseq SeqlLength Weights WeightsValue Seq randseq SeqlLength FromStructure FromStructureValue Seq randseq SeqlLength Case CaseValue Seq randseq Seqlength DataType DataTypeValue Arguments SegLength Number of amino acids or nucleotides in AlphabetValue WeightsValue FromStructureValue CaseValue DataTypeValue random sequence Property to select the alphabet for the sequence Enter dna default rna or amino Property to specify a weighted random sequence Property to specify a weighted random sequence using output structures from the functions from basecount dimercount codoncount or aacount Property to select the case of letters in a sequence whenAlphabet is char Values are upper default or lower Property to select the data type for a sequence Values are char default for letter sequences and uint8 or double for numeric sequences Creates a sequence as an array of DataType Description Seq randseq SegLength creates a random sequence with a length specified by SeqLength 2 596 randseq Examples Seq randseq SeqlLength PropertyName PropertyValue
503. st Descriptions DescriptionsValue getblast Alignments AlignmentsValue getblast ToFile ToFileValue getblast FileFormat FileFormatValue getblast WaitTime WaitTimeValue RID BLAST Request ID RID from the function blastncbi DescriptionsValue Property to specify the number of descriptions in a report AlignmentsValue Property to select the number of alignments in a report Enter values from 1 to 100 The default value is 50 ToFileValue Property to specify a file name for saving report data FileFormatValue Property to select the format of the file named in ToFileValue Enter either TEXT or HTML Default is TEXT WaitTimeValue Property to pause MATLAB and wait a specified time minutes for a report from the NCBI Web site If the report is still not available after the wait time getblast returns an error message The default behavior is to not wait for a report 2 197 getblast Description Examples 2 198 BLAST Basic Local Alignment Search Tool reports offer a fast and powerful comparative analysis of interesting protein and nucleotide sequences against known structures in existing online databases getblast parses NCBI BLAST reports including BLASTN BLASTP BLASTX TBLASTN TBLASTX and psi BLAST Data getblast RID reads a BLAST Request ID RID and returns the report data in a structure Data The NCBI Request ID RID must be a recently generated report bec
504. st and powerful comparative analysis of interesting protein and nucleotide sequences against known structures in existing online databases blastncbi Seq Program sends a BLAST request against a sequence Seq to NCBI using a specified program Program With no output arguments blastncbi returns a command window link to the actual NCBI report RID blastncbi Seq Program calls with one output argument and returns the Report ID RID RID RTOE blastncbi Seq Program calls with two output arguments and returns both the report ID RID and the Request Time Of Execution RTOE which is an estimate of the time until completion blastncbi uses the NCBI default values for the optional arguments nr for the database L for the filter and 10 for the expectation threshold The default values for the remaining optional arguments depend on which program is used For help in selecting an appropriate BLAST program visit http www ncbi nlm nih gov BLAST producttable shtml Information for all of the optional parameters can be found at 2 67 blastncbi 2 68 http www ncbi nlm nih gov staff tao URLAPI blastcgihelp_new html blastncbi PropertyName PropertyValue defines optional properties using property name value pairs blastncbi Database DatabaseValue selects a database for the alignment search blastncbi Descriptions DescriptionsValue when the function is called without output ar
505. stanceValue computes nearest neighbor columns using the distance metric distfun The choices for DistanceValue are euclidean Euclidean distance default seuclidean Standardized Euclidean distance each coordinate in the sum of squares is inversely weighted by the sample variance of that coordinate knnimpute Example 1 cityblock City block distance mahalanobis Mahalanobis distance minkowski Minkowski distance with exponent 2 cosine One minus the cosine of the included angle correlation One minus the sample correlation between observations treated as sequences of values hamming Hamming distance the percentage of coordinates that differ jaccard One minus the Jaccard coefficient the percentage of nonzero coordinates that differ chebychev Chebychev distance maximum coordinate difference function A handle to a distance function specified using for handle example distfun See pdist for more details knnimpute DistArgs DistArgsValue passes arguments DistArgsValue to the function distfun DistArgsValue can bea single value or a cell array of values knnimpute Weights WeightsValues enables you to specify the weights used in the weighted mean calculation w should be a vector of length k knnimpute Median MedianValue when MedianValue is true uses the median of the k nearest neighbors instead of the weighted mean A 1 2 534 5 7 NaN 1 8 7 6
506. stimate method This tuning parameter sets the lower bound of signal values with positive probability Choices are a positive value Default is 5 MLE or 0 5 EB Tip For information on determining a setting for this parameter see Wu et al 2004 ExpressionMatrix gcrma GSBCorr GSBCorrValue controls whether gene specific binding GSB correction is performed 2 165 gcrma Examples References 2 166 on the non specific binding NSB data Choices are true default or false ExpressionMatrix gcrma Normalize NormalizeValue controls whether quantile normalization is performed on background adjusted data Choices are true default or false ExpressionMatrix gcrma Verbose VerboseValue controls the display of a progress report showing the number of each chip as it is completed Choices are true default or false 1 Load the MAT file included with Bioinformatics Toolbox that contains Affymetrix data from a prostate cancer study The variables in the MAT file include seqMatrix a matrix containing sequence information for PM probes pmMatrix and mmMatrix matrices containing PM and MM probe intensity values and probeIndices a column vector containing probe indexing information load prostatecancerrawdata 2 Compute the Affymetrix PM and MM probe affinities from their sequences and MM probe intensities apm amm affyprobeaffinities seqMatrix mmMatrix 1 Prob
507. stread 2 199 getembl Purpose Syntax Arguments Description 2 200 Sequence information from EMBL database Data getembl AccessionNumber getembl PropertyName PropertyValue getembl ToFile ToFileValue getembl SequenceOnly SequenceOnlyValue AccessionNumber Unique identifier for a sequence record Enter a unique combination of letters and numbers ToFileValue Property to specify the location and file name for saving data Enter either a file name or a path and file name supported by your system ASCII text file SequenceOnlyValue Property to control getting a sequence without the metadata Enter either true or false default getembl retrieves information from the European Molecular Biology Laboratory EMBL database for nucleotide sequences This database is maintained by the European Bioinformatics Institute EBI For more details about the EMBL Bank database see http www ebi ac uk emb1 Documentation index html Data getembl AccessionNumber searches for the accession number in the EMBL database http www ebi ac uk emb1 and returns a MATLAB structure containing the following fields Field Comments Identification Accession getembl Examples Field SequenceVersion DateCreated DateUpdated Description Keyword OrganismSpecies OrganismClassification Organelle Reference DatabaseCrossReference Feature BaseCount Sequence
508. struct Model Atom 1 X ans 14 0930 3 Edit the x coordinate of the first atom gflstruct Model Atom 1 X 18 pdbwrite Note Do not add or remove any Atom fields because the pdbwrite function does not allow the number of elements in the structure to change 4 Write the modified MATLAB structure gflstruct to a new PDB formatted file modified _gf1 pdb in the Work directory on your C drive pdbwrite c work modified_gfl pdb gflstruct 5 Use the pdbread function to read the modified PDB file into a MATLAB structure then confirm that the x coordinate of the first atom has changed modified_gflstruct pdbread c work modified_gfl pdb modified_gflstruct Model Atom 1 X ans 18 See Also Bioinformatics Toolbox functions getpdb molviewer pdbread 2 559 pfamhmmread Purpose Syntax Arguments Description Examples See Also 2 560 Read data from PFAM HMM file Data pfamhmmread File File PFAM HMM formatted file Enter a file name a path and file name or a URL pointing to a file File can also be a MATLAB character array that contains the text of a PFAM HMM file pfamhmmread reads data from a PFAM HMM formatted file file saved with the function gethmmprof and creates a MATLAB structure Data pfamhmmread File reads from File a Hidden Markov Model described by the PFAM format and converts it to the MATLAB structure Data containing fields corresponding to an
509. t ExistingGapAdjustValue if ExistingGapAdjustValue is false turns off the automatic adjustment based on existing gaps of the position specific penalties for opening a gap When ExistingGapAdjustValue is true for every profile position profalign proportionally lowers the penalty for opening a gap toward 2 499 multialign the penalty of extending a gap based on the proportion of gaps found in the contiguous symbols and on the weight of the input profile multialign TerminalGapAdjust TerminalGapAdjustValue when TerminalGapAdjustValue is true adjusts the penalty for opening a gap at the ends of the sequence to be equal to the penalty for extending a gap Example 1 Align seven cellular tumor antigen p53 sequences p53 fastaread p53samples txt ma multialign p53 verbose true showalignment ma 2 500 multialign Aligned Sequences P53_XENLA 69 264 CAVPSTDD YAGK YGLQLDFQQ NGTAKSVTCTYS PELNKLFCQLAKTCI P53_ONCMY 83 278 STVPTTSD YPGALGF QLRFLQ STAKSVTCTYS PDLNKLFCQLAKTCI P53_BRARE 63 257 STVPETSD YPGDHGFRLRF PQ SGTAKSVTCTYS PDLNKLFCQLAKTCI P53 _HUMAN 95 289 SSVPSQKTYQGS YGFRLGFLH SGTAKSVTCTYS PALNKMFCQLAKTCI P53_ORYLA 80 270 TTVPVTTD YPGS YELELRFQK SGTAKSVTSTYSETLNKLYCQLAKTSI P73_HUMAN 113 309 PVIPSNTD YPGPHHFEVTFQQ STAKSATWTYS PLLKKL YCQIAKTCI Q27937_LOLFO 120 314 PSVPS NIK YPGE YVFEMSFAQPSKETKSTTWT YSEKLDKL YVRMATTCI Bil 2 Use an UPGMA phylogenetic tree instead as
510. t you can interactively do the following Adjust the horizontal fold change lines by click dragging one line or entering a value in the Fold Change text box then clicking Update Display labels for data points by clicking a data point Select a gene from the Up Regulated or Down Regulated list to highlight the corresponding data point in the plot Press and hold Ctrl or Shift to select multiple genes Zoom the plot by selecting Tools gt Zoom In or Tools gt Zoom Out View lists of significantly up regulated and down regulated genes and optionally export the gene labels and indices to a structure in the MATLAB workspace by clicking Export Normalize the data by clicking the Normalize button then selecting whether to show the normalized plot in a separate window If you show the normalized plot in a separate window the Show smooth curve check box becomes available in the original unnormalized plot Note To select different lowess normalization options before normalizing select Tools gt Set LOWESS Normalization Options then select options from the Options dialog box Use the gprread function to create a structure containing microarray data maStruct gprread mouse_alwt gpr Use the magetfield function to extract the green cy3 and red cy5 signals from the structure 2 377 mairplot References See Also 2 378 cy3data magetfield maStruct F635 Median cy5data magetfield maStruct F
511. t does not count overlapping patterns multiple times In the following example seqwordcount reports three matches TATATATA is counted as two distinct matches not three overlapping occurrences seqwordcount GCTATAACGTATATATAT TATA ans 3 The following example reports two matches TAGT and TAAT B is the ambiguous code for G T or C while R is an ambiguous code for Gand A seqwordcount GCTAGTAACGTATATATAAT BART ans 2 seqwordcount See Also Bioinformatics Toolbox functions codoncount seqshoworfs seqshowwords seqtool seq2regexp MATLAB functions strfind 2 681 showalignment Purpose Sequence alignment with color Syntax showalignment Alignment showalignment Alignment MatchColor MatchColorValue showalignment Alignment SimilarColor SimilarColorValue ee showalignment Alignment StartPointers StartPointersValue showalignment Alignment Columns ColumnsValue Arguments Alignment For pairwise alignments matches and MatchColorValue SimilarColorValue 2 682 similar residues are highlighted and Alignment is the output from one of the functions nwalign or swalign For multiple sequence alignment highly conserved columns are highlighted and Alignment is the output from the function multialign Property to select the color to highlight matching characters Enter a 1 by N RGB vector specifying the intensity 0 to 255 of the red green and
512. t object 4 39 goannotread function reference 2 229 gonnet function reference 2 231 gprread function reference 2 232 graphallshortestpaths function reference 2 235 graphconncomp function reference 2 242 graphisdag function reference 2 249 graphisomorphism function reference 2 255 graphisspantree function reference 2 262 graphmaxflow function reference 2 264 graphminspantree function reference 2 272 graphpred2path function reference 2 278 graphshortestpath function reference 2 282 graphtopoorder function reference 2 294 graphtraverse function reference 2 298 H hmmprofalign function reference 2 307 hmmprofestimate function reference 2 310 hmmprofgenerate function reference 2 313 hmmprofmerge function reference 2 315 hmmprofstruct function reference 2 317 imageneread function reference 2 323 int2aa function reference 2 326 Index 5 Index int2nt function reference 2 329 isdag method reference 4 41 isoelectric function reference 2 332 isomorphism method reference 4 42 isspantree method reference 4 44 J jeampread function reference 2 335 joinseq function reference 2 338 K knnclassify function reference 2 339 knnimpute function reference 2 346 M maboxplot function reference 2 350 mafdr function reference 2 353 magetfield function reference 2 360 maimage function reference 2 361 mainvarsetnorm function reference 2 363 mairplot function reference 2 371 maloglog function referen
513. t the window length from 5 to 29 residues You can modify the shape of the smoothing window by changing the edge weighting factor And you can choose the smoothing function to be a linear moving average an exponential moving average or a linear Lowess smoothing proteinplot See Also The File menu allows you to Import a sequence save the plot that you have created to a FIG file you can export the data values in the figure to a workspace variable or to a MAT file you can export the figure to a normal figure window for customizing and you can print the figure The Edit menu allows you to create a new property to reset the property values to the default values and to modify the smoothing parameters with the Configuration Values menu item The View menu allows you to turn the toolbar on and off and to add a legend to the plot The Tools menu allows you to zoom in and zoom out of the plot to view Data Statistics such as mean minimum and maximum values of the plot and to normalize the values of the plot from 0 to 1 The Help menu allows you to view this document and to see the references for the sequence properties built into proteinplot Bioinformatics Toolbox functions aacount atomiccomp molviewer molweight pdbdistplot seqtool MATLAB function plotyy 2 583 proteinpropplot Purpose Plot properties of amino acid sequence Syntax proteinpropplot SeqAA proteinpropplot SeqAA PropertyTitle PropertyTitleValue
514. ta BLAST report from NCBI Web site Sequence information from EMBL database Sequence information from GenBank database Retrieve sequence information from GenPept database Retrieve Gene Expression Omnibus GEO Sample GSM data Data Formats and Databases gethmmalignment gethmmprof gethmmtree getpdb gprread imageneread jcampread multialignread mzxmlread pdbread pdbwrite pfamhmmread phytreeread phytreewrite scfread sptread Retrieve multiple sequence alignment associated with hidden Markov model HMM profile from PFAM database Retrieve hidden Markov model HMM profile from PFAM database Phylogenetic tree data from PFAM database Retrieve protein structure data from Protein Data Bank PDB database Read microarray data from GenePix Results GPR file Read microarray data from ImaGene Results file Read JCAMP DX formatted files Read multiple sequence alignment file Read mzXML file into MATLAB as structure Read data from Protein Data Bank PDB file Write to file using Protein Data Bank PDB format Read data from PFAM HMM file Read phylogenetic tree file Write phylogenetic tree object to Newick formatted file Read trace data from SCF file Read data from SPOT file T Functions By Category 1 6 Trace Tools scfread traceplot Sequence Conversion aa2int aa2nt aminolookup baselookup dna2rna int2aa int2nt nt2aa nt2int rna2dna s
515. ta points by click dragging a box around them This will highlight the points in the selected region and the corresponding points in the other axes The labels of the selected data points appear in the list box Select a label in the list box to highlight the corresponding data point in the plot Press and hold Ctrl or Shift to select multiple data points Export the gene labels and indices to a structure in the MATLAB workspace by clicking Export load filteredyeastdata mapcaplot yeastvalues genes Bioinformatics Toolbox functions clustergram mattest mavolcanoplot Statistics Toolbox function princomp mattest Purpose Syntax Arguments Perform two tailed t test to evaluate differential expression of genes from two experimental conditions or phenotypes PValues mattest Datax DatayY PValues TScores mattest Datax DataY PValues TScores DFs mattest Datax DatayY mattest Permute PermuteValue mattest Showhist ShowhistValue mattest Showplot ShowplotValue mattest Labels LabelsValue Datax DataY Matrices of gene expression values where each row corresponds to a gene and each column corresponds to a replicate DataX and DataY must have the same number of rows and are assumed to be normally distributed in each class with equal variances DataxX contains data from one experimental condition and DataY contains data from a different ex
516. te cancer after androgen ablation therapy Clinical Cancer Research 11 6823 6834 Bioinformatics Toolbox functions maboxplot mafdr mainvarsetnorm mairplot maloglog malowess manorm mavolcanoplot rmasummary mavolcanoplot Purpose Create significance versus gene expression ratio fold change scatter plot of microarray data Syntax mavolcanoplot Datax DataY PValues SigStructure mavolcanoplot Datax DataY PValues mavolcanoplot Labels LabelsValue Mavolcanoplot LogTrans LogTransValue Mavolcanoplot PCutoff PCutoffValue mavolcanoplot Foldchange FoldchangeValue Arguments Datax Matrix or vector of gene expression values from a single experimental condition If DataX is a matrix each row is a gene each column is a sample and an average expression value is calculated for each gene Note Ifthe values in DataX are natural scale use the LogTrans property to convert them to log 2 scale DataY Matrix or vector of gene expression values from a single experimental condition Ifa matrix each row is a gene each column is a sample and an average expression value is calculated for each gene Note Ifthe values in DataY are natural scale use the LogTrans property to convert them to log 2 scale 2 395 mavolcanoplot 2 396 PValues LabelsValue LogTransValue Vector of p values for each gene in data sets from two different
517. ted DirectedValue indicates whether the graph is directed or undirected Set directedValue to false for an undirected graph This results in the upper triangle of the sparse matrix being ignored Default is true A DFS based algorithm computes the connected components Time complexity is O N E where N and E are number of nodes and edges respectively S C conncomp BGObj Weak WeakValue indicates whether to find weakly connected components or strongly connected components A weakly connected component is a maximal group of nodes that are mutually reachable by violating the edge directions Set WeakValue to true to find weakly connected components Default is false which finds strongly connected components The state of this parameter has no effect on undirected graphs because weakly and strongly connected components are the same in undirected graphs Time complexity is 0 N E where N and E are number of nodes and edges respectively conncomp biograph References See Also Note By definition a single node can be a strongly connected component Note A directed acyclic graph DAG cannot have any strongly connected components larger than one 1 Tarjan R E 1972 Depth first search and linear graph algorithms SIAM Journal on Computing 1 2 146 160 2 Sedgewick R 2002 Algorithms in C Part 5 Graph Algorithms Addison Wesley 3 Siek J G Lee L Q and Lumsdaine A 2002
518. ted initially by using the getpdb or pdbread functions Note You can edit this structure to modify its 3 D protein structure data The coordinate information is stored in the Model field of PDBStruct Character array in which each row corresponds to a line in a PDB record pdbwrite File PDBStruct writes the contents of the MATLAB structure PDBStruct to a PDB formatted file ASCII text file whose path and file name are specified by File In the output file File the 2 557 pdbwrite Examples 2 558 atom serial numbers are preserved The atomic coordinate records are ordered according to their atom serial numbers Tip After you save the MATLAB structure to a local PDB formatted file you can use the molviewer function to display and manipulate a 3 D image of the structure PDBArray pdbwrite File PDBStruct saves the formatted PDB record converted from the contents of the MATLAB structure PDBStruct to PDBArray a character array in which each row corresponds to a line in a PDB record Note You can edit PDBStruct to modify its 3 D protein structure data The coordinate information is stored in the Model field of PDBStruct 1 Use the getpdb function to retrieve structure information from the Protein Data Bank PDB for the green fluorescent protein with identifier 1GFL and store the data in the MATLAB structure gflstruct gflstruct getpdb 1GFL 2 Find the x coordinate of the first atom gfl
519. tent CpGoe content CpG islands greater than the minimum island size and all potential CpG islands for the specified criteria Example 1 Import a nucleotide sequence from GenBank For example get a sequence from Homo Sapiens chromosome 12 S getgenbank AC156455 2 Calculate the CpG islands in the sequence and plot the results cpgisland S Sequence PLOT true MATLAB lists the CpG islands greater than 200 bases and draws a figure ans Starts 4470 28753 29347 36229 Stops 5555 29064 29676 36450 2 108 cpgisland CTT lolx File Edit View Insert Tools Desktop Window Help a i GC content 0 0 1 2 3 4 4 CPGoe content x10 See Also Bioinformatics Toolbox functions basecount ntdensity seqshoworfs 2 109 crossvalind Purpose Syntax Description 2 110 Generate cross validation indices Indices crossvalind Kfold N K Train Test crossvalind HoldOut N P Train Test crossvalind LeaveMOut N M Train Test crossvalind Resubstitution N P Q crossvalind Method Group crossvalind Method Group Classes C crossvalind Method Group Min MinValue Indices crossvalind Kfold N K returns randomly generated indices for a K fold cross validation of N observations Indices contains equal or approximately equal proportions of the integers 1 through K that define a partition of the N observations into K d
520. ter true to use the calculation Default is false Tree Reordered Phytree object with reordered leaves OptimalOrder Vector of position indices for each leaf in Tree1Reordered determined by the optimal leaf ordering calculation Tree1Reordered reorder Tree Order reorders the leaves of the phylogenetic tree Tree1 without modifying its structure and distances creating a new phylogenetic tree Tree7Reordered Order is a vector of position indices for each leaf If Order is invalid that is if it divides the clades or produces crossing branches then reorder returns an error message Tree Reordered OptimalOrder reorder Tree1 Order Approximate ApproximateValue controls the use of the optimal leaf ordering calculation which finds the best approximate order closest to the suggested one without dividing the clades or producing crossing branches Enter true to use the calculation and return 4 59 reorder phytree Examples 4 60 Tree1Reordered the reordered tree and OptimalOrder a vector of position indices for each leaf in Tree1Reordered determined by the optimal leaf ordering calculation Default is false Tree Reordered OptimalOrder reorder Tree1 Tree2 uses the optimal leaf ordering calculation to reorder the leaves in Tree such that it matches the order of leaves in Tree2 as closely as possible without dividing the clades or producing crossing branches Tree1Reordered is the reordered tree and
521. tgenbank Z92777 CDS featuresparse worm feature cds CDS 1x12 struct array with fields Location Indices 2 155 featuresparse 2 156 locus_tag standard_name note codon_start product protein_id db_xref translation Extracting Sequences for Each Feature 1 Retrieve two nucleotide sequences from the GenBank database for the neuraminidase NA protein of two strains of the Influenza A virus H5N1 hk01 getgenbank AF509094 vt04 getgenbank DQ094287 2 Extract the sequence of the coding region for the neuraminidase NA protein from the two nucleotide sequences The sequences of the coding regions are stored in the Sequence fields of the returned structures hkO1_cds and vt04_cds hkO1_cds featuresparse hk01 feature CDS Sequence true vt04_cds featuresparse vt04 feature CDS Sequence true 3 Once you have extracted the nucleotide sequences you can use the nt2aa and nwalign functions to align the amino acids sequences converted from the nucleotide sequences sc al nwalign nt2aa hk01_cds nt2aa vt04_cds extendgap 1 4 Then you can use the seqinsertgaps function to copy the gaps from the aligned amino acid sequences to their corresponding nucleotide sequences thus codon aligning them hkO1_aligned seqinsertgaps hk01_cds al 1 vt04_aligned seqinsertgaps vt04_cds al 3 featuresparse 5 Once you have code aligned the two sequences you can
522. th at half height FWHH in m z units The FWHH is used to convert each peak msppresample Examples to a Gaussian shaped curve Default is median diff inputMZ 2 where inputMZ is the concatenated m z values from the input Peaks The default is a rough approximation of resolution observed in the input data Peaks Tip To ensure that the resolution of the peaks is preserved set FWHHValue to half the distance between the two peaks of interest that are closest to each other MZ Intensities msppresample Peaks N ShowPlot ShowPlotValue controls the display of a plot of an original and resampled spectrum Choices are true false or I an integer specifying the index of a spectrum in Intensities If set to true the first spectrum in Intensities is plotted Default is e false When return values are specified e true When return values are not specified 1 Load a MAT file included with Bioinformatics Toolbox which contains liquid chromatography mass spectrometry LC MS data variables including peaks a cell array of peak lists where each element is a two column matrix of m z values and ion intensity values and each element corresponds to a spectrum or retention time load lcmsdata 2 Resample the data specifying 5000 m z values in the resampled signal Then create a heat map of the LC MS data MZ Y msppresample peaks 5000 msheatmap MZ ret_time 1log Y 2 481 msppresample
523. the log p value from the vector PValues DataX and DataY can be vectors or matrices SigStructure mavolcanoplot Datax DataY PValues returns a structure containing information for genes that are considered to be both statistically significant above the p value cutoff and significantly differentially expressed outside of the fold change values The fields within SigStructure are sorted by p value and include e Name e PCutoff 2 397 mavolcanoplot 2 398 e FCThreshold e GeneLabels e PValues e FoldChanges Mavolcanoplot PropertyName PropertyValue defines optional properties that use property name value pairs in any order These property name value pairs are as follows mavolcanoplot Labels LabelsValue lets you provide a cell array of labels typically gene names or probe set IDs for the data After creating the plot you can click a data point to display the label associated with it If you do not provide a LabelsValue data points are labeled with row numbers from DataX and DatayY mavolcanoplot LogTrans LogTransValue controls the conversion of data from Datax and DatayY to log scale When LogTransValue is true mavolcanoplot converts data from natural to log scale Default is false which assumes the data is already log scale mavolcanoplot PCutoff PCutoffValue lets you specify a p value cutoff to define data points that are statisti
524. the phytree constructor function NodeValue Property to select the nodes Enter either leaves default or all SquareformValue Property to control creating a square matrix D pdist Tree returns a vector D containing the patristic distances between every possible pair of leaf nodes a phylogenetic tree object Tree The patristic distances are computed by following paths through the branches of the tree and adding the patristic branch distances originally created with seqlinkage The output vector D is arranged in the order 2 1 3 1 M 1 3 2 M 3 M M 1 the lower left triangle of the full M by M distance matrix To get the distance between the Ith and Jth nodes I gt J use the formula D J 1 M J 2 I J Mis the number of leaves D C pdist Tree returns in C the index of the closest common parent nodes for every possible pair of query nodes pdist PropertyName PropertyValue defines optional properties using property name value pairs pdist Nodes NodeValue indicates the nodes included in the computation When Node leaves the output is ordered as before but Mis the total number of nodes in the tree NumLeaves NumBranches pdist phytree pdist Squareform SquareformValue when Squareform is true converts the output into a square formatted matrix so that D 1I J denotes the distance between the Ith and the Jth nodes The output matrix is sy
525. the information probesetlink PropertyName PropertyValue defines optional properties using property name value pairs probesetlink Source SourceValue when SourceValue is true links to the data source e g GenBank Flybase for the probe set probesetlink Browser BrowserValue when BrowserValue is true displays the information in the system Web browser probesetlink NoDisplay NoDisplayValue when NoDisplayValue is true returns the URL but does not open a browser Note NetAffx Web site requires you to register and provide a user name and password 1 Get the file Drosophila 121502 chp from http www affymetrix com support technical sample_data demo_data affx 2 Read the data into MATLAB probesetlink chpStruct affyread Drosophila 121502 chp D Affymetrix LibFiles DrosGenome1 3 Display information from the NetAffx Web site probesetlink chpStruct AFFX YELO18w _at See Also Bioinformatics Toolbox functions affyread celintensityread probelibraryinfo probesetlookup probesetplot probesetvalues 2 573 probesetlookup Purpose Gene name for probe set Syntax probesetlookup AFFYStruct ID probesetlookup AFFYStruct Name Name NDX Description Source SourceURL probesetlookup Description probesetlookup AFFYStruct ID returns the gene name for a probe set ID from a CHP or CDF structure AFFYStruct probesetlookup AFFYStruct Na
526. the node IDs cm 0 1100 510011310000 00001 1010 0 bg1 biograph cm Biograph object with 5 nodes and 9 edges get bg1 nodes ID ans Node 1 Node 2 Node 3 Node 4 Node 5 2 Create a biograph object assign the node IDs and then use the get function to display the node IDs cm 0 11003 100113 10000 00001 1 010 0 ids M30931 L07625 K03454 M27323 M15390 bg2 biograph cm ids get bg2 nodes ID ans M30931 L07625 K03454 M27323 M15390 3 Use the view method to display the biograph object 2 62 biograph view bg2 Biograph Viewer 1 File Tools Window Help QAM See Also Bioinformatics Toolbox object biograph object Bioinformatics Toolbox methods of a biograph object allshortestpaths conncomp dolayout getancestors getdescendants getedgesbynodeid getmatrix getnodesbyid 2 63 biograph getrelatives isdag isomorphism isspantree maxflow minspantree shortestpath topoorder traverse view MATLAB functions get set 2 64 blastncbi Purpose Syntax Arguments Generate remote BLAST request blastncbi Seq Program RID blastncbi Seq Program RID RTOE blastncbi J blastncbi blastncbi blastncbi blastncbi blastncbi blastncbi blastncbi blastncbi blastncbi blastncbi blastncbi Seq Program 3 3 3 blastncbi Seq P
527. the results into a MATLAB structure results blastread AAA59174 BLAST rpt For more information about reading and interpreting BLAST reports see http www ncbi nlm nih gov Education BLASTinfo Blast_output html Bioinformatics Toolbox functions blastncbi getblast blosum Purpose BLOSUM scoring matrix Syntax Matrix blosum Identity Matrix MatrixInfo blosum Identity blosum PropertyName PropertyValue blosum Extended ExtendedValue blosum Order OrderValue Arguments Identity Percent identity level Enter values from 30 to 90 in increments of 5 enter 62 or enter 100 ExtendedValue Property to control the listing of extended amino acid codes Enter either true default or false OrderValue Property to specify the order amino acids are listed in the matrix Enter a character string of legal amino acid characters The length is 20 or 24 characters Description Matrix blosum Identity returns a BLOSUM Blocks Substitution Matrix matrix with a specified percent identity The default ordering of the output includes the extended characters B Z X and ARNDCQEGHILKMFPSTWYVBZ X Matrix MatrixInfo blosum Identity returns a structure of information MatrixInfo about a BLOSUM matrix Matrix with the fields Name Scale Entropy ExpectedScore HighestScore LowestScore and Order blosum PropertyName PropertyValue defines optional properties using propert
528. the syntax ExpressionMatrix gcrma PropertyName PropertyValue calls gcrma with optional properties that use property gcrma name property value pairs You can specify one or more properties in any order Each PropertyName must be enclosed in single quotes and is case insensitive These property name property value pairs are as follows ExpressionMatrix gcrma ChipIndex ChipIndexValue computes probe affinities from MM probe intensity data from the chip with the specified column index in MMMatrix assuming no affinity data is provided Default ChipIndexValue is 1 If AffinPM and AffinMM affinity data are provided this property is ignored ExpressionMatrix gcrma OpticalCorr OpticalCorrValue controls the use of optical background correction on the PM and MM intensity values in PMMatrix and MMMatrix Choices are true default or false ExpressionMatrix gcrma CorrConst CorrConstValue specifies the correlation constant rho for background intensity for each PM MM probe pair Choices are any value gt 0 and lt 1 Default is 0 7 ExpressionMatrix gcrma Method MethodValue specifies the method to estimate the signal Choices are MLE a faster ad hoc Maximum Likelihood Estimate method or EB a slower more formal empirical Bayes method Default is MLE ExpressionMatrix gcrma TuningParam TuningParamValue specifies the tuning parameter used by the e
529. they appear ordered in the graph display 2 295 graphtopoorder DG DG order order DG lt x xw wre re re TS wa ee Cee re Cee Cree Cee Ce view biograph DG 2 296 graphtopoorder References See Also iol xl File Tools Window Help a QA 1 Siek J G Lee L Q and Lumsdaine A 2002 The Boost Graph Library User Guide and Reference Manual Upper Saddle River NJ Pearson Education Bioinformatics Toolbox functions graphallshortestpaths graphconncomp graphisdag graphisomorphism graphisspantree graphmaxflow graphminspantree graphpred2path graphshortestpath graphtraverse Bioinformatics Toolbox method of biograph object topoorder 2 297 graphtraverse Purpose Traverse graph by following adjacent nodes Syntax disc pred closed graphtraverse G S graphtraverse G S Depth DepthValue graphtraverse G S Directed DirectedValue graphtraverse G S Method MethodValue Arguments G N by N sparse matrix that represents a directed S DepthValue DirectedValue MethodValue Description graph Nonzero entries in matrix G indicate the presence of an edge Integer that indicates the source node in graph G Integer that indicates a node in graph G that specifies the depth of the search Default is Inf infinity Property that indicates whether graph G is directed or undirected
530. this sequence s SourceOrganism ans Homo sapiens Eukaryota Metazoa Chordata Craniata Vertebrata Mammalia Eutheria Catarrhini Hominidae Homo See Also getgenbank scfread seqtool 2 178 Euarchontoglires Primates Haplorrhini Bioinformatics Toolbox functions emblread fastaread genpeptread Euteleostomi geneentropyfilter Purpose Syntax Arguments Description Remove genes with low entropy expression values Mask geneentropyfilter Data Masks FData geneentropyfilter Data Mask FData FNames geneentropyfilter Data Names geneentropyfilter PropertyName PropertyValue geneentropyfilter Percentile PercentileValue Data Matrix where each row corresponds to the experimental results for one gene Each column is the results for all genes from one experiment Names Cell array with the name of a gene for each row of experimental data Names has same number of rows as Data with each row containing the name or ID of the gene in the data set PercentileValue Property to specify a percentile below which gene data is removed Enter a value from 0 to 100 Mask geneentropyfilter Data identifies gene expression profiles in Data with entropy values less than the 10th percentile Mask is a logical vector with one element for each row in Data The elements of Mask corresponding to rows with a variance greater than the threshold have a value of 1 and those
531. this structure see the optimset and quadprog reference pages An options structure created by the svmsmoset function This structure specifies options used by the SMO method For more information on creating this structure see the svmsmoset function svmtrain BoxConstraintValue AutoscaleValue Showplot Value Box constraints for the soft margin Choices are e Strictly positive numeric scalar e Array of strictly positive values with the number of elements equal to the number of rows in the Training matrix If BoxConstraintValue is a scalar it is automatically rescaled by N 2 N1 for the data points of group one and by N 2 N2 for the data points of group two N1 is the number of elements in group one N2 is the number of elements in group two and N N1 N2 This rescaling is done to take into account unbalanced groups that is cases where N1 and N2 have very different values If BoxConstraintValue is an array then each array element is taken as a box constraint for the data point with the same index Default is a scalar value of 1 Controls the shifting and scaling of data points before training When AutoscaleValue is true the columns of the input data matrix Training are shifted to zero mean and scaled to unit variance Default is false Controls the display of a plot of the grouped data including the separating line for the classifier when using two dimensional data Choices are true or false
532. thm finishes KKTViolationLevell Value that specifies the fraction of variables allowed to violate the KKT conditions Choices are any value gt 0 and lt 1 Default is 0 For example if you set KKTViolationLevel to 0 05 then 5 of the variables are allowed to violate the KKT conditions Tip Set this option to a positive value to help the algorithm converge if it is fluctuating near a good solution For more information on KKT conditions see Cristianini et al 2000 KernelCacheLimit Value that specifies the size of the kernel matrix cache The algorithm keeps a matrix with up to KernelCacheLimit x KernelCacheLimit double precision floating point numbers in memory Default is 7500 Return SMO_OptsStruct Structure that specifies options used by the SMO Values method used by the svmtrain function 2 697 svmsmoset Description Examples 2 698 SMO_OptsStruct svmsmoset Property1Name Property1Value Property2Name Property2Value creates SMO_OptsStruct an SMO options structure from the specified inputs This structure can be used as input for the svmtrain function SMO_OptsStruct svmsmoset OldOpts Property1Name Property1Value Property2Name Property2Value alters the options in OldOpts an existing SMO options structure with the specified inputs creating a new output options structure SMO_OptsStruct svmsmoset OldOpts NewOpts alters the options in OldOp
533. tnewickstr phytree pdist phytree Information about phylogenetic tree object Branches and leaves from phytree object Calculate canonical form of phylogenetic tree Convert phytree object into relationship matrix Create Newick formatted string Calculate pair wise patristic distances in phytree object 3 Methods By Category plot phytree prune phytree reorder phytree reroot phytree select phytree subtree phytree view phytree weights phytree Graph Visualization Draw phylogenetic tree Remove branch nodes from phylogenetic tree Reorder leaves of phylogenetic tree Change root of phylogenetic tree Select tree branches and leaves in phytree object Extract phylogenetic subtree View phylogenetic tree Calculate weights for phylogenetic tree Following are methods for use with a biograph object allshortestpaths biograph conncomp biograph dolayout biograph getancestors biograph getdescendants biograph getedgesbynodeid biograph getmatrix biograph getnodesbyid biograph getrelatives biograph Find all shortest paths in biograph object Find strongly or weakly connected components in biograph object Calculate node positions and edge trajectories Find ancestors in biograph object Find descendants in biograph object Get handles to edges in biograph object Get connection matrix from biograph object Get handles to nodes Find relatives in b
534. to an Affymetrix CEL file Each CEL file is generated from a separate chip All chips should be of the same type Tip You can use the MMIntensities matrix returned by the celintensityread function Column vector containing probe indices Probes within a probe set are numbered 0 through N 1 where N is the number of probes in the probe set Tip You can use the affyprobeseqread function to generate this column vector gcrma AffinPM AffinMM Column vector of PM probe affinities Tip You can use the affyprobeaffinities function to generate this column vector Column vector of MM probe affinities Tip You can use the affyprobeaffinities function to generate this column vector 2 161 gcrma 2 162 SequenceMatrix ChipIndexValue OpticalCorrValue An N by 25 matrix of sequence information for the perfect match PM probes on the Affymetrix GeneChip array where N is the number of probes on the array Each row corresponds to a probe and each column corresponds to one of the 25 sequence positions Nucleotides in the sequences are represented by one of the following integers e 0 None e 1 A e 2 C e 3 G e 4 T Tip You can use the affyprobeseqread function to generate this matrix If you have this sequence information in letter representation you can convert it to integer representation using the nt2int function Positive integer specifying a column index
535. tree Type TypeValue AccessionNumber Accession number in the PFAM database ToFileValue Property to specify the location and file name for saving data Enter either a file name or a path and file name supported by your system ASCII text file TypeValue Property to control which alignments are included in the tree Enter either seed or full default Tree gethmmtree AccessionNumber searches for the PFAM family accession number in the PFAM database and returns an object Tree containing a phylogenetic tree representative of the protein family gethmmtree PropertyName PropertyValue defines optional properties using property name value pairs gethmmtree ToFile ToFileValue saves the data returned from the PFAM database in the file ToFileValue gethmmtree Type TypeValue when TypeValue is seed returns a tree with only the alignments used to generate the HMM model When TypeValue is full returns a tree with all of the alignments that match the model Retrieve a phylogenetic tree built from the multiple aligned sequences used to train the HMM profile model for global alignment The PFAM accession number PF00002 is for the 7 transmembrane receptor protein in the secretin family gethmmtree tree gethmmtree 2 type seed tree gethmmtree PFOOO002 type seed See Also Bioinformatics Toolbox functions gethmmalignment phytreeread 2 221 getpdb
536. tree represented by an M 1 by 3 matrix created by the linkage function where M is the number of leaves Dist Distance matrix such as that created by the pdist function 2 540 optimalleaforder CriteriaValue String that specifies the optimization criteria Choices are e adjacent default Minimizes the sum of distances between adjacent leaves e group Minimizes the sum of distances between every leaf and all other leaves in the adjacent cluster Trans formationValue Either of the following e String that specifies the algorithm to transform the distances in Dist into similarity values Choices are linear default Similarity max all distances distance quadratic Similarity max all distances distance inverse Similarity 1 distance e A function handle created using to a function that transforms the distances in Dist into similarity values The function is typically a monotonic decreasing function within the range of the distance values The function must accept a vector input and return a vector of the same size Return Order Optimal leaf ordering for the hierarchical Values binary cluster tree represented by Tree Description Order optimalleaforder Tree Dist returns the optimal leaf ordering for the hierarchical binary cluster tree represented by Tree an M 1 by 3 matrix created by the linkage function where M is the number of leaves Optimal leaf orderi
537. trieve two sequences from the GenBank database for the gag genes of two HIV viruses gagi getgenbank L11768 gag2 getgenbank L11770 2 Estimate the synonymous and nonsynonymous substitution rates between the two sequences dn ds like dndsml gagi gag2 dn 0 0259 ds 0 0624 like 2 1864e 003 Estimating Synonymous and Nonsynonymous Substitution Rates Between Two Nucleotide Sequences That Are Not Codon Aligned 1 Retrieve two nucleotide sequences from the GenBank database for the neuraminidase NA protein of two strains of the Influenza A virus H5N1 hkO1 getgenbank AF509094 vt04 getgenbank DQ094287 2 Extract the coding region from the two nucleotide sequences hkO1_cds featuresparse hk01 feature CDS Sequence true vt04_cds featuresparse vt04 feature CDS Sequence true 2 128 dndsml References See Also 3 Align the amino acids sequences converted from the nucleotide sequences sc al nwalign nt2aa hk01_cds nt2aa vt04_cds extendgap 1 4 Use the seqinsertgaps function to copy the gaps from the aligned amino acid sequences to their corresponding nucleotide sequences thus codon aligning them hkO1_aligned seqinsertgaps hk01_cds al 1 vt04_aligned seqinsertgaps vt04_cds al 3 5 Estimate the synonymous and nonsynonymous substitutions rates of the codon aligned nucleotide sequences and also display the codons considered in the
538. trix shows the shortest path from node 1 first row to node 6 sixth column is 0 95 You can see this in the graph by tracing the path from node 1 to node 5 to node 4 to node 6 0 21 0 36 0 38 0 95 Finding All Shortest Paths in an Undirected Graph 1 Create and view an undirected graph with 6 nodes and 11 edges UG tril DG DG UG 4 1 0 4500 5 1 0 2100 6 1 0 9900 3 2 0 5100 5 2 0 3200 6 2 0 4100 4 3 0 1500 5 3 0 3200 6 3 0 2900 5 4 0 3600 6 4 0 3800 view biograph UG ShowArrows off ShowWeights on 2 239 graphallshortestpaths Biograph Viewer 1 i File Tools Window Help os QAM 2 Find all the shortest paths between every pair of nodes in the undirected graph graphallshortestpaths UG directed false ans 0 0 5300 0 5300 0 4500 0 2100 0 8300 5300 0 0 5100 0 6600 0 3200 0 7000 5300 0 5100 0 0 1500 0 3200 0 5300 oo 2 240 graphallshortestpaths References See Also 0 4500 0 6600 0 1500 0 0 3600 0 3800 0 2100 0 3200 0 3200 0 3600 0 0 7400 0 8300 0 7000 0 5300 0 3800 0 7400 0 The resulting matrix is symmetrical because it represents an undirected graph It shows the shortest path from node 1 first row to node 6 sixth column is 0 83 You can see this in the graph by tracing the path from node 1 to node 4 to node 6 0 45 0 38 0 83 Because UG is an undirected graph we can use the edge between node 1 and node 4 whi
539. trongly connected components The state of this parameter has no effect on undirected graphs because weakly and strongly connected components are the same in undirected graphs Time complexity is 0 N E where N and E are number of nodes and edges respectively Tip For introductory information on graph theory functions see Graph Theory Functions in the Bioinformatics Toolbox documentation 4 5 conncomp biograph 4 6 S C conncomp BGObj finds the strongly connected components of an N by N adjacency matrix extracted from a biograph object BGObj using Tarjan s algorithm A strongly connected component is a maximal group of nodes that are mutually reachable without violating the edge directions The N by N sparse matrix represents a directed graph all nonzero entries in the matrix indicate the presence of an edge The number of components found is returned in S and C is a vector indicating to which component each node belongs Tarjan s algorithm has a time complexity of O N E where N and E are the number of nodes and edges respectively S C conncomp BGObj PropertyName PropertyValue calls conncomp with optional properties that use property name property value pairs You can specify one or more properties in any order Each PropertyName must be enclosed in single quotes and is case insensitive These property name property value pairs are as follows S C conncomp BGObj Direc
540. truct pdbstruct pdbread nicotinic_receptor pdb 3 Read only the second model from the nicotinic_receptor pdb file into a MATLAB structure pdbstruct_Model12 pdbstruct_Model2 pdbread nicotinic_receptor pdb ModelNum 2 4 View the atomic coordinate information in the model fields of both MATLAB structures pdbstruct and pdbstruct_Model2 pdbstruct Model ans 1x4 struct array with fields MDLSerNo Atom Terminal pdbstruct_Model2 Model ans MDLSerNo 2 2 555 pdbread Atom 1x1205 struct Terminal 1x2 struct 5 Read the data from an URL into a MATLAB structure gfl_pdbstruct gfl_pdbstruct pdbread http www rcsb org pdb files 1gfl pdb See Also Bioinformatics Toolbox functions genpeptread getpdb molviewer pdbdistplot pdbwrite 2 556 pdbwrite Purpose Syntax Arguments Return Values Description Write to file using Protein Data Bank PDB format pdbwrite File PDBStruct PDBArray pdbwrite File PDBStruct File PDBStruct PDBArray String specifying either a file name or a path and file name for saving the PDB formatted data If you specify only a file name the file is saved to the MATLAB Current Directory Tip After you save the MATLAB structure to a local PDB formatted file you can use the molviewer function to display and manipulate a 3 D image of the structure MATLAB structure containing 3 D protein structure coordinate data crea
541. ts an existing SMO options structure with the options specified in NewOpts another SMO options structure creating a new output options structure 1 Create an SMO options structure and specify the Display MaxIter and KernelCacheLimit properties opts svmsmoset Display final MaxIter 200 KernelCacheLimit 1000 opts Display final TO1KKT 1 0000e 003 MaxIter 200 KKTViolationLevel 0 KernelCacheLimit 1000 2 Create an alternate SMO options structure from the previous structure Specify different Display and KKTViolationLevel properties alt_opts svmsmoset opts Display iter KKTViolationLevel 05 alt_opts Display iter svmsmoset References See Also To1KKT 1 0000e 003 MaxIter 200 KKTViolationLevel 0 0500 KernelCacheLimit 1000 1 Cristianini N and Shawe Taylor J 2000 An Introduction to Support Vector Machines and Other Kernel based Learning Methods First Edition Cambridge Cambridge University Press http www support vector net 2 Platt J C 1999 Sequential Minimal Optimization A Fast Algorithm for Training Support Vector Machines In Advances in Kernel Methods Support Vector Learning B Scholkopf J C Burges and A J Smola eds Cambridge MA MIT Press pp 185 208 Bioinformatics Toolbox functions svmclassify svmtrain Optimization Toolbox functions optimset 2 699 svmtrain Purpose Syntax Arguments 2 700 Train sup
542. ts into the axes contained in an open Figure window with the handle FigHandle Tip This syntax is useful to overlay a dot plot on top of a heat map of mass spectrometry data created with the msheatmap function msdotplot Quantile QuantileValue plots only the most intense peaks specifically those in the percentage above the specified QuantileValue Choices are any value gt 0 and lt 1 Default is 0 For example setting QuantileValue 0 plots all peaks and setting QuantileValue 0 8 plots only the 20 most intense peaks PlotHandle msdotplot returns a handle to the line series object figure plot You can use this handle as input to the get function to display a list of the plot s properties You can use this handle as input to the set function to change the plot s properties including showing and hiding points 1 Load a MAT file included with Bioinformatics Toolbox which contains LC MS data variables including peaks and ret_time peaks is a cell array of peak lists where each element is a two column 2 431 msdotplot matrix of m z values and ion intensity values and each element corresponds to a spectrum or retention time ret_time is a column vector of retention times associated with the LC MS data set load lcmsdata 2 Create a dot plot with only the 5 most intense peaks msdotplot peaks ret_time Quantile 0 95 alo File Edit View Insert Tools Desktop Window Help a DHr ean
543. tution rate s e Vardn Variance for the nonsynonymous substitution rate s e Vards Variance for the synonymous substitutions rate s This analysis e Assumes that the nucleotide sequences SeqNT1 and SeqNT2 are codon aligned that is do not have frame shifts Tip If your sequences are not codon aligned use the nt2aa function to convert them to amino acid sequences use the nwalign function to globally align them then use the seqinsertgaps function to recover the corresponding codon aligned nucleotide sequences See Estimating Synonymous and Nonsynonymous Substitution Rates Between Two Nucleotide Sequences That Are Not Codon Aligned on page 2 123 dnds e Excludes codons that include ambiguous nucleotide characters or gaps e Considers the number of codons in the shorter of the two nucleotide sequences Caution If SeqNT1 and SeqNT2 are too short or too divergent saturation can be reached and dnds returns NaNs and a warning message Dn Ds Vardn Vards dnds SeqNT1 SeqNT2 PropertyName PropertyValue calls dnds with optional properties that use property name property value pairs You can specify one or more properties in any order Each PropertyName must be enclosed in single quotation marks and is case insensitive These property name property value pairs are as follows Dn Ds Vardn Vards dnds SeqNT7 SeqNT2 GeneticCode GeneticCodeValue calculates synonymous and no
544. ty for opening a gap toward the penalty of extending a gap based on the proportion of gaps found in the contiguous symbols and on the weight of the input profile profalign TerminalGapAdjust TerminalGapAdjustValue when TerminalGapAdjustValue is true adjusts the penalty for opening a gap at the ends of the sequence to be equal to the penalty for extending a gap Default is false profalign ShowScore ShowScoreValue when ShowScoreValue is true displays the scoring space and the winning path 1 Read in sequences and create profiles mat RGTANCDMQDA RGTAHCDMQDA RRRAPCDL DA ma2 RGTHCDLADAT RGTACDMADAA p1 seqprofile ma1 gaps all counts true p2 seqprofile ma2 counts true 2 579 profalign See Also 2 580 2 Merge two profiles into a single one by aligning them p profalign p1 p2 seqlogo p 3 Use the output pointers to generate the multiple alignment p hi h2 profalign p1 p2 ma repmat 5 12 ma 1 3 h1 mat ma 4 5 h2 ma2 disp ma 4 Increase the gap penalty before cysteine in the second profile gapVec 10 p2 aa2int C O 10 p3 profalign p1i p2 gapopen 10 gapVec seqlogo p3 5 Add a new sequence to a profile without inserting new gaps into the profile gapVec 0 inf 1 11 0 p4 profalign p3 seqprofile PLHFMSVLWDVQQWP gapopen gapVec 10 seqlogo p4 Bioinformatics Toolbox functions
545. type multialignviewer aagag aln Bioinformatics Toolbox functions fastaread gethmmalignment multialign multialignread seqtool mzxml2peaks Purpose Convert mzXML structure to peak list Syntax Peaks Times mzxml2peaks mzXMLStruct Peaks Times mzxml2peaks mzXMLStruct Levels LevelsValue Arguments mzXMLStruct mzXML structure such as one created by the mzxmlread function mzXMLStruct includes the following fields e scan e offset e mzXML LevelsValue Positive integer or vector of integers that specifies the level s of spectra in mzXMLStruct to convert assuming the spectra are from tandem MS data sets Default is 1 which converts only the first level spectra that is spectra containing precursor ions Setting LevelsValue to 2 converts only the second level spectra which are the fragment spectra created from a precursor ion 2 507 mzxml2peaks Return Values Description Examples 2 508 Peaks Either of the following e Two column matrix where the first column contains mass charge m z values and the second column contains ion intensity values e Cell array of peak lists where each element is a two column matrix of m z values and ion intensity values and each element corresponds to a spectrum or retention time Times Vector of retention times associated with a liquid chromatography mass spectrometry LC MS or gas chromatography mass spectrometry GC MS data set The nu
546. u Luo method 1985 uses the number of transitional and transversional substitutions at three different levels of degeneracy of the genetic code Based on Kimura s two parameter model e PBL Pamilo Bianchi Li method 1993 is similar to the Li Wu Luo method but with bias correction Use this method when the number of transitions is much larger than the number of transversions Integer specifying the sliding window size in codons for calculating substitution rates and variances Property to control the display of the codons considered in the computations and their amino acid translations Choices are true or false default Tip Specify true to use this display to manually verify the codon alignment of the two input sequences The presence of stop codons in the amino acid translation can indicate that SeqNT1 and SeqNT2 are not codon aligned 2 119 dnds Return Values Description 2 120 Dn Nonsynonymous substitution rate s Ds Synonymous substitution rate s Vardn Variance for the nonsynonymous substitution rate s Vards Variance for the synonymous substitutions rate s Dn Ds Vardn Vards dnds SeqNT1 SeqNT2 estimates the synonymous and nonsynonymous substitution rates per site between the two homologous nucleotide sequences SeqNT and SeqNT2 by comparing codons using the Nei Gojobori method dnds returns e Dn Nonsynonymous substitution rate s e Ds Synonymous substi
547. u specify only a file name that file must be on the MATLAB search path or in the MATLAB Current Directory e MATLAB character array that contains the text of a PDB formatted file ModelNumValue Positive integer specifying a model in a PDB formatted file PDBStruct MATLAB structure containing a field for each PDB record The Protein Data Bank PDB database is an archive of experimentally determined 3 D biological macromolecular structure data For more information about the PDB format see http www rcsb org pdb file_formats pdb pdbguide2 2 guide2 2 frame html PDBStruct pdbread File reads the data from PDB formatted text file File and stores the data in the MATLAB structure PDBStruct which contains a field for each PDB record The following table summarizes pdbread the possible PDB records and the corresponding fields in the MATLAB structure PDBStruct PDB Database Record Field in the MATLAB Structure HEADER Header OBSLTE Obsolete TITLE Title CAVEAT Caveat COMPND Compound SOURCE Source KEYWDS Keywords EXPDTA ExperimentData AUTHOR Authors REVDAT RevisionDate SPRSDE Superseded JRNL Journal REMARK 1 Remark REMARK N Remarkn Note N equals 2 through Note n equals 2 through 999 999 DBREF DBReferences SEQADV SequenceConflicts SEQRES Sequence FTNOTE Footnote MODRES Modif iedResidues 2 551 pdbread 2 552 PDB Database Record
548. uc Array of sequences Sequences can also be a structured array with the aligned sequences in a field Aligned or Sequences and the optional names in a field Header or Name Property to set the pseudocount weight A Default value is 20 Property to set the pseudocount weight Ax Default value is 20 Property to set the background symbol emission probabilities Default values are taken from Model NullEmission Property to set the background transition probabilities from any MATCH state M gt M M gt I M gt D Default values are taken from hmmprofstruct Property to set the background transition probabilities from any DELETE state D gt M D gt D Default values are taken from hmmprofstruct hmmprofestimate Description hmmprofestimate Model MultipleAlignment PropertyName PropertyValue returns a structure with the fields containing the updated estimated parameters of a profile HMM Symbol emission and state transition probabilities are estimated using the real counts and weighted pseudocounts obtained with the background probabilities Default weight is A 20 the default background symbol emission for match and insert states is taken from Model NullEmission and the default background transition probabilities are the same as default transition probabilities returned by hmmprofstruct Model Construction Multiple aligned sequences should contain uppercase letters and dashes indicating the model MATCH an
549. ue specifies the cutoff frequency Enter a scalar value between 0 and 1 Nyquist frequency or half the sampling frequency By default msresample estimates the cutoff value by inspecting the mass charge vectors MZ MZout However the cutoff frequency might be underestimated if MZ has anomalies msresample ShowPlot ShowPlotValue plots the original and the resampled spectrum When msresample is called without output arguments the spectra are plotted unless ShowPlotValue is false When ShowPlotValue is true only the first spectrum in Y is plotted ShowPlotValue can also contain an index to one of the spectra in Y 1 Load mass spectrometry data and extract m z and intensity value vectors 2 487 msresample load sample _hi_res mz MZ_hi_res y Y_hi_res 2 Plot original data to a lower resolution plot mz y MATLAB draws a figure lolx File Edit View Insert Tools Desktop Window Help 350 300 250 200 150 100 50 K 2000 4000 6000 8000 10000 12000 3 Resample data mz1 y1 msresample mz y 10000 range 2000 max mz 4 Plot resampled data plot mz1 y1 MATLAB draws a figure with the down sampled data 2 488 msresample lolx Fie Edit View Insert Tools Desktop Window Help a 300 250 200 150 100 J000 4000 6000 8000 10000 12000 See Also Bioinformatics Toolbox functions msalign msbackadj msheatmap mslowess msnorm msppresampl
550. urrent figure redgreencmap PropertyName PropertyValue defines optional properties that use property name value pairs in any order These property name value pairs are as follows redgreencmap Interpolation InterpolationValue lets you set the algorithm for color interpolation Choices are e linear redgreencmap Examples See Also e quadratic e cubic e sigmoid default Note The sigmoid interpolation is tanh Reset the color map of the current figure pd gprread mouse_a1pd gpr maimage pd F635 Median colormap redgreencmap Bioinformatics Toolbox function clustergram MATLAB functions colormap colormapeditor 2 607 restrict Purpose Syntax Arguments Description 2 608 Split nucleotide sequence at restriction site Fragments restrict SeqNT Enzyme Fragments restrict SegqNT Pattern Position Fragments CuttingSites restrict Fragments CuttingSites Lengths restrict restrict PartialDigest PartialDigestValue SeqnT Nucleotide sequence Enter either a character string with the characters A T G C and ambiguous characters R Y K M S W B D H V N or a vector of integers You can also enter a structure with the field Sequence Enzyme Enter the name of a restriction enzyme from REBASE Version 412 Pattern Enter a short nucleotide pattern Pattern can be a regular expression Position D
551. use human view tr any sel 2 getbyname phytree See Also Bioinformatics Toolbox e function phytree object constructor e phytree object methods get prune select 4 21 getcanonical phytree Purpose Calculate canonical form of phylogenetic tree Syntax Pointers getcanonical Tree Pointers Distances Names getcanonical Tree Arguments Tree phytree object created by phytree function object constructor Description Pointers getcanonical Tree returns the pointers for the canonical form of a phylogenetic tree Tree In a canonical tree the leaves are ordered alphabetically and the branches are ordered first by their width and then alphabetically by their first element A canonical tree is isomorphic to all the trees with the same skeleton independently of the order of their leaves and branches Pointers Distances Names getcanonical T7Tree returns in addition to the pointers described above the reordered distances Distances and node names Names Examples 1 Create two phylogenetic trees with the same skeleton but slightly different distances b 1 2 3 4 5 tr_1 phytree b b tr_2 phytree 6 7 1 ps2 2 Plot the trees plot tr_1 plot tr_2 3 Check whether the trees have an isomorphic construction isequal getcanonical tr_1 getcanonical tr_2 4 22 getcanonical phytree ans See Also Bioinformatics Toolbox e functions phyt
552. use them as input to other functions such as dnds which calculates the synonymous and nonsynonymous substitutions rates of the codon aligned nucleotide sequences By setting Verbose to true you can also display the codons considered in the computations and their amino acid translations dn ds dnds hk01_aligned vt04_ aligned verbose true See Also Bioinformatics Toolbox functions emblread genbankread genpeptread getgenbank getgenpept 2 157 galread Purpose Syntax Arguments Description See Also 2 158 Read microarray data from GenePix array list file GALData galread File File GenePix Array List formatted file GAL Enter a file name or enter a path and file name galread reads data from a GenePix formatted file into a MATLAB structure GALData galread File reads in a GenePix Array List formatted file File and creates a structure GALData containing the following fields Field Header BlockData IDs Names The field BlockData is an N by 3 array The columns of this array are the block data the column data and the row data respectively For more information on the GAL format see http www moleculardevices com pages software gn_genepix_file_ formats html gal For a list of supported file format versions see http www moleculardevices com pages software gn_genepix_file_ formats html GenePix is a registered trademark of Molecular Devices Corporation Bioinformati
553. ustergram Examples The following example uses data from an experiment DeRisi et al 1997 that used DNA microarrays to study temporal gene expression of almost all genes in Saccharomyces cerevisiae during the metabolic shift from fermentation to respiration Expression levels were measured at seven time points during the diauxic shift 1 Load the filtered yeast data provided with Bioinformatics Toolbox and then create a clustergram from the gene expression data in the yeastvalues matrix load filteredyeastdata clustergram yeastvalues 2 94 clustergram in xl File Edit View Insert Tools Desktop Window Help a osasia ealn ay i bi i biri Meee ee Prise 2 Add labels to the clustergram then click and hold the mouse button on the heat map to display the intensity value column label and row label for that area of the heat map View the row labels by using the Zoom icon to zoom the right side of the clustergram clustergram yeastvalues RowLabels genes ColumnLabels times 2 95 clustergram io x File Edit View Insert Tools Desktop Window Help a Oe eS Fs Q8QM B H 08 5 i AEAT E i s Q Ml ae n ho HE iev 3 Change the clustering parameters clustergram yeastvalues Linkage complete 2 96 clustergram 3 Fourea ioj x File Edit View Insert Tools Desktop Window Help a Oe Ss aana A0 ng 4 Change the color of the groups of nod
554. vations that are used in both sets P and Q are scalars between 0 and 1 Q 1 P corresponds to holding out 100 P while P Q 1 corresponds to full resubstitution P Q defaults to 1 1 when omitted crossvalind Method Group takes the group structure of the data into account Group is a grouping vector that defines the class for each observation Group can be a numeric vector a string array or a cell array of strings The partition of the groups depends on the type of cross validation For K fold each group is divided into K subsets approximately equal in size For all others approximately equal numbers of observations from each group are selected for the evaluation set In both cases the training set contains at least one observation from each group crossvalind Method Group Classes C restricts the observations to only those values specified in C C can be a numeric vector a string array or a cell array of strings but it is of the same form as Group If one output argument is specified it contains the value 0 for observations belonging to excluded classes If two output arguments are specified both will contain the logical value false for observations belonging to excluded classes crossvalind Method Group Min MinValue sets the minimum number of observations that each group has in the training set Min defaults to 1 Setting a large value for Min can help to balance the training
555. ve a value of 1 and those with a variance less than the threshold are 0 genevarfilter Examples References See Also Mask FData genevarfilter Data returns the filtered data matrix FData You can also create FData using FData Data find I Mask FData FNames genevarfilter Data Names returns a filtered names array FNames Names is a cell array of the names of the genes corresponding to each row of Data FNames can also be created using FNames Names I genevarfilter PropertyName PropertyValue defines optional properties using property name value pairs genevarfilter Percentile PercentileValue removes from the experimental data Data gene expression profiles with a variance less than the percentile Percentile genevarfilter AbsValue AbsValValue removes from Data gene expression profiles with a variance less than AbsValue load yeastdata fyeastvalues fgenes genevarfilter yeastvalues genes 1 Kohane I S Kho A T Butte A J 2003 Microarrays for an Integrative Genomics Cambridge MA MIT Press Bioinformatics Toolbox functions exprprofrange exprprofvar generangefilter geneentropyfilter genelowvalfilter 2 191 genpeptread Purpose Syntax Arguments Description 2 192 Read data from GenPept file GenPeptData genpeptread File File GenPept formatted file ASCII text file Enter a file name a path and file name or a URL
556. ve to the indices of the multiple alignment seqprofile Examples seqs fastaread pf00002 fa P S seqprofile seqs limits 50 60 gaps all See Also Bioinformatics Toolbox functions fastaread multialignread seqconsensus seqdisp seqlogo 2 667 seqrcomplement Purpose Syntax Arguments Description Examples See Also 2 668 Calculate reverse complement of nucleotide sequence SeqRC seqrcomplement SeqNT SeqNT Nucleotide sequence Enter either a character string with the characters A T U G C and ambiguous characters R Y K M S W B D H V N or a vector of integers You can also enter a structure with the field Sequence seqrcomplement calculates the reverse complementary strand of a DNA sequence SeqRC seqrcomplement SeqNT calculates the reverse complementary strand 3 gt 5 A gt T C gt G G gt C T gt A for a DNA sequence and returns a sequence in the same format as SeqNT For example if SeqNT is an integer sequence then so is SeqRC Reverse a DNA nucleotide sequence and then return its complement s ATCG seqrcomplement s ans CGAT Bioinformatics Toolbox functions codoncount palindromes seqcomplement seqreverse seqtool seqreverse Purpose Syntax Arguments Description Examples See Also Reverse letters or numbers in nucleotide sequence SeqR seqreverse SeqnT SeqnT Enter a nucleotide sequence Enter either a characte
557. vector net Bioinformatics Toolbox functions knnclassify svmclassify svmsmoset Statistics Toolbox function classify Optimization Toolbox function quadprog MATLAB function optimset 2 715 swalign Purpose Syntax Arguments 2 716 Locally align two sequences using Smith Waterman algorithm Score swalign Seq1 Seq2 Score Alignment swalign Seqi Seq2 Score Alignment Start swalign Seq1 Seq2 Swalign Seqi Seq2 Alphabet AlphabetValue swalign Seq1 Seq2 ScoringMatrix ScoringMatrixValue Swalign Seqi Seq2 Scale ScaleValue swalign Seq1 Seq2 GapOpen GapOpenValue Swalign Seq1 Seq2 ExtendGap ExtendGapValue swalign Seq1 Seq2 Showscore ShowscoreValue Seq1 Seq2 Amino acid or nucleotide sequences Enter any of the following e Character string of letters representing amino acids or nucleotides such as returned by int2aa or int2nt e Vector of integers representing amino acids or nucleotides such as returned by aa2int or nt2int e Structure containing a Sequence field Tip For help with letter and integer representations of amino acids and nucleotides see Amino Acid Lookup Table on page 2 42 or Nucleotide Lookup Table on page 2 52 AlphabetValue String specifying the type of sequence Choices are AA default or NT swalign ScoringMatrixValue String specifying the scoring matrix to use f
558. w phylogenetic tree Remove branch nodes from phylogenetic tree Reorder leaves of phylogenetic tree Change root of phylogenetic tree Select tree branches and leaves in phytree object Extract phylogenetic subtree phytree object Property Summary See Also view phytree View phylogenetic tree weights phytree Calculate weights for phylogenetic tree Note You cannot modify these properties directly You can access these properties using the get method Property Description NumLeaves Number of leaves NumBranches Number of branches NumNodes Number of nodes NumLeaves NumBranches Pointers Branch to leaf branch connectivity list Distances Edge length for every leaf branch LeafNames Names of the leaves BranchNames Names of the branches NodeNames Names of all the nodes Bioinformatics Toolbox functions phytree object constructor phytreeread phytreetool phytreewrite seqlinkage seqneighjoin seqpdist Bioinformatics Toolbox methods of phytree object get getbyname getcanonical getmatrix getnewickstr pdist plot prune reroot select subtree view weights A aa2int function reference 2 2 aa2nt function reference 2 5 aacount function reference 2 10 affyinvarsetnorm function reference 2 14 affyprobeaffinities function reference 2 22 affyprobeseqread function reference 2 29 affyread function reference 2 34 agferead function reference 2 39 allshortestpaths method reference 4 2
559. when you have set the JobManager property When WaitInQueueValue is true seqpdist waits in the job manager queue for an available worker When WaitInQueueValue is false default and there are no workers immediately available seqpdist stops and displays an error message You must have Distributed Computing Toolbox and have also set the JobManager property to use this property D seqpdist Seqs SquareForm SquareFormValue controls the conversion of the output into a square matrix such that D I J denotes the distance between the Ith and Jth sequences The square matrix is symmetric and has a zero diagonal Choices are true or false default Setting Squareform to true is the same as using the squareform function in Statistics Toolbox D seqpdist Seqs Alphabet AlphabetValue specifies the type of sequence nucleotide or amino acid Choices are NT or AA default The remaining input properties are available when the Method property equals alignment score or the PairwiseAlignment property equals true D seqpdist Seqs ScoringMatrix ScoringMatrixValue specifies the scoring matrix to use for the global pair wise alignment Default is e NUC44 when AlphabetValue equals NT e BLOSUM50 when AlphabetValue equals AA D seqpdist Seqs Scale ScaleValue specifies the scale factor used to return the score in arbitrary units Choices are any positive value If the scoring matri
560. with a variance less then the threshold are 0 Masks FData geneentropyfilter Data returns a filtered data matrix FData FData can also be created using FData Data find I Mask FData FNames geneentropyfilter Data Names returns a filtered names array FNames You can also create FNames using FNames Names 1 geneentropyfilter PropertyName PropertyValue defines optional properties using property name value pairs 2 179 geneentropyfilter Examples References See Also 2 180 geneentropyfilter Percentile PercentileValue removes from the experimental data Data gene expression profiles with entropy values less than a given percentile PercentileValue load yeastdata fyeastvalues fgenes geneentropyfilter yeastvalues genes 1 Kohane I S Kho A T Butte A J 2003 Microarrays for an Integrative Genomics Cambridge MA MIT Press Bioinformatics Toolbox functions exprprofrange exprprofvar genelowvalfilter generangefilter genevarfilter genelowvalfilter Purpose Syntax Arguments Description Remove gene profiles with low absolute values Mask genelowvalfilter Data Mask FData genelowvalfilter Data Mask FData FNames genelowvalfilter Data Names genelowvalfilter PropertyName PropertyValue genelowvalfilter Prctile PrctileValue genelowvalfilter AbsValue AbsValueValue genelowvalfilter A
561. x Arguments 2 478 Resample mass spectrometry signal while preserving peaks MZ Intensities msppresample Peaks N MZ Intensities msppresample Peaks N Range RangeValue MZ Intensities msppresample Peaks N FWHH FWHHValue MZ Intensities msppresample Peaks N ShowPlot ShowPlotValue Peaks Either of the following RangeValue e Two column matrix where the first column contains mass charge m z values and the second column contains ion intensity values e Cell array of peak lists where each element is a two column matrix of m z values and ion intensity values and each element corresponds to a spectrum or retention time Note You can use the mzxml2peaks function or the mspeaks function to create the Peaks matrix or cell array Integer specifying the number of equally spaced points m z values in the resampled signal 1 by 2 vector specifying the minimum and maximum m z values for the output matrix Intensities RangeValue must be within min inputMZ max inputMZ where inputMZ is the concatenated m z values from the input Peaks Default is the full range min inputMZ max inputMZ msppresample Return Values FWHHValue ShowPlot Value MZ Intensities Value that specifies the full width at half height FWHH in m z units The FWHH is used to convert each peak to a Gaussian shaped curve Default is median diff inputMZ 2 where in
562. x MMIntensity also estimates affinity coefficients using multiple linear regression It affyprobeaffinities returns BaseProf a 4 by 4 matrix containing the four parameters for a polynomial of degree 3 for each base A C G and T Each row corresponds to a base and each column corresponds to a parameter These values are estimated from the probe sequences and intensities and represent all probes on an Affymetrix GeneChip array AffinPM AffinMM BaseProf Stats af fyprobeaffinities SequenceMatrix MMIntensity also returns Stats a row vector containing four statistics in the following order e R square statistic e F statistic e p value e error variance affyprobeaffinities SequenceMatrix MMIntensity PropertyName PropertyValue calls affyprobeaffinities with optional properties that use property name property value pairs You can specify one or more properties in any order Each PropertyName must be enclosed in single quotation marks and is case insensitive These property name property value pairs are as follows affyprobeaffinities SequenceMatrix MMIntensity ProbeIndices ProbeIndicesValue uses probe indices to normalize the probe intensities with the median of their probe set intensities Tip Use of the ProbeIndices property is recommended only if your MMIntensity data are not from a nonspecific binding experiment affyprobeaffinities SequenceMatrix MMIntensity Showplot
563. x information also provides a scale factor then both are used D seqpdist Seqs GapOpen GapOpenValue specifies the penalty for opening a gap in the alignment Choices are any positive integer Default is 8 2 663 seqpdist Examples See Also 2 664 D seqpdist Seqs ExtendGap ExtendGapValue specifies the penalty for extending a gap in the alignment Choices are any positive integer Default is equal to GapOpenValue 1 Read amino acids alignment data into a MATLAB structure seqs fastaread pf00002 fa 2 For every possible pair of sequences in the multiple alignment ignore sites with gaps and score with the scoring matrix PAM250 dist seqpdist seqs Method alignment score Indels pairwise delete ScoringMatrix pam250 3 Force the realignment of every pair of sequences ignoring the provided multiple alignment dist seqpdist seqs Method alignment score Indels pairwise delete ScoringMatrix pam250 PairwiseAlignment true 4 Measure the Jukes Cantor pair wise distances after realigning every pair of sequences counting the gaps as point mutations dist seqpdist seqs Method jukes cantor Indels score Scoringmatrix pam250 PairwiseAlignment true Bioinformatics Toolbox functions fastaread dnds dndsml multialign nwalign phytree object constructor seqlinkage Bioinformat
564. xamples 2 676 be enclosed in single quotes and is case insensitive These property name property value pairs are as follows seqshowwords Seq Word Color ColorValue selects the color used to highlight the words in the output display seqshowwords Seq Word Columns ColumnsValue specifies how many columns per line to use in the output seqshowwords Seq Word Alphabet AlphabetValue selects the alphabet for the sequence Seq and the word Word If the search work Word contains nucleotide or amino acid symbols that represent multiple possible symbols then seqshowwords shows all matches For example the symbol R represents either G or A purines If Word is ART then seqshowwords shows occurrences of both AAT and AGT This example shows two matches TAGT and TAAT for the word BART seqshowwords GCTAGTAACGTATATATAAT BART ans Start 3 17 Stop 6 20 000001 GCTAGTAACGTATATATAAT seqshowwords does not highlight overlapping patterns multiple times This example highlights two places the first occurrence of TATA and the TATATATA immediately after CG The final TA is not highlighted because the preceding TA is part of an already matched pattern seqshowwords GCTATAACGTATATATATA TATA ans Start 3 10 14 seqshowwords Stop 6 13 17 000001 GCTATAACGTATATATATA To highlight all multiple repeats of TA use the regular expression TA
565. y display in the Tree view Tree opens the Phylogenetic Tree Tool window and draws a tree from data in a phytree object Tree The significant distances between branches and nodes are in the horizontal direction Vertical distances have no significance and are selected only for display purposes You can access tools to edit and analyze the tree from the Phylogenetic Tree Tool menu bar or by using the left and right mouse buttons view Tree IntNodes opens the Phylogenetic Tree Tool window with an initial selection of nodes specified by IntNodes IntNodes can be a logical array of any of the following sizes NumLeaves NumBranches x 1 NumLeaves x 1 or NumBranches x 1 IntNodes can also be a list of indices tree phytreeread pf00002 tree view tree Bioinformatics Toolbox functions phytree object constructor phytreeread phytreetool seqlinkage seqneighjoin Bioinformatics Toolbox object phytree object Bioinformatics Toolbox method of phytree object plot Purpose Syntax Arguments Description Examples weights phytree Calculate weights for phylogenetic tree W weights Tree Tree Phylogenetic tree phytree object created with the function phytree W weights Tree calculates branch proportional weights for every leaf in a tree Tree using the Thompson Higgins Gibson method The distance of every segment of the tree is adjusted by dividing it by the number of leaves it contains The sequence w
566. y for every feature in the pool gives the measurement of the significance X contains the training samples Every column of X is an observed vector Group contains the class labels Group can be a numeric vector or a cell array of strings numel Group must be the same as the number of columns in X and numel unique Group must be greater than or equal to 2 Z is the classification significance for every feature IDX contains the indices after sorting Z i e the first one points to the most significant feature randfeatures Classifier C sets the classifier Options are da default Discriminant analysis knn K nearest neighbors randfeatures ClassOptions CO is a cell with extra options for the selected classifier Defaults are 2 593 randfeatures Examples 2 594 5 correlation consensus for KNN and linear for DA See knnclassify and classify for more information randfeatures PerformanceThreshold PT sets the correct classification threshold used to pick the subsets included in the final pool Default is 0 8 80 randfeatures ConfidenceThreshold CT uses the posterior probability of the discriminant analysis to invalidate classified subvectors with low confidence This option is only valid when Classifier is da Using it has the same effect as using consensus in KNN i e it makes the selection of approved subsets very stringent Default is 0 95 number of classes
567. y if there are ambiguous and unknown characters add an Others field with the counts e basecount SeqNT Others full Display four nucleotides 11 ambiguous nucleotides gaps and only if there are unknown characters add an Others field with the unknown counts e basecount SeqNT Structure full Display four nucleotides and always display an Others field If there are ambiguous and unknown characters add counts to the Others field otherwise display 0 2 49 basecount e basecount SeqNT Others full Structure full Display 4 nucleotides 11 ambiguous nucleotides gaps and the Others field If there are unknown characters add counts to the Others field otherwise display 0 Examples 1 Count the number of bases in a DNA sequence Bases basecount TAGCTGGCCAAGCGAGCTTG Bases A AN OF C G T 2 Get the count for adenosine A bases Bases A ans 4 3 Count the bases in a DNA sequence with ambiguous characters basecount ABCDGGCCAAGCGAGCTTG Others full ans Dea n2AKXVTHQOOP il 0000 00NO0OQOO0A 2 50 basecount D 1 H O V 0 N 0 Gaps 0 See Also Bioinformatics Toolbox functions aacount baselookup codoncount cpgisland dimercount nmercount ntdensity seqtool 2 51 baselookup Purpose Syntax Arguments 2 52 Nucleotide codes abbreviations and names baselookup baselookup baselookup baselookup Compl
568. y name value pairs 2 75 blosum 2 76 Examples See Also blosum Extended ExtendedValue if Extended is false returns the scoring matrix for the standard 20 amino acids Ordering of the output when Extended is false is ARNDCQEGHILKMFPSTWYV blosum Order OrderValue returns a BLOSUM matrix ordered by an amino acid sequence OrderString Return a BLOSUM matrix with a value of 50 B50 blosum 50 Return a BLOSUM matrix with the amino acids in a specific order B75 blosum 75 Order CSTPAGNDEQHRKMILVFYW Bioinformatics Toolbox functions dayhoff gonnet nwalign pam swalign celintensityread Purpose Syntax Arguments Read probe intensities from Affymetrix CEL files Windows 32 ProbeStructure celintensityread CELFiles CDFFile ProbeStructure celintensityread CELPath CELPathValue ProbeStructure celintensityread CDFPath CDFPathValue ProbeStructure celintensityread PMOnly PMOnlyValue ee ProbeStructure celintensityread Verbose VerboseValue z CELFiles Cell array of CEL file names If you set CELFiles to then it reads all CEL files in the current directory If you set CELFiles to then it opens the Select CEL Files dialog box from which you select the CEL files From this dialog box you can press and hold Ctrl or Shift while clicking to select multiple CEL files CDFFile String specifying a CDF file n
569. zero value edge in the N by N sparse matrix The order of the custom weights in the vector must match the order of the nonzero values in the N by N sparse matrix when it is traversed column wise By default minspantree gets weight information from the nonzero entries in the N by N sparse matrix 1 Kruskal J B 1956 On the Shortest Spanning Subtree of a Graph and the Traveling Salesman Problem Proceedings of the American Mathematical Society 7 48 50 2 Prim R 1957 Shortest Connection Networks and Some Generalizations Bell System Technical Journal 36 1389 1401 minspantree biograph See Also 3 Siek J G Lee L Q and Lumsdaine A 2002 The Boost Graph Library User Guide and Reference Manual Upper Saddle River NJ Pearson Education Bioinformatics Toolbox functions biograph object constructor graphminspantree Bioinformatics Toolbox object biograph object Bioinformatics Toolbox methods of a biograph object allshortestpaths conncomp isdag isomorphism isspantree maxflow shortestpath topoorder traverse 4 51 pdist phytree 4 52 Purpose Syntax Arguments Description Calculate pair wise patristic distances in phytree object D pdist Tree D C pdist Tree pdist PropertyName PropertyValue pdist Nodes NodeValue pdist Squareform SquareformValue pdist Criteria CriteriaValue Tree Phylogenetic tree object created with

Download Pdf Manuals

Related Search

mavolcanoplot

mavolcanoplot

Contents

Download Pdf Manuals

Related Search

Related Contents