Home
ArrayGene v - Department of Animal Science
Contents
1. Set up and maintenance must be done by experienced bioinformaticians or IT staff given the need of a running a MySQL server a Unix operative system and the installation of all the required Perl modules and their dependencies Citing ArrayGene This beta version of ArrayGene is provided as supplementary material for an article in BMC Genomics Please cite this article when ArrayGene has been used Bugs reports Please report any bugs in this software to Ricardo Verdugo at raverdugo ucdavis edu I will do my best to provide support Prerequisites MySQL only tested with v 4 0 but should work with previous versions too ArrayGene uses the build foreign key check capabilities for INNODB tables Version 3 23 50 or higher is required Perl Modules NOTE all this modules can be obtained from CPAN Type man cpan for details 1 ExtUtils MakeMaker 2 DBI 3 CGI 4 Getopt Long 5 Term ProgressBar 6 DBD Chart Install perl libraries In ortehr to run ArrayGene you need to install the libraries that come with the program Go to the library folder fomr the package root folder Execute the commands perl Makefile PL make sudo make install Configuration options You need to provide the directories where html and cgi scripts can be saved Make sure that you have writing rights here and that the web client nobody in linux and www in MacOS X by default has read atributes At the top level directory of the ArrayGene distr
2. ArrayGene V 0 2 User Guide Integrated platform for storing managing and querying genome centered microarray annotation information Ricardo Verdugo University of California Davis Davis California January 5 2006 ArrayGene 0 2 beta by Ricardo Verdugo Copyright c 2005 University of California Davis Permission is granted to anyone to make or distribute verbatim copies of this document as received in any medium provided that the copyright notice and permission notice are preserved and that the distributor grants the recipient permission for further redistribution as permitted by this notice TABLE OF CONTENTS TABLE OF CONTEN TSi nenio e EA 3 User Mantala mone A IS 5 5 EE A a E E E E aa 4 A a e a a a RRR RR a a aan 4 A A 4 A E 4 A A n a 4 Bugs TeportS seon ersin iaaiiai ede ea eae ate ante 5 PREPS QUISILCS arista deis 5 Install perl DAS AA AA Aes eid ae dos eed 5 Configuration OPTIONS sens e daa lel oleae arise 5 Diab e e 6 Populating the database a 7 ArrayGene database AO 7 Assembly databaser A A A AAA A a 8 Gene Lists FES as 9 Tools in the ArrayGene Package did 10 R ference of PLO OT AIS tl e ei a AER 10 Import arra yora aa 2 Sateen E Ma da oe eae actos NE e A A he 10 A a n a RR TE TEE Ene we rt 12 O a a a a a a e E 13 ACCESSION DEN A a E E E a a a A a seats Sco 14 inport pencas 15 AO 16 User Manual Introduction ArrayGene is a software package that allows to creating and maintaining a database of anno
3. and it will do its best to find the target gene for that probe by accessing a MySQL database Finally it will create an annotated output file with Entrez Gene ids and type of sequence used to find it at the last to columns of the file options 1 input name of input file names first line contains column name no argument a accession column number starting from 1 containing the accession numbers or other ids of the source sequences to generate the probe Required if no gene argument is given e extract flag indicating if accession no should be extracted from a string of ids separated by the bar and preceded by a two or three letter code Takes no argument Example of tstring ref NM_178871 gb AK042509 riken A630098E12 In this example the refseq id will be extracted and the other ids will be discarded other other id column g genes column number starting from 1 containing Entrez Gene IDs Required if no accession is given refseq column with RefSeq ids optional delimiter character separating columns default 2 missing character for missing observations default recsep character separating multiple records in a single column field Only the first record is used for annotation but multiple can be are stored in the database default u ugenes unique genes output Returns a list of annotated unique genes Only used if a text file is the output s server database server optional d d
4. atabase database name optional u user user name to access database o output name for output file default lt intput file gt out h help print this help 14 import_geneinfo usage import_geneinfo options i lt input filename gt This program will parse a column text file and find the gene id and any information related with it and will import it to the ArrayGene database options input name of input file with mapping data new is this a new alignment ignore number of lines to ignore default 0 gene id column number with gene ids must be Entrez Ids xref column number with cross reference ids type type of xref ids in the input string filter column number with filter fvalue value allowed in the filer column delete column number with gene ids to delete from the db server database server optional user user name to access database password user password database database name delimiter character separating columns default 5 recsep character delimiting multiple values with a field missing character for missing values default help print this help 15 probe2genemap usage probe2genemap options i lt input filename gt This program will parse a column text file extract probe names for a given microarrays look for the target gene in the ArrayGene database and retrieve its genomic position in a given genomic assembly The output is an annotated file with
5. e default outname output filename String with a filename If given no changes to the database will be made and am annotated file will be created ugenes unique genes output Returns a list of annotated unique genes Only used if a text file is the output server database server optional database gene annotation database name optional user user name to access database password password to access database help print this help 11 import_genexref usage import_genexref options i lt input filename gt This program will parse a column text file extract the gene id and some other identification for cross reference and insert this pair in a MySQL database options input name of input file with mapping data ignore number of lines to ignore default 0 gene id column number with gene ids must be Entrez Ids xref column number with cross reference ids delvers should I delete the version number from the sequence id E g AV089821 1 gt AV089821 Numeric arguments 0 No 1 Yes default 1 type type of xref ids in the input string filter column number with filter regexp should I use regular expressions to match filter option with no argument fvalue value allowed in the filer column delete column number with gene ids to delete from the db server database server optional user user name to access database password user password database database name delim
6. etween gene ids and sequences ids A good source of data is the GeneEntrez database Files can be downloaded from the NCBI FTP server Useful files are gene2accession gene history gene2refseq gene infoSee See the file ftp ftp ncbi nlm nih gov gene README for more information The following file from the UCSC Genome Browser database is also useful knownToEnsembl txt The program import_genexref located in the bin folder of the ArrayGene distribution can take flat files and populate the database Call the program with the help attribute to see the available options A full set of examples of usage to populate the database with files above follows lines are broken for display but should be entered as single lines in the command line import_genexref input gene2refseq gene_id 2 xref 4 type refseq filter 1 fvalue 10090 missing import_genexref input data NCBI gene2refseq gene_id 2 xref 6 n type refseq filter 1 fvalue 10090 missing import_genexref input gene2refseq gene_id 2 xref 8 type refseq filter 1 fvalue 10090 missing import_genexref input gene2accession gene_id 2 xref 4 type mRNA filter 1 fvalue 10090 missing import_genexref input gene2accession gene_id 2 xref 6 type protein filter 1 fvalue 10090 missing import_genexref input gene_info gene_id 2 xref 5 type synonym filter 1 fvalue 10090 mis
7. genomic position for every probe when available options 1 input name of input file array name of microarray probes column number containing probe identifiers ignore number of headers lines to ignore default 0 headers first line after lt ignore gt lines contain headers symbol Include gene symbol Yes 1 No 0 Default 1 gene Include Entrez Gene id Yes 1 No 0 Default 1 delimiter character separating columns default 5 missing character for missing observations default s server database server optional g gdatabase ArrayGene database name optional adatabse Genome Assembly database optional u user user name to access database output name for output file default lt intput file gt out h help print this help 16
8. ibution run configure pl After answering some questions a configuration file is created in the pub conf directory called general conf Check this file to confirm the values are correct Then run sudo install pl This program will install the html and cgi files in the directories indicated to the configure pl program It will also create the necessary MySQL databases You will need a MySQL root password for this step Database architecture The ArrayGene system uses a MySQL database to store all the information about genes and microarrays A second database stores the mapping information for the genes This allows for efficient retrieval of information through SQL queries The database follows the following architecture ArrayGene genexref HGene 1d H H 2 seq id type gene_ info LocusTag Synonyms dbXrefs chromosome map location description type nomenclature auth nomenclature auth full nomenclature status w w vendors Vendor URL platforms Vendor lt EK Product PN SName arrays_table probe _id platform EK array_name probe_name source _id source type refseq other id vendor gene Alignmet_db genemap strand txStart txEnd source Populating the database ArrayGene database The table genexref must be populated with cross references b
9. iter character separating columns default t recsep character delimiting multiple values with a field missing character for missing values default help print this help 12 import_genemap usage import _ genemap options 1 lt input filename gt This program takes as input a column text file of genes and their genomic positions and stores the information in a MySQL database options input align new delimiter ignore gene_ id name chr strand txstart txend server maptable user password missing help name of input file with mapping data title for mouse alignment short alphanumeric string is this a new alignment character separating columns default t number of lines to ignore default 0 column number with gene ids must be Entrez Ids column number with gene or sequence name column number with chromosome e g Chr2 column number with strand or column number with transcription start column number with transcription end database server optional table in the database with mapping information user name to access database user password character for missing values default print this help 13 accession2gene usage accession2gene options i lt input filename gt This program will parse a column text file extract probe annotations such as accession numbers RefSeq ids Ensembl transcripts gene symbols among other
10. nly creates an annotated output file This can also be done by providing an output name to import_array Tools in the ArrayGene Package Reference manual of the annotation and administration tools in ArrayGene using a command line interface The syntax to call these programs follows standard Unix rules The arguments for the delimiter option in these programs is a Perl regular expression e g t is evaluated as a tabulation The defaults for the options will depend on the values you provides in the configuration step An attempt has been made to make these programs portable however they have only been tested in Unix platforms and it would be safer if users make sure that all the input files are in Unix format i e having a single new line character at the end of each line The utility dos2unix can prove useful to format the files properly An online version of this Unix program can be found at http www iconv com dos2unix htm Once ArrayGene is installed the full list of options for any tool can be obtained by calling the program with no arguments or with the option help Reference of programs import_array usage import_array options i lt input filename gt This program will parse a column text file extract probe annotations such as accession numbers RefSeq ids Ensembl transcripts gene symbols among other and it will do its best to find the target gene for that probe by accessing a MySQL database Finally it will create an an
11. notated table in the database This program is used to annotate and import microarray genelists into the ArrayGene system options 1 input name of input file array numeric id of the array previously assigned by ArrayGene optional ignore number of headers lines to ignore default 0 probes column number starting from 1 containing the probes ids a accession column number starting from 1 containing the accession numbers or other ids of the source sequences to generate the probe e extract flag indicating if accession no should be extracted from a string of ids separated by the bar and preceded by a two or three letter code Takes no argument Example of the string ref NM_178871 gb AK042509 riken A630098E12 In this example the refseq id will be extracted and the 10 V 0 u S d u h other ids will be discarded source column number with source name of the sequence accession E g genebank ensembl etc Optional other other id column vgene column number starting from 1 containing gene id provided by the vendor refseq column with RefSeq ids optional annotate Should annotate the probes default YES delimiter character separating columns default t missing character for missing observations default recsep character separating multiple records in a single column field Only the first record is used for annotation but multiple can be are stored in the databas
12. quenced should be identified This can be done with the accession2gene program that is included in the package bin folder accession2gene input knownGene txt accession 1 output knownGene txt out import_genemap pl input knownGene txt out align mm6_2 gene_id 13 name 1 chr 2 strand txstart 4 txend 5 password SP 6fG Gene Lists files The import_array program is used to read and annotate probes in a gene list The results are stored in a MySQL database It is important to have an in depth knowledge of the nature of the file since they change from vendor to vendor Especially critical is to know what character is used to separate columns and multiple records within a cell For example Affymetrix uses to separate multiple records within a cell Others use commas semicolon etc The default t for the delim option takes cares of columns separated by tabulations It is also important to know what is used to indicate missing records or empty cells e g for Affy If missing records are indicated just by to continuous delimiters 1 e no character the miss is disregarded 1 e don t worry about it An example call for import_array import_array pl input MEEBO_Annotations_051705 txt probe 2 accession 3 other 5 vgene ignore 1 recsep The accompanying program accession2gene can also be used to annotated a Genelist file but it does not import the results in the database Instead it o
13. sing recsep You can also use files that do not directly associate a sequence ids with an Entrez Gene ids but instead with some other id that can in turn be associated with an Entrez Gene id This is fairly simple to do by annotating the file first with the program accessio2gene see below and then importing the annotated file with import_array In this way one can create associations between sequence ids and Entrez Gene ids For example the files called knownToEnsembl txt from UCSC ftp hgdownload cse ucsc edu goldenPath mm7 database provides links between Ensembl transcript ids and accession numbers of sequences in the KnownGene track of the UCSC Genome Browser In order to include links between Ensembl transcripts and Entrez Genes in the database one needs first to identify the gene by using the accession number provided in the file accession2 gene input knownToEnsembl txt accession outname Ensemb ToEntrez txt Now you can import the new annotated file to the database import_genexref input EnsemblToEntrez gene_id 3 xref 2 type ensembl missing Assembly database The files available at the UCSC Genome Browser server are useful to populate the genemap table but others can be used as well To populate the genome assembly database call the import_genemap pl with genome annotation files as input such as those generated by UCSC Before these files can be used the genes associated with the mapped se
14. tations for microarray probes and to make comparisons of the level of gene coverage for any region in the genome between microarray platforms ArrayGene provides tools thorough a command line interface to create and maintain the database and a web interface for gene coverage queries Some simple administrative task can also be done through dynamic web pages and more will be available in future releases Array gene creates reports of gene coverage number of probes per gene and efficiency of gene identification Gene coverage is also provided for every chromosome if no specific genomic region is indicated Portability This software has only been tested in Unix platforms Suse 9 0 and Mac OS X Server 10 3 3 It should be possible to port it to any operative system where Perl can be run but bugs may have to be dealt with in the process What do I need to use Array Gene In order to use ArrayGene you will need 1 Perl and all the its modules described below A web Server A MySQL server Microarray GeneLists Access to files associating sequence identifiers used in GeneLists and Genes Access to files with genomic coordinated or genes or sequences Dy OV O Target User Potential users of ArrayGene are biologist interested in microarray platforms for a given organism The amount of data that is stored in the database requires large storage space mainly for annotation source files and RAM 2GB recommended more commonly available in servers
Download Pdf Manuals
Related Search
Related Contents
Satellite P10 User`s Manual Metra Electronics 99-7423 User's Manual lösungen für Druckluft und Industriegase Instrucciones de servicio Notice d`utilisation Samsung SF-360 Manual de Usuario Hasbro 83524 Games User Manual The teTeX Manual Ficha técnica Copyright © All rights reserved.
Failed to retrieve file