Home
"Développement d`outils et méthodes bioinformatiques
Contents
1. Figure 2 The expression matrix corresponding to signature 3DE64836D Of note the plotGeneExpProfiles is a high level function to visualize gene expression levels in a signature figure 3 gt plotGeneExpProfiles data em X11 FALSE 3 Creating transcriptional signatures from a user defined data set using DBF MCL algorithm When analyzing a noisy dataset one is interested in isolating dense regions as they are populated with genes elements that display weak distances to their nearest neighbors i e strong profile similarities To isolate these regions DBF MCL computes for each gene element the distance with its kth nearest neighbor DKNN In order to define a critical DKNN value that will depend on the dataset and below which a gene element will be considered as falling in a dense area DBF MCL computes simulated DKNN values by using an empirical randomization procedure Given a dataset containing n genes profile_signature png 0 2 0 4 Intensity 0 4 Samples Figure 3 Gene expression profiles of signatures containing XBP1 ESR1 and GATA3 the centroid is highlighted in green and p samples a simulated DKNN value is obtained by sampling n distance values from the gene gene distance matrix D and by extracting the kth smallest value This procedure is repeated n times to obtain a set of simulated DKNN values S Computed distributions of simulated DKNN are used to compute a FDR value for each ob
2. TIGR ea Viewer pi n i LC AY CONDUCTOR Statistical amp functional analyses Data transformation amp Normalization LOG2 transformation Normalization methods Quantiles OR LOWESS Expression matrix file LIBRARY AgiND FIGURE 2 4 Sch ma r capitulatif de l analyse de donn es de puces ADN incluant la librairie AgiND bioconductor pour l analyse des puces ADN marray et limma ont t am lior es Les puces ADN de type Agilent Msont utilis es dans pr s de 25 des exp riences r alis es partir d chantillons humains soumises GEO L analyse des donn es g n r es 2 6 Discussions et Perspectives 71 repr sente un v ritable enjeu obligeant les entreprises commerciales commencer par Agilent cr er des logiciels d di s GeneSpring GX La librairie que j ai mise au point est utilisable avec tous les puces compatibles avec le format AFE et n est donc pas uniquement d di e aux puces 4x44k comme l est Agi4x44PreProcess En effet AgiND utilise les param tres pr sents au d but du fichier pour g n rer et v rifier la taille des objets Elle peut ainsi tre utilis e tant pour les puces 4x44k que pour les nouvelles puces 8x60k par exemple De plus AgiND est utilisable aussi bien pour les donn es de puces one color que pour les two colors Enfin la g n ration d un format de sortie de classe ExpressionnSet permet l utilisation d
3. Author Contributions Conceived and designed the experiments NB FF FBA ECI Performed the experiments NB BL AB OS CFT ECI Analyzed the data NB ECI Contributed reagents materials analysis tools JG MK CN FF FBA Wrote the paper NB ECI Esberg A Huang B Johansson MJ Bystrom AS 2006 Elevated levels of two tRNA species bypass the requirement for elongator complex in transcription and exocytosis Mol Cell 24 139 48 Rahl PB Chen CZ Collins RN 2005 Elplp the yeast homolog of the FD disease syndrome protein negatively regulates exocytosis independently of transcriptional elongation Mol Cell 17 841 53 Okada Y Yamagata K Hong K Wakayama T Zhang Y 2010 A role for the elongator complex in zygotic paternal genome demethylation Nature 463 554 8 Lipardi C Paterson BM 2009 Identification of an RNA dependent RNA polymerase in Drosophila involved in RNAi and transposon suppression Proc Natl Acad Sci U S A 106 15645 50 Chen YT Hims MM Shetty RS Mull J Liu L et al 2009 Loss of mouse Ikbkap a subunit of elongator leads to transcriptional deficits and embryonic lethality that can be rescued by human IKBKAP Mol Cell Biol 29 736 44 Hims MM Shetty RS Pickel J Mull J Leyne M et al 2007 A humanized IKBKAP transgenic mouse models a tissue specific human splicing defect Genomics 90 389 96 Valensi Kurtz M Lefler S Cohen MA Aharonowiz M Cohen Kupiec R et al 2010 Enriched population of PNS neurons der
4. CTRL hOE MSC ws pe ee OE MSCs as a Model for FD MAP2 FD hOE MSC Figure 1 hOE MSCs display characteristics of immature neuroglial cells A lamina propria dark from an FD olfactory mucosa biopsy was placed under a glass coverslip to initiate stem cell proliferation Area delimited by a black square is enlarged in B C after transfer in a 6 well plate cells attached to the coverslip arrow proliferated and colonized the complete area of the well D M Immunofluorescence stainings of both control CTRL and FD hOE MSCs are positive for nestin D E B III tubulin F G with similar expression levels H I while slightly positive for GFAP J K and negative for MAP2 L M Green represents Alexa fluor 488 red Alexa Fluor 594 Nuclei blue were stained with Hoechst blue Scale bars represent 50 um doi 10 1371 journal pone 0015590 g001 much less expressed in FD 5 8 fold when compared to controls hOE MSCs Figure 2B In addition WT and MU transcripts were present in nearly equal amounts in FD hOE MSCs Figure 2B right graph Furthermore the total amount of IKBKAP transcripts in FD WT MU remains 3 to 5 times less abundant than WT in controls which suggests a defect in ZXBKAP transcription and or mRNA stability In FD cells the differential expression of JKBKAP transcripts was also correlated to a reduced expression of IKAP hELPI protein in FD when compared to controls as revealed by western blot analysis Figure 2C
5. Il convient de souligner que le ChIP est un enrichissement et non une strat gie de purifi cation Il faut galement garder a l esprit que certaines r gions peuvent appara tre enrichies alors qu elles n interagissent pas avec la prot ine d int r t par exemple les r gions du g nome pr dispos es la fragmentation Cela est probablement influenc par des facteurs tels que les l ments r p titifs et le niveau d ouverture de la chromatine De plus le g nome de r f rence utilis et la variation du nombre de copies de certaines r gions chromosomiques appel es amplicons dans les cancers conduisent galement vers la g n ration de faux positifs Il faut donc bien prendre en compte la distribution th orique des reads dans ces r gions notamment le d s quilibre de brin pour filtrer les artefacts Figure 5 12 Avant de lancer un programme de recherche de pics on filtre g n ralement les donn es pour supprimer les artefacts tels que les amplifications PCR Ce filtre est maintenant inclus dans la plupart des algorithmes De tr s int ressantes revues ont t r alis es par Pepke Wilbanks et leurs collaborateurs sur la comparaison des algorithmes de recherche de pics pour les facteurs de transcription Pepke et al 2009 Wilbanks amp Facciotti 20101 Ces algorithmes ne prennent pas les m mes param tres en entr e et ne donnent pas forc ment des r sultats similaires en terme de taille des sites de fixa
6. lincRNA et leurs impli cations dans des pathologies voir partie 1 5 1 2 Le transcriptome Le transcriptome est l ensemble des ARN issus de l expression d une partie des g nes du g nome d un type cellulaire ou d un tissu un moment et dans des conditions donn es Initialement focalis e sur l ensemble des ARN messagers cette d finition a t tendue aux ARN non codants comme les microRNA les ARN ribosomaux les ARN de transfert ou bien encore les lincRNA suite la mise en vidence r cente gr ce au s quen age haut d bit que plus de 80 des nucl otides d un g nome peuvent tre transcrit Cependant cette transcription g n ralis e n implique pas n cessairement une fonction associ e chacun des produits Nous retiendrons donc le terme de transcriptome pour d signer l ensemble des transcrits codant ou non codant associ s une fonction dans un organisme La caract risation et la quantification du transcriptome dans un mod le biologique tissu organisme cellule ou lign e cellulaire permettent d identifier les g nes transcrits dans un contexte donn et ainsi de d terminer les m canismes de r gulation d expression des g nes co expression et de d finir leurs r seaux de r gulation voies de signalisation impliqu es La meilleure connaissance du niveau d expression d un g ne dans diff rentes situations constitue une avanc e vers la compr hension de sa fonction mais
7. 27 Cawley S Bekiranov S Ng HH Kapranov P Sekinger EA Kampa D Piccolboni A Sementchenko V Cheng J Williams AJ Wheeler R Wong B Drenkow J Yamanaka M Patel S Brubaker S Tammana H Helt G Struhl K Gingeras TR Unbiased mapping of transcription factor binding sites along human chromosomes 21 and 22 points to widespread regulation of noncoding RNAs Cell 2004 116 499 509 28 Bindea G Mlecnik B Hackl H Charoentong P Tosolini M Kirilovsky A Fridman W H Pag s F Trajanoski Z Galon J ClueGO a Cytoscape plug in to decipher functionally grouped gene ontology and pathway annotation networks Bioinformatics 2009 25 1091 1093 29 Bejerano G Pheasant M Makunin I Stephen S Kent WJ Mattick JS Haussler D 27 Ultraconserved elements in the human genome Science 2004 304 1321 1325 30 Gerstein MB Lu ZJ Van Nostrand EL et al Integrative Analysis of the Caenorhabditis elegans Genome by the modENCODE Project Science 2010 330 1775 1787 31 The modENCODE Consortium Roy S Ernst J Kharchenko PV Kheradpour P Negre N Eaton ML Landolin JM Bristow CA Ma L Lin MF Washietl S Arshinoff BI Ay F Meyer PE Robine N Washington NL Di Stefano L Berezikov E Brown CD Candeias R Carlson JW Carr A Jungreis I Marbach D Sealfon R Tolstorukov MY Will S Alekseyenko AA Artieri C Booth BW Brooks AN Dai Q Davis CA Duff MO Feng X Gorchakov AA Gu T Henikoff JG Kapranov P Li R MacAlpine HK Malone J
8. Alignments bam Tertiary analysis for ChlP seq Peak calling samtools Alignments sam PICOR Alignments bed PIPELINE Peaks bed Remove peaks i _regions Repe n repreated atMaske Peaks bed Extract sequences under peaks Get reads under peaks Peaks mapped on RefSeq map Peak sequences fa Peaks alignments sam ChIP seq pipeline Ficure 5 14 Principales tapes du pipeline d analyse des donn es de Chromatine ImmunoPr cipitation ChIP seq sur la plateforme TGML 5 4 Elaboration d outils et de m thodes d analyse pour les donn es de ChIP seq 245 Picor pipeline Picor algorithm background J signal J A Forward strand A Reverse strand d 10 to 20 bins A real ChiP seq peak is expected to give a correlated coverage signal between the two strands with a shift depending on the sonicated fragments size Picor algorithm uses this special feature to detect peaks by evaluating the correlation in 2 sliding windows of variable width one on each strand Lopez F Lepoivre C Puthier D Herrmann C TAGC FIGURE 5 15 Principe de l algorithme et pipeline d analyse du programme de d tection de pics d velopp au laboratoire TAGC Cet outil prend en entr e deux fichiers d alignement correspondant au ChIP seq et a l input et dont les reads redondants ont t supprim s car ils sont g n ralement dus a des amplifications PCR
9. Competing Interests The authors have declared that no competing interests exist E mail puthier tagc univ mrs fr 9 These authors contributed equally to this work To date at least two major solutions have emerged The first one applies a gene centered perspective as developed in the GEO profile or SOURCE web interfaces 3 This approach allows Introduction Microarray technology provides biologists with a powerful approach for comprehensive analyzes of cells or tissues at the transcriptional level DNA chips are now widely used to assess the expression levels from all genes of a given organism These data most generally deposited in MIAME compliant public databases constitute an unprecedented source of knowledge for biologists 1 As an example until now the Gene Expression Omnibus repository GEO host approximately 8 000 experiments encom passing about 200 000 biological samples analyzed using various high through put technologies 2 Consequently this represents billions of measurements that reflect the biological states of cells or tissues recorded in physiological or pathological conditions or in response to various chemical compounds and or natural mole cules As public repositories are continually expanding we are facing the new challenge of designing new strategies to provide efficient and productive access to the available data PLoS ONE www plosone org users to retrieve the expr
10. 12 13 14 15 16 17 18 19 Wight TN Merrilees MJ 2004 Proteoglycans in atherosclerosis and restenosis key roles for versican Circ Res 94 1158 1167 Galley HF Webster NR 2004 Physiology of the endothelium Br J Anaesth 93 105 113 Raffetto JD Khalil RA 2008 Matrix metalloproteinases and their inhibitors in vascular remodeling and vascular disease Biochem Pharmacol 75 346 359 Ehrchen JM Sunderkotter C Foell D Vogl T Roth J 2009 The endogenous Toll like receptor 4 agonist S100A8 S100A9 calprotectin as innate amplifier of infection autoimmunity and cancer J Leukoc Biol 86 557 566 Borregaard N Sorensen OE Theilgaard Monch K 2007 Neutrophil granules a library of innate immunity proteins Trends Immunol 28 340 345 5 Quinn K Henriques M Parker T Slutsky AS Zhang H 2008 Human neutrophil peptides a novel potential mediator of inflammatory cardiovascular diseases Am J Physiol Heart Circ Physiol 295 H1817 1824 Butthep P Bunyaratvej A Bhamarapravati N 1993 Dengue virus and endothelial cell a related phenomenon to thrombocytopenia and granulocy topenia in dengue hemorrhagic fever Southeast Asian J Trop Med Public Health 24 Suppl 1 246 249 Michelsen KS Doherty TM Shah PK Arditi M 2004 TLR signaling an emerging bridge from innate immunity to atherogenesis J Immunol 173 5901 5907 Wellen KE Hotamisligil GS 2005 Inflammation stress and diabetes J Clin Inves
11. 6 6 6 6 6 s ai ss 6 4 El E2 E3 E4 E5 V1 3 8 19 28 23 D R attribuer les V2 19 14 8 8 14 valeurs de la matrice V3 28 3 14 14 19 en fonction des indices V4 14 19 23 23 3 Obtention de la matrice V5 23 28 3 19 28 d expression normalis e VS 8 2328 3 8 parles quantiles FIGURE 2 2 Principe de la m thode des quantiles 2 3 Contexte du projet Constatant l absence d outil gratuit disponible permettant d analyser facilement et rapide ment les donn es obtenues gr ce aux puces ADN de type Agilent nous avons d cid de d velopper notre propre logiciel Ce projet qui a d but au cours de mon stage de Master 1 BBSG porte sur le d veloppement d une librairie R pour l analyse de ces donn es de puce ADN Cette librairie appel e AgiND pour Agilent Normalize and Diagnosis devait permettre d extraire les donn es mais galement de les visualiser de mani re tr s simple et de les normaliser L objectif de cette librairie AgiND n est pas de proposer une nouvelle m thode de normali sation mais d offrir les outils permettant d extraire de visualiser et de normaliser les donn es simplement et tr s rapidement via l obtention de fichiers au format texte g n r s partir des donn es brutes extraites l aide du logiciel AFE Cet outil est utilisable tant avec les donn es de puces one color qu avec celles des two colors Dans la pratique lorsqu il s agit
12. D F Normal P Ja 9 ma nsertion ziza aa a nn gt Beeson gt m m Inversion locale lt gt pees ns 7 Inversion locale ES Distance th orique entre les 2 fragments FIGURE 1 12 A Les diff rents modes de s quen age fragment paired end et mate pair B Apport du mode paired end pour la d tection d v nements d insertion de d l tion ou d inversion 1 4 Les techniques de s quencage tr s haut d bit 45 1 4 1 4 Les nouvelles g n rations de s quenceurs haut d bit Avec les d veloppements technologiques permanents est apparue la quatri me g n ration de techniques de s quen age Glenn 2011 Elle regroupe les nouvelles technologies telles que PacBio RS Pacific Bioscience Inc et Ion Torrent Life Technologies Bas es sur le s quen age par synth se elles utilisent deux nouvelles chimies PacBio RS repose sur l analyse de mol cule unique par des r actions de s quen age en temps r el Single Molecule Real Time ou SMRT Figure 1 13 Elle utilise le processus tr s efficace et pr cis de r plication de l ADN par l ADN polym rase Cette enzyme fix e au fond des puits se lie un unique fragment d ADN s quencer Cependant elle ne peut int grer que quelques nucl otides marqu s avant de s arr ter cause de l encombrement st rique de ces nucl otides Pour y rem dier la m thode SMRT utilise un
13. EN ER OLLLNI aie oL LNI aizonue 8 LNINNIGAW are orun aravaboowoy 00Z HVHOHYA Aemued EJH YH vasan ANVLSANIL op LELNI segorqau LELHVHOHVA aiuvoniaire OOLHVHOHVA 170 CHLLNI azs 8 HWHO 200 OZUVHOUVA ewenubnid S LNITIWWS azaua LavHOUvA aiwsonmanxe LNIANIL aiABojoiuo LNIANIQN IPOS SSZ HWHOWWA proie OS HWHOHVA aipromfeyire B LNIWN IGS qipsoniex dNV1SANIL dueyserep OC HVHOHVA aige weu S LNITIVINS arenuaqu EEVHOHVA aluvoymiarxe LNIANLL atABoroiuo BVHO 10100 S IHVHOUVA ABoromuo LNIANLL qrABooquo ELLNIANLL aiABoroquo LNINNIGAW aiprones OLLNI aizenue Orini aruegoues OL LNI aizonue LoS LVHOUVA aisgoid LELHVHOHVA aiuvoneiaixe OS WHOUWA nuenwusoped 1X31 vonduosap ss2 avHOUv A a Oruni gieqaiaau OVNI aizenuaau B LNINNIOIWN arret LELIHVHOUVA aluopeidixe ELHYHOUVA aiusopejaixe OS HVHOHWA aiegoud amp 64 WHO aroumeubss sLHVHOUVA aipewand 1X31 uorduosep Dame E LNIANIL sasneubrsau S LNITIWWS saduesau 8 LNINNIQAN apee EDHVHOUVA alwopeidixe SH vHOUVA abao LNT aidxe LNIANLL aiA6ojouo BLLNIWNIGAN gipron ex 6u yH gioneubts ridini aidxa 6H YH diemeubrs SHEWHOHVA aiadurs LELNI WSOPE de la nouvelle base de donn es TBrowserDBv2 Cette base
14. Hilario JD Rodino Klapac LR Wang C Beattie CE 2009 Semaphorin 5A is a bifunc tional axon guidance cue for axial motoneurons in vivo Dev Biol 326 190 200 Hims MM Ibrahim EC Leyne M Mull J Liu L Lazaro C Shetty RS Gill S Gusella JF Reed R Slaugenhaupt SA 2007 Therapeutic potential and mechanism of kinetin as a treatment for the human splicing disease familial dysautonomia J Mol Med 85 149 161 Hossain S Fragoso G Mushynski WE Almazan G 2010 Regulation of peripheral myelination by Src like kinases Exp Neurol 226 47 57 Huang B Johansson MJ Bystrom AS 2005 An early step in wobble uridine tRNA modification requires the Elongator complex RNA 11 424 436 Huang DW Sherman BT Lempicki RA 2009 Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources Nat Protoc 4 44 57 Ibrahim EC Hims MM Shomron N Burge CB Slaugenhaupt SA Reed R 2007 Weak definition of IKBKAP exon 20 leads to aberrant splicing in familial dysautonomia Hum Mutat 28 41 53 Jelen N Ule J Zivin M 2010 Cholinergic regulation of striatal Nova mRNAs Neuro science 169 619 627 Jelen N Ule J Zivin M Darnell RB 2007 Evolution of Nova dependent splicing regulation in the brain PLoS Genet 3 1838 1847 Johansen LD Naumanen T Knudsen A Westerlund N Gromova I Junttila M Nielsen C Bottzauw T Tolkovsky A Westermarck J Coffey ET Jaattela M Kallunki T 2008 IKAP localizes to membrane ruffles with filam
15. Issue 7 e11671 Molecular Mechanisms of DSS Damaged epithelial cells Reactivation of innate immune response Onset of fever Defervescence iti Do Onset of shock Dendritic cells ROI Regulation of cholesterol metabolism and innate immune response on neutrophil Lipid laden lt gt peptides Neutrophils Monocytes Macrophages See ROI DAMPs _ MPO CD36 ALOX15B OLR1 lt amp ox LDL MSR1 NT a A i NN S100A8 A9 modified cholesterol e 100A12 CCR2 gt A adipokines P l inflammatory l i SPP1 cytokines l VEFG l ANG l ox LDL eicosanoids y elastase eicosanoids MMP9 Microvascular endothelial cells Vascular homeostasis breakdown 4 Systemic vascular dysfunction 4 Cardiovascular decompensation Figure 4 Hypothesis of a second inflammatory amplification loop in dengue shock syndrome After induction of a first inflammatory and anti viral response to dengue virus disease resolution generally occurs around time of defervescence for most dengue infected patients Some patients however progress towards a life threatening dengue shock syndrome Results obtained in this study suggest that in those patients a second inflammatory amplification loop which involves a diversity of pro inflammatory responses related to innate immunity occurs and leads to a major inflammatory systemic syndrome and to vascular homeostasis breakdown The putative role of different markers identified in va
16. POU2A1 sig mature B cells T Cell receptor signaling HE MHC Class II Macrophages Bf Muc class Immature AMPD3 BCL3 CASPS ccL20 CCRL2 cxcL1 CXCL2 CcxCL3 EREG ETS2 GPR1098 118 ILS MAP3K8 NFKBIA NFKBIE OSM PTGER4 PTGS2 RAB20 RELB RIPK2 sop2 TNFAIPS TNIP1 BTK Ciorfss CcD74 CSF2RB HLA DMA HLA DMB HLA DPAI HLA DPB1 HLA DQAI HLA DQB1 HLA DRA HLA DRB1 HLA DRB4 HLA DRBS GEO Datamining with TBrowser response i LAT2 LY86 NCF4 B cells B cells CD24 p NK cells Type IFN tank ZMYNDS ATF3 CD79A cp2 APOLI AIFI BTG2 CPVL CD247 APOL2 c1QA BTG3 CTA 246H3 1 c027 BTNSAI c1Q8 DUSPI OTNB co2s BTN3A2 C3AR1 EGRI 1FI6 CD33 BTN3A3 CSARI EGR2 IGHAI coss voxss CECRI Fos 16H61 co3D Geri CCR2 FOSB IGHG3 CD3G GBP2 co14 IER2 IGHM cos HERCS cp163 JUN IGHVI 69 CDSA HERCE CD300A JUNB 16 cpsB ine co37 KLF10 IGKC cD96 1F127 cose KLF4 IGKVID 13 GZMA 1FI44 CENTA2 KLFS I6L GzMB 1FI44L CSFIR NFIL3 IGLJ3 GZMH 1F16 cTss rein IGLV3 10 GZMK IFINI CYBB PMAIP1 IRF4 Icos Em EVIZA RGS2 Ivo IL16 Le evi2e SWAPCI KIAA0125 IL21R ILISRA FCERI G TANK LOC652128 IL2RB IRF FCGRIA TIPARP Loce1316 IL2RG 15615 FCGRIB YRDC MGC29506 IL32 ISGF3G FCGR2A 2FP36 NTN2L IL7R LAMPS FC GR28 Pim2 ITGAL LAPS FCGR3B POU2AF1 ITG87 LBAT FGL2 GPSM3 SKAP2 ITK Ma FPRI HCPS SLAMF7 LAG3 mwa GMFG HLA A SLC1AS LAT NECAP2 GPRES HLA B TNFRSF17 LCK NMI HCK HLA C LTB OAS1 IGSF6 HLA E NKG7 OAS2 ILIORA
17. Venkatasubrahmanyam S amp Butte A J 2008 GeneChaser identifying all biological and clinical conditions in which genes of interest are differentially expressed BMC bioinformatics 9 548 Chiaretti et al 2004 Chiaretti S Li X Gentleman R Vitale A Vignetti M Mandelli F Ritz J amp Foa R 2004 Gene expression profile of adult T cell acute lymphocytic leu kemia identifies distinct subsets of patients with different response to therapy and survival Blood 103 7 2771 8 Clark et al 2011 Clark M J Chen R Lam H Y K Karczewski K J Chen R Euskir chen G Butte A J amp Snyder M 2011 Performance comparison of exome DNA sequen cing technologies Nature biotechnology Cleveland 1979 Cleveland W S 1979 Robust Locally Weighted Regression and Smoo thing Scatterplots Journal of the American Statistical Association 74 368 829 836 Core et al 2008 Core L J Waterfall J J amp Lis J T 2008 Nascent RNA sequencing reveals widespread pausing and divergent initiation at human promoters Science New York N Y 322 5909 1845 8 David et al 2011 David M Dzamba M Lister D Ilie L amp Brudno M 2011 SHRiMP2 sensitive yet practical SHort Read Mapping Bioinformatics Oxford England 27 7 1011 2 De Santa et al 2010 De Santa F Barozzi I Mietton F Ghisletti S Polletti S Tusi B K Muller H Ragoussis J Wei C L
18. amp Natoli G 2010 A large fraction of extragenic RNA pol II transcription sites overlap enhancers PLoS biology 8 5 e1000384 Dekker et al 2002 Dekker J Rippe K Dekker M amp Kleckner N 2002 Capturing chro mosome conformation Science New York N Y 295 5558 1306 11 Dobbin et al 2003 Dobbin K Shih J H amp Simon R 2003 Statistical design of reverse dye microarrays Bioinformatics Oxford England 19 7 803 10 Dohm et al 2008 Dohm J C Lottaz C Borodina T amp Himmelbauer H 2008 Substan tial biases in ultra short read data sets from high throughput DNA sequencing Nucleic acids research 36 16 e105 Bibliographie 289 Draghici et al 2003 Draghici S Kulaeva O Hoff B Petrov A Shams S amp Tainsky M A 2003 Noise sampling method an ANOVA approach allowing robust selection of differentially regulated genes measured by DNA microarrays Bioinformatics Oxford En gland 19 11 1348 59 Droege amp Hill 2008 Droege M amp Hill B 2008 The Genome Sequencer FLX System longer reads more applications straight forward bioinformatics and more complete data sets Journal of biotechnology 136 1 2 3 10 Dunn et al 2007 Dunn J J McCorkle S R Everett L amp Anderson C W 2007 Paired end genomic signature tags a method for the functional analysis of genomes and epige nomes Genetic engineering 28 159 73 Edgar et
19. automatisation des tapes de pr paration des librairies la r duction de biais d amplification grace la suppression d tapes de clonage bact rien de purification sur gel la quantification est rendue possible car chaque mol cule est s quenc e la couverture de s quen age est lev e selon le multiplexage les applications sont tr s vari es ces techniques sont plus sensibles que les approches ant rieures le re s quen age ou le s quen age de novo avec une couverture de s quen age lev e 1 5 Apports des techniques de puces ADN et de s quencage tr s haut d bit Avec le d veloppement des techniques haut d bit comme les puces ADN puis le s quen age tr s haut d bit une meilleure caract risation des alt rations au niveau transcriptionnel et de la r gulation de l expression des g nes a pu tre tudi e Ceci a conduit la classification de cancers en groupes selon l expression de quelques g nes devenus de bons indicateurs de la progression ou du type de tumeur 52 Chapitre 1 Introduction g n rale Golub et collaborateurs ont ainsi pu proposer en 1999 une signature mol culaire portant sur l analyse de profils d expression de diff rentes leuc mies des leuc mies lym pho des aigu s ALL pour Acute Lymphoblastic Leukemia et des leuc mies my lo des aigu s AML pour Acute Myeloid Leukemia Golub et al 19991 De nouv
20. ind pendamment On obtient ainsi un fichier csfasta et un fichier qual pour chacun des fragments soit 4 fichiers en tout Enfin si l on utilise des codes barres ils sont galement s quenc s en d but de run afin d attribuer chaque bille un chantillon et donnent eux aussi une paire de fichiers csfasta _QV qual Au cours du s quen age le SOLiD g n re un rapport de qualit sur chaque cycle de ligation consultable l aide du logiciel SETS Figure 5 11 Ce rapport comprend plusieurs parties ou analyses telles que 1 la saturation du signal fluorescent pour chaque fluorochrome pour chaque quadrant de la lame Figure 5 11 A 2 le satay plot pour chaque ligation repr sentant le d s quilibre des couleurs et la pr sence de billes polyclonales Figure 5 11 C 3 l auto corr lation entre les diff rents chantillons Il est galement important de noter que 5 3 Analyse de donn es de ChIP seq 233 A Marque de pr sence Type de fragment d un identifiant E r GER FS fragment en 3 Num ro du Position Position B quadrant de en x sur la en y sur la l octet lame lame csfasta gt 1_644_120_F3 T30100200133300101133003322233303233201031212112 0 gt 1_666_405_F3 T32031113222333110121210133023032223001211212021 3 gt 1_703_178_F3 T03033320332300130303332213333103333332012112122 2 gt 1_708_449_F3 T01033320332003203021002213323202121111033022223 0 _QV qual gt 1 644 120 F3 44412594
21. non stock es dans la base de donn es mais dans des fichiers plats index s et de leur an notation avec toutes les informations sur les chantillons et les sondes correspondantes TBCommonGenes qui combine les listes de g nes provenant d un groupe de signatures permettant ainsi de d finir les g nes les plus fr quemment retrouv s coexprim s dans ce groupe de signatures Ces signatures proviennent g n ralement de requ tes bool ennes telles que ESRI amp FOXAI pour obtenir toutes les signatures comportant au mini mum ces deux g nes TBMap qui synth tise le contenu de toutes les signatures d une plateforme de puce ADN donn e Afin d obtenir une vue g n rale des g nes fr quemment associ s ensemble au sein des signatures des cartes transcriptionnelles ont t g n r es partir des sondes des plateformes GPL96 GPL570 GPL81 respectivement les 2 plateformes humaines et la plateforme murine les plus utilis es Ces cartes transcriptionnelles correspondent 4 4 D veloppement de l application 153 Gene expression profiles Gene gene distance Calculate for gene 1 n matrix n x n the distances with their kth nearest neighbors DKNN 3 Y J z N NY Z Samples Random Calculate the Observed critical value lt Q Calculate simulated DKNN values Calculate FDR f 2 25 aso 025 nA MES 040 or each observed DKNN value Graph construction Only genes th
22. t amplifi s et marqu s radioactivement avant d tre hybrid s sur des puces a ADN sur support nylon afin de mettre en vidence les g nes diff rentiellement exprim s chez les patients DF Ces puces ADN comportent 8780 sondes et ont permis d identifier 46 g nes surexprim s et 4 g nes sousexprim s chez les DF parmi lesquels 10 g nes avaient pr c demment t cit s par des tudes ind pendantes 3 4 Analyses de donn es dans le cadre de collaborations 109 i ES culture FA _ L e FD ITS EGF bEGE ae kin tine CIPPRESEM HIPAA SSNS TEET donn es brutes GEO Agilent 4x44K Normalisation analyse statistique annotation fonctionnelle litt rature wis a in g nes candidats g nes de normalisation pour validation RT qPCR FIGURE 3 10 R sum du plan exp rimental et analytique de la seconde campagne de puces ADN Cette premi re analyse a fait l objet d une publication dans PLoS ONE en 2010 Les donn es ont t publi es sous ArrayExpress avec l identifiant E MTAB 281 Afin de confirmer les r sultats pr c dents et d identifier de nouvelles cibles th rapeutiques potentielles l chelle pan g nomique cette fois ci une nouvelle campagne de puces ADN a t r alis e sur puces pan g nomiques Agilent a partir de 4 DF et de 4 contr les l tat de sph res mais galement l tat diff renci Figure 3 10 Au cours de cette tude j ai
23. tr s haut d bit 31 1 4 1 Principes du s quen age tr s haut d bit 32 1 4 2 Techniques d analyses bas es sur le s quen age HTS 45 1 5 Apports des techniques de puces ADN et de s quen age tr s haut d bit 51 1 6 Langages de programmation pour l analyse de donn es 53 2 Contr le qualit et normalisation de donn es de puces ADN 59 2 1 Obtention des donn es brutes d expression 4 59 2 1 1 Conception du plan d exp rience et biais techniques 59 2 1 2 Acquisition des donn es brutes oaoa be eee ee es 61 2 2 Correction des donn es Dies sua wn ae Be we ee hee ee ee 61 2 2 1 Pr traitement des donn es 2 2 4 422452442 bb eee hee ES 61 2 2 2 Transformation en logarithme base 2 2 554656 6 42645 62 2 2 3 Normalisation des donn es 4 su ass se aetidua t a 62 2 3 Contexte d Projet oce soca ss ses an OO UNS Led KOR KOH KS 65 2 4 Choix du d veloppement d une librairie R 66 2 5 Principe de la librairie R AgiND Les ui hee kee Be ERR ER Se ER ES 67 2 6 Discussions et Perspectiv S os soia eeit ai SG aR PM BR POT E 69 4 Table des mati res 3 Analyses de donn es de puces ADN 75 Sil S lection Ue een rs LL ELA L RAR EAN SLR FE ESS 76 DL CASSER LE Wager hee had hha RS PAS OE SARS rates 76 3 1 2 Significant Analysis of Microarrays SAM 78 3 1 3 ANalys
24. A the Complex9RN200 artificial dataset and B the GSE1456 microarray dataset Found at doi 10 1371 journal pone 0004001 s003 9 01 MB TIF Figure S4 Colors correspond to the clusters found using the corresponding algorithm A The whole dataset 9 112 points B A zoom in of Complex9RN200 dataset that displays the various shapes to be found C DBF filtering step without partitioning With k set to 60 noisy elements remain around the shapes D G The filtering and partitioning results obtained using DBF MCL run with a range of k values and I values Other arguments are unchanged FDR 10 S1 3 The set of points n 3 108 obtained using DBF MCL k 20 was used to test the other algorithms H Results obtained with hierarchical clustering single linkage The obtained dendrogram was cut to produce 9 clusters I Results obtained with the QT_CLUST algorithm radius 0 8 J Results obtained for k means 9 centers 100 initializations K Results obtained with cst threshold 0 81 Found at doi 10 1371 journal pone 0004001 s004 9 41 MB TIF Figure S5 Impact of various k values on DBF MCL results The x axis correspond to k values The y axis correspond to the number of elements considered as informative A DBF MCL was run with the Complex9RN200 as input using a range of k values FDR 10 S1 3 Inflation 1 2 B DBF MCL was run with several microarray datasets as input including GSE1456 using a range of k values
25. A co ordinated interaction between CTCF and ER in breast cancer cells BMC genomics 12 1 593 Rothberg ef al 2011 Rothberg J M Hinz W Rearick T M Schultz J Mileski W Da vey M Leamon J H Johnson K Milgrew M J Edwards M Hoon J Simons J F Marran D Myers J W Davidson J F Branting A Nobile J R Puc B P Light D Clark T A Huber M Branciforte J T Stoner I B Cawley S E Lyons M Fu Y Homer N Sedova M Miao X Reed B Sabina J Feierstein E Schorn M Alanjary M Dimalanta E Dressman D Kasinskas R Sokolsky T Fidanza J A Namsaraev E McKernan K J Williams A Roth G T amp Bustillo J 2011 An integrated semiconductor device enabling non optical genome sequencing Nature 475 7356 348 52 Rothberg amp Leamon 2008 Rothberg J M amp Leamon J H 2008 The development and impact of 454 sequencing Nature biotechnology 26 10 1117 24 Rougemont ef al 2008 Rougemont J Amzallag A Iseli C Farinelli L Xenarios I amp Naef F 2008 Probabilistic base calling of Solexa sequencing data BMC bioinformatics 9 431 Rumble et al 2009 Rumble S M Lacroute P Dalca A V Fiume M Sidow A amp Brudno M 2009 SHRiMP accurate mapping of short color space reads PLoS com putational biology 5 5 e1000386 Rye et al 2011 Rye M B Sz trom P 1 amp Drabl s F 2011
26. A manually curated ChIP seq benchmark demonstrates room for improvement in current peak finder programs Nucleic acids research 39 4 e25 Sanger et al 1977 Sanger F Nicklen S amp Coulson A R 1977 DNA sequencing with chain terminating inhibitors Proceedings of the National Academy of Sciences of the United States of America 74 12 5463 7 Schones amp Zhao 2008 Schones D E amp Zhao K 2008 Genome wide approaches to stu dying chromatin modifications Nature reviews Genetics 9 3 179 91 Sean amp Meltzer 2007 Sean D amp Meltzer P S 2007 GEOquery a bridge between the Gene Expression Omnibus GEO and BioConductor Bioinformatics Oxford England 23 14 1846 7 Shendure et al 2005 Shendure J Porreca G J Reppas N B Lin X McCutcheon J P Rosenbaum A M Wang M D Zhang K Mitra R D amp Church G M 2005 Accurate multiplex polony sequencing of an evolved bacterial genome Science New York N Y 309 5741 1728 32 Shi et al 2010 Shi L Campbell G Jones W D Campagne F Wen Z Walker S J Su Z Chu T M Goodsaid F M Pusztai L Shaughnessy J D Oberthuer A Thomas R S Paules R S Fielden M Barlogie B Chen W Du P Fischer M Furlanello C Gallas B D Ge X Megherbi D B Symmans W F Wang M D Zhang J Bitter H Brors B Bushel P R Bylesjo M Chen M Cheng J Cheng J Ch
27. Amplification monoclonale des fragments d ADN pour la constitution de librai ries Adapt de Metzker 2010 ms dus A Ew 0 ew 4 36 Principe des trois technologies majeures de s quen age tr s haut d bit Adapt d IM tzker 20101 4 4 4 mono a ard Las as a 4 4 ee be ee 37 Les sondes de la technologie SOLiD Chaque sonde de 8 nucl otides est com pos e de 2 bases compl mentaires la s quence cible positions 1 et 2 puis de 3 bases d g n r es n et enfin de trois bases universelles z 40 Conversion des reads SOLiD Men s quences nucl otidiques Chaque couleur code pour un nombre entre O et 3 permettant l aide de la derni re base de l adaptateur dans cet exemple T de reconstituer la s quence g nomique 40 Principe de la d tection de SNP et small indels par la technologie SOLiD 41 A Les diff rents modes de s quen age fragment paired end et mate pair B Apport du mode paired end pour la d tection d v nements d insertion de d l tion ou d inversion 44 La nouvelle g n ration de s quenceurs A La technologie de Pacific Bios ciences sur le principe de SMRT B L Ion Torrent et sa puce semi conductrice pour la lecture d un diff rentiel de pH Adapt de Metzker 2010 el Rothbers er al Les ns uses oY ARR a Led e 46 Liste des figures 1 14 Les diff rentes tudes rendues possibles p
28. COLE DOCTORALE DES SCIENCES DE LA VIE ET DE LA SANT UNIVERSIT DE LA M DITERRAN E AIX MARSEILLE II FACULT DES SCIENCES DE LUMINY TH SE pour obtenir le titre de Docteur en Sciences Sp cialit BIOINFORMATIQUE ET GENOMIQUE Pr sent e et soutenue par Aur lie BERGON D veloppement d outils et m thodes bioinformatiques pour l tude de l expression des g nes et de leur r gulation Application aux pathologies soutenue le 6 f vrier 2012 Jury Rapporteurs Pr Gianluca BoNTEMPI Universit Libre de Bruxelles Dr Fr d ric GUYON Inserm UMR_S 973 Paris Examinateurs Dr Max CHAFFANET CRCM Inserm UMR891 Marseille Dr Salvatore SPICUGLIA Inserm UMR_S 928 Marseille Directeurs Dr Jean IMBERT Inserm UMR_S 928 Marseille Dr Denis PUTHIER Inserm UMR_S 928 Marseille Pr sident Pr Franck GALLAND CIML Marseille Remerciements Tout d abord je tiens remercier les membres de mon jury d avoir accept de lire et de juger mon travail de Doctorat malgr le court d lai que je leur ai finalement laiss Je voudrais ensuite exprimer ma gratitude la directrice du laboratoire Inserm UMR_S 928 TAGC le Docteur Catherine Nguyen pour m avoir accueillie depuis 5 ans Je remercie galement mes directeurs de th se les docteurs Jean Imbert et Denis Puthier pour m avoir permis de prendre part des projets de recherche fort int ressants qui m ont beaucoup appris Un immense me
29. Control signa UCSC Gores Bazze on AetSog DnP re Gensark CCOS and Comparative Genomics ane A metin 121 h apoa2 Pop2 ft Stxbp2 B H3K4me2 enrichment 15 Distance from TSS kb Inactive Genes FIGURE 5 3 Visualisation de profils de pics avec en A ceux obtenus pour un facteur de trans cription ou pour les marques de m thylation et en B la diff rence de profils entre les diff rentes modifications d histone Barski et al 2007 Tomaru et al 2009 et Pekowska et al 2010 224 Chapitre 5 tude de la r gulation transcriptionelle par HTS de donn es et de les r utiliser facilement si n cessaire Le SOLiD est livr avec un cluster de calcul cluster online permettant 1 l acquisition des images de chacune des 10 ligations pour les 5 cycles d amorces 2 le stockage des images et 3 pour chaque bille la d termination de la s quence en code couleur puis sa conversion en nucl otides Il faut donc un mat riel performant pour analyser ces images et une grosse capacit de calcul pour g rer les millions de billes d pos es sur les lames la n cessit de performance ne provient pas tellement de la complexit des calculs r aliser mais souvent du nombre de fois qu il faut les r aliser Afin de concilier simultan ment l acquisition et le traitement des donn es de s quen age un second cluster a t install au TAGC cluster offline Celui
30. De plus certains de ces donn es sont mises a disposition sous la forme de fichiers plats c est a dire des fichiers texte tabul s 4 2 2 Bases de donn es d di es aux donn es de puces ADN Pour les puces ADN des bases de donn es ont galement t d velopp es Pour stocker ces donn es des standards ont t d finis Stoeckert et al 2002 Certains mettent m me disposition leurs donn es sous la forme de fichiers plats Ainsi le consortium MGED devenu FGED pour Functional Genomics Data Society a d velopp le standard MIAME pour Minimum Information About a Microarray Experiment en juillet 2010 Brazma ef al 20011 MIAME d crit tous les l ments n cessaires l interpr tation des r sultats et les param tres exp rimentaux utiles pour reproduire une exp rience tels que 1 les fichiers de donn es brutes obtenus partir du scanner 2 la matrice d expression normalis e 3 l annotation compl te des chantillons type dose de traitement temps de cin tique 4 la conception de l exp rience incluant les relations entre chantillons r plicats biologiques ou techniques chantillons ayant subit une diff renciation etc 5 le type de plateforme de puce ADN utilis e et 6 le laboratoire ayant r alis les exp riences et les protocoles utilis s Cependant le standard MIAME n impose aucun format de fichier mais pr conise l utilisation du format MAGE ML MicroArray Gene Expre
31. Dysautonomia Nathalie Boone B atrice Loriod Aur lie Bergon Oualid Sbai Christine Formisano Tr ziny Jean Gabert Michel Khrestchatisky Catherine Nguyen Fran ois F ron Felicia B Axelrod El Ch rif Ibrahim 1 NICN CNRS UMR 6184 Universit de la M diterran e Facult de M decine Nord IFR Jean Roche Marseille France 2 TAGC INSERM U928 Marseille France 3 Plateforme Transcriptome CRO2 Facult de M decine Marseille France 4 Biochemistry and Molecular Biology H pital Nord AP HM Marseille France 5 Department of Pediatrics New York University School of Medicine New York New York United States of America Abstract Background Familial dysautonomia FD is a hereditary neuropathy caused by mutations in the XBKAP gene the most common of which results in variable tissue specific mRNA splicing with skipping of exon 20 Defective splicing is especially severe in nervous tissue leading to incomplete development and progressive degeneration of sensory and autonomic neurons The specificity of neuron loss in FD is poorly understood due to the lack of an appropriate model system To better understand and modelize the molecular mechanisms of KBKAP mRNA splicing we collected human olfactory ecto mesenchymal stem cells hOE MSC from FD patients hOE MSCs have a pluripotent ability to differentiate into various cell lineages including neurons and glial cells Methodology Principal Findings We confirm
32. Hume D A Ideker T amp Hayashizaki Y 2010 An atlas of combinatorial transcriptional regulation in mouse and man Cell 140 5 744 52 Ren et al 2000 Ren B Robert F Wyrick J J Aparicio O Jennings E G Simon I Zeitlinger J Schreiber J Hannett N Kanin E Volkert T L Wilson C J Bell S P amp Young R A 2000 Genome wide location and function of DNA binding proteins Science New York N Y 290 5500 2306 9 Robyr et al 2002 Robyr D Suka Y Xenarios I Kurdistani S K Wang A Suka N amp Grunstein M 2002 Microarray deacetylation maps determine genome wide functions for yeast histone deacetylases Cell 109 4 437 46 Roh et al 2004 Roh T y Ngau W C Cui K Landsman D amp Zhao K 2004 High resolution genome wide mapping of histone modifications Nature biotechnology 22 8 1013 6 Ronaghi 2001 Ronaghi M 2001 Pyrosequencing sheds light on DNA sequencing Genome research 11 1 3 11 Bibliographie 299 Ronaghi ef al 1998 Ronaghi M Uhl n M amp Nyr n P 1998 A sequencing method based on real time pyrophosphate Science New York N Y 281 5375 363 365 Ross amp Cronin 2011 Ross J S amp Cronin M 2011 Whole cancer genome sequencing by next generation methods American journal of clinical pathology 136 4 527 39 Ross Innes et al 2011 Ross Innes C S Brown G D amp Carroll J S 2011
33. INC FD fibroblast cells Close et al 2006 Others identified genes known to be involved in oligodendrocyte development myelin formation and disorganization of microtubules from cerebrum of FD patients Cheishvili et al 2007 2011 Lee and colleagues determined that the neuron specific splicing factor NOVA1 was underexpressed in FD versus control induced pluripotent stem cell iPSC derived neu ral crest precursors Lee et al 2009 Finally a recent study showed that FD affects genes important for early developmental stages of the nervous system using neuroblastoma cell lines Cohen Kupiec et al 2011 Nevertheless the specific means by which aberrant IK BKAP mRNA splicing causes the disease producing developmental and degenerative neuronal changes in FD neurons is still unclear However the plant cytokinin kinetin has been found to be a pow erful agent that corrects IKBKAP mRNA splicing defects Boone et al 2010 Hims et al 2007 Keren et al 2010 Lee et al 2009 Slaugenhaupt et al 2004 and was effective when administered in transgenic mouse model Shetty et al 2011 and FD patients Axelrod et al 2011 which would make it a potential therapeutic agent for the treatment of FD and other disorders involving miss plicing of mRNAs To better understand the cascade of events mediated by the c 2204 6T gt C mutation we used human olfactory ecto mesenchymal stem cells hOE MSCs from FD patients or from control indivi
34. KEGG BioCarta Swiss Prot BBID SMART NIH Genetic Association DB COG KOG using the DAVID knowledgebase 2 RTools TB is a library for data mining of public microarray data RTools TB can be helpful i to define the biological contexts i e experiments in which a set of genes are co expressed and ii to define their most frequent neighbors 1 The RTools4TB package also implements the DBF MCL algorithm Density Based Filtering And Markov Clustering that can be used for fast and automated partitioning of microarray data DBF MCL is a tree step adaptative algorithm that i find elements located in dense areas ie clusters ii uses selected items to construct a graph and iii performs graph partitioning using MCL 3 Note that a UNIX like systems is required to use DBF MCL 2 Fetching transcriptional signatures from TBrowserDB 2 1 The getSignatures function Connection to the TranscriptomeBrowser database TBrowserDB relies on the getSignatures get ExpressionMatrix and getTBInfo functions Basically the getSignatures function can be used to retrieve transcriptional signature IDs using gene symbol s probe ID s experiment ID microarray platform ID or annotation term s as input This is controled by the field argument gt library RTools4TB gt args getSignatures function field c gene probe platform experiment annotation value NULL qValue NULL nbMin NULL verbose TRUE s
35. Since MU transcripts contain a premature stop codon that may activate PLoS ONE www plosone org the nonsense mediated mRNA decay NMD pathway we wanted to confirm whether this pathway is responsible for the lower IKBKAP transcripts expression in FD cells Thus we tested cycloheximide a protein synthesis inhibitor which also inhibits NMD Indeed FD cells preincubated for 6 h with cycloheximide exhibited a stabilization of the MU transcript as evidenced by semi quantitative RT PCR Figure 2D left panel To accurately determine the level of WT and MU J ABAAP transcripts in these samples absolute RT qPCR analysis was performed Figure 2D right panel The results clearly demonstrated that the WT MU ratio decreases when mRNA surveillance is inhibited Thus a December 2010 Volume 5 Issue 12 e15590 P1 P2 P5 P2 c3 mm NT C4 CS array wr OE MSCs as a Model for FD P5 P9 1 2 WT 0 12 WT vu gt 1 5 0 1 ES lt c 0 8 2 0 08 on wo on n L 2 a 0 6 a 0 06 x x a a 0 4 0 04 5 gt X 0 2 X 0 02 0 C1 C2 C3 C4 C5 FD1 FD2 FD3 FD4 c FD1 FD2 FD3 FD4 C1 C2 C4 C5 IKAP hELP1 150 kDa lt e sn ee CDS es C4 C5 FD2 FD4 2 E DMSO ik 2 abe oe e A 15 Bi cyclo 50 pg ml ii WT MU IKBKAP AU _ on Figure 2 Expression of KBKAP transcripts and IKAP hELP1 protein in hOE MSCs A agarose gel electrophoresis of end point RT PCR products showing both WT and MU transcri
36. SisGenume 21 Mirimal ChipSeq Peak Finder E RANGE 27 31 OR sce cs ns 18 incl Poisson cist mu UE chromsome scala cr Poisson dist Hidden Markov Model HPezk 2 Sole Sesarc1 Ona sample t test Fesses pi SISSRS 32 sop packaze wtd amp mic Generating density Peak Adjustments w Significance relative to profiles assignment contro data control data Windswe ony GUI or cross platform command line mtertace X opficnal suficient dala is ava lable to spl conho data B X method ealudes putative duplicated regions nc lreatment of deletions 12 Control 8 8 3 6 0 117576800 117576900 117577000 417577100 117577200 12 ChIP 8 84 S ao 6 0 117576800 117576900 117577000 117577100 117577200 PeakFinder PeakFinderC MACS MACSC ELLE Hpeak a a a aaa HpeakC T I Peak S og el FindPeaks ee CisGenome ER CisGenomeC ee SISSRs SISSRsC GeneTrack s s QuEST Binding motif Ficure 5 13 Les diff rents logiciels de recherche de pics A Tableau r capitulatif des princi pales m thodes B Repr sentation des pics obtenus par ces diff rentes m thodes une position donn e du g nome Adapt de Wilbanks amp Facciotti 2010 5 3 Analyse de donn es de ChIP seq 241 5 3 6 Annotation et visualisation des r sultats A partir d une liste de pics
37. Supp Table S6 displays the list of genes affected by kinetin action in FD rafnshh hOE MSCs Interestingly a majority of candidate genes were downregulated in response to kinetin In addition to confirming an increased expres sion of IKBKAP in FD hOE MSCs we observed cellular responses that are consistent with predicted mechanisms of kinetin action Indeed our analysis detected differences in expression of genes in volved in mRNA splicing LUC7L SNRPA WDR70 Supp Table S6 Of particular interest SNRPA and LUC7L are both related to the U1 snRNP splicing complex required for 5 ss selection SNRPA down regulated by 1 7 fold in response to kinetin in FD rafnshh treated hOE MSCs encodes the U1 snRNP core protein U1A Nelissen etal 1991 LUC7L upregulated by 2 fold encodes a putative RNA binding protein similar to the yeast Luc7p subunit of the U1 snRNP Fortes et al 1999 Tufarelli et al 2001 RT qPCR Analysis of Candidate Genes Validates Microarray Data To further confirm gene expression data from microarray analysis we used relative qPCR to verify the differential expression of a subset of the identified genes based on statistical significance as well as the biological relevance for each comparison WDR59 was selected as the reference gene since it exhibited relatively stable expression in our microarray data Using IKBKAP expression as a positive control for each experiment we confirmed the differential expression of LYN
38. The two faces of the 15 lipoxygenase in atherosclerosis Prostaglandins Leukot Essent Fatty Acids 77 67 77 Deng Y Theken KN Lee CR 2009 Cytochrome P450 epoxygenases soluble epoxide hydrolase and the regulation of cardiovascular inflammation J Mol Cell Cardiol 48 331 341 Attie AD Kastelein JP Hayden MR 2001 Pivotal role of ABCA in reverse cholesterol transport influencing HDL levels and susceptibility to atheroscle rosis J Lipid Res 42 1717 1726 Hennuyer N Tailleux A Torpier G Mezdour H Fruchart JC et al 2005 PPARalpha but not PPARgamma activators decrease macrophage laden atherosclerotic lesions in a nondiabetic mouse model of mixed dyslipidemia Arterioscler Thromb Vasc Biol 25 1897 1902 Samuelsson B Morgenstern R Jakobsson PJ 2007 Membrane prostaglandin E synthase 1 a novel therapeutic target Pharmacol Rev 59 207 224 Cipollone F Fazia M Iezzi A Ciabattoni G Pini B et al 2004 Balance between PGD synthase and PGE synthase is a major determinant of atherosclerotic plaque instability in humans Arterioscler Thromb Vasc Biol 24 1259 1265 World Health Organization 2009 Dengue guidelines for diagnosis treatment prevention and control new edition Geneva World Health Organization 147 p Dyrskjot L Zieger K Real FX Malats N Carrato A et al 2007 Gene expression signatures predict outcome in non muscle invasive bladder carcinoma a multicenter validation study Clin Cancer Res 13 3545 3551
39. V G 1992 DNA position specific repression of transcription by a Drosophila zinc finger protein Genes amp development 6 10 1865 73 Gheldof et al 2012 Gheldof N Leleu M Noordermeer D Rougemont J amp Reymond A 2012 Detecting long range chromatin interactions using the chromosome conformation capture sequencing 4C seq method Methods in molecular biology Clifton N J 786 211 25 Giardine et al 2005 Giardine B Riemer C Hardison R C Burhans R Elnitski L Shah P Zhang Y Blankenberg D Albert I Taylor J Miller W Kent W J amp Nekru tenko A 2005 Galaxy a platform for interactive large scale genome analysis Genome research 15 10 1451 5 Gilmour amp Lis 1985 Gilmour D S amp Lis J T 1985 In vivo interactions of RNA poly merase II with genes of Drosophila melanogaster Molecular and cellular biology 5 8 2009 18 Giresi et al 2007 Giresi P G Kim J McDaniell R M Iyer V R amp Lieb J D 2007 FAIRE Formaldehyde Assisted Isolation of Regulatory Elements isolates active regulatory elements from human chromatin Genome research 17 6 877 85 Glenn 2011 Glenn T C 2011 Field guide to next generation DNA sequencers Molecular ecology resources Goldfeder ef al 2011 Goldfeder R L Parker S C J Ajay S S Ozel Abaan H amp Margu lies E H 2011 A bioinformatics approach for determining sample identity from
40. Whitney AR Diehn M Popper SJ Alizadeh AA Boldrick JC et al 2003 Individuality and variation in gene expression patterns in human blood Proc Natl Acad Sci U S A 100 1896 1901 Eady JJ Wortley GM Wormstone YM Hughes JC Astley SB et al 2005 Variation in gene expression profiles of peripheral blood mononuclear cells from healthy volunteers Physiol Genomics 22 402 411 Remick DG 2007 Pathophysiology of sepsis Am J Pathol 170 1435 1444 Lenz A Franklin GA Cheadle WG 2007 Systemic inflammation after trauma Injury 38 1336 1345 Chaturvedi UC Shrivastava R Tripathi RK Nagar R 2007 Dengue virus specific suppressor T cells current perspectives FEMS Immunol Med Microbiol 50 285 299 Green S Vaughn DW Kalayanarooj S Nimmannitya S Suntayakorn S et al 1999 Early immune activation in acute dengue illness is related to development of plasma leakage and disease severity J Infect Dis 179 755 762 Panpanich R Sornchai P Kanjanaratanakorn K 2006 Corticosteroids for treating dengue shock syndrome Cochrane Database Syst Rev 3 CD003488 Rajapakse S 2009 Corticosteroids in the treatment of dengue illness Trans R Soc Trop Med Hyg 103 122 126 Lorente L Martin MM Sole Violan J Blanquer J Paramo JA 2010 Matrix metalloproteinases and their inhibitors as biomarkers of severity in sepsis Crit Care 14 402 July 2010 Volume 5 Issue 7 e11671 O1 02 03 04 07 08 09 10 ll
41. and SNCA between control and FD cells Fig 4A MAP1LC3C NOVAI SNCA SPONI between control and FD sphere derived cells Fig 4B and LUC7L between FD cells with or without kinetin treatment Fig 4C ZNF280D is a Potential Sequence Specific Target of Kinetin in FD hOE MSCs Among the list of genes whose expression is downregulated af ter kinetin treatment in FD OE MSCs we noted the presence of ZNE280D Supp Table S6 ZNF280D belongs to a unique group of 12 genes in the entire genome that contains an alternative 5 ss in one of its exons exon 16 that is identical to the FD 5 ss CAAguaagc Ibrahim et al 2007 Therefore we hypothesized that kinetin may favor the splicing of introns flanked by the CAAguaagc 5 ss motif resulting in a modification in the ratio of alternative 5 ss choice for ZNF280D exon 16 Supp Fig S2 Since the use of the 5 ss identical to the FD IKBKAP intron 20 5 ss is also expected to induce a pre mature stop codon in ZNF280D exon 17 and make it a target for nonsense mediated mRNA decay NMD this may explain why the total amount of ZNF280D transcripts is reduced in FD hOE MSCs after kinetin treatment HUMAN MUTATION Vol 00 No 0 1 11 2012 5 A Genes with reduced or increased expression in CTRL versus FD cells 2 5 fi CTRL S E rp w 2 n 2 x 1 5 v TD A 1 E 50 5 Z 0 IKBKAP LYN SNCA B Genes with reduced or increased expression in CTRL versus FD sphere cells
42. autres librairies R comme vsn afin de normaliser les donn es brutes extraites par AgiND ou bien encore d utiliser des librairies graphiques telles que arrayQualityMetrics Le d veloppement d AgiND n est pas termin On peut encore am liorer cette librairie Pour permettre son utilisation avec tous les types de formats de fichiers bruts il faudrait per mettre l importation de donn es obtenues l aide du logiciel GenePix comme c est le cas des librairies BABAR GOULPHAR ou bien encore limma De plus il faudrait inclure des fonc tions v rifiant la qualit des donn es de mani re plus ou moins automatique comme l tude des contr les positifs et n gatifs et g n rant des graphiques Enfin un rapport d analyse comme sous ArrayQualityMetrics pourrait tre g n r et donn aux clients de la plateforme TGML Ceci serait un gage suppl mentaire de qualit en vue d une analyse de donn es 72 Chapitre 2 Contr le qualit et normalisation de donn es de puces ADN Logiciel ou librairie R Type de fi Contr le Filtrage Normalisation s Langage s chier d entr e qualit des don n es AFE GenePix AgiND X X X Quantiles lo R C wess limma X X X X Quantiles lo R wess print tip loess scale Agi4x44PreProcess X X X Quantiles vsn R C C fortran GOULPHAR X X X Lowess R agilp X X X Lowess R BABAR X X X Lowess R arrayQuality X
43. database Gene Expression Omnibus 32 GEO Series accession number GSE17924 http www ncbhi nlm nih gov geo query acc cgi acc GSE17924 Individual microarray quality was evaluated based on QC report pair wise MA plots and box plots Intra array normaliza tion of raw signals from the 48 microarrays was done using Feature Extraction software 9 1 3 1 Agilent Microarrays nor malized data were further exported into the Limma package 33 for inter array normalization using the quantile method 34 Statistical analysis were was performed using the TIGR MeV MultiExperiment Viewer v 4 4 software http www tm4 org mev html and the GeneANOVA program 35 Multi way ANOVA model was implemented first to identify differentially regulated genes when accounting for the multiple sources of variation in the microrray experiment second to evaluate the effect of the main variable disease phenotype relatively to that of other putative confounding variables such age gender duration of illness or microarray technical variability independent extractions or hybridizations Local ANOVA further determined the contribution of each covariate on the expression level of each gene Multiple test correction was further carried out using the false discovery rate FDR method 36 Cluster 37 and Tree View softwares 38 were used for unsupervised hierarchical clustering Iterative SVM Support Vector Machine method PLoS ONE www plosone or
44. de l avancement du projet avec le d 4 8 Conclusions et perspectives 209 Feature TranscriptomeBrowser 2008 Added in TranscriptomeBrowser 2011 Supported species 3 human mouse rat 51 new species like Drosophila melano gatser Saccharomyces cerevisiae Number of annota 19 54 tions sources Biological evi DAVIDknowledgebase DAVIDknowledgebase version 2007 dences considered version 2005 and NEW ANNOTATIONS MicroRNA target site prediction Tar getScan Pictar TFBS prediction TFBSConserved Cisred Protein protein interaction functional relationship KEA Disease Expression signatures MsigDB TBrowser s TS TBMC Number of mi 70 101 croarray platforms i e GPL Number of mi 1484 5568 croarray experi ments i e GSE Input for enrich TS more than 10 TS more than 8 samples ment analysis samples Generation of TS bash perl C pro optimisation DBF MCL parameters gramms using RTools4TB and an automatic pipeline C gawk R Number of TS 18250 40138 with 33941 ES corresponding to Homo sapiens Mus musculus and Rattus norvegicus of annotated TS 84 annotated TS 87 annotated TS Plugins Heatmap TBCommon Genes TBMap AnnotationOverview TBConvertor In teractomeBrowser Request mode boolean request by new boolean request type homologe geneSymbol pro nelD entrezID and r
45. et le _QV qual pour le SOLiD le sff Standard Flowgram Format pour le 454 de Roche le fastq pour l Ion Torrent l Ilumina et les autres technologies de s quen age tr s 5 3 Analyse de donn es de ChIP seq 231 SYIVWIYd csfasta et _QV qual ma puis bam binaire et sam d compr ss Recherche de pics Recherche D couverte de motifs FIGURE 5 9 Pipeline d analyse des donn es de ChIP seq 232 Chapitre 5 tude de la r gulation transcriptionelle par HTS haut d bit Ainsi pour le SOLiD si le s quen age s est bien d roul les images r alis es au cours de chaque ligation sont transform es en fichiers spch dans un format con u pour des donn es volumineuses et complexes appel HDF5 http www hdfgroup org HDF5 whatishdf5 html Ces nombreux fichiers sont ensuite utilis s afin de cr er pour chaque chantillon deux fichiers un de s quences csfasta et un contenant les scores de qualit pour chaque dibase _QV qual Figure 5 10 Il est noter que le SOLID g n re 2 fichiers au lieu d un seul dans le cas des autres technologies Ils contiennent une en t te comment e en pour mettre les lignes de commandes avec les param tres ayant permis de les g n rer Ces fichiers incluent toutes les s quences y compris celles qui ne s aligneront pas sur le g nome et celles de mauvaise qualit Le premier fichier au forma
46. expression des g nes est un processus important chez les organismes vivants Dans le cas des organismes pluricellulaires toutes les cellules poss dent a priori le m me patrimoine g n tique Le nombre de g nes du g nome humain est fixe c est la r gulation qui permet aux cellules d exprimer diff remment leurs g nes et de se diff rencier au cours de l embryogen se pour donner diff rents tissus De m me ces tissus poss dent des caract ristiques et une r gulation particuli re au niveau des g nes et de la structure m me de la conformation de l ADN La r gulation de l expression des g nes comprend l ensemble des m canismes de r gulation mis en oeuvre pour passer de l information g n tique incluse dans une s quence d ADN un produit fonctionnel ARN ou prot ine Elle comporte plusieurs niveaux transcriptionnel post transcriptionnel traductionnel et post traductionnel Seules les r gulations transcriptionnelles seront d crites ci apr s compte tenu de mes travaux de th se portant sur l tude de l expression des g nes et de leur r gulation par l analyse de la fixation de facteurs de transcription L expression des g nes est le r sultat de l interaction de plusieurs processus 1 la transcription basale par les ARN polym rases et les facteurs de transcription g n raux 2 la modulation de celle ci par des facteurs de transcription s quences sp cifiques 3 la dynamique de la chromatin
47. infections virales provoquant la dengue Elle t r alis e avec des puces pan g nomiques Agilent 2 la collaboration avec le Dr El Ch rif Ibrahim du NICN CNRS UMR 6184 Facult de M decine Nord Marseille porte sur la caract risation transcriptionnelle des voies de signalisation alt r es lors du mauvais pissage alternatif du g ne IKBKAP chez les patients souffrant de dysautonomie familiale Ce projet a t men en deux temps Une premi re campagne sur puces ADN de type nylon con ue enti rement sur la plateforme a mis en vidence des g nes diff rentiellement exprim s entre les patients malades et les t moins La seconde sur puces ADN pan g nomiques commerciales de technologie Agilent a permis de confirmer les pr c dents r sultats et d tudier l effet d une mol cule vis e th rapeutique Dans ces collaborations mon travail a consist g n rer des signatures transcriptionnelles partir des donn es brutes provenant des fichiers issus d Agilent Feature Extraction AFE pour les puces Agilent du scanner pour les puces nylon J ai galement form et aid les biologistes utiliser les divers outils bioinformatiques AgiND TmeV cluster Treeview DAVID knowledge database Ingenuity Pathways Analysis IPA et contribu la r daction des articles 3 4 Analyses de donn es dans le cadre de collaborations 89 3 4 1 La dengue La dengue est une infection virale end
48. louverture de la chromatine permet la liaison de prot ines r gulatrices PADN Elle peut tre tudi e par la technique appel e couramment Formaldehyde Assisted Isolation of Regulatory Elements FAIRE Giresi et al 2007 Song ef al 2011 Namm o et al 2011 Le FAIRE seq permet de cartographier certaines r gions ouvertes de la chromatine et ainsi de d finir des r gions r gulatrices En effet seul 1 2 du g nome est compos de r gions ouvertes de la chromatine dans un type cellulaire donn et dans des conditions particuli res Song et al 2011 On obtient alors des informations sur des r gions r gulatrices sur lesquelles des facteurs de transcription peuvent venir se lier ADN ou bien encore des sites pour lesquels les histones nucl osomales subissent des modifications post transcriptionnelles 48 Chapitre 1 Introduction g n rale Domaine Technique Description d tude ChIP seq Cartographie des sites de fixation de fac teurs de transcription et des modifications Epig n tique d histones O Geen ef al 2011 Methyl seq Cartographie des sites de m thylation de 1 ADN il ts CpG Li et al 2011 Wu et al 2011a 3C seq Recherche des interactions chromatiniennes a longue distance Capture Chromo some Conformation Splinter et al 2004 Gheldof et al 2012 Mnhase seq Position des nucl osomes FAIRE seq Cartographie des r gions r gulatrices Giresi
49. m ta analyse Development of bioinformatics tools and methods for gene expression and regulation study Application to diseases Abstract Understanding the mechanisms that control gene expression is a major challenge for medical research This requires using a large set of pangenomic approaches such as those using DNA microarrays and high throughput sequencing that generate an ever growing mass of digital data During my thesis I have developed several computer based tools to facilitate their processing and analysis I have created a R library AgiND that controls the quality of Agilent DNA microarray data and allows their statistical normalization The growing number of expe riences stored in Gene Expression Omnibus has motivated the development of the TBrowser project An original method DBF MCL was created to extract annotated transcriptional signa tures by integrating various sources of information Stored in a database these signatures are ac cessible using a Java interface a SOAP web service and a R Bioconductor library RTools4TB Finally a pipeline dedicated to the ChIP seq analyses has been implemented All these tools were used to study various diseases in collaborations Keywords Bioinformatic transcriptome DNA microarrays epigenetic ChIP seq metaa nalysis TAGC INSERM UMR_S 928 PARC SCIENTIFIQUE DE LUMINY 13 288 MARSEILLE CEDEX 09
50. mique dans les pays tropicaux Cette maladie est transmise l tre humain par piq res de moustiques Aedes aegypti infect par un virus de la famille des flavivirus De rares cas de formes asymptomatiques existent Mais en g n ral cette infection virale entra ne des fi vres maux de t te douleurs musculaires et articulaires fatigues naus es vomissements et ruptions cutan es La fi vre peut tre h morragique avec ou sans syndrome de choc Ce dernier cas est rare mais s v re et peut entra ner la mort du patient LOMS a d fini en 1997 une classification clinique pour diff rencier les 3 principaux groupes de malades atteints de la dengue mais celle ci reste incompl te Figure 3 8 Cette classification comprend 1 la dengue classique DF Dengue Fever 2 la dengue h morragique sans syndrome de choc DHF Dengue Haemorrhagic Fever et 3 la dengue h morragique avec syndrome de choc DSS Dengue shock syndrome Nous avons utilis une approche transcriptomique afin d obtenir un aper u des m canismes mol culaires associ s au d veloppement de l infection par la dengue avec syndrome de choc DSS L objectif tait terme d identifier des biomarqueurs de diagnostic de cette forme clinique pouvant tre test s rapidement dans les pays end miques afin de pouvoir r duire le nombre de d c s dus cette maladie Nous avons donc r alis une analyse comparative des profils d expression de
51. plus ou moins fiables en fonction du score de qualit qui leur est associ on cherche a savoir quels sont les g nes potentiellement r gul s par le facteur tudi ceci dans le but de cr er des r seaux de g nes en valuant comme pour les puces ADN l en richissement fonctionnel de cette liste de g nes cibles en un processus une voie de signalisation La pratique standard pour associer les pics un g ne est d utiliser certains crit res comme la distance au site d initiation de la transcription ou TSS ou un l ment structural du g ne intron exon Par exemple Johnson et coll gues 2007 ont cartographi les pics 20 kb du TSS d un g ne alors que Wederell et coll gues 2008 ont utilis une distance comprise entre 10 kb du TSS et 1 kb du site de terminaison de la transcription Chen et coll gues 2008 ont utilis une m thode plus sophistiqu e et d termin la r partition des distances des pics aux TSS des g nes pour chaque facteur valu Les pics sont ensuite associ s au g ne le plus proche l int rieur de cette r partition Cependant il est noter que dans le cas des enhancers 1 le g ne le plus proche n est pas forc ment celui qui sera r gul par le facteur de transcription tudi et 2 un facteur de transcription peut r guler plusieurs des g nes qui l entourent Ainsi le 3C seq le ChIA PET et d autres extensions de ces approches permettent de d tecter les interactio
52. t es 5 2 L informatique du HTS 225 5 Stockage des donn es et des r sultats Cluster offline 3 Traitement des donn es de s quen age 2 Transfert des donn es de s quen age ment N ag oleur ICS A Pacs 4 Visualisation et analyse des r sultats 1 Acquisition des donn es de s quen age FIGURE 5 4 Sch ma de l organisation du mat riel utilis sur la plateforme IBiSA TGML du TAGC pour l acquisition et l analyse des donn es de s quen age tr s haut d bit par la technologie SOLiD 226 Chapitre 5 tude de la r gulation transcriptionelle par HTS Param trage du s quen age analyse primaire et secondaire Chargement de la r f rence WFA i Rapport d analyse du WFA non Cr ation du run chantillons type multiplexage D tection des billes focus range D marrage du run Rapport en temps r el Rapport en temps r el Sur la qualit du s quen age cycle scans cycle scans heatmap heat maps run log run metrics exposure times Contr le d arr t ou de pause du run en cas de probl me lors d une ligation Contr le de l imagerie et des analyses Annulation des analyses R analyse primaire ou secondaire Exportation des rapports et des r sultats Suppression des donn es par run ou par analyse SETS ICS 5 R analyse ou lancement de l analyse secondaire Bioscope Lorene Ee Lanc
53. terminer quels sont les g nes cibles potentiellement r gul s par le facteur de transcription tudi et analyser l enrichissement fonctionnel de ceux ci l aide d outils comme GREATER ou DAVID knowledgebase ceci afin de construire le r seau de g nes r gul s par le facteur de transcription La distribution de la localisation des pics est galement tudi e sur le g nome et au niveau de la structure des g nes Enfin un contr le qualit des donn es avant et apr s alignement a t ajout au pipeline l aide du logiciel FastQC celui ci venant en compl ment du rapport de s quen age fourni par SETS 5 4 2 Picor un nouvel outil pour la recherche de pics Face au probl me de d tection de pics voir partie 5 3 4 et vus les r sultats donn s par certains algorithmes sur nos donn es des chercheurs du laboratoire ont con u un nouvel algorithme pour la d tection de sites de fixation de facteurs de transcription partir de donn es de ChIP seq nomm Picor Figures 5 14 et 5 15 Cet algorithme non publi ce jour a t int gr au pipeline d analyse des donn es en parall le d un autre outil MACS J ai test et int gr ce nouvel outil dont je n ai pas pris part la conception mon pipeline 244 Chapitre 5 tude de la r gulation transcriptionelle par HTS Q Reads csfasta Primary analysis 4 Quality file qual Color calling Secondary analysis Bioscope 1 3 Reads alignment
54. velopp des plugins permettant l analyse des informations disponibles dans notre base de donn es il reste encore publier la mise jour Puis il est toujours possible d am liorer encore la rapidit la conception ou m me encore les fonctionnalit s de certains plugins comme TBNeighborhood par la repr sentation graphique des coexpression par exemple En effet l heure actuel la matrice g n r e doit tre analys par d autres outils L tape suivante pourrait tre l int gration de nouvelles sources de donn es avec pourquoi pas des donn es de puces microARN ou tout simplement de donn es de puces ADN provenant de nouvelles plateformes non int gr e en 2009 Ceci n cessiterait une nouvelle mise jour de la base mais disposant de script g n rant automatiquement les donn es cela ne devrait pas prendre beaucoup de temps De plus avec les bases de donn es comme IncRNAdb de nouvelles informations concernant la r gulation de l expression des g nes pourra tre obtenue La librairie R Bioconductor RTools4TB n cessite quelques am liorations et mise jour Comme dit pr c demment l int gration du nouveau service web SOAP WSDL doit tre finalis Ceci permettra une utilisation de la nouvelle base de donn es De plus d autres fonctionnalit s comme la g n rations de graphiques pour cytoscape ou pour une int gration dans InteractomeBrowser pourrait tre int ressant Enfin la possibilit d
55. 1 GENOME WIDE EXPRESSION PROFILING DECIPHERS HOST RESPONSES ALTE RED DURING DENGUE SHOCK SYNDROME AND REVEALS THE ROLE OF INNATE IMMU NITY IN SEVERE DENGUE don 0 Ge bd ue de tt mou woe de 91 3 4 2 Dysautonomie Familiale 107 ARTICLE 2 OLFACTORY STEM CELLS A NEW CELLULAR MODEL FOR STUDYING MOLECU LAR MECHANISMS UNDERLYING FAMILIAL DYSAUTONOMIA 111 ARTICLE 3 GENOME WIDE ANALYSIS OF FAMILIAL DYSAUTONOMIA AND KINETIN TAR GET GENES WITH PATIENT OLFACTORY EcTo MESENCHYMAL STEM CELLS 129 3 5 Conclusions et perspectives 140 Une fois les donn es normalis es l objectif est de mettre en vidence des g nes dif f rentiellement exprim s dans les chantillons Ceci peut tre obtenu par le simple calcul des amplitudes de variations d expression ratio mais c est g n ralement insuffisant Des approches statistiques sont donc n cessaires afin d estimer et de distinguer la variabilit intra et inter groupe De nombreux tests statistiques ont ainsi t propos s allant du test t de Welch aux approches bay siennes en passant par les analyses de variance Ces m thodes ont t utilis es lors de collaborations en vue d obtenir des signatures mol culaires dans le cas d tudes de pathologies 76 Chapitre 3 Analyses de donn es de puces ADN 3 1 S lection de g nes L application des tests d pend de plusieurs param tres mais ceux
56. 11 237 D veloppement d outils et m thodes bioinformatiques pour l tude de l expression des g nes et de leur r gulation Application aux pathologies R sum La compr hension des m canismes qui contr lent l expression des g nes est un en jeu majeur pour la recherche m dicale Elle n cessite un ensemble d approches pang nomiques telles que les puces ADN et plus r cemment le s quen age tr s haut d bit qui g n rent une masse toujours plus grande de donn es num riques traiter Au cours de ma th se j ai d ve lopp plusieurs outils informatiques innovants pour faciliter leur exploitation Ainsi J ai cr une librairie R AgiND qui v rifie la qualit des donn es de puces ADN Agilent et permet de les normaliser Le nombre croissant d exp riences stock es dans Gene Expression Omnibus a motiv la mise en place du projet TBrowser Une m thode originale DBF MCL a t cr e pour extraire des signatures transcriptionnelles annot es par l int gration de diverses sources d information Stock es dans une base de donn es elles sont accessibles a travers une interface Java un service web SOAP et une librairie R Bioconductor RTools4TB Enfin un pipeline d analyse d di au ChIP seq a t impl ment Tous ces outils ont servi pour l tude de di verses maladies dans le cadre de collaborations Mots cl s Bioinformatique transcriptome puces ADN pig n tique ChIP seq
57. 2 Fig S4D G although in some cases some of the shapes were merged in a manner that appears to be meaningful Fig S4E and S4G We then compared DBF MCL to several algorithms commonly used in microarray analysis All of them were run multiple times with various parameters and the best solution was kept In all cases the Euclidean distance was used as a distance measure between elements As these algorithms are not well suited for noisy data they were run on the 3 108 points extracted using DBF MCL k 20 Also it is difficult to compare those algorithms to one another some of them obviously failed to identify the shapes Indeed although k means was run 10 times with random initial starts and the right number of centers it led to a very poor partitioning result Fig S4J Cluster Affinity Search Technique CAST Fig S4K and Quality Cluster algorithm QT_CLUST Fig S4T gave also poor results as did the Self Organizing Map SOM data not shown Hierarchical clustering was run with single linkage as arguments and the obtained dendrogram was then split into 9 clusters Fig S4H Patterns were well recognized using this method but prior knowledge of the number of clusters is a prerequisite Thus both DBF MCL and hierarchical clustering are PLoS ONE www plosone org GEO Datamining with TBrowser able to properly identify complex shapes in a 2D space The main benefit of using DBF MCL resides in its ability to extract relevant informat
58. 3 2 Formats standards et outils de manipulation de donn es 233 5 3 3 Alignement sur le g nome de r f rence 2 2 hu sus eee Es 235 Doe COCHE DIRE ss s ad ses S As SAS AN BR NE ER 237 5 3 5 D couverte et recherche d motifs 2 due sue BB REG r dr es 238 5 3 6 Annotation et visualisation des r sultats 241 5 3 7 Bases de donn es d di es aux donn es HTS 241 5 4 Elaboration d outils et de m thodes d analyse pour les donn es de ChIP seq 242 5 4 1 Choix d s logiciels et strat gies 4 44 6 44 44 242 5 4 2 Picor un nouvel outil pour la recherche de pics 243 5 5 Analyse de donn es en collaborations 246 5 6 Discussion et perspectives 2 4 44 Leu Ke 4S ea Ke RR OER DER ess 247 A Manuel d utilisation de la librairie R AgiND 251 B Manuel d utilisation de la librairie R Bioconductor RTools4TB 273 Bibliographie 285 1 1 1 2 13 1 4 1 5 1 6 1 7 1 8 1 9 1 10 1 11 1 12 1 13 Liste des figures Proc dure amplification des ARNc pour une exp rience bi canale pour une exp rience mono canale seuls les chantillons marqu s au Cy3 B seront utilis s Extrait du manuel Agilent One Color Microarray Based Gene Expression Analysis Low Input Quick Amp Labeling Protocol 22 D roulement d une exp rience de puces ADN du
59. 3 3 Alignement sur le g nome de r f rence 235 5 34 Recherche de pies 222448 ak pau bus dia Damart ame 237 5 3 5 D couverte et recherche de motifs 238 5 3 6 Annotation et visualisation des r sultats 241 5 3 7 Bases de donn es d di es aux donn es HTS 241 5 4 Elaboration d outils et de m thodes d analyse pour les donn es de ChIP seq 242 5 4 1 Choix des logiciels et strat gies 242 5 4 2 Picor un nouvel outil pour la recherche de pics 243 5 5 Analyse de donn es en collaborations 246 5 6 Discussion et perspectives 247 La complexit des m canismes de r gulation transcriptionnelle commence seulement tre lucid e Dans de nombreux tissus ou types cellulaires des facteurs de transcription essentiels au fonctionnement normal ou pathologique ont t identifi s mais seules quelques unes de leurs cibles directes sont connues De nos jours de nombreuses techniques permettent d tudier la r gulation de l expression des g nes grande chelle et tr s haut d bit voir Chapitre 1 Ainsi le ChIP seq Johnson et al 2007 Mardis 2007 Elnitski et al 2006 Massie amp Mills 2008 qui associe immunopr cipitation de la chromatine Chromatin 216 Chapitre 5 tude de la r gulation transcriptionelle par HTS ImmunoPrecipitati
60. 328 2012 2097 Legend a Feature is saturated glsSaturated c Feature is not uniform glsFeatNonUnifOL e Feature is not positive and significant glsPosAndSignif g Feature is a population outlier glsFeatPopnOL 10 Va 59 83 130 2063 632 128945 65 19 69 29 55 98 02 56 41 26 20 60 i Feature is manually marked IsManualFlag j Background is not uniform g1lsBGNonUnifOL 1 Background reading is population outlier glsBGPopnOL And it s also possible to centered and reduced the inter arrays data by the command gt agBoxplot myob whichSlot gM array 1 4 centered TRUE reduced TRUE home aurelie Desktop images pdf library home aurelie Desktop images pdf library ri LITE 6 8 10 12 14 16 1 2 3 4 251487911262 1 4 251487911262 1 3 251487911262 1 2 251487911262 1 1 not centered and not reduced log2 gM centered and reduced log2 gM a b Figure 2 A Horizontal visualization of the agBoxplot for all the arrays B Centered and reduced data from the first array of the AgilenBatch object 11 3 2 The agMAplot function This fonction allows to obtain a MA plot Blant Atlman plot of the object for an array A and M values are different in according to the class object Indeed for an AgilentBatch object the reference correspond to a median of each spots in according to the different arrays whereas for an AgilentBatchRG object A and M are d
61. 7 N A A D Normalized expression IKBKAP MAP1LC3C NOVA1 SNCA c E CTRL rp SPON1 Genes with reduced or increased expression in FD versus FD cells treated with kinetin IKBKAP WT IKBKAP MU LUC7L 7 kkk T E Fo D FD kinetin 5 A 5 4 8 3 2 kk 2 xx 6 Ss 1 0 Figure 4 Validation of microarray candidates by RT qPCR RT qgPCR using total RNAs extracted from four controls and four FD hOE MSCs Histograms represent the mean value of A KBKAP LYN SNCA B MAPILC3C NOVA1 SPON1 and C LUC7Ltranscript expression level relative to WDR59 as a reference gene in control gray and FD samples black For dysregulated genes between control and FD hOE MSCs we pooled values of spheres and differentiated cells for each group Error bars denote standard errors P lt 0 05 P lt 0 01 P lt 0 001 using two tailed Student s test IKBKAP LUCZL and ZNF280D are Sensitive to Kinetin Treatment in hOE MSCs To corroborate the expression levels of LUC7L and ZNF280D detected in our microarray hybridization after kinetin treatment we exposed adherent hOE MSCs to increasing concentrations of kinetin 25 to 200 uM over a 48 hr time course and determined the expression changes of these genes relative qPCR Fig 5A Al 6 HUMAN MUTATION Vol 00 No 0 1 11 2012 though a dose dependent action of kinetin on increasing IKBKAP WT transcripts was only observed in FD samples Fig 5A higher
62. 8q21 1118q21 2 8q22 1 8q22 3 8q24 8q24 13 8q24 3 Next sections will introduce more complexe queries using sets of genes with or without Boolean operators 2 1 1 Request without logical operators gene list When field is set to gene or probe user can perform a request using a list of item separated by blanks These blanks are interpreted as the OR logical operators In this case all signatures containing at least one gene of the list will be returned To select more informative signatures we suggest to use the nbMin argument that will select signatures containing at least nb Min genes out of the list The following examples search for signatures containing at least 2 genes of the input list CD3D CD3E and CD4 gt gl lt getSignatures field gene value CD3D CD3E CD4 nbMin 2 150 signatures were found for the request gene CD3D CD3E CD4 and nbMin 2 gt head gl Signature nb Genes O3AD63FB5 050367D10 O53ECFACF 05F2203B7 OCOA8F888 OD2EA9D52 OO O1 amp NH NNNUNN 2 1 2 Request using logical operators The value argument of getSignatures may contain the following Boolean operators see help section on TranscriptomeBrowser web site for more informations http tagc univ mrs fr tbrowser e amp AND e OR e NOT used in conjonction with amp This is a convenient way to create relevant queries Suppose your field of interest is related to T cell activation You could be in
63. A Makiguchi H Niijima S Tsujimoto G amp Okuno Y 2009 GEM TREND a web tool for gene expression data mining toward relevant network discovery BMC genomics 10 411 Ferdin et al 2010 Ferdin J Kunej T amp Calin G A 2010 Non coding RNAs identifi cation of cancer associated microRNAs by gene profiling Technology in cancer research amp treatment 9 2 123 38 Fernandez Capetillo et al 2003 Fernandez Capetillo O Mahadevaiah S K Celeste A Romanienko P J Camerini Otero R D Bonner W M Manova K Burgoyne P amp Nussenzweig A 2003 H2AX is required for chromatin remodeling and inactivation of sex chromosomes in male mouse meiosis Developmental cell 4 4 497 508 Fiume et al 2010 Fiume M Williams V Brook A amp Brudno M 2010 Savant genome browser for high throughput sequencing data Bioinformatics Oxford England 26 16 1938 44 290 Bibliographie Foltz et al 2009 Foltz D R Jansen L E T Bailey A O Yates J R Bassett E A Wood S Black B E amp Cleveland D W 2009 Centromere specific assembly of CENP a nucleosomes is mediated by HJURP Cell 137 3 472 84 Fullwood et al 2009 Fullwood M J Wei C L Liu E T amp Ruan Y 2009 Next generation DNA sequencing of paired end tags PET for transcriptome and genome ana lyses Genome research 19 4 521 32 Geyer amp Corces 1992 Geyer P K amp Corces
64. ESR1 GATA3 and XBP1 tumors compare to normal breast tissues gt em lt getExpressionMatrix signatureID 3DE64836D Downloading expression matrix for transcriptional signature 3DE64836D 62 samples x 143 probes gt class em 1 data frame The getExpressionMatrix function returns a data frame The first two columns store probe IDs and gene symbols Additional columns contain corresponding expression values figure 2 library RColorBrewer col lt colorRampPalette brewer pal 10 RdBu 256 geneNames lt paste em 1 em 2 sep em lt as matrix em c 1 2 ind lt match colnames em sampleInfo 1 colnames em lt sampleInfo ind 2 row lt rep 1 nrow em ind lt grep XBP1 ESR1 GATA3 geneNames perl TRUE rowlind lt 2 rc lt rainbow 2 start 0 end 0 3 re lt rclrow col lt colorRampPalette brewer pal 10 RdBu 256 split lt strsplit colnames em fixed TRUE pheno lt unlist lapply split 1 VVVVVV VV VV VY VY VY pheno lt as factor pheno levels pheno lt 1 5 cc lt rainbow 5 start 0 end 0 3 cc lt ccipheno heatmap em col col RowSideColors rc ColSideColors cc labRow geneNames cexRow 0 3 vyv vvv
65. FDR 10 S1 3 Inflation 2 Found at doi 10 1371 journal pone 0004001 s005 8 72 MB TIF Figure S6 The TBMap plugin These pictures are derived from the GPL96 map 22 215 probes as rows and 3 114 GPL96 specific TS as columns Red indicates the presence of a gene in the corresponding TS default to black Only small parts of the map are displayed A A cluster enriched in genes from the Aminoacyl tRNA_ biosynthesis KEGG pathway hsa00970 Genes rows from this KEGG pathway are displayed as blue lines GARS SARS AARS GARS MARS IARS YARS Genes from a manually entered gene list are shown in yellow TRIB3 MOCOS MPZL1 CBS PPCDC B A cluster enriched in genes related to oxydative phosphorylation KEGG pathway hsa00190 Oxidative phosphorylation C A cluster containing December 2008 Volume 3 Issue 12 e4001 genes related to ribosome biogenesis KEGG pathway hsa03010 Ribosome D A cluster enriched in genes involved in cell proliferation KEGG pathway hsa04110 Cell cycle Found at doi 10 1371 journal pone 0004001 s006 9 66 MB TIF Table S1 Informations related to Affymetrix platforms n 33 used in the present work Found at doi 10 1371 journal pone 0004001 s007 0 12 MB XLS Table S2 Informations related to experiments n 1 484 that were analyzed using the DBF MCL algorithm All Informations were obtained from the GEO website Found at doi 10 1371 journal pone 0004001 s008
66. Figure 1B More subtle biclusters related to immune system were also found As an exam ple RBPJK specific PWMs M01112 M01111 were statistically significantly associated with terms Notch signaling pathway Although RBPJK is already known to be crucial in NOTCH sig naling pathway PWMs related to TCF3 also known as E2A and E47 and AP 4 were also found in the same bicluster Figure 1C This observation is very consistent with the known role of these TFs in early B cell differentiation a development step for which Notch pathway is decisive 34 35 As expected a bicluster containing almost all E2F related PWMs was also found Finally several bi clusters related to Muscle contraction Phosphorus metabolic processes Synaptic transmis sion Protein catabolic processes and Pre mRNA processing were also observed and are pre sented in figure 2E I Altogether these results highlight the biological relevance of the TFBS pre dictions and provides a systematic overview of putative regulatory interactions in human and mouse These predictions have been termed TBMC TranscriptomeBrowser Motif Conservation and are available through the InteractomeBrowser plugin or as a bed file See supplementary mate rial 10 InteractomeBrowser graph based knowledge browser The InteractomeBrowser application can be used to connect to our database in order to identify and analyze molecular interactions See supplementar
67. Livak and Schmittgen 2001 Preparation of Samples and Microarray Assay Sample amplification labeling and hybridization essentially fol lowed the one color microarray based gene expression analysis low input quick amp labeling protocol version 6 5 May 2010 recom mended by Agilent Technologies In brief 500 ng of each total RNA sample was reverse transcribed into cDNA using oligo dT T7 pro moter primer Labeled cRNA was synthesized from the cDNA The reaction was performed in a solution containing dNTP mix cya nine 3 dCTP and T7 RNA Polymerase and incubated at 40 C for 2 hr Hybridization was performed into whole human genome mi croarray slides 4 x 44K G4112F Agilent Technologies Santa Clara CA containing 45 220 oligonucleotide probes at 65 C for 17 hr Hybridized microarray slides were then washed according to the manufacturer s instructions and scanned using an Agilent DNA Mi croarray Scanner using the Agilent Feature Extraction Software Ag ilent Technologies The microarray data are available from the Gene Expression Omnibus GEO http www ncbi nlm nih gov geo under the series accession number GSE27915 Microarray Data Analysis Quantification files derived from the Agilent Feature Extraction Software were analyzed using the AgiND package http tagc univ mrs fr AgiND We also used the AgiND R package for quality control and normalization Quantile methods and a background correction were used for data
68. P L amp Reik W 2009 Evolution and functions of long noncoding RNAs Cell 136 4 629 41 Prober et al 1987 Prober J M Trainor G L Dam R J Hobbs F W Robertson C W Zagursky R J Cocuzza A J Jensen M A amp Baumeister K 1987 A system for rapid DNA sequencing with fluorescent chain terminating dideoxynucleotides Science New York N Y 238 4825 336 41 Quinlan amp Hall 2010 Quinlan A R amp Hall I M 2010 BEDTools a flexible suite of utilities for comparing genomic features Bioinformatics Oxford England 26 6 841 2 Ransohoff amp Gourlay 2010 Ransohoff D F amp Gourlay M L 2010 Sources of bias in specimens for research about molecular markers for cancer Journal of clinical oncology official journal of the American Society of Clinical Oncology 28 4 698 704 Ravasi et al 2010 Ravasi T Suzuki H Cannistraci C V Katayama S Bajic V B Tan K Akalin A Schmeier S Kanamori Katayama M Bertin N Carninci P Daub C O Forrest A R R Gough J Grimmond S Han J H Hashimoto T Hide W Hofmann O Kamburov A Kaur M Kawaji H Kubosaki A Lassmann T van Nimwegen E MacPherson C R Ogawa C Radovanovic A Schwartz A Teasdale R D Tegn r J Lenhard B Teichmann S A Arakawa T Ninomiya N Murakami K Tagami M Fukuda S Imamura K Kai C Ishihara R Kitazume Y Kawai J
69. Peng X Peterson R L Phan J H Quanz B Ren Y Riccadonna S Roter A H Samuelson F W Schumacher M M Shambaugh J D Shi Q Shippy R Si S Smalter A Sotiriou C Soukup M Staedtler F Steiner G Stokes T H Sun Q Tan P Y Tang R Tezak Z Thorn B Tsyganova M Turpaz Y Vega S C Visintainer R von Frese J Wang C Wang E Wang J Wang W Westermann F Willey J C Woods M Wu S Xiao N Xu J Xu L Yang L Zeng X Zhang J Zhang L Zhang M Zhao C Puri R K Scherf U Tong W amp Wolfinger R D 2010 The MicroArray Quality Control MAQC II study of common practices for the development and validation of microarray based predictive models Nature biotechnology 28 8 827 38 Shi et al 2006 Shi L Reid L H Jones W D Shippy R Warrington J A Baker S C Collins P J de Longueville F Kawasaki E S Lee K Y Luo Y Sun Y A Willey J C Setterquist R A Fischer G M Tong W Dragan Y P Dix D J Frueh F W Goodsaid F M Herman D Jensen R V Johnson C D Lobenhofer E K Puri R K Schrf U Thierry Mieg J Wang C Wilson M Wolber P K Zhang L Amur S Bao W Barbacioru C C Lucas A B Bertholet V Boysen C Bromley B Brown D Brunner A Canales R Cao X M Cebula T A Chen J J Cheng J Chu T M Chudin E Corson J C
70. Principe de l immunopr cipitation de la chromatine associ e au s quen age tr s haut d bit ChIP seq Ce 00 217 Lyse des cellules crosslink des cellules R cup ration de la chromatine au formaldehyde Ahaoe Fragmentation de la chromatine Immuno pr cipitation de la chromatine 4 XA co Facteur de transcription Amplification marquage t Hybridation sur puce ADN ChIP on chip t Intensit pour chaque sonde r parties gale i ll ll distance sur le g nome Ree Elution Purification d ADN l qu S quen age ChIP SEQ ChIP PET Alignement des lectures reads sur la s quence de r f rence a Traitement des donn es 47 D tection de pics a Pr paration des librairies Me Modification N terminal d histone Ficure 5 1 ChIP seq vs ChIP on chip processus g n ral sondes par puce Technologie ChIP on chip puces ChIP seq s quen a ADN cage tr s haut d bit R v lation Hybridation S quen age d ADNc d ADNg R solution 6 5 millions de gt 700 millions de s quences obtenues reads par run Couverture du g nome Limit e par le Illimit e nombre de sondes pr sentes sur la puce Risque de cross hybridation Oui entre les s Aucun quences tr s simi laires Multiplexage Non Oui Taille des fragments 600 pb 150 300 pb Nombre de cellule
71. R arrayQualityMetrics X R genefilter X R vsn vsn R array Tools X X X Excel R Java GeneSpring GX X X Quantiles lo Applet wess print tip JAVA API loess scale SOAP API JYTHON JAVA Based Python R TABLE 2 1 Synth se des principaux outils permettant l analyse des donn es de puces ADN de technologie Agilent En gris le logiciel commercial d velopp par Agilent et en gras les caract ristiques de notre librairie R AgiND CHAPITRE 3 Analyses de donn es de puces ADN Sommaire 3 1 S lection de g nes 76 Sel Tett 2 cs sorsana ridser a aah de ed ele aude te ue 76 3 1 2 Significant Analysis of Microarrays SAM 78 3 1 3 ANalysis Of VAriance ANOVA 79 3 2 M thodes de classification non supervis es 80 3 2 1 La m thode de classification hi rarchique 80 3 2 2 La m thode des k moyens k means 80 3 2 3 Self organizing maps SOM 82 3 3 Annotation fonctionnelle 82 3 3 1 Les diff rentes sources d information 83 3 3 2 Quelques outils d annotation 83 3 3 3 Tests d enrichissement fonctionnel 86 3 4 Analyses de donn es dans le cadre de collaborations 88 34 1 badensue oc 26 4 54 done he ba ee ba dut b 4 89 ARTICLE
72. T Wilson R K amp Mardis E R 2008 Whole genome sequencing and variant discovery in C ele gans Nature methods 5 2 183 8 Hillmer et al 2011 Hillmer A M Yao F Inaki K Lee W H Ariyaratne P N Teo A S M Woo X Y Zhang Z Zhao H Ukil L Chen J P Zhu F So J B Y Salto Tellez M Poh W T Zawack K F B Nagarajan N Gao S Li G Kumar V Lim H P J Sia Y Y Chan C S Leong S T Neo S C Choi P S D Thoreau H Tan P B O Shahab A Ruan X Bergh J Hall P Cacheux Rataboul V Wei C L Yeoh K G Sung W K Bourque G Liu E T amp Ruan Y 2011 Comprehensive long span paired end tag mapping reveals characteristic patterns of structural variations in epithelial cancer genomes Genome research 21 5 665 75 Hims et al 2007 Hims M M Shetty R S Pickel J Mull J Leyne M Liu L Gusella J F amp Slaugenhaupt S A 2007 A humanized IKBKAP transgenic mouse models a tissue specific human splicing defect Genomics 90 3 389 96 Ho et al 2011 Ho J W K Bishop E Karchenko P V N gre N White K P amp Park P J 2011 ChIP chip versus ChIP seq lessons for experimental design and data analysis BMC genomics 12 134 Holt amp Jones 2008 Holt R A amp Jones S J M 2008 The new paradigm of flow cell sequencing Genome research 18 6 839 46 Holtgrewe et al 2011 Ho
73. TS and will be presented in the next section Finally the TBMap plugin that can be used to visualize a summary of transcriptional regulation events observed in a given microarray platform will be presented in the last paragraph of the results section Meta analysis of public microarray data using TBrowser a case study TBrowser can be used in many biological contexts to point out relevant experiments and construct robust gene networks Several peer reviewed publications have highlighted the joint regulation of the estrogen receptor a ESR1 ER GATA3 and FOXAI in breast cancer cells 14 Although some of these reports have associated entry in the GEO database retrieving neighbors of GATA3 FOXAI and ESRI remains a time consuming and difficult task using existing tools As a consequence these informations are reserved to those with strong bioinformatics skills although they are of primary interest to the biologist Using the TBrowser search engine this task can be translated into a very simple Boolean query ESRI amp GATA3 amp FOXA1 which will December 2008 Volume 3 Issue 12 e4001 W QS6F7E4M w SCEGRFOFT W A6T9SF2TA Tables ALL Q vatue max BH 1E GEO Datamining with TBrowser 11 experiments GSE3849 w GSE2192 Wi GSESS2 w GSES297 GSES657 M GSE2331 Wi GSE97 i GSE2148 Functional annotation of a full length mouse cDNA collection l i Kawai J Shinagawa A Shibata K Yoshino M I
74. Tan J Town T Mori T Obregon D Wu Y DelleDonne A Rojiani A Crawford F Flavell RA Mullan M 2002 CD40 is expressed and functional on neuronal cells EMBO J 21 643 652 Tufarelli C Frischauf AM Hardison R Flint J Higgs DR 2001 Characterization of a widely expressed gene LUC7 LIKE LUC7L defining the centromeric boundary of the human alpha globin domain Genomics 71 307 314 Ule J Ule A Spencer J Williams A Hu JS Cline M Wang H Clark T Fraser C Ruggiu M Zeeberg BR Kane D Weinstein JN Blume J Darnell RB 2005 Nova regulates brain specific splicing to shape the synapse Nat Genet 37 844 852 Will CL Urlaub H Achsel T Gentzel M Wilm M Luhrmann R 2002 Characterization of novel SF3b and 17S U2 snRNP proteins including a human Prp5p homologue and an SF3b DEAD box protein EMBO J 21 4978 4988 Zhang X Klueber KM Guo Z Cai J Lu C Winstead WI Qiu M Roisen FJ 2006 Induction of neuronal differentiation of adult human olfactory neuroepithelial derived progenitors Brain Res 1073 1074 109 119 HUMAN MUTATION Vol 00 No 0 1 11 2012 1 1 140 Chapitre 3 Analyses de donn es de puces ADN 3 5 Conclusions et perspectives Les puces ADN une technique tr s importante pour l tude des pathologies L analyse de donn es de puces ADN reste une technique de choix pour l tude de l expression des ARN messagers En effet cette technique a r volutionn l tude des patho logies et a permis
75. and other roles for IKAP hELPI have been proposed outside of the Elongator complex Thus it is not clear whether the new IKBKAP isoforms we described may have functional roles Future investigations with specific reagents antibodies will be required to address this issue Nevertheless we consistently detected a lower expression of IKBKAP gene including the full length exon 2 transcript and the transcripts skipping exon 36 in FD hOE MSGCs Figure 6C as determined when investigating the exon 20 region Thus the relatively stable expression of JABKAP observed in microarray analysis may be due to a weak expression that is masked within the noise signals Furthermore during the analysis of our microarray data and those of previous studies 10 22 40 we noticed that a high fraction of genes were expressed at background levels This points to the limitation of using microarray technology to establish the whole genome expression pattern We expect that new technologies such as RNA deep sequencing will rival PCR sensitivity and specificy in the near future The model of hOE MSCs from FD patients has also been very useful to test compounds such as kinetin that can correct the defective splicing process As reported in the other cell types tested we confirmed that kinetin corrects splicing in a dose dependent manner in FD hOE MSCs Figure 7A C This suggests that kinetin activity is not cell type specific Although the mechanism by which kinetin
76. annotated or neuron differentiation GO biologi cal process q value 1 48 10 18 genes out 78 annotated Very concordant results were also ob served for human a summary of functional enrichment analysis using the ClueGO cytoscape plugin is provided in Supplementary Figure S2 and S3 28 Actually these results are in agreement with the work of Bejerano and collaborators that showed that ultraconserved elements of the human genome are most often found in genes involved in the regulation of transcription and development 29 As a consequence our phylogenetic footprinting analysis predicts a higher number of motifs in the promoter regions of these genes Although TFBS conservation in mammals has been previously analyzed in several papers none of them to our knowledge reported this observation that may in troduce a bias in the analysis However these ultraconserved regions may also be reminiscent of HOT high occupancy target regions identified using ChIP seq analysis in Caenorhabditis elegans and Drosophila 30 31 Indeed HOT regions have been shown to be significantly associated with essential genes i e having an RNAi phenotype of 100 larval arrest embryonic lethality or sterility and genes related to growth reproduction and larval and embryonic development Howev er we cannot rule out that these ultra conserved regions may be also related to other mechanisms than regulation by site specific TFs Biological relevance
77. avant Enfin un immense merci mon ch ri Christophe qui a subi ces 9 derniers mois la naissance de cette th se pour tout le soutien que tu m as toujours donn On a connu le plus dur mon coeur il nous reste le meilleur vivre A mes parents pour leur immense soutien et en m moire de ma grand m re Monique qui m a toujours pouss e me surpasser Table des mati res Remerciements hr pios Bee Bae OR Rs Res Nr ROE ER SS 1 Liste des RE D Se A OOD a 00e mater dem 9 Liste des tables 2223 i423 2544444 646464424 444 44 44 Rkh 4444 11 Listedes Abr viations 2 gw a 4 a ee ee ee ee EN ea MN EM es i acd 13 Avant propos Contexte de la th se 15 1 Introduction g n rale 19 1 1 Etude des pathologies 19 hee Te ifanseriptone 6 vee LR SLR Te ner EN CREED ERG ERD ER 20 1 2 1 Principe des puces ADN 2 24484 e eek ed BEG REE RE eR GS 21 1 2 2 Cas particulier des puces ADN de technologie Agilent 22 1 3 R gulation de l expression des g nes 24 1 3 1 La transcription basale da ok eed st ss Dh ee aie de 24 1 3 2 Les s quences r gulatrices et les facteurs de transcription s quences SO EE eh ee he RE He a 25 1 3 3 La chromatine histones et marques pig n tiques 25 1 3 4 Les ARN non codants 2 44 44 m4 Desassestess Pat es 27 13 5 Epig n tique et pig nomes su ad RSR AEE EER EEE 29 1 4 Les techniques de s quen age
78. beyond the consensus coding DNA sequence exome reveals exons with higher variant den sities Genome biology 12 7 R68 Barnett ef al 2011 Barnett D W Garrison E K Quinlan A R Str mberg M P amp Marth G T 2011 BamTools a C API and toolkit for analyzing and managing BAM files Bioinformatics Oxford England 27 12 1691 2 Baron ef al 2011 Baron D Magot A Ramstein G Steenman M Fayet G Chevalier C Jourdon P Houlgatte R Savagner F amp Pereon Y 2011 Immune response and mito chondrial metabolism are commonly deregulated in DMD and aging skeletal muscle PloS one 6 11 e26952 286 Bibliographie Barozzi et al 2011 Barozzi I Termanini A Minucci S amp Natoli G 2011 Fish the ChIPs a pipeline for automated genomic annotation of ChIP Seq data Biology direct 6 1 51 Barrett et al 2005 Barrett T Suzek T O Troup D B Wilhite S E Ngau W C Le doux P Rudnev D Lash A E Fujibuchi W amp Edgar R 2005 NCBI GEO mining millions of expression profiles database and tools Nucleic acids research 33 Database issue D562 6 Barski et al 2007 Barski A Cuddapah S Cui K Roh T Y Schones D E Wang Z Wei G Chepelev I amp Zhao K 2007 High resolution profiling of histone methylations in the human genome Cell 129 4 823 37 Bernstein ef al 2010 Bernstein B E Stamatoyannopoulos J A Co
79. bioche mistry 174 2 423 36 Impey et al 2004 Impey S McCorkle S R Cha Molstad H Dwyer J M Yochum G S Boss J M McWeeney S Dunn J J Mandel G amp Goodman R H 2004 Defi ning the CREB regulon a genome wide analysis of transcription factor regulatory regions Cell 119 7 1041 54 Inza et al 2004 Inza I n Larra aga P Blanco R amp Cerrolaza A J 2004 Filter versus wrapper gene selection approaches in DNA microarray domains Artificial intelligence in medicine 31 2 91 103 Irizarry et al 2005 Irizarry R A Warren D Spencer F Kim I F Biswal S Frank B C Gabrielson E Garcia J G N Geoghegan J Germino G Griffin C Hilmer S C Hoffman E Jedlicka A E Kawasaki E Martinez Murillo F Morsberger L Lee H Petersen D Quackenbush J Scott A Wilson M Yang Y Ye S Q amp Yu W 2005 Multiple laboratory comparison of microarray platforms Nature methods 2 5 345 50 Jenuwein amp Allis 2001 Jenuwein T amp Allis C D 2001 Translating the histone code Science New York N Y 293 5532 1074 80 Johnson et al 2007 Johnson D S Mortazavi A Myers R M amp Wold B 2007 Genome wide mapping of in vivo protein DNA interactions Science New York N Y 316 5830 1497 502 Kaikkonen et al 2011 Kaikkonen M U Lam M T Y amp Glass C K 2011 Non coding RNAs as regulators of gene
80. burnetii de souris dans une publication par Textoris et collaborateurs en ao t 2010 Textoris et al 2010 172 Chapitre 4 Fouille de donn es de puces ADN Seuch Inte 6 platoons 18 experiments Signanme Mateform Experiment 3 mi GSE603a iGseseun Roquest murs Grae symbot F1 Probe 10 Aarolatien o kap Pore Mus muscul es Genie lt j Bowlean mode z Ej 351478558 Tables Load data Send to plugins Create group Back Q vatue max CRM LE Plugins GExcH Heauean Seuisus Heatmap Visualisation de la matrice d expression d une signature transcriptionnelle information sur les sondes les chantillons et enrichissement fonctionnel TETT TBMap Exploitation des cartes transcriptionnelles visualisation synth tique des signatures transcriptionnelles pour une plateforme de puces ADN ers hendan joues itents Entre x 1 TE Fe PTE RAT EN Amber ee ngn se on TBNeighborhood Comparaison de la composition en g ne d un groupe de signature et information sur le g ne s lectionn Gun Eu Mimin KeggSearch Annotation fonctionnelle en utilisant Kegg pathways d une liste de g ne et visualisation du pathway s lectionn A P w Tng EN InteractomeBrowser Visualisation du r seau d interaction pr
81. cells well in a final volume of 200 ul serum or serum free culture medium with or without 100 uM kinetin Cells were allowed to migrate through the membrane filter for 24h at 37 C 5 CO Cells migrating thought the membrane pore and invading the underside surface of the membrane were fixed with 4 paraformaldehyde Non migratory cells on the upper membrane surface were removed with a cotton swab and nuclei were stained with 0 5 ug mL DNA intercalant Hoechst 33258 For quantitative assessment the number of stained migrating cells was counted with image software on 10 random fields per membrane filter at x20 magnification Supporting Information Figure S1 50 genes are differentially expressed between control and FD hOE MSCs Heatmap representation of overexpressed red and underexpressed green genes in 5 controls and 4 FD OE MSGs at passage 1 2 5 and 9 Normalized signal References 1 Axelrod FB Iyer K Fish I Pearson J Sein ME et al 1981 Progressive sensory loss in familial dysautonomia Pediatrics 67 517 22 2 Pearson J Pytel BA Grover Johnson N Axelrod F Dancis J 1978 Quantitative studies of dorsal root ganglia and neuropathologic observations on spinal cords in familial dysautonomia J Neurol Sci 35 77 92 3 Pearson J Pytel BA 1978 Quantitative studies of sympathetic ganglia and spina cord intermedio lateral gray columns in familial dysautonomia J Neurol Sci 39 47 59 4 Axelrod FB 2004 Familial
82. cellular models to reproduce neuronal cells in early development spheres and neuroglial progenitors in later developmental stages using the rafnshh treatment Retinoic acid RA and Sonic hedgehog Shh are known to regulate neuronal specification and differentiation during development Probst et al 2011 Both RA and Shh induced expression of a set of genes and proteins that define peripheral nervous system sensory neurons in murine mesenchymal stem cells Kondo et al 2005 These fac tors were also shown to stimulate the expression of motoneuronal transcription factors in parallel to neurite formation on hOE MSCs Zhang et al 2006 Previous microarray studies of FD were unable to discriminate IKBKAP expression between FD and control cells Boone et al 2010 Cheishvili et al 2007 Keren et al 2010 Lee et al 2009 However in our analysis we detected an IKBKAP signal above background level in both control and FD patient samples In addition we found that IKBKAP was the best marker for FD since this gene was initially underexpressed in FD cells but then showed even higher expres sion after kinetin treatment These results increased confidence in interpreting our microarray data In accordance with previous microarray studies Boone et al 2010 Cheishvili et al 2007 Lee et al 2009 the FD transcriptional signature is characterized by a general decrease in transcriptional expression that might reflect a defect in tr
83. centric re gion Cell 98 2 249 59 Boone et al 2010 Boone N Loriod B Bergon A Sbai O Formisano Tr ziny C Ga bert J Khrestchatisky M Nguyen C F ron F Axelrod F B amp Ibrahim E C 2010 Olfactory stem cells a new cellular model for studying molecular mechanisms underlying familial dysautonomia PloS one 5 12 e15590 Borgstr m et al 2011 Borgstr m E Lundin S amp Lundeberg J 2011 Large scale library generation for high throughput sequencing PloS one 6 4 e19119 Brazma et al 2001 Brazma A Hingamp P Quackenbush J Sherlock G Spellman P Stoeckert C Aach J Ansorge W Ball C A Causton H C Gaasterland T Glenisson P Holstege F C Kim I F Markowitz V Matese J C Parkinson H Robinson A Sarkans U Schulze Kremer S Stewart J Taylor R Vilo J amp Vingron M 2001 Mini mum information about a microarray experiment MIAME toward standards for microarray data Nature genetics 29 4 365 71 Bibliographie 287 Brazma et al 2003 Brazma A Parkinson H Sarkans U Shojatalab M Vilo J Abey gunawardena N Holloway E Kapushesky M Kemmeren P Lara G G Oezcimen A Rocca Serra P amp Sansone S A 2003 ArrayExpress a public repository for microarray gene expression data at the EBI Nucleic acids research 31 1 68 71 Cai et al 2010 Cai X Hou L Su N Hu H Deng M am
84. chromosomal DNA These abnormalities did not lead to a global increase in X chromosome transcription but were associated with overexpression of a small subset of X chromosomal genes Other equally aneuploid but non BLC rarely displayed these X chromosome abnormalities These results suggest that X chromosome abnormalities contribute to the pathogenesis of BLC both inherited and sporadic total 62 sample incudes 43 tumor 7 normal breast and 12 normal organelle As expected the transcriptional signature 3DE64836D correspond to a breast cancer tumor anal ysis This is also true for the other TS not shown 2 3 Finding transcriptional neighbors One interesting feature of RTools4TB its ability to find genes frequently co expressed with the input list Indeed results from a request to TBrowserDB can be displayed as a graph using the createGraph4BioC function This function retrieves the list of TS that verify the constrain here XBP1 amp ESRI amp GATA3 A list of gene falling in at least one of the TS is next computed A gene gene matrix M is created that will record for each pair of gene the number of time they were observed in the same signature In the following example only a subset of this adjacency matrix containing genes falling in a significant proportion of signatures prop 80 is used to create a graph gt library biocGraph gt adjMat lt createGraph4BioC request XBP1 amp ESR1 amp GATA3 prop 80 gt
85. ci sont tous appliqu s a chaque g ne pr sent sur la puce ADN afin de d terminer les g nes diff rentiellement exprim s en fonction des diff rents groupes d chantillons En effet il faut distinguer les cas o les don n es analys es sont ind pendantes Golub et al 19991 ou appari es Perou et al 20001 Il est galement important d valuer la distribution des donn es pour d terminer si l on peut utiliser des tests param triques ou non param triques On parle de tests param triques par exemple t test ANOVA lorsque l on postule que les donn es sont issues d une distribution param tr e distribution normale par exemple L hypoth se de normalit sous jacente des donn es est souvent utilis e en effet la transformation des donn es en logarithme de base 2 permet d obtenir une distribution assimilable une gaussienne La moyenne et la variance de ces donn es suffisent ainsi caract riser compl tement leur distribution Contrairement aux tests param triques en ne faisant aucune hypoth se sur la distribution des donn es les tests non param triques largissent le champ d application des m thodes statistiques En contrepartie ils sont moins puissants lorsque ces hypoth ses sont compatibles avec les donn es Les r sultats des tests statistiques multiples doivent tre corrig s pour minimiser le nombre de faux positifs L hypoth se nulle not e HO de ces tests est qu il n existe pas d
86. community would greatly benefit from having a unified database storing known and predicted molecular interactions Furthermore given the intrinsic complexity of the data the development of new tools offering integrated and meaningful visualizations of molecular interactions is necessary to help users drawing new hypotheses without being overwhelmed by the density of the subsequent graph Results We extend the previously developed TranscriptomeBrowser database with a set of tables containing 1 594 978 human and mouse molecular interactions The database includes i predicted regulatory interactions computed by scanning vertebrate alignments with a set of 1 213 position weight matri ces 11 potential regulatory interactions inferred from systematic analysis of ChIP seq experi ments iii regulatory interactions curated from the literature iv predicted post transcriptional regulation by micro RNA v protein kinase substrate interactions and vi physical protein protein interactions In order to easily retrieve and efficiently analyze these interactions we developed In teractomeBrowser a graph based knowledge browser that comes as a plug in for Transcriptome Browser The first objective of InteractomeBrowser is to provide a user friendly tool to get new in sight into any gene list by providing a context specific display of putative regulatory and physical interactions To achieve this InteractomeBrowser relies on a cell compartments b
87. d annotation afin de pouvoir analyser les signatures transcriptionnelles Figure 4 5 De plus par recoupement avec les nouveaux identifiants contenus dans la base de donn es nous avons galement pu int grer des annotations dont les identifiants taient des noms de g nes parfois non officiels des EnsemblID des UniprotID J ai ainsi cr des proc dures stock es et un script bash permettant 1 d extraire les donn es 2 de les formater pour leur int gration dans la base de donn es et 3 de remplir toutes les tables concernant les annotations KEY WORD KEY WORDCOUNT ONTOLOGY ONTOLOGYCOUNT J ai ensuite impl ment un autre script bash faisant appel la base de donn es et R pour permettre le calcul des enrichissements fonctionnels pour une annotation et une liste de signatures Finalement pour pouvoir facilement et automatiquement g n rer les signatures transcrip tionnelles un pipeline automatique a t r alis partir des divers programmes en R perl C et bash d velopp s au laboratoire Il permet partir d une liste de GSE d extraire les matrices d expression normalis es de v rifier maintenant si on dispose de la GPL dans notre base de donn es de filtrer les exp riences de plus de 8 chantillons et bien s r de g n rer automatiquement les signatures l aide de notre librairie R Bioconductor RTools4TB Une fois toutes ces informations r colt es les donn es sont automatiqu
88. d identifier les g nes impliqu s dans un processus biologique particulier de comprendre comment ceux ci interagissent et de mettre en vidence des perturbations pouvant conduire un tat pathologique Ceci n cessite a minima une caract risation du g nome du prot ome et du m tabolome de l organisme L inf rence de r seaux fait appel des donn es pouvant tre tr s h t rog nes donn es d expression d interactions g n tiques ou physiques Cette mod lisation peut tre r alis e notamment l aide du logiciel GINsim d velopp au sein de notre laboratoire Chaouiya ef al 201271 Afin de pouvoir g n rer des donn es de co expression de g nes partir de donn es publiques le logiciel de m ta analse TranscriptomeBrowser a t cr Lopez ef al 20081 J ai t fortement impliqu e dans ce projet d s mon master 4 1 Stockage des donn es Une fois g n r es les donn es de puces ADN doivent tre stock es puis mis disposition de la communaut scientifique pour permettre leur r analyse Pour cela les laboratoires mettent g n ralement en place des syst mes internes afin de tracer les donn es concourant ainsi leur qualit LIMS base de donn es Le moyen le plus efficace pour stocker et extraire des informations est l utilisation de bases de donn es syst me qui a t choisi pour le stockage long terme des informations et donn es des exp riences de puces ADN La p
89. damage in dengue Cytokine 30 359 365 Tseng CS Lo HW Teng HC Lo WC Ker CG 2005 Elevated levels of plasma VEGF in patients with dengue hemorrhagic fever FEMS Immunol Med Microbiol 43 99 102 Bozza FA Cruz OG Zagne SM Azeredo EL Nogueira RM et al 2008 Multiplex cytokine profile from dengue patients MIP Ibeta and IFN gamma as predictive factors for severity BMC Infect Dis 8 86 Deen JL Harris E Wills B Balmaseda A Hammond SN et al 2006 The WHO dengue classification and case definitions time for a reassessment Lancet 368 170 173 Bandyopadhyay S Lum LC Kroeger A 2006 Classifying dengue a review of the difficulties in using the WHO case classification for dengue haemorrhagic fever Trop Med Int Health 11 1238 1255 10 11 12 13 14 16 17 18 19 20 PLoS ONE www plosone org 14 Molecular Mechanisms of DSS We thank Dr J Desplans for help in using the Agilent microarrays platform We are also indebted to all doctors nurses patients and their families who participated to this study at the hospital of Kampong Cham Cambodia and to all technicians from the virology department of the Institut Pasteur in Cambodia who carried out dengue diagnosis assays Author Contributions Conceived and designed the experiments SD CS VD PB PCP Performed the experiments SD CS PCP Analyzed the data SD CS AB PR PCP Contributed reagents materials analysis tools VD SO PTL N
90. de Winther MP van Dijk KW Havekes LM Hofker MH 2000 Macrophage scavenger receptor class A A multifunctional receptor in atherosclerosis Arterioscler Thromb Vasc Biol 20 290 297 Febbraio M Hajjar DP Silverstein RL 2001 CD36 a class B scavenger receptor involved in angiogenesis atherosclerosis inflammation and lipid metabolism J Clin Invest 108 785 791 Pennings M Meurs I Ye D Out R Hoekstra M et al 2006 Regulation of cholesterol homeostasis in macrophages and consequences for atherosclerotic lesion development FEBS Lett 580 5588 5596 Chen XP Zhang TT Du GH 2007 Lectin like oxidized low density lipoprotein receptor l a new promising target for the therapy of atheroscle rosis Cardiovasc Drug Rev 25 146 161 Renie G Maingrette F Li L 2007 Diabetic vasculopathy and the lectin like oxidized low density lipoprotein receptor 1 LOX 1 Curr Diabetes Rev 3 103 110 Szanto Roszer T 2008 Nuclear receptors in macrophages a link between metabolism and inflammation FEBS Lett 582 106 116 Ory DS 2004 The niemann pick disease genes regulators of cellular cholesterol homeostasis Trends Cardiovasc Med 14 66 72 Zhang JR Coleman T Langmade SJ Scherrer DE Lane L et al 2008 Niemann Pick C1 protects against atherosclerosis in mice via regulation of macrophage intracellular cholesterol trafficking J Clin Invest 118 2281 2290 Sesti G Federici M Hribal ML Lauro D Sbraccia P et al 2001 Def
91. de choisir une m thode de normalisation ou bien d ap pr cier la qualit d un jeu de donn es les chercheurs s aident des repr sentations graphiques pr sent es plus haut Simples et facilement interpr tables ces graphes sont tr s informatifs et permettent souvent d orienter les analyses d am liorer les protocoles exp rimentaux ou m me 66 Chapitre 2 Contr le qualit et normalisation de donn es de puces ADN de d finir de nouveaux plans d exp riences 2 4 Choix du d veloppement d une librairie R Le choix du langage R a t motiv par plusieurs aspects Tout d abord GNU R www r project org est un langage de programmation fond sur le langage S et un environ nement math matique utilis s pour le traitement de donn es et l analyse statistique Ceci rend cet environnement de travail de plus en plus utilis par les bioinformaticiens Il est impl ment dans les langages C C Fortran et Java R dispose de nombreuses fonctions graphiques et est tr s r guli rement mis jour actuellement 2 nouvelles versions par an Cet environnement poss dent plusieurs avantages 1 c est un logiciel gratuit et ces sources sont disponibles pour la majorit des syst mes d exploitation Windows Linux et Mac OS 2 sa syntaxe est intuitive et permet aux biologistes de l utiliser en ayant recours des commandes faisant appel des librairies ou des fonctions pr alablement cr es ou install es
92. de donn es FIGURE 4 4 Sch ma de aux s rapi N permettant un acc de 47 tables d architecture MyISAM MySQL est constitu e donn es 4 5 Mise jour de la base et int gration de donn es miRNA PICTAR_4WAYS PICTAR_5WAYS PICTAR_CHICKEN PICTAR_DOG TARGETSCAN_HS TARGETSCAN_MM TARGETSCAN WORM TARGETSCAN_DROSO MOTIFS TFBS_CONSERVED ECRbase CISRED PATHWAYS KEGG REACTION NCICB_CAPATHWAY KEGG COMPOUND PANTHER _FAMILY REACTOME BBID PANTHER_PATHWAY BIOCARTA PANTHER_SUBFAMILY PANTHER_TERM_BP KEGG PATHWAY PANTHER_TERM_MF WIKIPATHWAY GENOMIC LOCATION LITERATURE CHROMOSOME CYTOBAND PUBMED_ID HIV_INTERACTION_PUBMED_ID MICROARRAY DB GENESIGDB MSIGDB TB_GENESETS GENE ONTOLOGY GOTERM_MF_ALL GOTERM_CC_ALL GOTERM_BP_ALL HIV HIV_INTERACTION HIV_INTERACTION_CATEGORY DISEASE OMIM_PHENOTYPE OMIM_ID DISEASE PHENOTYPE GENETIC_ASSOCIATION DB PROTEIN DOMAINS SMART_NAME COG KOG NAME TIGRFAMS_NAME PRODOM_NAME SCOP_ID PROSITE_NAME COG KOG ONTOLOGY PIR_SUPERFAMILY_NAME PFAM_NAME KEA SP_PIR_KEYWORDS HMDB 169 Ficure 4 5 Les diverses annotations disponibles dans la base de donn es utilis e pour g n rer l annotation des signatures transcriptionnelles De nombreuses bases de donn es d annotation utilisent galement les gene ID comme identifiant de g ne ceci nous a permis d inclure d autres sources
93. de type SOLID v4 42 Chapitre 1 Introduction g n rale Except pour le mode mate pair il est galement possible de multiplexer les chantillons l aide de code barres rajout s dans la s quence de l adaptateur P2 Fragment Comme le montre la figure 1 12 A ce mode de s quen age relativement simple consiste s quencer de 5 en 3 50 nucl otides des fragments d ADN partir de l adaptateur P1 Cela permet le comptage des fragments d ADN et peut donc tre utilis pour des applications telles que l tude du transcriptome Paired end De plus en plus utilis l heure actuelle cette technique permet un meilleur alignement des s quences par le s quen age de 2 fragments d ADN s par s par 100 a 300 nucl otides Ainsi pour le s quenceur de type SOLID les fragments d ADN sont s quenc s sur 50 nucl otides de 5 en 3 des fragments d ADN partir de l adaptateur P1 et sur 35 nucl otides de 5 en 3 a partir de l adaptateur P2 Figure 1 12 A La distance approximative entre les fragments F3 et F5 correspond la taille des fragments d ADN g n r s lors de la fragmentation par ultrasons et peut tre d termin e par migration sur gel ou par l utilisation de puces a haute sensibilit du Bioanalyzer Agilent Technologies On peut ainsi d finir un intervalle de taille de fragment En effet il est important d valuer ces distances afin de pr voir lors de l alignement l int gr
94. describing TranscriptomeBrowser stategy 1 Morevover in our example the distance method is set to pearson although the spearman that is the default method for computing TS in the TranscriptomeBrowser projet also give very relevant results Note that additional distance including euclidean and two mixtures of pearson and spearman spm and spgm are also available gt res lt DBFMCL subNorm distance method pearson memory 512 The results are stored in an instance of class DBFMCLresult gt class res 1 DBFMCLresult attr package 1 RTools4TB gt res 11 Bibliographie Hum 2010 2010 Human genome Genomes by the thousand Nature 467 7319 1026 7 Aburatani 2011 Aburatani H 2011 Cancer genome analysis through next generation se quencing Gan to kagaku ryoho Cancer amp chemotherapy 38 1 1 6 Aerts et al 2006 Aerts S Lambrechts D Maity S Van Loo P Coessens B De Smet F Tranchevent L C De Moor B Marynen P Hassan B Carmeliet P amp Moreau Y 2006 Gene prioritization through genomic data fusion Nature biotechnology 24 5 537 44 Ahmadian et al 2006 Ahmadian A Ehn M amp Hober S 2006 Pyrosequencing history biochemistry and future Clinica chimica acta international journal of clinical chemistry 363 1 2 83 94 Al Shahrour et al 2007 Al Shahrour F Minguez P T rraga J Med
95. disease Arterioscler Thromb Vasc Biol 27 2302 2309 Cho HJ Cho HJ Kim HS 2009 Osteopontin a multifunctional protein at the crossroads of inflammation atherosclerosis and vascular calcification Curr Atheroscler Rep 11 206 213 Forte TM Subbanagounder G Berliner JA Blanche PJ Clermont AO et al 2002 Altered activities of anti atherogenic enzymes LCAT paraoxonase and platelet activating factor acetylhydrolase in atherosclerosis susceptible mice J Lipid Res 43 477 485 Gilroy DW Newson J Sawmynaden P Willoughby DA Croxtall JD 2004 A novel role for phospholipase A2 isoforms in the checkpoint control of acute inflammation Faseb J 18 489 498 Homaidan FR Chakroun I Haidar HA El Sabban ME 2002 Protein regulators of eicosanoid synthesis role in inflammation Curr Protein Pept Sci 3 467 484 Khanapure SP Garvey DS Janero DR Letts LG 2007 Eicosanoids in inflammation biosynthesis pharmacology and therapeutic frontiers Curr Top Med Chem 7 311 340 Harizi H Corcuff JB Gualde N 2008 Arachidonic acid derived eicosanoids roles in biology and immunopathology Trends Mol Med 14 461 469 Rossi A Cuzzocrea S Sautebin L 2009 Involvement of leukotriene pathway in the pathogenesis of ischemia reperfusion injury and septic and non septic shock Curr Vasc Pharmacol 7 185 197 Lima JJ Blake KV Tantisira KG Weiss ST 2009 Pharmacogenetics of asthma Curr Opin Pulm Med 15 57 62 Wittwer J Hersberger M 2007
96. distances to their nearest neighbors z e strong profile similarities To isolate these regions we can compute for each gene the distance with its ky nearest neighbor DKNN Ifk is relatively small DKNN should be smaller for all genes falling in a dense area Thus the filtering procedure used in DBF MCL starts by computing a gene gene distance matrix D Then for each gene DBF MCL computes its associated DENN value with k being set typically to 100 for microarrays containing 10 to 50k elements Distributions of DKNN values observed with both an artificial and a real dataset Complex9RN200 and GSE1456 respectively see thereafter for a description are shown in Figure S3A and S3B solid curve The asymmetrical shape of the distribution observed in Figure S3B suggests the presence of a particular structure within the GSE1456 microarray dataset Indeed the long tail that corresponds to low DKNN values could indicate the existence of dense regions The fact that regions of heterogeneous densities exist in the Complex9RN200 artificial dataset is even clearer as a bimodal distribution is observed Next we would like to define a critical DKNN value below which a gene can be considered as belonging to a dense area and that would depend on the intrinsic structure of the dataset To this end DBF MCL computes simulated DKNN values by using an empirical randomization procedure Given a dataset containing n genes and p samples a simulated DKNN valu
97. e Marseille France 3 Institut de Math matiques de Luminy Campus de Luminy Marseille France 4ESIL Universit de Provence et de la M diterran e Marseille France 5 Service d Anesth sie et de R animation h pital Nord Assistance Publique H pitaux de Marseille Marseille France Abstract Background As public microarray repositories are constantly growing we are facing the challenge of designing strategies to provide productive access to the available data Methodology We used a modified version of the Markov clustering algorithm to systematically extract clusters of co regulated genes from hundreds of microarray datasets stored in the Gene Expression Omnibus database n 1 484 This approach led to the definition of 18 250 transcriptional signatures TS that were tested for functional enrichment using the DAVID knowledgebase Over representation of functional terms was found in a large proportion of these TS 84 We developed a JAVA application TBrowser that comes with an open plug in architecture and whose interface implements a highly sophisticated search engine supporting several Boolean operators http tagc univ mrs fr tbrowser User can search and analyze TS containing a list of identifiers gene symbols or AffyiDs or associated with a set of functional terms Conclusions Significance As proof of principle TBrowser was used to define breast cancer cell specific genes and to detect chromosomal abnormalities in tumor
98. ee a Te OAS3 IRFS HLA G co22 TRA PARP12 ITGAM HLA J co72 TRAF3 PLSCRI ITGB2 HMHAI cD798 ZAP70 PYHIN LAIRI IFI35 CSDA RSAD2 LAPTMS IRF1 CSF3R cp160 RTP4 LCP2 PSMB10 CSRP2 CD226 SAMD9 LILRB1 PSMB8 CXCR7 CD244 sPp110 LILRB4 PSMB9 DENNDS CXCRE STATI LPXN PSME2 DENND4B DLGS TNFSF10 LSTI RARRES3 DPEP1 DUSP2 TRIM22 LYZ SLAMFS DUSP6 EIFSB usps MNDA TAPI DYRK3 GFN XAFI MS4A4A I usezte EGLNI HDAC MS4A6A ELK3 IFNG NPL INTS3 IGSF2 PILRA Jup IL12RB2 PTPRC LILRA2 ILISRI RNASE6 MME IL18RAP SAMHD1 MS4A1 KIR20L3 SAMSNI PNG KIR2DLSA SELPLG reas KIR2DS1 SLA SZ ___ pTeni2 KIR2DS2 TFEC ReL2 KIR2DSS TUR SERTAD3 KIRSDL2 TLR2 SIPAI KIRSDLS TLR7 SLC3SE3 KIR3DS1 TNFRSFIS socs2 KLRC2 TCLIA KLRC3 TIAFI KLARET TNFRSF21 TBG TSPAN13 TOX TSPAN14 Figure 4 The transcriptional MAP associated with GPL96 related experiments A low resolution image made of 22 215 probes from GPL96 platform as rows and 3 114 GPL96 specific TS as columns Red color indicates the presence of a gene in the corresponding TS default to black B Zooms of the corresponding areas showing some immune system related meta signatures C Representative genes that fall into these clusters doi 10 1371 journal pone 0004001 g004 associated to multiple underlying pathways whose components and limits are unclear Our difficulty in depicting comprehensive maps for pathways is illustrated by existing discrepancies for instance between those proposed by BioCarta KEGG and Ge
99. en mulsion de ADN fix sur des billes suivie de pyros quencage Margulies et al 2005 Rothberg amp Leamon 2008 dans une plaque de pico titration permettant une lecture en parall le des millions de fragments d ADN a s quencer Figure 1 8 Le mod le le plus puissant actuellement commercialis est le GS FLX Titanium Les librairies constitu es de fragments d ADN simple brin auxquels sont fix s les adap tateurs sont mises en pr sence de billes magn tiques sur lesquelles sont fix es des milliers de copies de la s quence compl mentaire de l adaptateur 1 Une mulsion en concentration limitante d ADN permet de fixer un seul fragment d ADN par bille qui sera par la suite amplifi par PCR Une fois les fragments pr sents en de multiples copies monoclonales sur les billes celles ci sont plac es dans des mini r acteurs des plaques de pico titration Ces plaques permettent une lecture ind pendante de chaque bille donc de chaque s quence par pyros quen age Rougemont ef al 2008 Droege amp Hill 20081 Lors du pyros quen age les nucl otides sont rajout s successivement contrairement aux r actions de s quen age usuelles o les nucl otides sont rajout s simultan ment chacun tant marqu l aide d un fluorochrome diff rent Si le nucl otide pr sent dans le milieu r actionnel est celui attendu par l ADN polym rase il est incorpor dans le brin d ADN en cours de syn th se lib
100. en oe ER a a PR a Ew a ee ra RS 146 4 12 Les bases de donn es MySQL 4 cok su eRe EES OS HS 147 4 1 3 Optimisations de base de donn es 2 1 22 ec bee bee beens 147 4 2 M ta analyse et int gration de donn es 148 4 2 1 Bases de donn es de biologie 148 4 2 2 Bases de donn es d di es aux donn es de puces ADN 148 4 2 3 Structure des donn es dans Gene Expression Omnibus GEO 149 4 2 4 R analyses et m ta analyses de jeux de donn es provenant de GEO 149 43 Comments d projet s lt s s sa hassan st ne mes LAS es 151 4 4 D veloppement de l application 151 ARTICLE 4 TRANSCRIPTOMEBROWSER A POWERFUL AND FLEXIBLE TOOLBOX TO EXPLORE PRODUCTIVELY THE TRANSCRIPTIONAL LANDSCAPE OF THE GENE Ex PRESSION OMNIBUS DATABASE ee ee 155 4 5 Mise jour de la base et int gration de donn es 166 4 5 1 Restructuration de la base de donn es 166 45 2 Int gration de nouvelles donn es 2244s ee tee aa 167 4 6 D veloppement de nouvelles fonctionnalit s 170 4 6 1 Nouveaux modes d requ tes s s s D saa moni at ne a is 170 Table des mati res 5 4 6 2 Am lioration et nouveaux plugins 170 ARTICLE 5 TRANSCRIPTOMEBROWSER 3 0 INTRODUCING A NEW INTERACTION DA TABASE AND A NEW VISUALIZATION TOOL FOR THE STUDY OF GENE REG
101. ensuite obtenue l aide d amorces sp cifiques des adaptateurs Cette amplification permet la constitution par pontage d un groupe cluster de s quences amplifi es Le s quen age par synth se Sequencage By Synthesis ou SBS Figure 1 8 de ces clusters va permettre la lecture de l incorporation d un fluorochrome sp cifique d une base chaque ligation par la prise d une image tr s haute r solution de la lame Apr s rep rage des clusters il est possible de reconstituer les s quences ADN pour chacun d entre eux 1 4 Les techniques de s quencage tr s haut d bit 39 La soci t Illumina commercialise actuellement plusieurs mod les de s quenceurs ayant la m me chimie de s quen age mais offrant des caract ristiques de s quen age toujours plus comp titives capacit et configuration La majorit des donn es de s quen age tr s haut d bit tant issue de technologie Ilumina 980 s quenceurs 1670 un grand nombre de programmes d analyse de donn es ont initialement t d velopp s pour cette technologie Kircher et al 2011 Goldfeder et al 2011 Kircher et al 2009 Principe de la chimie du SOLiD de Life Technologies La technologie du SOLiD pour Sequencing by Oligonucleotide Ligation and Detection d velopp e par Life Technologies est bas e sur une amplification par PCR en mulsion de la m me fa on que pour le mod le commercialis par Roche Les b
102. eosinophil EpC epithelial cell Mac macrophage Mo monocyte PMN neutro polymorphonuclear neutrophil RAGE receptor for advanced glycation end products percentage of variance associated to disease phenotype PDanger associated molecular pattern DAMP activity doi 10 1371 journal pone 0011671 t004 PLA2G4A which is the initial rate limiting enzyme that cleaves membrane phospholipids 78 is over expressed Similarly most downstream key enzymes from the GOX 2 and 5 LOX sub pathways involved in the final synthesis conversion and transport of inflammatory eicosanoids lipid mediators are over expressed In particular the transcript encoding the inducible microsomal prostaglandin E synthase PTGES that catalyzes the conversion of prostaglandin PGH2 to PGE2 in the GOX 2 sub pathway and thought to play a pathogenic role in a number of inflammatory processes 88 is significantly increased and has the highest statistical association with the disease phenotype 62 of gene variance explained by the disease phenotype according to multi way ANOVA At the opposite the PTGDS transcript which encodes the anti inflammatory prostaglandin D2 synthase has decreased abundance a finding already reported in metabolic inflammatory processes 89 Increased abundance of the transcript encoding the LTA4H enzyme that converts the LTA leukotriene to LTB reflects the activation of the 5 LOX sub pathway Finally transcripts encoding the oxidative enzy
103. es Leur inclusion dans la base de donn es aurait entra n un alourdissement de celle ci et donc des performances m diocres Cependant toutes les signatures ont t re extraites en utilisant la librairie R RTools4TB que j ai d velopp e et qui sera pr sent e en d tail plus loin TBrowser a t modifi en cons quence afin de prendre en compte la nouvelle base de don 4 5 Mise jour de la base et int gration de donn es 167 500 000 analysis of H3K4me1 in IMR90 cells MaxS6 Min 0 ChiP Seq analysis of H3K4me2 in IMR90 cells Max162 Min 0 450 000 z Le 3 ex 400 000 a z 350 000 amp F lt 1 a 300 000 na lt D 250 000 T 9 200 000 3 a 150 000 O 100 000 50 000 5 0 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 Ficure 4 3 volution du nombre d chantillons disponibles dans Gene Expression Omnibus de 2000 2010 Adapt de Barrett et al 2005 n es mais galement l inclusion dans cette base d une centaine de proc dures stock es non pr sentes dans la premi re version et qui nous permettent de simplifier la r cup ration des r sultats de requ te au niveau de l application java Un seul appel une proc dure stock e est maintenant suffisant l o plusieurs requ tes SQL taient n cessaires De plus l utilisation d une base de donn es install e localement contenant des exp rienc
104. est noter toutefois l utilisation d autres types de format tels que le VCF Variant Call Format pour les variants g nomiques SNP insertions et d l tions Ce format mis au point par le projet 1000 G nomes est notamment utilis par SAMtools et GATK 5 3 3 Alignement sur le g nome de r f rence Le ChIP seq est une technique de re s quen age Il est donc indispensable de disposer d une r f rence g nome transcrits laquelle les fragments s quenc s seront compar s par alignement Cette technique se limite donc l tude des organismes dont le g nome a d j t s quenc L alignement des fragments sur la r f rence g n ralement un g nome est r alis par compl mentarit de bases La version du g nome utilis e est tr s importante pour la suite de l analyse et doit tre choisie en fonction des besoins annotations disponibles qualit de l assemblage outils d velopp s En effet entre les 2 derni res versions du g nome humain hg18 mars 2006 et hg19 f vrier 2009 les principales diff rences sont la taille et les annotations De plus beaucoup d outils m me r cents utilisent encore la version hg18 236 Chapitre 5 tude de la r gulation transcriptionelle par HTS du g nome car celle ci dispose de plus d annotations que la version hg19 n anmoins plus compl te Des outils de conversion de position entre les 2 versions de g nomes ont t cr s tels que liftover de
105. et al 2007 RNA seq Ou Whole Transciptome Shotgun Sequen cing WTSS permet le s quen age de tous les ARNs pour l etude du transcriptome la d couverte de nouveaux g nes l tude des sites d pissage pissage alterna tif Hong et al 2011 Clark et al 2011 Bainbridge et al 2011 SAGE seq Serial Analysis of Gene Expression SAGE ou Digital Gene Expression DGE pour le comptage relatif de transcrits sRNA seq Identification des petits ARNs miRNA lincRNA snRNA ncRNA Gommans amp Berezikov 2012 GRO seq Global run on sequencing Core et al 2008 De novo seq S quen age d un g nome m tag nomique G nomique Re seq Re s quen age des g nomes pour l identifica tion des variations inter individuelles SNP in dels insertions d l tions grandes duplications et d l tions inversions translocations anoma lies de plo die etc Pareek et al 2011 Target seq Le s quen age cibl e g g ne r gion chro mosomique sur plus de 20Mb ou exome com plet d une collection d chantillons partir de puces ADN pour la d tection de polymor phismes SNP indels Transcriptome TABLE 1 4 Les principales applications du s quengage tr s haut d bit En gras figure l appli cation qui sera d velopp e plus en d tail dans le chapitre 5 de ce manuscrit 1 4 Les techniques de s quenca
106. exemple la fonction de MySQL DATE 148 Chapitre 4 Fouille de donn es de puces ADN les triggers qui surveillent la base de donn es et d clenchent des requ tes automatiques telles que le remplissage de tables partir d une insertion de donn es 4 2 M ta analyse et int gration de donn es 4 2 1 Bases de donn es de biologie Pour permettre le stockage et l organisation des donn es biologiques diff rents niveaux de nombreuses bases de donn es ont t mise en place telles que des bases de donn es de s quences GenBank EMBL Nucleotide Sequence Database et DNA Data Bank of Japan DDBJ Eucaryotic Promoter Database EPD prot ines UniProt Protein Data Bank PDB InterPro Institut europ en de bio informatique EBI g nomiques sp cialis es Saccharomyces Genome Database SGD FlyBase Worm Base The Arabidopsis Information Resource Zebrafish Information Network facteurs de transcription TRANFAC JASPAR polymorphismes g n tique dbSNP Hapmap voies de signalisation KEGG REACTOME Panther NCBICP Toutes ces bases sont interconnect es grace l utilisation d un identifiant unique pour caract riser une s quence un g ne un transcrit ou une prot ine comme c est le cas sur le site du National Center for Biotechnology Information NCBI Ces bases de donn es peuvent tre accessibles via des sites web protocole http ou ftp des API des librairies R
107. expression and epigenetics Cardiovascular research 90 3 430 40 Kauffmann amp Huber 2008 Kauffmann A amp Huber W 2008 arrayQualityMetrics Qua lity metrics on microarray data sets R package version 2 4 3 Kauffmann et al 2009 Kauffmann A Rayner T F Parkinson H Kapushesky M Lukk M Brazma A amp Huber W 2009 Importing ArrayExpress datasets into R Bioconductor Bioinformatics Oxford England 25 16 2092 4 Bibliographie 293 Kellum amp Schedl 1992 Kellum R amp Schedl P 1992 A group of scs elements function as domain boundaries in an enhancer blocking assay Molecular and cellular biology 12 5 2424 31 Keren et al 2010 Keren H Donyo M Zeevi D Maayan C Pupko T amp Ast G 2010 Phosphatidylserine increases IKBKAP levels in familial dysautonomia cells PloS one 5 12 e15884 Khalil et al 2009 Khalil A M Guttman M Huarte M Garber M Raj A Rivea Mo rales D Thomas K Presser A Bernstein B E van Oudenaarden A Regev A Lander E S amp Rinn J L 2009 Many human large intergenic noncoding RNAs associate with chromatin modifying complexes and affect gene expression Proceedings of the National Academy of Sciences of the United States of America 106 28 11667 72 Kharchenko et al 2008 Kharchenko P V Tolstorukov M Y amp Park P J 2008 Design and analysis of ChIP seq experiments for DNA binding protein
108. expression pattern characterized in particular by a higher abundance of transcripts encoding the key scavenger receptors of modified low density lipoproteins OLR 1 CD36 and MSRI but a decreased abundance of transcripts encoding critical cholesterol transporters such as the NPC1 66 or the ABCAI July 2010 Volume 5 Issue 7 e11671 Table 2 T lymphocytes and NK cells related genes present in the DSS gene signature Molecular Mechanisms of DSS Function Th1 differentiation Th2 differentiation Cytotoxic T lymphocyte functions T lymphocyte activation Cooperation with antigen presenting cells Recruitment and interaction of T lymphocytes with endothelium Inhibitory NK cell receptors Activating NK cell receptors Cytotoxic molecules Receptors for NK cells homing to peripheral tissues Differentiation factors of NK cells Suppression of T lymphocytes and NK cells response NFKB related genes Genes P value Var RUNX3 STAT4 TBX21 lt 0 00001 to 0 00242 25 to 42 GATA3 STAT5A 0 00003 to 0 00225 17 to 32 CTSW PRF1 0 00005 to 0 00231 21 to 33 IL2RB IL2RG 0 00014 to 0 00039 29 to 35 CD40LG 0 00105 21 ITGAL XCL1 XCL2 lt 0 00001 to 0 00214 20 to 33 KLRD1 0 00001 31 NCR1 NCR3 CD160 lt 0 00001 to 0 00069 28 to 39 GZMM lt 0 00001 32 S1PR5 lt 0 00001 48 FLT3LG IL15 IL17C KITLG 0 00088 to 0 00774 13 to 21 PTGES VSIG4 lt 0 00001 60 to 63 IRAK3 TNIK RELA NFKBIB TRAF1 0 00001 to 0 00506 9 to 31 TRAF2 TRA
109. free medium It is known that in sphere conditions cells can form a niche prevent differentiation and ensure self renewal The cell popula tions contained in hOE MSC derived spheres are not well known Some reports indicate that they include an heterogeneous mixture of stem cells and neuroglial progenitors 29 43 44 47 58 However immunostainings of nestin and R II tubulin show no significant differences Interestingly PCR analysis demonstrated that spheres express higher levels of WT JABKAP transcript compared to hOE MSCs in serum and express very flow levels of MU transcript However when the cells were transferred back to culture conditions with serum the enhanced JABKAP exon 20 inclusion was not maintained Figure 8G and H FD hOE MSCs that were cultured in serum free conditions without forming spheres did not exhibit significant changes in JABKAP isoforms suggesting that there is subpopulation of cells within the spheres that can promote JABKAP exon 20 inclusion These results indicate that when FD cells are turned back into a more primitive developmental stage JABKAP aberrant splicing is corrected as was described during the fibroblast to iPS cell reprogramming process 22 Accordingly commitment into a more differentiated neuronal state would alter JABKAP exon 20 inclusion Therefore we differentiated FD hOE MSCs using a previously established protocol which included retinoic acid RA forskolin FN and Sonic hedgehog S
110. human breast tissue Summary bulk breast tumor RNA from patientAbstract Sporadic basal like cancers BLC are a distinct class of human breast cancers that are phenotypically similar to BRCAl associated cancers Like BRCA1 deficient tumors most BLC lack markers of a normal inactive X chromosome Xi Duplication of the active X chromosome and loss of Xi characterized almost half of BLC cases tested Others contained biparental but nonheterochromatinized X chromosomes or gains of X chromosomal DNA These abnormalities did not lead to a global increase in X chromosome transcription but were associated with overexpression of a small subset of X chromosomal genes Other equally aneuploid but non BLC rarely displayed these X chromosome abnormalities These results suggest that X chromosome abnormalities contribute to the pathogenesis of BLC both inherited and sporadic total 62 sample incudes 43 tumor 7 normal breast and 12 normal organelle The samples that were used are the following gt sampleInfo lt getTBInfo field samples value 3DE64836D gt head sampleInfo 1 2 sampleID Title 1 GSM194397 Basal T118 2 GSM194398 Basal T134 3 GSM194399 Basal T140 4 GSM194400 Basal T141 5 GSM194401 Basal T146 6 GSM194402 Basal T147 Using the getExpressionMatrix function the expression matrix for signature 3DE64836D can be fetched in order to visualize the expression profile of
111. identified as the scaffold protein required to as splicing transcriptome analysis semble a well conserved six protein complex ELP1 6 also called the holo Elongator complex Hawkes et al 2002 which is recruited to the transcribed regions of some human genes essentially involved in actin cytoskeleton regulation and cell motility migration Subse quently IKAP hElongator was also shown to have functions in cell migration Close et al 2006 Creppe et al 2009 acetylation of microtubules and neuronal development Solinger et al 2010 It was also proposed to play a role in exocytosis Rahl et al 2005 and zygotic paternal genome demethylation Okada et al 2010 but Additional Supporting Information may be found in the online version of this article most likely as a result of tRNA modifications Chen et al 2009a These authors contributed equally to this work Esberg et al 2006 Huang et al 2005 Li et al 2009 Correspondence to El Ch rif Ibrahim 51 Bd Pierre Dramard 13344 Marseille Cedex Several studies aimed at investigating transcriptional alterations 15 France E mail el cherif ibrahim univmed fr revealed distinct patterns of gene expression in FD Indeed a sub Contract grant sponsor Association Fran aise de Recherche contre les Myopathies group of genes associated with cell migration and actin cytoskeleton AFM was shown to be downregulated in IKAP hElp1 deficient HeLa and 2011 WILEY PERIODICALS
112. in microbiology and immunology 349 61 72 Bibliographie 297 Ng et al 2005 Ne P Wei C L Sung W K Chiu K P Lipovich L Ang C C Gupta S Shahab A Ridwan A Wong C H Liu E T amp Ruan Y 2005 Gene identification signature GIS analysis for transcriptome characterization and genome annotation Nature methods 2 2 105 11 rom et al 2010 rom U A Derrien T Beringer M Gumireddy K Gardini A Bus sotti G Lai F Zytnicki M Notredame C Huang Q Guigo R amp Shiekhattar R 2010 Long noncoding RNAs with enhancer like function in human cells Cell 143 1 46 58 Obayashi amp Kinoshita 2011 Obayashi T amp Kinoshita K 2011 COXPRESdb a database to compare gene coexpression in seven model animals Nucleic acids research 39 Database issue D1016 22 Oberthuer et al 2010 Oberthuer A Juraeva D Li L Kahlert Y Westermann F Eils R Berthold F Shi L Wolfinger R D Fischer M amp Brors B 2010 Comparison of perfor mance of one color and two color gene expression analyses in predicting clinical endpoints of neuroblastoma patients The pharmacogenomics journal 10 4 258 66 O Geen et al 2011 O Geen H Echipare L amp Farnham P J 2011 Using ChIP Seq Tech nology to Generate High Resolution Profiles of Histone Modifications Methods in molecu lar biology Clifton N J 791 265 86 Okada et al 2005 Ok
113. ine De plus ces tags poss dent une meilleure sp cificit surtout pour la biotine en interaction avec la streptavidine L inconv nient reste que cette modification des prot ines peut alt rer le syst me biologique tudi Avec ces techniques la pr sence d artefacts est un sujet de pr occupation Il est donc essentiel de r aliser des contr les appropri s Les contr les internes de l exp rience peuvent tre divers input anticorps non sp cifique Immunoglobuline de m me s rotype ChIP sur un autre tissu r plicats techniques ou biologiques utilisant d autres anticorps sp cifiques de la prot ine Cependant le co t lev du s quen age d un chantillon freine leur utilisation Il existe un dernier contr le capital avant de passer au s quen age v rifier que le ChIP a produit un enrichissement suffisant 20 d enrichissement minimum Cette v rification peut tre faite condition de conna tre un g ne cible ou site de fixation du facteur de transcription En comparant les r sultats de qPCR de l chantillon immunopr cipit et du contr le on peut tablir l enrichissement en s quences d int r t 220 Chapitre 5 tude de la r gulation transcriptionelle par HTS 5 1 4 Avantages et inconv nients Le ChIP on chip et le ChIP seq sont actuellement deux technologies utilis es pour l tude de la fixation des prot ines sur l ADN Toutefois le ChIP seq supplante le ChIP on chip en
114. infections were classified at admission as classical dengue fever DF dengue hemorrhagic fever DHF or dengue shock syndrome DSS based on the 1997 WHO criteria 29 Clinical and biological follow up was done daily for each hospitalised patient DSS patients were admitted to hospital intensive care unit where they received appropriate fluid resuscitation and were monitored for vital parameters Children who required blood transfusion were not included in the study To increase the probability to identify gene signatures specific of DSS we chose to include only symptomatic dengue infected classified DF DHF and DSS but no healthy or non dengue children in the present study This is based on the rationale that comparing DF DHF and DSS patients together should improve the probability to identify a DSS specific gene signature while including an external non dengue control group should increase the probability to identify a general dengue related signature but should be less powerful at identifying a signature of severe dengue disease DF DHF and DSS patients whole blood samples selected for the present study corresponded to comparable duration of illness after onset of fever all were collected within a window of time comprised between 3 days and 7 days after onset of fever being considered day 0 For most DSS patients this generally corresponded to the day of cardiovascular decompensation shock or the day after except for 3 PLO17 PLO3
115. l ments sont r partis en un nombre fixe k de clusters o k est sp cifi par l utilisateur Dans un premier temps tous les g nes sont assign s au hasard l un des k clusters puis un vecteur d expression moyen est calcul pour 3 2 M thodes de classification non supervis es 81 SONDE_35 SONDE 37 SOMDE_1 SONDE_36 somes SOHDE_2 SONDE_41 SONDE 45 SOHDE_3 SONDE_26 SONDE 46 SOHDE_4 SONDE 30 SONDE 49 SOHDE_5 SONDE_ 10 SONDE 2 SOHDE_6 SONDE 39 SONDE 5 SOHDE_7 SONDE_ 28 SONDE 11 SOHDE_8 SONDE_16 SONDE 18 SOMBRE 9 SONDE_9 SONDE_22 SOHDE_19 SONDE_ 29 SONDE_ 25 SOMDE_11 SONDE_20 SONDE 32 SOHDE_12 SONDE_ 33 SONDE 40 SOMDE_13 SONDE_15 SONDE 44 SONDE 14 SONDE_ 43 SOMDE_15 SONDE_13 SOHDE_16 SONDE_35 SONDE_6 SOMDE_ 17 SONDE_25 SONDE_7 SOHDE_18 SONDE_ 32 SONDE_13 SOMDE_ 19 SONDE_37 SONDE_17 SONDE 20 SONDE_40 SONDE_21 SOMDE_21 SONDE_46 SONDE_23 SOHDE_ 22 SONDE 8 SONDE 24 SONDE_23 SONDE_12 SONDE_50 SONDE 24 SONDE_ 18 SONDE_1 SOHDE_25 SONDE 44 SONDE_3 SOHDE_ 26 SONDE_45 SONDE_4 SOMDE_27 SONDE_5 poe 8 SOHDE_28 SONDE_38 NDE SOHDE_29 SONDE_ 2 SONDE 31 SOMDE_ 30 SONDE_11 SONDE_34 SOMDE_31 SONDE_22 SONDE_47 SOHDE_32 SONDE_49 SOHDE_33 SONDE_24 SONDE_14 SOMDE_34 SONDE_27 SONDE_15 SOMDE_35 SONDE_31 SONDE_16 SOHDE_36 SONDE_1 SONDE_ 28 SOHDE_ 37 SONDE_4 SONDE 30 SOHDE_38 SONDE_ 7 SONDE_ 36 SOMDE_39 SONDE_34 SONDE_41 SOHDE_40 SONDE_21 SONDE_42 SOHDE_41 SONDE_6 SONDE_43 SOMDE_42 SONDE_17 SONDE_8 SOMDE_43 SONDE_23 SONDE_9 SOWDE_
116. language This script supports different metrics for distance calculation Euclidean distance Pearson s correlation coefficient based distance Spear man s rank correlation based distance Data normalization and processing Given the huge amount of data processed by GEO curators it is impractical to determine the quality and efficiency of the normalization methods used 27 Although seriesMatrix files should ideally contain log transformed data expression matrices in linear scale were also observed in several cases To circumvent this problem each column of the expression matrix was rank transform using R software This normalization procedure is insensitive to data distribution and provided us with a standard input for the DBF MCL algorithm In the case of microarray data DBF MCL was run using Spearman s rank correlation based distance 1 r However although rank based methods are well suited for normalization and distance calculation purposes they are not appropriate to display gene expression profiles To this end a normal score transformation was applied to each column of the datasets after DBF MCL classification The transformation ensures that whatever the data a standard format is available for heatmap visualization Finally for each experiment this dataset was used 1 to classify samples using hierarchical clustering 2 to build the expression matrix for the corresponding TS Data storage Expression matrix for each T
117. le mode de s quen age fragment ou paired end la prise en compte de l input ou d un autre contr le ou bien encore du type d exp rience facteur de transcription ou modification d histones Une fois ces analyses termin es les alignements de s quences les positions des pics et la distribution des reads sur le g nome peuvent tre visualis s et interpr t s En effet partir de cibles potentielles de facteurs de transcription il est possible 1 de v rifier la pr sence de motifs de fixation du facteur de transcription sous les pics 2 de d finir un motif consensus dans des s quences extraites a partir des s quences sous les pics 3 d tudier les fonctions des g nes cibles et 4 de localiser des pics au sein du g ne intron exon UTR r gion interg lt lt Byuo gt uodxz lt lt fudder 11835 s Nejap 210154 a AeWUd aay 2nd NG ouledid np SSOUBAB 229 TEZIZILOZIOZ P Aieni oas sins P siaaueied 1nd4no Ndi MAIS Le 5 3 Analyse de donn es de ChIP seq a1 P PPp Aa9aup Uesod ual euledid np xIOU9 bes giud Bujouenbesey poyeBieL wozdu sueJL OUM KASH A SYNCJS9 31071539Y i ay EBC jndul LUBE O eN JuaWeoue7 E pass saljouweled sa 1884 SUBS NO 99AB 12PU pews 104 SIL CENO Je nuo np obessijduey Z 314 NGN eid pnsey Budde renb 214 enea ALENT VIVOCLEM es uta ga 10 gy on ere Los LUS yy 4 senisa jea jas Jaw
118. le s quen age de r gions cibl es target seq Elles ne n cessitent aucun traitement des cellules L ADN g nomique est ainsi extrait et s quenc La s quence de novo d un g nome peut tre obtenue avec une couverture suffisante de mani re beaucoup plus rapide et moins co teuse qu avec les pr c dentes technologies Il n est plus n cessaire de r aliser des clonages bact riens et autres exp riences pour amplifier les fragments d ADN D sormais tous les fragments d ADN sont s quenc s Il est pr f rable d utiliser la technologie Roche En effet elle g n re des fragments longs qui facilitent de reconstituer la s quence g nomique par le biais de scaffolds celle ci contenant g n ralement des s quences r p t es elles seraient difficiles ins rer correctement dans le g nome sans ce type de s quen age Il existe galement des approches hybrides utilisant la technologie 1 5 Apports des techniques de puces ADN et de s quen age tr s haut d bit 51 Roche pour g n rer des scaffolds et celle d Illumina ou de Life Technologies permettant d am liorer la couverture De plus de r centes approches m lent des donn es g nomiques et transcriptomiques pour reconstruire le g nome en s aidant des s quences des ARNc obtenus par HTS Pour les g nomes dont la s quence est connue la technique appliqu e est le re seq c est dire le re s quen age partiel ou entier du g nome Cette approche perm
119. ler la qualit des puces ADN de technologie Agilent Met de normaliser les donn es g n r es Mais elles pr sentaient deux inconv nients elles chargeaient tr s lentement les donn es et n taient pas sp cifiques de l analyse des puces ADN de type Agilent ce qui a n cessit le d veloppement de notre propre librairie R AgiND Pour y rem dier les bioinformaticiens ont du d velopper des librairies R d di es permettant soit d estimer la qualit des puces ADN d une exp rience soit de normaliser ces donn es Table 2 1 Ces librairies R ont t mises disposition sur le site du CRAN ou de biocon ductor arrayQuality Paquet amp Yang 2008 Agi4x44PreProcess Lopez Romero 20081 arrayQualityMetrics Kauffmann amp Huber 20081 agilp Chain et al 2010 BABAR Alston et al 20101 GOULPHAR pipeline de PENS Paris Lemoine et al 2006 ou bien encore des plugins pour Excel comme arrayTools En parall le les librairies de base de 70 Chapitre 2 Contr le qualit et normalisation de donn es de puces ADN Extraction data amp S4 object creation Signals Pobos annotations Taomni amp Background ji values AgilentBatch one color OR AgilentBatchRG two color classes objects Feature Extraction Files 1 per sample Quality control amp Visualization boxplot ae deg age ia a Imaan mi ia a a m am Sh enti tom tat Microarray image DENT 29 E
120. lie Bergon Pascal Rihet Sivuth Ong Patrich T Lorn Norith Chroeung Sina Ngeav Hugues J Tolou Philippe Buchy Patricia Couissinier Paris 1 French Army Biomedical Research Institute Institut de recherche biom dicale des arm es IRBA Antenne de Marseille IMTSSA Unit de Virologie Marseille France 2 Institut Pasteur in Cambodia Department of Virology Phnom Penh Cambodia 3 TAGC INSERM U928 Marseille France 4 Kampong Cham Provincial Hospital Kampong Cham Cambodia Abstract Background Deciphering host responses contributing to dengue shock syndrome DSS the life threatening form of acute viral dengue infections is required to improve both the differential prognosis and the treatments provided to DSS patients a challenge for clinicians Methodology Principal Findings Based on a prospective study we analyzed the genome wide expression profiles of whole blood cells from 48 matched Cambodian children 19 progressed to DSS while 16 and 13 presented respectively classical dengue fever DF or dengue hemorrhagic fever grades 1 11 DHF Using multi way analysis of variance ANOVA and adjustment of p values to control the False Discovery Rate FDR lt 10 we identified a signature of 2959 genes differentiating DSS patients from both DF and DHF and showed a strong association of this DSS gene signature with the dengue disease phenotype Using a combined approach to analyse the molecular patterns associated with th
121. maps to each canonical pathway doi 10 1371 journal pone 0011671 g002 under expressed in DSS patients compared to DF and DHF encode a diversity of molecules the MMP 9 a matrix metallo counterparts Table 2 non exhaustive list individual p values protease with key role in tissue remodeling and a candidate to available in Table S2 and Figure 3 Those genes are critical to a dengue plasma leakage 47 the extracellular matrix molecules number of T and NK cell functions including T and NK cell fibronectin versican and collagens the angiogenin and VEGF differentiation receptor signaling activation and proliferation 17 endothelial agonists as well as the arginase 1 repair enzyme cytotoxic functions or recruitment of lymphocytes to peripheral which competes with the endothelial NOS WOS3 for L arginine tissues Since lymphocyte counts did not differ between the DF biodisponibility 48 DHF and DSS children p 0 428 Kruskal Wallis test we Thus DSS children whole blood cells have a global decreased searched whether genes encoding factors regulating negatively T abundance of T and NK cell related transcripts but an increased and NK functions were over expressed in the DSS gene signature abundance of anti inflammatory and repair remodeling tran We identified that the two genes having the strongest association scripts at the time of cardiovascular decompensation with the disease phenotype variable encode two major immuno modulatory
122. me un format de fichier particulier Les principales tapes de l analyse de donn es de ChIP seq seront pr sent es ici en mettant l accent sur les donn es issues de la technologie SOLiD dont l analyse a fait partie de mes travaux de th se Quelle que soit la technique HTS utilis e l analyse des donn es est compos e de trois tapes SUCCESSIVES l analyse primaire correspond acquisition des images de s quen age 4 par ligation car 4 couleurs partir desquelles deux fichiers sont produits l un contenant la s quence de chaque bille en code couleur color call et l autre la qualit quality metric de chaque ligation par bille Cette tape est la seule tre exclusivement r alis e par le cluster de calcul associ physiquement au s quenceur cluster online l analyse secondaire est l alignement de ces s quences en code couleur sur le g nome ou la s quence de r f rence Elle d pend de la technologie utilis e en effet les donn es SOLiD Mne sont pas au m me format que celles des autres technologies mais galement du mode de s quen age fragment paired end Il est parfois indispensable de transformer le format des donn es pour pouvoir utiliser l outil choisi l analyse tertiaire d pend de la technique HTS et correspond l analyse sp cifique des donn es comme la recherche de pics pour le ChIP seq Elle d pend de divers facteurs tels que
123. modulate splicing remains poorly understood 36 57 it is unlikely that kinetin acts directly on the general transcription machinery as the level of JABKAP transcripts was not significantly modulated by kinetin in control hOE MSCs Figure 7G This effect of kinetin has also been previously observed in control iPS cells 22 Time course experiments of kinetin treatment revealed that the drug acts quite rapidly on correcting JABKAP mRNA splicing and enhancing IKAP hELP1 synthesis but its effects last only a short time after removal Figure 7E and F This information provides new perspectives in the strategy of kinetin delivery to FD patients First kinetin as an FD treatment would potentially decrease deleterious consequences of the mutation at the protein level In addition drug efficacy may be achieved if adequate levels of kinetin is maintained over a long period of time However as observed for FD iPS cells 22 kinetin did not improve cell migration in FD hOE MSCs Figure 5 suggesting incomplete phenotype complementation Using the hOE MSCs model we were also able to modulate the expression of JABKAP WT and MU transcripts by exposing the PLoS ONE www plosone org OE MSCs as a Model for FD cells to different culture conditions to simulate variations in alternative splicing occurring during development and differenti ation hOE MSCs form free floating spheres in approximately 7 days when cultured with EGF and bFGF in serum
124. normalization Statistical Analysis For each comparison spheres vs rafnshh controls vs FD control spheres vs FD spheres and FD rafnshh vs FD rafnshh treated with kinetin measurement of differential gene expression was obtained using the Multiexperiment viewer MEV program Significant Analysis of Microarray SAM version 1 13 Standford University and Students t test were applied to determine fold changes FC and P values P respectively The data were ana lyzed using a two class unpaired response type which compared control versus FD samples as well as untreated versus treated FD samples with kinetin To construct dendrograms average linkage approximate hierarchical clustering of genes was performed using Pearson correlation using Cluster Eisen et al 1998 and visu alized under Treeview software http jtreeview sourceforge net For each comparison of samples the statistically relevant signaling pathways corresponding to the differentially expressed genes were identified using DAVID Database for Annotation Visualization and Integrated Discovery http david abcc ncifcrf gov Huang et al 2009 with high classification stringency P lt 0 05 and FDR lt 20 Results IKBKAP Splice Variants Ratio is Affected by Culture Conditions and Kinetin in FD hOE MSCs To observe the variation in IKBKAP mRNA alternative splicing four control and four FD hOE MSC cultures
125. nucl otide marqu non pas au niveau de la base mais au niveau de la cha ne phosphate Le syst me de d tection enregistre un film chronologique de ces v nements la diff rence des syst mes ant rieurs bas s sur l analyse d images tr s haute r solution Quant la technologie Ion Torrent elle est bas e sur des puces semi conductrices consti tu es de puits Figure 1 13 Rothberg et al 20111 Elle suit le principe publi en 1968 selon lequel un proton est relargu d s lors qu un nucl otide est incorpor dans l ADN par la polym rase Narurkar ef al 1968 Il en r sulte un changement de pH local mesur par un d tecteur sensible la variation d ions Cette derni re technologie ne n cessite ni cam ra ni scanner ni cascade enzymatique ni fluorochrome ou chimioluminescence Elle diff re en cela de toutes les autres bas es sur la d tection d un signal lumineux d o la d nomination anglaise de post light sequencing technology De plus avec la course l am lioration des technologies de s quen age les principaux fournisseurs ont eux aussi d velopp ou acquis de nouveaux appareils de paillasse pouvant s quencer tr s rapidement des librairies avec n anmoins un d bit moins important MiSeq Illumina Ion Torrent acquis en octobre 2010 par Life Technologies GS Junior System Roche 1 4 2 Techniques d analyses bas es sur le s quencage HTS La technologie de s quen
126. nucl otides soit la taille d un nucl osome elle est obtenue par digestion la Micrococcal Nuclease MNase permettant la coupure entre les nucl osomes Les fragments de chromatine sont ensuite immuno pr cipit s l aide d un anticorps sp cifique du facteur de transcription tudi ou d une modification d histone Ces anticorps sont coupl s a des billes magn tiques l aide de prot ines G ou A Dynabeads protein G Life Technologies choisies en fonction du s rotype de l anticorps utilis Une petite quantit de chromatine non immunopr cipit e est conserv e elle est appel e input Elle permet de mesurer la fixation non sp cifique et donc de corriger le bruit de fond des donn es de s quen age Pr c demment l input tait obtenu par immunopr cipitation de la chromatine a l aide d immunoglobuline de m me s rotype que l anticorps utilis pour le ChIP Les fragments de chromatine immunopr cipit s et l input sont trait s la prot ase K et la RNAse afin de supprimer effet du crosslink Enfin l ADN est extrait et purifi pour permettre la pr paration des librairies de s quen age 5 1 3 Biais et bruit de fond Pour limiter les biais et r duire le bruit de fond il est important de prendre en compte quelques contraintes techniques En effet les tapes de crosslink et de sonication n cessitent une mise au point technique dont d pend la qualit des r sultats Il faut
127. number of samples in the experiments which demonstrates the robustness of the filtering process In contrast a trend to produce more clusters in experiments containing few samples was observed This was notably marked in experiments containing 10 to 15 samples Such a bias is classical in data analysis Indeed if numerous values 1e samples are used to estimate the expression profile of a given gene PLoS ONE www plosone org outliers will have weak impact on distance calculation and the gene will be assign to the expected cluster In contrast when only few values are available each of them has a greater impact on distance calculation This results in producing more clusters with some of them having centers close to one another This bias is also presumably amplified by the fact that small sample sets contain most generally a greater biological diversity compared to large sample sets as they contain fewer replicates Overall our analysis of GPL96 related experiments gave rise to 3 377 TS The full analysis on the 33 Affymetrix platforms produced 18 250 TS which correspond to 220 millions of expression values Partitioning results where manually checked for a large panel of experiments Although results seemed perfectible in few cases they always appeared to be rational The TBrowser interface Comprehensive information on samples experiments probes and genes were stored in a mySQL relational database A flat file indexed on TS
128. of the TFBS predictions One criterion to assess the reliability of our predictions is based on the hypothesis that the overall functional properties of the predicted targets can be used to infer the biological processes in which TFs are involved To test this hypothesis we used annotation terms obtained from GO biological process KEGG PANTHER PFAM SMART PROSITE and WIKIPATHWAYS databases and performed systematic annotation of all predicted target sets in the mouse 32 For each pair of term PWM we computed the Fisher s exact test p value f Each cell of a matrix with terms n 3 905 as row and PWM n 1 103 as column was filled with a score defined as og f We then searched for biclusters inside this matrix using the binary inclusion maximal algorithm BiMax 33 Given the amount of information produced by this analysis only some meaningful results will be presented and are summarized in figure 1 Sites for PWM related to ETS M00746 M00971 M00771 M00339 MA0136 M00658 M00678 STAT IRF and RUNX M00722 transcription factor families known to contribute to pathogen responses were significantly over represented in genes annotated as immune system process and lymphocyte activation Figure 1A Sites for PWMs related to the Rel NF B pathway were significantly associated with targets related to in duction of apoptosis Toll like receptor signaling pathway and as expected to NF kappaB cas cade
129. of the less abundant and shorter PCR product revealed the use of an alternative 3 ss within JABKAP exon 2 which is shortened of 145 nt Figure 6A left schematic Accordingly the loss of the ATG start codon located within the 5 end of exon 2 can potentially induce the use of an alternative ATG start codon in exon 4 resulting in the synthesis of a putative 114 amino acid truncated IKAP hELP1 protein Figure 6D December 2010 Volume 5 Issue 12 e15590 Table 1 Most dysregulated genes in FD hOE MSCs are involved in seven over represented cellular processes OE MSCs as a Model for FD Gene Clone ID FC p value Biological process Studies Actin cytoskeleton reorganization p 0 000275 GSN 246170 1 91 0 00124 actin cytoskeleton reorganization ial ai MYO9B 279085 170 0 00144 actin cytoskeleton remodeling MYPN 325601 1 52 0 00131 sarcomere organization through nebulin and actinin interactions DSTN 149199 1 48 0 00017 actin filament depolymerization CORO2B 547561 1 44 0 00079 neuronal actin structure reorganization SLC9A3R2 155467 2 42 0 00034 adaptor of ion channels and receptors to the actin cytoskeleton Regulation of apoptosis p 0 00203 CHEK2 1893020 2 20 0 00007 cell cycle arrest and apoptosis in response to DNA damage 1 ZMAT3 525407 1 78 0 00251 positive regulation of p 53 mediated apoptosis TNFSF10 713945 2175 0 00041 induction of apoptosis by activation of caspase activity 7 PARP3 436086 1 60 0 00004
130. olfactory epithelium Bull Exp Biol Med 144 596 601 Carmel I Tal S Vig I Ast G 2004 Comparative analysis detects dependencies among the 5 splice site positions RNA 10 828 40 Ibrahim EC Hims MM Shomron N Burge CB Slaugenhaupt SA et al 2007 Weak definition of IKBKAP exon 20 leads to aberrant splicing in familial dysautonomia Hum Mutat 28 41 53 Ule J Ule A Spencer J Williams A Hu JS et al 2005 Nova regulates brain specific splicing to shape the synapse Nat Genet 37 844 52 Beillard E Pallisgaard N van der Velden VH Bi W Dee R et al 2003 Evaluation of candidate control genes for diagnosis and residual disease detection in leukemic patients using real time quantitative reverse transcriptase polymerase chain reaction RQ PCR a Europe against cancer program Leukemia 17 2474 86 Livak KJ Schmittgen TD 2001 Analysis of relative gene expression data using real time quantitative PCR and the 2 Delta Delta C T Method Methods 25 402 8 Ballester B Ramuz O Gisselbrecht C Doucet G Loi L et al 2006 Gene expression profiling identifies molecular subgroups among nodal peripheral T cell lymphomas Oncogene 25 1560 70 Talby L Chambost H Roubaud MC N Guyen C Milili M et al 2006 The chemosensitivity to therapy of childhood early B acute lymphoblastic leukemia could be determined by the combined expression of CD34 SPI B and BCR genes Leuk Res 30 665 76 Lopez F Rougemont J Loriod
131. package version 1 6 0 Lu et al 2010 Lu F Wikramasinghe P Norseen J Tsai K Wang P Showe L Davu luri R V amp Lieberman P M 2010 Genome wide analysis of host chromosome binding sites for Epstein Barr Virus Nuclear Antigen 1 EBNA1 Virology journal 7 262 Machanick amp Bailey 2011 Machanick P amp Bailey T L 2011 MEME ChIP motif analy sis of large DNA datasets Bioinformatics Oxford England 27 12 1696 7 Maher et al 2009 Maher C A Kumar Sinha C Cao X Kalyana Sundaram S Han B Jing X Sam L Barrette T Palanisamy N amp Chinnaiyan A M 2009 Transcriptome sequencing to detect gene fusions in cancer Nature 458 7234 97 101 Mardis 2007 Mardis E R 2007 ChIP seq welcome to the new frontier Nature methods 4 8 613 4 Margulies et al 2005 Margulies M Egholm M Altman W E Attiya S Bader J S Bemben L A Berka J Braverman M S Chen Y J Chen Z Dewell S B Du L Fierro J M Gomes X V Godwin B C He W Helgesen S Ho C H Ho C H Irzyk G P Jando S C Alenquer M L I Jarvie T P Jirage K B Kim J B Knight J R Lanza J R Leamon J H Lefkowitz S M Lei M Li J Lohman K L Lu H Makhijani V B McDade K E McKenna M P Myers E W Nickerson E Nobile J R Plant R Puc B P Ronan M T Roth G T Sarkis G J Simons J F Simp
132. param tres des analyses que l on souhaite faire Figure 5 8 Une base de donn es Postgres est utilis e pour la gestion des t ches ainsi que le syst me de messagerie Java ActiveMQ ce qui permet de lancer plusieurs analyses en m me temps Bioscope poss de galement une interface graphique crite en Java qui s ex cute dans le conteneur Tomcat Figure 5 7 Cette interface tr s pratique ne sert en fait qu produire les fichiers ini pour Bioscope et lancer ce dernier Cette interface Bioscope 228 Chapitre 5 tude de la r gulation transcriptionelle par HTS autorise galement l ajout de fonctionnalit s par le biais de modules ou plugins comme par exemple ceux de Corona Lite Corona Lite est une suite logicielle open source en ligne de commandes maintenue par Applied Biosystems Elle permet l alignement mapping des reads par comparaison une r f rence et la d tection des petits indels et des SNP Elle est d velopp e en Perl Python et Java 5 3 Analyse de donn es de ChIP seq Comme pour les puces ADN il y a une quinzaine d ann e l apparition du ChIP seq a n cessit le d veloppement de nombreux outils et m thodes sp cifiques d analyse En effet la masse de donn es analyser pour chaque exp rience requiert des outils de plus en plus performants Ces outils mettent en oeuvre divers principes et m thodologies et sont parfois li s une technologie de s quen age ou m
133. permet une condensation suppl mentaire de la chromatine La chromatine est pr sente sous deux tats l h t rochromatine sous forme compact e et inaccessible aux activit s enzymatiques et l euchromatine sous forme d condens e laissant ainsi accessible ADN la machinerie des ARN polym rases L accessibilit de la chromatine est un excellent indicateur de la capacit de liaison des facteurs de transcription et de la dynamique des nucl osomes participant la 26 Chapitre 1 Introduction g n rale Modifications covalentes des extr mit s N terminales des histones Remodelage de la chromatine enhancer insulator Complexe co activateur TS Module cis Complexe r gulateur d initiation de la transcription promoteur Ficure 1 3 Repr sentation sch matique des r gions r gulatrices permettant la modulation transcriptionnelle de l expression des g nes Adapt de Wasserman amp Sandelin 2004 r gulation de l expression des g nes La modulation de cette accessibilit est directement d pendante de la structure dynamique de qui par int gration de signaux sp cifiques joue un r le important dans la r gulation de l expression des g nes Li ef al 2007 Le remodelage de la chromatine permettant les principaux m canismes cellulaires exige une grande plasticit control e par des modifications biochimiques de sa structure l aide des m canismes pig n tiques tels que l
134. permettre l exploitation des donn es mais galement leur g n ration 4 7 1 D veloppement de services web Deux versions de service web ont t mises au point durant ma th se dans le but de rendre accessible les donn es contenues dans notre premi re base de donn es puis une fois r alis e celles de notre nouvelle base de donn es de signatures transcriptionnelles 1 La premi re est sous forme d une proc dure servlet java qui interagit avec la premi re version de la base de donn es La documentation des fonctions impl ment es est accessible l adresse http tagc univ mrs fr torowser ws et est utilisable tel que http tagc univ mrs fr tB TBWS servlets TBWS type field amp request value o field peut correspondre gene probe GSE GPL signature et annotation et value une requ te bool enne avec des op rateurs logiques ou et ou une requ te par liste avec les g nes s par s par des Il est noter qu ici on utilise un au lieu de amp comme op rateur logique car amp est utilis pour s parer les diff rents param tres de la requ te du service web 2 La seconde et plus r cente version correspond un service web en java bas sur SOAP WSDL tournant sur un serveur Tomcat Apache Axis2 Met acc dant aux donn es contenues dans la nouvelle base de donn es Ce type de service web est de plus en plus utilis en bio
135. positive regulation of apoptosis maintenance of genomic stability Transport p 0 00224 ABCG5 121977 1 59 0 00089 cholesterol transport in and out of the enterocytes SLC35E1 487960 55 0 00086 monosaccharide transport SLC22A6 36482 1 52 0 00050 a ketoglutarate transmembrane transporter activity SFXN2 757192 1 45 0 00021 iron transport Cell proliferation p 0 00552 APOE 1870594 1 68 0 00028 cell proliferation regulation of neurite extension all CD22 284220 1 58 0 00032 B cell proliferation CD38 123264 1 54 0 00074 B cell proliferation GBA 757264 1 53 0 00012 cell proliferation ceramide metabolic process SERINC2 149995 1 84 0 00014 cell proliferation Regulation of cell growth and cell cyle p 0 0091 PMEPA1 366599 4 92 0 00585 EGF receptor signaling pathway negative regulation of cell growth BRED STRBP 669157 1 75 0 00007 regulation of cell growth INO80B 323554 1 63 0 00104 growth induction and cell cycle arrest at the G1 phase S100A16 739851 1 55 0 00012 regulation of cell cycle progression 1 CDIPT 306047 1 49 0 00088 regulation of cell growth Nervous system process p 0 0302 LRCH1 683580 3 62 0 00080 long term memory and learning signal transduction KCNT2 38677 2 06 0 00010 synaptic transmission mediated by K channels NUMBL 1855110 1 61 0 00118 Notch signaling pathway inhibition cerebral cortex morphogenesis DULLARD 346368 1 53 0 00198 Nuclear organization negative regulation of BMP signaling Proteolys
136. pro inflammatory gene pattern associated with the DSS gene signature is characteristic of the metabolic pro inflammatory arachidonic acid pathway one of the lipid metabolic pathways identified through IPA As shown in table 5 the gene encoding the upstream cytosolic phospholipase Table 3 Anti inflammatory tissue remodeling and repair genes present in the DSS gene signature Function Gene Symbol P value Var Anti inflammatory genes immunoregulatory molecules IL10 0 00430 20 anti proteases SERPINB2 SERPINB8 SERPINB10 SLPI lt 0 00001 to 0 00081 19 to 49 metalloproteinase inhibitor TIMP1 0 00183 19 decoy receptor IL1R2 0 00077 30 free heme scavenger molecules CD163 HP HMOX1 lt 0 00001 to 0 00064 26 to 46 complement regulatory molecules CD55 VSIG4 lt 0 00001 to 0 00096 24 to 60 Tissue remodeling and repair genes metallopeptidase MMP9 0 00001 33 extracellular matrix components COL1A2 COL8A2 COL14A1 COL17A1 FN1 SDC1 lt 0 00001 to 0 00309 18 to 34 VCAN pro angiogenic factors ANG VEGFA 0 00004 to 0 00236 25 to 30 others ARG1 NOS3 lt 0 00001 to 0 00054 18 to 44 HUGO gene names are indicated When genes were represented by several clones on the microarray p value and variance medians were calculated Genes in regular and bold are respectively under and over expressed in dengue shock syndrome patients percentage of variance associated to disease phenotype PDanger associated molecular pattern DAMP
137. products showing IKBKAP WT and MU transcripts of 3 differents FD patients before and after rafnshh treatment H histograms represent the mean level of IKBKAP transcripts expression normalized with ABL1 gene expression for 3 FD patients after RT qPCR analysis P lt 0 05 using two tailed Student s test FD hOE MSCs treated for 7 days in rafnshh were fixed and stained for GFAP I one cell with ramified neuritic process is magnified in J MAP2 expression K and L M P double labelling of rafnshh treated cells with anti B III tubulin and anti nestin antibodies Scale bars represent either 50 um A D K M P or 25 um E F J L doi 10 1371 journal pone 0015590 g009 Surprisingly we did not find JABKAP as a dysregulated gene in our microarray analysis This result is all the more intriguing since this gene is expressed at much lower levels in FD hOE MSCs as shown by RT qPCR in the exon 20 region Figure 2B However previous analyses using microarrays also failed to detect JABKAP as a down regulated gene in FD cells 22 40 To address the question of a possible PCR artifact or lowered microarray PLoS ONE www plosone org sensitivity and because the FD mutation is located in the middle region of the JABKAP gene we performed quantitative PCR at both ends of the ZXBKAP gene Unexpectedly we identified 2 new events of alternative splicing at both extremities of JABKAP coding sequence Figure 6A However these results are in ag
138. proportion of genes of which expression is modified are related to host innate immunity lymphocyte functions and lipid metabolism in particular This genome wide expression analysis also confirms the over expression of individual biomarkers previously associated with severe dengue such as the acute phase pentraxin related protein PTX3 the pro inflamma tory IL 18 cytokine or the anti inflammatory IL 10 cytokine Table 2 11 12 15 providing a more comprehensive overview of their implication in the pathophysiology of DSS Our results differ however from those reported by Long et al in a genome wide expression profiling study comparing DSS children with uncomplicated paediatric patients 28 This study concluded on a global benign and muted immune transcriptional response but a decreased expression of genes involved in IL 10 and IFN type I related pathways in DSS children blood cells 28 July 2010 Volume 5 Issue 7 e11671 Differences in study design size of cohorts and time of blood sampling from patients in the course of dengue disease may explain these differences Indeed in our study two DSS children had gene expression profiles close to those of uncomplicated DF and DHF and clustered within the DF DHF cluster Both proved to be the children from whom blood was sampled three days after the onset of shock while the three DSS children sampled two days after shock onset still exhibited a typical DSS gene expression pr
139. quantification of full length exon 2 inclusion and exon 36 skipping by RT qPCR on samples from 4 controls and 4 FD hOE MSC cultures Similar underexpression of JABKAP transcripts WT MU was observed in FD cells compared to control cells regardless of the exon investigated Figure 6C Thus PLoS ONE www plosone org OE MSCs as a Model for FD E CTRL FD 160 140 120 D 2 fao 100 amp 80 r o gt 60 gt 40 20 0 serum ITS ITS kinetin Figure 5 FD hOE MSCs demonstrate reduced migration Cell invasion in 3 different controls and 3 different FD hOE MSCs was studied using Boyden chamber assay Cells 3x10 were added to the upper chamber in serum medium serum free medium ITS or ITS supplemented with 100 uM kinetin Cell invasion was mesured after 24 h Results are shown as the average SEM of the number of cells per microscopic field P lt 0 05 doi 10 1371 journal pone 0015590 g005 these results confirmed a decreased level of JABAAP transcripts WT MU in FD cells In addition we tested for the stability of exon 36 containing transcripts after cycloheximide treatment Absolute RT qPCR analysis revealed that the exon 36 skipping exon 36 inclusion ratio decreases when NMD pathway is inhibited Figure S2 suggesting that transcripts including exon 36 are degraded through NMD in FD as well as in control OE MSCs data not shown Kinetin treatment corrects aberrant IKBKAP pre mRNA splici
140. rant ainsi un pyrophosphate PPi Ce PPi permet la transformation de l ad nosine 5 phosphosulfate APS en ad nosine tri phosphate ATP par l ATPsulfurylase Cette mol cule d ATP va se coupler la lucif rine pour permettre la lucif rase de transformer la lucif rine en oxylucif rine Cette derni re r action met un signal lumineux qui va tre capt par le scanner du s quenceur Les nucl otides en surplus dans le milieu r actionnel sont alors d grad s par une apyrase ce qui permet la lecture de la base suivante Ronaghi ef al 19981 Il est noter que la taille des s quences lues par cette technique reste faible inf rieure 400 nucl otides l activit enzymatique diminuant au cours du s quen age Ahmadian et al 2006 Ronaghi 2001 Principe de la chimie du HiSeq2000 d Illumina La technologie True Seq commer cialis e par la soci t Illumina repose sur l utilisation d une lame de verre divis e en 8 pistes lin aires dans lesquelles sont fix es haute densit deux courtes s quences d ADN de mani re al atoire Ces deux s quences correspondent aux s quences compl mentaires des adaptateurs la diff rence des deux autres technologies qui ont recours l amplification par mulsion PCR coupl e l utilisation de billes magn tiques les fragments d ADN poss dant les deux adaptateurs vont s hybrider la lame de mani re homog ne Figure 1 7 L amplification est
141. regions exon 2 and exon 36 using 4 control and 4 FD samples at the same cell passage P7 Histograms represent the mean value of 4 samples normalized with ABL7 gene P lt 0 001 using two tailed Student s test IKAP hELP1 truncated regions for all splicing events are represented by a schematic D Grey portions represent the conserved amino acids while black portions represent new amino acids resulting from a frame shift Putative functional domains of the protein are indicated as well as the immunogenic region for the monoclonal antibody used in western blot and immunocytochemistry experiments doi 10 1371 journal pone 0015590 g006 However quantitative analysis by RT qPCR revealed that kinetin significantly enhances the ratio 6 h after its addition to the culture Figure 7E Interestingly time response of kinetin was maximal at 24 h during treatment but its effect on splicing lasted more than 6 h after the drug was washed out and WT transcript levels remained high compared to non treated cells at least 24 h after the wash out Consistent results were observed for IKAP hELP1 protein expression by western blot analysis although a strong decrease of protein amount appeared after 24h of wash out Figure 7F Finally we wanted to investigate kinetin activity along the JABKAP transcript Therefore we compared the level of expression of IKBKAP transcripts by RT qPCR to focus on different transcript regions for both control and FD hOE MSC
142. s Enfin comme pour les puces ADN il faut passer par une tape de validation exp rimen tale par ChIP qPCR de certaines positions cibles Elle est g n ralement r alis e sur le ChIP de d part car seule une partie de l ADN immunopr cipit est utilis e pour le s quen age Mais ces validations peuvent galement tre r alis es sur un ChIP ind pendant 5 3 1 Donn es brutes et qualit de s quen age Les s quences qu elles soient prot iques ou nucl iques sont la plupart du temps stock es dans des fichiers fasta Ces derniers qui peuvent contenir une ou plusieurs s quences sont des fichiers textes structur s de la fa on suivante chaque s quence d bute par une ligne dont le format est libre commen ant par un caract re gt et contenant diverses informations telles qu un identifiant de s quence ou un nom de g ne Les lignes suivantes d gale longueur contiennent la s quence en nucl otides acides amin s ou code couleur dans le cas des fragments de s quen age produits par le s quenceur SOLiD Dans le cas du s quen age tr s haut d bit des fichiers de s quences un peu plus com plexes ont t utilis s et sont maintenant devenus des standards Outre les s quences ces fi chiers contiennent des valeurs de qualit pour chaque base s quenc e Les principaux formats de fichiers de s quences brutes non align es disponibles l heure actuelle sont le csfasta
143. serum followed by incubation with the primary antibodies diluted in the blocking buffer Coverslips were processed for immunofluorescence staining using the following primary antibodies rabbit anti nestin 1 500 Abcys mouse anti BITI tubulin 1 500 Sigma clone SDL 3D10 rabbit anti GFAP 1 500 Dako rabbit anti MAP2 1 500 Abcam mouse anti IKAP hELP1 1 100 BD Biosciences clone 33 Each primary antibody was applied for 2h at room temperature For IKAP hELP1 staining primary antibody was incubated 3h at room temperature followed by an overnight incubation at 4 C We used appropriate secondary antibodies goat anti rabbit IgG conjugated with AlexaFluor 594 1 500 Invitrogen goat anti mouse IgG conjugated with AlexaFluor 488 1 500 Invitrogen for 1 h at room temperature Hoechst nuclear dye was used to label nuclei 1 2 000 Molecular Probes 33258 Coverslips were finally mounted with anti fading medium ProLong invitrogen Cells were observed under a Nikon Eclipse E800 upright microscope equipped with epifluorescence and TRITC FITC and DAPI filters and images were analyzed using an Orca ER CCD camera Hamamatsu Photonics and the LUCIA image analysis software Laboratory Imaging Confocal PLoS ONE www plosone org OE MSCs as a Model for FD image acquisition was performed on a Leica TCS SP2 confocal microscope Leica Microsystems using the 488 nm band of an argon laser for excitation of Alexa 488 and the 68
144. size of the point is correlated with info content of the corresponding matrix e File name Fig S2 pdf e File format pdf e Title Summary of functional enrichment analysis using ClueGO cytoscape plugin e Description of data We estimated the number of predicted regulators for each gene of the human genome by computing the number of non redundant position specific motifs associated with each genes Genes in the top 1 regards to the number of regulators were used as input for the ClueGO plugin e File name Fig S3 pdf e File format pdf e Title Summary of functional enrichment analysis using ClueGO cytoscape plugin e Description of data We estimated the number of predicted regulators for each gene of the mouse genome by computing the number of non redundant position specific motifs 21 associated with each genes Genes in the top 1 regards to the number of regulators were used as input for the ClueGO plugin e File name Fig S4 pdf e File format pdf e Title Subset of Gene Ontology used for the cell compartment based layout Description of data Hierarchical structure of the subset of Gene Ontology used in InteractomeBrowser for the cell compartment based layout Colors highlight the main compartments e File name TBMC mm bed e File format bed e Title TFBS predictions in the mouse genome Description of data A bed file containing TFBS predictions in the mouse genome chrom The na
145. sporadic Parkinson s disease PLoS One 6 e21907 Creppe C Malinouskaya L Volvert ML Gillard M Close P Malaise O Laguesse S Cornez I Rahmouni S Ormenese S Belachew S Malgrange B Chapelle JP Siebenlist U Moonen G Chariot A Nguyen L 2009 Elongator controls the migration and differentiation of cortical neurons through acetylation of alpha tubulin Cell 136 551 564 Cuajungco MP Leyne M Mull J Gill SP Lu W Zagzag D Axelrod FB Maayan C Gusella JE Slaugenhaupt SA 2003 Tissue specific reduction in splicing efficiency of IKBKAP due to the major mutation associated with familial dysautonomia Am J Hum Genet 72 749 758 Delorme B Nivet E Gaillard J Haupl T Ringe J Deveze A Magnan J Sohier J Khrestchatisky M Roman FS Charbord P Sensebe L Layrolle P Feron F 2010 The human nose harbors a niche of olfactory ectomesenchymal stem cells displaying neurogenic and osteogenic properties Stem Cells Dev 19 853 866 Deng V Matagne V Banine F Frerking M Ohliger P Budden S Pevsner J Dissen GA Sherman LS Ojeda SR 2007 FXYD1 is an MeCP2 target gene overexpressed in the brains of Rett syndrome patients and Mecp2 null mice Hum Mol Genet 16 640 650 Dong J Edelmann L Bajwa AM Kornreich R Desnick RJ 2002 Familial dysautonomia detection of the IKBKAP IVS20 6T gt C and R696P mutations and frequencies among Ashkenazi Jews Am J Med Genet 110 253 257 Eisen MB Spellman PT Brown PO Botstein D 1998 Cluster an
146. surprised to find that spheres upregulated a significant number of genes related to nervous sys tem development and synaptic transmission Table 1 Detailed 4 HUMAN MUTATION Vol 00 No 0 1 11 2012 Table 1 Top Biological Process Gene Ontology GO Terms Overrepresented by Dysregulated Genes Control versus FD cells ID Term Count PValue FDR FUNCTIONAL GROUP 1 ENRICHMENT SCORE 2 35 ID 0051960 Regulation of nervous system development 5 1 0E 3 1 6 1D 0050767 Regulation of neurogenesis 4 6 9E 3 10 ID 0060284 Regulation of cell development 4 1 2E 2 17 FUNCTIONAL GROUP 2 ENRICHMENT SCORE 1 46 ID 0048489 Synaptic vesicle transport 3 2 7E 3 4 1 Control versus FD sphere cells ID Term Count PValue FDR FUNCTIONAL GROUP 1 ENRICHMENT SCORE 1 32 1D 0051960 Regulation of nervous system development 5 1 3E 2 18 FUNCTIONAL GROUP 2 ENRICHMENT SCORE 1 31 1D 0007268 Synaptic transmission 6 1 3E 2 19 x CXCR7 X PFKFB3 mAd AXK X ABKAP ARHGAP28 A MANTA1 1 SEMASA X SEPT3 4 Figure 3 Common genes differentially expressed in FD Intersec tion between the current study and the lists of four previous studies for the genes differentially expressed between control and FD KBKAP knockdown samples FC gt 1 5 P lt 0 05 The genes dysregulated in three different studies are listed and preceded by either a N for under expression or a 7 for overexpression in FD samples Capital letters define each study consid
147. the National Academy of Sciences of the United States of America 74 2 560 4 McKenna et al 2010 McKenna A Hanna M Banks E Sivachenko A Cibulskis K Kernytsky A Garimella K Altshuler D Gabriel S Daly M amp DePristo M A 2010 The Genome Analysis Toolkit a MapReduce framework for analyzing next generation DNA sequencing data Genome research 20 9 1297 303 McLean et al 2010 McLean C Y Bristor D Hiller M Clarke S L Schaar B T Lowe C B Wenger A M amp Bejerano G 2010 GREAT improves functional interpretation of cis regulatory regions Nature biotechnology 28 5 495 501 Metzker 2010 Metzker M L 2010 Sequencing technologies the next generation Nature reviews Genetics 11 1 31 46 Meyerson et al 2010 Meyerson M Gabriel S amp Getz G 2010 Advances in understan ding cancer genomes through second generation sequencing Nature reviews Genetics 11 10 685 96 Moorthy amp Mohamad 2011 Moorthy K amp Mohamad M S 2011 Random forest for gene selection and microarray data classification Bioinformation 7 3 142 6 Morin et al 2008 Morin R Bainbridge M Fejes A Hirst M Krzywinski M Pugh T McDonald H Varhol R Jones S amp Marra M 2008 Profiling the HeLa S3 transcriptome using randomly primed cDNA and massively parallel short read sequencing BioTechniques 45 1 81 94 Morris et al 2008 Mor
148. the normal stages of embryonic development that human ES cells undergo Although both stem cell types share a common transcriptional signature a subset of genetic profiles found in human iPS cells suggests retention of transcriptional and epigenetic memory related to their tissue of origin which can substantially affect their potential to differentiate into different cell types 25 27 Thus cells collected from primary sources that have been subjected to environmental signals appropriate for the pathological specificity of the targeted disease are likely important to mirror the biology of diseased human neural cells PLoS ONE www plosone org OE MSCs as a Model for FD Our aim is to understand what mechanisms drive ZXBKAP mRNA splicing to the almost exclusive production of aberrant transcripts MU in neuronal cells Here we demonstrate the potential of human olfactory ecto mesenchymal stem cells hOE MSGs to model this aspect in FD Indeed neurogenesis occurs throughout adult life in the olfactory mucosa due to the presence of resident multipotent stem cells giving rise to olfactory neurons in vivo 28 hOE MSCs can be grown into neurospheres that are multipotent and differentiate zm vitro into neurons astrocytes and oligodendrocytes as well as other cell types 29 30 Isolated from patients cultures of hOE MSCs provide potential models for genetically determined neuropsychiatric diseases 31 33 and stand as an interes
149. tissues that most constitutive IKBKAP exon 20 skipping occurs in tissues rep resenting a mixture of cell types and not just neurons Cuajungco et al 2003 Thus the ability to derive pure cultures of neurons or glial cells from hOE MSCs will be of great benefit to determine the cell type predominantly affected during FD development We report for the first time a genome wide gene expression anal ysis of IKBKAP mRNA splicing in response to kinetin a plant cy tokinin Surprisingly although kinetin helps to increase WT IK BKAP transcript level the compound does not seem to influence the expression of a large proportion of genes This specificity in IKBKAP mRNA splicing is an encouraging result in light of its po tential clinical use Axelrod et al 2011 Although the mechanism by which kinetin improves exon inclusion is still unknown a previ ous study has suggested that kinetin may target specific sequences within the 5 ss Hims et al 2007 In this context our finding that genes encoding a core component and a putative subunit of U1 snRNP SNRPA and LUC7L are regulated by kinetin supports the hypothesis that this compound can induce the recruitment of splicing factors to reinforce 5 ss recognition In addition we demon strated a consistent decrease of ZNF280D expression which shares with IKBKAP an identical 5 ss motif that potentiates the presence of a premature stop codon most likely targeted by the NMD ma chinery Theref
150. to MHC Class I MHC Class IT and macrophage related signatures two pathways related to immune function are presented in Figure 4 The API pathway is made of the prototypical immediate early genes and contains numerous transcription factors EGR1 EGR2 FOS FOSB IER2 JUN JUNB KLF6 KLF4 KLF10 ATF3 BTG2 and BTG3 whose complex interplay has been reported earlier Finally a NFKB signature was also observed which again contains prototypical regulators NFKIA NFKIE RELB BCL 3 and MAP3K8 TPL2 and known targets CCL20 CXCL3 ILIB IL8 and SOD2 Altogether these results underline the high relevance of the signatures obtained using this compilation of TS derived from GPL96 related GEO experiments Discussion In the present paper we present the construction of a unique collection of TS that summarize almost all human mouse and rat Affymetrix microarray data stored in the GEO database TBrowser constitutes a highly powerful search engine that makes it possible to perform easily platform independent meta analysis of microarray data This can be considered as a real improvement over classical approaches and softwares as it provides easy and productive access to data without the need of any programming skills Indeed the simple use of an extended set of operators proved to be sufficient to construct robust gene networks and assign poorly characterized genes to relevant biological pathways As a consequence it is particularly well suited t
151. to the PIR keyword multigene family Furthermore several signatures of Table 2 are related to melanoma and six of them were observed in the GSE7127 experiment 16 Although data from Table 2 would deserve further analysis they are most likely related to gain or loss of genetic material in tumors Indeed gain of 8q is frequently observed in a number of tumor types including melanoma and ovarian tumors and this region is known to contain the c myc oncogene at 8q24 21 Interestingly in several cases contiguous cytobands were significantly enriched suggesting a large deletion or amplification of genetic material in these tumors TS 60E29DA83 is enriched in genes from 8q13 8q21 11 8q22 1 8q22 3 8q24 13 and 8q24 3 cytobands In the same way loss of genetic material of the long arm of chromosome 11 occurs in primary melanoma but is even more frequent in metastatic tumors TS A93ED7519 is enriched in genes from 11q21 11q23 3 and 11q24 2 cytobands Altogether these results underline the versatility of TBrowser and its ability to extract hidden and meaningful informations from published or unpublished micro array data Indeed the cytogenetic results presented in Table 2 were not discussed by the authors in the corresponding articles A synthetic view of all GPL96 related experiments The paradigm that genes from a TS share functional relationships is now widely accepted and constitutes the basis of transcriptome analysis 17 Howev
152. tre simplement r alis en ligne de commande en effet une fonction d indexation de l objet de type S4 permet d extraire toutes les informations de l objet comme pour une matrice a partir des indices des colonnes et ou des lignes conserver Il est ainsi possible de cr er des sous groupes d chantillons et ou de g nes pour une analyse personnalis e L tape de normalisation est cruciale pour pouvoir exploiter les donn es et rep rer d ven tuels biais exp rimentaux Cette librairie propose la transformation en log2 des donn es et l acc s deux normalisations par la m thode des quantiles et par le LOWESS De plus 2 6 Discussions et Perspectives 69 l utilisateur peut exporter une partie des donn es au format g n rique ExpressionSet pour utiliser d autres librairies R proposant d autres types de normalisation Un export des donn es au format de la librairie R bioconductor marray a galement t impl ment pour permettre l utilisation des librairies marray et limma Enfin les donn es normalis es peuvent tre sauvegard es sous forme de matrice d expres sion avec en colonne les chantillons et en ligne les sondes L utilisateur peut galement choisir cette tape de supprimer les sondes contr les Par d faut si aucune information n est donn e sur les chantillons l aide d un objet de classe AnnotatedDataFrame les identifiants des chantillons sont directement r c
153. ucsc ceux ci permettent de convertir des annotations pour les genome browsers d une version une autre comme les positions de SNP de motifs de fixation de facteurs de transcription Oregano TFBSconserved Cependant ces versions de g nome ne sont pas adapt es a l tude des cellules canc reuses qui sont caract ris es par des amplifications et des d l tions g nomiques Il est important de prendre en compte cette information pour le reste de l tude Il faut galement tenir compte de la pr sence de s quences r p t es d tect es grace au logiciel repeatmasker En effet un fragment qui s aligne dans une zone r p t e ne peut pas tre consid r comme sp cifique de cette zone et ne doit donc pas tre pris en compte Au niveau de l alignement ces zones r p t es se caract risent par un empilement excessif de fragments sur une courte r gion quelques dizaines de paires de bases Ces empilements sont fr quemment observ s dans les t lom res et les centrom res des chromosomes r gions riches en r p titions L utilisation du mode de s quen age paired end qui permet de s affranchir en partie de ce genre de probl me est pour cette raison de plus en plus courante A Vheure actuelle il est courant qu une exp rience de HTS g n re une centaine de millions de reads L alignement d une telle quantit de courtes s quences sur une r f rence ne peut se faire l aide des outils d alignement clas
154. variance ANOVA 41 Indeed ANOVA evaluates the statistical probability p value for each individual gene that a difference in expression between the three patient groups could have been observed by chance This allows revealing genes that show even small but highly significant changes in expression regarding the studied phenotype July 2010 Volume 5 Issue 7 e11671 Molecular Mechanisms of DSS Table 1 Clinical and biological characteristics of DF DHF and DSS patient groups at the time of hospital admission DF n 16 DHF n 13 DSS n 19 Patients characteristics gender male n 7 43 4 31 7 37 age median IQR years 8 4 9 7 5 8 8 7 9 weight median IQR kg 18 13 20 15 14 18 19 15 23 hospital admission median IQR day after onset of fever DO 2 1 3 2 2 3 4 3 4 Dengue status viral serotype n DENV 1 DENV 2 DENV 3 DENV 4 unknown 4 2 8 1 1 1 1 10 1 0 1 1 10 0 7 immunological status secondary infections n 14 88 12 92 18 95 Clinical manifestations tourniquet test pos neg not done 56 44 0 54 38 8 37 32 31 hepatomegaly n 3 19 6 46 17 89 gastro intestinal bleeding gingivorragy hematemesis melena 0 1 8 6 32 n Blood pressure heart frequency median IQR pulse per minute pulse pressure median IQR mm Hg Haematological parameters thrombocytopenia platelet count lt 100000 mm hematocrit median IQR hemoconcentrati
155. vers des conceptions d tudes plus complexes avec en plus des puces ADN des puces miRNA En effet les ARN non codant ont t identifi s comme des cibles th rapeutiques potentielles pouvant traiter des maladies complexes comme le cancer Leur tude est donc un enjeu th rapeutique mais galement commerciale Cependant il existe quelques limitations des puces ADN par rapport au s quen age tr s haut d bit En effet les techniques de s quen age tr s haut d bit sont plus sensibles et permettent d avoir acc s plus d informations concernant les g nes transcrits En effet ces derni res permettent de faire du tag counting de rechercher des transcrits alternatifs ou bien encore de d tecter des fusions de g ne Maher et al 20091 Mais vu son co t les puces ADN sont encore de nos jours privil gi es lors d tude ayant pour but l identification de signature transcriptionnelle sp cifique d une pathologie ou l tude de l effet d une th rapie En effet les puces ADN permettent d analyser plus d chantillons ce qui est tr s utile pour des tudes pid miologique o l on dispose de beaucoup d chantillons Et enfin ces donn es sont beaucoup plus simple analyser et sont g n r es tr s rapidement 3 5 Conclusions et perspectives 141 Vers la construction de r seaux de r gulation Une fois l analyse de donn es g n r e on dispose souvent de liste de g nes
156. we checked whether our approach could overesti mate the number of targets for TF with high GC content related PWMs As shown in figure S1 this effect was essentially restricted to Spl and to a lesser extend to the Maz related PWM consensus RGGGAGGG As expected PWMs with high information content were most generally associated with fewer motifs Figure S1 point size Genes with highly conserved promoter regions mostly encode transcription factors We next estimated the number of predicted regulators for each gene by computing the number of non redundant PWMs associated with each gene The number of PWMSs that have a significant match in gene promoter regions range from 1 to 318 median 8 mean 13 37 in mouse and 1 to 353 in human median 7 mean 13 17 Genes in the top 1 considering the number of regulators eg Lmo3 Foxp2 Bcllla were as expected invariably associated with highly conserved promot er regions Moreover functional annotation indicates that a very large proportion of these genes were transcription factors and genes related to development Indeed in mouse enrichment analysis of the gene list 112 genes using Fisher s exact test with Benjamini and Hochberg correction indi cated a very strong enrichment for genes related to terms Transcription factor PANTHER TERM q value 1 3 10 52 genes out 95 annotated pattern specification process GO biologi cal process q value 2 8 10 19 genes out 78
157. were either induced to form spheres or treated with the rafnshh cocktail FD rafnshh treated hOE MSCs were also incubated with 100 uM kinetin for 48 hr A semi quantitative RT PCR analysis confirmed that control hOE MSCs expressed exclusively the WT IKBKAP mRNA transcript while FD hOE MSCs expressed both the WT and the MU tran scripts Fig 1A In contrast RT qPCR analysis on the FD samples revealed a reduced WT MU transcript expression ratio in rafnshh compared to sphere conditions which was reversed with kinetin treatment Fig 1B These results are consistent with the increased WT IKBKAP transcripts observed in spheres compared to adherent hOE MSCs from our previous study Boone et al 2010 Microarray Analysis Revealed Differential Transcriptional Expression of KBKAP and Genes Implicated in Nervous System Function The 16 RNA samples obtained after treating four control and four FD hOE MSCs with either EGF and bFGF or the rafnshh cocktail were used to characterize the FD transcriptional signature To con firm the strong impact of culture conditions on gene expression RAFNSHH SPHERES kinetin Nb oo adh HE ob no A gt ab RMP L L L L L C L Sre L L L L nm rr B 24 kk SPHERES 20 C o RAFNSHH D RAFNSHH kinetin 5 16 a a o S 12 v 2 F 8 m i 0 Figure 1 Expression profile of KBKAP exon 20 alternative splic ing in control and FD hOE MSCs under defined culture conditions A Agarose gel ele
158. with ITS and growth factors B and C Immunostaining showed f IIl tubulin and nestin positive spheres D E F RNA was isolated from 2 different FD hOE MSCs cultured first in serum then induced to form spheres and finally dissociated and replated in serum conditions for 24h G RT qPCR was performed on the same samples and histograms represent the mean value of the two FD samples after normalization with ABL1 gene H Scale bars represent 100 um P lt 0 05 P lt 0 01 using two tailed Student s test doi 10 1371 journal pone 0015590 g008 So far the proposed functions of IKAP hELPI are related to various cellular localizations This has been the matter of a controversy because several studies have failed to detect IKAP hELPI in the nucleus or found it almost exclusively in the cytoplasm 11 21 34 51 which is difficult to reconcile with its suggested role in transcription elongation As observed in most published studies we observed that the immunolocalization of IKAP hELP1 was mainly cytoplasmic within the perinuclear area However we also detected significant nuclear staining in agreement with other reported studies 9 52 Altogether our findings of IKAP hELPI distribution in hOE MSCs support multiple roles for the protein within different subcellular compartments In order to establish a direct link between low levels of JABKAP WT transcripts and decreased neuronal populations in FD patients several groups have investigated tra
159. 0 001 using two tailed Student s test doi 10 1371 journal pone 0015590 g007 In contrast to control cells which constitutively include JABKAP exon 20 we confirmed the alternative splicing of that exon in FD cells Figure 2A Similar to neural precursors obtained from iPS cells 22 FD hOE MSCs predominantly express the MU JABKAP transcript isoform Figure 2A and B Moreover we demonstrated that FD cells exhibit notably lower ZABKAP transcript levels WT MU when compared to controls Figures 2B 6C and 7G Such a difference is most likely explained by an extensive PLoS ONE www plosone org degradation of MU transcripts through the NMD pathway Figure 2D as was previously suggested 35 However NMD efficiency varies between cell types and individuals 48 50 and it is unclear how prevalent this mechanism of mRNA degradation is in the nervous system of FD individuals In order to get a better insight into the actual contribution of NMD on the decay of IKBKAP MU transcripts it will be necessary to specifically block the NMD machinery December 2010 Volume 5 Issue 12 e15590 ITS EGF bFGF B Ill tubulin nestin IKBKAP expression AU 0 06 0 05 0 04 0 03 0 02 0 01 serum OE MSCs as a Model for FD spheres serum 24h serum spheres serum 24h Figure 8 WT MU ratio is increased in hOE MSC derived spheres FD hOE MSCs cultured in serum A gave rise to spheres when plated in medium supplemented
160. 0 nm band of an argon laser for excitation of Alexa 680 High magnification images were acquired using a 63x HCX PL APO with 4 digital zoom factor oil immersion objective numerical aperture 1 32 by sequential scanning to minimize the crosstalk of fluorophores Pinhole size was set to Airy one to achieve the best possible resolution theoretical lateral and axial limits 165 and 330 nm respectively Voxel size was set to 58 nm in x and y and to 162 nm in z Western blot analysis Cells were harvested by trypsination and centrifugation 5 min 300g The pellet containing approximately 10 cells was resuspended in 0 5 ml 2x Laemmli buffer 0 5 M Tris pH 6 8 4 4 ml Glycerol 20 SDS 1 Bromophenol Blue 0 5 ml B mercaptoethanol 30 ul of cell lysates were separated on 6 5 SDS polyacrylamide gel electrophoresis and transferred to a nitrocellulose membrane Amersham Biosciences After blocking with 5 nonfat milk in PBS 0 1 tween 20 PBST buffer blots were probed for lh at room temperature with a mouse monoclonal anti IKAP antibody 1 5 000 BD Biosciences clone 33 in PBST followed by incubation with horseraddish peroxi dase conjugated goat anti mouse IgG 1 5 000 Jackson Immunor esearch for 45 min at room temperature As a control the membrane was also probed for B actin 1 3 000 Sigma clone SDL 3010 Proteins were visualized by chemiluminescent detec tion using the ECL detection kit Enhanced Chemiluminescence Am
161. 011 0 33 atherosclerosis 84 cytochrome P450 superfamily CYP1B1 CYP2U1 lt 0 00001 to 0 00686 10 to 32 Vascular inflammation 85 enzymes CYP51A1 HUGO gene names are indicated When genes were represented by several clones on the microarray p value and variance medians were calculated Genes in regular and bold are respectively under and over expressed in dengue shock syndrome patients percentage of variance associated to disease phenotype doi 10 1371 journal pone 0011671 t005 demonstrating the classifying capability of this gene signature using unsupervised hierarchical clustering and SVM leave one out methods 35 39 40 Based on unsupervised hierarchical cluster ing DHF grades I II patients expression profiles appear very close or indistinguishable from those of DF patients at the same time of disease evolution while they group into two heterogeneous sub groups 1 and 2b Figure 1 of which significance should be investigated Altogether the present results highlight the inade quacy of the 1997 WHO classification of dengue clinical forms 19 21 that considers DF and DHF grades I II as two separate disease phenotypes and support the recently proposed classifica tion 90 Two important questions arise about the DSS associated transcriptional profile are the observed modifications of genes expression the cause or the consequence of the pathology and could these modifications have a predictive value We cannot definitive
162. 04001 s009 0 66 MB XLS Acknowledgments The authors would like to thanks the staff from the TAGC laboratory for helpful discussions and gratefully acknowledge Francois Xavier Theodule for technical assistance Author Contributions Conceived and designed the experiments FL JT AB GD ER SG DP Performed the experiments FL JT AB DP Analyzed the data FL JT AB JI CN DP Contributed reagents materials analysis tools FL JT AB GD ER SG JI DP Wrote the paper JI CN DP 16 Johansson P Pavey S Hayward N 2007 Confirmation of a BRAF mutation associated gene expression signature in melanoma Pigment Cell Res 20 216 21 17 Eisen MB Spellman PT Brown PO Botstein D 1998 Cluster analysis and display of genome wide expression patterns Proc Natl Acad Sci U S A 95 14863 8 18 Strubin M Newell JW Matthias P 1995 OBF 1 a novel B cell specific coactivator that stimulates immunoglobulin promoter activity through associa tion with octamer binding proteins Cell 80 497 506 19 Zhao C Inoue J Imoto I Otsuki T Iida S et al 2008 POU2AFI an amplification target at 11q23 promotes growth of multiple myeloma cells by directly regulating expression of a B cell maturation factor TNFRSF17 Oncogene 27 63 75 Rabot M El Costa H Polgar B Marie Cardine A Aguerre Girr M et al 2007 CD160 activating NK cell effector functions depend on the phosphatidylinositol 3 kinase recruitment Int Immunol 19 401 9 21 Boles KS Naka
163. 1 38 MB XLS Table S3 This matrix summarizes the results obtained using the ESRI amp GATA3 amp FOXAL query Rows correspond to genes References 1 Stoeckert CJ Causton HC Ball CA 2002 Microarray databases standards and ontologies Nat Genet 32 Suppl 469 73 2 Barrett T Edgar R 2006 Gene expression omnibus microarray data storage submission retrieval and analysis Methods Enzymol 411 352 69 3 Diehn M Sherlock G Binkley G Jin H Matese JC et al 2003 SOURCE a unified genomic resource of functional annotations ontologies and gene expression data Nucleic Acids Res 31 219 23 4 Parkinson H Sarkans U Shojatalab M Abeygunawardena N Contrino S et al 2005 ArrayExpress a public repository for microarray gene expression data at the EBI Nucleic Acids Res 33 D553 5 5 D haeseleer P 2005 How does gene expression clustering work Nat Biotechnol 23 1499 501 6 Heyer LJ Kruglyak S Yooseph S 1999 Exploring expression data identification and analysis of coexpressed genes Genome Res 9 1106 15 7 CHAMELEON A Hierarchical Clustering Algorithm Using Dynamic Model ing n d Available http citeseerx ist psu edu viewdoc summary doi 10 1 1 44 5847 Accessed 18 September 2008 8 Van Dongen S 2000 A cluster algorithm for graphs National Research Institute for Mathematics and Computer Science in the pp 1386 3681 9 Krogan NJ Cagney G Yu H Zhong G Guo X et al 2006 Global landscape o
164. 10 The Caenorhabditis elegans Elongator complex regulates neuronal alpha tubulin acetylation PLoS Genet 6 e1000820 3 Chen C Tuck S Bystrom AS 2009 Defects in tRNA modification associated with neurological and developmental dysfunctions in Caenorhabditis elegans elongator mutants PLoS Genet 5 e1000561 4 Huang B Johansson MJ Bystrom AS 2005 An early step in wobble uridine tRNA modification requires the Elongator complex RNA 11 424 36 PLoS ONE www plosone org 16 OE MSCs as a Model for FD intensities were treated with the SAM software to highlight the most differentially expressed genes with a FDR set at 3 TIF Figure S2 IKBKAP exon 36 inclusion increases after cycloheximide treatment NMD pathway was blocked by the translation inhibitor cycloheximide and results in an elevated expression of exon 36 including transcripts in 2 FD OE MSC cultures FD3 and FD4 as determined by absolute RT qPCR P lt 0 05 TIF Table S1 Dysregulated genes in other processes DOC involved Acknowledgments We wish to thank the patients and their families for their contribution to this study Furthermore we thank Dr Joseph Bernstein Dr Arnaud Deveze and Dr Jacques Magnan for their support in collecting biopsies We also thank Denis Puthier for his expertise in bioinformatics analysis as well as Adlane Ould Yahoui for technical help Andr Verdel and Jeanne Hsu for critical reading of the manuscript
165. 10 Devignot et al This is an open access article distributed under the terms of the Creative Commons Attribution License which permits unrestricted use distribution and reproduction in any medium provided the original author and source are credited Funding This work was supported by a clinical research program PRC 2007 13 from the French Army Medical Health Service Service de Sant des Arm es Fran aises The funders had no role in the study design data collection and analysis decision to publish or preparation of the manuscript Competing Interests The authors have declared that no competing interests exist E mail parisp imtssa fr 9 These authors contributed equally to this work Introduction Acute dengue virus infections are a major public health problem for many tropical and sub tropical countries and an increasing risk for the worldwide population 1 Symptomatic infections occur under a spectrum of diseases ranging from classical dengue fever DF to the most severe life threatening dengue shock syndrome DSS a leading cause of childhood hospitalisation and death in endemic countries with limited health resources 1 2 DSS is regarded as a vascular disease involving a complex interplay between virus whole blood cells and microvascular territories 3 4 and thought to result largely from an aberrant host response to infection As for other major systemic diseases a PLoS ONE www plosone org detrimental
166. 114941646445720549115741676124761145585454411767 1 11 gt 1_666 405 F3 13474677647412487546461147464541255941311545877848546 1 15 gt 1_703_178_F3 114107646511577484124815475555541284565788126477464845 1 16 gt 1_708_449 F3 84474844444566444644448764686465485844458675654 1 113 Ficure 5 10 Les formats fichiers bruts standards du SOLiD avec en A la notation de l iden tifiant des billes et en B quelques lignes seulement d un fichier csfasta et _QV qual la qualit de la pr paration des billes faible proportion de billes polyclonales absence d un adaptateur peut tre estim e avant de passer au s quen age de tout l chantillon Pour cela un run test appel WFA pour WorkFlow Analysis est r alis sur une faible proportion de billes En plus du rapport de SETS il existe d autres outils prenant en entr e les fichiers de s quences initiales ou bien encore les reads align s tels que par exemple le logiciel FastQC http www bioinformatics bbsrc ac uk projects fastqc Celui ci est simple d utilisation rapide et permet d obtenir graphiquement des statistiques sur la qualit du s quen age ou des reads align s Figure 5 11 5 3 2 Formats standards et outils de manipulation de donn es En fonction du type de donn es g n r es et du niveau d analyse primaire secondaire ou tertiaire il existe des formats de fichiers de donn es particuliers Tableau 5 2 Il est noter que ces fichi
167. 11a Wu G Yi N Absher D amp Zhi D 201 1a Statistical quantification of methylation levels by next generation sequencing PloS one 6 6 e21034 Wu ef al 2011b Wu G P K Chan K C C amp Wong A K C 2011b Unsupervised fuzzy pattern discovery in gene expression data BMC bioinformatics 12 Suppl 5 S5 Yu et al 2009 Yu Y Tu K Zheng S Li Y Ding G Ping J Hao P amp Li Y 2009 GEOGLE context mining tool for the correlation between gene expression and the pheno typic distinction BMC bioinformatics 10 264 Zeller et al 2006 Zeller K I Zhao X Lee C W H Chiu K P Yao F Yustein J T Ooi H S Orlov Y L Shahab A Yong H C Fu Y Weng Z Kuznetsov V A Sung W K Ruan Y Dang C V amp Wei C L 2006 Global mapping of c Myc binding sites and target gene networks in human B cells Proceedings of the National Academy of Sciences of the United States of America 103 47 17834 9 304 Bibliographie Zhang et al 2004 Zhang W Shiraishi A Suzuki A Zheng X Kodama T amp Ohashi Y 2004 Expression and distribution of tissue transglutaminase in normal and injured rat cornea Current eye research 28 1 37 45 Zhu et al 2010 Zhu L J Gazin C Lawson N D Pages H Lin S M Lapointe D S amp Green M R 2010 ChIPpeakAnno a Bioconductor package to annotate ChIP seq and ChIP chip data BMC bioinformatics
168. 2007 Les autres bases de donn es ont galement mis en place un syst me permettant des recherches en fonction d un g ne d une prot ine d un processus biologique ou d une voie de signalisation D autres approches sont galement utilis es pour obtenir des informations sur des g nes telles que des outils de fouille de texte comme Chilibot Chen amp Sharp 20041 1HOP Good et al 2006 Enfin des logiciels proposent galement l acc s diff rentes sources de donn es pr c demment ci t es Parmi les outils gratuits ceux principalement utilis s par les biologistes et les bioin formaticiens sont The Database for Annotation Visualization and Integrated Discovery DAVID knowledgebase Huang et al 20091 et Gene Set Enrichment Analysis GSEA Subramanian ef al 2005 Figure 3 7 La base de donn es DAVID propose ainsi un outil de regroupement d annotations fonctionnelles permettant l identification de groupes d anno tations significativement surrepr sent es dans une s lection de g nes Huang da et al 2007 Sherman et al 2007 Alors que GSEA est une m thode non param trique qui d termine si un jeu de g nes d fini a priori poss de des diff rences statistiquement significatives entre 84 Chapitre 3 Analyses de donn es de puces ADN GENETIC ASSOCIATION OMIM DISEASE GO Biological Process BP Gene Ontology GO GO Cellular Component CC GO Molecular Function MF CHROMOSOME C
169. 2010 Place et al 2008 Kim ef al 2008 28 Chapitre 1 Introduction g n rale Octam re d histones A H2A H2B H3 H4 x2 8 Histone H1 10 nm o_o ADN 4 Histone H1 Nucl osome B C Histone H4 Ac Histone H3 ARTKQTARK A A PT per p 3 GKQ Histone H2A SGRG GGKVR LPKKTE Histone H2B PEPAKSAPAPKKGSKKAVTKA VTKYTS E Ac tylation J Ubiquitinylation M tylation 9 Phosphorylation Ficure 1 4 Repr sentation des modifications covalentes d histones avec en A la structure de la chromatine avec ces octam res d histones adapt de http www mun ca biology scarr Histone_Protein_Structure html en B la structure tridimensionnelle d un nucl osome avec le positionnement des principales modifications d histones extrait de Wolffe amp Hayes 1999 et enfin en C les diverses modifications N terminales des histones H2A H2B H3 et H4 adapt de Lacoste amp C t 2003 1 3 R gulation de l expression des g nes 29 La majorit des IncRNA sont de taille sup rieure 200nt R cemment trois nou velles classes de IncRNA ont t d crites les long intergenic non coding RNA lin cRNA enhancer RNA eRNA et les promoter associated RNA PAR Kim et al 2010 De Santa et al 2010 rom et al 2010 Les lincRNA auraient une signature chromatinienne identique celle des g nes actifs Ils poss dent donc des marques pig n tiques telles que H3K4me3 au ni
170. 3 177 182 O Neill LA 2003 Therapeutic targeting of Toll like receptors for inflamma tory and infectious diseases Curr Opin Pharmacol 3 396 403 Kim KD Zhao J Auh S Yang X Du P et al 2007 Adaptive immune cells temper initial innate responses Nat Med 13 1248 1252 Barton GM 2008 A calculated response control of inflammation by the innate immune system J Clin Invest 118 413 420 Zhao J Kim KD Yang X Auh S Fu YX et al 2008 Hyper innate responses in neonates lead to increased morbidity and mortality after infection Proc Natl Acad Sci U S A 105 7528 7533 July 2010 Volume 5 Issue 7 e11671 3 4 Analyses de donn es dans le cadre de collaborations 107 Une conception minutieuse de l tude et de son plan exp rimental conjugu e la disponi bilit d outils d analyse nous ont permis de mettre au point cette approche transcriptomique par la technologie des puces ADN et d obtenir de solides r sultats En effet nous avons non seulement identifi une signature mol culaire large et robuste associ e au ph notype DSS mais aussi mis en vidence la complexit de la r ponse de l h te au cours de ce syndrome mortel Pour cela une analyse multifactorielle de variance multi ways ANOVA a t utilis e via le logiciel GeneANOVA Ce type d analyse permet de mettre en vidence d ventuelles corr lations parmi les diff rents param tres cliniques disponibles mais galement de prendre en com
171. 3 PL047 and 2 DSS PL005 PL101 for whom blood was collected respectively 2 and 3 days after onset of shock Patients samples selected for the present study were also carefully matched for age gender viral serotype when identified and immunological status primary or secondary according to reference assays described in diagnosis methods towards dengue infection Diagnosis assays carried out as described thereafter indicated that about 90 of all dengue infected children had secondary infection Dengue diagnosis and immunological status All diagnosis assays were carried out at the Institut Pasteur in Cambodia the National Reference Center for arboviral diseases in Cambodia IgM capture ELISA and Hemagglutination inhibition were performed on paired sera collected at admission and at discharge and systematically tested for both dengue and Japanese July 2010 Volume 5 Issue 7 e11671 Encephalitis virus another flavivirus endemic in Cambodia as described previously 30 Virus isolation was carried out on earliest serum samples by inoculating permissive C6 36 and VERO E6 cells followed by serotype specific immunofluorescence 30 Viral RNA was detected in specimens collected at early stage of the disease using a nested RT PCR 31 Primary or anamnestic secondary antibody response indicating previous infections by dengue viruses was determined from paired serum samples by hemagglutination inhibition assay Interpretat
172. 3 il permet l inclusion de fonctionnalit s d velopp es dans d autres langages de programmation tels que le C le C le perl et m me des commandes syst mes 4 il int gre les fonctions dans des paquets ou librairies accessibles la communaut scientifique via des d p ts tels que bioconductor Comprehensive R Archive Network CRAN Omegahat Les librairies R sont maintenant commun ment utilis es dans de nombreux domaines scientifiques Elles sont devenu un outil de travail tr s appr ci de la communaut bioinfor matique car facile d acc s Ainsi de nombreuses librairies d di es au traitement des donn es de puces ADN ont vu le jour Elles permettent entre autre l annotation des donn es via l acc s des bases de donn es ou par la cr ation de librairies contenant annotation sp cifique une plateforme de puces ADN l extraction des donn es partir de fichiers complexes la visualisation de ces donn es via des librairies graphiques le pr traitement et la normalisation via diff rentes m thodes et pour diverses technolo gies l tude statistique soit l aide de tests classiquement utilis s test t ANOVA regression lineaire SAM soit gr ce de nouvelles approches et des r analyses possibles de jeux de donn es de puces ADN le r arrangement et l export des donn es Ces librairies contiennent des d finitions d objets complexes d
173. 3284 ra 0 3000 3 GSE2189 GSE3167 GSE1427 2 e GSE4731 GSE1010 GSE995 Le 2 GsEsese GSE2322 Q e GSE6236 2 e GSE2208 GSE590 A 2000 GSE2144 Number of extracted TSs 0 10 e 20 30 10 20 1000 o GSE2174 e GSE1318 T T T T T 20 40 60 80 100 Number of samples Figure 2 Large scale TS extraction from GPL96 experiments DBF MCL was run with default parameters k 100 FDR 10 S 3 Inflation 2 X axis corresponds to the number of samples in the experiment and Y axis to the number of informative genes For each experiment the number of associated TS is represented by the size of the dot For clarity purpose only experiments with less than 100 samples are represented Furthermore the name of only some of them is displayed doi 10 1371 journal pone 0004001 g002 to query TBrowser database using six methods by gene symbols by probe IDs by experiments by microarray platform by ontology terms annotation or by TS Three of them gene symbols probe IDs and annotation methods accept a list of operators that control the way a query is to be processed One may take advantage of these operators to create complex queries using the AND operator amp the OR operator the NOT operator or using additional characters such as the quote or parenthesis reader may refer to the user guide for additional explanations and informations The main window of TBrowser is made of five panels Fig 3 The search panel is the m
174. 4 IF127 4 80 4 86E 3 PNMAL1 2 12 1 38E 3 GPR37 4 33 7 42E 4 RUNX3 17 44 8 23E 3 BST2 5 36 1 80E 3 RHOV 4 09 2 21E 3 EXTL1 2 16 2 86E 3 NRG1 2 84 3 89E 3 ANKRD2 2 07 1 43E 4 EPS8L1 2 57 1 38E 3 IGSF3 4 30 4 53E 3 FAM84B 11 25 1 87E 3 Figure 2 Heatmap of gene expression changes in control versus FD hOE MSCs Heatmap representation of overexpressed red and underex pressed green genes in four controls and four FD OE MSCs in different culture conditions named as SPHERES and RAFHSHH Normalized signal intensities were treated with the SAM software to highlight the most differentially expressed genes with a FDR set at 10 The color scale bar indicates Log ratio of intensities Genes related to nervous system development are indicated in blue et al 2006 Marazziti et al 2007 Newbern and Birchmeier 2010 Perez Otano et al 2006 Tan et al 2002 When analyzing the gene ontology GO of the dysregulated genes in FD P lt 0 01 and FC gt 2 Supp Table S2 the pathways with the most significant differential expression correspond to regulation of nervous system development and synaptic vesicle transport Table 1 NOVAT is Differentially Expressed in FD Versus Control Sphere Derived hOE MSCs As previously shown cells that have been induced to form spheres express a higher amount of IKBKAP WT transcript Therefore we were interested to identify genes that may be associated with this alternative splicing profile We were
175. 44 SONDE_3 SONDE_12 SOMDE_45 SONDE_19 SONDE_20 SOHDE_46 SONDE_47 SONDE_26 SOMDE_47 SONDE_14 SONDE 29 SOHDE_48 SONDE_ 42 SONDE_ 33 SOHDE_49 SONDE_48 SONDE_48 SOHDE_50 SONDE_ 50 SONDE_10 FIGURE 3 4 Repr sentation d un jeu de donn es A le heatmap sur les donn es brutes B la classification hi rarchique et C la partitionnement par la m thode k means k 3 82 Chapitre 3 Analyses de donn es de puces ADN Genes Genes UP DOWN Microarray Fa 4 Pathway GO ranking ge regulatory network Context dependency FIGURE 3 5 Interpr tation des g nes diff rentiellement exprim s partir d une exp rience de puces ADN Cette interpr tation d pend de l tude men e et permet la g n ration de r seaux de g nes contextualis s Werner 20081 chaque groupe Il permet de d terminer les distances entre clusters De mani re it rative les l ments sont d plac s d un cluster l autre et les distances sont recalcul es chaque it ration Les l ments ne sont autoris s rester dans le nouveau cluster que si celui ci est plus proche de l l ment que le cluster pr c dent ceci afin de minimiser la somme des carr s l int rieur de chaque cluster 3 2 3 Self organizing maps SOM Cette m thode appel e en fran ais Carte auto adaptative est un r seau de neurones artificiels fond sur des m thodes d apprentissage non supervis es On la d signe
176. A S 9uoz 18919 mses Ja Spry CUNY 10 21028 yn wru PI599JE90 0 12603 18 10 9320 914 Oz Apsuad Wyss yy 2213Y YLOWI BINA O0T Sup WNAE sBumes uoneand wy Buidden assejo uny bu cdeyy 4 JEG NEN luewu E14 sis uv Budden 103 sues pcoueapy Wen Jana ere A AMET dinates c panne senya RIAD haeo ptm 102 sprouse Wd esau jews oa 00096019 GMOS ay 10 mse Any al dudew f Poa SIUSIEJSS VLSVA LIN parte rope mans 4 94763 V cd 54 1U EH 10 C4 sanjra jea aS amd msajsy jang spray asfjeuy Bujddey 10 sBuyes uen 2409018 0110S SS ETORT OMOU mnog Den PU aa neg pang wa papp dry FRI DIM 2590 BRAD ae ce ary isvia Eeg dew En _ Bujousnbssey 20025014 a GDS FIGURE 5 7 Interface graphique Tomcat de Bioscope install e sur le cluster offline avec les diff rents pipelines disponibles et un exemple de lancement de pipeline 230 Chapitre 5 tude de la r gulation transcriptionelle par HTS Ficure 5 8 Fichier de param trage ini de bioscope pour une utilisation en ligne de com mandes nique Ces r sultats peuvent galement tre crois s avec d autres types de donn es tels que des donn es de transcriptome puces ADN RNA seq de m thylation Me DIP d ouverture de la chromatine FAIRE seq afin de construire des r seaux de r gulation contextualis
177. A prospective nested case control study of Dengue in infants rethinking and refining the antibody dependent enhancement dengue hemorrhagic fever model PLoS Med 6 e1000171 Green S Vaughn DW Kalayanarooj S Nimmannitya S Suntayakorn S et al 1999 Elevated plasma interleukin 10 levels in acute dengue correlate with disease severity J Med Virol 59 329 334 Mustafa AS Elbishbishi EA Agarwal R Chaturvedi UC 2001 Elevated levels of interleukin 13 and IL 18 in patients with dengue hemorrhagic fever FEMS Immunol Med Microbiol 30 229 233 Juffrie M Meer GM Hack CE Haasnoot K Sutaryo et al 2001 Inflammatory mediators in dengue virus infection in children interleukin 6 and its relation to C reactive protein and secretory phospholipase A2 Am J Trop Med Hyg 65 70 75 Koraka P Murgue B Deparis X Van Gorp EC Setiati TE et al 2004 Elevation of soluble VCAM 1 plasma levels in children with acute dengue virus infection of varying severity J Med Virol 72 445 450 15 Mairuhu AT Peri G Setiati TE Hack CE Koraka P et al 2005 Elevated plasma levels of the long pentraxin pentraxin 3 in severe dengue virus infections J Med Virol 76 547 552 Cardier JE Marino E Romano E Taylor P Liprandi F et al 2005 Proinflammatory factors present in sera from patients with acute dengue infection induce activation and apoptosis of human microvascular endothelial cells possible role of TNF alpha in endothelial cell
178. AGC Inserm U928 Parc Scientifique de Luminy case 928 163 avenue de Luminy 13288 MARSEILLE cedex 09 FRANCE bergon tagc univ mrs fr Contents 1 Overview 1 2 Fetching transcriptional signatures from TBrowserDB 2 2 1 The getSignatures function 2 2 1 1 Request without logical operators gene list 4 2 1 2 Request using logical operators 4 2 2 Finding the biological contexts in which sets of genes are co expressed 5 2 3 Finding transcriptional neighbors 6 2 4 Vizualising expression matrix 6 3 Creating transcriptional signatures from a user defined data set using DBF MCL algorithm 9 3 Installation ase lt 2 cen 2 nas Ga te en De eee we eA ae Gee are dd 10 3 29 Examples 4 25 coed So ea mate mes a ee BE ew ot Ae ea bee ee 6 11 1 Overview TranscriptomeBrowser TBrowser http tagc univ mrs fr tbrowser hosts a large collection of transcriptional signatures TS automatically extracted from the Gene Expression Omnibus GEO database Each GEO experiment GSE was processed so that a subset of the original expression matrix containing the most relevant informative genes was kept and organized into a set of homogeneous signatures 1 Each signature was tested for functional enrichment using annotations terms obtained from numerous ontologies or curated databases Gene Ontology
179. B Bourgeois A Loi L et al 2004 Feature extraction and signal processing for nylon DNA microarrays BMC Genomics 5 38 Gentleman RC Carey VJ Bates DM Bolstad B Dettling M et al 2004 Bioconductor open software development for computational biology and bioinformatics Genome Biol 5 R80 December 2010 Volume 5 Issue 12 e15590 Human Mutation OFFICIAL JOURNAL Genome Wide Analysis of Familial Dysautonomia and HGV HUMAN GENOME Kinetin Target Genes with Patient Olfactory VARIATION SOCIETY Ecto Mesenchymal Stem Cells Nathalie Boone Aur lie Bergon 2 B atrice Loriod Arnaud Dev ze Catherine Nguyen Felicia B Axelrod and El Ch rif Ibrahim 1 Aix Marseille Universit NICN UMR 6184 Marseille France CNRS NICN UMR 6184 Marseille France 2TAGC INSERM UMR_S 928 Aix Marseille Universit Marseille France D partement ORL H pital Universitaire Nord AP HM Marseille France Department of Pediatrics New York University School of Medicine New York NY Communicated by Mireille Claustres Received 10 October 2011 accepted revised manuscript 8 December 2011 Published online 20 December 2011 in Wiley Online Library www wiley com humanmutation DOI 10 1002 humu 22010 Introduction ABSTRACT Familial dysautonomia FD is a rare in herited neurodegenerative disorder The most common mutation is a c 2204 6T gt C transition in the 5 splice site 5 ss of IKBKAP intron 20 which c
180. C SN PB PCP Wrote the paper SD PCP Contributed to obtainment of funding HT 21 Ng CF Lum LC Ismail NA Tan LH Tan CP 2007 Clinicians diagnostic practice of dengue infections J Clin Virol 40 202 206 Cobb JP Mindrinos MN Miller Graziano C Calvano SE Baker HV et al 2005 Application of genome wide expression analysis to human health and disease Proc Natl Acad Sci U S A 102 4801 4806 Feezor RJ Cheng A Paddock HN Baker HV Moldawer LL 2005 Functional genomics and gene expression profiling in sepsis beyond class prediction Clin Infect Dis 41 Suppl 7 S427 435 Tang BM McLean AS Dawes IW Huang SJ Lin RC 2007 The use of gene expression profiling to identify candidate genes in human sepsis Am J Respir Crit Care Med 176 676 684 Tian L Greenberg SA Kong SW Altschuler J Kohane IS et al 2005 Discovering statistically significant pathways in expression profiling studies Proc Natl Acad Sci U S A 102 13544 13549 Laudanski K Miller Graziano C Xiao W Mindrinos MN Richards DR et al 2006 Cell specific expression and pathway analyses reveal alterations in trauma related human T cell and monocyte pathways Proc Natl Acad Sci U S A 103 15564 15569 Simmons CP Popper S Dolocek C Chau TN Griffiths M et al 2007 Patterns of host genome wide gene transcript abundance in the peripheral blood of patients with acute dengue hemorrhagic fever J Infect Dis 195 1097 1107 Long HT Hibberd M
181. Ce dernier est ensuite utilis au niveau de l analyse tertiaire d velopp e en fonction des particularit s de l analyse de donn es de ChIP seq pr sent es dans la partie pr c dente de ce chapitre L tape suivante consiste r aliser la d tection de pics Quelques outils ont t test s avec diff rents param tres MACS Hpeaks MICSA Le choix s est port finalement sur l int gration de MACS car celui ci donne des pics plus troits que Hpeaks Cependant il g n re beaucoup d artefacts empilement de reads anormalement pris en compte dus une conception d un mod le de pics assez difficile C est la raison pour laquelle en parall le MACS un autre outil de d tection de pics d velopp au laboratoire et nomm Picor a t int gr au pipeline voir ci apr s Les pics obtenus sont ensuite filtr s pour ne conserver que ceux qui ne chevauchent pas une r gion r p t e Il en r sulte un fichier de r sultat au format bed servant 1 l analyse fonctionnelle par divers scripts impl ment s sur la plateforme 2 la recherche et ou d couverte de motifs apr s r cup ration des s quences fasta sous les pics l aide de peakmotifs de la suite logicielle RSATools install e sur un serveur de notre laboratoire et 3 la visualisation des pics avec un navigateur de g nome comme IGV ou UCSC en parall le des alignements au format bam On peut ainsi partir de la localisation des pics d
182. Contr le qualit et normalisation de donn es de puces ADN 2 2 2 Transformation en logarithme base 2 Les donn es de puce ADN subissent g n ralement une transformation logarithmique de base 2 not e log2 permettant de rendre les ratios ou en anglais fold change sym triques et de r duire la dispersion des donn es en limitant l influence des valeurs extr mes En effet la plupart des intensit s mesur es sont faibles et donc potentiellement situ es au niveau du bruit de fond Mais surtout cette transformation permet d appliquer des tests statistiques param triques car la distribution des valeurs logarithmiques est plus proche de celle d une loi normale 2 2 3 Normalisation des donn es L application de m thodes de normalisation est possible dans une certaine mesure la normalisation suppose que l effet biologique ne soit pas confondu avec le biais technique que l on souhaite corriger Si ce n est pas le cas il devient difficile de trancher entre un art fact technique ou une variabilit biologique Afin de minimiser au maximum la variabilit exp rimentale et pour pouvoir comparer les chantillons entre eux une normalisation est appliqu e aux donn es dans le but de faire ressortir les diff rences r ellement dues aux variations d expression des transcrits entre les chantillons De nombreuses m thodes de normalisation existent mais aucune ne peut tre appliqu e de mani re syst matique car cela d
183. EM supplemented with insulin transferrin seleni um ITS lg l insulin 0 55 g l transferrin 0 67 mg l sodium selenite Gibco epidermal growth factor EGF 50 ng ml R amp D system and basic fibroblast growth factor 2 bFGF 50 ng ml R amp D system Half of the medium was changed every 2 days Multipotent spheres were obtained after 1 week and harvested by aspiration of the culture medium and centrifugation 5 min 300g They were then incubated in Accumax solution Sigma for 10 min at 37 C To release more cells the sample was gently triturated by repeated pipetting When disaggregation was complete cells were centrifugated 5 min 300g to remove cell debris For cell differentiation hOE MSCs were plated on glass coverslips at the density of 10 000 cells cm in six well plates for RNA extraction and 24 well plates for immunostaining in serum free medium supplemented with 1 ITS 1 B27 and 0 5 N2 until adhesion Cells were then treated with 1 ITS 1 uM all trans retinoic acid Sigma 5 uM Forskolin R amp D Systems 15 nM Sonic hedgehog R amp D Systems 1 B27 and 0 5 N2 for 7 days without changing the medium Immunocytochemistry Cells grown on glass coverslips were fixed with 4 parafor maldehyde for 20 min at room temperature and rinsed three times with phosphate buffered saline PBS Cells were preincubated for 60 min at room temperature with blocking buffer 3 BSA in PBS with 0 1 Triton X 100 and 10 normal goat
184. F6 HUGO gene names are indicated When genes were represented by several clones on the microarray p value and variance medians were calculated Genes in regular and bold are respectively under and over expressed in dengue shock syndrome patients percentage of variance associated to disease phenotype doi 10 1371 journal pone 0011671 t002 like ABCAIO 86 which regulate the efflux of modified cholesterol from Mo Mac Other lipid laden cells related genes also have altered expression in the DSS gene signature In particular the PPARA gene that negatively regulates the formation of lipid laden Mo Mac 87 has decreased abundance in DSS patients At the opposite transcripts encoding the chitinase 1 a marker of pro inflammatory lipid laden Mo Mac 73 and the FABP4 SOCS6 RETN and IRS2 proteins involved in lipid laden Mo Mac induced insulin resistance and compensatory response 68 72 have all increased abundance also strongly supporting a biological signature of foam cells Interestingly the PCSK9 transcript which encodes a secreted protein that decreases the recycling of LDL to the liver by inducing the degradation of liver LDL receptors 74 is also over expressed in the DSS signature and highly associated with the disease phenotype Thus a gene expression pattern similar to that characterizing lipid laden monocytes is activated in the whole blood cells of DSS children at the time of cardiovascular decompensation The third
185. FLX SOLID vd Seq2000 Titanium Support Billes lame Lame Billes plaque de pico titration PTP Nombre 1 4 8 8 2 4 8 16 d chantillons par support Technique mulsion PCR Par pontage sur Emulsion PCR d amplifica phase solide tion Technique de Par ligation Par synth se SBS Par synth se Pyro s quen age s quen age Longueur des 50 F3 50 F3 et 100 2 x 100 400 2 x 400 lectures en 35 F5 nucl otides fragment paired end multiplexage 4 a 96 chantillons 1 2 4 8 16 132 sur une lame Nombre de 0 7x10 3x10 0 5x10 reads Temps de 7 jours 8 5 jours 10 heures s quen age chantillon fragment TABLE 1 2 Caract ristiques des trois mod les de s quenceurs les plus r pandus 35 Le mod le GS FLX Titanium de Roche permet ainsi de s quencer des fragments d ADN longs de 400 nucl otides alors que les autres technologies s quencent des fragments courts de 50 100 nucl otides Cependant le volume de s quences produites ou reads est plus limit Cette technologie est donc largement utilis e pour le s quen age de novo des g nomes de grande taille La longueur de ces reads permet un assemblage plus facile du g nome d int r t bien que la couverture obtenue i e le nombre de fois o une base est s quenc e reste relativement faible En revanche les s quenceurs Illumina et SOLID g n rent des reads de courte tai
186. GGTTCTTCTGTTGATCTTTGGTG P WTELP1 ex37 38R AAGCTCAGCATCAAGAACAGGAACC Annealing temperature doi 10 1371 journal pone 0015590 t002 solution of Tris 10 mM EDTA 1 mM pH 8 containing 20 ng pl of E coh 16S and 23S rRNA Roche Real time PCR assay The PCR reactions were performed in triplicate in a final volume of 25 ul including 300 nM primers 200 nM TaqMan probe 12 5 ul of TaqMan universal PCR master mix Applied Biosystems and 5 wl of either cDNA or plasmid calibrator in a AB Prism 7900 HT thermocycler with 50 cycles and the protocol recommended by the manufacturer For relative quantification and microarray results validation we selected primer sets and probes matching sequences present in the IMAGE human cDNA clones of the nylon microarrays with those displayed on the web portal of Applied Biosystems The assay IDs were the following Hs00375306_m1 PMEPAT and Hs00293488_m1 S 00A16 for the dysregulated genes in FD and Hs01003267_m1 HPRT7 and Hs00293488_m1 RPLP0 for reference genes used to normalize the data We also used previously validated primers and probe for ABL as a third reference gene 62 Results were calculated using the 2 AACy method 63 For absolute quantification JABKAP primers and hydrolysis probes FAM TAMRA were designed using the Primer 3 software and are listed in Table 2 Serial dilutions of plasmid calibrators 10 10 10 10 10 copies in 5 ul were prepared and used to constr
187. GPL96 Huang F et al 2007 17332353 3DE64836D 102 143 62 Tissue GSE7904 GPL570 unpublished 2007 59A18E225 690 893 121 Both GSE2603 GPL96 Minn AJ et al 2005 16049480 6C975B20B 88 96 26 Tissue GSE6772 GPL96 Klein A et al 2007 17410534 6C975B290 88 96 26 Tissue GSE6596 GPL96 Klein A et al 2007 17410534 7150E17F6 868 1032 34 Cell lines GSE4668 GPL96 Coser KR et al 2003 14610279 8059848B4 200 250 251 Tissue GSE3494 GPL96 Miller LD et al 2005 16141321 84E5E1077 694 883 198 Tissue GSE7390 GPL96 Desmedt C et al 2007 17545524 8F69864F9 68 82 95 Tissue GSE5847 GPL96 Boersma BJ et al 2007 17999412 A151D5695 297 361 58 Tissue GSE5327 GPL96 Minn AJ et al 2007 17420468 B79B1COB9 270 380 47 Tissue GSE3744 GPL570 Richardson AL et al 2006 16473279 BDB6D8700 550 679 104 Tissue GSE3726 GPL96 Chowdary D et al 2006 16436632 D8FOB528C 125 152 159 Tissue GSE1456 GPL96 Pawitan Y et al 2005 16280042 E2E620F40 448 616 129 Tissue GSE5460 GPL570 unpublished 2007 EA9669A21 219 251 158 Tissue GSE3143 GPL91 Bild AH et al 2006 16273092 F310ACC36 519 646 49 Tissue GSE1561 GPL96 Farmer P et al 2005 15897907 Transcriptional signature ID Total number doi 10 1371 journal pone 0004001 t001 KRTAP9 8 This signature is notably annotated as being enriched in genes related to PMID 11279113 Characterization of a cluster of human high ultrahigh sulfur keratin associated protein genes embedded in the type I keratin gene domain on chromosome 17q12 21 15 and in genes related
188. H3K9mel H2BKSmel H3K27mel levels H3K4me3 H3K4me2 j H3K4mel H2A Z H3ac H4ac M akame levels CpG Island Active gene Euchromatin H3K4me3 H3K4me2 H3K4mel H3K9mel H2A Z H3ac H4ac ere Enhancer Heterochromatin A i DNA binding 4 Transcribed 4 Active 4 Repressive proteins LS DNA methylation RA Small RNAs HPI Nature Reviews Genetics FIGURE 1 5 Interaction de la m thylation de l ADN des modifications d histones du posi tionnement des nucl osomes et des autres facteurs permettant la r gulation de l expression des genes comme des facteurs de transcription et les small RNA Les r gions d euchromatine sont marqu es par H3K9me2 et H3K9me3 qui servent la liaison de HP1 Heterochromatic pro tein 1 Les small RNA sont impliqu s dans le maintien de l h t rochromatine La m thylation de l ADN est pr sente tout le long du g nome mais est absente g n ralement au niveau des r gions r gulatrices des g nes actifs ou activables La modification H3K27me3 marque les g nes inactifs tandis que H3K4me3 H3K4me2 H3K4mel l ac tylation des histones et le variant de l histone H2A H2A Z marquent la r gion d initiation de la transcription des g nes actifs Les mono m thylations de H3K4 H3K9 H3K27 H4K20 et H2BKS sont localis es au niveau des r gions transcrites avec un pic en 5 du g ne alors que H3K36me3 marque galement les r gions transcrites mais avec un pic
189. IANU HE LS OLLNI aizenue SSH vHOLVA aipewand E HVHOU VA souspus OL LNI a108 6 OLLNI aizenue S LNINNIGAW Owa E HVHOHVA eouaprao OL LNI a8 e SLNINNIQN JoIseouye OL LNI aizenue 4 LNI aipeuwqnd 8 LNINMIGAW are OLN azeue SLEVHOHVA Giobny OLLI aizonue BVHO uonejuauo OLLI SOddos OMNI soduers Sh WHOHWA aibesjes 8 LNINNIGAW axe OLN aiza SHVHOUVA anordiun SDUvHOHVA awwo OLLNI aizenue OH LNI gizonue Gy WHOUWA voeg amp r vHOUVA awosowranp 8 LNINNIGAW aer orini aizanua sr uvHOUVA aieuebiun oLlLNi aizemue amp sHuvHOHvA aiaanH OLLNI aizeaue SSI VHOUVA on femuyred slini Aug hemued EvHOVA nue SLNI Aug femued OLIN BOueapt os uvHOHVA oue6 OL LNI yeap L E VIDA H04 orun arret nanoa oo OVLNI aizenue oH1LNI v aizenue au HVHOUVA pouw ecluvHOUvA g aliquesue eE uvHOHVA V aliqwesue LXAL ed Loouapne OWLNI auonmeau S52 uvHOuvA ad Luomeau Or LLNI Aeyuono Baur or LNI quop eu SLA YHOUVA groson OLN aizenue 8 LNWNIGAW are OLN aizesue a LNWNIGAW asee sL avHouvA albesje orini gizoxue ss2 uvHouvA joqUiASoUDB 8 LNINIGAW are e OL LNI aizenue oz uvHOHvA aiiawesue oL LNI aizenue amp Gala vHouvA uasnp
190. IDs was used to store TS expression data This solution was preferred because it turned out to be an excellent alternative to database for retrieving rapidly expression values for the selected TS We next developed TBrowser a Multitier architecture system composed of i a heavy client written in JAVA presentation Tier ii a servlet container logic tier and iii a back end database data tier The client application allows user December 2008 Volume 3 Issue 12 e4001 GEO Datamining with TBrowser Platform GPL96 6000 GSES5060 GSE3868 e on rar GSE6S96 GSE1922 sesess 2 A GSE6272 e GSE6432 A e es o 9E4917 GSE2727 GSEss9 a GSE2779 e asez ee e 3 esse z Si e GSE1729 GSE1993 GSE5392 5000 x e 8 GsE16s7 eon GSES720 GSE6269 e 1 G6E1295 A o GSE5824 GSE1542 GSE5122 6 detui E GSE100 See ei cesse o GSE4119 e o GSE417 GSE6S3 GSE3910 e GSE659 a e oe 3 o eR eee A E i GSE657 GSES847 GSE6883 e o ee ases74 pal Bes a GSE2841 GSES107 S oe ee 2 g8 GSEsse7e GSES0N GSES Gserags GSE646 GSE643 o GSE050 4000 s er tes GSE649 o e GSE1743 2 o GSE4824 GSE648 e GSE2405 g Ses GSE14S e eGSEG47 oser S Ggeesso e o GSE1869 asaubi wo ees b r e GSE1466 F GSE767065E3189 nd ps z see mre yer th NE GSE4688 seeds e GSE2328 e _ O9E0905 GSE4840 e GSE6914 GSE4342 GSE982 GSE4412 7 GSE1397 e GSE2712 e GSE
191. L Hien TT Dung NM Van Ngoc T et al 2009 Patterns of gene transcript abundance in the blood of children with severe or uncomplicated dengue highlight differences in disease evolution and host response to dengue virus infection J Infect Dis 199 537 546 World Health Organization 1997 Dengue haemorrhagic fever diagnosis treatment prevention and control Geneva World Health Organization 84 p Buchy P Vo VL Bui KT Trinh TX Glaziou P et al 2005 Secondary dengue virus type 4 infections in Vietnam Southeast Asian J Trop Med Public Health 36 178 185 Reynes JM Ong S Mey C Ngan C Hoyer S et al 2003 Improved molecular detection of dengue virus serotype variants J Clin Microbiol 41 3864 3867 Edgar R Domrachev M Lash AE 2002 Gene Expression Omnibus NCBI gene expression and hybridization array data repository Nucleic Acids Res 30 207 210 Smyth GK Michaud J Scott HS 2005 Use of within array replicate spots for assessing differential expression in microarray experiments Bioinformatics 21 2067 2075 Bolstad BM Irizarry RA Astrand M Speed TP 2003 A comparison of normalization methods for high density oligonucleotide array data based on variance and bias Bioinformatics 19 185 193 Didier G Brezellec P Remy E Henaut A 2002 GeneANOVA gene expression analysis of variance Bioinformatics 18 490 491 Benjamini Y Hochberg Y 1995 Controlling the False Discovery Rate a pratical and powe
192. Minoda A Nordman J Okamura K Perry M Powell SK Riddle NC Sakai A Samsonova A Sandler JE Schwartz YB Sher N Spokony R Sturgill D van Baren M Wan KH Yang L Yu C Feingold E Good P Guyer M Lowdon R Ahmad K Andrews J Berger B Brenner SE Brent MR Cherbas L Elgin SCR Gingeras TR Grossman R Hoskins RA Kaufman TC Kent W Kuroda MI Orr Weaver T Perrimon N Pirrotta V Posakony JW Ren B Russell S Cherbas P Graveley BR Lewis S Micklem G Oliver B Park PJ Celniker SE Henikoff S Karpen GH Lai EC MacAlpine DM Stein LD White KP Kellis M Identification of Functional Elements and Regulatory Circuits by Drosophila modENCODE Science 2010 330 1787 1797 32 Bader GD Cary MP Sander C Pathguide a pathway resource list Nucleic Acids Res 2006 34 D504 506 33 Preli A Bleuler S Zimmermann P Wille A B hlmann P Gruissem W Hennig L Thiele L Zitzler E A systematic comparison and evaluation of biclustering methods for gene expression data Bioinformatics 2006 22 1122 1129 34 Nie L Xu M Vladimirova A Sun X H Notch induced E2A ubiquitination and degradation 28 are controlled by MAP kinase activities EMBO J 2003 22 5780 5792 35 Aranburu A Carlsson R Persson C Leanderson T Transcription factor AP 4 is a ligand for immunoglobulin kappa promoter E box elements Biochem J 2001 354 431 438 36 Painter MW Davis S Hardy RR Mathis D Benoist C Transcriptomes of the B and T lineages c
193. MyISAM pour stocker les donn es peu sensibles et n cessitant un acc s rapide InnoBD pour les fonctions avanc es et les donn es plus sensibles MEMORY pour des donn es pouvant tre perdues lors du red marrage de la machine et souvent modifi es ARCHIVE pour un historique ne n cessitant que peu de lecture Les deux principaux moteurs utilis s sont Myl SAM et InnoDB Leur choix est dict par le type d application que l utilisateur veut d velopper 4 1 3 Optimisations de base de donn es L optimisation d une base de donn es peut se faire deux niveaux celui du serveur et celui de la base elle m me Au niveau du serveur il est possible de modifier les variables globales et celles de session Ceci permet d ajuster l accessibilit de la base en fonction des applications La base de donn es peut tre optimis e par la normalisation d normalisation des tables l utilisation de jointures et d index le d coupage de requ tes en requ tes simples et l utilisation de programmes stock s L utilisation de programmes stock s permet 1 d augmenter la s curit de la base de donn es 2 d extraire des donn es en routine et 3 de diminuer le trafic sur le r seau Il en existe 3 types les proc dures stock es qui permettent une action mais qui ne retournent aucun r sultat les fonctions stock es qui retournent un r sultat et peuvent tre directement utilis es dans des requ tes comme par
194. NAS de l anglais Network Attached Storage Ainsi deux Dell MD1000 de 13 Tb chacun directement connect s en SAS aux clusters de calculs sont utilis s pour l acquisition et l analyse des donn es Une baie NetApp de 8 To est d di e au stockage des r sultats et une unit Netgear de 20 To contient les r sultats bruts de s quen age Enfin le stockage des donn es brutes tr s long terme est assur par une sauvegarde sur bande LTO4 Nous disposons donc d un total de 50 Tb d espace de stockage La production du s quenceur est telle qu en un an la totalit des 50Tb de stockage a t utilis e En effet une exp rience ou run et son analyse g n rent 1 2 Tb de donn es en fonction du mode de s quen age et du type d analyse Enfin diff rentes suites logicielles permettent de contr ler le s quen age et l analyse des donn es du SOLiD Figure 5 5 Elles correspondent respectivement ICS Instrument Controler Software SETS SOLID Experimental Tracking Software et Bioscope Corona Lite Des programmes et scripts d velopp s au laboratoire ou des open sources sont galement n cessaires pour permettre l analyse de l immense flot de donn es Enfin les navigateurs de g nomes commun ment appel s Genome Browser permettent l affichage de l alignement des reads le long d un g nome ainsi que d autres annotations telles que les g nes les transcrits les s quences r p
195. PLoS ONE au 11 janvier 2011 TBrowser peut tre utilis comme outils de visualisation gr ce au plugin IntractomeBrowser comme a a t d j le cas Textoris ef al 2010 ou comme base de donn es de signature transcriptionnelle au m me titre que MsigDB Molecular Signature Data Base Les donn es de coexpression de g nes de TBrowser sont ainsi utilis es par l outil PredictSearch d velopp par la soci t Pr diguard Marseille PredictSearch est un ou til commercial permettant la construction d un r seau g nique a partir d une s lection drastique de g nes Baron et al 2011 De m me pour la librairie R RTools4TB Bioconductor g n re des statistiques de t l char gement des librairies uniquement sur ceux des 12 derniers mois Bien que le t l chargement de la librairie ait baiss du fait qu elle n est toujours pas publi ce jour elle compte cependant 952 t l chargements depuis f vrier 2011 Prioritization de g nes Il est galement possible d utiliser des approches telles que la prioritization de g nes afin de mettre en vidence des g nes int ressant parmis les nombreux g nes s lectionn s lors de l analyse de donn es de puces ADN En effet l identification de g nes cl s impliqu s dans une maladie reste un d fi important de la recherche m dicale Plusieurs approches de prioritization de g nes ont t d velopp es comme Endeavour Aerts et al 2006 Les donn es pro
196. Protein kinase substrate relationships were retrieved from KEA n 14 084 Finally miRNA target relationships were obtained from TargetScan data base predictions n 260 068 For all datasets all identifiers were mapped onto Entrez Gene ids This compendium of molecular interactions is available as flat files at ftp tagc univ mrs fr public TranscriptomeBrowser DB_Tables InteractomeBrowser was developed using the Prefuse Java library which was modified according to our needs InteractomeBrowser requires Java 1 6 Results and discussion TFBS predictions using comparative genomics Although previous works have demonstrated the power of comparative genomics in defining nov el regulatory motifs in human and mouse few of them integrate the PWMs recently computed from protein binding microarray PBM experiments Overall restricting our analysis to promoter regions and using a set of 1 213 PWMs we predicted TFBSs in 141 305 position specific motifs of the mouse genome and 164 171 of the human genome The median number of hits for any PWM was 117 in mouse mean 169 range 3 2 317 and 122 in human mean 192 range 6 2 678 The PWMs with highest number of hits correspond to Spl transcription factor M00931 M00933 M00196 in both species Supplementary Figure S1 Spl binds GC rich elements consensus GGGGCGGGGC that are found in the promoter regions of a large number of genes 27 As pro moter regions are known to contain CpG islands
197. Read en base D l tion perte d une couleur mais e reste du code couleur est inchang AA Cc 6 T eA G FT 6 D l tion se R f rence en base R f rence en code couleur Read en code couleur Read en base Insertion jini d une ou plusieurs couleurs mais le reste du code couleur est inchang Am JE e T A G G T G O86 ee ee A EC CAQTASGE TE Insertion R f rence en base R f rence en code couleur Read en code couleur Read en base Ficure 1 11 Principe de la d tection de SNP et small indels par la technologie SOLiD Il est noter que ce format pr sente un inconv nient d s qu une erreur de s quen age se produit le reste de la s quence est erron Figure 1 11 C est pourquoi afin d am liorer la qualit des s quences et de visualiser toute erreur les alignements sont effectu s sur la base du code couleur et non directement sur l interpr tation du code couleur en nucl otides 1 4 1 3 Mode de s quencage En fonction des projets et du type d exp rience le mode de s quen age des librairies le plus adapt sera choisi En effet il existe 3 modes de s quen age possibles d un fragment T ADN fragment paired end et mate pair Chaque mode requiert des protocoles exp rimentaux diff rents afin de g n rer les librairies correspondantes La figure 3 12 A repr sente ces 3 types de librairies dans le cas d un s quen age
198. S were stored in an indexed flat file with a TS ID as a key This flat file is used by the TBrowser client to retrieve expression data for the requested TS Experiment metadata corresponding to sample and experiment informations were stored in a mySQL relational database Probe meta information gene symbol gene name GenBank accession ID chromosomal location Entrez ID were obtained from Biocon ductor 28 annotation packages and stored in the database In some cases as no annotation packages were available especially for GeneChip CustomExpress Array a script was used to obtain gene symbols and gene names from GenBank files based on the provided GenBank accession ID Both flat file and database information will be periodically updated to give access to novel experiments stored in GEO repository Complex9 dataset The complexe9 dataset was obtained from the UH Data Mining and Machine Learning Group UH DMML http www2 cs uh edu ml_kdd Cluster Affinity Search Technique CAST was run using the TMEV software QT_CLUST and k means were run using the flexclust and fpc R package For k means the algorithm was run 10 times with random initial centers Hierarchical clustering was performed using the amap library from the R Bioconductor project The Euclidean distance was used in all cases Functional enrichment analysis We used the DAVID knowledgebase 12 for functional enrichment analysis as it provided a practical mean to gain a
199. SCO2 gene evidence of intertissue and interindividual variation in Nmd efficiency J Cell Physiol 209 67 73 Holmberg C Katz S Lerdrup M Herdegen T Jaattela M et al 2002 A novel specific role for I kappa B kinase complex associated protein in cytosolic stress signaling J Biol Chem 277 31918 28 Kim JH Lane WS Reinberg D 2002 Human Elongator facilitates RNA polymerase IT transcription through chromatin Proc Natl Acad Sci U S A 99 1241 6 Cornez I Creppe C Gillard M Hennuy B Chapelle JP et al 2008 Deregulated expression of pro survival and pro apoptotic p53 dependent genes upon Elongator deficiency in colon cancer cells Biochem Pharmacol 75 2122 34 Watanabe Y Itoh S Goto T Ohnishi E Inamitsu M et al 2010 TMEPAT a transmembrane TGF beta inducible protein sequesters Smad proteins from active participation in TGF beta signaling Mol Cell 37 123 34 Stamm S Riethoven JJ Le Texier V Gopalakrishnan C Kumanduri V et al 2006 ASD a bioinformatics resource on alternative splicing Nucleic Acids Res 34 D46 55 Cohen L Henzel WJ Baeuerle PA 1998 IKAP is a scaffold protein of the IkappaB kinase complex Nature 395 292 6 Pros E Fernandez Rodriguez J Benito L Ravella A Capella G et al 2009 Modulation of aberrant NF1 pre mRNA splicing by kinetin treatment Eur J Hum Genet 18 614 7 Viktorov IV Savchenko EA Chekhonin VP 2007 Spontaneous neural differentiation of stem cells in culture of human
200. TS which contain genes located in the 6p21 3 and 14q32 33 chromosomal regions major histocompat ibility complex and human immunoglobulin heavy chain locus respectively and which contain T cell specific genes can be translated as 6p21 3 4 amp 14q32 33 4 amp T CELL ACTIVA TION 5 12 4 cytoband term 5 GO term 12 Pan ther pathways term As chromosomal aberrations do occur frequently in cancer our approach can also be used to perform systematic cytogenetic analysis Indeed throughout our analysis 2 208 functional enrichments related to 360 human cytobands were observed and stored in the database As an example TS with very strong enrichment q value lt 1 10 2 for any of the human cytobands stored in the database are presented in Table 2 The first one is related to atopic dermatis analysis skin biopsies and contained 24 of genes located in 17q12 q21 They correspond to genes encoding for the keratin and keratin associated protein families KRT17 KRT27 KRTAP1 5 KRTAP17 1 KRTAP3 1 KRTAP3 3 KRTAP4 10 KRTAP4 12 KRTAP4 13 KRTAP4 15 KRTAP4 2 KRTAP4 3 KRTAP4 5 KRTAP4 8 KRTAP4 9 KRTAP9 2 KRTAP9 3 KRTAP9 4 and December 2008 Volume 3 Issue 12 e4001 GEO Datamining with TBrowser Table 1 Transcriptional signatures containing Affymetrix probes for ESR1 GATA3 and FOXA1 Ts ID Genes Probes Samples Sample type GSE ID GPL ID Author PubMed IDs 0F2635383 1190 1572 23 Cell lines GSE6569
201. U lt m MU c S n 0 4 n 0 2 on n 2 2 g g o o 0 2 0 1 J L a a 0 ui El El i 2 0 i i l l E i 0 25 50 100 200 ctrl 1h 2h 6h 24h 1h 2h 6h 24h kinetin uM treated wash out Cc F kinetin yM treated wash out IKAP hELP1 150 kDa 150 kDa CJCTRL FD 0 5 ax xx 0 4 0 3 0 2 0 1 0 kinetin 80pm IKBKAP expression AU ctrl 1h 2h 6h 24h 1h 2h Gh 24h Figure 7 Action of kinetin on KBKAP mRNA splicing in FD hOE MSCs cells A C hOE MSCs were treated with increasing concentration of kinetin for 72 h i Total RNAs were reverse transcribed and subjected to both semi quantitative PCR A and absolute qPCR B of WT and MU KBKAP transcripts ii total lysates were analyzed by western blot using a monoclonal mouse anti IKAP hELP1 antibody C D F kinetics of hOE MSCs incubated for 24 h with 80 uM kinetin which was then removed for the next 24 h Total RNAs were reverse transcribed and subjected to XBKAP specific semi quantitative PCR D and absolute RT qPCR E Total lysates were analyzed by western blot F The level of WT and MU transcripts was normalized using ABL7 as a reference gene B and E G Two controls CTRL and two FD hOE MSCs treated or not with 80 uM kinetin for 24h were analyzed by absolute RT qPCR to determine the amount of IKBKAP exon 2 inclusion exon 20 inclusion exon 20 skipping and exon 36 exclusion after normalization with ABL1 P lt 0 05 P lt
202. ULATORY NETWORKS 2 eid Tos i we i ee Shee Sek Sed Boa BY He aes Sea 173 4 6 3 Les cartes transcriptionnelles pour le plugin TBMap 205 4 7 Acc s programm la base de donn es de TBrowser 205 4 7 1 D veloppement de services web 205 4 7 2 Impl mentation d une librairie R Bioconductor RTools4TB 206 4 8 Conclusions et perspectives 40 oo go ho ee ot te wt te 4 207 5 Etude de la r gulation transcriptionelle par HTS 215 5 1 Principe de l immunopr cipitation de la chromatine associ e au s quen age tr s haut d bit ChIP seq ese oR Xe EN Sade EE se do re 216 SL Geo ralites sa ee he Be Oe Be de NE Lo re 216 5 1 2 Principe biologique oa ace Se ee a ue eR OER m6 8 EHR BS 218 5 13 Bi setbrutdefond s sss sr sau nru ERAGE AER AS ES 218 5 1 4 Avantages et inconv nients ooo a ee eae ee ee eS 220 5 1 5 Le mod le th orique de distribution des s quences 220 5 2 Liinfonmanquedu HTS zs La Sous sh eek oe eH Oa DEES EOE HS 221 5 2 1 Organisation mat rielle et logicielle 221 5 2 2 Interfaces utilisateurs pour le lancement et la gestion du s quen age 227 5 2 3 Pipeline de traitement de donn es Bioscope 227 5 3 Analyse de donn es de ChIP seq L di sua dre de gs gere RE 228 5 3 1 Donn es brutes et qualit de s quen age ca 24 8 a oa kate 230 5
203. UMR_S 928 nous avons d velopp notre propre outil permettant l analyse des donn es brutes de puces Agilent r alis es au ni veau de la plateforme TGML 2 1 Obtention des donn es brutes d expression 2 1 1 Conception du plan d exp rience et biais techniques La conception du plan d exp rience est la premi re tape importante pour obtenir des donn es de qualit En effet il est tr s important de bien concevoir l exp rience car les sources de variabilit exp rimentale sont nombreuses Mutter ef al 2004 Ransohoff amp Gourlay 2010 L une des premi res sources de variabilit est directement 60 Chapitre 2 Contr le qualit et normalisation de donn es de puces ADN li e au mat riel biologique lui m me bien souvent h t rog ne Cela est particuli rement vrai dans le cas des tumeurs qui sont constitu es de nombreuses populations cellulaires tr s diff rentes dans la plupart des cas De plus le pr l vement des cellules canc reuses par biopsie s accompagne souvent de celui de cellules saines Pour des exp riences de puces ADN l id al serait de travailler sur des populations de cellules homog nes et dont le cycle de division cellulaire serait synchronis ce qui n est pas r alisable dans le cas des biopsies Lors de la conception d une exp rience destin e l tude d expression par puces ADN Agilent il faut avant tout choisir l approche utiliser one color ou two co
204. We next applied DBF MCL algorithm to all experiments performed on human mouse and rat Affymetrix microarrays and available in the GEO database 33 platforms Supplementary Table S1 and 82 Only experiments containing more than 10 biological samples were kept for analysis Overall this dataset includes 46 564 biological samples hybridized in the context of 1 484 experiments Each experiment was analyzed independently and subjected to TS discovery process k 100 FDR 10 S1 3 Inflation 2 As mentioned in the Material and Methods section we rank transformed data from each biological sample to get a common input for DBF MCL algorithm and to allow analysis of a large broad of experiments whose normalization status is frequently unknown Furthermore a distance based on Spear December 2008 Volume 3 Issue 12 e4001 GEO Datamining with TBrowser A o O o o 3 Ru Cellular metabolism Apoptosis Bh cett cycte I ribosome JE tracellular matrix Cell adhesion I 7 iymphocyte activation ice alpha 1 Nuctear phosphorylation mRNA metabolism amp processing a Oestrogen receptor response M transcription B lymphocyte activation Humoral response Figure 1 Results obtained with the GSE1456 dataset DBF MCL was run with GSE1456 as input k 100 FDR 10 S Inflation 2 A Hierarchical clustering of the GSE1456 dataset B Same as A but only informative genes are displayed C The graph construc
205. XBP1 are co expressed in breast cancer tumors see 4 This assumption can be easily verified using RTools TB For instance in the following examples we fetch transcriptional signature IDs that contain XBP1 amp ESR1 amp GATA3 Next the getTBInfo function is used to retrieve the experiment description from which they are derived here only for TS ID 3DE64836D gt TS lt getSignatures field gene value XBP1 amp ESR1 amp GATA3 14 signatures were found for the request gene XBP1 amp ESR1 amp GATA3 gt head TS 1 OF2635383 3DE64836D 59A18E225 8059848B4 84E5E1077 SF69864F9 gt a lt getTBInfo field signature value 3DE64836D verbose FALSE gt exp lt al Experiment 1 gt info lt getTBInfo field experiment value exp verbose TRUE A result was found for experiment GSE7904 Name GSE7904 Organism Homo sapiens PMID NULL Nb samples 62 Title Expression data from human breast tissue Summary bulk breast tumor RNA from patientAbstract Sporadic basal like cancers BLC are a distinct class of human breast cancers that are phenotypically similar to BRCAl associated cancers Like BRCA1 deficient tumors most BLC lack markers of a normal inactive X chromosome Xi Duplication of the active X chromosome and loss of Xi characterized almost half of BLC cases tested Others contained biparental but nonheterochromatinized X chromosomes or gains of X
206. YTOBAND GENERIF SUMMARY PUBMED ID BIOCARTA KEGG PANTHER REACTOME BLOCKS COG INTERPRO PFAM SCOP SMART SSF TIGRFAMS BIND NCICB CAPATHWAY REACTOME TFBS conserved CGAP EST QUARTILE CGAP SAGE QUARTILE GNF U133A QUARTILE PIR TISSUE SPECIFICITY UNIGENE EST QUARTILE UP TISSUE Disease General annotations Litterature Pathways Protein domains Protein interactions Tissues espressions TABLE 3 1 Liste des principales annotations contenues dans l outil DAVID knowledgebase regroup es par domaine 3 3 Annotation fonctionnelle biological process cellular process metabolic process lt nitrogen utilization macromolecule metabolic process cellular metabolic process cellular biosynthetic nee macromolecule process metabolic process biosynthetic process nucleobase containin g compound metabolic process nitrogen compound metabolic process biosynthetic process primary metabolic process cellular nitrogen compound metabolic process nucleic acid metabolic process cellular gene expression macromolecule biosynthetic process RNA metabolic process RNA biosynthetic process 85 Isa Part of Has part a Je nnnsonscosssemesensessscessessse Regulates Positively regulates a Occurs in FIGURE 3 6 Exemple de structure de ontologie Gene Ontology B
207. abondamment fix es telles que l ARN polym rase IT et pour la localisation de certaines modifications d histones comme la tri m thylation de la lysine 4 de l histone 3 not e H3K4me3 alors que de tr s grosses quantit s seront n cessaires pour des prot ines moins abondantes ou des modifications d histones plus diffuses Il existe des kits permettant de travailler sur de faibles quantit s de cellules comme le kit MAGnify de Life Technologies avec lequel on peut utiliser de 1x10 1x10 cellules L un des points les plus cruciaux dans la r alisation d un ChIP est bien entendu le choix de l anticorps Il devra avoir une forte affinit et une sp cificit lev e envers l pitope de la prot ine tudi e pour permettre un enrichissement sp cifique et donner au s quen age un rapport signal sur bruit statistiquement significatif On choisira pr f rentiellement un an ticorps monoclonal se liant un seul pitope afin d viter toute r action d hybridation parasite Dans le cas o les anticorps dirig s contre un facteur de transcription donn ne sont pas utilisables car ils ne donnent pas de bons r sultats en ChIP il est parfois possible de faire exprimer par des cellules les prot ines d int r t marqu es l aide de tags tels que Myc HA hemagglutinine ou bien encore la biotine L anticorps utilis pour r aliser le ChIP sera alors un anticorps sp cifique du tag et non plus de la prot
208. activity doi 10 1371 journal pone 0011671 t003 PLoS ONE www plosone org July 2010 Volume 5 Issue 7 e11671 Molecular Mechanisms of DSS Table 4 Pro inflammatory innate immunity related genes present in the DSS gene signature Function Gene Symbol P value microbicidal peptides DEFA1 DEFA3 DEFA4 CAMP lt 0 00001 LTF lt 0 00001 calgranulin proteins S100A8 S100A9 lt 0 00001 S100A12 lt 0 00001 granulocyte enzymes RNASE2 0 00017 MPO 0 00024 RNASE3 lt 0 00001 MMP8 lt 0 00001 CTSG lt 0 00001 ELANE lt 0 00001 pro inflammatory cytokines and related IL18 0 00052 molecules IL18BP 0 00710 lt 0 00001 to 0 00007 to 0 00014 Var Main cellular origin Ref 0 25 to 0 44 PMN neutro EpC 49 50 0 34 PMN neutro Mo mast cells EpC 49 50 0 41 PMN neutro inflammed EpC 50 0 18 to 0 38 PMN neutro Mo Mac 51 52 0 33 PMN neutro 51 0 25 Mo Mac Eo EpC PMN neutro 53 0 25 PMN neutro Mo subtypes of tissue Mac 50 0 29 Eo Mo PMN neutro 54 0 49 PMN neutro 50 0 36 PMN neutro 50 0 39 PMN neutro 50 0 21 Kupffer cells activated Mac Mo DC EpC 55 0 20 T cells peripheral blood leukocytes EC 55 HUGO gene names are indicated When genes were represented by several clones on the microarray p value and variance medians were calculated Genes in regular and bold are respectively under and over expressed in dengue shock syndrome patients DC dendritic cell EC endothelial cell Eo
209. ada T Endo M Singh M B amp Bhalla P L 2005 Analysis of the histone H3 gene family in Arabidopsis and identification of the male gamete specific variant AtMGH3 The Plant journal for cell and molecular biology 44 4 557 68 Olguin Lamas et al 2011 Olguin Lamas A Madec E Hovasse A Werkmeister E Cal lebaut I Slomianny C Delhaye S Mouveaux T Schaeffer Reiss C Van Dorsselaer A amp Tomavo S 2011 A novel Toxoplasma gondii nuclear factor TgNF3 is a dynamic chromatin associated component modulator of nucleolar architecture and parasite virulence PLoS pathogens 7 3 e1001328 Orphanides et al 1996 Orphanides G Lagrange T amp Reinberg D 1996 The general transcription factors of RNA polymerase II Genes amp development 10 21 2657 83 Paquet amp Yang 2008 Paquet A amp Yang J Y H 2008 arrayQuality Assessing array qua lity on spotted arrays R package version 1 24 0 Pareek et al 2011 Pareek C S Smoczynski R amp Tretyn A 2011 Sequencing technolo gies and genome sequencing Journal of applied genetics Parkinson et al 2011 Parkinson H Sarkans U Kolesnikov N Abeygunawardena N Burdett T Dylag M Emam I Farne A Hastings E Holloway E Kurbatova N Lukk M Malone J Mani R Pilicheva E Rustici G Sharma A Williams E Adamusiak T Brandizi M Sklyar N amp Brazma A 2011 ArrayExpress update an ar
210. age tr s haut d bit offre de nombreuses applications allant de l tude de l pig n tique au transcriptome en passant par la g nomique Figure 1 14 et Table 1 4 Cependant le co t d utilisation du s quen age tr s haut d bit reste particuli rement lev ce qui explique que pour l tude du transcriptome l utilisation de puces ADN pang nomiques d crites pr c demment est encore tr s r pandue Cependant l utilisation du RNA seq par Whole Transcriptome Shotgun Sequencing WTSS ou Serial Analysis of Gene Expression SAGE seq est utile pour l tude globale des transcrits mRNA snRNA lincRNA miRNA la d tection de transcrits alternatifs et de nouveaux g nes 46 Chapitre 1 Introduction g n rale Phospholnked heraphosphate nucleotides Pacific Biosciences Pacific Biosciences Life Visgen LI CORBiosciences Single molecule polymerase immobilized Thousands of primed crgle rolecule templates lon torrent Life Technologies Puit micro r actionnel Couche sensible aux ion lon senseur Adapt e de Rothberg et al 2011 Nature Ficure 1 13 La nouvelle g n ration de s quenceurs A La technologie de Pacific Bios ciences sur le principe de SMRT B L Ion Torrent et sa puce semi conductrice pour la lecture d un diff rentiel de pH Adapt de Metzker 2010 et Rothberg et al 20111 ChlP seq Mnase seq FAIRE seq Euchro
211. aggregates when cultured in serum free medium in the presence of EGF and bFGF Figure 8B and C Both control and FD hOE MSCs were able to form spheres in approximately one week and immunostaining with anti B III tubulin Figure 8D and anti nestin Figure 8E antibodies revealed a similar staining of both markers for control and FD cells Figure 8F Total RNAs isolated from either FD spheres FD cells cultured in serum during the same period or dissociated cells from spheres that were reintroduced in serum medium for 24 h were subjected to RT qPCR We observed a significant increase of JABKAP exon 20 inclusion in spheres when compared to hOE MSCs in serum conditions as well as a semi disappearance of JKBKAP exon 20 skipping Figure 8G Dissoci ated spheres re exposed to serum rapidly expressed initial levels of WT and MU transcripts Figure 8G We quantified WT and MU transcript level of expression in these 3 different conditions and confirmed that spheres formation from FD hOE MSC induces PLoS ONE www plosone org IKBKAP mRNA splicing correction using RT qPCR Figure 8H We also looked for exon 2 and exon 36 alternative splicing events but did not detect significant alterations of splicing ratio resulting from sphere formation and dissociation data not shown Commitment of FD OE MSCs into neuronal and glial lineages leads to a more severe IKBKAP exon 20 skipping FD hOE MSCs were treated for 7 days to induce neuronal di
212. ain entry as it is used i to define the search method ii to write the queries iii to launch database interrogation and iv eventually to filter out some of the TS Filters can be applied to select species of interest and to control the sizes number of samples and number of genes of the TS that one wants to analyze The results area can display two panels the list of queries the user launched during his session and the list of TS that correspond to the currently selected query Double clicking on one or several TS send it them to the selected plugin The information area is used to display various informations about the selected TS whereas the plugin area is used to select one of the currently installed plugins Finally the plugin display panel manages the display of the currently selected plugin To date eight plugins have been developed three of them are presented in this article The Heatmap plugin is composed of two main panels the heatmap on the left and the annotation panel on PLoS ONE www plosone org 5 the right Fig 3 The Heatmap panel displays a color coded image of TS expression values In this representation each row corresponds to a probe and each column to a sample Additional informations such as external links can be retrieved by single click on genes or samples Functional enrichment informations are available on the right The TBCommonGenes plugin was developed to compare gene composition of several
213. al 2002 Edgar R Domrachev M amp Lash A E 2002 Gene Expression Omni bus NCBI gene expression and hybridization array data repository Nucleic acids research 30 1 207 10 Eisen et al 1998 Eisen M B Spellman P T Brown P O amp Botstein D 1998 Clus ter analysis and display of genome wide expression patterns Proceedings of the National Academy of Sciences of the United States of America 95 25 14863 8 Elnitski et al 2006 Elnitski L Jin V X Farnham P J amp Jones S J M 2006 Locating mammalian transcription factor binding sites a survey of computational and experimental techniques Genome research 16 12 1455 64 Enright et al 2002 Enright A J Van Dongen S amp Ouzounis C A 2002 An efficient algorithm for large scale detection of protein families Nucleic acids research 30 7 1575 84 Ewing amp Green 1998 Ewing B amp Green P 1998 Base calling of automated sequencer traces using phred II Error probabilities Genome research 8 3 186 94 Ewing et al 1998 Ewing B Hillier L Wendl M C amp Green P 1998 Base calling of automated sequencer traces using phred I Accuracy assessment Genome research 8 3 175 85 Fedorova amp Zink 2008 Fedorova E amp Zink D 2008 Nuclear architecture and gene regu lation Biochimica et biophysica acta 1783 11 2174 84 Feng et al 2009 Feng C Araki M Kunimoto R Tamon
214. al 2009 et Pekowska ef al 2010 Sch ma de l organisation du mat riel utilis sur la plateforme IBiSA TGML du TAGC pour l acquisition et l analyse des donn es de s quen age tr s haut d bit par la technologie SOLID D roulement de la pr paration du s quen age et de l analyse l aide des diff rents logiciels Adapt du manuel d utilisation d Applied Biosystems SOLiD Experimental Tracking Software SETS v4 0 1 Aper u de l interface graphique du logiciel ICS pilotantlerun Interface graphique Tomcat de Bioscope install e sur le cluster offline avec les diff rents pipelines disponibles et un exemple de lancement de pipeline Fichier de param trage ini de bioscope pour une utilisation en ligne de com MANES Rd ee oe ee eae oe Mb ee eme te Se Pipeline d analyse des donn es de ChIP seq Les formats fichiers bruts standards du SOLiD avec en A la notation de Videntifiant des billes et en B quelques lignes seulement d un fichier cs Paste oe OI Se sr Ee ee eae t Visualisation de la qualit des reads l aide des logiciels SETS ou FastQC Choix de la m thode de d tection de pics et repr sentation des artefacts Adapt de Pepke et al 2009 et Rye et al 2011 66 4 ss hee ee ee REM a ws Les diff rents logiciels de recherche de pics A Tableau r capitula tif des principales m
215. al 2011 2 les exomes qui diff rent norm ment entre les lign es cellulaires Chang et al 2011 avec en particulier 3 les ARN non codants et leurs implications dans le cancer Martens Uzunova et al 2011 Ferdin et al 20101 Ceci nous permet d en apprendre d avantage sur le g nome notamment dans le cas de cancers Meyerson et al 2010 Aburatani 20111 M me si les principales tudes portent sur le cancer il existe des tudes plus fondamentales ou touchant d autres pathologies telles que des infections bact riennes ou virales Olguin Lamas ef al 2011 Lu et al 2010 Ces techniques ont rendu des projets de grande envergure possibles Therapeutically Applicable Research to Generate Effective Treatments TARGET a pour une analyse d int gration de plusieurs techniques allant du transcriptome au re s quen age en passant par la d tection de mutations dans le cas de plusieurs cancers touchant les enfants ceci afin de s lectionner de nouvelles mol cules th rapeutiques Quant au projet European Prospective Investigation into Cancer and Nutrition EPIC il porte non seulement sur la d termination de signatures mol culaires de cancers mais aussi sur l impact de facteurs tels que le tabac la nutrition Le d veloppement du s quen age tr s haut d bit a galement permis la cr ation du pro jet 1000 g nomes en 2008 qui a pour ambition de caract riser les variations g nomiqu
216. alyser une exp rience donn e J ai contribu ce projet partir de mon stage de M2BBSG en janvier 2008 sous la direction de Jean Imbert et Denis Puthier Il portait sur le d veloppement de fonctionnalit s pour cette application J ai poursuivi ce projet en th se avec davantage de d veloppements et d am liorations 44 D veloppement de l application Nous avons mis au point une nouvelle approche de partitionnement pour extraire de mani re syst matique et automatis e des groupes de g nes co exprim s partir de centaines de jeux de donn es issus de GEO et appel s GSE pour Gene Serie Experiment gt Pour cela nous avons utilis l algorithme MCL pour Markov CLustering Enright et al 20021 et inclus une tape de filtrage des donn es permettant de ne conserver que les g nes ayant une r elle variation au sein d une exp rience et de retirer ainsi le bruit inh rent ce genre d exp rience Cette nouvelle m thode d analyse a t nomm e DBF MCL pour Density based filtering and Markov CLustering Figure 4 2 Les signatures transcriptionnelles TS pour Transcriptional Signature correspondent des groupes de g nes ayant des profils similaires au sein d une exp rience Par cette strat gie innovante nous avons extrait 18 250 TS partir de 1 484 GSE provenant de 70 plateformes ou GPL de type Affymetrix Mde la base de donn es GEO Ces exp riences correspondent des tudes
217. alysis and display of genome wide expression patterns Proc Natl Acad Sci USA 95 14863 8 HUMAN MUTATION Vol 00 No 0 1 11 2012 9 Esberg A Huang B Johansson MJ Bystrom AS 2006 Elevated levels of two tRNA species bypass the requirement for elongator complex in transcription and exocytosis Mol Cell 24 139 148 Falk J Bechara A Fiore R Nawabi H Zhou H Hoyo Becerra C Bozon M Rougon G Grumet M Puschel AW Sanes JR Castellani V 2005 Dual functional activity of semaphorin 3B is required for positioning the anterior commissure Neuron 48 63 75 Fortes P Bilbao Cortes D Fornerod M Rigaut G Raymond W Seraphin B Mattaj IW 1999 Luc7p a novel yeast U1 snRNP protein with a role in 5 splice site recognition Genes Dev 13 2425 2438 Frotscher M 2010 Role for Reelin in stabilizing cortical architecture Trends Neurosci 33 407 414 Gibb SL Jeanblanc J Barak S Yowell QV Yaka R Ron D 2011 Lyn kinase regulates mesolimbic dopamine release implication for alcohol reward J Neurosci 31 2180 2187 Hawkes NA Otero G Winkler GS Marshall N Dahmus ME Krappmann D Scheidereit C Thomas CL Schiavo G Erdjument Bromage H Tempst P Svejstrup JQ 2002 Purification and characterization of the human elongator complex J Biol Chem 277 3047 3052 Hernandez Montiel HL Tamariz E Sandoval Minero MT Varela Echavarria A 2008 Semaphorins 3A 3C and 3F in mesencephalic dopaminergic axon pathfinding J Comp Neurol 506 387 397
218. an nodes and empty compartments An option called Hide intercompartmental edges al lows users to remove several unlikely edges of the network notably those involving physical inter actions between distant compartments eg an instance of gene A in the nucleus and an instance of gene B in the extracellular regions When the mouse is over a node or an edge corresponding in formation is provided in the Infos tab on the left side of the application Right clicking on a node opens a context menu allowing users to i open the NCBI web page for this gene ii add regulato 11 ry interactions involving this gene and other genes of the network iii move the node to another compartment and iv connect to UCSC genome browser The action menu provides other tools to expand the network i add all the interactors of the selected genes or 11 add common interactors of selected genes IBrowser can be used with any user defined gene list for examples genes of interest in a particular experiment Additionally the integration of this tool into the TranscriptomeBrowser suite facilitates the analysis of lists corresponding to pre processed clusters of co expressed genes stored in the database The next part of the result and discussion section demonstrates the use of InteractomeBrowser for retrieving molecular interactions in the context of thymocyte differentiation analysis Case study early T cell development in mouse The development of
219. ance et permettre une meilleure gestion des donn es j ai mis au point une base de donn es MySQL version 5 0 avec un moteur de stockage de type MyISAM Ce moteur ne supporte pas les transactions regroupement de plusieurs instructions en une seule ni les clefs trang res contrainte d int grit de la base de donn es Cependant j ai choisi ce type de moteur car celui ci est simple mettre en oeuvre et car il est g n ralement conseill pour les applications utilisant essentiellement des requ tes en lecture et donc peu de requ tes en criture L absence de transactions a t remplac e par l utilisation de fonctions sous forme de proc dures stock es La gestion de l int grit des tables a t impl ment e au niveau de l application TBrowser Cependant afin de limiter la redondance des donn es des clefs trang res ont t cr es telles que expID pour qualifier une exp rience et signatureID pour une signature transcriptionnelle Les identifiants de g nes ont tous t rapproch s de l identifiant le plus utilis et surtout disponible pour toutes les esp ces sous le m me format de plus petite taille les gene ID valeur num rique facile stocker dans la base et not entrezID dans notre base Figure 4 4 L utilisation des fichiers plats index s contenant les matrices d expression a t conserv e car il n existe pas ma connaissance de moyen plus efficace de stocker ce type de donn
220. ancer Institute Vanderbilt Microarray Shared Re source Genome Institute of Singapore several of them being related to the MicroArray Quality Control MAQC project GSE5350 26 However to date systematic analysis of all experiments performed on these platforms has not been done The flexibility of our approach also makes it possible to integrate and compare data obtained through any kind of large scale analysis technologies providing that the experiment can be represented by a single numerical matrix ChIP on chip Protein array large scale Real time PCR ChIP seq etc Three plugins Heatmap TBCommonGenes and TBMap have been presented in this article but seven new plugins have been recently developed manuscript in preparation In the near future the ease of plugin development will makes it possible to look for TS enriched in genes sharing transcription factor and miRNA specific motifs in their non coding regions As raw data are only available for some of the microarray datasets we used the normalized data provided by submitters These data were subsequently rank transformed and used for classification This procedure allowed us to re analyze a very large number of datasets However the drawback is that quality status of individual samples or experiments could not be determined computing the so called 3 5 ratio requires raw data We plan to provide extensive quality control informations through a dedicated plugin H
221. anscription elongation due to impaired Elongator activity Close et al 2006 Moreover gene expression profiling studies have shown that most of gene expression differences between control and FD samples are involved in nervous system development which correlates with FD physiopathology and findings from other cellular systems Chen et al 2009b Cohen Kupiec et al 2011 Lee et al 2009 When we explored the transcriptome of spheres we hypothesized that such cell populations maintained at a higher undifferentiated 8 HUMAN MUTATION Vol 00 No 0 1 11 2012 state would likely reveal discriminating markers of the stem state Interestingly rather than displaying a profile that is more consis tent with stem cells we identified nervous system related genes in spheres In fact spheres contain a heterogeneous mixture of cells and progenitors whose identity and proportion still need to be charac terized However this discrepancy with our hypothesis suggests that spheres can be a relevant model for predicting FD alteration as also proposed for other diseases such as schizophrenia and Parkinson s disease Cook et al 2011 Matigian et al 2010 As in studies for all rare diseases the sample size is unavoidably small which may lead to moderate differences in gene expression variations In addition previous investigations at the genome wide level aiming to identify transcriptional defects associated to FD used different cel
222. anscription sp cifiques Leur liaison non covalente au niveau de l ADN s effectue 1 3 R gulation de l expression des g nes 25 au niveau de sites sp cifiques appel s sites de fixation de facteurs de transcription TFBS Transcription Factor Binding Site afin d activer ou d inhiber l expression d un g ne donn 1 3 2 Les s quences r gulatrices et les facteurs de transcription s quences sp cifiques La modulation de l expression des g nes est rendue possible par l assemblage de prot ines tel que les facteurs de transcription li s l ADN au niveau des s quences r gulatrices Ce contr le spatiotemporel de l expression des g nes au sein de l organisme permet la mise en place et le maintien de la sp cificit tissulaire impliquant de nombreuses voies de signalisation et r seaux de r gulation transcriptionnelle Naef amp Huelsken 2005 Zhang et al 2004 Visel et al 2009a Ces r gions r gulatrices sont de plusieurs types 1 les promoteurs lorsqu ils sont situ s proximit du site d initiation de la transcription de la r gion codante 2 les enhancers quand ils se localisent distance du site d initiation de la transcription et qu ils potentialisent l action du promoteur 3 des silencers lorsqu ils se situent comme les enhancers distance du g ne mais qu ils le r priment et 4 les insulateurs qui correspondent une s quence r gulatrice affectant l interact
223. anscripts as well as LUC7L and ZNF280D expressions were analyzed by RT qPCR Each gene was normalized using WOR59 as a reference gene hOE MSCs to two consecutive rounds of 24 hr treatment with 80 uM kinetin followed by a 24 hr wash out At each 24 hr time point with kinetin treatment we analyzed gene expression by RT qPCR and observed that WT IKBKAP transcripts and LUC7L expression increased while MU IKBKAP transcripts and ZNF280D expression decreased Fig 5B This variation in expression returned to basal levels during washout period As expected in control cells kinetin treatment modulated expression of LUC7L and ZNF280D without acting on IKBKAP WT isoforms These results strongly suggest that kinetin may increase the efficiency of 5 ss recognition in the FD context through the recruitment of U1 snRNP Genes Involved in mRNA Splicing Display an IKBKAP Like Pattern of Expression When analyzing gene expression data it is informative to include a clustering algorithm to find groups of genes that behave similarly over a number of experiments Eisen et al 1998 Slonim 2002 To HUMAN MUTATION Vol 00 No 0 1 11 2012 7 better understand the FD physiopathology and since IKBKAP rep resents the best biomarker to discriminate between control and FD samples as well as samples with or without kinetin treatment we wanted to identify genes with expression pattern similar to that of IKBKAP We used hierarchical clustering to create den
224. ar le HTS avec divers niveaux d abs traction adapt de Fullwood et al 20091 1 15 Diagramme des objectifs du consortium travaillant sur le d cryptage des pig nomes humains le IHEC International Human Epigenome Consortium Cette figure est issue du site internet du consortium IHEC 2 1 Les diff rents types de repr sentations A nuage de points B diagramme C histogramme D bo te moustaches 2 2 Principe de la m thode des quantiles oc bods sus dues re SROs Hs 2 3 Exemple de structure d une librairie R ici la librairie R limma avec en A son architecture de fichier au niveau du code source et en B celle apr s compila tion et installation de la librairie 2 4 Sch ma r capitulatif de l analyse de donn es de puces ADN incluant la librai HE PAIGE See hi oy eR SVR GOR Bg a tes ge Sa bs 3 1 Distribution de la loi de Student 204 46 an at 8 ut 4e 4 4 3 2 Repr sentation d type volcano plot a 4 ee sun a nte bus ak est 3 3 Repr sentation de la valeur de d obtenue pour chaque g ne i soit d i en fonc tion de la valeur simul e dg i 3 4 Repr sentation d un jeu de donn es A le heatmap sur les donn es brutes B la classification hi rarchique et C la partitionnement par la m thode k means ESS eh ee eee Be amp oe ew ok Pe ad RS RPS Re 3 5 Interpr tation d
225. aracteristic of granulocyte neutrophil activity 51 and involved in a diversity of inflammatory diseases 56 as well as the granulocyte related metalloprotease MMP8 are also over expressed Increased abundance of those transcripts cannot be explained by increase in granulocyte count since DSS patients have lower relative granulocyte counts than DF and DHF counterparts median values DF 3900 DHF 3950 DSS 2500 p value 0 03 Kruskal Wallis test thus reflecting more likely cellular activation Altogether those results show that a transcrip tional pattern of innate defense genes is activated in the whole blood of DSS children The second pro inflammatory gene pattern identified is typical of altered homeostasis of cholesterol in monocytes macrophages PLoS ONE www plosone org that characterises inflammatory lipid laden monocytes macro phages lipid laden Mo Mac a subtype of foam cells initiating vascular lesions in metabolic inflammatory diseases 57 59 Table 5 non exhaustive list individual p values available in Table 52 Since the PPARG gene which encodes a nuclear lipid receptor involved in lipid signaling and lipid homeostasis in inflammatory lipid laden Mo Mac 65 has a very strong association with the dengue disease phenotype we searched whether other genes involved in cholesterol homeostasis in Mo Mac had altered expression in the DSS gene signature Remarkably we found a large lipid laden Mo Mac related gene
226. arrett et al 20051 Le premier permet g ne g ne de visualiser son profil d expression travers les chantillons li s l exp rience Figure 4 1 B Le second met la disposition des utilisateurs des classifications pr calcul es de g nes et d chantillons pour un nombre important d exp riences tout en proposant des outils de s lection par analyse supervis es Figure 4 1 D Cependant ces outils restent limit s tant du point de vue de la recherche de l information que de sa repr sentation et de son interpr tation 150 Chapitre 4 Fouille de donn es de puces ADN i oh em iwim e ipe p gt e Quen rns J yd 2 aneroc ST iaa TE SAS ea PERRET NUL ees el LR es aie SES JSF ISIN 3 Piston tems 1 20 of 51608 Pegs ji of ras Experiment type sege SL CPU IITE Moe mecs acy ouusespna Me of proves 29114 apoigagrenre dang compes Apor E Vate type ar Series Series putinha Nomos 21 20 POM Type dabates gens erpressen prufiing unge chenal nucleate Covet Last GDS sp sie dy 15 200 E COSI raced OPLA PEIO Ms mane eee Type Gates gene exprenpen prefting urge chaseal 1OP 22 GOA M muet eee Roses Binns pinup CE 204 D 465 7 M Neat sae emeren pone erpressen prof wa e LPS FIGURE 4 1 Interface web de Gene Expression Omnibus GEO A GEO Profiles permet de ret
227. ased layout that makes use of a subset of the Gene Ontology to map gene products onto relevant cell compartments This layout is particularly powerful for visual integration of heterogeneous biological information and is a productive avenue in generating new hypotheses The second objective of Interactome 2 Browser is to fill the gap between interaction databases and dynamic modeling It is thus compatible with the network analysis software Cytoscape and with the Gene Interaction Network simulation software GINsim We provide examples underlying the benefits of this visualization tool for large gene set analysis related to thymocyte differentiation Conclusions The InteractomeBrowser plugin is a powerful tool to get quick access to a knowledge database that includes both predicted and validated molecular interactions InteractomeBrowser is available through the TranscriptomeBrowser framework and can be found at http tagc univ mrs fr tbrowser Our database is updated on a regular basis Introduction In the last decade the advent of high throughput technologies led to the emergence of the sys tems biology era and prompted the research community to systematically define the expression lev els of mRNAs and micro RNA miRNAs through thousands of cell and tissues under physiological and pathological conditions 1 Now one of the crucial issues is to define the biological mecha nisms that drives genes expression with the ultimat
228. at belong to dense regions DKNN lt cut off are conserved Nodes genes An edge exists between two genes if one of them belongs to the k nearest neighbors of the other Graph partitionning Markov clustering MCL Stijn van Dongen Transcriptional signatures extraction aS FIGURE 4 2 Principe de l algorithme DBF MCL La matrice de distance g ne g ne est g n r e pour chaque couple de g nes Puis elle est utilis e afin d obtenir les distances aux k plus proches voisins o k 150 Ces distances sont ensuite compar es celles d une distribution th orique obtenue par r chantillonnage des distances aux k plus proches voisins observ es Ceci permet de calculer une valeur seuil pour un FDR de 10 par exemple Un graphe est ensuite g n r un noeud correspond un g ne et un arc relies un g ne ses k plus proches voisins Enfin l algorithme MCL Markov Clustering est utilis pour partitionner ce graphe en groupes de g nes correspondant des signatures transcriptionnelles 154 Chapitre 4 Fouille de donn es de puces ADN des matrices bool ennes avec en ligne toutes les sondes d une plateforme de puce ADN donn e et en colonne les signatures transcriptionnelles contenues dans TBrowser et obtenues par l algorithme DBF MCL Cette matrice contiendra pour une sonde 1 et une signature transcriptionnelle j un 1 si cette signature contient cette sonde et 0 si cette s
229. ation d v nements de type insertion d l tion inversion Figure 1 12 B Lutilisation du s quen age en paired end ou PET pour Paired End Tag pr sente divers avantages en fonction du type d application souhait Table 1 3 Ainsi pour la technique d Immunopr cipitation de la Chromatine ChIP Chromatin ImmunoPrecipitation on parle de ChIP seq quand les librairies sont en fragments et de ChIP PET Wei et al 2006 quand celles ci sont en paired end Cette technique permet d accroitre la sp cificit et la d marcation des sites de fixation des facteurs de transcription Comme le montre la table 1 3 ce mode de s quen age est couramment utilis pour diverses applications puisqu il permet une nette am lioration de l efficacit et de la qualit d alignement des reads Mate pair Ce mode de s quen age permet de s quencer 2 fragments de m me taille 50 nucl otides et loign s sur le g nome de 1 10 kb soit une distance d passant la taille des fragments n cessaires pour la construction des librairies Il permet le re s quen age des g nomes Re seq afin d tudier les remaniements grande distance Shendure et al 20051 tels que les indels insertions ou d l tions les grandes duplications et d l tions les inversions les translocations ou encore les anomalies de plo die La construction de librairies en mate pair permet ainsi le s quen age orient de large
230. ation spectacu laire de la densit des spots les puces les plus fr quemment utilis es sont pang nomiques c est dire que les sondes interrogent tous les transcrits connus d un g nome ainsi que quelques s quences non annot es En plus des s quences d ARN messagers elles comportent parfois celles correspondant des lincRNAs On distingue diff rents formats et types de puces encore appel es plateformes selon la densit des spots la nature et le mode de fabrication des sondes synth se in situ par photoli thographie ou impression jet d encre la nature des cibles les m thodes d hybridation et le champ d application Plusieurs soci t s commerciales ont ainsi d velopp des puces ADN parmi lesquelles Agilent Technologies Affymetrix GE Healthcare Life Technologies Applied Biosystems ou encore Illumina Le choix de l Unit UMR_S 928 TAGC s tant port en 2007 sur l installation d une plateforme transcriptome commerciale Agilent celle ci sera d crite ci apr s 22 Chapitre 1 Introduction g n rale 1 2 2 Cas particulier des puces ADN de technologie Agilent La technologie d velopp e dans les ann es 90 par Agilent Technologies utilise un support d hybridation rigide de type lame de verre qui permet le d p t d une densit lev e de sondes gr ce une technique d impression de type jet d encre Dans un premier temps des ARNc marqu s avec un fluorochrome obtenu
231. ative analysis of the gene patterns altered in DSS children we deciphered part of the complex interactive molecular processes occurring during DSS highlighting similarities between DSS and other major inflammatory processes Finally we identified unexpected pro inflammatory innate immune responses activated in the whole blood cells of DSS children that may play a major PLoS ONE www plosone org Molecular Mechanisms of DSS role in DSS pathophysiology The implications of present findings to the improvement of DSS prognosis and treatment are discussed Materials and Methods Ethics statement The global study and all protocols presented here were approved by the national Cambodian ethical committee Written informed consent was obtained from the legal guardians of each child To ensure strict anonymity regarding the patients samples were encoded as PLxxx Plasma Leakage Patients and clinical data Inclusion criteria retained were age 1 to 15 years old positive diagnosis of acute dengue infection assessed by different methods absence of known chronic inflammatory disease or ongoing acute co infection at the time of inclusion An eligible cohort of 83 dengue infected children hospitalised at the Kampong Cham provincial hospital Cambodia was prospec tively enrolled from July to September 2007 during the huge 2007 dengue outbreak in Cambodia characterized by a high number of DSS cases Children diagnosed with acute dengue
232. auses a tissue specific skipping of exon 20 resulting in lower synthesis of IKAP hELP1 protein To better understand the speci ficity of neuron loss in FD we modeled the molecular mechanisms of IKBKAP mRNA splicing by studying hu man olfactory ecto mesenchymal stem cells hOE MSCs derived from FD patient nasal biopsies We explored how the modulation of IKBKAP mRNA alternative splicing impacts the transcriptome at the genome wide level We found that the FD transcriptional signature was highly associated with biological functions related to the devel opment of the nervous system In addition we identified target genes of kinetin a plant cytokinin that corrects IKBKAP mRNA splicing and increases the expression of IKAP hELP1 We identified this compound as a putative regulator of splicing factors and added new evidence for a sequence specific correction of splicing In conclusion hOE MSCs isolated from FD patients represent a promis ing avenue for modeling the altered genetic expression of FD demonstrating a methodology that can be applied to a host of other genetic disorders to test the therapeutic potential of candidate molecules Hum Mutat 00 1 11 2012 2011 Wiley Periodicals Inc Familial dysautonomia FD Riley Day syndrome hereditary sen sory and autonomic neuropathy type III MIM 223900 is a rare neurodegenerative disease with autosomal recessive inheritance and a carrier frequency of 1 in 31 in the Ashkenazi Jewish popu
233. ave FALSE NULL Once the field argument is set one need to provide a value as input For instance the following query use gene name as input with value PCNA gt res lt getSignatures field gene value PCNA gt head res Transcriptional signature IDs can also be obtained by selecting the relevant experiment IDs platform IDs and probe IDs To get all transcriptional signature IDs associated with GSE2004 experiment one should use the following syntax gt res lt getSignatures field experiment value GSE2004 23 signatures were found for the request GSE GSE2004 To get all signatures obtained on GPL96 platform use the following syntax gt res lt getSignatures field platform value GPL96 3377 signatures were found for the request GPL GPL96 Moreover as all signatures were tested for functional enrichment using keywords from the DAVID knowledgebase these terms can be used to query the database DAVID collects a wide range of an notation from several databases including GO BIOCARTA KEGG PANTHER BBID The annotationList dataset contains the annotations terms gt data annotationList gt names annotationList 1 Keyword TableName Vv attach annotationList annotationList 1 4 Vv Keyword TableName 1 1 RBphosphoE2F BBID 2 100 MAPK_signaling_ cascades BBID 3 104 Insulin_signaling BBID 4 105 Signaling_glucose_uptake BBID gt table TableName Ta
234. bachata et de la kizomba tr s bient t sur les pistes de danse esp rons que d ici l je n aie pas tout oubli J adresse mes remerciements tous mes collaborateurs aupr s desquels j ai beaucoup appris au cours de ces 4 ann es En particulier je souhaite vivement remercier le Dr El Ch rif Ibrahim pour ses conseils son aide pr cieuse et les critiques qu il a pu apporter lors de la laborieuse r daction de ce manuscrit A mes amies de Pr diguard Angela et Florence et tous mes amis du laboratoire pr sents et pass s Jacky Brigitte Alex Laura Mimz Luca S ve Nath Cyrille et tous les autres un grand merci pour votre soutien et tous les bons moments de rigolade et pour l ambiance de travail unique du TAGC A Jacques et Sam qui malgr leur emploi du temps tr s charg m ont accord du temps lors de nombreuses discussions constructives et pour leurs conseils A mes amis Martine et Jean Louis pour leur bons sandwichs qui m ont nourri pendant ces ann es de th se a des horaires souvent hors normes merci pour votre amiti Une pens e ma grande famille et celle de Christophe en particulier mes parents et mon fr re Olivier Vous avez toujours t l pour moi m me toi Olivier parti vivre loin de nous en Chine Merci pour votre soutien et le r confort que vous m avez apport pendant les moments de doute et de stress de ces derni res ann es et pour m avoir pouss e toujours en
235. bleName BBID BIOCARTA COG_KOG_ONTOLOGY 57 468 22 CYTOBAND GENETIC_ASSOCIATION_DB GOTERM_BP_ALL 526 68 1273 GOTERM_CC_ALL GOTERM_MF_ALL INTERPRO_NAME 328 639 777 KEGG_PATHWAY KEGG_REACTION OMIM_PHENOTYPE 334 78 10 PANTHER_PATHWAY PFAM_NAME PIR_HOMOLOGY_DOMAIN 104 531 86 PIR_SUPERFAMILY_NAME PUBMED_ID SMART_NAME 160 4887 235 SP_PIR_KEYWORDS 568 The selected terms can be used to select TS IDs In this case user should define a q value For instance one can select TS enriched in genes related to the HSA04110 CELL CYCLE KEGG pathway with q value below 10e gt cc lt getSignatures field annotation value HSA04110 CELL CYCLE qValue 20 66 signatures were found for the request annotation HSA04110 CELL CYCLE Of note one can also search for TS IDs containing genes located in the same chromosomal region For instance one can select TS IDs enriched in genes located in the 8q region which is frequently amplified or deleted in tumors This will point out the biological contexts in which sets of genes located in the 8q region share the same expression profile suggesting amplifications or deletions in some biological samples gt query lt paste grep 8q Keyword val T collapse gt query 11 8q13 8q21 8q21 11 8q21 2 8q22 1 8q22 3 8q24 8q24 13 8q24 3 gt cc lt getSignatures field annotation value query qValue 10 4 signatures were found for the request annotation 8q13 8q21
236. bserved in our study was PMEPAI 4 92 fold change encoding the TMEPAI protein which has recently been reported to be a direct target of the TGF f signaling pathway and is involved in cell growth cell differentiation and apoptosis 54 Due to its important cellular function and repeated reports of its dysregula tion in FD cells it would be very interesting to test TMEPAI in further studies In agreement with previous studies correlating a decreased expression of IKAP hELPI with defects in cell migration 10 11 22 34 the Boyden s chamber assay show that FD hOE MSCs have decreased migration potential compared to control cells Figure 5 December 2010 Volume 5 Issue 12 e15590 FD1 ue tt OE MSCs as a Model for FD FD3 FD4 serum rafnshh serum rafnshh serum rafnshh Lou Let 100 li b le ae H 0 04 5 lt 5 003 2 L Q 3 0 02 S Q lt 0 01 0 OWT ET serum rafnshh MAP2 RARR B III tubulin merged Figure 9 WT MU ratio is decreased in differentiated hOE MSCs A Phase contrast microscopy of FD hOE MSCs cultured for 48 h in either serum A or N2B27 conditions B or with the rafnshh cocktail including retinoic acid forskolin and Sonic hedgehog C or rafnshh for 7 days D Details of connections established between cells and extensive cellular arborization after 7 days in rafnshh condition are shown in E and F G agarose gel electrophoresis of semi quantitative RT PCR
237. c CG CG CC CT micro c 1 GC GT GG GA r acteur 7 TA 1G TT TC Site de clivage ter cycle de ligation m ED Bil e polymerase Sti Ad nosine dre si Piimer ei N lat x Pan op col des dapiateurs T cas P1 ATPsu phuryase Ne us re Adaptateur S querce cible Ces PI tucifelase tucif rine Emission de excitation fluorescence be ce s As iat et oxy luciferine mmn Illumina TA S quen age par synth se e CES LR Clivage de ia probe M Incorporation des 4 HOrr dNTPs marqu s M chacun avec un PE ENT P flucrechiome RE 3 diff rent Cydes de ligaion 1 2 3 4 5 6 7 i Lavage et teclure i Paid la seabed Ade Se 5 de a fuorescerce D shybridation cu primer n et de ta s quence 3 5 gt Clivage du Primer universel n 1 nt flucrochicme et passage au cycle Br amant 3 de igation suivant M me chose pour les primers universels en pcsitions n 1 n 2 n 3 et n 4 Position de lecture 0 11213 415 6 7 8 s 1 Primer universel n 2 Primer universel n 1 3 Primer universel n 2 Primer universel n 3 Primer universel n 4 Position iue Cycle de gaton W E E E FIGURE 1 8 Principe des trois technologies majeures de s quen age tr s haut d bit Adapt de Metzker 2010 38 Chapitre 1 Introduction g n rale Principe de la chimie du GS FLX Titanium de Roche La technique commercialis e par la soci t Roche est bas e sur l amplification par PCR
238. cal pathways identified using IPA genes having the strongest association with the disease phenotype based on ANOVA analysis and similarities to molecular patterns altered in other systemic inflammatory processes associated with endothelial dysfunction Results Patient characteristics To identify gene patterns specifically altered in DSS patients we compared three groups of carefully matched paediatric patients representing the main clinical forms of symptomatic dengue infections DF n 16 DHF n 13 and DSS n 19 according to the 1997 WHO classification criteria of dengue severity 29 Altogether DF DHF and DSS represent different subtypes of the disease phenotype variable further considered in this study The clinical characteristics and values of haematological parameters are presented in table 1 median values from each patient group and table S1 individual values from each of the 48 patients included Supportive treatment provided to DSS patients are mentioned As indicated DSS children had significant lower relative neutrophil counts median values DF 3900 DHF 3950 DSS 2500 p value 0 03 Kruskal Wallis test Unsupervised hierarchical clustering discriminates DSS children from DF DHF ones revealing a DSS gene signature Since microarray data analysis can be affected by a number of bias 40 we put a particular care on study design and analysed data from the 48 normalized microarrays using multi way analysis of
239. ccess to a wide range of heterogeneous sources of gene annotation 152 543 annotation terms were used for human 105 207 for mouse and 39 787 for rat DAVID ID mapping was obtained for 218 727 AffyID A Perl script that integrates call to the R software was run to load probe list and calculate iteratively Fisher s exact PLoS ONE www plosone org GEO Datamining with TBrowser test p values on 2x2 contingency tables Bonferroni adjusted p values were calculated using the multtest Bioconductor library for all TS Overall 5 10 Fisher s exact test were performed User interface TBrowser is accessible through a web browser at TAGC web site http tagc univ mrs fr tbrowser Of note the Browser client is extensible through a plug in architecture that allows rapid development of additional features A developer s guide will be available soon on our website Supporting Information Figure S1 TBrowser Found at doi 10 1371 journal pone 0004001 s001 10 16 MB TIF A schematic overview of the pipeline used in Figure S2 An illustration in two dimensions of the motivation behind DBF MCL filtering step Arrows point out the 20th nearest neighbor for selected points Length of each segment corresponds to a given DKNN value Found at doi 10 1371 journal pone 0004001 s002 8 22 MB TIF Figure S3 Distributions of DKNN values Observed DKNN values solid line and of a set of simulated DKNN values S dotted line are shown for
240. cellules sanguines a partir des cellules de sang total PAXgene blood RNA Qiagen chez 48 jeunes patients cambodgiens recrut s prospectivement pendant l pid mie de dengue de 2007 et pr sentant des volutions cliniques distinctes selon la distribution suivante DF n 16 DHF n 13 et DSS n 19 90 Chapitre 3 Analyses de donn es de puces ADN Infection par le virus de la dengue m l Asymptomatique Symptomatique Dengue h morragique Fi vre indiff renci e Fi vre de Sans choc DHF Avec choc DSS dengue DF Grades et Il Grades II et IV DANGRET Fi vre ae Arthralgies Si DF ipkor Ft shal gt Stung Treng Myalgies Thombon nie iem Reap Mekong C phal es H morragies Syndrome de choc a Douleurs r tro orbitaires Feagilit capillaire HYPOVOLEMIQUE Rash Signes de fuite Signes DHF Signes h morragiques plasmatique Hypotension H moconcentration Pression pouls lt 20mm Hg Epanchement pleural Alt ration de l tat g n ral L Risque de d c s FIGURE 3 8 Classification clinique de la dengue tablie en 1997 par POMS et localisation de la r gion d o proviennent les jeunes patients cambodgiens OPEN Q ACCESS Freely available online Genome Wide Expression Profiling Deciphers Host Responses Altered during Dengue Shock Syndrome and Reveals the Role of Innate Immunity in Severe Dengue St phanie Devignot C dric Sapet Veasna Duong Aur
241. centr es par l cart type de l chantillon correspondant Il est galement possible d utiliser la Median Absolute Deviation MAD cart m dian absolu un estimateur plus robuste de la dispersion des donn es Cependant cette normalisation suppose que les biais observ s soient dus a des facteurs globaux concernant tous les g nes incorporation des marqueurs qualit de l hybridation protocoles exp rimentaux elle ne consid re donc aucune r gion ou effets d pendant des intensit s bruit de fond local normalement pr trait s pr c demment 2 2 3 2 Normalisation par r gression locale La m thode LOWESS LOcaly WEighted Scatterplot Smoothing propos e par Cleveland en 1979 Cleveland 1979 et d velopp e par Cleveland et Devlin en 1988 d signe sp cifiquement une m thode de r gression polynomiale locale pond r e En fonction du degr du polyn me utilis on parle de m thode LOWESS ou LOESS Ainsi pour un degr d ordre 1 du polyn me c est dire une r gression lin aire on parle de m thode LOWESS contrairement au LOESS qui est utilis pour un ordre 2 Ce type de normalisation est le plus couramment utilis pour les puces ADN two colors Elle implique que l expression de la majorit des g nes soit inchang e 2 2 3 3 Normalisation par les quantiles La normalisation par les quantiles permet d uniformiser les distributions des intensit s pour un ensemble d cha
242. chematic This exon inclusion also induced a frameshift and resulted in a premature stop codon whose relative location may 6 December 2010 Volume 5 Issue 12 e15590 PMEPA1 ctr ro 1 5 1 5 N oOo LE kk RK 6 1 2 1 2 h 0 9 0 9 0 9 a 5 0 6 0 6 0 6 2 P 0 3 0 3 0 3 0 0 0 ABL1 HPRT1 RPLPO B S100A16 Bicrr ro 1 5 1 5 12 tt 0 9 0 9 0 6 0 6 0 6 0 3 0 3 0 0 0 ABL1 HPRT1 RPLPO N 0 gt o relative expression gt eo Figure 4 Relative levels of expression of PMEPA1 and S100A16 transcripts determined by RT qPCR RT qPCR using total RNAs extracted from 4 controls and 4 FD hOE MSCs at cell passages 2 4 and 7 Histograms represent the mean value of PMEPA7 A and S100A16 B transcript expression level relative to 3 reference genes ABL1 HPRT1 and RPLPO in control grey and FD samples black Error bars denote standard errors P lt 0 05 P lt 0 01 P lt 0 001 using two tailed Student s test doi 10 1371 journal pone 0015590 g004 lead to NMD of this new isoform Figure 6D We confirmed this exon 36 inclusion with specific primers Figure 6B lower panel Both new alternative splicing events we described were also observed in others cell types fibroblasts HeLa peripheral mononuclear cells data not shown and we decided to focus on the two major splicing events full exon 2 inclusion and exon 36 skipping We derived the tools plasmids primers probes to perform absolute
243. chive of mi croarray and high throughput sequencing based functional genomics experiments Nucleic acids research 39 Database issue D1002 4 Pastinen et al 2000 Pastinen T Raitio M Lindroos K Tainola P Peltonen L amp Syv nen A C 2000 A system for specific high throughput genotyping by allele specific primer extension on microarrays Genome research 10 7 1031 42 Pekowska et al 2010 Pekowska A Benoukraf T Ferrier P amp Spicuglia S 2010 A unique H3K4me2 profile marks tissue specific gene regulation Genome research 20 11 1493 502 298 Bibliographie Pepke et al 2009 Pepke S Wold B amp Mortazavi A 2009 Computation for ChIP seq and RNA seq studies Nature methods 6 11 Suppl S22 32 Perou ef al 2000 Perou C M S rlie T Eisen M B van de Rijn M Jeffrey S S Rees C A Pollack J R Ross D T Johnsen H Akslen L A Fluge O Pergamenschikov A Williams C Zhu S X L nning P E Bg rresen Dale A L Brown P O amp Botstein D 2000 Molecular portraits of human breast tumours Nature 406 6797 747 52 Place et al 2008 Place R F Li L C Pookot D Noonan E J amp Dahiya R 2008 MicroRNA 373 induces expression of genes with complementary promoter sequences Pro ceedings of the National Academy of Sciences of the United States of America 105 5 1608 13 Ponting et al 2009 Ponting C P Oliver
244. ci poss de la m me organisation mat rielle ainsi que le m me environnement logiciel que le cluster online Figure 5 4 Cependant il offre une puissance de calcul sup rieure gr ce des processeurs plus r cents Ces 2 clusters sont compos s d un noeud ma tre head node et de quatre noeuds esclaves node On appelle ce type de ferme de calcul un cluster Beowulf Leur syst me d exploitation suivant les pr requis de Life Technologies est CentOS Community ENTerprise Operating System une distribution GNU Linux gratuite d riv e de Red Hat principalement destin e aux serveurs Ce syst me n est physiquement install que sur le noeud ma tre des deux serveurs les quatre noeuds esclaves chargeant leur syst me en m moire au d marrage gr ce au logiciel Scyld Enfin le logiciel Torque Terascale Open source Resource and QUEue Manager version gratuite du logiciel PBS Portable Batch System associ a un s quenceur de tache permet la gestion par le maitre de la r partition des taches sur les noeuds du cluster et l envoi de commandes pour la soumission des jobs et leur monitoring Au total ces deux clusters poss dent 40 coeurs de calculs par comparaison un ordinateur de bureau comporte g n ralement 2 coeurs dual core Pour la conservation des donn es moyen et long terme le laboratoire dispose de 4 unit s de stockage en r seau galement appel stockage en r seau
245. co t N anmoins cet inconv nient est att nu par le fait que ce mode de s quen age donne beaucoup plus d informations que le ChIP on chip En outre cette diff rence de co t s estompe progressivement avec les avanc es technologiques temps de pr paration des librairies plus long que celui de la pr paration des chantillons pour l hybridation sur puce ADN avec des protocoles plus complexes l amplification PCR Figure 1 7 de ces HTS conduit parfois des biais d amplifi cation certains reads sont ainsi sur amplifi s tandis que d autres sont sous amplifi s Mutter amp Boynton 19951 erreurs de s quen age r gion compos e d un faible pourcentage en GC Siddiqui ef al 20061 et d alignement possibles s quences r p t es et r gions t lom riques Dohm ef al 20081 temps d analyse plus long et besoin de plus de moyens informatiques pour le stockage et le traitement d un volume important de donn es en Gb pour le ChIP seq plut t qu en Mb pour le ChIP on chip analyse de donn es plus complexes mod les statistiques normalisation des donn es 5 15 Le mod le th orique de distribution des s quences Apr s alignement des s quences immunopr cipit es sur le g nome ou une s quence de r f rence deux types de distributions peuvent tre tudi es en fonction du type d exp rience la localisation de sites de liaison d un facteur de transcripti
246. coexpressed in given conditions thus creating context specific views of the interactome and regulome IBrowser is intended both for biologists and bioinformaticians On one hand it is a graph based knowledge browser that is intended to provide new insight into any user defined gene list On the other hand it is also intended to fill the gap between heterogeneous genomic data and gene regulato ry network analysis In this regard graphs produced inside IBrowser may be exported into Cy toscape and GINsim a dynamic modeling software 23 In the following sections we provide sev eral examples underlying the benefits of this visualization tool for large gene set analysis Implementation We first used phylogenetic footprinting to predict regulatory elements in the human and mouse genomes A dataset of 1 213 PWMs corresponding to mouse or human transcription factors was ob tained from various sources TRANSFAC 10 2 JASPAR 2010 UNIPROBE The multiz28way with hg18 as a reference and the multiz30way with mm9 as a reference cross species multiple alignments were obtained from UCSC 24 We retained for analysis alignments flanking transcrip tion start sites on both sides 3000 3000 of any RefSeq transcript and devoid of coding sequences Sequences were scored following the commonly used formula 25 P seeing Sat position w PWM ptw W 1 SCORE gt log Wweo P seeing S at position w Background model whe
247. creased lipid peroxydation activity 113 114 insufficiently compensated by anti oxydant mechanisms 121 as supported by related altered gene patterns identified in this study Table 5 may result in high levels of circulating ox LDL contributing to altered cholesterol metabolism Differences in nutritional status 122 124 or host genetics may also contribute to altered homeostasis of cholesterol gene pattern in the blood cells of DSS patients Interestingly transcripts encoding molecules considered candidates to diseases characterized by impairment of cholesterol homeostasis such as WPCI PCSK9 and PPARG 66 67 74 have significant altered abundance in the blood cells of DSS children Table 5 Further investigations should consider possible associ ations between DSS and allelic variants of such genes Whatever the determinants of cholesterol metabolism alterations in DSS patients our results reinforce interest in considering sub fractions and total cholesterol as putative biomarkers of DSS 115 They also suggest that drugs used to treat metabolic disorders such as atherosclerosis should deserve further attention for the control of such a pro inflammatory process in dengue infected patients now proposed for other critical illnesses 117 Transcriptional activation of the lipid related arachidonic acid pathway in the whole blood cells of DSS children at the time of shock was another pro inflammatory mechanism relevant to the pathophysiology
248. croarrays Note that the AgiND requires at least R 2 5 0 version Briefly the AgiND package contains high level function for e diagnosis Boxplot color coded images and MA plot for both gProcessedSignal rProcessedSig nal and gMeanSignal rMeanSignal intensities e normalization lowess or quantiles method e convertion of an AgilentNorm or an AgilentNormRG object into an ExpressionSet object to ensure co patibility with other bioconductor packages 2 Getting started 2 1 Load the AgiND library e After installation load the AgiND package using the library Fonctions gt library AgiND e Exemple data are located in the R installation path of the AgiData package and for the need of this demonstration user should change its working directory using the following command gt library AgiData gt setwd system file package AgiData Note If a MIAME and phenoData files are provided they should be located in the ExpData directory and named miame and phenodata txt respectively Type the following command to acces help gt help AgiND gt help search AgiND 2 2 Note about quantification files The library supports quantification files with txt extension derived from the Agilent Feature Extraction software As user way perform both one channel one sample and two channel two samples hybridization the AgiND package was developped to handle both approaches For data acquisition user sh
249. ctrophoresis of semi quantitative RT PCR products ob tained from four control and four FD hOE MSCs cultivated in sphere and kinetin treated untreated differentiation rafnshh conditions IKBKAP transcripts are identified as WT for the correct transcript and MU for the exon 20 skipped transcript B Relative RT qPCR was performed us ing cDNAs from the same samples of the three conditions P lt 0 01 P lt 0 05 Ct mean values for all samples from each condition were used and normalized with Ct mean values of WDR59 we compared the control and FD sphere samples to the rafnshh treated samples without kinetin After conducting a significant analysis of microarray SAM analysis we visualized as a heatmap that more than 3 000 transcripts are differentially expressed false discovery rate FDR 0 between spheres and neuroglial progeni tors Supp Fig S1 Of these genes we analyzed only those with a more than 10 fold change FC superior and grouped them under five types of biological processes nervous system development cell adhesion WNT Shh signaling pathway proteolysis and retinoic acid activity Supp Table S1 All of the processes appear to be related to the factors added in culture media Indeed genes that show the greatest fold changes are involved in retinoic acid activity RARRES1 DHRS3 RARRES2 RARB and the WNT Shh signal ing pathway SFRP4 CP WNT11 In general genes related to the nervous system are more highly expres
250. cytokine storm is thought to be central to the systemic microcirculatory failure and massive plasma leakage leading to cardiovascular decompensation characterizing DSS 5 However controversies exist regarding the nature of pathogenic host immune responses supporting this life threatening syndrome 6 8 Indeed reactivation of cross reactive memory T lymphocytes and increased infection of monocytes mediated by cross reactive antibodies acquired during previous infections by distinct dengue virus serotypes are the main hypothetic mechanisms proposed to explain the putative cytokine storm leading to plasma leakage 5 9 However those hypothesis fail to explain the occurrence of DSS in patients having primary dengue infection and their relevance to the pathophysiology of DSS disease is discussed 8 10 July 2010 Volume 5 Issue 7 e11671 Efforts to identify soluble biomarkers of severe dengue differentiating uncomplicated dengue infections from severe ones has led to the identification of a diversity of cytokines chemokines endothelial agonists or soluble endothelial molecules 11 18 However discrepancies in definition of dengue severity variability in patients cohorts characteristics as well as in techniques and markers investigated have impaired the identification of reliable sets of DSS biomarkers and the possibility to get a global overview of biological markers altered during DSS Understanding the molecular basis of DSS and id
251. d Tag RiboNucleic Acid ou ARN en frangais Significant Analysis of Microarrays SOLiD Experimental Tracking Software Small Nucleotide Polymorphism Sequencing by Oligonucleotide Ligation and Detection Terabytes TranscriptomeBrowser s Transcriptional Signature Transcription Start Site R sum de la th se Suite des tudes en biologie effectu es IUT G nie Biologie Analyses Biologiques et Biochimiques puis l Universit de Toulon et du Var j ai d but ma formation en bioinfor matique en 2006 en int grant le master 1 mention Bioinformatique Biochimie Structurale et G nomique BBSG la Facult des Sciences de Luminy Universit de la M diterran e AixMarseille II Lors de ce cursus j ai r alis deux stages en bioinformatique au sein de l unit mixte Inserm Universit de la M diterran e UMR_S 928 intitul Technologies Avanc es pour le G nome et la Clinique TAGC sous la direction du Dr Denis Puthier puis en co direction avec le Dr Jean Imbert qui a rejoint le TAGC en juillet 2007 Le laboratoire TAGC m ne des projets de recherche dans les domaines de la g nomique et de la bioinformatique avec pour la plupart une application m dicale et accueille une plateforme de Transcriptomique et de G nomique lab lis e IBiSA bas e sur la technologie des puces ADN et du s quen age tr s haut d bit intitul e Transcriptome G nomique Marseille Luminy TGML Ces projets combinent des analyses porta
252. d find a signature defining T cells that contained numerous cell surface markers e g TCA CD2 CD3G CD6 IL2RB IL2RG IL7R IL21R and ICOS signaling genes ZAP70 LAT LCK ITK and cytotoxicity related genes GZMA GZMB GZMH GZMK and PRF1 Concerning B cells three clusters were observed A large signature contains mature B cell markers CD19 CD22 CD72 and CD79B and transcription factors important in B cell development such as PAX5 and TCLIA A second signature contains POU2AF1 OBF 1 together with its described targets genes coding for immunoglobulin IGHG1 IGHG3 IGHA1 IGHM IGJ IGKC and IGL and the B cell maturation factor TNFRSF17 BCMA 18 19 The third B cell signature contains cell surface markers found in immature B cells CD24 VPREB1 IGLL1 CD179B and CR2 CD21 in addition to transcription factors known to play a crucial role during early B cell development TCF3 SPIB and CUTLI The NK signature contains eight genes of the Killer cell immunoglob ulin like receptors KIR family 3 genes of the killer cell lectin like receptor family in addition to other markers whose expression has December 2008 Volume 3 Issue 12 e4001 been reported on the surface of NK cells CD160 CD244 2B4 and CD226 20 21 22 It also contains TBX21 T bet together with ILI8RI ILI8RAP ILI2RB2 and IFNG Importantly the IL12 IL18 combination has been shown to be potent inducers of both TBX21 T bet and IFNG in NK cells 23 24 In addition
253. d from control and FD biopsies express the neural stem cell specific marker nestin Figure 1D and 1E and the immature neuronal marker f III tubulin Figure 1F and 1G in the same proportions Figure 1H and 11 A comparatively low GFAP staining was observed in every hOE MSCs Figure 1J and 1K In addition cells were negative for a mature neuronal marker MAP2 Figure 1L and 1M This analysis suggests that both control and FD hOE MSCs display properties of neuroglial progenitor cells Expression of IKBKAP transcripts is dramatically reduced in FD hOE MSCs IKBKAP mRNA expression was investigated in cultures of 5 controls and 4 FD hOE MSCs at early P1 P2 and later cell passages P5 P9 A semi quantitative RT PCR analysis revealed that while control hOE MSCs expressed exclusively the WT mRNA transcript Figure 2A left panel FD hOE MSCs expressed the WT but also the MU transcript Figure 2A right panel We also demonstrated that long time culture conditions and trypsin EDTA mediated cell passages did not affect the IKBKAP gene expression pattern In order to more accurately determine the level of expression of JABKAP alternative transcripts we designed primers probes and plasmid calibrators to perform absolute quantification using quantitative real time RT PCR RT qPCR on the same samples Strikingly WT transcripts were December 2010 Volume 5 Issue 12 e15590 CTRL hOE MSC FD hOE MSC GFAP CTRL hOE MSC FD hOE MSC
254. dans les cancers Les analyses que j ai pu r aliser sur des chantillons de ChIP seq obtenus partir de lign es canc reuses cancers du sein ou glioblastomes ont montr que le nombre de pics d tect dans les r gions amplifi s est sup rieur aux autres r gions du g nome En effet l amplification g nomique peut tre d fini comme un processus g n tique condui sant la multiplication s lective du nombre d exemplaires d un g ne ou d un groupe limit de g nes adjacents d finissant un amplicon ce qui participe l oncogen se dans plusieurs types tumoraux http www sanger ac uk genetics CGP Census amplification shtml Reste a savoir si cela est d a amplification g nomique qui enrichit artificiellement le nombre de fragments al atoirement immunopr cipit et correspondant au bruit de fond ou a des r gions r gulatrices importantes Afin de r pondre cette question on s oriente peu peu vers le res quen age du g nome des tumeurs Ross amp Cronin 2011 En effet chaque lign e canc reuses poss dent sa propre amplification diff rente en nombre de copie et en r gions g nomiques d une autre Stephens et al 2009 Perspectives de d veloppement Le d veloppement du pipeline va tre poursuivie dans le cadre d un poste sur la plateforme En effet ce pipeline utilis en interne n cessite une homog n isation avec l autre pipeline d velopp sur la plateforme et perm
255. dcentral com imedia 1789905813646762 supp5 bed Additional file 6 TBMC hs bed 4849K http www biomedcentral com imedia 1455949074646762 supp6 bed Additional file 7 Video tutorial doc 9K http www biomedcentral com imedia 1319895762646762 supp7 doc 4 7 Acc s programm la base de donn es de TBrowser 205 4 6 3 Les cartes transcriptionnelles pour le plugin TBMap J ai cr des proc dures stock es permettant de g n rer des cartes transcriptionnelles pour diverses esp ces partir de cette nouvelle base de donn es o les informations ne sont plus organis es de la m me mani re J ai galement cr un script qui permet de g n rer une carte transcriptionnelle partir d une liste de g nes et pour toutes les signatures poss dant au minimum un g ne de cette liste Notre base de donn es contenant maintenant beaucoup plus d esp ces j ai galement modifi le script original pour accepter en entr e les homologenelD permettant ainsi d avoir une carte comprenant plusieurs esp ces proches Ces cartes peuvent galement tre visualiser l aide d outils tels que Treeview et TMeV logiciels pour l analyse de donn es de puces ADN 4 7 Acc s programm la base de donn es de TBrowser Afin de permettre un acc s notre base de donn es par des outils de programmation aux utilisateurs experts nous avons d velopp des services web et une librairie R y acc dant ceci dans le but de
256. de grandes avanc es au niveau de la compr hension des m canismes du d veloppement et du traitement de ces maladies bien que cela ne soit pas toujours vident et n cessite parfois l utilisation de tests statistiques ou d approches parfois plus contest es lors d tudes o il n y a pas voir peu de r plicats par exemple Comme les analyses effectu es dans ce chapitre le montre elle permet l obtention de signatures mol culaires de bonne qualit et reproductible que ce soit dans le cadre de maladies mono g nique comme la Dysautonomie Familiale ou dans le cas de maladies plus complexes comme les maladies infectieuses telles que la dengue Cependant il est toutefois noter que selon la pathologie tudi ces signatures sont plus ou moins tendues et n cessite parfois des exp riences suppl mentaires Des techniques haut d bit celles tr s haut d bit Le d veloppement r cent des techniques de s quen age tr s haut d bit et les nombreuses d couvertes concernant la r gulation de l expression des g nes comme le r le des miRNA des lincRNA ont permis l volution des puces ADN pour proposer maintenant l tude de ces ARN non codants Des puces d di es l tude des milliers de miRNA d couvert dans le g nome humain ont t cr es alors que les lincRNA pr sents en plus petit nombre environ 200 ont t tout simplement rajout s aux puces ADN d expression On s oriente donc
257. de patients de sexes diff rents S Femme Homme qui peut tre repr sent par la formule log Vija w G Ej T S GE GT GS ET x ES TS ju Eijk ou G E T S repr sentent respectivement les effets dus aux g nes aux chantillons au type de cancer au sexe Les interactions entre deux de ces facteurs sont not es entre parenth ses 80 Chapitre 3 Analyses de donn es de puces ADN comme par exemple pour TS qui correspond donc l interaction entre le type de cancer k et le sexe s Les g nes diff rentiellement exprim s seront ceux pour lesquels l interaction avec les chantillons GE auront les plus faibles p valeurs 3 2 M thodes de classification non supervis es La classification de g nes ou d chantillons peut tre obtenue par 1 des m thodes supervis es si l on tient compte de l expression diff rentielle des g nes dans diff rents groupes d chantillons selon leur ph notype ou 2 non supervis es c est dire sans a priori en se basant sur l ensemble des chantillons Diverses m thodes de classification non supervis es ont t appliqu es l identification des profils dans les donn es d expression g niques Elles peuvent tre class es en 2 cat gories les m thodes de regroupement classification hi rarchique et les m thodes de partitionnement k moyens cartes auto adaptatives en n groupes de g nes ou clusters Divers outils grat
258. dessin de l exp rience au stockage des donn es en passant par leur traitement et leur analyse 23 Repr sentation sch matique des r gions r gulatrices permettant la modulation transcriptionnelle de l expression des g nes 26 Repr sentation des modifications covalentes d histones avec en A la struc ture de la chromatine avec ces octam res d histones adapt de http www mun ca biology scarr Histone Protein _Structure html en B la structure tri dimensionnelle d un nucl osome avec le positionnement des principales modi fications d histones extrait de Wolffe amp Hayes 1999 et enfin en C les di verses modifications N terminales des histones H2A H2B H3 et H4 adapt de Lacoste amp C t 2003 3 ee qu dus gui gun a aus 28 Interaction de la m thylation de l ADN des modifications d histones du posi tionnement des nucl osomes et des autres facteurs permettant la r gulation de l expression des g nes comme des facteurs de transcription et les small RNA 30 R partition des diff rentes technologies de s quen age tr s haut d bit dans le monde en d cembre 2011 A R partition g ographique B Distribution en nombre et en pourcentage des principaux mod les de s quenceurs tr s haut d bit nombre total 1670 et C Principaux centres de s quen age source http pathogenomics bham ac uk hts 34
259. dgehog 37 genes ont t identifi s par SAM FDR 10 comme permettant de distinguer toutes les cultures de DF par rapport celles des contr les avec une pr pond rance de g nes ayant un r le d terminant dans le fonctionnement du syst me nerveux La comparaison de notre tude avec les 5 transcriptomes provenant de GEO nous a permis d identifier une centaine de g nes dont les variations d expression entre chantillons contr les et DF ou knock down pour IKAP hELP1 sont conserv es dans au moins deux tudes ind pendantes Parmi les processus 110 Chapitre 3 Analyses de donn es de puces ADN qui semblent alt r s de fa on r currente dans la DF nous avons pu identifier la diff renciation neuronale la migration et l adh sion cellulaires et la r gulation de l apoptose Enfin parmi les g nes d r gul s par la kin tine nous avons pu mettre en vidence pour la premi re fois deux facteurs d pissage participant la reconnaissance du site 5 d pissage 5 ss Cela ouvre ainsi de nouvelles pistes pour le d cryptage du mode d action de la kin tine sur l pissage du pr ARNm d IKBKAP Tous ces travaux font l objet d une publication sous presse Human Mutation Les donn es sources sont accessibles sur GEO l aide de l identifiant GSE27915 OPEN ACCESS Freely available online Olfactory Stem Cells a New Cellular Model for Studying Molecular Mechanisms Underlying Familial
260. different lanes of high throughput sequencing data PloS one 6 8 e23683 Golub et al 1999 Golub T R Slonim D K Tamayo P Huard C Gaasenbeek M Me sirov J P Coller H Loh M L Downing J R Caligiuri M A Bloomfield C D amp Lander E S 1999 Molecular classification of cancer class discovery and class prediction by gene expression monitoring Science New York N Y 286 5439 531 7 Gommans amp Berezikov 2012 Gommans W M amp Berezikov E 2012 Sample preparation for small RNA massive parallel sequencing Methods in molecular biology Clifton N J 786 167 78 Good et al 2006 Good B M Kawas E A Kuo B Y L amp Wilkinson M D 2006 iHO Perator user scripting a personalized bioinformatics Web starting with the iHOP website BMC bioinformatics 7 534 Govin et al 2004 Govin J Caron C Lestrat C Rousseaux S amp Khochbin S 2004 The role of histones in chromatin remodelling during mammalian spermiogenesis European journal of biochemistry FEBS 271 17 3459 69 Guttman et al 2009 Guttman M Amit I Garber M French C Lin M F Feldser D Huarte M Zuk O Carey B W Cassady J P Cabili M N Jaenisch R Mikkelsen Bibliographie 291 T S Jacks T Hacohen N Bernstein B E Kellis M Regev A Rinn J L amp Lander E S 2009 Chromatin signature reveals over a thousand highly conserved large non c
261. drograms that capture the degree of similarity for each gene An illustrative set of selected genes is shown in Supp Figure S3A Next we looked for the cluster of genes that include IKBKAP Supp Fig S3B Significantly among the few genes in the same cluster as IKBKAP we identi fied DDX42 which encodes SF3b125 an RNA helicase involved in spliceosome assembly Will et al 2002 and NHP2L1 nonhistone chromosome protein 2 like 1 which binds the 5 stem loop of U4 snRNA and may play a role in late stage spliceosome assembly Nottrott et al 1999 Discussion Genome wide expression studies have been widely used in an ef fort to identify signatures that can define pathologies In this study we proposed to use properties of hOE MSCs to perform a transcrip tome analysis of FD These cells have been used as a nervous system replacement cells in mice Nivet et al 2011 and demonstrate a potential to differentiate into nervous cell types Delorme et al 2010 Murrell et al 2005 Importantly this novel patient derived cellular model has allowed us to modulate IKBKAP alternative splic ing by exposing cells to different culture conditions Boone et al 2010 In this study we discuss the opportunity to use hOE MSCs derived from FD patients to analyze the transcriptional differences due to the alteration or improvement of IKBKAP mRNA alternative splicing We focused on identifying gene expression differences in FD using two different
262. duals as an experimental model This allowed us to modulate the rate of IKBKAP exon 20 skipping in vitro by vary ing culture conditions to produce spheres with epidermal growth factor EGF and basic fibroblast growth factor bFGF or to stim ulate neuroglial differentiation with a rafnshh cocktail including all trans retinoic acid forskolin and sonic hedgehog Boone et al 2010 In this study we performed the comparative transcriptome analysis between spheres and rafnshh treated hOE MSCs and also investigated the effect of kinetin at the genome wide level Materials amp Methods Purification of hOE MSCs Human nasal mucosae were obtained by biopsying five FD pa tients four females and one male aged 10 16 years at the Dysau tonomia Treatment and Evaluation Center New York Biopsies from five healthy controls four females and one male aged 10 34 years were collected by the ENT Department in Marseille University H pital Nord France Samples were obtained under a protocol approved by the local ethical committees in New York and Mar seille Biopsies were harvested as previously described Boone et al 2010 to obtain an olfactory cell culture of hOE MSCs Cells are routinely cultivated with DMEM HAM S F12 containing 10 FBS at 37 C in the presence of 5 CO2 Kinetin solution 1 mg ml Sigma Aldrich St Louis MO was diluted in DMEM HAM S F12 at 100 uM concentration for dose effect experiments and at 80 uM in ex
263. dysautonomia Muscle Nerve 29 352 63 5 Anderson SL Coli R Daly IW Kichula EA Rork MJ et al 2001 Familial dysautonomia is caused by mutations of the IKAP gene Am J Hum Genet 68 753 8 6 Slaugenhaupt SA Blumenfeld A Gill SP Leyne M Mull J et al 2001 Tissue specific expression of a splicing mutation in the IKBKAP gene causes famili dysautonomia Am J Hum Genet 68 598 605 7 Dong J Edelmann L Bajwa AM Kornreich R Desnick RJ 2002 Familial dysautonomia detection of the IKBKAP IVS20 6T C and R696P mutations and frequencies among Ashkenazi Jews Am J Med Genet 110 253 7 8 Cuajungco MP Leyne M Mull J Gill SP Lu W et al 2003 Tissue specific reduction in splicing efficiency of IKBKAP due to the major mutation associated with familial dysautonomia Am J Hum Genet 72 749 58 9 Hawkes NA Otero G Winkler GS Marshall N Dahmus ME et al 2002 Purification and characterization of the human elongator complex J Biol Chem 277 3047 52 0 Close P Hawkes N Cornez I Creppe C Lambert CA et al 2006 Transcription impairment and cell migration defects in elongator depleted cells implication for familial dysautonomia Mol Cell 22 521 31 1 Creppe C Malinouskaya L Volvert ML Gillard M Close P et al 2009 Elongator controls the migration and differentiation of cortical neurons through acetylation of alpha tubulin Cell 136 551 64 2 Solinger JA Paolinelli R Kloss H Scorza FB Marchesi S et al 20
264. e innate immunity and TLRs play a central pathogenic role 134 DSS pathophysiology a secondary inflammatory loop hypothesis To summarize we report the identification of a specific gene expression profile in the blood cells of DSS children at time of shock characterizing DSS as a unique entity at the transcriptional level whatever the immunological status of children regarding primary or secondary infection Major immunological alterations identified at the time of shock are characterized by an altered balance between depressed T lymphocyte responses and exacerbated compensatory and pro inflammatory innate immune responses that may finally be detrimental to the host 135 137 while functional studies should confirm the contribution of those molecular mechanisms to DSS pathophysiology Based on recent knowledge on molecular mechanisms altered in other systemic inflammatory diseases DSS may result from a complex pro inflammatory network involving a diversity of innate immune effectors sustaining a secondary systemic inflammatory loop leading in turn to vascular homeostasis breakdown and systemic microcirculatory failure characterizing DSS Figure 4 We suggest that drugs available to treat metabolic and other systemic chronic inflammatory diseases could be considered for the treatment of dengue infected patients before shock occurs and that a number of bio markers found altered in DSS patients blood cells should be evaluated as putativ
265. e DSS gene signature we provide an integrative overview of the transcriptional responses altered in DSS children In particular we show that the transcriptome of DSS children blood cells is characterized by a decreased abundance of transcripts related to T and NK lymphocyte responses and by an increased abundance of anti inflammatory and repair remodeling transcripts We also show that unexpected pro inflammatory gene patterns at the interface between innate immunity inflammation and host lipid metabolism known to play pathogenic roles in acute and chronic inflammatory diseases associated with systemic vascular dysfunction are transcriptionnally active in the blood cells of DSS children Conclusions Significance We provide a global while non exhaustive overview of the molecular mechanisms altered in of DSS children and suggest how they may interact to lead to final vascular homeostasis breakdown We suggest that some mechanisms identified should be considered putative therapeutic targets or biomarkers of progression to DSS Citation Devignot S Sapet C Duong V Bergon A Rihet P et al 2010 Genome Wide Expression Profiling Deciphers Host Responses Altered during Dengue Shock Syndrome and Reveals the Role of Innate Immunity in Severe Dengue PLoS ONE 5 7 e11671 doi 10 1371 journal pone 0011671 Editor Patricia T Bozza Funda o Oswaldo Cruz Brazil Received January 20 2010 Accepted June 22 2010 Published July 20 2010 Copyright 20
266. e RR SUN Ne UN ANNE de 25 1 3 3 La chromatine histones et marques pig n tiques 25 1 34 Les ARN nontodants 4 544 450 du ve sa out kate 27 1 3 5 Epig n tique et pig nomes sec 4 4 e a 4 su ee h ea 29 1 4 Les techniques de s quencage tr s haut d bit 31 1 4 1 Principes du s quen age tr s haut d bit 32 1 4 2 Techniques d analyses bas es sur le s quen age HTS 45 15 Apports des techniques de puces ADN et de s quen age tr s haut d bit 51 1 6 Langages de programmation pour l analyse de donn es 53 1 1 tude des pathologies La pathologie est une partie de la m decine qui a pour objet l tude des maladies et notamment leurs causes leurs m canismes leurs d veloppements et leurs sympt mes Un abus de langage relativement r cent et populaire consiste faire du mot pathologie un synonyme du mot maladie en l utilisant ainsi pour toute alt ration pathologique d un m canisme ou d un processus biologique La plupart des maladies sont multifactorielles c est dire quelles poss dent plusieurs alt rations ou causes Leur occurrence d pend de l environnement dans le cas de l infection par exemple du v cu de l individu mais aussi des pr dispositions que lui conf re son patrimoine g n tique pour des maladies h r ditaires Dans ce cas les facteurs g n tiques ne font que pr dispo
267. e alternative solution Indeed user may fill the slots by invoking gt dataPhenoD lt data frame x 1 4 y rep c Brain Heart 2 z I LETTERS 1 4 row names paste Sample 1 4 sep _ gt metaData lt data frame labelDescription c Numbers Tissue Condition gt PhenoD myob lt new AnnotatedDataFrame data dataPhenoD varMetadata metaData gt PhenoD myob rowNames Sample_1 Sample_2 Sample_3 Sample_4 varLabels and varMetadata x Numbers y Tissue z Condition gt Miame myob lt new MIAME title name Experience name lab There is an exemple of MIAME file INSERM TAGC ERM206 contact Mr Dupond url http tagc univ mrs fr abstract an abstract describing the experiment gt Miame myob Experiment data Experimenter name Experience name Laboratory INSERM TAGC ERM206 Contact information Mr Dupond Title There is an exemple of MIAME file URL http tagc univ mrs fr PMIDs Abstract A 5 word abstract is available Use abstract method Futhermore user can call the read AnnotatedDataFrame and read Miame functions gt PhenoD myob lt read AnnotatedDataFrame filename paste getwd phenoData txt sep sep t head T fill NA quote gt Miame myob lt read MIAME filename paste getwd miame sep 2 4 Object informations 2 4 1 Class description Complete description of sl
268. e diff rence significative pas d effet biologique entre 2 ou plusieurs groupes au risque de se tromper Le r sultat du test d hypoth se est une probabilit d sign e p valeur qui cro t plus le ph nom ne observ la variation de l expression d un g ne entre 2 ou plusieurs conditions est li au hasard Le principal mode de contr le de l erreur de type I ou risque est le FDR False Discovery Rate Il permet d estimer la proportion q d erreurs parmi les g nes consid r s comme diff rentiellement exprim s faux positifs Les m thodes FDR sont g n ralement plus puissantes et moins conservatrices que les autres approches telles que le Family wise error rate FWER Selon le seuil choisi par l utilisateur la s lection de g nes diff rentiellement exprim s sera plus ou moins s v re g n ralement un seuil standard de FDR 5 est utilis 3 1 1 Testt Le test t de Student compare les moyennes de deux groupes d chantillons et d termine en fonction d un risque fix si ces moyennes sont significativement diff rentes pour chaque g ne Callow et al 2000 Ce test param trique peut tre r alis de mani re appari e ou non Les tests appari s sont plus puissants car en couplant par chantillon dans le cas o un m me chantillon est utilis avant apr s un traitement par exemple cela permet de r duire la variabilit de l expression des g nes qui diff re d un chan
269. e est statistiquement enrichie en g nes associ s ce processus biologique ou voie de signalisation Plusieurs m thodes statistiques peuvent tre utilis es pour cette comparaison Draghici et al 2003 Elles peuvent inclure 1 le test du chi 2 2 le test exact de Fi sher 3 la distribution hyperg om trique et 4 le test binomial Le test du chi 2 est simple calculer mais il ne donne qu une valeur approximative p et 1l est limit aux cas o le nombre d observations de chaque type par exemple les g nes surexprim s qui apparaissent dans le mot cl est sup rieur cinq S il y a moins de cinq observations une alternative ce test est celui du test exact de Fisher Ce calcul correspond la probabilit exacte de voir le nombre observ d occurrences Sinon la probabilit de l existence d un nombre sp cifique de g nes d une classe dans une liste de g nes peut tre calcul e gr ce la distribution hyperg om trique Cette distribution est utilis e pour l chantillonnage de populations finies mais se rapproche de la distribution binomiale pour un nombre lev d chantillons tant donn que les puces contiennent g n ralement des sondes repr sentant des dizaines de milliers d ARNm cette approximation binomiale peut tre utilis e Ces tests statistiques donnent des p valeurs qui d crivent la probabilit d obtenir le r sultat observ Des permutations et des corrections de t
270. e for functional genomics data sets 10 years on Nucleic Acids Res 2011 39 D 1005 1010 2 Xie X Lu J Kulbokas EJ Golub TR Mootha V Lindblad Toh K Lander ES Kellis M Systematic discovery of regulatory motifs in human promoters and 3 UTRs by comparison of several mammals Nature 2005 434 338 345 3 Pique Regi R Degner JF Pai AA Gaffney DJ Gilad Y Pritchard JK Accurate inference of transcription factor binding from DNA sequence and chromatin accessibility data Genome Res 2011 21 447 455 4 Gehlenborg N O Donoghue SI Baliga NS Goesmann A Hibbs MA Kitano H Kohlbacher O Neuweger H Schneider R Tenenbaum D Gavin A C Visualization of omics data for systems biology Nat Methods 2010 7 S56 68 5 Shannon P Markiel A Ozier O Baliga NS Wang JT Ramage D Amin N Schwikowski B Ideker T Cytoscape a software environment for integrated models of biomolecular interaction networks Genome Res 2003 13 2498 2504 6 Le B chec A Portales Casamar E Vetter G Moes M Zindy P J Saumet A Arenillas D Theillet C Wasserman WW Lecellier C H Friederich E MIR NT N a framework integrating transcription factors microRNAs and their targets to identify sub network motifs in a meta regulation network model BMC Bioinformatics 2011 12 67 7 Szklarczyk D Franceschini A Kuhn M Simonovic M Roth A Minguez P Doerks T Stark M Muller J Bork P Jensen LJ von Mering C The STRING database in 2011 functiona
271. e goal of reverse engineering gene regulatory networks GRN as a whole in order to predict the system outcome under molecular perturbations One current limit for biologists interested in mining regulatory information or for bioinformati cians interested in creating regulatory maps for modeling is that this information is scattered over the Internet under various formats making it difficult to handle Thus one needs to create a unified database that would list known and predicted molecular interactions This information can be ob tained from different sources 1 from the literature 11 from large scale experimental methods that allow genome wide profiling of transcription factors TFs binding sites to DNA or iii from DNA sequence analysis by searching 3 UTR regions for miRNA specific motifs or by scanning gene pro moters with transcription factor specific position weight matrices PWMs In the latter case the use of comparative genomics is known to greatly improve predictions of functional TF binding sites by limiting the number of false positives though increasing false negative rate 2 3 Another limit of GRN analysis is the intrinsic complexity of the data In this regard several graph based tools have been developed to draw a global picture of the putative interactions taking place in the biological context of interest for a review see reference 4 In these genes or proteins appear as nodes in a graph and functional relations
272. e is obtained by sampling n distance values from the gene gene distance matrix D and by extracting the k smallest value This procedure is repeated n times December 2008 Volume 3 Issue 12 e4001 to obtain a set of simulated DKNN values S As shown in Figure 3 dotted line the variance of the simulated DKNN values is very low compare to that observed using the real dataset Indeed we can think of simulated DKNN values as the distances to the k element ifno structure existed in the associated space In this case we would expect elements to be uniformly spread throughout the space and the variance of DKNN value to be low In practice several sets Si are computed and thus several distributions of simulated DKNN values are obtained For each observed DKNN value d a false discovery rate FDR value is estimated by dividing the mean number of simulated DKNN below d by the number of observed value below d The critical value of DKNN is the one for which a user defined FDR value typically 10 is observed Given a set of selected genes the next issue is to partition them into homogeneous clusters This step is achieved through a graph partitioning procedure In the created graph edges are constructed between two genes nodes if one of them belongs to the k nearest neighbor of the other Edges are weighted based on the respective coefficient of correlation 16 similarity and the graph obtained is partitioned using the Markov CLustering Al
273. e mesure en gigabytes Gb voire en terabytes Tb 5 2 1 Organisation mat rielle et logicielle Afin de pouvoir aligner les s quen ages et analyser les r sultats g n r s par une exp rience il est indispensable de poss der une ferme de calcul puissante on utilise commun ment le terme anglais cluster et des unit s de stockage d di es qui permettront de stocker les Tb 222 Chapitre 5 tude de la r gulation transcriptionelle par HTS A Read sur le brin sens Fragment 3 d ADN ON s enrichie en ChIP S Read sur le brin antisene AA ALIGNEMENT SUR LA SEQUENCE DE REFERENCE B Facteur de transcription exemple CTCF Modification d histone exemple H3K27me3 ARN Polymerase II Ficure 5 2 Distribution th orique de fragments s quenc s apr s alignement sur une s quence de r f rence avec en A la d finition d un pic o d correspond la taille de sonication et en B les diff rents profils de pics Adapt de Wilbanks amp Facciotti 2010 et Kidder et al 20111 5 2 L informatique du HTS 223 A Immunopr cipitation de la chromatine Ay Ny De Me 159460500 159461900 ehr8 3630000 3640000 3650000 3660000 156456500 1seMeccoy L FOxaa binding sequere aka 4 a r MS Ci EE Aar ome gun 1 rene y E preadipocyte adipocyte i ss tt du ih ail bunca m min ma wasn ama m D e L dll CT PUS ee ee recede mn sa x po MU i D L db 13
274. e on the microarray and S100416 Figure 4B were significantly underexpressed in FD samples The expression pattern of these two candidate genes was essentially identical at all passages with the 3 reference genes which demonstrates the validity and reliability of the array data PLoS ONE www plosone org FD OE MSCs migration is altered compared to controls To explore the functional consequence of a down regulated expression of genes involved in cell migration in FD hOE MSCs compared to control cells we used the Boyden s chamber assay After comparing the migration pattern of 3 control and 3 FD hOE MSCs in serum medium and serum free medium ITS we determined that FD cells invasion is significantly reduced compared to control cells both in serum and in ITS medium Figure 5 Confirmed down expression of first and final IKBKAP exons in FD hOE MSCs Since we and others 22 40 did not detect ZXBKAP among the significantly down regulated transcripts in FD compared to control samples we asked whether this discrepancy could be due to a lack of sensitivity of microarray compared to RT qPCR For this purpose we decided to analyze JABKAP levels of expression by investigating other exons distal from JABKAP exon 20 By looking at the beginning of JABKAP transcript we identified a second event of alternative splicing After amplifying transcripts from exon to exon 5 we obtained 2 PCR products Figure 6B upper panel The sequencing
275. e par la modulation des marques pig n tiques et 4 la r gulation par les ARN non codants 1 3 1 La transcription basale La transcription basale de l ADN en ARN s effectue sous l influence des ARN poly m rases et de nombreux facteurs de transcription g n raux Les ARN polym rases sont dits ADN d pendants et leur type permet la transcription d ARN diff rents Ainsi le type I est l origine des ARN ribosomiques alors que le type II est l origine des ARN messagers de la plupart des small nuclar RNA snRNA des small nucleolar RNA snoRNA et des microRNA Kornberg 1999 Sims et al 2004 enfin le type II est l origine de la synth se des ARN de transfert ainsi que de l ARN ribosomique 5S Quant aux facteurs de transcription g n raux comme la famille TFH TFIA TFIIB TFIID TFIIE TFIIF TFIIH et TFIIS Lee amp Young 2000 ils sont requis pour permettre le recrutement de l ARN polym rase If PollI aux promoteurs formant ainsi le complexe de pr initiation de la transcription Orphanides et al 1996 Figure 1 3 Les g nes des eucaryotes poss dent des s quences r gulatrices pr sentes proximit du site d initiation de la transcription ou TSS pour Transcription Start Site constituant le promoteur proximal Celui ci est le lieu de formation du complexe de pr initiation de la transcription La modulation de la transcription de ADN par la Poll est effectu e par des facteurs de tr
276. e predictive markers of progression to DSS Supporting Information Figure S1 Validation of microarray results by RT PCR Pearson s correlation was calculated between microarray expres sion signals horizontal axis and Delta Ct values from real time PCR vertical axis for nine genes highly associated to dengue shock syndrome Correlation is significant at 0 01 Found at doi 10 1371 journal pone 0011671 s001 4 94 MB TIF Table S1 Clinical and biological characteristics of each DF DHF and DSS patient Found at doi 10 1371 journal pone 0011671 s002 0 04 MB XLS Table S2 List of the 3515 clones corresponding to the 2959 genes differentially expressed between DF DHF and DSS July 2010 Volume 5 Issue 7 e11671 patients identified using the multi way ANOVA at a false discovery rate of 10 Clones corresponding to the 2959 genes are listed according to their association to DSS the first one being the gene of which expression level variance is the most influenced by the clinical phenotype HUGO gene names are indicated The variation is the one related to the DSS group relatively to DF and DHF ANOVA analysis of variance DF dengue fever DHF dengue hemorrhagic fever DSS dengue shock syndrome NA not available a percentage of variance associated to disease pheno type Found at doi 10 1371 journal pone 0011671 s003 0 90 MB XLS Acknowledgments We greatly thank Pr Y Buisson for supporting this program I Droue
277. e r sultats avec une ligne par sonde nom du g ne log ratio signal normalis intra array 68 Chapitre 2 Contr le qualit et normalisation de donn es de puces ADN signal moyen Si le fichier correspond des exp riences de puces ADN two colors celui ci contiendra des colonnes pour chacune des couleurs Dans cette derni re table seules les informations pr c demment cit es sont extraites De plus un fichier est g n r pour chaque chantillon il est donc n cessaire de combiner ces informations en un seul fichier final pour une exp rience donn e Pour chaque chantillon les donn es une fois extraites sont collect es l int rieur d un seul objet de type S4 diff rent pour les donn es brutes et normalis es C est un objet complexe compos de multiples objets simples vecteur scalaire matrice Pour les donn es brutes one color cet objet est de classe AgilentBatch alors que pour les objets two colors il se nomme AgilentBatchRG Une fois les donn es normalis es les objets cr s sont de classe AgilentNorm et AgilentNormRG respectivement pour les puces ADN one color et two colors L tape suivante concerne l tablissement du contr le qualit des donn es brutes et norma lis es l aide de l objet qui comporte les principales informations n cessaires voir le manuel d utilisation pour une description des donn es contenues dans l objet Dans ce but des repr sentations grap
278. e type S4 correspondant une liste de vecteurs et ou matrices ainsi que des m thodes et fonctions poss dant chacune une fiche d aide associ e Ces fiches d aides mais galement le manuel d utilisation de la librairie appel vignette sous R sont crits en langage LaTeX avec des fichiers Rd pour les fiches d aide et Rnw pour la vignette Ceci permet le balisage et une formulation standard des documents l extraction facile de renseignement et l inclusion de code R au sein de la vignette Ces librairies ont une structure architecture d finie commune Figure 2 3 2 5 Principe de la librairie R AgiND 67 A B limma DESCRIPTION limma inst CITATION DESCRIPTION changelog txt z z o changelog txt index html Documentation de la librairie index html sous diff rents formats limma pdf limma R limma Rtex usersguide pdf TNR aliases rds Fiche d aide de chaque fonction et objet de la librairie Anindex limma rdb limma rdx paths rds html 00Index htm 03reading Rd 04Background Rd 05Normalization Rd 06linearmodels Rd Per limma so alias2Symbol R Code R dela librairie autres ee arrayWeights R languages dans src et fonctions fe hacer ie arrayWeightsSimple R et objets n cessaire au links rds fonctionnement de la librairie nsinfo rds dans le fichier NAMESPACE package rds Rd rds
279. ects of the insulin receptor substrate IRS system in human metabolic disorders Faseb J 15 2099 2111 White MF 2002 IRS proteins and the common path to diabetes Am J Physiol Endocrinol Metab 283 E413 422 Makowski L Hotamisligil GS 2005 The role of fatty acid binding proteins in metabolic syndrome and atherosclerosis Curr Opin Lipidol 16 543 548 PLoS ONE www plosone org 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 94 95 96 97 98 99 100 15 Molecular Mechanisms of DSS Howard JK Flier JS 2006 Attenuation of leptin and insulin signaling by SOCS proteins Trends Endocrinol Metab 17 365 371 Qatanani M Szwergold NR Greaves DR Ahima RS Lazar MA 2009 Macrophage derived human resistin exacerbates adipose tissue inflammation and insulin resistance in mice J Clin Invest Boot RG van Achterberg TA van Aken BE Renkema GH Jacobs MJ et al 1999 Strong induction of members of the chitinase family of proteins in atherosclerosis chitotriosidase and human cartilage gp 39 expressed in lesion macrophages Arterioscler Thromb Vasc Biol 19 687 694 Horton JD Cohen JC Hobbs HH 2009 PCSK9 a convertase that coordinates LDL catabolism J Lipid Res 50 Suppl S172 177 Scatena M Liaw L Giachelli CM 2007 Osteopontin a multifunctional molecule regulating chronic inflammation and vascular
280. ects of culture conditions on gene expression We anticipated that a strong gene dysregulation observed in microarray would be more significant if this expression is stably maintained at any cell passage Most of the differentially expressed genes were found to have a modest lt 2 fold change Table 1 Interestingly like previous studies we observed that a majority of genes were down regulated in FD hOE MSCs Table 1 negative values and only 4 genes were up regulated Table 1 and Table S1 positive values This observation is in agreement with other studies 10 22 40 and may reflect a defect in transcription due to decreased Elongator activity as previously proposed 10 Impor tantly 10 genes in our list 20 appeared to be correlated with one or two previous investigations Table 1 and Table S1 last column In one of the past studies JABKAP expression level can be downregulated by RNAi in control cells 10 where there is no production of MU transcripts Thus different studies share dysregulated genes in different contexts of either constitutive or alternative splicing of JABKAP mRNA This suggests that JABKAP alternative splicing may not be the only pathological alteration in FD Similar to what was previously reported our study revealed the downregulation of gelsolin GSM a protein involved in cell motility that causes defects in cytoskeleton reorganization and cell migration in FD 10 11 22 34 The most dysregulated gene o
281. ed IKBKAP mRNA alternative splicing in FD hOE MSCs and identified 2 novel spliced isoforms also present in control cells We observed a significant lower expression of both KBKAP transcript and IKAP hELP1 protein in FD cells resulting from the degradation of the transcript isoform skipping exon 20 We localized IKAP hELP1 in different cell compartments including the nucleus which supports multiple roles for that protein We also investigated cellular pathways altered in FD at the genome wide level and confirmed that cell migration and cytoskeleton reorganization were among the processes altered in FD Indeed FD hOE MSCs exhibit impaired migration compared to control cells Moreover we showed that kinetin improved exon 20 inclusion and restores a normal level of IKAP hELP1 in FD hOE MSCs Furthermore we were able to modify the XBKAP splicing ratio in FD hOE MSCs increasing or reducing the WT exon 20 inclusion MU exon 20 skipping ratio respectively either by producing free floating spheres or by inducing cells into neural differentiation Conclusions Significance hOE MSCs isolated from FD patients represent a new approach for modeling FD to better understand genetic expression and possible therapeutic approaches This model could also be applied to other neurological genetic diseases Citation Boone N Loriod B Bergon A Sbai O Formisano Tr ziny C et al 2010 Olfactory Stem Cells a New Cellular Model for Studying Molecular Mechanism
282. ed in DSS patients blood cells are considered endogenous danger signals or Danger Associated Molecular PLoS ONE www plosone org Molecular Mechanisms of DSS Patterns DAMPs Tables 3 and 4 molecules with DAMP activity are indicated capable to trigger secondary systemic inflammatory responses through direct interaction with surface or intracellular receptors such as TLRs or NODs expressed in endothelial or innate immune cells 127 DAMPs include a diversity of molecules without structural similarity either actively produced by immune cells in the context of an infection or passively secreted by damaged tissues 128 129 now considered key inducers of secondary systemic inflammation in a number of acute inflam matory syndromes 130 131 or chronic diseases 132 Amplification of inflammation during DSS through direct signalling by molecules harbouring DAMP activity via TLRs is also supported by the increased abundance of DAMP induced transcripts as those encoding the pro inflammatory IL 18 cytokine or the NLRC4 CARD12 intracellular sensor 55 Interestingly association of allelic polymorphisms of TLR4 with DSS suggested by De Kruif and colleagues 46 suggests that differential signalling through TLRs may contribute to the severity of dengue disease as suspected for other pathologies 133 Accordingly anti inflam matory drugs targeting Toll like receptors are now under development for a number of inflammatory pathologies wher
283. ed through the knowledge based IPA software Indeed IPA analysis identified that 163 canonical pathways were significantly associated with those genes data not shown with a large proportion of immune related pathways in the first top 30 Figure 2 In particular several under expressed but partially redundant signaling canonical pathways related to T lymphocyte activation were identified of which the T cell receptor TCR signaling pathway Figure 3 which has the strongest association with the DSS gene signature Interestingly a number of metabolic pathways and particularly of lipid signaling path ways were significantly represented among the 163 DSS related canonical pathways When comparing our results to those of colleagues who reported gene or protein signatures associated with DSS we identified some transcripts encoding proteins considered putative markers of severe dengue This includes non exhaustively the acute phase pentraxin related protein PTX3 15 the anti inflammatory IL 10 11 or the pro inflammatory IL 18 12 cytokine transcripts that have increased abundance in the DSS gene signature while having intermediate to low statistical association with the disease phenotype variable according to the multi way ANOVA Table S2 IFN type I related transcripts of which abundance was shown to be decreased in DSS patients by others 27 28 42 represented only a limited number of genes associated with the DSS gene signature Th
284. eleton regulation and cell motility migration 10 This role may underlie a cell motility deficiency in FD neurons because of impaired transcriptional elongation of some genes coding for proteins involved in cell migration Indeed one study found that mouse neurons defective in Elongator exhibit reduced levels of acetylated a tubulin causing defects in radial migration and branching of cortical projections neurons 11 Another study showed that Caenorhabditis elegans Elongator complex is required for correct acetylation of microtubules and neuronal development 12 IKAP hELP 1 protein is also involved in other cellular processes including tRNA modifications 13 15 exocytosis 16 and zygotic paternal genome demethylation 17 Recently its homolog in fly D elp 1 has also been suggested to be involved in RNA interference through a RNA dependent RNA polymerase activity 18 To better understand the molecular mechanisms leading to aberrant splicing of JABKAP mRNA in FD creation of model systems recapitulating the pathological development of neural cells is required Because ZXBKAP gene knock out causes embryonic lethality 19 an animal model that exhibits the major phenotypic characteristics observed in FD humans has not yet been established However a humanized JABKAP transgenic mouse model for FD has been created 20 that reproduces the tissue specific splicing of ZXBKAP mRNA in nervous tissues Such a model is a notable progress in
285. elles approches de classification et de s lection de g nes ont t mises au point et test es partir de ce jeu de donn es Inza et al 2004 Wu ef al 2011b Wang amp Simon 2011 Moorthy amp Mohamad 2011 Le m me type de classification a t r alis plus tard par Sorlie et collaborateurs en 2003 puis par Bertucci et collaborateurs en 2004 sur des chantillons de cancers du sein Sorlie et al 2003 Bertucci ef al 2004 Ces tudes ainsi qu une tude histologique de ces m mes chantillons sont a la base de la classification des tumeurs du sein en 5 groupes basale luminale A luminale B ERBB2 et normale Elles ont permis la d finition de bio marqueurs comme le g ne ERBB2 pour v erb b2 erythroblastic leukemia viral oncogene homolog 2 permettant de caract riser les tumeurs et d ajuster ainsi le traitement en fonction de leurs caract ristiques Bertos amp Park 2011 De m me que les puces ADN le s quengage tr s haut d bit est de plus en plus utilis De nombreuses publications paraissent pour permettre l analyse de pathologies au niveau pig n tique par ChIP seq et au niveau transcriptionnel par RNA seq Ainsi les techniques HTS ont pu tre utilis es pour tudier par exemple 1 les cibles de facteurs de transcription pour diff rents types de cancers comme par exemple FOXA1 ER et CTCF pour des lign es de cancers du sein Ross Innes ef al 2011 Hurtado et
286. ement de l analyse tertiaire FIGURE 5 5 D roulement de la pr paration du s quen age et de l analyse l aide des diff rents logiciels Adapt du manuel d utilisation d Applied Biosystems SOLiD Experimental Tracking Software SETS v4 0 1 informatique du HTS 227 5 2 L informatique du P ICS Application DER Fie Tools Wizards Window Help Welcome lab_user Run Name solido360 20100415 WFA MM Sal3 ere so d0360 20100416 run MM Sala o Created by lab_user created by lab_user of Pedi Run P clear ji Run logs f Heat Map anh 4 fF Edt Run P Clear RunLogs Heat Map ais wr Setup Sample Slide Sample Slide Reece 4 Samples in 4_spot_WFA_mask_sf Hide Samples lt lt 4 Samples in 8_spot_mask sf ES manage Runs G apture 2 ChIP_REIP_EL4_07pM 2 ChIP_REIP 3 ChIP_ETS1_EL4_07pM o 3 ChIP_ETS1_EL4 4 ChIP_TLX3_DN_Thymo_07pM 4 ChIP_TLX3_DN_Thymo o AE ies 5 Details a Details 1YG5_1254 0 1YGS_1254_a Description 3 Description 2 primaryAnalysisSetting default primary primaryAnalysisSetting default primary z pi Lib s ond defa none defa Singh a si 4 4 24 78 25 930 76 100 m L 187 744 111 M Protocol Protocol SOLID Opti WFA F3 1 bases SOLID Opti F3 50 bases PreScan S Scan F3 E GERMELLLLLLECECIBEELLLLECECIBEEELLL CEE TIET ELEC EE TE TETE LL LEE Prime Buffe
287. ement incluses dans la base de donn es Ce pipeline permet de pouvoir inclure n importe quelle exp rience condition que la plateforme soit pr sente dans notre base de donn es et que l on dispose du fichier se riesMatrix provenant de GEO De plus l utilisation de proc dures stock es nous permet galement de remplir toutes les tables automatiquement L enrichissement fonctionnel des signatures transcriptionnelles est ensuite valid par un test exact de Fisher avec une correction de Benjamini et Hochberg Toutes ces nouvelles donn es sont accessibles via l interface graphique de TBrowser mais 170 Chapitre 4 Fouille de donn es de puces ADN galement par de nouveaux plugins des services web et une librairie R Bioconductor ainsi que par le d veloppement de nouveaux plugins pr sent s ci apr s 4 6 D veloppement de nouvelles fonctionnalit s 4 6 1 Nouveaux modes de requ tes Afin de r pondre aux demandes des utilisateurs il est maintenant possible de construire des requ tes bool ennes bas es sur les identifiants gene ID et homologene ID De plus partant du principe que l on ne retrouve pas forc ment chaque fois l ensemble des g nes comme tant coexprim s nous avons impl ment un autre mode de requ te non bool en par liste Cette variabilit des signatures peut galement tre visualis e gr ce aux cartes transcriptionnelles dans le plugin TBMap Ce nouveau mode de requ te perme
288. en 3 du g ne Extrait de Schones amp Zhao 2008 1 4 Les techniques de s quencage tr s haut d bit 31 permis d avoir une vue globale l chelle du g nome de l activit des l ments cis r gulateurs de la fonction des facteurs de transcription et des processus pig n tiques impliqu s dans le contr le de l expression des g nes 1 4 Les techniques de s quen age tr s haut d bit Les techniques de s quen age tr s haut d bit HTS pour High Thoughput Sequen cing commun ment et improprement appel es NGS pour Next Generation Sequencing connaissent un d veloppement spectaculaire depuis leur apparition commerciale au d but 2006 Margulies et al 2005 Shendure et al 2005 Hutchison 2007 Chan 2005 Elles constituent la troisi me g n ration de s quen age apr s les m thodes de Sanger et de Maxam et Gilbert en 1977 et le pyros quencage en 1988 Le s quen age de l ADN a t invent dans la deuxi me moiti des ann es 1970 Deux m thodes ont t d velopp es ind pendamment une m thode de d gradation chimique s lective par l quipe de Walter Gilbert Maxam amp Gilbert 1977 et une m thode de synth se enzymatique s lective par Frederick Sanger Sanger et al 1977 Prober et al 1987 Puis le pyros quen age a t d velopp et reste encore aujourd hui une technique tr s utilis e car elle est plus rapide que les m thodes classiqu
289. en fonction des cellules utilis es veiller adapter divers param tres tels que la dur e de crosslink et de sonication le pourcentage de formald hyde l intensit de sonication selon le mod le du sonicateur le volume final la quantit de cellules Autant de param tres qui d termineront la qualit du ChIP et l obtention de fragments de la taille souhait e La taille des fragments d ADN pour le ChIP seq est en g n ral comprise entre 100 et 300 paires de bases en fonction des param tres de sonication temps intensit Elle est v rifi e par migration sur gel SDS page ou l aide du bioanalyzer Agilent 5 1 Principe de l immunopr cipitation de la chromatine associ e au s quen age tr s haut d bit ChIP seq 219 L abondance des prot ines fix es ou des histones modifi es ainsi que la qualit de l anti corps sont des crit res qui doivent tre pris en compte afin de d terminer le nombre optimal de cellules qui sont n cessaires pour l exp rience Le rapport signal sur bruit tant directement corr l la quantit d ADN l utilisation d un nombre excessif de cellules tend augmenter le bruit de fond Kidder ef al 2011 Ainsi pour une exp rience de ChIP seq le nombre de cellules utilis es est g n ralement compris entre 1x10 et 10x10 ce qui quivaut 10 100 ng d ADN immunopr cipit De petites quantit s de cellules sont g n ralement suffisantes pour l analyse de prot ines
290. en vue de leur amplification par PCR Polymerase Chain Reaction en mulsion ou par pontage sur phase solide Figure 1 7 Leur s quence ne s aligne pas sur celle du g nome ce qui permet une amplification PCR sp cifique des s quences cibles que l on souhaite s quencer Les fragments d ADN sont ensuite s lectionn s en fonction de leur taille On peut galement utiliser des adaptateurs particuliers poss dant en plus de la s quence adaptatrice une courte s quence d identification sp cifique Cette courte s quence de 5 nucl otides est appel e code barre barcode En utilisant un jeu de code barres unique pour chaque chantillon cette technique appel e multiplexage permet de s quencer plusieurs chantillons dans la m me cellule ou dans la m me piste cette tape on ne parle plus d chantillons mais de librairies Ces librairies sont enfin s quenc es simultan ment lors d un cycle de s quen age ou run Les reads obtenus sont ensuite r affect s automatiquement chaque chantillon gr ce l identification informatique du code barre 1 4 Les techniques de s quencage tr s haut d bit 37 Life Technologies Roche 454 ligation de dibases pyros quen age Piimer universel n SS Fiux de i dNTP Bile z 3 magn tique Adapiateur P1 S querce cible pl de Piobes dibases Code couleur Exemple de s quence pico 24 base eo A x AT AC BA AG Z
291. ences les s quences sous les pics provenant du ChIP et celles provenant de la condition contr le input Les pics mais aussi les reads peuvent tre visualis s sur le g nome l aide de genome browser qui permet l ajout d annotations track telles que les s quences r p t es obtenues par repeatmasker des donn es de transcriptome de CGH afin de pouvoir mieux interpr ter les donn es Un autre crit re important pour la recherche de motifs est leur conservation au cours de l volution Cai et al 2010 De nombreux outils de recherche de motifs uti lisent galement ces donn es de conservation pour affiner leur analyse comme ECRbase Loots amp Ovcharenko 2007 par exemple 5 3 Analyse de donn es de ChIP seq 239 Generate signal profile along each chromosome Define background model or data Control data Tag count Tag count Position bp Position bp Identify E Poak region peaks in ChIP signal 8 Enrichment relative g to background F Assess significance Position bp Tags Mo Pa Posilion bp positive ambiguous 400 800 1200 1600 negative 6 500 1500 2500 500 1500 2500 FIGURE 5 12 Choix de la m thode de d tection de pics et repr sentation des artefacts Adapt de Pepke ef al 2009 et Rye et al 20111 240 Chapitre 5 tude de la r gulation transcriptionelle par HTS cor ditional binomial model
292. entifying relevant DSS biomarkers thus remains a major challenge 5 6 Indeed DSS occurs by the end of the acute infection in only a fraction of dengue infected patients and current severity criteria based on the 1997 World Health Organization WHO classifi cation of dengue severity fail to predict a significant proportion of patients who progress to life threatening DSS 19 21 Attempting to decipher molecular mechanisms underlying DSS by analyzing circulating whole blood cell genome wide expression profiles is a relevant approach regarding the study of other systemic inflammatory syndromes where a cognate cross talk between endothelial vascular cells and blood cells occurs 22 24 Whole blood represent a highly informative while complex cellular sample that may reflect host pathophysiological responses ongoing at the time of blood sampling 22 Furthermore whole blood cells are easy to collect and store during field studies on large cohorts reducing samples volumes required and limiting technical bias due to cell purification However due to the high cellular complexity of whole blood cells samples whole gene expression patterns should be carefully analyzed and deciphered to allow returning to an integrative view of the molecular mechanisms altered during the pathophysiological process studied 25 Such a bench to bedside medical research has gained more and more interest in the recent years Indeed it allowed improving the understa
293. eously altered in the blood cells of DSS children at the onset of shock In particular T and NK lymphocyte transcriptional responses are globally impaired while genes implicated in compensatory anti inflammatory and repair remod eling responses and in innate immune responses are over expressed This highlights the complexity of biological responses at the time of dengue shock syndrome and points out similarities between DSS and other critical syndromes such as severe sepsis or post trauma SIRS that are similarly characterized by depressed T lymphocyte but over expressed innate immunity 94 95 Reduced abundance of a number of T lymphocyte related transcripts at the time of DSS may reflect a feed back mechanism aimed at limiting an initial early T lymphocyte activation reported to occur in patients who further progress to severe dengue 96 97 Such a negative feed back may be sustained by the over expression of a diversity of anti inflammatory transcripts in DSS patients blood cells at the time of shock In particular the two potent immunomodulating factors prostaglandin E synthase and VSIG4 which dampen both T and NK lymphocyte responses 43 45 and have both a strong statistical association with the DSS phenotype could have such a negative effect Based on those observations and previous clinical reports the benefit of corticotherapy in DSS patients might thus be questioned 98 99 Over expression in the blood of DSS children of seve
294. epting a higher number of false positive genes also provide larger and enriched gene lists that should be more informative when searching to identify molecular pathways Based on this rationale we chose to work using the gene list generated at FDR 10 after we verified by a different statistical method currently used for the analysis of microarrays data SAM Significant Analysis of Microarray 33 that most significant genes were commonly found by the two types of analyses data not shown The gene list generated at FDR10 included 2959 genes differentially expressed between DF DHF and DSS patients groups Table S2 The biological relevance of those differentially expressed genes was assessed using local ANOVA that allows evaluating the contribution of the main variable disease phenotype and that of other putative confounding variables related to patients age gender day of blood sampling viral serotype and to technical steps effect of independent RNA extractions amplifications and hybridization on variations of expression levels of those 2959 genes This confirmed that the disease PLoS ONE www plosone org phenotype strongly influenced the variations of expression of the 2959 genes differentially expressed between the three patient groups reinforcing the biological significance of this set of genes Table S2 Unsupervised hierarchical clustering based on the 2959 gene signature identified was then applied to the 48 childre
295. equest by list belD platformID experimentID ES ID ontologyID Programmatic NONE Webservice SOAP WSDL and acces RTools4TB R Bioconductor package DFB MCL algorithm and webservice interface TABLE 4 1 Bilan des avanc es du projet entre sa publication en 2008 et maintenant 210 Chapitre 4 Fouille de donn es de puces ADN Il est ainsi possible d utiliser des donn es de coexpression de g nes comme celles contenues dans notre base de donn es afin d am liorer la d tection de bons candidats Autres approches de m ta analyses partir des donn es pr sentes dans GEO TBrowser n est pas le seul projet ayant pour but d tudier les coexpressions de g nes partir de donn es de puces ADN pr sentes dans les bases de donn es publiques comme GEO D autres outils proposent des approches diff rentes Table 4 2 mais toutes contrairement TBrowser utilisent les informations sur les chantillons d pos es dans GEO Ils utilisent donc non pas les GSE comme TBrowser mais les datasets GDS Ils proposent d obtenir partir d un g ne donn des g nes similairement exprim s en nous renseignant sur le contexte exp rie mental GeneChaser permet de d finir les diff rents contexte dans lesquels un g ne donn est trouv comme diff rentiellement exprim alors que MARQ renvoie une liste de g nes similairement diff rentiellement exprim s D autres proposent de construire des graphes de coexp
296. er 2010 Li H amp Homer N 2010 A survey of sequence alignment algorithms for next generation sequencing Briefings in bioinformatics 11 5 473 83 Li et al 2011 Li J Zhao Q amp Bolund L 2011 Computational methods for epigenetic analysis the protocol of computational analysis for modified methylation specific digital karyotyping based on massively parallel sequencing Methods in molecular biology Clifton N J 791 313 28 Linsen et al 2009 Linsen S E V de Wit E Janssens G Heater S Chapman L Parkin R K Fritz B Wyman S K de Bruijn E Voest E E Kuersten S Tewari M amp Cup pen E 2009 Limitations and possibilities of small RNA digital gene expression profiling Nature methods 6 7 474 6 Bibliographie 295 Loots amp Ovcharenko 2007 Loots G amp Ovcharenko I 2007 ECRbase database of evo lutionary conserved regions promoters and transcription factor binding sites in vertebrate genomes Bioinformatics Oxford England 23 1 122 4 Lopez et al 2008 Lopez F Textoris J Bergon A Didier G Remy E Granjeaud S Imbert J Nguyen C amp Puthier D 2008 TranscriptomeBrowser a powerful and flexible toolbox to explore productively the transcriptional landscape of the Gene Expression Omni bus database PloS one 3 12 e4001 Lopez Romero 2008 Lopez Romero P 2008 Agi4x44PreProcess PreProcessing of Agilent 4x44 array data R
297. er each of these TS is rather Table 2 Transcriptionnal signatures displaying high enrichment q value lt 1 10 20 for any of the human cytoband tested Ts ID Enrich Cytoband q value Sample type GSE ID GPL ID Authors PubMed ID 3DA3C8345 24 17q12 q21 ie Skin GSE5667 GPL97 Plager DA et al 2007 17181634 43CC3EF57 9 8q24 3 7 0 1077 Melanoma GSE7153 GPL570 Unpublished 2007 60E29DA83 16 8q24 3 6 8 10 74 Melanoma GSE7127 GPL570 Johansson P et al 2007 17516929 60E581184 26 17q25 1 5 5 10 23 Melanoma GSE7127 GPL570 Johansson P et al 2007 17516929 60E6B4129 35 20p13 1 6 10 7 Melanoma GSE7127 GPL570 Johansson P et al 2007 17516929 60E96FF1E 28 6p21 3 1 2 10 74 Melanoma GSE7127 GPL570 Johansson P et al 2007 17516929 60EC95F6A 17 7q22 1 6 3 10 7 Melanoma GSE7127 GPL570 Johansson P et al 2007 17516929 60EEBD669 32 11q23 3 1 4 1075 Melanoma GSE7127 GPL570 Johansson P et al 2007 17516929 B4C95CF18 42 8q24 3 iii S Ovary GSE6008 GPL96 Hendrix ND et al 2006 16452189 A93ED6519 16 11q23 3 6 9 1077 Melanoma GSE7152 GPL570 Packer LM et al 2007 17450523 A93DB01ED 11 7q22 1 9 5 10 Melanoma GSE7152 GPL570 Packer LM et al 2007 17450523 Transcriptional signature ID Enrichment Proportion of non redondant genes from the TS that are located in the corresponding cytoband doi 10 1371 journal pone 0004001 t002 PLoS ONE www plosone org December 2008 Volume 3 Issue 12 e4001 E NFKB pathway E AP1 pathway
298. er sp cifiquement de courtes r gions de quelques centaines de bases pr alablement identifi es Ainsi le Projet G nome Humain entrepris en 1990 avec pour mission de d terminer la s quence du g nome humain par la m thode de Sanger n a pu tre achev qu en avril 2003 avec tout de m me deux ans d avance sur la date pr vue La r alisation de ce projet a n cessit l intervention d un nombre important de biologistes pour le s quen age et de bioinformaticiens pour l assemblage des s quences ainsi que pour le d veloppement et l utilisation de puissants moyens informatiques 32 Chapitre 1 Introduction g n rale Avec les HTS le s quen age de novo ou le re s quen age du g nome humain est possible en seulement quelques jours dans les grands centres de s quen age Figure 1 6 C En effet l automatisation de tous les processus exp rimentaux et l utilisation d un pipeline d analyse automatique permettent un s quen age et une analyse extr mement rapides des chantillons De plus des compagnies d di es un type d analyse sp cifique ont t cr es elles fonctionnent 24h 24 et 7j 7 et produisent de l ordre d un milliard de s quences par jour par exemple Complete Genomics ou le BGI 1 4 1 Principes du s quencage tr s haut d bit Depuis 2005 diverses technologies ont t mises au point pour permettre le s quen age tr s haut d bit de plusieurs millions de s quences d ADN en
299. er les annotations communes d une liste de signatures KeggSearch d velopp par Fabrice Lopez repr sentant les pathways KEGG associ s une liste de g nes provenant par exemple de TBNeighborhood J ai r alis une proc dure stock e permettant la g n ration du tableau de r sultat affich dans le plugin plut t que de traiter ces donn es au niveau du plugin java ceci en incluant dans notre base de donn es les informations sur tous les pathways Kegg partir de leur base de donn es Figure 4 6 4 6 D veloppement de nouvelles fonctionnalit s 171 TBConvertor permettant l obtention de tous les identifiants et informations d une liste de g nes partir d une liste d identifiants gene ID GeneSymbol homologene ID TBMotifsSearch pour interroger les outils de recherche de motifs cis r gulateurs TFM Explorer et DiRE partir d une signature ou d une liste de g nes InteractomeBrowser d velopp par Cyrille Lepoivre repr sente une liste de g nes sous forme d interactome prot ine prot ine en y ajoutant des informations sur les cibles de miARN et sur les g nes cibles de facteurs de transcription J ai ainsi int gr dans la base de donn es ces sources d annotation Figure 4 6 Ce dernier plugin a fait l objet d une publication accept e dans BMC Bioinformatics et a t utilis pour repr senter des r sultats d une analyse transcriptome d infection par Coxiella
300. ered with the following order A Cheishvili et al 2007 B Current study C Lee et al 2009 D Close et al 2006 E Cohen Kupiec et al 2011 information about gene expression in spheres is supported in Supp Table S3 Among nervous system related genes we identified genes such as SNCA that exhibited a 10 fold downregulation in FD In ad dition to finding gene expression alterations for nervous system de velopment in spheres we also identified NOVA1 neuro oncological ventral antigen 1 encoding a neuron specific RNA binding pro tein Jelen et al 2007 as an upregulated gene in FD sphere hOE MSCs These results suggest that sphere forming cells provide an FD relevant signature even at an early undifferentiated state More over these results suggest that NOVA activity may be involved in the improvement of IKBKAP exon 20 inclusion in FD spheres Comparative Transcriptome Analysis Identify Convergent Pathways Affected in FD Four previous studies from other laboratories have generated a wealth of data on the transcriptome variations in either FD or IKBKAP knockdown samples Cheishvili et al 2007 Close et al 2006 Cohen Kupiec et al 2011 Lee et al 2009 Therefore we procured the raw data from all studies and reanalyzed the data in search for the common candidates that may be involved in FD physiopathology For each study we identified genes that are dif ferentially expressed between control and FD IKBKAP knockdown sampl
301. eries that can be done with an extended set of Boolean operators allow user to rapidly select sets of TS containing or not a given list of gene symbols Based on these TS a list of frequently observed neighbors can be created As each TS is linked to a set of biological keywords derived from ontologies user can also search for those enriched in genes involved in specific biological processes We show that Browser can be used to mine productively hundreds of experiments and to reveal underlying gene networks Furthermore using this unprecedented collection of TS we built the first synthetic transcriptional map of all human microarray data performed on Affymetrix HG U133A platform and currently available in the GEO database Results DBF MCL algorithm Conventional algorithms used for unsupervised classifications of gene expression profiles suffer from two main limitations First they do not filter out uninformative profiles and second they are not able to find out the actual number of natural clusters in a microarray dataset We can considerer genes as points located in a hyperspace whose number of axes would be equal to the number of biological samples As it is difficult to perceive high dimensional spaces a common way to illustrate classification methods is to use a 2D representation In Supplemental Figure S2 each point represents a gene and we are interested in isolating dense regions as they are populated with genes that display weak
302. ers peuvent parfois avoir une taille tr s importante plusieurs dizaines de Gb De nombreux logiciels d alignement acceptent uniquement le format fastq en entr e d o la n cessit de cr er des outils de conversion de format Afin d y parvenir ou bien encore d extraire des donn es ou de fournir des statistiques sur les l ments contenus divers outils de manipulation de donn es ont t cr s BEDTools Quinlan amp Hall 2010 SAMtools Li et al 2009 BamTools Barnett et al 2011 picard GATK McKenna ef al 2010 234 Chapitre 5 tude de la r gulation transcriptionelle par HTS HF Dae tin Mate w in PONS md we md D File 4elp ase sequence DT cond anne short fer satoto Puce cree cary aves Fe Less segera writen Fe base st cament Free inner fe mater COR haea nnl 1 squerce Iupkatien svels Corceracdid separis Ke Cars GA Lequerce ength Jetrouter e o gt 13 7 43 GL 2 it 8 Gf D 21 35 3 He er r eed Ep Ficure 5 11 Visualisation de la qualit des reads l aide des logiciels SETS ou FastQC A la saturation du signal B la densit de billes sur la lame avant le d but du s quengage C le satay plot fait sur une seule ligation dans lequel chaque point correspond a une bille Les points repr sentant des billes monoclonales sont situ es sur un des 4 axes de mesure de l intensit des 4 fluo
303. ersham and films were digitized and analyzed using the Bio 1D software RNA Isolation and semi quantitative reverse transcription polymerase chain reaction analysis Total RNA was isolated using the RNeasy Mini Kit Qiagen with DNAse treatment on the column according to manufacturer s recommendation Total RNA was subjected to reverse transcrip tion RT using the High Capacity cDNA Archive Kit Applied Biosystems End point polymerase chain reaction PCR analysis was performed using the Go Taq polymerase system Promega and JKBKAP specific primers listed in Table 2 PCR products were separated on a 1 7 agarose gel by electrophoresis in 1x TBE buffer Tris 0 89 M boric acid 0 89 M and EDTA 0 02 M DNA was visualized under UV light after ethidium bromide incorpo ration and documented using BioVision Camera Plasmid calibrators A fragment of WT ZXBKAP cDNA containing exon 19 exon 20 exon 21 and the 16 first nt of intron 21 was cloned into pcDNA 3 1 TOPO vector Invitrogen and named JABKAP cDNA cal Similarly a piece of MU JABKAP cDNA containing exon 19 exon 21 and the 19 first nt of exon 22 was cloned into a pcDNA 3 1 TOPO vector and named JABKAP skipEx20cal A piece of WT JKBKAP cDNA containing the last 103 nt of exon 35 exon37 exon38 first 90 nt was cloned into KpnI Xbal cloning sites of pcDNA 3 1 TOPO vector and named JABKAP skipEx36cal A piece of WT ZXBKAP cDNA containing the last 30 nt of exonl exon2 exon3 first 110 n
304. es 166 4 5 1 Restructuration de la base de donn es 166 4 5 2 Int gration de nouvelles donn es 167 4 6 D veloppement de nouvelles fonctionnalit s 170 4 6 1 Nouveaux modes de requ tes 170 4 6 2 Am lioration et nouveaux plugins 170 ARTICLE 5 TRANSCRIPTOMEBROWSER 3 0 INTRODUCING A NEW INTERACTION DA TABASE AND A NEW VISUALIZATION TOOL FOR THE STUDY OF GENE REGULATORY NETWORKS x ce aid does a a Glee be ee eae ae Oe eo a 173 4 6 3 Les cartes transcriptionnelles pour le plugin TBMap 205 4 7 Acc s programm la base de donn es de TBrowser 205 4 7 1 D veloppement de services web 205 4 7 2 Impl mentation d une librairie R Bioconductor RTools4TB 206 4 8 Conclusions et perspectives 207 Avec l augmentation spectaculaire de l tude du transcriptome par puces ADN il est devenu indispensable de stocker les informations relatives aux exp riences afin que celle ci puissent tre r analys es ou combin es dans le contexte de m ta analyses 146 Chapitre 4 Fouille de donn es de puces ADN Aujourd hui la probl matique de r analyse et ou de m ta analyse s inscrit dans un contexte de g nomique int grative qui vise ultimement mod liser le vivant Cette approche ambitieuse a pour but
305. es Dans le cas des puces AgilentTM le scanner G2565CA utilise le logiciel de quantification Agilent Feature Extraction AFE afin 1 de quantifier pour chaque fluorochrome tudi le signal de fluorescence mis par chaque spot 2 d valuer de fa on quantitative la qualit du signal gr ce la d termination du bruit de fond 3 de d terminer la qualit du spot par la d tection des valeurs extr mes outliers et autres spots satur s Le logiciel calcule ainsi une valeur normalis e par spot et pour chaque fluorochrome gProcessedSignal et ou rProcessedSignal puis g n re le rapport de qualit et le fichier de r sultats en format texte Il est noter que pour les puces ADN two colors AFE calcule galement le ratio Cy5 Cy3 le logarithme en base 2 de celui ci pour chaque spot de chaque puce ADN ainsi qu une valeur de probabilit p valeur En plus de ce fichier de r sultats AFE g n re pour chaque puce un rapport de qualit appel QCreport Il est toutefois possible d utiliser d autres logiciels ou scanners tels que le GenePix 4400A de Molecular Devices mais celui ci retourne les r sultats sous un autre format tabul gpr Une fois les lames scann es il est n cessaire de passer par une tape de pr traitement et de normalisation des donn es l aide de programmes d di s tels que certaines librairies R ou encore le logiciel commercial d velopp par Agilent GeneSpring GX Ce proc
306. es la suite d une transcription inverse et d une transcription in vitro des ARN de l chantillon Ces ARNc sont ensuite incub s avec la puce afin de permettre leur hybridation avec les sondes pr sentes sur la celle ci Figure 1 1 RNA RNA from from Sample A Sample B 5 AAAA 3 mRNA 5 AAAA 3 mRNA AffinityScript RT AffinityScript RT Oligo dT Promoter Primer Oligo dT Promoter Primer Ist strand _ Ist strand 5 AAAA Antesense es 3 2nd strand 5 AAAA Antisense enn 3 2nd strand Promoter RS cDNA Promoter TY cDNA AffinityScript 8T AffinityScript RT T7 RNA Polymerase T7 RNA Polymerase Cy5 CTP NTPs Cy3 CTP NTPs 17 RNA Polymerase 17 ANA Polymerase 3 S Clu i 5 cRNA C C UULU E3 5 cRNA 5 AAAA Anti sensej 4 2nd strand 5 AAAA Antisense 3 2nd strand Promoter cDNA Promoter cDNA Purify cRNA Purify cRNA 3 c C Uuuu En 5 cRNA 3 C c vuuu 5 cRNA y Q A PRE Oligo Microarrays FIGURE 1 1 Proc dure d amplification des ARNc pour une exp rience bi canale pour une exp rience mono canale seuls les chantillons marqu s au Cy3 B seront utilis s Extrait du manuel d Agilent One Color Microarray Based Gene Expression Analysis Low Input Quick Amp Labeling Protocol Ces puces comportent des sondes a longs oligonucl otides 60 nucl otides contraire ment la technologie Affymetrix dans laquelle les sondes sont beaucoup plus courtes 25 nucl otides L acquisition des donn es de flu
307. es Hyman 1988 Ronaghi ef al 19981 C est l une des raisons pour lesquelles cette technique a t choisie par une des technologies du HTS qui sera bri vement d crite plus tard dans ce manuscrit la diff rence des premi res g n rations de s quenceurs capillaires ces les s quenceurs tr s haut d bit actuels permettent le s quen age massif en parall le de plusieurs millions de fragments d ADN ceci tr s rapidement et en diminuant les co ts et avec une moindre quantit de mat riel biologique Cette avanc e a n cessit des d veloppements technologiques constant tant au niveau biologique automatisation r vision et am lioration des r actifs des protocoles qu au niveau informatique algorithmes logiciels ferme de calcul m moire stockage Gr ce aux techniques du HTS des tudes qui n taient pas envisageables pour diverses raisons trop longues pas assez de mat riel biologique trop co teuses ont pu tre r alis es Hillier et al 2008 Srivatsan et al 20081 Il est dor navant possible de s quencer plusieurs centaines de giga bases Gb du g nome avec une couverture suffisante permettant des tudes de liaison g n tiques telles que la recherche des polymorphismes sp cifiques par exemple des SNP pour Single Nucleotide Polymorphisms pr sents chez plusieurs patients Jusque l en effet l exp rimentateur tait contraint de s lectionner des g nes d int r t et de s quenc
308. es g nes diff rentiellement exprim s partir d une exp rience de puces ADN Cette interpr tation d pend de l tude men e et permet la g n ration de r seaux de g nes contextualis s Werner 20081 3 6 Exemple de structure de l ontologie Gene Ontology Biological Process Cette figure sch matise les termes parents du terme transcription DNA dependent obtenue l aide de l outil QuickGO http www ebi ac uk QuickGO GTerm id GO 0006351 3 7 Exemples d outils d annotation partir de listes de g nes ou autres identifiants A Gene Set Enrichment Analysis GSEA B DAVIDknowledgebase et C Ingenuity Pathway Analysis IPA 4 04 4 3 sus Be ee ee pu 3 8 Classification clinique de la dengue tablie en 1997 par POMS et localisation de la r gion d o proviennent les jeunes patients cambodgiens 3 9 Cons quence de l pissage alternatif du g ne IKBKAP sur les diff rentes iso formes prot ique cod es Par ce g ne cis coo Be eS ea ERR ELS BS 3 10 R sum du plan exp rimental et analytique de la seconde campagne de puces a 4 1 Interface web de Gene Expression Omnibus GEO 42 Principe de l algorithme DBF MCL 2 22 cee ce ee es 4 3 Evolution du nombre d chantillons disponibles dans Gene Expression Omni bus de 2000 2010 Adapt de Barrett et al 2005 4 4 Sch ma de la nouvelle base de donn e
309. es modifications des histones leur substitution par des variants et la m thylation de l ADN au niveau des dinu cl otides CG souvent concentr s dans des courtes r gions gt 200 bp CpG gt 60 appel es ilots CpG Figures 1 4 et 1 5 En effet chez l homme on observe une sous repr sentation globale des dinucl otides CpG environ 20 de la fr quence attendu et une surrepr sentation locale proximit des r gions promotrices et des enhancers 29 000 ilots CpG pr dits dans l ensemble du g nome humain La m thylation de l ADN est une modification pig n tique transmissibles Sa pr sence est g n ralement associ e la r pression de la transcription g niques Elle se localise principa lement au niveau des ilots CpG proximit des g nes Figure 1 5 Le degr de condensation de la chromatine est contr l par des modifications des extr mit s N terminales des histones comme des phosphorylations ac tylations m thylations 1 3 R gulation de l expression des g nes 27 ubiquitinations sumoylations Kouzarides 2007 Figure 1 4 B et C Toutes ces modifica tions sont catalys es par des enzymes sp cifiques Les modifications covalentes des histones agiraient soit directement en modifiant la compaction de l enroulement d ADN autour des nucl osomes soit indirectement en constituant des marques permettant le recrutement de prot ines capables de remodeler localement la structure de la chro
310. es non publi es dans GEO a t r alis e pour permettre un utilisateur avanc de pouvoir cr er sa propre base de donn es La base de donn es a t install e au CRG Center for Genomic Regu lation de Barcelone et au sein de l IGF Institut de G nomique Fonctionnelle de Montpellier 4 5 2 Int gration de nouvelles donn es Afin de pouvoir inclure beaucoup plus d esp ces d autres plateformes ont t r cup r es sur le serveur FTP de GEO Contrairement la premi re version de la base de donn es les informations disponibles ne sont pas centr es sur les sondes mais sur les g nes par le biais des gene ID valeur num rique et unique pour chaque g ne L utilisation des gene ID comme r f rence de g ne en suivant l exemple du NCBI nous a permis d inclure d autres types d identifiants permettant l acc s simplifi d autres bases de donn es telles que Uniprot Ensembl UCSC Refseq OMIM Une table contenant les alias d un g ne a galement t cr e afin de permettre l utilisateur de rentrer une liste de g nes ne contenant pas forc ment uniquement les identifiants de g nes officiels HUGO pour Homo sapiens Enfin ceci pourra terme nous permettre de mettre jour les informations sur les g nes partir des gene ID c est dire les geneSymbol et autres alias du g ne sans modifier la composition des signatures Chapitre 4 Fouille de donn es de puces ADN 168 ULN
311. es par le s quen age d une large collection de g nomes puis du projet 1000 pig nomes en 2010 par l International Human Epigenome Consortium IHEC qui a pour objectif la caract risation 1 6 Langages de programmation pour l analyse de donn es 53 INTERNATIONAL HUMAN EPIGENOME CONSORTIUM 1000 Epigenomes DNA Methylation Histone Marks Histone Variants RNA Chromatin Associated Proteins Reference Stem Cells Environment Disease Aging Infections Cancer Autism ds de gga Toxins Obesity Psychiatric ypes Nutrition Atherosclerosis Asthma Model Organisms eS cer Stress Autoimmune Addiction Data Coordination Access and Archiving Quality Control VON NN NV v Translating Discoveries into Improved Human Health aiai maimai re FIGURE 1 15 Diagramme des objectifs du consortium travaillant sur le d cryptage des pig nomes humains le IHEC International Human Epigenome Consortium Cette figure est issue du site internet du consortium IHEC d au moins 1000 pig nomes 1 par tissu du corps humain incluant les modifications d his tones les positions des variants d histones le remodelage des nucl osome la m thylation de l ADN l tude des ARN non codants Figure 1 15 Plus r cemment le projet BLUEPRINTS a pour objectif l tablissement des pig nomes des diff rentes lign es cellulaires h matopo tiques saines et canc reuses Ces tudes sont tr s importantes car elles pe
312. es puces ADN encore appel es biopuces ou microarrays repose sur Vhybridation d un chantillon de s quences d ADN ou d ARN compl mentaires ADNc ARNc marqu avec des brins d ADN plus courts compl mentaires ou des oligonucl otides synth tiques fix s sur un support solide Les premi res puces cr es au milieu des ann es 1980 utilisaient une membrane de nylon et un marquage radioactif appel es parfois macroar rays par opposition aux microarrays actuels Elles ont ensuite t supplant es dans les ann es 90 par la technologie sur lame de verre avec un marquage fluorescent La miniaturisation sur support solide l utilisation de marqueurs fluorescents et les progr s de la robotique permettent aujourd hui de fabriquer des puces comportant une tr s haute densit d unit s d hybridations ou spots Chaque spot est constitu de sondes c est dire d oligonucl otides d une longueur de quelques dizaines de nucl otides ou de produits PCR pour Polymerase Chain Reaction ADNc d une longueur de quelques centaines de nucl otides Ces sondes correspondent des s quences d ADN sp cifiques d un transcrit codants connu ou pr dit Les oligonucl otides synth tis s sont issus de banques de donn es telles que GenBank ou dbEST et correspondent donc des s quences non redondantes sp cifiques d un transcrit donn A l heure actuelle les avanc es technologiques ayant permis une augment
313. es with a FC gt 1 5 and a P value lt 0 05 and cross compared the lists of candidate genes for each study Fig 3 We did not find genes that were consistently dysregulated in all studies Among the 3 228 candidate genes differentially expressed in at least one of the five studies including our own we found 10 genes shared by three dif ferent studies with the same kind of dysregulation Seven genes were underexpressed in FD CXCR7 PFKFB3 IKBKAP SEMASA SEPT3 SNAI2 and TNC and three genes were overexpressed ARCH GAP28 MANIA and XK Supp Table S4 We also analyzed the GO of the 175 genes shared by at least two studies Supp Table 4 Nine processes emerged as significantly affected in FD regulation of cell motion guanyl ribonucleotide binding contractile fiber part neuron differentiation regulation of protein kinase activity regula tion of apoptosis cadmium ion binding muscle tissue development and osteoblast differentiation Supp Table S5 Kinetin Modulated the Expression of Genes Involved in mRNA Splicing Our microarray data were next examined for evidence of genes targeted by kinetin Indeed this plant cytokinin reproducibly in duces rapid increase of IKBKAP transcripts with exon 20 inclusion through unknown mechanisms To further understand the mecha nism of kinetin in IKBKAP mRNA alternative splicing we compared FD rafnshh untreated hOE MSCs versus FD rafnshh treated hOE MSCs with 100 uM of kinetin for 48 hr
314. ession profiles of a given gene in numerous curated experiments Once a profile is selected a list of similar profiles z e neighbors can be retrieved Although GEO proposes several tools to refine the queries cross analysis through multiple experiments can not be performed The second solution involves an experiment centered perspective as developed in the GEO DataSets and ArrayExpress web interfaces 4 This approach provides to biologists a set of classification tools to re analyze selected experiments Depending on the interface supervised or unsupervised analysis see below can be pre calculated or computed on demand Again as no meta analysis tool is available mining and compiling even few GEO Serie Experiments GSE remains a difficult and time consuming task We therefore lack efficient tools allowing productive data mining of microarray databases For example querying whole December 2008 Volume 3 Issue 12 e4001 public microarray data using a single gene identifier is an ambiguous procedure to extract relevant co regulated genes Indeed depending of the biological context genes can be involved in different signaling pathways and may be associated with different neighbors As a consequence combined queries should be more appropriate to build relevant gene networks Moreover numerous uninformative genes exist in microarray experiments They correspond most generally to those with low standard deviation
315. essus permet de r duire les effets dus aux biais techniques sans pour autant affecter celui de la variation biologique de l expression des g nes Cette tape qui est indispensable quelle que soit la technologie utilis e est impossible automatiser en raison des sp cificit s inh rentes chacune de ces technologies 2 2 Correction des donn es brutes 2 2 1 Pr traitement des donn es Pour les puces ADN de technologie Agilent M les logiciel d analyses comme AFE proposent diff rents crit res pour valuer la qualit du signal de chaque spot Des biais tels que la variance de l intensit des pixels du spot la variance du bruit de fond la pr sence de taches ou de spots de taille anormale ou encore un faible rapport signal sur bruit peuvent tre ainsi examin s Puis l tape de filtrage des donn es permet de ne conserver que les spots au dessus d un seuil de qualit pr d fini pour ne pas fausser les r sultats Smyth er al 20037 Il est cependant noter que chaque laboratoire poss de sa propre m thode empirique de filtrage des donn es de puces car il n existe pas de m thode standard en ce qui concerne les puces Agilent TM C est cette tape que le bruit de fond est pris en compte g n ralement par soustraction du signal si celui ci n a pas d j t utilis par AFE pour g n rer le signal trait gProcessedSignal et rProcessedSignal pour Cy3 et Cy5 respectivement 62 Chapitre 2
316. est Indeed the nucleus subnetwork contains sever al regulators e g Runx1 Notchl Hes and Xbp1 some of them colored in green indicating avail able regulatory interactions for the transcription factor in our database Figure 2B shows that several genes Dtx1 Hes1 Il7r and Bcl2 have been previously shown to be under the positive control of Notch these curated informations are derived from LymphTF DB According to TargetScan pre dictions Mirn17 Mir17 does not seem to target any component of the Notch pathway In contrast it is predicted to affect the expression of several transcription regulators including Mycn Runxl Smad7 and the H3K27 methyltransferase Ezh1 by default miRNA are considered as having a nega tive effect on mRNA and thus edges appear as T shaped arrows Moreover it may also control key components of the cell cycle machinery Cend2 and Cdknla Figure 2D shows informations avail able from ChIP X database regarding Mycn These informations are derived from a ChIP seq ex periment performed on mouse embryonic stem cells by Chen ef al 38 Note that according to these results Mycn could target several transcription factors and thus play a key role during DN3 to DN4 transition However in this cellular context such results should be interpreted with caution since no large scale analysis of MYCN targets in DN3 Thymocytes has been reported so far Among Mycn potential targets Notch1 is one master switch of early to late thymocy
317. est appel e normalisation globale par la moyenne ou par la m diane Elle consiste soustraire aux log2 des intensit s ou log ratio le log2 de la moyenne ou de la m diane des intensit s ou des ratios de chaque puce Cette normalisation permet de centrer la distribution des intensit s ou des log ratios sur 0 Elle reste 64 Chapitre 2 Contr le qualit et normalisation de donn es de puces ADN n anmoins d un int r t limit en raison de la nature souvent non lin aire des relations entre les intensit s observ es Ramdas et al 2001 Shoemaker et al 2001 A nsi cette m thode ne permet qu une valuation de l erreur syst matique pour chaque chantillon contr lant des diff rences proportionnelles travers les puces Il est galement possible d utiliser la normalisation par centrage et r duction Cette technique permet d uniformiser globalement la distribution des donn es fwo colors et one color en centrant les donn es sur 0 et en fixant l cart type 1 puis en calculant les log ratios dans le cas des donn es two colors Elle permet ainsi de pouvoir comparer de mani re quivalente les diff rences d expression de g nes dans plusieurs couples d chantillons par exemple tissu tumoral vs tissu de r f rence Le centrage est obtenu en soustrayant aux log ratios la m diane des log ratios de l chantillon correspondant Puis les donn es sont r duites g n ralement en divisant les valeurs
318. ests multiples Bonferroni Benjamini peuvent tre faites afin de renforcer la valeur des r sultats statistiques Les calculs de permutations n cessitent l valuation des scores d enrichissement de listes de g nes obtenues par s lection al atoire La p valeur ainsi obtenue refl te la probabilit d occurrence de cette liste de g nes par rapport au hasard Une fois tablies les listes de g nes diff rentiellement exprim s dans des conditions biologiques s lectionn s peuvent tre analys es par regroupement On pourra alors identifier des groupes de g nes significativement li s des processus biologiques impliqu s dans la probl matique et le mod le tudi Afin de valider exp rimentalement par une autre approche le niveau d expression de ces g nes candidats on aura g n ralement recours une exp rience 3 3 Annotation fonctionnelle 87 Enrichment plot P53HYPOXIAPATHWAY Molecular Profile Data Enriched Sets CLEMENT TETE EEE Enrichment profile Hits Ranking metic scores MAKAT NKI s A V Arormate pathway of NES kappa 8 activation FIGURE 3 7 Exemples d outils d annotation partir de listes de g nes ou autres identifiants A Gene Set Enrichment Analysis GSEA B DAVIDknowledgebase et C Ingenuity Pathway Analysis IPA 88 Chapitre 3 Analyses de donn es de puces ADN de PCR quantitative appel e couramment qRT PCR D un point de vue fo
319. esults indicate that rafnshh treatment influences the neural and glial lineage commitment As a consequence the splicing machinery in neuron or astrocyte differentiated cells is impaired for JABKAP exon 20 recognition Discussion Deciphering the molecular basis of the tissue specific pattern of IKBKAP mRNA splicing in FD nervous tissues is crucial for the comprehension of disease physiopathology in this genetic neuro logical disorder affecting neuronal development and survival In this study we aimed to recapitulate different aspects of JABKAP gene expression using FD hOE MSCs cultured with different conditions While other human cellular models such as fibroblasts or iPS cells have been investigated to understand FD we believe that hOE MSCs hold a great promise to model the FD disease pathology hOE MSGCs are easily obtained by a simple biopsy and can be maintained for an extended period of time and can be rapidly expanded in basic culture conditions without genetic manipulation In addition due to the origins of hHOE MSCs from a peripheral tissue these cells are able to express neuroglial markers in vitro Figure 1 42 47 Thus they constitute an efficient and simple method to derive neuronal cells in the original context of the genetic mutation studied December 2010 Volume 5 Issue 12 e15590 OE MSCs as a Model for FD A D kinetin uM treated wash out 0 25 50 100 200 WT MU B E 5 0 6 WT 50 3 WT lt m M
320. et Christophe Ginestier de l quipe du Dr Daniel BirnBaum du Centre de Recherche en Canc rologie de Marseille CRCM porte sur la d finition des cibles du facteur de transcription ZNF703 dans le cancer du sein 2 la collaboration avec Nathalie Sakakini tudiante en seconde ann e de these au TAGC sous la co direction des Docteurs Jean Imbert et Thierry Virolle de l unit Inserm U898 stem cells development and cancer de Nice porte sur l tude de la fixation des facteurs de transcription EGRI et B Cat nine dans deux lign es de Glioblastome 3 la collaboration avec le Dr Salvatore Spicuglia faisant alors partie de l quipe du Dr Pierre Ferrier au Centre d Immunologie de Marseille Luminy CIML portant sur l tude du facteur de transcription TLX3 lors du d veloppement des lymphocytes T chez la souris 4 la collaboration avec le Dr Saadi Khochbin directeur de l quipe pig n tique et signalisation cellulaire de l Institut Albert Bonniot de Grenoble portant sur l analyse de la localisation du variant d histone tH2B lors de la spermatog n se chez la souris Dans ces collaborations mon travail a consist en l alignement des donn es brutes de s quen age en la v rification de la qualit du s quen age la d tection de pics et pour la plupart des cas l analyse de ceux ci statistique de distribution annotation recherche de motifs Ces collaborations n ont pas donn lieu pour
321. et concepts au vocabulaire contr l appel ontologie Les concepts sont organis s dans un graphe dont les relations peuvent tre des relations s mantiques ou des relations d inclusion L objectif premier d une ontologie est de mod liser un ensemble de connaissances dans un domaine donn Lontologie la plus connue pour l annotation de donn es provenant de puces ADN est Gene Ontology GO Ashburner et al 20001 Celle ci propose un vocabulaire contr l de termes d crivant les propri t s des produits des g nes Elle est compos e de 3 domaines compartiment cellulaire ou cellular component d crivant la localisation des prot ines au sein de la cellule comme par exemple noyau cytoplasme membrane fonction mol culaire ou molecular function d crivant les activit s au niveau mol cu laire telles que la liaison par exemple le terme GO transcription factor binding GO 0008134 ou la catalyse processus biologique ou biological process repr sentant l ontologie la plus int ressante pour connaitre la fonction des prot ines Elle nous renseigne sur les processus dans les quels des prot ines sont impliqu es comme par exemple la transcription terme trans cription DNA dependent GO 0006351 Figure 3 6 3 3 2 Quelques outils d annotation Plusieurs outils utilisant cette ontologie ont t cr s comme AmiGO GOToolsBox Martin et al 2004 FATIGO Al Shahrour et al
322. et non seulement l tude des variations interindividuelles telles que les SNPs les petits indels insertions d l tions mais galement celle des grandes duplications et d l tions des inversions des translocations ou encore des anomalies de ploidie CNV pour Copy Number Variation en anglais Elle est tr s utilis e pour tude des remaniements chromosomiques dans les cancers Toutefois comme pour le RNA seq il est parfois n cessaire de s lectionner une ou plusieurs r gions g nomiques afin qu elles soient enrichies lors du s quen age Cette capture des fragments d ADN localis s au sein d une r gion pr cise du g nome est appel e target seq Le target seq permet l analyse cibl e de r gions candidates provenant de l tude de liaisons g n tiques afin d identifier de nouveaux SNPs et ou indels associ s une maladie ou un ph notype particulier Le s quen age permet galement de d tecter des g nomes viraux ou bact riens int gr s au g nome de leur h te apr s l avoir infect Cette approche de m tag nomique vise tu dier directement des organismes microbiens dans leur environnement sans passer par une tape de culture en laboratoire En conclusion l utilisation du s quen age tr s haut d bit offre de nombreux avantages le multiplexage des chantillons l utilisation d amorces sp cifiques des adaptateurs pour r aliser les amplifications PCR l
323. etermined with the logRatio log2 R G The different arguments of this function are gt args agMAplot A basic MA plot is obtained by the command gt agMAplot myob whichSlot gM array 1 Controls and Flags can be added to the MA plot by the command gt agMAplot myob whichSlot gM array 1 ctr TRUE flag 1 5 Controls and distribution of A and M can be added to the MA plot by the command gt agMAplot myob whichSlot gM array 1 ctr TRUE hist TRUE 12 US45102986 251487911262 S01 GE1 v5 95 Feb07_1_1 txt US45102986 251487911262 S01_ GE1 v5 95 Feb07_1_1 txt Positives controls Negatives control glsSaturated glsFeatNonUnif gUsPosAndSigni glsFeatPopnOL IsManualFlag 3 Slot log2 gM with densCols Slot log2 gM with densCols a b Histogram of A Frequency 0 10 12 14 16 O US45102986_251487911262_S01_GE1 v5_95 Feb07_1_1 tx Histogram of M e Positives contro e Negatives contrq CTT 6 8 10 12 14 16 oO 10000 Slot log2 gM with densCols c Figure 3 The agMAplot function A MA plot obtain with the gMeanSignal slot and for the first array B Visualization of the different flags on the MA plot in same time that the controls C Visualization of controls and distribution of A and M on the MA plot a 13 The argument show gene allows to observe the gene name of all the probe whose the M value
324. ethods aimed at capturing informative dimensions PCA A filtering step is most generally applied prior to unsupervised classification One can select genes with high standard deviations those displaying a proportion of values above a user defined threshold or those having a given maximum or minimum value However this procedure is extremely subjective and the number of selected genes may be over or under estimated Finally another limit of classical unsupervised methods also resides in their inability to accurately identify the actual number of clusters if no further argument is provided to the algorithm As a consequence additional algorithms for unsupervised classification have been proposed such as Quality Cluster algorithm QT_Clust 6 CHAMELEON 7 or Markov CLustering MCL 8 However none of them address both the filtermg and partitioning issues MCL is a graph partitioning algorithm whose ability to solve complex classification problems has been underlined in many applications including protein protein interaction networks 9 sequence analysis TRIBE MCL 10 or microarray analysis geneMCL 11 In a graph representation of microarray data nodes stand for genes and edges represent profile similarities between genes As processing the full graph for partitioning is time consuming and computer intensive the gen eMCL algorithm has to be run on a subset of genes that are selected using classical filters e g high standard deviat
325. ettant l analyse des donn es de puces ADN de technologie Agilent En gris le logiciel commercial d velopp par Agilent et en gras les caract ristiques de notre librairie R AgiND 72 Liste des principales annotations contenues dans l outil DAVID knowledgebase t gro p es par dom i e lt sa eno sa a s Bok Go Re eA ee s 84 Bilan des avanc es du projet entre sa publication en 2008 et maintenant 209 Autres approches de m ta analyses de donn es de puces ADN provenant de GEO en gras l outil que j ai d velopp Les cellules gris es correspondent aux outils non gratuits kk ok A RA RESEDA use BO s SO 210 comparaison des techniques de ChIP on chip et de ChIP seq Le corres pond l utilisation du kit MAGnify 217 Les principaux formats de donn es du s quen age tr s haut d bit 235 Liste des abr viations Les abr viations indiqu es ci dessous sont en anglais car ce sont celles commun ment ad mise par la communaut scientifique AFE ANOVA ChIP FDR Gb GEO HTS ICS LOWESS nt PCR PET RNA SAM SETS SNP SOLiD Tb TS TSS Agilent Feature Extraction software ANalysis Of VAriance Chromatin ImmunoPrecipitation False Discovery Rate Gigabytes Gene Expression Omnibus High Thoughput Sequencing Instrument Controler Software LOcaly WEighted Scatterplot Smoothing Nucleotide Polymerase Chain Reaction Paired En
326. ettant la d tection des SNP et small indels Ces deux pipelines ont d j en commun l analyse secondaire reste les int grer dans une interface graphique conviviale pour une utilisation externe La laboratoire souhaite a ce sujet install sur un serveur une version locale de Galaxy Giardine et al 2005 avec des ressources communes comme la suite RSATools l outil de d tection de pics PICOR ou bien encore les diff rents pipelines d analyses de la plateforme TGML 248 Chapitre 5 tude de la r gulation transcriptionelle par HTS Afin de pouvoir en partie poursuivre mes travaux de th se et d en apprendre encore d avan tage sur ces m canismes complexes mais tr s int ressants de la r gulation de l expression des g nes je vais tre embauch e en tant qu ing nieur de recherche en CDD au niveau de la plateforme TGML Mon r le sera d am liorer le pipeline ChIP seq mais surtout de mettre en place l analyse complexe des donn es de RNA seq pour laquelle rien n est disponible au laboratoire pour le moment Annexes ANNEXE Manuel d utilisation de la librairie R AgiND The AgiND package Aur lie Bergon and Denis Puthier July 30 2007 1 INSERM TAGC ERM206 Parc scientifique de Luminy case 928 MARSEILLE FRANCE puthier tagc univ mrs fr http tagc univ mrs fr Contents 1 Introduction 2 2 Getting started 2 2 1 Load the AgiND library nas si mgng e ok a ee mu LR RNA ER us 2 2 2 Note about qua
327. exon 20 uniquement dans le syst me nerveux La prot ine IKAP hELP1 synth tis e partir des transcrits incluant l exon 20 doit jouer un r le important au niveau du syst me nerveux r le qui reste n anmoins encore tr s obscur Le transcrit d IKBKAP excluant l exon 20 coderait pour une prot ine tronqu e de pr s de 50 du c t C terminal Cependant l existence de cette prot ine reste incertaine Pour comprendre les voies mol culaires dont l alt ration dans le syst me nerveux cause la DE nous avons explor la signature transcriptionnelle de cette maladie A cette fin le groupe du Dr El Ch rif Ibrahim a tabli des cultures de cellules souches olfactives indiff renci es hOE MSC signifiant human Olfactory Ecto Mesenchymal Stem Cells comme mod le d tude de la DF Ces cellules contribuent en permanence aux processus de prolif ration migration diff renciation apoptose et survie cellulaire qui caract risent la neurogen se Une banque de cellules souches nasales humaines a t tablie a partir de 10 individus contr les et 6 patients DF permettant de produire des cellules neurales diff renci es savoir des neurones et des astrocytes A partir de 5 individus contr les et 4 patients DF des cultures de ces cellules souches en cours de diff renciation ont t suivies a diff rents temps 1 2 5 et 9 semaines Pour chacun de ces temps et de ces chantillons les ARN totaux extraits de ces cultures ont
328. f protein complexes in the yeast Saccharomyces cerevisiae Nature 440 637 43 0 Enright AJ Van Dongen S Ouzounis CA 2002 An efficient algorithm for large scale detection of protein families Nucleic Acids Res 30 1575 84 1 Samuel Lattimore B van Dongen S Crabbe MJC 2005 GeneMCL in microarray analysis Comput Biol Chem 29 354 9 2 Sherman BT Huang DW Tan Q Guo Y Bour S et al 2007 DAVID Knowledgebase a gene centered database integrating heterogeneous gene annotation resources to facilitate high throughput gene functional analysis BMC Bioinformatics 8 426 3 Pawitan Y Bj hle J Amler L Borg A Egyhazi S et al 2005 Gene expression profiling spares early breast cancer patients from adjuvant therapy derived and validated in two population based cohorts Breast Cancer Res 7 R953 64 4 Lacroix M Leclercq G 2004 About GATA3 HNF3A and XBP1 three genes co expressed with the oestrogen receptor alpha gene ESR1 in breast cancer Mol Cell Endocrinol 219 1 7 5 Rogers MA Langbein L Winter H Ehmann C Praetzel S et al 2001 Characterization of a cluster of human high ultrahigh sulfur keratin associated protein genes embedded in the type I keratin gene domain on chromosome 17q12 21 J Biol Chem 276 19440 51 PLoS ONE www plosone org 11 GEO Datamining with TBrowser and columns to TS The presence of a given gene in a given TS is indicated by 1 default 0 Found at doi 10 1371 journal pone 00
329. factors the microsomal prostaglandin E synthase Pro i ae ro inflammatory innate defense and host lipid PTGES Agilent clone number A 24 P478940 and the comple ae a i y 5 oe i EPN ment regulatory protein CRIg VSIG4 considered potent negative meta Q DIMAS ate transcriptiona responses are regulators of T and NK cell responses 43 45 The decreased activated in DSS children abundance of NFkB signal transduction related transcripts When searching for pro inflammatory gene patterns that may Table 2 already reported in DSS patients by others 46 might be relevant to DSS pathophysiology and particularly to systemic be related to impaired expression of T and NK cell related genes inflammation and vascular dysfunction we identified three major Our analysis also revealed that DSS whole blood cells from pro inflammatory gene patterns Interestingly all are related to children over expressed an enriched pattern of anti inflammatory innate defense and host lipid metabolism and considered major and repair tissue remodeling genes Table 3 non exhaustive list pathogenic mechanisms in other systemic inflammatory diseases individual p values available in Table S2 Over expressed anti As shown in table 4 non exhaustive list individual p values inflammatory genes identified encode molecules with diverse available in Table 52 the first one is defined by a set of over functions the anti inflammatory cytokine IL 10 a putative marker expressed genes str
330. fferentiation with a protocol previously used in hOE MSC 41 which consists of additing retinoic acid forskolin and Sonic hedgehog in the medium called rafnshh medium Cells were first cultured in serum free medium supplemented with N2 and B27 until they became adherent before being cultured in rafnshh Figure 9B The new culture medium induced a slight morphology change as compared to the serum condition Figure 9A When hOE MSCs were first cultured in rafnshh they began to form long fine processes and neural like cells Figure 9C After 7 days of treatment a majority of cells adopted neuron like morphologies Figure 9D and established a wide range of connections Figure 9E F and M Using end point PCR on 3 different FD cell cultures we observed that JABKAP mRNA splicing in rafnshh treated cells was more prone to exon 20 skipping as compared to untreated cells Figure 9G This change can be quantified by RT qPCR Figure 9H In contrast we did not detect significant variations in exon 2 and exon 36 alternative splicing during neuronal differentiation data not shown When assessing immunostaining on treated cells we observed that rafnshh treatment increased the proportion of both GFAP Figure 9I and J and MAP2 expressing cells Figure 9K and L Double labeling with B III tubulin and nestin revealed a stronger expression of B III tubulin compared to nestin during the differentiation process Figure 9N P Collectively these r
331. for some genes the clustering results differed Fig 1E Importantly partition results were not very sensitive to inflation values Indeed 10 and 12 clusters were observed with I set to 1 5 and 2 5 respectively data not shown All signatures were then submitted to functional enrichment analysis A summary of the results is given in Figure 1G As expected for a breast cancer dataset TS were found to be related to i immune response T lymphocyte activation B lymphocyte activa tion and interferon alpha i primary metabolism cell cycle ribosome biogenesis nuclear phosphorylation and transcription which is probably reminiscent of tumor aggressiveness ii modification of local environment extracellular matrix and cell adhesion which could sign metastasis potential of each sample iv and estrogen receptor status of breast tumors estrogen response pathway Altogether these results underline the ability of DBF MCL algorithm to find natural gene clusters within a randomly selected dataset Indeed for numerous additional microarray datasets hierarchical clustering results and DBF MCL results were compared As illustrated in Figure S5B for a representative set of experiments setting k to 100 allows in all cases to delete noisy elements and to select only informative genes in a microarray dataset Interestingly in all cases meaningful partitioning results were obtained using inflation parameter set to 2 Systematic extraction of TS
332. from object The agExclude function allows to remplace by NA different data to exclude There are different case e low quality spots thanks to the Flag slot of the object e controls if just the sample want to be observed e list of gene name to exclude this list can be obtained by the argument identify TRUE of the agMAplot or agImage functions which return a list of the identify gene names gt M lt agExclude myob type controls toNA TRUE gt M2 lt agExclude myob type flags toNA TRUE gt a lt c DarkCorner GE_BrightCorner gt M3 lt agExclude myob type list list a toNA TRUE All these commands return an object of the same class whose the controls flags or data of the gene name list are remplaced by NA 3 Diagnostics plots Several functions allow to visualise data of arrays e agBoxplot boxplot of a Slot distribution e agMAplot MA plot of a Slot e agImage Virtual image of a Slot e agPlot intensities values along the different chromosomes By default if there is no whichSlot argument the data used are e gM slot for an AgilentBatch object e SgNorm slot for an AgilentNorm object e logRatio for an AgilentBatchRG object which is calculating as log2 rM gM e logRatioNorm for an AgilentNormRG object which is calculating as log2 rSgNorm gSgNorm Moreover it s possible to save these plots by the arguments pdf nd html s TRUE These plots are saved on the working direc
333. g Molecular Mechanisms of DSS associated with leave one out cross validation 39 was used to assess the robustness of DSS gene signature Real time PCR validation of genes over and under expressed in DSS patients Briefly total RNA extracted from whole blood samples was reverse transcripted using the High Capacity cDNA RT kit Applied Biosystems Inc and random primers Real time PCR were carried out using the FastStart Universal Probe Master ROX Roche and real time PCR primers designed using the Universal Probe Library UPL Assay Design Center Roche Amplification products were run on an ABI PRISM 7900HT Applied Biosystems Cycle threshold Ct values were automati cally calculated and value obtained for each gene amplified was normalized by subtracting the Ct corresponding to amplification of the HPRT1 gene AC for the same sample Correlation between ACt values obtained by real time PCR and correspond ing expression values from microarrays was estimated using Spearman correlation coefficient Comprehensive overview of functional patterns altered during DSS Bio informatics based analysis using the demonstration version 7 1 of Ingenuity Pathway Analysis software IPA Ingenuity Systems www ingenuity com associated with manual and littera ture based analysis was carried out to identify the most relevant functional processes associated with the identified DSS gene signature This was done by combining most informative canoni
334. g1 lt new graphAM adjMat adjMat gt nodes g1 1 C6orf211 GREB1 WWPi JMJD2B KRT18 RNF103 7 ROGDI SLC22A5 THSD4 NATL SLC39A6 ABAT 13 CA12 CIRBP LOC400451 MAGED2 MCCC2 MLPH 19 ANXA9 ERBB4 FOXA1 ESRi GATA3 TBC1D9 25 XBP1i gt nAt lt makeNodeAttrs g1 gt nAtffillcolorimatch rownames as matrix nAtffillcolor c GATA3 XBP1 ESR1 nomatch F 0 lt green gt nAtffillcolorimatch rownames as matrix nAtffillcolor c TBC1D9 FOXA1 nomatch F 0 lt yellow gt plot g1 fdp nodeAttrs nAt As expected the list of gene contains XBP1 amp ESRI amp GATA3 but also FOXA1 HNF3A that was reported to be co expressed with ESR1 in several experiments see 4 Other genes are also particularly relevant such as TBC1D9 MDR1 Multidrug Resistance 1 figure 1 2 4 Vizualising expression matrix The TS 3DE64836D is related to experiment GSE7904 In this experiments the authors were interested in analysing several classes of breast cancer tumors especially Sporadic basal like cancers gt a lt getTBInfo field signature value 3DE64836D verbose FALSE gt exp lt al Experiment 1 gt info lt getTBInfo field experiment value exp verbose TRUE A result was found for experiment GSE7904 Name GSE7904 Organism Homo sapiens PMID NULL Nb samples 62 Title Expression data from
335. galement vers le criblage de nouvelles mol cules et l identification de nouveaux m dicaments et de nouveaux outils de diagnostic Introduite dans les ann es 1980 la technique haut d bit des puces ADN permet de mesurer simultan ment le niveau d expression d un large ensemble d ARN messagers conte nus dans un chantillon ce qui en fait un outil de choix pour l tude du transcriptome Cette m thode est de nos jours encore couramment utilis e dans les laboratoires de recherche pour diverses applications telles que l identification de cibles th rapeutiques de biomarqueurs ou de voies de signalisation impliqu es dans une pathologie la caract risation des m canismes de r sistance une drogue ou encore l identification de signatures transcriptionnelles dans divers 1 2 Le transcriptome 21 contextes biologiques Son utilisation intensive a entra n le d veloppement de nombreuses technologies permettant l acquisition des donn es De plus elle a n cessit le d veloppement de nombreux outils et m thodes bioinformatiques et statistiques d di s au traitement de la masse de donn es ainsi obtenue Actuellement avec le d veloppement du s quen age tr s haut d bit de nouvelles techniques d tudes du transcriptome ont vu le jour le RNA seq et le SAGE seq Ces techniques d taill es plus loin ne sont pas celles que j ai utilis es durant ma th se 1 2 1 Principe des puces ADN Le principe d
336. ge tr s haut d bit 49 Ainsi l identification in vivo par ChIP de sites de fixation de facteurs de transcription et de modifications N terminales d histones est maintenant possible tr s haut d bit par le s quen age des fragments immunopr cipit s ChIP seq Cette m thode permet une meilleure r solution des sites potentiels de liaison des facteurs de transcription que la technique ant rieure utilisant l hybridation des fragments d ADN immunopr cipit s sur des puces ADN ou a oligonucl otides pang nomiques d sign e par l acronyme anglais ChIP on chip voir partie 5 1 1 La m thode du ChIP seq sera d crite en d tail dans la partie suivante de ce manuscrit car le traitement des donn es ainsi obtenues a n cessit le d veloppement d un pipeline d analyse sp cifique sur la plateforme TGML ce qui a constitu une partie de mon travail de th se pr sent dans le Chapitre 5 En plus des modifications des histones le positionnement de nucl osomes peut tre tudi par une digestion de la chromatine la DNAsel ou la MNase Micrococcal Nuclease ou S7 Nuclease en pr sence de cations bivalents Cette endonucl ase effectue des coupures doubles brins de l ADN entre les nucl osomes On obtient ainsi des fragments d ADN de 146 nucl otides environ soit la taille du fragment d ADN enroul autour d un nucl osome Par s quen age il est possible de d terminer les positions de ces nucl oso
337. ge where a list of gene can be located can be obtained by the command gt a lt c DarkCorner GE_BrightCorner gt aglmage myob gM array 1 show gene a 16 US45102986_ 251487911262 _S01_GE1 v5_95_Feb07_1_1 txt US45102986_251487911262_S01_GE1 v5_95_Feb07_1_1 txt 120000 120000 500 110000 110000 93000 98000 400 80000 86000 300 67000 74000 1 nb_row 4 nb_row 53000 62000 200 40000 50000 27000 37000 100 13000 25000 81 13000 1 nb_col 1 nb_col threshold 25 quantiles threshold 10 intensity a b US45102986_251487911262_S01_GE1 v5_95 Feb07_1_1 txt 400 500 300 1 nb_row 20 40 60 80 1 nb_col threshold 0 intensity c Figure 6 The others arguments of the agImage function A Visualization of the virtual image where only the intensity upper than the first quantile e g 25 percent quantile are present B Visualization of the same image but with a threshold of 10 percent in according with signal intensity C Localisation of spots on the image corresponding to a list of gene name in blue circle here DarkCorner and GE_BrightCorner which are two controls 17 3 4 The agPlot function This function is more specific of the two colours array Because the plot obtained allows to observe a variation of the signal e g logRatio for an AgilentBatchRG object or gM for an AgilentBatch along a chromosome or on a sorted SN slot The arg
338. gorithm MCL Performances of DBF MCL on Complex9RN200 dataset To test the performances of DBF MCL algorithm we used a modified version of the complex9 dataset which was used earlier by Karypis et al 7 Since DBF MCL is designed to handle noisy datasets 200 of normally distributed random noise was added to the original data The resulting dataset which will be referred as Complex9RN200 thereafter see Figure S4A and S4B shows some difficulties for partitioning since it is composed of a noisy environment in which arbitrary geometric entities with various spacing have been placed The two main parameters of DBF MCL are k that controls the size of the neighborhood and the inflation I range 1 1 to 5 which controls the way the underlying graph is partitioned The effect of k on the selection of informative elements is shown in Figure S5A Euclidean distance was used for this dataset A steep ascending phase and a slow increasing phase starting from a k values close to 40 were observed This confirms the existence of areas with heterogeneous densities In fact the transition between the two phases reflects the transition from dense to sparse regions Indeed datasets produced with k values above 40 contain noisy elements Fig S4C In contrast choosing k values in the ascending phase ensure the achievement of noise free datasets In the case of artificial data satisfying partitioning results were obtained with inflation values close to 1
339. h a constant concentration of 80 WM over 24 h After performing semi quantitative RT PCR analysis the first significant increase of WT MU ratio was seen after 24 h of kinetin treatment Figure 7D December 2010 Volume 5 Issue 12 e15590 OE MSCs as a Model for FD E1 E2 E3 E4 E5 E6 E32 E33 E34 E35 E36 E37 rae HHHH H HH V A CAGgtaagce ttctcttagAGA CTTCTCTCTCCGAACTGAACAGGGG GAA ATG CCGgtaagt ttttttcctagACA GGT gtaagc ttttctcctcagGTT TTG 145nt 60nt 103nt B C _ 0 3 ee BICTRL C1 C4 FD1 FD3 5 ek FD z Fa 0 2 Lu Q x x 0 1 x 0 exon2 exon 36 D WD like repeats elongator domain interaction antibody recognition _ gt IKAP hELP1 full length 1 796 1008 1332 Alt 3 ss exon 2 EEE 114 1332 Du Exon 36 inclusion es n D i E E O a gt NND 1 1299 3 Ean 20 ekipping Ht gt NMD 1 713 Figure 6 Expression and alternative splicing of KBKAP mRNA at the extremities of the coding sequence Two additional splicing events are described within XBKAP gene The first one represents the alternative use of a 3 ss for exon 2 and the second one concerns exon 36 skipping as PLoS ONE www plosone org 8 December 2010 Volume 5 Issue 12 e15590 OE MSCs as a Model for FD represented by a schematic A B semi quantitative RT PCR illustrates relative amounts of both events on 2 control and 2 FD hOE MSCs C RT qPCR analysis performed at 2 different
340. he transcriptome of these cells at very early P1 P2 and later P5 P9 cell passages with the same samples used to quantify JABKAP transcripts Among the 8 780 cDNA represented on the microarray 46 were significantly decreased and only 4 increased in FD hOE MSCs when compared to control hOE MSCs fold change gt 1 4 fold p value lt 6 10 Table 1 and Table S1 considering a false discovery rate FDR of 3 Figure S1 Notably the biological processes and the signaling pathways most significantly targeted by the effectors on our list were actin cytoskeleton organization cell growth and apoptosis Table 1 More specifically we identified 10 genes Table 1 and Table S1 that also exhibited a significant dysregulated expression in previous microarray studies 10 22 Interestingly 2 genes PMEPAI and GSN encoding TMEPAI and gelsolin respectively involved in cell growth and cytoskeleton organization respectively were dysregulated in both the JABKAP RNAi and FD iPS cell studies In order to assess the robustness of our microarray analysis RT qPCR analysis was performed on independent RNAs extracted from 4 control and 4 FD hOE MSCs harvested at the second fourth and seventh cell passage Since gene expression quantifi cation using RT qPCR requires a steady reference gene we selected three genes frequently used for normalization of the data ABLI RPLPO and HPRTI We confirmed that PMEPAI Figure 4A the most dysregulated gen
341. hh in the culture medium 44 In these conditions we observed that differentiated cells express the highest levels of MU JABKAP transcript Figure 9G and H This result correlates with the specific lw WT MU JABKAP isoform ratio in nervous tissues 8 and suggests that stem cells engaged in a neuronal lineage with appropriate culture conditions can rapidly switch their JABKAP WT MU transcript ratio Previous studies have shown that i JABKAP exon 20 is poorly defined in a healthy context due to the presence of a weak 3 ss and exonic splicing silencers and ii the FD mutation exacerbates the environment leading to alternative exon 20 inclusion in FD tissues 59 60 We propose that some transcription splicing factors involved in JABKAP exon 20 recognition are also downregulated in a tissue specific manner This would explain why the pattern of JABKAP alternative mRNA splicing is more aberrant in the nervous system Interestingly Lee and colleagues determined that the neuron specific splicing factor NOVA 61 was underexpressed in FD versus control iPS cell derived neural crest precursors 22 The new model described in this study will allow us further test whether candidate splicing factors may be involved in the tissue specific regulation of JABKAP mRNA alternative splicing Materials and Methods Ethics Statement All control and FD participants gave informed and written consents provided by the parents for the children and biops
342. hiques peuvent tre utilis es pour visualiser la distribution des contr les n gatifs il s agit d ADN h t rologue ou h t roduplex correspondant de l ADN bicat naire form par l appariement de deux brins d origine diff rente il pr sente des domaines en boucle dans les zones o les appariements ne se font pas correctement des contr les positifs Spiked in controls ce sont des sondes sp cifiques d ADNc de diff rentes concentrations connues correspondant une gamme talon r alis e par dilutions successives Ces ADNc sont rajout s aux chantillons lors du marquage avant Vhybridation sur puce ADN Ce sont donc de bons contr les de qualit et de normali sation des chantillons des g nes de r f rence ou g nes de m nage tels que les prot ines ribosomales dont l ex pression est consid r e constante dans toutes les conditions donc dans tous les chan tillons Cette librairie propose de visualiser les chantillons sous forme de bo tes moustaches par la fonction agBoxplot de diagramme MA par la fonction agMAplot d images avec la fonction agImage d histogramme pour un ou plusieurs g nes d int r t par la fonction agPlot Des fonctions sont galement pr vues pour supprimer selon le besoin de l utilisateur certaines des sondes de l objet de type S4 avant ou apr s normalisation par la fonction agExclude Ceci peut galement
343. human olfactory stem cells Brain Res 890 11 22 Zhang X Klueber KM Guo Z Lu C Roisen FJ 2004 Adult human olfactory neural progenitors cultured in defined medium Exp Neurol 186 112 23 Winstead W Marshall CT Lu CL Klueber KM Roisen FJ 2005 Endoscopic biopsy of human olfactory epithelium as a source of progenitor cells Am J Rhinol 19 83 90 Feron F Perry C McGrath JJ Mackay Sim A 1998 New techniques for biopsy and culture of human olfactory epithelial neurons Arch Otolaryngol Head Neck Surg 124 861 6 Othman M Lu C Klueber K Winstead W Roisen F 2005 Clonal analysis of adult human olfactory neurosphere forming cells Biotech Histochem 80 189 200 Viegas MH Gehring NH Breit S Hentze MW Kulozik AE 2007 The abundance of RNPSI a protein component of the exon junction complex can PLoS ONE www plosone org 17 49 50 51 52 53 54 56 DZ 58 59 60 61 62 63 64 65 66 67 OE MSCs as a Model for FD determine the variability in efficiency of the Nonsense Mediated Decay pathway Nucleic Acids Res 35 4542 51 Bateman JF Freddi S Nattrass G Savarirayan R 2003 Tissue specific RNA surveillance Nonsense mediated mRNA decay causes collagen X haploinsuffi ciency in Schmid metaphyseal chondrodysplasia cartilage Hum Mol Genet 12 217 25 Resta N Susca FC Di Giacomo MC Stella A Bukvic N et al 2006 A homozygous frameshift mutation in the E
344. i J amp Hayashizaki Y 2005 The transcriptional landscape of the mammalian genome Science New York N Y 309 5740 1559 63 Chain et al 2010 Chain B Bowen H Hammond J Posch W Rasaiyaah J Tsang J amp Noursadeghi M 2010 Error reproducibility and sensitivity a pipeline for data processing of Agilent oligonucleotide expression arrays BMC bioinformatics 11 344 Chan 2005 Chan E Y 2005 Advances in sequencing technology Mutation research 573 1 2 13 40 288 Bibliographie Chang et al 2011 Chang H Jackson D G Kayne P S Ross Macdonald P B Ryseck R P amp Siemers N O 2011 Exome Sequencing Reveals Comprehensive Genomic Altera tions across Eight Cancer Cell Lines PloS one 6 6 e21097 Chaouiya et al 2012 Chaouiya C Naldi A amp Thieffry D 2012 Logical Modelling of Gene Regulatory Networks with GINsim Methods in molecular biology Clifton N J 804 463 79 Chen amp Sharp 2004 Chen H amp Sharp B M 2004 Content rich biological network constructed by mining PubMed abstracts BMC bioinformatics 5 147 Chen amp Sadowski 2005 Chen J amp Sadowski I 2005 Identification of the mismatch repair genes PMS2 and MLHI as p53 target genes by using serial analysis of binding elements Proceedings of the National Academy of Sciences of the United States of America 102 13 4813 8 Chen et al 2008 Chen R Mallelwar R Thosar A
345. ices GRN gene regulatory network GO Gene Ontology micro RNA miRNA transcription factors TF transcription factor binding site TFBS TranscriptomeBrows er Motif Conservation TBMC Authors contributions CL AB FL CN JI and DP conceived the project CL AB and FL developed the Java application 16 AB CL and NBP developed the database DP performed the TFBS analysis DP CN and JI super vised the project DP wrote the manuscript All authors read and approved the final manuscript Acknowledgments This work was supported by the Institut National de la Sant et de la Recherche M dicale Inserm the Cancerop le PACA and Marseille Nice Genopole Authors acknowledge financial support from the EU ERASysBio Plus ModHeart project Fabrice Lopez was supported by a fellowship from the EU STREP grant Diamonds and through funding from the IntegraTCell project ANR Na tional Research Agency The funders had no role in study design data collection and analysis de cision to publish or preparation of the manuscript The authors would like to thanks the staff from the TAGC laboratory for helpful discussions and gratefully acknowledge Francois Xavier Theodule for technical assistance 17 Figure legends Figure 1 Functional enrichment analysis of predicted targets Annotation terms obtained from various annotation databases were used to performed systematic annotation of all predicted target sets in the mouse For each pa
346. ide Polymorphisms qui ne diff rent du g nome de r f rence que par un nucl otide On peut ainsi ais ment distinguer une erreur de s quen age changement d une seule couleur ou mismatch d un v ritable SNP qui sera d tect par le changement successif de deux couleurs Figure 1 11 Ainsi partir de la succession de fluorochromes observ s lors du s quen age pour chaque bille le SOLID g n re une s quence en code couleur au format csfasta voir partie 5 3 1 dont la premi re lettre correspond la derni re base de l adaptateur P1 position n Figure 1 8 Lecture d une succession de fluorochromes Transformation en code couleur o 1 2 3 T230230212302132132223321 VAM Cy3 IXR Cy5 Conversion en s quence TCGGATTCAGCCTGCTGCTCTATCA Ficure 1 10 Conversion des reads SOLiD en s quences nucl otidiques Chaque couleur code pour un nombre entre 0 et 3 permettant l aide de la derni re base de l adaptateur dans cet exemple T de reconstituer la s quence g nomique 1 4 Les techniques de s quencage tr s haut d bit 41 SNP mse de 2 couleurs successives A A C mw Bs 6 sesos ce ee A A CGT A G G T G SNP R f rence en bzse R f rence en code couleur Read en code couleur Read en base Erreur de lecture changement d une seule couleur An IS A Ge 11CGATCCAC Erreur R f rence en bese R f rence en code couleur Read en code couleur
347. ies were obtained under a protocol which was approved by the local ethical committees in New York Institutional Review Board of the New York University School of Medicine and Marseille Comit Consultatif de Protection des Personnes dans la Recherche Biom dicale Marseille 2 Purification of hOE MSCs Human nasal mucosae were obtained from biopsies of 4 FD patients 3 females and 1 male aged 12 16 years at the December 2010 Volume 5 Issue 12 e15590 Dysautonomia Treatment and Evaluation Center New York All four FD patients were homozygous for the splicing mutation Biopsies form 5 healthy controls 3 females and 2 males aged 18 39 years were collected by the ENT Department in Marseille Hopital Nord France Biopsies were harvested as previously described 30 46 to obtain a cell culture of hOE MSCs The cells were continuously cultured in DMEM HAM S F12 Gibco supplemented with 10 fetal bovine serum FBS and 50 ug ml gentamicin Gibco and trypsinized once a week with 0 05 trypsin EDTA Gibco at 60 80 confluence Cycloheximide Sigma diluted in DMSO was used at 50 ug ml Kinetin solution Sigma 1 mg ml was diluted in DMEM HAM S F12 at concentration ranging from 25 to 200 uM for various incubation times as specified in the text Generation of spheres and cell differentiation Cells were plated at 15 000 cells cm into 6 well plates preteated with poly L lysine 5 ug cm Sigma in a serum free medium of DM
348. iew of gene regulation in the human and mouse Data gener ated in this study were next integrated with a large set of molecular interactions from various sources including i potential protein DNA interactions derived from ChIP seq experiments ChIP X database ii curated regulatory interactions obtained from the literature OregAnno LymphTF DB iii predicted miRNA targets interactions TargetScan iv protein kinase substrate interac tions derived from multiple online sources KEA and v physical protein protein interactions ob tained from IntAct and HPRD 15 21 Informations related to these interactions were stored as MySQL tables that were integrated in the back end database of TranscriptomeBrowser our previ ously published microarray datamining software 22 Finally we developed InteractomeBrowser Browser as a plugin for TranscriptomeBrowser Browser was developed using the prefuse Java library and can be used to translate any gene list into a meaningful graph The specificity of the Browser plugin relies on a new cell compartments based layout that makes use of a subset of the Gene Ontology to map gene products onto relevant cell compartments This layout is particularly powerful for visual integration of heterogeneous biological information Moreover Browser is in tegrated into the TranscriptomeBrowser suite which allows an easy communication with other tools for instance to retrieve lists of genes that are frequently
349. illes sont dans ce cas simplement fix es sur une lame de verre L originalit de cette technologie repose sur un s quen age de fragments en parall le et tr s haut d bit par ligation de di bases coupl es un fluorochrome Ainsi la lecture d un fluorochrome ne code pas pour une base comme pour la SBS mais pour la ligation de deux bases successives La plateforme IBiSA TGML du laboratoire TAGC Inserm UMR_S 928 tant quip e d un s quenceur de technologie SOLiD depuis avril 2009 cette technologie est d crite plus en d tail dans ce manuscrit Le s quen age du fragment d ADN est r alis par l hybridation d une amorce compl mentaire de la s quence de l adaptateur P1 Figure 1 8 et l ajout de sondes d une taille de 8 nucl otides coupl es un fluorochrome Les bases d g n r es correspondent un m lange quimolaire des 4 nucl otides par position Figure 1 9 Les sondes de 8 nucl otides sont compl mentaires sur 5 nucl otides il existe donc 45 sondes possibles soit 1024 sondes au total L inclusion de nucl otides universels dans les sondes permet un s quen age efficace et rapide de s quences nucl otidiques plus longues Apr s la ligation d une sonde le signal mis par chaque bille est d tect photographie haute r solution Puis les sondes sont cliv es en position 5 pour permettre une nouvelle ligation La lecture des deux premi res positions du fragment d ADN cible e
350. in A and regulates actin cytoskeleton organization and cell migration J Cell Sci 121 854 864 Katafiasz D Smith LM Wahl JK 3rd 2011 Slug SNAI2 expression in oral SCC cells results in altered cell cell adhesion and increased motility Cell Adhes Migr 5 315 322 Keren H Donyo M Zeevi D Maayan C Pupko T Ast G 2010 Phosphatidylserine increases IKBKAP levels in familial dysautonomia cells PLoS One 5 e15884 Kondo T Johnson SA Yoder MC Romand R Hashino E 2005 Sonic hedghog and retinoic acid synergistically promote sensory fate specification from bone marrow derived pluripotent stem cells Proc Natl Acad Sci USA 102 4789 4794 Kramer I Sigrist M de Nooij JC Taniuchi I Jessell TM Arber S 2006 A role for Runx transcription facto signaling in dorsal root ganglion sensory neuron diversification Neuron 49 379 393 Lee G Papapetrou EP Kim H Chambers SM Tomishima MJ Fasano CA Ganat YM Menon J Shimizu F Viale A Tabar V Sadelain M Studer L 2009 Modelling pathogenesis and treatment of familial dysautonomia using patient specific iPSCs Nature 461 402 406 Li Q Fazly AM Zhou H Huang S Zhang Z Stillman B 2009 The elongator complex interacts with PCNA and modulates transcriptional silencing and sensitivity to DNA damage agents PLoS Genet 5 e1000684 Livak KJ Schmittgen TD 2001 Analysis of relative gene expression data using real time quantitative PCR and the 2 Delta Delta C T Method Methods 25 402 408 Lo
351. ina I Alloza E Montaner D amp Dopazo J 2007 FatiGO a functional profiling tool for genomic data Integration of functional annotation regulatory motifs and interaction data with microarray experiments Nucleic acids research 35 Web Server issue W91 6 Alston et al 2010 Alston M J Seers J Hinton J C D amp Lucchini S 2010 BABAR an R package to simplify the normalisation of common reference design microarray based transcriptomic datasets BMC bioinformatics 11 73 Altschul et al 1990 Altschul S F Gish W Miller W Myers E W amp Lipman D J 1990 Basic local alignment search tool Journal of molecular biology 215 3 403 10 Ashburner et al 2000 Ashburner M Ball C A Blake J A Botstein D Butler H Cherry J M Davis A P Dolinski K Dwight S S Eppig J T Harris M A Hill D P Issel Tarver L Kasarskis A Lewis S Matese J C Richardson J E Ringwald M Rubin G M amp Sherlock G 2000 Gene ontology tool for the unification of biology The Gene Ontology Consortium Nature genetics 25 1 25 9 Bailey 2011 Bailey T L 2011 DREME motif discovery in transcription factor ChIP seq data Bioinformatics Oxford England 27 12 1653 9 Bainbridge et al 2011 Bainbridge M N Wang M Wu Y Q Newsham I Muzny D M Jefferies J L Albert T J Burgess D L amp Gibbs R A 2011 Targeted enrichment
352. informatique et est retrouv pour des outils tels que RSATools ou bien encore des bases de donn es comme Kegg La documentation de notre service web est accessible l adresse http tagc univ mrs fr services T BService wsdl Les m mes requ tes que pour la pr c dente version ont t d velopp es mais ce service web n interroge que la derni re version de la base de donn es via l appel de proc dures stock es Cette derni re peut tre int gr e des workflows de type Taverna ou des outils tels que 206 Chapitre 4 Fouille de donn es de puces ADN Cytoscape ce qui permet l utilisation de notre base de donn es travers d autres outils 4 7 2 Impl mentation d une librairie R Bioconductor RTools4TB Cette librairie est constitu e d un ensemble d objets et de fonctions cod es sous R et d un programme crit en langage C permettant 1 d interroger la nouvelle base de donn es via l utilisation du service web 2 d extraire des signatures transcriptionnelles l aide de l algorithme DBF MCL partir d une matrice d expression Le programme C appel par le code R r alise la premi re partie de l algorithme et g n re les donn es pour MCL Cette librairie fait galement appel d autres librairies R telles que Biobase limma methods XML Reurl et SSOAP Enfin le code R appelle les programmes mcl et cluster gr ce des commandes syst me Il est ainsi possible de questi
353. iological Process Cette figure sch matise les termes parents du terme transcription DNA dependent obtenue a l aide de l outil QuickGO http www ebi ac uk QuickGO GTerm id GO 0006351 86 Chapitre 3 Analyses de donn es de puces ADN deux tats biologiques cette m thode permet de calculer des scores d enrichissement fonc tionnel en utilisant la base de donn es mol culaire Molecular Signature DataBase MSigDB Subramanian et al 2005 Une application commerciale existe galement Ingenuity Path way Analysis IPA qui comporte des annotations v rifi es par des scientifiques et permet la visualisation des g nes surexprim s en rouge et sousexprim s en vert sous forme de r seaux de g nes contextualis s Figure 3 7 3 3 3 Tests d enrichissement fonctionnel Lors de l tape d annotation fonctionnelle il ne suffit pas seulement de savoir quelle voie de signalisation ou quelle annotation caract rise au moins l un des g nes pr sents dans la liste de g nes diff rentiellement exprim s il faut galement savoir si l association d une partie des g nes de cette liste une annotation donn e est significative Draghici et al 20031 Un test d enrichissement fonctionnel permet de comparer la liste de g nes diff rentiellement exprim s aux g nes impliqu s dans une voie de signalisation ou associ s une annotation fonctionnelle particuli re pour v rifier si la liste de g nes test
354. ion ChIP seq Leurs analyses au niveau de la plateforme a n cessit le d veloppement d un pipeline de traitement des donn es sp cifique du s quenceur SOLiD Tous ces r sultats seront pr sent s dans le chapitre 5 de ce manuscrit 1 4 1 1 Chimie Bien que les trois principales technologies de s quen age poss dent chacune une chimie et des caract ristiques de s quen age diff rentes Table 1 2 elles permettent toutes de s quencer en parall le des fragments d ADN obtenus apr s une tape d amplification Figure 1 7 Suzuki et al 2011 Borgstr m ef al 2011 Shendure ef al 20051 1 4 Les techniques de s quen age tr s haut d bit 33 Soci t Support Amplification Technique de Mod le Ann e s quen age Lame de verre Emulsion Ligation SOLID v2 2007 PCR Life SOLID v3 2008 Technologies SOLID v3 5 2009 SOLiD v4 2010 SOLiD 5500 XL 2011 Puce semi Diff rentiel de Ion Torrent 2010 conductrice potentiel lec trique Lame de verre Pontage Synth se GA I 2007 sur phase solide Illumina GA IIx 2008 HiScanSQ 2009 HiSeq 1000 2009 HiSeq 2000 2010 MiSeq 2011 Plaque a pico Emulsion Pyros quencaga GS20 2006 titration Pi PCR Roche Diagnostics we Titec ste PTP GS FLX 2007 GS FLX Titanium 2008 GS Junior 2011 TABLE 1 1 Tableau comparatif des principales technologies de s quen age tr s haut d bit Les cellules gris es corresp
355. ion data sets Proceedings of the National Aca demy of Sciences of the United States of America 100 14 8418 23 302 Bibliographie Splinter et al 2004 Splinter E Grosveld F amp de Laat W 2004 3C technology analyzing the spatial organization of genomic loci in vivo Methods in enzymology 375 493 507 Srivatsan et al 2008 Srivatsan A Han Y Peng J Tehranchi A K Gibbs R Wang J D amp Chen R 2008 High precision whole genome sequencing of laboratory strains facilitates genetic studies PLoS genetics 4 8 1000139 Stephens et al 2009 Stephens P J McBride D J Lin M L Varela I Pleasance E D Simpson J T Stebbings L A Leroy C Edkins S Mudie L J Greenman C D Jia M Latimer C Teague J W Lau K W Burton J Quail M A Swerdlow H Churcher C Natrajan R Sieuwerts A M Martens J W M Silver D P Langer d A Russnes H E G Foekens J A Reis Filho J S van t Veer L Richardson A L B rresen Dale A L Campbell P J Futreal P A amp Stratton M R 2009 Complex landscapes of somatic rearrangement in human breast cancer genomes Nature 462 7276 1005 10 Stoeckert et al 2002 Stoeckert C J Causton H C amp Ball C A 2002 Microarray data bases standards and ontologies Nature genetics 32 Suppl 469 73 Strahl amp Allis 2000 Strahl B D amp Allis C D 2000 The language
356. ion du chromosome sexuel Fernandez Capetillo ef al 20031 et une condensation sp cifique des cellules gam tes m les Okada et al 2005 Govin ef al 20041 Ces variants ont une s quence qui diff re de celle des histones conventionnelles sur quelques r sidus seulement ou sur des portions plus impor tantes de la prot ine 1 3 4 Les ARN non codants De r centes analyses transcriptomiques tr s haut d bit ont mis en vidence que plus de 90 du g nome est transcrit mais que seuls 1 2 de ces transcrits coderaient pour des prot ines les autres constitueraient une cat gorie de transcrits appel s ARN non codants ncRNA pour non coding RNA Parfois bien conserv s lors de l volution ce qui sugg re une importance fonctionnelle ils sont cependant g n ralement moins fortement exprim s que les ARN messagers Ces ncRNA peuvent tre divis s en 2 groupes les ncRNA d infrastructure incluant les ARN ribosomiques les ARN de transfert et les petits ARN de type small nuclear et les ncRNA de r gulation comme les micro ARN miRNA les small interfering RNA siRNA et les long non coding RNA IncRNA Ponting et al 2009 Au del de leur r le dans la d gradation d un ARNm cible les miRNA et les siRNA et les ont ainsi t identifi s ainsi que le IncRNA comme pouvant jouer un r le dans la r gulation de l expression des g nes par le ciblage des promoteurs et l activation de la traduction Krol et al
357. ion entre l enhancer et le promoteur Geyer amp Corces 1992 Kellum amp Schedl 19921 Ces r gions r gulatrices sont particuli rement conserv es lors de l volution car elles sont compos es de courtes s quences de 6 15 paires de bases pb appel es l ments r gulateurs RE Regulatory Element permettant le recrutement sp cifique des facteurs de transcription sur l ADN Figure 1 3 Les facteurs de transcription n agissent pas de mani re ind pendante mais forment des complexes avec d autres facteurs de transcription et des cofacteurs prot iques comme c est galement le cas pour les facteurs de transcription g n raux Fedorova amp Zink 2008 Ravasi et al 2010 Ces facteurs de transcription se lient leur sites de liaison sp cifiques souvent regroup s en modules cis r gulateurs 1 3 3 La chromatine histones et marques pig n tiques La chromatine est compos e de l enroulement de la double h lice d ADN chromosomique autour des nucl osomes d histones et des prot ines non histones La chromatine est ainsi un polym re de nucl osomes dont le degr de condensation affecte l accessibilit de la machi nerie transcriptionnelle ADN Les nucl osomes sont des octam res d histones constitu s de deux h t rodim res H2A H2B et H3 H4 autour desquels 146 paires de bases pb d ADN sont enroul s Figure 1 4 A L histone linker H1 est localis e entre 2 nucl osomes et
358. ion followed WHO recommendations 29 Blood sample preparation Whole blood samples 2 5 ml were collected on PAXgene Tubes PreAnalytiX TM further stored at 80 C before being sent to France in dry ice Extraction of series of 24 matched samples DF DHF and DSS was done using PAXgene Blood RNA kits PreAnalytix rapidly after collection Purified total RNAs kept at 80 C were processed for hybridization on genome wide DNA microarrays within one month cRNAs preparation and microarrays hybridization All RNAs were checked for integrity using the 2100 BioAna lyzer Agilent Technologies and quantified using a ND 1000 spectrophotometer NanoDrop Technologies Cyanine 3 labeled cRNA was generated from 0 3 ug of RNA using the One Color Low RNA Input Linear Amplification kit Agilent according to the manufacturer s instructions followed by purification on RNAeasy column QIAGEN All amplified CRNAs were checked for dye incorporation cRNA yield and amplification profile Only those fitting all quality criteria were fragmented for further hybridization on microarrays Samples from DF DHF and DSS patients were then carefully matched and hybridized onto Agilent Whole Human Genome 444K Oligo Microarrays G4112F Microarrays were scanned using an Agilent DNA microarray scanner G2505B Microarray data analysis All microarray data is MIAME compliant and the raw and normalized data have been deposited in the MIAME compliant
359. ion or fold change As such a filtering procedure is not well suited for automated analysis of numerous experiments we developed an adaptive density based filter DBF whose goal is to isolate automatically informative genes from a dataset Selected genes are next used to construct a graph that is subsequently partitioned using MCL This modified version of MCL algorithm was termed DBF MCL for Density Based Filtering and Markov CLustering In the present paper we show that DBF MCL provides very good results both on simulated and real datasets The algorithm was run on 1 484 microarrays datasets 46 564 biological samples PLoS ONE www plosone org GEO Datamining with TBrowser performed on various Affymetrix platforms human mouse and rat This led to the identification of 18 250 transcriptional signatures TS whose corresponding gene lists were tested for an enrichment in terms derived from numerous ontologies or curated databases using the DAVID knowledgebase 12 Gene Ontology KEGG BioCarta Swiss Prot BBID SMART NIH Genetic Association DB COG KOG etc see Figure S1 for an overview of the data processing pipeline Informations related to biological samples experiments TS composition TS associated expression values and TS keyword enrichment scores were stored in a relational database A Java application TBrowser Transcripto meBrowser was developed and deployed using Java Web Start technology Combined qu
360. ion to physical interactions it offers a unified access to miRNA targets and results from ChIP Seq experiments derived from CHEA Presently the data sources associated with the InteractomeBrowser plug in are restricted to human and mouse Indeed one of the main objectives of InteractomeBrowser is to help users in creating regulatory maps to study human gene regulatory networks in physiological and pathological conditions The choice of mouse as an additional organism supported by our database is a natural choice as it is a widely used model of human physiopathology However we are already planning to add new organisms in the near future As more and more experimentally validated interactions are available we hope that this tool will prove very useful for researchers 15 Availability and requirements InteractomeBrowser comes as a plugin for TranscriptomeBrowser and is available at http tagc univ mrs fr tbrowser Our database is updated on a regular basis See supplementary ma terial for a video tutorial Project name InteractomeBrowser e Project home page http tagc univ mrs fr tbrowser e Operating system s Platform independent Java e Programming language Java e Other requirements Java gt 1 6 X License no license required e Any restrictions to use by non academics none Competing interests The authors declare that they have no competing interests List of abbreviation used PWM Position Weight Matr
361. ions from a noisy environment However a range of optimal values for inflation parameter needs to be defined to get the best results Performances of DBF MCL on GSE1456 dataset Next DBF MCL was tested with microarray data to explore its effectiveness in finding clusters of co regulated genes To this end we used the microarray data from Pawitan et al 13 who studied gene expression profiles in a large cohort of Swedish patients affected by breast cancer This experiment is recorded as GSE1456 in the GEO database All sample n 159 have been hybridized onto the GPL96 platform Affymetrix GeneChip Human Genome U133 Array Set HG U133A The complete dataset 22 283 genes was used for analysis Figure S5B shows the number of informative genes obtained with various k values Again two phases were observed suggesting that regions with heterogeneous densities exist in the GSE1456 dataset As expected the transition from dense to sparse regions was less marked than in the artificial dataset A k value of 100 was chosen to allow the extraction of a large part of data that can be considered as noise free This value led to the selection of 4 470 elements out of the whole dataset Fig 1 A B The graph partitioning procedure using default MCL parameters I 2 generated 11 highly homogeneous clusters Fig 1C F As with the Complex9RN200 dataset the results were very consistent with those obtained using hierarchical clustering although
362. ir of term PWM we computed Fisher s exact test p value f Each cell of a matrix with terms as row and PWM as column was filled with a score defined as og f A D Representative biclusters found with BiMax are presented Figure 2 The InteractomeBrowser plugin A A global and zoom in view of InteractomeBrowser cell compartment based layout Zoom in view shows some sub cellular compartments together with node corresponding to gene products Note that node corresponding to Esrl appears as green indicating that regulatory information is available for this gene B Positive interactions i e activations appear as green edges with normal arrowheads here Notch is the source C Negative interactions i e repressions appear as red edges with T shaped arrowheads here Mirn17 is the source D Ambiguous interactions whose repressive or activating status is unknown appear as violet arrows with dot arrowheads here with Mycn as source 18 Table Table 1 A comparison of web tools dedicated to molecular interactions The table provides an overview of the types of molecular interactions and of the functionalities offered by representative web tools previously published Informations were obtained from latest articles describing the servers Physical protein protein interactions 2 S Computationally predicted TF targets Experimentally observed TF targets E Predicted miRNA targets Regulatory interactions from literature Bi
363. is Of VAriance ANOVA 79 3 2 M thodes de classification non supervis es oi ess us wR est 80 3 2 1 La m thode de classification hi rarchique 1 5 44 40 es sown Boe 80 3 2 2 La m thode des k moyens k means 80 3 2 3 Self organizing maps SOUM 224 2 du o4 due 6 Kb wR ED KS 82 3 3 Annotation fonctionnelle Le a su 4 bm La ow RRS RR ORS RRS 82 3 3 1 Les diff rentes sources d information 83 3 3 2 Quelques outils d annotation 83 3 3 3 Tests d enrichissement fonctionnel ss ass LUE she EES 86 3 4 Analyses de donn es dans le cadre de collaborations 88 34 1 ESS Le a ete eae ee 2 A AU NN ee wa ae d 89 ARTICLE 1 GENOME WIDE EXPRESSION PROFILING DECIPHERS HOST RESPONSES AL TERED DURING DENGUE SHOCK SYNDROME AND REVEALS THE ROLE OF INNATE IMMUNITY IN SEVERE DENGUE 2 2 0 ee ee e 91 3 4 2 Dysautonomie Familiale 107 ARTICLE 2 OLFACTORY STEM CELLS NEW CELLULAR MODEL FOR STUDYING MOLE CULAR MECHANISMS UNDERLYING FAMILIAL DYSAUTONOMIA 111 ARTICLE 3 GENOME WIDE ANALYSIS OF FAMILIAL DYSAUTONOMIA AND KINETIN TARGET GENES WITH PATIENT OLFACTORY EcTo MESENCHYMAL STEM CELLS 129 3 5 Conclusions et perspectives 4 dis mon no Ke ER HS ROR KOS ROSH 5 140 4 Fouille de donn es de puces ADN 145 AA SR Cee donn es a n gh oS oh Se ARB A he eS 146 Blt ANGIE CU Weare
364. is p 0 0334 FBXL15 166240 1 71 0 00130 ubiquitin dependent protein catabolic process WSB1 298983 1 61 0 00123 ubiquitination and proteosomal degradation of target proteins PCSK7 241130 1 53 0 00155 proteolysis ubiquitous endoprotease activity RNF115 471834 1 42 0 00067 proteolysis vesicle mediated transport vesicle traffic MMP27 767086 1 35 0 00067 proteolysis of fibronectin laminin gelatins and or collagens Clone ID represents the number assigned to the original clones produced by the I M A G E Consortium FC Fold change and p values were calculated by SAM analysis as described in Methods This list of genes was annotated with the Explain System from Biobase 7 majors processes are overrepresented in our list of genes and for each process p values were calculated and adjusted by the Bonferroni correction The last column indicates the genes that were also found to be significantly dysregulated in 2 previous FD studies 1 Lee et al 2009 2 Close et al 2006 doi 10 1371 journal pone 0015590 t001 When investigating expression at the end of JABAAP coding sequence again we observed a third alternative splicing event The amplification from exon 33 to exon previously numbered exon 36 and now called exon 37 revealed 2 products Figure 6B middle PLoS ONE www plosone org panel The sequencing of the barely detectable and longer PCR product revealed the inclusion of an additional exon Figure 6A right s
365. is was confirmed by the IPA analysis that did not identify IFN type I related pathways among those strongly associated with the DSS gene signature Figure 2 DSS is associated with impaired expression of T and NK cell related genes but increased expression of anti inflammatory and repair remodeling transcriptional responses Integrative analysis of the most significant individual genes and canonical pathways extended the finding that a large and diverse set of genes related to T but also to NK lymphocyte activity is July 2010 Volume 5 Issue 7 e11671 Molecular Mechanisms of DSS 6 0 35 25 5 4 3 15 2 1 05 o M 0 YD Rw O amp LP AAS c PQ Xd AAD amp Ss ca PAA amp L A S amp SD SF SPN S I ISSN INR ISIN IN EES Log P value gs o N Ratio PPP ELL LLL PE PR LLL LL SO X KIEL E EN N NNS SH SO SE RS SK E2 Not Sd eG RS NO LS ELS NS VD os Q LORA FAN gt RE o NONAS NY CR ii ST ES PEO as ES SAS UNE ET E NES FS yY DIF OREA OAN NS LA RES S amp NO ays as OS S ae S 2 S Sew FO SS KES LINC HS SC 4 L O SE S A ORO K5 Na SN PQ As Ca D Dy lt D N NA Y KG rh Figure 2 Top 30 canonical pathways identified from the DSS gene signature using Ingenuity Pathway Analysis software The significance of the association between data set and canonical pathway was estimated by the p value Fischer s exact test left axis and the ratio right axis of genes that
366. iveau informatique version de bioscope modification des programmes et pipelines Actuellement pour un chantillon on dispose en g n ral d environ 35 40 millions de billes et donc autant de reads Le pipeline met environ 6 heures par chantillon pour les analyses secondaires et tertiaires Un fichier de log permet de suivre la progression des diff rentes tapes et le comportement des outils dont les sorties sont redirig es dans ce fichier 5 4 1 Choix des logiciels et strat gies Les donn es de s quen age du SOLiD provenant de l analyse primaire tant dans les formats csfasta et _QV qual le choix d utiliser le logiciel d alignement fourni par la suite bioscope appel mapread s est naturellement impos Toutefois les logiciels BOWTIE et BWA ont t test s avec conversions pour la prise en charge de ces formats sans montrer 5 4 Elaboration d outils et de m thodes d analyse pour les donn es de ChIP seq 243 de diff rences de qualit avec le logiciel mapread de bioscope De plus par d faut ces autres outils ne parall lisent par leurs t ches d alignement comme le fait la suite bioscope et donc ils se r v lent nettement plus lents sans apporter de r el avantage Le fichier de r sultats de l alignement tant au format compress bam sa conversion l aide de la suite d outils samtools permet l obtention d un fichier plat au format sam plus adapt aux traitements
367. ived from human embryonic stem cells as a platform for studying peripheral neuropathies PLoS One 5 e9290 Lee G Papapetrou EP Kim H Chambers SM Tomishima MJ et al 2009 Modelling pathogenesis and treatment of familial dysautonomia using patient specific iPSCs Nature 461 402 6 Saha K Jaenisch R 2009 Technical challenges in using human induced pluripotent stem cells to model disease Cell Stem Cell 5 584 95 Vierbuchen T Ostermeier A Pang ZP Kokubu Y Sudhof TC et al 2010 Direct conversion of fibroblasts to functional neurons by defined factors Nature 463 1035 41 Kim K Doi A Wen B Ng K Zhao R et al 2010 Epigenetic memory in induced pluripotent stem cells Nature 467 285 90 Polo JM Liu S Figueroa ME Kulalert W Eminli S et al 2010 Cell type of origin influences the molecular and functional properties of mouse induced pluripotent stem cells Nat Biotechnol 28 848 55 Ghosh Z Wilson KD Wu Y Hu S Quertermous T et al 2010 Persistent donor cell gene expression among human induced pluripotent stem cells contributes to differences with human embryonic stem cells PLoS One 5 e8975 16 17 18 19 20 21 22 23 24 26 27 December 2010 Volume 5 Issue 12 e15590 28 29 30 31 32 33 34 35 36 37 38 39 Graziadei PP Graziadei GA 1979 Neurogenesis and neuron regeneration in the olfactory system of mammals I Morpho
368. jima H Colonna M Chuang SS Stepp SE et al 1999 Molecular characterization of a novel human natural killer cell receptor homologous to mouse 2B4 Tissue Antigens 54 27 34 PubMed Snapshot n d Available http www ncbi nlm nih gov sites entrez Accessed 18 September 2008 Townsend MJ Weinmann AS Matsuda JL Salomon R Farnham PJ et al 2004 T bet regulates the terminal maturation and homeostasis of NK and Valphal4i NKT cells Immunity 20 477 94 Lauwerys BR Renauld JC Houssiau FA 1999 Synergistic proliferation and activation of natural killer cells by interleukin 12 and interleukin 18 Cytokine 11 822 30 van t Veer LJ Dai H van de Vijver MJ He YD Hart AAM et al 2002 Gene expression profiling predicts clinical outcome of breast cancer Nature 415 530 6 Shi L Reid LH Jones WD Shippy R Warrington JA et al 2006 The MicroArray Quality Control MAQC project shows inter and intraplatform reproducibility of gene expression measurements Nat Biotechnol 24 1151 61 Barrett T Troup DB Wilhite SE Ledoux P Rudnev D et al 2007 NCBI GEO mining tens of millions of expression profiles database and tools update Nucleic Acids Res 35 D760 5 Gentleman RC Carey VJ Bates DM Bolstad B Dettling M et al 2004 Bioconductor open software development for computational biology and bioinformatics Genome Biol 5 R80 20 22 23 24 26 27 28 December 2008 Volume 3 Iss
369. k N Segraves R Blackwood S Brown N Conroy J Hamilton G Hindle A K Huey B Kimura K Law S Myambo K Palmer J Ylstra B Yue J P Gray J W Jain A N Pinkel D amp Albertson D G 2001 Assembly of microarrays for genome wide measurement of DNA copy number Nature genetics 29 3 263 4 Solinas Toldo et al 1997 Solinas Toldo S Lampel S Stilgenbauer S Nickolenko J Benner A Dohner H Cremer T amp Lichter P 1997 Matrix based comparative genomic hybridization biochips to screen for genomic imbalances Genes chromosomes amp cancer 20 4 399 407 Song et al 2011 Song L Zhang Z Grasfeder L L Boyle A P Giresi P G Lee B K Sheffield N C Graf S Huss M Keefe D Liu Z London D McDaniell R M Shibata Y Showers K A Simon J M Vales T Wang T Winter D Zhang Z Clarke N D Birney E Iyer V R Crawford G E Lieb J D amp Furey T S 2011 Open chromatin defined by DNasel and FAIRE identifies regulatory elements that shape cell type identity Genome research 21 10 1757 67 Sorlie et al 2003 Sorlie T Tibshirani R Parker J Hastie T Marron J S Nobel A Deng S Johnsen H Pesich R Geisler S Demeter J Perou C M L nning P E Brown P O B rresen Dale A L amp Botstein D 2003 Repeated observation of breast tumor subtypes in independent gene express
370. kay F Schulz S Lopez Bendito G Stumm R Marin O 2011 Cxcr7 controls neuronal migration by regulating chemokine responsiveness Neuron 69 77 90 Scott SA Edelmann L Liu L Luo M Desnick RJ Kornreich R 2010 Experience with carrier screening and prenatal diagnosis for 16 Ashkenazi Jewish genetic diseases Hum Mutat 31 1240 1250 Shetty RS Gallagher CS Chen YT Hims MM Mull J Leyne M Pickel J Kwok D Slaugenhaupt SA 2011 Specific correction ofa splice defect in brain by nutritional supplementation Hum Mol Genet 20 4093 4101 Slaugenhaupt SA Blumenfeld A Gill SP Leyne M Mull J Cuajungco MP Liebert CB Chadwick B Idelson M Reznik L Robbins C Makalowska I Brownstein M Krappmann D Scheidereit C Maayan C Axelrod FB Gusella JF 2001 Tissue specific expression of a splicing mutation in the IKBKAP gene causes familial dysautonomia Am J Hum Genet 68 598 605 Slaugenhaupt SA Mull J Leyne M Cuajungco MP Gill SP Hims MM Quintero F Axelrod FB Gusella JF 2004 Rescue of a human mRNA splicing defect by the plant cytokinin kinetin Hum Mol Genet 13 429 436 Slonim DK 2002 From patterns to pathways gene expression data analysis comes of age Nat Genet Suppl 32 502 508 Solinger JA Paolinelli R Kloss H Scorza FB Marchesi S Sauder U Mitsushima D Ca puani F Sturzenbaum SR Cassata G 2010 The Caenorhabditis elegans Elongator complex regulates neuronal alpha tubulin acetylation PLoS Genet 6 e1000820
371. l 24 interaction networks of proteins globally integrated and scored Nucleic Acids Res 2011 39 D561 568 8 Xie X Rigor P Baldi P MotifMap a human genome wide map of candidate regulatory motif sites Bioinformatics 2009 25 167 174 9 Warde Farley D Donaldson SL Comes O Zuberi K Badrawi R Chao P Franz M Grouios C Kazi F Lopes CT Maitland A Mostafavi S Montojo J Shao Q Wright G Bader GD Morris Q The GeneMANIA prediction server biological network integration for gene prioritization and predicting gene function Nucleic Acids Res 2010 38 W214 220 10 Hernandez Toro J Prieto C De las Rivas J APID2NET unified interactome graphic analyzer Bioinformatics 2007 23 2495 2497 11 Barsky A Gardy JL Hancock REW Munzner T Cerebral a Cytoscape plugin for layout of and interaction with biological networks using subcellular localization annotation Bioinformatics 2007 23 1040 1042 12 Wingender E Dietze P Karas H Kniippel R TRANSFAC a database on transcription factors and their DNA binding sites Nucleic Acids Res 1996 24 238 241 13 A Sandelin JASPAR an open access database for eukaryotic transcription factor binding profiles Nucleic Acids Research 2004 32 91D 94 14 Newburger DE Bulyk ML UniPROBE an online database of protein binding microarray data on protein DNA interactions Nucleic Acids Res 2009 37 D77 82 15 Lachmann A Xu H Krishnan J Berger SI Mazloom AR Ma aya
372. l types Some investigators treated HeLa or neu roblastoma cells with siRNAs while others generated FD iPSCs hOE MSCs or analyzed FD brains Boone et al 2010 Cheishvili et al 2007 Close et al 2006 Cohen Kupiec et al 2011 Lee et al 2009 It was thus expected from such heterogeneity in cell types genetic background and methodologies that important discrepan cies would characterize those studies and ours Despite such lim itations we were able to identify a common set of genes in our microarray data and data from four previous studies Supp Ta ble S4 that could contribute to the FD disease process Cheishvili et al 2007 Close et al 2006 Cohen Kupiec et al 2011 Lee et al 2009 Among the dysregulated genes shared by at least two studies several are related to nervous system development and character ize common alterations of neuronal cells Notable downregulated genes include SEMA5A and SEMA3C which encode members of the semaphorin family involved in axonal guidance during neu ral development Hernandez Montiel et al 2008 Hilario et al 2009 NRCAM which encodes an adhesion molecule acting as a co receptor for SEMA3B and 3F Falk et al 2005 ALCAM involved in axonal guidance Buhusi et al 2009 RELN which regulates the migration of neuroblasts Frotscher 2010 FEZ1 which promotes neurite elongation Maturana et al 2010 and DLX5 which en codes a homeobox transcriptional factor promoting neuro
373. lation Scott et al 2010 The disease is characterized by anatomical se lective depletion of sensory and autonomic neurons Axelrod et al 1981 Pearson and Pytel 1978 Pearson et al 1978 resulting in variable symptoms including decreased sensitivity to pain lack of overflow tearing inappropriate blood pressure control manifested as orthostatic hypotension and episodic hypertension poor oral coordination resulting in poor feeding and swallowing and gas trointestinal dysmotility Axelrod 2004 FD is a disease for which no cure is currently available and treatment is aimed at controlling symptoms and prevention of complications FD is caused by mutations in the IKBKAP gene MIM 603722 which encodes a protein termed IKAP hELP1 Anderson et al 2001 Slaugenhaupt et al 2001 The most prevalent mutation is the T to C transition in position six of the 5 splice site 5 ss of intron 20 c 2204 6T gt C occurring in gt 99 5 of cases of FD Anderson et al 2001 Dong et al 2002 Scott et al 2010 Slaugenhaupt et al 2001 This mutation leads to a tissue specific skipping of exon 20 of IKBKAP mRNA MU isoforms The defective splicing leads to low levels of transcripts including exon 20 WT isoforms re duced synthesis of IKAP hELP1 protein and appears to be more se vere in sensory and autonomic nervous systems than others tissues Cuajungco et al 2003 KEY WORDS familial dysautonomia IKBKAP RNA IKAP hELP1 was
374. le contr le distance des instruments 4 1 2 Les bases de donn es MySQL MySQL est un syst me de gestion de bases de donn es SGBD permettant d optimiser les recherches les tries et la visualisation de quantit s importantes de donn es Il fait partie des logiciels de gestion de bases de donn es les plus utilis s dans le monde avec Oracle et Microsoft SQL Server MySQL est un serveur de bases de donn es relationnelles utilisant le langage de requ te SQL Structured Query Language d velopp dans un souci de per formances lev es en lecture ce qui signifie qu il est davantage orient vers le service de donn es d j en place que vers celui de mises jour fr quentes et fortement s curis es Il est multi t ches multi thread et multi utilisateurs Il fonctionne sous les principaux syst mes d exploitation et les donn es qu il contient sont accessibles en utilisant de tr s nombreux langages de programmation parmi lesquels Java C Perl PHP une interface de programmation Application Programming Interface ou APT sp cifique est disponible pour chacun d entre eux Lune des sp cificit s de MySQL est de pouvoir g rer plusieurs moteurs de stockage au sein d une seule base Chaque table peut utiliser un moteur diff rent au sein d une base La facilit d utilisation de plusieurs moteurs de stockage dans une seule base permet une norme flexibilit dans l optimisation de la base pour chaque table on utilisera
375. le moment a des publications Enfin ce pipeline a t en partie utilis pour g n rer des donn es partir d une nouvelle ap proche app l e Mnase Cap d velopp e en collaboration avec le Dr Salvatore Spicuglia Cette approche est une technique combinant celle du Mnase seq et de la capture sur lame ou en solution sur bille magn tique de r gion g nomique cibl e Ceci permet d augmenter la couver ture des r gions cibl es et de mieux tudier le positionnement des nucl osomes au niveau des r gions r gulatrices de g nes d int r t Une publication de cette nouvelle approche du positionnement des nucl osomes est en cours de pr paration 5 6 Discussion et perspectives 247 5 6 Discussion et perspectives Beaucoup de questions restent en suspens Le s quen age tr s haut d bit n est pas encore une technologie ne soulevant aucun probl me technique Il reste en effet pas mal de questions laiss es en suspend et qui font l objet de conf rences internationnales comme les puces ADN une autre poque En effet la communaut scientifique n a pas encore fix e de protocole sp cifique du ChIP seq concernant l utilisation du mode fragment par rapport au paired end la longueur des reads utiliser pour le ChIP seq la taille optimale des sonications les meilleures m thodes pour la d tection de pics ou bien encore l impact des amplifications g nomiques sur les donn es de ChIP seq Amplification
376. librairie R a t int gr e aux librairies pr sentes dans Bioconductor partir de la version 2 5 La page web de la librairie sur le site de Bioconductor est accessible l adresse http www bioconductor org packages 2 8 bioc html RTools4TB html version 2 8 de Bioconductor actuellement Une nouvelle version de la librairie utilisant le service web SOAP WSDL est en cours de pr paration et sera bient t mise en ligne sur le site de Bioconductor via un syst me de gestion de version de type SVN Celle ci permettra galement d annoter une liste de g nes partir des donn es d annotation contenues dans notre base de donn es 4 8 Conclusions et perspectives 207 Un r sum de l utilisation de la librairie RTools4TB et de son utilisation est pr sent dans le manuel d utilisation t l chargeable avec la librairie voir Annexe B Afin de permettre un d veloppement conjoint par tous les d veloppeurs du projet et galement l archivage et la maintenance de celui ci un syst me de gestion de version de type SVN a t mis en place Un article r sumant l avanc du projet Figure 4 7 et Table 4 1 depuis sa publication initiale en 2008 est en cours de pr paration 4 8 Conclusions et perspectives Utilisation de TBrowser Une des mani re de mesurer l impact de l utilisation de TBrowser sur la commaut scientifique est d tudier son utilisation L article de TBrowser a t vu 2396 fois partir du site de
377. ll data are MIAME compliant and have been loaded into ArrayExpress database http www ebi ac uk microarray as ae under accession number E MTAB 281 Statistical and gene ontology analysis Significant Analysis of Microarray SAM version 1 13 Stand ford University was applied to determine significant differential gene expression using the Multiexperiment viewer MEV program The data were analyzed using a two class unpaired response type which compared control samples versus FD samples SAM calculated a significant score for each gene based on the gene expression change relative to the standard deviation of repeated values for that gene We used 100 permutations and a December 2010 Volume 5 Issue 12 e15590 false discovery rate FDR of 3 A total of 50 genes appearing in the heat map generation were called as significant with a p value lt 0 006 For gene ontology analysis we generated a set of human protein associated with the gene appearing as significant with the SAM test by using the BioKnowledge Library BKL Retriever M search tool http www biobase international com This set of proteins was analyzed for overrepresentation of Gene Ontology GO Biological Process BP terms Boyden chamber based cell migration assay hOE MSCs were detached by trypsm EDTA counted and seeded into the upper chamber of transwell polyethylene terephtalate filter membranes with 8 uM diameter pores BD Biosciences at a density of 3x10
378. ll files file 1 gt US45102986_251487911262_S01_GE1 v5_95_Feb07_1_1 txt file 2 gt US45102986_251487911262_S01_GE1 v5_95_Feb07_1_2 txt file 3 gt US45102986_251487911262_S01_GE1 v5_95_Feb07_1_3 txt file 4 gt US45102986_251487911262_S01_GE1 v5_95_Feb07_1_4 txt gt Creating an object of class AgilentBatch An object of class AgilentBatch one color Memory used 16717708 Number of samples 4 Number of spots 45018 Dimensions of arrays 532 rows x 85 columns This object contains the following informations gP gBGM _ gM fileNames PosX PosY CtrT PN GN SN Desc PhenoD Miame Flag Row Col Informations about object size can be obtained using the following commands gt ncol myob gt nrow myob gt dim myob gt length myob 2 2 2 Two channels hybridizations As in the case of single channel approach quantification files should be derived from the same microarray plateform and thus contain the same number of elements gt myobRG lt getAgilentBatch 1 RG TRUE flag 1 7 path TwoColors When RG is set to TRUE getAgilentBatch will construct an AgilentBatchRG object which differs slightly from the AgilentBatch object since it will contain rP rM and rBGM slots 2 3 Building phenoData and MIAME files Although the easiest way to create phenoData and Miame information is to provide a well formatted file in the ExpData Directory user may us
379. lle en tr s grand nombre Cette particularit est tr s utile pour les applications qui n cessitent un plus grand nombre de lectures par position c est dire une grande couverture telles que l tude quantitative du transcriptome la d tection sp cifique de polymorphismes ou des marques pig n tiques 36 Chapitre 1 Introduction g n rale Emulsion PCR Amplification sur phase solide Roche 454 Life Technologies Illumina Micro r acteur en mulsion pour une amplification clonale Un fragment d ADN par c uster un fragment d ADN par bille peur une amplification locale clonale 1 fragment d ADN dNTPs et poiym rase 1 bille Mix r actionnel primers 1 fragment d ADN i dNTPs et polym rase Mix r acticnnel Amplification PCI Un Wei Amplification par pontage Bridge amplification Rupture des mulsions Fixation chimique des billes sur la lame 100 200 milions de billes 100 200 millions de clusters FIGURE 1 7 Amplification monoclonale des fragments d ADN pour la constitution de librairies Adapt de Metzker 2010 G n ralit s Apr s r paration des extr mit s non coh sives des s quences d ADN double brins obtenues par fragmentation d ADN g nomique ou d ADNc la premi re tape pour permettre le s quen age consiste en l ajout d un couple de s quences adaptatrices Ces adaptateurs permettent la fixation sur bille ou sur lame des fragments d ADN
380. logical aspects of differentiation and structural organization of the olfactory sensory neurons J Neurocytol 8 1 18 Murrell W Feron F Wetzig A Cameron N Splatt K et al 2005 Multipotent stem cells from adult olfactory mucosa Dev Dyn 233 496 515 Delorme B Nivet E Gaillard J Haupl T Ringe J et al 2010 The human nose harbors a niche of olfactory ectomesenchymal stem cells displaying neurogenic and osteogenic properties Stem Cells Dev 19 853 66 Feron F Perry C Hirning MH McGrath J Mackay Sim A 1999 Altered adhesion proliferation and death in neural cultures from adults with schizophrenia Schizophr Res 40 211 8 McCurdy RD Feron F Perry C Chant DC McLean D et al 2006 Cell cycle alterations in biopsied olfactory neuroepithelium in schizophrenia and bipolar I disorder using cell culture and gene expression analyses Schizophr Res 82 163 73 Murrell W Wetzig A Donnellan M Feron F Burne T et al 2008 Olfactory mucosa is a potential source for autologous stem cell therapy for Parkinson s disease Stem Cells 26 2183 92 Johansen LD Naumanen T Knudsen A Westerlund N Gromova I et al 2008 IKAP localizes to membrane ruffles with filamin A and regulates actin cytoskeleton organization and cell migration J Cell Sci 121 854 64 Slaugenhaupt SA Mull J Leyne M Cuajungco MP Gill SP et al 2004 Rescue of a human mRNA splicing defect by the plant cytokinin kinetin Hum Mol Genet 13 429 36 Hims MM Ib
381. loppement des maladies neurologiques La dysautonomie familiale DF une maladie orpheline et neurod g n rative en est un parfait exemple La DF aussi appel e syndrome de Riley Day est un d sordre du syst me nerveux affectant la survie des neurones dans le syst me nerveux autonome et sensoriel Elle affecte presque exclusivement la population juive d Europe de Est avec une incidence annuelle de 1 sur 3 600 naissances Elle touche aussi bien les hommes que les femmes d s la naissance et elle est progressive Sa transmission est autosomique r cessive La DF est due des mutations au sein du g ne IKBKAP localis sur le bras long du chromosome 9 9q31 Dans cette maladie le simple changement T gt C en position 6 du site 5 d pissage 5 ss de l exon 20 du g ne IKBKAP est responsable de sa non inclusion au cours de l pissage du pr ARNm Figure 3 9 Cette non inclusion n est cependant pas syst matique et l on observe ainsi un pissage alternatif de l exon 20 avec chez les patients atteints de DF 108 Chapitre 3 Analyses de donn es de puces ADN es Vega CAAgtaagt gcc GE er me AN 19200 21 1 3 aemp IKBKAP 38 exons CAAgtaagCgcc s th IKAP hELP1 a m 1332 aa 612 aa 150 kDa 79 kDa FIGURE 3 9 Cons quence de l pissage alternatif du g ne IKBKAP sur les diff rentes isoformes prot ique cod es par ce g ne une pr pond rance de transcrits d IKBKAP excluant l
382. lors en fonction du nombre d chantillons et de la question biologique pos e Oberthuer ef al 20101 Avec l approche one color un seul chantillon marqu la cyanine 3 Cy3 est hybrid sur la puce alors que l approche fwo colors permet l hybridation simultan e sur la m me puce de deux chantillons marqu s par des fluorochromes diff rents Cy3 et cyanine 5 Cy5 g n ralement Le principal int r t des puces two colors est de pouvoir comparer directement deux chantillons hybrid s sur la m me puce en r duisant ainsi les biais techniques inh rents l utilisation de puces Cette strat gie two colors permet non seulement de s affranchir de certaines variabilit s techniques mais aussi d augmenter la sensibilit et la pr cision dans la d termination des niveaux d expression diff rentiels entre des paires d chantillons Cependant l efficacit d incorporation des nucl otides fluorescents varie en fonction du fluorochrome utilis les nucl otides marqu s par Cy5 sont moins efficacement incorpor s que ceux porteurs de Cy3 en raison de l encombrement st rique des diff rents nucl otides marqu s dUTP Cy5 et dUTP Cy3 ce qui induit des variations d intensit non imputables une expression diff rentielle des g nes Cette approche n cessite donc de traiter ces chantillons en deux tapes par l change des marqueurs fluorescents dye swap afin de corriger ce biais d incorporation D
383. lot name Description SgNorm Normalised signal from the mean signal mesured on the green channel gM slot for an AgilentNorm object A Matrix of the A values for a normalized object AgilentNorm or Agilen tormRG M Matrix of the M values for a normalized object AgilentNorm or Agilen tormRG Type Method of normalization used to obtain the normalized object AgilentNorm or AgilentormRG Table 3 Table of the new specific slots of the normalized object AgilentNorm and Agilent NormRG After normalization agMAplot can be use to observe the effect of this normalization on data And the new data distribution can be observed by agBoxplot function Finally a new virtual image can be visualized 20 5 Data exportation This exportation of data is realized by the transformation of the object normally the normalized object in a ExpressionSet object since the R 2 5 0 version Thus this new object allows to use the other library of Bioconductor for example Biobase 5 1 Creation of the ExpressionSet object Before to perform an ExpressionSet conversion the flag matrix must be change in a booleen matrix If only the second type of flags is necessary users can use the command gt myob lt agConvFlag myob flag 2 The Transformation of an AgilentBatch AgilentNorm AgilentBatchRG or AgilentNormRG objects in an ExpressionSet object is processed by the following command gt es lt as myob Expre
384. ltgrewe M Emde A K Weese D amp Reinert K 2011 A novel and well defined benchmarking method for second generation read mapping BMC bioinfor matics 12 210 Homer et al 2009 Homer N Merriman B amp Nelson S F 2009 BFAST an alignment tool for large scale genome resequencing PloS one 4 11 e7767 292 Bibliographie Hong et al 2011 Hong L Z Li J Schmidt K ntzel A Warren W C amp Barsh G S 2011 Digital gene expression for non model organisms Genome research 21 11 1905 15 Hruz et al 2008 Hruz T Laule O Szabo G Wessendorp F Bleuler S Oertle L Wid mayer P Gruissem W amp Zimmermann P 2008 Genevestigator v3 a reference expres sion database for the meta analysis of transcriptomes Advances in bioinformatics 2008 420747 Huang et al 2009 Huang D W Sherman B T amp Lempicki R A 2009 Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources Nature proto cols 4 1 44 57 Hurtado et al 2011 Hurtado A Holmes K A Ross Innes C S Schmidt D amp Carroll J S 2011 FOXAI is a key determinant of estrogen receptor function and endocrine res ponse Nature genetics 43 1 27 33 Hutchison 2007 Hutchison C A 2007 DNA sequencing bench to bedside and beyond Nucleic acids research 35 18 6227 37 Hyman 1988 Hyman E D 1988 A new method of sequencing DNA Analytical
385. lutions exist For instance if one observes a cluster containing cells of the immune system this will also frequently contain several sub clusters that will be reminiscent of cell types B or T cells for example or activation status Increasing MCL granularity Inflation parameter will most generally split the parent clusters and provided user with another partitioning result However both results can be considered as optimal and we should consider all of them To this end we plan to propose multiple partitioning solutions for each dataset to provide PLoS ONE www plosone org GEO Datamining with TBrowser a more exhaustive view of underlying biological pathways Although such an approach could appear computer intensive it should be practicable taking into account that DBF MCL is much faster than hierarchical clustering or MCL run on a whole dataset In addition although we routinely obtained very relevant results with DBF MCL we expect that even more accurate methods will be proposed in the future The present work focus on human mouse and rat Affymetrix microarray data but TBrowser can handle any type of microarrays and organism The current release of the database already contains data obtained using other commercial eg Agilent Illumina Inc GE Healthcare Applied Biosystems Panomics CapitalBio Corporation TeleChem Arraylt Mergen LTD Ep pendorf Array Technologies and non commercial platforms e g National C
386. ly answer these questions from the present study in part because blood samples were collected at the onset of shock 14 out of the 19 DSS patients or after 5 patients Functional study of each individual pathway will be required to fully understand the role of each gene in a complex network of molecular interactions The ability of some genes transcripts or genes products to accurately predict progression to DSS should be evaluated by multivariate regression models 91 using blood samples collected before the onset of shock while this proves to be difficult in the context of dengue outbreaks 28 In the present study we chose to focus on those of the identified molecular mechanisms that made PLoS ONE www plosone org the more sense to DSS pathophysiology and systemic vascular dysfunction referring to recent findings on the role of innate immunity in systemic inflammatory processes leading to shock multi organ dysfunction syndromes or other pejorative clinical outcomes Second while present results confirm some putative DSS related biomarkers it also reveals unreported alterations that make sense to hypovolemic shock pathophysiology This reinforces the ability of a global and open mind approach to identify molecular processes relevant to the studied pathology Blood cells transcriptional profiles clearly reveal alterations of different immune responses and the activation of a large pro inflammatory response A significant
387. matine Le mod le des modifications covalentes des histones agissant comme un code le code des histones a t propos par Strahl et Allis en 2000 Strahl amp Allis 2000 Jenuwein amp Allis 2001 Ce code est loin d tre universel Il serait plus ou moins sp cifique selon les g nes et les cellules consid r s mais semblerait tre volutivement stable au sein des mammif res Lee amp Mahadevan 20091 Cependant le r le des marques pig n tiques dans le maintien de l identit cellulaire n est pas encore clairement d fini Natoli 20111 L effet fonctionnel des principales marques d histones d pend au moins en partie de leur localisation Ainsi l tude de leur profil le long des g nes mais galement de leurs s quences r gulatrices a pu montrer que les promoteurs actifs pr sentent des modifications du type H3K4me3 et H3K27ac tandis que les enhancers actifs seraient plut t sujets H3K4mel et H3K27ac Les g nes transcrits poss deraient des modifications de type H3K36me3 alors que l h t rochromatine inaccessible aux l ments de r gulation pr senterait des marques H3K9me3 et H3K27me3 Visel et al 2009b Heintzman et al 2009 Il existe galement des variants d histones qui jouent des r les majeurs dans diff rents processus tels que la r paration de ADN Klose amp Zhang 2007 Billon amp C t 2011 l organisation centrom rique Foltz et al 20091 l inactivat
388. matine H t rochromatine g Histone amp Moditication QE Hi Deletion ap Fes JS so x rversion Insertion 102009 Interaction ARNm Fusion de transcrits de la chromatine RNA seq C3C seq DNA seq target seq re seq de novo seq Ficure 1 14 Les diff rentes tudes rendues possibles par le HTS avec divers niveaux d abs traction adapt de Fullwood et al 20091 1 4 Les techniques de s quencage tr s haut d bit 47 L utilisation du s quen age tr s haut d bit n cessite encore de nombreuses mises au point exp rimentales et bioinformatiques pour une analyse performante et aboutie des r sultats g n r s En effet les donn es produites repr sentent plusieurs Gigabytes Gb par chantillon dont l exploitation requiert de puissants ordinateurs de calcul et dont l archivage n cessite de grandes capacit s de stockage Le choix de la technologie de s quen age sera op r en fonction des applications sou hait es Ainsi le mod le de Roche sera choisi pour le s quen age de novo en raison de la taille plus importante des reads ce qui facilite l assemblage des g nomes Les technologies de type SOLID et Illumina seront pr f r es pour les tudes pig n tiques et la d tection de polymorphismes tels que les SNP les insertions et d l tions 1 4 2 1 Etude de la r gulation pig n tique Le contr le de la structure dynamique de la chromatine est
389. mation in vivo innate immunity elicited intracellular Loci involved in eicosanoid metabolism J Immunol 169 6498 6506 de Assis EF Silva AR Caiado LF Marathe GK Zimmerman GA et al 2003 Synergism between platelet activating factor like phospholipids and peroxi some proliferator activated receptor gamma agonists generated during low density lipoprotein oxidation that induces lipid body formation in leukocytes J Immunol 171 2090 2098 Castellheim A Brekke OL Espevik T Harboe M Mollnes TE 2009 Innate immune responses to danger signals in systemic inflammatory response syndrome and sepsis Scand J Immunol 69 479 491 Oppenheim JJ Yang D 2005 Alarmins chemotactic activators of immune responses Curr Opin Immunol 17 359 365 Bianchi ME 2007 DAMPs PAMPs and alarmins all we need to know about danger J Leukoc Biol 81 1 5 Cinel I Opal SM 2009 Molecular biology of inflammation and sepsis a primer Crit Care Med 37 291 304 Claus RA Otto GP Deigner HP Bauer M 2010 Approaching clinical reality markers for monitoring systemic inflammation and sepsis Curr Mol Med 10 227 235 Gill R Tsung A Billiar T 2010 Linking oxidative stress to inflammation Toll like receptors Free Radic Biol Med 48 1121 1132 Mockenhaupt FP Cramer JP Hamann L Stegemann MS Eckert J et al 2006 Toll like receptor TLR polymorphisms in African children Common TLR 4 variants predispose to severe malaria Proc Natl Acad Sci U S A 10
390. mature T cells from lymphoid progenitor cells involves a series of cell fate choices that direct differentiation In the context of the Immunological Genome Project ImmGen M W Painter et al used rigorously standardized conditions to analyze expression levels of pro tein coding gene in almost all defined T cell populations of the mouse 36 Using SAM analysis FDR 15 we selected a set of 281 genes repressed during the transition from thymic DN3 stage to DN4 stage Careful analysis indicated that this gene set was highly enriched in genes previously shown to be crucially involved during the first step of thymocyte development This includes cell surface markers such as Il2ra Cd25 and Il7r together with several transcriptional regulators includ ing Notchl Smarca4 Brg1 Dtx1 Deltex1 and Hes1 Hry More recently Neilson ef al identified specific miRNAs enriched at distinct stages of thymocyte development by deep sequencing 37 The authors showed that transcripts of the mir17 family are up regulated at DNA stage and thus could be involved in the repression of DN3 specific messenger RNAs during DN3 to DN4 transi 12 tion We thus combined one member of the mir17 family Mirn17 Mir17 with the mRNA gene list mentioned above This gene list was provided as input to InteractomeBrowser Figure 2A shows node placement according to cellular compartment As shown in Figure 2A and 2B this layout is ex tremely useful to directly focus on genes of inter
391. me of the chromosome Fields contain the following informations chromStart The starting position of the feature in the chromosome chromEnd The ending position of the feature in the chromosome name PWM identifier and representative names score A score for the PWM hit strand Defines the strand either or gene id The gene id of the target gene geneSymbol The genesymbol of the target gene e File name TBMC hs bed e File format bed 22 23 e Title TFBS predictions in the human genome Description of data A bed file containing TFBS predictions in the human genome 1 chrom The name of the chromosome Fields contain the following informations chromStart The starting position of the feature in the chromosome chromEnd The ending position of the feature in the chromosome name PWM identifier and representative names score A score for the PWM hit strand Defines the strand either or gene id The gene id of the target gene geneSymbol The genesymbol of the target gene File name Video tutorial doc e File format doc Title InteractomeBrowser functionalities Description of data Contains a web link to a screencast showing basic use of InteractomeBrowser plugin References 1 Barrett T Troup DB Wilhite SE Ledoux P Evangelista C Kim IF Tomashevsky M Marshall KA Phillippy KH Sherman PM Muertter RN Holko M Ayanbule O Yefanov A Soboleva A NCBI GEO archiv
392. men es chez l homme le rat ou la souris et pour lesquelles le nombre d chantillons est sup rieur 10 Ces 18 250 TS ont t dans un second temps annot es en utilisant la base de donn es de DAVID Database for Annotation Visualization and Integrated Discovery version 2005 Un enrichissement fonctionnel avec une p valeur inf rieure 0 05 avec un test exact de Fisher corrig par la correction de Benjamini and Hochberg a t mis en vidence pour 84 des TS Toutes les informations concernant l annotation des plateformes Affymetrix les exp 152 Chapitre 4 Fouille de donn es de puces ADN riences les signatures et leur annotation ont t stock es dans une base de donn es relationnelle MySQL 5 0 Les donn es d expression des TS sont conserv es dans des fichiers plats index s Nous avons d velopp une application java modulaire et volutive TBrowser sous la forme d un client Java distribu par Java Web Start permettant de consulter les informations contenues dans la base de donn es Un fichier jar ex cutable est galement propos au t l chargement sur le site FTP du TAGC Cet outil est compos d une interface graphique permettant des requ tes bool ennes utilisant donc des op rateurs logiques Les requ tes sont de la forme genel gene2 amp gene3 o et amp signifient respectivement O et ET les parenth ses servant structurer les priori
393. mes ALOX15B lipoxygenase 84 and cytochrome P450 epoxygenase family members 85 involved in the arachidonic acid metabolic pathway are also significantly increased in the DSS gene signature also reflecting activation of those sub pathways during DSS Thus a transcriptional signature related to the lipid related metabolic arachidonic acid pathway is activated in the whole blood cells of DSS children at the time of cardiovascular decompensation Discussion Numerous studies have addressed the pathophysiology of DSS the more frequent and severe complication of dengue infections Despite important findings only partial understand ing of the cellular and molecular processes that may support this PLoS ONE www plosone org life threatening syndrome has been obtained and we still lack a comprehensive overview of the complete figure of alterations that contribute to or reflect the setting up of the shock syndrome This could allow the improvement of patients management and treatment a major challenge for clinicians We designed a study aimed at analysing the quasi global transcriptome of whole blood cells from dengue paediatric patients looking at every modification that could make sense to the understanding of the pathogenic process The capacity of such an exhaustive approach to identify relevant host responses of which unsuspected pathways has been demonstrated in other systemic inflammatory syndromes such as human se
394. mes sur le g nome Enfin l tude de la m thylation de ADN par les techniques appel es methyl seq et Reduced Representation Bisulfite Sequencing RRBS permet la cartographie et la quantification du niveau de m thylation des cystosines methyl C au niveau des lots ou des dinucl otides CpG sur l ensemble du g nome Wu et al 201 1a Hansen et al 2011 Lan et al 20111 1 4 2 2 Etude du transcriptome Le HTS peut galement tre utilis pour tude du transcriptome Les ARN d int r t sont r trotranscrits en ADNc puis chaque ADNc est s quenc Ces donn es procurent des informa tions sur le contenu qualitatif et quantitatif en ARN des chantillons Etant plus sensible que les puces ADN le s quen age permet une r elle quantification des transcrits sans effets de saturation du signal sur une plus grande chelle dynamique Cette m thode peut tre utilis e dans le cadre de diverses applications l identification de nouveaux g nes transcriptome de novo l identification de r gions transcrites mais non traduites UTRs de r gions charni res intron exon de transcrits alternatifs par pissage alternatif de codons start l identification d unit s non codantes incluant les ARN non codants les micro ARN pr curseurs et les autres ARNs non traduits la d termination du niveau de transcription des g nes En HTS le transcriptome est tudi principalement par deu
395. n A ChEA transcription factor regulation inferred from integrating genome wide ChIP X experiments Bioinformatics 2010 26 2438 2444 25 16 Griffith OL Montgomery SB Bernier B Chu B Kasaian K Aerts S Mahony S Sleumer MC Bilenky M Haeussler M Griffith M Gallo SM Giardine B Hooghe B Van Loo P Blanco E Ticoll A Lithwick S Portales Casamar E Donaldson IJ Robertson G Wadelius C De Bleser P Vlieghe D Halfon MS Wasserman W Hardison R Bergman CM Jones SJM ORegAnno an open access community driven resource for regulatory annotation Nucleic Acids Res 2008 36 D107 113 17 Childress PJ Fletcher RL Perumal NB LymphTF DB a database of transcription factors involved in lymphocyte development Genes Immun 2007 8 360 365 18 Friedman RC Farh KK H Burge CB Bartel DP Most mammalian mRNAs are conserved targets of microRNAs Genome Research 2009 19 92 105 19 Lachmann A Ma ayan A KEA kinase enrichment analysis Bioinformatics 2009 25 684 686 20 Aranda B Achuthan P Alam Faruque Y Armean I Bridge A Derow C Feuermann M Ghanbarian AT Kerrien S Khadake J Kerssemakers J Leroy C Menden M Michaut M Montecchi Palazzi L Neuhauser SN Orchard S Perreau V Roechert B van Eijk K Hermjakob H The IntAct molecular interaction database in 2010 Nucleic Acids Research 2009 21 Keshava Prasad TS Goel R Kandasamy K Keerthikumar S Kumar S Mathivanan S Telikicherla D Raju R Shafreen B Ven
396. n E modulation of the mitogenic response of human T cells Differential response of T cell subpopulations J Clin Invest 64 1188 1203 Joshi PC Zhou X Cuchens M Jones Q 2001 Prostaglandin E2 suppressed IL 15 mediated human NK cell function through down regulation of common gamma chain J Immunol 166 885 891 Vogt L Schmitz N Kurrer MO Bauer M Hinton HI et al 2006 VSIG4 a B7 family related protein is a negative regulator of T cell activation J Clin Invest 116 2817 2826 5 de Kruif MD Setiati TE Mairuhu AT Koraka P Aberson HA et al 2008 Differential gene expression changes in children with severe dengue virus infections PLoS Negl Trop Dis 2 e215 Luplertlop N Misse D Bray D Deleuze V Gonzalez JP et al 2006 Dengue virus infected dendritic cells trigger vascular leakage through metalloproteinase overproduction EMBO Rep 7 1176 1181 Predescu D Predescu S Shimizu J Miyawaki Shimizu K Malik AB 2005 Constitutive eNOS derived nitric oxide is a determinant of endothelial junctional integrity Am J Physiol Lung Cell Mol Physiol 289 L371 381 Yang D Biragyn A Hoover DM Lubkowski J Oppenheim JJ 2004 Multiple roles of antimicrobial defensins cathelicidins and eosinophil derived neuro toxin in host defense Annu Rev Immunol 22 181 215 DiStasi MR Ley K 2009 Opening the flood gates how neutrophil endothelial interactions regulate permeability Trends Immunol 30 547 556 Foell D Wittko
397. n gene expression profiles This allows clustering the patients whose gene expression profiles are the more similar independently of their disease phenotype subtype As a result the 48 patients expression profiles were organized in two major subsets Figure 1 subset 1 first dendrogram branch includes both DF and DHF patients without distinction subset 2 second dendrogram branch encompasses a sub group 2a of DF and DHF patients and a distinct sub group 2b including 17 out of the 19 DSS patients whatever they received or not plasma infusion revealing a DSS gene signature common to most DSS patients Some few patients clustered however in unexpected subsets two DSS patients PLO05 PL101 had gene expression profiles closer to those of the DF DHF 2a subset while one DF patient PL064 and three DHF patients PLO37 PLO58 PLO70 gene expression profiles clustered within the DSS 2b subset We confirmed the robustness of the DSS gene signature using the iterative Support Vector Machine SVM classifier learning method 39 which reclassified all the 19 DSS patients together July 2010 Volume 5 Issue 7 e11671 subset 2 Figure 1 Unsupervised hierarchical clustering of whole blood cells expression profiles from the 48 dengue infected children PLoS ONE www plosone org Molecular Mechanisms of DSS The clustering is based on the 2959 gene list 3515 clones detailed in Table S2 discriminating dengue feve
398. n the perinuclear area We could also detect the presence of IKAP hELPI in the nucleus of hOE MSCs Figure 3A C Significantly FD hOE MSCs exhibit a weaker PLoS ONE www plosone org anti IKAP hELPI immunofluorescence staining compared to control cells with a similar distribution of the staining Figure 3D F Therefore collectively our results are in agreement with a wide distribution of IKAP hELP1 including a much lower IKAP hELP1 staining in FD hOE MSCs in agreement with RT qPCR and western blot analysis Transcriptome analysis identified fifty dysregulated genes It is widely accepted that culture conditions alone may exert effects on gene expression resulting in experimental inconsisten cies 38 39 Thus to investigate the involvement of candidate disease mechanisms in FD and to test whether differences in gene expression are stably imprinted in FD compared to control hOE December 2010 Volume 5 Issue 12 e15590 OE MSCs as a Model for FD IKAP hELP1 Hoechst Merged es 72 Ww O lt J x O oO N na Ra Fa Ww O Q L oO Ss wi c Q a LL Figure 3 IKAP hELP1 distribution in hOE MSCs Anti IKAP hELP1 immunofluorescence staining in control A B C FD hOE MSCs D E F and FD hOE MSCs treated with 100 uM kinetin for 24h G H I The primary antibody used is a mouse monoclonal anti IKAP hELP1 Scale bars represent 20 um doi 10 1371 journal pone 0015590 g003 MSCs we explored t
399. nal dif ferentiation Perera et al 2004 Therefore we can speculate that in FD the dysregulation of these candidate genes in FD will disrupt the precisely defined waves of migration differentiation and navigation of axonal growth cone for synapse formation which are all essential for the formation of the peripheral nervous system LYN is one of the genes that was found to be downregulated in our microarray data as validated by RT qPCR and IKBKAP knockdown in HeLa cells LYN encodes a Src family tyrosine kinase that have many roles in the process of oligodendrocyte differentiation Colognato et al 2004 Hossain et al 2010 and dopamine release in the mesolimbic system Gibb et al 2011 Importantly we highlighted 10 genes including IKBKAP whose dysregulation is shared by three indepen dent genome wide transcriptional studies Fig 3 Notably four of them CXCR7 SEMASA SNAI2 and TNC are closely related to cell migration Katafiasz et al 2011 Nishio et al 2005 Sadanandam et al 2010 Sanchez Alcaniz et al 2011 Since several studies pre viously suggested a contribution of altered migration pathways in the physiopathology of FD Close et al 2006 Cohen Kupiec et al 2011 Creppe et al 2009 Johansen et al 2008 Lee et al 2009 Naumanen et al 2008 future experiments will aim to investigate the role of those four genes in functional migration assays using hOE MSCs Understanding the mechanisms underlying reg
400. nctionnel et analytique il est ensuite possible et recommand de contextualiser afin de g n rer des r seaux de g nes La construction de tels r seaux permettra de comprendre et de mod liser les liens fonctionnels unissant les g nes discriminants pr c demment identifi s A terme cela contribuera au d cryptage du m canisme de la pathologie ou du mod le tudi De nombreux outils permettent de cr er ce genre de r seaux IPA cytoscape 3 4 Analyses de donn es dans le cadre de collaborations Avant l introduction de la technique de puces ADN sur lame de verre au sein de la plateforme Transcriptome le TAGC utilisait des puces radioactives sur support nylon technique pour laquelle le laboratoire tait pr curseur La plateforme TGML avait le mat riel et les comp tences requises pour r aliser la conception et le d p t des sondes sur les puces qui taient ensuite utilis es dans diverses tudes Ainsi le d veloppement de la librairie R AgiND voir Chapitre 2 a donn lieu de nombreuses collaborations pour les deux technologies puces fluorescence Agilent et puces radioactives nylon Deux de ces collaborations dans des domaines tr s diff rents se sont concr tis es par des publications 1 la collaboration avec le Dr Patricia Paris de l IMTSSA Institut de M decine Tropi cale du Service de Sant des Arm es Marseille porte sur la d finition d une signature transcriptionnelle caract ristique d
401. nding of pathophysiological processes underlying systemic critical illnesses such as sterile and non sterile systemic inflamma tory responses syndromes SIRS allowing the identification of relevant disease biomarkers and of new putative therapeutic targets 22 24 26 Genome wide expression studies aimed at deciphering molec ular responses altered in the whole blood cells of adults 27 and children DSS patients 28 have been implemented recently by colleagues They reported a decreased IFN type I induced response and a benign transcriptional response at the time of cardiovascular decompensation 27 28 but failed in identifying biological pathways relevant to DSS pathophysiology and particularly inflammatory ones that could sustain microvascular dysfunction 28 We report here the results of a prospective study comparing the whole blood genome wide expression profiles of 48 matched Cambodian children recruited during the huge 2007 dengue outbreak who presented with classical dengue fever DF dengue hemorrhagic fever grades I II DHF or dengue shock syndrome DSS according to the 1997 WHO classification of dengue severity 29 Based on careful study design and statistic treatment of microarrays data we identified a large and highly relevant gene signature of DSS never reported before that discriminates DSS children from paediatric patients with DF or DHF grades I II who did not present severe clinical complications Using an integr
402. neMAPP We reasoned that the more frequently two genes fall in the same TS the more likely these genes belong to the same core functional network To test this hypothesis we produced a Boolean matrix with 22 215 probes from GPL96 platform as rows and 3 114 GPL96 specific TS as columns only TS containing 30 to 1500 probes were included This matrix was filled with zero and elements were set to if a given gene was observed in the corresponding TS Hierarchical clustering with uncentered Pearson s correlation coefficient was used to reveal genes frequently associated to the same TS Given the order of the resulting matrix it could not be visualized on a desktop computer using conventional software 1e Treeview MeV We thus developed the TBMap plugin which allows one to visualize the map but also to superimpose a user defined or a KEGG related gene list As expected most of the clusters where obviously enriched in genes involved in similar biological processes Protein biosynthesis Ribosome function oxidative phosphorylation cell cycle fatty acid metabolism valine leucine and isoleucine degradation extracellular matrix breast cancer cells structural PLoS ONE www plosone org 8 constituent of muscles neuronal processes etc This was particularly clear when KEGG pathway informations were superimposed see Figure S6 The Figure 4 presents some of the clusters that were identified as related to immune system functions We coul
403. ng Our next goal was to assess whether the production of both WT and MU J KBKAP mRNAs can be modulated in our model In previous studies one compound kinetin 6 furfurylaminopurine was found to correct JABKAP splicing and increase IKAP hELP1 production in FD cells 35 We tested whether this drug could also modify the splicing defect of ZXBKAP in FD hOE MSC cells For this purpose we used increasing concentrations of kinetin 25 to 200 uM on a FD hOE MSCs culture for 72 h As expected after semi quantitative RT PCR we observed a significant decrease of MU transcript compared to non treated cells on agarose gel electrophoresis Figure 7A The level of JABKAP mRNA splicing correction increased proportionally to the concentration of kinetin and the MU transcript almost vanished at 100 uM The dose dependent action of kinetin on increasing WT MU ratio was confirmed by RT qPCR analysis Figure 7B A similar finding was observed when IKAP hELP1 proteins were detected by western blot analysis Figure 7C Accordingly when FD hOE MSC were incubated with 100 uM kinetin for 24 h we observed a major increase of anti IKAP hELPI staining in cytoplasmic as well as in nuclear areas Figure 3G I However the same kinetin treatment could not rescue the migration defect observed in FD hOE MSCs with the Boyden s chamber assay Figure 5 In order to determine how fast kinetin modulates JABKAP mRNA splicing we performed a time course experiment wit
404. nn es HTS Le NCBI et l EBI ont galement mis en place des bases de donn es pour acc der aux donn es de s quen age tr s haut d bit Sequence Read Archive SRA et European Nu cleotide Archive ENA respectivement Mais avec l explosion de ces techniques le nombre d exp riences cro t de mani re exponentielle ce qui n cessite l augmentation des capacit s de stockage l o quelques dizaines de Mb suffisaient pour les puces ADN il faut maintenant plusieurs centaines de Gb de stockage pour une exp rience de HTS En effet ces bases de donn es permettent 1 le stockage des fichiers bruts csfasta qual pour SOLiD ou fastq 242 Chapitre 5 tude de la r gulation transcriptionelle par HTS qui repr sentent plusieurs Gb 2 des alignements fichiers bam et 3 de donn es r sultant de pipeline d analyses tertiaires parfois format bed pour la localisation des pics provenant d exp rience de ChIP seq De la m me mani re que MIAME pour les donn es de puces ADN le FGED a mis au point le Minimum Information about a highthroughput Nucleotide SeQuencing Experiment MINSEQ pour les donn es de s quen age tr s haut d bit 5 4 Elaboration d outils et de m thodes d analyse pour les donn es de ChIP seq Afin de pouvoir analyser les donn es de ChIP seq issues du s quenceur tr s haut d bit SOLiD pr sent sur la plateforme TGML un pipeline de traitement des donn es de ChIP seq a t d vel
405. nnel gt args agNormData function object whichSlot c Mean Processed bgCorrection F type quantiles percent 1 NULL 4 1 Lowess method This normalization can be obtained by the agLowessNorm function which retun a normalized object by the Lowess or Loess method without background correction The command to obtain this normalized object are gt norm1 lt agNormData myob Mean bgCorrection FALSE type lowess percent 1 gt normi lt agLowessNorm myob Mean To process the background correction the command is gt norm2 lt agNormData myob Mean bgCorrection TRUE type lowess percent 1 4 2 Quantiles method This normalization can be obtained by the agQuantilesNorm function which retun a normalized object by the quantiles method without background correction The command to obtain this normalized object with or without background correction are gt norm3 lt agNormData myob Mean bgCorrection FALSE type quantiles gt norm3 lt agQuantilesNorm myob Mean gt norm4 lt agNormData myob Mean bgCorrection TRUE type quantiles 19 Thus the normalized object class is e AgilentNorm for a normalized object from an AgilentBatch object class one colour e AgilentNormRG for a normalized object from an AgilentBatchRG object class two colours These normalized objects contain new slots SgNorm rSgNorm gSgNorm A M and Type S
406. ns K amp Blust R 2009 Best prac tices for hybridization design in two colour microarray analysis Trends in biotechnology 27 7 406 14 Kohonen 1997 Kohonen T 1997 Self Organizing Maps New York Springer Korbel et al 2007 Korbel J O Urban A E Affourtit J P Godwin B Grubert F Si mons J F Kim P M Palejev D Carriero N J Du L Taillon B E Chen Z Tanzer A Saunders A C E Chi J Yang F Carter N P Hurles M E Weissman S M Har kins T T Gerstein M B Egholm M amp Snyder M 2007 Paired end mapping reveals extensive structural variation in the human genome Science New York N Y 318 5849 420 6 294 Bibliographie Kornberg 1999 Kornberg R D 1999 Eukaryotic transcriptional control Trends in cell biology 9 12 M46 9 Kouzarides 2007 Kouzarides T 2007 Chromatin modifications and their function Cell 128 4 693 705 Krol et al 2010 Krol J Loedige I amp Filipowicz W 2010 The widespread regulation of microRNA biogenesis function and decay Nature reviews Genetics 11 9 597 610 Lacoste amp C t 2003 Lacoste N amp C t J 2003 The epigenetic code of histones M decine sciences M S 19 10 955 9 Lan et al 2011 Lan X Adams C Landers M Dudas M Krissinger D Marnellos G Bonneville R Xu M Wang J Huang T H M Meredith G amp Jin V X 2011 High res
407. ns Un seuil arbitraire de FDR est choisi par l utilisateur g n ralement 5 en fonction du nombre de faux positifs qu il est pr t accepter observed relative difference d i 10 5 0 5 10 expected relative difference d i FIGURE 3 3 Repr sentation de la valeur de d obtenue pour chaque g ne i soit d i en fonction de la valeur simul e dg i 3 1 3 ANalysis Of VAriance ANOVA L analyse de variance ANalysis Of Variance ANOVA est une m thode param trique utili s e pour comparer les moyennes d au moins 3 groupes d chantillons Draghici ef al 2003 Elle postule que les donn es sont distribu es suivant une loi normale et que les g nes sont ind pendants pour une analyse globale Ceci n est g n ralement pas le cas pour les donn es de puces ADN car les g nes ne sont pas ind pendants en terme de r gulation N anmoins elle est couramment utilis e pour estimer si les diff rences observ es entre ces moyennes sont significatives ou non en prenant en compte diverses variables explicatives cat gorielles type cellulaire temps de traitement sexe Selon le nombre de fac teurs pris en compte on parle d ANOVA mono ou multi factorielle Son principe est bas sur le fait que la variance observ e est due a la contribution d une ou de plusieurs sources distinctes Consid rons par exemple le cas de cellules provenant de diff rents types de cancers C sein prostate ovaire
408. ns entre les locus g nomiques et d identifier les sites de r gulation associ s au TSS lors de l initiation de la transcription pour aider r soudre ces probl mes Dekker ef al 2002 Dostie et al 2006 Simonis et al 2006 Zhao et al 2006 Divers outils d annotations ont t impl ment s Savant Fiume ef al 20101 ChIPpeakAnno Zhu ef al 20101 CEAS Shin ef al 20091 cisgenome Barozzi et al 2011 et GREATER McLean et al 20101 Chez les mammif res pr s de la moiti des sites de liaison identifi s sont associ s des g nes inactifs Hatzis et al 2008 En effet de nombreux facteurs de transcription sont soit des cofacteurs soit des r presseurs Il est donc important de savoir si un g ne est bien fonctionnellement li au facteur de transcription auquel il est associ Pour fournir un argument prouvant que les sites sont fonctionnels plusieurs m thodes peuvent tre utilis es comme 1 tudier l expression diff rentielle des g nes avec et sans site de fixation associ Johnson ef al 2007 Chen et al 20081 2 valuer l expression de g nes cibles dans les cellules dans lesquelles l expression du facteur d int r t a t r duite ou supprim e et 3 rechercher la concordance des sites de liaison et des modifications des histones qui d limitent les r gions enhancer et promoteur comme H3K4mel et H3K4me3 Barski ef al 2007 5 3 7 Bases de donn es d di es aux do
409. nscriptome variations resulting from a reduced level of ZXBKAP transcripts using microarray technology 10 22 34 40 53 However poor correlations were observed between these studies Several reasons can explain these discrepancies First various cell types used at different stages of development and differentiation have been studied brain tissue fibroblasts HeLa cells HCT116 cells iPS cells The cells tested in the current study are likely at a stage between the iPS and iPS cell derived neural crest precursors developed by the Studer group 22 It is thus not surprising that most of our microarray results overlap with that of the iPS cell study Second a potential source of variability among transcriptome analyses derives from the technical manipulations employed to downregulate JABKAP presence of the FD mutation in its original context compared to WT ZXBKAP knockdown using different interfering RNAs resulting in differ ential residual ZXBKAP IKAP expression Third in the context of a rare disease a small sample size may cast doubt upon the validity of drawn conclusions To decrease statistical bias we decided to increase the number of samples of our 5 control and 4 FD patients by collecting data from 4 different passages P1 P2 P5 and P9 of each primary cell line We hypothesized was that such a method PLoS ONE www plosone org would allow us to i increase the statistical power of our analysis and ii explore the eff
410. nt galement en compte la valeur de qualit de s quen age de chaque base ou dibases contenue dans les fichiers _QV qual ou fastq permettant d aligner chaque base sur la r f rence en mettant un poids diff rent aux bases en fonction de leur qualit Des jeux de s quences tests ou benchmark ont t cr s pour permettre la comparaison de ces logiciels Holtgrewe et al 2011 Les donn es d alignement sont la plupart du temps produites au format bam devenu le format quasi standard pour ce type de donn e Ce format outre le fait qu il soit compress et 5 3 Analyse de donn es de ChIP seq 237 permette de gagner de l espace de stockage est galement index L acc s aux donn es align es est ainsi extr mement rapide ce qui permet de les parcourir de fa on relativement confortable malgr leur taille parfois plusieurs dizaines de Go Une fois les fragments align s ceux ci sont s lectionn s sur la base de leur qualit Il peuvent tre ainsi directement visualis s l aide d un navigateur de g nome genome browser tel que UCSC genome browser Integrated Genome Browser IGB ou Integrative Genomics Viewer IGV 5 3 4 Recherche de pics Le but d une exp rience de ChIP seq est d identifier les r gions enrichies en fragments par rapport au bruit de fond et ou l input Celles ci repr sentent les sites de liaison d un facteur de transcription ou des sites tendus pour les marques d histone
411. nt sur la r gulation transcriptionnelle notamment dans le cancer du sein les lymphomes les glioblastomes ou encore dans le sepsis Ce laboratoire effectue galement des recherches fondamentales sur le contr le du cycle cellulaire de la diff renciation et de l activation des lymphocytes T chez les mammif res Ces projets combinent des analyses portant sur la transcription et la r gulation de l expression des g nes et impliquent le d veloppement d outils d analyse et d approches bioinformatiques Les stages pr doctoraux m ont permis d acqu rir une exp rience dans 1 l analyse de donn es de puces ADN avec le d veloppement d une librairie R permettant le contr le qualit et la normalisation des puces sur lame de verre Agilent et 2 la m ta analyse de donn es de puces 4 ADN provenant de Gene Expression Omnibus GEO avec ma participation au projet TranscriptomeBrowser initi en 2007 par le Docteur Denis Puthier Ce dernier projet m a permis d tudier la co expression de g nes et leur r gulation dans le cadre de I activation et de la diff renciation des lymphocytes T Suite a l obtention en Juillet 2008 d une bourse d allocataire de recherche MRT j ai souhait poursuive mes travaux de recherche au TAGC J ai ainsi effectu ma th se en co direction avec les docteurs Jean Imbert et Denis Puthier Dans la continuit de mes travaux de master ma th se a eu pour objet le d veloppement d outils e
412. ntification files 2 2 2 1 One channel hybridizations 3 2 2 2 Two channels hybridizations 4 2 3 Building phenoData and MIAME files 4 2 4 Object MIOEMATIONS saras ni eee due Sede A Ed ELA NE ee Lt Lara 5 2 41 Class description ie wep US Se eee te eee ee eb ee ee amp 5 2 42 Accessing slots he ra ans Darren nt amp etes ei eee aha eee AS ia 5 2 4 3 Exclude data from object 7 3 Diagnostics plots 8 3 1 The agBoxplot function 2 2 yir ig de han he ae en N E aia ele ee Bee ed 8 3 1 1 Distribution inter arrays 2 48 ek AA nn a 8 3 1 2 Distribution intra array 9 3 2 The agMAplot function 12 3 3 The agImage function 14 3 4 The agPlot function suem iaa de Se M BA wa ae de Pod a a en ABS 18 4 Normalization 19 4 1 Lowess method 19 4 2 Quantiles method 19 5 Data exportation 21 5 1 Creation of the ExpressionSet object 21 5 2 Exemples using the ExpressionSet object 21 1 Introduction This document is intented to provide a brief overview of the AgiND package The library was developped for diagnosis and normalization of one channel and two channel Agilent mi
413. ntillons Elle est donc particuli rement efficace pour normaliser une s rie d chantillons dont les distributions de valeurs d expression sont proches Elle suppose que la distribution de l abondance des g nes soit presque similaire dans tous les chantillons L inconv nient de cette m thode est qu elle peut donner un poids important des valeurs faibles Cette m thode utilise comme r f rence une puce a ADN dite synth tique Smyth et al 20031 Ces donn es synth tiques correspondent g n ralement aux 2 3 Contexte du projet 65 moyennes ou aux m dianes des valeurs d expression calcul es sur l ensemble des chantillons quantile par quantile Ces valeurs de r f rence sont ensuite utilis es pour remplacer quantile par quantile les valeurs d expression Figure 2 2 A log2 intensit B Trier chaque C Remplacer chaque valeur brutes colonne en fonction de la valeur m diane encadr e en rouge E1 E2 E3 E4 E5 El E2 E3 E4 B5 E1 E2 E3 E4 ES v 1 11 13 29 26 1 BE 30 29 27 28 28 28 28 28 v2 15 17 5 8 14 18 23 16 24 26723 23 23 23 23 watts lva 21 2 12 20 25 15 19 13 22 254 19 19 19 19 19 F d v4 10 19 16 24 4 10 17 12 20 T H 14 14 14 14 14 d expression 48 28 3 22 27 5 8 946 8 8 8 8 lve 7 23 30 6 9 B 6 4 3 3 3 3 2 3 LE et Sigs Bo DE S 22 2 2 4 2 5 ai dics des 5 2 58 3 8 28 31 2 ow AS 3 lignes 4k L 3 63 Zit 24 53 2 5 5 5 5 5 Ss 16116 1 22 amp
414. o compare results obtained through microarray ChIP on chip ChIP seq GGH or protein protein interaction experiments to those previously stored in the GEO database In all tested experiments we found that DBF MCL gives very good results both on simulated datasets and real microarray datasets Although Lattimore et al proposed another MCL based algorithm geneMCL we were unable to compare our results with their implementation as the software is no longer available nor maintained However DBF MCL was run on the full van t Veer DataSet 25 117 biological samples that was used by Lattimore and collaborators in the original paper In their report the authors used a subset of genes 5 730 out of 24 482 that were selected based on their associated variance Our procedure run on the full dataset led to the selection of 5 932 genes that fall into 22 clusters in contrast to 154 clusters using geneMCL This discrepancy is likely to be due to the filtering step applied to the dataset Indeed a strong associated variance can also be reminiscent of punctual random artifacts Thus selecting those genes will generate small or singleton clusters In this context the MDNN statistic better handle these artifacts as its purpose is to conserve genes that belong to dense region in the hyperspace To date Browser provides user with only one partitioning solution for a dataset However as density is heterogeneous inside a dataset several partitioning so
415. o temporelle pour permettre aux processus biologiques de se produire dans un type cellulaire donn et au stade d veloppemental appropri tel est le r le de la r gulation pig n tique Ce terme dont la d finition initiale fut introduite en 1942 par Conrad H Waddington d finit les modifications transmissibles et r versibles de la chromatine ne s accompagnant pas de changements de la s quence nucl otidique de l ADN Il est cependant important de ne pas confondre pig n tique et pig nome En effet l pig nome est l tat pig n tique de la cellule Un pig nome fait donc r f rence aux caract ristiques pig n tiques d une cellule donn e telles que la m thylation de l ADN les modifications d histones et l accessibilit de la chromatine permettant l acc s au g nome et ainsi l expression d ARN messagers et non codants Bernstein ef al 20101 Chaque type cellulaire un tat de diff renciation donn poss de ainsi son pig nome qui d finit son programme d expression g nique Une meilleure compr hension des m canismes de r gulation pig n tiques et des pig nomes a t rendue possible par le d veloppement du s quen age tr s haut d bit d crit dans la partie suivante de ce chapitre L utilisation de ces approches dans une vari t des tissus a ainsi 30 Chapitre 1 Introduction g n rale H3K4me3 H3K27me3 H3K4mel H4K20mel 7 Inactivegene
416. obbin et al 2003 Smyth et al 2003 Knapen et al 2009 La m thode d change de marqueurs fluorescents consiste inverser le marquage des 2 chantillons et donc hybrider chaque chantillon 2 fois apr s marquage avec chaque fluorochrome ce qui double ainsi le nombre de puces r alis es Cela pose un probl me de co t et n cessite d avoir du mat riel biologique en quantit suffisante Cette approche est tr s souvent utilis e dans le cas des tudes en canc rologie car elle permet la comparaison directe sur une puce d un chantillon pathologique par rapport un chantillon de r f rence dit sain Quant aux puces one color elles pr sentent comme principaux avantages la simplicit et la flexibilit de la conception exp rimentale les comparaisons entre les diff rentes puces d une exp rience sont facilit es surtout lorsque le nombre d chantillons est important De plus cette approche permet de r duire les sources de variabilit lors des tests statistiques gr ce l utilisation de r plicats biologiques et techniques Le consortium MAQC MicroArray Quality Control cr en f vrier 2005 a d montr qu en conditions bien contr l es les comparaisons inter et intra laboratoires de r sultats de puces ADN indiquent une bonne reproductibilit Irizarry et al 2005 Shi et al 2006 Shi et al 20101 2 2 Correction des donn es brutes 61 2 1 2 Acquisition des donn es brut
417. ode de Sanger ont ainsi vu le jour ChIP serial analysis of chromatin occupancy SACO Impey et al 2004 ChIP serial analysis of binding elements SABE Chen amp Sadowski 2005 ChIP sequence tag analysis of genomic enrichment STAGE Bhinge et al 2007 Genome wide mapping technique GMAT Roh et al 2004 Ces approches ont r cemment t supplant es par le ChIP seq une technique associant immunopr cipitation de la chromatine et s quen age tr s haut d bit des fragments de sonication immunopr cipit s Barski et al 2007 Johnson et al 2007 Contrairement aux techniques ant rieures le ChIP seq permet de d terminer le site de fixation d une prot ine avec une pr cision de quelques dizaines de bases seulement pour peu que la couverture nombre de fragments couvrant la zone d int r t soit suffisante Ho et al 2011 Figure 5 1 et Table 5 1 De plus l utilisation du mode de s quengage paired end a permis d accroitre encore la sp cificit et la pr cision des r sultats ChIP PET Zeller ef al 2006 Enfin on peut mentionner l existence d une nouvelle technique tr s haut d bit appel e ChIA PET Chromatin Interaction Analysis using Paired End Tag sequencing Combinant ChIP PET et 3C seq voir partie 1 4 2 elle a t r cemment utilis e pour l tude des enhancers r gions r gulatrices loign es des r gions promotrices et g niques Fullwood et al 20091 5 1
418. oding RNAs in mammals Nature 458 7235 223 7 Hansen et al 2011 Hansen K D Timp W Bravo H C Sabunciyan S Langmead B McDonald O G Wen B Wu H Liu Y Diep D Briem E Zhang K Irizarry R A amp Feinberg A P 2011 Increased methylation variation in epigenetic domains across cancer types Nature genetics Hatzis et al 2008 Hatzis P van der Flier L G van Driel M A Guryev V Nielsen F Denissov S Nijman I J Koster J Santo E E Welboren W Versteeg R Cuppen E van de Wetering M Clevers H amp Stunnenberg H G 2008 Genome wide pattern of TCF7L2 TCF4 chromatin occupancy in colorectal cancer cells Molecular and cellular biology 28 8 2732 44 Heintzman et al 2009 Heintzman N D Hon G C Hawkins R D Kheradpour P Stark A Harp L F Ye Z Lee L K Stuart R K Ching C W Ching K A Antosiewicz Bourget J E Liu H Zhang X Green R D Lobanenkov V V Stewart R Thomson J A Crawford G E Kellis M amp Ren B 2009 Histone modifications at human enhan cers reflect global cell type specific gene expression Nature 459 7243 108 12 Hillier et al 2008 Hillier L W Marth G T Quinlan A R Dooling D Fewell G Bar nett D Fox P Glasscock J I Hickenbotham M Huang W Magrini V J Richt R J Sander S N Stewart D A Stromberg M Tsung E F Wylie T Schedl
419. of DSS 81 Activation of this lipid metabolic pathway in innate cells such as neutrophils or lipid laden monocytes during inflammatory process or infection 125 results in the production of eicosanoid lipid mediators that are not only physiological regulators of vascular tone and permeability 81 but also potent pro inflammatory mediators involved in a number of pathologies such as asthma 81 Interestingly formation of lipid bodies where eicosanoid synthesis takes place can be induced by ox LDL through activation of the PPARY nuclear lipid receptor 126 thus suggesting a direct link between the three pro inflammatory pathways identified in DSS children and a contribution of arachidonic pathway related inflammatory lipids and oxidative enzymes to the systemic vascular dysfunction leading to DSS Fourth DAMPs and TLRs could be a link from primary to secondary inflammation leading to DSS Occurrence of DSS in only some patients at the late phase of infection is likely due to an inadequate control or an amplification of the primary inflamma tory response aimed at fighting infection The pro inflammatory molecular responses activated in the blood cells of DSS children at time of shock involve a diversity of innate immune mediators that may amplify a first line inflammatory response mediated by TNF IL 6 or IL 1 thus contributing to a secondary inflammatory loop Indeed a number of repair remodeling and of defence gene products over express
420. of covalent histone modifications Nature 403 6765 41 5 Subramanian et al 2005 Subramanian A Tamayo P Mootha V K Mukherjee S Ebert B L Gillette M A Paulovich A Pomeroy S L Golub T R Lander E S amp Mesirov J P 2005 Gene set enrichment analysis a knowledge based approach for interpreting genome wide expression profiles Proceedings of the National Academy of Sciences of the United States of America 102 43 15545 50 Suzuki et al 2011 Suzuki S Ono N Furusawa C Ying B W amp Yomo T 2011 Com parison of sequence reads obtained from three next generation sequencing platforms PloS one 6 5 e19534 Tamayo 1999 Tamayo P 1999 Interpreting patterns of gene expression with self organizing maps Methods and application to hematopoietic differentiation Proceedings of the National Academy of Sciences 96 6 2907 2912 Textoris et al 2010 Textoris J Ban L H Capo C Raoult D Leone M amp Mege J L 2010 Sex related differences in gene expression following Coxiella burnetii infection in mice potential role of circadian rhythm PloS one 5 8 e12190 Tomaru et al 2009 Tomaru T Steger D J Lefterova M I Schupp M amp Lazar M A 2009 Adipocyte specific expression of murine resistin is mediated by synergism between peroxisome proliferator activated receptor gamma and CCAAT enhancer binding proteins The Journal of biological chemis
421. ofile This suggests that a shift from a severe to an uncomplicated transcriptional profile may occur within a very short time and could explain the uncomplicated and benign gene immune transcriptional responses reported by Long et al 28 Differences in strategies and methods used to filter genes differentially expressed between patients groups could also explain the finding that few IFN type I related genes but a large diversity of other pathways were identified in the present study compared to other transcriptomic studies of DHF or DSS patients Here genes were selected considering only their statistical significance and their association with the disease phenotype Differently from others 27 28 42 no fold change cut off filter was applied since this non statistically motivated criteria selects preferentially genes prompt to high variations such as the IFN type I induced genes 92 93 thus excluding from subsequent bio informatic analysis a diversity of transcripts exhibiting more subtle variations but strong associations and biological relevance with the considered disease phenotype Third unsuspected mechanisms identified in DSS patients could contribute importantly to the pathophysiology of this severe syndrome as supported by similarities between those DSS related alterations and other critical syndromes Interestingly a number of immune repair remodeling and metabolic related related path ways are simultan
422. ol 20 1240 1245 Axelrod FB 2004 Familial dysautonomia Muscle Nerve 29 352 363 Axelrod FB lyer K Fish I Pearson J Sein ME Spielholz N 1981 Progressive sensory loss in familial dysautonomia Pediatrics 67 517 522 Axelrod FB Liebes L G vS G Mendoza S Mull J Leyne M Norcliffe Kaufmann L Kaufmann H Slaugenhaupt SA 2011 Kinetin improves IKBKAP mRNA splicing in patients with familial dysautonomia Pediatr Res 70 480 483 Boone N Loriod B Bergon A Sbai O Formisano Treziny C Gabert J Khrestchatisky M Nguyen C Feron F Axelrod FB Ibrahim EC 2010 Olfactory stem cells a new cellular model for studying molecular mechanisms underlying familial dysautonomia PLoS One 5 e15590 Buhusi M Demyanenko GP Jannie KM Dalal J Darnell EP Weiner JA Maness PF 2009 ALCAM regulates mediolateral retinotopic mapping in the superior colliculus J Neurosci 29 15630 15641 Burr J Sharma M Tsetsenis T Buchman V Etherton MR Siidhof TC 2010 Alpha synuclein promotes SNARE complex assembly in vivo and in vitro Science 329 1663 1667 Cheishvili D Maayan C Cohen Kupiec R Lefler S Weil M Ast G Razin A 2011 IKAP Elp1 involvement in cytoskeleton regulation and implication for familial dysautonomia Hum Mol Genet 20 1585 1594 Cheishvili D Maayan C Smith Y Ast G Razin A 2007 IKAP hELP1 deficiency in the cerebrum of familial dysautonomia patients results in down regulation of genes involved in oligodendrocyte differentia
423. ological pathways Inferred functional interactions Batch query add remove hide interactors and interactions Movable nodes Build in graph visualizer Compartment based layout 20 20 20 20 20 20 2 HM MR en XR EE EL 22 2 EL ENT EPTITIT E LES KKK KRM cman gt lt lt 26 26 26 20 26 2 2 270 36 3 20 2 2 RR R K mans lt lt lt lt HR ER K meractomeBrowser Refers to bioinformatic prediction of TFBSs using PWMs Refers to results from large scale experimental methods that profile the binding of TFs to DNA at the genome wide level e g ChIP Seq ChIP chip Refers to computational methods that aggregate various informations e g expression genomic distance conservation to infer functional interactions d Search Tool for the Retrieval of Interacting Genes Proteins 19 MotifMap visualizer was not available during our tests Informations related to the visualizer were obtained from documentation Agile Protein Interaction DataAnalyzer 20 Description of additional data files e File name Fig S1 pdf e File format pdf e Title Number of predicted motifs versus GC content of PWMs e Description of data Each point corresponds to the results obtained using one PWM on mouse genome The name of a representative transcription factor for each PWM is displayed together with the PWM identifier informations are separated using a pipe character The
424. olution detection and analysis of CpG dinucleotides methylation using MBD Seq techno logy PloS one 6 7 e22226 Langmead et al 2009 Langmead B Trapnell C Pop M amp Salzberg S L 2009 Ultrafast and memory efficient alignment of short DNA sequences to the human genome Genome biology 10 3 R25 Lee amp Mahadevan 2009 Lee B M amp Mahadevan L C 2009 Stability of histone modi fications across mammalian genomes implications for epigenetic marking Journal of cellular biochemistry 108 1 22 34 Lee amp Young 2000 Lee T I amp Young R A 2000 Transcription of eukaryotic protein coding genes Annual review of genetics 34 77 137 Lemoine et al 2006 Lemoine S Combes F Servant N amp Le Crom S 2006 Goulphar rapid access and expertise for standard two color microarray normalization methods BMC bioinformatics 7 467 Li et al 2007 Li B Carey M amp Workman J L 2007 The role of chromatin during trans cription Cell 128 4 707 19 Li amp Durbin 2009 Li H amp Durbin R 2009 Fast and accurate short read alignment with Burrows Wheeler transform Bioinformatics Oxford England 25 14 1754 60 Li et al 2009 Li H Handsaker B Wysoker A Fennell T Ruan J Homer N Marth G Abecasis G amp Durbin R 2009 The Sequence Alignment Map format and SAMtools Bioinformatics Oxford England 25 16 2078 9 Li amp Hom
425. ompared by multiplatform microarray profiling J Immunol 2011 186 3047 3057 37 Neilson JR Zheng GXY Burge CB Sharp PA Dynamic regulation of miRNA expression in ordered stages of cellular development Genes amp Development 2007 21 578 589 38 Chen X Xu H Yuan P Fang F Huss M Vega VB Wong E Orlov YL Zhang W Jiang J Loh Y H Yeo HC Yeo ZX Narang V Govindarajan KR Leong B Shahab A Ruan Y Bourque G Sung W K Clarke ND Wei C L Ng H H Integration of external signaling pathways with the core transcriptional network in embryonic stem cells Ce 2008 133 1106 1117 29 d 6o OI lt OlL68L9SbEZTO Figure 1 Tv 48 waoana 1132 gt germe Tv 48 WHSLOSISEMLISUHOeY NLLYWOUHD 40 SDNVNGLNIVW BO LNSWH HSMeviss Wi SNTTYNSNS orem a SE WHSL HSHINTYISSNSI30 ONY ALINMINW 5 g x BEEEEE dS wus BSHINveINOLWLINGHS310 ONY NOLITES4NOwd T32 SEBERE E ain J3 Sohmoe HF S 33 FE Sga EUR 5 g 1 y 3 2 ex Additional files provided with this submission Additional file 1 Fig S1 pdf 20K http www biomedcentral com imedia 1502059913646762 supp1 pdf Additional file 2 Fig S2 pdf 10K http www biomedcentral com imedia 3065379596467629 supp2 pdf Additional file 3 Fig S3 pdf 21K http Awww biomedcentral com imedia 1 156508138646762 supp3 pdf Additional file 4 Fig S4 pdf 18K http www biomedcentral com imedia 2672681 166467629 supp4 pdf Additional file 5 TBMC mm bed 5802K http www biome
426. on l tude de la distribution des marques pig n tiques le long des g nes du g nome 5 2 L informatique du HTS 221 Les sites de liaison des facteurs de transcription sont d tect s par l observation d un enrichissement localis en s quences immunopr cipit es formant un pic l endroit pr cis o le facteur est li l ADN Wilbanks amp Facciotti 2010 Pepke et al 20091 Figure 5 2 Les fragments de chromatine immunopr cipit s l aide de l anticorps sp cifique de cette prot ine sont de taille variable comprise entre 150 et 300 nucl otides Pour chacun de ces fragments le s quen age permet l acquisition de la s quence des 50 premiers nucl otides si on utilise un s quenceur de type SOLID en mode fragment Apr s alignement sur la r f rence ces 50 nucl otides doivent mettre en vidence un d s quilibre de brin avec au centre le site de liaison du facteur de transcription Figure 5 2 Le d s quilibre de brin est sch matiquement repr sent par la pr sence d un pic sur chaque brin et du g nome Ces pics sont d cal s d une distance d correspondant la taille de sonication Le facteur de transcription tudi peut tre soit en interaction avec la chromatine ou soit directement li l ADN au niveau d un site s quence sp cifique appel motif Les marques pig n tiques modifications d histones variants d histones ou bien encore ARN polym rase
427. on ou ChIP Gilmour amp Lis 1985 et s quen age tr s haut d bit des fragments d ADN immuno pr cipit s permet l identification pr cise in vivo des sites de fixation des prot ines dans le g nome Ces prot ines se liant l ADN comprennent notamment les ARN polym rases les facteurs de transcription et les histones 5 1 Principe de l immunopr cipitation de la chromatine as soci e au s quen age tr s haut d bit ChIP seq 5 1 1 G n ralit s L arriv e r cente du HTS a r volutionn l tude grande chelle des m canismes de r gulation de l expression des g nes Associ e au ChIP cette technique constitue un outil extr mement efficace pour 1 la d termination des sites de fixation des facteurs de transcription de mani re directe ou indirecte dans le cas des co facteurs 2 la localisation des modifications des domaines N terminaux d histones et 3 l tude de la fixation de diverses prot ines sur l ADN ARN polym rase variants d histone Anciennement hybrid s sur des puces ADN pang nomiques tiling arrays par la technique de ChIP on chip Blat amp Kleckner 1999 Ren et al 2000 Robyr et al 20021 les fragments d ADN immunopr cipit s sont d sormais s quenc s pour permettre d identifier des sites de fixation des facteurs de transcription de fa on plus pr cise Figure 5 1 et Table 5 1 Diverses techniques utilisant un s quen age par la m th
428. on hematocrit gt 20 n white blood cells median IQR number mm neutrophils median IQR number mm lymphocytes median IQR number mm Supportive medical care oxygen supplementation n perfusion of colloid dextran 40 n perfusion of human plasma n 113 100 124 n 14 40 30 45 15 n 14 36 5 35 39 n 14 1 6 6600 5500 9900 n 13 3900 2900 7600 n 13 1600 1400 2100 n 13 0 0 0 120 112 120 30 30 40 55 n 11 39 75 38 42 n 12 3 23 6450 6200 7400 n 10 3950 3500 4200 n 10 1850 1500 1900 n 10 Not perceptible n 15 15 10 20 n 15 94 n 17 42 5 38 45 n 18 17 89 6900 4800 6900 n 17 2500 2200 3800 n 17 2200 1500 3200 n 17 15 79 14 74 8 42 DENV dengue virus DF dengue fever DHF dengue hemorrhagic fever DSS dengue shock syndrome IQR interquartile range n number n x with x number of patients for which the data is available doi 10 1371 journal pone 0011671 t001 Based on ANOVA analysis lists of genes differentially expressed between DF DHF and DSS groups were generated using different false discovery rate FDR ranging from 0 05 up to 10 Indeed low FDR provide more stringent statistical filter while they reduce the number and thus the enrichment of genes differentially expressed At the opposite higher FDR while statistically acc
429. onde est absente Cette matrice subit ensuite une classification sur les lignes et les colonnes en utilisant une distance de corr lation de Pearson A l aide du plugin TBMap on peut visualiser les cartes ainsi g n r es En rouge sont repr sent es les sondes pr sentes dans une signature donn e et en noir celles qui sont absentes Celles ci une fois annot es nous permettent d observer le regroupement des g nes en fonction des processus biologiques et d identifier de nouveaux g nes candidats Comme preuve de concept TBrowser a t utilis dans des tudes de cancers du sein GSE1456 pour identifier des g nes sp cifiques de tumeurs malignes du sein L utilisation de l algorithme DBF MCL a permis l extraction d une dizaine de TS sp cifiquement enrichis en g nes du cycle cellulaire de l adh sion cellulaire de l immunit voir Lopez et al 2008 ci apr s Ces r sultats ont t publi s dans PLoS ONE en d cembre 2008 PLoS one OPEN 8 ACCESS Freely available online TranscriptomeBrowser A Powerful and Flexible Toolbox to Explore Productively the Transcriptional Landscape of the Gene Expression Omnibus Database Fabrice Lopez Julien Textoris gt Aur lie Bergon Gilles Didier Elisabeth Remy Samuel Granjeaud Jean Imbert Catherine Nguyen Denis Puthier 1 Inserm U928 TAGC Parc Scientifique de Luminy Marseille France 2 Universit de la M diterran
430. ondent des technologies tr s haut d bit de derni re g n ration NGS qui seront d crites plus tard dans ce manuscrit Les pointent les mod les de s quenceurs de paillasse de petite taille faible d bit mais tr s rapides 34 Chapitre 1 Introduction g n rale B c Beijing Genomics Institute BGI The Genome Center at Washington University 61 20 9 3 65 1 20 0 54 277 o 521 31 20 B Illumina GA2x Illumina HiSeq 2000 ABI SOLID Roche 454 G Ion Torrent 323 Pacific Biosciences 19 34 B Autres 459 27 49 Centre de s quencage Nombre de machines Wellcome Trust Sanger Institute Canada s Michael Smith Genome Sciences Centre Broad institute ee DOE Joint Genome Institute i A Yale Center for Genome Analysis ar Human Genome Sequencing Centre Baylor College of Medicine ft Ontario Institute for Cancer Research FIGURE 1 6 R partition des diff rentes technologies de s quen age tr s haut d bit dans le monde en d cembre 2011 A R partition g ographique B Distribution en nombre et en pourcentage des principaux mod les de s quenceurs tr s haut d bit nombre total 1670 et C Principaux centres de s quen age source http pathogenomics bham ac uk hts 1 4 Les techniques de s quen age tr s haut d bit Caract ristiques Life Technologies Illumina Hi Roche 454 GS
431. ongly associated with the disease phenotype of severe dengue 11 serine proteases and metalloprotease subtype DSS that encode highly pro inflammatory microbicidal inhibitors IL 1B cytokine decoy receptor free heme scavenger peptides and enzymes This pattern includes non exhaustively the molecules or complement regulating receptors Repair and alpha defensins DEFA1 DEFA3 and DEFA4 the cathelicidin remodeling genes over expressed in the DSS gene signature also CAMP and lactoferrin LTF peptides the neutrophil enzymes PLoS ONE www plosone org 6 July 2010 Volume 5 Issue 7 e11671 Nucleus Molecular Mechanisms of DSS Shape legend C chemical enzyme O group or complex O member of a group or complex V kinase lt L peptidase A phosphatase CD transcription regulator 0 transmembrane receptor O other 2000 2009 Ingenuity Systems Inc All rights reserved Figure 3 T Cell Receptor Signaling canonical pathway from Ingenuity Pathway Analysis Genes in green and red are respectively under and over expressed in the DSS gene signature Genes in white are other genes present in the canonical pathway but absent from the DSS gene signature DSS Dengue Shock Syndrome doi 10 1371 journal pone 0011671 g003 myeloperoxydase MPO neutrophil RNASE2 RNASE3 cathep sin G and neutrophil elastase ELANE Transcripts encoding the potent pro inflammatory calgranulins proteins S100A8 9 and S100A12 ch
432. onner la base de donn es afin d obtenir des informations sur une exp rience une plateforme de puce ADN une signature l aide de la fonction getTBlnfo mais galement de r cup rer une liste de signatures correspondant au r sultat d une requ te bool enne ou par liste par la fonction getSignatures Il est donc possible de concevoir des scripts R pour programmer l extraction des donn es et les analyser plus facilement sans passer par l interface graphique de TBrowser Cette librairie permet galement l extraction de signatures transcriptionnelles l aide de l algorithme DBF MCL par la fonction DBFMCL Celle ci permet d avoir acc s aux diff rents param tres de l algorithme tel que l inflation le nombre k de plus proches voisins le nombre de randomisations la valeur de FDR False Discovery Rate utilis e Cette fonction utilise en entr e une matrice d expression et renvoie un objet de type S4 DBFMCLresult contenant les param tres de l algorithme la matrice d expression de chaque signature le nombre de sondes Comme dans la version initiale seules les signatures de plus de 10 sondes sont conserv es L utilisateur peut ainsi utiliser notre algorithme sur son jeu de donn es et cr er de nouvelles signatures Comme pour la pr c dente librairie R des fiches d aide ont t cr es pour chaque fonction ou classe d objet cr e ainsi qu un manuel d utilisation voir ci apr s Cette
433. onomia Cell Adhes Migr 2 236 239 Nelissen RL Sillekens PT Beijer RP Geurts van Kessel AH van Venrooij WJ 1991 Structure chromosomal localization and evolutionary conservation of the gene encoding human U1 snRNP specific A protein Gene 102 189 196 Newbern J Birchmeier C 2010 Nrg1 ErbB signaling networks in Schwann cell devel opment and myelination Semin Cell Dev Biol 21 922 928 Nishio T Kawaguchi S Yamamoto M Iseda T Kawasaki T Hase T 2005 Tenascin C regulates proliferation and migration of cultured astrocytes in a scratch wound assay Neuroscience 132 87 102 Nivet E Vignes M Girard SD Pierrisnard C Baril N Deveze A Magnan J Lante F Khrestchatisky M Feron F Roman FS 2011 Engraftment of human nasal olfactory stem cells restores neuroplasticity in mice with hippocampal lesions J Clin Invest 121 2808 2820 Nottrott S Hartmuth K Fabrizio P Urlaub H Vidovic I Ficner R Luhrmann R 1999 Functional interaction of a novel 15 5kD U4 U6 U5 tri snRNP protein with the 5 stem loop of U4 snRNA EMBO J 18 6119 6133 Okada Y Yamagata K Hong K Wakayama T Zhang Y 2010 A role for the elongator complex in zygotic paternal genome demethylation Nature 463 554 558 Pearson J Pytel BA 1978 Quantitative studies of sympathetic ganglia and spinal cord intermedio lateral gray columns in familial dysautonomia J Neurol Sci 39 47 59 Pearson J Pytel BA Grover Johnson N Axelrod F Dancis J 1978 Quantitative studie
434. opp Il effectue les analyses secondaires et tertiaires des donn es de ChIP seq r alis es en mode de s quen age fragment mais galement paired end l analyse primaire tant r alis e sur le cluster du s quenceur Ce d veloppement r cent n a donn lieu aucune publication Cependant il est utilis en routine sur la plateforme TGML et utilis dans le cadre de collaborations Je l ai galement utilis pour analyser des donn es de ChIP seq exp rience que j ai moi m me r alis Ces r sultats ne seront toutefois pas montr dans ce manuscrit Ce pipeline tant principalement destin l analyse de donn es provenant de la plateforme il a t construit sur la base des outils propos s par Applied Biosystems Bioscope Corona lite Toujours en d veloppement afin de s adapter aux besoins de la plateforme il est crit en langage bash particuli rement appropri pour l int gration de logiciels d origine diverse Ainsi 1l int gre diff rents outils publics ainsi que des scripts et programmes d velopp s au laboratoire En effet l volution technique des s quenceurs tr s haut d bit tant tr s rapide voir Chapitre 1 1 1 Ainsi depuis 2009 nous avons chang plusieurs fois de version du SOLiD passant d une version v3 0 la version v3 5 en 2010 et enfin la v4 en 2011 Ceci a entra n de nombreux changement tant au niveau exp riemental longueur des reads taille de sonication qu au n
435. ord England 25 19 2605 6 Shinde et al 2010 Shinde K Phatak M Johannes F M Chen J Li Q Vineet J K Hu Z Ghosh K Meller J amp Medvedovic M 2010 Genomics Portals integrative web platform for mining genomics data BMC genomics 11 27 Siddiqui et al 2006 Siddiqui A S Delaney A D Schnerch A Griffith O L Jones S J M amp Marra M A 2006 Sequence biases in large scale gene expression profiling data Nucleic acids research 34 12 e83 Sims ef al 2004 Sims R J Mandal S S amp Reinberg D 2004 Recent highlights of RNA polymerase II mediated transcription Current opinion in cell biology 16 3 263 71 Slaugenhaupt ef al 2004 Slaugenhaupt S A Mull J Leyne M Cuajungco M P Gill S P Hims M M Quintero F Axelrod F B amp Gusella J F 2004 Rescue of a human mRNA splicing defect by the plant cytokinin kinetin Human molecular genetics 13 4 429 36 Smyth 2005 Smyth G K 2005 Limma linear models for microarray data In Bioinforma tics and Computational Biology Solutions using R and Bioconductor Gentleman R Carey V Dudoit S amp R Irizarry W H eds Springer New York pp 397 420 Smyth et al 2003 Smyth G K Yang Y H amp Speed T 2003 Statistical issues in cDNA microarray data analysis Methods in molecular biology Clifton N J 224 111 36 Snijders et al 2001 Snijders A M Nowa
436. ore we propose kinetin as a new sequence specific agent that can affect U1 snRNP mediated 5 ss recognition Further experiments considering the 11 other alternatively spliced mRNAs sharing a 5 ss identical to the one bordering IKBKAP exon 20 will also be of interest to understand the mechanism underlying kinetin activity on mRNA splicing In conclusion this study provides important clues to the phys iopathology of FD We identified several genes involved in nervous system development and differentiation that could represent the molecular altered signature unique to the abnormal FD neuronal function Knowledge of the commonly expressed genes from dif ferent cell types should facilitate their further characterization and functional studies Our results also identified kinetin as a com pound that affects genes involved in mRNA maturation and shed new light on its mechanism of action and its potential for therapeutic use Acknowledgments We wish to thank the patients and their families for their contribution to this study We also thank Jeanne Hsu for critical reading of the manuscript References Anderson SL Coli R Daly IW Kichula EA Rork MJ Volpi SA Ekstein J Rubin BY 2001 Familial dysautonomia is caused by mutations of the IKAP gene Am J Hum Genet 68 753 758 Aubert J Dunstan H Chambers I Smith A 2002 Functional gene screening in embry onic stem cells implicates Wnt antagonism in neural differentiation Nat Biotech n
437. orescence pour chaque spot est r alis e avec un scanner Agilent G2565CA quip d un syst me de balayage laser permettant d exciter chaque fluorochrome sa longueur d onde sp cifique L image de la puce g n r e est par la suite trait e par un logiciel de quantification afin de calculer l intensit de chaque spot c est 1 2 Le transcriptome 23 dire le niveau d expression de chaque transcrit repr sent sur la puce De nombreux formats de puces pang nomiques sont disponibles en version mono et bi canale j emploierai par la suite les termes anglais one color et two colors qui sont pr f rentiellement utilis s dans le milieu scientifique pour les principaux organismes mod les tels que l homme la souris le rat et la levure Leurs identifiants sont compos s du nombre d chantillons multipli par le nombre de sondes en milliers k voire aujourd hui en mil lions m Ces formats sont diff rents en fonction du type d impression les SurePrint HD 8x15k 4x44k 2x105k 1x244k mais galement la nouvelle g n r ation de puces contenant des lincRNAs les SurePrint G3 8x60k 4x180k 2x400k 1x1m Il est galement possible d obtenir des puces ADN fa on pour tudier le transcriptome d esp ces atypiques l aide du logiciel eArray Il existe galement d autres types de puces chacune ayant une application bien d finie telles que les puces CGH Comparative Genomic Hybrida
438. orton J C Croner L J Davies C Davison T S Delenstarr G Deng X Dorris D Eklund A C Fan X h Fang H Fulmer Smentek S Fuscoe J C Gallagher K Ge W Guo L Guo X Hager J Haje P K Han J Han T Harbottle H C Harris S C Hatchwell E Hauser C A Hester S Hong H Hurban P Jackson S A Ji H Knight C R Kuo W P LeClerc J E Levy S Li Q Z Liu C Liu Y Lombardi M J Ma Y Magnuson S R Maqsodi B McDaniel T Mei N Myklebost O Ning B Novoradovskaya N Orr M S Osborn T W Papallo A Patterson T A Perkins R G Peters E H Peterson R Philips K L Pine P S Pusztai L Qian F Ren H Rosen M Rosenzweig B A Samaha R R Schena M Schroth G P Shchegrova S Smith D D Staedtler F Su Z Sun H Szallasi Z Tezak Z Thierry Mieg D Thompson K L Tikhonova I Turpaz Y Vallanat B Van C Walker S J Wang S J Wang Y Wolfinger R Wong A Wu J Xiao C Xie Q Xu J Yang W Zhang L Bibliographie 301 Zhong S Zong Y amp Slikker W 2006 The MicroArray Quality Control MAQC project shows inter and intraplatform reproducibility of gene expression measurements Nature biotechnology 24 9 1151 61 Shin ef al 2009 Shin H Liu T Manrai A K amp Liu X S 2009 CEAS cis regulatory element annotation system Bioinformatics Oxf
439. ot ine prot ine au niveau des diff rents compartiment cellulaire partir d une liste de g nes namove eaiaceo moaes FIGURE 4 6 Interface graphique de TBrowser avec son panneau de requ tes et ses principaux plugins TranscriptomeBrowser 3 0 introducing a new compendium of molecular interactions and a new visualization tool for the study of gene regulatory networks Cyrille Lepoivre Aur lie Bergon Fabrice Lopez Narayanan B Perumal Catherine Nguyen Jean Imbert and Denis Puthier S Inserm UMR_S 928 TAGC Parc Scientifique de Luminy Marseille France Universit de la M diterran e Marseille France 3 IBiSA Platform TGML Parc Scientifique de Luminy Marseille France Eli Lilly and Company Indianapolis Indiana USA ESIL Universit s de Provence et de la M diterran e Marseille France Corresponding author Email addresses CL lepoivre tagc univ mrs fr AB bergon tagc univ mrs fr FL lopez tagc univ mrs fr NBP nperumal iupui edu CN nguyen tagc univ mrs fr JI jean imbert inserm fr DP puthier tagc univ mrs fr Abstract Background Deciphering gene regulatory networks by in silico approaches is a crucial step in the study of the molecular perturbations that occur in diseases The development of regulatory maps is a tedious process requiring the comprehensive integration of various evidences scattered over biological data bases Thus the research
440. ots for classes AgilentBatch and AgilentBatchRG is provided in the Table 2 2 4 2 Accessing slots Different components or slots of the microarray may be accessed using the operator or alterna tively using the slot function object slot name slot object slot name Slot name Column name of quan tification file Description gP gProcessedSignal Matrix of the normalized signal obtained by the Fea ture Extraction software on the green chanel gM gMeanSignal Matrix of the mean signal measured in the green chan nel gBGM gBGMeanSignal Matrix of the mean background signal measured in the green channel rP rProcessedSignal Matrix of the normalized signal obtained by the Fea ture Extraction software in the red channel for Agi lentBatchRG and AgilentNormRG class object rM rMeanSignal Matrix of the mean signal measured in the red chan nel for AgilentBatchRG and AgilentNormRG class object rBGM rBGMeanSignal Matrix of the mean background signal measured in the red chanel for AgilentBatchRG and Agilent NormRG class objects fileNames Vector containing names of the files used to build the AgilentBatch or AgilentBatchRG object PosX Col Vector of the column localization of the spot on array PosY Row Vector of the row localization of the spot on array Desc Description Vector containing probe annotation GN GeneName Vector containing gene names for corres
441. ou J Davison T S 300 Bibliographie Delorenzi M Deng Y Devanarayan V Dix D J Dopazo J Dorff K C Elloumi F Fan J Fan S Fan X Fang H Gonzaludo N Hess K R Hong H Huan J Irizarry R A Judson R Juraeva D Lababidi S Lambert C G Li L Li Y Li Z Lin S M Liu G Lobenhofer E K Luo J Luo W McCall M N Nikolsky Y Pennello G A Perkins R G Philip R Popovici V Price N D Qian F Scherer A Shi T Shi W Sung J Thierry Mieg D Thierry Mieg J Thodima V Trygg J Vishnuvajjala L Wang S J Wu J Wu Y Xie Q Yousef W A Zhang L Zhang X Zhong S Zhou Y Zhu S Arasappan D Bao W Lucas A B Berthold F Brennan R J Buness A Catalano J G Chang C Chen R Cheng Y Cui J Czika W Demichelis F Deng X Dosymbekov D Eils R Feng Y Fostel J Fulmer Smentek S Fuscoe J C Gatto L Ge W Goldstein D R Guo L Halbert D N Han J Harris S C Hatzis C Herman D Huang J Jensen R V Jiang R Johnson C D Jurman G Kahlert Y Khuder S A Kohl M Li J Li L Li M Li Q Z Li S Li Z Liu J Liu Y Liu Z Meng L Madera M Martinez Murillo F Medina I Meehan J Miclaus K Moffitt R A Montaner D Mukherjee P Mulligan G J Neville P Nikolskaya T Ning B Page G P Parker J Parry R M
442. ou g nomiques Il g n re ensuite un fichier wig par chromosome contenant le nombre de reads tous les 10 nucl otides sur chaque brin s par ment Ce sont ces fichiers wig que Picor analyse pour trouver les pics Le principe de l algorithme de Picor part du postulat que pour un facteur de transcription s quence sp cifique on doit observer au niveau des donn es un d s quilibre de liaison Si on tudie la couverture des deux brins du g nome l aide de fen tres glissantes de taille variable mais distantes d une distance d correspondant a la moiti de la taille de sonication on doit donc obtenir un pic sur chaque brin tout d abord sur le brin puis sur le brin Figure 5 15 Il en r sulte en sortie un fichier de type bed donnant la localisation des pics d passant un certain FDR calcul pour chaque pic et comportant en plus de la localisation a la base pr s des pics la taille de la fen tre et la distance de corr lation de Spearman 246 Chapitre 5 tude de la r gulation transcriptionelle par HTS 5 5 Analyse de donn es en collaborations Le pipeline est utilis en routine sur la plateforme TGML pour l analyse des donn es du s quenceur SOLiD J ai ainsi r alis l analyse des donn es d exp riences de ChIP seq enti rement r alis es au niveau de la plateforme TGML mais provenant de projets collaboratifs avec d autres laboratoires de recherche 1 la collaboration avec les Docteurs Max Chaffanet
443. oul use the getAgilentBatch function which uses read MIAME read phenoData readAgilent and checkDim functions In the case of two colors approach user must set the RG argument to TRUE gt args getAgilentBatch function n NULL RG FALSE path recursive FALSE flag 2 NULL Numero Letter Flag name Flag description 1 a glsSaturated Feature is saturated 1 b rIsSaturated Feature is saturated 2 c glsFeatNonUnifOL Feature is not uniform 2 d rIsFeat NonUnifOL Feature is not uniform 3 e gUsPosAndSignif Feature is not positive and significant 3 f rIsPosAdSignif Feature is not positive and significant 4 g glsFeatPopnOL Feature is a population outlier 4 h rIsFeatPopnOL Feature is a population outlier 5 i IsManualF lag Feature is manually marked 6 j gIsBGNonUnifOL Background is not uniform 6 k rlsBGNonUnifOL Background is not uniform 7 l gIsBGPopnOL Background reading is population outlier 7 m rlsBGPopnOL Background reading is population outlier Table 1 Table of the different flags 2 2 1 One channel hybridizations As in the case of two channel hybridization the getAgilentBatch function will extract information from user provided files and return an instence of class AgilentBatch gt myob lt getAgilentBatch 1 4 path OneColor flag 1 7 There are 4 files in the working directory home aurelie R i686 pc linux gnu library 2 5 AgiData OneColor Reading a
444. owever we think that scientists should comply better with the MIAME guidelines and that they should provide systematically raw data when submitting a new experiment Finally we would like to acknowledge the GEO database team whose efforts in providing high quality repository service made this work possible Materials and Methods Microarray data retrieval Human mouse and rat microarray data derived from 30 Affymetrix microarray platforms Supplementary Table S1 were downloaded from the GEO ftp site and retrieved in seriesMatrix file format ftp ftp ncbi nih gov pub geo DATA SeriesMatrix SeriesMatrix are summary text files related to a GEO series Experiment GSE that include sample and experiment metadata together with a tab delimited matrix that corresponds to normalized expression data Each file n 2 869 was parsed using a Perl script to extract gene expression matrix and metadata Probes with missing expression values were excluded from analysis Only expression matrix with at least ten columns samples were kept for subsequent analysis n 1 484 Supplementary Table S2 DBF MCL algorithm The filtering step of DBF MCL was implemented in C The latest Markov Clustering algorithm version 1 006 06 058 was December 2008 Volume 3 Issue 12 e4001 obtained from http micans org mcl src The full pipeline of DBF MCL that integrates normalization filtering and partition ing was implemented in Bash Shell Scripting
445. p Li X 2010 Systematic iden tification of conserved motif modules in the human genome BMC genomics 11 567 Callow et al 2000 Callow M J Dudoit S Gong E L Speed T P amp Rubin E M 2000 Microarray expression profiling identifies genes with altered expression in HDL deficient mice Genome research 10 12 2022 9 Carninci et al 2005 Carninci P Kasukawa T Katayama S Gough J Frith M C Maeda N Oyama R Ravasi T Lenhard B Wells C Kodzius R Shimokawa K Ba jic V B Brenner S E Batalov S Forrest A R R Zavolan M Davis M J Wilming L G Aidinis V Allen J E Ambesi Impiombato A Apweiler R Aturaliya R N Bailey T L Bansal M Baxter L Beisel K W Bersano T Bono H Chalk A M Chiu K P Choudhary V Christoffels A Clutterbuck D R Crowe M L Dalla E Dalrymple B P de Bono B Della Gatta G di Bernardo D Down T Engstrom P Fagiolini M Faulkner G Fletcher C F Fukushima T Furuno M Futaki S Gariboldi M Georgii Hemming P Gingeras T R Gojobori T Green R E Gustincich S Har bers M Hayashi Y Hensch T K Hirokawa N Hill D Huminiecki L Iacono M Ikeo K Iwama A Ishikawa T Jakt M Kanapin A Katoh M Kawasawa Y Kelso J Kitamura H Kitano H Kollias G Krishnan S P T Kruger A Kummerfeld S K Kurochkin I V Lareau L F La
446. panel LUC7L expression increased in both control and FD samples Fig 5A middle panel We also observed that increasing kinetin concentration leads to a dose dependent inhibition of ZNF280D mRNA expression supporting our hypothesis of sequence specific targeting by kinetin Fig 5A lower panel To validate the action of kinetin on the expression of LUC7L and ZNF280D we exposed A Kinetin dose effect Consecutive addition removal of kinetin IKBKAP OCtri WT MFD WT FD MU D wo fF A Relative expression AU o KO K25 K50 K100 K200 LUC7L CTRL FD N oa Relative expression AU a KO K25 K50 K100 K200 ZNF280D OICTRL WFD Relative expression AU KO K25 K50 K100 K200 Relative expression AU O a NW RAHN KO K80 WwW K80 W Relative expression AU KO K80 W K80 W Relative expression AU KO K80 W K80 W Figure 5 Changes in gene expression after different exposures of hOE MSCs to kinetin A Control and FD adherent hOE MSCs were incubated for 48 hr with different concentration of kinetin 25 50 100 and 200 uM for dose effect experiment B Cells were exposed to 80 uM kinetin for 24 hr K80 followed by the removal of the drug for another 24 hr i e W for washout Two rounds of drug addition removal were performed and RNA was extracted each time after 24 hr for each condition Total RNAs were reverse transcribed and levels of expression of KBKAP alternative tr
447. parall le On peut distinguer trois technologies principales propos es par diff rents fournisseurs chacune poss dant des caract ristiques particuli res et ayant recours des techniques sp cifiques Roche Diagnostics 454 Life Sciences Illumina Solexa et Life Technologies Applied Biosystems ABI Table 1 1 Metzker 2010 Suzuki et al 2011 A Vheure actuelle plus de 1 800 s quenceurs de nouvelle g n ration ont t vendus a travers le monde Figure 1 6 A 93 3 appartiennent a l une de ces quatre technologies dont plus de la moiti correspondant un mod le Illumina Figure 1 6 B Des centres de s quen age de renomm e internationale se sont quip s d un nombre important de s quenceurs tr s haut d bit Figure 1 6 C Hum 20101 La gamme de s quenceurs en d veloppement s tend tr s rapidement ainsi seules les techniques et les mod les les plus couramment utilis s seront d crits Pendant ma th se la plateforme TGML fait le choix de s quiper d un s quenceur tr s haut d bit J ai ainsi particip aux discussions avec les divers fournisseurs C est pourquoi je pr senterai ci apr s les trois principales chimies de s quen age La plate forme s est finalement quip en avril 2009 d un s quenceur tr s haut d bit de type SOLiD J ai particip de nombreuses collaborations sur l analyse de donn es pro venant d exp riences de Chromatine ImmunoPr cipitat
448. pend du type de donn es tudier En effet certaines m thodes de normalisation sont d di es une technologie donn e Par exemple il existe de nombreuses normalisations d di es aux puces Affymetrix telles que RMA MAS 5 0 GCRMA dChIP mais celles ci ne sont pas utilisables pour les donn es de puces Agilent De plus les m thodes diff rent g n ralement entre des donn es one color et two colors Dans cette partie seules les principales m thodes de normalisation utilisables pour les puces ADN de technologie Agilent sont d crites Il faut d terminer quelle m thode peut corriger au mieux les biais sans pour autant alt rer le signal tudi G n ralement ces m thodes sont utilis es de la plus simple la plus sophistiqu e si un crit re de qualit particulier para t am lior Le choix de la m thode de normalisation est guid par des repr sentations graphiques de type nuage de points scatter plot diagramme MA MA plot ou Bland Altman plot histogramme ou profil de densit des intensit s ou bien encore boite 4 moustaches box plot permettant de visualiser la distribution des donn es Smyth et al 2003 La repr sentation de type nuage de points permet la comparaison de deux chantillons entre eux On repr sente ainsi l intensit de chaque sonde avec en abscisse le ler chantillon et en ordonn e le second les g nes s loignant de la diagonale tant diff rentiellement exp
449. per un pipeline d analyse pour des exp riences d ImmunoPr cipitation de la Chromatine ChIP seq voir Chapitre 5 Ce d veloppement bioinformatique m a ensuite permis de collaborer l analyse d exp riences ciblant de facteurs de transcription impliqu s dans le cancer du sein ou les glioblastomes respectivement avec l quipe du Dr Daniel Birnbaum du Centre de Recherche en Canc rologie de Marseille CRCM et le Dr Thierry Virolle de l unit Inserm U898 stem cells development and cancer de Nice Enfin le pipeline et les scripts d velopp s ont galement t utilis s dans le cadre de l ana lyse de donn es de positionnement des nucl osomes par une approche d velopp e par le Dr Salvatore Spicuglia de l quipe du Dr Pierre Ferrier au CIML en collaboration avec notre labo ratoire Cette technique nomm e Mnase Cap fait l objet d un article en pr paration CHAPITRE Introduction g n rale Sommaire 1 1 tude des pathologies 19 1 2 Lie transcriptome 2 ssas Se as at oe Dhs ee ee at 20 12 1 Principe d s puces ADN cone a e s uns ue nus ma men oe de 21 1 2 2 Cas particulier des puces ADN de technologie Agilent 22 1 3 R gulation de l expression des g nes 24 1 31 Latranseription basale 2 oo 22 23 43 4144444 sua 24 1 3 2 Les s quences r gulatrices et les facteurs de transcription s quences SD CIHQUES inte ka e
450. periments of consecutive addition and washout of kinetin For transcriptome analysis four of the five control and FD hOE MSCs have been used Generation of Spheres and Induction of Cell Differentiation Multipotent spheres were obtained after 1 week of culture with EGF and bFGF as previously described Boone et al 2010 For cell differentiation hOE MSCs were treated with the rafnshh cock tail consisting in 1 insulin transferrin selenium ITS 1 uM all trans retinoic acid Sigma Aldrich 5 uM Forskolin R amp D Systems Minneapolis MN 15 nM Sonic hedgehog R amp D Systems 1 B27 supplement a serum substitute and 0 5 N2 supplement 2 HUMAN MUTATION Vol 00 No 0 1 11 2012 enhancing the growth and survival of neuronal cells for 7 days without changing the medium RNA Isolation Total RNA was isolated using the RNeasy Mini Kit Qiagen Hilden Germany with DNAse treatment on the column follow ing the manufacturer s guidelines RNA concentration was deter mined using a nanodrop ND 1000 spectrophotometer NanoDrop Technologies Wilmington DE RNA integrity was assessed on an Agilent 2100 Bioanalyzer Palo Alto CA All samples exhibited RIN gt 9 End Point Reverse Transcription Polymerase Chain Reaction Analysis Total RNA was subjected to reverse transcription RT using the High Capacity cDNA Archive Kit Applied Biosystems Foster City CA End point polymerase chain reaction PCR analysis was per formed
451. pez Santiago LF Pertin M Morisod X Chen C Hong S Wiley J Decosterd I Isom LL 2006 Sodium channel beta2 subunits regulate tetrodotoxin sensitive sodium channels in small dorsal root ganglion neurons and modulate the response to pain J Neurosci 26 7984 7994 Marazziti D Mandillo S Di Pietro C Golini E Matteoni R Tocchini Valentini GP 2007 GPR37 associates with the dopamine transporter to modulate dopamine uptake and behavioral responses to dopaminergic drugs Proc Natl Acad Sci USA 104 9846 9851 Matigian N Abrahamsen G Sutharsan R Cook AL Vitale AM Nouwens A Bellette B An J Anderson M Beckhouse AG Bennebroek M Cecil R Chalk AM Cochrane J Fan Y Feron F McCurdy R McGrath JJ Murrell W Perry C Raju J Ravishankar 1 0 HUMAN MUTATION Vol 00 No 0 1 11 2012 S Silburn PA Sutherland GT Mahler S Mellick GD Wood SA Sue CM Wells CA Mackay Sim A 2010 Disease specific neurosphere derived cells as models for brain disorders Dis Model Mech 3 785 798 Maturana AD Fujita T Kuroda S 2010 Functions of fasciculation and elongation protein zeta 1 FEZ1 in the brain Sci World J 10 1646 1654 Murrell W Feron F Wetzig A Cameron N Splatt K Bellette B Bianco J Perry C Lee G Mackay Sim A 2005 Multipotent stem cells from adult olfactory mucosa Dev Dynam 233 496 515 Naumanen T Johansen LD Coffey ET Kallunki T 2008 Loss of function of IKAP ELP1 could neuronal migration defect underlie familial dysaut
452. physical regulatory interactions are represented as edges connect ing the corresponding entities The topology of the subsequent network can later be analyzed using advanced tools such as Cytoscape 5 However as data integration is a challenge that requires to map various types of evidence onto a set of stable gene ids most applications are oriented toward a single data type mostly regulatory or physical interactions see table 1 for an overview 6 10 Moreover another challenge is the development of graph based tools producing clear meaningful and integrated visualizations from which users can draw new hypotheses without being over whelmed by the density of the presented graphic information In this regard the Cytoscape plug in Cerebral proposes an intuitive visualization method through a cell compartment based layout that shows interacting proteins on a layout resembling traditional signalling pathway system diagrams 11 Here we sought to create a compendium of predicted and validated molecular interactions in hu man and mouse First we used a large collection of PWMs obtained from TRANSFAC n 523 JASPAR n 303 and UNIPROBE n 387 to search in gene promoter regions for candidate tran scription factor binding sites TFBSs conserved over human mouse rat and dog genomes 12 14 Overall our analysis of these PWMs corresponding to 347 human and 475 mouse transcription fac tors TFs provides a systematic overv
453. ponding probes PN ProbeName Vector of the probe name SN SystematicName Vector of the sustematic name of the gene correspond ing to the probe Flag glsFeatureNonUnifOL Matrix which allows to know if it s a qood quality spot feature is not uniform CtrT ControlType Vector of the control type 1 negative control 0 sample ou 1 positive control PhenoD The phenodata txt file is in the ExpData directory of the working directory This slot is an phenoData class object Miame The miame txt file contain Minimum Information About Microarray Experiment and is in the ExpData directory of the working directory This slot is an MIAME class object Row Vector of the number of array row Col Vector of the number of array column Table 2 Table of the different slot contained in an AgilentBatch or AgilentBatchRG class object If implemented user may also use the corresponding method slot name object For a slot containing a matrix the Following command will be valid object slot name i j slot object slot name i j slot names object i j object i j where i correspond to one or several spots and j correspond to one or several arrays Exemples for matrix gt myob gP 1 20 1 2 gt slot myob gP 1 20 1 2 gt gP myob 1 20 1 2 Exemples for vector gt myob GN 1 20 gt slot myob GN 1 20 gt GN myob 1 20 For data from the first array gt myobl 1 2 4 3 Exclude data
454. pro inflammatory mediators highly damaging to host tissues and vascular endothelia 50 106 and poorly regarded in dengue 107 definitively play a role in DSS pathophysiology Alteration of a gene pattern related to homeostasis of cholesterol in monocytes macrophages Mo Mac in the blood cells of DSS children Table 5 was an unexpected finding while it should be considered regarding recent knowledge on the role of monocytes as a pivotal link between inflammation innate immunity and host lipid metabolism 108 109 Indeed under physiological condi tions monocytes maintain cholesterol homeostasis by clearing modified LDL such as oxidized LDL ox LDL from plasma Under pathological conditions balance between uptake and efflux of those modified cholesterol molecules may be altered 59 60 62 66 67 86 resulting in the intracellular accumulation of modified cholesterol This turns classical monocytes towards a pro inflammatory phenotype lipid laden monocytes macrophag es Mo Mac a sub type of pro inflammatory immune cells initially identified in vascular lesions of chronic inflammatory metabolic diseases 110 Recent knowledge has shown that those atypical monocytes produce a large array of pro inflammatory mediators such as ROS metalloproteases eicosanoids and pro inflammatory adi pokines making these cells potent contributors to vascular damages systemic inflammation and major metabolic changes increase July 2010 Volume 5
455. psis or post trauma sterile SIRS 23 25 We compared the transcriptome of blood cells from DSS paediatric patients at time of shock to those of children classified as DF or DHF grades I II 29 matched for important variables such as age gender immune status towards dengue infection primary or secondary infection and time of disease evolution after onset of fever Our study has produced significant results further discussed in the context of DSS pathophysiology First we identify a transcriptional signature of the DSS differentiating DSS from the other forms of dengue infection and characterizing DSS as a unique and specific entity Giving particular attention to study design and statistical analysis we identify a large and robust gene expression profile of 2959 genes that discriminates DSS paediatric patients from other dengue patients DF or DHF who did not progress to shock whatever the supportive treatment they received Importantly DSS children clustered together whatever they were considered as having primary or secondary dengue infection while secondary infections represented the majority of DF DHF and DSS children recruited see table S1 as expected in hyper endemic areas The robustness of the DSS associated gene signature was established by showing that the disease phenotype variable significantly affected expres sion levels of all the genes identified multi way ANOVA and July 2010 Volume 5 Issue 7 e11671 Molec
456. pte les biais dus l exp rience diff rents temps d hybridation de marquage d extraction Ainsi 2 959 g nes ont pu tre identifi s comme tant diff rentiellement exprim s entre les patients DSS ou DF et DHF avec un False Discovery Rate FDR de 10 La pertinence de la majorit de ces g nes a ensuite t confirm e par une autre approche SAM utilisant le logiciel TmeV Par la suite les outils d analyse ontologique David knowledgebase et Ingenuity Pathways Analysis IPA nous ont permis de conclure a un enrichissement de notre liste de g nes candidats sous exprim s chez les DSS en marqueurs de lymphocytes T et de cellules Natural Killer ce groupe enrichi tant plus pr cis ment associ aux voies de signalisation TCR signaling pathways et IFN I related pathways En outre chez les DSS nous avons pu observer une augmentation du niveau d expression de plusieurs marqueurs impliqu s dans 1 la r ponse anti inflammatoire 2 la r paration des tissus 3 la r ponse du complexe pro inflammatoire et 4 le m tabolisme lipidique L quipe du Dr Patricia Paris a ainsi pu sugg rer que les m canismes identifi s seraient fortement impliqu s dans la fuite vasculaire massive li e au syndrome DSS Les donn es sont accessibles dans la base de donn es GEO avec l identifiant GSE17924 3 4 2 Dysautonomie Familiale La d r gulation de l pissage des ARNm est un processus crucial dans le d ve
457. pts of XBKAP gene for control left panel and FD hOE MSCs right panel at cell passage 1 2 5 9 B graph of the mean level of expression of XBKAP alternative transcripts in control left panel and FD hOE MSCs right panel at cell passages 1 2 5 9 determined by absolute RT qPCR ABL7 was used as a reference gene for normalization Error barrs denote standard error C western blot analysis of total lysate from 4 controls and 4 FD hOE MSCs using monoclonal anti IKAP hELP1 antibody upper panel Anti B actin was included to show equal loading lower panel D NMD pathway was blocked by the translation inhibitor cycloheximide and results in an elevated expression of MU transcripts in FD cells agarose gel electrophoresis left panel Results are confirmed with absolute qPCR normalized with ABL1 right panel doi 10 1371 journal pone 0015590 g002 large amount of JABKAP MU transcripts is degraded through the NMD pathway resulting in much less JABKAP transcripts and IKAP RELPI protein in FD compared to control cells Heterogeneous IKAP hELP1 distribution in hOE MSCs Since the localization of IKAP hELPI remains controversial and is important to understand protein functions we stained both control and FD hOE MSCs with the monoclonal antibody directed against IKAP RELPI and previously used for detecting the protein by western blot analysis In control cells confocal imaging revealed a weak and diffuse signal with a dominant cytoplasmic staining withi
458. r DF dengue hemorrhagic fever DHF and dengue shock syndrome DSS patients Each row represents a single transcript and each column represents a patient s sample Color scale indicates the range of gene expression black indicates median expression level red greater expression green lower expression The 2 patient subsets identified are indicated PLxxx code relative to one patient Black star DSS patient sampled 3 days after shock Orange star Patients who received perfusion of human plasma before collection of blood samples doi 10 1371 journal pone 0011671 g001 To validate microarray data we carried out real time RT PCR focusing on nine genes strongly associated with the DSS gene signature using 15 patients samples five from each disease phenotype subtype DF DHF and DSS Results obtained strongly correlate microarray data Figure S1 DSS gene signature analysis identifies a diversity of genes and canonical molecular pathways related to immunity inflammation and host metabolism Filtering genes from those having the highest to the lowest statistical association with the disease phenotype variable Table S2 relying on results from multi way and local ANOVA revealed that the individual genes having the strongest association with the DSS phenotype subtype are for a large part related to innate immunity inflammation and host lipid metabolism a finding confirmed when the whole 2959 genes of the DSS gene signature were process
459. r alis la partie de traitement et d analyse des donn es et ainsi que leur soumission a GEO Dans la mesure o il n existe aucun traitement sp cifique la DF permettant de r duire les sympt mes ni m me de contrecarrer l avancement de la maladie des recherches ont t men es pour tester une mol cule prometteuse la kin tine Le choix s est port sur cette mol cule car elle permet de corriger l pissage alternatif aberrant d IKBKAP bien que son m canisme d action soit totalement inconnu Boone et al 2010 Hims et al 2007 Keren et al 2010 Lee amp Mahadevan 2009 Slaugenhaupt et al 2004 Ainsi nous avons recherch sa signature transcriptionnelle afin de mieux comprendre son mode d action La signature transcriptionnelle des DF obtenue a ensuite t compar e aux donn es brutes de 5 autres tudes publi es et accessibles sur GEO ou ArrayExpress A l aide d un SAM FDR 0 3000 g nes ont t trouv s comme tant diff rentiellement exprim s entre les cellules souches indiff renci es sph res et les cellules neuro gliales diff renci es neurones et astrocytes Nous avons ainsi pu clairement valider l empreinte transcriptionnelle induite par les 3 facteurs du cocktail de diff renciation neuro gliale utilis lors de la pr c dente campagne de puces ADN en retrouvant des g nes connus pour leur r ponse 1 l acide r tinoique 2 la forskoline et 3 le morphog ne Sonic he
460. r Line Chiller Off x 1494 hrs Lamp 1494 hrs P Run completed gt Run control J Run Completed Y Run Control ji ey s A Complete Complete L L Imaging Initializing Fluidics Initializing _ Initializing subsystems HS som FIGURE 5 6 Aper u de l interface graphique du logiciel ICS pilotant le run 5 2 2 Interfaces utilisateurs pour le lancement et la gestion du s quencage Avant de lancer une session de s quengage il est n cessaire de param trer le s quenceur et les analyses l aide de ICS et SETS ICS est le logiciel de contr le instrumental du SOLiD tandis que SETS est le logiciel de gestion des s quen ages sur le cluster online Figure 5 6 C est une application web qui permet la visualisation des donn es en temps r el et la lecture des rapports d analyse une fois le run termin 5 2 3 Pipeline de traitement de donn es Bioscope Bioscope est une suite logicielle d velopp e par Applied Biosystems et livr e en standard avec les s quenceurs SOLiD Elle est utilis e pour r aliser l analyse secondaire et certaines analyses tertiaires telles que la recherche de SNP de petits et larges indels d inversions de CNV Copy Number Variations ou encore le calcul de l abondance des transcrits apr s s quen age d un exome whole exome Son fonctionnement en ligne de commande se fait via des fichiers de configuration ini contenant tous les
461. rahim EC Leyne M Mull J Liu L et al 2007 Therapeutic potential and mechanism of kinetin as a treatment for the human splicing disease familial dysautonomia J Mol Med 85 149 61 Gold von Simson G Goldberg JD Rolnitzky LM Mull J Leyne M et al 2009 Kinetin in familial dysautonomia carriers implications for a new therapeutic strategy targeting mRNA splicing Pediatr Res 65 341 6 Tanabe S Sato Y Suzuki T Suzuki K Nagao T et al 2008 Gene expression profiling of human mesenchymal stem cells for identification of novel markers in early and late stage cell culture J Biochem 144 399 408 Wagner W Horn P Castoldi M Diehlmann A Bork S et al 2008 Replicative senescence of mesenchymal stem cells a continuous and organized process PLoS One 3 e2213 Cheishvili D Maayan C Smith Y Ast G Razin A 2007 IKAP hELP1 deficiency in the cerebrum of familial dysautonomia patients results in down regulation of genes involved in oligodendrocyte differentiation and in myelination Hum Mol Genet 16 2097 104 Zhang X Klueber KM Guo Z Cai J Lu C et al 2006 Induction of neuronal differentiation of adult human olfactory neuroepithelial derived progenitors Brain Res 1073 1074 109 19 Wolozin B Sunderland T Zheng BB Resau J Dufy B et al 1992 Continuous culture of neuronal cells from adult human olfactory epithelium J Mol Neurosci 3 137 46 Roisen FJ Klueber KM Lu CL Hatcher LM Dozier A et al 2001 Adult
462. raison d avantages notables Massie amp Mills 20081 possibilit d analyse du g nome entier il ne d pend pas des sondes pr sentes sur la puce ADN En effet les puces ADN disponibles poss dent un nombre limit de sites qui ne repr sentent qu une fraction du g nome total meilleure sensibilit toutes les s quences pr sentes sont s quenc es et reproductibilit possibilit de multiplexage utilisation de code barres possibilit de s quencer en mode paired end pour am liorer la qualit de l alignement des fragments suppression des biais techniques li s aux puces ADN comme la saturation du signal les probl mes de d tection de spots de lavage partiel de la puce besoin d une quantit initiale d ADN plus faible ce qui est pratique pour des chantillons pr cieux avec au minimum 5yg pour le chip on chip contre 5ng pour le ChIP seq moins de bruit de fond avec une gamme dynamique et un rapport signal sur bruit plus pr cis pas de bruit de fond des spots et l absence d hybridation crois e entre les sondes Johnson et al 2007 Mardis 2007 Massie amp Mills 2008 meilleure r solution spatiale des pics ou profils un site de liaison d un fac teur de transcription peut tre identifi pr cis ment 10 30 pb centr sur le pic Kharchenko et al 2008 Cependant le ChIP seq poss de galement quelques inconv nients son
463. ral repair and remodeling genes encoding extracellular matrix proteins vasoactive mediators and matrix metalloproteases such as the MMP likely reflects a compensatory response to inflammatory insults and a number of those genes products are now considered putative biomarkers systemic inflammatory syndromes such as severe sepsis 100 Most proteins encoded by those genes are indeed secreted by activated immune cells such as monocytes macrophages They may have adverse effects responses in PLoS ONE www plosone org Molecular Mechanisms of DSS towards the vascular endothelium when produced in excess since they may immobilization of inflammatory mediators at the surface of endothelial cells 101 permeability of capillaries 102 or induce direct damage to endothelial tissues 103 Recently one of them MMP9 has been proposed as a putative candidate in the occurrence of plasma leakage during dengue infection 47 While previous transcriptional studies failed to identify pro inflammatory gene patterns in the blood cells of DSS patients 27 28 42 our study is the first one to report that a diversity of pro inflammatory transcriptional responses at the interface of innate immunity inflammation and host lipid metabolism are activated at the time of cardiovascular failure Since those mechanisms are considered pathogenic in other systemic inflam matory diseases where systemic vascular dysfunction does occur we suggest that
464. rci mes amis de la plateforme TGML pour votre amiti et pour tous les moments de d tente et les fous rires inoubliables partag s ensemble A Fabrice et FX mes coll gues bioinformaticiens ah on en passe du temps sur les donn es de s quen age et ce n est que le d but tr s bient t pour poursuivre le d veloppement Et en particulier Fabrice pour son aide en programmation lors de ma th se on va bient t pouvoir reprendre les soir es pizzas A H l ne pour toute son aide et sa patience afin de m expliquer et surtout de m aider venir bout des exp riences notamment du ChIP seq et pour toute la culture cellulaire qu elle a r alis e pour moi A Val rie pour sa gentillesse et les bons g teaux au chocolat un antid presseur naturel qui m a t tr s utile lors de cette th se A Sophie et V ro notre quipe de choc de gestion merci les filles pour les pauses caf s qui m ont t d un grand r confort Merci en particulier toi Sophie pour toutes les relectures que tu as faites malgr le fait que la bioinformatique ne soit pas ta tasse de th se euh je veux dire de th A Clairette merci de m avoir encourag e en me montrant qu on pouvait vaincre toutes les difficult s Merci pour m avoir encourag e aller prendre des cours de salsa avec toi sans quoi je n aurais jamais rencontr l amour de ma vie Christophe J en profite pour remercier tous mes amis de la salsa de la
465. re SCORE represents the PWM score for a PWM of length W in the DNA sequence of a species c between positions p and p W 1 and S w represents the nucleotide observed at position ptw The probability of observing each nucleotide under the background distribution was assumed to be 0 25 For each PWM m a score threshold tm with p value below 5 10 was computed using matrix distrib from RSAT ensuring high stringency of sequence scoring 26 A sequence in the ref erence genome was considered as a putative TFBS if its score for PWM m at position p in the align ment was found above tm in human mouse rat and dog Each PWM was then linked to its corre sponding transcription factors and putative targets Information was stored in a MySQL relational database We also integrated several informations obtained from popular databases Protein DNA interac tions n 174 168 derived from various genome wide analysis e g ChIP on chip ChIP seq and ChIP PET and encompassing interactions corresponding to 38 human TFs and 55 mouse TFs were obtained from the ChIP X database TFBS predictions were obtained from the present work see be low and TFBSConserved UCSC track 367 829 and 686 936 respectively A set of regulatory in teractions curated from the literature were obtained from LymphTF DB 392 directed interactions and OregAnno 1 991 interactions Protein protein interaction datasets were obtained from HPRD and Intact 39 224 and 50 286 respectively
466. reement with EST sequences found in alternative splicing database such as ASD 55 We December 2010 Volume 5 Issue 12 e15590 revealed that the alternative use of a 3 ss downstream of the ATG start codon leads to a shorter exon 2 which can potentially induce the use of an alternative ATG start codon in exon 4 resulting in the synthesis of an N terminal truncated IKAP RELPI protein In addition we detected the presence of intronic sequences at the end of IKBKAP gene leading to a supplementary exon in the mRNA named exon 36 This exon inclusion also induced a frameshift and resulted in a premature stop codon whose relative location likely led to NMD of this new isoform as observed by stabilization of the transcript after cycloheximide treatment Figure S2 IKAP hELP1 plays the role of a scaffold protein in Elongator complex assembly and the C terminus half of IKAP RELPI is responsible for this function 34 It has also been shown that IKAP RELPI contains five WD like repeats domains in the N terminal part that may play a role for protein protein interactions 56 When comparing the different protein isoforms resulting from the 3 alternative splicing events we described Figure 6D only the isoform resulting from exon 20 skipping seems to lack a functional domain and may play a pathological role during FD progression However the protein domains of IKAP hELP1 important for Elongator integrity have not been precisely mapped
467. ressions GEN TREND Ces outils proposent galement parfois en entr e d utiliser deux listes de g nes correspondant des g nes surexprim s et sousexprim s L objectif de ces outils est de nous renseigner sur les coexpressions de g nes ils n tudient pas les r gulation autours de ces g nes en int grant d autres sources de donn es comme TBrowser Outil Site web TranscriptomeBrowser http tagc univ mrs fr torowser Lopez et al 2008 GENE CHAnge brow http genechaser stanford edu SER GeneChaser Chen et al 2008 MARQ http marq dacya ucm es Vazquez et al 2010 Gene Expression data Mining http cgs pharm kyoto u ac jp services network index php Toward RElevant Network Discovery GEM TREND Feng et al 2009 COXPRESdb http coxpresdb jp Obayashi amp Kinoshita 2011 GOEGLE Yu et al 2009 http omics biosino org 14000 kweb workflow jsp id 00020 Genevestigator https www genevestigator com gv biomed jsp Hruz et al 2008 TABLE 4 2 Autres approches de m ta analyses de donn es de puces a ADN provenant de GEO en gras l outil que j ai d velopp Les cellules gris es correspondent aux outils non gratuits 4 8 Conclusions et perspectives 211 Futurs d veloppements et am lioration du projet TranscriptomeBrowser Dans un premier temps maintenant que la preuve de concept de notre algorithme DBF MCL a t publi et que nous avons d
468. rful approach to multiple testing J Royal Stat Soc Ser B 57 289 300 Eisen MB Spellman PT Brown PO Botstein D 1998 Cluster analysis and display of genome wide expression patterns Proc Natl Acad Sci U S A 95 14863 14868 Saldanha AJ 2004 Java Treeview extensible visualization of microarray data Bioinformatics 20 3246 3248 Brown MP Grundy WN Lin D Cristianini N Sugnet CW et al 2000 Knowledge based analysis of microarray gene expression data by using support vector machines Proc Natl Acad Sci U S A 97 262 267 Kerr MK Martin M Churchill GA 2000 Analysis of variance for gene expression microarray data J Comput Biol 7 819 837 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 July 2010 Volume 5 Issue 7 e11671 41 42 50 51 52 53 54 56 57 58 59 60 61 62 63 64 66 67 68 69 70 Pavlidis P 2003 Using ANOVA for gene selection from microarray studies of the nervous system Methods 31 282 289 Ubol S Masrinoul P Chaijaruwanich J Kalayanarooj S Charoensirisuthikul T et al 2008 Differences in global gene expression in peripheral blood mononuclear cells indicate a significant role of the innate responses in progression of dengue fever but not dengue hemorrhagic fever J Infect Dis 197 1459 1467 Stobo JD Kennedy MS Goldyne ME 1979 Prostaglandi
469. rim s Le diagramme MA correspond la visualisation des log ratios en fonction de l chelle des intensit s moyennes L abscisse indique l intensit moyenne des 2 chantillons A et l ordon 2 2 Correction des donn es brutes 63 n e le logarithme base 2 du ratio des intensit s M de ces chantillons et ceci pour chaque sonde avec dans le cas des puces ADN two colors M log ratio log2 R G et A log2 R G 2 o R repr sente l intensit de fluorescence rouge red celle de la Cy5 et G celle de fluores cence verte green la Cy3 Log des intensit s de l chantillon 2 B M log ratio 4 t N Y g 4 3 R 10 l2 C Log des intensit s de l chantillon 1 D A intensit moyenne rad Al 6 me f 7 E ee AS 1 6 pi oo D er Gus _ BIE 2 2 o N e es 1 222 22222 2 02 Jo 4 Log2 des intensit s Log2 des intensit s Ficure 2 1 Les diff rents types de repr sentations A nuage de points B diagramme C histogramme D bo te moustaches En rouge sont repr sent s les g nes sur exprim s et en vert les g nes sous exprim s dans l chantillon 1 par rapport l chantillon 2 Les donn es utilis es ici sont celles de la librairie R bioconductor ALL Acute Lymphoblastic Leukemia Chiaretti et al 2004 2 2 3 1 Normalisation globale La m thode de normalisation la plus simple
470. rincipale technologie de base de donn es couramment utilis e l heure actuel est MySQL 4 1 1 Qualit et tra age La qualit des donn es d pend beaucoup de la capacit des utilisateurs visualiser les probl mes et apprendre de leurs erreurs par des am liorations constantes des outils et techniques Dans cet effort de poursuite des d veloppements bioinformatiques la tra abilit des donn es passe par le maintien des suivis de modifications des programmes et ou des scripts gr ce des logiciels de gestion de versions comme SVN subversion par exemple Ces logiciels de suivis permettent de mutualiser un d veloppement en stockant le code source d un logiciel et une arborescence de fichiers en conservant la chronologie de toutes les modifications qui ont t effectu es Le syst me travaille par fusion de copies locales et distantes et non par crasement de la version distante par la version locale La mise en place d un syst me de gestion de l information du laboratoire LIMS La boratory Information Management System logiciel de gestion int gr permet des actions telles que la tra abilit des chantillons la gestion des utilisateurs des instruments des stocks des approvisionnements par fournisseur le suivi des produits et quipements utilis s l enregistrement des incidents la d finition des analyses effectu es avec leurs param tres et 4 1 Stockage des donn es 147 parfois m me
471. ris K V Santoso S Turner A M Pastori C amp Hawkins P G 2008 Bidirectional transcription directs both transcriptional gene activation and suppres sion in human cells PLoS genetics 4 11 e1000258 Mutter amp Boynton 1995 Mutter G L amp Boynton K A 1995 PCR bias in amplification of androgen receptor alleles a trinucleotide repeat marker used in clonality studies Nucleic acids research 23 8 1411 8 Mutter et al 2004 Mutter G L Zahrieh D Liu C Neuberg D Finkelstein D Baker H E amp Warrington J A 2004 Comparison of frozen and RNALater solid tissue storage methods for use in RNA expression microarrays BMC genomics 5 88 Naef amp Huelsken 2005 Naef F amp Huelsken J 2005 Cell type specific transcriptomics in chimeric models using transcriptome based masks Nucleic acids research 33 13 e111 Nammo ef al 2011 Nammo T Rodriguez Seguf S A amp Ferrer J 2011 Mapping open chromatin with formaldehyde assisted isolation of regulatory elements Methods in molecu lar biology Clifton N J 791 287 96 Narurkar et al 1968 Narurkar M V Narurkar L M amp Sahasrabudhe M B 1968 A new technique of pH gradient electrophoresis as applied to the separation of nucleic acid bases Analytical biochemistry 26 1 174 7 Natoli 2011 Natoli G 2011 Specialized chromatin patterns in the control of inflammatory gene expression Current topics
472. rmettent de conna tre l pig nome normal de tissus et de pouvoir ainsi les comparer ceux de pathologies en parti culier dans le cas de maladies touchant sp cifiquement un ou plusieurs tissus donn s comme le coeur le cerveau le foie ou bien encore un lign e donn e de cellules immunitaires 1 6 Langages de programmation pour l analyse de donn es Comme cela sera pr sent et discut ult rieurement l analyse de donn es issues des puces ADN et du s quen age tr s haut d bit n cessite des d veloppements bioinformatiques constants des programmes avec ou sans interface graphique des services web ou bien encore des bases de donn es Ceci est rendu possible par l utilisation de techniques de programmation adapt es au besoin des biologistes et des bioinformaticiens En effet la bioinformatique peut tre d finie tr s simplement comme l analyse automatique et programmatique des donn es biologiques dans le but d en extraire de l information Cette discipline constitue la biologie in silico par analogie avec in vitro ou in vivo Elle est un champ de recherche multidisciplinaire o travaillent de concert biologistes m decins 54 Chapitre 1 Introduction g n rale informaticiens math maticiens physiciens et bioinformaticiens dans le but de r soudre un probl me scientifique pos par la biologie Ce terme peut galement d crire par abus de langage toutes les applications informatiques r sul
473. rochromes D visualisation de l interface graphique du logiciel FASTQC permettant diff rentes analyses dont I tude de la distribution des scores de qualit pour chacun des 50 nucl otides s quenc s 5 3 Analyse de donn es de ChIP seq 235 Analyse Format Description csfasta Format de fichier propri taire de Life Technolo Primai gies M _QV qual Format de fichier propri taire de Life Technolo gies fastq Format principal de s quence provenant de techniques HTS sff Format de fichier propri taire de Roche ma Format de fichier propri taire de Life Technolo Secondaire gies bam Format de fichier binaire d alignement converti partir du fichier ma sam Fichier obtenu par d compression d un fichier bam gff Fichier d annotation Le wig Fichier de couverture Tertiaire Sep bed Fichier d annotation txt Fichier contenant des r sultats ou statistiques TABLE 5 2 Les principaux formats de donn es du s quengage tr s haut d bit Ils fonctionnent en ligne de commande ce qui permet leur inclusion dans divers programmes et pipelines d analyse Les formats de fichiers obtenus par l analyse tertiaire d pendent de l outil utilis et du type d analyse Il n existe pas de format standard repris par tous les logiciels g n ralement ils utilisent en sortie des fichiers texte tabul s ou des fichiers bed pour permettre l inclusion des r sultats dans un genome browser Il
474. roteins in cis to inhibit transcription Nature 454 7200 126 30 Wang amp Simon 2011 Wang X amp Simon R 2011 Microarray based cancer prediction using single genes BMC bioinformatics 12 391 Wasserman amp Sandelin 2004 Wasserman W W amp Sandelin A 2004 Applied bioinforma tics for the identification of regulatory elements Nature reviews Genetics 5 4 276 87 Wei et al 2006 Wei C L Wu Q Vega V B Chiu K P Ng P Zhang T Shahab A Yong H C Fu Y Weng Z Liu J Zhao X D Chew J L Lee Y L Kuznetsov V A Sung W K Miller L D Lim B Liu E T Yu Q Ng H H amp Ruan Y 2006 A global map of p53 transcription factor binding sites in the human genome Cell 124 1 207 19 Werner 2008 Werner T 2008 Bioinformatics applications for pathway analysis of microar ray data Current opinion in biotechnology 19 1 50 4 Wilbanks amp Facciotti 2010 Wilbanks E G amp Facciotti M T 2010 Evaluation of algo rithm performance in ChIP seq peak detection PloS one 5 7 e11471 Wilhite amp Barrett 2012 Wilhite S E amp Barrett T 2012 Strategies to Explore Functional Genomics Data Sets in NCBI s GEO Database Methods in molecular biology Clifton N J 802 41 53 Wolffe amp Hayes 1999 Wolffe A P amp Hayes J J 1999 Chromatin disruption and modifi cation Nucleic acids research 27 3 711 20 Wu ef al 20
475. rouver des exp riences ou des jeux de donn es B repr sentation sous forme d histogramme des profils d expression d un g ne avec en rouge la valeur d intensit d expression normalis e et en bleu le rang du g ne au sein de chaque chantillon de l exp rience chaque chantillon est galement mis en relation avec les param tres exp rimentaux tissu tat de la maladie C r sum des informations sur l exp rience D classification hi rarchique du jeu de donn es Adapt de Barrett et al 20051 4 3 Contexte du projet 151 4 3 Contexte du projet Comme nous l avons vu l utilisation intensive des puces ADN pour l tude du trans criptome g n re une quantit importante de donn es Celles ci sont depuis quelques ann es accessibles publiquement par le biais de bases de donn es en ligne telles que Gene Ex pression Omnibus GEO NCBI La taille de ces bases de donn es cro t tr s rapidement et n cessite la mise au point de strat gies d analyse permettant de r analyser ces donn es efficacement Dans ce contexte j ai contribu au d veloppement de notre propre outil appel Transcrip tomeBrowser TBrowser sous la direction de Denis Puthier en 2007 Le site web du projet est accessible l adresse http tagc univ mrs fr torowser Il n existait pas d outils permettant la m ta analyse de jeux de donn es Seuls les outils disponibles sur le site de GEO permettaient de r an
476. s 5 x 107 1 x 10 10 x 109 1 x 107 gt 1 x 10 TABLE 5 1 comparaison des techniques de ChIP on chip et de ChIP seq Le correspond l utilisation du kit MAGnify 218 Chapitre 5 tude de la r gulation transcriptionelle par HTS 5 1 2 Principe biologique La premi re tape du ChIP appel e crosslink permet de fixer les prot ines ADN de mani re covalente afin de pouvoir tudier leur interaction sur tout le g nome Une fois produites en quantit suffisante les cellules subissent un traitement au formald hyde qui a pour objectif de cr er des liaisons covalentes entre les r sidus de lysine des prot ines et les cytosines de l ADN tout en pr servant l int grit structurelle des cellules Cette r action stopp e par l ajout de glycine est r versible Elle est l une des tapes primordiales du ChIP Les cellules trait es au formald hyde subissent des lyses successives afin d extraire la chromatine Celle ci est ensuite fragment e soit par sonication soit par digestion enzymatique afin d obtenir des fragments d ADN crosslink s aux prot ines de la taille souhait e Pour tude d un facteur de transcription plus les fragments seront petits longueur comprise g n ralement entre 150 et 300 nucl otides plus la d finition des positions des sites de fixation sera pr cise Pour les marques d histones la taille de fragment est de 146
477. s sont tudi es en analysant leur distribution au niveau des g nes par recherche de profils diff rentiels autour du site d initiation de la transcription TSS ou tout au long du g ne dans diverses conditions exp rimentales ou pour diff rents types cellulaires Kidder et al 2011 Barski et al 20071 Figure 5 3 Cette diff rence peut galement tre tudi e de mani re globale recherche de diff rentiel de couverture sur l ensemble du g nome dans le but d identifier des enhancers En effet il a r cemment t montr que les sites de liaison interg niques de l ARN polym rase II taient localis s de fa on pr f rentielle proximit des enhancers De Santa et al 20101 5 2 L informatique du HTS Le d veloppement rapide des technologies HTS implique la mise en place d outils et m thodes performants pour l analyse des donn es et leur mise jour permanente Chaque s quenceur poss de une architecture syst me propre ainsi que des suites logicielles propri taires sp cifiques avec des formats de donn es non standards Le mat riel n cessaire au fonctionnement des s quenceurs HTS et l analyse des donn es g n r es est donc tr s important En effet il ne suffit pas de disposer d un ordinateur de bureau comme pour les technologies de puces ADN donnes un ordre de grandeur des fichiers de donn es les donn es de s quen age tr s haut d bit g n rent un flux de donn es qui s
478. s Finally taking advantage of our large collection of transcriptional signatures we constructed a comprehensive map that summarizes gene gene co regulations observed through all the experiments performed on HGU133A Affymetrix platform We provide evidences that this map can extend our knowledge of cellular signaling pathways Citation Lopez F Textoris J Bergon A Didier G Remy E et al 2008 TranscriptomeBrowser A Powerful and Flexible Toolbox to Explore Productively the Transcriptional Landscape of the Gene Expression Omnibus Database PLoS ONE 3 12 e4001 doi 10 1371 journal pone 0004001 Editor Pamela A Silver Harvard Medical School United States of America Received July 8 2008 Accepted November 25 2008 Published December 23 2008 Copyright 2008 Lopez et al This is an open access article distributed under the terms of the Creative Commons Attribution License which permits unrestricted use distribution and reproduction in any medium provided the original author and source are credited Funding This work was supported by the Institut National de la Sant et de la Recherche M dicale Inserm the Cancerop le PACA and Marseille Nice Genopole Fabrice Lopez was supported by a fellowship from the EU STREP grant Diamonds and through funding from the IntegraTCell project ANR National Research Agency The funders had no role in study design data collection and analysis decision to publish or preparation of the manuscript
479. s Nature biotechnology 26 12 1351 9 Kidder et al 2011 Kidder B L Hu G amp Zhao K 2011 ChIP Seq technical considera tions for obtaining high quality data Nature immunology 12 10 918 22 Kim et al 2008 Kim C Cheon M Kang M amp Chang I 2008 A simple and exact Lapla cian clustering of complex networking phenomena application to gene expression profiles Proceedings of the National Academy of Sciences of the United States of America 105 11 4083 7 Kim et al 2010 Kim T K Hemberg M Gray J M Costa A M Bear D M Wu J Harmin D A Laptewicz M Barbara Haley K Kuersten S Markenscoff Papadimitriou E Kuhl D Bito H Worley P F Kreiman G amp Greenberg M E 2010 Widespread transcription at neuronal activity regulated enhancers Nature 465 7295 182 7 Kircher et al 2011 Kircher M Heyn P amp Kelso J 2011 Addressing challenges in the production and analysis of illumina sequencing data BMC genomics 12 382 Kircher et al 2009 Kircher M Stenzel U amp Kelso J 2009 Improved base calling for the Illumina Genome Analyzer using machine learning strategies Genome biology 10 8 R83 Klose amp Zhang 2007 Klose R J amp Zhang Y 2007 Regulation of histone methylation by demethylimination and demethylation Nature reviews Molecular cell biology 8 4 307 18 Knapen et al 2009 Knapen D Vergauwen L Lauke
480. s Underlying Familial Dysautonomia PLoS ONE 5 12 e15590 doi 10 1371 journal pone 0015590 Editor Carlo Gaetano Istituto Dermopatico dell Immacolata Italy Received September 2 2010 Accepted November 13 2010 Published December 20 2010 Copyright 2010 Boone et al This is an open access article distributed under the terms of the Creative Commons Attribution License which permits unrestricted use distribution and reproduction in any medium provided the original author and source are credited Funding The authors thank the Association Francaise de Recherche contre les Myopathies AFM for supporting their work NB was supported by a PhD fellowship from the Minist re de l Education Nationale de la Recherche et de la Technologie MENRT The funders had no role in study design data collection and analysis decision to publish or preparation of the manuscript Competing Interests The authors have declared that no competing interests exist E mail el cherif ibrahim univmed fr Introduction Familial dysautonomia FD Riley Day syndrome hereditary sensory and autonomic neuropathy type HI MIM 223900 is an autosomal recessive genetic disorder that occurs in 1 3600 live births with a carrier frequency of in 30 in the Ashkenazi Jewish population The disease is characterized by incomplete develop ment and the progressive depletion of autonomic and sensory neurons 1 3 resulting in variable symptoms including insensi ti
481. s aren t between for this example 1 et 1 e g for the M value inferior to 1 and superior to 1 The command is gt agMAplot myob whichSlot gM array 1 show gene c 1 1 gt a lt c DarkCorner GE_BrightCorner gt agMAplot myob whichSlot gM array 1 show gene a US45102986_251487911262_S01_GE1 v5_95_Feb07_1_1 txt US45102986_251487911262_S01_GE1 v5_95_Feb07_1_1 txt AIU7TT74 DT TIFT A DarkCorner e Le GE_BrightCorn TC523154 A_44_P723917 Sie anol ae s bo e o s ore pis tca A pene e i RS A5 RGD1306053 i TO53 AP paes TC522979AW434086 ydo ones y Ta in T T T T T T T T T T T 6 8 10 12 14 16 6 8 10 12 14 16 A A Slot log2 gM with densCols Slot log2 gM with densCols a b Figure 4 The agMAplot function A Visualization of gene name thanks to GeneName slot for the M value inferior to 1 and superior to 1 B Visualization of the position on the plot of the spots corresponding to the list a of Gene Name 3 3 The aglmage function This function allows to obtain a virtual image of each array Thus presence of paths of colour or anomaly can be observed The different arguments of this function are gt args agImage function object whichSlot NULL array 1 type intensity threshold NULL bar TRUE html FALSE pdf FALSE identify FALSE col zoom NULL row zoom NULL log TRUE
482. s e d un r sum d crivant la conception de la puce et d un tableau d annotation complet des s quences qui y sont fix es Chaque plateforme est associ e a un identifiant unique GPLxxx Une plateforme peut tre associ e a de nombreux chantillons provenant de diverses exp riences men es dans des laboratoires ind pendants Pour chaque chantillon associ un identifiant unique GSMxxx les conditions dans lesquelles celui ci a t obtenu sont d crites suivant les informations requises par le standard MIAME Un chantillon est r f renc dans une seule plateforme mais peut tre inclus dans plusieurs exp riences Une exp rience GSExxx est constitu e d un ensemble d chantillons et d crit pr cis ment les param tres exp rimentaux des diff rents chantillons pour conna tre le but de l tude 4 2 4 R analyses et m ta analyses de jeux de donn es provenant de GEO Diverses approches et outils ont t d velopp s pour permettre la r analyse et ou la m ta analyse des jeux de donn es disponibles dans les bases de donn es de puces ADN En effet GEO propose galement une visualisation originale des donn es sous la forme de jeux de donn es datasets GDSxxx qui repr sentent des chantillons statistiquement et biologiquement comparables et manuellement v rifi s par les op rateurs de GEO Ainsi GEO propose deux type d outils GEO Profiles Figure 4 1 A B et GEO Datasets Figure 4 1 C D B
483. s of dorsal root ganglia and neuropathologic observations on spinal cords in familial dysautonomia J Neurol Sci 35 77 92 Perera M Merlo GR Verardo S Paleari L Corte G Levi G 2004 Defective neurono genesis in the absence of DIx5 Mol Cell Neurosci 25 153 161 Perez Otano I Lujan R Tavalin SJ Plomann M Modregger J Liu XB Jones EG Heinemann SF Lo DC Ehlers MD 2006 Endocytosis and synaptic removal of NR3A containing NMDA receptors by PACSIN1 syndapin1 Nat Neurosci 9 611 621 Probst S Kraemer C Demougin P Sheth R Martin GR Shiratori H Hamada H Iber D Zeller R Zuniga A 2011 SHH propagates distal limb bud development by enhancing CYP26B1 mediated retinoic acid clearance via AER FGF signalling Development 138 1913 1923 Rahl PB Chen CZ Collins RN 2005 Elp1p the yeast homolog of the FD disease syndrome protein negatively regulates exocytosis independently of transcriptional elongation Mol Cell 17 841 853 Ruggiu M Herbst R Kim N Jevsek M Fak JJ Mann MA Fischbach G Burden SJ Darnell RB 2009 Rescuing Z agrin splicing in Nova null mice restores synapse formation and unmasks a physiologic defect in motor neuron firing Proc Natl Acad Sci USA 106 3513 3518 Sadanandam A Rosenbaugh EG Singh S Varney M Singh RK 2010 Semaphorin 5A promotes angiogenesis by increasing endothelial cell proliferation migration and decreasing apoptosis Microvasc Res 79 1 9 Sanchez Alcaniz JA Haege S Mueller W Pla R Mac
484. s with or without the presence of 80 uM kinetin for 24h We first observed that the total amount of JABKAP transcript detected was almost identical when probes at the extremities or in the middle of the transcript were used Figure 7G In addition kinetin has no significant effects on JABKAP transcript levels in control cells which likely excludes a potential action of kinetin on JABKAP transcription Moreover kinetin by improving JABKAP exon 20 recognition restores JKBKAP transcript levels in FD hOE MSCs similar to those observed in control cells Figure 7G Kinetin did not modify the ratio of alternative splicmg around exon 2 and exon 36 suggesting its specific mechanism of action on exon 20 inclusion data not shown Altogether these results revealed that kinetin exerts a rapid and possibly long lasting effect on JABKAP mRNA splicing which most likely occurs by increasing JABKAP mRNA stability rather than acting on transcription FD sphere cells display a strongly reduced KBKAP exon 20 skipping One property of multipotent cell consists in their capacity to organize into spheres when cultured in appropriate medium Since FD hOE MSCs express a significant amount of MU JABKAP transcript we asked whether induction of sphere formation could modify the WT MU JABKAP transcripts ratio Although hOE MSCs proliferate as adherent cells when cultured in DMEM F12 supplemented with serum Figure 8A they progressively organize into spherical
485. s TBrowserDBv2 Cette base de don n es MySQL est constitu e de 47 tables d architecture MyISAM permettant HA acc s rapide AC donn es Sink s aoe Sie S wa eS oe eed Es Ree ue 79 Liste des figures 4 5 4 6 4 7 5l 5 2 33 5 4 55 5 6 5 7 5 8 59 5 10 SA 3 12 5 13 5 14 5 15 Les diverses annotations disponibles dans la base de donn es utilis e pour g n rer l annotation des signatures transcriptionnelles Interface graphique de TBrowser avec son panneau de requ tes et ses principaux PUZINS a dd au De D an ES ma ae Ee dih i a R sum de l avancement du projet avec le d veloppement d une nouvelle base de donn es d une librairie R de services web et l int gration de nouvelles CONNGGS 254524 bee 64 RSE 48 beige LE ass di EN SELS e ChIP seq vs ChIP on chip processus g n ral Distribution th orique de fragments s quenc s apr s alignement sur une s quence de r f rence avec en A la d finition d un pic o d correspond la taille de sonication et en B les diff rents profils de pics Adapt de Wilbanks amp Facciotti 2010 et Kidder et al 20111 Visualisation de profils de pics avec en A ceux obtenus pour un facteur de transcription ou pour les marques de m thylation et en B la diff rence de profils entre les diff rentes modifications d histone Barski et al 2007 Tomaru et
486. s fragments d ADN 1 4 Les techniques de s quencage tr s haut d bit Application Avantage du PET Technique et r f rence Alignement des reads Augmentation de l efficacit lors de l alignement Diminution du co t de s quen age des chantillons Apport d informations par rapport aux distances entre les deux frag ments s quenc s et leur relation d l tion insertion inversion Paired end ditag PET Ng et al 2005 Wei et al 2006 Paired end se quencing PES Holt amp Jones 2008 Paired end map ping PEM Korbel et al 2007 Mate pairs Shendure et al 20051 Paired end genomic si gnature tags PE GST Dunn et al 2007 de la d marcation des fragments contenant le site d int r t Transept Identification des 5 et 3 UTR Gene identification signature GIS PET Ng et al 2005 Identification de TSS alternatifs Gene Scanning CAGE GSC PET Carninci et al 2005 Epig n tique Am lioration de la sp cificit et ChIP PET Wei et al 2006 Variation de la struc ture du g nome Requis pour le s quen age de novo DNA PET Hillmer et al 2011 43 TABLE 1 3 Applications et avantages du paired end pour les techniques de s quen age tr s haut d bit 44 Chapitre 1 Introduction g n rale A Fragment 5 mm 3 Paired end Fragment d ADN Mate pair B S quence de r f rence ns
487. s limit e par les sondes fix es sur le support cas des puces on peut identifier de nouveaux transcrits jamais observ s auparavant variants d pissage lincRNA En fonction de l application on choisira l utilisation des modes fragment ou paired end la strat gie de s quen age fragment tant pr f r e pour le tag counting alors que le paired end permet identification de fusion de transcrits d pissages alternatifs Il est noter qu il existe maintenant des kits de s quen age de l exome et de re s quen age d une r gion cibl e du g nome sur puces ADN de capture ou sur billes magn tiques Clark et al 20111 Ces techniques proposent des sondes chevauchantes de taille fixe 60 nucl otides d cal s d un pas de 3 nucl otides pour Agilent Technologies couvrant les r gions d ADN tudier pouvant aller jusqu 10Mb Il est important d utiliser une s quence ne contenant pas d l ments r p t s pour cela on peut masqu e la s quence l aide du logiciel RepeatMasker ceci est g n ralement int gr dans les logiciels de conception de puces capture Les fragments d ADN ou d ADNc s hybridant aux puces ou aux billes seront d shybrid s puis s quenc s 1 4 2 3 Autres types d applications Il existe diverses techniques permettant l acquisition de donn es sur un g nome le s quen age de novo de novo seq le re s quen age re seq ou bien encore
488. s modulation regulates dengue viral replication Virology 389 8 19 Vaughn DW Green S Kalayanarooj S Innis BL Nimmannitya S et al 2000 Dengue viremia titer antibody response pattern and virus serotype correlate with disease severity J Infect Dis 181 2 9 PLoS ONE www plosone org 16 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 Molecular Mechanisms of DSS Murgue B Roche C Chungue E Deparis X 2000 Prospective study of the duration and magnitude of viraemia in children hospitalised during the 1996 1997 dengue 2 outbreak in French Polynesia J Med Virol 60 432 438 Ray G Kumar V Kapoor AK Dutta AK Batra S 1999 Status of antioxidants and other biochemical abnormalities in children with dengue fever J Trop Pediatr 45 4 7 Kalayanarooj S Nimmannitya S 2005 Is dengue severity related to nutritional status Southeast Asian J Trop Med Public Health 36 378 384 Nguyen TH Nguyen TL Lei HY Lin YS Le BL et al 2005 Association between sex nutritional status severity of dengue hemorrhagic fever and immune status in infants with dengue hemorrhagic fever Am J Trop Med Hyg 72 370 374 Schaible UE Kaufmann SH 2007 Malnutrition and infection complex mechanisms and global impacts PLoS Med 4 e115 Pacheco P Bozza FA Gomes RN Bozza M Weller PF et al 2002 Lipopolysaccharide induced leukocyte lipid body for
489. s output mechanism such as in linear pathway diagrams As a consequence it is suited for a more general study of various types of networks Moreover since visual zones correspond to Gene Ontology terms this layout handles different levels of accuracy in the localisation of proteins for instance a precisely annotated protein might be placed in the zone corresponding to endoplasmic reticulum while a less well annotated can be placed in the more generic higher level zone intracellular In Cerebral each gene product is represented by one instance whose cell compartment may be defined by the user In contrast InteractomeBrowser displays by default several instances of a given gene product that may be placed in several cell compartments according to informations provided by the GO Cellular component ontology Although this may lead to a more complex graph it provides a more exhaustive presentation of current knowledge and may draw the attention 14 of users to unexpected locations of gene products in the cells The user may choose to delete some of these instances hence selecting a posteriori the most representative one The main benefit of InteractomeBrowser resides in its direct interaction with the database described in this report Indeed it provides a ready to use web based service that requires only few manipulations to retrieve a network of interactions see video tutorial provided as additional file Notably in addit
490. scription interagissant directement avec l ADN et mettre en exergue la pr sence de co facteurs dans le cas de p300 par exemple De plus partir d une liste de pics de bonne qualit il est galement possible d am liorer les motifs par la constitution d une collection de s quence type de repr senta tion des motifs par exemple celles contenues dans les bases de donn es de JASPAR Wasserman amp Sandelin 2004 et TRANSFAC Matys et al 2003 ou bien encore Uni PROBE pour la souris Elles collectent des informations sur les sites potentiels de liaison dans une matrice poids positions ou logo Divers outils de recherche de motifs ont ainsi t adapt s aux donn es issues du ChIP seq tels que RSATools avec peakmotifs MEME avec MEME ChIP Machanick amp Bailey 2011 DREME Bailey 2011 Certains pipelines d analyse comme la librairie rGADEM per mettent l aide de plusieurs librairies R de rechercher les motifs partir des pics d termin s par PICS Figure 5 13 Ces outils nous renseignent sur la probabilit d obtenir une liste enrichie en sites potentiels de fixation d un facteur en fonction de son occurrence sur le g nome Le probl me est la d termination du mod le de fond et des s quences finalement utilis es pour la recherche de motifs MEME ChIP n utilise que les meilleures s quences pour construire le motif Ces outils proposent g n ralement l utilisation de deux jeux de s qu
491. scular endothelial dysfunction is indicated Thin black arrow release of Bold black arrow interaction between Punctuated black arrow chemotactic effect Thin red arrow biological activity Bold red arrow direct activity on endothelium DAMPs danger associated molecular pattern GAG glycosaminoglycane ROI reactive oxygen intermediates TLR Toll like receptor doi 10 1371 journal pone 0011671 g004 PLoS ONE www plosone org 12 July 2010 Volume 5 Issue 7 e11671 such as insulin resistance 109 111 which altogether characterize systemic inflammatory syndromes such as DSS or severe sepsis While the existence of functional lipid laden Mo Mac during DSS should be established by functional studies such a molecular mechanism could explain decrease of circulating sub fractions and total cholesterol previously reported in DSS 112 115 and in other critically ill patients where low cholesterol levels are associated with poor clinical outcome 116 117 Altered homeostasis of cholesterol in blood cells from DSS patients could also favour replication of dengue viruses into host cells 118 thus contributing to increased viremia in patients with severe dengue infection 119 120 while this could not be evaluated in this study since part of patients had undetectable viremia at the time of blood sampling The factors contributing to altered homeostasis of cholesterol in the blood cells of DSS children at time of shock are numerous In
492. sed in spheres in comparison to the differentiated samples In addition many genes involved in proteolysis were upregulated in spheres samples MMP1 MMP10 ADAMST14 MME PRSS35 and ADAMSTS Using the SAM analysis we next compared the FD signature be tween control and FD samples We assumed a FDR of 10 and characterized 35 differentially expressed genes with a FC gt 2 Fig 2 Although most of the genes were downregulated in FD IKBKAP appears the second most discriminant marker between control and FD hOE MSCs Importantly 10 differentially expressed genes en code proteins playing important role in neural cells CD40 FXYD1 GPR37 LYN NRG1 PACSINI RUNX3 SCN2B SFRP2 SNCA Aubert et al 2002 Burr et al 2010 Deng et al 2007 Gibb et al 2011 Hossain et al 2010 Kramer et al 2006 Lopez Santiago HUMAN MUTATION Vol 00 No 0 1 11 2012 3 SPHERES RAFNSHH SPHERES RAFNSHH 3 0 3 N BAN D DPPPANPT OI symBoL FC PValue MGC9913 3 81 2 70E 4 CR622844 2 45 4 05E 5 LOC339803 4 62 9 87E 4 FXYD1 2 41 2 46E 3 MAGEL2 2 54 6 87E 3 SFRP2 9 39 2 74E 3 CA314185 2 83 2 74E 3 LOC100289550 2 67 4 74E 3 THC2724353 4 18 3 42E 3 SCN2B 3 76 9 60E 3 TNNT1 3 37 6 68E 3 C2orf27A 3 22 1 23E 3 PACSIN1 3 18 2 01E 3 SNCA 10 02 8 14E 3 AJAP1 3 13 1 04E 2 BG118529 13 15 6 33E 8 FLJ39632 6 36 3 56E 5 IKBKAP 2 43 2 91E 6 CD40 2 47 3 97E 4 BC017398 2 22 2 44E 4 ITIH5 22 03 7 26E 4 ENST00000443689 3 65 1 22E 3 LYN 2 01 7 83E
493. ser un individu au d clenchement et au d veloppement particulier d une pathologie mais n impliquent en aucun cas son apparition Ces maladies complexes sont diverses telles que ob sit diab te asthme cancer maladies auto immunes maladies neurod g n ratives etc Des examens m dicaux comme des prises 20 Chapitre 1 Introduction g n rale de sang et autres examens cliniques permettent de les d tecter et de suivre leur volution tant au niveau d un tissu ou organe donn que de tout l organisme En effet toutes les maladies ont des volutions et des pronostics diff rents Les travaux de recherche sur ces maladies visent la compr hension de leurs perturbations Ainsi l tude grande chelle de l expression des g nes permet de d terminer des g nes diff rentiellement exprim s pouvant expliquer les perturbations observ es Ces travaux conduisent entre autres des analyses fonctionnelles au niveau du produit de ces g nes et de la r gulation de leur expression Plus on en conna t sur une maladie mieux on peut la combattre ou du moins limiter ses cons quences L tude du transcriptome et des r gulations transcriptionnelles sont donc un des aspects tr s importants de l tude des pathologies De plus en plus de publications portent par exemple sur l tude des Acides RiboNucl iques ARN ou RNA en anglais non codants comme les microRNA ou bien encore les Long intergenic non coding RNA
494. sers may also use other normalization routines such as doRankTransformation or limma normalizeQuantiles gt subNorm lt doNormalScore sub The DBFMCL function allows one to extract TS from a data set Its behaviour is controlled by several arguments gt args DBFMCL function data NULL filename NULL path name NULL distance method c pearson spearman euclidean spm spgm clustering TRUE silent FALSE verbose TRUE k 150 random 3 memory used 1024 fdr 10 inflation 2 set seed 123 returnRank FALSE NULL The DBFMCL function accepts a tab delimited file argument filename an expressionSet a data frame or a matrix argument data as input The input data must contain an expression matrix with gene as rows and samples as columns Note that space characters inside gene names are not allowed as they are not supported by the mcl command line program The two main parameters of DBF MCL are k that controls the size of the neighborhood and the inflation range 1 1 to 5 which controls the way the underlying graph is partitioned In the following example the neighborhood size k is set to 150 and the MCL inflation parameters is set to 2 0 default MCL setting Most generally these default parameters give very good results on microarray datasets For a detailed discussion about these parameters please read the section Performances of DBF MCL on Complex9RN200 dataset in the article
495. served DKNN value The critical value of DKNN is the one for which a user defined FDR value typically 10 is observed Genes with DKNN value below this threshold are selected and used to construct a graph In this graph edges are constructed between two genes nodes if one of them belongs to the k nearest neighbors of the other Edges are weighted based on the respective coefficient of correlation 1 e similarity and the graph obtained is partitioned using the Markov CLustering algorithm MCL 3 1 Installation With the current implementation DBFMCL function works only on UNIX like plateforms MCL is required and can be installed using your package manager or using the following command lines pasted in a terminal Download the latest version of mcl the library has been tested successfully with the 06 058 version wget http micans org mcl src mcl latest tar gz Uncompress and install mcl tar xvfz mcl latest tar gz cd mcl xx xxx configure make sudo make install You should get mcl in your path mcl h 10 3 2 Examples We will search for transcriptional signatures in a subset of the ALL dataset gt library ALL gt data ALL gt sub lt exprs ALL 1 3000 First we will normalize the data set using the doNormalScore function This function performs normal score transformation of a matrix The doNormalScore transforms each sample to follow a normal distribution with mean 0 and sd 1 Alternatively u
496. show gene NULL NULL For a basic image use this command 14 gt agImage myob whichSlot gM array 1 US45102986 251487911262 S01_GE1 v5 95 Feb07_1_1 txt 120000 500 110000 93000 400 80000 300 67000 1 nb_row 53000 200 40000 27000 100 13000 20 40 60 80 1 nb_col threshold 0 intensity Figure 5 Virtual image for the first array of the AgilentBatch object obtain thanks to the agImage function Another function agThreshold function allows to observe this image with a threshold Thus all the intensity values under the threshold are remplace by NA and are not observed on the image This threshold can be defined in accordance with two methods e a percentage of intensity between minimum and maximum intensities values e intensities distribution e g in accordance with the quantiles 25 50 ou 75 of the intensities values This function return the same object but with NA for the values inferior to threshold This function can be directly used on the agImage function by the type and threshold arguments by the command gt agImage myob gM array 1 type quartiles threshold 25 gt aglmage myob gM array 1 type intensity threshold 10 15 As was mentioned previously it s possible to exclude spots controls flags or list of gene names by the agEclude function And with this new object we can observe a new virtual image Moreover an ima
497. simple permet d interpr ter les r sultats du test t le vol cano plot Figure 3 2 Ce graphique repr sente en ordonn e le logarithme de base 10 des p valeurs provenant du test t et en abscisse le logarithme de base 2 du fold change Les g nes diff rentiellement exprim s sont ceux pr sentant de faibles valeurs p soit des valeurs en ordonn e les plus lev es possibles car elles sont en log10 et affichant des valeurs abso lues de fold change lev es G n ralement un seuil arbitraire est appliqu pour conserver les valeurs sup rieures 1 soit une expression deux fois plus importante dans un des chantillons 78 Chapitre 3 Analyses de donn es de puces ADN Log Odds Log Fold Change FIGURE 3 2 Repr sentation de type volcano plot Chaque point correspond un g ne Deux filtres sont r alis s un sur chaque axe pour d terminer les g nes discriminant Les zones en rose correspondent a des r gions dans lesquelles les g nes ont un log ratio sup rieure ou gal a 1 et dont la p valeur du test t est inf rieur 107 gt 3 1 2 Significant Analysis of Microarrays SAM La m thode SAM est un test non param trique qui permet d identifier des g nes dif f rentiellement exprim s entre deux groupes d chantillons et ceci sans a priori sur leur distribution SAM assigne un score chaque g ne sur la base du changement d expression relatif des g nes par rapport la d viation standard des r plica
498. siques tels que BLAST ou FASTA Altschul ef al 1990 En effet ces derniers ont t con us principalement pour retrouver une s quence query dans une r f rence subject et peuvent pour cela tre param tr s tr s finement afin de prendre en compte diff rents cas de figure mismatches gaps en fonction du contexte de la recherche Par contre cette complexit s accompagne d une relative lenteur ce qui les rend inadapt s au probl me pos par le HTS Pour cette raison divers outils bas s sur de r cents algorithmes de recherche ont t d velopp s afin de permettre l alignement sur le g nome de r f rence d un grand nombre de courts fragments 50nt pour le SOLiD dans un temps raisonnable Un logiciel appel mapread Corona Lite a t d velopp dans ce but par Life Technologies Il a l avantage de prendre en entr e les 2 fichiers issus de l analyse primaire a savoir le fichier csfasta code couleur et le fichier de qualit ce que peu de logiciels sont encore capables de faire D autres logiciels pour l alignement des s quences courtes existent comme BOWTIE Langmead et al 20091 BWA Li amp Durbin 20091 BFAST Homer et al 20091 ELAND SHRIMP2 Rumble ef al 2009 David et al 2011 SOAP Li amp Homer 2010 La qualit de s quen age est telle que l on consid re correctement align s des reads poss dant jusqu a 2 mismatches avec la r f rence Ces outils prenne
499. son J W Srinivasan M Tartaro K R Tomasz A Vogt K A Volkmer G A Wang S H Wang Y Weiner M P Yu P Begley R F amp Rothberg J M 2005 Genome sequencing in microfabricated high density picolitre reactors Nature 437 7057 376 80 Martens Uzunova et al 2011 Martens Uzunova E S Jalava S E Dits N F van Leen ders G J L H M ller S Trapman J Bangma C H Litman T Visakorpi T amp Jenster G 2011 Diagnostic and prognostic signatures from the small non coding RNA transcrip tome in prostate cancer Oncogene Martin et al 2004 Martin D Brun C Remy E Mouren P Thieffry D amp Jacq B 2004 GOToolBox functional analysis of gene datasets based on Gene Ontology Genome biology 5 12 R101 Massie amp Mills 2008 Massie C E amp Mills I G 2008 ChIPping away at gene regulation EMBO reports 9 4 337 43 Matys et al 2003 Matys V Fricke E Geffers R G ssling E Haubrock M Hehl R Hornischer K Karas D Kel A E Kel Margoulis O V Kloos D U Land S Lewicki Potapov B Michael H M nch R Reuter I Rotert S Saxel H Scheer M Thiele S amp Wingender E 2003 TRANSFAC transcriptional regulation from patterns to profiles Nucleic acids research 31 1 374 8 296 Bibliographie Maxam amp Gilbert 1977 Maxam A M amp Gilbert W 1977 A new method for sequencing DNA Proceedings of
500. souvent par le terme anglais Self Organizing Map SOM on encore carte de Teuvo Kohonen du nom du statisticien ayant d velopp le concept en 1984 Kohonen 1997 Tamayo 1999 Elle est utilis e pour classifier des donn es dans un espace multi dimensionnel comme dans le cas des puces a ADN 3 3 Annotation fonctionnelle Apr s avoir identifi des groupes de g nes diff rentiellement exprim s et afin de pouvoir interpr ter les donn es il est n cessaire de proc der des tests d enrichissement fonctionnel En effet les g nes co exprim s sont g n ralement impliqu s dans des processus ou voies de signalisation similaires Eisen et al 19981 Figure 3 5 3 3 Annotation fonctionnelle 83 3 3 1 Les diff rentes sources d information Il existe diverses sources d information utiles pour l annotation et donc pour l interpr tation des donn es de puces ADN En effet de tr s nombreuses bases de donn es stockent des informations sur la fonction la localisation l expression tissulaire la r gulation et les interac tions des g nes ou de leurs produits Tableau 3 1 En effet on consid re ici que les transcrits identifi s pr c demment sont traduits de mani re quivalente en quantit de prot ines fonc tionnelles Cela ne tient donc pas compte des m canismes de r gulation post transcriptionnelle et post traductionnelle Parfois les donn es sont organis es en un ensemble structur de termes
501. ssion Markup Language qui est un format tabul 4 2 M ta analyse et int gration de donn es 149 Il existe beaucoup de bases de donn es de puces ADN plus ou moins sp ciali s es Un tr s bon r capitulatif de ces bases de donn es a t cr par Sophie Lemoine de la plateforme transcriptome de l Ecole Normale Sup rieure ENS et est accessible l adresse http transcriptome ens fr sgdb tools data_management php Les principales bases de donn es utilis es sont Gene Expression Omnibus GEO Edgar et al 2002 Barrett et al 2005 Wilhite amp Barrett 2012 au NCBI tats Unis et ArrayExpres VEBI Angleterre Brazma ef al 2003 Parkinson et al 2011 La quantit de donn es dans ces bases croit tr s fortement car il est maintenant obligatoire pour une publication des r sultats obtenus de d poser les donn es brutes et normalis es dans ces bases de donn es ceci afin de permettre leur r analyse ventuelle Des d veloppements sous R ont galement t r alis s pour extraire les donn es contenues dans GEO et ArrayExpress et permettre leur r analyse ces librairies R sont GEOquery Sean amp Meltzer 2007 et ArrayExpress Kauffmann ef al 2009 respectivement 4 2 3 Structure des donn es dans Gene Expression Omnibus GEO Dans GEO http www ncbi nlm nih gov geo les donn es sont regroup es en plateformes de puces a ADN chantillons et exp riences Une plateforme est compo
502. ssionSet This ExpressionSet object from an AgilentBatch object one colour arrays is saved in home aurelie R i686 pc linux gnu library 2 5 AgiData ExpressionSet txt Flags corresponding to the data from an AgilentBatch object one colour arrays are saved in home aurelie R i686 pc linux gnu library 2 5 AgiData Flags txt 5 2 Exemples using the ExpressionSet object Then other library can be used like Biobase e Display gene name gt featureNames es 1 25 1 GE_BrightCorner DarkCorner DarkCorner 4 DarkCorner DarkCorner DarkCorner 7 DarkCorner DarkCorner DarkCorner 10 DarkCorner DarkCorner AA892298 13 AI232741 Gmpr XM_236342 16 RGD1309888 XM_222163 AA891661 19 Plcb3 Polr2b_predicted XM_225162 22 F10 Btbd5_predicted RGD1310717_predicted 25 Prkr e Visualization of expression data gt exprs es 1 25 1 20354 110000 11 306570 5 485073 5 550568 5 610573 6 5 665902 7 786766 8 300380 5 807561 5 845899 11 5 880494 10 634210 10 221720 2621 380000 11 501160 16 1173 688000 2910 492000 16 168080 203 756200 690 069400 21 6 061691 18 307080 12 981660 6 073625 22 823440 21 ANNEXE B Manuel d utilisation de la librairie R Bioconductor RTools4TB The RTools4TB package data mining of public microarray data through connections to the TranscriptomeBrowser database A Bergon F Lopez J Textoris S Granjeaud and D Puthier October 28 2009 T
503. st ainsi obtenue Cette tape est r p t e 10 fois afin de constituer une s quence incompl te de 50 nucl otides cas du mode fragment et de la lecture de 50 nucl otides Puis 4 autres cycles de 10 ligations sont r alis s partir d amorces s hybridant galement sur l adaptateur P1 en position n 1 n 2 n 3 et enfin n 4 La combinaison des 5 s quences partielles en code couleur permet de reconstituer la s quence de 50 nucl otides 1 4 1 2 Le code couleur du SOLiD avantages et inconv nients Une des particularit s du s quen age SOLiD est que chaque nucl otide est s quenc 2 fois En effet ce s quenceur ne lit pas base par base comme les mod les d clin s par Roche ou 40 Chapitre 1 Introduction g n rale 4 couleurs TTT TTS X He nnnzzz 4 dibases ee Tr 3 TTnnnzez 1024 sondes 4 FA 0 cc X n bases d g n r es X TTTS z bases universelles sims TGnnnzzz lt nnnzzz 2 M cT AG 2 AC 1 A c G xxhnnzzz Fluorochromes T aan cY3 Site de FAM ligation TXR du ous CY5 dibases FIGURE 1 9 Les sondes de la technologie SOLiD Chaque sonde de 8 nucl otides est compo s e de 2 bases compl mentaires la s quence cible positions 1 et 2 puis de 3 bases d g n r es n et enfin de trois bases universelles z Illumina mais d finit la s quence cible par la lecture de di bases Figure 1 10 Cette strat gie est particuli rement adapt e la d tection de SNP Single Nucleot
504. st cancer cells whose expression specificity was previously reported by other notably ERBB3 XBP1 KRT18 IL6ST CREB1 TFF1 TFF3 see Supplementary Table 3 Thus TBrowser can be used to perform meta analysis of microarray data in a platform indepen dent manner providing high confidence gene lists However one can also focus the analysis on a unique platform Indeed the transcriptional signatures 3DE64836D B79B1COB9 and E2E620F40 that were derived from the GPL570 platform which measures over 47 000 transcripts share a list of 68 genes Many of them correspond to poorly characterized genes for example C17orf28 Clorf64 KIAA1370 KIAA1467 LOC143381 LOC400451 LOC92497 and ZNF703 This example clearly demonstrates the superiority of TBrowser over conventional approaches as it can be used easily and productively to create robust sets of transcriptionally related genes whose subsequent analysis may be crucial in defining new therapeutic targets PLoS ONE www plosone org Using annotation terms to mine public microarray data Based on the systematic functional enrichment analysis the vast majority of TS 84 have a set of associated biological terms only functional enrichment with q value lt 0 01 are stored in the database One can search for TS related to functional terms of the DAVID knowledgebase e g nervous system development More interestingly multiple terms can be combined with Boolean operators Searching for
505. stello J F Ren B Milosavljevic A Meissner A Kellis M Marra M A Beaudet A L Ecker J R Farn ham P J Hirst M Lander E S Mikkelsen T S amp Thomson J A 2010 The NIH Roadmap Epigenomics Mapping Consortium Nature biotechnology 28 10 1045 8 Bertos amp Park 2011 Bertos N R amp Park M 2011 Breast cancer one term many enti ties The Journal of clinical investigation 121 10 3789 96 Bertucci et al 2004 Bertucci F Finetti P Rougemont J Charafe Jauffret E Nasser V Loriod B Camerlo J Tagett R Tarpin C Houvenaeghel G Nguyen C Maraninchi D Jacquemier J Houlgatte R Birnbaum D amp Viens P 2004 Gene expression profiling for molecular characterization of inflammatory breast cancer and prediction of response to chemotherapy Cancer research 64 23 8558 65 Bhinge et al 2007 Bhinge A A Kim J Euskirchen G M Snyder M amp Iyer V R 2007 Mapping the chromosomal targets of STAT1 by Sequence Tag Analysis of Genomic Enrichment STAGE Genome research 17 6 910 6 Billon amp C t 2011 Billon P amp C t J 2011 Precise deposition of histone H2A Z in chro matin for genome expression and maintenance Biochimica et biophysica acta Blat amp Kleckner 1999 Blat Y amp Kleckner N 1999 Cohesins bind to preferential sites along yeast chromosome III with differential regulation along arms versus the
506. sur et sous exprim es dans une des conditions Une fois ces donn es contextualis es l tape suivante de l analyse consiste en la g n ration de r seaux de g nes Ces graphes n cessitent l int gration en plus des donn es de r gulation transcriptionnelle disponible d autres types de donn es comme les interactions prot ine prot ine par exemple CHAPITRE 4 Fouille de donn es de puces ADN Sommaire 4 1 Stockage des donn es 146 Ali Qualit et tra age s og wok ts eo oe tes es a dor 146 4 1 2 Les bases de donn es MySQL 147 4 1 3 Optimisations de base de donn es 147 4 2 M ta analyse et int gration de donn es 148 4 2 1 Bases de donn es de biologie 148 4 2 2 Bases de donn es d di es aux donn es de puces ADN 148 4 2 3 Structure des donn es dans Gene Expression Omnibus GEO 149 4 2 4 R analyses et m ta analyses de jeux de donn es provenant de GEO 149 4 3 Contexte du projet 55 4 use se 6 reese des Gui 6 8 RR des 151 44 D veloppement de l application 151 ARTICLE 4 TRANSCRIPTOMEBROWSER A POWERFUL AND FLEXIBLE TOOLBOX To Ex PLORE PRODUCTIVELY THE TRANSCRIPTIONAL LANDSCAPE OF THE GENE EXPRES SION OMNIBUS DATABASE 155 4 5 Mise jour de la base et int gration de donn
507. t csfasta est un fichier fasta dont les donn es de qualit sont ordonn es non pas par ligation par cycle mais en fonction de la s quence de code couleur Comme d crit pr c demment voir partie 1 4 1 1 les s quences au format couleur commencent par une base correspondant la derni re base de l adaptateur suivie d une suite de chiffres entre 0 et 3 0 1 2 3 correspondant une des 4 couleurs des dibases Un est utilis pour une position en code couleur dont on ne dispose d aucune donn e choix du fluorochrome impossible pas de signal Le second fichier au format _QV qual comprend des scores de qualit un pour chaque dibase lue Ceux ci sont calcul s en utilisant le calcul de la valeur de qualit du Sanger Ewing amp Green 1998 Ewing et al 19981 le phred tels que QV 10xlog10 p o QV est la valeur de qualit et p la probabilit pr dite qu une couleur d tect e soit incorrecte Ces valeurs de qualit QV assign es pour chaque base sont comprises entre 0 et 40 Les positions non renseign es des s quences not es auront pour score 1 Ces scores sont pris en compte lors de l alignement Ils permettent de caract riser la qualit des s quences d ADN et peuvent tre utilis s pour comparer les efficacit s des diff rentes technologies de s quen age Dans le cas du mode de s quen age paired end chaque fragment not s F3 et F5 est s quenc
508. t l utilisateur d entrer une liste de g nes par exemple par un simple copier coller dans la zone de requ te et de demander quelles sont les signatures comportant au minimum une certaine proportion des g nes de cette liste Il peut ainsi tre utilis partir d un groupe de g nes trouv s comme diff rentiellement exprim s dans une tude de puce ADN ou bien partir de g nes cibles d un facteur de transcription donn provenant d une tude pig n tique par ChIP seq 4 6 2 Am lioration et nouveaux plugins Depuis la publication de TBrowser en 2008 quelques am liorations ont t apport es aux plugins existants Heatmap d velopp par Fabrice Lopez Figure 4 6 TBNeighborhood anciennement TBCommonGenes avec l ajout des informations sur les g nes diff rents identifiants et annotations qui n taient pas accessibles dans la pre mi re version de la base de donn es Figure 4 6 TBMap d velopp par Fabrice Lopez j ai modifi le plugin pour permettre l inclusion de cartes transcriptionnelles provenant de nouvelles esp ces autre que Homme et souris d autres am liorations ont galement t r alis es visualisation des g nes appartenant une voie de signalisation KEGG zoom partir de s lection sur la carte transcriptionnelle corr lation g ne g ne Plusieurs nouveaux plugins ont t d velopp s Figure 4 6 AnnotationOverview permettant de visualis
509. t s Il est ainsi possible d acc der aux signatures transcriptionnelles contenant des marqueurs du cancer du sein comme ESRI GATA3 et FOXAI par une requ te telle que ESRI amp GATA3 amp FOXAI Il est galement possible d exclure des g nes par l utilisation du caract re Ceci permet de filtrer les signatures obtenues comme par exemple lors de requ tes avec des marqueurs de cellules T pour exclure les signatures contenant des g nes sp cifiques d un autre type de cellules monocytes telles que CD3E amp CD3D amp CD14 Mais les requ tes ne s effectuent pas uniquement sur les g nes il est en effet galement possible d interroger la base de donn es par annotation sonde plateforme exp rience On peut ainsi r cup rer toutes les signatures enrichies fonctionnellement en g nes du cycle cellulaires CELL CYCLE S 12 18 Les r sultats d une requ te se pr sentent sous la forme d une liste de TS en relation avec les plateformes et exp riences correspondantes toutes ces informations tant charg es lors de l affichage des r sultats Diverses fonctionnalit s ont pu tre ajout es via le d veloppement de modules ou plugins afin d utiliser les r sultats des requ tes Lors de la publication de TBrowser Lopez et al 2008 seuls 3 plugins taient pr sent s et mis disposition Heatmap qui permet la visualisation signature par signature des matrices d expression
510. t was cloned into KpnI Xbal cloning sites of pcDNA 3 1 TOPO vector and named JABKAP fullEx2cal For ABLI the last 37 nt of exon 2 and first 102 nt of exon 3 were amplified from ABLI cDNA cloned into KpnI Xbal cloning sites of pcDNA 3 1 TOPO vector and named ABLI cal All plasmid calibrators were linearized with Xbal and serially diluted in a December 2010 Volume 5 Issue 12 e15590 Table 2 Sequence of primers used for end point and TaqMan real time PCR OE MSCs as a Model for FD Primer Probe Sequence T C Amplicon size bp Splicing events End point PCR hIKBKAP ex17 18F TCATCAATGACATTGAGGTTG 55 446 WT ex20 incl skip hIKBKAP ex22R ATGATTCACAGAATCTATCTG 372 MU hIKBKAP ex1F CCGGACGCACCTCTGTITG 60 485 alt 3 ss for ex2 hIKBKAP ex4 5R TCAGGGTCTGTTGACCTGTG 340 alt 3 ss ex2 hIKBKAP ex33 34F TCCAGGATATCAGCGAGATC 59 449 ex36 incl skip hIKBKAP ex37R GCTGATAAGATGCCATGATAC 346 ex36 hIKBKAP ex35 36R TTGGGACCTAGAACACCTGT 59 414 ex36 incl Real time PCR hELP1 ex19F GGTTCACGGATTGTCACTGTT 60 133 ex20 incl hELP1 ex20 21R ACATAAGTTTGTCCAACCACTTCC P WTELP1 ex20R AAACCAGGGCTCGATGATGAACA hELP1 ex19 21F GGACACAAAGCTTGTATTACAGACTTA 60 121 ex20 skip hELP1 ex21 22R CCACATTTCCAAGAAACACCT P MUELP1 ex21F AGAGGCATTTGAATGCATGAGAAAGC hELP1 ex2F CCAGGGAATCCTCAGTGCT 60 104 full length ex2 incl hELP1 ex2 3R TTCACTTCTCTTGAGACAGGGTCTAC P WTELP1 ex2F TCCGACTGAACAGGGGACGGT hELP1 ex35 37F CAGCTACCCCGGTTCTAGGT 60 128 ex36 skip hELP1 ex38R
511. t 115 1111 1119 Schmitz G Grandl M 2008 Lipid homeostasis in macrophages implications for atherosclerosis Rev Physiol Biochem Pharmacol 160 93 125 Yagmur E Trautwein C Gressner AM Tacke F 2006 Resistin serum levels are associated with insulin resistance disease severity clinical complications and prognosis in patients with chronic liver diseases Am J Gastroenterol 101 1244 1252 van Gorp EC Suharti C Mairuhu AT Dolmans WM van Der Ven J et al 2002 Changes in the plasma lipid profile as a potential predictor of clinical outcome in dengue hemorrhagic fever Clin Infect Dis 34 1150 1153 Lee CY Seet RC Huang SH Long LH Halliwell B 2008 Different patterns of oxidized lipid products in plasma and urine of dengue fever stroke and Parkinsons disease patients Cautions in the use of biomarkers of oxidative stress Antioxid Redox Signal Soundravally R Sankar P Bobby Z Hoti SL 2008 Oxidative stress in severe dengue viral infection association of thrombocytopenia with lipid peroxidation Platelets 19 447 454 Suvarna JC Rane PP 2009 Serum lipid profile a predictor of clinical outcome in dengue infection Trop Med Int Health 14 576 585 Marik PE 2006 Dyslipidemia in the critically ill Crit Care Clin 22 151 159 viii Kruger PS 2009 Forget glucose what about lipids in critical illness Crit Care Resusc 11 305 309 Rothwell C Lebreton A Young Ng C Lim JY Liu W et al 2009 Cholesterol biosynthesi
512. t and H Puggelli for technical support and help in preparation of field work References 1 Gubler DJ 2002 Epidemic dengue dengue hemorrhagic fever as a public health social and economic problem in the 21st century Trends Microbiol 10 100 103 2 Khun S Manderson L 2008 Poverty user fees and ability to pay for health care for children with suspected dengue in rural Cambodia Int J Equity Health 7 10 3 Peters KG 1998 Vascular endothelial growth factor and the angiopoietins working together to build a better blood vessel Circ Res 83 342 343 4 Basu A Chaturvedi UC 2008 Vascular endothelium the battlefield of dengue viruses FEMS Immunol Med Microbiol 53 287 299 5 Pang T Cardosa MJ Guzman MG 2007 Of cascades and perfect storms the immunopathogenesis of dengue haemorrhagic fever dengue shock syndrome DHF DSS Immunol Cell Biol 85 43 45 6 Green S Rothman A 2006 Immunopathological mechanisms in dengue and dengue hemorrhagic fever Curr Opin Infect Dis 19 429 436 7 Lin CF Wan SW Cheng HJ Lei HY Lin YS 2006 Autoimmune pathogenesis in dengue virus infection Viral Immunol 19 127 132 8 Murgue B 2009 Severe dengue questioning the paradigm Microbes Infect 12 113 118 9 Fink J Gu F Vasudevan SG 2006 Role of T cells cytokines and antibody in dengue fever and dengue haemorrhagic fever Rev Med Virol 16 263 275 Libraty DH Acosta LP Tallo V Segubre Mercado E Bautista A et al 2009
513. t de m thodes bioinformatiques permettant la caract risation des m canismes impliqu s dans diverses maladies par une approche transcriptomique pan g nomique mais galement pig n tique et par l tude de la r gulation transcriptionnelle de l expression des g nes J ai valu la qualit des donn es de puces ADN Agilent je les ai normalis es et analys es dans le cadre de collaborations avec d autres quipes de recherche J ai ainsi travaill avec le Dr Patricia Paris de l Institut de M decine Tropicale du Service de Sant des Arm es IMTSSA Marseille sur la dengue infection virale et avec le Dr El Ch rif Ibrahim du NICN CNRS UMR 6184 Facult de M decine Nord Marseille sur tude d une maladie orpheline et neurod g n rative la dysautonomie familiale Ces collaborations ont fait l objet de trois publications deux dans PLoS ONE et une dans Human Mutation voir Chapitres 2 et 3 Le projet TranscriptomeBrowser publi en d cembre 2008 dans PLoS ONE a t poursuivi J ai ainsi d velopp de nouvelles fonctionnalit s mais galement restructur optimis et mis jour notre base de donn es Ces travaux font l objet d un article accept dans BMC Bioinformatics et d un autre en pr paration voir Chapitre 4 En avril 2009 l volution technologique marqu e par l arriv e d un s quenceur tr s haut d bit SOLiD v3 sur la plateforme TGML m a permis de d velop
514. t gP V1 Min 4 487e 00 ist Qu 6 402e 00 Median 5 353e 01 Mean 1 997e 03 3rd Qu 5 759e 02 Max 1 594e 05 V4 Min 8 833e 00 ist Qu 5 578e 00 Median 5 169e 01 Mean 1 980e 03 3rd Qu 5 575e 02 Max 1 573e 05 slot gM ist Qu Median Mean 3rd Qu Max FON OTD amp 535e 00 802e 00 704e 01 170e 03 300e 02 697e 05 V3 Min ist Qu Median Mean 3rd Qu Max FONDA oO amp 415e 00 763e 00 347e 01 138e 03 970e 02 521e 05 V1 V2 V3 Min 61 09 Min 57 8 Min 63 43 Min ist Qu 81 17 ist Qu 82 9 ist Qu 83 61 ist Qu Median 131 14 Median 135 5 Median 141 11 Median Mean 2076 09 Mean 2252 2 Mean 2217 22 Mean 3rd Qu 652 93 3rd Qu 706 1 3rd Qu 772 31 3rd Qu Max 128480 40 Max 128453 9 Max 128744 30 Max slot gBGM V1 V2 V3 Va Min 39 17 Min 86 41 Min 40 54 Min 38 ist Qu 46 12 ist Qu 46 32 ist Qu 46 50 ist Qu 46 Median 48 00 Median 48 21 Median 48 33 Median 47 Mean 49 07 Mean 49 32 Mean 52 24 Mean 48 3rd Qu 50 48 3rd Qu 50 75 3rd Qu 50 65 3rd Qu 49 Max 180 31 Max 226 48 Max 25870 13 Max 152 slot CtrT 1 0 1 153 43379 1486 slot Flag name array ok ace gi j l total flags Array 1 251487911262_1_1 42357 115 5 O 16 0 834 2570 2661 Array 2 251487911262_1_2 42421 128 3 0 28 0 872 2489 2597 Array 3 251487911262_1_3 42244 113 3 0 45 0 774 2661 2774 Array 4 251487911262_1_4 42921 113 2 0 31 0
515. tant de ces recherches Le d veloppement de ces applications a n cessit la cr ation de nombreux langages en fonction des besoins des programmateurs ceux ci sont regroup s en styles ou paradigmes les langages proc duraux sont des langages o une proc dure appel e galement fonc tion correspond une suite d instructions devant tre effectu es sans erreur dans un ordre pr cis On distingue parfois proc dure et fonction par la caract ristique qu une proc dure ne renvoie pas de r sultat les langages objets utilisent des objets qui correspondent des structures s mantiques ind pendantes et rassemblent galement des donn es et des traitements les langages de requ te sont destin s interroger et manipuler les bases de donn es les langages de d finition de donn es ne permettent pas de traiter des donn es mais uni quement de d crire leur structure sous forme de listes ou d arborescences et les instances de ces structures Cependant un langage peut tre associ plusieurs paradigmes La table 1 5 ci apr s regroupe les diff rents langages utilis s pendant ma th se Langage Proc dural Objet Requ te D finition bash x C xX gawk x java x x LaTeX xX perl x R x x SQL X xX XML x TABLE 1 5 Paradigmes associ s aux langages utilis s durant cette th se R sultats CHAPITRE 2 Contr le qualit et normalisa
516. tch M Isha Y Arakawa T Hara A Fukunishi Y Konno H Adachi J Fukuda S Arawa K kawa M Nishi K Kiyosawa H Kondo S Yamanaka 1 Sato T Okazaki Y Gojobori T Bono H Kasukawa T Sato R Kadota K Matsuda H Save heatmap Save annotation Export heatmap data Figure 3 The TBrowser 2 0 interface The main window of TBrowser is made of five panels highlighted in red the search panel 1 the results panel 2 the information panel 3 the plugins panel 4 and the plugin display panel 5 This example shows the expression profiles of genes contained in the TS CBE3881EB derived from GSE469 Temporal profiling in muscle regeneration Annotation panel shows that this TS is highly enriched in genes related to ATP synthesis doi 10 1371 journal pone 0004001 g003 be almost instantaneously proceeded by the server With the current database release this produces a list of 16 TS see Table 1 containing on average 508 probes range 82 1 572 and which were obtained using various microarray platforms GPL96 GPL570 GPL91 Interestingly all these TS are related to experiments performed on breast cancer cells underlying the high specificity of this gene list Table 1 The TBCommonGenes plugin indicates that in addition to ESRI GATA3 and FOXA two genes ANXA9 and ERBB4 are found in all 16 TS Importantly 63 genes are found in at least 10 out of the 16 selected TS 63 As expected this list contains numerous markers of brea
517. te developmental transition Thus one could hypothesize that Mirn17 Mir17 may indirectly affect Notch1 by negatively regulat ing Mycn Although these hypotheses rely on predictions and on the assumption that Mycn binding to Notch promoter is effective in DN3 thymocyte it clearly underlines the potential of this software in helping researchers to draw new hypotheses using data integration 13 Conclusions InteractomeBrowser and its underlying approach can be compared to the Cerebral Cell Re gion Based Rendering And Layout plugin of Cytoscape that also combines molecular interactions with a cell compartment based layout 11 But there are qualitative differences in the conception of Cerebral and InteractomeBrowser which make the latest an interesting alternative for exploring networks On one hand Cerebral uses a layered representation of the cell to create a pathway like view of the network of interacting proteins This layout thus provides a linear organisation of the network On the other hand the layout of InteractomeBrowser is based on a schematic view of the entire cell and displays the hierarchical structure of the underlying Gene Ontology subset as nested zones First this helps visually separating different parts of the network corresponding to different cellular localisations as in Cerebral But this is a more generic visualisation method in the sense it does not restrict the visual message to an input intermediate
518. ted with the 4 470 selected genes D The graph after MCL partitioning Each point is colored according to its associated class E Correspondence between hierarchical clustering and DBF MCL results F TS obtained for GSE1456 G Functional enrichment associated with these TS doi 10 1371 journal pone 0004001 g001 man s rank correlation coefficient was used for k nearest neighbor computation This rank based distance is known to be clearly more resistant to outlying data points than Pearson based distance and thus ensured the selection of genes belonging to unmistakable clusters The full pipeline was run on a server equipped with 6 CPUs and took about 4 days to complete For the sake of clarity only results obtained with GPL96 which is the most widely used Affymetrix microarray platform will be presented in this section 311 experiments related to GPL96 were analyzed 12 752 hybridized samples On average 4 341 probes min 832 max 5 849 per expression matrix were declared as informative by DBF MCL suggesting that routinely 20 of the 22 283 probes measured on the HG U133A array belong to a natural cluster Graph partitioning generated on average 10 8 clusters min 2 max 29 for each experiment and each cluster contained approximately 400 probes corresponding in average to 370 distinct gene symbols Figure 2 shows a summary of these results As expected no clear correlation was observed between the number of selected genes and the
519. terested in retrieving all TS IDs that contain the CD4 gene as they likely contain additional T cell markers Comparing these TS IDs should help you to define frequent CD4 neighbors very likely related to TCR signaling cascade Thereby your request should be gt res lt getSignatures field gene value CD4 371 signatures were found for the request gene CD4 This gene is found in 371 TS with the current database release Obtaining associated gene lists would be time consuming and would not be as specific as expected Indeed the CD4 marker is also expressed by macrophages Another solution would be to search for TS containing two T cell markers CD4 and CD3E for instance and to exclude using the NOT operator those containing the CD14 marker a macrophages marker The syntax should be the following gt res lt getSignatures field gene value CD4 amp CD3E amp CD14 55 signatures were found for the request gene CD4 amp CD3E amp CD14 In the same way one can try to exclude TS containing B cells markers by discarding those containing the CD19 or IGHM markers The resulting query would be the following gt res lt getSignatures field gene value CD4 amp CD3E amp CD19 IGHM 33 signatures were found for the request gene CD4 amp CD3E amp CD19 IGHM 2 2 Finding the biological contexts in which sets of genes are co expressed As mentioned by Lacroix et al ESR1 GATA3
520. that are outside any natural gene cluster These genes should be discarded from analysis as they are inevitably associated with false positive neighbors These considerations motivated the present work and the development of a new approach that follows a transcriptional signature centered perspective The goal was to build an application that would interact with a large database of transcriptional signatures and would implement efficient tools to analyze and visualize the results The first issue resided in the construction of a database containing high quality transcriptional signatures obtained in an automated fashion Both supervised and unsupervised classification algorithms can be used in microarray data analysis 5 Supervised methods aim at finding a set of genes whose expression profiles best correlate with a known phenotype They provide a way to select informative genes by choosing the top k genes according to the results of a statistical test e g Student s test Significance Analysis of Micro arrays Signal to Noise Ratio ANOVA and by controlling the false discovery rate FDR In contrast unsupervised classification approaches achieve clustering of genes based on their respective expression profiles but are not intended to filter out uninformative genes Some popular approaches in microarray analysis use either agglomerative methods hierarchical clustering partitioning meth ods k medoids k means PAM SOM etc or m
521. the comprehension of this complex rare disease and offers a potential system for testing therapeutic agents However transgenic animals do not reproduce phenotypic features of FD as they maintain normal development Alterna tively FD patient fibroblasts are an informative model of mRNA splicing regulation However a recent study suggests that IKAP hELP1 expression is much higher in neurons compared to fibroblasts 21 and fibroblasts do not exhibit the same ratio of IKBKAP exon 20 including exon 20 skipping transcripts named WT MU respectively for simplicity as observed in nervous system derived tissues 8 This finding narrows the understanding of disease mechanisms in a neural context Finally generation of neural cells through the production of induced pluripotent stem GPS cell from FD fibroblasts has been recently established 22 Neural cells derived from iPS cells have potential to be used for studies of neuropathologies 23 However the labor intensive reprogramming required to induce iPS cells erases the develop mentally relevant epigenetic signature specific to the disease state As a consequence some important information may be lost impeding recreation of an accurate disease model The demon stration that fibroblasts can be converted directly into neurons without an initial reprogramming as recently evidenced in mouse 24 is very attractive Nevertheless during their reprogramming human iPS cells do not pass through
522. they may altogether contribute to DSS pathophysiology Activation of a pro inflammatory defence gene pattern in DSS patients blood cells Table 4 has relevance to the pathophysiology of systemic vascular dysfunction since most microbicidal peptides and enzymes have recognized pro inflammatory and pathogenic effects towards vascular endothelial tissues 50 Among them the neutrophil microbicidal peptides alpha defensins and the highly pro inflammatory calgranulins proteins S100A8 A9 and S100A12 are now considered putative pathogenic factors in sepsis cardiovascular diseases rheumatoid arthritis or atherosclerosis 51 56 104 While neutrophils are considered the main source of those defence molecules this cellular origin cannot be established from the present study due to the cellular complexity of unfractionnated whole blood samples and to the possibility that other circulating cell types may express a_neutrophil like inflammatory repertoire under pathologic conditions 105 A putative neutrophil origin of this gene expression pattern is however supported by the over expression in DSS patients blood cells of transcripts encoding other neutrophil related molecules such as the MMP8 matrix metalloprotease CGEACAM 6 CEACAM 8 and CD99L2 adhesion molecules Tables 4 and S2 involved in the recruitment of neutrophils to vascular endothelia Functional studies should confirm whether those first line defence immune cells which produce an array of
523. thodes B Repr sentation des pics obtenus par ces diff rentes m thodes une position donn e du g nome Adapt de Wilbanks amp Facciotti 2010 Principales tapes du pipeline d analyse des donn es de Chromatine Immuno Pr cipitation ChIP seq sur la plateforme TGML Principe de l algorithme et pipeline d analyse du programme de d tection de pics d velopp au laboratoire TAGC 2 4 4 Pe ee RE RES 1 1 1 2 1 3 1 4 LS 2 1 3 1 4 1 4 2 Jl 5 2 Liste des tables Tableau comparatif des principales technologies de s quen age tr s haut d bit Les cellules gris es correspondent des technologies tr s haut d bit de derni re g n ration NGS qui seront d crites plus tard dans ce manuscrit Les pointent les mod les de s quenceurs de paillasse de petite taille faible d bit mais tr s rapides ce eee 4314444844 33 Caract ristiques des trois mod les de s quenceurs les plus r pandus 35 Applications et avantages du paired end pour les techniques de s quen age tres haut debit iep d de Lo aces Save eSB 4 G 43 Les principales applications du s quen age tr s haut d bit En gras figure V application qui sera d velopp e plus en d tail dans le chapitre 5 de ce manuscrit 48 Paradigmes associ s aux langages utilis s durant cette th se 54 Synth se des principaux outils perm
524. tillon l autre Ceci est g n ralement utilis pour r duire ou liminer les effets de facteurs confondants ind pendants de l appartenance un des deux groupes Un test t non appari peut tre normalement utilis lorsque deux ensembles distincts d chantillons ind pendants et identiquement distribu s sont compar s Un des pr requis de ce test est que la variance des deux chantillons est identique homosc dasticit Contrairement au test t de Student le test t de Welch tient compte de l in galit de la 3 1 S lection de g nes 77 variance des deux groupes d chantillons et peut donc tre mis en oeuvre lorsque l hypoth se d homosc dasticit n est pas v rifi e ce qui est souvent le cas pour les donn es de puces ADN Le test t de Welch d finit la valeur statistique t par la formule suivante t a A 4B nA ng o x s et n correspondent dans le cas des puces ADN respectivement la moyenne des intensit s d un g ne l cart type et la taille des groupes d chantillons A ou B y Pour chaque g ne une p valeur est estim e soit partir de la distribution de la statistique t c est dire de la loi de Student Figure 3 1 soit partir de permutations ce qui permet de d finir un FDR Densite de probabilite t i Rejet de H p Non rejet de H Non rejet de H FIGURE 3 1 Distribution de la loi de Student Une repr sentation graphique
525. ting human model to investigate gene networks and cellular pathways altered in disease like FD For example cell migration defects have been observed in cells lacking normal expression of IKAP hELP1 10 11 22 34 and we show here that FD hOE MSCs exhibit impaired migration compared to control cells Additionally hOE MSCs are an appropriate model for validating the potency of therapeutic agents such as kinetin a cytokinin that has been shown to increase JABKAP mRNA and protein expression in FD cell lines and in vivo models 20 22 35 36 as well as in leukocytes of healthy carriers of the FD mutation 37 Results FD hOE MSCs express stem cell glial and immature neuronal markers To establish a human cellular model of FD we collected 4 olfactory mucosa biopsies from homozygous patients for the Ivs20 T FD mutation As previously demonstrated with control biopsies 30 after about 2 weeks of culture the microscopic examination of the tissue crushed under a glass coverslip revealed stem cell proliferation Figure 1A and 1B After reaching confluency in a 4 well plate the cells attached to the glass coverslip were further expanded by transfer into a 6 well plate Figure 1C Like control hOE MSCs we observed that FD hOE MSCs could be cultured for long periods at least 15 cycles of trypsin EDTA treatment and expansion on larger plastic surface with a doubling time of about 30 48 h When subjected to immunostaining all hOE MSCs derive
526. tion pour tudier les amplifi cations et les d l tions g nomiques sur l ensemble du g nome Solinas Toldo et al 1997 Snijders et al 2001 les puces recouvrant l enti ret du g nome par fragments chevauchants tiling arrays pour des applications comme le ChIP on chip voir partie 5 1 1 les puces de g notypage SNP Small Nucleotide Polymorphism permettant l analyse des polymorphismes Pastinen ef al 2000 Une exp rience de puces ADN se d roule selon les tapes suivantes conception du plan d exp rience marquage et hybridation acquisition et traitement des donn es analyse et inter pr tation des r sultats Figure 1 2 Les particularit s de l acquisition de la correction de la normalisation des donn es de technologie AgilentTM seront pr sent es et discut es dans les chapitre 2 et 3 de ce manuscrit La m ta analyse de donn es de puces sera pr sent e au travers du projet TranscriptomeBrowser d velopp au TAGC et auquel j ai contribu voir Chapitre 4 FIGURE 1 2 D roulement d une exp rience de puces ADN du dessin de l exp rience au stockage des donn es en passant par leur traitement et leur analyse Chacune de ces tapes a n cessit le d veloppement d outils bioinformatiques qui seront d taill s dans les chapitres 2 4 de ce manuscrit 24 Chapitre 1 Introduction g n rale 1 3 R gulation de l expression des g nes La r gulation de l
527. tion and in myelination Hum Mol Genet 16 2097 2104 Chen C Tuck S Bystrom AS 2009a Defects in tRNA modification associated with neurological and developmental dysfunctions in Caenorhabditis elegans elongator mutants PLoS Genet 5 e1000561 Chen YT Hims MM Shetty RS Mull J Liu L Leyne M Slaugenhaupt SA 2009b Loss of mouse Ikbkap a subunit of elongator leads to transcriptional deficits and embryonic lethality that can be rescued by human IKBKAP Mol Cell Biol 29 736 744 Close P Hawkes N Cornez I Creppe C Lambert CA Rogister B Siebenlist U Merville MP Slaugenhaupt SA Bours V Svejstrup JQ Chariot A 2006 Transcription impairment and cell migration defects in elongator depleted cells implication for familial dysautonomia Mol Cell 22 521 531 Cohen Kupiec R Pasmanik Chor M Oron Karni V Weil M 2011 Effects of IKAP hELP1 deficiency on gene expression in differentiating neuroblastoma cells implications for familial dysautonomia PLoS One 6 e19147 Colognato H Ramachandrappa S Olsen IM ffrench Constant C 2004 Integrins direct Src family kinases to regulate distinct phases of oligodendrocyte development J Cell Biol 167 365 375 Cook AL Vitale AM Ravishankar S Matigian N Sutherland GT Shan J Sutharsan R Perry C Silburn PA Mellick GD Whitelaw ML Wells CA Mackay Sim A Wood SA 2011 NRF2 activation restores disease related metabolic deficiencies in olfactory neurosphere derived cells from patients with
528. tion de donn es de puces ADN Sommaire 2 1 Obtention des donn es brutes d expression 59 2 1 1 Conception du plan d exp rience et biais techniques 59 2 1 2 Acquisition des donn es brutes 61 2 2 Correction des donn es brutes 61 2 2 1 Pr traitement des donn es 61 2 2 2 Transformation en logarithme base 2 62 2 2 3 Normalisation des donn es 62 2 3 Contexte duprojet 65 2 4 Choix du d veloppement d une librairie R 66 2 5 Principe de la librairie R AgiND 67 2 6 Discussions et Perspectives 69 Le but d une exp rience de puce ADN est d identifier les transcrits dont le niveau d expression varie entre diff rentes conditions biologiques d int r t Cependant ces sources de variation peuvent galement tre dues en partie voire en quasi totalit des biais exp ri mentaux Afin d analyser au mieux les donn es de puces ADN mais surtout de pouvoir les comparer entre elles lors de l analyse de donn es voir Chapitre 3 il est important de prendre grand soin lors de toutes les tapes exp rimentales de pr traitement et de normalisation des donn es brutes afin de limiter ces biais et ou de les corriger C est pourquoi au laboratoire TAGC Inserm
529. tion et de quantit Figure 5 13 Le choix de l algorithme utiliser est dict par le type de facteur de transcription tudi Ainsi pour un facteur de transcription site sp cifique on pr f rera un logiciel donnant des pics troits En revanche pour des facteurs tels que Cbp Creb Binding Protein ou son homologue p300 qui est un co activateur pour un grand nombre de facteurs de transcription notamment Creb E2F Jun Fos on cherchera des r gions de taille plus tendue comme pour les marques d histones ventuellement Un alignement test a galement t cr pour valuer les algorithmes Celui ci correspond un alignement disposant de pics et galement d artefacts tels que des empilements Figure 5 12 Une fois les pics identifi s ils peuvent tre inclus et visualis s en plus des alignements l aide d un genome browser ce qui permet de juger de leur qualit Enfin ces informations sont maintenant collect es dans des outils ou bases de donn es 238 Chapitre 5 tude de la r gulation transcriptionelle par HTS d di s tels que Epigraph http epigraph mpi inf mpg de WebGRAPH 5 3 5 D couverte et recherche de motifs Une fois localis s les pics potentiels de fixation du facteur de transcription on peut recher cher la pr sence de motifs de fixation de facteurs de transcription dans les s quences pr sentes sous les pics On peut ainsi conforter les r sultats obtenus pour les facteurs de tran
530. tory 3 1 The agBoxplot function This function allow to observe the distribution of a slot The result is different if they are one or several array The argument of this function are gt args agBoxplot function x whichSlot NULL array NULL log TRUE centered FALSE reduced FALSE html FALSE pdf FALSE horizontal FALSE NULL 8 1 1 Distribution inter arrays e g for several arrays One boxplot is obtained for each array gt agBoxplot myob array 1 4 3 1 2 Distribution intra array e g for just one array Four boxplot are obtained to allow to observe the distribution of the negative controls samples positive controls and DarkCorner which is the most important positive control on the array gt agBoxplot myob array 1 lhome aurelie R i686 pc linux gnu library 2 5 AgiData 12 fi not centered and not reduced log2 gM a US45102986_251487911262_S01_GE1 v5_95_Feb07_1_1 txt 12 oe uel Ctrneg samples no flags Ctrpos DarkCorner Log2 of Slot gM array 1 b Figure 1 The agBoxplot function A Boxplot obtained for the gMeanSignal slot by default and for all the array of the AgilentBatch object B Visualization of the negatives and positives controls distribution for the first array of the AgilentBatch object But the distributions can be obtain thanks to the summary function by the command gt summary myob Summary of slo
531. try 284 10 6116 25 Tusher et al 2001 Tusher V G Tibshirani R amp Chu G 2001 Significance analysis of microarrays applied to the ionizing radiation response Proceedings of the National Academy of Sciences of the United States of America 98 9 5116 21 Vazquez et al 2010 Vazquez M Nogales Cadenas R Arroyo J Botias P Garcia R Carazo J M Tirado F Pascual Montano A amp Carmona Saez P 2010 MARQ an online tool to mine GEO for experiments with similar or opposite gene expression signatures Nucleic acids research 38 Web Server issue W228 32 Bibliographie 303 Velculescu et al 1995 Velculescu V E Zhang L Vogelstein B amp Kinzler K W 1995 Serial analysis of gene expression Science New York N Y 270 5235 484 7 Visel et al 2009a Visel A Blow M J Li Z Zhang T Akiyama J A Holt A Plajzer Frick I Shoukry M Wright C Chen F Afzal V Ren B Rubin E M amp Pennacchio L A 2009a ChIP seq accurately predicts tissue specific activity of enhancers Nature 457 7231 854 8 Visel et al 2009b Visel A Rubin E M amp Pennacchio L A 2009b Genomic views of distant acting enhancers Nature 461 7261 199 205 Wang et al 2008 Wang X Arai S Song X Reichart D Du K Pascual G Tempst P Rosenfeld M G Glass C K amp Kurokawa R 2008 Induced ncRNAs allosterically modify RNA binding p
532. ts Pour cela il utilise des permutations dans les mesures pour estimer le FDR La statistique du test SAM d est d finie par Tusher et al 2001 p ADR dli Sass o xA et xB repr sentent les moyennes d expression pour le g ne i dans les conditions A et B et s i est une estimation de la variance repr sentant cart type pour le g ne i tel que si Va mx D XAOP DY aL BOP et 1 1 na g na ng 2 avec xm et xn les sommes des expressions pour les chantillons du groupe A et B respectivement et nA et nB le nombre d chantillons dans les groupes A et B a Cette statistique est sensiblement identique celle du test t La diff rence essentielle est la pr sence au d nominateur d un facteur correctif So Ce facteur correspond une faible valeur positive calcul e pour minimiser l effet de la variance En effet les g nes peu abondant ont une variance faible Ce score est calcul pour chaque g ne en fonction des deux groupes fournis 3 1 S lection de g nes 79 par l utilisateur Ce m me calcul est r alis partir des donn es initiales par un nombre d fini de permutations d un ensemble d chantillons afin de g n rer une distribution simul e des valeurs dg Les valeurs de d observ es sont ensuite compar es aux valeurs simul es dg Figure 3 3 Les g nes diff rentiellement exprim s sont alors s lectionn s en fonction du FDR calcul partir des permutatio
533. uct the standard curves The number of IKBKAP and ABL transcripts was extrapolated automatically by the Sequence Detection System v2 2 2 software Applied Biosystems Microarray analysis and normalization RNA integrity was assessed using an Agilent 2100 Bioanalyser Palo Alto CA Samples with an RNA integrity number RIN lt 9 PLoS ONE www plosone org were excluded from the analysis the samples concerned were C2P5 C3P5 and FD2P5 Gene expression analyses were carried out with cDNA Nylon microarrays containing 8 780 spotted cDNA clones and radioactive detection as previously described 64 with 5 ug of RNA reverse transcribed oligo dT priming in presence of a P dCTP Amersham Pharmacia Biotech Details about microarray construction clones list probes preparations hybridizations and washes have been previously described 65 After image acquisition signal intensities were quantified using BZScan software http tagc univ mrs fr bioinformatics bzscan 66 A specific R library that uses the S4 system of formal classes and methods was used to process and normalize nylon microarray data 67 Quantile normalization was applied to vector probe data V and complex probe data C to correct for global intensity and dispersion Correction by the vector signal was made for each spot signal by calculating a C V ratio before log transformation base 2 No background correction or overshining correction was used A
534. ue 12 e4001 166 Chapitre 4 Fouille de donn es de puces ADN 4 5 Mise jour de la base et int gration de donn es Au d but de ma th se une mise jour de la base de donn es a t n cessaire afin d am liorer les performances et la coh rence des donn es pour inclure plus d exp riences d esp ces et d annotations En effet les donn es pr sentes dans la base de donn es avaient t r cup r es en 2007 au d but du projet Or la quantit d chantillons disponibles a pratiquement doubl en 2 ans Figure 4 3 Enfin une fois la preuve de concept r alis e nous avons voulu conforter nos observations en incluant d autres sources de donn es afin d aider l utilisateur construire des r seaux de g nes contextualis s en fonction d une pathologie d une voie de signalisation ou bien encore d un tissu donn 4 5 1 Restructuration de la base de donn es Devant la n cessit d am liorer les performances de TBrowser tout particuli rement au niveau de la rapidit d ex cution des requ tes sur la base de donn es le sch ma de cette base a t totalement red fini En effet la premi re version de la base de donn es contenait des informations redondantes et pr s de 200 tables non index es avec notamment une table par plateforme J ai donc d normaliser les tables existantes afin de diminuer la redondance et permettre un acc s plus rapide aux donn es Pour r duire cette redond
535. ugopal A Balakrishnan L Marimuthu A Banerjee S Somanathan DS Sebastian A Rani S Ray S Harrys Kishore CJ Kanth S Ahmed M Kashyap MK Mohmood R Ramachandra YL Krishna V Rahiman BA Mohan S Ranganathan P Ramabadran S Chaerkady R Pandey A Human Protein Reference Database 2009 update Nucleic Acids Res 2009 37 D767 772 22 Lopez F Textoris J Bergon A Didier G Remy E Granjeaud S Imbert J Nguyen C Puthier D TranscriptomeBrowser a powerful and flexible toolbox to explore productively the 26 transcriptional landscape of the Gene Expression Omnibus database PLoS ONE 2008 3 e4001 23 Naldi A Berenguier D Faur A Lopez F Thieffry D Chaouiya C Logical modelling of regulatory networks with GINsim 2 3 BioSystems 2009 97 134 139 24 Dreszer TR Karolchik D Zweig AS Hinrichs AS Raney BJ Kuhn RM Meyer LR Wong M Sloan CA Rosenbloom KR Roe G Rhead B Pohl A Malladi VS Li CH Learned K Kirkup V Hsu F Harte RA Guruvadoo L Goldman M Giardine BM Fujita PA Diekhans M Cline MS Clawson H Barber GP Haussler D James Kent W The UCSC Genome Browser database extensions and updates 2011 Nucleic Acids Research 2011 25 Stormo GD DNA binding sites representation and discovery Bioinformatics 2000 16 16 23 26 Thomas Chollier M Sand O Turatsinze J V Janky R Defrance M Vervisch E Broh e S van Helden J RSAT regulatory sequence analysis tools Nucleic Acids Res 2008 36 W119 127
536. uits permettent de g n rer ou uniquement de repr senter ces classifica tions TMeV cluster Treeview fonctions et librairies R 3 2 1 La m thode de classification hi rarchique La classification hi rarchique a pour avantage d tre simple mettre en oeuvre et son r sultat peut tre facilement visualis Elle est devenue l une des m thodes les plus utilis es pour l analyse des donn es d expression g nique C est une approche d agglom ration dans laquelle les profils d expression sont simplement regroup s sur la base de leur similarit Les groupes ainsi obtenus sont ensuite joints jusqu ce que le processus ait t men son terme en formant un seul arbre hi rarchique appel aussi dendogramme Figure 3 4 La classification hi rarchique permet de repr senter la matrice des intensit s d expression normalis es afin de visualiser simplement les g nes ayant des profils similaires Cette matrice dont la distribution est m diane centr e sur les g nes est telle que chaque colonne correspond une exp rience et chaque ligne correspond la sonde d un transcrit On repr sente g n ra lement les ratios ou valeurs d intensit s normalis es gr ce une chelle de couleurs allant du vert g nes r prim s au rouge g nes induits Cette repr sentation est commun ment appel e heatmap Figure 3 4 3 2 2 La m thode des k moyens k means Dans la m thode de partitionnement k means les
537. ular Mechanisms of DSS Table 5 Pro inflammatory lipid related genes present in the DSS gene signature Function Gene Symbol P value Var Disease Ref Lipid laden Mo Mac related genes scavenger receptors of modified OLR1 CD36 MSR1 lt 0 00001 to 0 00013 0 21 to 0 32 metabolic diseases 60 64 LDL in Mo Mac lipid nuclear receptor signalisation PPARG PPARA 0 00007 to 0 00732 0 21 to 0 34 metabolic diseases 65 by lipids efflux of modified cholesterol from NPC1 0 00005 0 32 Niemann Pick disease atherosclerosis 66 67 Mo Mac ABCA10 0 00016 0 14 none migrating Mo resident Mac CCR2 CX3CR1 0 00001 to 0 00099 0 22 to 0 40 atherosclerosis 57 chemokine receptors other lipid laden related Mo Mac FABP4 SOCS6 RETN lt 0 00001 to 0 00092 0 20 to 0 26 metabolic diseases 68 72 genes IRS2 CHIT1 lt 0 00001 0 48 Gaucher s disease atherosclerosis 73 PCSK9 0 00001 0 42 familial hypercholesterolemia 74 SPP1 lt 0 00001 0 49 metabolic and inflammatory diseases 75 76 anti oxydant enzymes LCAT PAFAH2 0 00196 to 0 00461 21 to 26 metabolic diseases 77 Arachidonic acid pathway related genes phospholipase PLA2G4A 0 00003 0 21 rheumatoid arthritis 78 eicosanoid synthesis enzymes PTGES LTA4H PTGDS lt 0 00001 to 0 00123 0 22 to 0 63 metabolic and inflammatory diseases 79 82 TBXAS1 PTGDR asthma cancer leukotrienes convertion enzyme MGST2 0 00003 0 32 none leukotriene transporter SLCO2B1 0 00010 0 31 asthma 83 lipid oxidation ALOX15B 0 00
538. ulation of tissue specific gene expression remains a challenging problem So far the only candidate gene that may explain increased aberrant splicing of IKBKAP mRNA in the nervous system is NOVAI identified by Lee et al as a downregulated gene in FD iPSC derived neural crest precursors Lee et al 2009 NOVA is a tissue specific factor regu lating alternative splicing in the brain of a large number of genes that function primarily at synapses Ule et al 2005 Thus it has been suggested that this splicing factor may participate in the balance of neuronal excitation and inhibition and is necessary for proper synaptic development and function Ruggiu et al 2009 In addi tion one of the roles of NOVA proteins may be to enable neurons to adapt their synaptic inhibition in response to neuronal activity Jelen et al 2010 In our system we confirmed a NOVA dysreg ulation in FD hOE MSCs derived spheres supporting this gene s potentially critical role in modulating IKBKAP mRNA alternative splicing Therefore we can speculate that NOVAI may not only act as a master candidate to regulate IKBKAP pre mRNA splicing in FD but also the regulation of many other targets involved in progres sion of this neurodegenerative disease To understand the precise role of NOVAI in mRNA splicing further experiments modulating its expression in human control and FD cells will be necessary In addition it is clear from the initial analysis of postmortem
539. uments of this function are gt args agPlot function object array 1 whichSlot NULL log TRUE chrm NULL scale NULL barplot TRUE identify FALSE html FALSE pdf FALSE NULL For exemple for the 50th probes of the first chromosome the plot obtained is gt agPlot myobRG barplot TRUE log FALSE chrm 1 scale 1 50 US45102986_251470610922_S01_CGH v4_95_Feb07 ixt 1 5 1 0 logRatio l Ill x IL 0 0 1 0 5 L chr1 000005748 000005792 to chr1 000931329 000931373 Figure 7 The agPlot function for the 50th probes of the first chromosome 18 4 Normalization Two methods of normalisation are used in this library e Lowess methods e Quantiles methods With these two methodes a correction of the background signal is possible to improve the data normalization This correction correspond to the argument bgCorrection of the agNormData func tion The differents normalized slot obtained are in logarithm base 2 Moreover two others slots are created A and M which correspond to the values of A and M when the MA plot is plotted Futhermore a difference exists between normalization of one or two colours array Indeed for a one colour the reference correspond to the medians of each spots between the different arrays of the AgilentBatch or AgilentBatchRG object whereas for a two colors the reference of the red channel corespond to the green cha
540. une composante essentielle de la r gulation transcriptionnelle dans les cellules eucaryotes Diverses techniques tr s haut d bit ont t mises au point pour I tude de la r gulation pig n tique du g nome diff rents niveaux la conformation des chromosomes 3C seq l ouverture de la chromatine FAIRE seq ou traitement la DNAse I la position des nucl osomes MNase seq les modifications des histones et la liaison des facteurs de transcription ChIP seq ou bien encore la m thylation de l ADN methyl seq Il est noter que pour la technique du ChIP seq une liaison covalente des prot ines P ADN cross link r alis e par un traitement de fixation au formald hyde est requise afin de pouvoir cartographier les sites de liaison des facteurs de transcription voir partie 5 1 2 L expression d un g ne peut tre contr l e par l interaction directe de son promoteur avec des l ments de r gulation localis s une longue distance sur le chromosome ou dans de rares cas sur d autres chromosomes La technique du 3C seq permet ainsi la Capture de la Conformation des Chromosomes Capture Chromosome Conformation ou 3C Elle a t d velopp e pour l analyse de la chromatine une chelle sup rieure Les r gions du g nome bien qu loign es peuvent alors tre juxtapos es par bouclage de la chromatine et deviennent ainsi contigu s condition que la chromatine soit ouverte En effet
541. up r s l aide des noms de fichiers Seuls le code barre et la position sur la puce ADN seront conser v s ult rieurement Le fichier US83700202_252800413012_S01_GE1_107_Sep09_1_1 txt portera par exemple le nom simplifi 252800413012 121 Pour les g nes les Feature number et le symbole des g nes sont s par s par des par exemple 4 U2AF1L4 Ceci permet l obtention d un identifiant unique puisqu un transcrit peut tre pr sent plusieurs reprises sur la puce ADN II est galement possible d y rajouter par programmation les r f rences des sondes les descriptions de g nes et d autres informations Un r sum de l installation de la librairie AgiND et de son utilisation est pr sent dans le manuel d utilisation t l chargeable avec la librairie voir Annexe A Non publi e l heure actuelle cette librairie est d j utilis e en routine par les utilisateurs de la plateforme transcriptome TGML Par ailleurs elle a permis plusieurs collaborations qui sont pr sent es dans le chapitre suivant de ce manuscrit Cette librairie n a pas t soumise des d p ts de librairies R comme bioconductor cependant elle est mise disposition sur le site web du laboratoire http tagc univ mrs fr AgiND 2 6 Discussions et Perspectives Au d but de ce projet les seuls outils gratuits disponibles taient les librairies R marray et limma Smyth 2005 Elles permettent de contr
542. using the Go Taq DNA polymerase system Promega Madison WI and IKBKAP specific primers hIKBKAP 17 18F and hIKBKAP 22R see Boone et al 2010 PCR products were sep arated on a 1 7 agarose gel by electrophoresis in 1X TBE buffer Tris 0 89 M boric acid 0 89 M and EDTA 0 02 M DNA was vi sualized under UV light after ethidium bromide incorporation and documented using BioVision Camera Real Time PCR Assay The PCR reactions were performed in duplicate in a final volume of 25 ul including 300 nM primers 200 nM TaqMan probe 12 5 ul of TagMan universal PCR master mix Applied Biosys tems and 25 50 ng of cDNA in a AB Prism 7900 HT thermocycler with 50 cycles and the protocol recommended by the manufacturer Primers hELP1 ex19F hELP1 ex20 21R and probe P WTELP1 ex20R were used for detection of IKBKAP transcripts containing exon 20 while primers hELP1 ex19 21F hELP1 ex21 22R and probe P MUELP1 ex21F were used for detection of IKBKAP transcripts skipping exon 20 Boone et al 2010 To determine the level of expression of candidate genes dysregulated genes in FD the following primer TaqMan probe assays were obtained from Applied Biosystems Hs_00176719m1 LYN Hs_01103338m1 SNCA Hs_01374916m1 MAPILC3C Hs_00359592m1 NOVAI Hs_01120488m1 SPONI Hs_00216077m1 LUC7L Hs_00214302m1 ZNF280D and Hs00296608_m1 WDR59 was used as a reference gene to normalize the data Results were calculated using the 2 AACr method
543. utiliser d autres librairies R nous permettrait de faire des m ta analyses plus simplement et d y int grer des donn es analys es sous R et provenant de techniques tr s haut d bit comme le RNA seq ou le ChIP seq On s oriente ainsi vers une int gration de plusieurs types d exp riences comme le propose d j l outil Genomics Portals Shinde ef al 20101 CHAPITRE 5 tude de la r gulation transcriptionelle par HTS Sommaire 5 1 Principe de l immunopr cipitation de la chromatine associ e au s quen age tres haut d bit ChIP seq osc he es SOSH ss se RRR esse so 216 S 1 1 G N AINES saet usuel messes satin dans 216 5 1 2 Princip biologique 2 2 4 us ua ae eee pe ee ees 218 5 13 Biaisiet bruttid fond 4 c i e bas Loubet es 218 5 1 4 Avantages et inconv nients 220 5 1 5 Le mod le th orique de distribution des s quences 220 5 2 L informatique du HTS 221 5 2 1 Organisation mat rielle et logicielle 221 5 2 2 Interfaces utilisateurs pour le lancement et la gestion du s quen age 227 5 2 3 Pipeline de traitement de donn es Bioscope 227 5 3 Analyse de donn es de ChIP seq 228 5 3 1 Donn es brutes et qualit de s quen age 230 5 3 2 Formats standards et outils de manipulation de donn es 233 5
544. veau de leur promoteur et H3K36me3 le long de la r gion transcrite On pense qu ils jouent le r le de guide des modifications de la chromatine participant ainsi l tablissement d un tat pig n tique sp cifique pour chaque type cellulaire Khalil et al 2009 Guttman ef al 20091 La taille des eRNA varie entre 100 et 900 nt De Santa et al 2010 rom et al 2010 Contrairement aux lincRNA les eRNA poss dent les caract ristiques pig n tiques sp cifiques des enhancers d o leur nom En effet ceux ci sont enrichis en marques de H3K4mel et sont transcrits par ARN polym rase II contrairement aux autres ARN non codants transcrits par l ARN polym rase III et de co r gulateurs tels que le co activateur p300 En revanche ils sont faiblement enrichis en H3K4me3 Enfin les PAR sont une cat gorie de plus petite taille comprise entre 16 et 200nt Ces ncRNA sont caract ris s par leur localisation certains sont exprim s proximit des TSS alors que d autres le sont au niveau des promoteurs Un nombre croissant d tudes semble indiquer que les PAR jouent un r le dans la r gulation de l expression des g nes aussi bien dans leur activation que dans leur r pression Morris et al 2008 Wang et al 2008 Kaikkonen ef al 20111 1 3 5 Epig n tique et pig nomes L information contenue dans le g nome est ainsi sp cifiquement r gul e par les marques pig n tiques de mani re spati
545. venant de multiples sources h t ro g nes sont int gr es l expression des g nes leur annotation la litt rature la structure en domaine des prot ines et leur interaction etc Toutes ces donn es sont ainsi utilis s s parem ment afin d ordonner sur la base de similarit une liste de g ne de r f rence sp cifique de la maladie tudi e une liste de g nes Puis le rang obtenu pour chaque type d information est fusionn pour chaque g ne afin d obtenir un classement globale Chapitre 4 Fouille de donn es de puces ADN 208 ik anovav JOYIS GAM y sanpasoud 4 pa1oys w yas qqiasmolg 6007 Z uotssaq 2082 r uorsie ISYAYLYO UASMOUG AWOLATYOSNYYL SL ASS INS par erase 91 2 UDJEQULE annees dy 9 9671452 104 por 210 51 4 SUTSA1 TOW 48Q Due GULATI EHIE Y u meq a u uodsauo 3 55212 P JEOSSE SN o upoe paoe 3 ad y3 SuLonmed poe sou nd te se au jo BLusysn 2 jeaysie yy dut S2 QSpT3S5 42 M LI saa 72 110 SEP 961159 SA ym paumo synsey NOLLVAHOSNI s1 ameus jeuondposue L isa pwa s Jays ROLLYLONHY FANLYNIIS 3 NOLLV49 LMI VIVO amp LISROL A 194400 NOLLOVALX3 IANLYNEIS aya zk SRS STEER E SOUSSE ef FE i o loppement d une nouvelle base de donn es d une librairie R de services web et l int gration de nouvelles donn es 2 eve FIGURE 4 7 R sum
546. vignette rds NAMESPACE normexp c tests limma Tests R limma Tests Rout save limma limma rdb limma rdx Ficure 2 3 Exemple de structure d une librairie R ici la librairie R limma avec en A son architecture de fichier au niveau du code source et en B celle apr s compilation et installation de la librairie 2 5 Principe de la librairie R AgiND Cette librairie est constitu e d un ensemble d objets et de fonctions cod s en R et d un programme crit en langage C permettant l extraction des donn es de AFE de mani re plus rapide Ce programme C est appel par le code R Enfin AgiND fait galement appel d autres librairies R telles que Biobase limma marray geneplotter annotate AnnotationDbi et lattice Il y a galement des fiches d aide pour chaque fonction ou classe d objet cr e ainsi qu un manuel d utilisation Les donn es brutes initiales sont obtenues partir de AFE et correspondent des fichiers texte tabul s contenant trois tables deux de param tres exp rimentaux et une de r sultats Ainsi la premi re table FEPARAMS contient les param tres d entr e et les options choisies par l utilisateur en accord avec les param tres du protocole utilis comme par exemple GE1_105_Jan09 tandis que la seconde table STATS contient des param tres d termin s par le scanner La derni re table FEATURES est un tableau contenant plus de 90 colonnes d
547. vity to pain lack of overflow tearing inappropriate blood pressure control manifested as orthostatic hypotension and episodic hypertension poor oral coordination resulting in poor PLoS ONE www plosone org feeding and swallowing and gastrointestinal dysmotility 4 No cure is available for this disorder and treatment is aimed at controlling symptoms and avoiding complications FD is caused by mutations in the JABAAP gene which encodes a protein termed IKAP hELP 1 5 6 The most prevalent mutation is a splice mutation the T to C transition in position 6 of the 5 splice site 5 ss of intron 20 IVS20 75 of this gene All FD cases have at least one copy of this mutation gt 99 5 are homozygous 5 7 This mutation leads to variable tissue specific skipping of exon 20 of JABKAP mRNA with the central and peripheral nervous system more prone to complete skipping than others tissues which leads to reduced IKAP hELPI protein levels 8 December 2010 Volume 5 Issue 12 e15590 Although the exact function of the IKAP RELPI protein is not clearly understood researchers have identified IKAP hELPI1 as the scaffold protein required to assemble a well conserved six protein complex ELP1 6 called the holo Elongator complex that possess histone acetyltransferase activity directed against histone H3 and H4 in vitro 9 INAP hElongator is recruited to the transcribed regions of some human genes essentially involved in actin cytosk
548. wski H Vogl T Roth J 2007 S100 proteins expressed in phagocytes a novel group of damage associated molecular pattern molecules J Leukoc Biol 81 28 37 Vogl T Tenbrock K Ludwig S Leukert N Ehrhardt C et al 2007 Mrp8 and Mrpl4 are endogenous activators of Toll like receptor 4 promoting lethal endotoxin induced shock Nat Med 13 1042 1049 Yang D Chen Q Rosenberg HF Rybak SM Newton DL et al 2004 Human ribonuclease A superfamily members eosinophil derived neurotoxin and pancreatic ribonuclease induce dendritic cell maturation and activation J Immunol 173 6134 6142 Sur S Glitz DG Kita H Kujawa SM Peterson EA et al 1998 Localization of eosinophil derived neurotoxin and eosinophil cationic protein in neutrophilic leukocytes J Leukoc Biol 63 715 722 Pedra JH Cassel SL Sutterwala FS 2009 Sensing pathogens and danger signals by the inflammasome Curr Opin Immunol 21 10 16 Foell D Frosch M Sorg C Roth J 2004 Phagocyte specific calcium binding 100 proteins as clinical laboratory markers of inflammation Clin Chim Acta 344 37 51 Gordon S 2007 Macrophage heterogeneity and tissue lipids J Clin Invest 117 89 93 Barlic J Murphy PM 2007 Chemokine regulation of atherosclerosis J Leukoc Biol 82 226 236 Mosig S Rennert K Buttner P Krause S Lutjohann D et al 2008 Monocytes of patients with familial hypercholesterolemia show alterations in cholesterol metabolism BMC Med Genomics 1 60
549. x techniques le RNA seq Whole Transcriptome Shotgun Sequencing WTSS est un outil pour la transcriptomique permettant le s quen age de tous les transcrits d un chantillon le SAGE seq Serial Analysis of Gene Expression galement appel DGE seq pour Digital Gene Expression pr c demment utilis pour le s quen age des EST 50 Chapitre 1 Introduction g n rale gr ce au clonage en s rie de fragments tr s courts d ADNc dans un vecteur plasmi dique Velculescu ef al 19957 Cette application permet l analyse du niveau d expres sion d un grand nombre de g nes via l identification de s quences en 5 UTR appel es tiquettes ou fags et leur comptage L analyse du transcriptome par s quen age tr s haut d bit est rapidement devenue un atout pr cieux pour l tude de maladies telles que le cancer Morin ef al 20081 Cette approche permet l acquisition de donn es d expression de g nes l chelle pang nomique de la m me fa on que les approches ant rieures utilisant les puces ADN Mais elle pr sente des avantages sur les puces ADN elle permet notamment d obtenir en une seule fois beaucoup plus d informations telles que les fusions de g nes les transcrits alternatifs les mutations post transcriptionnelles ou encore l tude des ARNs non codants miRNA lincRNA Linsen et al 2009 De plus la d tection des transcrits n tant pa
550. y material for a video tutorial Available molecu lar interactions are derived from various sources our predictions TBMC and numerous databases including ChIP X LymphTF DB OregAnno HPRD Intact TargetScan and KEA However Inter actomeBrowser may also accept additional interaction datasets that users can provide through a tab ulated flat file InteractomeBrowser relies on a mixed graph that contains both directed and undirected edges de picting various types of interactions ranging from proteins complex formation to transcriptional reg ulation Thus nodes represent both genes and gene products InteractomeBrowser uses a subset of terms of the Cellular Component ontology supplementaty figure S4 to map nodes onto a schematic and hierarchical view of cell compartments users may choose to disable this option As a consequence each gene product may be represented by several instances e g one in the nucleus and one in the cytosol The nodes placement is controlled by a force directed placement layout the nodes are repulsive to each other they are attracted to their respective compartments and edges act like springs the force directed placement layout can be switched off or on at any moment through the Display menu Once a graph has been drawn one can easily add or delete nodes InteractomeBrowser provides several filters that are intended to focus on the most interesting part of the network Users can filter out orph
551. zarevic D Lipovich L Liu J Liuni S McWilliam S Madan Babu M Madera M Marchionni L Matsuda H Matsuzawa S Miki H Mignone F Miyake S Morris K Mottagui Tabar S Mulder N Nakano N Nakauchi H Ng P Nilsson R Nishiguchi S Nishikawa S Nori F Ohara O Okazaki Y Or lando V Pang K C Pavan W J Pavesi G Pesole G Petrovsky N Piazza S Reed J Reid J F Ring B Z Ringwald M Rost B Ruan Y Salzberg S L Sandelin A Schneider C Sch nbach C Sekiguchi K Semple C A M Seno S Sessa L Sheng Y Shibata Y Shimada H Shimada K Silva D Sinclair B Sperling S Stupka E Sugiura K Sultana R Takenaka Y Taki K Tammoja K Tan S L Tang S Taylor M S Tegner J Teichmann S A Ueda H R van Nimwegen E Verardo R Wei C L Yagi K Yamanishi H Zabarovsky E Zhu S Zimmer A Hide W Bult C Grimmond S M Teasdale R D Liu E T Brusic V Quackenbush J Wahlestedt C Mattick J S Hume D A Kai C Sasaki D Tomaru Y Fukuda S Kanamori Katayama M Suzuki M Aoki J Arakawa T Iida J Imamura K Itoh M Kato T Kawaji H Kawagashira N Kawashima T Kojima M Kondo S Konno H Nakano K Ninomiya N Nishio T Okada M Plessy C Shibata K Shiraki T Suzuki S Tagami M Waki K Watahiki A Okamura Oho Y Suzuki H Kawa
Download Pdf Manuals
Related Search
Related Contents
5 - Sorensen à petites doses - Mediapost Publicité MANUEL D`INSTALLATION ET D`EMPLOI £10,000 prize draw Français ROTEX Solaris Origin Storage KB-90KRN notebook spare part Copyright © All rights reserved.
Failed to retrieve file