Home

IBM Massively Parallel Blue Gene

1. o P c icount 0 do 1 numvdw include ew_directp h enddo c c calculation starts loop over the data gathered in the temporary array caches c ifdef MASSLIB call vrsqrt cache_df cache_r2 icount else do im_new 1 icount delr2 cache r2 im new delrinv 1 0 sqrt delr2 cache new delrinv enddo endif 85 BO BICB short_ene f optimized cont Biomedical Informatics amp Computational Biology do im new 1 icount j cache_bckptr im_new delr2 cache r2 im new delrinv cache df im new c cubic spline on switch c delr delr2 delrinv delr2inv delrinv delrinv x dxdr delr ind eedtbdns x dx x ind del ind 4 ind 86 Bab Single processor Optimization B I B Biomedical Informatics amp Computational Biology vector mass without MASS with MASS Elapsed 25 9 95 2226 20 o POWER 375 MHz User 2574 65 2224 06 15 Speedup Sys 050 047 87 Bab Running Xprofiler on Silver lab 2 B I B Biomedical Informatics amp Computational Biology cd scratchl cpsosa bicb8510 fortran dgemm smodule load 1 Smake f make ibml xlf pg c 03 qhot qarch pwr6 qtune pwr6 q64 matmul f matmul End of Compilation 1 1501 510 Compilation successful for file matmul f xlf pg c 03 ghot qarch pwr6 qtune pwr6 q64 dgemm f dgemm End of Compilation 1 dgemm f 1500 036 I The NOSTRICT
2. Base Pairs Sequences NE 1986 1990 1994 1998 2002 Base Pairs of DNA billions Database f Levels of Parallelism and Potential I O Bottlenecks B I C B Biomedical Informatics amp Computational Biology D Fine Medium Grained Coarse Subject s Target s Parallelism R C Braun K T Pedretti T L Casavant T E Scheetz C L Birkett C A Roberts Three Complementary Approaches to Parallelization of Local BLAST Service on Workstation Clusters Future 167 Generation Computer Systems 17 745 2001 BLAST to mpiBLAST PIO Evolution B I B Biomedical Informatics amp Computational Biology o mpiBLAST DBis partitioned and BLAST is executed in parallel o pioBLAST Uses parallel I O to improve mpiBLAST Dynamic virtual DB partitioning Improved result merging o mpiBLAST pio mpiBLAST Incorporates the parallel I O performance enhancements of pioBLAST into mpiBLAST o Darling L aray and W Feng The design implementation and evaluation of mpiBLAST In Proceedings F the Cluster World Conference and Expo in conjunction with the 4th International Conference on Linux Clusters The HPC Revolution 2003 o H Rangwala E Lantz R Musselman K Pinnow B Smith and B Wallenfelt Massively Parallel BLAST for the Blue Ge
3. 44 Bab Storing Files in Memory B I C B Biomedical Informatics amp Computational Biology o Original BLAST version utilizes mmap to store the database in memory o mmap is not implemented as a part of the Blue Gene L operating system o With all nodes sharing the same file system I O contention severely limits the scaling of this application o Solution virtual file manager VFM Oystein Thorsen Karl Jiang Amanda Peters Brian Smith Heshan Lin Wu chun Feng Carlos P Sosa Parallel Genomic Sequence Search on a Massively Parallel System Conference On Computing Frontiers Proceedings of the 4th international conference on Computing frontiers Ischia Italy 59 68 2007 171 Virtual File Manager BICB Biomedical Informatics amp Computational Biology o VFM is used to store database fragments in memory query files in memory various temporary files in memory o Eliminates disk I O Allows files distribution using MPI when workers need the same file Computing frontiers Ischia Italy 59 68 2007 Oystein Thorsen Karl Jiang Amanda Peters Brian Smith Heshan Lin Wu chun Feng Carlos P Sosa Parallel Genomic Sequence Search a Massively Parallel System Conference On Computing Frontiers Proceedings of the 4th international conferencean BO Multiple Masters B I C B Biomedical Informatics amp Computational Biology o Second level of management Limit the number of worke
4. yes setenv LOADLIB L MASSLIBDIR lmassvp4 if SVENDOR BLAS yes setenv LOADLIB LOADLIB lblas if SVENDOR LAPACK yes setenv LOADLIB SLOADLIB lessl little or no of rgation setenv LO 1 90 qqfixed modest optimization 1 1 scalar setenv L1 xlf90 pg 1 02 c high scalar optimization but not vectorization setenv L2 1 90 d4qfixed 03 qmaxmem 1 qarch auto qtune auto high optimization may be vectorization not parallelization setenv L3 1 90 d4qfixed 03 qmaxmem 1 qarch auto qtune auto 73 Bab Xprofiler Calling Tree B I C B Biomedical Informatics amp Computational Biology File View Filter Report Utility Help Call arcs Function boxes al CPU Usage 30 03 seconds summary of 1 gmon out profile files rogram sander Tot isplay Status showing 210 out of 210 nodes and 217 out of 217 arcs BO BICB 4 Xprofiler Zoom In Biomedical Informatics amp Computational Biology Xprofiler v1 1 IBM POWER Parallel System Filter Report Utility 0 040 x 0 040 coords 58 0 000 x 0 000 timer stop start 124 15 530 x 16 530 short ene 7 Program sander Total CPU Usage 30 03 seconds summary of 1 gmon out profile files Display Status showing 210 out of 210 nodes and 217 out of 217 arcs 1 Babe Functions Labels B I C B Biomedical Informatics amp Computational
5. bash export TRACE SEND PATTERN yes csh setenv TRACE SEND PATTERN yes o Wrappers keep track of the number of bytes that are sent to each task and a binary file send bytes matrix is written during MPI Finalize which lists the number of bytes that were sent from each task to all other tasks The binary file has the following format 000 001 DOn D10 Dij Dnn In this format the data type Dij is double and it represents the size of MPI data that is sent from rank i to rank j This matrix can be used as input to external utilities that can generate efficient mappings of MPT tasks onto torus coordinates The wrapper also provide the average number of hops for all flavors of MPI Send The wrappers do not track the message traffic patterns in collective calls such as MPI Alltoall Only point to point send operations are tracked AverageHops for all communications given processor is measured as follows AverageHops sum Hopsi x Bytesi sum Bytesi Hopsi is the distance between the processors for MPI communication and Bytesi is the size of the data that is transferred in this communication The logical concept behind this performance metric is to measure how far each byte has to travel for the communication in average If the communication processor pair is close to each other in the coordinate the AverageHops value tends to be small 118 Bab Output plain text BICB Biomedical Informatics amp Computa
6. http publib boulder ibm com infocenter pseries v5r3 index jsp topic com ibm aix cmds doc aixcmds2 hpmcount htm 96 Bab hpmcount examples B I C B Biomedical Informatics amp Computational Biology o Torun the 15 command and write information concerning events in set 5 from hardware counters enter 2 15 o Torun the 15 command and write information concerning events in sets 5 2 and 9 from hardware counters using the counter multiplexing mode enter Npmeount eS 2 2 7 1s 97 Bab Lab 3 hpmcount exercise 7 B I C B Biomedical Informatics amp Computational Biology bin csh Very simple serial code set up to execute under HPMCOUNT control cat lt lt EOF gt it f program main implicit none integer i real sum common sum sum 0 0 do i 1 1000000 5 00000001 1 end do print sum sum stop end EOF Compile and build program it from it f use g option and no optimization to support source debugging of all Fortan statements r O4 qarch auto qrealsize 8 o it it f Execute program it with HPMCOUNT usr bin hpmcount it http www cisl ucar edu docs ibm hpm toolkit hpmcount html 98 BICB Bab Lab 3 2 hpmcout output HPMCOUNT output Execution time wall clock time 0 057595 seconds HHHHHHHH Resource Usage Statistics Total amount of time in user mode 0 015934 seconds Total amount of time in system mode
7. 039 41039 039 039 039 039 mpi profile O 4 PRA RPP ARP Biomedical Informatics amp Computational Biology ZCoord proci otal_comm sec avg Nops 00 00 00 00 00 00 00 00 00 00 RP PGE 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 123 BICB profile O 5 Biomedical Informatics amp Computational Biology MPI tasks sorted by communication time taskid xcoord ycoord zcoord procid total_comm sec avg_hops 000 0 015 1 00 2 0 0 4 039 1 00 4 039 1 00 4 039 1 00 039 1 00 039 1 00 171010 4 039 1 00 51100 4 039 1 00 23 3110 4 039 4 00 40100 4 039 1 00 29 13 10 4 039 1 00 211110 4 039 1 00 153300 4 039 7 00 19 3010 4 039 4 00 313310 4 039 7 00 200110 4 039 1 00 62100 4 039 1 00 73100 4 039 4 00 80200 4 039 1 00 33000 4 039 4 00 160010 4 039 1 00 113200 4 039 4 00 131300 4 039 1 00 142300 4 039 1 00 24 0210 4 039 1 00 273210 4 039 4 00 222110 4 039 1 00 25 1210 4 039 1 00 280310 4 039 1 00 120300 4 039 1 00 182010 4 039 1 00 302310 4 039 1 00 124 Bab MPI Trace lab 4 i B I C B Biomedical Informatics amp Computational Biology cd scratchl opsosa bicb8510 c mpi module load hpct Silver make f make pi opt ibmhpc ppe poe bin mpcc g o pi pi c L opt ibmhpc ppe hpct lib lmpitrace 1m silver po
8. An Efficient Parallel of the Hidden Markov 155 Methods for Genomic Sequence Search on a Massively Parallel SystemIEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS VOL 19 NO 1 JANUARY 2008 44 DW PVM B I C B Biomedical Informatics amp Computational Biology PVM calls pvm initsend pvm pk pack instructions or unpack pvm send or pvm recv Replaced with calls MPI Send for each pvm send MPI Recv for every pvm recv memcpy for every pvm pk and pvm upk MPI Send and MPI_Recv fo send and receive the entire package Functions were constructed to pack the HMM data along with other control structures in parallel with the PVM to MPI conversions K Jiang O Thorsen A Peters B Smith and C P Sosa An Efficient Parallel Implementation of the Hidden Markov Methods 156 for Genomic 222 Search on a Massively Parallel SystemlEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS VOL 19 NO 1 JANUARY 2008 hmmsearch Parallel BICHB ocneme Biomedical Informatics amp Computational Biology I I 4 AddToHistogram d 2 Q d 2 Y SLAVE SLAVE SLAVE SLAVE Image source K Jiang O Thorsen A Peters B Smith and C P Sosa An Efficient Parallel Implementation of the Hidden Markov Methods for Genomic Sequence Search on a Massively Parallel SystemlIEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS VOL 19 NO 1
9. Computational Biology o The Standard GNU compilers and libraries which are also located on the frontend node will NOT produce Blue Gene compatible binary code The standard GNU ene can only be used for utility or frontend code development that your application may require o GNU compilers Fortran C for Blue Gene are located in opt blrts gnu o Fortran opt gnu powerpc bgp linux gfortran opt gnu powerpc bgp linux gcc o opt gnu powerpc bgp linux gtt o Itis recommended not to use GNU compiler for Blue Gene as the IBM XL compilers offer significantly higher performance The GNU compilers do offer more flexible support for things like inline assembler 35 Bab Messaging Software Stack B I C B Biomedical Informatics amp Computational Biology BGP Network Hardware _ IBM supported software Externally supported software 36 Source C P Sosa and B Knutson IBM System Blue Gene Solution Blue Gene P Application Development SG24 7278 03 Redbooks Draft Redbooks last update 25 August 2009 Library Location B I C B Biomedical Informatics amp Computational Biology o MPI implementation on Blue Gene is based on MPICH 2 from Argonne National Laboratory o Include files mpi h and mpif h are at the location I bgsys drivers ppcfloor comm include Bab Compile and Link MPI Programs BICB The following scripts are provided to compile and link MPI programs mpicc C
10. One Way Memorial Union Lia 2 Memorial Auditorium Hendon Ave AREA M Medicol Center Minn Cro ene i E Imp Assot mj Place SO a Buford Ave ANA UNIVERS PEOF MINNESOTA ROCHESTER z gt NIE a y d Workshops o Introductory Unix Linux remote computing job submission queue policy o Programming amp Scientific Computation Code parallelization programming languages math libraries Computational Physics Fluid dynamics space physics structural mechanics material science Computational Chemistry Quantum chemistry classical molecular modeling drug design cheminformatics Computational Biology Structural biology computational genomics proteomics bioinformatics The institute s web page https www msi umn edu Getting started https www msi umn edu support start html Software https www msi umn edu sw Password reset https www msi umn edu password Tutorials https www msi umn edu tutorial UNIVE pY OF MINNESOTA ROCHESTER P RAS aal Grants and Traineeships o Hormel Institute Blue Gene Mayo Clinic o IBM Blue Gene Rochester and Watson University of Minnesota Supercomputing Institute MSI B I C B Biomedical Informatics amp Computational Biology Thank You gt Library 210
11. P7Viterbi 3 18 71727127772 ReadSeq 5 727771 749381 230545997 230546004 toupper 12 4366626 13380941 free 15 3720 3720 216331373633 7220 log 17 2117717 2214013 exp 54 727171 13380947 malloc 20 727771 727711 AddToHistogram 1455543 1455545 1455543 1455545 141 pthread exit 151 BU Selected Techniques f B I B Biomedical Informatics amp Computational Biology o Maximize Expressions Transforming if max calculation into using the operator o Use of registers Using registers to carry values to next iterations eliminating a large number of load operations o Fusion Helped increase registers reuse o Better arrays access 152 4 BOL Maximizing Expressions B I C B Biomedical Informatics amp Computational Biology for k 1 E After maximizing expressions if dmx i 1 k 1 1 1 1 SZ gt scl sSo2 5c01 804 gt sc3 sc4 505 866 gt SC5 2906 sch 153 BO BICB hmmsearch Timings with a Single Query and the NR Database Biomedical Informatics amp Computational Biology POWER3 375 MHz POWERA 1 3 GHz Original Optimized 154 Massively Parallel Version B I B Biomedical Informatics amp Computational Biology I Single node optimization II Port from PVM III BG parallel optimizations K Jiang Thorsen A Peters B Smith and C P Sosa
12. Used to synchronize timebases Single clock source for all racks 24 Bab High Throughput Computing Mode BICB Biomedical Informatics amp Computational Biology High Throughput Computing HTC modes on Blue Gene P BG P with HTC looks like a cluster for serial and parallel apps Hybrid environment standard HPC MPI apps plus now HTC apps Enables a new class of workloads that use many single node jobs Easy administration using web based Navigator 1024 nodes pu BICB HPC versus HTC Biomedical Informatics amp Computational Biology 44 High Performance Computing HPC Mode best for Capability Computing Parallel tightly coupled applications Single Instruction Multiple Data SIMD architecture Programming model typically MPI Apps need tremendous amount of computational power over short time period High Throughput Computing HTC Mode best for Capacity Computing Large number of independent tasks Multiple Instruction Multiple Data MIMD architecture Programming model non MPI Apps need large amount of computational power over long time period Traditionally run on large clusters HTC and HPC modes co exist on Blue Gene Determined when resource pool partition is allocated 26 Babe Outline B I C B Biomedical Informatics amp Computational Biology 0 Part I Hardware Historical perspective Why do we need MPPs Overview of ma
13. database searches to match an HMM hmmer wustl edu Oct 1998 S R Eddy HMMER User s Guide Biological Sequence Analysis Using Profile Hidden Markov Models Version 2 3 2 http 149 BU Single Node Optimization Data Set 44 B Biomedical Informatics amp Computational Biology o Queries The two queries consisted of 0111174687 sp p42461 THIX CORGL Thiamine biosynthesis protein X and 50 aligned globin sequences as provided in the HMMER version 2 2 globins50 The first query corresponds to a single sequence of a small protein with 135 characters amino acids o Databases Small protein database SWISS PROT 49787460 characters or 108891 sequences The second database NR 459219939 characters 929420 sequences This second database is larger than the first one by almost a factor of 10 150 BO BICB index 65 time Profiler Partial Output NH 421 714 02 06 94 Iracescoretorrection 3 61 0 17 0 01 PostprocessSignificantHit 24 22 203 06 05 pthread mutex lock 28 2 0 01 pthread mutex unlock 0 00 765 7855 la OF Biomedical Informatics amp Computational Biology self descendents 52 52 71 49 10 48 45 1 00 00 LD 04 DU 00 00 called total parents called self name index called total children 1 1 pthread body 2 1 2 worker_thread 1 1277717727771
14. last update 25 August 2009 BICB IBM p520 GPFS NSD 9 sarvers each server has ane T GbE and ane 966 connection Force10 Networks E1200 10GbE DED Shared GPFS Filesystem Biomedical Informatics amp Computational Biology o neos feu i 1 eral FC Data Direct Mabwonks S2490000 couplet with 5 SAPSO0C disk enclosures attached B per coupel nate mot to scere all eonnactianms are shown BG P Applications Specific Integrated Circuit ASIC Diagram I Biomedical Inf gy L2 Data cache BlueGene P Node prefetch buffer Data 8 B cyde holds 15 128 byte lines can prefetch up to 7 streams DOR 2 Multiplexing 5 itch L1 Data cache 32 KB total size 32 Byte line size 64 way associative round robin replacement write through for cache coherency 4 cycle load to use pel E re oe for Tree Tones and globes ber Ers L3 Data cache 2x4 MB 50 cycles latency on chip 3 Forts 4 Poms 10600 naei Nau cri SEIT Bidirectional Layer Bidro 22 Source Sosa and Knutson IBM System Blue Gene Solution Blue Gene P Application Development SG24 7278 03 Redbooks Draft Redbooks last update 2
15. ne WE Up to 512 TB 4 4 44 Cabled 8x8x16 Rack 32 Node Cards 13 9 TF s 2TB Node Card 32 Compute Cards lt gt 0 2 I O cards Compute 435 2 GF s SoC 1 SoC 40 DRAMs 64 GB 13 6 GF s 13 6 GF s 8 MB EDRAM 2 GB DDR Source C P Sosa and B Knutson IBM System Blue Gene Solution Blue Gene P Application Development SG24 7278 03 Redbooks Draft Redbooks last update 25 August 2009 f PAAT Hierarchy B I B Biomedical Informatics amp Computational Biology Compute nodes dedicated to running user applications and almost nothing else simple compute node kernel CNK I O nodes run Linux and provide a more complete range of OS services files sockets process launch debugging and termination Service node performs system management services e g heart beating monitoring errors largely transparent to application system software Looking inside Blue Gene 18 Frontend File i Nodes Servers Service Node 21 E Collective Network System I Console I l CMCS 10 Gbps i i Ethernet Scheduler i lece Gigabit Ethernet a Node 1151 E Worm MM JTAG Palomino 1 Source Sosa and Knutson IBM System Blue Gene Solution Blue Gene P Application Development SG24 7278 03 Redbooks Draft Redbooks
16. one Fan Assembly 0 9 O Berom Front 4 Pront SeBortomn Mex 9 Rear Midplane 0 1 0 toman 1 Rack Columo QF Rack Row 0 F 03 Redbooks Draft Redbooks last update 25 August 2009 Note The fact that this illustration shows numbers 00 through 77 does not imply this is the largest configuration possible The largest configuration possible is 256 racks numbered 00 Service Card Side Bab Cards Naming Convention B I C B Biomedical Informatics amp Computational Biology Note Master service card for rack is 1 0 5 Compute Cards Wix 5 A Service Card A 1 Compur e 04 through 25 Midplane 0 1 3 1 Cord OL 155 Rack Column LF 5 Midplane 10 130 L Top Back Row 0 F 5 Back Columo LF 5 Back Row 0 F 5 Setvice Cards Link op A E Link Card 10 1 Bear t Faar LO Caids Midplane 0 1 1 Exx Mx Haxx Ixx Rack Colimn Fs A Cord 00 015 Rack Rove F 5 D o Warde Cord 00 15 Midplane d l 0 1 Back Column Rack Roc 0 F 5 Mode Cazidz Etom Front Rock m ee Tronr M i Tode Card 00 15 A 0S Bot nom Mam Pac Element Name oa 04 through J35 R23 M10 N02 J09 04 1 57 4 04
17. 0 Number of tasks lt lt size lt lt My rank lt lt rank lt lt BO BICB C Fortran program include integer call call MPI call MPI call end Hello World Fortran Biomedical Informatics amp Computational Biology example hello rank size ierror tag status MPI STATUS SIZE INIT ierror COMM SIZE MPI COMM WORLD size ierror COMM RANK MPI COMM WORLD rank ierror node rank i Hello world FINALIZE ierror 43 f Bab Debugging on Blue Gene B I C B Biomedical Informatics amp Computational Biology o The Compute Node Kernel which provides the low level primitives that are necessary to debug an application o The control and I O daemon CIOD running on the I O Nodes which provides control and communications Compute Nodes o A debug server running on the Nodes which is vendor supplied code that interfaces with the CIOD A a client running a Front End Node which is where the user does their work interactively GNU Project debugger Core processor debugger Addr2Line utility 44 Babe Outline B I C B Biomedical Informatics amp Computational Biology 0 Part I Hardware Historical perspective Why do we need MPPs Overview of massively parallel processing MPP Architecture Part II Software Overview Compilers MPI Building and Running Example
18. 0 003379 seconds Maximum resident set size 8532 Kbytes Average shared memory use in text segment O Kbytes sec Average unshared memory use in data segment 77 Kbytes sec Number of page faults without I O activity 2073 Number of page faults with I O activity 2 Number of times process was swapped out 0 Number of times file system performed INPUT 0 Number of times file system performed OUTPUT O Number of IPC messages sent 0 Number of IPC messages received 0 Number of signals delivered 0 Number of voluntary context switches 13 Number of involuntary context switches 52 http www cisl ucar edu docs ibm hpm toolkit hpmcount html 99 E Biomedical Informatics amp Computational Biology Bab Lab 3 3 hpmcout output B I C B Biomedical Informatics amp Computational Biology HHHHHH H End of Resource Statistics FHHHHHHH Set 1 Counting duration 0 019886103 seconds PM_FPU_1FLOP FPU executed one flop instruction 4000225 PM_FPU_FMA FPU executed multiply add instruction 11000076 PM_FPU_FSQRT_FDIV FPU executed FSQRT instruction 0 PM CYC Processor cycles 26428653 PM_RUN_INST_CMPL Run instructions completed 47657875 PM_RUN_CYC Run cycles 93529315 Utilization rate 9 755 Flop 26 000 Mflop Flop rate flops WCT 451 435 Mflop s Flops user time 4627 772 Mflop s FMA percentage 146 665 http www cisl ucar edu docs ibm hpm toolkit hpmcount html 100 f P Instrument
19. 3 B I C B Biomedical Informatics amp Computational Biology Message size distributions MPI Irecv calls avg bytes time sec 3 2 3 0 000 1 8 0 0 000 1 16 0 0 000 1 32 0 0 000 1 64 0 0 000 1 128 0 0 000 1 256 0 0 000 1 512 0 0 000 1 1024 0 0 000 1 2048 0 0 000 1 4096 0 0 000 1 8192 0 0 000 1 16384 0 0 000 1 32768 0 0 000 1 65536 0 0 000 1 131072 0 0 000 1 262144 0 0 000 1 524288 0 0 000 1 1048576 0 0 000 Communication summary for all tasks minimum communication time 0 015 sec for task 0 median communication time 4 039 sec for task 20 maximum communication time 4 039 sec for task 30 122 BICB co CO CO PO PO PO PO PO PO PO PO PO FR E PR PP CO BH GOGU N HOU b WNHROUOWNFOWNFOWNFOWNEF O w N NON HM H BHROOGO CO 1 C C CO CO lO OD lO lO Oo OO O00 C0 C0 I2 OC OO Qo PRP PPP PPP PP PEE EEE OO lt QOO OO o SOS O S UO QU gt gt gt gt x HRP A HPP ag 4534 4 A 015 039 039 039 039 039 039 039 039 039 039 41039 039 039 039 039 039 039 039 039 039 039 039 039 039 039
20. 8 100 tvectors E 9 E mog ILLIAC Iv cire EM 7090 rammis BRA 70 otal 1E 2 1940 1950 1960 1970 1980 1990 2000 2010 Year Introduced http www reed electronics com electronicnews article CA508575 html indust ryid 21365 From Kilobytes to Petabytes in 50 Years http www eurekalert org features doe 2002 03 dinI fkt062102 php Biomedical Informatics amp Computational Biology BD what are MPP Systems Good at BICB oGrand challenge problems is a key part of high performance computing applications Grand challenges are fundamental problems in science and engineering with broad economic and scientific impact and whose solution can be advanced by applying high performance computing techniques and resources Source Pete Beckam Director ACLF Argonne National Lab Different from the Rest Biomedical Informatics amp Computational Biology mu Lm ES S DY BICB Pushing the Technology Biomedical Informatics amp Computational Biology Source Pete Beckam Director ACLF Argonne National Lab f PM Machine for Protein Folding B I B Biomedical Informatics amp Computational Biology December 1999 IBM Announces 100 Million Research Initiative to build World s Fastest Supercomputer Blue Gene to Tackle Protein Folding Grand Challenge YORKTOWN HEIGHTS NY December 6 1999 IBM today announc
21. Biology Functions are represented by green solid filled boxes in the function call tree size and shape of each function box indicates its CPU usage height of each function box represents the amount of CPU time it spent on executing itself width of each function box represents the amount of CPU time it spent on executing itself plus its descendant functions Function cycle total amount of CPU time in seconds this function spent on itself plus descendants the number to the left of the x the amount of CPU time in seconds this function spent only on itself the number to the right of the x Call arc labels show the number of calls that were 76 made between the two functions from caller to callee Library Filters before Biomedical Informatics amp Computational Biology 77 BICB Library Filters after Biomedical Informatics amp Computational Biology 78 Bab Looking at the Source Code B I B Biomedical Informatics amp Computational Biology no ticks line per line source code 10 50 I 1 M C I ZERO CONTINUE ELSE IF THEN 10 60 I 1 M ELT TO Reece D 39 CONTINUE END IF be 12 IF BE L J NE ZERO THEN TEMP ALPHA BE L J CO I 2 CC I J TEMP AC I L 4 H N Tick marks RET CONTINUE ELSE Form t alpha A B Search
22. CHALLENGE Database fragmentation and distribution Parallelization of very large databases versus very large queries USAGE Database searches homology a o Data intensive applications Bab Quantum Chemistry BICB f Biomedical Informatics amp Computational Biology HY EY CHARACTERISTICS o These methods have traditionally been used for computing very accurate properties of small molecules to o Complex systems with 1000s of atoms CHALLENGE 0 Parallel scalability to large number of processors 0 Parallelization of Linear Algebra based algorithms USAGE Small to medium molecules properties 52246 141 Compute intensive applications BO BICB Bioinformatics Areas of Interest Biomedical Informatics amp Computational Biology SEQUENCE ANALYSIS AND ALIGNVENT COMPARATIVE GENOMICS EVOLUTION AND PHYLOGENY GENE REGULATION AND TRANSCRIPTOMICS PROTEIN STRUCTURE AND FUNCTION PROTEIN INTERACTIONS AND MOLECULAR NETWORKS B TEXT MNING DATABASES AND ONTOLOGIES OTHER BIOINFORMATICS APPLICATIONS AND METHODS BIOINFORMATICS OF DISEASE Source ISMB 2008 Toronto Canada 142 Bioinformatics Selected Applications Biomedical Informatics amp Computational Biology BO BICB HMMER mpiBLAST PIO PBPI 143 Biomedical Informatics amp Computational Biology Hidden Markov models HMMs
23. COS tan atan atan2 sinh cosh tanh dnint x y o Vector Library The general vector library libmassv a contains vector functions that will run on the entire IBM pSeries and Blue Gene families 82 Bab short_ene f unoptimized I G B Biomedical Informatics amp Computational Biology e U 0 c Loop over the 12 6 LJ terms for eedmeth 1 rc UR c icount 0 do 1 numvdw include ew directp h enddo c c calculation starts loop over the data gathered in the temporary c array c C NO FUSION do im new 1 icount j tempint im new delr2 tempre 5 im new c c cubic spline on switch 83 BED short _ene f unoptimized cont 2 BICB Biomedical Informatics amp Computational Biology delrinv 1 0 sqrt delr2 delr delr2 delrinv delr2inv delrinv delrinv x dxdr delr ind eedtbdns x dx x ind del ind 4 ind e3dx 3 ind e4dx dx dx eed cub 4 ind switch eed 1 dx eed 2 e3dx e4dx third half d switch eed cub 2 4ind e3dx e4dx half 84 Bab short_ene f optimized B I C B Biomedical Informatics amp Computational Biology c Loop over the 12 6 LJ terms for eedmeth 1 T w P M n
24. JANUARY 2008 157 4 hmmseacrh Plain MPI Port B I C B Biomedical Informatics amp Computational Biology 1 2 1 5 08 D 064 Plain N 0 4 2 0 2 0 32 64 128 256 512 1024 Processors 158 f BUD Gene Optimizations B I B Biomedical Informatics amp Computational Biology Alternate Sequence File Indexing Open file and skip to offset Multiple Master Configuration Single master not enough to handle communication Use current infrastructure and include another management level e Multiple master structure is able to do an intermediate processing step Dynamic Data Collection Eliminate gather operation Introduce buffer and tolerance threshold Database Caching in hmmpfam Eliminate excessive I O Load balancing Index file and offset K Jiang O Thorsen A Peters B Smith and C P Sosa An Efficient Parallel of the Hidden Markov 159 Methods for Genomic Sequence Search on a Massively Parallel SystemIEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS VOL 19 NO 1 JANUARY 2008 DD Master Supermaster Scheme BICB Results from slave source Markov Methods for Ge POST PROCESS MASTER MASTER MASTER es MASTER SLAVE SLAVE SLAVE SLAVE 4 Biomedical Informatics amp Computational Biology Results f
25. MPI specifically MPICH Of the several programming APIs the only one supported on the Blue Gene P system that is portable is OpenMP is supported only on individual nodes Is the memory requirement per MPI task less than 4 GB Is the code computational intensive That is is there a small amount of I O compared to computation Is the code floating point intensive This allows the double floating point capability of the Blue Gene P system to be exploited Does the algorithm allow for distributing the work to a large number of nodes Have you ensured that the code does not use flex lm licensing At present flex Im library support for Linux on IBM System is not available If you have answered yes to all of these questions then answer the following questions Bo ne Has the code been ported to Linux on System p Is the code Open Source Software OSS These type of applications require the use of the GNU standard configure and special considerations are require Can the problem size be increased with increased numbers of processors Do you use standard input If yes can this be changed to single file input 53 1 Biomedical Informatics amp Computational Biology What is Performance Tuning B I C B Biomedical Informatics amp Computational Biology o Application software optimization is the process of making it work more efficiently Executes faster Uses less memory Perfo
26. MPI Comm size MPI COMM WORLD amp size Comm rank _ _ WORLD amp rank tag 100 if rank 0 strcpy message Hello world for i l i size i MPI Send message 13 MPI CHAR i tag MPI COMM WORLD else rc MPI Recv message 13 MPI CHAR 0 tag MPI COMM WORLD amp status printf node d 13s n rank message rc Finalize t Bab Compiling on Blue Gene C B I C B Biomedical Informatics amp Computational Biology 5 make hello XL CC Iprixloxx r OBJ hello SRC dello FLAGS 03 qarch 450 qtune 450 LIBS OBJ SRC S XL_CC FLAGS SRC o OBJ S LIBS clean rm o hello 41 BO BICB Hello World C Biomedical Informatics amp Computational Biology 4 cat hello cc Include the MPI version 2 C bindings include lt mpi h gt include lt iostream gt include lt string h gt using namespace std int main int argc char argv MPI Init argc argv int rank MPI COMM WORLD Get rank int size MPI COMM WORLD Get size char name MPI MAX PROCESSOR NAME int len memset name 0 MAX PROCESSOR MPI Get processor name name len memset name len 0 MPI_MAX PROCESSOR 1 cout hello parallel cc name lt lt name lt lt lt lt endl MPI Finalize return
27. amp Computational Biology Comparison between mpirun and Loadleveler Ilsubmit command command MPIRUN Syntax LoadLeveler Command File Syntax mpirun partition R011 job type bluegene requirements Machine S host exe gpfs fs2 frontend id my dir my prog amp arguments exe Igpfs fs2 frontend id my dir my prog cwd pwd cwd pwd args f arg arg1 args 0 arg1 2 verbose bg_partition R011 queue ogsys drivers ppcfloor bin mpirun job_type and requirements tags must ALWAYS be specified as listed above Babe Outline B I C B Biomedical Informatics amp Computational Biology o Part I Hardware Historical perspective Why do we need MPPs Overview of massively parallel processing MPP Architecture o Part II Software Overview Compilers MPI Building and Running Examples on Blue Gene Hands on session 1 0 Applications MPP architecture and its impact on applications Performance tools Introduction to code optimization Hands n session 2 Mapping applications on a massively parallel architecture Applications landscape Challenges and characteristics of Life Sciences applications Selected Bioinformatics applications Selected Structural Biology applications Hands on session 3 o Future Directions o Summary 50 American Chemical Society Chemical amp Engineering News April 13
28. has no effect on the content of any of the Xprofiler reports 128 E Biomedical Informatics amp Computational Biology Bab Xprofiler Options 3 BICB O e Xprofiler e functionl e function2 a out gmon out This option de emphasizes the appearance of the function box or boxes for the specified function or functions in the function call tree This option also limits the number of entries for these function in the Call Graph Profile report This also applies to the specified functions descendants as long as they have not been called by non specified functions In the function call tree the function box or boxes for the specified function or functions appears to be unavailable Its size and the content of the label remain the same This also applies to descendant functions as long as they have not been called by non specified functions In the Call Graph Profile report an entry for the specified function oy appears where it is a child of another function or as a parent of a function that also has at least one non specified function as its parent The information for this entry remains unchanged Entries for descendants of the specified function do not appear unless they have been called by at least one non specified function in the program 129 4 Biomedical Informatics amp Computational Biology Xprofiler Options 4 E Xprofiler E functionl E function2 a out gmon out Thi
29. on Blue Gene P B I C B Biomedical Informatics amp Computational Biology f o Self comparison of Microbial Gnome database 5 2 GB raw size 16 million sequences 36864 o Scalability tests 32768 Search a quarter million of 26672 randomly sampled sequences sen against the database itself Pe Achieve 93 parallel 3 efficiency on 32768 cores 8 T 16384 rack BG P 12288 Complete genome to genome oe comparison Nu Finish searching 16 million I vs 16 million sequences within Number of Cores VN mode 12 hours P Balaji Poole Sosa X Ma and W Feng Massively Parallel Genomic Sequence Search the Blue Gene P Architecture IEEE ACM International Conference for High Performance Computing Networking and Analysis SC 2008 1 Da What is PBPI B I C B Biomedical Informatics amp Computational Biology o PBPI is an open source implementation of Parallel Bayesian Phylogenetic Inference o Combines sequential optimization and parallel processing to reduce execution times o Supports large problem sizes 182 f PO http www pbpi org B I C B Biomedical Informatics amp Computational Biology PBPI uses MPI message passing interface and runs under Linux Its parallel algorithm can be summarized as 1 Multi dimensional data and task distribution across multi dimensional grid
30. option default at OPT 3 has the potential to alter the semantics of a program Please refer to documentation on the STRICT NOSTRICT option for more information 1501 510 Compilation successful for file dgemm f xlf pg c 03 qhot qarch pwr6 qtune pwr6 q64 lsame f lsame End of Compilation 1 1501 510 Compilation successful for file lsame f xlf pg c 03 qhot qarch pwr6 qtune pwr6 q64 xerbla f xerbla End of Compilation 1 1501 510 Compilation successful for file xerbla r 1 pg o matmul q64 matmul o dgemm o lsame o xerbla o 88 BaD running Xprofiler on Silver lab 2 32 B Biomedical Informatics amp Computational Biology 5 matmul Lp mflops are 964 993481618238206 15 dgemm f gmon out lsame o matmul o xerbla o dgemm o lsame f make ibm1 matmul f xerbla f Zmodule load hpct ife matmul gmon out e 89 DO BICB File Running Xprofiler lab 2 3 View Filter k Report Biomedical Informatics amp Computational Biology AL YS TTT d 3 Program matmul Total CPU Usage 8 32 seconds summary of 1 gmon out profile files Display Status showing 3 out of 3 nodes and 2 out of 2 arcs 90 BUD Hardware Performance Monitor HPM amp Prerequisi B I B Biomedical Informatics amp Computational Biology o Hardware Performance Counter SQ s
31. option adds alternative paths to search for source code and library files or the current path search order When using this command line option you can use the at sign to represent the default file path in order to specify that other paths be searched before the default path Xprofiler a out gmon out c config file name This option loads the specified configuration file If the c option is used on the command line the configuration file name specified with it is displayed in the Configuration File c fext field in the Loads Files window and the Selection field of the Load Configuration File window When both the c and disp max options are specified on the command line the disp max option is ignored However the value that was specified with it is displayed in the Initial Display disp max field in the Load Files window the next time it is opened disp max Xprofiler disp max 50 a out gmon out This option sets the number of function boxes that Xprofiler initially displays in the function call tree The value that is supplied with this flag can be any integer between O and 5 000 Xprofiler displays the function boxes for the most CPU intensive functions through the number that you specify For instance if you specify 50 Xprofiler displays the function boxes for the 5O functions in your program that consume the most CPU After this you can change the number of function boxes that are d via the pu menu options This flag
32. out gmon out This option changes the general appearance and label information of all function boxes in the function call tree except for that of the specified function or functions and its descendants In addition the number of entries in the Call Graph Profile report for the non specified and non descendant functions is limited and the CPU data associated with them is changed The F flag overrides the E flag In the function call tree all function boxes except for that of the specified function or functions and its descendant or descendants appear to be unavailable The size and shape of these boxes change so that n are displayed as squares of the smallest allowable size In addition the CPU time shown in the function box label appears as zero In the Call Graph Profile report an entry for a non specified or non descendant function only is displayed where it is a parent or child of a specified function or one of its descendants When this is the case the time in the self and descendants columns for this entry is set to zero Asa result be aware that the value listed in the time column for most profiled functions in this report will change Xprotiler L tiled This option sets the path name for locating shared libraries If you plan to specify multiple paths use the Set File Search Paths option of the File menu on the Xprofiler GUI 132 l Biomedical Informatics amp Computational Biology BO BICB Appendix
33. performance improvement 164 l BO BLAST B I B Biomedical Informatics amp Computational Biology Emu Database gt BLAS BLAST Basic Local Alignment Search Tool A set of similarity search programs for searching available sequence databases regardless of whether the query is protein or DNA The most popular tool in bioinformatics NCBI BLAST server 500 000 query submissions per day 165 What is the Problem Biomedical Informatics amp Computational Biology BICB Hexokinase from the yeast species Saccharomyces cerevisiae 5 10 15 20 25 30 1 AASXDXSLVEVHXXVFIVPPXILQAVVSIA 31 TTRXDDXDSAAASIPMVPGWVLKQVXGSQA 61 GSFLAIVMGGGDLEVILIXLAGYQESSIXA 91 SRSLAASMXTTAIPSDLWGNXAXSNAAFSS 121 151 TXQAXAFSLAXLXKLISAMXNAXFPAGDXX 181XXVADIXDSHGILXXVNYTDAXIKMGIIFG 211SGVNAAYWCDSTXIADAADAGXXGGAGXMX 241 VCCXQDSFRKAFPSLPQIXYXXTLNXXSPX 271 AXKTFEKNSXAKNXGQSLRDVLMXYKXXGQ 301XHXXXAXDFXAANVENSSYPAKIQKLPHFD 331 LRXXXDLFXGDQGIAXKTXMKXVVRRXLFL 3611 AAYAFRLVVCXIXAICQKKGYSSGHIAAX 391GSXRDYSGFSXNSATXNXNIYGWPQSAXXS 421KPIXITPAIDGEGAAXXVIXSIASSQXXXA 451XXSAXXA Growth of GenBank 1982 2005 54 52 50 46 4 44 4 42 4 38 4 36 4 34 4 32 4 30 4 28 4 26 4 24 4 22 4 20 4 18 4 16 4 14 4 124 10 4 Sequences millions ONA 1982 Database size increasing faster than our ability to compute on it
34. some cases there might be deeply nested layers on top of and you might need to profile higher up the call chain functions in the call stack You can do this by setting this environment variable default value is O For example 0 RACEBACK_LEVEL 1 indicates that the library must save addresses starting with the parent in the call chain level 1 not with the location of the MPI call level 0 SWAP_BYTES The event trace file is omary and therefore it is sensitive to byte order For example Blue Gene L is big endian and your visualization workstation is probably little endian for example x86 The trace files are written in little endian format by default If you use a big endian system for graphical display such as Apple OS X AIX on the System p workstation and so on ou can set an environment variable by using one of the following commands depending on you shell bash export SWAP_BYTES no setenv SWAP BYTES no Setting this variable results in a trace file in big endian format when you run your jo 117 TRACE SEND PATTERN Blue Gene L and Blue Gene P only B I C B Biomedical Informatics amp Computational Biology o Ineither profiling or tracing mode there is an option to collect information about the number of hops for oint to point communication on the torus network This feature can be enabled by setting the RACE SEND PATTERN environment variable as follows depending on your shell
35. the 4th international conference on Computing frontiers Ischia Italy 59 68 2007 BU mpiBLAST 1 5 Performance Improvements 4 B I B Biomedical Informatics amp Computational Biology Partition i o DB frags cached in workers queries streamed across 4 4 qi o One output file per partition Results merged and written to GPFS through I O nodes is Gist Gist 4 2 2 4 2 Partition 1 Partition 2 Partition i Config example 3 PSize 128 RG 4 ves partition aot SINE CIE TIS MERE 32768 128 7 256 partitions 179 5339 f BUD Compare I O Strategies Single Partition B I B Biomedical Informatics amp Computational Biology o Experimental setup Database NT over 6 million seqs 23 GB raw size Query 512 sequences randomly sampled from the database Metric Overall execution time Computer to IO 128 1 Computer to IO 64 1 WM outperforms WC 00 m WorkerIndividual and WI by a factor E WorkerCollective of 2 7 and 4 9 SS Fs A e e 5 WorkerMerge Execution Time secs m wm c E 5 a u 54 128 Number of Workers Number of Workers Da Performance of Latest Research Prototype
36. the processor is traded favor of dense packaging and low power consumption per processor BaD Blue Gene Technology Roadmap B I C B Biomedical Informatics amp Computational Biology Blue Gene P PPC 450 850MHz Scalable to 3 PF Blue Gene Q Power Multi Core Scalable to 10 PF Blue Gene L 440 700MHz Scalable to 360 TF NS 2007 2011 2004 BU Most Power Space and Cooling efficient BICB 400 300 200 100 0 Supercomputer Racks TF kW TF Sq FUTF Tons TF Sun Constellation B Cray XT4 5 Published specs per peak performance Biomedical Informatics amp Computational Biology Bab Areas of Application LSK B I B Biomedical Informatics amp Computational Biology Life Sciences In Silico Trials Drug Discovery Geophysical Data Processing Upstream Petroleum j Biological Modeling Brain Science Financial Modeling Streaming Data Analysis ane oe A Physics Materials Science Molecular Dynamics Environment and Climate Modeling Life Sciences Sequencing Babe Outline B I C B Biomedical Informatics amp Computational Biology 0 Part I Hardware Historical perspective Why do we need MPPs Overview of massively parallel processing MPP Architecture Part II Software Overview Comp
37. were initially introduced for pattern recognition in digitized acoustics of the human voice L R Rabiner A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition Proceedings of the IEEE Vol 77 257 286 1989 Oystein Thorsen ie ved A Krogh M Brown I S Mian K Sjolander Michigan Technological U ef Mer K and Haussler D 1994 Hidden Markov BLAST models in computational biology Applications to protein modeling J Mol Biol vol pp 1501 1531 SR Eddy Multiple Alignment Using Hidden Markov Models Proc Third Int l Conf Intelligent Systems Molecular Biology ISMB 95 vol 3 pp 114 120 1995 http hmmer janelia org 144 BaD Hidden Markov Models in Bioinformatics B I C B Biomedical Informatics amp Computational Biology f The UC Santa Cruz profile HMM software SAM probably the closest relative of HMMER Philipp Bucher s PFTOOLS package implements generalized profiles which are substantially similar to profile HMMs commercial HMMpro package from Pierre Baldi and Yves Chauvin at NetID Inc implements more general HMM architectures than just profile HMMs and also comes with a nifty Java display Andy Neuwald s PROBE software implements models based on multiple ungapped HMM motifs and includes an implementation of training models by Gibbs sampling The UC San Diego META MEME package from Michael Gribskov Bill Grun
38. 00 2 VO amp Compute Module 000 R23 MO N13 J08 U00 ink Uo0througn 1705 00 leftmost 05 rightmost R32 Mo42 vo3 Link Port TA through TF RO1 MO0 L1 U02 TC Link data cable Connector 00 through J15 as labeled on link card R21 M1 L2 J13 Node Ethernet Connector 0 R16 M1 N14 EN1 Service Connector Control FPGA Control Network Clock R Clock B R05 MO S Control FPGA Clock Output 0 through Output 13 47 Bab Submitting Jobs mpirun B I C B Biomedical Informatics amp Computational Biology o Job submission using mpirun User can use mpirun to submit jobs The Blue Gene mpirun is located in usr bin mpirun o Typical use of mpirun mpirun np lt of processes partition block id cwd executable o Where np Number of processors to be used Must fit in available partition partition A partition from Blue Gene rack on which a given executable will execute eg ROOO cwd The current working directory and is generally used to specify where any input and output files are located exe The actual binary program which user wish to execute Example mpirun np 32 partition R000 cwd gpfs fs2 frontend 11 myaccount exe gpfs fs2 frontend 11 myaccount hello BUD rirun Standalone Versus mpirun in LL zzz Environment NE B I C B Biomedical Informatics
39. 2 Informatics database data mining and computing 3 Mathematics biostatistics and statistics 4 Chemistry chemical engineering and physics 5 Biophysics and structural biology 6 Imaging information theory and signal processing 7 Computational chemistry medicinal chemistry and drug design 8 Clinical and translational science BICB AAA UNIVERSERBOF MINNESOTA ROCHESTER r 7 BICB Graduate Program ePersonalized degree program to meet the needs of full time and part time students e M S Degree eCourse work plus capstone or course work plus thesis Ph D Degree eInterdisciplinary and collaborative research environment Internships Professional development leadership and management skills Mentoring CONTACT INFORMATION Professor Claudia Neuhauser Director of Graduate Studies Vice Chancellor for Academic Affairs UMR Telephone 507 281 7791 E mail neuha001 umn edu BOL BICB c UNIVE OF MINNESOTA ROCHESTER UivERS PEOF Mt INN The Breadth of Research in BICB o Data mining of biomedical data o Metabolic pathways o Mining of unstructured biomedical data o Screening for drug development BICB JOf MINNESOTA ROCHESTER AN 2 Ebola Virus Therapeutics Prof Kaznessis U of MN Dr Kocher s Group Mayo Clinic RNA Catalysis Prof York s Group U of MN Geroge Giambasu Ph D Candidate Chemistry Andrew Norgan Ph D Candidate Mayo Clinic Metabolic
40. 2009 Issue 4 B Biomedical Informatics amp Computational Biology April 13 2009 o The Looming Petascale o Chemists gear up for a new generation of supercomputers aE TG cm o The new petascale computers will be 1 000 times faster than the terascale supercomputers of today performing more than 1 000 trillion ad per second And instead of machines with thousands of rocessors petascale machines will ave many hundreds of thousands that simultaneously process streams of information This technological sprint could a gear boon for chemists allowing hem to computationally explore the http pubs acs org cen science 87 8715sci3 html structure and behavior of biqger and more complex molecules 51 BDL What is the Challenge B I C B Biomedical Informatics amp Computational Biology Applications are we there 52 Bab Porting Applications to Blue Gene BICB Answer the following questions to help you in the decision making process of and the level of ion tha gt Ce effort required answering yes to most of the questions is an indication our code is already enabled for distributed memory systems and good candidate for Blue Gene P Is the code already running in parallel Is the application addressing 32 bit Does the application rely on system calls for example system Does the code use the Message Passing Interface
41. 5 August 2009 t BU Blue Gene P Job Modes Allow Flexible Use of Node Memory B I C B Biomedical Informatics amp Computational Biology Virtual Node Mode Dual Node Mode SMP Node Mode oPreviously called Virtual Node o Two cores run one MPI o One core runs one Mode process each process oAll four cores run one o Each process may spawn one oProcess may spawn threads process each thread on core not used by on each of the other cores oNo threading other process o Memory process full oMemory process 1 o Memory process 2 node memory node memory node memory oHybrid MPI OpenMP oMPI programming model o Hybrid MPI OpenMP programming model programming model Application Application Application Memory address space D Memory address space BUD Blue Gene Integrated Networks BICB Torus 106b Functional Ethernet Interconnect to all I O nodes only compute nodes t Biomedical Informatics amp Computational Biology Torus network is used 16b Private Control i Ethernet Point to point communication Provides JTAG i2c Collective etc access to Interconnects compute hardware Accessible and I O nodes only from Service One to all broadcast Node system functionality OP Boot monitoring and Reduction operations di functionality iagnostics Barrier Clock network Compute and I O nodes Low latency barrier across system lt lusec for 72 rack
42. Biomedical Informatics amp Computational Biology BICB 194 BO B I C B Biomedical Informatics amp Computational Biology Building Partnerships BICB Biomedical Informatics and Computational Biology mo A P THE HORMEL INSTITUTE i En UNIVERSITY OF MINNESOTA UNIVERSITY OF MINNESOTA ROCHESTER TWIN CITIES MINNESOTA ROCHESTER PEE Objectives Establish world class academic and research programs in bioinformatics and computational biology at UM Rochester Leverage the University of Minnesota s academic and research capabilities in partnership with IBM Mayo Clinic Hormel Institute and other industry leaders Build academic and research programs that complement southeast Minnesota s existing leadership roles in health sciences biosciences engineering and technology Create academic and research programs that provide applications to economic activities via innovation translational research and clinical experiences BICB A UNIVERSBEOF MINNESOTA 4 Overview Biomedical Informatics and Computational Biology BICB Interdisciplinary all University graduate program University of Minnesota Twin Cities University of Minnesota Rochester administrative home D and Master of Science M S Plan A and Plan degrees and a inor Gradu
43. Engine regular expressions supported Hoenn matmul 79 Looking at Assembler Code B I C B Biomedical Informatics amp Computational Biology File View Filter Report Utility no ticks address per instr instruction assembler code source code 76080246 mfspr 0 8 70800026 mfcr 12 48001629 bl Ox5be FBEiFFB8 std 31 0x3fee 1 FBCIFFBO std 30 0x3fec 1 615F0000 oril 21 10 0 FBAIFFAS std 29 0 3 1 FBS1FFAO std 28 0x3fe8 1 61160000 oril 28 8 0 matmul Cii 80 Program matmul Total CPU Usage 39 33 seconds summary of 1 gmon out profile files Display Status showing 6 out of 75 nodes and 3 out of 70 arcs BICB cumulative seconds 16 53 19 27 21 71 23 57 Xprofiler Flat Format Biomedical Informatics amp Computational Biology self self total seconds calls ms call ms call name 16 53 235580 0 07 0 07 short_ene 7 2 74 23558 0 12 0 12 pack nb list 11 2 44 10 244 00 244 00 grad sumrc 12 1 86 10 186 00 190 00 fill charge grid 81 Bab Mass Library B I B Biomedical Informatics amp Computational Biology f o Mathematical Acceleration Subsystem MASS consists of libraries of tuned mathematical intrinsic functions o Scalar Library The MASS scalar library libmass a contains an accelerated set of frequently used math intrinsic functions in the AIX and Linux system library libm a now called libxlf90 a in the IBM XL Fortran dpe sqrt t exp log sin
44. II Computer TE Performance Biomedical Informatics amp Computational Biology Name yottaFLOPS zettaFLOPS exaFLOPS petaFLOPS teraFLOPS gigaFLOPS megaFLOPS kiloFLOPS FLOPS 1024 1021 1018 1015 1012 109 106 103 http en wikipedia org wiki FLOPS 133 Babe Outline t B I C B Biomedical Informatics amp Computational Biology 0 Part I Hardware Historical perspective Why do we need MPPs Overview of massively parallel processing MPP Architecture Part II Software Overview Compilers MPI Building and Running Examples on Blue Gene Hands on session 1 Da III Applications MPP architecture and its impact on applications Performance tools Introduction to code optimization Hands n session 2 Mapping applications on a massively parallel architecture Applications landscape Challenges and characteristics of Life Sciences applications Selected Bioinformatics applications Selected Structural Biology applications Hands on session 3 Summary Biomedical Informatics amp Computational Biology 134 BU Scientific and Engineering Applications LandscapES I C B Biomedical Informatics amp Computational Biology Petroleum Rational Reservoir Materials Weather and Partial Diff Climate N Body Methods Equations Semiconductors Molecular Multiphase ae Dynamics Reaction iu Diffusion Fluid Y Dynamics Fracture Flo
45. PP architecture and its impact on applications Performance tools Introduction to code optimization Hands n session 2 Mapping applications on a massively parallel architecture Applications landscape Challenges and characteristics of Life Sciences applications Selected Bioinformatics applications Selected Structural Biology applications Hands on session 3 o Future Directions o Summary 65 High Performance Toolkit B I Biomedical Informatics amp Computational Biology High Performance Computing Toolkit Xprofiler for CPU profiling Hardware Performance Monitoring HPM Message Passing Interface MPI Profiler and Tracer tool Threading performance OpenMP profiling I O Performance o GUI of the High Performance Computing Toolkit HPCT De BICB HPC Toolkit Flow Biomedical Informatics amp Computational Biology Fortran Binary Output Analysis 67 f PT cu Profiling using Xprofiler B I B Biomedical Informatics amp Computational Biology o Xprofiler Used to analyze your application performance It uses data collected by the pg compiler option to construct a graphical display It identifies functions that are the most CPU intensive o GUI manipulates the display in order to focus on the critical areas of the application o Important factors Sampling interval is in the order of ms Profiling introduces overhead du
46. Pathways Prof Boley s Group U of MN Emilia Wu Post Doc Chemical Engineering Dimitrije Jevremovic Ph D Candidate Computer Science Life Sciences Environment for Blue Gene Kinases Small Molecules Inhibitors Dr Dong amp Dr Bode s Group Hormel Institute Rashed Ferdous Ph D Candidate IBM Madhusanan Mottamal Post Doc Hormel Institute 74 SENE o SOTA ROCHESTER UwivERSEDEOF Mi DK BICB Resource IBM JS22 IBM Blue Gene P IBM Blue Gene Center Rochester AAA UNIVERS PEOF MINNESOTA ROCHESTER 1 Silver IBM JS22 QS22 Hardware and Configuration 7 compute blades 1 interactive blade 1 file server management node 30 total compute processors 72 TB total memory Specifications for the compute blades are as follows Six JS22 blades each with four 4 0 GHz processors and 8 GB of memory One QS22 blade with two 3 2 GHz PowerXCell 8i processors and 16 GB of memory Specifications for the interactive blade are as follows One JS22 blade with four 4 0 GHz Power6 processor and 8 GB of memory Network All of the blades within the cluster are interconnected with a 4X InfiniBand DDR network https www msi umn edu labs umbcl techinfo html ington St u Johnston Hall L Walter ha Washington Ave Parking Ramp 5 Transportation amp safety Bldg 2 Art Museum
47. Series B I C B Biomedical Informatics amp Computational Biology o 2048 MB or 4096 memory per node 32 bit memory addressing o Compute node kernel does not have full Linux limited system calls compatibility no fork or system calls Bab IBM XL Compilers E B I B Biomedical Informatics amp Computational Biology o Compilers for Blue Gene are located in the front end opt ibmcmp o Fortran opt ibmcmp xlf bg 11 f bin bgxlf opt ibncmp xlf bg 11 I bin bgxlf 90 opt ibmcmp xlf bg 11 I bin bgxlf 95 opt ibmcmp vac bg 9 bin bgxlc o opt ibmcmp vacpp bg 9 O bin bgxlC 32 Language Scripts EEEE B I B Biomedical Informatics amp Computational Biology C bgc89 99 bgxlc bgc89 bgc99 bgxlc bgxlC bgxlC r o Fortran bgf 2003 bgf95 bgxlf 2003 bgxlf90 bgxlf bgf77 bgfort77 bgxlf 2003 bgxlf95 bgf90 bgxlf90 bgxlf95 Bab Unsupported Options E B I B Biomedical Informatics amp Computational Biology The following compiler options although available for other IBM systems are not supported by the Blue Gene P hardware o q64 The Blue Gene P system uses a 32 bit atlas you cannot compile in 64 bit mode qaltivec The 450 processor does not support VMX instructions or vector data Types Bab GNU Compilers B I B Biomedical Informatics amp
48. al Biology o Function menu Number of operations for any of the functions shown in the function call tree by using the Function menu You can access statistical data look at source code and control which functions are displayed The Function menu is not visible from the Xprofiler window To access it you right click the function box of the function in which you are intereste Arc menu Locate the caller and callee functions for a particular call arc The Arc menu is not visible from the window You access it by right clicking the call arc in which you are intereste o Cluster Node menu Control the way your libraries are displayed by Xprofiler The Cluster Node menu is not visible from the Xprofiler window You access it by right clicking the edge of the cluster box in which you are interested Display Status Field at the bottom of the Xprofiler window is a single field that tells you The name of your application The number of gmon out files used in this session The total amount of CPU used by the application The number of functions and calls in your application and how many are currently displayed 72 44 Bab Building AMBER7 with Xprofiler B I C B Biomedical Informatics amp Computational Biology LOADER LINKER Use Standard optig setenv LOAD 1 90 naxdata 0x80000000 Load with the IB amp ESSL libraries setenv LOADLIB if 5 5 MASSLIB
49. architecture 103 Bab Registers per Architecture BICB Processor Architecture Power PC 970 POWER4 5 POWERb5 POWER6 Blue Gene L Blue Gene P Biomedical Informatics amp Computational Biology Number of Performance Counter Registers 8 8 8 6 6 52 256 104 Bab Counting Registers f B I B Biomedical Informatics amp Computational Biology o User sees private counter values for the application o Counting of the special CPU registers is frozen and the values are saved whenever the application process is taken off the CPU and another process is scheduled o Counting is resumed when the user application is scheduled on the CPU o The special CPU registers can count different events o There are restrictions on which registers can count which events 105 Dab Performance Monitor Counters BICB Processor PowerPC 970 PowerPC 970 MP POWER4 POWER4 II POWERS POWERD POWER6 Biomedical Informatics amp Computational Biology Performance Monitor Counters oO OQ CO CO Events Event Groups 230 230 244 244 474 483 553 49 51 63 63 163 188 202 106 Babe HPM Metrics t B I C B Biomedical Informatics amp Computational Biology Useful derived metrics Cycles IPC instructions per e Instruction os diia A Float point rate Mflip s Floating point instructions Computation intensity e Integer instructions Instructions p
50. ate faculty are from University of Minnesota Twin Cities University of Minnesota Rochester Hormel Institute Mayo Clinic IBM Students are in residence on either the Rochester or Twin Cities campus The program is suitable for full time and part time students Bw BICB UNIVE MINNESOTA ROCHESTER if SN Admission Requirements Strong background in the quantitative sciences and varied backgrounds in the life health sciences Calculus 1 year Introduction to computer science or programming 1 semester Chemistry 1 year General Biology 1 semester Background in either two of the areas 1 3 or one of the areas 1 3 and one of the areas 4 5 1 Multivariable calculus differential equations linear algebra 2 Algorithms amp data structure discrete mathematics 3 Statistics or biostatistics probability theory 4 Biochemistry genetics and cell biology 5 Health sciences pharmacology physiology or related areas Deficiencies must be made up during the first year Da BICB AAA UNIVERSEDEOF MINNESOTA ROCHESTER UN BICB Graduate Program CORE AREAS 1 Biochemistry molecular and cell biology 2 Database data mining and computing 3 Informatics analysis and machine learning 4 Mathematics biostatistics and statistics 5 Computational and systems biology ELECTIVE AREAS 1 Biochemistry molecular and cell biology
51. ation Library B I B Biomedical Informatics amp Computational Biology o Libhpm Provides instrumented programs with a summary output for each instrumented region in a program This library supports serial and bur Message Passing Interface MPI threaded and mixed mode applications written in Fortran C and Provides a programming interface to start and stop performance counting for an application program The MC of the application program between the start and stop of performance counting is called an instrumentation section Any such instrumentation section is assigned a unique integer number as a section identifier 101 BO Libhpm Template BICB Biomedical Informatics amp Computational Biology hpmInit tasked Uv program 2 1 outer call jj work hpmStart 2 computing meaning of life do_more_work hpmStop 2 hpmStop 1 hpmTerminate taskID Calls to hpmInit and hpmTerminate embrace the instrumented part Every instrumentation section starts with hpmStart and ends with hpmStop section identifier is the first parameter to the latter two functions 102 E Bab Events and Groups B I C B Biomedical Informatics amp Computational Biology o The hardware performance counters information is the value of special CPU registers that are incremented at certain events o The number of such registers is different for each
52. buffer is full 113 Tracing All Events Finer Granularity BICB Biomedical Informatics amp Computational Biology o Control the time history measurement within the application by calling routines to start or stop tracing o Fortran syntax call trace start do work mpi call trace stop o Csyntax void trace start void void trace start void trace_start do work mpi trace_stop o C syntax extern C void trace_start void extern OU Old trace_start void trace start do work mpi trace Stop 114 t JP TRACE ALL EVENTS disabled B I C B Biomedical Informatics amp Computational Biology o To use one of the previous control methods the ALL EVENTS variable must be Disabled Otherwise it traces all events o You can use one of the following commands depending on your shell to disable the variable bash export TRACE ALL EVENTS no csh setenv TRACE ALL EVENTS no csh 115 E Bab Environmental Variables B I B Biomedical Informatics amp Computational Biology o TRACE ALL TASKS When saving MPI event records it is easy to generate trace files that are too large to visualize To reduce the data volume when you set TRACE ALL EVENTS yes o TRACE MAX RANK To provide more control you can set TRACE RANK Z 116 4 Dab Environmental Variables 2 B I C B Biomedical Informatics amp Computational Biology o TRACEBACK LEVEL In
53. compiler mpicxx C compiler mpif77 Fortran 77compiler mpif90 Fortran 90 compiler mpixlc IBM XL C compiler Thread safe version of mpixlc XL C compiler Thread safe version of mpixlf 2003 XL Fortran 2003 compiler mpixlf 2003 Thread safe version of mpixlf 2003 mpixlf77 IBM XL Fortran 77 compiler mpixlf77 r Thread safe version of mpixlf77 mpixlf90 IBM XL Fortran 90 compiler mpixlf90 Thread safe version of mpixlf90 mpixlf95 IBM XL Fortran 95 compiler mpixlf95 r Thread safe version of mpixlf95 mpich2version Prints 2 version information O O O O O 0000000000 38 44 Biomedical Informatics amp Computational Biology Bab Compiling on Blue Gene C 4 B Biomedical Informatics amp Computational Biology 5 make f make hello 5 mpixlc 03 garch 450 qtune 450 hello c o hello Scat make hello AL mpixlc r OBJ hello SRC hello c FLAGS 03 qarch 450 gtune 450 LIBS S OBU lt 5 S XL_CC S FLAGS SRC o OBJ LIBS clean rm o hello 39 Babe Hello World C B I C B Biomedical Informatics amp Computational Biology cat include lt stdio h gt Headers include mpi h main int char argv Function main int rank size tag 1 Stetus status char message 20 rc MPI Init amp argc amp argv re
54. d Running Examples on Blue Gene Hands on session 1 ixi III Applications MPP architecture and its impact on applications Performance tools Introduction to code optimization Hands n session 2 Mapping applications on a massively parallel architecture Applications landscape Challenges and characteristics of Life Sciences applications Selected Bioinformatics applications Selected Structural Biology applications Hands on session 3 Summary Biomedical Informatics amp Computational Biology Bab Technological challenges B I C B Biomedical Informatics amp Computational Biology o The point to which we can shrink transistors has an absolute limit o The shrinking of transistors yield difficult side effects Electro Magnetic Interference o Power leakage Multi processor shared memory machines Fast sophisticated interconnects with multiple processors 4 Dab The 1990s B I B Biomedical Informatics amp Computational Biology 44 o Commodity computing o Large scale machines could be achieved using individual CPUs networked or clustered to function together as a single unit Massively parallel processing MPP systems From Kilobytes to Petabytes in 50 Years http www eurekalert org features doe 2002 03 dlnl fKt062102 php Bab Supercomputer Peak Performance B I C B Biomedical Informatics amp Computational Biology 1E 17 1E 14 F lt 1 11 i Doubling time 1 5 yr 1
55. dical Informatics amp Computational Biology Babel The DNA of mpiBLAST PIO on BG BICB Biomedical Informatics amp Computational Biology Approach Exploit the distributed processing power and memory of supercomputing systems particularly for large datasets Software Environment Operating System Linux Programming Language C and MPI Message Passing Interface Overview of Parallel Algorithm Segment query file into individual queries only one query shown below Fragment database and distribute to the worker nodes Worker Nodes Results Master node reb cas Results sent to broadcast to workers master node 177 Nodes query DB generate results BaD sien AsT PIO 1 4 Performance on BG L BICB 8192 7168 6144 5120 4096 3072 Parallel Soeedup 2048 1024 0 1024 2048 3072 4096 5120 6144 7168 8192 Nodes in co processor mode Biomedical Informatics amp Computational Biology Performance Scaling Thick Line Ideal Speed Up Thin Solid Line Speed up for a large query against nr Dashed Line Speed up for a medium query against nr Dotted Line gt Speed up for small query against nr Oystein Thorsen Karl Jiang Amanda Peters Brian Smith Heshan Lin Wu chun Feng Carlos P Sosa Parallel Genomic Sequence Search on a Massively Parallel System Conference On Computing Frontiers 485 of
56. dy Tim Bailey and others implements multiple ungapped HMM motif models similar to PROBE NCBI s PSI BLAST server implements a stripped down but ultra fast version of iterative profile HMM searches This is a convenient Web server for folks who don t want to hassle with installing software locally Ewan Birney s WISETOOLS package can take a HMMER model and search it against EST or genomic DNA sequence doing six frame translation and allowing for frameshifts and introns Hidden Markov Models References 44 B Biomedical Informatics amp Computational Biology o Several software packages are currently available HMMER http hmmer janelia org SAM http www cse ucsc edu research compbio sam html PFTOOLS http www isrec isb sib ch profile profile html http www bio net hypermail bio software 1999 January 020107 html GENEWISE http www ebi ac uk Wise2 META MEME http metameme sdsc edu PSI BLAST http blast ncbi nlm nih gov Blast cgi 146 t Bab HMMs and Applications B I B Biomedical Informatics amp Computational Biology o HMM Profile HMMs are statistical models of multiple sequence alignments Capture position specific information on how conserved each column of the alignment is and which residues are likely o Applications Evolutionary homology in family of proteins Automated annotation of the domain structure of proteins Automated constructio
57. e pi hfile hostfile procs 4 20 Enter the number of intervals 0 quits pi is approximately 3 1418009868930938 Error is 0 0002090223330533007 0 Enter the number of intervals 0 quits wrote trace file single trace 125 BICB Appendix I Xprofiler Options Biomedical Informatics amp Computational Biology 126 BO BICB Xprofiler Options 1 b Xprofiler b a out gmon out This option poses ud of the field descriptions for the Flat Profile Call Graph Profile and Function Index reports when they are written to a file with the Save As option of the File menu Xprofiler s a out gmon out 1 gmon out 2 gmon out 3 If multiple gmon out files are specified when Xprofiler is started this option produces the gmon su be data file The gmon sum file represents the sum of the profile information in all the specified profile files Note that if you specify a single gmon out file the gmon sum file contains the same data as the gmon out file Xprofiler z a out gmon out This option includes functions that have both zero CPU usage and no call counts in the Flat Profile Call Graph Profile and Function Index reports A function will not have a call count if the file that contains its definition was not compiled with the pg option which is common with system library files 127 f Biomedical Informatics amp Computational Biology Bab Xprofiler Options 2 BICB Xprofiler pathA pathB This
58. e is measured as elapsed time or wallclock 60 4 Bab Efficiency B I C B Biomedical Informatics amp Computational Biology o Parallel efficiency is defined as how well a program your code utilizes multiple processors cores Sequential run time Efficiency X Parallel run time is the number of processors defined by the user 61 PWT Parallel Efficiency Dependencies BI Biomedical Informatics amp Computational Biology Sequential code Parallel code Communication overhead and redundancy DOD Example Parallel Speedup B I C B Biomedical Informatics amp Computational Biology Completion time computation time communication time Serial Parallel me 25 Serial time gt Parallel time Processors Serial Parallel Speedup 1 100 100 1 Programmer A 4 25 4 Programmer B 4 35 2 9 4 45 692 Bab Optimization Comparison B I C B Biomedical Informatics amp Computational Biology Time reduction O Programmer Programmer B Programmer C Time Processors 64 Babe Outline B I C B Biomedical Informatics amp Computational Biology o Part I Hardware Historical perspective Why do we need MPPs Overview of massively parallel processing MPP Architecture o Part II Software Overview Compilers MPI Building and Running Examples on Blue Gene Hands on session 1 0 3d III Applications M
59. e to function calls 68 Bab Starting Xprofiler BICB Biomedical Informatics amp Computational Biology Start Xprofiler by issuing the Xprofiler command from the command line Specify the executable Profile data file or files Options Specify them on the command line with the Xprofiler command Issue the Xprofiler command alone and then specify the options from within the GUI Xprofiler a out gmon out options is the name of your binary executable file gmon out is the name of your profile data file or files options 69 Bab Xprofiler versus gprof BICB Xprofiler gives a graphical picture of the CPU consumption of your application in addition to textual data Xprofiler displays your profiled program single main window It uses several types of graphic images to represent the relevant parts of your program Functions are displayed as solid green boxes called function boxes Calls between them are displayed as blue arrows called ca arcs The function boxes and call arcs that belong to each library within your application are displayed within a fenced in area called a cluster box When Xprofiler first opens by default the function boxes for your application are clustered by library This type of clustering means that a cluster box appears around each library and the function boxes and call arcs within the cluster box are reduced in size If you want to see more detai
60. ed a new 100 million exploratory research initiative to build a supercomputer 500 times more powerful than the world s fastest computers today The new computer nicknamed Blue Gene by IBM researchers will be capable of more than one quadrillion operations per second one petaflop This level of performance will make Blue Gene 1 000 times more powerful than the Deep Blue machine that beat world chess champion Garry Kasparov in 1997 and about 2 million times more powerful than today s top desktop PCs Blue Gene s massive computing power will initially be used to model the folding of human proteins making this fundamental study of biology the company s first computing grand challenge since the Deep Blue experiment Learning more about how proteins fold is expected to give medical researchers better understanding of diseases as well as potential cures 10 Bab MPP Constraints B Biomedical Informatics amp Computational Biology oLimits of physical size floor space oPower consumption oCooling needed to house and run the aggregated equipment Bab Design Considerations B I C B Biomedical Informatics amp Computational Biology o Widening gap between processor and DRAM clock rates o Excessive heat generated by dense packaging and high switching frequency o Disparity between processor clock rate and immediate vicinity peripheral devices memory buses etc o Network performance The speed of
61. enabled on a massively parallel system Sequence alignment Bioinformatics applications can be mapped onto a nc parallel architecture and take advantage of its architectural eatures Multiple optimization techniques were required to improve performance ona single node Multiple optimization techniques were required for extreme scalability Alternate Sequence File Indexing Multiple Master Configuration Dynamic Data Collection Database Caching Load balancing Extreme scalability enables us to complete a large scale bioinformatics problem sequence searching a microbial genome database against itself to support the discovery of missing genes in genomes in only a few hours on BG P Previously this problem was viewed as computationally intractable in practice 191 f Biomedical Informatics amp Computational Biology Applications Development Book tz B IC Biomedical Informatics amp Computational Biology IBM System Blue Gens Solution Blue Gene P Application Development UT Carlos Sosa Brant Knudson ibm com redbooks Redbooks 192 Babe References Biomedical Informatics amp Computational Biology BICB C P Sosa and Knutson IBM System Blue Gene Solution Blue Gene P Application 5624 7278 03 Redbooks Draft Redbooks last update 25 August G Lakner IBM System Blue Gene Solution Blue Gene P System Administration 5624 7417 03 Redbooks published 1 Septembe
62. er e Load stores load store Load stores per cache Cache misses miss TLB misses Cache hit rate Branch taken not taken Loads per load miss Branch mispredictions Stores per store miss Loads per TLB miss Branches mispredicted e Derived metrics allow users to correlate the behavior of the application to one or more of the hardware components Onecan define threshold values acceptable for metrics and take actions regarding program optimization when values are below the threshold 107 BICB sen receive Biomedical Informatics amp Computational Biology Task a program with local memory and I O ports Channel a message queue that connects two tasks Computation Communication f Babe Profiler and Tracer B I C B Biomedical Informatics amp Computational Biology o The MPI profiling and tracing library collects profiling and tracing data for MPI programs Library name Usage libmpitrace a Library for both the C and Fortran applications mpt h Header files 109 Bab Compiling and Linking B I C B Biomedical Informatics amp Computational Biology f o To use the library the application must be compiled with the g option You might consider turning off or having a lower level of optimization O2 1 for the application when linking with the MPI profiling and tracing library High level optimization affects the correctness of the debu
63. files Subset of potential drug candidates Store docking score results into database 188 BICE Andrew P Norgant Paul S Coffman Jean Pierre Kochert David J Katzmann Carlos P Sosa 1 Clinic IBM Corporation Rochester MN 3Biomedical Informatics and Computational Biology UMR Andrew P Norgan1 Paul S Coffman Jean Pierre Kocher David J Katzmann Carlos P Sosa BICB Research Symposium University of Minnesota Rochester June 25 2010 Rochester MN 189 Babe Outline B I C B Biomedical Informatics amp Computational Biology 0 Part I Hardware Historical perspective Why do we need MPPs Overview of massively parallel processing MPP Architecture Part II Software Overview Compilers MPI Building and Running Examples on Blue Gene Hands on session 1 Part III Applications MPP architecture and its impact on applications Performance tools Introduction to code optimization Hands n session 2 Mapping applications on a massively parallel architecture Applications landscape Challenges and characteristics of Life Sciences applications Selected Bioinformatics applications Selected Structural Biology applications Hands on session 3 Summary Biomedical Informatics amp Computational Biology BICB 190 Bab Summary BICB o o o Multiple applications in the area of Life Sciences have been
64. gging information and can also affect the call stack behavior o To link the application with the library L path to libraries where path to libraries is the path where the libraries are located Impitrace which should be before the library Impich in the linking order The option llicense to link the license library 110 Bab Compiling on AIX on POWER B I C B Biomedical Informatics amp Computational Biology o C example CC e Jusr Llpp ope poe brxn moco r TRACE LIB L path to libmpitrace a lmpitrace mpitrace ppe mpi test c g o 508 lt S TRACE LIB 1m o Fortran example FC usr lpp ppe poe bin mpxlf r TRACE LIB L path to libmpitrace a lmpitrace Swim ppe swim f S FC o lt S IRACELIB 111 Bab Compiling on Linux on POWER BICB Biomedical Informatics amp Computational Biology o C example opt ibmhpc ppe poe bin mpcc IRACE LIB sL path to libmpitrace a Iimpitrace mpitrace CC mpi test c ed 0 SU 24 BIB 1m o Fortran example opt ibmhpc ppe poe bin mpfort IRACE LIB L path to libmpitrace a lmpitrace Statusesf trace FC Statusesf f g o 80 lt S TRACE LIB 112 Tracing All Events iS BICB Biomedical Informatics amp Computational Biology o Wrappers can save a record of all MPI events one after MPI Init until the application completes or until the trace
65. i E i UJ C UJ Hu Biomedical Informatics amp Computational Biology Avolidelon Carlos P Sosa IBM and Biomedical Computation E THE HORMEL INSTITUTE MAYO CLINIC UNIVERSITY OF MINNESOTA GY UNIVERSITY MINNESOTA X Babe Outline 44 B I C B Biomedical Informatics amp Computational Biology 0 Part I Hardware Historical perspective Why do we need MPPs Overview of massively parallel processing MPP Architecture Part II Software Overview Compilers MPI Building and Running Examples on Blue Gene Hand s on session 1 Applications MPP architecture and its impact on applications Performance tools Introduction to code optimization Hands n session 2 Mapping applications a massively parallel architecture Applications landscape Challenges and characteristics of Life Sciences applications Selected Bioinformatics applications Selected Structural Biology applications Hands on session 3 Summary Biomedical Informatics amp Computational Biology Babe Outline t B I C B Biomedical Informatics amp Computational Biology 0 Part I Hardware Historical perspective Why do we need MPPs Overview of massively parallel processing MPP Architecture Part II Software Overview Compilers MPI Building an
66. ilers MPI Building and Running Examples on Blue Gene Hands on session 1 Part III Applications MPP architecture and its impact on applications Performance tools Introduction to code optimization Hands on session 2 Mapping applications on a massively parallel architecture Applications landscape Challenges and characteristics of Life Sciences applications Selected Bioinformatics applications Selected Structural Biology applications Hands on session 3 Summary Biomedical Informatics and Computational Biology BICB Bab How is BG P Configured B I C B Biomedical Informatics amp Computational Biology Service amp Front End Login Nodes 1GbE Service Network SLES10 DB2 XLF XLC C 4 GPFS ESSL TWS LL Blue Gene core rack 1024 Compute Nodes rack Up to 64 Nodes rack Storage File Servers Subsystem 17 10GbE Functional Network 1 Source Sosa and Knutson IBM System Blue Gene Solution Blue Gene P Application Development SG24 7278 03 Redbooks Draft Redbooks last update 25 August 2009 IBM System Blue System on Chip SoC Quad PowerPC 450 w Double FPU Memory Controller w ECC 1 L2 L3 Cache Ew 494 9 ME ES DMA amp PMU 9 n Torus Network Collective Network 4474 Global Barrier Network mca ea M ee je System 10GbE Control Network Ecos Up to 256 Racks JTAG Monitor a c uf Up to 3 5 PF s
67. l you must uncluster the functions by selecting File Uncluster Functions l 70 Biomedical Informatics amp Computational Biology Bab Xprofiler Main Menus BICB o File menu With the File menu you specify the executable a out files and profile data gmon out files that Xprofiler will use You also use this menu to control how your files are accessed and saved View menu You use the View menu to help you focus on portions of the function call tree in the Xprofiler main window in order to have a better view of the application s critical areas Filter menu the Filter menu you add remove and change o parts of the function call tree By controlling what Xprofiler displays you can focus on the objects that are most important to you Report menu The Report menu provides several types of profiled data a textual and tabular format With the options of the Report menu you can display textual data save it to a file view the corresponding source code or locate the corresponding function box or call arc in the function call tree in addition to presenting the profiled data Utility menu The Utility menu contains one option Locate Function By Name with which you can highlight a particular function box in the function call tree 71 44 Biomedical Informatics amp Computational Biology Bab Xprofiler Main Menus 2 44 B Biomedical Informatics amp Computation
68. mainly of biological interest CHALLENGE Micro seconds scale simulations require an order of 103 increases in the computing power of contemporary high end systems o Improving code performance and scalability for longer time and length scales simulations Novel algorithms f Biomedical Informatics amp Computational Biology To reduce the performance and scaling bottlenecks To minimize memory requirements for large systems USAGE o Protein modeling structure folding dynamics and function o Compute intensive applications 138 Ligand Protein Interactions BICB Biomedical Informatics amp Computational Biology CHARACTERISTICS Molecular docking is used in structure based drug design The computational aspects can be divided into two parts Ligand atoms being located inside the cavity or binding p of a receptor which is a large iomolecule and scoring or identifying the most favorable interactions CHALLENGE Improving code performance and scalability for virtual screening of millions of ligands USAGE Drug Discovery Compute intensive applications 139 DO BICB f Bioinformatics Biomedical Informatics amp Computational Biology Database CHARACTERISTICS o Science necessary to manage process 9 understand large amounts of data for instance from the sequencing of the human genome or from large databases containing information about plants and animals
69. me listed under the descendants column for the profiled function As a result be aware that the value listed in the time column for most profiled functions in this report will change 130 Biomedical Informatics amp Computational Biology Bab Xprofiler Options 5 BICB O Xprofiler f functionl function2 a out gmon out This option de emphasizes the general appearance of all function boxes in the function call tree except for that of the specified function or functions and its descendant or descendants In addition the number of entries in the Call Graph Profile report for the non specified functions and non descendant functions is limited he f flag overrides the e flag In the function call tree all function boxes except for that of the specified function or functions and its descendant or descendants appear to be unavailable The size of these boxes and the content of their labels remain the same For the specified function or functions and its descendant or descendants the of the function boxes and labels remains the same In the Call Graph Profile report an entry for a non specified or non descendant function only appears where it is a parent or child of a specified function or one of its descendants All information for this entry remains the same 131 f Biomedical Informatics amp Computational Biology Bab Xprofiler Options 6 BICB EXE Xprofiler F functionl F function2 a
70. n and maintenance of large multiple alignment databases 147 Source HMMER s User Guide 2 3 2 Babe HMMER 2 3 2 l B I B Biomedical Informatics amp Computational Biology O O hmmalign Align sequences to an existing model hmmbuild Build a model from a multiple sequence alignment hmmcalibrate Takes an HMMand empirically determines parameters that are used to make searches more sensitive by calculating more accurate expectation value scores E values hmmconvert Convert a model file into different formats including a compact HMMER 2 binary format and best effort emulation of GCG profiles hmmemit Emit sequences probobilistically from a profile HMM hmmfetch Get a single model from an HMM database hmmindex Index an HMM database hmmpfam Search an HMM database for matches to a query sequence hmmsearch Search a sequence database for matches to an HMM S R Eddy HMMER Biological Sequence Analysis Using Profile Hidden Markov Models Version 2 3 2 http 148 hmmer wustl edu Oct 19 BU HMMER 2 3 2 Parallel Modules BICB Biomedical Informatics amp Computational Biology o Three have been parallelized hmmcalibrate Takes an HMM and empirically determines parameters that are used to make searches more sensitive by calculating more accurate expectation value scores E values hmmpfam is used to search a profile HMM database to a sequence query hmmsearch is used to carry out sequence
71. nd bar shows the multiple master implementation The third bar shows the dynamic data collection implementation the right bar shows the load balancing implementation lementation of the Hidden Markov 162 Methods for Genomic Sequence Search on a Massively pole SystemIEEE TRANSACTIONS ON PARALLEL AND et DISTRIBUTED SYSTEMS VOL 19 NO 1 JANUARY PT hmmpfam B I B Biomedical Informatics amp Computational Biology o Opposite of hmmsearch but similar in program 800 structure same optimizations 0 addition to the Linear other 400 optimizations data caching allowed ind fast processing of thousands of query _ sequences 0 200 400 600 800 1000 Processors 1000 600 4 Speedup o Also scales close to linearly up to 1000 nodes K Jiang O Thorsen A Peters B Smith and C P Sosa An Efficient Parallel of the Hidden Markov 163 Methods for Genomic Sequence Search on a Massively Parallel SystemIEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS VOL 19 NO 1 JANUARY 2008 BICB hmmsearch Performance Improvements on BG P Biomedical Informatics amp Computational Biology Time in Sec Viterbi Default O Viterbi Optimized 128 256 Processors 512 1024 HMM profile globins UniProt Database 2 9 million sequences o Jobs were submitted using LoadLeveler to Blue Gene P 20 25
72. ne L In High Availability and Performance Workshop 2005 Oehmen and J Nieplocha Scalablast A scalable implementation of mpiBLA 5 blast for high performance data intensive bioinformatics analysis IEEE Trans Parallel Distrib Syst 17 8 2006 168 Bab Importance of mpiBLAST PIO BICB http www mpiblast org Biomedical Informatics amp Computational Biology 4 Completion of the sequencing of human genome o New organisms being sequenced at a rapid rate NCBI BLAST server 500 000 query submissions per day Queries per day doubling approximately every year d o Trend in GenBank Database Doubling in size every year Consequence Database size increasing faster than our ability to compute on it o What to Do Faster amp more scalable parallel algorithms i e mpiBLAST PIO More efficient use of state of the art hardware i e BG L and 982 1986 1990 1994 1998 2002 BG P 169 http www ncbi nlm nih gov Genbank genbankstats html Sequences millions Base Pairs of DNA billions Dab Initial Problems B I B Biomedical Informatics amp Computational Biology o Disk I O overload DB 2 4M o Master overworked Master _ Worker 170 X Ma P Chandramohan A Geist and Samatova Efficient data access for parallel BLAST In IPDPS 2005 Proceedings of the 19 IEEE International Parallel and Distributed Processing Symposium
73. ology o Ligand receptor docking Goal Given a protein and a ligand determine the pose s and conformation s minimizing the total energy of the protein ligand complex http www Iso ft com Source http www cs princeton edu courses archive fall07 cos597A index html 185 Energetics and Conformation B I C B Biomedical Informatics amp Computational Biology Challenges Predicting energetics of protein ligand binding Searching space of possible poses amp conformations Conformational Coordinates i Source http www cs princeton edu courses archive fall07 cos597A index html 186 BO BICB 0 DOCK 5 D Moustakas and P T Lang et al J Comput Aided Mol Des 2006 20 601 619 DOCK 5 D Moustakas S C H Pegg and Kuntz in Virtual Screening in Drug Discovery Edited by J Alvarez and B Shoichet Taylor amp Francis Inc DOCK 6 P T Lang PT et al in preparation MPP DOCK A Peters M E Lundberg C P Sosa and P Therese Lang High Throughput Computing Validation for Drug Discovery Using the DOCK Program on a Massively Parallel System REDP 4410 00 Redpapers published 16 April 2008 DOCK6 Biomedical Informatics amp Computational Biology 187 f Bab Embarrassingly Parallel B I C B Biomedical Informatics amp Computational Biology Create database of potential drug candidates midplane 1 For each independent node load DOCK binary Receptor input
74. pecial purnocs ene built into mode counts X within 4 Adv con tunn http en wikipedia org wiki Har pdf pdf 91 http download boulder ibm com ibmdl ptc De Registers Microprocessors and Tunkig B I C B Biomedical Informatics amp Computational Biology o Processor register or general purpose register is a small amount of storage available on the CPU whose contents can be accessed more quickly than storage available elsewhere http en wikipedia org wiki Processor register o Microprocessor incorporates most or all of the functions of a central processing unit CPU on a single integrated circuit http en wikipedia org wiki Microprocessor o Performance tuning is the improvement of system performance http en wikipedia org wiki Performance tuning 92 BIC Software Profilers versus Hardware Counters Hardware counters provide low overhead access to a wealth of detailed performance information related to CPU s functional units caches and main memory With hardware counters no source code modifications are needed in general Meaning of hardware counters vary from one kind of architecture to another due to the variation in hardware organizations Difficulties correlating the low level performance metrics back to source code Limited number of registers to store the counters often force users to conduct multiple measurements to collect all desired performance metrics Modern s
75. r 2009 P T Lang M E Lundberg A Peters and C P Sosa High Throughput Computing Validation for revi Discovery Using the DOCK Program on a Massively Parallel System REDP 4410 00 Reapapers published 16 April 2008 IBM Rochester Blue Gene Center BG P User Guide Rochester MN IBM System Blue Gene Solution Performance Analysis Tools REDP 4256 01 Redpapers published 24 November 2008 last updated 4 June 2009 http www redbooks ibm com abstracts redp4256 html Open IBM High Performance Computing Toolkit MPI mM Profiling User Manual Advanced Computing 2177 Center IBM Thomas J Watson Research Center Yorktown Heights NY 10598 April 4 2008 http www redbooks ibm com 193 1 Babe Outline B I C B Biomedical Informatics amp Computational Biology 0 o Part I Hardware Historical perspective Why do we need MPPs Overview of massively parallel processing MPP Architecture Part II Software Overview Compilers MPI Building and Running Examples on Blue Gene Hands on session 1 ixi III Applications MPP architecture and its impact on applications Performance tools Introduction to code optimization Hands n session 2 Mapping applications on a massively parallel architecture Applications landscape Challenges and characteristics of Life Sciences applications Selected Bioinformatics applications Selected Structural Biology applications Hands on session 3 Summary
76. rmatics amp Computational Biology Other Software Support Parallel File Systems Lustre at LLNL PVFS2 at ANL Job Schedulers SLURM at LLNL Cobalt at ANL Altair PBS Pro Platform LSF for BG L only Condor HTC porting for BG P Parallel Debugger gt Etnus TotalView for BG L as of now porting for BG P Allinea DDT and OPT porting for BG P Libraries gt FFT Library Tuned functions TU Vienna gt VNI porting for BG P Performance Tools gt HPC Toolkit MP_Profiler Xprofiler HPM PeekPerf PAPI Tau Paraver Kojak 28 Understanding Performance on Blue 2 RL BIC Biomedical Informatics amp Computational Biology o Theoretical floating point performance 1 fpmadd per cycle Total of 4 floating point operations per cycle 4 floating point operations cycle x 850 cycle s x 106 3 400 x 10 3 4 GFlop s per core Peak performance 13 6 GFlop s per node 4 cores 1 Babe Two Generations of Blue Gene BICB Menor per Node 5128 165 noi dwi 5 6 GB s 13 6 GB s Peak Performance 5 6 GFlop s per node 13 6 GFlop s per node Full System 72 rack comparison Source C P Sosa and B Knutson IBM System Blue Gene Solution Blue Gene P Application Development SG24 7278 03 Redbooks Draft Redbooks last update 25 August 2009 30 4 Biomedical Informatics amp Computational Biology Blue Gene Key Difference with pSeries x
77. rms less I O Better use of resources Robert Sedgewick A gorithms 1984 p 84 Programming Optimization http www azillionmonkeys com qed optimize html 54 4 1 Bab Application Flow Analysis B I C B Biomedical Informatics amp Computational Biology Tasks 55 Bab Application Optimization B I B Biomedical Informatics amp Computational Biology pdas performance CPU bound analysis Memory bound I O bound 56 Bab Optimization Steps IEK B I B Biomedical Informatics amp Computational Biology 1 Tune for compiler optimization flags 2 Locate hot spots in the code 3 Use highly tuned libraries MASS ESSL 4 Manually optimize the code 5 Determine if I O plays a role and tune if needed BICB Two Key Concepts Biomedical Informatics amp Computational Biology Speedup o Efficiency E Bab Speedup B I C B Biomedical Informatics amp Computational Biology o Speedup is defined as the ratio between the run time of the original code and the run time of the modified code Original code run time Speedup Modified code run time 59 E Bab Parallel Speedup B I C B Biomedical Informatics amp Computational Biology o Parallel speedup is defined as the ratio between the run time of the sequential code and the run time of the modified code Sequential run time Parallel Speedup Parallel run time Run tim
78. rom master a Sox aoe J Q 5 o 9 gt MASTER O Thorsen A Peters B Smith and C P Sosa An Efficient Parallel Implementation of the Hidden 160 nomic Sequence Search on a Massively Parallel SystemlIEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS VOL 19 NO 1 JANUARY 2008 BIC B hmmcalibrate t Biomedical Informatics amp Computational Biology Time sec 12000 10000 8000 6000 4000 2000 32 64 128 256 512 Processors 1024 2048 Parallel performance using the first 327 entries of the Pfam database K Jiang O Thorsen A Peters B Smith and C P Sos ethods for Genomic Sequence Search Massiv M DISTRIBUTED SYSTEMS VOL 19 NO 1 JANUARY 2 a AnEfficient Parallel Imp ely arale SystemIEEE lementation of the Hidden Markov ANSACTIONS ON PARALLEL AND 161 BO BICB Normalized search time K Jiang O Thorsen A Peters B Smith and C P So hmmseacrh Biomedical Informatics amp Computational Biology HMMSearch parallel optimizations 32 64 128 256 512 1024 Processors sa An Efficient Parallel imp Plain Multiple master With dynamic data collection Load Balanced 44 hmmsearch parallel performance using 50 proteins of the globin family For each processor count the left bar shows the original PVM to MPI port The seco
79. rs for a single master o Groups of nodes One master for each group working on Separate query sequences Oystein Thorsen Karl Jiang Amanda Peters Brian Smith Heshan Lin Wu chun Feng Carlos P Sosa Parallel Genomic Sequence Search on a Massively Parallel System Conference On Computing Frontiers Proceedifigs of the 4th international conference on Computing frontiers Ischia Italy 59 68 2007 DO BICB Divide into Groups Biomedical Informatics amp Computational Biology Query 28k DB 2 4M Super master L Master Worker BJ E EIE E Oystein Thorsen Karl Jiang Amanda Peters Brian Smith Heshan Lin Wu chun Feng Carlos Sosa Parallel Genomic Sequence Search a Massively Parallel System Conference On Computing Frontiers Proceedings of the 4th international conference on Computing frontiers Ischia Italy 59 68 2007 BO Divide into Groups B I C B Biomedical Informatics amp Computational Biology Query 28k DB 2 4M Super master Master oQuery fragmentation Load balancing oMultiple output Worker Oystein Thorsen Karl Jiang Amanda Peters Brian Smith Heshan Lin Wu chun Feng Carlos P Sosa Parallel Genomic Seljf nce Search on a Massively Parallel System Conference On Computing Frontiers Proceedings of the 4th international conference on Computing frontiers Ischia Italy 59 68 2007 Distribution Biome
80. s on Blue Gene Hands on session 1 idi III Applications MPP architecture and its impact on applications Performance tools Introduction to code optimization Hands n session 2 Mapping applications on a massively parallel architecture Applications landscape Challenges and characteristics of Life Sciences applications Selected Bioinformatics applications Selected Structural Biology applications Hands on session 3 Future Directions Summary 45 BICB Hardware Naming Convention Biomedical Informatics amp Computational Biology Racks Rad Columo QF Row 0 F Bulk Power Supply Rxx B t 4 Bulk Power Supply Rack Row 0 F Rad Colucan QF Power Cable Power Cable Bulk Power Supply Rad Row 0 Rad Columo QF Power Modules Rax B Px A A Power Module 0 7 0 7 1 6 to righ t cing kont 4 7 lef ro rear Bulk Powers Supply Rack Row QF Rack Columa 0 F 1 Source P Sosa and Knutson IBM System Blue Gene Solution Blue Gene P Application Development SG24 7278 Fan Assemblies Midplane 0 1 0 lt toran 1 Rack Columa 0 F Rack Row OF Clod Rack Columa 0 Rack Row OF Fan Assembly 0 9 0 Front JeTop Prout 3eBorton Hear 9e Top Fear Midplane 0 1 0 B tom Le Top Rack Colucan 0 F Rack Row OF Fan 0 2 0 2
81. s option changes the general appearance and label information of the function box or boxes for the ro function or functions in the function call tree In addition this option limits the number of entries for these functions in the Call 22 Profile report and changes the CPU data that is associated with them These results also apply to the specified function s descendants as long as they have not been called by non specified functions in the program In the function call tree the function box for the specified function appears to be unavailable and its size and shape also change so that it appears as a square of the smallest allowable size In addition the CPU time shown in the function box label appears as zero The same applies to function boxes for descendant functions as e they have not been called by non specified functions This option also causes the CPU time spent by the specified function to be deducted from the left side CPU total in the label of the function box for each of the specified ancestors of the function In the Call Graph Profile report an entry for the specified function only appears where it is a child of another function or as a parent of a function that also has at least one non specified function as its parent When this is the case the time in the self and descendants columns for this entry is set to zero In addition the amount of time that was in the descendants column for the specified function is subtracted from the ti
82. ssively parallel processing MPP Architecture Part II Software Overview Compilers MPI Building and Running Examples on Blue Gene Hands on session 1 ixi III Applications MPP architecture and its impact on applications Performance tools Introduction to code optimization Hands on session 2 Mapping applications on a massively parallel architecture Applications landscape Challenges and characteristics of Life Sciences applications Selected Bioinformatics applications Selected Structural Biology applications Hands on session 3 Summary Biomedical Informatics and Computational Biology BICB 27 BO BICB IBM Software Stack XL FORTRAN C and C compilers gt Externals preserved gt Optimized for specific BG functions gt OpenMP support LoadLeveler scheduler gt Same externals for job submission and system query functions gt Backfill scheduling to achieve maximum system utilization GPFS parallel file system gt Provides high performance file access as in current pSeries and xSeries clusters gt Runs on I O nodes and disk servers ESSL MASSV libraries gt Optimization library and intrinsics for better application performance gt Serial Static Library supporting 32 bit applications gt Callable from FORTRAN C and C MPT library Message passing interface library based on 2 tuned for the Blue Gene architecture Software Stack Biomedical Info
83. tional Biology profile taskid has the timing summaries mpi profile O file contains a timing summary from each task Currently for scalability reasons only four ranks rank O and rank with min med max MPI communication time generate a plain text file by default o To change this default setting one simple function can be implemented and linked into compilation control c int MT output trace int rank 4 return 1 mpitrace mpi test c 5 S CFLAGS control o mpi test o 5 LIB lm o 544 119 Bab mpi profile O B I C B Biomedical Informatics amp Computational Biology mpi profile O0 elapsed time from clock cycles using freq 700 0 MHz MPI Comm size 1 0 0 0 000 MPI Comm rank 1 0 0 0 000 MPI Isend 21 99864 3 0 000 MPI Irecv 21 99864 3 0 000 MPI Waitall 21 0 0 0 014 MPI Barrier 47 0 0 0 000 total communication time 0 015 seconds total elapsed time 4 039 seconds 120 BO BICB Message size distributions MPI Isend calls avg bytes time sec 3 1 18 1 1 1 1 1 1 1 1 I 1 1 1 1 1 mpi profile O 2 4 Biomedical Informatics amp Computational Biology 2 3 0 000 9 U 0 000 16 0 0 000 S200 Q 000 64 0 0 000 128 0 0 000 256 0 0 000 512 0 0 000 1024 0 0 000 2048 0 0 000 4096 0 0 000 8192 0 0 000 16384 0 0 000 32768 0 0 000 65536 0 0 000 131072 0 0 000 262144 0 0 000 524288 0 0 000 1048576 0 0 000 121 Bab profile O
84. topology organization of processors 2 Context aware synchronization across the whole grid and sub grid PBPI significantly reduces phylogenetic inference time by exploiting distributed processing power and memory especially for large data set For proper sizes of phylogenetic problem PBPI is capable to scale up to thousands of processors on Blue Gene 183 BO BICB Scalability on BG L Biomedical Informatics amp Computational Biology E 256 512 1024 2048 E 4096 b am A FO Relative Speedup 32 chains 64 chains X Feng D A Buell J Rose and P J Waddell Parallel algorithms for Bayesian phylogenetic inference Journal of Parallel and Distributed Computing JPDC Volume 63 Issues 7 8 2003 X Feng K W Cameron and D A Buell PBPI a high performance Implementation of Bayesian Phylogenetic Inference ACM IEEE SC 2006 the International Conference on High Performance Computing Networking Storage and Analysis November 2006 Tampa FL X Feng K W Cameron B Smith and C Sosa Building the Tree of Life on Tera scale Systems the 2157 International Parallel and Distributed Processing Symposium IPDPS 07 April 2007 Long Beach CA 184 BO Molecular Docking amp BICHB Virtual Screenin Biomedical Informatics amp Computational Bi
85. uperscalar processors schedule and execute multiple Instructions at one time http en wikipedia org wiki Hardware performance counter 93 f Biomedical Informatics amp Computational Biology Bab Summary of Hardware Counters B I B Biomedical Informatics amp Computational Biology o Extra logic inserted in the processor to count specific events o Updated at every cycle o Strengths Non intrusive Very accurate Low overhead o Weakness Provides only hard counts Specific for each processor Access is not well documented Lack of standard and documentation on what is counted 94 E BU hpmcount B I C B Biomedical Informatics amp Computational Biology o hpmcount command provides Execution wall clock time Hardware performance counters information Derived hardware metrics Resource utilization statistics obtained from the getrusage system call for the application named by command 95 f PAO hpmcount options B I B Biomedical Informatics amp Computational Biology o a Aggregates the counters on POE runs d Adds detailed set counts for counter multiplexing mode H Adds hypervisor activity on behalf of the process h Displays help message k Adds system activity on behalf of the process file Output file name S set Lists a predefined set of events or a comma separated list of sets 1 to or O to select all o O O O O
86. ws in Porous Mechanics Media Molecular Structural Modeling Transport Structural Y Discrete Nanotechnology Scientific and Engineering 90 Display Stochastic Applications Landscape Ord Diff amp Process Equations Processing Graph VLSI Pattern Symbolic Design Processing Aerodynamics Network Bioinformatics Flow Cryptography Particle Physics Elementary Flow Data Genome Proteomics Mining Computing Hes 1 MPP Challenges for Applications Developers B I B Biomedical Informatics amp Computational Biology o MPP flops are not only dependent on the individual performance of the CPU o Performance on the holistic system Memory system File access Network messaging This type of system is not appropriate for every application It is harder to take advantage of all processors o Applications that can take advantage of large number of processors need access to larger systems 136 BICB Develop New Software Package ultidisciplinary Team Mapping Applications Biomedical Informatics amp Computational Biology New Hardware Architectures Applications Enablement Port and Optimize Existing Application Collaboration Developers Bab Classical Molecular Dynamics BICB CHARACTERISTICS These models were developed to describe molecular structures and properties in as practical a manner as possible for very large systems

IBM Massively Parallel Blue Gene

Contents

Download Pdf Manuals

Related Search

Related Contents