Home
CodeAnalyst User`s Manual
Contents
1. Inline 103 multiply matrices 10950 Func multiply matrices z File root classic classic cpp be Line 55 71 free Range O x4005cl Ox400a41 been 58 je 59 inline void multiply matrices in 50 bes 61 Multiply the two matrices gt Dasm 62 for int i 0 i lt ROWS i i xd movslq rl d r3a pes O40 xor tesi tesi ben 0x40 imul FOxfaD0 r9 4r3 um 0x40 lea Oxda2280 r9 rdi pi 0x40 add 0x1l tr1l0d 040 cmp 0x3e8 rl d DKJO jne 4003cl main 0x61 gt Dasm 63 for int j 0 j COLUMNS j 1 93 z 64 float sum 0 0 w Dasm 65 for int k 0 k lt COLUMN 9784 El 1 Instruction 3 7 Call Stack Sampling Analysis Call stack sampling CSS collects function call information including caller to callee relationships CSS is used in conjunction with time based profiling TBP event based profiling EBP and instruction based sampling IBS When compared to other techniques such as instrumentation CSS is a relatively low overhead approach to the collection of function call information However CSS is based on sampling and its results are subject to statistical variation CSS collects information from the run time call stack to identify caller to callee relationships between functions When CSS is enabled CodeAnalyst collects information from the run time call stack of a monitored application process whenever a regular time event or instruction based samp
2. d anced Micro Devices JUCATIGIY gt oc v AN ID Code nalyst m p ojec ymy project cav File Profile Tools Windows Help 18 Ose OHO eoo my p Pause bp z EBF e A JEEP CodeAnalyst starts data collection and launches the application program that is specified in the session settings Session status is displayed in the status bar in the lower left corner of the Code Analyst window Session progress is displayed in the lower right corner 118 Tutorial After data collection is complete CodeAnalyst processes the data and displays results in the System Data tab A new session appears under T BP Sessions in the sessions area at the left hand side of the CodeAnalyst window The System Data table displays a module by module breakdown of timer samples This breakdown shows the distribution of execution time across modules that were active while CodeAnalyst was collecting data CodeAnalyst is a system wide profiling tool and it collects data on all active software components System wide profiling assists the analysis of multi process applications drivers operating system modules libraries and other software components Each sample represents millisecond of execution time L Eile Profile Tools Windows Help Time based profile my_project caw A Ej T All Data A Manage m m TBP Sessions System Data ystem System Data L Session EBP Sessions Tee by lib64 Id 2 12 so usr lib64 qt 3 3 lib
3. Sampling Session Idle P PA 8 5 2 Changing Contents of a View CodeAnalyst provides a way to change the contents of a view Platform name CPU Family 0x10 View name All Data Description This special view has all of the data from the profile available Columns Available data Columns shown GPU clocks not halted cycles 0 0x0 de Data cache accesses 0 0x0 Data cache misses 0 0x0 a Retired instructions 0 0x0 ala OTIA and 12 OTIRA mise 7 NYT y _ Separate CPUs 1 Click the Manage button to change the contents of the currently selected view A dialog box appears showing the name of the view a description of the view the available data that can be shown and the columns data that are shown 125 Tutorial e To add data for an event to the current view select an event in the Available data list and click the right arrow button e To remove data for an event from the current view select an event in the Columns shown list and click the left arrow button 2 Remove all events except Retired instructions and Data cache assesses from the Columns shown list 3 Click the OK button to confirm and accept the changes After making these changes CodeAnalyst updates the System Data table and eliminates the columns for the event data that were removed from the view Eile Profile Tools Windows Help my_project caw All Data A Manage 9 zz TBP Sessions ystem Data EBP S
4. 0 o Current instruction based profile y cee TBP System Data EBP Sessions i Session Aggregate by Modules root classic classic E Jroot classic classic 5190 no vmlinux gt lib libc 2 12 1 so Jroot classiciclassic 5190 Jusr bin xterm 5187 Jsbin killall5 5192 Jopt CodeAnalyst trunk bin oprof t Jopt CodeAnalyst trunk bin Code Jusr bin sudo 5194 i Jusr bin sudo 5193 Jusr lib libX11 50 6 3 0 Jopt CodeAnalyst trunk bin oprofiled lib libcrypto so 0 9 8 lib ld 2 12 1 so Ei Use the drop down list of views to select a different view of the IBS data The drop down list is located next to the Manage button For instance choose IBS fetch instruction cache to see a summary of IBS attempted fetches completed fetches instruction cache IC misses and the IC miss ratio or choose IBS All ops from the drop down list to see a summary of the number of all IBS op samples IBS branch BR samples and IBS load store samples that were collected Figure 5 17 IBS All Ops Advanced Micro Devices CodeAnalyst root AMD CodeAnalyst ClassicTest ClassicTest caw Session 01 Session O1 ebp a in SS Se cc File Profile Tools Windows Help EREMIE TI TBP Sessions IBS All ops EBP Sessions IBS BR branch IBS BR return IBS MEM all load store xu IBS all ops IBS BR IBS load store I
5. 51 Features event IBS_OP_ALL 250000 0 1 1 Please see opcontrol list for list of available IBS derived events and option mask Also see Section 9 3 Instruction Based Sampling Derived Events for description of IBS derived events NOTE All IBS Fetch events must have same counts and option mask The rule also applies for IBS op events 2 8 2 Importing and Viewing Profile Data Profiles taken by using the command line utility can be imported into a CodeAnalyst project With default configuration opcontrol creates profiles in var lib oprofile samples current directory It contains the samples collected by Oprofile daemon The profile data can be imported into the CodeAnalyst GUI for further review and interpretation Profile data are imported into a CodeAnalyst project See Section 8 3 Tutorial Creating a CodeAnalyst Project in order to create a project Section 2 5 Importing Profile Data into CodeAnalyst illustrates the process of importing profile data 2 8 3 OProfile Daemon Driver Monitoring Tool Figure 2 41 OProfile Daemon Driver Monitoring Tool OProfile Daemon Monitor hvar lib aprafile lack pid 13363 Command Line Japt CodeAnalyst ibin aprafiled session dir2 var lib aprafile separate lib File Descriptor Count 5 Last Dump Timestamp Fri Feb 5 17 36 58 2010 OPrafile Driver Monitor Driver Info Driver Stats CPU Stats IBS Config PMC Config Driver Enabled 1 CPU type x06 64family10
6. 2 4 Note Tab Figure 2 36 Session Settings Dialog Note Tab Setting Templates Session Sessianl Session2 Duplicate Rename Delete Remove Launch Control Launch raat classic classic AI a C Terminate app when stop profile C Enable CPU Affinity E C Show app in terminal Enable Process Filtering Prafile Cantral Profile duration sec Profile start delay sec la Stop prafile when the app exits C Start with the profiling paused Profile the duration ofthe app execution Profile Configuration Current eventbased profile 1000000 CPU clocks not halted cycles 500000 Retired instructions This tab contains a field where users can specify profile session note 2 7 5 OProfiled Log 41 2 6 Features Figure 2 37 Session Settings Dialog OProfiled Log Tab Property Mode Only A Je Session General Advance Mate EE Session2 Sessione Launch Control Duplicate Launch root classic classic Rename Terminate app when stop profile Enable CPU Affinity oxi Show app in terminal Enable Process Filtering Advance Filter Profile Control Profile duration sec Profile start delay sec bo C Stop profile when the app exits Start with the profiling paused C Profile the duration of the app execution Profile Configuration Current event based Current eventbased profile we AA A A 1000000 CE clocks not halted mn 500000 Retire
7. 7 3 1 1 Platform Name The Platform Name field shows the name of the selected AMD processor family Processor families are selected from a drop down list which is enabled only in the Global View dialog In Local View dialog the processor type is determined by the type of processor the profile data of the session was collected on 7 3 1 2 View Name The View Name field shows the name of the selected view configuration View configurations are selected from a drop down list 7 3 1 3 Description The description changes according to the selection in View name 7 3 1 4 Columns CodeAnalyst can only display items listed in the Available data list Available data The items in the Available data list are preselected according to the selected view configuration Each item is available for inclusion in a view as a column Columns shown Selects which columns are shown and can be changed by using the directional arrows to move an item from the list 106 View Configuration The selected platform name in Platform Name drop down menu list determines the types of AMD processor family which in turn determines the available views shown in View name drop down menu list Availability of views are determined by the available performance counters on a particular AMD processor family The selected view determines the choices displayed under Available data and under Columns shown The performance data to be shown in the selected view can be
8. A 1 Preamble The licenses for most software are designed to take away your freedom to share and change it By contrast the GNU General Public License is intended to guarantee your freedom to share and change free software to make sure the software is free for all its users This General Public License applies to most of the Free Software Foundation s software and to any other program whose authors commit to using it Some other Free Software Foundation software is covered by the GNU Library General Public License instead You can apply it to your programs too When we speak of free software we are referring to freedom not price Our General Public Licenses are designed to make sure that you have the freedom to distribute copies of free software and charge for this service if you wish that you receive source code or can get it if you want it that you can change the software or use pieces of it in new free programs and that you know you can do these things To protect your rights we need to make restrictions that forbid anyone to deny you these rights or to ask you to surrender the rights These restrictions translate to certain responsibilities for you if you distribute copies of the software or if you modify it For example if you distribute copies of such a program whether gratis or for a fee you must give the recipients all the rights that you have You must make sure that they too receive or can get the source code And you
9. FTE rm AA rne lt gt Shen empes x Search fino Grouping Types Callers All Callers Callee Map Source Code i MonteCarlo integrate 9 35 0 y 91 3 E 9 0 E u 898 1 W SOR execute F 8 0 y 77 ars Hi 7 0 ee ri 7 0 E 69 6 a 61 4 ndom_ r 6 O 1 kernel J 514 E rm_int J 5 O 1 jj kernel measure FI 1 4 0 FFT t form 1 1 FFT_bitreverse 1 0 FFT inverse n callgrind out 1 Total CA CSS Cost 33 71 Chapter 4 Configure Profile 4 1 Profile Data Collection Profile configurations control the kind of performance data that is collected during a session CodeAnalyst provides several Section 4 4 Predefined Profile Configurations that support the most common kinds of performance analysis Profile configurations eliminate the tedium of configuring data collection by hand for each session To quickly change to a different profile configuration select the profile configuration from the drop down list of program configurations in the toolbar an Advanced Micro Devices CodeAnalyst root AMD CodeAnalyst my project my project caw File Profile Tools Windows Help BBA E ij BF B OOO prome os Assess performance Current event based profile Current instruction based profile Current time based profile IBS Op Branch IBS Op Load Store IBS Op Load Store expert IBS Op Load Store DC Miss IBS Op Load Store DTLB my project caw TBP Sessions i EBP Sessions Sampling S
10. executed on core 2 C2 HTBP Sessions L Session EBP Sessions Eye K root AMD CodeAnalyst samples classic classic 92 85 no vmlinux 0 10 2 57 0 48 0 39 0 12 3 04 r lib64 libe 2 12 so0 0 02 0 29 0 02 p lib64 Id 2 12 so 0 02 0 02 0 02 P usr lib64 qt 3 3 lib libgt mt so 3 3 8 Kr usr bin sudo 0 01 opt CodeAnalyst bin oprofiled Kr opt CodeAnalyst bin CodeAnalyst bin dbus daemon 3 Double click on a module in the System Data table to drill down into the data for the module 120 Tutorial 4 When viewing the System Data tab double click on a process to drill down A new tab is created displaying a function by function breakdown of timer examples The distribution of timer samples across all of the functions in the module is shown The functions with the most timer samples take the most time to execute and are the best places to look for optimization opportunities F TBP Sessions Session EBP Sessions HO0x400abd multiply_matrices 0x400b4f multiply_matrices 0x92 0x400b48 multiply_matrices 0x8b 0x400b45 multiply_matrices 0x88 0x400b3a multiply_matrices 0x7d 0x400b2c multiply_matrices Ox6f 0x400b1b multiply_matrices 0x5e 0x400b14 multiply_matrices 0x57 0x400b0c multiply_matrices Ox4f 0x400afc multiply matrices Ox3f 0x400af5 multiply_matrices 0x38 0x400aed multiply_matrices 0x30 0x400824 initialize_matrices 5 Double clic
11. x A lll CodeAnalyst Options Source Code Display Bypass Source Disassembly only Alen when no source is available Data Aggregation Default Aggregation Made C Aggregate samples into instance of inline function O Aggregate samples ino original inline function a Aggregat samples into basic blocks Show Data Aggregation Controller in Module Data Tab Toolbar CS ElP Basic Block Load Store CPU clocks a Ox4d00a92 multiply matrice st 0x4 00 abt 0x400abf 0x400b11 7 2 0x400b1la Ox400b1la Ox400b3e 4 2 0x400aae 0x400aae 0x400abf 0 2 i 0x400b11 x4dO0bll O0x4OG0bla 0 1 0x400aa2 Ox400aa2 Ox400aae 0 1 i 0x400a92 0x400a92 0x400aa2 0 1 i 0x4008e4 initialize matrices 0x40094e 0x40094e 0x4009a3 3 3 0400949 0x4009459 0x40094e 0 0 0400915 0x400913 0x4009491 2 1 I 0x40090e O0xdOQ0S90e 0x400813 0 0 i 0x400907 0x400907 0x400950e 2 0 i 0x4 008fb 0x4008fb 0x400907 0 11 i 0x4008e4 Ox4008e4 Ox4008fb 0 1 65 Types of Analysis In Disassembly View each basic block is shown by interleaving different background colors of white and gray Users can navigate through code execution path from one basic block to the previous or to the next basic block Right click at the beginning of a basic block and the pop up menu lists the source addresses that are usually the destination address of a control transfer i
12. Analysis usually begins with time based profiling in order to find time critical and time consuming software components Event based profiling or instruction based sampling is usually employed next in order to determine why a section of code is running more slowly than it should Flexible System Wide Data Collection CodeAnalyst s data collection is system wide so performance data is collected about all software components that are executing on the system not just the application program itself CodeAnalyst collects data on application programs dynamically loaded libraries device drivers and the operating system kernel CodeAnalyst can be configured to monitor the system as a whole by not specifying an application program to be launched when data collection is started Time based profiling event based profiling and instruction based sampling collect data from multiple processors in a multiprocessor system CodeAnalyst can also be used to analyze Java just in time JIT code Summarized Results with Drill down CodeAnalyst summarizes and displays performance information in a hierarchical fashion The CodeAnalyst graphical user interface organizes and displays information at each of these levels and provides drill down Thus CodeAnalyst provides an overview of available performance data by process or by module followed by drill down to functions within a module to source lines within a function or even the instructions that are associated w
13. Edit Event based and Instruction based Sampling Configuration for more detail 3 Click OK to apply the changes 99 Chapter 6 Data Collection Configuration 6 1 Data Collection Configuration CodeAnalyst provides predefined profile data collection and view configurations to set up data collection and the presentation of results The CodeAnalyst GUI provides the ability to change profile and view configurations CodeAnalyst stores these predefined configurations and custom user configurations in XML files Expert users may choose to create or modify their own profile and view configurations by editing XML files directly The two subsections below describe the XML format and tags for profile and view configuration files Section 6 2 Profile Configuration File Format Section 7 5 View Configuration File Format 6 2 Profile Configuration File Format A data collection configuration file describes how CodeAnalyst is to be configured for data collection Through the Code Analyst GUI the user chooses one of several data collection configurations The GUI configures data collection according to the selected configuration The user may also modify a data collection configuration to fit their needs using the Code Analyst GUI The modified data collection configuration can be saved to a file A data collection configuration file contains a single configuration The file representation of a data collection configura
14. Features appropriate starting points for customization See Section 4 5 Manage Profile Configurations for detailed information Figure 2 22 Configuration management LA Configuration Management t gt gt Profile configuration list Rename Current event based profile Current instruction based profile Current time based profile IBS Op Branch IBS Op Load Store Edit IBS Op Load Store expert IBS Op Load Store DC Miss p Import IBS Op Load Store DTLB IBS Op Load Store Memory Access IBS Op Load Store Page Size IBS Op Northbridge Access IBS Op Northbridge Cache Access IBS Op Northbridge Services IBS Op Overall Assessment IBS Op Return Instruction based sampling Investigate L2 cache access Investigate branchina Remove 2 2 17 CodeAnalyst Options Dialog Box The CodeAnalyst options dialog box has tabs for setting the following options General Use to assign types of source code displayed module enumeration and hot keys e Directories Select various directories For details on using this dialog box refer to Section 2 3 CodeAnalyst Options 20 Features Figure 2 23 CodeAnalyst Options Bill coss Op General Directories source Code Display Show disassembly only by default Alert when no source is available Font Size Detaut Data Aggregation d Default Aggregation Made f Aggregate samples into instance of inline function a Aggreg
15. Launch Contral Launch raat classic classic me Browse Terminate app when stop profile Enable CPU Affinity ox Show app in terminal Enable Process Filtering Advance Filter Profile Control Profile duration sec Profile start delay sec 0 Stop profile when the app exits C Start with the profiling paused C Frofile the duration ofthe app execution Profile Configuration 1000000 Ox00 IBS all fetch samples 1000000 0x01 IBS all op samples Remove 4 Click OK to apply selections Click the Start icon B o o to launch the application and begin profiling The task bar at the bottom of the screen displays Sampling Session Started and the percent completed The Pause and Stop icons 0 D become active When the sampling session is complete the application under test terminates and the performance data is processed The work space then displays a module by module breakdown of the results in the System Data table Select the System Tasks tab to see a task by task breakdown of the results Double click on a module or process to drill down into the data 95 Collecting Profile Figure 5 15 Output from IBS Profile Session ho Advanced Micro Devices Le root AMD CodeAnalyst ClassicTest ClassicTest caw Ses T Se DERS T Eile Profile Tools Windows Help J Ef ka m E p D 6 sampling src aros zl 2 TEP pom System Data EBP Sessions i Session
16. Once finished the profiling information can be transferred to another system and viewed by the CodeAnalyst GUI CodeAnalyst provides a tool called capackage sh It gathers information necessary for analyzing a session of profiling and compresses it into an easily managed tarball capacked tar gz This tarball can then be transferred onto another system and imported into the CodeAnalyst GUI If you choose Remote Profiling and click Next Choose Import Type Import Options C Import Local Profile from Oprofile Sample Files Recommended Import Remote Profile from capackage sh Output Recommended The wizard prompts the user to enter the location of the tarball capacked tar gz output from capackage sh 32 Features E Import Wizard Choose packaged tar gz file Browse Once completed click Finish and CodeAnalyst will untar capacked tar gz into the tmp CAxxxxxx capacked directory which contains the following sub directories binary Stores the executable and modules used in profiling current Stores Oprofile samples e Java Stores information related to Java profiling Then the importing process continues as described in Section 2 5 1 Import Local Profiling 2 5 3 Import CodeAnalyst Session Directory CodeAnalyst stores profile data from each profile session in a directory This directory generally contains TBP or EBP files and other intermediate data files Importing the session directo
17. To stop the profiling session simply run opcontrol with the corresponding option flags as listed in opcontrol help 2 8 1 3 Instruction Based Sampling Instruction Based Sampling collects performance data on instruction fetch IBS fetch sampling and macro op execution IBS op sampling IBS fetch sampling provides information about instruction cache IC instruction translation lookaside buffer ITLB behavior and other aspects of the process of fetching instructions IBS op sampling provides information about the execution of macro ops that are issued from AMD64 instructions IBS op data is wide ranging and covers branch and memory access operations See Section 3 4 Instruction Based Sampling Analysis for more information IBS fetch sampling and IBS op sampling are controlled by specifying opcontrol event lt IBS Fetch event name gt lt Fetch count gt lt Fetch option mask gt 1 1 or opcontrol event lt IBS Op event name gt lt Op count gt lt Op option mask gt 1 1 The count takes a single decimal value which is the fetch op interval sampling period for IBS fetch op sampling The IBS fetch sampling counts completed fetches to determine the next fetch operation to monitor and sample The IBS op sampling counts processor cycles to determine the next macro op to monitor and sample IBS fetch and op sampling may be enabled independently or at the same time For instance opcontrol event IBS_ FETCH ALL 250000 0 1 1
18. retired instructions e Misaligned access rate Misaligned accesses retired instructions e Mispredict rate Retired mispredicted branch instructions retired instructions In general when the term rate appears in a computed performance measurement the rate is expressed as events per retired instruction A rate indicates how frequently an event is occurring A high rate such as a high DC miss rate may indicate the presence of a performance problem and an opportunity for optimization The specific combination of events and computed performance measurements that are shown in a table are a view CodeAnalyst may offer more than one view depending upon the kinds of data e g events that were collected The drop down list immediately above the System Data tab contains the available views The All Data view is always available To switch to All Data view select the All Data view from the drop down list 124 Tutorial File Profile Tools Windows Help 45 aaa ELE All Data Manage Bil TBP Sessions System Data System Data Y EBP Sessions L Session TNI by a S lib64 Id 2 12 so usr bin sudo usr lib64 libXft so 2 1 13 usr lib64 libX11 s0 6 3 0 usr lib64 libldap 2 4 so 2 5 2 sbin killall5 opt CodeAnalyst bin CodeAnalyst libe4 libpthread 2 12 so flib64 libpam so 0 82 2 1ib64 libglib 2 0 so 0 2200 5 usr sbin irqbalance Kr usr libe4 libxcb so 1 1 0 Ow C5 C09 C0 C09 C0 CO w Op
19. 4 libqtmcop so 1 0 0 fJusrilib amp 4 libmcop so 1 0 0 Jusr lib amp d libkdeui so 4 2 0 RAKRARAKR RRA RRR RARO ARA Figure 3 5 Retired instructions DC accesses and misses for source level hot spot EBP S9inline void multiply matrices i 0400992 push rbp 55 err mov rsp rbp 48 89 eb 60 f 61 Multiply the two matrices 62 for inti 0 i1 ROWS it 63 for int j 0 j lt COLUMNS j 64 float sum 0 0 65 for int K O k C 66 sum sum matrix 67 68 matrix r il jl sum 63 1 Instruction AMD processors are equipped with performance monitoring counters PMC Each counter may count exactly one hardware event at a time A hardware event is a condition or change in hardware condition like CPU clocks retired x86 instructions data cache accesses or data cache misses The number of counters and the hardware events that can be measured are processor dependent The CodeAnalyst online help provides a quick guide to the events that are available for each AMD processor family See Performance Monitoring Events PME for descriptions of the events supported by AMD processors However you should consult the BIOS and Kernel Developer s Guide for the AMD processor in your test platform for the latest information The number of events and in some cases the event behavior may vary by revision within a processor family as well 58 3 3 1 3 3 2 3 3 3 Types of Analysis How Event
20. An ebp element has the following attributes name Configuration name string mux period Event multiplexing period in milliseconds integer The mux period attribute specified the event multiplexing period If the mux period is zero event multiplexing is disabled Event multiplexing should not be used when all events can be programmed onto the hardware counters in a profile run The number hardware counters can be different on each platform on which measurements are to be taken For example AMD Athlon64 and Opteron processors have four performance counters and thus the number of event elements is limited to a maximum of four Information from extra event elements may be discarded An EBP collection configuration element contains exactly one of the following elements tool tip and description An event element has the following attributes select Event select value integer mask Unit mask value integer os Enables OS sampling Boolean user Enabled user level sampling Boolean count Sampling period integer edge detect Enable edge detect when counting events Boolean host Enable host mode event counting Boolean guest Enable guest mode event counting Boolean 101 Data Collection Configuration The values must be validated against the events and specific capabilities supported by the measurement platform An event based profiling data collection configuration has the form lt dc_confi
21. J N mat Ali j AiiJ Ajl File LU java Module roat AMD CodeAnalystimy project Javasession tbp dir jit 3643 7197 8b 704860 0 El 1 Instruction Launching a Java Program from the Command You may also use CodeAnalyst to collect data on a Java program or any other application program that is launched from a command line This is sometimes more convenient than launching the application from within CodeAnalyst 1 Click the Session Settings button in the toolbar to change the session settings or alternately select Tools gt Session Settings from the menu A dialog box appears asking for changes to the session settings 2 Leave the Launch and Working directory fields empty CodeAnalyst disables the Stop data collection when the app exits and Profile the duration of the app execution options 3 Change the Profile duration field to 40 seconds 4 Click OK to confirm the changes and to dismiss the dialog box 145 Tutorial Template Name Java Mo Launch General Advanced Mate JavaSession Session Launch Control Launch Browse Working directory Browse Teminate app when stop profe Enable CPU Affinity in Hex o select Affinity Show app in terminal Enable Process Filtering Advance Fiiter Profile Control Profile duration sec 40 Profile start delay sec o Stop profile when the app exits Start with the profiling paused Profile the duration of the
22. Tutorial 8 6 Tutorial Analysis with Instruction Based Sampling Profile This section is a brief introduction to using Instruction Based Sampling IBS A CodeAnalyst project must already be opened by following the directions under Section 8 3 Tutorial Creating a CodeAnalyst Project or by opening an existing CodeAnalyst project It also assumes that session settings have been established and Code Analyst is ready to profile an application 8 6 1 Collecting IBS Data Setting Templates Template Name sesion 000 a General Advanced Note Launch Control Launch root AMD CodeAnalyst samples classic classic ai Browse Working directory l root A MD CodeAnalyst samples classic Browse Terminate app when stop profile Enable CPU Affinity in Hex xar select Affinity Show app in terminal _ Enable Process Filtering Advance Filter Profile Control Profile duration sec fo Profile start delay sec lo t Stop profile when the app exits _ Start with the profiling paused Profile the duration of the app execution Profile Configuration Current event based profile Current event based profile Current instruction based profile ar Current time based profile IBS Op Branch IBS Op Load Store IBS Op Load Store expert IBS Op Load Store DC Miss IBS Op Load Store DTLB IBS Op Load Store Memory Access IBS Op Load Store Page Size 1 Open the Session Setting dialog from the toolbar o
23. developer can still produce the debug information so that AMD CodeAnalyst can perform its analysis Compiling with the GNU GCC Compiler When using GNU GCC to compile the application in general specify the option g to produce debugging information Please refer to section Options for Debugging Your Program or GCC of the gcc Linux manual page man gcc for more detail Chapter 2 Features 2 1 Overview of AMD CodeAnalyst 2 1 1 2 1 2 Program Performance Tuning The program performance tuning cycle is an iterative process 1 Measure program performance 2 Analyze the results and identify program hot spots 3 Identify the cause for any performance issues in the hot spots 4 Change the program to remove performance issues AMD CodeAnalyst assists all four steps by collecting performance data by analyzing and summarizing the performance data and by presenting it graphically in many useful forms tables charts etc CodeAnalyst directly associates performance information with software components such as processes modules functions and source lines CodeAnalyst helps to identify the cause for a performance issue and where changes need to be made in the program The performance tuning cycle resembles the classic scientific method where a hypothesis about performance is made and then the hypothesis is tested through measurement Measurement and analysis provide an objective basis for tuning decisions Performance anal
24. no vmlinux 866 1 lib64 libc 2 12 so amp opt CodeAnalyst bin oprofiled 11 lib64 Id 2 12 so 11 usr lib64 qt 3 3 lib libqt mt so 3 3 8 C usr lib64 libX11 so 6 3 0 amp lib6e4 libpthread 2 12 so amp 1 usr lib64 libXft so 2 1 13 amp 1 usr lib64 libxcb so 1 1 0 amp opt CodeAnalyst bin CodeAnalyst r lib64 libselinux so 1 PH lib64 libpam so 0 82 2 PH lib64s libglib 2 0 s0 0 2200 5 Examine the list of available events The predefined IPC assessment view is offered because data is available for both the retired instruction and CPU clocks not halted events The decision to offer a view is data driven If the right type of event data is available to display a view CodeAnalyst offers the view 9 Select the IPC assessment view to display a module by module breakdown of IPC and CPI measurements in the System Data table Session Session 01 amp root AMD CodeAnalyst samples classic classic 0 914172 1 093886 I no vmlinux 0 123018 8 128906 amp lib64 libc 2 12 so 1 081395 0 924731 amp opt CodeAnalyst bin oprofiled 1 583333 0 631579 1 lib64 Id 2 12 so 9 0 900000 1 111111 amp 1 usr lib64 qt 3 3 lib libqt mt so 3 3 8 0 333333 3 lib64 libpam so 0 82 2 amp 1 usr lib64 libXft so 2 1 13 amp 1 usr lib64 libxcb so 1 1 0 amp usr lib64 libX11 so 6 3 0 amp opt CodeAnalyst bin CodeAnalyst 1 lib64 libselinux so 1 amp libe4 libpthread 2 12 so 11 lib64 libglib 2 0 s0 0 2200 5 130
25. samples accumulate for as long as data is collected The measurement period depends upon the overall execution time of the workload and the way in which CodeAnalyst data collection is configured Using either the Session Settings or command line utility options CodeAnalyst can be configured to collect samples for all or part of the time that the test workload executes If program run time is short less than 15 seconds it may be necessary to increase program run time by using a larger data set or more loop iterations to obtain a statistically useful result Extending the duration of the measurement period by changing the Session Settings or options to the CodeAnalyst command line utility may need to be done Deciding how many samples are enough requires judgment and a working knowledge about the characteristics of the workload under test Scientific applications often have tight inner loops that are executed several times In these situations samples accumulate rapidly within the inner loops and even a fairly short run time yields a statistically useful number of samples Other workloads like transaction processing have few intense inner loops and the profiles are relatively flat For flat workloads a longer measurement period is required to build up samples in code regions of interest Predefined Profile Configurations CodeAnalyst provides predefined profile configurations to make configuration of time based profiling and other kinds of analy
26. this view can be configured to show sample counts aggregated per inlined instance or aggregate sample counts from all inlined instances in the module into each inline function It can also aggregate sample counts in per basic block fashion 13 Features Figure 2 14 Single Module Data tab Session Session ebp Il All Data Manage System Data root classic classic Data Pia AN Pic Tid AN Tid lil Aggregate samples into instance of inline function IF CS EIP Symbol Offset CPU clocks Misalign aci 4 i x4d0Daed multiply matrices 95736 42922 39906 Ea A gt a a z 0x400b4f multiply matrices 0x62 18286 8850 2026 z 0x400b6e multiply_matrices 0x81 8178 3482 4652 pu 0400b4b multiply matrices Ox5e 4164 1674 3498 pa 0x4D0b3c multiply matrices x4f 4068 1578 6818 pa 0x400b5c multiply matrices Ox6f 4020 1950 820 pa 0x400bba multiply matrices 0x7d 4008 1826 1570 z Ox400b75 multiply matrices 0x88 3798 1564 1194 pa 0x400b1d multiply matrices 0x30 3360 1766 930 pa 0x400b2c multiply matrices Ox3f 3166 1232 1712 B 0x400b30 multiply matrices 0x43 826 272 420 z 0x400b20 multiply matrices 0x33 582 324 172 gt Ox400b91 multiply matrices Oxa4 148 48 10 pa 0x400b39 multiply matrices Ox4c 148 48 2B z 0x400b58 multiply matrices Ox6b 34 12 22 T 0x400b78 multiply matrices Ox8b 18 2 4 pa 0x400b3F multiply_matrices 0x52 16 2 2 2 0400bla multiply matrice
27. 0x71978 b704ddt Ljnt scimark2 LU factor Ox471 319 0x71978 b704d96 Ljnt scimark2 LU factor 0x436 377 x71978 b704dch Lint scimark2 LU factor Ox46b 375 0x71978 b704dac Lint scimark2 LU factor 0x44c 351 Ox71978 b704eb3 Lint scimark2 LU factor 0x553 205 i 0x71978 b704d9d Ljnt scimark2 LU factor 0x43d 143 0x71978 b704dd2 Ljnt scimark2 LU factor 0x472 142 i 0x71978 b704e03 Ljnt scimark2 LU factor 0x4a3 130 Ox71978 hT04e5a Ljnt scimark2 LU factor Ox4fa 123 071978 b704de3 122 Fl 1 function 1 instruction Ljnt scimark2 l LU factor 0x483 Samping Session Idk we 2 Expand the line jnt scimarkZ LU factor and double click on the offset with the highest number of samples 0x7f978b704e0d in this example to drill down into the function CodeAnalyst displays a new tabbed panel containing the annotated source and assembly code for the function LU factor The number of timer samples for each instruction is shown 144 8 7 2 Line Tutorial ficro n Dias Hie Profile Tools Windows Help B P El E e all e m 0 Time hased profile my projectcaw at Data El Manage ala TBP Sessions SS System Data jnt scimark2 commandiine Data 78b704960 0 Lint scimark2 LU factor Src Dasm i EBP Sessions for int ii2j 1 11 lt M 11 i double Aii Alii double Aj A J H Dasm double AiiJ Aii H Dasm for aint J2 1
28. 2 752 Gelieral RAD pan tese EE dod Edo MP DU andi MELLE 37 2 s xd Vallce A E 39 age NN A H 4 A DEO ioexecueu siclos A atero du O IA 4 2 50 t hanpime The PU ATNI nits omi di ida 42 2 Oh PLIOCOSS EIHOE dias 46 iii CodeAnalyst User s Manual 2 5 ode Analyst and OPTOG cua 47 2 8 1 Profiling with OProfile Command Line Utilities eese 48 2 8 2 Importing and Viewing Profile Data esses mme 52 2 8 3 OProfile Daemon Driver Monitoring Tool esse 52 TYPE Ol ANIIS S En 54 S TYPEOF ANAN I er E T S 54 3 2 Time Based Profilin Analysis A IA NA E ned ed atu cues 54 3 2 1 How Time Based Profiling Works ii 56 3 2 2 Sampling Period and Measurement Period esee 56 32 3 Predefined Profile Configurations ossen oue bout tUe qu HE a 57 3 5 Eyent Based Protiline Amal y SiS oio b to P etc Fue adstare dc NN le rud 57 3 3 1 How Event Based Profiling Works tia td axe 59 3 3 2 Sampling Period and Measurement Period cece cece cence ee eeeeeeeeeeeeeeaenees 59 9 9 veal Mute a io dos dibo sica 59 3 334 Predefined Profile onfIp ratiODs 01 A A Mad claws 60 3 4 Instruction Based Sampling Analysis ccceccecesceceecee ene ee eee ee meme e enne 60 SL IBS Pete Salm pi sestati ia id tat Temennanbuarniel eects 61 342 CBS Op Satin ONS sita 62 243 IB S Derived ENCISO 64 3 4 4 Predefined Profil
29. Aggregate by Modules root classic classic 22 Iroot classic classic 5021 gt Alib libc 2 12 1 50 P froot classiq classic 5021 Jusr bin xterm 5018 sbin killallS 5023 Jopt CodeAnalyst trunk bintoprofiled 5016 fusr sbin sshd 1464 Jusr bin sudo 5025 Jusr bin sudo 5024 i JoptiCodeAnalyst trunk bin CodeAnalyst 1575 i Ina vmlinux Jusr lib libX11 so 6 3 0 apt CodeAnalyst trunk bin oprofiled Jusr sbin sshd El H lib ld 2 12 1 s0 5 4 2 Changing the Current View of the Data Similar to event based profiling IBS profile can produces a broad range of information about program behavior in a single run IBS fetch samples provide information about instruction fetch while IBS op samples provide information about the execution of operations ops that are issued from AMD64 instructions Several views provide a more focused look at different aspects of fetch and execution behavior The All Data view displays sample counts for all IBS derived events Predefined views are provided to narrow down the displayed performance data to the most useful groups of IBS derived events 96 Collecting Profile Figure 5 16 IBS Profile with All Data view when selecting large number of IBS derived events ra Advanced Micro Devices CodeAnalyst root AMD CodeAnalyst ClassicTest ClassicTest caw Session 01 Session 01 ebp P H rie Profile Tools Windows Help IB A A IED s
30. B8 Mlib54 libpthread 2 5 so 9 usrilibe4 qt 3 3 plugins styles bluecurve so 10 fusrlibbaslibft so 2 1 2 11 usrilibB4 libX11 sa B 2 0 12 usrilibE4 libpythonz 4 so 1 0 13 libB4 liberypto so D 3 8b 14 usr shin sshd 15 usr sbin hald 18 usr lacal bin CodeAnalyst 1 fusr libbdslibstde so 5 0 8 18 l usr libe4 libfreetvpe so 5 3 10 H 4 k H my data c c J 10 0000500 500 0 0 DQD0 O mn GA m GA m GA mn OAG G C O GO OGO C 50505000 5000d RO BJ BJ ORO R3 0 04 0 04 Co CTI Ready 36 Features 2 Session Settings The Session Settings specify information that is needed to control performance data collection Session Settings are persistent and apply to future data collection sessions that are initiated within a project until the Session Settings are again changed Figure 2 33 Session Settings Dialog Ecco ll S Session General Advance Note eens ae Session2 Sessione Launch Control Duplicate Launch root classic classic Rename Terminate app when stop profile Enable CPU Affinity lox E Show app in terminal Enable Process Filtering Advance Filter Profile Control Profile duration sec Profile start delay sec bo Stop prafile when the app exits C Start with the profiling paused Profile the duration of the app execution Profile Configuration Current event based Current eventbased profile we Ee 1000000 CPU clocks not halted 50
31. BF OO Assess pertomance my_project caw Current event based profile TBP Sessions Current instruction based profile EBP Sessions Current time based profile IBS Op Branch IBS Op Load Store IBS Op Load Store expert IBS Op Load Store DC Miss IBS Op Load Store DTLB IBS Op Load Store Memory Access IBS Op Load Store Page Size IBS Op Northbridge Access IBS Op Northbridge Cache Access IBS Op Northbridge Services IBS Op Overall Assessment IBS Op Return Instruction based sampling Investigate L2 cache access Investigate branching Investigate data access 1 Investigate instruction access Sampling Session Idle Time based profile 2 Click the Start button in the toolbar or select Profile Start to begin profiling CodeAnalyst starts data collection and launches the application program that was specified in the session settings The session status displays in the status bar in the lower left corner of the CodeAnalyst window Session progress displays in the lower right corner The blank window is the console window in which the application program classic is running 123 Tutorial Assess performance my_project caw A 7 Overall assessment d Manage w TBP Sessions System Data xi F EBP Sessions L Session Aggregate by Modules JS root AMD CodeAnalyst samples classic classic 98724 0 908533 0 001087 o c no vmlinux 31722 0 111311 0 002039 0 000085 0 lib64 libc 2 12 so 3
32. Based Profiling Analysis for further information To successfully use EBP the user needs to consult the performance monitor event tables See the section on Section 9 1 Performance Monitoring Events PME or the BIOS and Kernel Developer s Guide BKDG for the AMD processor in your test platform For a general description of how to use these performance monitoring features refer to the AMD64 Architecture Programmer s Manual Volume 2 order 24593 Debug and Performance Resources section The Event Select Unit Mask and Event Count sampling period must be specified for each event to be measured The Oprofile utility accepts event specifications that are formatted in the following manner OPROFILE_EVENT_NAME Count Unit mask Kernel User e OPROFILE EVENT NAME specifies the name of event to be profiled e Unit Mask is a two digit hexadecimal value which specifies the Unit Mask value for the event Count is a decimal number that specifies the Event Count sampling period Kernel 0 or 1 to specify kernel space profiling e User O or 1 to specify user space profiling A complete list of events can be viewed using command opcontrol 1 49 Features Figure 2 40 Listing events opcontrol 1 root localhost opcontrol 1 profile available events for CPU type BHIB4 FamilulOh DISPATCHED_ZFPU_OPS counter all Dispatched FFU ops min count SOQ Unit masks default Ox3F t Add pipe ops Jt
33. DC miss samples is not necessarily better than a larger quantity of EBP DC miss samples See Section 9 3 Instruction Based Sampling Derived Events for descriptions of the IBS derived events Predefined Profile Configurations AMD CodeAnalyst provides a predefined profile configuration called Instruction based sampling which collects both IBS fetch and IBS op samples It also provides the configuration named Current Instruction based profile which can be changed and customized 3 9 Basic Block Analysis Basic block is a section of code that represents a serialized execution path that does not contain any type of control transfer instruction i e jump or call A basic block usually begins with the destination of a single or multiple control transfer instructions and ending with a control transfer instruction In the Module Data view users can aggregate data by using basic block aggregation of samples This mode of aggregation is enabled by selecting the Aggregate samples into basic blocks option in the CodeAnalyst Options dialog CodeAnalyst examines each function to identify basic blocks and aggregates samples accordingly Each basic block is denoted by a range of addresses using the format StartAddr StopAddr Number of Load Number of store The dialog also displays the number of load and store instructions within the basic block on the disassembly view tab 64 Types of Analysis Figure 3 8 CodeAnalyst Options
34. Data cache misses gt lt event select 0x4t mask 0x00 os T user T co nt 5000 lt event gt 1 L1 DTLB and L2 DTLB Miss event selecet 0x46 mask 0x00 0s T user T co unt 5000 gt lt event gt Misaligned accesses gt lt event selecet 0x47 mask 0x00 os T user T ecount 5000 gt lt event gt lt tool_tip gt Perform quick assessment lt tool_tip gt lt description gt Collect data needed to quickly identify possible performance issues Good candidates for further investigation are frequently executed time consuming parts of the program Use analysis configurations for follow up investigations lt description gt lt ebp gt lt aC Configuration 103 Chapter 7 View Configuration 7 1 Viewing Results AMD CodeAnalyst can collect many different kinds of performance data in a single measurement session This 1s especially true of event based profiling and Instruction Based Sampling where a large number of events may be collected in a single performance experiment The CodeAnalyst view feature organizes performance data into groups of logically related kinds of data in order to aid interpretation and analysis Select a view from the drop down list of views to change to a different view Figure 7 1 List of Views All Data y All Data i Branch assessment CPU clocks DC accesses DCr 61 21 0066 Data access assessment IPC assessment Misaligned a
35. Multiply pipe Store pipe ops 5 Add pipe load ops t Multiply pipe load ops O20 Store pipe load ops PYCLES_MO_ PS RETIRED counter all Cycles with no FPU ops retired min count 500 Dispatched FPU ops that use the fast flag interface min count 500 fETIRED_SSE_OPS counter all The number af SSE ops or uops retired min count SOQ Unit masks default Oxf FI Single Precision add subtract ops Single precision multiply ops single precision dividessquare root ops llouble precision add subtract ops t Double precision multiply ops llouble precision divide square root ops OP type zuops 1 FLOPS ETIRET MOVE DPS counter all The number af move uops retired min counts SOQ Unit masks default Oxf di Merging low quadword move uopz x 2 Merging high quadword move uops x d All other merging move uops x B8 All other move uops Consider for example the DCache Refill From L2 or System event which can be used to measure only refills from system memory through the use of a Unit Mask that qualifies the event The Event name is DATA_CACHE_REFILLS_FROM_L2_OR_NORTHBRIDGE and a Unit Mask value of 0x01 measures only refills from system memory Using an Event Count of 25 000 the full opcontrol event specification is opcontrol e DATA CACHE REFILLS FROM L2 OR NORTHBRIDGE 25000 0x1 1 1 The Retired Instructions event Event Select OxOCO does not require a Unit Mask Using an Event Count of 250 000 the full o
36. Sessions Process gt Pid gt Module E isandboxwjdk1 7 0 binfjava PID 3643 24931 Sent scimarkz commandline l sandbox jdk1 7 Q jre lib amd amp 4 server libjvm so 15 Isandbax jdk1 7 0 jre lib amd64 libzip so 2 fb xBB G4 linux gnu libc 2 13 so 2 i flib xXB6 64 linux gnu ld 2 13 s0 1 E na vmilinux 220 fa pt CodeAnalyst bin oprafiled 32 fa pt CodeAnalyst bin CodeAnalyst 16 E l usrisbin unity g reeter 4 BJ fopt CodeAnalyst sbin ca oprafile controller 3 Sampling Session Idle an ve n 1 Double click on the module jnt scimark2 commandline to drill down into the performance data for the benchmark program CodeAnalyst displays a new tabbed panel containing a function by function breakdown of timer samples Use this table to identify functions that consume the most execution time The hottest functions are the best candidates for optimization 143 Tutorial a x my_project caw A E Data Manage EF TBP Sessions i JavaSession i EBP Sessions system Data pm jnt scimark commandline Data Pia 3643 Tid an Tid E xrf3r8b 704360 mExrtdrsbr gsegd Mo n7O4dif Ljnt scimark2 L U Lintiscimark2 L Ur factor factor x4ad Lint scimark2 LU factor 0491 Ox 71978b704dbd Lint scimark2 U factor Ox45d 389 i 0x71978 b704d80 Ljnt scimark2 LU factor 0x420 387 0x71978 b704dee Ljnt scimark2 LU factor Ox48e 385 i
37. Timer Configuration and Section 4 3 Edit Event based and Instruction based Sampling Configuration The user can then change the timer interval which determines how often samples are taken Figure 4 10 Edit timer configuration BB Edit timer configuration Profile name Time based profile Cycles in which the FPU is Empty Dispatched fast flag FPU operations Retired SSE Ops Add Events Events in this profile configuration Event Setting Event source Count Tonimass os tome cow 250000 Data cache accesses Unit Masks Options 25000 Data cache misses O Reserved 25000 L1DTLB and L2 DTLB miss 25000 Misaligned accesses Reserved 250000 CPU clacks not halted cyc Reserved 250000 Retired instructions E ener 25000 Retired branch instructions 25000 Retired mispredicted bran Reserved Reserved Reserved Reserved C usr Apply Setting Multiplexing interval msec E Os Selected Events 82 Configure Profile The Session Settings dialog See Section 2 7 Session Settings box contains an Edit button that also opens a profile configuration edit dialog box Selecting a profile configuration and clicking the Edit button in the Session Settings dialog box is equivalent to opening the Configuration Management dialog box selecting a profile configuration and clicking the Edit button By default new profile configurations are stored in the 5HOME CodeAnalyst Configs DC Confi
38. accesses L Reserved OxOO076 FR 250000 0x00 1 CPU clocks not halted cyc L Reserved xQ c FR 250000 0x00 1 1 Retired instructions 7 Reserved 0x 00c2 FR 25000 Ox00 1 Retired branch instructions HS x c3 FR 25000 0x00 1 1 Retired mispredicted bran Reserved Reserved Reserved L Reserved usr Apply Setting Multiplexing interval msec 1 E Os Selected Events This profile configuration uses event counter multiplexing to measure eight performance events with one millisecond multiplexing interval Let us assume the number of performance counters on the running processor is four When data collection is started with this event configuration CodeAnalyst separates the events into two groups Group A 0x40 0x41 0x46 0x47 and Group B 0x76 OxcO Oxc2 Oxc3 In this scenario CodeAnalyst samples events in Group A for the one millisecond before reprogramming the hardware to samples events in Group B for the same duration This process repeats and continues until the sampling session ends according to the run control criteria set in the Session Settings dialog box Please ensure the run time is long enough to build up a statistically accurate picture of program behavior Lengthening the duration of the data collection period may be necessary to increase the number of samples taken 2 5 Importing Profile Data into CodeAnalyst AMD CodeAnalyst can import profile data into a project Typically this feature is u
39. app execution Time based profile 3300000 1 msec 0x0 CPU Clocks not Halted 5 Click the Start button in the toolbar or select Profile gt Start from the menu CodeAnalyst starts to collect data and will do so for the next 40 seconds as specified by the Profile duration field in the session settings dialog box 6 At the shell command line enter the following command to start the benchmark program java agentpath opt CodeA nalyst bin libCA JVMTIA 64 so jnt scimark2 commandline You must use the CodeAnalyst profiling agent that is appropriate for the JVM After 40 seconds has expired CodeAnalyst stops data collection and displays results in its workspace Some versions of the JVM have deprecated the use of the XrunCAJVMPIA option Java X options are non standard and are subject to change without notice 146 Chapter 9 Performance Monitoring Events 9 1 Performance Monitoring Events PME AMD processors provide many performance monitoring events to help analyze the performance of programs The performance monitoring events available for use depend upon the underlying AMD processor which executes the program or system under analysis Each processor family and possibly revision within a family offers a specific range of performance monitoring events Use the links below to browse the performance monitoring events for each specific processor family e BIOS and Kernel Developer s Guide BKDG For AMD Family 11h Processor h
40. as poor data access patterns that cause cache misses An event based profile can identify the reason for a performance issue as well as the code regions that may be performance culprits Event based profiling can test 2 1 3 2 1 4 2 129 2 1 6 Features hypotheses about a performance issue to identify and resolve it When multiple events are sampled an event profile shows the proportion of one event to another See Section 9 1 Performance Monitoring Events PME for descriptions of the events supported by AMD processors Please see Section 3 3 Event Based Profiling Analysis for more detail Instruction based sampling IBS also uses the performance monitoring hardware This kind of analysis identifies the likely cause of certain performance issues and associates those issues precisely to specific source lines and instructions Please see Section 3 4 Instruction Based Sampling Analysis for more detail Basic Block Analysis statically analyzes the assembly instructions to identify basic blocks and aggregates data accordingly Please see Section 3 5 Basic Block Analysis for more detail In line Analysis allows users to aggregate samples into either in line functions or in line instance Please see Section 3 6 In Line Analysis for more detail Call Stack Sampling CSS Analysis allows users to identify hot call paths in the application Please see Section 3 7 Call Stack Sampling Analysis for more detail
41. by the availability of the performance data required for generating the view Any view that can be displayed from available performance data can be selected for display For example the misaligned accesses view is offered when data is collected using either the basic assessment and the data access profile configurations because both of these predefined profile configurations collect the events needed to display the misaligned accesses view The following table summarizes the predefined view configurations associated with each of the predefined profile configurations Profile Configuration View Configuration Assess performance Overall assessment PC assessment e Branch assessment 107 View Configuration Investigate branching Investigate data access Investigate instruction access e Data access assessment e DTLB assessment e Misaligned access assessment e Branch assessment Near return report e Taken branch report e Data access assessment e Data access report e DTLB assessment e DTLB report e Misaligned access assessment e Instruction cache report e ITLB report Investigate L2 cache access e 2 access report Instruction based sampling 108 BS fetch overall BS fetch instruction cache BS fetch instruction TLB BS fetch page translations IBS All ops BS BR branch e IBS BR return BS MEM all load store BS MEM data TLB BS MEM data cache BS MEM forward
42. easier to review and interpret results e Section 2 4 Event Counter Multiplexing that extends Section 3 3 Event Based Profiling Analysis and makes it possible to measure more than four events in a single run 165 Bibliography BIOS and Kernel Developer s Guide BKDG BIOS and Kernel Developer s Guide BKDG For AMD Family 11h Processor http support amd com us Processor_TechDocs 41256 pdf BIOS and Kernel Developer s Guide BKDG For AMD Family 10h Processor http support amd com us Processor_TechDocs 31116 pdf BIOS and Kernel Developer s Guide for AMD Athlon and AMD Opteron Processors Rev A E http support amd com us Processor_TechDocs 26094 PDF BIOS and Kernel Developer s Guide for AMD NPT Family OFh Processors Rev F G http support amd com us Processor_TechDocs 32559 pdf General Documentation Basic Performance Measurements for AMD Athlon 64 AMD Opteron and AMD Phenom Processors http developer amd com Assets Basic_Performance_Measurements pdf Increased performance with AMD CodeAnalyst software and Instruction Based Sampling on Linux http developer amd com Assets amd_ca_linux_june_2008 pdf An introduction to analysis and optimization with AMD CodeAnalyst Performance Analyzer http developer amd com Assets Introduction_to_CodeAnalyst pdf Improving program performance with AMD CodeAnalyst for Linux http developer amd com assets Linux_Summit_PJD_2007_v2 pdf Instruction Based S
43. entities separately IBS fetch sampling and IBS op sampling may be enabled and collected separately or both may be enabled and collected together 3 4 1 IBS Fetch Sampling IBS fetch sampling is a statistical sampling method IBS fetch sampling counts completed fetch operations When the number of completed fetch operations reaches the maximum fetch count the sampling period IBS tags the fetch operation and monitors that operation until it either completes or aborts When a tagged fetch completes or aborts a sampling interrupt is generated and an IBS fetch sample is taken An IBS fetch sample contains a timestamp the identifier of the interrupted process the virtual fetch address and several event flags and values that described what happened during the fetch operation Like time based profiling and event based profiling CodeAnalyst uses the IBS sample data and information from the executable images debug information and source to build a profile IBS for software components executing on the system Instruction Based Sampling is also system wide 61 3 4 2 Types of Analysis The event data reported in an IBS sample includes Whether the fetch completed or aborted e Whether address translation initially missed in the level one L1 or level two L2 instruction translation lookaside buffer ITLB The page size of the L1 ITLB address translation 4K 2M Whether the fetch initially missed in the instruction cache IC a
44. fz c 45 f4 00 00 00 OO qax Os HH Yo rp multiply madrices 0x10 0 1 xoDDODDODDO40DafO ea 30 00 00 00 400b8e multiply matricesi t xac axoDDODODODOA00a fe be 00 oO OO OD POD bear A malrices x1c O 2 OxODOODODODO4OObO3 89 45 fe Soe x OxtfIffffffffffffar o rbp OxOcOOODODODO4OObOO c 45 fc 00 00 OO OO Ox Oc HAE 95 rbp Ox 0000000000400 bOd eb 52 400667 multiply madricesi tOx7f axoDODDODODO4DObOT amp b 45 fO E multiply mairicest E oxocOODODODO4OOb12 Bb 55 fc Cocot ic rp aed x ag 98 cliq OxODOOOODODO4OOb17 4863 d2 mowslq Yoeds Yo rdx x DDDODOODDAOOb1a 4869 cO es 03 00 00 imul 0x3e8 rax Morax 2235 4a amp instructions Total 9526 samples 5 65 of samples in module rootclassic 0 00 of total session samples 2 2 12 Code Density Chart Selecting Show Density charts under Windows toolbar menu displays a code density chart on the Source Disassembly Src Dasm tab The chart on the Src Dasm tab shows the number of samples relative to the location within the function or a user specified area The initial zoom level is at the function level and can also be shown at partial view Figure 2 17 Code Density Chart System Data root classic classic Data root classic classic Src Dasm x 46946 Function r Ox400aed 0x400 Figure 2 18 A drop down list provides choices for selecting code density Function Partial 16 Features 2 2 13 Ses
45. imported into different CodeAnalyst projects Opreport s XML File Opreport is an OProfile s commandline utility for viewing profile data in tabulate style It can also export data into an XML file 28 Features Choose Import Type Import Options Import Local Profile from Oprofile sample Files Recommended C Import Remote Profile from capackage sh Output Recommended Import CodeAnalyst Session Dir Opreport XML Output File 2 5 1 Import Local Profiling If you choose Local Profiling and click Next the wizard will prompt you to enter the location where the profiling data is stored The default location is var lib oprofile samples current Then click Finish 20 Features Figure 2 29 Import Wizard Choose Import path Oprofile Sample Path var lib oprofile samples current Default Apply Process Filter CodeAnalyst creates a new session for the imported data Session import The GUI displays the imported data in the System Data tab Users can also choose to import profile data from any process and the dependent modules by specifying the full path of each binary and using the Advanced Filter 30 Features example caw i TBP Sessions EBP Sessions i Session session Import Session Import ebp Session 01 NEN Aagregate by Modules Module gt Process CPU clocks DC i root classic inlined classic 47506
46. intensive application that performs few memory access operations will cause relatively few data cache miss events simply because it does not access memory very often The characteristics of the workload may even vary by phase where the phase setting up a computation has a different behavior from the computation phase itself Thus the workload behavior determines the frequency of certain kinds of events and changes to the sample period may be necessary in practice Event Multiplexing As mentioned earlier the number of available performance counters is processor dependent AMD Family 10h processors for example provide four performance counters The number of available performance 59 3 3 4 Types of Analysis counters determines the number of events that can be measured together at one time Ordinarily this would limit the number of events that can be measured in a single experimental run However CodeAnalyst for Linux allows users to set more than four performance counters within a profiling session See Section 2 4 Event Counter Multiplexing for further details NOTE The semantic for Event Multiplexing used in CodeAnalyst for Linux is different from the one in Windows Predefined Profile Configurations To make the process of configuration easier CodeAnalyst provides several predefined profile configurations in which the choice of events and sampling periods have already been made These predefined profile configurations a
47. libqt mt so 3 3 8 usr bin sudo opt CodeAnalyst bin oprofiled opt CodeAnalyst bin CodeAnalyst bin dbus daemon Sampling Session Idle i 8 4 1 Changing the View of Performance Data The CodeAnalyst GUI offers one or more views of performance data A view shows one or more kinds of performance data or computed performance measurements that are calculated from the performance data Select the current view from the drop down list that appears directly above the result tabs The Timer based profile view 1s offered for TBP data 1 Click the Manage button A dialog box appears to change aspects of the currently selected view The tutorial section on Section 8 5 Tutorial Analysis with Event Based Sampling Profile revisits Section 7 3 View Management One aspect of a view is the separation and aggregation of performance data An aggregated sample count is the sum total of the samples taken across all CPUs processes or threads Aggregated data is shown by default Data may be separated by CPU using the check boxes in the view management dialog box 119 Tutorial This special view has all of the data from the profile available GPU clocks not halted cycles 0 0x0 2 When the Separate CPUs option is enabled for a view CodeAnalyst displays sample data broken out for each core The following screen shot shows sample data for each module by individual core The application program classic
48. lt column gt A lt column gt element is non empty and contains exactly one of the following empty elements lt value gt Show the selected event data as a simple value lt sum gt Show the sum of two selected event data lt product gt Show the product of two selected event data lt ratio gt Show the ratio of two selected event data lt difference gt Show the difference of two selected event data These elements select and combine the event data to be shown in the column A lt value gt element has a single attribute id Data identifier to select event data The lt sum gt lt difference gt lt product gt and lt ratio gt elements take two attributes left Data identifier to select data for the left operand right Data identifier to select data for the right operand In the case of lt ratio gt for example the left identifier specifies event data for the numerator and the right identifier specifies event data for the denominator Note Simple lt value gt elements using an event must appear before any lt sum gt lt ratio gt elements in the lt output gt section that use the same event This ordering is required by the CodeAnalyst user interface With respect to future enhancement additional elements like lt sum gt etc can be defined to combine event data One such addition may be a lt formula gt element that defines a formula to be evaluated using event data The formula may use identifiers to refe
49. na wmlinux 10216 amp Mlibflibc 2 12 1 50 kii Jusrilib libX11 50 6 3 0 del Japt CodeAnalyst trunk bin aprofiled 70 Jusrilib libat mt so 3 3 8 be libjlippthread 2 12 1 s0 22 Jusr sbin sshd 42 lib libcrypto so 0 9 8 42 kia 219 7 1 Right click on the session name for the imported data 2 CodeAnalyst displays a pop up context menu Select Rename from the pop up menu to rename the session i EBP Sessions i Session Session Import 5 e Session Import Il All Data Session 01 Properties Copy configuration to current Rename View CSS in kcachegrind Enter new name for the session Rename Session Session Import CodeAnalyst changes the name of the session in the session management area of the workspace 31 Features example caw TBP Sessions 5 EBP Sessions E Session D1 my session my session ebp my session ll All Data Manage Aggregate by Modules Module gt Process CPU cli froot classic inlined classic no vmlinux Hib libc 2 12 1 s0 Jusr lib libX11 s0 6 3 0 apt CodeAnalyst trunk bin oprofiled Jusr lib libgt mt so 3 3 8 fJlib libpthread 2 12 1 s0 J usr sbin sshd eTeeTeeceeTecrecreerT c 2 5 2 Import Remote Profiling When the graphical user interface is not available on the system users can generate profile data using the opcontrol command line tool Please see Section 2 8 CodeAnalyst and OProfile
50. operation that caused the miss Improved precision makes it easier to pinpoint specific performance issues BS collects a wide range of hardware event information in a single measurement run 164 Features List e IBS collects new information such as retire delay and data cache miss latency See Section 3 4 Instruction Based Sampling Analysis for more information about IBS The sections Section 8 6 Tutorial Analysis with Instruction Based Sampling Profile and Section 5 4 Collecting an Instruction Based Sampling Profile give step by step directions for collecting IBS profile data CodeAnalyst reports IBS performance information as derived events Derived events are similar to the events reported by Section 3 3 Event Based Profiling Analysis The IBS derived events are described in Section 9 3 Instruction Based Sampling Derived Events CodeAnalyst supports the new performance monitoring events provided by AMD Family 10h processors See BIOS and Kernel Developer s Guide BKDG For AMD Family 10h Processor http support amd com us Processor_TechDocs 31116 pdf for complete list of performance events The online help has been restructured and revised The new structure follows the CodeAnalyst workflow See the Chapter 1 Introduction B 5 New Features in CodeAnalyst 2 6 e Chapter 4 Configure Profile that assist the configuration of profile data collection e Chapter 7 View Configuration that make it
51. readable copy of the corresponding source code to be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange or c Accompany it with the information you received as to the offer to distribute corresponding source code This alternative is allowed only for noncommercial distribution and only if you received the program in object code or executable form with such an offer in accord with Subsection b above The source code for a work means the preferred form of the work for making modifications to it For an executable work complete source code means all the source code for all modules it contains plus any associated interface definition files plus the scripts used to control compilation and installation of the executable However as a special exception the source code distributed need not include anything that is normally distributed in either source or binary form with the major components compiler kernel and so on of the operating system on which the executable runs unless that component itself accompanies the executable If distribution of executable or object code is made by offering access to copy from a designated place then offering equivalent access to copy the source code from the same place counts as distribution of the source code even though third parties are not compelled to copy the source along with the object code Section 4 You may not copy modify sublicense o
52. specify its location Advanced Micro Devic Eile Profile Tools Windows Help JE ciu BOO Assess performance F Open Ctri O El Save Ctri S Export Profiles Import Profiles Close Quit Amp bla bla caw Amp bla1 blal caw Amp bla pt bla pt caw 3 Enter the name of the new project into the Project Name field 4 Enter the path to the directory that will hold the new project into the Project location field You may also browse to the project location by clicking the Browse button next to the project location field 5 Click OK to create the new project A dialog box appears asking for basic session settings The session settings name the session to be created control the kind of data to be collected and specify the application program to be launched along with 1ts working directory Project Name my project Project location roat AMD CodeAnalyst Browse OK Cancel 6 Enter the name of the new session into the Template Name field 114 Tutorial 7 Enter the Launch path to the application program to be launched 8 Enter the path to the Working directory You may browse to either the application program or working directory by clicking the appropriate Browse button This tutorial uses the example program that is installed with CodeAnalyst The profile configuration controls the method of data collection and the kind of performance data to be collected 250000 Data cache accesses 25000 Data cac
53. straightforward textbook algorithm for matrix multiplication Matrix multiplication is performed in the function multiply_matrices This function provides an opportunity for optimization The classic implementation takes long non unit strides through one of the operand matrices These long strides cause frequent data translation lookaside buffer DTLB misses that penalize execution In order to prepare the example program from the tutorial 1 Go to directory path_to_CodeAnalyst_source_directory samples classic 2 At a command prompt run g g o classic classic cpp This should compile classic cpp into classic binary with debugging information Alternatively run make with the supplied makefile to generate the executable 8 3 Tutorial Creating a CodeAnalyst Project This section demonstrates how to create a new CodeAnalyst project All work takes place in the context of a CodeAnalyst project You must either create a new project or open an existing project in order to collect 113 Tutorial and analyze performance data Performance data is organized and saved as a Session A project may contain multiple sessions Sessions are saved with a project 1 Launch CodeAnalyst The CodeAnalyst window displays 2 To create new project click the New project icon in the toolbar or select File gt New from the File menu A dialog box appears asking for the project name and the project location You must name the project and you must
54. the launch target application radio button Otherwise select Specify TGID and specify the TGID of the application to be monitored Figure 3 15 Enabling CSS Profiling in Advanced tab of the Session Settings dialog General Advanced Note Enable vmlinux Oprofiled Buffer Configuration Event Buffer Watershed Size 131 072 Sam ples Event Buffer Size a1 94304 Sam ples CPU Buffer Size 32 768 Sam ples Enable Call Stack Sampling C55 Call Stack Depth 10 Call Stack Unwinding Interval 1000 C55 Filtering f Use TGID of the launched target application Specify TGID 3 7 2 Viewing CSS Profile CodeAnalyst leverages a third party utility called kcachgrind http kcachegrind sourceforge net html Home html In the Session Explorer right click on the session name and select View CSS in Kcachgrind This will launch kcachgrind with the CSS Sampling profile data for the session 70 Types of Analysis Figure 3 16 Invoke kcachgrind from Session Explorer test caw TBP Sessions i EBP Sess Properties Copy configuration to current cimark2 C scimark2 fox 3 6 10 libsglite3 so 3994 Rename View C55 in kcachegrind 1125 ad 2 12 1 50 464 usr libi firefox 3 6 10 firefox bin 93 fusr libfirefox 3 6 10 libxul so 75 Jusr lib firefox 3 5 10 libnspr4 so 53 lib libc 2 12 1 so 62 lib libm 2 12 1 s0 43 Figure 3 17 Launch Kcachgrind with CSS data mmm
55. the AMD64 instruction and the op types 9 3 4 2 Event 0xF201 Abbreviation IBS load The number of IBS op samples for ops that perform a load operation 9 3 4 3 Event 0xF202 Abbreviation IBS store The number of IBS op samples for ops that perform a store operation 9 3 4 4 Event 0xF203 Abbreviation IBS L1 DTLB hit The number of IBS op samples where either a load or store operation initially hit in the L1 DTLB data translation lookaside buffer 9 3 4 5 Event 0xF204 Abbreviation IBS DTLB L1M L2H The number of IBS op samples where either a load or store operation initially missed in the L1 DTLB and hit in the L2 DTLB 9 3 4 6 Event OxF205 Abbreviation IBS DTLB LIM L2M The number of IBS op samples where either a load or store operation initially missed in both the LI DTLB and the L2 DTLB 9 3 4 7 Event OxF206 Abbreviation IBS DC miss The number of IBS op samples where either a load or store operation initially missed in the data cache DC 9 3 4 8 Event OxF207 Abbreviation IBS DC hit The number of IBS op samples where either a load or store operation initially hit in the data cache DC 152 Performance Monitoring Events 9 3 4 9 Event 0xF208 Abbreviation IBS misalign acc The number of IBS op samples where either a load or store operation caused a misaligned access 1 e the load or store operation crossed a 128 bit boundary 9 3 4 10 Event 0xF209 Abbreviation IBS bank conf load
56. you wish to incorporate parts of the Program into other free programs whose distribution conditions are different write to the author to ask for permission For software which is copyrighted by the Free Software Foundation write to the Free Software Foundation we sometimes make exceptions for this Our decision will be guided by the two goals of preserving the free status of all derivatives of our free software and of promoting the sharing and reuse of software generally A 2 12 NO WARRANTY Section 11 BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE THERE IS NO WARRANTY FOR THE PROGRAM TO THE EXTENT PERMITTED BY APPLICABLE LAW EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND OR OTHER PARTIES PROVIDE THE PROGRAM AS IS WITHOUT WARRANTY OF ANY KIND EITHER EXPRESSED OR IMPLIED INCLUDING BUT NOT LIMITED TO THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU SHOULD THE PROGRAM PROVE DEFECTIVE YOU ASSUME THE COST OF ALL NECESSARY SERVICING REPAIR OR CORRECTION A 2 13 Section 12 IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER OR ANY OTHER PARTY WHO MAY MODIFY AND OR REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE BE LIABLE TO YOU FOR DAMAGES INCLUDING ANY GENERAL SPECIAL INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE PROGRAM INCLU
57. 000 CPU clocks not halted cyc 250000 Retired instructions 25000 Retired branch instructions 25000 Retired mispredicted bran Multiplexing interval msec Selected Events 2 2 15 View Management Dialog Box The Section 7 3 View Management dialog box allows customization of views After performance data is collected or imported it is managed as a pool of available performance data A CodeAnalyst view specifies the kinds of data to be displayed This feature allows users to choose and focus on performance information that is the most relevant to the issue under investigation For further explanation refer to Chapter 7 View Configuration There are two types of View Management dialog methods z i Local View Management Dialog is opened by clicking the Manage button Mee displayed in an open session window 18 Features di Global View Management Dialog is opened by selecting the Views icon from the toolbar or from the Tools menu For details using this dialog box go to Section 7 3 View Management Figure 2 21 Global View Management Global View Management Platform name cru Family 0x10 r View name Branch assessment al Description Use this view to find code with a high branch density and poorly predicted branches Columns Available data Columns shown am Ret inst 0x0 Ret branch 0x0 p Ret misp branch 0x0 Branch rate 0x0 Mispredict rate 0x0 Misnredict ratin Nixi Options Separate
58. 0000 Retired instructions Remove 2 7 1 Setting Templates Once users finish configuring a data collection session settings can be stored as a setting template The currently selected template will be used to configure future data collection sessions Stored templates are listed in the Setting Templates field When template selection changes the dialog will re populate each field with the settings previously stored The template provides convenience when performing multiple data collection with different settings Right click on the selected template to show options to Rename create Duplicate or Delete each template Renaming the current template can also be done simply by modifying the Template Name field and click Save or Ok button Click the Remove button to remove the currently selected template NOTE that the modified template must be saved by clicking the Save or Ok button before selecting another template 2 7 2 General Tab 37 Features Figure 2 34 Session Settings Dialog General Tab Boc il O Session General Advance Mate Session2 Sessione Launch Control Duplicate Launch root classic classic Rename Terminate app when stop profile Enable CPU Affinity oxi Show app in terminal Enable Process Filtering Advance Filter Profile Control Profile duration sec Profile start delay sec bo C Stop profile when the app exits Start with the profi
59. 15 1 333333 0 usr lib64 qt 3 3 lib libqt mt so 3 3 8 0 454545 opt CodeAnalyst bin oprofiled 1 500000 lib64 1d 2 12 so 0 833333 a S e G a a G a Kr usr bin sudo 1 usr lib64 libXft so 2 1 13 e usr libe4 libX11 s0 6 3 0 er usr libe4 libstdc4 4 so 6 0 13 Kr usr libe4 libldap 2 4 so 2 5 2 K1 sbin killall5 opt CodeAnalyst bin CodeAnalyst 1 lib64 libpthread 2 12 so Kr libe4 libpam so 0 82 2 1 lib64 libglib 2 0 so 0 2200 5 Kr usr sbin irgbalance Ow Co CO Co to CO CO CO oD DN q E Sampling Session Idle et l When data collection is complete CodeAnalyst processes the performance data and creates a new session under EBP Sessions in the session management area on the left side of the CodeAnalyst window Results are shown in the System Data tab The System Data table resembles and behaves like its TBP counterparts however the type and number of event based samples are shown instead of timer samples The Overall assessment view displays an overview of software performance The System Data table shows the number of events and computed performance measurements for each module that was active during data collection The Overall assessment view shows e CPU clocks CPU clocks not halted event e Instructions per clock cycle IPC Retired instructions CPU clocks e DC data cache miss rate DC misses retired instructions e DTLB data translation lookaside buffer miss rate DTLB miss
60. 2 1712 420 28 6818 a 810 858 160 114 1022 b Features When viewing the Source Disassembly tab in disassembly only mode it also displays basic block information by interleaving different background colors of white and gray Users can navigate through code execution paths from one basic block to the previous or the next basic block Right clicking at the beginning of a basic block opens a menu that lists the source addresses that are usually the destination address of a control transfer instruction in some basic blocks Right click at the end of a basic block to open a list with the destination address of the control transfer instruction CPU docks fet branch m TUTE m m nu O LL z mp um a ary or p Ox400b04 0x17 jmpq 40069 multiply Ox400b09 Oxlc mow S50x0 teax 4 n 0x400b0e 0x21 mow heax xc rbp Ox400b11 0x24 movl 50x0 0x10 krbp i 0b6e multiply_ Copy selection to buffer Ctrl C 0x4 rbp tedx z 0x400b1d 0x30 mow 0x10 rbp eax 3360 1766 show 81 0x400b20 0x33 cltq 682 324 Dest 0x400b6e E e Ox400b22 0x35 movslq tedx trdx i 0x400b25 0x38 imul 0x3e8 trdx trdx 2 When selecting multiple instructions on the Assembly tab AMD CodeAnalyst displays a summary of the selection in the status bar of the Assembly window OxODOOOODODO4O0Dae6 c 4570 00 00 00 OO mon BORO Oa FRO rbp ef ad 00 00 00 400 bof multiply madrices Oxbd gt OR DOCOOCOOCCHCOa
61. 55 For int k 0 k C 1118 He 66 sum sum matrix 20997 67 H 68 matrix r il j sum 65 63 Fo e 1 Instruction How Time Based Profiling Works Time based profiling uses statistical sampling to collect and build a program profile CodeAnalyst handles all of the details and mechanics of collecting and building a profile However simple knowledge of the TBP sampling process is helpful When time based profiling is enabled and started CodeAnalyst configures a timer that periodically interrupts the program executing on a processor core When a timer interrupt occurs a sample is created and saved for post processing by the CodeAnalyst GUI Post processing builds up a kind of histogram which is a profile of what the system and its software components were doing The most time consuming parts of a program will have the most samples because most likely the program is executing in those regions when a timer interrupt is generated and a sample is taken On CodeAnalyst Linux Time based profiling uses event CPU_CLK_UNHALTED performance counter event 0x76 which represents the amount of running time of a processor i e CPU is not in a halted state This event allows system idle time to be automatically factored out from IPC or CPI measurements providing the OS halts the CPU when going idle The time representation in seconds or millisecond can be calculated from the processor clock speed For instance on a processor runn
62. BS tag to ret IBS comp to ret IBS MEM data TLB 23093 9939 2071154 784822 IBS MEM data cache 752 p 313 50493 18820 IBS MEM forwarding and bank conflicts 159 44 59 4314 1863 IBS MEM locked ops and access by type 46 11 12 2025 857 IBS MEM translations by page size 61 15 28 1920 744 IBS NB cache state 18 3 4 427 198 Mlib libcrypto so 0 9 8 14 394 46 flib Id 2 12 1 s0 1 1 221 36 Jusr lib libat mt so 3 3 8 2 206 40 flib libpthread 2 12 1 so 181 Jusr lib libgtk x11 2 0 50 0 2200 0 2 121 12 Jusr lib libcairo so 2 11000 0 1 90 46 lib security pam_limits so 32 17 Jusr lib libgobject 2 0 50 0 2600 0 29 20 Jusr sbin irgbalance 26 11 x b nN e I pa ph NN Dd 14 44 4444 44 IAE 47 IP es Sampling Session Idle To change the type of data displayed in the current view click Manage The View Management dialog box opens Refer to the Section 7 3 View Management section for details The items listed in the Columns part of the View Management dialog box depend on the view configuration that is currently open for use 97 Collecting Profile Figure 5 18 View Management LA View Management Platform name View name IBS All ops Description Use this view to show a summary of all IBS op samples Columns Available data Columns shown IBS load store 020 IBS all ops 0x0 IBS IBS BR 020 IBS comp to ret 0x0 Ave tag to ret 0x0 fue rnamm taret Dr Options Sep
63. Based Profiling Works Like time based profiling event based profiling relies upon statistical sampling to build a program profile CodeAnalyst handles the details of PMC configuration and sampling However the following short description of how CodeAnalyst performs event based profiling may help to understand how to use CodeAnalyst more effectively Each counter must be configured with e An event to be measured specified by event select and unit mask values e An event count sampling period Choice of OS space sampling user space sampling or both and e Choice of edge or level detect Some AMD processor families support additional configuration parameters and CodeAnalyst offers these parameters when they are supported on the test platform Once a PMC has been configured and sampling is Started the counter counts the event to be measured until the event count sampling period is reached At that time an interrupt generates and an EBP sample taken The EBP sample contains a timestamp the kind of event that triggered the interrupt the identifier of the process that was interrupted and the instruction pointer where the program will restart after the return from the sampling interrupt CodeAnalyst uses this data along with information from the executable images debug information and source code to accumulate and display a profile for each executing software component process module function source line or instruction Sampling
64. Buffer Size 4194304 Buffer Watershed Size 1310772 CPU Buffer Size 32768 Backtrace Depth g Backtrace Interval N A Backtrace Tgid N A Backtrace Bitness N A Multiplexing Interval 1 CodeAnalyst provides a utility to monitor OProfile daemon and driver activities The tool accesses information in dev oprofile and var lib oprofile directories then presents statistics in 52 Features realtime This tool helps provide insight into data collection subsystem The tool is a dock window which can be moved around the main window or pulled out as a stand alone dialog 53 Chapter 3 Types of Analysis 3 1 Types of Analysis AMD CodeAnalyst provides several types of analysis Each kind of analysis provides information about certain aspects of program behavior and performance The following sections provide details about the kinds of analysis provided by AMD CodeAnalyst e Section 3 2 Time Based Profiling Analysis e Section 3 3 Event Based Profiling Analysis e Section 3 4 Instruction Based Sampling Analysis e Section 3 5 Basic Block Analysis e Section 3 6 In Line Analysis e Section 3 7 Call Stack Sampling Analysis Time based Event based and Instruction based sampling IBS analysis require different types of profile data collecting at run time The section on time based profiling has tips that apply to the other forms of analysis that use statistical sampling EBP and IBS Call stack sampling ana
65. CPUs Show Percentage Reset Current View Apply 5ave and Close Cancel 2 2 16 Configuration Management Dialog Box Performance data collection is controlled by profile configurations A profile configuration specifies basic run control parameters types of data to collect and how data is to be collected Certain configurations can be managed by the user to create new profile configurations Specifications for basic run control types of data collected and how data is to be collected are determined through the configuration s profile Predefined profile configurations and user defined profile configurations are found in the Configuration Management dialog box or in the toolbar list of profile configurations Configuration management allows for customizing existing profiles and for creating new ones Configurations for both profiles and views are stored in files when CodeAnalyst is not running Saving in this manner allows for easy sharing of files Each user created configuration is permanently stored as a file until it is removed by using the Remove button See Section 4 5 Manage Profile Configurations for additional details Use the Configuration icon i to open the dialog box 2 2 16 1 Current Type Profiles The profile configuration list contains three Current configurations Current time based profile Current event based profile Current instruction based profile These configurations are considered as 19
66. CodeAnalyst In order to get an overall impression of the CodeAnalyst workflow please read following sections of these tutorials e Section 8 2 Tutorial Prepare Application e Section 8 3 Tutorial Creating a CodeAnalyst Project e Section 8 4 Tutorial Analysis with Time Based Sampling Profile e Section 8 5 Tutorial Analysis with Event Based Sampling Profile e Section 8 6 Tutorial Analysis with Instruction Based Sampling Profile The following sections describe different areas of the Code Analyst configurations and workflow in detail e Section 2 3 CodeAnalyst Options e Section 2 5 Importing Profile Data into CodeAnalyst e Section 2 6 Exporting Profile Data from CodeAnalyst e Section 2 7 Session Settings e Chapter 3 Types of Analysis e Chapter 4 Configure Profile e Chapter 5 Collecting Profile e Chapter 7 View Configuration Preparing an Application for Profiling AMD CodeAnalyst uses debug information produced by a compiler Debug information 1s not required for Code Analyst profiling but it is required for source level annotation Performance data can be collected for an application program that was compiled without debug information but the results displayed by CodeAnalyst are less descriptive For example CodeAnalyst will not be able to display function names or source code Assembly code is displayed instead When compiling an application in release mode the
67. CodeAnalyst User s Manual AMD CodeAnalyst CodeAnalyst User s Manual AMD CodeAnalyst Publication 26833 Revision 1 2 Publication date April 2012 Copyright 2003 2012 Advanced Micro Devices Inc All rights reserved Disclaimers The contents of this documentation are provided in connection with Advanced Micro Devices Inc AMD products AMD makes no representations or warranties with respect to the accuracy or completeness of the contents of this publication The information contained herein may be of a preliminary or advance nature and is subject to change without notice No license whether express implied arising by estoppel or otherwise to any intellectual property rights is granted by this publication Except as set forth in AMD s Standard Terms and Conditions of Sale AMD assumes no liability whatsoever and disclaims any express or implied warranty relating to its products including but not limited to the implied warranty of merchantability fitness for a particular purpose or infringement of any intellectual property right AMD s products are not designed intended authorized or warranted for use as components in systems intended for surgical implant into the body or in other applications intended to support or sustain life or in any other application in which the failure of AMD s product could create a situation where personal injury death or severe property or environmental damage may occur AMD reser
68. DING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES END OF TERMS AND CONDITIONS A 3 How to Apply These Terms to Your New Programs If you develop a new program and you want it to be of the greatest possible use to the public the best way to achieve this is to make it free software which everyone can redistribute and change under these terms To do so attach the following notices to the program It is safest to attach them to the start of each source file to most effectively convey the exclusion of warranty and each file should have at least the copyright line and a pointer to where the full notice is found lt one line to give the program s name and a brief idea of what it does gt Copyright C lt year gt lt name of author gt This program is free software you can redistribute it and or modify it under the terms of the GNU General Public License as published by the Free Software Foundation either version 2 of the License or at your option any later version 162 GNU General Public License This program is distributed in the hope that it will be useful but WITHOUT ANY WARRANTY without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE See the GNU General Public License for m
69. Data Tab The System Data tab can be viewed in two modes When aggregated by module the tab lists the modules and sample counts in descending sample count order If the module is common in several processes the processes are listed in the second level along with the PID When aggregated by process the tab lists the executables and sample counts in descending sample count order If there are several processes that run the same executable the PIDs are listed in the second level For each PID the executable and dependency modules are listed in the third level In both modes double clicking on a module drills down to the TBP or EBP samples within the selected module which is displayed in the Single Module Data Tab Figure 2 12 System Data tab Aggregated by Module Session Session ebp All Data Manage eem lr r lil Hodie gt Proce N See Jroot classic classic 95916 43044 39908 Ina vmlinux 17018 1450 348 V Hib libc 2 12 1 50 378 148 20 2 Jusr lib libX11 so 6 3 0 248 90 46 12 lapt CodeAnalyst trunk bin oprofiled 170 144 12 Hib libcrypto so 0 9 8 76 40 l ib libpthread 2 12 1 so 58 10 Jusr lib libat mt sa 3 3 8 56 10 Jusr sbin sshd 44 18 Hib Id 2 12 1 s0 12 Jusrilib libXt so 6 0 0 Jusr lib libxcb so 1 1 0 Jusr lib libXcursor so 1 0 2 Jusr sbin irgbalance Jusr lib libXft so 2 1 13 usr lib libXaw7 so 7 0 0 J usr lib libstdc 4 so 6 0 14 Jusr lib libgdk x11 2 0 50 0 2200 0
70. Detailed information about the L3 cache state when the L3 cache services a Northbridge request BS NB local remote access Breakdown of local and remote memory accesses via the Northbridge BS NB request breakdown Breakdown of requests made through the Northbridge e IBS fetch instruction TLB Detailed information about instruction translation lookaside buffer ITLB behavior BS fetch instruction cache Detailed information about instruction cache IC behavior BS fetch overall Breakdown of attempted killed completed and aborted fetch operations BS fetch page translations Breakdown of ITLB address translations by page size Most software developers will be interested in the overall breakdown of IBS ops branch operations load store operations data cache behavior and data translation lookaside buffer behavior The breakdown of local remote accesses through the Northbridge can provide information about the efficiency of memory access on non uniform memory access NUMA platforms 1 Select IBS fetch instruction cache from the drop down list of views IBS information about instruction cache behavior is displayed IC related IBS derived events are shown including the number of IBS fetch samples for attempted and completed fetch operations the number of fetch samples which indicated an IC miss and the total IBS fetch latency An attempted fetch is a request to obtain instruction data from cache or system memory A fetch attemp
71. EJ Session Settings E CodeAnalyst Options Configuration Management J View Management OProfile Daemon Monitor OProfile Driver Monitor Figure 2 7 Active and Inactive Icons The following table summarizes the Tools menu items and associated toolbar icons Hovering the mouse over the icon displays the name as indicated in parentheses Links under Menu Command go to the corresponding section for more information The rollover text is included to assist with similar naming conventions used Session Settings Use this icon to change the sessions settings e g time based and event based Please see Section 2 7 Session Settings for more detail CodeAnalyst Options Opens the CodeAnalyst Options dialog box AII of the application level configuration options can le be changed Please see Section 2 3 CodeAnalyst Options for more detail Configuration Management Profile configurations determine how performance data is collected Profiles configurations can be defined by the user or predefined configurations can ae be selected For more information see Chapter 4 Configure Profile View Management A view consists of a set of event data and computed performance measurements displayed Bil together in a table or a graph Use View Management to open the View Configuration dialog Features Menu Command Icons Description box and to change the contents displayed in the view Exchange content from th
72. IBS branch op derived event reports the number of branches The sampling basis is different The All Data view shows the number of occurrences of all IBS derived events Instruction Based Sampling collects a wide range of performance data in a single run For instance when all dervied events from both IBS fetch and op data are collected the All Data view displays information for over 60 IBS derived events However this is not recommended since it could potentially introduce overhead when processing the profile data CodeAnalyst provides several predefined views that display IBS derived events in logically related groups The available views are BS All ops Breakdown of all IBS op samples by major type 133 Tutorial IBS BR branch Detailed information about branch operations e IBS BR return Detailed information about subroutine return operations IBS MEM all load store Breakdown of memory access operations by major type IBS MEM data TLB Detailed information about data translation lookaside buffer DTLB behavior IBS MEM data cache Detailed information about data cache DC behavior IBS MEM forwarding and bank conflicts Detailed information about store to load data forwarding and bank conflicts BS MEM locked ops and access by type Detailed information about locked operations and UC WC memory access BS MEM translations by page size Breakdown of DTLB address translations by page size e BS NB cache state
73. Op Northbridge Derived Events 154 op sampling 62 op sampling bias 63 predefined profile 64 review results 133 icons 6 importing CodeAnalyst Session Directory 33 importing xml 34 In Line Analysis 66 in line function aggregate into original 67 source inlined annotation 68 in line instance aggregate samples 66 Instruction Based Sampling Analysis 60 J java app launch commandline 145 L Launch 38 Launch Control 38 manage button 18 167 menu file 6 help 11 profile 7 tools 8 windows 9 menus 6 multiplexing examples 26 N New Features in CodeAnalyst 2 6 165 New Features in CodeAnalyst 2 7 164 New Features in CodeAnalyst 2 8 164 New Features in CodeAnalyst 2 9 164 New Features in CodeAnalyst 3 0 164 Note Tab 41 O opcontrol 47 Event Based Profiling 49 Instruction Based Sampling 51 Time Based Profiling 48 OProfile 47 OProfile Daemon Monitoring Tool 52 Oprofiled Buffer Configuration 40 OProfiled Log 41 overview 2 P performance data change view 119 platform name 106 PME 147 PME Unit masks 147 predefined views 107 Process Filter 46 profile configuration 19 Profile Configuration 39 profile configuration file 100 Profile Configurations 79 Manage 80 Profile Control 39 profile data exporting 34 importing 27 importing and viewing 52 importing local 29 profile java apps 21 profiles and performance data collecting 84 Profiling Prepare Applicati
74. Period and Measurement Period Since EBP is a statistical method it also depends upon a statistically significant quantity of samples in order to support reasoning about program behavior As discussed in Section 3 2 Time Based Profiling Analysis the number of samples collected depends upon the sampling period the event count parameter and the measurement period the length of time during which samples are collected The number of samples collected can be increased by using a smaller sampling period or by increasing the length of time that samples are taken Use of a smaller sampling period increases data collection overhead Since data collection must be performed on the same platform as the test workload more frequent sampling increases the intrusiveness of event based profiling and the sampling process adversely affects shared hardware resources like instruction and data caches translation lookaside buffers and branch history tables Extremely small sampling periods may also cause system instability Start off conservatively and slowly decrease the sampling period for an event until the appropriate volume of samples is generated An additional complicating factor when choosing the sampling period for an event is the behavior of the workload itself Some workloads are CPU intensive while other workloads are memory intensive Some workloads may be CPU intensive and require high memory bandwidth to stream data into the CPU For example a CPU
75. Prepare Appeal 295535 basti den gestae vta eoe titaiea Ox ital ceo baro 113 8 3 Tutorial Creating a CodeAnalyst Project o ooocooocccooncccononcononncncononcnnoononcononnononnns 113 8 4 Tutorial Analysis with Time Based Sampling Profile esses 117 8 4 1 Changing the View of Performance Data ssssssseseeee meme 119 8 5 Tutorial Analysis with Event Based Sampling Profile oooococcoconcononconcncononos 122 00 1 ASSessino POrtormalcCe iio 123 56 5 2 Chans ine Contents ota MEW di dla 125 5 5 3 Choosing Events Tor Data Collection usina Pale sebo bet NR ebd va 127 8 6 Tutorial Analysis with Instruction Based Sampling Profile sesusuessuse 131 5 6 1 Collecting IBS Data aii ii toad aea eM aes 131 0 02 Review mo IBS Results dc aset essa ooo set ia 133 5 0 5 Dilme Down Into IBS Dala enacted utet else wee O aid 137 8 7 Tutorial Profiling a Java Application esssseesssseeee mme 139 5 eile REVIEWING Results caia das 141 8 7 2 Launching a Java Program from the Command Line ccccececeeeeeeeees 145 9 Pertormance Monittormpe Events uoc o e ot o Qe a bxc ella c tea taba erue ubi 147 9 1 Performance Monitoring Events PME iso ba rete ene Neuen te ede lasedeadctels 147 2 2 iit masks Tor PMES aiio iuc ento oe hoo e Rebum Ta aat Ree du aO op ist 147 9 3 Instruction Based Sampling Derived Events cccc
76. Size Measure data load store page size performance The listed configurations are not modifiable However CodeAnalyst provides three modifiable profile configurations which can be used as templates for creating customized profile configurations e Current time based profile Current event based profile e Current instruction based profile 4 5 Manage Profile Configurations CodeAnalyst calls the process of creating modifying and saving a custom user defined profile configuration Configuration Management The Section 4 4 Predefined Profile Configurations are preset and cannot be changed However a profile configuration can be saved under a different name and the new profile configuration can be modified Three modifiable profile configurations are also available e Current time based profile Current event based profile Current instruction based profile These profile configurations act as templates for creating new user defined profile configurations The Current profile configurations are persistent and can be used as temporary scratchpad profile configurations To start configuration management 1 Open the Configuration Management dialog box by clicking on the Configuration Management button in the toolbar or select Tools gt Configuration Management from the menu bar 2 The Configuration Management dialog box appears Select a profile configuration in the Profile configuration list 3 Click one of the confi
77. The commands available for controlling data collection and for selecting the profile configuration are shown in the following table Activates the sampling process in the same way as choosing Start from the Sampling toolbar This icon is in the active state when a project is open and the sampling process has not yet started Suspends the sampling process in the same way as choosing Pause from the Sampling toolbar This icon is red in the active state when a project is open and the sampling process is in progress Terminates the sampling process in the same way as choosing Stop from the Sampling toolbar This icon is red in the active state when a project is open and the sampling process has started or has been paused profile configurations Second list Provides a list of pre defined profile configurations for data collection The profile configuration determines the performance data to be collected time based event based etc The list changes according to which mode is selected See Chapter 4 Configure Profile for more details Features 2 2 8 Tools Menu and Icons The Tools menu contains icons for modifying project options for the current project and application level CodeAnalyst options These icons also appear as toolbar group icons that float or can be docked As the last illustration shows the Session Settings icon is not active unless a session is opened Figure 2 6 Tools menu and toolbar Tools Windows Help
78. The number of IBS op samples where either a load or store operation caused a bank conflict with a load operation 9 3 4 11 Event OxF20A Abbreviation IBS bank conf store The number of IBS op samples where either a load or store operation caused a bank conflict with a store operation 9 3 4 12 Event OxF20B Abbreviation IBS forwarded The number of IBS op samples where data for a load operation was forwarded from a store operation 9 3 4 13 Event 0xF20C Abbreviation IBS cancelled The number of IBS op samples where data forwarding to a load operation from a store was cancelled 9 3 4 14 Event 0xF20D Abbreviation IBS UC mem acc The number of IBS op samples where a load or store operation accessed uncacheable UC memory 9 3 4 15 Event 0xF20E Abbreviation IBS WC mem acc The number of IBS op samples where a load or store operation accessed write combining WC memory 9 3 4 16 Event OxF20F Abbreviation IBS locked op The number of IBS op samples where a load or store operation was a locked operation 9 3 4 17 Event OxF210 Abbreviation IBS MAB hit 153 Performance Monitoring Events The number of IBS op samples where a load or store operation hit an already allocated entry in the Miss Address Buffer MAB 9 3 4 18 Event 0xF211 Abbreviation IBS L1 DTLB 4K The number of IBS op samples where a load or store operation produced a valid linear virtual address and a 4 KByte page entry in the L1 DTLB wa
79. ampling A New Performance Analysis Technique for AMD Family 10h Processors http developer amd com assets AMD_IBS_paper_EN pdf 166 Index A Advance Tab 39 assess performance 123 B Basic Block Analysis 64 basic steps run 4 Buffer Size CPU 40 Event 40 C Call Stack Depth 40 Unwinding Interval 41 Call Stack Sampling CSS 40 Call Stack Sampling Analysis 69 Enabling CSS Profile 70 Viewing CSS Profile 70 cascading session panes 10 change view 125 code density chart 16 CodeAnalyst dialog window 21 CodeAnalyst options 20 Compiling GCC 1 configuration management 72 CPU Affinity 42 current profiles 19 D data aggregation 23 data collection 3 choosing events 127 data source display 12 directories tab 25 drill down 3 E EBP 92 changing view 91 collecting profile 88 configuration 17 event multiplexing 59 how it works 59 predefined profile 60 sampling and measuring period 59 EBS IBS Configuration 74 enhancement requests 157 Event Buffer Watershed size 40 event counter multiplex 26 Event Based Profiling Analysis 57 F Features List 164 G General tab 22 General Tab 37 graphical user interface 3 gul explore 4 IBS collecting data 131 collecting profile 93 derived events 64 148 drill into data 137 Fetch Derived Events 148 fetch sampling 61 Op Branches Derived Events 151 op data 63 Op Derived Events 150 Op Load Store Derived Events 151
80. and misses for source level hot spot EBP 58 30 IBS op samples tor eaclicsorbware Module e ra ida aisla 61 3 7 Attribution of IBS op samples to source level hot spot cece cece sence ee ec eee eeeeeeaeneeeennenees 61 56 Code Analyst ODLDIODS tad A toda erate tales 65 3 9 Module data view OF basic block ageres ato iia ads 65 3 10 Basic block information in Disassembly View ccccccecescec ence ence sence emm 66 Vil CodeAnalyst User s Manual 3AN Basie block Pop Up MEN Dei TD 66 3 12 AAS OTe Cale ThtO ne N CE salida dai S ndun 67 3 19 Abore ae mo nline TUTICUOT ESAS AS R 68 S414 Annotated Tilined tancias 69 3 15 Enabling CSS Profiling in Advanced tab of the Session Settings dialog suus 70 3 16 Invoke kcacherind from Session Explorer mu iia 71 527 saunen Wcachorimnd WI 58 dt tE NA NES 71 Alo AO Uter COn AUTOR iia 74 A Git BS BS CONSTAND a A Ai 75 4 3 Edit EBS IBS configuration Selected Events Tab sess 76 4 4 Edit EBS IBS configuration Description Tab cc cece cece cence eee e eee enc eens ene ea ene eaeneenees LF 4 5 Edit EBS IBS configuration Perf Counter Tabu iaa TI 4 6 Edit EBS IBS configuration IBS Fetch Op Tab sees ene 78 4 7 Edit EBS IBS configuration Import Tab i i t e o E dete Sot opa A 78 4 5 Edit EBS IBS conti surati n Into Tab i s eoe A A A A 79 49 Contisutation Mana
81. application at the completion of the profile sampling duration Show app in terminal Run the target application in a terminal This option allows users to access stdin stdout stderr from the command line Enable CPU Affinity specifies the list of CPUs allowed to run the target application See Section 2 7 6 Changing the CPU Affinity Enable Process Filter filters out processes during data processing unless specified in Advance Filter See Section 2 7 7 Process Filter 2 7 2 3 Profile Control Stop data collection when the app exits terminates the sampling process if the application terminates This option is convenient when profiling small applications or if the shutdown sequence is desired in the profile Selecting this option enables the s Profile Duration option that sets up the profile run time in seconds Profiling large applications over a long time period with this option could create very large profile data files Profile the duration of the app execution allows profiling to continue as long as the specified application is running When this option is selected the s Profile duration option is disabled and no specific profiling time limit is needed Profile duration sec sets the profile sampling duration in seconds Start with profiling paused This option is included for times when using the profiler API to programmatically control the Pause and Resume functionality Start delay sec set
82. arate CPUs Show Percentage 5 4 3 Changing How IBS Data is Collected The predefined profile configuration named Instruction Based Sampling collects both IBS fetch and op data The way IBS data is collected can be changed by editing the Current instruction based profile configuration 1 Click the Session settings button in the toolbar or select Tools gt Sessions Settings from the menu bar Under Profile configuration select the profile configuration Current instruction based profile 98 Collecting Profile Figure 5 19 Edit EBS IBS configuration IBS store ops IBS L1 DTLE hit IBS L1 DTLB miss L2 D TLB hit IBS L1 DTLB miss L2 DTLM miss Add Events Events in this profile configuration Event Setting Event Source Coum Unimas Usr Os name Coun OxfOO0 IC 1000000 Ox00 0 O IBS all fetch samples IBS Op Options xf100 FR 100000 0x01 0 0 IBS all op samples Oxf201 DE 0x01 O O IBS load ops DC 1000000 0x01 O0 O IB5 store ops Reserved Reserved Reserved Reserved Reserved Reserved Reserved Enable Dispatch count mode C usr Apply Setting Multiplexing interval msec ol Os Selected Events 2 In the Edit EBS IBS configuration dialog users can select IBS Fetch Op events from the available performance event lists remove events from the selected list of events or change the sampling periods and options of each type of event Please see Section 4 3
83. assic classic opt CodeAnalyst bin oprofiled opt CodeAnalyst bin CodeAnalyst lib64 libpthread 2 12 so libe4 libpam so 0 82 2 lib amp 4 libglib 2 0 so 0 2200 5 libG4 libc 2 12 so lib64 1d 2 12 so Sampling Session Idle 8 6 2 Reviewing IBS Results CodeAnalyst reports IBS performance data as IBS derived events See Section 9 3 Instruction Based sampling Derived Events for descriptions of the IBS derived events Although IBS derived events look similar to performance monitoring counter PMC events the sampling method is quite different PMC event samples measure the actual number of particular hardware events that occurred during the measurement period IBS derived events report the number of IBS fetch or op samples for which a particular hardware event was observed Consider the three IBS derived events BS all op samples BS branch op BS mispredicted branch op The IBS all op samples derived event is a count of all IBS op samples that were taken The IBS branch op derived event is the number of IBS op samples where the monitored macro op was a branch These samples are a subset of all the IBS op samples The IBS mispredicted branch op derived event is the number of IBS op sample branches that mispredicted These samples are also a subset of the IBS branch op samples Unlike PMC events that count the actual number of branches event select 0x0C2 it would be incorrect to say that the
84. ate samples into original inline function j Aggregate samples into basic blocks show Data Aggregation Controller in Module Data Tab Toolbar 2 2 18 Profiling Java Applications CodeAnalyst supports profiling of Java applications The target Java application must be invoked with the Java agentpath option to specify the use of CodeAnalyst Java Profiling Agent To launch a Java application from within AMD CodeAnalyst 1 In the launch field of the Session Settings Dialog launch the Java command line with agentpath opt CodeAnalyst lib libCA JVMTIA32 so or agentpath opt CodeAnalyst lib libCA JVMTIA 64 so 2 In the working directory field type the path of Java application to be launched For example usr bin java agentpath opt CodeAnalyst lib libCA JVMTIA32 so examplel where examplel class is in your working directory 2 3 CodeAnalyst Options CodeAnalyst options control how AMD CodeAnalyst displays profile data and its toolbar and how it finds source and debug information These options are persistent and are effective across projects and sessions They affect the CodeAnalyst application as a whole CodeAnalyst options are changed using the CodeAnalyst Options dialog box The dialog box contains following tabs 21 2 3 1 2 3 1 1 Features e Section 2 3 1 General Tab e Section 2 3 2 Directories Tab Open the CodeAnalyst Options dialog box by clicking the CodeAnalyst Options icon e in th
85. ccccececeeceeeeceeeeeeeeeaeeseneeaeneeas 148 9 3 TL IBS Fetch Derived EXCUSAS Ne 148 0325 IBS Op Den ved EVetlts aoi a E a A E NA 150 93s IBS Op Branches Den ved Events dida 151 9 3 4 IBS Op Load Store Derived Events ici A ote eut 151 9 3 5 IBS Op Northbridge Derived Events onera bete ebbe bre ede 154 TO SUP DOGG ciis ott oat II talento Baan Ai eta icut eed a bate iax be oed 157 I0 1 Enhancement Requests 22 e cdet imeem oe aene iia 157 10 2 Proben Reportar ist mca bn suelo Uca us dre etate tutos T 157 Po GNU General Pubie CICESE ia AAA s OV e vdd ncs te Mudo nea 158 Db A A O II a bu dtun Meat sie 158 A 2 TERMS AND CONDITIONS FOR COPYING DISTRIBUTION AND MODIFICA ON ut is o elos tos teu Ste m next 159 Psd Vi SECON U MT E 159 21 2 1 DOBLE AA nn II O a II a 159 PEZ SECO LA iS 159 DUO O S Sone tie RARA 160 1852 ECO ASA A A EN 160 22 0 SECO AN o Tu II A O 160 AZ ln SEU O A AS 161 A 966 HOD di ds to ce es dae dead 161 CodeAnalyst User s Manual Pid OD SCCUON AA AS 161 Pe LO DEC TON POPE pu os teescanecdhunmes E 161 12 SS A PR euet euet te date dawns 162 AZAZ NO WARRANTY Sect Om Ul sess ts 162 2 3 SECO a Ni 162 A 3 How to Apply These Terms to Your New Programs ooococcccconcnconcononnoncnnoncononcononnns 162 B ESAS LA A AAA AN a 164 DA New Features q1n CodeAnalyst 320 AS 164 B2 New Features Inr Code Analyst 29 ii dla a 164 Bo New Features in Code Analy Sb 228 ii is 164 Ba New Features in CodeAnalyst 2 7 c
86. ccess assessment Overall assessment The All Data view is displayed by default after data collection The data is displayed in as many columns as there are distinct kinds of sample data The list of available views is determined by the kind of performance data that was collected A view is only offered if the right kind of performance data is available to produce the view CodeAnalyst provides several predefined views See Section 7 4 Predefined Views for more details The choice of performance data and the aggregation separation of data by CPU core process and thread is controlled through Section 7 3 View Management View Management aids interpretation by filtering out unneeded performance data and by breaking out or summarizing performance data across CPUs processes and threads The CodeAnalyst GUI does not provide a way to directly create and save new views Advanced users may choose to create their own views and should see Section 7 2 View Configurations for more information Expertise with XML is required 7 2 View Configurations Information that defines a view is stored in a view configuration View configurations like profile configurations are stored as XML format files This approach allows for 104 View Configuration e Sharing view configurations between users e Saving customized view configurations Creating and distributing new view configurations more easily Predefined views are install
87. changed by moving items between the Available data list and the Columns shown list To add an item to the Columns shown list and make it visible in the view 1 Select the item in the Available data list 2 Click the right arrow gt button To remove item from the Columns shown list and remove it from the view 1 Select the item in the Columns shown list 2 Click the left arrow lt button NOTE Changes made to view configurations are persistent 7 3 1 5 Separate CPUs When selected event data is shown for each processor in separate columns in the System Data table For example on a dual core processor timer based samples display in two columns with the headings Time based samples CO Time based samples Cl 7 3 1 6 Show Percentage Enabling Show Percentage displays the percentage column in the view 7 4 Predefined Views AMD CodeAnalyst provides many predefined views These predefined views cover the most common kinds of analysis and directly support the predefined profile configurations that control collection of performance data Profile data collection configurations choose which performance data are collected during a session Coordinated predefined view configurations provide purpose specific views of this performance data A view may also include computed performance measurements rates and ratios of events for example that are calculated from the raw performance data View availability is determined
88. column lt output gt SLOOl cLrpo Show instructions per cycle Seog tip lt description gt Use this view to find hot spots with low instruction level description lt view gt lt view_configuration gt 112 parallelism Chapter 8 Tutorial 8 1 AMD CodeAnalyst Tutorial 8 1 1 This tutorial demonstrates how to use AMD CodeAnalyst to analyze the performance of an application program The tutorial provides step by step directions for using AMD CodeAnalyst We recommend reading the tutorial sections in the order listed below e Section 8 2 Tutorial Prepare Application e Section 8 3 Tutorial Creating a CodeAnalyst Project e Section 8 4 Tutorial Analysis with Time Based Sampling Profile e Section 8 5 Tutorial Analysis with Event Based Sampling Profile e Section 8 6 Tutorial Analysis with Instruction Based Sampling Profile e Section 8 7 Tutorial Profiling a Java Application Related Topics For quick reference to options available in the CodeAnalyst workspace See Section 2 2 Exploring the Workspace and GUI 8 2 Tutorial Prepare Application This tutorial uses the example program that is distributed with AMD CodeAnalyst Source code for the example program is installed with CodeAnalyst To find the source code locate the directory into which CodeAnalyst was installed and then find the samples classic directory The example program classic implements the
89. currently using the same view The changes made to a view configuration within the dialog will propagate to all opening sessions that are currently displaying the view Global View Management dialog also allows user to manage platform specific view configurations Changes made in this dialog are saved to the view configuration file The user can reset the pre defined view configuration to the original value in this dialog e Local Manage the view of an individual session The changes made to a view configuration within the dialog are applied only to an individual session and are not saved to the view configuration file View management can be initiated in any of three ways e m Click the view management icon in the toolbar Global e Select Tools gt View Management from the menu bar Global e Click the Manage button next to the drop down list of views Local Overall assessment Manage 105 View Configuration 7 3 1 View Management Dialog Figure 7 2 Global View Management Dialog gl Global View Management Platform name CPU Family 0x10 r View name Branch assessment Description Use this view to find code with a high branch density and poorly predicted branches Available data Columns shown Ret inst 0x0 Ret branch 0x0 Ret misp branch 0x0 Branch rate 0x0 Mispredict rate 0x0 Misnredict ratin Me Options Separate CPUs Show Percentage Reset Current View Apply Save and Close Cancel
90. d in the IC instruction cache 149 Performance Monitoring Events 9 3 1 10 Event 0xF009 Abbreviation IBS IC hit The number of IBS attempted fetch samples where the fetch operation initially hit in the IC 9 3 1 11 Event OxFOOA Abbreviation IBS 4K page The number of IBS attempted fetch samples where the fetch operation produced a valid physical address 1 e address translation completed successfully and used a 4 KByte page entry in the L1 ITLB 9 3 1 12 Event OxFOOB Abbreviation IBS 2M page The number of IBS attempted fetch samples where the fetch operation produced a valid physical address 1 e address translation completed successfully and used a 2 MByte page entry in the L1 ITLB 9 3 1 13 Event OxFOOE Abbreviation IBS fetch lat The total latency of all IBS attempted fetch samples Divide the total IBS fetch latency by the number of IBS attempted fetch samples to obtain the average latency of the attempted fetches that were sampled 9 3 2 IBS Op Derived Events 9 3 2 1 Event 0xF100 Abbreviation IBS all ops The number of all IBS op samples that were collected These op samples may be branch ops resync ops ops that perform load store operations or undifferentiated ops e g those ops that perform arithmetic operations logical operations etc IBS collects data for retired ops No data 1s collected for ops that are aborted due to pipeline flushes etc Thus all sampled ops are architecturally signi
91. d instructions Remove This tab contains the information from var lib oprofile samples oprofiled log This is available only in property mode of the Session Setting dialog Changing the CPU Affinity CPU affinity limits the execution of a program or process to selected cores in a multicore system CPU affinity 1s set through a CPU affinity mask in which each bit of the mask specifies whether the program or process may execute upon a particular core The number of available cores is system dependent CPU affinity can be used to perform scalability analysis by limiting the number of cores available to a multi threaded program CPU affinity is defined in the CodeAnalyst Session Settings dialog box The CPU affinity mask can be specified directly as a hexadecimal value in the CPU affinity mask field as shown in the screen shot below The CPU affinity mask determines the CPU affinity for the application program that is launched by CodeAnalyst 42 Features Setting Templates Template Name Session Launch Control Launch raat classic classic Working directory roat Show app in terminal Profile Control Profile duration sec pp Profile start delay sec 0 Stop profile when the app exits C Start with the profiling paused Profile the duration of the app execution Profile Configuration Remove It may be more convenient to set CPU affinity using the Select Affinity button located to the r
92. d or store event information includes Whether a load was performed 63 3 4 3 3 4 4 Types of Analysis e Whether a store was performed e Whether address translation initially missed in the L1 and or L2 data translation lookaside buffer DTLB Whether the load or store initially missed in the data cache DC The virtual data address for the memory operation and The DC miss latency when a load misses the DC Requests made through the Northbridge produce additional event information Whether the access was local or remote and The data source that fulfilled the request A full list of IBS op event information appears in the section on IBS events Hardware level details can be found in the BIOS and Kernel Developer s Guide BKDG for the AMD processor in your test platform IBS Derived Events CodeAnalyst translates the IBS information produced by the hardware into derived event sample counts that resemble EBP sample counts All IBS derived events have IBS in the event name and abbreviation Although IBS derived events and sample counts look similar to EBP events and sample counts the source and sampling basis for the IBS event information are quite different Arithmetic should never be performed between IBS derived event sample counts and EBP event sample counts It is not meaningful to directly compare the number of samples taken for events which represent the same hardware condition For example fewer IBS
93. data that is passed to the decoder stages in the pipeline The decoder identifies AMD64 instructions in the fetch block These AMD64 instructions are translated to one or more macro operations called macro ops or ops that are executed in the execution phase 60 Types of Analysis Figure 3 6 IBS op samples for each software module Module Name 64 bit IBS load store IBS DC miss IBS DC load lat IBS load IBS store jraat classic classic af 10059 1 4985 9090 969 no vmilinux 374 15 1188 249 125 Nib64 libe 2 5 s0 Japt CadeAnalyst bin aprafiled jusrilibed qt 3 3 lib libqt mt so 3 3 6 MNib64 1d 2 5 50 Nib64 libcrypto so 0 9 8e lusrlib64Aibqimcop so 1 0 0 jusrilibe4 libartstlaw sa 1 0 0 libeA libpthread 2 5 so 36 21 14 10 4 4445444444 Figure 3 7 Attribution of IBS op samples to source level hot spot System Data raat classic classic Data classic classic sre raat classic classic cpp Address Line Source E 59 inline void multiply matrices 000992 push rbp 0400393 mov rsp rbp 60 61 Multiply the two matrices Bl for int i 0 i ROWS it 63 for int j 0 j lt COLUMNS jt 64 float sum 0 0 65 for int k O k C 66 sum sum matrix 67 68 matrix r i j sum 63 70 O Instruction Instruction Based Sampling provides separate means to sample fetch operations and macro ops since the fetch and execution phases of the pipeline treat these
94. e Available Data list and the Column shown list Items in the Column Shown list appear in the View Management window Some views have prerequisites that must be met before they can be selected from the available data list For more information see View Configurations The Manage button opens the view management dialog for the currently selected view Please see Section 7 3 View Management for more detail 2 2 9 Windows Menu The Windows menu controls the display attributes of the Data window These icons become active when a session is open in the work area Figure 2 8 Windows Menu Windows Help f Cascade Tile Close All r Show Density charts session 02 Session 02 ebp Session 03 Session 03 ebp session Session ebp Session 01 Session 01 ebp Cascade Displays open windows as overlapping and cascading downward from the upper left area of the work area Tile Displays the open windows in a non overlapping tiled fashion Close AII Closes all open windows Session Displays open windows A check mark indicates the current window with focus Each session is assigned a number and extension to differentiate between sessions with the same name File extensions further define the file as timer based tbp event based ebp or as a session imported from OProfile 1mport tbp Features 2 2 9 1 Cascading Session Panes When two or more sessions are open session panes can be ca
95. e Configurations cb 64 35 Basic DOCK Ama INSI idas iia 64 IO In Dae Analysis sd lolis 66 3 6 1 Aggregate samples into in line instance cece cece cence ee eceeeeceneeaenseneeaeneeas 66 3 6 2 Aggregate samples into original in line function eese 67 Joe SOUICe Inlined ANNOtANOM id e 68 Sur alle Stack Samplhine Analysis A 69 3L 1 MAMAN C Ss PEOHIe 2 A RN A A ud detta Suo puta 70 HTL MIEWIUP CSS PrO NE cinesi alos ino 70 A COMM OMG PIODle als tir 12 dT Pront Data Cole cion do 12 4 1 1 Modifying a Profile Cofifre rtatiOfr 3 oe A A A Ni 73 A gt Edit Timer Cone uration indies alec ico RD bis 73 4 3 Edit Event based and Instruction based Sampling Configuration uuuesusus 74 do Oe NOI da T3 4 3 2 Select and Modify Events in Profile Configuration eese 76 45 9 Avadable Performance EV CAS a TI 2 4 Predenned Profile ConngsuratiOnS aida ias 79 42 Manase Profile Conti urati ns ari ii cio 80 A Proe AAA A A NaN Nl eee at und DD iMac raten 84 5 1 Collecting Profiles and Performance Data cn nie Tox ta eaadawedsoules 84 5 2 Collecting a lime Based Prole arar 84 5 2 Collecting a Time Based PtOllle ade cH ote atio eto to bt oH amete 84 5222 Chaneme the Current View Ob the Data una A A ib 87 I2 UU ST ouest cote teo ubeiteovit Gore duties essit tea a a Daan fis 87 3 5 Collecting an EventBased Prole 4i ce oe e bito iva bata 88 3 9 Ll Collectine am Event Bas
96. e changes and to dismiss the dialog box 128 Tutorial Profile Name Current event based profile Manage Profiles Perf Counter Bs Fetch IBS Op Import Info Please select event from the list below ITLB reloads ITLB reloads aborted Retired instructions Add Events Retired uops Event Setting Count 11000000 1000000 0x00 1 CPU clocks not halted Unit Masks 1000000 0x00 1 Retired instructions 1000000 0x00 l Retired uops Reserved Reserved Reserved Reserved Reserved Reserved Reserved d a a E E E E a Reserved Apply Setting NM usr Multiplexing interval msec Remove Event Os Selected Events Save As Save Y Cancel da 7 Click the Start button in the toolbar to begin data collection or select Profile gt Start from the Profile menu Results are displayed in the System Data tab when data collection is finished A new session Session 01 1s added to the list of EBP sessions in the session management area Notice that CodeAnalyst auto generates new session names when necessary 8 Click on the System Data tab and select the All Data view from the list of available views Three columns display containing the number of samples taken for the CPU clocks not halted retired instruction and retired micro op uops events 129 Tutorial aa ession Zl sl meer 01 DOC a a RR amp root AMD CodeAnalyst samples classic classic 23150 i
97. e specifically designed to help analysis of in line functions Aggregate samples into instance of in line function This is the default aggregation mode When samples belong to an in line instance CodeAnalyst aggregates them into the caller function and uses blue text to identify the in line instance together with in line function name Figure 2 25 Aggregate samples into instance of in line function Session 01 Session 01 ebp Au Data sil mS mA System Data froot classic inlined_classic Data Pia pan Pid Tid an Tid ll Aggregate samples into instance of inline function r CS EIP Symbol 4 Offset CPU clocks DC misses 0x4009c1 main gt multiply matrices Ox400a04 main gt multiply matrices 0x43 Ox400a16 main gt multiply matrices 0x55 Ox400a00 main gt multiply matrices Ox 3f Ox400alf main gt multiply matrices x5e Ox400a0f main gt multiply matrices Ox4e Ox400a28 main gt multiply matrices 0x67 Ox400a0b main gt multiply matrices Ox4a Ox400a07 main gt multiply matrices 0x46 Ox4009F9 main gt multiply matrices 0x38 0x4009e3 main gt multiply matrices 0x22 Ox400alb main gt multiply matrices Ox5a 0x400a24 main gt multiply matrices 0x63 0x400a21 main gt multiply matrices 0x60 0x4D09f 3 main gt multiply matrices 0x 32 i Ox4009e0 main multiply matrices Ox1f 0400800 initialize matrices i 0x4006 0 Unknown Sample Aggregate sampl
98. e text edesoriptlion lt view gt lt view_configuration gt The lt data gt element specifies the event data that is needed to produce the view CodeAnalyst uses this information to determine if a view is producible from a data set and will not offer the view to the user if it cannot be produced from available event data in the session The lt data gt element must appear before the lt output gt element since the lt output gt element makes use of identifiers which are defined in the lt data gt element 7 5 1 2 lt data gt The lt data gt element contains one or more lt event gt elements An lt event gt element e Indicates that event data of that particular type is needed to produce the view and Defines amnemonic symbolic identifier which is used to refer to that event data in the output section of the XML file The lt data gt element may contain lt event gt elements for event data which is not used in the lt output gt section Such lt event gt elements can be used to further control the association of a view with certain data collection configurations For example a view can be associated with a specific data collection configuration by listing exactly the events measured by the data collection configuration 7 5 1 2 1 A lt event gt An lt event gt element has three attributes id Symbolic name given to the event data select Event selection value hexadecimal integer mask Event selection unit
99. e toolbar Or you may open the CodeAnalyst Options dialog box by selecting Tools gt CodeAnalyst Options from the pull down menu General Tab The General tab controls the display of source code and disassembler instructions CodeAnalyst follows the usual Windows conventions for accepting or canceling changes to options Clicking the Apply button activates the new options Clicking the OK button activates the new options and closes the dialog box Clicking the Cancel button closes the dialog box without applying changes Figure 2 24 CodeAnalyst Options lll CodeAnalyst Options 4 General Directories source Code Display Show disassembly only by default Alert when no source is available Font Size Default al Data Aggregation Default Aggregation Mode f Aggregate samples into instance of inline function Aggregate samples into original inline function Aggregate samples into basic blocks Show Data Aggregation Controller in Module Data Tab Toolbar Source Code Display If the Show disassembly only by default check box is selected double clicking a sample address in a module opens the Source Disassembly tab in Disassmbly only mode by default Selecting the Alert when no source is available check box displays an alert message when CodeAnalyst cannot find the source for a module 22 Features 2 3 1 2 Data Aggregation CodeAnalyst allows three different methods for aggregate data Two modes ar
100. ed but users can save the changes as a new user defined configuration Users can manage the list of configurations by clicking the Manage Profiles button which will bring up the Configuration Management dialog See Section 4 5 Manage Profile Configurations for more info 75 Configure Profile 4 3 2 Select and Modify Events in Profile Configuration 4 3 2 1 Selected Events Tab Figure 4 3 Edit EBS IBS configuration Selected Events Tab Events in this profile configuration Event Setting Event source Count Unitmas Usr Os Name Count 25000 Ox0040 BC 250000 0x00 1 1 Data cache accesses Unit Masks 0x004 DE 25000 O00 1 1 Data cache misses A TA ENAA il 1 LIOTLB and L2 OTLB miss 0x0047 DC 25000 0x00 1 1 Misaligned accesses 2M TLB reload x0076 FR 250000 Ox00 l CPU clocks not halted cyc 1G TLE reload Ox00c0 FR 250000 0x00 1 l Retired instructions E oe Ox0Dc2 FR 25000 Ox00 1 1 Retired branch instructions OS x c3 FR 25000 x al 1 Retired mispredicted bran Reserved L Reserved C Reserved LIsr Apply Setting Multiplexing interval msec 1 Os Selected Events This tab contains a table listing performance events in the currently selected profile configuration For instance when you select Assess Performance profile configuration there will be 8 events shown in the table along with the counts unitmask and other information To modify an event simply highlight the event you want to modif
101. ed Protein cia 88 53 2 Chansine the Current View of the Data a ote mte bet eto er qu dett uia 91 32515 UU PROCESSES A E A a 92 5 4 Collecting an Instruction Based Sampling Profile sees 93 Sd Locolecuns an IBS PEOBle eene io ie eue et tace nba tendat ud un timens 93 5 412 Chaneme the Current View of the Data ia iw cata ciate Y bee b e E eua 96 5 4 3 Changing How IBS Data is Collected ead id 98 0 Data Collection Conta ni ita did 100 6 1 Data Collection Oni S ULATION a fics dia ii dis 100 1V CodeAnalyst User s Manual 6 2 Profile Contisurati n Pile LOMA oue toti o bad emi Bebe e Moo ark a e obuia basa a utes 100 02 LX ME TUE TORMIAL 620 09e boten E Suri a ume aud 100 6 2 2 Examples OF XML Piles 3 2 so A see b CoD eu dt 102 ic VIEW C OMNI E UEO LOB aos oed e NI tuba tlt a ta Gc d gio bodef a 104 yl VIC WAS NA tia eimi bte Mie tuto A ete bra 104 TN OW COM SULA OUS dea tse aires 104 4 5 VIEW Mania SE MOE scio ois en A A 105 75915 VIEW WManasement Dido nas 106 TA Predetited VI A ER xa eR NEANG 107 cs View Eontisuration File FORE s uou oo c coins denon 109 TL ME Pile FOMM octies ae Dec es QUY ua cene dtd rau Roe Mo Meno value edid da 109 ko 2 Example AME PATIO ties ote e tee ota 111 REM TITO A eoancademewsdeadamacnetenes 113 Sal AMD Code Analyst Tito Motos I done sut mtt bites inet pense e 113 5 ll Related Tope a ce o aus ee ai unc abuti anu M DU adstat 113 02 Litoral
102. ed as XML files and cannot be permanently changed Any changes made in a session only appear while the session is open Advanced customization is possible by making a copy of the XML file containing the view configuration and then modifying the XML representation itself The view configuration files that are installed with CodeAnalyst provide a good starting point for creation of new views Predefined view configuration files are in the opt CodeAnalyst share codeanalyst Configs ViewConfig directory In this directory CodeAnalyst stores common view configuration files Platform specific configuration files are stored in the subdirectories with the corresponding AMD processor family name NOTE Since the unit mask for some performance monitoring events vary on different AMD processor families view configurations that use these events are listed underneath platform specific directory See Section 7 5 View Configuration File Format for more information about the XML representation of a view 7 3 View Management The content and format of a view can be modified using view management Use view management to change the kinds of performance data shown in a view to aggregate or separate data by core CPU to aggregate or separate data by task to aggregate or separate data by thread and to control display of a percentage column Two types of View Management dialog are available Global Manage a collection of session dialogs that are
103. em Info Displays the System Information dialog box and reports system characteristics such as processor model operating system version memory characteristics and video resolution 11 Features 2 2 11 Data and Source Display The profile results are shown in the form of data tables and annotated source code In a time based or event based profile session the initial tabbed panel shows tabular system wide profile data which can be aggregated by module or process From this table users can navigate further into the module and function of interest Source code display is organized to allow for drilling down into the selected function at the source and disassembly level Additional interface elements are available A drop down list of preset views allows quick selection of a new view A view is a set of event data and computed performance measurements that display in each column of the table The All Data view is the default view which shows all available data in columns Each column represents a distinct type of data or a computed performance measurement Use the Section 7 3 View Management dialog box to change view configurations Also profile data can be shown in raw samples or percentage For multi core systems samples are shown for each core Each column can be sorted in ascending or descending order by clicking the column header The 64 bit column distinguishes if samples are 32 bit modules or 64 bit modules 2 2 11 1 System
104. ent Counter Multiplexing 2 4 1 The number of performance counters in AMD processors 1s often limited to a small number 1 e 4 counters in most processor families The number of performance events allowed in a profiling session is limited by this hardware constraint For instance only four events could be measured per run A minimum of two runs was needed to collect all data for five or more events Event counter multiplexing removes this burden Event multiplexing is accomplished by re programming the performance counters with the next set of events when a timer interrupt is generated by the driver The interval between each timer can be specified in the Edit EBS IBS Profile Configuration dialog box See Section 4 3 Edit Event based and Instruction based Sampling Configuration for more information Example of Event Counter Multiplexing In the Edit Event Configuration dialog see Section 4 3 Edit Event based and Instruction based Sampling Configuration select the predefined profile configuration Assess performance 26 Features Figure 2 28 Assess Performance Configuration with 1 msec MUX Interval Events in this profile configuration Event Setting Event Source Count Unitmas Usr Os Name Count 0x0040 DC 250000 0x00 1 l Datacache accesses Unit Masks Options x 041 DC 25000 Ox00 1 1 Data cache misses O Reserved 0x0046 Be 25000 0x07 1 1 LIDTLB and L2 DTLB miss MUT 0x0047 DC 25000 0x00 al 1 Misaligned
105. er of load store operations are also displayed B File Profile Tools Windows Help AA jtm s FE e 00 Investigate instruction access aH IBS MEM data cache Manage m TBP Sessions ystem Data F EBP Sessions Aggregate by Modules EJE Session i noe 01 Module gt Process IBS load store IBS load IBS store IBS DC miss Session 02 gt root AMD CodeAnalyst samples classic classic 0 011760 E no vmlinux 0 024823 ib64 libc 2 12 s0 opt CodeAnalyst bin oprofiled lib64 libglib 2 0 s0 0 2200 5 usr lib64 libxcb so 1 1 0 F sbin auditd F opt CodeAnalyst bin CodeAnalyst A 88080 Pj Sampling Session Idle R AA M a 4 Select the IBS MEM data TLB view from the drop down list of views The IBS MEM data TLB view is displayed This view shows information related to data translation lookaside buffer DTLB behavior The number of sampled IBS load store operations is shown along with a breakdown of the number of load operations and the number of store operations AMD processors use a two level DTLB Address translation may hit in the LI DTLB miss in the LI DTLB and hit in the L2 DTLB L1M L2H or miss in both levels of the DTLB LIM L2M The performance penalty for a miss in both levels is relatively high Nearly half of the sampled load store operations incurred a miss at both levels of the DTLB This is the performance culprit in the sa
106. er the Program or any derivative work under copyright law that is to say a work containing the Program or a portion of it either verbatim or with modifications and or translated into another language Hereinafter translation is included without limitation in the term modification Each licensee is addressed as you Activities other than copying distribution and modification are not covered by this License they are outside its scope The act of running the Program is not restricted and the output from the Program is covered only if its contents constitute a work based on the Program independent of having been made by running the Program Whether that is true depends on what the Program does Section 1 You may copy and distribute verbatim copies of the Program s source code as you receive it in any medium provided that you conspicuously and appropriately publish on each copy an appropriate copyright notice and disclaimer of warranty keep intact all the notices that refer to this License and to the absence of any warranty and give any other recipients of the Program a copy of this License along with the Program You may charge a fee for the physical act of transferring a copy and you may at your option offer warranty protection in exchange for a fee Section 2 You may modify your copy or copies of the Program or any portion of it thus forming a work based on the Program and copy and distribute such modifications or work
107. es The Aggregate by Processes mode shows the sample counts break down by modules within the process 87 Collecting Profile Figure 5 6 Aggregate by Processes ra File Profile Tools Windows Help JE 3 BF GB BO 0 0 errimen y TBP Sessions System Data Aggregate by Processes 2 PID 4255 root classic classic Mibflibe 2 12 1 50 Ina vmlinux 3 Mlib ld 2 12 1 50 7 Ino vmlinux 7 Jopt CadeAnalyst trunk bin CodeAnalyst Jusr bin sudo H Isbinikillall5 m Jusr sbin sshd Sampling Session idle 5 3 Collecting an Event Based Profile 9 3 1 Event based profiling EBP collects data based on how a program behaves on processors and memory Information from this type of profile can be analyzed for developing and testing hypothesis about performance issues AMD processors provide a wide range of hardware events that can be monitored and measured Collecting an Event Based Profile To collect an event based profile 1 Create new project or choose a previously opened project Select an event based profiling configuration like Assess performance from the drop down profile configuration list in the toolbar 2 If a new project is to be created the New project properties dialog box opens Assign a project name and location or browse for an existing file 88 Collecting Profile Figure 5 7 New Project Properties Project Name my project Project location roo
108. es into original in line function When samples belong to an in line instance Code Analyst aggregates them into each in line instance CodeAnalyst groups all in line instances and lists them together under the in line function which is presented in red text 23 Features Figure 2 26 Aggregate samples into original in line function Session 01 Session 01 ebp System Data an Pid me Tid T Aggregate samples into original inline function CS EIP Symbol Offset CPUclocks DC misses multiply matrices 47236 8514 Ox4009c1 multiply matrices lt main 47236 8514 gt 000004 main Oxa4 3884 5046 0400416 main Oxb amp 10072 1680 04004900 main Oxa0 4174 578 0x400a1f main Oxbf 4120 578 Ox 4ADDaU0f main Oxaf 4012 558 0400228 main 0xc8 370 22 Ox400a0b main Oxab 178 20 Ox400a07 main Oxa 132 22 040099 main 0x99 132 0x4009e3 main 0x83 114 Ox400alb main Oxbb 36 0400924 main Oxc4 4 Ox400a21 main Oxcl 040093 main 0x93 0x4009e0 main 0x80 0400800 initialize matrices 0x4006e0 Unknown Sample 0x400960 main 4 Aggregate samples into basic blocks This mode of aggregation is designed to aid basic block analysis BBA CodeAnalyst examines each function to identify basic blocks and aggregates samples accordingly Each basic block is denoted by range of address as follows StartAddr StopAddr Number of load Number of store 24 Features Figure 2 27 Aggregate samples into ba
109. escribes the derived events for Instruction Based Sampling IBS IBS is available on AMD Family 10h processors 9 3 1 IBS Fetch Derived Events 9 3 1 1 Event OxF000 Abbreviation IBS fetch The number of all IBS fetch samples This derived event counts the number of all IBS fetch samples that were collected including IBS killed fetch samples 9 3 1 2 Event OxF001 Abbreviation IBS fetch killed The number of IBS sampled fetches that were killed fetches A fetch operation is Killed if the fetch did not reach ITLB or IC access The number of killed fetch samples is not generally useful for analysis and are filtered out in other derived IBS fetch events except Event Select OxFOOO which counts all IBS fetch samples including IBS killed fetch samples 9 3 1 3 Event OxF002 Abbreviation IBS fetch attempt 148 Performance Monitoring Events The number of IBS sampled fetches that were not killed fetch attempts This derived event measures the number of useful fetch attempts and does not include the number of IBS killed fetch samples This event should be used to compute ratios such as the ratio of IBS fetch IC misses to attempted fetches The number of attempted fetches should equal the sum of the number of completed fetches and the number of aborted fetches 9 3 1 4 Event 0xF003 Abbreviation IBS fetch comp The number of IBS sampled fetches that completed A fetch is completed if the attempted fetch delivers instruction data
110. ession Idle To select a profile configuration via Session Settings 1 Open the Session Settings dialog box by clicking the session settings button in the toolbar or by selecting Tools gt Session Settings from the menu bar 2 Select one of the profile configurations from the drop down list of profile configurations in the Session Settings dialog box 72 Configure Profile Launch Contral Launch raat classic classic C Terminate app when stop profile C Enable CPU Affinity foxttt Show app in terminal Enable Process Filtering Advance Filter Profile Control Profile duration sec o Profile start delay sec o Stop profile when the app exits C Start with the profiling paused Profile the duration of the app execution Profile Configuration Assess performance Assess performance Current event based profile Current instruction based profile Current time based profile IBS Op Branch IBS Op Load Store IBS Op Load Store expert IBS Op Load Store DC Miss IBS Op Load Store DTLB IBS Op Load Store Memory Access 4 1 1 Modifying a Profile Configuration The predefined profile configurations should provide starting points for program analysis However eventually it is necessary to change either the kind of data collected such as the specific events collected through event based profiling the sampling period or some other aspect of data collection Custom profile config
111. essions L Session Aggregate by usr lib64 qt 3 3 lib libqt mt so 3 3 8 lib amp 4 libpthread 2 12 so opt CodeAnalyst bin CodeAnalyst usr sbin irqbalance usr lib64 libxcb so 1 1 0 usr lib64 libX11 50 6 3 0 usr lib64 libldap 2 4 so 2 5 2 usr bin sudo sbin killall5 libe4 libpam so 0 82 2 1 lib64 libglib 2 0 so 0 2200 5 Samping Session e Z 4 Select the IPC assessment view from the drop down list of views CodeAnalyst updates the System Data table which now shows the IPC assessment view This view consists of The number of retired instruction samples The number of CPU clock samples The ratio of instructions per clock cycle IPC The ratio of clock cycles per instruction CPI The ratio of instructions per clock cycle is a basic measure of execution efficiency and is a good indicator of instruction level parallelism ILP 126 Tutorial my_project caw A IPC assessment cul Manage w TBP Sessions ystem Data F EBP Sessions Leeson Aggregate by Modules root AMD CodeAnalyst samples classic classic 0 908533 1 100676 E no vmlinux 0 111311 8 983857 lib64 libc 2 12 so 1 333333 0 750000 opt CodeAnalyst bin oprofiled 1 500000 0 666667 0 454545 2 200000 0 833333 1 200000 0 500000 0 500000 opt CodeAnalyst bin CodeAnalyst lib64 libpthread 2 12 so usr lib64 libXft so 2 1 13 usr lib64 libxcb so 1 1 0 usr lib64 libX11 s0 6 3 0 usr lib64 liblda
112. essors can measure a wide range of event types These events are described in more detail in Section 9 1 Performance Monitoring Events PME Consult the BIOS and Kernel Developer s Guide BKDG for the AMD processor in your test platform for in depth information about the performance monitoring events that it supports including any processor revision by revision differences For IBS derived events please see Section 9 3 Instruction Based Sampling Derived Events 74 Configure Profile Figure 4 2 Edit EBS IBS configuration Cycles in which the FPU is Empty Dispatched fast flag FPU operations PEED SSE Ops Events in this profile configuration Event Setting 250000 Data cache accesses Unit Masks Options 25000 Data cache misses O Reserved 25000 L1DTLB and L2 DTLB miss 25000 Misaligned accesses 250000 CPU clocks not halted cyc 250000 Retired instructions 25000 Retired branch instructions 25000 Retired mispredicted bran Multiplexing interval msec Selected Events 4 3 1 Profile Name This drop down menu shows the name of currently selected profile configuration for editing Changing selection will re populate settings with the newly selected configuration Once changes are made to the current configuration users must click Ok or Save to store changes to the current configuration or Save As to store changes as a new configuration Note that the pre defined configurations cannot be modifi
113. fic events to be measured are determined by the profile configuration that is used to set up data collection CodeAnalyst provides five predefined profile configurations to collect performance data using event based sampling These profile configurations are e Assess performance Assess overall program performance Investigate data access Investigate how well software uses the data cache DC Investigate instruction access Investigate how well software uses the instruction cache IC Investigate L2 cache access Investigate how well software uses the unified level 2 L2 cache Investigate branching Identify mispredicted branches and near returns These profile configurations cover the most common program performance issues of interest Later in this section we demonstrate how to select events and configure data collection in order to investigate issues that are not covered by the predefined profile configurations 8 5 1 Assessing Performance A drop down list of the available profile configurations is included in the CodeAnalyst toolbar 1 Select the Assess performance profile configuration This profile configuration is a good starting point for analysis because it generates an overview of program performance The overall assessment may indicate one or more potential issues to be investigated in more detail by using one of the other predefined configurations or by using a custom profile configuration of your own B eA ajan
114. ficant and contribute to the successful forward progress of executing programs 9 3 2 2 Event 0xF101 Abbreviation IBS tag to ret The total number of tag to retire cycles across all IBS op samples The tag to retire time of an op is the number of cycles from when the op was tagged selected for sampling to when the op retired 9 3 2 3 Event 0xF102 Abbreviation IBS comp to ret The total number of completion to retire cycles across all IBS op samples The completion to retire time of an op 1s the number of cycles from when the op completed to when the op retired 150 Performance Monitoring Events 9 3 3 IBS Op Branches Derived Events 9 3 3 1 Event OxF103 Abbreviation IBS BR The number of IBS retired branch op samples A branch operation is a change in program control flow and includes unconditional and conditional branches subroutine calls and subroutine returns Branch ops are used to implement AMD64 branch semantics 9 3 3 2 Event 0xF104 Abbreviation IBS misp BR The number of IBS samples for retired branch operations that were mispredicted This event should be used to compute the ratio of mispredicted branch operations to all branch operations 9 3 3 3 Event 0xF105 Abbreviation IBS taken BR The number of IBS samples for retired branch operations that were taken branches 9 3 3 4 Event 0xF106 Abbreviation IBS misp taken BR The number of IBS samples for retired branch operations that were mispredicted
115. g EBS profile It collects data based on how a program behaves on processors and memory This section demonstrates how to configure CodeAnalyst to collect an instruction based sampling profile 5 4 1 Collecting an IBS Profile 1 Select Create new project or choose a previously opened project from the Welcome window Select Instruction Based Sampling from the second drop down list in the toolbar If a new project is to be created the New project properties dialog box opens Assign a project name and location or browse for an existing file 93 Collecting Profile Figure 5 13 New Project Properties Project Mame ClassicTest Project location roatiAMD CodeAnalyst ClassicTest 2 The Session settings dialog box opens e In the Launch field enter the path to the application program to be started or browse to find the executable program The Working directory field fills automatically You may also set the working directory to a different location by either entering the path directly or by browsing to it Under Profile configuration select the predefined profile configuration Instruction Based Sampling 3 Advanced step If editing the IBS profile configuration click Edit to open the Edit EBS IBS Configuration dialog box See Section 4 3 Edit Event based and Instruction based Sampling Configuration for more detail 94 Collecting Profile Figure 5 14 Session Settings Session General
116. gement ads 81 SAA A Sd sua op Nh on Lecta dedi amet 82 Ae ice cvents CONC U ON 82 Jal New Project PEODOEUO Si sus warns plaice Minnesotans olla tena tel Mam nantes AI 84 ES SA O o A anata whieh seas 85 Os aeri ri italia ii id 86 IA yS A alae STIS 22b ena TEET E bie tuto ini abs camels ete scende mnie sa entem tnt 86 2492 VIEW AAA Vau quit uada Neto ra rece Mla recta ce can Mita adducunt RE 87 UE IN A A o D akeleaeteaueudde 88 Ll New Project PODES all idos 89 5 9 OSOS led oscila 89 3 9 Launch AA A A a a eto de Merde eM culo lett D voa eaves 90 3 10 Perronmance Data Result di 91 Ml View Mana eE eia A A isa 92 3 12 ASE CALC DY APIODESSES ti tl et ee ed sce 93 DS NEW Prope ct PLOPEMIES lt a AS EA AA ES EEES 94 SAE e ESE T a E ios 95 AL Output tron IBS Pror le SESSION snc io A tea se 96 5 16 IBS Profile with All Data view when selecting large number of IBS derived events 97 AT EDS AE ODS 2 c bored a A AS A Me TA A or Dalec Mosca ae ave 97 LS VIEW MARA US SERRE T T UU UU Tm 98 3 19 Edit EB S S conni gurao e 99 Al NTSC Ol VAC T dis 104 7 2 Global View Management DIOS AT AN ances 106 Vill Chapter 1 Introduction 1 1 Overview 1 1 1 1 1 2 AMD CodeAnalyst is a suite of tools to assist performance analysis and tuning Chapter 2 Features provides a summary of CodeAnalyst features and concepts It is essential reading for all CodeAnalyst users Chapter 8 Tutorial provides step by step directions for using
117. gs subdirectory 83 Chapter 5 Collecting Profile 9 1 Collecting Profiles and Performance Data Information about program performance can be collected in several ways depending upon the kind of analysis to be performed The following sections discuss ways to collect program profiles and performance information Section 5 2 Collecting a Time Based Profile Section 5 3 Collecting an Event Based Profile Section 5 4 Collecting an Instruction Based Sampling Profile Collection and analysis of profiles are also covered in the Chapter 8 Tutorial 5 2 Collecting a Time Based Profile 9 2 1 In time based profiling the application to be analyzed is run at full speed on the same machine that is running AMD CodeAnalyst Time based samples collected at predetermined intervals can be used to identify possible bottlenecks execution penalties or optimization opportunities The timer based profiling feature can be used with all AMD processors This page describes how to collect a time based profile The predefined profile data collection configuration Time based profile is used to enable collection of a time based profile Collecting a Time Based Profile To collect a time based profile Create new project or choose a previously opened project Select Time Based Profile from the drop down profile configuration list in the toolbar 2 If a new project is to be created the New project properties dialog box opens As
118. guration gt lt ebp name mux_period 10 gt KCOOL TAOS aan COOL tig lt description gt lt description gt lt event select 0x00 mask 0x00 os T user T host F guest F edge_detect F count 50000 gt lt event gt lt ebp gt lt dc_configuration gt 6 2 1 4 Tool tip The tags lt tool_tip gt and lt tool_tip gt mark the beginning and end of a short tool tip description of a configuration The text between the tags is the tool tip description lt EOOL Ep usw ICOC DID A tool tip is usually only a few key words with no line breaks 6 2 1 5 Description The tags lt description gt and lt description gt mark the beginning and end of a short description of a configuration The text between the tags is the description lt description gt se descriptiono A description is usually only a few sentences long It may contain line breaks Line breaks and extra space will be simplified Line breaks will be replaced by spaces and runs of space characters will be replaced by single space characters 6 2 1 6 Attributes The value of a numeric attribute 1s returned as a string that can be converted to internal integer and floating point representation The actual values may be further constrained for the attribute s purpose e g the number of simulated instructions must be greater than 0 The default value for a numeric attribute is zero The value of a Boolean attribute is returned as one of t
119. guration management buttons on the right hand side of the dialog box The Configuration Management dialog box selects a profile configuration to be modified Edit Rename to be deleted Remove or to be written to a file Export A new profile configuration can be read from a file Import A profile configuration is stored as a sharable XML file Export writes an existing profile configuration to a file while import adds a new profile configuration by reading it from a file New profile configurations appear at the bottom of the profile configuration list Only user defined and imported configurations can be deleted from the list the predefined profile configurations that are installed with CodeAnalyst cannot be deleted 80 Configure Profile Figure 4 9 Configuration Management LA Configuration Management Profile configuration list Rename Current event based prafile Current instruction based profile Current time based prafile IBS Op Branch IBS Op Load Store Edit IBS Op Load Store expert IBS Op Load Store DC Miss Import IBS Op Load Store DTLB IBS Op Load Store Memory Access IBS Op Load Store Page Size IBS Op Northbridge Access IBS Op Northbridge Cache Access IBS Op Northbridge Services IBS Op Overall Assessment IBS Op Return Instruction based sampling Investigate L2 cache access Investigate branchina Remove The following actions are available through the Configuration Management dialog box Cl
120. gure 3 13 Aggregate into in line function system Data root classic inline d classic Data Aggregate samples into symbol Offset multiply matrices lt main Ux4009ec 0x4008f3 0x4 009 ef 04 00 9fe 0400203 0x400a07 Ox4009e8 i OX4 00917 Ox400a0c 0x400ala i 0x400a09 main Ox9c main xa3 main 0x9f main Oxae main Oxb3 main xb7 main 0x958 main xa7 main xbc main Oxca main xbS9 H 0x4008b0 0x4006e8 F 0x400950 initialize matrices Unknown Sample main 3 6 3 Source inlined annotation When viewing a function in the Source Disassembly tab CodeAnalyst annotates the inlined instances with the source lines of the inlined function Furthermore the source lines of the inlined instances are annotated with assembly instructions accordingly This is useful when trying to analyze the samples count within inlined functions which could be from multiple source files 68 Types of Analysis Figure 3 14 Annotated inlined instances inlined classic inlined classic ebp system Data froot classic inlined classic Data 10950 froot classic inlined_classic Src Dasm x Function 03400960 0x400380 main Pia Pid Tid Tid Type Source and Dasm Search for A Go Es my pid getpid aer 0 BERN 88 10 P qe m in fprintf result file Process ID z 101 Dasm 102 initialize matrices
121. he misses 25000 L1 DTLB and L2 DTLB miss 25000 Misaligned accesses 250000 CPU clocks not halted cycles 250000 Retired instructions 25000 Retired branch instructions 9 Select Time based profile or another profile configuration from the drop down list of profile configurations A brief description of the profile configuration is displayed for the selected configuration 115 Tutorial IBS Op Northbridge Cache Access IBS Op Northbridge Services IBS Op Overall Assessment IBS Op Return Instruction based sampling Investigate L2 cache access Investigate branching Investigate data access Investigate instruction access Time based profile 10 Click the check box to the left of Terminate app when stop profile to select that option When this option is enabled CodeAnalyst terminates the application program after collecting data The Profile duration option specifies the maximum time period for data collection 11 Enter 10 into the Profile duration field to collect data for a maximum of 10 seconds 12 Click OK to confirm the session settings CodeAnalyst is now ready to collect data 116 Tutorial setting Templates Template Name Sesion 200 General Advanced Note Launch Control Launch root AMD CodeAnalyst samples classic classic Working directory root AMD CodeAnalyst samples classic F7 Terminate app when stop profile Enable CPU Affinity in Hex 03 _ Show app in
122. he strings T or F which denote the truth values true and false respectively The default value of a Boolean attribute is F 6 2 2 Examples of XML Files 6 2 2 1 TBP example Here is an example of a data collection configuration for time based profiling 102 Data Collection Configuration dc contfrguratliorn tbp name Time based profile interval 10 0 gt tool tip Measure time in program regions tool tip description Monitor the program and produce a profile showing where the program spends its time Use this configuration to identify hot spots sJ decer IoC ON E EDp gt lt dc_configuration gt 6 2 2 2 EBP example Here is an example of a data collection configuration for event based profiling de COnELIGuUralion gt lt ebp name Quick tune mux_period 100000 gt 1 CPU clocks unhalted gt event select 0xT6 mask 0x00 oss user T count 50000 gt lt esvyente gt lt Retired instructions gt lt event Selecet 0 Cc0 mask 0x00 os T user T coinet gt 50000 gt lt fevent gt l In order to detect branch mispredictions gt event Select 0xC2 mask 0x00 6oS E user I eounte t5OODMoc event 1 Mispredicted branches gt Levent weelectet0xco U mask 0x00 lt os T auser T count 5000 gt lt event gt 1 Data cache accesses gt event Se lece 0 40 mask 0x00 os user T count gt 000 gt lt event gt lt
123. ick the Rename button to rename the selected user defined profile configuration e g my config configuration and to rename the associated file Only user defined profile configurations may be renamed Click the Remove button to remove the selected user defined profile configuration e g my config configuration and to delete the associated file Only user defined profile configurations may be removed from the list of profile configurations e Click the Edit button to modify the selected configuration e Click the Import button to import a profile configuration from an existing file This adds a new profile configuration to the list The import action displays a standard file selection dialog box Use this dialog box to select a file to import e Click the Export button to write the selected profile configuration to a file The export action displays a standard file selection dialog box Use this dialog box to specify the name and location of the file to which the profile configuration is written Click OK to close the window without taking further action The Edit button opens an edit dialog box The kind of analysis used by the selected profile configuration determines the type of dialog box that displays since each kind of analysis has its own settings For example selecting Current time based profile then clicking the Edit button opens the Edit timer 81 Configure Profile configuration dialog box See Section 4 2 Edit
124. ight of the CPU affinity mask field To change the CPU affinity 1 Open the Session Settings dialog box 2 Click the Select Affinity button 3 The CPU Affinity Configuration dialog box appears 4 Check a box to enable execution on a core 5 Click the Select All button to check all boxes enabling execution on all cores 6 Click the Clear All button to remove checks from all boxes 7 Click the OK button to activate the CPU affinity settings 43 Features CPU Affinity C uration LA nity Confiqu ation Please select CPU affinity socket 0 Core 0 Core 1 Core 2 Core 3 socket 1 Core 0 Core 1 Core 2 Core 3 socket 2 Core 0 Core 1 Core 2 Core 3 socket 3 Core 0 Core 1 Core 2 Core 3 Select All Clear All The CPU affinity configuration below limits execution of the application program under test to a single CPU Core 1 44 Features CPU Affinity C uratia LA nity Huis Ll on Please select CPU affinity Socket 0 Core 0 Core 1 Socket 1 Core 0 Core 1 Socket 2 Core 0 Core 1 Socket 3 Core 0 Core 1 Select All Clear All The Session Settings dialog box below reflects the change to the CPU affinity mask 0x2 45 Features Setting Templates Template Name Session Launch Control Launch raat classic classic Browse Terminate app when stop profile Enable CPU Affinity Select Affinity Show app in terminal Enable Pr
125. ing and band conflicts BS MEM locked ops and access by type BS MEM translations by page size BS NB cache state e BS NB local remote BS NB request breakdown View Configuration 7 5 View Configuration File Format View configurations control the content and formatting of CodeAnalyst views The CodeAnalyst GUI offers predefined view configurations which can be selected by a user View Configurations are stored in XML formatted files A new customized view can be created by writing an XML file for the view Only the most advanced CodeAnalyst users should ever need to create views at the XML level This section describes the format and semantics of a view configuration in XML format 7 5 1 XML File Format A view configuration file contains a single non empty lt view_configuration gt element This element in turn contains a single non empty lt view gt element This format allows the possibility of representing multiple views within a single configuration file The current implementation assumes that only one lt view gt will be defined in a configuration file The lt view gt element describes a view and has the following attributes name Displayable symbolic name given to the view string separate cpus Controls aggregation separation by CPU Boolean separate processes Controls aggregation separation by process Boolean separate threads Controls aggregation separation by thread Boolean default
126. ing at clock speed 800MHz to specify 1 millisecond time interval of time based profiling Sampling Period and Measurement Period Because TBP relies upon statistical sampling collecting enough samples to reason about program behavior is important The number of TBP samples collected during an experimental run depends upon How frequently samples are taken the sampling period and The length of the time during which samples are taken the measurement period The frequency of sample taking is controlled by the timer interval This quantity is sometimes called the sampling period The default timer interval is 1 millisecond meaning that a TBP sample is taken on a processor core approximately every one millisecond of wall clock time The timer interval can be changed by editing the Current time based profile configuration With the default timer interval roughly 1 000 TBP samples are taken per second for each processor core 56 3 2 3 Types of Analysis With a shortened interval time samples are taken more often and more samples are taken within a given fixed length measurement period However the amount of effort expended to take samples known as overhead will increase placing the test system under a higher load The process of taking samples and the overhead have an intrusive effect that perturbs the test workload and may bias statistical results As to the second factor the length of time during which samples are taken
127. ith a line of source code Graphical User Interface The CodeAnalyst graphical user interface GUI provides an interactive workspace for the collection and analysis of program and system performance data Users can also run profiling via a command line interface and subsequently import the profile results into the CodeAnalyst GUI for viewing Projects and Sessions The CodeAnalyst GUI uses a project and session oriented user interface A project retains important settings to control a performance experiment such as the application program to launch and analyze settings that control data collection etc A project also organizes performance data into sessions A CodeAnalyst session is created when performance data is collected through the GUI or when profile data is imported into the project The Oprofile command line utility is an alternative method for collecting data Session data is persistent and can be recalled at a later time Sessions can be renamed and deleted 2 1 7 Features Basic Steps for Analysis The CodeAnalyst graphical user interface provides features to set up a performance experiment run the experiment while collecting data and display the results The basic steps are 1 Open an existing project or create a new project 2 Set up basic run parameters like the program to launch the working directory etc 3 Select a predefined profile data collection configuration 4 Collect a time based profile event based
128. k on a function to drill down into the data for that function CodeAnalyst displays a new tab containing the source for the function along with the number of timer samples taken for each line of code A code density chart is drawn at the top of the source panel Emm 59inline void multiply_matri v Dasm 60 0x400abd push srbp 0x400abe mov rsp s rbp 61 Multiply the two ma gt Dasm 62 for int i Oj i lt R Dasm for int j 0 j lt COLU Dasm 64 float sum 0 0 gt Dasm 65 for int k 121 Tutorial 6 Some users may choose to hide the code density chart in order to devote more screen area within a source panel to code To hide the code density chart select Windows gt Show Density Charts from the Tools menu Windows Help Cascade rra ll I Lile Close All show Density charts r Session Session tbp 7 Click on an expand square to expand and display a region of source or assembly code or click on a collapse square to hide a region of source or assembly code Timer samples for each source line and assembly instruction are shown The source panel shows the program regions that are taking the most execution time These hot spots are the best candidates for optimization WP File Profile Tools Windows Help 1B P3 El 0d 19000 Time based profile A i All Data 4 Manage Up HTBP Sessions System Data 00t AMD CodeAnalyst samples classic classic Data AMD CodeA
129. lapt CodeAnalyst trunk bin CodeAnal Jusr lib libXrender so 1 3 0 Jusr bin xterm Jusr bin sudo Isbin killall5 Es a 2 RO ORJ RJ RJ RJ RJ d Ch cn P 12 Features Figure 2 13 System Data tab Aggregated by Process Session Session ebp System Data Aggregate by Processes y Jl Lyco we LLL gt Pid gt Module nol oan heen PID 1813 root classic classic Ina vmlinux lib libc 2 12 1 so i llib ld 2 12 1 so nao vmlinux J opt CodeAnalyst trunk bin oprofiled Jusr bin xterm Jopt CodeAnalyst trunk bin CodeAnalyst Jusr sbin sshd I sbin killall5 Jusr bin sudo J usr sbin irgbalance Jusr lib gnome settings daemon gnome settings daemon Jusr lib utempter utempter J usr sbin rsyslogd i 2 2 11 2 Single Module Data Tab The Single Module Data tab drills down into a single module It illustrates how the data samples are distributed within a module Like the System Data tab this view shows the distribution of samples per core in a multicore system The data samples can be expanded and collapsed around the available symbols If debug information is available double clicking on an address navigates to the Source View If debug information is not available double clicking on an address navigates to the Disassembly View When a function contains inlined instance
130. le is taken for the process CodeAnalyst analyzes the call stack information and produces the following information e a summary of function call relationships who called whom 69 Types of Analysis e the number of CSS samples associated with individual functions e the children called by each function Compared to regular time event or instruction based sampling CSS sampling has higher overhead because the call stack must be processed by unwinding whenever a CSS sample is taken The unwind operation combined with the larger amount of data that must be written to the trace file creates the higher overhead However two parameters help reduce the overhead e the unwind level that controls the depth to which the stack is explored e the CSS interval that determines how often a CSS sample is taken with a sample Note Call stack sampling requires the thread group ID TGID of the target process If the target application is specified in the Launch field of the Session Settings Dialog CodeAnalyst will use the TGID of the launched application Otherwise users must provide the TGID at the start of the profile 3 7 1 Enabling CSS Profile To enable CSS profiling open the Session Settings dialog and navigate to the Advanced tab select the Enabling Call Stack Sampling CSS check box and specify the desired CSS depth and interval If the target application is specified in the Launch field in the General tab check the Use TGID of
131. leted fetch is an attempted fetch that successfully delivered instruction data to the decoder An aborted fetch is an attempted fetch that did not complete Note Instruction fetch is an aggressive speculative activity and even instruction data produced by a completed fetch may not be used IBS Op Sampling IBS op sampling selects tags and monitors macro ops as issued from AMD64 instructions Two options are available for selecting ops for sampling Cycles based selection counts CPU clock cycles The op is tagged and monitored when the count reaches a threshold the sampling period and a valid op 1s available Dispatched op based selection counts dispatched macro ops When the count reaches a threshold the next valid op is tagged and monitored n both cases an IBS sample is generated only if the tagged op retires Thus IBS op event information does not measure speculative execution activity The execution stages of the pipeline monitor the tagged macro op When the tagged macro op retires a sampling interrupt is generated and an IBS op sample is taken An IBS op sample contains a timestamp the identifier of the interrupted process the virtual address of the AMD64 instruction from which the op 62 Types of Analysis was issued and several event flags and values that describe what happened when the macro op executed CodeAnalyst uses this and other information to build an IBS profile 3 4 2 1 Bias in Cycles Based IBS O
132. ling paused C Profile the duration of the app execution Profile Configuration Current event based Current eventbased profile we AA A A 1000000 CE clocks not halted mn 500000 Retired instructions Remove 2 7 2 1 Template Name This is a name that is assigned to the session If the session name is not changed CodeAnalyst will auto generate new session names by appending a number to the end of the base session name 2 7 2 2 Launch Control 2 7 2 2 1 Launch and Working directory Launch Users can specify the application program to launch in the Launch field Enter the path to the executable program to be launched You may also enter the path to a shell script file to be started instead of an executable program You may also leave this field blank in order to perform system wide data collection when overall system monitoring is required Working Directory is the working directory for the application to be launched Enter the path to the working directory in this field In addition to directly entering path names into the Launch and Working directory fields you may browse to the desired location by clicking the appropriate Browse button Each field also offers a drop down list of the most recently used path names The drop down lists retain the last 10 application paths and the last 10 working directory paths respectively 38 Features 2 7 2 2 2 Options Terminate the app after the profile terminates the
133. lling to distribute software through any other system and a licensee cannot impose that choice This section is intended to make thoroughly clear what is believed to be a consequence of the rest of this License Section 8 If the distribution and or use of the Program is restricted in certain countries either by patents or by copyrighted interfaces the original copyright holder who places the Program under this License may add an explicit geographical distribution limitation excluding those countries so that distribution is permitted only in or among countries not thus excluded In such case this License incorporates the limitation as if written in the body of this License A 2 10 Section 9 The Free Software Foundation may publish revised and or new versions of the General Public License from time to time Such new versions will be similar in spirit to the present version but may differ in detail to address new problems or concerns Each version is given a distinguishing version number If the Program specifies a version number of this License which applies to it and any later version you have the option of following the terms and conditions either of that version or of any later version published by the Free Software Foundation If the 161 GNU General Public License Program does not specify a version number of this License you may choose any version ever published by the Free Software Foundation A 2 11 Section 10 If
134. lysis can be combined with all other types of analysis mentioned above The basic block and in line analysis are static analysis which inspects binary modules and allows users to better analyze the compiler generated assembly code by identifying basic block and inline instance 3 2 Time Based Profiling Analysis Time based profiling TBP identifies the hot spots in a program that are consuming the most time Hot spots are good candidates for further investigation and optimization Time based profiling is system wide so a hot spot can be identified anywhere in the system not just an application under test Any software component an executable image dynamically loaded library device driver or even the operating system kernel that executes during the measurement period may be sampled System wide data collection conveniently handles applications consisting of parallel processes without any special configuration 54 Types of Analysis Figure 3 1 Time spent in each software module TBP System Data Module Name 64 bit CPU clocks w AA frooticlassic classic ina zmlinux libed libc 2 5 so Jusrilib amp 4 qt 3 3 lib libqt mt so 3 3 6 libe4 libpthread 2 5 so lopt CodeAnalystibin aprafiled llib amp 4 Id 2 5 sa libe4 libcrypta so 0 9 8e Jusrilib amp d qt 3 3 plugins styles bluecurnrgze so Jusrilib amp 4 libX11 sao 6 2 0 Jusrilib amp d libpythanz2 4 so 1 0 44444445444 4 CodeAnalyst reports performance results in one or more
135. mandline 3643 is shown along with the tgid in the Module gt Process column 141 Tutorial my_projecticaw fanbara z Manage plan Es E TBP Sessions S 5 K i ha JavaSession EE h EBP Sessions Aggregate by Modules Module gt Process Snt scimark2 commandline i sandbox jdk1 7 0 bin java 3643 24826 f na vmlinux 350 sandboxjdki1 7 O refibfamd64 serventibyvm so 15 Mb x86_64 Anux gnu ibe 2 13 30 fusrlib xB6 64 linux gnu libX 11 s0 5 3 0 Isandbox jdk1 7 Q jre lib amd64 libzip so l apt CodeAnalyst bin oprofiled lib x86 64 linux gnu kd 2 13 sn l usr lib xB6 64 linux gnu libOtGui so 4 7 4 fusrlib xB6 64 linux gnu libOt3sSuppoart sa 4 7 4 usrilib libbfd 2 21 53 system 20110810 sn l ib xB6 64 linux gnu libpthread 2 13 sn j j j Pi m3 m3 Pi cd Sampling Session Idle A y When aggregated by process the view displays distribution of timer samples across the software processes that were active during data collection Each process can be expanded to show samples aggregation with respect to each process ID and its dependent module 142 Tutorial E File Profile Tools Windows Help HER ry All Bl ra E Ei 3 0 Time based profile my_projectcaw A woss Manage ERE E TEP Sessions System Data JavaSession ll Aggregate by Processes ll i EBP
136. mask hexadecimal integer The select and mask attributes take values as specified by the definition of performance events in the BIOS and Kernel Developers Guide BKDG CodeAnalyst may choose to ignore the mask attribute when selecting event data With respect to future expansion and enhancement new elements like the lt event gt element may be defined to accommodate new hardware performance measurement features 7 5 1 3 A lt output gt The lt output gt element specifies how event data will be shown The lt output gt element contains one or more lt column gt elements Each lt column gt element specifies a column of event data in a table A lt column gt element has the following attributes 110 View Configuration title Title to be displayed along with the data string sort Controls sorting of data in the column string visible Controls column visibility Boolean The title attribute is a descriptive string that is used to label the data in a table The sort attribute specifies whether the data should be sorted or not The sort attribute has only three permitted values ascending descending and none At most one column should have a sort attribute of ascending or descending all other columns should be none The visible attribute is optional and has the default value true A column may be hidden by setting the visible attribute to false Generally you should not need to define a hidden column 7 5 1 3 1
137. mple program classic which performs a textbook implementation of matrix multiplication 136 Tutorial ae TBP Sessions F EBP Sessions Session Session 01 Session 02 amp root AMD CodeAnalyst samples classic classic 19951 17926 K no vmlinux 421 150 1 amp lib64 libc 2 12 so 110 Kr opt CodeAnalyst bin oprofiled 84 1 Pt lib64 1d 2 12 so iI usr lib64 qt 3 3 lib libgt mt so 3 3 8 i sbin killall5 Kr usr libe4 libXft so 2 1 13 Pt lib64 libpthread 2 12 so amp lib64 libpam so 0 82 2 amp lib64 libglib 2 0 so 0 2200 5 amp usr lib64 libxcb so 1 1 0 amp sbin auditd F opt CodeAnalyst bin CodeAnalyst 8 6 3 Drilling Down Into IBS Data In order to find the source of the performance issue in the example program we need to drill down into the classic module 1 Double click on the module name classic in the System Data table A list of functions within classic is displayed with the IBS information for each function CodeAnalyst retains the IBS MEM data TLB view The function multiply matrices has the most load store activity and incurs the bulk of the DTLB misses TBP Sessions EBP Sessions Session el Em i Session 01 Session 02 b 0x400abd multiply_matrices 0x400824 initialize matrices 0x400708 Unknown Sample 137 Tutorial 2 Double click on the function multiply matrices in the Module Data table i e the table of functions within classic The source code for
138. mpling Session Started 15 g e 9 When the sampling session is complete the application under test terminates and the performance data is processed The work space then displays a module by module breakdown of the results in the System Data table Select the System Tasks tab to see a task by task breakdown of the results Double click on a module or task to drill down into the data 90 Collecting Profile Figure 5 10 Performance Data Results Advanced Micro Devices CodeAnalyst root AMD CodeAnalyst my y project my project caw Ses Session c co T Eile Profile Tools Windows Help morao a TAE Li m 2 TEP o System Data EBP Sessions i Session ric by pus na vmlinux llib libc 2 12 1 so ront classic classic 4551 lopt CodeAnalyst trunk bin CodeAnalyst 1575 Isbin killall5 4553 lopt CodeAnalyst trunk bin oprofiled 4549 Jusr bin vim gname 3980 Jusr sbin sshd 1464 Jusr bin sudo 4555 fusr sbinjirgbalance 1016 E fusr bin sude 4554 fopt CodeAnalyst trunk bin oprofiled Jusr sbin sshd El 5 3 2 Changing the Current View of the Data To change the type of data displayed in the current view click Manage The View Management dialog box opens Refer to the Section 7 3 View Management section for details The items listed in the Columns part of the View Management dialog box depend on the view configuration that is currently open for
139. must show them these terms so they know their rights We protect your rights with two steps 1 copyright the software and 2 offer you this license which gives you legal permission to copy distribute and or modify the software Also for each author s protection and ours we want to make certain that everyone understands that there is no warranty for this free software If the software is modified by someone else and passed on we want its recipients to know that what they have is not the original so that any problems introduced by others will not reflect on the original authors reputations Finally any free program is threatened constantly by software patents We wish to avoid the danger that redistributors of a free program will individually obtain patent licenses in effect making the program proprietary To prevent this we have made it clear that any patent must be licensed for everyone s free use or not licensed at all 158 GNU General Public License The precise terms and conditions for copying distribution and modification follow A 2 TERMS AND CONDITIONS FOR COPYING DISTRIBUTION AND MODIFICATION A 2 1 A 2 2 A 2 3 Section 0 This License applies to any program or other work which contains a notice placed by the copyright holder saying it may be distributed under the terms of this General Public License The Program below refers to any such program or work and a work based on the Program means eith
140. nalyst samples classic classic Src Dasm Session EBP Sessions Function 0x400abd 0x400b99 multiply matrices Pid All Pid Tid All Tid Type Source and Dasm Show System Percentage Search for Al Go Address Une Source OO O O d SH Jasm sum sum 605 0x400aea v 0x10 rbp tedx 0x400aed 0x4 3rbp teax 0x400af0 0x400af 2 Fedx rdx 0x400af5 0x3e8 trdx rdx 0x400afc rdx rax 1 rax 0x400b00 0x6012a0 3rax 4 0x400b09 v 0x4 3rbp tedx 0x400b0c Oxc 3 rbp teax 0x400b0f 0x400b11 sedx rdx 0x400b14 0x3e8 rdx rdx 0x400b1b 3 rdx rax 1 rax 0x400b1f 0Ox9dlba0 trax 4 0 Instruction Sampling Session Idle 8 5 Tutorial Analysis with Event Based Sampling Profile This section is a brief introduction to analysis with event based profiling A CodeAnalyst project must already be opened by following the directions under Section 8 3 Tutorial Creating a CodeAnalyst Project or by opening an existing CodeAnalyst project It also assumes that session settings have been established and CodeAnalyst is ready to profile an application Event based profiling uses the hardware performance event counters to measure the number of specific kinds of events that occur during execution Processor clock cycles retired instructions data cache 122 Tutorial accesses and data cache misses are examples of events The speci
141. nd The fetch latency number of processor cycles from when the fetch was initiated to when the fetch completed or aborted Event based profiling would require several different counters in order to collect as much information as IBS Further the fetch address precisely identifies the fetch operation associated with the hardware events The IBS fetch address may be the address of a fetch block the target of a branch or the address of an instruction that is the fall through of a conditional branch A fetch block does not always start with a complete valid AMD64 instruction This situation occurs when an AMD64 instruction straddles two fetch blocks In this case CodeAnalyst associates the IBS fetch sample with the AMD64 instruction in the first preceding fetch block The terms killed attempted completed and aborted refer to specific hardware conditions A fetch operation may be abandoned before it delivers data to the decoder A fetch may be abandoned due to a control flow redirection and it may be abandoned at any time during the fetch process A fetch abandoned before initial access to the ITLB before address translation is not regarded as useful for analysis These early abandoned fetches are called killed fetches CodeAnalyst filters out killed fetches The fetch operations remaining after killed fetches are eliminated are called attempted fetches since these fetches represent valid attempts to obtain instruction data A comp
142. nstruction in some basic blocks Right click at the end of a basic block and the pop up menu lists the destination address of the control transfer instruction see following image Figure 3 10 Basic block information in Disassembly View Address Cade Bytes Instruction Symbal CPU clocks 4 l 0xXoc000000000400382 55 multiply matrices 0 1 0x0000000000400a93 456965 mov rsp orbp 0x0000000000400a496 c7 4510 00 00 00 00 mowl 0x0 0xfffffffffffffffO 26 rb p OxOooooooooo400asd eSad00 00 00 jmpq 400b4f lt multiply_matrices ox0000000000400aa2 c7 4514 00 00 00 00 maow 0x0 0xffffffffffffffd 25 rbp multiply matrices 0x10 0 1 0x0000000000400aa9 e39 90 000000 jmpq 400b3e multiply matrices chal eect bs 00 00 00 00 mov x 95eax multiply matrices Ox1c 0 2 1 0x0000000000400ab3 aa 45 fa mow 9 5Seax xffffffffffffffai2erbp 0x0000000000400ab6 c7 45 tc 00 00 00 00 mowl 0x0 0xfffffffffffffffc 26 rbp Ox0000000000400abd eb 52 jmp 400b11 multiply matrices Ox0000000000400abT 8b 45 f mov OxTffffffffffiffff varbpi9seax multiply matrices 0x2d 7 2 0xXo0o0000000004 0002 ab 55 fc mow OXxfTIIIIfffffc oerbp SS eds ox0000000000400ac5 48 98 cItq OxO000000000400ac 48 63 d movslg 9sedx srdx OxoOoooooooo0400aca 45 69 c es 03 00 00 imul 0x3e68 95rax 9 56rax 643 0x0000000000400ad1 48 01 d add rdx rax 11362 1 instructions Total Figure 3 11 Basic block pop up me
143. nter the name of the CSV file to which the data from the System Data tab is to be written 4 Click the Save button The data is converted to CSV format and is written to the file 5 Launch a spreadsheet program like Microsoft Excel or OpenOffice org Calc 6 Import the CSV file into the spreadsheet Figure 2 30 Select Export System Data Eite Profile Tools Windows Help LI Ctrl N Bi Open Ctrl 0 FA Save Ctrl 5 Export System Data Import Close Ctrl 4 W Quit Ctr4Q Iraaot AMD CaodeAnalyst test classic test classic caw Iroat AMD CaodeAnalyst test classis test classic caw froovAMD CodeAnalyst CodeAnalyst caw Iroot AMD CodeAnalyst test CA test classic caw frootAMD CodeAnalysttest CA test Nurma test Numa caw 35 Features Figure 2 31 Specify output CSV file Ek Lookin E rootfAMD CodeAnalysttest_classic my session ebp dir Ey Session ebp dir File name File type Figure 2 32 Import CSV into a spreadsheet Ed Microsoft Excel my data csv tes File Edit View Insert Format Tools Data Window Help yy Fy ey A 4 dd Y Reply with Changes End Review AT e Module Name RCS AA E NT GA Module Name ITask Mame 64 bit CPU clocks DC accesses DC misses DTLB LIN 2 sandbax CA Lin 27 dewpackage Gamples classic EE 3 no vmlinux B51 4 fusrilacal bin aprafilad gy 5 Mlibb4 libe 2 5 5n 55 B fusr libbafgt 3 3vlib libgt mit 0 3 3 6 3a Mlib54 libglib 2 0 so D 1200 3
144. nto original in line function cece cece cece cence ee eceeeeeeeeeaeeseeeaeeeas 24 2 2 T Pee Se Cale Samples nto Wasi DIOCKS editado olas tai lua lidad ccr c tetads 25 2 28 Assess Performance Configuration with 1 msec MUX Interval ooooooccnccncncnccnoncncnnonono 27 2 29 IniDObt WIZ a A E COo Tan tod 30 2 90 SelecE EXPO Sy Stemma acoustic ansa Peta pd li did 35 2 9 4 SPECI yout S V He aco dep tu roto A II 36 2 92 Import C S V imto SPIE dd DEL doe erede env A tie teur edes e URDU 36 2 29 DeSSIONS A AAA tnt amen M itus tar fa cadet uoti A eae Coco s Loa E MUS 37 LIS ESO setunss Dialog General Paba E EA 38 2 95 Session Set nps Dialog Advance Tab condssetesiss as A ue Hte ORS 40 2 50 essi n ocn aS Dialog Note TD o 41 2 37 Session Settings Dialog OProfiled Log Tab Property Mode Only oooooococcccccconcnnoncnnoos 42 2 99 Drocess PHter Dial 9 2 55 93 a 47 2 99 Command Hae Sw tehes to opc oO L o io dd di eca ooda 48 2 40 isting events ODCOBLTOLS T eiie AA da 50 2 41 OProtile Daemon Driver Monttorins Tool unit ext ere t dI eS Tesi tex T eres iae dedos 52 Sal Time spent im cach sortware module CEBB ata 55 3 2 Time spent in each function within an application TBP sse 55 2 5 Time spent at source level hotspot TBP IA 56 3 4 Retired instructions DC accesses and misses per software module EBP 58 3 5 Retired instructions DC accesses
145. nu Address Code Bytes Instruction Symbol O0x0000000000400a92 55 push rbp multiply matricesi OxOO00000000400a93 48 69 e5 mov rsp erbp 0xo0000000000400a96 c7 4510 00 00 00 00 mov 0x 0xfITIITITITITITO v5 rb p 0x0000000000400a9ed e9 ad 00 00 00 jmpq 400b4f lt multiply_matrices 0x0000000000400a132 c 4514 00 00 00 00 mowi 0x 0xfffffffffffiTIT4 26 rb p multiply matricesi OxOO00000000400aa9 e9 90 00 00 00 WEEE UHER o gt 0x0000000000400aae b8 00 00 00 00 mov 0x0 ea Copy Selection Ctrl C ricas 0x0000000000400ab3 89 4518 mov eax Oxtt Page up PgUp 0x0000000000400ab6 c7 45 fc 00 00 00 00 movl 0x Oxffff PgDown 0x0000000000400abd eb 52 imp 400b11 lt n 0x0000000000400abf Bb 4510 mow Ox show trices 0x0000000000400acz Bb 55 fc mov Oxfffffffff 0x0000000000400ac5 48 98 cita Dest Ox400b3e D xagooo00000400ac7 48 63 d2 movslq edx rdx xnnnnnnnoanapanpdn narca da AA ro eR 03 00 AA imul 024268 rax Sra 3 6 In Line Analysis CodeAnalyst provides two modes to help analyze in line functions 3 6 1 Aggregate samples into in line instance This is the default aggregation mode Inline instance typically resides in the caller function When samples belong to an in line instance CodeAnalyst aggregates them into the caller function and uses blue text to 66 Types of Analysis identify the in line instance together with in line function name In the example in the pre
146. ocess Filtering Advance Filter Profile Control Profile duration sec pp Profile start delay sec 0 Stop profile when the app exits C Start with the profiling paused Profile the duration of the app execution Profile Configuration 1700000 1 msec 0x0 CPU clocks not halted cycles Remove When CodeAnalyst launches the application program root classic using the CPU affinity mask execution is restricted to core 1 The following screen shot shows that execution was indeed limited to core 1 since all timer samples for classic are attributed to core 1 and no samples were collected for classic on any other core Symbol Offset CPU clocks CO 4 CPU clocks C1 CPU clocks C7 ES 0400492 multiply_matrices E Ox400b3a multiply matrices Oxab z xd bzc multiply matricesi xSa L 0x400b1d multiply matrices x8 amp b m Ox400bla multiply matricesi 0x88 i Ox400b0d multiply matrices x7b L 0x400b08 multiply matrices 0x76 L 0x400aff multiply matrices x amp d L 0x400afb multiply matrices 0x65 L 0x400aef multiply matrices x5d L 0x400ae8 multiply matrices 0x56 L 0x400ae0 multiply matrices Oxde 0x400ad1 multiply matrices Ox3f 2 7 1 Process Filter When the checkbox Apply Process Filter is selected CodeAnalyst will filter out all other processes except the ones being specified in Launch Users can also specify a list of processes to be included in post processing
147. ogram is a subroutine library you may consider it more useful to permit linking proprietary applications with the library If this is what you want to do use the GNU Library General Public License instead of this License 163 Appendix B Features List B 1 New Features in CodeAnalyst 3 0 New System Data View tab New Source Dasm View tab Annotate inlined instances when viewing source Aggregation and filtering by PIDs and TIDs B 2 New Features in CodeAnalyst 2 9 Redesign Session Setting dialog Redesign Event Selection dialog Improve JAVA profile to handle JIT modules unloading mprove security and permission management for non root users OProfile daemon driver monitoring tool e New CA OProfile based on OProfile 0 9 6 B 3 New Features in CodeAnalyst 2 8 Support for AMD processor Family 10h Revision 4 e New mode for IBS OP Section 3 5 Basic Block Analysis Section 3 6 In Line Analysis True event multiplexing Session Import by TBP EBP file B 4 New Features in CodeAnalyst 2 7 CodeAnalyst 2 7 added support for Instruction Based Sampling IBS Instruction Sampling is a new performance measurement technique supported by AMD Family 1Oh processors IBS has these advantages BS precisely associates hardware event information with the instructions that cause the events A data cache miss for example is associated with the AMD64 instruction performing the memory read or write
148. ols Windows Help IB AA dE 9009 El my project caw _ TBP Sessions El Sampling Session Started I X 8 When the sampling session is complete the application under test terminates and the performance data is processed The work space then displays a module by module breakdown of the results in the System Data table Select the System Tasks tab to see a task by task breakdown of the results Double click on a module or task to drill down into the data Figure 5 4 System Data results 4 Module gt Process CPU clocks H root classic classic Ino wmlinux H Mlibilibe 2 12 1 50 H Jusr lib libat mt so 3 3 8 H lib ld 2 12 1 50 H libllibpthread 2 12 1 50 H Jusr sbin sshd El Sa mpling Session Idle P 86 Collecting Profile 5 2 2 Changing the Current View of the Data To change the type of data displayed in the current view click Manage The View Management dialog box opens Refer to the Section 7 3 View Management section for details The items listed in the Columns part of the View Management dialog box depend on the view configuration that is in current use Figure 5 5 View Management mn View Management Platform name View name PORIE Description This special view has all of the data from the profile available Columns Available data Columns shown CPU Clocks not Halted 0 0x0 Options Separate CPUs Show Percentage 5 2 3 Aggregate by Process
149. on 1 with OProfile Command Line Utilities 48 Index profiling importing remote 32 project panel 4 projects and sessions 3 R Report Problems 157 review data java app 141 S separate CPUs 107 Session Settings 17 session settings 37 show percentage 107 single module data tab 13 source dasm tab 14 status bar 4 system data tab 12 T TBP collecting 84 configuring 73 how 1t works 56 predefined profile 57 sampling and measuring period 56 TBS 87 changing view 87 Template Name 38 Templates Setting 37 TGID for CSS 41 tiling session panes 10 Time Based Profiling Analysis 54 Toolbar 5 toolbar float or dock 5 tools 6 tuning cycle 2 tutorial 113 create project 113 EBS Profile 122 IBS Profile 131 prepare application 113 profiling java app 139 TBS Profile 117 Types of analysis 2 V view available data 106 column shown 106 168 Index columns 106 description 106 name 106 view configuration 104 file format 109 view management 105 viewing results 104 vmlinux enable 40 W Working Directory 38 X xml file 109 example 102 111 xml format 100 169
150. on of the AMD BIOS and kernel developer s guide for support details 3 4 2 2 IBS Op Data IBS op sampling reports a wide range of event data The following values are reported for all ops e The virtual address of the parent AMD64 instruction from which the tagged op was issued e The tag to retire time the number of processor cycles from when the op was tagged to when the op retired and The completion to retire time the number of processor cycles from when the op completed to when the op was retired Attribution of event information is precise because the IBS hardware reports the address of the AMD64 instruction causing the events For example branch mispredicts are attributed exactly to the branch that mispredicted and cache misses are attributed exactly to the AMD64 instruction that caused the cache miss IBS makes it easier to identify the instructions which are performance culprits Some ops implement branch semantics Branches include unconditional and conditional branches subroutine calls and subroutine returns Event information reported for branch ops include Whether the branch was mispredicted and e Whether the branch was taken IBS also indicates whether a branch operation was a subroutine return and if the return was mispredicted Some ops may perform a load memory read store memory write or a load and a store to the same memory address as in the case of a read op write sequence When an op performs a load an
151. oncerts cal AS tee Desc ewe db MEA ut 164 Bo New Features 10 Code Analyst 2 0 solet altre cen dats iautienenmo ten lta Du deett bo etit aR E uf 165 DID MO STPS 166 io WEE 167 vi List of Figures DA al AAA o 4 D2 SOLAS cR 4 2 9 TOOlDats MACU VS ando AC UNE dat lan lia estrias 5 24 Floating ang docking tOOIDOlS AAA O T 5 2 3 Protile ment and LOO Dal aria sopas al 20 TOOlS meni and OOIDaL sis des 8 24 sau and Indc uve ICONS o radios 8 228 WAC OWS INI CIA end E A AA onder oes 9 29 sCasCadine SESSION Dalles aer tara ratita 10 2 10 Tile SESSIOM A A E 11 SABES UERSUM 11 212 ystem Data tab Xebsresadted by Module a 12 2 15 ystemDatz t b Asore ated by PrOCESS seras tos lo dira 13 Z 14 S me e Module Dato io aisl 14 2 15 Source and Disassmbly Mode in Src Dasm mode sse 15 2 16 Source and Disassembly Tab in Disassembly only Mode oooocccccconcnconcnconcnnoncnnonnononnononnos 15 PANNE Code Density CALL astron LUTTE 16 2 18 A drop down list provides choices for selecting code density cccceccec eee eceeeeeeeeeeeneenes 16 Z9 TS O 17 220 Edit event COMES MAON a AAA maton 18 2 21 Global View Mana sometida AAA cda 19 2222 Conteuratonna a semen ios 20 2 29 COdeANalysMODUONS PE E EE 0 O Get iuateteniaet ad alam ieee tet 21 QA BEGIN Qi CONS AA A do 22 2 25 Aggregate samples into instance of in line function ccc cece cence nee c ee eee ee eeaeneenennenen 23 2 26 Aggregate samples i
152. one or more coherent HyperTransport links and through a remote memory controller 9 3 5 8 Event OxF247 Abbreviation IBS NB local other The number of IBS op samples where a load operation was serviced from local MMIO configuration or PCI space or from the local APIC 9 3 5 9 Event OxF248 Abbreviation IBS NB remote other 155 Performance Monitoring Events The number of IBS op samples where a load operation was serviced from remote MMIO configuration or PCI space 9 3 5 10 Event 0xF249 Abbreviation IBS NB cache M The number of IBS op samples where a load operation was serviced from local or remote cache and the cache hit state was the Modified M state 9 3 5 11 Event 0xF24A Abbreviation IBS NB cache O The number of IBS op samples where a load operation was serviced from local or remote cache and the cache hit state was the Owned O state 9 3 5 12 Event 0xF24B Abbreviation IBS NB local lat The total data cache miss latency in processor cycles for load operations that were serviced by the local processor 9 3 5 13 Event 0xF24C Abbreviation IBS NB remote lat The total data cache miss latency in processor cycles for load operations that were serviced by a remote processor 156 Chapter 10 Support 10 1 Enhancement Requests Please email the following information about a desired enhancement or change to CodeAnalyst support amd com mailto CodeAnalyst supportOamd com e State which
153. ore details You should have received a copy of the GNU General Public License along with this program if not write to the Free Software Foundation Inc 51 Franklin Street Fifth Floor Boston MA 02110 1301 USA Also add information on how to contact you by electronic and paper mail If the program is interactive make it output a short notice like this when it starts in an interactive mode Gnomovision version 69 Copyright C year name of author Gnomovision comes with ABSOLUTELY NO WARRANTY for details type show w This is free software and you are welcome to redistribute it under certain conditions type show c for details The hypothetical commands show w and show c should show the appropriate parts of the General Public License Of course the commands you use may be called something other than show w and show c they could even be mouse clicks or menu items whatever suits your program You should also get your employer if you work as a programmer or your school if any to sign a copyright disclaimer for the program if necessary Here is a sample alter the names Yoyodyne Inc hereby disclaims all copyright interest in the program Gnomovision which makes passes at compilers written by James Hacker lt signature of Ty Coon gt April 1989 Ty Coon President of Vice This General Public License does not permit incorporating your program into proprietary programs If your pr
154. p 2 4 so 2 5 2 usr bin sudo lib64 libglib 2 0 so 0 2200 5 f Sampling Session Idle ee DA 8 5 3 Choosing Events for Data Collection The predefined profile configurations cover the most common kinds of performance analysis AMD processors however are able to monitor a wide range of performance events Setting Templates Template Name Sesion 000 General Advanced Note Launch Control Launch l Iroot AMD CodeAnalyst samples classic classic Browse Working directory root AMD CodeAnalyst samples classic Browse M Terminate app when stop profile Enable CPU Affinity in Hex oxar Select Affinity Show app in terminal Enable Process Filtering Advance Filter Profile Control Profile duration sec fo Profile start delay sec lo F Stop profile when the app exits _ Start with the profiling paused Profile the duration of the app execution Profile Configuration Current event based profile 1000000 0x00 CPU clocks not halted cycles Save Ok Cancel 2 127 Tutorial 1 To configure data collection using events of your own choice click on the Session Settings button in the toolbar A dialog box appears asking for session settings 2 Choose the Current event based profile configuration in the list of profile configurations You may freely edit and change this profile configuration and may use this profile configuration as a scratchpad fo
155. p Sampling A cycles based selection generally produces more IBS samples than dispatched op based selections However the statistical distribution of IBS op samples collected with a cycles based selection may be affected by pipeline stalls and other time dependent hardware behavior The statistical bias is due to stalls at the decoding stage of the pipeline If a macro op is not available for tagging when the maximum op count is reached the hardware skips the opportunity to tag a macro op and starts counting again from a small pseudo random initial count From a practical perspective the distribution of cycles based IBS op samples may not be uniform across instructions with the same execution frequency 1 e across instructions within the same basic block The statistical distribution of IBS op samples collected with dispatched op based selection is generally more uniform across instructions with the same execution frequency This is a useful property in practice as IBS op statistics can be more readily used to make comparisons between instruction behavior The dispatched op based selection is the preferred collection mode and should be used when available Note The cycles based selection is supported on all IBS capable AMD processors The dispatched op based selection is a newer IBS feature and is not supported on all IBS capable AMD processors and is only available in AMD Family 10h processors revision 4 and beyond Refer to the relevant versi
156. parate works But when you distribute the same sections as part of a whole which is a work based on the Program the distribution of the whole must be on the terms of this License whose permissions for other licensees extend to the entire whole and thus to each and every part regardless of who wrote it Thus it is not the intent of this section to claim rights or contest your rights to work written entirely by you rather the intent is to exercise the right to control the distribution of derivative or collective works based on the Program In addition mere aggregation of another work not based on the Program with the Program or with a work based on the Program on a volume of a storage or distribution medium does not bring the other work under the scope of this License Section 3 You may copy and distribute the Program or a work based on it under Section A 2 3 Section 2 in object code or executable form under the terms of Section A 2 2 Section 1 and Section A 2 3 Section 2 above provided that you also do one of the following a Accompany it with the complete corresponding machine readable source code which must be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange or b Accompany it with a written offer valid for at least three years to give any third party for a charge no more than your cost of physically performing source distribution a complete machine
157. pcontrol event specification is opcontrol eventZRETIRED INSTRUCTIONS 250000 1 1 50 Features The Event Count field of the event specification determines how many times the event is to be counted before an EBP sample is collected This field is important because some events happen so frequently that the processing of the samples becomes very slow or collecting the samples can cause the system to stop responding This latter scenario may occur if the specified Event Count is too small for an event For example event 0x4000 DCache Access happens quite frequently so specifying a count of 10000 will generate a significant number of samples during a normal 5 second profiling session It is advisable to specify a large count number when profiling an event for the first time and allow for later adjustments of the count to get more statistically useful data The following example command measures the Data Cache Accesses event Event Select 0x040 using an Event Count of 100 000 opcontrol event DATA CACHE ACCESSES 100000 1 1 The user is allowed to specify up to four events on the command line For example to specify events DATA CACHE ACCESSES and RETIRED INSTRUCTIONS in the same session with reasonable counts enter the following command opcontrol event DATA CACHE ACCESSES 100000 1 1 event RETIRED_INSTRUCTIONS 25000 1 1 After profiling has begun oprofile prints out the confirmation messages and returns to command prompt
158. ples for ops that performed a memory load and or store operation Tag to retire time is the number of cycles from when an op was selected tagged for sampling to when the op retired Completion to retire time is the number of cycles from when an op completed finished execution to when the op retired Total and average tag to retire and completion to retire times are shown in the next view Session Session 01 Session 02 root AMD CodeAnalyst samples classic classic 2015696 r no vmlinux 114970 i lib64 libc 2 12 50 8733 i opt CodeAnalyst bin oprofiled 2879 i lib64 Id 2 12 so0 511 i usr lib64 qt 3 3 lib libqt mt so 3 3 8 174 sbin killall5 203 i usr lib64 libXft so 2 1 13 10 sbin auditd 17 opt CodeAnalyst bin CodeAnalyst P lib64 libpthread 2 12 so r lib64 libpam so 0 82 2 i lib64 libglib 2 0 so 0 2200 5 r usr lib64 libxcb so 1 1 0 Ln NY MN Cn i A A lc 0 3 Select the IBS MEM data cache view from the drop down list of views 135 Tutorial The IBS MEM data cache view is displayed This view shows information related to data cache DC behavior The number of sampled IBS load store operations is shown along with a breakdown of the number of loads and the number of stores The number of IBS samples where the load store operation missed in the data cache is shown The DC miss rate DC misses divided by the total number of op samples and DC miss ratio DC misses divided by the numb
159. profile or IBS based profile as selected by the profile configuration 5 View and explore the results 6 Save the project and session data to review it later or to share it 2 2 Exploring the Workspace and GUI 2 2 1 2 22 The Exploring the Workspace and GUI section serves as a visual guide to options and related screens AII options that have a separate information section contain links to that section Projects Panel The Project Panel opens initially with no projects open Projects consist of sessions created by using profile configurations and by importing performance data captured using the CodeAnalyst Command Line Utility program OProfile Kinds of sessions displayed are TBP time based sampling sessions EBP event based sampling sessions and IBS Instruction Based Sampling sessions Figure 2 1 Project Panel Ea TBP Sessions EBP Sessions Status Bar The status bar displays the current operation taking place For example while a profile is being collected the Sampling Sessions Started status bar displays the amount of time left to run in percentages The status bar displays some of the following examples Figure 2 2 Status bar mam Sampling session Idle Features 2 2 3 Toolbars The CodeAnalyst tools consist of menu items and corresponding icons or drop down lists Most menu items and icons are not active until after a session is opened or a profile session is running as sho
160. r Tools gt Sesssion Settings In the dialog select the Current Instruction based profile profile configuration and click Edit 2 In the Edit EBS IBS Profile Configuration dialog there might be some events in the field Events in this profile configuration Remove all existing events by selecting each event and click Remove Event button 3 In the IBS Fetch tab select all events by pressing Ctrl a on the keyboard then click the Add Events button 131 Tutorial 4 In the IBS Op tab select all events by pressing Ctrl a on the keyboard then click the Add Events button e Profile Name X Current instruction based profile Manage Profiles Perf Counter IBS Fetch IBS Op Import Info Please select event from the list below FR IBS tag to retire cycles FR IBS completion to retire cycles FR IBS branch op Add Events Event Setting Count 500000 IG IG IG IG IG IG IG IG IG IG IG IG 500000 500000 500000 500000 500000 500000 500000 500000 500000 500000 500000 500000 500000 IBS all fetch s IBS fetch killed IBS fetch atte IBS fetch com IBS fetch abor IBS L1 ITLB hit IBS L1 ITLB IBS L1 ITLB IBS instructio IBS instructio IBS 4K page t IBS 2M page IBS fetch late IBS Op Options Enable Dispatch count mode Enable Mem Access Log Reserved Reserved Reserved Reserved Reserved Reserved Appl
161. r customized EBP configurations 3 Click the Edit button A dialog box appears which allows you to edit the Current EBS IBS Profile Configuration dialog The Current event based profile configuration in this example already contains the CPU clocks not halted event 4 In the Perf Counter tab scroll through the list of individual events to find the Retired instructions event Select Retired instructions and click Add Event The 0x00c0 Retired instructions event is added to the list of events in the configuration Set the Count field to 1 000 000 and click Apply Setting 5 Find and select the 0x00c1 Retired uops event and repeat the previous step If you make a mistake and need to remove an event from the configuration select the event and then click the Remove Event button The Event Count field specifies the sampling period for the event The Event Count determines how often a sample is taken for the event If the Event Count is N then a sample will be taken after the occurrence of N events of that type Use smaller Event Count values to sample an event more often However more frequent sampling increases measurement overhead Caution Choose the Event Count value conservatively Start with a large value first and then decrease the value until the desired measurement accuracy is achieved Very small values may cause the system to hang under certain workload conditions 6 Click OK to confirm th
162. r distribute the Program except as expressly provided under this License Any attempt otherwise to copy modify sublicense or distribute the Program is void and will automatically terminate your rights under this License However parties who have received copies or rights from you under this License will not have their licenses terminated so long as such parties remain in full compliance Section 5 You are not required to accept this License since you have not signed it However nothing else grants you permission to modify or distribute the Program or its derivative works These actions are prohibited by law 160 A 2 7 A 2 8 A 2 9 GNU General Public License if you do not accept this License Therefore by modifying or distributing the Program or any work based on the Program you indicate your acceptance of this License to do so and all its terms and conditions for copying distributing or modifying the Program or works based on it Section 6 Each time you redistribute the Program or any work based on the Program the recipient automatically receives a license from the original licensor to copy distribute or modify the Program subject to these terms and conditions You may not impose any further restrictions on the recipients exercise of the rights granted herein You are not responsible for enforcing compliance by third parties to this License Section 7 If as a consequence of a court judgment or allegation of pa
163. r family model etc It also shows the XML events file currently in use 4 4 Predefined Profile Configurations AMD processors offer a wide range of event types for collection and this can overwhelm new users Predefined profile configurations address the more common aspects of program and system performance analysis CodeAnalyst provides configurations that support various methods of data collection including time based profiling TBP event based profiling EBP and Instruction Based Sampling IBS Time based profile TBS Collect a time based profile Assess performance EBS Collect a profile that provides an overall assessment of performance Investigate data access EBS Investigate data cache DC and data translation lookaside buffer DTLB performance Investigate instruction access EBS Investigate instruction cache IC and instruction translation lookaside buffer ITLB performance Investigate L2 cache access EBS Investigate access to the unified L2 cache Investigate branching EBS Investigate branch behavior including branch misprediction Instruction based sampling IBS Investigate instruction fetch and macro op execution performance IBS Op Overall Assessment Get an overall assessment of performance using IBS Op 79 Configure Profile IBS Op Load Store DTLB Measure data load store DTLB performance IBS Op Load Store Memory Access Measure data load store memory access performance IBS Op Load Store Page
164. r of IBS op samples where a load operation was serviced from the local processor 154 Performance Monitoring Events Northbridge IBS data is only valid for load operations that miss in both the L1 data cache and the L2 data cache If a load operation crosses a cache line boundary then the IBS data reflects the access to the lower cache line 9 3 5 2 Event 0xF241 Abbreviation IBS NB remote The number of IBS op samples where a load operation was serviced from a remote processor 9 3 5 3 Event 0xF242 Abbreviation IBS NB local L3 The number of IBS op samples where a load operation was serviced by the local L3 cache 9 3 5 4 Event 0xF243 Abbreviation IBS NB local cache The number of IBS op samples where a load operation was serviced by a cache L1 data cache or L2 cache belonging to a local core which is a sibling of the core making the memory request 9 3 5 5 Event 0xF244 Abbreviation IBS NB remote cache The number of IBS op samples where a load operation was serviced by a remote L1 data cache L2 cache or L3 cache after traversing one or more coherent HyperTransport links 9 3 5 6 Event OxF245 Abbreviation IBS NB local DRAM The number of IBS op samples where a load operation was serviced by local system memory local DRAM via the memory controller 9 3 5 7 Event OxF246 Abbreviation IBS NB remote DRAM The number of IBS op samples where a load operation was serviced by remote system memory after traversing
165. r scripts Stop profiling shutdown 2 8 1 1 Time Based Profiling Time based profiling works by collecting samples at specified time intervals Over a period of time the samples collected can show which blocks of code use the most processor time See for further information The timer interval determines how often a TBP sample is taken On Linux Time based profiling uses event CPU_CLK_UNHALTED performance counter event 0x76 which represents the amount of running 48 Features time of a processor i e CPU is not in a halted state This event allows system idle time to be automatically factored out from IPC or CPI measurements providing the OS halts the CPU when going idle The time representation in seconds or millisecond can be calculated from the processor clock speed For instance on a processor running at clock speed 800MHz to specify 1 millisecond time interval of time based profiling using the opcontrol tool we input opcontrol event CPU_CLK_UNHALTED 800000 1 1 2 8 1 2 Event Based Profiling Event based profiling works by using performance counters in the processor to count the number of times a specific processor event occurs When the specified counter threshold of an event is reached Oprofile collects a sample from the processor Up to four events can be profiled in a given session and each event can be assigned a different counter threshold EBP requires an APIC enabled system See Section 3 3 Event
166. r to event data 7 5 1 4 A lt tool_tip gt 7 5 2 A lt tool_tip gt element is a non empty XML element that contains the text of a tool tip to be displayed for the view A lt description gt element is a non empty XML element that contains the text of a short description to be displayed for the view Leading and trailing space is trimmed from tool tip and description text Spaces are compressed and new line characters are replaced by a single space Example XML File Here is an example view configuration file Note that several optional attributes are not used in the example and they take on the appropriate default values 111 View Configuration lt Show instructions per cycle View configuration Date 17 May 2006 Version 0 1 gt lt view_configuration gt lt view name IPC separate_cpus F separate_processes F separate_threads F gt lt data gt event 1d CPU clocks select 76 mask 00 7 gt event 1d Ret instructions select c0 mask 00 gt lt data gt SOuULDpuL lt CoLum ocbroles vInebtructrone sorbet tuonmew lt value 1d Ret Ins LrucCtions gt lt column gt lt column title Cycles sort none gt lt value 1o CPU clocks 7 gt lt column gt column title IPC sort descending gt lt ratio left Ret_instructions right CPU_clocks gt lt column gt lt column title CPI sort none gt sratio JXett s CPU Glooks vrxghts hebt Tnstruotrons y gt
167. ranch op Add Events These tabs contain the list of IBS Fetch and IBS Op derived performance events These tabs are available only if the currently running system supports this performance monitoring feature Users can select multiple events and click Add Event to add them to the current profile configuration NOTE All selected IBS Fetch derived events must have same settings 1 e same counts and option mask The same rule applies to IBS Op 4 3 3 3 Import Tab Figure 4 7 Edit EBS IBS configuration Import Tab Perf Counter IBS Fetch IBS Op Import Events Fram Existing DC Configurations Existing Profile Configurations Lise this configuration to get an overall assessment of performance and to find potential issues for investigation Import Events Besides selecting individual events users can import event selections from an existing profile configuration This tab lists the available profile configuration in the drop down menu Once you select 78 Configure Profile a configuration description of the configuration will be shown in the field below To add events simply click on Import Events 4 3 3 4 Info Tab Figure 4 8 Edit EBS IBS configuration Info Tab System Information Model name AMD Engineering Sample Family 16 Model 4 Stepping 0 Events file apt CadeAnalyst share cadeanalyst events gh events xml This tab shows information of the currently running system such as processo
168. re e Assess performance e Investigate data access e Investigate instruction access e Investigate L2 cache access e Investigate branching CodeAnalyst also provides a configuration named Current event based profile that allows choosing your own events to measure change sampling periods event counts and change other EBP configuration parameters See Section 4 4 Predefined Profile Configurations for more information 3 4 Instruction Based Sampling Analysis Instruction Based Sampling IBS uses a hardware sampling technique to generate event information which is similar to that produced by Section 3 3 Event Based Profiling Analysis Instruction Based Sampling can be used to identify and diagnose performance issues in program hot spots IBS has these advantages e Events are precisely attributed to the instructions that caused the events e IBS produces a wealth of event data in a single test run e Latency is measured for key performance factors such as data cache miss latency The information provided through Instruction Based Sampling covers the most common kinds of information needed for program performance analysis Event based profiling however offers a wider range of events that can be monitored such as events related to HyperTransport links The processor pipeline stages can be categorized into two main phases instruction fetch and execution Each instruction fetch operation produces a block of instruction
169. rojects profile actions and creating configurations 2 2 6 File Menu and File Icons File Profile Tools Windows Help j New Ctri N Fy Open Ctri O Ej Save Ctr s Export Profiles Import Profiles Ame blabla caw tmip d d caw tmp blat blal caw Ampbla pt bla pt caw The commands available in the File menu are shown in the following table These icons also appear as toolbar group icons that can float or be docked Menu Command and Icon Fast Key Ctrl N Opens a new project This opens the Project Options dialog box Ctrl 0 Opens an existing project This opens a dialog box for navigating to CodeAnalyst workspace files Recently opened projects are listed at the bottom of the File menu Ctrl O Saves an open project Sav Export System Data Exports the project data to a comma separated value CSV formatted file This action opens Features Menu Command and Icon Fast Key Description Import Imports performance data files generated by Oprofile command line tools Ctrl W Closes an open project Ctrl Q Closes the application 2 2 7 Profile Menu and Toolbar Icon Group The Profile menu and icons are used to start pause and stop data collection Options are also part of the toolbar in the Profile toolbar group which includes the profile configuration to be used for collecting performance data Figure 2 5 Profile menu and toolbar Profile Tot
170. ry allows users to easily move profiling sessions between different CodeAnalyst projects However session properties are not available in this case 33 Features Import Local Profile from Oprofile sample Files Recommended Import Remote Profile from capackage sh Output Recommended f Import CodeAnalyst Session Dir CodeAnalyst Profile Session Dir CodeAnalyst profile session directory is generated as a result of each profile session and stored in the project directory i e home users AMD CodeAnalyst project dir session dir Iroot AMD CodeAnalyst 2 5 4 Import Opreport s XML Output Files This mode of import allows users to import the OProfile XML output file generated by the opreport utility using option X w d l o C Import Remote Profile from capackage sh Output Recommended Import CodeAnalyst Session Dir e Opreport XML Output File 2 6 Exporting Profile Data from CodeAnalyst AMD CodeAnalyst can export profile data from a table such as the System Data tab or Processes tab or source view The data is exported as a file containing comma separated values CSV This section illustrates the process of exporting data to a CSV file 1 With an open project and session click the System Data tab and select the Export System Data item from the File menu 2 With an open project and session select the Export System Data item from the File menu 34 Features 3 Select or e
171. s 0x 2d 16 4 2 Ox400b65 multiply matrices 0x78 12 4 Ox400ba6 multiply matrices Oxb9 10 z Ox400b9T multiply matrices Oxb2 6 4 pa 0400b7c multiply_matrices 0x8f 6 2 2 2 11 3 Source Disassembly Tab The Source Dasm tab displays the source lines annotated with assembly instructions and or inline instance and sample count The source line can be expanded or collapsed to show or hide the assembly instructions that are associated with the source line When debug information is not available only disassembly and sample counts will be displayed Right clicking on the source or disassembly line will reveal additional options for the view such as to select copy the selection to a buffer the clipboard The information may then be pasted into another document 14 Figure 2 15 Source and Disassmbly Mode in Src Dasm mode Session Session ebp Il All Data System Data froot classic classic Data Iroot classic classic Src Dasm 79554 Function 0x400aed Ox400bc9 Y Pia all Pid Tid Jan rid Type Source and Dasm multiply matrices Features Manage CDU EA search for E Go wes que sue ES wich 59 inline void multiply matrices ly Dasm L 0x400aed i 0x400aee Dasm Dasm Dasm Dasm H Ox400b1a Ox400b1d i 0x400b20 i 0x400b22 i 0x400b25 0 Instruction 60 61 62 63 65 push mow Multiply
172. s a Edesa 6 22 6 File Menu and Els ICONS ce E san Erud EU AUR M RUE 6 2 27 Profile Metu and Toolbar con Group ainia ios eroe Io tiet Ert iedat trien 2 2220 POOLS Ment and Con S aeaa E cdo 8 D229 WM WS MGNU ta di lo elias is seh etetanad 9 PAPA HE MU Sc 11 2 2 bL Data and Source DISPLAY sir taa iaa 12 22 2 Code Density CA 16 2 219 2S CSSIOM S EGLIDS 29d cs a Nac Genlantagerta uucenddue eto se Genie sage Qa ele erdt e 17 2 2 T4 Bait Event Conn Ut al soe doceo vei o pev E RESET RR Ee NEHME ud odes Sa DERE nus 17 2 215 View Manacement Dial BOX ratas iia arica 18 2 2 16 Configuration Management Dialog Box seesesseeee 19 2 2 17 CodeAn lyst Options Dialog BOX usura das ibas tds 20 2 2 I5 Protilino Java ADDIGAUOHBS cud n vaut ates ere ev tus ou EDU esas ta pude dS oda Ln TERES 21 2 9 0 0de Analyst ODLIOHS cir sida add ias 21 E A rp 22 2 912 A O 25 24 Event Counter Multiple Xine A N 26 2 4 1 Example of Event Counter Multiplexing esee 26 2 5 Importe Profile Data Into Code ABalyst dd 27 2 5 L ns ot sities totos iaci a O tad 29 2 5 2 import Remote Prominin A A moa buR ie 32 2 5 5 Import CodeAnalyst Session Directory ici de tire resi as 33 2 544 Iniport Opreport s XML Output Blesa ita Ert En vet vr Or pe aub PE Ad dU 34 2 6 Exporuane Profile Data front ode Analyst iie trit eti pd nde E d etra uei das ERR 34 2 7 SESSION E ds 37 237 A Se tina Templates tt dietis ui o aos 37
173. s a field which let users describe the purpose of the current profile configuration 4 3 3 Available Performance Events These sections describe how to add performance events to the current profile configuration Users can select any type of event to add to the configuration i e EBS event and IBS derived events can be added in a configuration 4 3 3 1 Perf Counter Tab Figure 4 5 Edit EBS IBS configuration Perf Counter Tab Please select event fram the list below Event Source Name OxXo000 FP Dispatched FPU aps 0x0001 FP Cycles in which the FPU is Empty x 002 FP Dispatched fast flag FPU operations OxOO003 FP Retired SSE Ops Ox oO 04 FP Retired move ops a Add Events This tab contains the list of Event based Sampling performance events Only performance events available for the currently running system are shown Users can select multiple events and click Add Event to add them to the current profile configuration TI Configure Profile 4 3 3 2 IBS Fetch Op Tab Figure 4 6 Edit EBS IBS configuration IBS Fetch Op Tab Perf Counter IBS Fetch IBS Op Please select event fram the list below Source Mame 0xf001 IBS fetch killed Oxfoo2 IBS fetch attempted Oxf00 3 IBS fetch completed Oxfoo4 IBS fetch aborted Please select event from the list below Event Source Mame Oxf101 IBS tag to retire cycles Oxf102 IBS completion ta retire cycles OxflOs IBS branch op Oxflo4 IBS mispredicted b
174. s the time in seconds of delaying the profile sampling after the target application is launched 2 7 2 4 Profile Configuration This 1s the kind of analysis to be performed Choose a predefined profile configuration from the drop down list After selecting a profile configuration performance events along with the count and unitmask settings will be shown in the list Please see Chapter 4 Configure Profile and Section 4 5 Manage Profile Configurations for more detail Click the Edit button to edit the selected profile configuration However predefined profile configurations cannot be changed unless saved as a new name 2 7 3 Advance Tab 39 Features Figure 2 35 Session Settings Dialog Advance Tab N Session Settings Setting Templates Template Name Session General Advance Note EEES Session Sessione Launch Control Duplicate Launch root classic classic Rename Terminate app when stop profile Enable CPU Affinity oxi Select Affinity Show app in terminal Enable Process Filtering Advance Filter Profile Control Profile duration sec Profile start delay sec o Stop prafile when the app exits Start with the profiling paused C Profile the duration of the app execution Profile Configuration Current event based profile 1000000 CPU clocks not halted cycles 500000 Retired instructions Remove 2 7 3 1 Enable vmlinux This setting allows users to
175. s used for address translation 9 3 4 19 Event 0xF212 Abbreviation IBS L1 DTLB 2M The number of IBS op samples where a load or store operation produced a valid linear virtual address and a 2 MByte page entry in the L1 DTLB was used for address translation 9 3 4 20 Event 0xF213 Abbreviation IBS L1 DTLB 1G The number of IBS op samples where a load or store operation produced a valid linear virtual address and a 1 GByte page entry in the L1 DTLB was used for address translation 9 3 4 21 Event 0xF215 Abbreviation IBS L2 DTLB 4K The number of IBS op samples where a load or store operation produced a valid linear virtual address hit the L2 DTLB and used a 4 KByte page entry for address translation 9 3 4 22 Event 0xF216 Abbreviation IBS L2 DTLB 2M The number of IBS op samples where a load or store operation produced a valid linear virtual address hit the L2 DTLB and used a 2 MByte page entry for address translation 9 3 4 23 Event 0xF219 Abbreviation IBS DC load lat The total DC miss latency in processor cycles across all IBS op samples that performed a load operation The miss latency 1s the number of clock cycles from when the data cache miss was detected to when data was delivered to the core Divide the total DC miss latency by the number of sampled load operations to obtain the average DC miss latency 9 3 5 IBS Op Northbridge Derived Events 9 3 5 1 Event 0xF240 Abbreviation IBS NB local The numbe
176. sal tal 0x400b4a 400aea lt multiply_ma 0 000970 8 7 Tutorial Profiling a Java Application This section demonstrates how to analyze a Java program using a time based profile The application program used in this section is the SciMark 2 0 benchmark from the National Institute of Standards 139 Tutorial and Technology NIST Source code for the benchmark can be downloaded from http math nist gov scimark2 scimark2src zip You will need a Java compiler in order to compile the benchmark CodeAnalyst supports both the Sun Microsystems and IBM versions of the Java Virtual Machine JVM Before starting ensure that a CodeAnalyst project is open and ready for use The Section 8 3 Tutorial Creating a CodeAnalyst Project demonstrates how to create a new project 1 Click the New button in the toolbar or select File gt New from the file menu to create a new project or click the Open button in the toolbar to open an existing project or select File gt Open from the file menu The session settings are slightly different for a Java application 2 With an open CodeAnalyst project select Time based profile from the drop down list of profile configurations in the toolbar Advanced Micro De pra E B 3 0 Assess performance IBS Op Northbidge Cache Access a my pmjectcaw gt l IBS Op Northbridge Services i TBP Sessions IES Op Overall Assessment EBP Sessions IBS Op Retum Instruc
177. scaded Following are examples of cascading panes Figure 2 9 Cascading session panes jon ebp 01 Session Tem I Hr All Data Manage mE F System Data System Data Aggregate by Modules fa Module Process GPU clocks no vmlinux 417 libB4 libc 2 1 2 so 14 usr libe4 qt 3 3 lib libxt mt so 3 3 8 10 libB4 Id 2 12 so 10 usr bin sudo 3 lib amp d libpthread 2 12 so usr lib amp 4 libstdc4 so 6 0 13 usrlibe4 libXrender so 1 3 0 usr libG4 libX11 s0 6 3 0 usr libe4 libpangoft2 1 0 so 0 2800 1 sbin killall5 2 2 9 2 Tiling Session Panes When two or more sessions are open session panes can be tiled for viewing more than one pane at a time Following are examples of tiled panes 10 Features Figure 2 10 Tiling session panes A X Session 03 Session 03 ebp alata gui AlDaa pala k E H0x37736799d0 malloc H0Xx3773678470 int malloc R 0x3773654820 1O vfscanf 0x377371 ea00 dl addr 0x37736de000 munmap no vmlinux lib64 libc 2 12 so usr libe4 qt 3 3 lib libqt mt so 3 3 8 I lib64 Id 2 12 so rr usr bin sudo M lib64 libpthread 2 12 so C usr lib64 libstdc4 4 so 6 0 13 2 2 10 Help Menu The Help menu displays the following Figure 2 11 Help menu The commands available in the Help menu are About F1 Displays the application flash screen and the version information Syst
178. sed when profile data is collected using the Oprofile command line utility and the review and analysis of the profile data is desired to be done using the GUI Data may be from a time based profiling event based profiling Instruction Based Sampling session EBP TBP files or opreport XML file The data to be imported must be generated by the Oprofile command line tool A new session is created for the data This section illustrates the process of importing 1 Collect profile data using Oprofile Please refer to Oprofile documentation http oprofile sourceforge net docs on how to collect profile data After profiling session profile data is usually stored in var lib oprofile samples current directory 2 Select the Import menu item from the File menu PA Features Profile Jools Windows Help New Ctrl N Fy Open Ctrl 0 FA Save Ctrl 5 Export System Data Import Close Ctrl 4 W Quit ctrl Q 3 An Import Wizard dialog box appears CodeAnalyst can import four types of profile data Remote Profiling In this mode the profile data from a remote system can be packaged into a compressed tarball which is generated by a script called capackage sh Then CodeAnalyst can import the packaged file for analysis Advanced User only Local Profile In this mode the profile data is generated on the local machine TBP EBP File TBP EBP files store profile data for each CodeAnalyst session This can be
179. sic blocks Session 01 Session O1 ebp Au Data y Manage cpu System Data jroot classic inlined classic Data Pid Pid Tid an Tia ll Aggregate samples into basic blocks CS EIP Basic Block Load Store CPU clocks DC misses 0x400a00 0x400400 Ox400a21 2 0 0400304 040000 0x400a21 0x4 0400816 040000 0x400a21 0x16 0400500 040000 0x400a21 0x0 04006 1f 0400400 0x4002a21 Ox 1f Ox400a0f 0x40DaD0 0x400a21 Oxf Ox400a0b 0x400400 0x400a21 Oxb 0x400a07 0400300 0x400a21 0x7 0x40031b 0400300 0x400321 Ox1b gt 0400321 0x400a21 0x400334 0 1 0400328 0x400a21 0x400a34 0x7 0x400a24 0x400a21 0x400a34 0x3 0x400a21 0x400a21 0x400a34 0x0 2 Ox400920 0x4009e0 0x4009fd 2 1 L 0x4009f9 0x4009e0 0x4009fd 0x19 0x4009e3 0x4009e0 0x4009fd 0x3 0x4009f3 0x4009e0 0x4009fd 0x13 0x4009e0 0x4009e0 0x4009fd 0x0 0x4009fd 0x4009fd 0x400a00 0 0 2 3 2 Directories Tab This tab allows the specification of directory paths to help CodeAnalyst find the information that it needs for analysis An additional search path for finding source can be specified in the Source File Search Paths field 25 Features Bl CodeAnalyst Options uu a General Directories Projecte Defauli Project Dir froo4 M D codesnalyst source File Search Paihs 2 4 Ev
180. sign a project name and location or browse for an existing file Figure 5 1 New Project Properties LA New project properties Project Name my projec Project location roat amp MD CadeAnalyst 84 Collecting Profile 3 The Session settings window opens Assign a session name optional enter the path to the application program to be launched and set the working directory 4 Under Profile configuration select time based profile 5 Select other desired profiling options and click OK Figure 5 2 Session Settings D r2 Wee 0 0 NR 9 seting Templates Template Name Session Genera Advance Note Launch Control Launch raat classic classic me Browse Terminate app when stop profile Enable CPU Affinity ox Show app in terminal Enable Process Filtering Advance Filter Profile Control Profile duration sec Profile start delay sec lo Stop profile when the app exits L Start with the profiling paused Profile the duration of the app execution Profile Configuration Time based profile 1700000 1msec 0x0 CPU clacks not halted cycles Remove B o o to launch the application and begin profiling 7 Thetask bar at the bottom of the screen displays Sampling Session Started and the percent completed 1000 6 Click the Start icon The Pause and Stop icons become active 85 Collecting Profile Figure 5 3 Task Bar Display File Profile To
181. sion Settings The Session Settings dialog box sets and supports changes to the session parameters Please see Section 2 7 Session Settings for more detail Figure 2 19 Session Settings Session2 Duplicate Rename Delete Remove Session Session Launch Control Launch raat classic classic siooticlassiciclassicy we E C Terminate app when stop profile C Enable CPU Affinity C Show app in terminal Enable Process Filtering Profile Control Profile duration sec Profile start delay sec o Stop prafile when the app exits Start with the profiling paused Profile the duration ofthe app execution Profile Configuration Current event based profile 1000000 CPU clocks not halted cycles 500000 Retired instructions 2 2 14 Edit Event Configuration The Edit button is located in two dialog boxes e Session Settings e Configuration Management Clicking Edit opens the Edit Event Configuration dialog box For details on using the dialog box read Section 4 3 Edit Event based and Instruction based Sampling Configuration 17 Features Figure 2 20 Edit event configuration Cycles in which the FPU is Empty Dispatched fast flag FPU operations PEED SSE Ops Events in this profile configuration Event Setting 250000 Data cache accesses Unit Masks Options 25000 Data cache misses O Reserved 25000 L1DTLB and L2 DTLB miss 25000 Misaligned accesses 250
182. sis convenient The predefine configuration for time based profiling is called Time based profile CodeAnalyst also provides a configuration named Current time based profile where the timer interval sampling period can be changed 3 3 Event Based Profiling Analysis Event based profiling EBP helps identify the root cause for CPU and memory related performance issues EBP uses the performance monitoring hardware in AMD processors to count the number of occurrences of hardware events The kind and frequency of these events may indicate the presence of a pipeline bottleneck poor memory access pattern poorly predicted conditional branches or some other performance issue Once hot spots are found through time based profiling EBP is used to follow up and investigate the hot spots in order to identify and exploit opportunities for optimization 57 Types of Analysis Figure 3 4 Retired instructions DC accesses and misses per software module EBP System Data Module Name 64 bit DC refills L2 NB DTLB L1ML2M DTLBLIML2H T frooticlassic classic 20093 4609 5646 15648 3967 Ino vmlinux llib amp 4 libc 2 5 sa fopt CodeAnalyst bin oprofiled Jusr lib amp 4 qt 3 3 lib libqt mt sa 3 3 6 fusrilibe4 libX11 sa 6 2 0 nib amp 4 libpthread 2 5 so fusrllibe4 libXft so 2 1 2 Jusr sbin sendmail sendmail fusr sbin hald Jusrilib amp 4 qt 3 3 plugins styles bluecurve so fusr lib amp d libusb 0 1 50 4 4 4 Jusrilibe4 libstdc 4 4 sa 6 0 8 Jusrilib6
183. specify the kernel image used for kernel and kernel modules profiling Uncheck this setting to profile without the kernel image The option equals no vmlinux option in Oprofile Check this setting to specify the vmlinux file If vmlinux is compressed uncompress it first Please note that this is not vmlinuz file 2 7 3 2 OProfiled Buffer Configuration e Event Buffer Watershed Size Set kernel buffer watershed to num samples 2 6 only When it ll remain only buffer size buffer watershed free entry in the kernel buffer data will be flushed to daemon most useful values are in the range 0 25 0 5 buffer size e Event Buffer Size Number of samples in kernel buffer When using a 2 6 kernel buffer watershed needs to be tweaked when changing this value e CPU Buffer Size Number of samples in kernel per cpu buffer 2 6 only If you profile at high rate it can help to increase this if the log file shows excessive count of sample lost cpu buffer overflow 2 7 3 3 Enable Call Stack Sampling CSS e Call Stack Depth specify the maximum depth of call stack unwinding 40 Features e Call Stack Unwinding Interval specify the frequency of stack unwinding 1 e perform stack unwinding every 1000 samples e TGID for CSS Users can specify the TGID to use for CSS instead of using TGID of the launched target application Note that if you select Use TGID of the launched target application the Launch field must be specified
184. start and stop data collection Users can invoke opcontrol from within a script which is useful when the task requires multiple profile runs To see the list of command line options simply run opcontrol 47 Features help at command prompt To identify the CodeAnalyst provided version of opcontrol tool simply run opcontrol version The profiling results from opcontrol can be imported into the CodeAnalyst Graphical User Interface GUI for analysis A CodeAnalyst project must be created to import and view performance data that was collected using the command line utility Please see for more information 2 8 1 Profiling with OProfile Command Line Utilities The command line switches to opcontrol set up and control performance data collection Analysis often concentrates on the performance of a particular program The ability to control profiling from a command file offers considerable flexibility when writing test scripts Specific test cases can be encapsulated into individual script files The following is an example of how to use opcontrol in a script Figure 2 39 Command line switches to opcontrol amp l bin bash Reset Uprofile prezet Specify no vmlinux na vm inus Specify oprofile merge option separate lib kernel cpu Specify event Uxfb UxLO x40 Ox4l event LPl DLE LIHHRLTETDZT00000 07 121 evyent RETIRED INSTRUCTIONS 100000707171 Start profiling start Run target application o
185. t A4MD CodeAnalyst 3 The Session settings dialog box opens Assign a session name optional enter the path to the application program to be launched and set the working directory 4 Under Profile configuration select a predefined event based profile configuration such as Assess performance and click OK 5 Select other desired profiling options and click OK Figure 5 8 Session Settings General Note Launch Control Launch ioowcasiclasi el at classic classic Browse C Terminate app when stop profile C Enable CPU Affinity px Select attinity ae Show app in terminal Enable Process Filtering Profile Control Profile duration sec Profile start delay sec lo Stop profile when the app exits Start with the profiling paused Profile the duration of the app execution Profile Configuration 250000 Data cache accesses 25000 Data cache misses 25000 L1 DTLB and L2 DTLB miss 25000 Misaligned accesses 250000 CPU clocks not halted cycles Remove 89 Collecting Profile 6 Click Ok to apply selections o o to launch the application and begin profiling 8 The task bar at the bottom of the screen displays Sampling Session Started and the percent completed o D e become active Figure 5 9 Launch Application J Click the Start icon The Pause and Stop icons File Profile Tools Windows Help I P3 EJ TUN amale oa l my pecan f i TBP Sessions Sa
186. t may be speculative A completed fetch actually delivers instruction data to the decoder The delivered data may go unused if the branch operation redirects the pipeline at a later time Finally the view also includes two computed performance measurements the IC miss ratio the number of IBS IC misses divided by the number of IBS attempted fetches and the average fetch latency Fetch latency is the number of cycles from when a fetch is initiated to when the fetch is either completed or aborted An aborted fetch is a fetch operation that does not complete and deliver instruction data to the decoder 134 Tutorial Session Session 01 BS tan temes taen comp 185 mie mere es tna Session 02 amp root AMD CodeAnalyst samples classic classic 24159 P no vmlinux 0 102389 1546 1 lib64 libc 2 12 so 171 Kr opt CodeAnalyst bin oprofiled lib64 Id 2 12 so 0 166667 30 amp 1 usr lib64 qt 3 3 lib libgt mt so 3 3 8 16 1 usr lib64 libxcb so 1 1 0 3 amp sbin killall5 3 usr lib64 libXft so 2 1 13 11 sbin auditd amp T opt CodeAnalyst bin CodeAnalyst libe4 libpthread 2 12 so amp r lib64 libpam so 0 82 2 b lib64 libglib 2 0 s0 0 2200 5 2 Select IBS All ops from the drop down list of views The IBS All ops view displays This view is an overall summary of the collected IBS op samples It shows the total number of IBS op samples the number of op samples taken for branch operations and the number of sam
187. tabs in the workspace Results are broken out and summarized by module process function source line or instruction Figure 3 2 Time spent in each function within an application TBP System Data root classic classic Data ae Aggregate samples into instance of inline function El CS ElP Symbol Offset CPU clocks e A PECGIINT ENS multiply matrices Be 0x400ad1 multiply matrices Ox3f 11362 0x400ae0 multiply matrices x4e 4224 0x400ae8 multiply matrices 0x56 1998 Ox400b0d multiply matrices Ox 7b 1118 0x4 00abf multiply matrices Ox4d 924 Ox400aca multiply matrices 0x38 543 0x400aef multiply matrices Ox5d 522 i 0x4 00 aff multiply matrices x amp d 610 Dx 40D aca multiply matrices 0x30 610 i 0x400bld multiply matrices 0x8b 54 CodeAnalyst supports drill down to source lines and instructions Source level presentation of performance results requires symbolic debug information See Section 1 1 1 Preparing an Application for Profiling 55 3 2 1 3 2 2 Types of Analysis Figure 3 3 Time spent at source level hot spot TBP Address Line Source Code Bytes CPU clocks mm 55 A 59 inline void multiply matricesi 2 50400392 push rbp i 0x400a93 mov resp srbp de 89 e5 2 60 m 61 Multiply the two matrices E 62 for int i 0 i lt ROWS it ie 63 for int j 0 j lt COLUMNS jt 7 fe 64 float sum 0 0 1 E
188. taken branches 9 3 3 5 Event 0xF107 Abbreviation IBS RET The number of IBS retired branch op samples where the operation was a subroutine return These samples are a subset of all IBS retired branch op samples 9 3 3 6 Event 0xF108 Abbreviation IBS misp RET The number of IBS retired branch op samples where the operation was a mispredicted subroutine return This event should be used to compute the ratio of mispredicted returns to all subroutine returns 9 3 3 7 Event 0xF109 Abbreviation IBS resync The number of IBS resync op samples A resync op is only found in certain microcoded AMD64 instructions and causes a complete pipeline flush 9 3 4 IBS Op Load Store Derived Events 9 3 4 1 Event 0xF200 Abbreviation IBS load store 151 Performance Monitoring Events The number of IBS op samples for ops that perform either a load and or store operation An AMD64 instruction may be translated into one single fastpath two double fastpath or several vector path ops Each op may perform a load operation a store operation or both a load and store operation each to the same address Some op samples attributed to an AMD64 instruction may perform a load store operation while other op samples attributed to the same instruction may not Further some branch instructions perform load store operations Thus a mix of op sample types may be attributed to a single AMD64 instruction depending upon the ops that are issued from
189. tent infringement or for any other reason not limited to patent issues conditions are imposed on you whether by court order agreement or otherwise that contradict the conditions of this License they do not excuse you from the conditions of this License If you cannot distribute so as to satisfy simultaneously your obligations under this License and any other pertinent obligations then as a consequence you may not distribute the Program at all For example if a patent license would not permit royalty free redistribution of the Program by all those who receive copies directly or indirectly through you then the only way you could satisfy both it and this License would be to refrain entirely from distribution of the Program If any portion of this section is held invalid or unenforceable under any particular circumstance the balance of the section is intended to apply and the section as a whole is intended to apply in other circumstances It is not the purpose of this section to induce you to infringe any patents or other property right claims or to contest validity of any such claims this section has the sole purpose of protecting the integrity of the free software distribution system which is implemented by public license practices Many people have made generous contributions to the wide range of software distributed through that system in reliance on consistent application of that system it is up to the author donor to decide if he or she is wi
190. terminal _ Enable Process Filtering Profile Control Profile duration sec h 0 Profile start delay sec lo _ Stop profile when the app exits _ Start with the profiling paused Profile the duration of the app execution Profile Configuration 2800000 1 msec 0x0 CPU clocks not halted cycles 13 After the new project is created it becomes the current project with the new project name displayed in the sessions area on the left side of the CodeAnalyst window No sessions are listed under TBP Sessions EBP Sessions or IBS Sessions until data has been collected and processed 14 Click on Session Settings in the toolbar or select Tools gt Session Settings to change session settings 8 4 Tutorial Analysis with Time Based Sampling Profile After creating a new project or opening an existing project CodeAnalyst is ready to collect and analyze data The currently selected profile configuration is shown in the toolbar You may choose a different profile configuration from this list without affecting the other session settings This is a fast way to collect and analyze the program from a different perspective while keeping run control options the same This section uses the Time based profile configuration 117 Tutorial TBP Sessions EBP Sessions a a ia IELI Click the Start button in the toolbar to start data collection You may also select Profile gt Start from the Profile menu aa z
191. the function multiply matrices is displayed with the IBS information for each source line in the function Most load store activity occurs at line 66 which is the statement within the nested loops This is the statement that reads an element from each of the two operand matrices and computes the running sum of the product of the elements The DTLB misses are caused by the large strides taken through matrix_b With nearly every iteration the program touches a different page thereby causing a miss in the DTLB M File Profile Tools Windows Help B P EJ 16D s 1000 Curent instruction based profile my_project caw AE i TBP Sessions M EBP Sessions Function 0x400abd 0x400b99 multiply matrices Session 01 Pia All Pid Tid All Tid Type Source and Dasm Show System Percentage ee Search for Al Go 59inline void multiply matrices PHDasm 60 61 Multiply the two matrices gt Dasm 62 for int i 0 i lt ROWS i Dasm 63 for int j 0 j lt COLUMNS j 3 3 64 float sum 0 0 2 2 gt Dasm 65 for int k 0 k lt COLUMNS 3988 3986 2 gt Dasm 66 sum sum matrix_a i 15907 13909 1998 67 Dasm 68 matrix r i j sum 7 5 2 69 70 71 gt Dasm 2 oaa 1 Instruction Z f Samping Session e 3 Click the expand box to the left of line 66 The disassembled instructions for source line 66 are displayed along with the IBS data for each instruc
192. the two matrices for for di El mow mow cltq mowslq imul Xrbp trap trbp int i 0 i ROWS nt j 0 j COLUMNS cat sum 0 0 for int k O k M i d i d DOLUMHS sum sum matrix a i k 0x4 krbp tedx xl t rbp eax tedx trdx S0x3e8 trdx trdx k t Mans 18 16002 16 3360 682 A mal x Xx frio 3 ker cma a 71 Figure 2 16 Source and Disassembly Tab in Disassembly only Mode Session Session ebp Il All Data Manage A 14 System Data root classic classic Data reot classic classic Src Dasm VENT H amp 6874 7520 4 2 1766 530 324 172 7 Cw k A B m xi x mn x400aed Function E Ox400aed 0x400bc9 multiply matrices Pid al Pid Tid Jan Tia Type Dasm only Search for E Go L7 ET C77 ECT Ret branch Ox400b09 i 0x400b0e i 0x400b11 Ox400b18 S 0x400bla 1 Instruction 0x17 Oxlc 0x21 0x24 Ox2b 0x30 0x33 038 Ox3f 043 x4c Ox4f mpg meow mow mowl Imp mow cltg movslg imul lea mowvss mow mow oe lt multiply_ 0x0 eax eax xc 4rbp x xl trbp 400be multiply Ox4 trbp tedx Ox10 rbp teax tedx trdx 0x3e8 trdx trdx rdx trax l trax Oxb020a0 trax 4 0x10 rbp tedx xS trbp tsax 15 3166 148 1766 324 212 1578 930 17
193. tion IBS load store information is attributed to each instruction that performs either a memory read or write operation Sources of performance robbing DTLB misses are precisely identified 138 Tutorial TBP Sessions EBP Sessions Session 02 eem Tur me iii ii i a Dasm sum sum matrix a i l 15907 13909 1998 0x400aea 0x10 rbp tedx 2017 2017 0x400aed 0x4 3rbp teax 1930 1930 0x400af0 0x400af2 sedx trdx 0x400af5 i 0x3e8 trdx rdx 0x400afc rdx trax 1 rax 0x400b00 0x6012a0 trax 4 xmml 0x400b09 0x4 3rbp tedx 0x400b0c Oxc 3rbp eax 0x400b0f 0x400b11 sedx rdx 0x400b14 i 0x3e8 trdx rdx 0x400b1b trdx trax 1 rax AYN Q sere N 4 Select IBS BR branch from the drop down list of views The IBS BR branch view displays This view shows the number of IBS branch op samples and indicates 1f the branch operation mispredicted and or was taken Note that only the conditional jump instruction at the end of the innermost loop is marked as a branch instruction This example further illustrates the precision offered by Instruction Based Sampling F EBP Sessions Session Session 01 Session 02 Multiply the two ma for int i 0 i lt R Dasm 63 for int j 0 j lt COLU Dasm float sum 0 0 v Dasm for int k 0 0 000969 0x400ae1 0x0 0x4 rbp 0x400ae8 j 400b3e lt multiply_ma 0x400b3a 0x1 0x4 rbp 0x400b3e 0x3e7 0x4 rbp 0x400b45 sal 0x400b48
194. tion based sampling Investigate L cache access Investigate branching Investigate data access Investigate instruction access Time based profile El Samping session Idk d 3 Click the Session Settings button in the toolbar to change the session settings You may alternately select Tools gt Session Settings from the file menu A dialog box appears asking for session settings 4 Change the session name to JavaSession Java programs are executed by the JVM which is started by the Java application launcher tool path to java installation bin java 5 In the Launch field enter the path path to java installation bin java or browse to it by clicking the Browse button located next to the Launch field 140 Tutorial 6 After the path to java enter jnt scimark2 commandline the name of the Java application program to be launched Note CodeAnalyst automatically checks for the Java application launcher tool and inserts the profiling agent library with option agentpath into the Launch field You do not need to enter this option yourself The agent connects the JVM to CodeAnalyst 7 Enter the path to the working directory or browse to the working directory by clicking the Browse button next to the working directory field 8 Ensure the checkbox is selected to enable Terminate the app after the profile Stop data collection when the app exits and Profile the duration of the app execution options CodeAnal
195. tion is in XML 6 2 1 XML file format 6 2 1 1 Collection configuration The following tags mark the beginning and end of configuration information within a data collection configuration file dco contaiguration gt de conocio A collection configuration element contains one lt tbp gt lt ebp gt or lt sim gt element Each element describes a data collection configuration of the type indicated by its element name Each such element describes how to configure Code Analyst to collect data 6 2 1 2 TBP collection configuration The lt tbp gt and lt tbp gt tags mark the beginning and end of a time based profiling data collection configuration 100 Data Collection Configuration A lt tbp gt element has the following attributes name Configuration name string interval Sampling interval given in milliseconds float A TBP collection configuration element contains exactly one of the following elements lt tool_tip gt and lt description gt The lt tool_tip gt and lt description gt elements have a common form and are described below A time based profiling configuration has the form lt dc_configuration gt lt tbp name interval 10 0 gt SEDO L LADA sun COOL Cip lt deSCripeLon gt x lt descriprion gt lt TRpP gt dc configuration 6 2 1 3 EBP collection configuration The ebp and ebp tags mark the beginning and end of a event based profiling data collection configuration
196. to the instruction decoder Although the instruction data was delivered it may still not be used e g the instruction data may have been on the wrong path of an incorrectly predicted branch 9 3 1 5 Event 0xF004 Abbreviation IBS fetch abort The number of IBS sampled fetches that aborted An attempted fetch is aborted if it did not complete and deliver instruction data to the decoder An attempted fetch may abort at any point in the process of fetching instruction data An abort may be due to a branch redirection as the result of a mispredicted branch The number of IBS aborted fetch samples is a lower bound on the amount of unsuccessful speculative fetch activity It is a lower bound since the instruction data delivered by completed fetches may not be used 9 3 1 6 Event 0xF005 Abbreviation IBS L1 ITLB hit The number of IBS attempted fetch samples where the fetch operation initially hit in the L1 ITLB Instruction Translation Lookaside Buffer 9 3 1 7 Event OxF006 Abbreviation IBS ITLB LIM L2H The number of IBS attempted fetch samples where the fetch operation initially missed in the L1 ITLB and hit in the L2 ITLB 9 3 1 8 Event OxF007 Abbreviation IBS ITLB LIM L2M The number of IBS attempted fetch samples where the fetch operation initially missed in both the L1 ITLB and the L2 ITLB 9 3 1 9 Event 0xF008 Abbreviation IBS IC miss The number of IBS attempted fetch samples where the fetch operation initially misse
197. ttp support amd com us Processor_TechDocs 41256 pdf BIOS and Kernel Developer s Guide BKDG For AMD Family 10h Processor http support amd com us Processor_TechDocs 31116 pdf BIOS and Kernel Developer s Guide for AMD Athlon and AMD Opteron Processors Rev A E http support amd com us Processor_TechDocs 26094 PDF BIOS and Kernel Developer s Guide for AMD NPT Family OFh Processors Rev F G http support amd com us Processor_TechDocs 32559 pdf e Section 9 3 Instruction Based Sampling Derived Events 9 2 Unit masks for PMEs The following three tables show how to construct a unit mask and express combinations of unit masks for a given event bit position in mask Event E Mask a y iioii pox OOO pod DX NN NEN NN Rem 7 0 p T Unit Mask bits 15 8 Combined Masks combined bit Unit Mask Hex Value in Event positions Unit Mask 147 Performance Monitoring Events Unit Mask bits 15 8 Combined Masks combined bit Unit Mask Hex Value in Event positions Unit Mask Xxxx_x111 0 Oandland2 0 and 2 E xxx1_1111 O and 1 and 2 and 3 and 4 0x001F bu CITO A p maWemesae ME qp re ene ste ME po cti cc te MM B Owemdesae MI E oi se M S ERN ERN o JAHN All of the above the above All All MOESI states states NU 1F A cd l and 3 Invalid and Ez and Owner states 9 3 Instruction Based Sampling Derived Events This section d
198. under the terms of Section A 2 2 Section 1 above provided that you also meet all of these conditions a You must cause the modified files to carry prominent notices stating that you changed the files and the date of any change b You must cause any work that you distribute or publish that in whole or in part contains or is derived from the Program or any part thereof to be licensed as a whole at no charge to all third parties under the terms of this License c If the modified program normally reads commands interactively when run you must cause it when started running for such interactive use in the most ordinary way to print or display an announcement including an appropriate copyright notice and a notice that there is no warranty or else saying that you provide a warranty and that users may redistribute the program under these conditions and telling the user how to view a copy of this License Exception If the Program itself is interactive but does not normally print such an announcement your work based on the Program is not required to print an announcement These requirements apply to the modified work as a whole If identifiable sections of that work are not derived from the Program and can be reasonably considered independent and separate works in themselves then this License and its terms do not apply to those sections when you distribute them as 159 A 2 4 A 2 5 A 2 6 GNU General Public License se
199. urations can be created modified and saved through Configuration Management See Section 4 5 Manage Profile Configurations for further details 4 2 Edit Timer Configuration The Current time based profile configuration opens a dialog box for changing e The profile configuration name The timer interval that determines how often samples are taken 73 Configure Profile Figure 4 1 Edit timer configuration LA Edit timer configuration Profile name Time based profile Timer interval 1 m5 4 3 Edit Event based and Instruction based Sampling Configuration This section discusses how to configure event based and instruction based sampling profile The Edit EBS IBS Profile Configuration dialog box allows for changing a profile configuration that uses event based sampling and or instruction based sampling for data collection This dialog corresponds to the dialog box opened by using the Edit button displayed in Session Settings dialog box This dialog box allows for defining performance events to be measured It is important to understand the difference between event based sampling and instruction based sampling as they use different data collection techniques and different performance monitoring hardware For more technical detail about event based Sampling and instruction based Sampling please see Section 3 3 Event Based Profiling Analysis and Section 3 4 Instruction Based Sampling Analysis AMD proc
200. use 91 Collecting Profile Figure 5 11 View Management Platform name View name DRE Es dapi Description This view gives an overall picture of performance Use it to find possible issues for deeper investigation Columns Available data Columns shown Ret inst 0x0 CPU clocks 0x0 DC misses 0x1 IPC 0x0 DTLB LIM L2M 0x7 DC miss rate 0x1 Misalign access 0x0 DTLB LIM L2M rate 0x 7 ptions Separate CPUs 5 3 3 Aggregate by Processes The Aggregate by Processes mode shows the sample counts break down by modules within the process 92 Collecting Profile Figure 5 12 Aggregate by Processes Advanced Micro Devices CodeAnalyst root AMD CodeAnalyst my project my project caw Session S E l aa B r File Profile Tools Windows Help x morao a my projectcaw fo A TBP Sessions gt EBP Sessions i Session gt frooticlassic classic PID 4551 gt frooticlassic classic jna vmlinux llib libc 2 12 1 so i Jlib ld 2 12 1 50 nao wmlinux apt CodeAnalyst trunk bin aprofiled J opt CodeAnalyst trunk bin CodeAnalyst Jusr sbin sshd sbinikillall5 Jusr bin vim gname Jusr bin sudo J usrilib libgconf2 A gconfd 2 Jusr sbin irgbalance a Sampling Session Idle n 5 4 Collecting an Instruction Based Sampling Profile Instruction based sampling IBS profile is similar to Event based samplin
201. using the Advance Filter 46 Features Figure 2 38 Process Filter Dialog Si Process Filter Dialog Please specify processes to be included in data processing Processes frooticlassic Note that a profile is collected for all processes system wide However only the profile specified in the Advance Filter is being processed This can also help reduce the post processing wait time in some cases The following figure shows a profiling session with process filter enabled Here only root classic and 1ib64 1ibc 2 4 so are shown since application classic depends on the standard libc library Foot classiciclassic 20068 4627 libed libc 2 5 so 76 2 Jusrilib amp 4 libstdc 4 4 sa 6 0 8 Alibe4 Id 2 5 sa 2 8 CodeAnalyst and OProfile CodeAnalyst leverages a third party profiling tool called OProfile http oprofile sourceforge net news CodeAnalyst provides graphical user interface which communicates with Oprofile kernel module and Oprofile daemon in order to collect profile data OProfile also provides command line utilities for CodeAnalyst CodeAnalyst is designed to work with original OProfile which is often available on several Linux distributions CodeAnalyst also provides a modified OProfile kernel module daemon and utilities which include additional features and support for latest AMD processors Opcontrol is a command line tool from OProfile used to control a profiling session It allows users to configure
202. version of AMD CodeAnalyst you are using Choose Help gt About to view the About AMD CodeAnalyst dialog box Describe the desired enhancement or change e Indicate to us how important this is to you using a scale of 1 to 5 where 1 is most important and 5 least important 10 2 Problem Report If a problem is found take the following action 1 Run careport sh script which is located in CodeAnalyst root directory of the source tree or opt CodeAnalyst bin careport sh This script will generate a report file called CAReport txt J Please provide the following information Give a description of the problem or issue Briefly describe the steps or sequence of events leading to the observation State how frequently problem occurred Describe the messages AMD CodeAnalyst displayed State which version of the AMD CodeAnalyst was used under Help gt System Info or opcontrol version Describe the application analyzed Please send the report file CAReport txt in step 1 and information in step 2 to CodeAnalyst support amd com mailto CodeAnalyst support amd com 157 Appendix A GNU General Public License Version 2 June 1991 Copyright 1989 1991 Free Software Foundation Inc Free Software Foundation Inc 51 Franklin Street Fifth Floor Boston MA 02110 1301 USA Everyone is permitted to copy and distribute verbatim copies of this license document but changing it is not allowed Version 2 June 1991
203. ves the right to discontinue or make changes to its products at any time without notice Trademarks AMD the AMD Arrow logo AMD Athlon AMD Opteron and combinations thereof and 3DNow are trademarks of Advanced Micro Devices Inc HyperTransport is a licensed trademark of the HyperTransport Technology Consortium Linux is a registered trademark of Linus Torvalds Microsoft Windows and Windows Vista are registered trademarks of Microsoft Corporation MMX is a trademark of Intel Corporation Other product names used in this publication are for identification purposes only and may be trademarks of their respective companies Table of Contents T EBEEOCIUE HOD du adela rei tbe Lotus l PMs OVEI VION aid A E l I 1 T Preparing an Application Tor Profiling cui 1 1 12 Compiling with the GNU GCC Compiler nie l A ON 2 2 L9 vVervie wor AMD CodeAnalysE pn isa 2 2 T1 T Prosram Pertormauce TINS io erii oh ettet ied ei eise dein op is es da 2 2 1 2 Dy pescoE ZB SIS ds 2 2 3 Plexible oystem Wide Data Colecistitis 3 2 1 4 Summarized Results with rilledo wn as 3 AES Graphical User TIMEN ACC rca cia c duda etdc a tri gegen acum 3 2 16 Projects and SESSIONS A E 3 24 5 Baste te ps Or Anal SIS et lt ia dais 4 22 xplorm the Workspace a d OUT iios dai 4 A A 4 PD Die SUAS B dB A mua su e oe A eww CMS oes 4 Za LOUIS rt rs ii ass 5 2 4 4 Floating and Docking Toolbar Groups ui 5 2 2 5 Men s Tools ANC ICONS sica das tota eae man asi
204. view Identifies this view as a candidate default view Boolean show percentage Toggle the format of the profile data shown between raw and percentage Boolean attributes are restricted to the string values T and F The view s name will appear in a list of available views and should suggest the purpose of the view to the user The separation aggregation attributes are optional and default to false CodeAnalyst shows a view after data collection a so called default view The default view attribute is a hint to CodeAnalyst that the view could or should be selected as a default view Please use this attribute sparingly The default view attribute is optional and it s default value is false 7 5 1 1 view A view element contains one each of the following elements data Describes the event data needed to show the view output Describes how the event data will be shown tool tip Tool tip to be shown for the view description Short description to be shown for the view Combining these elements a view configuration file has the overall structure shown here view configuration view name View name separate cpus T separate processes F separate threads F data 109 View Configuration lt l LVst of event data needed gt lt data gt lt output gt See se wor Columns O Da shown gt lt output gt SEDO tips Tool tip text S bool tip description Short descriptiv
205. vious image multiply_matrices is an in line function called by main This mode aggregates samples belonging to the instance of multiply_matrices as part of main Figure 3 12 Aggregate into in line instance system Data froot classicfinlined classic Data Aggregate samples into instance of inline function ERE symbol Offset CPU clocks JOx 100950 meme A EE err E a A Eo O E E O E lovem 04009 ec main gt multiply matrices 0x4 5001 0400913 main gt multiply matrices Oxb 5548 Ox4009 ef main gt multiply matrices Oxy 5492 o 04 009 fe main gt multiply matrices 0x16 740 0x400a03 main gt multiply matrices Oxlb 579 x400a07 main gt multiply matrices Ox1f 556 0x4 00988 main gt multiply matrices 0x0 252 i 0x400917 main gt multiply matrices Oxf 23 i Ox4d 00a0c main gt multiply matrices 0x24 177 i 0x400ala main gt multiply matrices 0x32 1 i Ox400a05 main gt multiply matrices 0x21 1 O i 0x4008c9 main 0x79 1 0x4008b0 initialize rnatrices 11 L 0x4006e8 Unknown Sample 2 3 6 2 Aggregate samples into original in line function When samples belong to an inline instance CodeAnalyst aggregates them into each inline instance CodeAnalyst groups all inline instances and lists them together under the inline function which 1s presented in red text In figure above the inline function multiply_matrices has one inline instance in function main 67 Types of Analysis Fi
206. wn in the following illustrations Figure 2 3 Toolbars inactive and active File Profile Tools Windows Help Assess performance W File Profile Tools Windows Help En el 1 EN e oo Current event based profile 2 2 4 Floating and Docking Toolbar Groups Toolbar groups have the ability to be floated or docked in the workspace Drag and drop a toolbar group using the grip y located to the left of the group Drag into the work area until the border darkens and release The toolbar displays 1ts group name in a header Double click on the toolbar name to automatically return it to the toolbar area To replace it in its original position drag and realign the group until the shape changes elongates and then release the mouse Following are examples of toolbars being docked and floated Figure 2 4 Floating and docking toolbars File Profile Tools Windows Help JAA AVA ed E e mam JP TBP Sessions FRKEBP Sessions Dona Session 01 sampling Tool Bar Current event based profile Sampling Session Idle A Features Any single icon or groups of icons that include the grip bar can be moved around the work area This includes tools found only on specific tab windows 2 2 5 Menus Tools and Icons CodeAnalyst provides a menu bar and three toolbars The toolbars can float in any area of the workspace The following sections give descriptions and definitions of menus tools and icons available for creating p
207. y Setting of the selected event will get populated into the Event Setting field on the right Count The Count field specifies the event count sampling period for an event Unitmasks Options Check boxes allow for specifying the unit mask or option for the selected event e Usr Selecting Usr enables collection of user level samples for an event e Os Selecting Os enables collection of operating system level samples for an event Users can make desire changes and then click Apply Setting button to save changes On an AMD processor there are a limited number of performance event counters In case the number of events selected is more than the number of available hardware performance counter event multiplexing will be enabled to help eliminate the hardware limitation Here the Multiplexing Interval field will be enabled and default to 1 msec This parameter basically specifies the profiling period before the next group of events is multiplexed during profile run as described in Section 2 4 Event Counter Multiplexing To remove an event simply select the event and click Remove Event button 76 Configure Profile 4 3 2 2 Description Tab Figure 4 4 Edit EBS IBS configuration Description Tab Please enter description for this profile Lise this configuration to get an overall assessment of performance and to find potential issues for investigation Selected Events Description This tab contain
208. y Setting Selected Events Cancel ds l 5 Click the Start button in the toolbar or select Profile gt Start to begin profiling CodeAnalyst starts data collection and launches the application program previously specified in the session settings The session status displays in the status bar in the lower left corner of the CodeAnalyst window Session progress displays in the lower right corner The blank window is the console window in which the application program classic is running When data collection completes CodeAnalyst processes the IBS performance data and creates a new session under EBP Sessions in the session management area at the left hand side of the CodeAnalyst window Results are displayed in the System Data tab This tab behaves like its TBP and EBP counterparts The table displays the number of IBS derived events that were sampled by the performance monitoring hardware 132 Tutorial ll File Profile Tools Windows Help E En dl E 1e Curent instruction based profile my projectcaw All Data Manage mz TBP Sessions ystem Data F EBP Sessions Aggregate by Modules EJE Session i So 01 Module gt Process IBS 2M page IBS 4K page IBS fetch abort 12 Session 02 no vmlinux 51 241 usr libe4 qt 3 3 lib libqt mt so 3 3 8 4 usr lib64 libXft so 2 1 13 usr lib64 libxcb so 1 1 0 1 sbin killall5 1 sbin auditd root AMD CodeAnalyst samples cl
209. ysis and tuning with CodeAnalyst consists of six steps 1 Prepare the application for analysis by compiling with debug information turned on an optional step 2 Select the kind of data to be gathered by choosing one of several predefined profile configurations 3 Configure run options such as the application program to be launched the duration of data collection etc 4 Start and perform data collection 5 Review and interpret the summarized results produced by CodeAnalyst 6 Make changes to the program s algorithm and source code recompile link and analyze again Types of Analysis AMD CodeAnalyst is a suite of tools that help improve the performance of an application program or system CodeAnalyst provides several different ways of collecting and analyzing performance data Time based profiling TBP shows where the application program or system is spending most of its time This kind of analysis identifies hot spots that are good candidates for tuning and optimization After making changes to the code time based profiling can evaluate measure and assess improvements to performance It can also verify that the modifications improved execution speed and calculate by how much Please see Section 3 2 Time Based Profiling Analysis for more detail Event based profiling EBP uses the performance monitoring hardware in AMD processors to investigate hot spots This kind of analysis identifies potential performance issues such
210. yst collects data for the entire duration of the benchmark run Click OK to confirm the session settings and to dismiss the dialog box TemplateName JavaSession 00000000 General Advanced Mote Launch Control Isandbox kdk1 7 0 bin java agentpath opt CodeAnalyst lib libCAJVMTIA64 so jnt scima Working directory sandbox scimark2 Java Browse J Terminate app when stop profile Enable CPU Affinity in Hex Oxf Select Affinity Show app in terminal Enable Process Filtering Advance Filter Profile Control Profile duration sec lo Profile start delay sec Jo J Stop profile when the app exits Start with the profiling paused Profile the duration of the app execution Profile Configuration Time based profile Edit 3300000 1 msec 0x0 CPU Clocks not Halted 9 Click the Start button in the toolbar or select Profile gt Start from the menu CodeAnalyst starts data collection and launches the Java application program through the Java application launcher tool When the benchmark program terminates and data collection is finished CodeAnalyst displays results in three tabbed panels System Data and Processes The System Data table shows a module by module breakdown of timer samples Each timer sample represents approximately 1 millisecond of execution time when using the default timer interval of 1 millisecond 8 7 1 Reviewing Results The target java application jnt scimark2 com
Download Pdf Manuals
Related Search
Related Contents
ecoタッチリモコン形式RC-DX1 取扱説明書 Solac TH 8305 Manual Dimmer Táctil RGB Personnels Techniques GTO FM121 User's Manual MANUALE UTENTE TERMOPRODOTTI - La Nordica KX-TDA30/KX-TDA100 Model KX-TDA200/KX MicroVibe P - Alba Servis 9 Fehlerbehebung Copyright © All rights reserved.
Failed to retrieve file