Home

User's Guide

image

Contents

1. O SE SEE ENE 115 Ahn Pe UN op M 115 LocalMemorvberf bread 115 localMemory Total o repe eer e ten Ad A ee ge 115 www nvidia com CUPTI DA 05679 001 v5 5 vii parentBlOck de 116 aL ig3eltesfrep RR 116 parentBlock gem 116 Part te pp 116 JU EE 116 reeistersPerThread exten resa A CE TRES RN A seas 116 reduested 116 sharedMemoryConfig eu ee ECKE EEN SEENEN ENEE dE NENNEN EE ENKEN s es Era chewed se 117 Fic HM 117 staticSharedMemlory eene rera exe nnne AEN a en a e ERR REN AK ERR Y nw ice FESTER EYES ge ee 117 Gub 117 SUDE RP ED 117 CUpti Activitycontex ts e ees ER ERNEIEREN EEN ER DEEN KEEN EEN NEE EEEEE E aereas 117 elle A NEE 118 o ira EE 118 CEVICE DEE 118 Alp mE 118 CUpti Activity Device cocinan criaron NEEN NEEN RSS NES NN EENS ENNEN ERR TRE RE CERA RP RR dus 118 ComputeC apabilitvkaior seen ehe ehe ehe sensnm eene 118 computeCapabilityMinor eorr EENS EEN EE dE EEN TERRAE KO E SER EFE RO KETTE 119 constantMemorySIze iss ease sr tidie t Rae E ks au ERE RRENSSEEEN SERES EEN NEEN SEU PARU S acen aces 119 A gains VEER ANEREN N E E e Oe SNE ENEE Ke WK oN 119 A RT 119 globalMemoryBandwidt conc a nenne ehe he enne 119 globalMemory5lze cies eee note esie
2. M 149 www nvidia com CUPTI DA 05679 001 _v5 5 xii N T 149 ead D 149 ie eg Al ME 149 CUP UE ActivityObjectKindld rcer preter re anh SEENEN EE e ERES FER RAS 149 qom REDE 150 am 150 CUpti ActivityOverliead uce eee re tenet e gare ee nr rtl eit vais ea Re RE PEERS a 150 LM n 150 Ane 150 eld E 151 oy reife 151 overheadKind eese tes Ge geg A e err E PR ds A a 151 SCAM eT 151 CUpti ActivityPreemptlon esee ette s NEEN ENEE NNN ENEE NENNEN NEEN NEEN NEEN NEE aba ege 151 51 Ea 9 AAN Ae Ee eege 151 stes 152 slo Zi dlrs 152 A RA 152 MO iaa 152 DAG ii RARA 152 Si le ele TA NEE 152 A EA 152 CUpti_ActivitySourceLocatoOF ege ica A ENN REN EEN 153 FILENAME T E T qn T A SA A EATE ENA oe ER ea E 153 Le D 153 re e 153 Dpehumber oe eg EK VEER VEER EEN EEN d Ee EEN eevee eddies ENEE EEN NEE Ed EEN ace 153 CUpti Callbae kat gegen consi m nhe ER PER RRE dE geed dE A EE ENEE EAS 153 elle 154 COMU A A AA AAA A 154 Cont WE 154 cortrelationD tAire iiti se EEEE do oe 154 correlationld EE 154 FUNCCONNAME ia 155 FU
3. M 131 LocalMemorvberfbread nn rro ehh enhn etes nnne 131 localMemory Total sespe deg 9 ooo tere EEN e A Een 131 ARUMTTETCEEITI TEILT TL LL LI IIT 131 obe 131 registersPerTlitead eoe e sete tree pr e pre e rro c s eeedPe e ci 131 reserved ci eds iuum ESSE Id NUR RRA EE 131 runtimeCorrelationld voee eerie reri rt er Tres tere re va rev reru e 132 SCAM EE 132 StaticSharedMeMOly EE 132 A A 132 CUpti_ActivityKerne lege gege eg SEA EA ENEE ROREM AAA ER ER Ra 132 slo pM 132 DIOCKY C P 133 sj PE 133 GCOMPIELE E 133 se jadis 133 COMO EUER 133 ure pm S H 133 CynamMicSharedMeMOry eene eene ehe hn hee ene sees sensn enne 133 CDM T 134 e a 134 Cue 134 OMAK ee ELTE 134 uem 134 GIL M E 134 ALONS RR 134 localMeMoOryPer gt hread cccssccececceessee esses ee eene nhe ehh ehh eth eher 135 localMemory Total erp eer Ere ew A A A Rea 135 www nvidia com CUPTI DA 05679 001 v5 5 x
4. 124 o ee 124 COMPE eege tir e inn ec la 125 timestamps nssr oan e id a 125 CUpti_ActivityEvent reisiin aaia oran encinas ANARA REAA asa NEEN NEEN 125 correlato WEE 125 GOMAIN EE 125 e EE 125 KING DEE 126 MELIUS 126 CUpti ActivityEventlnstance iu eise ee oin n tarn EN ENEE EN SSES Ern ENER REN EEN EE PRA suena tees 126 correlationld DEE 126 EYE END MEE 126 o 126 hieme 127 abe ce 127 H ere ege M 127 Val ii 127 CUpti ActivityGlobalACC ss tacita lili 127 aalge e Ee ETE E a 127 EES ILE EE 127 O ebe ae Ze e e dE e Ae Eed ee 128 le DEE 128 KN Re E 128 spar ld PUN geesde ee deeg ER snes NEEN E AER geg 128 sourceLocatorid EE 128 RU 1o EE 128 www nvidia com CUPTI DA 05679 001 _v5 5 ix CUpti Actwtvkernel ee KENNS ENEE SNE Eh NN cde ANER NNN DESEN NEEN e Ee cani 128 3 aa M 129 DIOEKY GE 129 DIOCKZ E 129 elei lu 129 CacheCopntgekecuestecd ee NEEN ENNEN EEN ENER SERA ia EE E S ENN EE e 129 eege A A A A 129 correlation DEE 130 deyiceld a a a and 130 dynamicsharedMemory coccion intar oana RENE ENEE ENEE AE eege OU eK 130 Morada ad 130 ian OE 130 euh REDE 130 ig pem ER reel 130 MS
5. 141 le el e 142 too M 142 www nvidia com CUPTI DA 05679 001 v5 5 xi correlation dui a n ias 142 devil nia A AAA A 142 astConteXtld EE 142 EA A ale ore wid Sau E ETAN EREVENT EA 142 OSI ta rr AA AA A ARA ARA AAA AAA ANA 143 Md ds 143 W 143 le 143 Pad eg geg GE A TT A E 143 TOA VO E 143 SrCCONTEXTI A noera enin aE Ondas 144 SHEDEVICE is cnc cists a A EEEE EE EEN ed 144 SEM EE 144 GT 144 RK lu E 144 CUpti ActivityMemset EE 144 DSi A A da 145 Contextid iso a ERI E EEA ET RP ONU E T ET 145 correlation d EE 145 dc fe DEE 145 n 145 Sheer 145 dll 145 ruritimeCorrelationld 5 556 eco oa earn eor xm Eee nE ANNER RR PA E ETE ENEE ANNER AEN EEN EEN 146 pig P 146 SCREAM EE 146 Vii AA EE AAA 146 CUDU Ee aer A Ia 146 ele Tu DEE 146 TT 147 Vis A A AA AAA le 147 MO AN A Adi 147 Dd A 147 CTT 147 CUpti_ActivityMetricInStalCE w eo ete ENN tica nieta 147 Correlation DEE 148 A EE 148 A RS RA 148 SN Ci ii ET 148 Je 148 Dad p T 148 VAlUS sais 149 lee Wa 187 4y EU 1 APO PO
6. F fanSpeed CUpti_ActivityEnvironment fileName CUpti_ActivitySourceLocator flags CUpti_ActivityMemcpy2 CUpti_ActivityDevice CUpti_ActivityMarker CUpti_ActivityMetric CUpti_ActivityMarkerData CUpti_ActivityMemcpy CUpti_ActivityMetricInstance CUpti_ActivityGlobalAccess functionName CUpti_NvtxData CUpti_CallbackData www nvidia com CUPTI Data Fields DA 05679 001 _v5 5 162 Data Fields functionParams CUpti_CallbackData CUpti_NvtxData functionReturnValue CUpti_CallbackData G globalMemoryBandwidth CUpti_ActivityDevice globalMemorySize CUpti_ActivityDevice gpuTemperature CUpti_ActivityEnvironment gridId CUpti ActivityKernel2 CUpti_ActivityCdpKernel CUpti_ActivityPreemption gridX CUpti_ActivityKernel2 CUpti_ActivityCdpKernel CUpti_ActivityKernel grid Y CUpti_ActivityKernel2 CUpti_ActivityKernel CUpti_ActivityCdpKernel gridZ CUpti_ActivityKernel CUpti_ActivityKernel2 CUpti_ActivityCdpKernel id CUpti_ActivityEvent CUpti_ActivityEventInstance CUpti_ActivityMetricInstance CUpti_ActivityMarkerData CUpti_ActivityMarker CUpti_ActivityDevice CUpti_ActivitySourceLocator CUpti_ActivityMetric instance CUpti_ActivityEventInstance CUpti_ActivityMetricInstance www nvidia com CUPTI DA 05679 001 _v5 5 163 kind CUpti_Activity CUpti_ActivityEnvironment CUpti_ActivityOverhead CUpti_ActivityMarkerData CUpti_ActivityMarker CUpti_ActivityName CUpti_ActivityContext CUpti_ActivityDevice CUpt
7. www nvidia com CUPTI DA 05679 001 v5 5 69 Modules CUptiResult cuptiDeviceGetNumEventDomains CUdevice device uint32_t numDomains Get the number of domains for a device Parameters device The CUDA device numDomains Returns the number of domains Returns gt CUPTI SUCCESS gt CUPTI ERROR NOT INITIALIZED gt CUPTI ERROR INVALID DEVICE gt CUPTI ERROR INVALID PARAMETER if numDomains is NULL Description Returns the number of domains in numDoma ins for a device Thread safety this function is thread safe CUptiResult cuptiDeviceGetTimestamp CUcontext context uint64 t timestamp Read a device timestamp Parameters context A context on the device from which to get the timestamp timestamp Returns the device timestamp Returns gt CUPTI SUCCESS gt CUPTI ERROR NOT INITIALIZED www nvidia com CUPTI DA 05679 001 v5 5 70 Modules gt CUPTI_LERROR_INVALID_CONTEXT gt CUPTI ERROR INVALID PARAMETER is timestamp is NULL Description Returns the device timestamp in timestamp The timestamp is reported in nanoseconds and indicates the time since the device was last reset Thread safety this function is thread safe CUptiResult cuptiDisableKernelReplayMode CUcontext context Disable kernel replay mode Parameters context The context Returns gt CUPTI SUCCESS Description Set profiling mode for the context to non replay default mode Eve
8. Description The identifier for the activity object objectKind indicates which ID is valid for this record CUpti ActivityObjectKind CUpti ActivityOverhead objectKind Description The Kind of activity object that the overhead is associated with CUpti ActivityOverheadKind CUpti ActivityOverhead overheadKind Description The kind of overhead CUPTI DRIVER COMPILER etc uint64 t CUpti ActivityOverhead start Description The start timestamp for the overhead in ns A value of 0 for both the start and end timestamps indicates that timestamp information could not be collected for the overhead 3 23 CUpti ActivityPreemption Struct Reference The activity record for a preemption of a CDP kernel This activity record represents a preemption of a CDP kernel uint32 t CUpti ActivityPreemption blockX Description The X dimension of the block that is preempted www nvidia com CUPTI DA 05679 001 v5 5 151 Data Structures uint32 t CUpti_ActivityPreemption blockY Description The Y dimension of the block that is preempted uint32 t CUpti ActivityPreemption blockZ Description The Z dimension of the block that is preempted int64_t CUpti ActivityPreemption gridld Description The grid id of the block that is preempted CUpti ActivityKind CUpti ActivityPreemption kind Description The activity record kind must be CUPTI ACTIVITY KIND PREEMPTION uint32 t CUpti ActivityPreemption pad Description
9. uint32_t CUpti_ActivityCdpKernel contextld Description The ID of the context where the kernel is executing www nvidia com CUPTI DA 05679 001 _v5 5 113 Data Structures uint32 t CUpti_ActivityCdpKernel correlationld Description The correlation ID of the kernel Each kernel execution is assigned a unique correlation ID that is identical to the correlation ID in the driver API activity record that launched the kernel uint32 t CUpti_ActivityCdpKernel deviceld Description The ID of the device where the kernel is executing int32 t CUpti ActivityCdpKernel dynamicSharedMemory Description The dynamic shared memory reserved for the kernel in bytes uint64 t CUpti ActivityCdpKernel end Description The end timestamp for the kernel execution in ns A value of 0 for both the start and end timestamps indicates that timestamp information could not be collected for the kernel uint8 t CUpti ActivityCdpKernel executed Description The cache configuration used for the kernel The value is one of the CUfunc cache enumeration values from cuda h int64 t CUpti_ActivityCdpKernel gridld Description The grid ID of the kernel Each kernel execution is assigned a unique grid ID www nvidia com CUPTI DA 05679 001 v5 5 114 Data Structures int32 t CUpti_ActivityCdpKernel gridX Description The X dimension grid size for the kernel int32 t CUpti ActivityCdpKernel gridY Description The Y dime
10. 108 Data Structures CUpti_ActivityMemcpy The activity record for memory copies CUpti_ActivityMemcpy2 The activity record for peer to peer memory copies CUpti_ActivityMemset The activity record for memset CUpti_ActivityMetric The activity record for a CUPTI metric CUpti_ActivityMetricInstance The activity record for a CUPTI metric with instance information This activity record represents a CUPTI metric value for a specific metric domain instance CUPTI ACTIVITY KIND METRIC INSTANCE This activity record kind is not produced by the activity API but is included for completeness and ease of use Profile frameworks built on top of CUPTI that collect metric data may choose to use this type to store the collected metric data This activity record should be used when metric domain instance information needs to be associated with the metric CUpti_ActivityName The activity record providing a name CUpti_ActivityObjectKindId Identifiers for object kinds as specified by CUpti_ActivityObjectKind CUpti_ActivityOverhead The activity record for CUPTI and driver overheads CUpti_ActivityPreemption The activity record for a preemption of a CDP kernel CUpti_ActivitySourceLocator The activity record for source locator CUpti_CallbackData Data passed into a runtime or driver API callback function CUpti_EventGroupSet A set of event groups CUpti_EventGroupSets A set of event group sets CUpti_MetricValue A metric value CUpti_NvtxData Data passed into a NVTX
11. CUPTI CBID RESOURCE FORCE INT Ox7fffffff enum CUpti CallbackldSync Callback IDs for synchronization domain Callback IDs for synchronization domain CUPTI CB DOMAIN SYNCHRONIZE This value is communicated to the callback function via the cbid parameter Values CUPTI CBID SYNCHRONIZE INVALID 0 Invalid synchronize callback ID CUPTI CBID SYNCHRONIZE STREAM SYNCHRONIZED 1 Stream synchronization has completed for the stream CUPTI CBID SYNCHRONIZE CONTEXT SYNCHRONIZED 2 Context synchronization has completed for the context CUPTI CBID SYNCHRONIZE SIZE CUPTI CBID SYNCHRONIZE FORCE INT 0x7fffffff www nvidia com CUPTI DA 05679 001 v5 5 54 Modules typedef CUpti_CallbackFunc void userdata CUpti_CallbackDomain domain CUpti_Callbackld cbid const void cbdata Function type for a callback Function type for a callback The type of the data passed to the callback in codata depends on the domain If domain is CUPTI_CB_DOMAIN_DRIVER_API or CUPTI CB DOMAIN RUNTIME API the type of cbdata will be CUpti_CallbackData If domain is CUPTI CB DOMAIN RESOURCE the type of cbdata will be CUpti_ResourceData If domain is CUPTI_CB_DOMAIN_SYNCHRONIZE the type of cbdata will be CUpti_SynchronizeData If domain is CUPTI_CB_DOMAIN_NVTX the type of cbdata will be CUpti_NvtxData typedef uint32_t CUpti_Callbackld An ID for a driver API runtime API resource or synchronization callback An ID for a driver API runtime API resou
12. CUpti MetriclD metricArray Get all the metrics available on any device Parameters arraySizeBytes The size of met ricArray in bytes and returns the number of bytes written to metricArray metricArray Returns the IDs of the metrics Returns gt CUPTI SUCCESS gt CUPTI ERROR INVALID PARAMETER ifarraySizeBytes ormetricArray are NULL www nvidia com CUPTI DA 05679 001 v5 5 97 Modules Description Returns the metric IDs in metricArray for all CUDA capable devices The size of the metricArray buffer is given by arraySizeBytes The size of the metricArray buffer must be at least numMetrics sizeof CUpti MetricID or all metric IDs will not be returned The value returned in arraySizeBytes contains the number of bytes returned in metricArray CUptiResult cuptiGetNumMetrics uint32 t numMetrics Get the total number of metrics available on any device Parameters numMetrics Returns the number of metrics Returns gt CUPTI SUCCESS gt CUPTI ERROR INVALID PARAMETER if numMetrics is NULL Description Returns the total number of metrics available on any CUDA capable devices CUptiResult cuptiMetricCreateEventGroupSets CUcontext context size t metricldArraySizeBytes CUpti MetricID metricldArray CUpti EventGroupSets eventGroupPasses For a set of metrics get the grouping that indicates the number of passes and the event groups necessary to collect the events required for t
13. Name of the symbol operated on by the runtime or driver API function which issued the callback This entry is valid only for driver and runtime launch callbacks where it returns the name of the kernel 3 26 CUpti EventGroupSet Struct Reference A set of event groups A set of event groups When returned by cuptiEventGroupSetsCreate and cuptiMetricCreateEventGroupSets a set indicates that event groups that can be enabled at the same time i e all the events in the set can be collected simultaneously www nvidia com CUPTI DA 05679 001 v5 5 155 Data Structures CUpti_EventGroup CUpti EventGroupSet eventGroups Description An array of numEventGroups event groups uint32 t CUpti EventGroupSet numEventGroups Description The number of event groups in the set 3 27 CUpti EventGroupSets Struct Reference A set of event group sets A set of event group sets When returned by cuptiEventGroupSetsCreate and cuptiMetricCreateEventGroupSets a CUpti EventGroupSets indicates the number of passes required to collect all the events and the event groups that should be collected during each pass uint32 t CUpti EventGroupSets numSets Description Number of event group sets CUpti EventGroupSet CUpti EventGroupSets sets Description An array of numSets event group sets 3 28 CUpti MetricValue Union Reference A metric value Metric values can be one of several different kinds Corresponding to each kind is a me
14. CUPTI ERROR INVALID EVENT VALUE if any of the event values required for the metric is CUPTI EVENT OVERFLOW gt CUPTI ERROR NOT COMPATIBLE if the computed metric value cannot be represented in the metric s value type For example if the metric value type is unsigned and the computed metric value is negative gt CUPTI ERROR INVALID PARAMETER if metricValue eventIdArray or eventValueArray is NULL Description Use the events and properties collected for a metric to calculate the metric value Metric value evaluation depends on the evaluation mode CUpti MetricEvaluationMode that the metric supports If a metric has evaluation mode as CUPTI METRIC EVALUATION MODE PER INSTANCE then it assumes that the input event value is for one domain instance If a metric has evaluation mode as CUPTI METRIC EVALUATION MODE AGGREGATE it assumes that input event values are normalized to represent all domain instances on a device For the most accurate metric collection the events required for the metric should be collected for all profiled domain instances For example to collect all instances of an event www nvidia com CUPTI DA 05679 001 v5 5 106 Modules set the CUPTL EVENT GROUP ATTR PROFILE ALL DOMAIN INSTANCES attribute on the group containing the event to 1 The normalized value for the event is then sum event values totalInstanceCount instanceCount where sum event values is the summation of the event values across
15. The source or destination memory is on the device CUPTI ACTIVITY MEMORY KIND ARRAY 4 The source or destination memory is an array CUPTI ACTIVITY MEMORY KIND FORCE INT 0x7fffffff enum CUpti ActivityObjectKind The kinds of activity objects See also CUpti ActivityObjectKindId Values CUPTI ACTIVITY OBJECT UNKNOWN 0 The object kind is not known CUPTI ACTIVITY OBJECT PROCESS 1 A process CUPTI ACTIVITY OBJECT THREAD 2 A thread CUPTI ACTIVITY OBJECT DEVICE 3 A device www nvidia com CUPTI DA 05679 001 v5 5 36 Modules CUPTI_ACTIVITY_OBJECT_CONTEXT 4 A context CUPTI ACTIVITY OBJECT STREAM 5 A stream CUPTI ACTIVITY OBJECT FORCE INT 0x7fffffff enum CUpti ActivityOverheadKind The kinds of activity overhead Values CUPTI ACTIVITY OVERHEAD UNKNOWN 0 The overhead kind is not known CUPTI ACTIVITY OVERHEAD DRIVER COMPILER 1 Compiler JIT overhead CUPTI ACTIVITY OVERHEAD CUPTI BUFFER FLUSH 1 16 Activity buffer flush overhead CUPTI ACTIVITY OVERHEAD CUPTI INSTRUMENTATION 2 16 CUPTI instrumentation overhead CUPTI ACTIVITY OVERHEAD CUPTI RESOURCE 3 lt lt 16 CUPTI resource creation and destruction overhead CUPTI ACTIVITY OVERHEAD FORCE INT 0x7fffffff enum CUpti ActivityPreemptionKind The kind of a preemption activity Values CUPTI ACTIVITY PREEMPTION KIND UNKNOWN 0 The preemption kind is not known CUPTI ACTIVITY PREEMPTION KIND SAVE 1 Preemption to save CDP b
16. Unable to allocate enough memory to perform the requested operation CUPTI_ERROR_HARDWARE 9 An error occurred on the performance monitoring hardware CUPTI_ERROR_PARAMETER_SIZE_NOT_SUFFICIENT 10 The output buffer size is not sufficient to return all requested data CUPTI_ERROR_API_NOT_IMPLEMENTED 11 API is not implemented CUPTI_ERROR_MAX_LIMIT_REACHED 12 The maximum limit is reached CUPTI_ERROR_NOT_READY 13 The object is not yet ready to perform the requested operation CUPTI_ERROR_NOT_COMPATIBLE 14 The current operation is not compatible with the current state of the object CUPTI_ERROR_NOT_INITIALIZED 15 CUPTI is unable to initialize its connection to the CUDA driver CUPTI_ERROR_INVALID_METRIC_ID 16 The metric id is invalid CUPTI_ERROR_INVALID_METRIC_NAME 17 The metric name is invalid CUPTI_ERROR_QUEUE_EMPTY 18 The queue is empty CUPTI_ERROR_INVALID_HANDLE 19 Invalid handle internal CUPTI ERROR INVALID STREAM 20 Invalid stream CUPTI ERROR INVALID KIND 21 Invalid kind CUPTI ERROR INVALID EVENT VALUE 22 Invalid event value CUPTI ERROR DISABLED 23 CUPTI is disabled due to conflicts with other enabled profilers CUPTI ERROR INVALID MODULE 24 Invalid module CUPTI ERROR INVALID METRIC VALUE 25 Invalid metric value CUPTI ERROR HARDWARE BUSY 26 The performance monitoring hardware is in use by other client CUPTI ERROR UNKNOWN 999 An unknown internal error has occurred CU
17. Using the callback API with the CUPTI CB DOMAIN RESOURCE domain you can associate a callback function with some CUDA resource creation and destruction events For example when a CUDA context is created your callback function will be invoked with a callback ID equal to CUPTI_CBID RESOURCE CONTEXT CREATED For this domain the cbdata argument to your callback function will be of the type CUpti ResourceData 1 4 3 Synchronization Callbacks Using the callback API with the CUPTI CB DOMAIN SYNCHRONIZE domain you can associate a callback function with CUDA context and stream synchronizations For example when a CUDA context is synchronized your callback function will be invoked with a callback ID equal to CUPTI_CBID SYNCHRONIZE CONTEXT SYNCHRONIZED For this domain the cbdata argument to your callback function will be of the type CUpti SynchronizeData 1 4 4 NVIDIA Tools Extension Callbacks Using the callback API with the CUPTI CB DOMAIN NVTX domain you can associate a callback function with NVIDIA Tools Extension NVTX API functions When an NVTX function is invoked in the application your callback function is invoked as well For these domains the cbdata argument to your callback function will be of the type CUpti NvtxData The NVTX library has its own convention for discovering the prof
18. computeApiKind Description The compute API kind See also CUpti_ActivityComputeApiKind uint32_t CUpti ActivityContext contextld Description The context ID uint32 t CUpti ActivityContext deviceld Description The device ID CUpti ActivityKind CUpti ActivityContext kind Description The activity record kind must be CUPTI ACTIVITY KIND CONTEXT 3 6 CUpti ActivityDevice Struct Reference The activity record for a device This activity record represents information about a GPU device CUPTI ACTIVITY KIND DEVICE uint32 t CUpti ActivityDevice computeCapabilityMajor Description Compute capability for the device major number www nvidia com CUPTI DA 05679 001 _v5 5 118 Data Structures uint32_t CUpti_ActivityDevice computeCapabilityMinor Description Compute capability for the device minor number uint32_t CUpti_ActivityDevice constantMemorySize Description The amount of constant memory on the device in bytes uint32_t CUpti_ActivityDevice coreClockRate Description The core clock rate of the device in kHz CUpti_ActivityFlag CUpti ActivityDevice flags Description The flags associated with the device See also CUpti_ActivityFlag uint64_t CUpti_ActivityDevice globalMemoryBandwidth Description The global memory bandwidth available on the device in kBytes sec uint64_t CUpti_ActivityDevice globalMemorySize Description The amount of global memory on the device in byte
19. or NULL to query the global queue streamId The stream ID validBufferSizeBytes Returns the number of bytes in the buffer that contain activity records Returns gt CUPTI SUCCESS gt CUPTI ERROR NOT INITIALIZED gt CUPTI ERROR INVALID PARAMETER if buffer or validBufferSizeBytes are NULL gt CUPTI ERROR MAX LIMIT REACHED if buffer is full gt CUPTI ERROR QUEUE EMPTY the queue is empty validBufferSizeBytes returns 0 Description Query the status of buffer at the head in the queue See cuptiActivityEnqueueBuffer for description of queues Calling this function does not transfer ownership of the buffer CUptiResult cuptiActivityRegisterCallbacks CUpti BuffersCallbackRequestFunc funcBufferRequested www nvidia com CUPTI DA 05679 001 v5 5 48 Modules CUpti_BuffersCallbackCompleteFunc funcBufferCompleted Registers callback functions with CUPTI for activity buffer handling Parameters funcBufferRequested callback which is invoked when an empty buffer is requested by CUPTI funcBufferCompleted callback which is invoked when a buffer containing activity records is available from CUPTI Returns gt CUPTI_SUCCESS gt CUPTI ERROR INVALID PARAMETER if either funcBuf ferRequested or funcBufferCompleted is NULL Description This function registers two callback functions to be used in asynchronous buffer handling If registered activity record buffers are handled using asynchrono
20. tools that target CUDA applications CUPTI provides four APIs the Activity API the Callback API the Event API and the Metric API Using these APIs you can develop profiling tools that give insight into the CPU and GPU behavior of CUDA applications CUPTI is delivered as a dynamic library on all platforms supported by CUDA 1 1 CUPTI Compatibility and Requirements New versions of the CUDA driver are backwards compatible with older versions of CUPTI For example a developer using a profiling tool based on CUPTI 4 1 can update to a more recently released CUDA driver However new versions of CUPTI are not backwards compatible with older versions of the CUDA driver For example a developer using a profiling tool based on CUPTI 4 1 must have a version of the CUDA driver released with CUDA Toolkit 4 1 or later installed as well CUPTI calls will fail with CUPTI ERROR NOT INITIALIZED if the CUDA driver version is not compatible with the CUPTI version 1 2 CUPTI Initialization CUPTI initialization occurs lazily the first time you invoke any CUPTI function For the Event Metric and Callback APIs there are no requirements on when this initialization must occur i e you can invoke the first CUPTI function at any point For correct operation the Activity API does require that CUPTI be initialized before any CUDA driver or runtime API is invoked See the CUPTI Activity API section for more information on CUPTI initialization requirements for
21. www nvidia com CUPTI DA 05679 001 _v5 5 2 Introduction to read and write attributes that control how the buffering API behaves See the API documentation for more information The activity_trace_async sample shows how to use the activity buffer API to collect a trace of CPU and GPU activity for a simple application 1 4 CUPTI Callback API The CUPTI Callback API allows you to register a callback into your own code Your callback will be invoked when the application being profiled calls a CUDA runtime or driver function or when certain events occur in the CUDA driver The following terminology is used by the callback API Callback Domain Callbacks are grouped into domains to make it easier to associate your callback functions with groups of related CUDA functions or events There are currently four callback domains as defined by CUpti_CallbackDomain a domain for CUDA runtime functions a domain for CUDA driver functions a domain for CUDA resource tracking and a domain for CUDA synchronization notification Callback ID Each callback is given a unique ID within the corresponding callback domain so that you can identify it within your callback function The CUDA driver API IDs are defined in cupti_ driver cbid h and the CUDA runtime API IDs are defined in cupti runtime cbid h Both of these headers are included for you when you include cupti h The CUDA resource callback IDs are defined by CUpti_CallbackIdResource and the CUDA synchro
22. 3 1 4 1 Driver and Runtime API Callbacks e eene eme eene 4 1 4 2 Resource Callbacks eese a gege ee decals eg ai ENEE eV er e ue 5 1 4 3 Synchronization Callbacks ene eee hehe ee enne 5 1 4 4 NVIDIA Tools Extension Callbacks Inn n eene 5 1 5 CUPTI Event API MPUERO ei 7 1 5 1 Collecting Kernel Execution Events 8 1 52 Sampling Events coda ra in ai 9 1 6 CUPTI Metric APh m 9 1 6 1 Metric Reference Compute Capability 1 x essensen eeeeeeeeeeees 11 1 6 2 Metric Reference Compute Capability 2 x cesses 11 1 6 3 Metric Reference Compute Capability 3 x sne sner nnennee 17 17e Sample EE 23 Chapter 2 Modules Wi as 24 SL CUPTI Verston es 24 CUPLIGELVETSION TE 24 aU a diss m 25 2 2 CUPTI Result Codes ancianidad 25 lee TEE 25 cuptiGetResultString EEN ENN ENEE EE eas ches NENNEN RENE a iaa 27 2 3 CUPTI Activity AP leccion A de EE d ENER EE 27 CUPDEI_ACtiVI M See ER 28 CUpti Activity API ugeet dee NEEN etude NEEN ete DEENEN ses Aerea 28 CUpti_ActivityBr NCh coccion nc EEN A ENNER EE geed RR AR gege 28 CUpti_ActivityCdpkKerneL rere ror rhe A cia 28 CUPU ActiyityCOntext ioescs dE EE anh EEN sate SEXE VUA EN e REA IN C ER UC en HEX UE 28 CUPUILACTIVITYDEVICE IRR 28 CUpti ActivityEnvironment 5 eere entrenar KAES NES EN NEEN NEEN NEEN NN NEEN EA EEN NEE Ie RE
23. CB DOMAIN RUNTIME API amp amp cbid CUPTI RUNTIME TRACE CBID cudaMemcpy v3020 if cbInfo gt callbackSite CUPTI API ENTER cudaMemcpy v3020 params funcParams cudaMemcpy v3020 params cbInfo gt functionParams size t count funcParams gt count enum cudaMemcpyKind kind funcParams gt kind www nvidia com CUPTI DA 05679 001 _v5 5 4 Introduction In your callback function you use the CUpti_CallbackDomain and CUpti CallbackID parameters to determine which CUDA API function invocation is causing this callback In the example above we are checking for the CUDA runtime cudaMemcpy function The cbdata parameter holds a structure of useful information that can be used within the callback In this case we use the cal lbackSite member of the structure to detect that the callback is occurring on entry to cudaMemcpy and we use the functionParams member to access the parameters that were passed to cudaMemcpy To access the parameters we first cast functionParams to a structure type corresponding to the cudaMemcpy function These parameter structures are contained in generated cuda runtime api meta h generated cuda meta h and a number of other files When possible these files are included for you by cupti h The callback_event and callback_timestamp samples described on the samples page both show how to use the callback API for the driver and runtime API domains 1 4 2 Resource Callbacks
24. CUPTI DA 05679 001 v5 5 110 Data Structures uint64 t CUpti_ActivityAPI end Description The end timestamp for the function in ns A value of 0 for both the start and end timestamps indicates that timestamp information could not be collected for the function CUpti_ActivityKind CUpti_ActivityAPI kind Description The activity record kind must be CUPTI_ACTIVITY_KIND_DRIVER or CUPTI_ACTIVITY_KIND_RUNTIME uint32 t CUpti_ActivityAPI processld Description The ID of the process where the driver or runtime CUDA function is executing uint32 t CUpti ActivityAPI returnValue Description The return value for the function For a CUDA driver function with will be a CUresult value and for a CUDA runtime function this will be a cudaError t value uint64 t CUpti_ActivityAPI start Description The start timestamp for the function in ns A value of 0 for both the start and end timestamps indicates that timestamp information could not be collected for the function uint32 t CUpti ActivityAPI threadld Description The ID of the thread where the driver or runtime CUDA function is executing www nvidia com CUPTI DA 05679 001 _v5 5 111 Data Structures 3 3 CUpti_ActivityBranch Struct Reference The activity record for source level result branch This activity record the locations of the branches in the source CUPTI_ACTIVITY_KIND_BRANCH uint32 t CUpti ActivityBranch correlationld Description The correl
25. ERR 28 CUpti Activity EVENT qe NEEN REENEN EEN REENEN ENNEN NEEN Esas 28 CUpti ActivityEventlnstarice c eene cinco canes 28 CUpti ActivityGlobalAccess eeeeeeeeeee eese esee ENNER hhee E ES E herus EE a iaeia 28 Clpti ActivityKernel cetero Seen ZER NEE EEN AXE ERE QE ORE VE RE ses 28 CUpti ActivityKernel2 iiis cese roh renean ennt scene eii ne p seein KEEN Y SEN E EEUU Y ERN Nee 28 eoo Meca ld CIR UO S 28 CUpti ActivityMarkerData e icr rr rnt rr hehehe ee E ANER ENEE RR RO 29 CUpti ActivityMemcpy eek ENER NEEN RE Er ER S ERE aaa 29 www nvidia com CUPTI DA 05679 001 v5 5 iii CUpti ActiyityMemcpyA TTT 29 CUPU ActivityMemtset TTT 29 CUpti ACHVItYMEt oer EEN NEEN ANEREN NEEN NEEN ANEREN REESEN NEEN EEN ANEN NEE NN NEEN Ke 29 CUPtiActivityMetriclNStaNne icccscccscccescosscsectssceseecesestectssbecssssedasiacsensecssssensases 29 CUPU_ACUIVIEYNAIN CG uge eE EEN EENS REENEN E En RE Er DTE ER REENEN SEENEN sb ENEE EN 30 CUpti_ActivityObjectKindid o ooconconcnconccccccnccannconononnonarocacanocononocnccancanccannenos 30 CUpti ActivityOverhead econtra ee ge Ee SEA e ee dE au ENEE EEN SEENEN 30 CUpti_ActivityPreemMpti0N asernes erkende MANERER ER NEEN te NEEN NN 30 CUpti_ActivitySoUrceLocad cido a VE NNN EE gts 30 CUpti ActivityAttrib te ae Rete ehem aaa na 30 CUpti ActivityComputeApiKind 1 rsen rentre nnno
26. Returns gt CUPTI_SUCCESS on success gt CUPTI ERROR INVALID PARAMETER if version is NULL www nvidia com CUPTI DA 05679 001 _v5 5 24 Modules Description Return the API version in version See also CUPTI API VERSION define CUPTI API VERSION 4 The API version for this implementation of CUPTI The API version for this implementation of CUPTI This define along with cuptiGetVersion can be used to dynamically detect if the version of CUPTI compiled against matches the version of the loaded CUPTI library v1 CUDAToolsSDK 4 0 v2 CUDAToolsSDK 4 1 v3 CUDA Toolkit 5 0 v4 CUDA Toolkit 5 5 2 2 CUPTI Result Codes Error and result codes returned by CUPTI functions enum CUptiResult CUPTI result codes Error and result codes returned by CUPTI functions Values CUPTI_SUCCESS 0 No error CUPTI ERROR INVALID PARAMETER 1 One or more of the parameters is invalid CUPTI ERROR INVALID DEVICE 2 The device does not correspond to a valid CUDA device CUPTI ERROR INVALID CONTEXT 3 The context is NULL or not valid CUPTI ERROR INVALID EVENT DOMAIN ID 4 The event domain id is invalid CUPTI ERROR INVALID EVENT ID 5 The event id is invalid CUPTI ERROR INVALID EVENT NAME 6 The event name is invalid CUPTI ERROR INVALID OPERATION 7 The current operation cannot be performed due to dependency on other factors www nvidia com CUPTI DA 05679 001 v5 5 25 Modules CUPTI_ERROR_OUT_OF_MEMORY 8
27. UNKNOWN O An invalid unknown correlation ID A correlation ID of this value indicates that there is no correlation for the activity record define CUPTI GRID ID UNKNOWN OLL An invalid unknown grid ID define CUPTI SOURCE LOCATOR ID UNKNOWN O The source locator ID that indicates an unknown source location There is not an actual CUpti ActivitySourceLocator object corresponding to this value define CUPTI_TIMESTAMP_UNKNOWN OLL An invalid unknown timestamp for a start end queued submitted or completed time 2 4 CUPTI Callback API Functions types and enums that implement the CUPTI Callback API www nvidia com CUPTI DA 05679 001 _v5 5 52 Modules struct CUpti_CallbackData Data passed into a runtime or driver API callback function struct CUpti_NvtxData Data passed into a NVTX callback function struct CUpti_ResourceData Data passed into a resource callback function struct CUpti_SynchronizeData Data passed into a synchronize callback function enum CUpti_ApiCallbackSite Specifies the point in an API call that a callback is issued Specifies the point in an API call that a callback is issued This value is communicated to the callback function via CUpti_CallbackData callbackSite Values CUPTI_API_ENTER 0 The callback is at the entry of the API call CUPTI API EXIT 1 The callback is at the exit of the API call CUPTI API CBSITE FORCE INT 0x7fffffff enum CUpti_CallbackDomain Callba
28. Undefined Reserved for internal use CUpti ActivityPreemptionKind CUpti ActivityPreemption preemptionKind Description kind of the preemption uint 4 t CUpti ActivityPreemption timestamp Description The timestamp of the preemption in ns A value of 0 indicates that timestamp information could not be collected for the preemption www nvidia com CUPTI DA 05679 001 v5 5 152 Data Structures 3 24 CUpti_ActivitySourceLocator Struct Reference The activity record for source locator This activity record represents a source locator CUPTI_ACTIVITY_KIND_SOURCE_LOCATOR const char CUpti ActivitySourceLocator fileName Description The path for the file uint32 t CUpti ActivitySourceLocator id Description The ID for the source path will be used in all the source level results CUpti ActivityKind CUpti ActivitySourceLocator kind Description The activity record kind must be CUPTI ACTIVITY KIND SOURCE LOCATOR uint32 t CUpti_ActivitySourceLocator lineNumber Description The line number in the source 3 25 CUpti CallbackData Struct Reference Data passed into a runtime or driver API callback function Data passed into a runtime or driver API callback function as the cbdata argument to CUpti CallbackFunc The cbdata will be this type for domain equal to CUPTI CB DOMAIN DRIVER API or CUPTI CB DOMAIN RUNTIME API The callback data is valid only within the invocation of the callback function that is p
29. and destination targets of a memory copy Targets are host device and array Values CUPTI ACTIVITY MEMCPY KIND UNKNOWN 0 The memory copy kind is not known CUPTI ACTIVITY MEMCPY KIND HTOD 1 A host to device memory copy CUPTI ACTIVITY MEMCPY KIND DTOH 2 A device to host memory copy CUPTI ACTIVITY MEMCPY KIND HTOA 3 A host to device array memory copy CUPTI ACTIVITY MEMCPY KIND ATOH 4 A device array to host memory copy CUPTI ACTIVITY MEMCPY KIND ATOA 5 A device array to device array memory copy CUPTI ACTIVITY MEMCPY KIND ATOD 6 A device array to device memory copy CUPTI ACTIVITY MEMCPY KIND DTOA 7 A device to device array memory copy www nvidia com CUPTI DA 05679 001 v5 5 35 Modules CUPTI ACTIVITY MEMCPY KIND DTOD 8 A device to device memory copy CUPTI ACTIVITY MEMCPY KIND HTOH 9 A host to host memory copy CUPTI ACTIVITY MEMCPY KIND PTOP 10 A peer to peer memory copy CUPTI ACTIVITY MEMCPY KIND FORCE INT Ox7fffffff enum CUpti ActivityMemoryKind The kinds of memory accessed by a memory copy Each kind represents the type of the source or destination memory accessed by a memory copy Values CUPTI ACTIVITY MEMORY KIND UNKNOWN 0 The source or destination memory kind is unknown CUPTI ACTIVITY MEMORY KIND PAGEABLE 1 The source or destination memory is pageable CUPTI ACTIVITY MEMORY KIND PINNED 2 The source or destination memory is pinned CUPTI ACTIVITY MEMORY KIND DEVICE 3
30. callback function CUpti_ResourceData Data passed into a resource callback function CUpti_SynchronizeData Data passed into a synchronize callback function www nvidia com CUPTI DA 05679 001 _v5 5 109 Data Structures 3 1 CUpti_Activity Struct Reference The base activity record The activity API uses a CUpti Activity as a generic representation for any activity The kind field is used to determine the specific activity kind and from that the CUpti_Activity object can be cast to the specific activity record type appropriate for that kind Note that all activity record types are padded and aligned to ensure that each member of the record is naturally aligned See also CUpti_ActivityKind CUpti_ActivityKind CUpti_Activity kind Description The kind of this activity 3 2 CUpti_ActivityAPl Struct Reference The activity record for a driver or runtime API invocation This activity record represents an invocation of a driver or runtime API CUPTI ACTIVITY KIND DRIVER and CUPTI ACTIVITY KIND RUNTIME CUpti_Callbackld CUpti_ActivityAPI cbid Description The ID of the driver or runtime function uint32 t CUpti_ActivityAPI correlationld Description The correlation ID of the driver or runtime CUDA function Each function invocation is assigned a unique correlation ID that is identical to the correlation ID in the memcpy memset or kernel activity record that is associated with this function www nvidia com
31. callbacks subscriber Handle to callback subscription domain The domain of the callback Returns gt CUPII SUCCESS on success gt CUPTI ERROR NOT INITIALIZED if unable to initialized CUPTI gt CUPTI ERROR INVALID PARAMETER if subscriber or domain is invalid Description Enable or disabled all callbacks for a specific domain www nvidia com CUPTI DA 05679 001 v5 5 57 Modules Thread safety a subscriber must serialize access to cuptiGetCallbackState cuptiEnableCallback cuptiEnableDomain and cuptiEnableAllDomains For example if cuptiGetCallbackEnabled sub d and cuptiEnableDomain sub d are called concurrently the results are undefined CUptiResult cuptiGetCallbackName CUpti CallbackDomain domain uint32 t cbid const char name Get the name of a callback for a specific domain and callback ID Parameters domain The domain of the callback cbid The ID of the callback name Returns pointer to the name string on success NULL otherwise Returns gt CUPTI SUCCESS on success gt CUPTI ERROR INVALID PARAMETER if name is NULL or if domain or cbid is invalid Description Returns a pointer to the name c string in name Names are available only for the DRIVER and RUNTIME domains www nvidia com CUPTI DA 05679 001 _v5 5 58 Modules CUptiResult cuptiGetCallbackState uint32_t enable CUpti_SubscriberHandle subscriber CUpti_CallbackDomain domain CUpti_Callbackld cb
32. com CUPTI DA 05679 001 v5 5 129 Data Structures uint32 t CUpti ActivityKernel correlationld Description The correlation ID of the kernel Each kernel execution is assigned a unique correlation ID that is identical to the correlation ID in the driver API activity record that launched the kernel uint32 t CUpti ActivityKernel deviceld Description The ID of the device where the kernel is executing int32 t CUpti ActivityKernel dynamicSharedMemory Description The dynamic shared memory reserved for the kernel in bytes uint64 t CUpti ActivityKernel end Description The end timestamp for the kernel execution in ns A value of 0 for both the start and end timestamps indicates that timestamp information could not be collected for the kernel int32 t CUpti ActivityKernel gridX Description The X dimension grid size for the kernel int32 t CUpti ActivityKernel gridY Description The Y dimension grid size for the kernel int32 t CUpti ActivityKernel gridZ Description The Z dimension grid size for the kernel www nvidia com CUPTI DA 05679 001 v5 5 130 Data Structures CUpti ActivityKind CUpti_ActivityKernel kind Description The activity record kind must be CUPTI ACTIVITY KIND KERNEL or CUPTI ACTIVITY KIND CONCURRENT KERNEL uint32 t CUpti_ActivityKernel localMemoryPerThread Description The amount of local memory reserved for each thread in bytes uint32 t CUpti_ActivityKernel local
33. completion callback Parameters flag Reserved must be 0 Returns gt CUPTI_SUCCESS gt CUPTI ERROR NOT INITIALIZED gt CUPTI ERROR INVALID OPERATION if not preceeded by a successful call to cuptiActivityRegisterCallbacks gt CUPTI ERROR UNKNOWN an internal error occurred Description This function does not return until all activity records associated with all contexts streams and the global buffers not associated with any stream are returned to the CUPTI client using the callback registered in cuptiActivityRegisterCallbacks To ensure that all activity records are complete the requested stream s if any are synchronized Before calling this function the buffer handling callback api must be activated by calling cuptiActivityRegisterCallbacks CUptiResult cuptiActivityGetAttribute CUpti ActivityAttribute attr size t valueSize void value Read an activity API attribute Parameters attr The attribute to read valueSize Size of buffer pointed by the value and returns the number of bytes written to value value Returns the value of the attribute www nvidia com CUPTI DA 05679 001 v5 5 45 Modules Returns gt CUPTI SUCCESS gt CUPTI ERROR NOT INITIALIZED gt CUPTI ERROR INVALID PARAMETER if valueSize or value is NULL or if attr is not an activity attribute gt CUPTI ERROR PARAMETER SIZE NOT SUFFICIENT Indicates that the value buffer is too small to hold the attribute
34. cuptiEnumMetrics function CUPTI provides two functions for calculating a metric value cuptiMetricGetValue2 can be used to calculate a metric value when the device is not available AII required event values and metric properties must be provided by the caller cuptiMetricGetValue can be used to calculate a metric value when the device is available as a CUdevice object All required event values must be provided by the caller but CUPTI will determine the appropriate property values from the CUdevice object Configuring and calculating metric values requires the following steps First determine the name of the metric that you want to collect and then use the cuptiMetricGetIdFromName to get the metric ID Use cuptiMetricEnumEvents to get the events required to calculate the metric and follow instructions in the CUPTI Event API section to create the event groups for those events Alternatively you can use the cuptiMetricCreateEventGroupSets function to automatically create the event group s required for metric s events If you are using cuptiMetricGetValue2 the you must also collect the required metric property values using cuptiMetricEnumProperties Collect event counts as described in the CUPTI Event API section and then use either cuptiMetricGetValue or cuptiMetricGetValue2 to calculate the metric value from the collected event and property values The callback metric sample described on the samples p
35. d NK EEN iniciada NENNEN EEN ES EENES 93 CUpti MetricCategory NS KESENNERNENR KENE NEES EEE KEREN NEEN EEN neon KEE GE RNA DRE ANERER EE 93 CUpti_MetricEvaluationMode ooocccoonconnconoroccconccanccnncooncccnncnncanarooncocccanaranaronos 94 CUpti_MetricPropertyDeviceClass ssssssssssessesssssessesseresssssssssssessessessessoeseseoo 94 CUpti_MetricPropertylD veias ale 94 CUpti MetricValtileKind uu cocinas cr a di 95 CUpti_MetricValueUtilizationLevel ooooococccocconccnnononnconccnnnconccnonconnroncnccccccccncnes 95 eoa ug cU 96 cuptiDeviceEnumMetrics es venerint n ia ERR ERROR EROR R a 96 cuptiDeviceGetNumWMetriCs 5 eene hh aan ahh e RR NES ERRARE ERAR E RERARE ERAS EASKE REACER 97 else e LEE 97 cUPtiGetNuMMet CS isis 98 cuptiMetricCreateEventGroupSetS ursissrsrrrrisirsisisrrs stanit tats iTA A nene enhn nnne 98 cuptiMetricCEnUMEVEeNES e eg gege AEN EEN NEES an aaa a iaaa E aiai ames ae 99 c ptiMetricEnumPropertieS sesen rui aante insin a a ENEE e i E 100 cuptiMetricGetAttribDUtO spisie naan a ova EE E ENNS S ve CE gana dens 101 www nvidia com CUPTI DA 05679 001 _v5 5 vi CuptiMetricGetldFromhame e eene he hee he eh ese eee 102 cuptiMetricGetNumbvents sex NEE KN RES EKE ER NNN RENERT ENKER RER SEN ERR arr RE ERR TS 102 cuptiMetricGetNumProperties iiis ees RES NEEN KEEN NEEN NEE KANEEE NEE ERR R NEE ANEN EEN 103 cuptiMetricGetValle coins en dana SEENEN SEH ENNEN ge 103 cuptiMetricGetVallle2 iiec
36. here else if status CUPTI ERROR MAX LIMIT REACHED break else COBRO REO jr Waste A CUptiResult cuptiActivityGetNumDroppedRecords CUcontext context uint32_t streamld size_t dropped Get the number of activity records that were dropped of insufficient buffer space Parameters context The context or NULL to get dropped count from global queue streamId The stream ID dropped The number of records that were dropped since the last call to this function Returns gt CUPTI SUCCESS CUPTI ERROR NOT INITIALIZED gt CUPTI ERROR INVALID PARAMETER if dropped is NULL Description Get the number of records that were dropped because of insufficient buffer space The dropped count includes records that could not be recorded because CUPTI did not have activity buffer space available for the record because the CUpti BuffersCallbackRequestFunc callback did not return an empty buffer of sufficient size and also CDP records that could not be record because the device size buffer was www nvidia com CUPTI DA 05679 001 v5 5 47 Modules full size is controlled by the CUPTI_ACTIVITY_ATTR_DEVICE_BUFFER_SIZE_CDP attribute The dropped count maintained for the queue is reset to zero when this function is called CUptiResult cuptiActivityQueryBuffer CUcontext context uint32_t streamld size_t validBufferSizeBytes Query the status of the buffer at the head of a queue Parameters context The context
37. nro n nnt RR aea Pa ha a arai 31 CUpti_ActivityEnvironMentKind cece cece cece eee cece eee eene eee hene ese enne 31 CUDULACTIVIEV FLAG canvas ica nati iia cada 31 CUpti Bee Ke Al DT 32 CUpti_ActivityMemcpyKiNd oooccconcconnoncccnanonncnnnoroncoorcconconnronnrnoncoccconaranccnnannsnos 35 CUpti_ActivityMemoryKiNd o ooooccconconcnconcoccconcconnconoronronaronarcnnrconoconcccnconanananoso 36 CUpti_ActivityODjectKind 55k eee ence eee cece eee i N e eee senes eene 36 CUpti_ActivityOverheadKind o coonononnonoccnnccnncnnncranconncccncanncannrnoncocccanoranacananonoso 37 CUpti_ActivityPreemptionKind oooccooncooncocnconcconccanaroncconccnnronnranaroonccccranarananonss 37 CUpti EnvironmentClocksThrottleReason anos 37 CUpti_BuffersCallbackCompleteFuUnC nsesoesossosssosssessossossossssssossessssssessessesssesee 38 CUpti BuftersCallbackbeouestFunc eene ehem hne eene 38 CUDACHvitvleoueuebutfter hehehe ehh sheer enn 39 cuptiActivityDisable ee cett eere A AA c steed LO E Re EE CHE EIR E od 40 cuptiActivityDisableContext ooomoooocconccnncronconncccnnonarannrnoncocccnnnronconnoroccccncnnaranaso 40 cuptiActivityEnable a ue san E E de t eR US 41 cuptiActivityEnableContext ures ee eto oe teo aeta eo eher ne EE S EPER ida 41 cuptiActivityEnqueueBuffer esee enne ehh eher ehe enne 42 le reg eg ee A4 cuptiActivityFlUshAll ccoo Deere erret or terea evt tege eaa enm bx eU bbupe Ea y DE Mee REEE iA 45 cuptiActiv
38. registersPer Thread jsi A ds 135 requested EE 135 EE E AN 135 SharedMeMoryContiG erger ege ENNEN EA EPI VEER inde 135 SEA wisn RES DR 136 a ee EE 136 EE e T E EA E EN rt 136 CUpti_ActivityMarker essri n EE NEEE E NEEN ties Os US 136 UC 136 e D 136 Maps 137 NAM cnica Un 137 eld A 137 oye ME 137 TIMES Di aaa TR OE deren n UE DH RUNE ter 137 CUpti AcrivitvkMarkerfiata eese eee eene e eene me hee h essen 137 leuia C TREE Eg 138 tolto O A 138 IL m 138 e ELSE oases 138 le D 138 payload e ES 138 PUMA SEA dees A A exce esee ed 139 CUpti ActivityMemcpy NEEN KEREN ehh hor rn caian naaa de a e oa NEEN NN EEN 139 RUE 139 COMTEX os dioss 139 COPYKING vg a A a A A id 139 A ED 139 sup adas 140 OM iii E E a des 140 A M 140 A SETE SAFE ED AE E EEN EEN e Eege geheie Ed 140 kind DEE 140 O lO ini A A A A EE 140 PUNLIME COPE AION davis uie rtr trito vo rere wet uer e eve Yuste vui Ee verre Y eser EEEE 141 Mire MED CES 141 tice ems 141 Steam 2991 ett ege 141 CUPU ActivityMemcpy2 iiesiee enean hehehe ha n nS Y ERRARE SERES VERRE Y E RRR E Sa ENEE Ne 141 aM RC T
39. represents the general type of the metric A metric s category is accessed using cuptiMetricGetAttribute and the CUPTI METRIC ATTR CATEGORY attribute Values CUPTI METRIC CATEGORY MEMORY 0 A memory related metric CUPTI METRIC CATEGORY INSTRUCTION 1 An instruction related metric CUPTI METRIC CATEGORY MULTIPROCESSOR 2 www nvidia com CUPTI DA 05679 001 v5 5 93 Modules A multiprocessor related metric CUPTI METRIC CATEGORY CACHE 3 A cache related metric CUPTI METRIC CATEGORY TEXTURE 4 A texture related metric CUPTI METRIC CATEGORY FORCE INT 0x7fffffff enum CUpti MetricEvaluationMode A metric evaluation mode A metric can be evaluated per hardware instance to know the load balancing across instances of a domain or the metric can be evaluated in aggregate mode when the events involved in metric evaluation are from different event domains It might be possible to evaluate some metrics in both modes for convenience A metric s evaluation mode is accessed using CUpti MetricEvaluationMode and the CUPTI METRIC ATTR EVALUATION MODE attribute Values CUPTI METRIC EVALUATION MODE PER INSTANCE 1 If this bit is set the metric can be profiled for each instance of the domain The event values passed to cuptiMetricGetValue can contain values for one instance of the domain And cuptiMetricGetValue can be called for each instance CUPTI METRIC EVALUATION MODE AGGREGATE 1 lt lt 1 If this bit is set the metric can be profil
40. sereen ete nen eren En EE REE E E RET E E aT 105 Chapter 3 Data SETUCTUTOS vviovonicissncasacinsorinc annann E EA 108 CUPULACUIVILY EE 110 KING DEE 110 CUpti laudi 7 uU 110 oM 110 CORE iron M 110 A ge ege NEEN EN dE A E dE ed EN dE NNN weeds 111 Md rm 111 procesa ds 111 PECUMM VALUE ees in s DUREE DEA AA 111 AN RAN 111 a irera E lens prevede Drs De radere t vage sve sobra ever D DVe c ana NERE Du 111 CH ActivityBranch iecit ree Ente RR nr e ERR a AG REENEN NEEN EE A 112 correlato 0 PEE ces 112 A T 112 SKE BL t a O EE 112 KING DEE 112 ascuplic M M 112 sourceLocatorld edu ege teurer E e pe tre Ule a Eee De ER RE UD e e D RH oia 112 RU Em 113 CH ActivityCdpkKerrlel 5 duoc toten is ENNER ENEE RE REESEN ee 113 illos d 113 DIOCKY RM T 113 lO PE 113 COMPIELE m 113 Content et e gue 113 COMO diia dt 114 device DEE 114 CynamMicSharedMeMOry cesceescceceeeseeeeeee sees e eene ehe hne E ESEE RE E STE ES 114 CDM TT 114 Xe 114 tiu M 114 siu ee ELTE 115 uem 115 SIZ M
41. the activity API www nvidia com CUPTI DA 05679 001 _v5 5 1 Introduction 1 3 CUPTI Activity API The CUPTI Activity API allows you to asynchronously collect a trace of an application s CPU and GPU CUDA activity The following terminology is used by the activity API Activity Record CPU and GPU activity is reported in C data structures called activity records There is a different C structure type for each activity kind e g CUpti_ActivityMemcpy Records are generically referred to using the CUpti Activity type This type contains only a kind field that indicates the kind of the activity record Using this kind the object can be cast from the generic CUpti Activity type to the specific type representing the activity See the printActivit y function in the activity_trace_async sample for an example Activity Buffer An activity buffer is used to transfer one or more activity records from CUPTI to the client CUPTI fills activity buffers with activity records as the corresponding activities occur on the CPU and GPU The CUPTI client is responsible for providing empty activity buffers as necessary to ensure that no records are dropped This section describes the new asynchronous buffering API implemented by cuptiActivityRegisterCallbacks cuptiActivityFlush and cuptiActivityFlushAll The old buffering API implemented by cuptiActivityEnqueueBuffer and cuptiActivityDequeueBuf fer is still supported but is deprecated and will be removed i
42. the memory being copied to uint32 t CUpti ActivityMemcpy2 dstDeviceld Description The ID of the device where memory is being copied to www nvidia com CUPTI DA 05679 001 v5 5 142 Data Structures uint8 t CUpti_ActivityMemcpy2 dstKind Description The destination memory kind read by the memory copy stored as a byte to reduce record size See also CUpti ActivityMemoryKind uint64_t CUpti ActivityMemcpy2 end Description The end timestamp for the memory copy in ns A value of 0 for both the start and end timestamps indicates that timestamp information could not be collected for the memory copy uint8 t CUpti_ActivityMemcpy2 flags Description The flags associated with the memory copy See also CUpti ActivityFlag CUpti ActivityKind CUpti ActivityMemcpy2 kind Description The activity record kind must be CUPTI ACTIVITY KIND MEMCPY2 uint32 t CUpti ActivityMemcpy2 pad Description Undefined Reserved for internal use void CUpti ActivityMemcpy2 reservedO Description Undefined Reserved for internal use www nvidia com CUPTI DA 05679 001 v5 5 143 Data Structures uint32 t CUpti_ActivityMemcpy2 srcContextld Description The ID of the context owning the memory being copied from uint32 t CUpti ActivityMemcpy2 srcDeviceld Description The ID of the device where memory is being copied from uint8 t CUpti_ActivityMemcpy2 srcKind Description The source memory kind read
43. uere eret rne rmn rn nra nene rete nn ease eI ENEE et ee ais 54 CUpti CallbackldSyric s iis eee ree e certet e EE ceed e 54 CUpti Callback FUNC raene Seege ge nth mre rie xu RR ENEE d RE e vere sa e KR EATUR ee 55 Hlp Calllbaekelglt t dise tad 55 CUpti Domain Table sionista 55 CUpti SUDSCIDEFHANGLE comicios inician dee nan atn Yer mrt uen metam bre n 55 cuptiEnableAllDoirairis iiber era Dor ya odias 55 cuptiEnableCallback 555 nosiri trn rrt rper ro nhe tna aa bee era n ERR abbr ele teu Fi RR es 56 CUPtIENADlLEDOMAIN vcs CER 57 CUPTIGetCallbackNaMe cccccccccescecsseescceceeennee sense ENEE REENEN ENEE EE EEN 58 cuptiGetCallbackStatO rss rear ere aA EE de das 59 c pti Io ore o a Hn 60 cuptisupportedDomains eeeeee eee esee Eee r esee E etes 61 cuptiUnsUbsC DE c X MMM M 61 2 5 CUPTI Event AP lies vssnssscudan suns rtr er ENN SEN suis EEN E REA a Cae VERRE sai 62 CUpti EventGroupSet sses Kee Kee Ne E ERE E tag s NEE SE ERE SNE S ERES 62 CUpti EventGroupSets gege eege ege ERENNERT AA aaa 62 CUpti DeviceAttribUte criscini ici E es ee 62 CUpti_DeviceAttributeDevicecClasS oooocococccococccccnnnccocnn conca eene enhn emnes enn 63 CUpti EventAttribute eere enr tutt EENS aaa 63 CUDU EveritCategory icai in RARE STRE AER REIN A EE P SIREFADIEI aos 63 CUpti Event ollectonMetbod ne enhn ehe e he ehh nee 64 CUpti EventCollectionMode ce
44. v5 5 126 Data Structures uint32 t CUpti_ActivityEventInstance instance Description The event domain instance CUpti_ActivityKind CUpti_ActivityEventinstance kind Description The activity record kind must be CUPTI_ACTIVITY_KIND_EVENT_INSTANCE uint32 t CUpti ActivityEventInstance pad Description Undefined Reserved for internal use uint64 t CUpti ActivityEventInstance value Description The event value 3 10 CUpti ActivityGlobalAccess Struct Reference The activity record for source level global access This activity records the locations of the global accesses in the source CUPTI ACTIVITY KIND GLOBAL ACCESS uint32 t CUpti ActivityGlobalAccess correlationld Description The correlation ID of the kernel to which this result is associated uint32 t CUpti ActivityGlobalAccess executed Description The number of times this instruction was executed www nvidia com CUPTI DA 05679 001 v5 5 127 Data Structures CUpti_ActivityFlag CUpti_ActivityGlobalAccess flags Description The properties of this global access CUpti_ActivityKind CUpti_ActivityGlobalAccess kind Description The activity record kind must be CUPTI ACTIVITY KIND GLOBAL ACCESS uint64_t CUpti ActivityGlobalAccess l2 transactions Description The total number of 32 bytes transactions to L2 cache generated by this access uint32 t CUpti_ActivityGlobalAccess pcOffset Description The pc offset for the access uin
45. 00 No clock throttling CUPTI CLOCKS THROTTLE REASON FORCE INT Ox7fffffff typedef CUpti BuffersCallbackCompleteFunc CUcontext context uint32 t streamld uint8 t buffer size t size size t validSize Function type for callback used by CUPTI to return a buffer of activity records This callback function returns to the CUPTI client a buffer containing activity records The buffer contains validSize bytes of activity records which should be read using cuptiActivityGetNextRecord The number of dropped records can be read using cuptiActivityGetNumDroppedRecords After this call CUPTI relinquished ownership of the buffer and will not use it anymore The client may return the buffer to CUPTI using the CUpti BuffersCallbackRequestFunc callback typedef CUpti BuffersCallbackRequestFunc uint8 t buffer size t size size t maxNumRecords Function type for callback used by CUPTI to request an empty buffer for storing activity records This callback function signals the CUPTI client that an activity buffer is needed by CUPTI The activity buffer is used by CUPTI to store activity records The callback function can decline the request by setting buf fer to NULL In this case CUPTI may drop activity records www nvidia com CUPTI DA 05679 001 v5 5 38 Modules CUptiResult cuptiActivityDequeueBuffer CUcontext context uint32_t streamld uint8_t buffer size_t validBufferSizeBytes Dequeue a buffer containing activity record
46. 679 001 v5 5 139 Data Structures uint32_t CUpti_ActivityMemcpy deviceld Description The ID of the device where the memory copy is occurring uint8 t CUpti_ActivityMemcpy dstKind Description The destination memory kind read by the memory copy stored as a byte to reduce record size See also CUpti ActivityMemoryKind uint64_t CUpti ActivityMemcpy end Description The end timestamp for the memory copy in ns A value of 0 for both the start and end timestamps indicates that timestamp information could not be collected for the memory copy uint8 t CUpti_ActivityMemcpy flags Description The flags associated with the memory copy See also CUpti ActivityFlag CUpti ActivityKind CUpti ActivityMemcpy kind Description The activity record kind must be CUPTI ACTIVITY KIND MEMCPY void CUpti ActivityMemcpy reservedO Description Undefined Reserved for internal use www nvidia com CUPTI DA 05679 001 v5 5 140 Data Structures uint32_t CUpti_ActivityMemcpy runtimeCorrelationld Description The runtime correlation ID of the memory copy Each memory copy is assigned a unique runtime correlation ID that is identical to the correlation ID in the runtime API activity record that launched the memory copy uint8 t CUpti_ActivityMemcpy srcKind Description The source memory kind read by the memory copy stored as a byte to reduce record size See also CUpti_ActivityMemoryKind uint64
47. 9 001 v5 5 133 Data Structures uint64_t CUpti_ActivityKernel2 end Description The end timestamp for the kernel execution in ns A value of 0 for both the start and end timestamps indicates that timestamp information could not be collected for the kernel uint8_t CUpti_ActivityKernel2 executed Description The cache configuration used for the kernel The value is one of the CUfunc_cache enumeration values from cuda h int64_t CUpti ActivityKerneU gridld Description The grid ID of the kernel Each kernel is assigned a unique grid ID at runtime int32 t CUpti_ActivityKernel2 gridX Description The X dimension grid size for the kernel int32 t CUpti ActivityKerneU2 gridY Description The Y dimension grid size for the kernel int32 t CUpti_ActivityKernel2 gridZ Description The Z dimension grid size for the kernel CUpti ActivityKind CUpti ActivityKerneU kind Description The activity record kind must be CUPTI ACTIVITY KIND KERNEL or CUPTI ACTIVITY KIND CONCURRENT KERNEL www nvidia com CUPTI DA 05679 001 v5 5 134 Data Structures uint32_t CUpti_ActivityKernel2 localMemoryPerThread Description The amount of local memory reserved for each thread in bytes uint32_t CUpti_ActivityKernel2 localMemoryTotal Description The total amount of local memory reserved for the kernel in bytes const char CUpti_ActivityKernel2 name Description The name of the kernel This name is shared ac
48. ACTIVITY KIND MARKER DATA CUPTI ACTIVITY FLAG MARKER COLOR ARGB 1 lt lt 1 Indicates the activity represents a marker that specifies a color in alpha red green blue format Valid for CUPTI ACTIVITY KIND MARKER DATA CUPTI ACTIVITY FLAG GLOBAL ACCESS KIND SIZE MASK 0xFF lt lt 0 The number of bytes requested by each thread Valid for CUpti ActivityGlobalAccess CUPTI ACTIVITY FLAG GLOBAL ACCESS KIND LOAD 1 8 If bit in this flag is set the access was load else it is a store access Valid for CUpti ActivityGlobalAccess CUPTI ACTIVITY FLAG GLOBAL ACCESS KIND CACHED 1 9 If this bit in flag is set the load access was cached else it is uncached Valid for CUpti ActivityGlobalAccess CUPTI ACTIVITY FLAG METRIC OVERFLOWED 1 0 If this bit in flag is set the metric value overflowed Valid for CUpti ActivityMetric CUPTI ACTIVITY FLAG METRIC VALUE INVALID 1 lt lt 1 If this bit in flag is set the metric value couldn t be calculated This occurs when a value s required to calculate the metric is missing Valid for CUpti Activity Metric CUPTI ACTIVITY FLAG FORCE INT 0x7fffffff enum CUpti_ActivityKind The kinds of activity records Each activity record kind represents information about a GPU or an activity occurring on a CPU or GPU Each kind is associated with a activity record structure that holds the information associated with the kind See also CUpti Activity www nvidia com CUPTI DA 05679 001 _v5 5 32 CUpti_Activ
49. CUpti EventDomainlD CUpti ActivityEvent domain Description The event domain ID CUpti EventlD CUpti ActivityEvent id Description The event ID www nvidia com CUPTI DA 05679 001 v5 5 125 Data Structures CUpti_ActivityKind CUpti_ActivityEvent kind Description The activity record kind must be CUPTI_ACTIVITY_KIND_EVENT uint64 t CUpti_ActivityEvent value Description The event value 3 9 CUpti ActivityEventInstance Struct Reference The activity record for a CUPTI event with instance information This activity record represents the a CUPTI event value for a specific event domain instance CUPTI ACTIVITY KIND EVENT INSTANCE This activity record kind is not produced by the activity API but is included for completeness and ease of use Profile frameworks built on top of CUPTI that collect event data may choose to use this type to store the collected event data This activity record should be used when event domain instance information needs to be associated with the event uint32 t CUpti ActivityEventInstance correlationld Description The correlation ID of the event Use of this ID is user defined but typically this ID value will equal the correlation ID of the kernel for which the event was gathered CUpti EventDomainlD CUpti ActivityEventInstance domain Description The event domain ID CUpti EventlD CUpti_ActivityEventinstance id Description The event ID www nvidia com CUPTI DA 05679 001
50. ERNEL 18 A CDP CUDA Dynamic Parallel kernel executing on the GPU The corresponding activity record structure is CUpti ActivityCdpKernel This activity can not be directly www nvidia com CUPTI DA 05679 001 v5 5 34 Modules enabled or disabled It is enabled and disabled through concurrent kernel activity CUPTI ACTIVITY KIND CONCURRENT KERNEL CUPTI ACTIVITY KIND PREEMPTION 19 Preemption activity record indicating a preemption of a CDP CUDA Dynamic Parallel kernel executing on the GPU The corresponding activity record structure is CUpti_ActivityPreemption CUPTI_ACTIVITY_KIND_ENVIRONMENT 20 Environment activity records indicating power clock thermal etc levels of the GPU The corresponding activity record structure is CUpti_ActivityEnvironment CUPTI ACTIVITY KIND EVENT INSTANCE 21 An event value associated with a specific event domain instance The corresponding activity record structure is CUpti_ActivityEventInstance CUPTI ACTIVITY KIND MEMCPY2 22 A peer to peer memory copy The corresponding activity record structure is CUpti_ActivityMemcpy2 CUPTI_ACTIVITY_KIND_METRIC_INSTANCE 23 A metric value associated with a specific metric domain instance The corresponding activity record structure is CUpti ActivityMetricInstance CUPTI ACTIVITY KIND FORCE INT Ox7fffffff enum CUpti ActivityMemcpyKind The kind of a memory copy indicating the source and destination targets of the copy Each kind represents the source
51. Environment clocksThrottleReasons Description The clocks throttle reasons www nvidia com CUPTI DA 05679 001 v5 5 122 Data Structures CUpti_ActivityEnvironment 6 10 CUpti_ActivityEnvironment cooling Description Data returned for CUPTI ACTIVITY ENVIRONMENT COOLING environment kind uint32 t CUpti ActivityEnvironment deviceld Description The ID of the device CUpti ActivityEnvironmentKind CUpti ActivityEnvironment environmentKind Description The kind of data reported in this record uint32 t CUpti ActivityEnvironment fanSpeed Description The fan speed as percentage of maximum uint32 t CUpti ActivityEnvironment gpuTemperature Description The GPU temperature in degrees C CUpti ActivityKind CUpti ActivityEnvironment kind Description The activity record kind must be CUPTI ACTIVITY KIND ENVIRONMENT uint32 t CUpti ActivityEnvironment memoryClock Description The memory frequency in MHz www nvidia com CUPTI DA 05679 001 v5 5 123 Data Structures uint32_t CUpti_ActivityEnvironment pcieLinkGen Description The PCIe link generation uint32 t CUpti ActivityEnvironment pcieLinkWidth Description The PCIe link width CUpti_ActivityEnvironment 6 9 CUpti_ActivityEnvironment power Description Data returned for CUPTI_ACTIVITY_ENVIRONMENT_POWER environment kind uint32 t CUpti_ActivityEnvironment power Description The power in milliwatts consumed by GPU and asso
52. INK RATE 7 Get PCIE link rate in Mega bits sec for device Return 0 if bus type is non PCIE Value is a uint64 t CUPTI DEVICE ATTR PCIE LINK WIDTH 8 Get PCIE link width for device Return 0 if bus type is non PCIE Value is a uint64 t CUPTI DEVICE ATTR PCIE GEN 9 Get PCIE generation for device Return 0 if bus type is non PCIE Value is a uint64 t CUPTI DEVICE ATTR DEVICE CLASS 10 Get the class for the device Value is a CUpti DeviceAttributeDeviceClass CUPTI DEVICE ATTR FORCE INT Ox7fffffff enum CUpti_DeviceAttributeDeviceClass Device class Enumeration of device classes for device attribute CUPTI DEVICE ATTR DEVICE CLASS Values CUPTI DEVICE ATTR DEVICE CLASS TESLA 0 CUPTI DEVICE ATTR DEVICE CLASS QUADRO 1 CUPTI DEVICE ATTR DEVICE CLASS GEFORCE 2 enum CUpti EventAttribute Event attributes Event attributes These attributes can be read using cuptiEventGetAttribute Values CUPTI EVENT ATTR NAME 0 Event name Value is a null terminated const c string CUPTI EVENT ATTR SHORT DESCRIPTION 1 Short description of event Value is a null terminated const c string CUPTI EVENT ATTR LONG DESCRIPTION 2 Long description of event Value is a null terminated const c string CUPTI EVENT ATTR CATEGORY 3 Category of event Value is CUpti EventCategory CUPTI EVENT ATTR FORCE INT 0x7fffffff enum CUpti_EventCategory An event category Each event is assigned to a category that represents the general type of the
53. MemoryTotal Description The total amount of local memory reserved for the kernel in bytes const char CUpti ActivityKernel name Description The name of the kernel This name is shared across all activity records representing the same kernel and so should not be modified uint32 t CUpti ActivityKernel pad Description Undefined Reserved for internal use uint16 t CUpti ActivityKernel registersPerThread Description The number of registers required for each thread executing the kernel void CUpti ActivityKernel reservedO Description Undefined Reserved for internal use www nvidia com CUPTI DA 05679 001 v5 5 131 Data Structures uint32 t CUpti ActivityKernel runtimeCorrelationld Description The runtime correlation ID of the kernel Each kernel execution is assigned a unique runtime correlation ID that is identical to the correlation ID in the runtime API activity record that launched the kernel uint64 t CUpti ActivityKernel start Description The start timestamp for the kernel execution in ns A value of 0 for both the start and end timestamps indicates that timestamp information could not be collected for the kernel int32 t CUpti ActivityKernel staticSharedMemory Description The static shared memory allocated for the kernel in bytes uint32 t CUpti ActivityKernel streamld Description The ID of the stream where the kernel is executing 3 12 CUpti_ActivityKernel2 Struct Referen
54. Metric An characteristic of an application that is calculated from one or more event values Metric ID Each metric is assigned a unique identifier A named metric will represent the same characteristic on all device types But the named metric may have different IDs on different device families Use cuptiMetricGetIdFromName to get the ID for a named metric on a particular device www nvidia com CUPTI DA 05679 001 _v5 5 9 Introduction Metric Category Each metric is placed in one of the categories defined by CUpti MetricCategory The category indicates the general type of the characteristic measured by the metric Metric Property Each metric is calculated from input values These input values can be events or properties of the device or system The available properties are defined by CUpti MetricPropertyID Metric Value Each metric has a value that represents one of the kinds defined by CUpti MetricValueKind For each value kind there is a corresponding member ofthe CUpti MetricValue union that is used to hold the metric s value The tables included in this section list the metrics available for each device as determined by the device s compute capability You can also determine the metrics available on a device using the cuptiDeviceEnumMetrics function The cupti query sample described on the samples page shows how to use this function You can also enumerate all the CUPTI metrics available on any device using the
55. NCTION PAM ed EEEE EEEE EEEE Eee TENE TR ae Uia Seele 155 FUNCTIONREtUIFN UTC 155 SymbolNalities oce Leonor ad 155 CUpti EventGroups ES doter enr oa ter a rh a DE Ere tee o EFPN IEEE TERRAE RS DERE EE EE DER 155 SS egene ge AE EERSTEN EE EEE EEN 156 PIDENGSDELA OUER Neie ee NR SE A NEEN NEEN amas panies 156 CUpti_EventOroupSetS 20 geess ag det ve SNE NEE d ANNE SNE SE SN NENNEN SR e Sie Set 156 AUDI e ETAT LI ID DIDIT 156 www nvidia com CUPTI DA 05679 001 v5 5 xiii CUpti Metty ele cocidas ii 156 CUpti_NvtXData EmMMM TT 157 O 157 FUNCION PA AOS ir cir A AA eo 157 CUpti_ResqurceD ai A dE d EEN 157 COM AE 157 resourceDescriptor seen rr ueri ee NENNEN ENN SEN esie se ga EN ASS 158 AMA EN dd dE NEE SEENEN 158 CUpti SynchronizeData 552i eee een erem NEE ENEE RERO PA KS EARS ARR son an 158 eonpnl2SG T 158 AMM is 158 Chapter 4 Data Fields seggegsgtvsstsuasgstasie gege g aset inue stesso eg SAAR d AE dees 159 www nvidia com CUPTI DA 05679 001 _v5 5 xiv LIST OF TABLES Table 1 Capability n ET 11 Table 2 Capability 2 x Metrics ccccceccccccesceeeneeensesseecnessneeseeessesceeeaeesasenececaessaeenanes 12 Table 3 Capability 3 x Metrics www nvidia com CUPTI DA 05679 001 _v5 5 xv www nvidia com CUPTI DA 05679 001 _v5 5 xvi Chapter 1 INTRODUCTION The CUDA Profiling Tools Interface CUPTI enables the creation of profiling and tracing
56. NOT_INITIALIZED gt CUPTI ERROR INVALID OPERATION if the event group is enabled gt CUPTI ERROR INVALID PARAMETER if eventGroup is NULL Description Destroy an eventGroup and free its resources An event group cannot be destroyed if it is enabled Thread safety this function is thread safe www nvidia com CUPTI DA 05679 001 _v5 5 79 Modules CUptiResult cuptiEventGroupDisable CUpti_EventGroup eventGroup Disable an event group Parameters eventGroup The event group Returns gt CUPTI_SUCCESS gt CUPTI ERROR NOT INITIALIZED gt CUPII ERROR HARDWARE gt CUPTI ERROR INVALID PARAMETER if eventGroup is NULL Description Disable an event group Disabling an event group stops collection of events contained in the group Thread safety this function is thread safe CUptiResult cuptiEventGroupEnable CUpti_EventGroup eventGroup Enable an event group Parameters eventGroup The event group Returns gt CUPTI_SUCCESS gt CUPTI ERROR NOT INITIALIZED gt CUPTII ERROR HARDWARE gt CUPTII ERROR NOT READY www nvidia com CUPTI DA 05679 001 v5 5 80 Modules if eventGroup does not contain any events gt CUPTI ERROR NOT COMPATIBLE if eventGroup cannot be enabled due to other already enabled event groups gt CUPTI ERROR INVALID PARAMETER if eventGroup is NULL gt CUPTI ERROR HARDWARE BUSY if another client is profiling and hardware
57. NULL www nvidia com CUPTI DA 05679 001 v5 5 88 Modules Description Disable a set of event groups Disabling a set of event groups stops collection of events contained in the groups gt Thread safety this function is thread safe gt If this call fails some of the event groups in the set may be disabled and other event groups may remain enabled CUptiResult cuptiEventGroupSetEnable CUpti_EventGroupSet eventGroupSet Enable an event group set Parameters eventGroupSet The pointer to the event group set Returns gt CUPTI SUCCESS gt CUPTI ERROR NOT INITIALIZED gt CUPTI ERROR HARDWARE gt CUPTI ERROR NOT READY if eventGroup does not contain any events gt CUPTI ERROR NOT COMPATIBLE if eventGroup cannot be enabled due to other already enabled event groups gt CUPTI ERROR INVALID PARAMETER if eventGroupSet is NULL gt CUPTI ERROR HARDWARE BUSY if other client is profiling and hardware is busy Description Enable a set of event groups Enabling a set of event groups zeros the value of all the events in all the groups and then starts collection of those events Thread safety this function is thread safe www nvidia com CUPTI DA 05679 001 _v5 5 89 Modules CUptiResult cuptiEventGroupSetsCreate CUcontext context size_t eventidArraySizeBytes CUpti EventlD eventldArray CUpti_EventGroupSets eventGroupPasses For a set of events get the grouping that
58. NVIDIA WHAT S NEW CUPTI contains a number of changes and new features as part of the CUDA Toolkit 5 5 release WWW CUPTI Applications that use CUDA Dynamic Parallelism can now be profiled using CUPTI Device side kernel launches are reported using a new activity kind Device attributes such as power usage clocks thermals etc are now reported via a new activity kind A new activity buffer API uses callbacks to request and return buffers of activity records The existing cuptiActivityEnqueueBuffer and cuptiActivityDequeueBuffer functions are still supported but are deprecated and will be removed in a future release The Event API supports kernel replay so that any number of events can be collected during a single run of the application A new metric API cuptiMetricGetValue2 allows metric values to be calculated for any device even if that device is not available on the system CUDA peer to peer memory copies are reported explicitly via the activity API In previous releases these memory copies were only partially reported nvidia com DA 05679 001 v5 5 ii TABLE OF CONTENTS Chapter 1 Introduction cuina SEN dE EA de canes 1 1 1 CUPTI Compatibility and Requirements seeren ennen eee nee 1 1 2 CUPTI InitializatiOn erret orn chat ENEE Sean anes wack EARRA SEEE RANEA ERE ME 1 1 3 2CUPTI Activity Olai rer tore cine enne EE HERE RR Reed ee BEOR EN e NEE gn 2 1 4 CUPTI Callback AP ao EERS rite ie
59. PTI ERROR FORCE INT 0x7fffffff www nvidia com CUPTI DA 05679 001 _v5 5 26 Modules CUptiResult cuptiGetResultString CUptiResult result const char str Get the descriptive string for a CUptiResult Parameters result The result to get the string for str Returns the string Returns gt CUPTI_SUCCESS on success gt CUPTI ERROR INVALID PARAMETER if str is NULL or result is not a valid CUptiResult Description Return the descriptive string for a CUptiResult in str Thread safety this function is thread safe 2 3 CUPTI Activity API Functions types and enums that implement the CUPTI Activity API www nvidia com CUPTI DA 05679 001 v5 5 27 Modules struct CUpti_Activity The base activity record struct CUpti_ActivityAPl The activity record for a driver or runtime API invocation struct CUpti_ActivityBranch The activity record for source level result branch struct CUpti_ActivityCdpKernel The activity record for CDP CUDA Dynamic Parallelism kernel struct CUpti_ActivityContext The activity record for a context struct CUpti_ActivityDevice The activity record for a device struct CUpti_ActivityEnvironment The activity record for CUPTI environmental data struct CUpti_ActivityEvent The activity record for a CUPTI event struct CUpti_ActivityEventinstance The activity record for a CUPTI event with instance information struct CUpti_ActivityGlobalAccess The ac
60. RROR NOT INITIALIZED gt CUPTI ERROR INVALID EVENT DOMAIN ID gt CUPTI ERROR INVALID PARAMETER if numEvents is NULL Description Returns the number of events in numEvents for a domain Thread safety this function is thread safe CUptiResult cuptiEventGetAttribute CUpti EventlD event CUpti EventAttribute attrib size t valueSize void value Get an event attribute Parameters event ID of the event attrib The event attribute to read www nvidia com CUPTI DA 05679 001 v5 5 75 Modules valueSize The size of the value buffer in bytes and returns the number of bytes written to value value Returns the attribute s value Returns gt CUPTILSUCCESS gt CUPTI_LERROR_NOT_INITIALIZED gt CUPTI ERROR INVALID EVENT ID gt CUPTI ERROR INVALID PARAMETER if valueSize or value is NULL or if attrib is not an event attribute gt CUPTI ERROR PARAMETER SIZE NOT SUFFICIENT For non c string attribute values indicates that the value buffer is too small to hold the attribute value Description Returns an event attribute in value The size of the value buffer is given by valueSize The value returned in valueSi ze contains the number of bytes returned in value If the attribute value is a c string that is longer than valueSize then only the first valueSize characters will be returned and there will be no terminating null byte Thread safety this function is thread safe CUp
61. T CUptiResult cuptiMetricGetValue2 CUpti_MetriclD metric size t eventldArraySizeBytes CUpti EventlD eventldArray size t eventValueArraySizeBytes uint64 t eventValueArray size t propldArraySizeBytes CUpti MetricPropertyID propldArray size t propValueArraySizeBytes uint64 t propValueArray CUpti MetricValue metricValue Calculate the value for a metric Parameters metric The metric ID eventIdArraySizeBytes The size of event IdArray in bytes eventIdArray The event IDs required to calculate met ric eventValueArraySizeBytes The size of event ValueArray in bytes eventValueArray The normalized event values required to calculate metric The values must be order to match the order of events in eventIdArray www nvidia com CUPTI DA 05679 001 _v5 5 105 Modules propIdArraySizeBytes The size of propIdArray in bytes propIdArray The metric property IDs required to calculate metric prop ValueArraySizeBytes The size of propValueArray in bytes prop ValueArray The metric property values required to calculate met ric The values must be order to match the order of metric properties in propIdArray metricValue Returns the value for the metric Returns gt CUPTI SUCCESS gt CUPTI ERROR NOT INITIALIZED gt CUPTI ERROR INVALID METRIC ID gt CUPTI ERROR INVALID OPERATION gt CUPTI ERROR PARAMETER SIZE NOT SUFFICIENT if the eventIdArray does not contain all the events needed for metric gt
62. U ou san ELI EU NEEN ENNEN 78 cuptiEventGroupDOestrOy esses eee bae 79 cuptiEventGroupbDisable 1 e rr reote enter n eek Sh n nr RR PESE NAAR ERR eerie ed ENNEN Ee 80 cuptiEventGroupEnable 5 teer ten teen n rn menn erre en nn mter n er n e ea p Fa P 80 CUptEventGrouptGetAttribute eese eene ehe hens enne 81 CUuptEventGroupbeadAllEventzs eene hee he ehe es enne 82 cuptiEventGroupReadkvent 5 err EEEN NENNEN seas scien diene EEN Eas aka ore ia TR EEN 84 cuptiEventGroupRemoveAllEvents eene ehe ehe eene nennen 85 CUDEventGoroupbbemovebvent eene nene hene hh enhn eh nnne 86 CUptEventGroupbesetAllEventz eene ehe he hene nhe nen eene 87 CUPTIEVeNtGroupSetAttribULe eee hene ehe ehe sees 87 cuptiEventGroupSetDisable oooooncooncooncooncoccnonaconaraonorcconancnnronnaroncconcccnranarenanos 88 cuptiEventGroupSetEnable ssssssssssssssssscsossoscossosssssosssososssosssossosesssossssssosssss 89 cuptiEventGroupSetsCreate ovoronococinsononicacoracanin no ENEE ENNEN RER ENEE ERNEIEREN EEN 90 cuptiEventGroupSetsDestroy 225i esses ease ear si n ee ERR idas ains ERR E ias cian ERR RR siii 91 cuptiGetNumEventDomains ceceeeeeseeeeeeee eee ehe e hh ehe hes e es enne 91 cuptisetEventCollectionMode ep enero oer rre nope EE ER YE eR P OE EE NE idas 92 CUPTI EVENT OVERELOW iecore rera NEEN AEN dean NEEN REESEN oe os SEN EE SEN Fa RR v Rae o 92 2 6 CUPTI Metric dE 93 CUpti Metric llo de 93 CUpti MetricAttrib te see
63. UPTI_EVENT_GROUP_ATTR_NUM_EVENTS CUPTI EVENT GROUP ATTR INSTANCE COUNT 5 Number of instances of the domain bound to this event group that will be counted Value is a uint32 t CUPTI EVENT GROUP ATTR FORCE INT 0x7fffffff enum CUpti_ReadEventFlags Flags for cuptiEventGroupReadEvent an cuptiEventGroupReadAllEvents Flags for cuptiEventGroupReadEvent an cuptiEventGroupReadAllEvents Values CUPTI EVENT READ FLAG NONE 0 No flags CUPTI EVENT READ FLAG FORCE INT 0x7fffffff typedef uint32 t CUpti EventDomainID ID for an event domain ID for an event domain An event domain represents a group of related events A device may have multiple instances of a domain indicating that the device can simultaneously record multiple instances of each event within that domain typedef void CUpti EventGroup A group of events An event group is a collection of events that are managed together All events in an event group must belong to the same domain typedef uint32 t CUpti EventID ID for an event An event represents a countable activity action or occurrence on the device www nvidia com CUPTI DA 05679 001 v5 5 66 Modules CUptiResult cuptiDeviceEnumEventDomains CUdevice device size t arraySizeBytes CUpti_EventDomainID domainArray Get the event domains for a device Parameters device The CUDA device arraySizeBytes The size of domainArray in bytes and returns the number of bytes written to domainArray
64. VALID PARAMETER if eventIdArraySizeBytes oreventIdArray are NULL Description Gets the event IDs in event IdArray required to calculate a metric The size of the eventIdArray buffer is given by eventIdArraySizeBytes and must be at least numEvents sizeof CUpti EventID or all events will not be returned The value returned in eventIdArraySizeBytes contains the number of bytes returned in eventIdArray CUptiResult cuptiMetricEnumProperties CUpti MetricID metric size t propldArraySizeBytes CUpti MetricPropertyID propldArray Get the properties required to calculating a metric Parameters metric ID of the metric propIdArraySizeBytes The size of propIdArray in bytes and returns the number of bytes written to propIdArray propIdArray Returns the IDs of the properties required to calculate metric Returns gt CUPTI SUCCESS gt CUPTI ERROR NOT INITIALIZED gt CUPTI ERROR INVALID METRIC ID gt CUPTI ERROR INVALID PARAMETER if propIdArraySizeBytes or propIdArray are NULL Description Gets the property IDs in propIdArray required to calculate a met ric The size of the propIdArray buffer is given by propIdArraySizeBytes and must be at least numProp sizeof CUpti DeviceAttribute or all properties will not be returned The value returned in propIdArraySizeBytes contains the number of bytes returned in propidArray w
65. ZED www nvidia com CUPTI DA 05679 001 v5 5 61 Modules if unable to initialized CUPTI gt CUPTI ERROR INVALID PARAMETER if subscriber is NULL or not initialized Description Removes a callback subscriber so that no future callbacks will be issued to that subscriber Thread safety this function is thread safe 2 5 CUPTI Event API Functions types and enums that implement the CUPTI Event APL struct CUpti EventGroupSet A set of event groups struct CUpti EventGroupSets A set of event group sets enum CUpti_DeviceAttribute Device attributes CUPTI device attributes These attributes can be read using cuptiDeviceGetAttribute Values CUPTI DEVICE ATTR MAX EVENT ID 1 Number of event IDs for a device Value is a uint32 t CUPTI DEVICE ATTR MAX EVENT DOMAIN ID 2 Number of event domain IDs for a device Value is a uint32 t CUPTI DEVICE ATTR GLOBAL MEMORY BANDWIDTH 3 Get global memory bandwidth in Kbytes sec Value is a uint64 t CUPTI DEVICE ATTR INSTRUCTION PER CYCLE 4 Get theoretical maximum number of instructions per cycle Value is a uint32 t CUPTI DEVICE ATTR INSTRUCTION THROUGHPUT SINGLE PRECISION 5 Get theoretical maximum number of single precision instructions that can be executed per second Value is a uint64 t CUPTI DEVICE ATTR MAX FRAME BUFFERS 6 Get number of frame buffers for device Value is a uint64 t www nvidia com CUPTI DA 05679 001 v5 5 62 Modules CUPTI DEVICE ATTR PCIE L
66. _t CUpti_ActivityMemcpy start Description The start timestamp for the memory copy in ns A value of 0 for both the start and end timestamps indicates that timestamp information could not be collected for the memory copy uint32 t CUpti_ActivityMemcpy streamld Description The ID of the stream where the memory copy is occurring 3 16 CUpti ActivityMemcpy2 Struct Reference The activity record for peer to peer memory copies This activity record represents a peer to peer memory copy CUPTI ACTIVITY KIND MEMCPY2 uint64 t CUpti ActivityMemcpy2 bytes Description The number of bytes transferred by the memory copy www nvidia com CUPTI DA 05679 001 v5 5 141 Data Structures uint32 t CUpti ActivityMemcpy2 contextld Description The ID of the context where the memory copy is occurring uint8 t CUpti_ActivityMemcpy2 copyKind Description The kind of the memory copy stored as a byte to reduce record size See also CUpti ActivityMemcpyKind uint32 t CUpti ActivityMemcpy2 correlationld Description The correlation ID of the memory copy Each memory copy is assigned a unique correlation ID that is identical to the correlation ID in the driver and runtime API activity record that launched the memory copy uint32 t CUpti ActivityMemcpy2 deviceld Description The ID of the device where the memory copy is occurring uint32 t CUpti_ActivityMemcpy2 dstContextld Description The ID of the context owning
67. age shows how to use the functions to calculate event values and calculate a metric using cuptiMetricGetValue Note that as shown in the example you should collect event counts from all domain instances and normalize the counts to get the most accurate metric values It is necessary to normalize the event counts because the number of event counter instances varies by device and by the event being counted www nvidia com CUPTI DA 05679 001 v5 5 10 Introduction For example a device might have 8 multiprocessors but only have event counters for 4 of the multiprocessors and might have 3 memory units and only have events counters for one memory unit When calculating a metric that requires a multiprocessor event and a memory unit event the 4 multiprocessor counters should be summed and multiplied by 2 to normalize the event count across the entire device Similarly the one memory unit counter should be multiplied by 3 to normalize the event count across the entire device The normalized values can then be passed to cuptiMetricGetValue or cuptiMetricGetValue2 to calculate the metric value As described the normalization assumes the kernel executes a sufficient number of blocks to completely load the device If the kernel has only a small number of blocks normalizing across the entire device may skew the result 1 6 1 Metric Reference Compute Capability 1 x Devices with compute capability less than 2 0 implement the metrics shown in the
68. al floating point instructions The utilization level of the multiprocessor Multi context function units that execute miscellaneous instructions Single precision floating point operations Multi context executed Single precision floating point add operations Multi context executed Single precision floating point multiply Multi context operations executed Single precision floating point multiply Multi context accumulate operations executed Double precision floating point operations Multi context executed Double precision floating point add operations Multi context executed Double precision floating point multiply Multi context operations executed Double precision floating point multiply Multi context accumulate operations executed Single precision floating point special Multi context operations executed DA 05679 001 _v5 5 16 Introduction stall_inst_fetch Percentage of stalls occurring because the Multi context next assembly instruction has not yet been fetched stall_exec_dependency Percentage of stalls occurring because an Multi context input required by the instruction is not yet available stall data request Percentage of stalls occurring because a Multi context memory operation cannot be performed due to the required resources not being available or fully utilized or because too many requests of a given type are outstanding stall sync Percentage of stalls occurring because the Multi context warp is blocked
69. all profiled domain instances totalInstanceCount is obtained from querying CUPTI EVENT DOMAIN ATTR TOTAL INSTANCE COUNT and instanceCount is obtained from querying CUPTI EVENT GROUP ATTR INSTANCE COUNT or CUPTI EVENT DOMAIN ATTR INSTANCE COUNT www nvidia com CUPTI DA 05679 001 _v5 5 107 Chapter 3 DATA STRUCTURES Here are the data structures with brief descriptions CUpti_Activity The base activity record CUpti_ActivityAPI The activity record for a driver or runtime API invocation CUpti_ActivityBranch The activity record for source level result branch CUpti_ActivityCdpKernel The activity record for CDP CUDA Dynamic Parallelism kernel CUpti_ActivityContext The activity record for a context CUpti_ActivityDevice The activity record for a device CUpti_ActivityEnvironment The activity record for CUPTI environmental data CUpti_ActivityEvent The activity record for a CUPTI event CUpti_ActivityEventInstance The activity record for a CUPTI event with instance information CUpti_ActivityGlobalAccess The activity record for source level global access CUpti_ActivityKernel The activity record for kernel deprecated CUpti_ActivityKernel2 The activity record for a kernel CUDA 5 5 onwards CUpti_ActivityMarker The activity record providing a marker which is an instantaneous point in time CUpti_ActivityMarkerData The activity record providing detailed information for a marker www nvidia com CUPTI DA 05679 001 _v5 5
70. ample client must guard against simultaneous calls to cuptiEventGroupDestroy cuptiEventGroupAddEvent etc and must guard against simultaneous destruction of the context in which eventGroup was created for example client must guard against simultaneous calls to cudaDeviceReset cuCtxDestroy etc CUptiResult cuptiEventGroupSetAttribute CUpti EventGroup eventGroup CUpti EventGroupAttribute attrib size t valueSize void value Write an event group attribute Parameters eventGroup The event group attrib The attribute to write www nvidia com CUPTI DA 05679 001 _v5 5 87 Modules valueSize The size in bytes of the value value The attribute value to write Returns gt CUPTI SUCCESS gt CUPTI ERROR NOT INITIALIZED gt CUPTI ERROR INVALID PARAMETER if valueSize or value is NULL or if at trib is not an event group attribute or if attrib is not a writable attribute gt CUPTI ERROR PARAMETER SIZE NOT SUFFICIENT Indicates that the value buffer is too small to hold the attribute value Description Write an event group attribute Thread safety this function is thread safe CUptiResult cuptiEventGroupSetDisable CUpti EventGroupSet eventGroupSet Disable an event group set Parameters eventGroupSet The pointer to the event group set Returns gt CUPI SUCCESS gt CUPTI ERROR NOT INITIALIZED gt CUPTII ERROR HARDWARE gt CUPTI ERROR INVALID PARAMETER if eventGroupSet is
71. applies to new allocations Set this value before initializing CUDA or before creating a stream to ensure it is considered for the following allocations Note The actual amount of device memory per stream reserved by CUPTI might be larger CUPTI ACTIVITY ATTR DEVICE BUFFER POOL LIMIT 2 The maximum number of device memory buffers stored for reuse by CUPTI The value is a size t Buffers can be reused by streams of the same context Increasing this value reduces the profiling overhead when the application creates and destroys many www nvidia com CUPTI DA 05679 001 v5 5 30 Modules streams Setting this value will not modify the number of memory buffers currently stored Set this value before initializing CUDA to ensure the limit is not exceeded enum CUpti_ActivityComputeApiKind The kind of a compute API Values CUPTI_ACTIVITY_COMPUTE_API UNKNOWN 0 The compute API is not known CUPTI ACTIVITY COMPUTE API CUDA 1 The compute APIs are for CUDA CUPTI ACTIVITY COMPUTE API FORCE INT 0x7fffffff enum CUpti_ActivityEnvironmentKind The kind of environment data Used to indicate what type of data is being reported by an environment activity record Values CUPTI ACTIVITY ENVIRONMENT UNKNOWN 0 Unknown data CUPTI ACTIVITY ENVIRONMENT SPEED 1 The environment data is related to speed CUPTI ACTIVITY ENVIRONMENT TEMPERATURE 2 The environment data is related to temperature CUPTI ACTIVITY ENVIRONMENT POWER 3 The environment data i
72. assed the data If you need to retain some data for use outside of the callback you must make a copy of that data For example if you make a shallow copy of CUpti CallbackData within a callback you cannot dereference functionParams outside of that callback to www nvidia com CUPTI DA 05679 001 v5 5 153 Data Structures access the function parameters functionName is an exception the string pointed to by functionName is a global constant and so may be accessed outside of the callback CUpti_ApiCallbackSite CUpti_CallbackData callbackSite Description Point in the runtime or driver function from where the callback was issued CUcontext CUpti CallbackData context Description Driver context current to the thread or null if no context is current This value can change from the entry to exit callback of a runtime API function if the runtime initializes a context uint32 t CUpti CallbackData contextUid Description Unique ID for the CUDA context associated with the thread The UIDs are assigned sequentially as contexts are created and are unique within a process uint64 t CUpti CallbackData correlationData Description Pointer to data shared between the entry and exit callbacks of a given runtime or drive API function invocation This field can be used to pass 64 bit values from the entry callback to the corresponding exit callback uint32 t CUpti CallbackData correlationld Description The activity record correl
73. assed the data If you need to retain some data for use outside of the callback you must make a copy of that data CUcontext CUpti_ResourceData context Description For CUPTI_CBID_RESOURCE_CONTEXT_CREATED and CUPTI_CBID_RESOURCE_CONTEXT_DESTROY_STARTING the context being created or destroyed For CUPTI_CBID_RESOURCE_STREAM_CREATED and CUPTI_CBID_RESOURCE_STREAM_DESTROY_STARTING the context containing the stream being created or destroyed www nvidia com CUPTI DA 05679 001 _v5 5 157 Data Structures void CUpti_ResourceData resourceDescriptor Description Reserved for future use CUstream CUpti_ResourceData stream Description For CUPTI_CBID_RESOURCE_STREAM_CREATED and CUPTI_CBID_RESOURCE_STREAM_DESTROY_STARTING the stream being created or destroyed 3 31 CUpti_SynchronizeData Struct Reference Data passed into a synchronize callback function Data passed into a synchronize callback function as the codata argument to CUpti_CallbackFunc The cbdata will be this type for domain equal to CUPTI_CB_DOMAIN_SYNCHRONIZE The callback data is valid only within the invocation of the callback function that is passed the data If you need to retain some data for use outside of the callback you must make a copy of that data CUcontext CUpti_SynchronizeData context Description The context of the stream being synchronized CUstream CUpti_SynchronizeData stream Description The stream being synchronized www nvidia c
74. at a __syncthreads call stall_texture Percentage of stalls occurring because the Multi context texture sub system is fully utilized or has too many outstanding requests stall_other Percentage of stalls occurring due to Multi context miscellaneous reasons 1 6 3 Metric Reference Compute Capability 3 x Devices with compute capability greater than or equal to 3 0 implement the metrics shown in the following table A scope value of single context indicates that the metric can only be accurately collected when a single context CUDA or graphic is executing on the GPU A scope value of multi context indicates that the metric can be accurately collected when multiple contexts are executing on the GPU Table 3 Capability 3 x Metrics sm_efficiency The percentage of time at least one warp is Single context active on a multiprocessor averaged over all multiprocessors on the GPU sm_efficiency_instance The percentage of time at least one warp is Single context active on a specific multiprocessor achieved_occupancy Ratio of the average active warps per active Multi context cycle to the maximum number of warps supported on a multiprocessor www nvidia com CUPTI DA 05679 001 _v5 5 17 Introduction issue_slot_utilization Percentage of issue slots that issued at least one instruction averaged across all cycles Multi context Instructions issued per cycle ipc instance inst per warp Instructions executed per cycle for a si
75. ation ID for this callback For a driver domain callback i e domain CUPTI CB DOMAIN DRIVER API this ID will equal the correlation ID in the CUpti Activity API record corresponding to the CUDA driver function call For a runtime domain callback i e domain CUPTI CB DOMAIN RUNTIME API this ID will equal the correlation ID in the CUpti Activity API record corresponding to the CUDA runtime function call Within the callback this ID can be recorded to correlate user data with the activity record This field is new in 4 1 www nvidia com CUPTI DA 05679 001 v5 5 154 Data Structures const char CUpti CallbackData functionName Description Name of the runtime or driver API function which issued the callback This string is a global constant and so may be accessed outside of the callback const void CUpti CallbackData functionParams Description Pointer to the arguments passed to the runtime or driver API call See generated cuda runtime api meta h and generated cuda meta h for structure definitions for the parameters for each runtime and driver API function void CUpti CallbackData functionReturnValue Description Pointer to the return value of the runtime or driver API call This field is only valid within the exit CUPTI API EXIT callback For a runtime API functionReturnValue points to a cudaError_t For a driver API functionReturnValue points to a CUresult const char CUpti CallbackData symbolName Description
76. ation ID of the kernel to which this result is associated uint32 t CUpti ActivityBranch diverged Description Number of times this branch diverged uint32 t CUpti ActivityBranch executed Description The number of times this branch was executed CUpti ActivityKind CUpti ActivityBranch kind Description The activity record kind must be CUPTI ACTIVITY KIND BRANCH uint32 t CUpti ActivityBranch pcOffset Description The pc offset for the branch uint32 t CUpti ActivityBranch sourceLocatorld Description The ID for source locator www nvidia com CUPTI DA 05679 001 v5 5 112 Data Structures uint64 t CUpti ActivityBranch threadsExecuted Description This increments each time when this instruction is executed by number of threads that executed this instruction 3 4 CUpti ActivityCdpKernel Struct Reference The activity record for CDP CUDA Dynamic Parallelism kernel This activity record represents a CDP kernel execution int32 t CUpti ActivityCdpKernel blockX Description The X dimension block size for the kernel int32 t CUpti ActivityCdpKernel blockY Description The Y dimension block size for the kernel int32 t CUpti_ActivityCdpKernel blockZ Description The Z dimension grid size for the kernel uint64_t CUpti_ActivityCdpKernel completed Description The timestamp when kernel is marked as completed in ns A value of CUPTI TIMESTAMP UNKNOWN indicates that the completion time is unknown
77. be collected for the memory set CUpti ActivityKind CUpti ActivityMemset kind Description The activity record kind must be CUPTI ACTIVITY KIND MEMSET void CUpti ActivityMemset reservedO Description Undefined Reserved for internal use www nvidia com CUPTI DA 05679 001 v5 5 145 Data Structures uint32 t CUpti ActivityMemset runtimeCorrelationld Description The runtime correlation ID of the memory set Each memory set is assigned a unique runtime correlation ID that is identical to the correlation ID in the runtime API activity record that launched the memory set uint64 t CUpti ActivityMemset start Description The start timestamp for the memory set in ns A value of 0 for both the start and end timestamps indicates that timestamp information could not be collected for the memory set uint32 t CUpti ActivityMemset streamld Description The ID of the stream where the memory set is occurring uint32 t CUpti ActivityMemset value Description The value being assigned to memory by the memory set 3 18 CUpti ActivityMetric Struct Reference The activity record for a CUPTI metric This activity record represents the collection of a CUPTI metric value CUPTI ACTIVITY KIND METRIC This activity record kind is not produced by the activity API but is included for completeness and ease of use Profile frameworks built on top of CUPTI that collect metric data may choose to use this type to store the colle
78. by the memory copy stored as a byte to reduce record size See also CUpti ActivityMemoryKind uint64_t CUpti ActivityMemcpy2 start Description The start timestamp for the memory copy in ns A value of 0 for both the start and end timestamps indicates that timestamp information could not be collected for the memory copy uint32 t CUpti ActivityMemcpy2 streamld Description The ID of the stream where the memory copy is occurring 3 17 CUpti_ActivityMemset Struct Reference The activity record for memset This activity record represents a memory set operation CUPTI ACTIVITY KIND MEMSET www nvidia com CUPTI DA 05679 001 v5 5 144 Data Structures uint64_t CUpti_ActivityMemset bytes Description The number of bytes being set by the memory set uint32_t CUpti_ActivityMemset contextld Description The ID of the context where the memory set is occurring uint32_t CUpti_ActivityMemset correlationld Description The correlation ID of the memory set Each memory set is assigned a unique correlation ID that is identical to the correlation ID in the driver API activity record that launched the memory set uint32 t CUpti_ActivityMemset deviceld Description The ID of the device where the memory set is occurring uint64 t CUpti_ActivityMemset end Description The end timestamp for the memory set in ns A value of 0 for both the start and end timestamps indicates that timestamp information could not
79. calls to cudaDeviceReset cuCtxDestroy etc CUptiResult cuptiEventGroupReadAllEvents CUpti EventGroup eventGroup CUpti ReadEventFlags flags size t eventValueBufferSizeBytes uint64 t eventValueBuffer size t eventldArraySizeBytes CUpti_EventID eventldArray size t numEventldsRead Read the values for all the events in an event group Parameters eventGroup The event group flags Flags controlling the reading mode eventValueBufferSizeBytes The size of eventValueBuffer in bytes and returns the number of bytes written to eventValueBuffer eventValueBuffer Returns the event values eventIdArraySizeBytes The size of event IdArray in bytes and returns the number of bytes written to eventlIdArray eventIdArray Returns the IDs of the events in the same order as the values return in eventValueBuffer numEventIdsRead Returns the number of event IDs returned in event IdArray www nvidia com CUPTI DA 05679 001 _v5 5 82 Modules Returns gt CUPTI SUCCESS gt CUPTI ERROR NOT INITIALIZED gt CUPTI ERROR HARDWARE gt CUPTI ERROR INVALID OPERATION if eventGroup is disabled gt CUPTI ERROR INVALID PARAMETER if eventGroup eventValueBufferSizeBytes eventValueBuffer eventIdArraySizeBytes eventIdArray or numEventIdsRead is NULL Description Read the values for all the events in an event group The event values are returned in the eventValueBuff
80. ce The activity record for a kernel CUDA 5 5 onwards This activity record represents a kernel execution CUPTI ACTIVITY KIND KERNEL and CUPTI ACTIVITY KIND CONCURRENT KERNEL int32 t CUpti ActivityKerneU2 blockX Description The X dimension block size for the kernel www nvidia com CUPTI DA 05679 001 v5 5 132 Data Structures int32 t CUpti_ActivityKernel2 blockY Description The Y dimension block size for the kernel int32 t CUpti_ActivityKernel2 blockZ Description The Z dimension grid size for the kernel uint64 t CUpti_ActivityKernel2 completed Description The completed timestamp for the kernel execution in ns It represents the completion of all it s child kernels and the kernel itself A value of CUPTI TIMESTAMP UNKNOWN indicates that the completion time is unknown uint32 t CUpti_ActivityKernel2 contextld Description The ID of the context where the kernel is executing uint32 t CUpti_ActivityKernel2 correlationld Description The correlation ID of the kernel Each kernel execution is assigned a unique correlation ID that is identical to the correlation ID in the driver or runtime API activity record that launched the kernel uint32 t CUpti_ActivityKernel2 deviceld Description The ID of the device where the kernel is executing int32 t CUpti ActivityKerneU2 dynamicSharedMemory Description The dynamic shared memory reserved for the kernel in bytes www nvidia com CUPTI DA 0567
81. ciated circuitry uint32_t CUpti_ActivityEnvironment powerLimit Description The power in milliwatts that will trigger power management algorithm uint32 t CUpti_ActivityEnvironment smClock Description The SM frequency in MHz CUpti_ActivityEnvironment 6 7 CUpti_ActivityEnvironment speed Description Data returned for CUPTI_ACTIVITY_ENVIRONMENT_SPEED environment kind www nvidia com CUPTI DA 05679 001 _v5 5 124 Data Structures CUpti_ActivityEnvironment 6 8 CUpti ActivityEnvironment temperature Description Data returned for CUPTI ACTIVITY ENVIRONMENT TEMPERATURE environment kind uint64 t CUpti ActivityEnvironment timestamp Description The timestamp when this sample was retrieved in ns A value of 0 indicates that timestamp information could not be collected for the marker 3 8 CUpti ActivityEvent Struct Reference The activity record for a CUPTI event This activity record represents a CUPTI event value CUPTI ACTIVITY KIND EVENT This activity record kind is not produced by the activity API but is included for completeness and ease of use Profile frameworks built on top of CUPTI that collect event data may choose to use this type to store the collected event data uint32 t CUpti ActivityEvent correlationld Description The correlation ID of the event Use of this ID is user defined but typically this ID value will equal the correlation ID of the kernel for which the event was gathered
82. ck domains Callback domains Each domain represents callback points for a group of related API functions or CUDA driver activity Values CUPTI_CB_DOMAIN_INVALID 0 Invalid domain CUPTI CB DOMAIN DRIVER API 1 Domain containing callback points for all driver API functions CUPTI CB DOMAIN RUNTIME API 2 Domain containing callback points for all runtime API functions CUPTI CB DOMAIN RESOURCE 3 Domain containing callback points for CUDA resource tracking CUPTI CB DOMAIN SYNCHRONIZE 4 www nvidia com CUPTI DA 05679 001 v5 5 53 Modules Domain containing callback points for CUDA synchronization CUPTI CB DOMAIN NVTX 5 Domain containing callback points for NVTX API functions CUPTI CB DOMAIN SIZE 6 CUPTI CB DOMAIN FORCE INT 0x7fffffff enum CUpti_CallbackldResource Callback IDs for resource domain Callback IDs for resource domain CUPTI CB DOMAIN RESOURCE This value is communicated to the callback function via the cbid parameter Values CUPTI CBID RESOURCE INVALID 0 Invalid resource callback ID CUPTI CBID RESOURCE CONTEXT CREATED 1 A new context has been created CUPTI CBID RESOURCE CONTEXT DESTROY STARTING 2 A context is about to be destroyed CUPTI CBID RESOURCE STREAM CREATED 3 A new stream has been created CUPTI CBID RESOURCE STREAM DESTROY STARTING 4 A stream is about to be destroyed CUPTI CBID RESOURCE CU INIT FINISHED 5 The driver has finished initializing CUPTI CBID RESOURCE SIZE
83. cted metric data uint32 t CUpti ActivityMetric correlationld Description The correlation ID of the metric Use of this ID is user defined but typically this ID value will equal the correlation ID of the kernel for which the metric was gathered www nvidia com CUPTI DA 05679 001 v5 5 146 Data Structures uint8_t CUpti ActivityMetric flags Description The properties of this metric See also CUpti ActivityFlag CUpti MetricID CUpti ActivityMetric id Description The metric ID CUpti ActivityKind CUpti ActivityMetric kind Description The activity record kind must be CUPTI ACTIVITY KIND METRIC uint8 t CUpti ActivityMetric pad Description Undefined Reserved for internal use CUpti ActivityMetric value Description The metric value 3 19 CUpti ActivityMetricInstance Struct Reference The activity record for a CUPTI metric with instance information This activity record represents a CUPTI metric value for a specific metric domain instance CUPTI ACTIVITY KIND METRIC INSTANCE This activity record kind is not produced by the activity API but is included for completeness and ease of use Profile frameworks built on top of CUPTI that collect metric data may choose to use this type to store the collected metric data This activity record should be used when metric domain instance information needs to be associated with the metric www nvidia com CUPTI DA 05679 001 v5 5 147 Data Structures
84. d CUPTI ERROR INVALID PARAMETER www nvidia com CUPTI DA 05679 001 v5 5 85 Modules if eventGroup is NULL Description Remove all events from an event group Events cannot be removed if the event group is enabled Thread safety this function is thread safe CUptiResult cuptiEventGroupRemoveEvent CUpti EventGroup eventGroup CUpti EventlD event Remove an event from an event group Parameters eventGroup The event group event The event to remove from the group Returns gt CUPTI SUCCESS gt CUPTI ERROR NOT INITIALIZED gt CUPTI ERROR INVALID EVENT ID gt CUPTI ERROR INVALID OPERATION if eventGroup is enabled gt CUPTI ERROR INVALID PARAMETER if eventGroup is NULL Description Remove event from the an event group The event cannot be removed if the event group is enabled Thread safety this function is thread safe www nvidia com CUPTI DA 05679 001 v5 5 86 Modules CUptiResult cuptiEventGroupResetAllEvents CUpti_EventGroup eventGroup Zero all the event counts in an event group Parameters eventGroup The event group Returns gt CUPTI_SUCCESS gt CUPTI ERROR NOT INITIALIZED gt CUPTI ERROR HARDWARE gt CUPTI ERROR INVALID PARAMETER if eventGroup is NULL Description Zero all the event counts in an event group Thread safety this function is thread safe but client must guard against simultaneous destruction or modification of eventGroup for ex
85. ding instances that cannot be profiled Use CUPTI EVENT DOMAIN ATTR INSTANCE COUNT to get the number of instances that can be profiled Can be read only with cuptiDeviceGetEventDomainAttribute Value is a uint32 t CUPTI EVENT DOMAIN ATTR COLLECTION METHOD 4 Collection method used for events contained in the event domain Value is a CUpti EventCollectionMethod CUPTI EVENT DOMAIN ATTR FORCE INT Ox7fffffff enum CUpti EventGroupAttribute Event group attributes Event group attributes These attributes can be read using cuptiEventGroupGetAttribute Attributes marked rw can also be written using cuptiEventGroupSetAttribute Values CUPTI EVENT GROUP ATTR EVENT DOMAIN ID 0 The domain to which the event group is bound This attribute is set when the first event is added to the group Value is a CUpti_EventDomainID CUPTI EVENT GROUP ATTR PROFILE ALL DOMAIN INSTANCES 1 rw Profile all the instances of the domain for this eventgroup This feature can be used to get load balancing across all instances of a domain Value is an integer CUPTI EVENT GROUP ATTR USER DATA 2 www nvidia com CUPTI DA 05679 001 v5 5 65 Modules rw Reserved for user data CUPTI EVENT GROUP ATTR NUM EVENTS 3 Number of events in the group Value is a uint32_t CUPTI EVENT GROUP ATTR EVENTS 4 Enumerates events in the group Value is a pointer to buffer of size sizeof CUpti_EventID num of events in the eventgroup num of events can be queried using C
86. domainArray Returns the IDs of the event domains for the device Returns gt CUPTI SUCCESS gt CUPTI ERROR NOT INITIALIZED gt CUPTI ERROR INVALID DEVICE gt CUPTI ERROR INVALID PARAMETER if arraySizeBytes or domainArray are NULL Description Returns the event domains IDs in domainArray for a device The size of the domainArray buffer is given by arraySizeBytes The size of the domainArray buffer must be at least numdoma ins sizeof CUpti EventDomainID or else all domains will not be returned The value returned in arraySizeBytes contains the number of bytes returned in domainArray Thread safety this function is thread safe www nvidia com CUPTI DA 05679 001 _v5 5 67 Modules CUptiResult cuptiDeviceGetAttribute CUdevice device CUpti_DeviceAttribute attrib size_t valueSize void value Read a device attribute Parameters device The CUDA device attrib The attribute to read valueSize Size of buffer pointed by the value and returns the number of bytes written to value value Returns the value of the attribute Returns gt CUPTILSUCCESS gt CUPTI_LERROR_NOT_INITIALIZED gt CUPTI ERROR INVALID DEVICE gt CUPTI ERROR INVALID PARAMETER if valueSize or value is NULL or if attrib is not a device attribute gt CUPTI ERROR PARAMETER SIZE NOT SUFFICIENT For non c string attribute values indicates that the value buffer is too small to hold the attribute value D
87. due to global Single context memory cache misses for each instruction executed local replay overhead Average number of replays due to local Single context memory accesses for each instruction executed gld efficiency Ratio of requested global memory load Single context throughput to required global memory load throughput gst efficiency Ratio of requested global memory store Single context throughput to required global memory store throughput gld transactions Number of global memory load transactions Single context gst transactions Number of global memory store transactions Single context gld transactions per request Average number of global memory load Single context transactions performed for each global memory load gst transactions per request Average number of global memory store Single context transactions performed for each global memory store local load transactions Der Average number of local memory load Single context request transactions performed for each local memory load www nvidia com CUPTI DA 05679 001 v5 5 13 Metric Name local_store_transactions_per_ request Introduction Average number of local memory store Single context transactions performed for each local memory store shared_load_transactions_per_ request shared_store_transactions_per_ request shared_efficiency Average number of shared memory load Single context transactions performed for each shared memory load Average nu
88. dules CUptiResult cuptiActivityFlush CUcontext context uint32_t streamld uint32_t flag Wait for all activity records are delivered via the completion callback Parameters context A valid CUcontext or NULL streamld The stream ID flag Reserved must be 0 Returns gt CUPTI_SUCCESS gt CUPTI ERROR NOT INITIALIZED gt CUPTI ERROR CUPTI ERROR INVALID OPERATION if not preceeded by a successful call to cuptiActivityRegisterCallbacks gt CUPTI ERROR UNKNOWN an internal error occurred Description This function does not return until all activity records associated with the specified context stream are returned to the CUPTI client using the callback registered in cuptiActivityRegisterCallbacks To ensure that all activity records are complete the requested stream s if any are synchronized If context is NULL the global activity records i e those not associated with a particular stream are flushed in this case no streams are synchonized If context isa valid CUcontext and streamId is 0 the buffers of all streams of this context are flushed Otherwise the buffers of the specified stream in this context is flushed Before calling this function the buffer handling callback api must be activated by calling cuptiActivityRegisterCallbacks www nvidia com CUPTI DA 05679 001 v5 5 44 Modules CUptiResult cuptiActivityFlushAll uint32_t flag Wait for all activity records are delivered via the
89. e and cuptiMetricCreateEventGroupSets You can determine the events available on a device using the cuptiDeviceEnumEventDomains and cuptiEventDomainEnumEvents functions The cupti_query sample described on the samples page shows how to use these functions You can also enumerate all the CUPTI events available on any device using the cuptiEnumEventDomains function Configuring and reading event counts requires the following steps First select your event collection mode If you want to count events that occur during the execution of a kernel use cuptiSetEventCollectionMode to set mode CUPTI EVENT COLLECTION MODE KERNEL If you want to continuously sample the event counts use mode CUPTI EVENT COLLECTION MODE CONTINUOUS www nvidia com CUPTI DA 05679 001 _v5 5 7 Introduction Next determine the names of the events that you want to count and then use the cuptiEventGroupCreate cuptiEventGetIdFromName and cuptiEventGroupAddEvent functions to create and initialize an event group with those events If you are unable to add all the events to a single event group then you will need to create multiple event groups Alternatively you can use the cuptiEventGroupSetsCreate function to automatically create the event group s required for a set of events To begin counting a set of e
90. e to peak utilization The utilization level of the L2 cache relative to the peak utilization The utilization level of the texture cache relative to the peak utilization The utilization level of the device memory relative to the peak utilization The utilization level of the system memory relative to the peak utilization The utilization level of the multiprocessor function units that execute load and store instructions The utilization level of the multiprocessor function units that execute integer instructions DA 05679 001 v5 5 15 Single context Single context Single context Single context Sinlge context Single context Single context Sinlge context Single context Single context Single context Single context Single context Single context Multi context Multi context Introduction cf_fu_utilization tex_fu_utilization tex_fu_utilization fpspec_fu_utilization misc_fu_utilization flops_sp flops_dp_fma flops_sp_special www nvidia com CUPTI The utilization level of the multiprocessor Multi context function units that execute control flow instructions The utilization level of the multiprocessor Multi context function units that execute texture instructions The utilization level of the multiprocessor Multi context function units that execute floating point instructions The utilization level of the multiprocessor Multi context function units that execute speci
91. ecord providing a name union CUpti_ActivityObjectKindld Identifiers for object kinds as specified by CUpti ActivityObjectKind struct CUpti ActivityOverhead The activity record for CUPTI and driver overheads struct CUpti ActivityPreemption The activity record for a preemption of a CDP kernel struct CUpti ActivitySourceLocator The activity record for source locator enum CUpti ActivityAttribute Activity attributes These attributes are used to control the behavior of the activity API Values CUPTI ACTIVITY ATTR DEVICE BUFFER SIZE 0 The device memory reserved for storing profiling data for non CDP operations for each stream The value is a size t Larger buffers require less flush operations but consume more device memory Small buffers might increase the risk of missing timestamps for concurrent kernel records in the asynchronous buffer handling mode if too many kernels are launched replayed between context synchronizations This value only applies to new allocations Set this value before initializing CUDA or before creating a stream to ensure it is considered for the following allocations Note The actual amount of device memory per stream reserved by CUPTI might be larger CUPTI ACTIVITY ATTR DEVICE BUFFER SIZE CDP 1 The device memory reserved for storing profiling data for CDP operations for each stream The value is a size t Larger buffers require less flush operations but consume more device memory This value only
92. ecord structure is CUpti ActivityEvent CUPTI ACTIVITY KIND METRIC 7 A metric value The corresponding activity record structure is CUpti ActivityMetric CUPTI ACTIVITY KIND DEVICE 8 Information about a device The corresponding activity record structure is CUpti ActivityDevice CUPTI ACTIVITY KIND CONTEXT 9 Information about a context The corresponding activity record structure is CUpti ActivityContext CUPTI ACTIVITY KIND CONCURRENT KERNEL 10 A potentially concurrent kernel executing on the GPU The corresponding activity record structure is CUpti ActivityKernel2 CUPTI ACTIVITY KIND NAME 11 Thread device context etc name The corresponding activity record structure is CUpti ActivityName CUPTI ACTIVITY KIND MARKER 12 Instantaneous start or end marker CUPTI ACTIVITY KIND MARKER DATA 13 Extended optional data about a marker CUPTI ACTIVITY KIND SOURCE LOCATOR 14 Source information about source level result The corresponding activity record structure is CUpti ActivitySourceLocator CUPTI ACTIVITY KIND GLOBAL ACCESS 15 Results for source level global acccess The corresponding activity record structure is CUpti ActivityGlobalAccess CUPTI ACTIVITY KIND BRANCH 16 Results for source level branch The corresponding activity record structure is CUpti ActivityBranch CUPTI ACTIVITY KIND OVERHEAD 17 Overhead activity records The corresponding activity record structure is CUpti ActivityOverhead CUPTI ACTIVITY KIND CDP K
93. ed over all instances The event values passed to cuptiMetricGetValue can be aggregated values of events for all instances of the domain CUPTI METRIC EVALUATION MODE FORCE INT 0x7fffffff enum CUpti MetricPropertyDeviceClass Device class Enumeration of device classes for metric property CUPTI METRIC PROPERTY DEVICE CLASS Values CUPTI METRIC PROPERTY DEVICE CLASS TESLA 0 CUPTI METRIC PROPERTY DEVICE CLASS QUADRO 1 CUPTI METRIC PROPERTY DEVICE CLASS GEFORCE 2 enum CUpti MetricPropertyID Metric device properties Metric device properties describe device properties which are needed for a metric Some of these properties can be collected using cuDeviceGetA tribute www nvidia com CUPTI DA 05679 001 v5 5 94 Modules Values CUPTI METRIC PROPERTY MULTIPROCESSOR COUNT CUPTI METRIC PROPERTY WARPS PER MULTIPROCESSOR CUPTI METRIC PROPERTY KERNEL GPU TIME CUPTI METRIC PROPERTY CLOCK RATE CUPTI METRIC PROPERTY FRAME BUFFER COUNT CUPTI METRIC PROPERTY GLOBAL MEMORY BANDWIDTH CUPTI METRIC PROPERTY PCIE LINK RATE CUPTI METRIC PROPERTY PCIE LINK WIDTH CUPTI METRIC PROPERTY PCIE GEN CUPTI METRIC PROPERTY DEVICE CLASS enum CUpti_MetricValueKind Kinds of metric values Metric values can be one of several different kinds Corresponding to each kind is a member of the CUpti_MetricValue union The metric value returned by cuptiMetricGetValue should be accessed using the appropriate member of that union based on its
94. eeseeeeeeeee eene eee nee ehe hh hh ehh 64 CUpti EventDomaimnAttribute eene eene eher enne 65 CUpti EventGroupAttribUte enge g nie ro neret tree re ne delia 65 CUpti ReadEventF lags css nere eo reu ry AEN ENNEN NN ner enr save en nw RE CU ES n EEN ERE eg 66 Io Dro rib dE ENNEN rr NENNEN EE EEN NEEN ARRE NENNEN d NENNEN 66 Hipp Een TOM gebeten geg gege ENEE EEN SEENEN ii 66 CUpti EventlD woods o A dj 66 cuptiDeviceEnumEventDomains eeeesseseseeeeeeeee eee eene hene nhe nnn 67 cuptiDeviceGetAttrib te EE 68 cuptiDeviceGetEventDomainAttribUte e eee eene eene 68 cuptiDeviceGetNumEventDomamns eene enhn ehe hne ehh e enne nnne 70 cuptiDeviceGetTimestambp 3 Se coins eran NEEN ee RR ENEE odas sean sada nodules oda REIR NEEN 70 cuptiDisableKernelReplayMode eene ne eene ehh tenens eene 71 CuptiEnablekernelReplavMode eene nhe ehh nhe nnn 71 www nvidia com CUPTI DA 05679 001 v5 5 v cuptiEnumEventDomallS ss V ENER NK cde Kee rone ro rao SEAN WER NEEN airada EK SEENEN cs 72 cuptiEventDomainEnumEvents ooooccccononcorononccnanorccnonnonororacnonncoroncnnoncarancnsonosa 73 CUuptEventDomaintetAttribute eene eee he heme eene enne 73 CUPtiEVentDoMAINGETNUMEVENMS cece eee ee sees eset esses eene eene ehe ense enne nnne 75 ele nee d e 75 CuptEventGetldFromhame sense e e essen enne 76 cuptiEventGroupAddEvent eee heat rhet rho A EE ge ENEE E geg 77 cuptiEventGroupCreate ege kee reae reete et n eR RR ERIS RUIT ER
95. el of the texture cache relative to the peak utilization The utilization level of the device memory relative to the peak utilization The utilization level of the system memory relative to the peak utilization The utilization level of the multiprocessor function units that execute load and store instructions The utilization level of the multiprocessor function units that execute integer instructions The utilization level of the multiprocessor function units that execute control flow instructions The utilization level of the multiprocessor function units that execute texture instructions The utilization level of the multiprocessor function units that execute floating point instructions Sinlge context Single context Single context Sinlge context Single context Single context Single context Single context Single context Single context DA 05679 001 v5 5 21 Introduction fpspec_fu_utilization misc_fu_utilization flops_dp_add flops_dp_mul flops_dp_fma flops_sp_special stall_inst_fetch stall_exec_dependency stall data request www nvidia com CUPTI The utilization level of the multiprocessor Multi context function units that execute special floating point instructions The utilization level of the multiprocessor Multi context function units that execute miscellaneous instructions Single precision floating point operations Multi context executed Single precision floating point add ope
96. ent counters on a CUDA enabled device The following terminology is used by the event APL Event An event is a countable activity action or occurrence on a device Event ID Each event is assigned a unique identifier A named event will represent the same activity action or occurrence on all device types But the named event may have different IDs on different device families Use cuptiEventGet IdFromName to get the ID for a named event on a particular device Event Category Each event is placed in one of the categories defined by CUpti_EventCategory The category indicates the general type of activity action or occurrence measured by the event Event Domain A device exposes one or more event domains Each event domain represents a group of related events available on that device A device may have multiple instances of a domain indicating that the device can simultaneously record multiple instances of each event within that domain Event Group An event group is a collection of events that are managed together The number and type of events that can be added to an event group are subject to device specific limits At any given time a device may be configured to count events from a limited number of event groups All events in an event group must belong to the same event domain Event Group Set An event group set is a collection of event groups that can be enabled at the same time Event group sets are created by cuptiEventGroupSetsCreat
97. er buffer eventValueBufferSizeBytes indicates the size of eventValueBuf fer The buffer must be at least sizeof uint64 number of events in group if CUPTI EVENT GROUP ATTR PROFILE ALL DOMAIN INSTANCES is not set on the group containing the events The buffer must be at least sizeof uint64 number of domain instances number of events in group if CUPTI EVENT GROUP ATTR PROFILE ALL DOMAIN INSTANCES is set on the group The data format returned in eventValueBuf fer is gt domain instance 0 event0 event eventN gt domain instance 1 event0 event eventN gt domain instance M event event eventN The event order in eventValueBuf fer is returned in event IdArray The size of eventIdArray is specified in event IdArraySizeBytes The size should be at least sizeof CUpti EventID number of events in group If any instance of any event counter overflows the value returned for that event instance will be CUPTI EVENT OVERFLOW The only allowed value for 1ags is CUPTI EVENT READ FLAG NONE Reading events from a disabled event group is not allowed After being read an event s value is reset to zero Thread safety this function is thread safe but client must guard against simultaneous destruction or modification of eventGroup for example client must guard against simultaneous calls to cuptiEventGroupDestroy cuptiEventGroupAddEvent etc and must guard aga
98. ers required for each thread executing the kernel uint8_t CUpti_ActivityCdpKernel requested Description The cache configuration requested by the kernel The value is one of the CUfunc_cache enumeration values from cuda h www nvidia com CUPTI DA 05679 001 _v5 5 116 Data Structures uint8_t CUpti_ActivityCdpKernel sharedMemoryConfig Description The shared memory configuration used for the kernel The value is one of the CUsharedconfig enumeration values from cuda h uint64_t CUpti_ActivityCdpKernel start Description The start timestamp for the kernel execution in ns A value of 0 for both the start and end timestamps indicates that timestamp information could not be collected for the kernel int32 t CUpti_ActivityCdpKernel staticSharedMemory Description The static shared memory allocated for the kernel in bytes uint32 t CUpti_ActivityCdpKernel streamid Description The ID of the stream where the kernel is executing uint64_t CUpti_ActivityCdpKernel submitted Description The timestamp when kernel is submitted to the gpu in ns A value of CUPTI_TIMESTAMP_UNKNOWN indicates that the submission time is unknown 3 5 CUpti_ActivityContext Struct Reference The activity record for a context This activity record represents information about a context CUPTI_ACTIVITY_KIND_CONTEXT www nvidia com CUPTI DA 05679 001 _v5 5 117 Data Structures CUpti_ActivityComputeApikind CUpti_ActivityContext
99. es When a new activity record needs to be recorded CUPTI searches for a non empty queue to hold the record in this order 1 the appropriate stream queue 2 the appropriate context queue If the search does not find any queue with a buffer then the activity record is dropped If the search finds a queue containing a buffer but that buffer is full then the activity record is dropped and the dropped record count for the queue is incremented If the search finds a queue containing a buffer with space available to hold the record then the record is recorded in the buffer At a minimum one or more buffers must be queued in the global queue and context queue at all times to avoid dropping activity records Global queue will not store any activity records for gpu activity kernel memcpy memset It is also necessary to enqueue at least one buffer in the context queue of each context as it is created The stream queues are optional and can be used to reduce or eliminate application perturbations caused by the need to process or save the activity records returned in the buffers For example if a stream queue is used that queue can be flushed when the stream is synchronized DEPRECATED This method is deprecated and will be removed in a future release The new asynchronous API implemented by cuptiActivityRegisterCallbacks cuptiActivityFlush and cuptiActivityFlushAll should be adopted www nvidia com CUPTI DA 05679 001 _v5 5 43 Mo
100. escription Read a device attribute and return it in value Thread safety this function is thread safe CUptiResult cuptiDeviceGetEventDomainAttribute CUdevice device CUpti EventDomainID eventDomain www nvidia com CUPTI DA 05679 001 v5 5 68 Modules CUpti_EventDomainAttribute attrib size_t valueSize void value Read an event domain attribute Parameters device The CUDA device eventDomain ID of the event domain attrib The event domain attribute to read valueSize The size of the value buffer in bytes and returns the number of bytes written to value value Returns the attribute s value Returns gt CUPTI SUCCESS gt CUPTI_ERROR_NOT_INITIALIZED gt CUPTI_LERROR_INVALID_DEVICE gt CUPTI_LERROR_INVALID_EVENT_DOMAIN_ID gt CUPTI ERROR INVALID PARAMETER if valueSize or value is NULL or if attrib is not an event domain attribute gt CUPTI ERROR PARAMETER SIZE NOT SUFFICIENT For non c string attribute values indicates that the value buffer is too small to hold the attribute value Description Returns an event domain attribute in value The size of the value buffer is given by valueSize The value returned in valueSize contains the number of bytes returned in value If the attribute value is a c string that is longer than valueSize then only the first valueSize characters will be returned and there will be no terminating null byte Thread safety this function is thread safe
101. event event s category is accessed using cuptiEventGetAttribute and the CUPTI EVENT ATTR CATEGORY attribute www nvidia com CUPTI DA 05679 001 v5 5 63 Modules Values CUPTI EVENT CATEGORY INSTRUCTION 0 An instruction related event CUPTI EVENT CATEGORY MEMORY 1 A memory related event CUPTI EVENT CATEGORY CACHE 2 A cache related event CUPTI EVENT CATEGORY PROFILE TRIGGER 3 A profile trigger event CUPTI EVENT CATEGORY FORCE INT 0x7fffffff enum CUpti_EventCollectionMethod The collection method used for an event The collection method indicates how an event is collected Values CUPTI EVENT COLLECTION METHOD PM 0 Event is collected using a hardware global performance monitor CUPTI EVENT COLLECTION METHOD SM 1 Event is collected using a hardware SM performance monitor CUPTI EVENT COLLECTION METHOD INSTRUMENTED 2 Event is collected using software instrumentation CUPTI EVENT COLLECTION METHOD FORCE INT 0x7fffffff enum CUpti EventCollectionMode Event collection modes The event collection mode determines the period over which the events within the enabled event groups will be collected Values CUPTI EVENT COLLECTION MODE CONTINUOUS 0 Events are collected for the entire duration between the cuptiEventGroupEnable and cuptiEventGroupDisable calls This is the default mode For devices with compute capability less than 2 0 event values are reset when a kernel is launched For all other devices even
102. example if the metric value type is unsigned and the computed metric value is negative gt CUPTI ERROR INVALID PARAMETER if metricValue eventIdArray or eventValueArray is NULL www nvidia com CUPTI DA 05679 001 v5 5 104 Modules Description Use the events collected for a metric to calculate the metric value Metric value evaluation depends on the evaluation mode CUpti_MetricEvaluationMode that the metric supports If a metric has evaluation mode as CUPTI_METRIC_EVALUATION_MODE_PER_INSTANCE then it assumes that the input event value is for one domain instance If a metric has evaluation mode as CUPTI_METRIC_EVALUATION_MODE_AGGREGATE it assumes that input event values are normalized to represent all domain instances on a device For the most accurate metric collection the events required for the metric should be collected for all profiled domain instances For example to collect all instances of an event set the CUPTI EVENT GROUP ATTR PROFILE ALL DOMAIN INSTANCES attribute on the group containing the event to 1 The normalized value for the event is then sum event values totalInstanceCount instanceCount where sum event values is the summation of the event values across all profiled domain instances totalInstanceCount is obtained from querying CUPTI EVENT DOMAIN ATTR TOTAL INSTANCE COUNT and instanceCount is obtained from querying CUPTI EVENT GROUP ATTR INSTANCE COUNT or CUPTI EVENT DOMAIN ATTR INSTANCE COUN
103. following table A scope value of single context indicates that the metric can only be accurately collected when a single context CUDA or graphic is executing on the GPU A scope value of multi context indicates that the metric can be accurately collected when multiple contexts are executing on the GPU Table 1 Capability 1 x Metrics branch_efficiency Ratio of non divergent branches to total Single context branches gld_efficiency Ratio of requested global memory load Single context transactions to actual global memory load transactions gst_efficiency Ratio of requested global memory store Single context transactions to actual global memory store transactions gld_requested_throughput Requested global memory load throughput gst_requested_throughput Requested global memory store throughput 1 6 2 Metric Reference Compute Capability 2 x Devices with compute capability between 2 0 inclusive and 3 0 implement the metrics shown in the following table A scope value of single context indicates that the metric can only be accurately collected when a single context CUDA or graphic is executing on the GPU A scope value of multi context indicates that the metric can be accurately collected when multiple contexts are executing on the GPU www nvidia com CUPTI DA 05679 001 _v5 5 11 Introduction Table 2 Capability 2 x Metrics sm_efficiency The percentage of time at least one warp is Single context active on a multiprocessor ave
104. functions cuptiEnableCallback and cuptiEnableAllDomains can also be used to associate NVTX API functions with a callback see reference below for more information The following code shows a typical callback function void CUPTIAPI my callback void userdata CUpti CallbackDomain domain CUpti CallbackId cbid const void cbdata const CUpti NvtxData nvtxInfo CUpti NvtxData cbdata MyDataStruct my data MyDataStruct userdata if domain CUPTI CB DOMAIN NVTX amp amp cbid NVTX CBID CORE NameOsThreadA nvtxNameOsThreadA params params nvtxNameOsThreadA params nvtxInfo gt functionParams In your callback function you use the CUpti_CallbackDomain and CUpti CallbackID parameters to determine which NVTX API function invocation is causing this callback In the example above we are checking for the nvtxNameOsThreadA function The cbdata parameter holds a structure of useful information that can be used within the callback In this case we use the functionParams member to access the parameters that were passed to nvtxNameOsThreadA To access the parameters we first cast functionParams toa structure type corresponding to the nvtxNameOsThreadaA function These parameter structures are contained in generated nvtx meta h www nvidia com CUPTI DA 05679 001 v5 5 6 Introduction 1 5 CUPTI Event API The CUPTI Event API allows you to query configure start stop and read the ev
105. he buffer must be at least sizeof uint64 if CUPTI EVENT GROUP ATTR PROFILE ALL DOMAIN INSTANCES is not set on the group containing the event The buffer must be at least sizeof uint64 number of domain instances if CUPTI EVENT GROUP ATTR PROFILE ALL DOMAIN INSTANCES is set on the group If any instance of an event counter overflows the value returned for that event instance will be CUPTI EVENT OVERFLOW The only allowed value for flags is CUPTI EVENT READ FLAG NONE Reading an event from a disabled event group is not allowed After being read an event s value is reset to zero Thread safety this function is thread safe but client must guard against simultaneous destruction or modification of eventGroup for example client must guard against simultaneous calls to cuptiEventGroupDestroy cuptiEventGroupAddEvent etc and must guard against simultaneous destruction of the context in which eventGroup was created for example client must guard against simultaneous calls to cudaDeviceReset cuCtxDestroy etc If cuptiEventGroupResetAllEvents is called simultaneously with this function then returned event values are undefined CUptiResult cuptiEventGroupRemoveAllEvents CUpti_EventGroup eventGroup Remove all events from an event group Parameters eventGroup The event group Returns gt CUPTI_SUCCESS gt CUPTI ERROR NOT INITIALIZED gt CUPTI ERROR INVALID OPERATION if eventGroup is enable
106. hose metrics Parameters context The context for event collection metricIdArraySizeBytes Size of the metricldArray in bytes metricIdArray Array of metric IDs www nvidia com CUPTI DA 05679 001 v5 5 98 Modules eventGroupPasses Returns a CUpti_EventGroupSets object that indicates the number of passes required to collect the events and the events to collect on each pass Returns gt CUPTI SUCCESS gt CUPTI_ERROR_NOT_INITIALIZED gt CUPTI ERROR INVALID CONTEXT gt CUPTI ERROR INVALID METRIC ID gt CUPTI ERROR INVALID PARAMETER ifmetricIdArray or eventGroupPasses is NULL Description For a set of metrics get the grouping that indicates the number of passes and the event groups necessary to collect the events required for those metrics See also cuptiEventGroupSetsCreate for details on event group set creation CUptiResult cuptiMetricEnumEvents CUpti_MetriclD metric size t eventldArraySizeBytes CUpti EventlD eventldArray Get the events required to calculating a metric Parameters metric ID of the metric eventIdArraySizeBytes The size of event IdArray in bytes and returns the number of bytes written to eventlIdArray eventIdArray Returns the IDs of the events required to calculate metric Returns gt CUPTILSUCCESS gt CUPTI_LERROR_NOT_INITIALIZED gt CUPTI ERROR INVALID METRIC ID www nvidia com CUPTI DA 05679 001 _v5 5 99 Modules gt CUPTI ERROR IN
107. i_ActivityBranch CUpti_ActivityGlobalAccess CUpti_ActivitySourceLocator CUpti ActivityMetricInstance CUpti ActivityMetric CUpti ActivityEventInstance CUpti_ActivityEvent CUpti Activity API CUpti ActivityPreemption CUpti_ActivityCdpKernel CUpti_ActivityKernel2 CUpti_ActivityKernel CUpti_ActivityMemset CUpti_ActivityMemcpy2 CUpti_ActivityMemcpy L 12 transactions CUpti_ActivityGlobalAccess 12CacheSize CUpti_ActivityDevice lineNumber CUpti_ActivitySourceLocator localMemoryPerThread CUpti_ActivityKernel CUpti_ActivityKernel2 CUpti_ActivityCdpKernel localMemoryTotal CUpti_ActivityCdpKernel CUpti_ActivityKernel CUpti_ActivityKernel2 www nvidia com CUPTI Data Fields DA 05679 001 _v5 5 164 M maxBlockDimX CUpti_ActivityDevice maxBlockDimY CUpti_ActivityDevice maxBlockDimZ CUpti_ActivityDevice maxBlocksPerMultiprocessor CUpti_ActivityDevice maxGridDimX CUpti_ActivityDevice maxGridDimY CUpti_ActivityDevice maxGridDimZ CUpti_ActivityDevice maxIPC CUpti_ActivityDevice maxRegistersPerBlock CUpti_ActivityDevice maxSharedMemoryPerBlock CUpti_ActivityDevice maxThreadsPerBlock CUpti_ActivityDevice maxWarpsPerMultiprocessor CUpti_ActivityDevice memoryClock CUpti_ActivityEnvironment N name CUpti_ActivityKernel CUpti_ActivityKernel2 CUpti_ActivityDevice CUpti_ActivityName CUpti_ActivityCdpKernel CUpti_ActivityMarker numEventGroups CUpti_EventGroupSet numMemcpyEngines CUpti_ActivityDevice numM
108. id Get the current enabled disabled state of a callback for a specific domain and function ID Parameters enable Returns non zero if callback enabled zero if not enabled subscriber Handle to the initialize subscriber domain The domain of the callback cbid The ID of the callback Returns gt CUPTI SUCCESS on success gt CUPTI_ERROR_NOT_INITIALIZED if unable to initialized CUPTI gt CUPTI ERROR INVALID PARAMETER if enabled is NULL or if subscriber domain or cbid is invalid Description Returns non zero in enable if the callback for a domain and callback ID is enabled and zero if not enabled Thread safety a subscriber must serialize access to cuptiGetCallbackState cuptiEnableCallback cuptiEnableDomain and cuptiEnableAllDomains For example if cuptiGetCallbackState sub d c and cuptiEnableCallback sub d c are called concurrently the results are undefined www nvidia com CUPTI DA 05679 001 _v5 5 59 Modules CUptiResult cuptiSubscribe CUpti_SubscriberHandle subscriber CUpti_CallbackFunc callback void userdata Initialize a callback subscriber with a callback function and user data Parameters subscriber Returns handle to initialize subscriber callback The callback function userdata A pointer to user data This data will be passed to the callback function via the userdata paramater Returns gt CUPTI_SUCCESS on success gt CUPTI ERROR NOT INITIALIZED if unable t
109. iling library that will provide the implementation of the NVTX callbacks To receive callbacks you must set the NVTX environment variables appropriately so that when the application calls an NVTX www nvidia com CUPTI DA 05679 001 _v5 5 5 Introduction function your profiling library recieve the callbacks The following code sequence shows a typical initialization sequence to enable NVTX callbacks and activity records Set env so CUPTI based profiling library loads on first nvtx call char inj32 path path to 32 bit version of cupti based profiling library char inj64 path path to 64 bit version of cupti based profiling library setenv NVTX INJECTION32 PATH asa patap L setenv NVTX INJECTION64 PATH inj64 path 1 The following code shows a typical sequence used to associate a callback function with one or more NVTX functions To simplify the presentation error checking code has been removed CUpti SubscriberHandle subscriber ngem melen ere met cuptiSubscribe amp subscriber CUpti CallbackFunc my callback my data cuptiEnableDomain 1 subscriber CUPTI CB DOMAIN NVTX First cuptiSubscribe is used to initialize a subscriber with the my callback callback function Next cuptiEnableDomain is used to associate that callback with all the NVTX functions Using this code sequence will cause my callback to be called once each time any of the NVTX functions are invoked CUPTI callback API
110. inAttribute attrib size_t valueSize void value Read an event domain attribute Parameters eventDomain ID of the event domain attrib The event domain attribute to read valueSize The size of the value buffer in bytes and returns the number of bytes written to value value Returns the attribute s value Returns gt CUPTI SUCCESS gt CUPTI_ERROR_NOT_INITIALIZED gt CUPTI_ERROR_INVALID_EVENT_DOMAIN_ID gt CUPTI ERROR INVALID PARAMETER if valueSize or value is NULL or if attrib is not an event domain attribute gt CUPTI ERROR PARAMETER SIZE NOT SUFFICIENT For non c string attribute values indicates that the value buffer is too small to hold the attribute value Description Returns an event domain attribute in value The size of the value buffer is given by valueSize The value returned in valueSize contains the number of bytes returned in value If the attribute value is a c string that is longer than valueSize then only the first valueSize characters will be returned and there will be no terminating null byte Thread safety this function is thread safe www nvidia com CUPTI DA 05679 001 v5 5 74 Modules CUptiResult cuptiEventDomainGetNumEvents CUpti_EventDomainID eventDomain uint32_t numEvents Get number of events in a domain Parameters eventDomain ID of the event domain numEvents Returns the number of events in the domain Returns gt CUPTI SUCCESS gt CUPTI E
111. indicates the number of passes and the event groups necessary to collect the events Parameters context The context for event collection eventIdArraySizeBytes Size of event IdArray in bytes eventIdArray Array of event IDs that need to be grouped eventGroupPasses Returns a CUpti_EventGroupSets object that indicates the number of passes required to collect the events and the events to collect on each pass Returns gt CUPTI_SUCCESS gt CUPTI ERROR NOT INITIALIZED gt CUPTI ERROR INVALID CONTEXT gt CUPTI ERROR INVALID EVENT ID gt CUPTI ERROR INVALID PARAMETER if eventIdArray or eventGroupPasses is NULL Description The number of events that can be collected simultaneously varies by device and by the type of the events When events can be collected simultaneously they may need to be grouped into multiple event groups because they are from different event domains This function takes a set of events and determines how many passes are required to collect all those events and which events can be collected simultaneously in each pass The CUpti_EventGroupSets returned in eventGroupPasses indicates how many passes are required to collect the events with the numSets field Within each event group set the sets array indicates the event groups that should be collected on each pass www nvidia com CUPTI DA 05679 001 _v5 5 90 Modules Thread safety this function is thread safe but client must guard agai
112. inst simultaneous destruction of the context in which www nvidia com CUPTI DA 05679 001 v5 5 83 Modules eventGroup was created for example client must guard against simultaneous calls to cudaDeviceReset cuCtxDestroy etc If cuptiEventGroupResetAllEvents is called simultaneously with this function then returned event values are undefined CUptiResult cuptiEventGroupReadEvent CUpti_EventGroup eventGroup CUpti_ReadEventFlags flags CUpti EventID event size_t eventValueBufferSizeBytes uint64_t eventValueBuffer Read the value for an event in an event group Parameters eventGroup The event group flags Flags controlling the reading mode event The event to read eventValueBufferSizeBytes The size of eventValueBuffer in bytes and returns the number of bytes written to eventValueBuffer eventValueBuffer Returns the event value s Returns gt CUPTI SUCCESS gt CUPTI ERROR NOT INITIALIZED gt CUPTI ERROR INVALID EVENT ID gt CUPTI ERROR HARDWARE gt CUPTI ERROR INVALID OPERATION if eventGroup is disabled gt CUPTI ERROR INVALID PARAMETER if eventGroup eventValueBufferSizeBytes oreventValueBuffer is NULL www nvidia com CUPTI DA 05679 001 v5 5 84 Modules Description Read the value for an event in an event group The event value is returned in the eventValueBuffer buffer eventValueBufferSizeBytes indicates the size of the eventValueBuf fer buffer T
113. is busy Description Enable an event group Enabling an event group zeros the value of all the events in the group and then starts collection of those events Thread safety this function is thread safe CUptiResult cuptiEventGroupGetAttribute CUpti EventGroup eventGroup CUpti EventGroupAttribute attrib size t valueSize void value Read an event group attribute Parameters eventGroup The event group attrib The attribute to read valueSize Size of buffer pointed by the value and returns the number of bytes written to value value Returns the value of the attribute Returns gt CUPII SUCCESS gt CUPTI ERROR NOT INITIALIZED gt CUPTI ERROR INVALID PARAMETER if valueSize or value is NULL or if at trib is not an eventgroup attribute gt CUPTI ERROR PARAMETER SIZE NOT SUFFICIENT www nvidia com CUPTI DA 05679 001 v5 5 81 Modules For non c string attribute values indicates that the value buffer is too small to hold the attribute value Description Read an event group attribute and return it in value Thread safety this function is thread safe but client must guard against simultaneous destruction or modification of eventGroup for example client must guard against simultaneous calls to cuptiEventGroupDestroy cuptiEventGroupAddEvent etc and must guard against simultaneous destruction of the context in which eventGroup was created for example client must guard against simultaneous
114. ity API CUpti_ActivityContext CUpti_ActivityDevice CUpti_ActivityEvent CUpti_ActivityEventInstance CUpti_ActivityKernel CUpti_ActivityKernel2 CUpti_ActivityCdpKernel CUpti_ActivityPreemption CUpti_ActivityMemcpy CUpti_ActivityMemcpy2 CUpti_ActivityMemset CUpti_ActivityMetric CUpti_ActivityMetricInstance CUpti_ActivityName CUpti_ActivityMarker CUpti_ActivityMarkerData CUpti_ActivitySourceLocator CUpti_ActivityGlobalAccess CUpti_ActivityBranch CUpti_ActivityOverhead CUpti_ActivityEnvironment Values CUPTI_ACTIVITY_KIND_INVALID 0 The activity record is invalid CUPTI ACTIVITY KIND MEMCPY 1 Modules A host lt gt host host lt gt device or device lt gt device memory copy The corresponding activity record structure is CUpti_ActivityMemcpy CUPTI ACTIVITY KIND MEMSET 2 A memory set executing on the GPU The corresponding activity record structure is CUpti ActivityMemset CUPTI ACTIVITY KIND KERNEL 3 www nvidia com CUPTI DA 05679 001 v5 5 33 Modules A kernel executing on the GPU The corresponding activity record structure is CUpti_ActivityKernel2 CUPTI_ACTIVITY_KIND_DRIVER 4 A CUDA driver API function execution The corresponding activity record structure is CUpti_ActivityAPI CUPTI ACTIVITY KIND RUNTIME 5 A CUDA runtime API function execution The corresponding activity record structure is CUpti Activity API CUPTI ACTIVITY KIND EVENT 6 An event value The corresponding activity r
115. ityGetAttribute oooooconconcocconnrcnnnonanocncccnconarcononcrccnccnnccncnconcccncancrcaneso 45 cuptiActivityGetNextRecord eise ene ennt ents SEENEN NEE nena nn Eae NEEN dE EN EEN dE 46 cuptiActivityGetNumDroppedRecords ecceee eee ee eect eee e eee ee eene ee he ehe eene 47 cuptiActivityQueryBuffer 2 ee ceeds k rini VEER dees NEEN NENNEN NENNEN EN ENER NNN ENEE ene 48 CuptiActivityRegisterCallbacks sek ee eire erre eg Ae NEEN eo va EEN EA NENNEN ENER Ae die 48 CuptiActivitySetAttribute occ A vi A 49 CUptiGEtDeviceld miii MM 50 ele LE UE 51 CUPLIGEL IMESCAMP Ss sants 51 CUPTI CORRELATION ID UNKNOWN ccs cecceeeeeeee cece Hehe hee eee he eme eene nnn 52 CUPTI GRID ID UNKNOWN eege roro NN EK EEN KEREN detiene conics deidad VERRE NEES EEN 52 CUPTI SOURCE LOCATOR ID UNKNOWN III II IH he eee he ehe es enne 52 CUPTI TIMESTAMP UNKNOWN eese enhn nnne ores hn KEEN NENNEN ENEE E EREET RR 52 www nvidia com CUPTI DA 05679 001 _v5 5 iv 2 4 CUPTI Callback AP l 2o a a NEE EE NEE aia 52 CUpti Call ackData TEE TE 53 CUpti NytxData ex cest KEREN incidan KAREN E EENS NEEN AEN SEGRE a oe TERR FR KEEN AA 53 CUptI ResoUrceData iii a dE KE 53 CUpti SynchronizeData radio 53 CUpti ApiCallbackSite i inicia EEN NENNEN Nee EEN SEENEN NEEN aree E gs 53 CUpti CallbackDomain enero rostro ehe eara ERU Oe hee ere hn eR ORA ERES ERES DERI SN 53 CUpti CallbackldResoUurce
116. ityMemcpy coreClockRate CUpti_ActivityDevice www nvidia com CUPTI Data Fields DA 05679 001 _v5 5 160 Data Fields correlationData CUpti_CallbackData correlationId CUpti_ActivityMemset CUpti_ActivityMetricInstance CUpti_ActivityCdpKernel CUpti_ActivityMemcpy CUpti_ActivityBranch CUpti_ActivityEventInstance CUpti_ActivityMetric CUpti_ActivityKernel2 CUpti_ActivityEvent CUpti_ActivityGlobalAccess CUpti_ActivityKernel CUpti_ActivityAPI CUpti_CallbackData CUpti_ActivityMemcpy2 D dcs CUpti ActivityObjectKindId deviceld CUpti_ActivityMemcpy CUpti_ActivityMemset CUpti_ActivityContext CUpti_ActivityEnvironment CUpti_ActivityKernel CUpti_ActivityMemcpy2 CUpti_ActivityKernel2 CUpti_ActivityCdpKernel diverged CUpti_ActivityBranch domain CUpti_ActivityEvent CUpti_ActivityEventInstance dstContextld CUpti_ActivityMemcpy2 dstDeviceld CUpti_ActivityMemcpy2 dstKind CUpti_ActivityMemcpy2 CUpti_ActivityMemcpy www nvidia com CUPTI DA 05679 001 _v5 5 161 dynamicSharedMemory CUpti_ActivityCdpKernel CUpti_ActivityKernel2 CUpti_ActivityKernel E end CUpti_ActivityMemcpy CUpti_ActivityMemcpy2 CUpti_ActivityKernel CUpti_ActivityOverhead CUpti_ActivityKernel2 CUpti_ActivityMemset CUpti_ActivityCdpKernel CUpti_ActivityAPI environmentKind CUpti_ActivityEnvironment eventGroups CUpti_EventGroupSet executed CUpti_ActivityGlobalAccess CUpti_ActivityKernel2 CUpti_ActivityBranch CUpti_ActivityCdpKernel
117. ivity kind is not supported Description Disable collection of a specific kind of activity record Multiple kinds can be disabled by calling this function multiple times By default all activity kinds are disabled for collection CUptiResult cuptiActivityDisableContext CUcontext context CUpti ActivityKind kind Disable collection of a specific kind of activity record for a context Parameters context The context for which activity is to be disabled kind The Kind of activity record to stop collecting Returns gt CUPTI SUCCESS gt CUPTI ERROR NOT INITIALIZED gt CUPTI ERROR INVALID KIND if the activity kind is not supported www nvidia com CUPTI DA 05679 001 v5 5 40 Modules Description Disable collection of a specific kind of activity record for a context This setting done by this API will supersede the global settings for activity records Multiple kinds can be enabled by calling this function multiple times CUptiResult cuptiActivityEnable CUpti_ActivityKind kind Enable collection of a specific kind of activity record Parameters kind The kind of activity record to collect Returns gt CUPTI SUCCESS gt CUPTI ERROR NOT INITIALIZED gt CUPTI ERROR NOT COMPATIBLE if the activity kind cannot be enabled gt CUPTI ERROR INVALID KIND if the activity kind is not supported Description Enable collection of a specific kind of activity record Multiple kinds can be enabled by calling thi
118. izeBytes The size of the domainArray buffer must be at least numDomains sizeof CUpti EventDomainID or all domains will not be returned The value returned in arraySizeBytes contains the number of bytes returned in domainArray Thread safety this function is thread safe www nvidia com CUPTI DA 05679 001 _v5 5 72 Modules CUptiResult cuptiEventDomainEnumEvents CUpti EventDomainID eventDomain size t arraySizeBytes CUpti EventID eventArray Get the events in a domain Parameters eventDomain ID of the event domain arraySizeBytes The size of eventArray in bytes and returns the number of bytes written to eventArray eventArray Returns the IDs of the events in the domain Returns gt CUPTI SUCCESS gt CUPTI ERROR NOT INITIALIZED gt CUPTI ERROR INVALID EVENT DOMAIN ID gt CUPTI ERROR INVALID PARAMETER if arraySizeBytes or eventArray are NULL Description Returns the event IDs in eventArray for a domain The size of the eventArray buffer is given by arraySizeBytes The size of the eventArray buffer must be at least numdomainevents sizeof CUpti EventID or else all events will not be returned The value returned in arraySizeBytes contains the number of bytes returned in eventArray Thread safety this function is thread safe CUptiResult cuptiEventDomainGetAttribute CUpti_EventDomainID eventDomain www nvidia com CUPTI DA 05679 001 v5 5 73 Modules CUpti_EventDoma
119. k uint32_t CUpti_ActivityDevice maxSharedMemoryPerBlock Description Maximum amount of shared memory that can be assigned to a block in bytes uint32_t CUpti_ActivityDevice maxThreadsPerBlock Description Maximum number of threads allowed in a block uint32_t CUpti ActivityDevice maxWarpsPerMultiprocessor Description Maximum number of warps that can be present on a multiprocessor at any given time www nvidia com CUPTI DA 05679 001 v5 5 121 Data Structures const char CUpti_ActivityDevice name Description The device name This name is shared across all activity records representing instances of the device and so should not be modified uint32_t CUpti_ActivityDevice numMemcpyEngines Description Number of memory copy engines on the device uint32_t CUpti_ActivityDevice numMultiprocessors Description Number of multiprocessors on the device uint32_t CUpti_ActivityDevice numThreadsPerWarp Description The number of threads per warp on the device 3 7 CUpti_ActivityEnvironment Struct Reference The activity record for CUPTI environmental data This activity record provides CUPTI environmental data include power clocks and thermals This information is sampled at various rates and returned in this activity record The consumer of the record needs to check the environmentKind field to figure out what kind of environmental record this is CUpti_EnvironmentClocksThrottleReason CUpti Activity
120. lag CUpti ActivityMarker flags Description The flags associated with the marker See also CUpti ActivityFlag uint32 t CUpti ActivityMarker id Description The marker ID www nvidia com CUPTI DA 05679 001 v5 5 136 Data Structures CUpti ActivityKind CUpti_ActivityMarker kind Description The activity record kind must be CUPTI ACTIVITY KIND MARKER const char CUpti ActivityMarker name Description The marker name for an instantaneous or start marker This will be NULL for an end marker CUpti ActivityMarker objectld Description The identifier for the activity object associated with this marker objectKind indicates which ID is valid for this record CUpti ActivityObjectKind CUpti ActivityMarker objectKind Description The kind of activity object associated with this marker uint64 t CUpti ActivityMarker timestamp Description The timestamp for the marker in ns A value of 0 indicates that timestamp information could not be collected for the marker 3 14 CUpti ActivityMarkerData Struct Reference The activity record providing detailed information for a marker The marker data contains color payload and category CUPTI ACTIVITY KIND MARKER DATA www nvidia com CUPTI DA 05679 001 v5 5 137 Data Structures uint32 t CUpti_ActivityMarkerData category Description The category for the marker uint32 t CUpti ActivityMarkerData color Description The color for the marke
121. llbackState sub d and cuptiEnableAllDomains sub are called concurrently the results are undefined CUptiResult cuptiEnableCallback uint32 t enable CUpti SubscriberHandle subscriber CUpti CallbackDomain domain CUpti Callbackld cbid Enable or disabled callbacks for a specific domain and callback ID Parameters enable New enable state for the callback Zero disables the callback non zero enables the callback subscriber Handle to callback subscription domain The domain of the callback cbid The ID of the callback Returns gt CUPTI SUCCESS on success www nvidia com CUPTI DA 05679 001 v5 5 56 Modules gt CUPTI_LERROR_NOT_INITIALIZED if unable to initialized CUPTI gt CUPTI ERROR INVALID PARAMETER if subscriber domain or cbid is invalid Description Enable or disabled callbacks for a subscriber for a specific domain and callback ID Thread safety a subscriber must serialize access to cuptiGetCallbackState cuptiEnableCallback cuptiEnableDomain and cuptiEnableAllDomains For example if cuptiGetCallbackState sub d c and cuptiEnableCallback sub d c are called concurrently the results are undefined CUptiResult cuptiEnableDomain uint32_t enable CUpti_SubscriberHandle subscriber CUpti_CallbackDomain domain Enable or disabled all callbacks for a specific domain Parameters enable New enable state for all callbacks in the domain Zero disables all callbacks non zero enables all
122. lock CUPTI ACTIVITY PREEMPTION KIND RESTORE 2 Preemption to restore CDP block CUPTI ACTIVITY PREEMPTION KIND FORCE INT 0x7fffffff enum CUpti EnvironmentClocksThrottleReason Reasons for clock throttling The possible reasons that a clock can be throttled There can be more than one reason that a clock is being throttled so these types can be combined by bitwise OR These are used in the clocksThrottleReason field in the Environment Activity Record www nvidia com CUPTI DA 05679 001 v5 5 37 Modules Values CUPTI_CLOCKS_THROTTLE_REASON_GPU_IDLE 0x00000001 Nothing is running on the GPU and the clocks are dropping to idle state CUPTI_CLOCKS_THROTTLE_REASON_USER_DEFINED_CLOCKS 0x00000002 The GPU clocks are limited by a user specified limit CUPTI_CLOCKS_THROTTLE_REASON_SW_POWER_CAP 0x00000004 A software power scaling algorithm is reducing the clocks below requested clocks CUPTI CLOCKS THROTILE REASON HW SLOWDOWN 0x00000008 Hardware slowdown to reduce the clock by a factor of two or more is engaged This is an indicator of one of the following 1 Temperature is too high 2 External power brake assertion is being triggered e g by the system power supply 3 Change in power state CUPTI CLOCKS THROTILE REASON UNKNOWN 0x80000000 Some unspecified factor is reducing the clocks CUPTI CLOCKS THROTTLE REASON UNSUPPORTED 0x40000000 Throttle reason is not supported for this GPU CUPTI CLOCKS THROTTLE REASON NONE 0x000000
123. mber of shared memory store Single context transactions performed for each shared memory store shared_load_throughput Shared memory load throughput shared_store_throughput Shared memory store throughput Ratio of requested shared memory throughput Single context to required shared memory throughput www nvidia com CUPTI DA 05679 001 _v5 5 14 Metric Name 2_read_transactions 2_write_transactions 2_read_throughput 2_write_throughput 2_11_read_hit_rate 12_11_read_throughput l2 texture read hit rate D texure read throughput local memory overhead D shared utilization D utilization tex utilization dram utilization sysmem utilization ldst fu utilization int fu utilization www nvidia com CUPTI Introduction Memory read transactions seen at L2 cache for all read requests Memory write transactions seen at L2 cache for all write requests Memory read throughput seen at L2 cache for all read requests Memory write throughput seen at L2 cache for all write requests Hit rate at L2 cache for all read requests from L1 cache Memory read throughput seen at L2 cache for read requests from L1 cache Hit rate at L2 cache for all read requests from texture cache Memory read throughput seen at L2 cache for read requests from the texture cache Ratio of local memory traffic to total memory traffic between the L1 and L2 caches The utilization level of the L1 shared memory relativ
124. mber of the CUpti MetricValue union The metric value returned by cuptiMetricGetValue should be accessed using the appropriate member of that union based on its value kind www nvidia com CUPTI DA 05679 001 v5 5 156 Data Structures 3 29 CUpti_NvtxData Struct Reference Data passed into a NVTX callback function Data passed into a NVTX callback function as the cbdata argument to CUpti_CallbackFunc The cbdata will be this type for domain equal to CUPTI_CB_DOMAIN_NVTX Unless otherwise notes the callback data is valid only within the invocation of the callback function that is passed the data If you need to retain some data for use outside of the callback you must make a copy of that data const char CUpti_NvtxData functionName Description Name of the NVTX API function which issued the callback This string is a global constant and so may be accessed outside of the callback const void CUpti NvtxData functionParams Description Pointer to the arguments passed to the NVTX API call See generated_nvtx_meta h for structure definitions for the parameters for each NVTX API function 3 30 CUpti_ResourceData Struct Reference Data passed into a resource callback function Data passed into a resource callback function as the codata argument to CUpti_CallbackFunc The cbdata will be this type for domain equal to CUPTI_CB_DOMAIN_RESOURCE The callback data is valid only within the invocation of the callback function that is p
125. ment CUpti_ActivityPreemption CUpti_ActivityMarker V value CUpti ActivityMemset CUpti_ActivityMetricInstance CUpti_ActivityMetric CUpti_ActivityEventInstance CUpti_ActivityEvent www nvidia com CUPTI DA 05679 001 _v5 5 169 Notice ALL NVIDIA DESIGN SPECIFICATIONS REFERENCE BOARDS FILES DRAWINGS DIAGNOSTICS LISTS AND OTHER DOCUMENTS TOGETHER AND SEPARATELY MATERIALS ARE BEING PROVIDED AS IS NVIDIA MAKES NO WARRANTIES EXPRESSED IMPLIED STATUTORY OR OTHERWISE WITH RESPECT TO THE MATERIALS AND EXPRESSLY DISCLAIMS ALL IMPLIED WARRANTIES OF NONINFRINGEMENT MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE Information furnished is believed to be accurate and reliable However NVIDIA Corporation assumes no responsibility for the consequences of use of such information or for any infringement of patents or other rights of third parties that may result from its use No license is granted by implication of otherwise under any patent rights of NVIDIA Corporation Specifications mentioned in this publication are subject to change without notice This publication supersedes and replaces all other information previously supplied NVIDIA Corporation products are not authorized as critical components in life support devices or systems without express written approval of NVIDIA Corporation Trademarks NVIDIA and the NVIDIA logo are trademarks or registered trademarks of NVIDIA Corporation in the U S and other countrie
126. n a future release see the API documentation for information on these functions To ensure that all activity records are collected CUPTI must be initialized before any CUDA driver or runtime API is invoked Initialization can be done by enabling one or more activity kinds using cuptiActivityEnable or cuptiActivityEnableContext as shown in the initTrace function of the activity_trace_async sample Some activity kinds cannot be directly enabled see the API documentation for for CUpti_ActivityKind for details Functions cuptiActivityEnable and cuptiActivityEnableContext will return CUPTI ERROR NOT COMPATIBLE if the requested activity kind cannot be enabled The new activity buffer API uses callbacks to request and return buffers of activity records The use the asynchronous buffering API you must first register two callbacks using cuptiActivityRegisterCallbacks One of these callbacks will be invoked whenever CUPTI needs an empty activity buffer The other callback is used to deliver a buffer containing one or more activity records to the client To minimize profiling overhead the client should return as quickly as possible from these callbacks Functions cuptiActivityFlush and cuptiActivityFlushAll can be used to force CUPTI to deliver any activity buffers that contain completed activity records Functions cuptiActivityGetAttribute and cuptiActivitySetAttribute can be used
127. n a similar manner to cudaGetDevice or cuCtxGetDevice but may be called from within callback functions www nvidia com CUPTI DA 05679 001 v5 5 50 Modules CUptiResult cuptiGetStreamld CUcontext context CUstream stream uint32 t streamld Get the ID of a stream Parameters context If non NULL then the stream is checked to ensure that it belongs to this context Typically this parameter should be null stream The stream streamId Returns a context unique ID for the stream Returns gt CUPTI SUCCESS gt CUPTI ERROR NOT INITIALIZED gt CUPTI ERROR INVALID STREAM if unable to get stream ID or if context is non NULL and st ream does not belong to the context gt CUPTI ERROR INVALID PARAMETER if streamId is NULL Description Get the ID of a stream The stream ID is unique within a context i e all streams within a context will have unique stream IDs See also cuptiActivityEnqueueBuffer cuptiActivityDequeueBuffer CUptiResult cuptiGetTimestamp uint64_t timestamp Get the CUPTI timestamp Parameters timestamp Returns the CUPTI timestamp www nvidia com CUPTI DA 05679 001 _v5 5 51 Modules Returns gt CUPTI_SUCCESS gt CUPTI ERROR INVALID PARAMETER if timestamp is NULL Description Returns a timestamp normalized to correspond with the start and end timestamps reported in the CUPTI activity records The timestamp is reported in nanoseconds define CUPTI CORRELATION ID
128. nel2 CUpti ActivityMemcpy CUpti ActivityMemcpy2 resourceDescriptor CUpti ResourceData return Value CUpti Activity API runtimeCorrelationId S CUpti ActivityMemset CUpti ActivityMemcpy CUpti ActivityKernel sets CUpti_EventGroupSets www nvidia com CUPTI Data Fields DA 05679 001 _v5 5 167 sharedMemoryConfig CUpti_ActivityKernel2 CUpti_ActivityCdpKernel smClock CUpti_ActivityEnvironment sourceLocatorld CUpti_ActivityGlobalAccess CUpti_ActivityBranch speed CUpti_ActivityEnvironment srcContextId CUpti ActivityMemcpy2 srcDeviceld CUpti_ActivityMemcpy2 srcKind CUpti ActivityMemcpy CUpti ActivityMemcpy2 start CUpti ActivityKernel2 CUpti_ActivityCdpKernel CUpti Activity API CUpti ActivityOverhead CUpti_ActivityMemcpy CUpti_ActivityMemcpy2 CUpti_ActivityMemset CUpti_ActivityKernel staticSharedMemory CUpti_ActivityKernel CUpti_ActivityKernel2 CUpti_ActivityCdpKernel stream CUpti_ResourceData CUpti_SynchronizeData streamId CUpti_ActivityCdpKernel CUpti_ActivityKernel2 CUpti_ActivityKernel CUpti_ActivityMemcpy2 CUpti_ActivityMemset CUpti_ActivityMemcpy submitted CUpti_ActivityCdpKernel www nvidia com CUPTI Data Fields DA 05679 001 _v5 5 168 Data Fields symbolName CUpti_CallbackData T temperature CUpti ActivityEnvironment threadId CUpti_ActivityAPI threadsExecuted CUpti_ActivityBranch CUpti_ActivityGlobalAccess timestamp CUpti_ActivityEnviron
129. ng code has been removed Stake vord CUP IDAS I getEventValueCallback void userdata CUpti CallbackDomain domain eU ME Called bil const void cbdata const CUpti CallbackData cbData CUpti CallbackData cbdata if cbData gt callbackSite CUPTI API ENTER cudaThreadSynchronize cuptiSetEventCollectionMode cbInfo context CUPTI EVENT COLLECTION MODE KERNEL cuptiEventGroupEnable eventGroup 338 lee ere ach Sages gt CUAL MAL Im A cudaThreadSynchronize cuptiEventGroupReadEvent eventGroup CUPTI EVENT READ FLAG NONE www nvidia com CUPTI DA 05679 001 _v5 5 8 Introduction eventld amp bytesRead amp eventVal cuptiEventGroupDisable eventGroup Two synchronization points are used to ensure that events are counted only for the execution of the kernel If the application contains other threads that launch kernels then additional thread level synchronization must also be introduced to ensure that those threads do not launch kernels while the callback is collecting events When the cudaLaunch API is entered that is before the kernel is actually launched on the device cudaThreadSynchronize is used to wait until the GPU is idle The event collection mode is set to CUPTI EVENT COLLECTION MODE KERNEL so that the event counters are automatically s
130. ngle multiprocessor Average number of instructions executed by each warp Multi context Multi context Multi context Number of issued control flow instructions Multi context Number of executed control flow instructions Multi context ldst_issued Number of issued load and store instructions ldst_executed branch_efficiency warp_execution_efficiency warp_nonpred_execution_efficiency inst_replay_overhead shared_replay_overhead global cache replay overhead www nvidia com CUPTI Number of executed load and store instructions Ratio of non divergent branches to total branches Ratio of the average active threads per warp to the maximum number of threads per warp supported on a multiprocessor Ratio of the average active threads per warp executing non predicated instructions to the maximum number of threads per warp supported on a multiprocessor Average number of replays for each instruction executed Average number of replays due to shared memory conflicts for each instruction executed Average number of replays due to global memory cache misses for each instruction executed Multi context Multi context Multi context Multi context Multi context Multi context Single context Single context DA 05679 001 _v5 5 18 Introduction local_replay_overhead Average number of replays due to local Single context memory accesses for each instruction executed gld_efficiency Ratio of reque
131. nization callback IDs are defined by CUpti_CallbackIdSync Callback Function Your callback function must be of type CUpti_CallbackFunc This function type has two arguments that specify the callback domain and ID so that you know why the callback is occurring The type also has a codata argument that is used to pass data specific to the callback Subscriber A subscriber is used to associate each of your callback functions with one or more CUDA API functions There can be at most one subscriber initialized with cuptiSubscribe at any time Before initializing a new subscriber the existing subscriber must be finalized with cuptiUnsubscribe Each callback domain is described in detail below Unless explicitly stated it is not supported to call any CUDA runtime or driver API from within a callback function Doing so may cause the application to hang www nvidia com CUPTI DA 05679 001 _v5 5 3 Introduction 1 4 1 Driver and Runtime API Callbacks Using the callback API with the CUPTI CB DOMAIN DRIVER API or CUPTI CB DOMAIN RUNTIME API domains you can associate a callback function with one or more CUDA API functions When those CUDA functions are invoked in the application your callback function is invoked as well For these domains the cbdata argument to your callback function will be of the type CUpti_CallbackData It is legal to call cudaThreadSynchronize c
132. nsion grid size for the kernel int32 t CUpti_ActivityCdpKernel gridZ Description The Z dimension grid size for the kernel CUpti ActivityKind CUpti_ActivityCdpKernel kind Description The activity record kind must be CUPTI ACTIVITY KIND CDP KERNEL uint32 t CUpti ActivityCdpKernel localMemoryPerThread Description The amount of local memory reserved for each thread in bytes uint32 t CUpti_ActivityCdpKernel localMemoryTotal Description The total amount of local memory reserved for the kernel in bytes const char CUpti ActivityCdpKernel name Description The name of the kernel This name is shared across all activity records representing the same kernel and so should not be modified www nvidia com CUPTI DA 05679 001 v5 5 115 Data Structures uint32 t CUpti ActivityCdpKernel parentBlockX Description The X dimension of the parent block uint32 t CUpti ActivityCdpKernel parentBlockY Description The Y dimension of the parent block uint32 t CUpti ActivityCdpKernel parentBlockZ Description The Z dimension of the parent block int64_t CUpti_ActivityCdpKernel parentGridld Description The grid ID of the parent kernel uint64_t CUpti_ActivityCdpKernel queued Description The timestamp when kernel is queued up in ns A value of CUPTI TIMESTAMP UNKNOWN indicates that the queued time is unknown uint16_t CUpti_ActivityCdpKernel registersPerThread Description The number of regist
133. nst another thread simultaneously destroying context CUptiResult cuptiEventGroupSetsDestroy CUpti_EventGroupSets eventGroupSets Destroy a CUpti_EventGroupSets object Parameters eventGroupSets The object to destroy Returns gt CUPTI SUCCESS gt CUPTI ERROR NOT INITIALIZED gt CUPTI ERROR INVALID OPERATION if any of the event groups contained in the sets is enabled gt CUPTI ERROR INVALID PARAMETER if eventGroupSets is NULL Description Destroy a CUpti EventGroupSets object Thread safety this function is thread safe CUptiResult cuptiGetNumEventDomains uint32 t numDomains Get the number of event domains available on any device Parameters numDomains Returns the number of domains Returns gt CUPTI_SUCCESS gt CUPTI ERROR INVALID PARAMETER www nvidia com CUPTI DA 05679 001 _v5 5 91 Modules if numDomains is NULL Description Returns the total number of event domains available on any CUDA capable device Thread safety this function is thread safe CUptiResult cuptiSetEventCollectionMode CUcontext context CUpti EventCollectionMode mode Set the event collection mode Parameters context The context mode The event collection mode Returns gt CUPTI SUCCESS gt CUPTI ERROR NOT INITIALIZED gt CUPTI ERROR INVALID CONTEXT gt CUPTI ERROR INVALID OPERATION if called when replay mode is enabled Description Set the event collection mode for a contex
134. nt collection mode will be set to CUPTI EVENT COLLECTION MODE CONTINUOUS All previously enabled event groups and event group sets will be disabled Thread safety this function is thread safe CUptiResult cuptiEnableKernelReplayMode CUcontext context Enable kernel replay mode Parameters context The context www nvidia com CUPTI DA 05679 001 _v5 5 71 Modules Returns CUPIL SUCCESS Description Set profiling mode for the context to replay mode In this mode any number of events can be collected in one run of the kernel The event collection mode will automatically switch to CUPTI EVENT COLLECTION MODE KERNEL In this mode cuptiSetEventCollectionMode will return CUPTI ERROR INVALID OPERATION gt Kernels might take longer to run if many events are enabled gt Thread safety this function is thread safe CUptiResult cuptiEnumEventDomains size t arraySizeBytes CUpti EventDomainID domainArray Get the event domains available on any device Parameters arraySizeBytes The size of domainArray in bytes and returns the number of bytes written to domainArray domainArray Returns all the event domains Returns gt CUPTI SUCCESS gt CUPTI ERROR INVALID PARAMETER if arraySizeBytes or domainArray are NULL Description Returns all the event domains available on any CUDA capable device Event domain IDs are returned in domainArray The size of the domainArray buffer is given by arrayS
135. o initialize CUPTI gt CUPTI ERROR MAX LIMIT REACHED if there is already a CUPTI subscriber gt CUPTI ERROR INVALID PARAMETER if subscriber is NULL Description Initializes a callback subscriber with a callback function and optionally a pointer to user data The returned subscriber handle can be used to enable and disable the callback for specific domains and callback IDs gt Only a single subscriber can be registered at a time gt This function does not enable any callbacks gt Thread safety this function is thread safe www nvidia com CUPTI DA 05679 001 v5 5 60 Modules CUptiResult cuptiSupportedDomains size_t domainCount CUpti_DomainTable domainTable Get the available callback domains Parameters domainCount Returns number of callback domains domainTable Returns pointer to array of available callback domains Returns gt CUPTI_SUCCESS on success gt CUPTI ERROR NOT INITIALIZED if unable to initialize CUPTI gt CUPTI ERROR INVALID PARAMETER if domainCount or domainTable are NULL Description Returns in domainTable an array of size domainCount of all the available callback domains Thread safety this function is thread safe CUptiResult cuptiUnsubscribe CUpti SubscriberHandle subscriber Unregister a callback subscriber Parameters subscriber Handle to the initialize subscriber Returns gt CUPTI SUCCESS on success gt CUPTI ERROR NOT INITIALI
136. om CUPTI DA 05679 001 _v5 5 158 Chapter 4 DATA FIELDS Here is a list of all documented struct and union fields with links to the struct union documentation for each field B blockX CUpti_ActivityKernel CUpti_ActivityKernel2 CUpti_ActivityPreemption CUpti_ActivityCdpKernel blockY CUpti_ActivityPreemption CUpti_ActivityKernel CUpti_ActivityKernel2 CUpti_ActivityCdpKernel blockZ CUpti_ActivityCdpKernel CUpti_ActivityKernel CUpti_ActivityKernel2 CUpti_ActivityPreemption bytes CUpti_ActivityMemset CUpti_ActivityMemcpy CUpti_ActivityMemcpy2 C cacheConfigExecuted CUpti_ActivityKernel cacheConfigRequested CUpti_ActivityKernel www nvidia com CUPTI DA 05679 001 _v5 5 159 callbackSite CUpti_CallbackData category CUpti_ActivityMarkerData cbid CUpti_Activity API clocksThrottleReasons CUpti_ActivityEnvironment color CUpti_ActivityMarkerData completed CUpti_ActivityKernel2 CUpti_ActivityCdpKernel computeApiKind CUpti_ActivityContext computeCapabilityMajor CUpti_ActivityDevice computeCapabilityMinor CUpti_ActivityDevice constantMemorySize CUpti_ActivityDevice context CUpti_SynchronizeData CUpti_CallbackData CUpti_ResourceData contextld CUpti_ActivityMemcpy CUpti_ActivityMemcpy2 CUpti_ActivityMemset CUpti_ActivityKernel CUpti_ActivityKernel2 CUpti_ActivityCdpKernel CUpti_ActivityContext contextUid CUpti_CallbackData cooling CUpti_ActivityEnvironment copyKind CUpti_ActivityMemcpy2 CUpti_Activ
137. operties in numProp that are required to calculate a metric CUptiResult cuptiMetricGetValue CUdevice device CUpti Merci metric size t eventldArraySizeBytes CUpti_EventID eventldArray size t eventValueArraySizeBytes uint64 t www nvidia com CUPTI DA 05679 001 v5 5 103 Modules eventValueArray uint64 t timeDuration CUpti_MetricValue metricValue Calculate the value for a metric Parameters device The CUDA device that the metric is being calculated for metric The metric ID eventIdArraySizeBytes The size of event IdArray in bytes eventIdArray The event IDs required to calculate metric eventValueArraySizeBytes The size of eventValueArray in bytes eventValueArray The normalized event values required to calculate metric The values must be order to match the order of events in eventIdArray timeDuration The duration over which the events were collected in ns metricValue Returns the value for the metric Returns gt CUPTI SUCCESS gt CUPTI_LERROR_NOT_INITIALIZED gt CUPTI_LERROR_INVALID_METRIC_ID gt CUPTI ERROR INVALID OPERATION gt CUPTII ERROR PARAMETER SIZE NOT SUFFICIENT if the eventIdArray does not contain all the events needed for metric gt CUPTI ERROR INVALID EVENT VALUE if any of the event values required for the metric is CUPTI EVENT OVERFLOW gt CUPTL ERROR INVALID METRIC VALUE if the computed metric value cannot be represented in the metric s value type For
138. oughput Single context to required shared memory throughput 2_read_transactions 2_write_transactions 2_read_throughput 2_write_throughput www nvidia com CUPTI Memory read transactions seen at L2 cache Single context for all read requests Memory write transactions seen at L2 cache Single context for all write requests Memory read throughput seen at L2 cache for Single context all read requests Memory write throughput seen at L2 cache for Single context all write requests DA 05679 001 _v5 5 20 Metric Name 2_11_read_hit_rate 12_11_read_throughput l2 texture read hit rate D texure read throughput local memory overhead D shared utilization D utilization tex utilization dram utilization sysmem utilization ldst fu utilization int fu utilization cf fu utilization tex fu utilization tex fu utilization www nvidia com CUPTI Introduction Hit rate at L2 cache for all read requests from L1 cache Memory read throughput seen at L2 cache for read requests from L1 cache Hit rate at L2 cache for all read requests from texture cache Memory read throughput seen at L2 cache for read requests from the texture cache Ratio of local memory traffic to total memory traffic between the L1 and L2 caches The utilization level of the L1 shared memory relative to peak utilization The utilization level of the L2 cache relative to the peak utilization The utilization lev
139. r CUpti ActivityFlag CUpti_ActivityMarkerData flags Description The flags associated with the marker See also CUpti ActivityFlag uint32 t CUpti ActivityMarkerData id Description The marker ID CUpti ActivityKind CUpti ActivityMarkerData kind Description The activity record kind must be CUPTI ACTIVITY KIND MARKER DATA CUpti ActivityMarkerData payload Description The payload value www nvidia com CUPTI DA 05679 001 v5 5 138 Data Structures CUpti_MetricValueKind CUpti_ActivityMarkerData payloadKind Description Defines the payload format for the value associated with the marker 3 15 CUpti_ActivityMemcpy Struct Reference The activity record for memory copies This activity record represents a memory copy CUPTI_ACTIVITY_KIND_MEMCPY uint64_t CUpti_ActivityMemcpy bytes Description The number of bytes transferred by the memory copy uint32 t CUpti ActivityMemcpy contextld Description The ID of the context where the memory copy is occurring uint8 t CUpti_ActivityMemcpy copyKind Description The kind of the memory copy stored as a byte to reduce record size See also CUpti ActivityMemcpyKind uint32 t CUpti ActivityMemcpy correlationld Description The correlation ID of the memory copy Each memory copy is assigned a unique correlation ID that is identical to the correlation ID in the driver API activity record that launched the memory copy www nvidia com CUPTI DA 05
140. raged over all multiprocessors on the GPU sm_efficiency_instance The percentage of time at least one warp is Single context active on a specific multiprocessor achieved_occupancy Ratio of the average active warps per active Multi context cycle to the maximum number of warps supported on a multiprocessor issue_slot_utilization Percentage of issue slots that issued at least Multi context one instruction averaged across all cycles et ipc instance Instructions executed per cycle for a single Multi context multiprocessor inst per warp Average number of instructions executed by Multi context each warp Number of issued control flow instructions Multi context Number of executed control flow instructions Multi context ldst issued Number of issued load and store instructions Multi context ldst executed Number of executed load and store Multi context instructions branch efficiency Ratio of non divergent branches to total Multi context branches warp execution efficiency Ratio of the average active threads per warp Multi context to the maximum number of threads per warp supported on a multiprocessor inst replay overhead Average number of replays for each Multi context instruction executed www nvidia com CUPTI DA 05679 001 v5 5 12 Introduction shared_replay_overhead Average number of replays due to shared Single context memory conflicts for each instruction executed global cache replay overhead Average number of replays
141. rations Multi context executed Single precision floating point multiply Multi context operations executed Single precision floating point multiply Multi context accumulate operations executed Double precision floating point operations Multi context executed Double precision floating point add operations Multi context executed Double precision floating point multiply Multi context operations executed Double precision floating point multiply Multi context accumulate operations executed Single precision floating point special Multi context operations executed Percentage of stalls occurring because the Multi context next assembly instruction has not yet been fetched Percentage of stalls occurring because an Multi context input required by the instruction is not yet available Percentage of stalls occurring because a Multi context memory operation cannot be performed due to the required resources not being available or fully utilized or because too many requests of a given type are outstanding DA 05679 001 _v5 5 22 Introduction stall_sync Percentage of stalls occurring because the Multi context warp is blocked at a syncthreads call stall_texture Percentage of stalls occurring because the Multi context texture sub system is fully utilized or has too many outstanding requests stall_other Percentage of stalls occurring due to Multi context miscellaneous reasons 1 7 Samples The CUPTI installation includes
142. rce or synchronization callback Within a driver API callback this should be interpreted as a CUpti_driver_api_trace_cbid value these values are defined in cupti_driver_cbid h Within a runtime API callback this should be interpreted as a CUpti_runtime_api_trace_cbid value these values are defined in cupti_runtime_cbid h Within a resource API callback this should be interpreted as a CUpti_CallbackIdResource value Within a synchronize API callback this should be interpreted as a CUpti_CallbackIdSync value typedef CUpti_DomainTable Pointer to an array of callback domains typedef struct CUpti_Subscriber_st CUpti_SubscriberHandle A callback subscriber CUptiResult cuptiEnableAllDomains uint32_t enable CUpti_SubscriberHandle subscriber Enable or disable all callbacks in all domains Parameters enable New enable state for all callbacks in all domain Zero disables all callbacks non zero enables all callbacks www nvidia com CUPTI DA 05679 001 _v5 5 55 Modules subscriber Handle to callback subscription Returns gt CUPTI SUCCESS on success gt CUPTI ERROR NOT INITIALIZED if unable to initialized CUPTI gt CUPTI ERROR INVALID PARAMETER if subscriber is invalid Description Enable or disable all callbacks in all domains Thread safety a subscriber must serialize access to cuptiGetCallbackState cuptiEnableCallback cuptiEnableDomain and cuptiEnableAllDomains For example if cuptiGetCa
143. ric or undefined if unable to find the metric Returns gt CUPTI SUCCESS gt CUPTI ERROR NOT INITIALIZED gt CUPTI ERROR INVALID DEVICE gt CUPTI ERROR INVALID METRIC NAME if unable to find a metric with name met ricName In this case metric is undefined gt CUPTI ERROR INVALID PARAMETER if metricName ormetric are NULL Description Find a metric by name and return the metric ID in metric CUptiResult cuptiMetricGetNumEvents CUpti MetricID metric uint32 t numEvents Get number of events required to calculate a metric Parameters metric ID of the metric numEvents Returns the number of events required for the metric www nvidia com CUPTI DA 05679 001 v5 5 102 Modules Returns gt CUPTI SUCCESS gt CUPTI_LERROR_NOT_INITIALIZED gt CUPTI_LERROR_INVALID_METRIC_ID gt CUPTI ERROR INVALID PARAMETER if numEvents is NULL Description Returns the number of events in numEvents that are required to calculate a metric CUptiResult cuptiMetricGetNumProperties CUpti MetricID metric uint32 t numProp Get number of properties required to calculate a metric Parameters metric ID of the metric numProp Returns the number of properties required for the metric Returns gt CUPII SUCCESS gt CUPTI ERROR NOT INITIALIZED gt CUPTI ERROR INVALID METRIC ID gt CUPTI ERROR INVALID PARAMETER if numProp is NULL Description Returns the number of pr
144. ross all activity records representing the same kernel and so should not be modified uint16_t CUpti_ActivityKernel2 registersPerThread Description The number of registers required for each thread executing the kernel uint8_t CUpti_ActivityKernel2 requested Description The cache configuration requested by the kernel The value is one of the CUfunc_cache enumeration values from cuda h void CUpti ActivityKerneU reservedO Description Undefined Reserved for internal use uint8 t CUpti_ActivityKernel2 sharedMemoryConfig Description The shared memory configuration used for the kernel The value is one of the CUsharedconfig enumeration values from cuda h www nvidia com CUPTI DA 05679 001 v5 5 135 Data Structures uint64_t CUpti_ActivityKernel2 start Description The start timestamp for the kernel execution in ns A value of 0 for both the start and end timestamps indicates that timestamp information could not be collected for the kernel int32 t CUpti_ActivityKernel2 staticSharedMemory Description The static shared memory allocated for the kernel in bytes uint32 t CUpti_ActivityKernel2 streamld Description The ID of the stream where the kernel is executing 3 13 CUpti ActivityMarker Struct Reference The activity record providing a marker which is an instantaneous point in time The marker is specified with a descriptive name and unique id CUPTI ACTIVITY KIND MARKER CUpti ActivityF
145. s Parameters context The context or NULL to dequeue from the global queue streamId The stream ID buffer Returns the dequeued buffer validBufferSizeBytes Returns the number of bytes in the buffer that contain activity records Returns gt CUPTI SUCCESS gt CUPTI ERROR NOT INITIALIZED gt CUPTI ERROR INVALID OPERATION if preceeded by a successful call to cuptiActivityRegisterCallbacks gt CUPTI ERROR INVALID PARAMETER ifbufferorvalidBufferSizeBytes are NULL gt CUPTI ERROR QUEUE EMPTY the queue is empty buffer returns NULL and validBufferSizeBytes returns 0 Description Remove the buffer from the head of the specified queue See cuptiActivityEnqueueBuffer for description of queues Calling this function transfers ownership of the buffer from CUPTI CUPTI will no add any activity records to the buffer after it is dequeued DEPRECATED This method is deprecated and will be removed in a future release The new asynchronous API implemented by cuptiActivityRegisterCallbacks cuptiActivityFlush and cuptiActivityFlushAll should be adopted www nvidia com CUPTI DA 05679 001 v5 5 39 Modules CUptiResult cuptiActivityDisable CUpti_ActivityKind kind Disable collection of a specific kind of activity record Parameters kind The kind of activity record to stop collecting Returns gt CUPTI SUCCESS gt CUPTI ERROR NOT INITIALIZED gt CUPTI ERROR INVALID KIND if the act
146. s gt CUPTI SUCCESS gt CUPTI ERROR NOT INITIALIZED gt CUPTI ERROR INVALID OPERATION if preceeded by a successful call to cuptiActivityRegisterCallbacks gt CUPTI ERROR INVALID PARAMETER www nvidia com CUPTI DA 05679 001 _v5 5 42 Modules if buffer is NULL does not have alignment of at least 8 bytes or is not at least 1024 bytes in size Description Queue a buffer for activity record collection Calling this function transfers ownership of the buffer to CUPTI The buffer should not be accessed or modified until ownership is regained by calling cuptiActivityDequeueBuffer There are three types of queues Global Queue The global queue collects all activity records that are not associated with a valid context All device and API activity records are collected in the global queue A buffer is enqueued in the global queue by specifying context NULL Context Queue Each context queue collects activity records associated with that context that are not associated with a specific stream or that are associated with the default stream A buffer is enqueued in a context queue by specifying the context and a streamId of 0 Stream Queue Each stream queue collects memcpy memset and kernel activity records associated with the stream A buffer is enqueued in a stream queue by specifying a context and a non zero stream ID Multiple buffers can be enqueued on each queue and buffers can be enqueue on multiple queu
147. s uint32_t CUpti_ActivityDevice id Description The device ID www nvidia com CUPTI DA 05679 001 _v5 5 119 Data Structures CUpti_ActivityKind CUpti_ActivityDevice kind Description The activity record kind must be CUPTI ACTIVITY KIND DEVICE uint32 t CUpti_ActivityDevice l2CacheSize Description The size of the L2 cache on the device in bytes uint32 t CUpti ActivityDevice maxBlockDimX Description Maximum allowed X dimension for a block uint32 t CUpti ActivityDevice maxBlockDimY Description Maximum allowed Y dimension for a block uint32 t CUpti ActivityDevice maxBlockDimZ Description Maximum allowed Z dimension for a block uint32 t CUpti ActivityDevice maxBlocksPerMultiprocessor Description Maximum number of blocks that can be present on a multiprocessor at any given time uint32 t CUpti ActivityDevice maxGridDimX Description Maximum allowed X dimension for a grid www nvidia com CUPTI DA 05679 001 v5 5 120 Data Structures uint32_t CUpti_ActivityDevice maxGridDimY Description Maximum allowed Y dimension for a grid uint32_t CUpti_ActivityDevice maxGridDimZ Description Maximum allowed Z dimension for a grid uint32_t CUpti_ActivityDevice maxIPC Description The maximum instructions per cycle possible on each device multiprocessor uint32 t CUpti_ActivityDevice maxRegistersPerBlock Description Maximum number of registers that can be allocated to a bloc
148. s Other company and product names may be trademarks of the respective companies with which they are associated Copyright 2007 2013 NVIDIA Corporation All rights reserved e www nvidia com nVIDIA
149. s function multiple times By default all activity kinds are disabled for collection CUptiResult cuptiActivityEnableContext CUcontext context CUpti ActivityKind kind Enable collection of a specific kind of activity record for a context Parameters context The context for which activity is to be enabled kind The kind of activity record to collect www nvidia com CUPTI DA 05679 001 v5 5 41 Modules Returns gt CUPTI SUCCESS gt CUPTI ERROR NOT INITIALIZED gt CUPTI ERROR NOT COMPATIBLE if the activity kind cannot be enabled gt CUPTI ERROR INVALID KIND if the activity kind is not supported Description Enable collection of a specific kind of activity record for a context This setting done by this API will supersede the global settings for activity records enabled by cuptiActivityEnable Multiple kinds can be enabled by calling this function multiple times CUptiResult cuptiActivityEnqueueBuffer CUcontext context uint32 t streamld uint8 t buffer size t bufferSizeBytes Queue a buffer for activity record collection Parameters context The context or NULL to enqueue on the global queue streamId The stream ID buffer The pointer to user supplied buffer for storing activity records The buffer must be at least 8 byte aligned and the size of the buffer must be at least 1024 bytes bufferSizeBytes The size of the buffer in bytes The size of the buffer must be at least 1024 bytes Return
150. s related to power CUPTI ACTIVITY ENVIRONMENT COOLING 4 The environment data is related to cooling CUPTI ACTIVITY ENVIRONMENT COUNT CUPTI ACTIVITY ENVIRONMENT KIND FORCE INT Ox7fffffff enum CUpti ActivityFlag Flags associated with activity records Activity record flags Flags can be combined by bitwise OR to associated multiple flags with an activity record Each flag is specific to a certain activity kind as noted below Values CUPTI ACTIVITY FLAG NONE 0 Indicates the activity record has no flags CUPTI ACTIVITY FLAG DEVICE CONCURRENT KERNELS 1 0 www nvidia com CUPTI DA 05679 001 v5 5 31 Modules Indicates the activity represents a device that supports concurrent kernel execution Valid for CUPTI ACTIVITY KIND DEVICE CUPTI ACTIVITY FLAG MEMCPY ASYNC 1 0 Indicates the activity represents an asynchronous memcpy operation Valid for CUPTI ACTIVITY KIND MEMCPY CUPTI ACTIVITY FLAG MARKER INSTANTANEOUS 1 0 Indicates the activity represents an instantaneous marker Valid for CUPTI ACTIVITY KIND MARKER CUPTI ACTIVITY FLAG MARKER START 1 1 Indicates the activity represents a region start marker Valid for CUPTI ACTIVITY KIND MARKER CUPTI ACTIVITY FLAG MARKER END 1 lt lt 2 Indicates the activity represents a region end marker Valid for CUPTI ACTIVITY KIND MARKER CUPTI ACTIVITY FLAG MARKER COLOR NONE 1 0 Indicates the activity represents a marker that does not specify a color Valid for CUPTI
151. several samples that demonstrate the use of the CUPTI APIs The samples are activity_trace_async This sample shows how to collect a trace of CPU and GPU activity using the new asynchronous activity buffer APIs callback_event This sample shows how to use both the callback and event APIs to record the events that occur during the execution of a simple kernel The sample shows the required ordering for synchronization and for event group enabling disabling and reading callback_metric This sample shows how to use both the callback and metric APIs to record the metric s events during the execution of a simple kernel and then use those events to calculate the metric value callback_timestamp This sample shows how to use the callback API to record a trace of API start and stop times cupti_query This sample shows how to query CUDA enabled devices for their event domains events and metrics event_sampling This sample shows how to use the event API to sample events using a separate host thread www nvidia com CUPTI DA 05679 001 _v5 5 23 Chapter 2 MODULES Here is a list of all modules gt CUPTI Version gt CUPTI Result Codes gt CUPTI Activity API gt CUPTI Callback API gt CUPTI Event API gt CUPTI Metric API 2 1 CUPTI Version Function and macro to determine the CUPTI version CUptiResult cuptiGetVersion uint32_t version Get the CUPTI API version Parameters version Returns the version
152. sted global memory load Single context throughput to required global memory load throughput gst_efficiency Ratio of requested global memory store Single context throughput to required global memory store throughput gld_transactions Number of global memory load transactions Single context gst_transactions Number of global memory store transactions Single context gld_transactions_per_request Average number of global memory load Single context transactions performed for each global memory load gst_transactions_per_request Average number of global memory store Single context transactions performed for each global memory store local_load_transactions_per_ Average number of local memory load Single context request transactions performed for each local memory load local_store_transactions_per_ Average number of local memory store Single context request transactions performed for each local memory store www nvidia com CUPTI DA 05679 001 _v5 5 19 Metric Name shared_load_transactions_per_ request shared_store_transactions_per_ request Introduction Average number of shared memory load Single context transactions performed for each shared memory load Average number of shared memory store Single context transactions performed for each shared memory store shared_load_throughput Shared memory load throughput shared_store_throughput Shared memory store throughput shared_efficiency Ratio of requested shared memory thr
153. t CUPTI ERROR MAX LIMIT REACHED if eventGroup is full gt CUPTI ERROR INVALID PARAMETER if eventGroup is NULL Description Add an event to an event group The event add can fail for a number of reasons gt The event group is enabled gt The event does not belong to the same event domain as the events that are already in the event group gt Device limitations on the events that can belong to the same group gt The event group is full Thread safety this function is thread safe CUptiResult cuptiEventGroupCreate CUcontext context CUpti EventGroup eventGroup uint32 t flags Create a new event group for a context Parameters context The context for the event group eventGroup Returns the new event group flags Reserved must be zero Returns gt CUPTI SUCCESS gt CUPTI ERROR NOT INITIALIZED gt CUPTI ERROR INVALID CONTEXT gt CUPTI ERROR OUT OF MEMORY gt CUPTI ERROR INVALID PARAMETER www nvidia com CUPTI DA 05679 001 v5 5 78 Modules if eventGroup is NULL Description Creates a new event group for context and returns the new group in eventGroup gt flags are reserved for future use and should be set to zero gt Thread safety this function is thread safe CUptiResult cuptiEventGroupDestroy CUpti_EventGroup eventGroup Destroy an event group Parameters eventGroup The event group to destroy Returns gt CUPTI_SUCCESS gt CUPTI_ERROR_
154. t The mode controls the event collection behavior of all events in event groups created in the context This API is invalid in kernel replay mode Thread safety this function is thread safe define CUPTI EVENT OVERFLOW uint64_t OxFFFFFFFFFFFFFFFFULL The overflow value for a CUPTI event The CUPTI event value that indicates an overflow www nvidia com CUPTI DA 05679 001 _v5 5 92 Modules 2 6 CUPTI Metric API Functions types and enums that implement the CUPTI Metric API union CUpti_MetricValue A metric value enum CUpti_MetricAttribute Metric attributes Metric attributes describe properties of a metric These attributes can be read using cuptiMetricGetAttribute Values CUPTI_METRIC_ATTR_NAME 0 Metric name Value is a null terminated const c string CUPTI_METRIC_ATTR_SHORT_DESCRIPTION 1 Short description of metric Value is a null terminated const c string CUPTI_METRIC_ATTR_LONG_DESCRIPTION 2 Long description of metric Value is a null terminated const c string CUPTI_METRIC_ATTR_CATEGORY 3 Category of the metric Value is of type CUpti_MetricCategory CUPTI_METRIC_ATTR_VALUE_KIND 4 Value type of the metric Value is of type CUpti_MetricValueKind CUPTI METRIC ATTR EVALUATION MODE 5 Metric evaluation mode Value is of type CUpti MetricEvaluationMode CUPTI METRIC ATTR FORCE INT Ox7fffffff enum CUpti MetricCategory A metric category Each metric is assigned to a category that
155. t values are only reset when the events are read CUPTI EVENT COLLECTION MODE KERNEL 1 Events are collected only for the durations of kernel executions that occur between the cuptiEventGroupEnable and cuptiEventGroupDisable calls Event collection begins when a kernel execution begins and stops when kernel execution completes Event values are reset to zero when each kernel execution begins If multiple kernel executions occur between the cuptiEventGroupEnable and cuptiEventGroupDisable www nvidia com CUPTI DA 05679 001 _v5 5 64 Modules calls then the event values must be read after each kernel launch if those events need to be associated with the specific kernel launch CUPTI EVENT COLLECTION MODE FORCE INT Ox7fffffff enum CUpti EventDomainAttribute Event domain attributes Event domain attributes Except where noted all the attributes can be read using either cuptiDeviceGetEventDomainAttribute or cuptiEventDomainGetAttribute Values CUPTI_EVENT_DOMAIN_ATTR_NAME 0 Event domain name Value is a null terminated const c string CUPTI EVENT DOMAIN ATTR INSTANCE COUNT 1 Number of instances of the domain for which event counts will be collected The domain may have additional instances that cannot be profiled see CUPTI EVENT DOMAIN ATTR TOTAL INSTANCE COUNT Can be read only with cuptiDeviceGetEventDomainAttribute Value is a uint32_t CUPTI EVENT DOMAIN ATTR TOTAL INSTANCE COUNT 3 Total number of instances of the domain inclu
156. t32 t CUpti ActivityGlobalAccess sourceLocatorld Description The ID for source locator uint64 t CUpti ActivityGlobalAccess threadsExecuted Description This increments each time when this instruction is executed by number of threads that executed this instruction 3 11 CUpti ActivityKernel Struct Reference The activity record for kernel deprecated This activity record represents a kernel execution CUPTI ACTIVITY KIND KERNEL and CUPTI ACTIVITY KIND CONCURRENT KERNEL but is no longer generated www nvidia com CUPTI DA 05679 001 v5 5 128 Data Structures by CUPTI Kernel activities are not reported using the CUpti_ActivityKernel2 activity record int32 t CUpti ActivityKernel blockX Description The X dimension block size for the kernel int32 t CUpti ActivityKernel blockY Description The Y dimension block size for the kernel int32 t CUpti ActivityKernel blockZ Description The Z dimension grid size for the kernel uint8 t CUpti ActivityKernel cacheConfigExecuted Description The cache configuration used for the kernel The value is one of the CUfunc cache enumeration values from cuda h uint8 t CUpti_ActivityKernel cacheConfigRequested Description The cache configuration requested by the kernel The value is one of the CUfunc cache enumeration values from cuda h uint32 t CUpti ActivityKernel contextld Description The ID of the context where the kernel is executing www nvidia
157. tarted and stopped just before and after the kernel executes Then event collection is enabled with cuptiEventGroupEnable When the cudaLaunch API is exited that is after the kernel is queued for execution on the GPU another cudaThreadSynchronize is used to cause the CPU thread to wait for the kernel to finish execution Finally the event counts are read with cuptiEventGroupReadEvent 1 5 2 Sampling Events The event API can also be used to sample event values while a kernel or kernels are executing as demonstrated by the event_sampling sample The sample shows one possible way to perform the sampling The event collection mode is set to CUPTI_EVENT COLLECTION MODE CONTINUOUS so that the event counters run continuously Two threads are used in event_sampling one thread schedules the kernels and memcpys that perform the computation while another thread wakes periodically to sample an event counter In this sample there is no correlation of the event samples with what is happening on the GPU To get some coarse correlation you can use cuptiDeviceGetTimestamp to collect the GPU timestamp at the time of the sample and also at other interesting points in your application 1 6 CUPTI Metric API The CUPTI Metric API allows you to collect application metrics calculated from one or more event values The following terminology is used by the metric API
158. ti ActivityName objectKind Description The kind of activity object being named 3 21 CUpti ActivityObjectKindld Union Reference Identifiers for object kinds as specified by CUpti ActivityObjectKind See also www nvidia com CUPTI DA 05679 001 v5 5 149 Data Structures CUpti_ActivityObjectKind CUpti_ActivityObjectKindld 1 CUpti_ActivityObjectKindld dcs Description A device object requires that we identify the device ID A context object requires that we identify both the device and context ID A stream object requires that we identify device context and stream ID CUpti_ActivityObjectKindld 0 CUpti_ActivityObjectKindld pt Description A process object requires that we identify the process ID A thread object requires that we identify both the process and thread ID 3 22 CUpti_ActivityOverhead Struct Reference The activity record for CUPTI and driver overheads This activity record provides CUPTI and driver overhead information CUPTI_ACTIVITY_OVERHEAD uint64_t CUpti_ActivityOverhead end Description The end timestamp for the overhead in ns A value of 0 for both the start and end timestamps indicates that timestamp information could not be collected for the overhead CUpti_ActivityKind CUpti_ActivityOverhead kind Description The activity record kind must be CUPTI_ACTIVITY_OVERHEAD www nvidia com CUPTI DA 05679 001 _v5 5 150 Data Structures CUpti ActivityOverhead objectld
159. tiResult cuptiEventGetldFromName CUdevice device const char eventName CUpti EventID event Find an event by name Parameters device The CUDA device eventName The name of the event to find event Returns the ID of the found event or undefined if unable to find the event www nvidia com CUPTI DA 05679 001 v5 5 76 Modules Returns gt CUPTI SUCCESS gt CUPTII ERROR NOT INITIALIZED gt CUPTI ERROR INVALID DEVICE gt CUPTI ERROR INVALID EVENT NAME if unable to find an event with name eventName In this case event is undefined gt CUPTI ERROR INVALID PARAMETER if eventName or event are NULL Description Find an event by name and return the event ID in event Thread safety this function is thread safe CUptiResult cuptiEventGroupAddEvent CUpti EventGroup eventGroup CUpti EventlD event Add an event to an event group Parameters eventGroup The event group event The event to add to the group Returns gt CUPII SUCCESS gt CUPTI ERROR NOT INITIALIZED gt CUPTI ERROR INVALID EVENT ID gt CUPTI ERROR OUT OF MEMORY gt CUPTI ERROR INVALID OPERATION if eventGroup is enabled gt CUPTI ERROR NOT COMPATIBLE www nvidia com CUPTI DA 05679 001 v5 5 77 Modules if event belongs to a different event domain than the events already in eventGroup orif a device limitation prevents event from being collected at the same time as the events already in eventGroup g
160. tivity record for source level global access struct CUpti_ActivityKernel The activity record for kernel deprecated struct CUpti_ActivityKernel2 The activity record for a kernel CUDA 5 5 onwards struct CUpti_ActivityMarker The activity record providing a marker which is an instantaneous point in time www nvidia com CUPTI DA 05679 001 _v5 5 28 Modules struct CUpti_ActivityMarkerData The activity record providing detailed information for a marker struct CUpti_ActivityMemcpy The activity record for memory copies struct CUpti_ActivityMemcpy2 The activity record for peer to peer memory copies struct CUpti_ActivityMemset The activity record for memset struct CUpti_ActivityMetric The activity record for a CUPTI metric struct CUpti ActivityMetriclnstance The activity record for a CUPTI metric with instance information This activity record represents a CUPTI metric value for a specific metric domain instance CUPTI ACTIVITY KIND METRIC INSTANCE This activity record kind is not produced by the activity API but is included for completeness and ease of use Profile frameworks built on top of CUPTI that collect metric data may choose to use this type to store the collected metric data This activity record should be used when metric domain instance information needs to be associated with the metric www nvidia com CUPTI DA 05679 001 v5 5 29 Modules struct CUpti_ActivityName The activity r
161. udaDeviceSynchronize cudaStreamSynchronize cuCtxSynchronize and cuStreamSynchronize from within a driver or runtime API callback function The following code shows a typical sequence used to associate a callback function with one or more CUDA API functions To simplify the presentation error checking code has been removed CUpti SubscriberHandle subscriber MyData struct my dato qc cuptiSubscribe amp subscriber CUpti CallbackFunc my callback my data cuptiEnableDomain 1 subscriber CUPTI CB DOMAIN RUNTIME API First cuptiSubscribe is used to initialize a subscriber with the my callback callback function Next cuptiEnableDomain is used to associate that callback with all the CUDA runtime API functions Using this code sequence will cause my callback to be called twice each time any of the CUDA runtime API functions are invoked once on entry to the CUDA function and once just before exit from the CUDA function CUPTI callback API functions cuptiEnableCallback and cuptiEnableAllDomains can also be used to associate CUDA API functions with a callback see reference below for more information The following code shows a typical callback function void CUPTIAPI my callback void userdata CUpti CallbackDomain domain CUpti CallbackId cbid const void cbdata const CUpti CallbackData cbInfo CUpti CallbackData cbdata MyDataStruct my data MyDataStruct userdata if domain CUPTI
162. uint32 t CUpti_ActivityMetriclnstance correlationld Description The correlation ID of the metric Use of this ID is user defined but typically this ID value will equal the correlation ID of the kernel for which the metric was gathered uint8 t CUpti ActivityMetriclnstance flags Description The properties of this metric See also CUpti ActivityFlag CUpti MetricID CUpti_ActivityMetricinstance id Description The metric ID uint32 t CUpti ActivityMetriclnstance instance Description The metric domain instance CUpti ActivityKind CUpti ActivityMetricInstance kind Description The activity record kind must be CUPTI ACTIVITY KIND METRIC INSTANCE uint8 t CUpti ActivityMetriclnstance pad Description Undefined Reserved for internal use www nvidia com CUPTI DA 05679 001 v5 5 148 Data Structures CUpti ActivityMetriclnstance value Description The metric value 3 20 CUpti ActivityName Struct Reference The activity record providing a name This activity record provides a name for a device context thread etc CUPTI ACTIVITY KIND NAME CUpti ActivityKind CUpti ActivityName kind Description The activity record kind must be CUPTI ACTIVITY KIND NAME const char CUpti ActivityName name Description The name CUpti ActivityName objectld Description The identifier for the activity object objectKind indicates which ID is valid for this record CUpti ActivityObjectKind CUp
163. ult cuptiDeviceEnumMetrics CUdevice device size t arraySizeBytes CUpti MetriclD metricArray Get the metrics for a device Parameters device The CUDA device arraySizeBytes The size of met ricArray in bytes and returns the number of bytes written to metricArray metricArray Returns the IDs of the metrics for the device Returns gt CUPTI SUCCESS gt CUPTI ERROR NOT INITIALIZED gt CUPTI ERROR INVALID DEVICE gt CUPTI ERROR INVALID PARAMETER ifarraySizeBytes ormetricArray are NULL Description Returns the metric IDs in metricArray for a device The size of the metricArray buffer is given by arraySizeBytes The size of the met ricArray buffer must be at least numMetrics sizeof CUpti_MetricID or else all metric IDs will not be returned The value returned in arraySizeBytes contains the number of bytes returned in metricArray www nvidia com CUPTI DA 05679 001 v5 5 96 Modules CUptiResult cuptiDeviceGetNumMetrics CUdevice device uint32_t numMetrics Get the number of metrics for a device Parameters device The CUDA device numMetrics Returns the number of metrics available for the device Returns gt CUPTI_SUCCESS gt CUPTI ERROR NOT INITIALIZED gt CUPTI ERROR INVALID DEVICE gt CUPTI ERROR INVALID PARAMETER if numMetrics is NULL Description Returns the number of metrics available for a device CUptiResult cuptiEnumMetrics size t arraySizeBytes
164. ultiprocessors CUpti_ActivityDevice www nvidia com CUPTI Data Fields DA 05679 001 _v5 5 165 numSets CUpti_EventGroupSets numThreadsPerWarp CUpti_ActivityDevice 0 objectId CUpti_ActivityName CUpti_ActivityMarker CUpti_ActivityOverhead objectKind CUpti_ActivityMarker CUpti_ActivityName CUpti_ActivityOverhead overheadKind CUpti_ActivityOverhead P pad CUpti ActivityMemcpy2 CUpti ActivityKernel CUpti ActivityEventInstance CUpti ActivityMetric CUpti ActivityPreemption CUpti ActivityMetricInstance parentBlockX CUpti_ActivityCdpKernel parentBlockY CUpti_ActivityCdpKernel parentBlockZ CUpti_ActivityCdpKernel parentGridId CUpti ActivityCdpKernel payload CUpti ActivityMarkerData payloadKind CUpti ActivityMarkerData pcieLinkGen CUpti_ActivityEnvironment pcieLinkWidth CUpti_ActivityEnvironment pcOffset CUpti_ActivityBranch www nvidia com CUPTI Data Fields DA 05679 001 _v5 5 166 power CUpti_ActivityGlobalAccess CUpti_ActivityEnvironment powerLimit CUpti_ActivityEnvironment preemptionKind CUpti_ActivityPreemption processId pt Q CUpti ActivityAPI CUpti ActivityObjectKindId queued R CUpti_ActivityCdpKernel registersPerThread CUpti_ActivityKernel CUpti_ActivityKernel2 CUpti_ActivityCdpKernel requested CUpti_ActivityKernel2 CUpti_ActivityCdpKernel reservedO CUpti ActivityMemset CUpti ActivityKernel CUpti ActivityKer
165. us requested completed callbacks from CUPTI Registering these callbacks prevents the client from using CUPTT s blocking enqueue dequeue functions CUptiResult cuptiActivitySetAttribute CUpti ActivityAttribute attr size t valueSize void value Write an activity API attribute Parameters attr The attribute to write valueSize The size in bytes of the value value The attribute value to write Returns gt CUPTI SUCCESS www nvidia com CUPTI DA 05679 001 _v5 5 49 Modules gt CUPTI_ERROR_NOT_INITIALIZED gt CUPTI_ERROR_INVALID_PARAMETER if valueSize or value is NULL or if attr is not an activity attribute gt CUPTI ERROR PARAMETER SIZE NOT SUFFICIENT Indicates that the value buffer is too small to hold the attribute value Description Write an activity API attribute CUptiResult cuptiGetDeviceld CUcontext context uint32 t deviceld Get the ID of a device Parameters context The context or NULL to indicate the current context deviceld Returns the ID of the device that is current for the calling thread Returns gt CUPTI SUCCESS gt CUPTI ERROR NOT INITIALIZED gt CUPTI ERROR INVALID DEVICE if unable to get device ID gt CUPTI ERROR INVALID PARAMETER if deviceld is NULL Description If context is NULL returns the ID of the device that contains the currently active context If context is non NULL returns the ID of the device which contains that context Operates i
166. value Description Read an activity API attribute and return it in value CUptiResult cuptiActivityGetNextRecord uint8 t buffer size t validBufferSizeBytes CUpti Activity record Iterate over the activity records in a buffer Parameters buffer The buffer containing activity records validBufferSizeBytes The number of valid bytes in the buffer record Inputs the previous record returned by cuptiActivityGetNextRecord and returns the next activity record from the buffer If input value is NULL returns the first activity record in the buffer Records of kind CUPTI ACTIVITY KIND CONCURRENT KERNEL may contain invalid 0 timestamps indicating that no timing information could be collected for lack of device memory Returns gt CUPTI SUCCESS gt CUPTI ERROR NOT INITIALIZED gt CUPTI ERROR MAX LIMIT REACHED if no more records in the buffer gt CUPTI ERROR INVALID PARAMETER if bu feris NULL www nvidia com CUPTI DA 05679 001 v5 5 46 Modules Description This is a helper function to iterate over the activity records in a buffer A buffer of activity records is typically obtained by using the cuptiActivityDequeueBuffer function or by receiving a CUpti_BuffersCallbackCompleteFunc callback An example of typical usage P CUpti Activity record NULL EE ao 4 status cuptiActivityGetNextRecord buffer validSize amp record AE Stars CUB Mian SWC CES Use record
167. value kind Values CUPTI METRIC VALUE KIND DOUBLE 0 The metric value is a 64 bit double CUPTI METRIC VALUE KIND UINT64 1 The metric value is a 64 bit unsigned integer CUPTI METRIC VALUE KIND PERCENT 2 The metric value is a percentage represented by a 64 bit double For example 57 5 is represented by the value 57 5 CUPTI METRIC VALUE KIND THROUGHPUT 3 The metric value is a throughput represented by a 64 bit integer The unit for throughput values is bytes second CUPTI METRIC VALUE KIND INT64 4 The metric value is a 64 bit signed integer CUPTI METRIC VALUE KIND UTILIZATION LEVEL 5 The metric value is a utilization level as represented by CUpti MetricValueUtilizationLevel CUPTI METRIC VALUE KIND FORCE INT 0x7fffffff enum CUpti MetricValueUtilizationLevel Enumeration of utilization levels for metrics values of kind CUPTI METRIC VALUE KIND UTILIZATION LEVEL Utilization values can vary from IDLE 0 to MAX 10 but the enumeration only provides specific names for a few values www nvidia com CUPTI DA 05679 001 v5 5 95 Modules Values CUPTI_METRIC_VALUE_UTILIZATION_IDLE 0 CUPTI METRIC VALUE UTILIZATION LOW 2 CUPTI METRIC VALUE UTILIZATION MID 5 CUPTI METRIC VALUE UTILIZATION HIGH 8 CUPTI METRIC VALUE UTILIZATION MAX 10 CUPTI METRIC VALUE UTILIZATION FORCE INT 0x7fffffff typedef uint32 t CUpti_MetriclD ID for a metric A metric provides a measure of some aspect of the device CUptiRes
168. vents enable the event group or groups that contain those events by using the cuptiEventGroupEnable function If your events are contained in multiple event groups you may be unable to enable all of the event groups at the same time due to device limitations In this case you can gather the events across multiple executions of the application or you can enable kernel replay If you enable kernel replay using cuptiEnableKernelReplayMode you will be able to enabled any number of event groups and all the contained events will be collect Use the cuptiEventGroupReadEvent and or cuptiEventGroupReadAllEvents functions to read the event values When you are done collecting events use the cuptiEventGroupDisable function to stop counting of the events contained in an event group The callback_event sample described on the samples page shows how to use these functions to create enable and disable event groups and how to read event counts 1 5 1 Collecting Kernel Execution Events A common use of the event API is to count a set of events during the execution of a kernel as demonstrated by the callback_event sample The following code shows a typical callback used for this purpose Assume that the callback was enabled only for a kernel launch using the CUDA runtime i e by cuptiEnableCallback 1 subscriber CUPTI CB DOMAIN RUNTIME API CUPTI RUNTIME TRACE CBID cudaLaunch v3020 To simplify the presentation error checki
169. ww nvidia com CUPTI DA 05679 001 v5 5 100 Modules CUptiResult cuptiMetricGetAttribute CUpti_MetriclD metric CUpti_MetricAttribute attrib size_t valueSize void value Get a metric attribute Parameters metric ID of the metric attrib The metric attribute to read valueSize The size of the value buffer in bytes and returns the number of bytes written to value value Returns the attribute s value Returns gt CUPII SUCCESS gt CUPTI ERROR NOT INITIALIZED gt CUPTI ERROR INVALID METRIC ID gt CUPTI ERROR INVALID PARAMETER if valueSize or value is NULL or if attrib is not a metric attribute gt CUPTI ERROR PARAMETER SIZE NOT SUFFICIENT For non c string attribute values indicates that the value buffer is too small to hold the attribute value Description Returns a metric attribute in value The size of the value buffer is given by valueSize The value returned in valueSize contains the number of bytes returned in value If the attribute value is a c string that is longer than valueSize then only the first valueSize characters will be returned and there will be no terminating null byte www nvidia com CUPTI DA 05679 001 v5 5 101 Modules CUptiResult cuptiMetricGetldFromName CUdevice device const char metricName CUpti MetriclD metric Find an metric by name Parameters device The CUDA device metricName The name of metric to find metric Returns the ID of the found met
170. x orm nt er PARE WETRRO A UROER EE RD ERE TR RSEN E DER ERO EUR 119 e U 119 Mee PUE 120 AC Ia dpi I sess 120 MAXBLOCKDIMX Sc 120 TT 120 E Te EE 120 maxBlocksPerMultiprocessSoT oooooococcoococconconnonccnnocon non enhn ehe e hh eher eere 120 MAXGMADIUIMN So 120 MAXGMADIMY EU 121 MAXGMADIUMZ csi a ais 121 MAXIP EE 121 EK e E 121 maxSharedMemoryPerBloCk neessssssssssssesosssssssssosssssssssessessessesssssessssssssesssseeee 121 max ThreadsPer Blocks cocina E Ae 121 maxWarpsPerMultiprOCESSOT ooooococcococonocononoconrocnconoco nono eia eiae 121 W REESE oid RER REESE orn SEERE SSR LTD 122 tut ne rn 122 www nvidia com CUPTI DA 05679 001 _v5 5 viii NUMMULLIPFOCESSOMS ee ENEE EE dee NEE EEN Kee ties ERAN eae da 122 humThreadsPer WD ET 122 CUpti ACTwtvErmitonpment use NA EEENKAEEE NEEN NEEN ra ERE NEEN ANER NEEN AEN a 122 clocksThrottleReasons cose eene a da 122 pope 123 anre E SSEES 123 environmerntKiid e eoo exce de Eas oA DENEN 123 Iun M aa 123 PIN joris SEENEN ENEE EEN EE ENEE dE EE dE EE 123 KING DEE 123 Feeler eom 123 la Bn 124 pciebink WAGE EEN 124 POWE neeo isis 124 DOW E 124 lee ul EE 124 A

Download Pdf Manuals

image

Related Search

Related Contents

  航 空 従 事 者 学 科 試 験 問 題 M2    Altifalantes Bluetooth Nokia MD-5W  島根原子力発電所の 保守管理並びに定期事業者検査に 係る調査報告  Tech air TAXSGT016  4 - 浦安市  Questionnaire resumé 2012  Manual de Instrucciones  

Copyright © All rights reserved.
Failed to retrieve file