Home
MX User's Guide
Contents
1. mx status t status uint32 t result Parameters IN endpoint The MX endpoint on which the operation is pending IN request The handle of the pending request IN timeout The value of the timeout in milliseconds OUT UE A pointer to the status structure to be filled in case of completion if any OUT result Non zero if the request is complete If the asynchronous pending request is complete mx wait returns MX SUCCESS Otherwise the function may return one of the following error codes Error return codes MX NO RESOURCES Shortage of memory or other system resources This function blocks the current thread of execution in the kernel waiting for an interrupt from the NIC The arguments to the mx wait functions are the same as to mx test with the addition of a timeout This timeout is the maximum time in milliseconds that the function will wait for the completion of the pending request If the request is not yet completed at the expiration of the timeout mx wait will return to the application If the request is completed before the expiration of the timeout the function will return at that time and result will be non zero VI 3 Querying for Any Completion It may be required for the application to know if at least one request among all of the posted operations on an endpoint is ready to be completed mx ipeek and mx peek provide this capability These functions will return the
2. or mx wait are required and used to recycle the request s resources Example V 2 Post of an asynchronous non contiguous receive with a context value tinclude myriexpress h int main void mx_return_t rc mx_endpoint_t endpoint mx_request_t recv_handle mx_segment_t buffer_desc 2 u u u int8_t workspace 256 int64_t match_recv int64_t match_mask mx_status_t status some_private_struct my_context uint32_t result Init and open local endpoint post an asynchronous non contiguous receive with a wildcard for the middle 16 bits of the match data part B O 2005 Myricom Inc 32 buffer desc 0 segment ptr amp workspace 64 buffer desc 0 segment length 17 buffer desc l1 segment ptr amp workspace 0 buffer desc 1 segment length 50 match recv UINT64 C 0x1111111100003333 match mask UINT64 C Oxffffffff0000ffff a rc mx irecv atch_recv match mask I I endpoint buffer desc 2 m amp my context amp recv handle it is not yet safe to change values in workspace though it is safe to modify buffer desc wait for receive completion rc mx wait endpoint amp recv handle MX INFINITE amp status amp result status context now holds amp my context and it is now safe to write into workspace endpoint closing and finalize 2005 Myricom Inc 33 VI Request State Functio
3. Key MX MAX STR LEN MX PART NUMBER Description The part number string for this Myrinet NIC Input uint32 t the board id Output uint8 t MX MAX STR LEN Output Size Key MX MAX STR LEN MX SERIAL NUMBER Description The serial number for this Myrinet NIC Input uint32_t the board id Output uint32 t Output Size Key sizeof uint32 t MX PORT COUNT Description The number of ports for this Myrinet NIC Input uint32 t the board id Output uint32 t Output Size sizeof uint32 t 111 2 2 MX NIC IDS Before calling mx get info with the key MX NIC IDS an application should first call mx get info with the key MX NIC COUNT A subsequent call with MX NIC IDS will fill in an array of NIC IDs in uint64 t s followed by a 0 Thus the memory area passed to mx get info for MX NIC IDS should be large enough to hold N 1 64 bit integers where N is the number returned by MX NIC COUNT For example if MX NIC COUNT indicates there are 2 NICs in the system MX NIC IDS should be passed an array with size at least 3 sizeof uint64 t The first two elements of the array will contain the two NIC IDs and the third element will bezero This array is terminated with a zero 111 2 3 MX COUNTERS 2005 Myricom Inc 13 Before calling mx get info with the key MX COUNTERS LABELS or the key MX COUNTERS VALUES an application should first call m
4. ns 20 IV 2 BOARD NUMBERS AND NIC ID 8 ecccccccccsssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssseesssssssussssssssssssees 21 IV 2 1 mx_board_number_to_niCc_1dl ccccnnnoouooncnnoconononanananonononnnnnononnonnnanononononocn no 21 IV 2 2 mx_mnic_id_to_ board _mMumber ccccccccccceccecseeeessnsececcecccecesesensessseceeeeeens 22 AVe E ENDPOINT ADDRESSES nata d Led ace cta qe cd 22 VISA mix connect a tacos teen Ie pr EN ide 22 IV 3 2 mx decompose endpoint addr 15 ee dece add 23 IV 4 Locat ENDPOINT ADDRESS cia 25 IVA F IE pet endpomt lt a ddr stgo iot eel oci qe aaepe at etos 25 V POINT TO POINT COMMUNICATIONN eere eee eee ee eee een neos e ette ee tn sees eee eaa 26 Vol SEND OPERATIONS eite et inicia 26 MTS aix 4Send ret edet esiste eb iet esi vd 26 AO A trot rete T e EN Te EE REEF Pe ENS 28 2005 Myricom Inc i V2 RECEIVE OPERATIONS dass 31 Vidal ME LU O O O OEm 31 VI REQUEST STATE FUNCTIONS eoe e ee eo ooo eee etna oe eaae eee eene eee ee po aes eaa a eee eOo 34 VD BUFFERED STATE ir t e tu etae esee eet Ex e e re te e ire n eed E 34 VET T aux DU Osiris ettet e inesse tees eese edo 34 VL2 REQUEST COMPLETION wiscsssssssssssssssssssssssssssssssscsssssssssssssssusccssssssssssssssssssssssssssssesssssssssssssssssussssssssssssssssssssssssssssases 35 AO ttt me tette e Se ber esee VOTE 35 LO RAP O CIONE
5. 2 Request Completion A successful return from a completion function like mx test or mx wait is required for each pending request in order to release the resources associated with the operation If asynchronous requests are not successfully completed the application will suffer a resource leak and MX operations will eventually fail The usage of these functions is the only way for the application to query for the eventual success or failure of the requests VI 2 1 mx test The function used to check for the completion of a pending operation in a non blocking way is mx test mx return t mx test mx endpoint t endpoint mx request t request mx status t status uint32 t result Parameters IN endpoint The MX endpoint on which the operation is pending IN request The handle to the pending request OUT status The status structure to be filled in case of completion OUT result Non zero if the request is complete If the asynchronous pending request is complete mx test returns MX SUCCESS Otherwise the function may return one of the following error codes Error return codes MX NO RESOURCES Shortage of memory or other system resources The argument request is the pointer to the handle referencing the pending MX operation If the referenced operation is complete the output parameter result is non zero and the output parameter status a pointer to an mx status t structure is fille
6. Initialization of the MX library tinclude myriexpress h 2005 Myricom Inc 9 int main void mx_return_t return_code Initialize the MX library return_code mx_init do work here Finalize the MX library return_code mx_finalize return 0 2005 Myricom Inc 10 111 2 Information Retrieval 111 2 1 mx get info A variety of information about MX or about a specific endpoint can be retrieved using mx get info mx return t mx get info mx endpoint t endpoint mx get info key t key void in val uint32 t in len void out val uint32 t out len Parameters The MX endpoint to focus the scope of the IN endpoint information inquiry NULL if information is global IN key The enumeration value of the information key IN in val A pointer to the parameters needed for this call IN ih pen od D of the memory area referenced by A pointer to the memory area where the requested information will be placed IN out len The size of the out val buffer OUT out val If the value of the specified information key has been successfully retrieved mx get info Q returns MX SUCCESS Otherwise the function may return one of the following error codes Error return codes MX BAD INFO KEY Unknown key MX BAD INFO LENGTH The buffer length is too small mx get info provides a way to obtain information
7. about MX at the global level of the library or at the limited level of an MX endpoint Such information is accessible under the form of key value pairs where the key is an enumeration and the value can be of multiple types The size of the content of the value is specific for each information key If the endpoint parameter is NULL the information retrieval applies to the MX library itself If information associated to a specific endpoint is requested then the parameter endpoint must be defined to the appropriate MX endpoint O 2005 Myricom Inc 11 The argument key is one value of the enumeration referencing all the information keys If this key is not recognized as one of the valid keys listed below the return code MX_BAD_INFO_KEY is returned The parameter in_val is a pointer to a memory area which contains any needed parameters for that key request The parameter in_len is the length associated with in val The parameter out val is the memory where the information requested will be returned The argument out len is the size of the memory area designated by out val If this size is not large enough to contain the value associated with the key MX BAD INFO LENGTH is returned and the contents of the memory referenced by out val are undefined The following keys are recognized as valid Global Information endpoint can be NULL Key MX MAX NATIVE ENDPOINTS Description The maximum number of endpoints interfaced directl
8. matches the specified matching information the probe functions return a status structure updated with information about the message including match data message source and message length mx probe blocks until a matching message is available mx iprob returns immediately indicating whether a matching incoming message is available or not VII 1 1 mx iprobe mx return t mx iprobe mx endpoint t endpoint uint64 t match recv uint64 t match mask mx status t status uint32 t result Parameters IN ndpoint The MX endpoint on which to probe for incoming messages The matching information to be matched by the IN match recv incoming message after masking it by the match_mask The mask applied to the matching information of the IN match_mask incoming message to match the match recv argument The status structure to be filled in case of a OUT status ae matching incoming message is available OUT result Non zero if there is a message ready to be received If an incoming message matches the matching information the status structure has been updated and mx iprobe returns MX SUCCESS Otherwise the function may return one of the following error codes Error return codes MX NO RESOURCES Shortage of memory or other system resources If the output parameter result is non zero the status structure has been updated with the information related to the incoming messag
9. millisecond e Efficient support for multiple links per NIC MX does segmentation and re assembly of large messages at the NIC level and does not ensure in order delivery or notification only in order matching Support is provided to ensure reliability for a large number of out of order fragments packets without requiring retransmission leading to high efficiency on all links Intentional dropping of out of order packets and reliance on retransmission is used only when the number of out of order packets exceeds available buffering resources e Support for route dispersion Multiple routes are computed for each destination When necessary different routes are chosen to avoid hot spots on the network fabric and to achieve resistance to individual link failures e Integrated Myrinet fabric mapping Network topology discovery and route computation is performed automatically as soon as the MX driver is loaded and the network is automatically remapped as necessary when a network connectivity issue is detected e Support for cancellation of pending requests Pending operations can be cancelled in MX The cancel operation will fail gracefully if the pending request has completed asynchronously while its cancel is attempted 2005 Myricom Inc 2 e Single threaded and thread safe libraries MX provides both thread safe and single threaded libraries to allow users to select which is most appropriate for the application There is no API differen
10. opened This endpoint number must be in the 0 MX MAX ENDPOINTS 1 range The value of MX MAX ENDPOINTS can be retrieved using mx get info The application can let MX choose the best endpoint to open by using the MX ANY ENDPOINT constant The params list argument is a pointer to an array of mx param t entries This array specifies the user configuration of the requested endpoint MX endpoint parameters are key value pairs where the keys are member of an enumeration and the values are pointers to memory areas allocated by the application and containing the values of the respective parameters The params count parameter specifies the number of entries in the list of endpoint parameters The params list argument may be NULL along with a params count of 0 in which case default values are used for all settings 2005 Myricom Inc 17 The following keys are recognized as valid Parameter MX_PARAM_UNEXP_QUEUE_MAX Description Sets the maximum length of the unexpected queue Format uint32_t Size sizeof uint32_t Default Value Value of the MX_UNEXP_QUEUE_LENGTH_MAX Parameter Key MX PARAM ERROR HANDLER Description Sets the error handler Format mx_error_handler_t Size sizeof mx_error_handler_t Default Value No error handler 111 3 2 mx_close_endpoint Once opened an MX endpoint can be closed This operation is performed by the function mx_close_endpoint mx_return_t mx_close_end
11. soon as a call to mx_test or mx_wait indicates that this pending operation is complete Note that being complete also indicates that the send buffers are available for the application As mx_isend follows the semantics of the MPI standard mode a send request in the buffered state can be completed immediately by calling mx_test or mx_wait Thus there is no advantage to use mx_ibuffered before mx_test or mx_wait on requests initiated by mx_isend 2005 Myricom Inc 27 V 1 2 mx issend MX also supports the concept of a synchronous send which means that the send request is not considered complete until it is successfully received by the destination endpoint it is cancelled or an unrecoverable error has occurred sending the message The function to initiate a non blocking synchronous send is mx issend mx return t mx issend mx endpoint t endpoint mx segment t segments list uint32 t segments count mx endpoint addr t destination uint64 t match send void context mx request t request Parameters IN endpoint The local MX endpoint used to post the send IN seemientsc hist The array of contiguous segments constituting the gather list describing the buffer to send IN segments count The number of segments in the gather list IN destination The MX Addr of the destination of the message The matching information from the send side that IN match send will be used to find a
12. the request returned is non deterministic The output parameter request is only valid if the output parameter result is non zero The returned handle must subsequently be passed to mx test or mx wait in order to learn the success or failure of the request and to release the resources associated with the request mx test is preferred over mx wait in this case as the specified request is guaranteed to be complete VI 4 Obtaining the context Functions that generate request handles take a context parameter This parameter is made available to the user when the request is completed by mx wait or mx test as part of the status output parameter There can be cases for example when handling requests returned by mx peek or mx ipeek where it might be useful to extract the context field before the request is completed mx context is the function used to obtain the context mx return t mx context mx request t request void context 2005 Myricom Inc 39 Parameters IN request The handle of the request from which the context is to be extracted OUT context The user defined pointer specified when the request was created The current implementation of mx_context always returns MX_SUCCESS 2005 Myricom Inc 40 VII Probing The functions mx iprobe and mx probe can check for incoming messages without actually receiving them If a message is ready to be received and
13. undocumented error codes as a way to report some programmings errors that are easy to detect 2005 Myricom Inc 45
14. using the function mx connect which checks that the remote endpoint is open and accepts our filter value An endpoint addr t is endpoint specific If a process has multiple local endpoints open it 2005 Myricom Inc 22 will need to call mx connect for each local endpoint even if all the local endpoints will be talking to the same remote endpoint mx return t mx connect mx endpoint t endpoint uint64 t nic id uint32 t endpoint ig uint32 t filter uint32 t timeout mx endpoint addr t endpoint addr Parameters The local MX endpoint that will be used for IN endpoint 70 communication NIC ID of remote node with which we wish to IN nic id gt communicate IN endpoint_id ID of the remote endpoint IN filter Filter value for the remote endpoint IN timeout Specifies the amount of time to wait for a connection in seconds OUT endpoint_addr The newly built endpoint address If the endpoint address has been successfully built mx connect returns MX SUCCESS Otherwise the function may return one of the following error codes Error return codes MX NIC NOT FOUND The target NIC was not found in the network peer table MX CONNECTION FAILED The remote endpoint is closed MX BAD CONNECTION KEY Wrong credentials key the peer rejected the connection or message MX TIMEOUT The specified timeout was exceeded while waiting for the target to
15. ESS Otherwise the function may return one of the following error codes Error return codes MX BUSY The endpoint or all possible endpoint matching requirements if wildcards are used is are busy MX NO DEV Some OS specific dev mx devices are not present MX NO DRIVER The MX driver does not seem to be loaded MX NO PERM No permission to access the mx device 2005 Myricom Inc 15 MX BAD BAD BAD Something bad happened with the driver maybe the wrong driver You need to check the kernel log MX BOARD UNKNOWN Invalid board number MX NO RESOURCES Shortage of memory or other system resources 2005 Myricom Inc 16 mx_open_endpoint opens a specific MX endpoint if available This function requires a pointer to an mx endpoint t object allocated by the application This object should be passed to all of the other MX functions operating on an MX endpoint Opening an endpoint also creates it and these terms may be used interchangeably in this document If the return value is not MX SUCCESS then the endpoint passed in to mx open endpoint remains unmodified A board number board num is passed to specify with which NIC this endpoint should be associated This is referred to as the primary NIC for the endpoint The application can let MX choose the best NIC on which to open an endpoint by using the MX ANY NIC constant The second input parameter is the index of the endpoint to be
16. IX and Windows parlance MX provides a mechanism for processes on the same or different hosts to communicate with each other through a Myrinet communication fabric Each individual Myrinet card in a host is called a physical NIC or just NIC Processes communicate using MX by opening endpoints that are associated with NICs All messages in MX contain matching information This information is used to match incoming sends to receive requests In order to receive a message a receive request is posted To send a message a send request is posted Both are MX calls that specify matching information a destination endpoint and a list of user memory segments and their respective lengths Specific code examples will follow but the typical flow of a process wishing to communicate with another is thus initialize the library open an endpoint connect to your target host s start sending and receiving messages commingled with calls to collect status on these operations close the endpoint and finalize the library 11 1 Naming Scheme Many MX functions have both a blocking and a non blocking variant The naming scheme of the MX functions strongly reflects the MPI naming scheme in order to facilitate the ease of understanding of the semantics of the functions As in the MPI standard the letter prefixed in the function name denotes a non blocking operation which returns immediately The blocking counterparts of these function names do not contain t
17. In this case the complete state can be used as a synchronization point with the receiver There are five functions used to observe and wait on the state of requests These are mx ibuffered mx_test mx wait mx_ipeek and mx peek mx ibuffered returns MX SUCCESS if the referenced request has been safely buffered and the memory buffers associated with the request may be reused If a request is in either the buffered or complete state mx ibuffered will return MX SUCCESS Calls to mx ibuffered do not affect the state of the request in any way mx test returns MX SUCCESS if the referenced request has completed In this case the status structure referenced in the call will be updated with more detailed information about the requests completion A successful return from mx test does not mean that the request was successful just that 1t is complete The status structure will contain all codes related to the outcome of the request such as successful cancelled or rejected After a successful return from mx test the referenced request no longer exists as far as MX is concerned If the return from mx test is unsuccessful for any reason the resources associated with the request are not released mx wait is the blocking counterpart of mx test mx ipeek returns the handle of a request for a specific endpoint that is in the complete state If mx ipeek returns MX SUCCESS the returned request handle is guaranteed to be succes
18. Myrinet Express MX A High Performance Low Level Message Passing Interface for Myrinet Version 1 0 July 01 2005 Table of Contents T OVERVIEW e iocexeekcvxretee suutebba A ai sosis issado 1 EA AS PEGMING FEATURES iodo 3 MT SOONG TG RC NTC 4 TENA SCHEME e a es 4 TZ UNDC x decias 4 IE3 ENDPOINT ADDRESSING aenaran a a rE A A A ANEN 5 PAN MATCHING AS EE AA EAE e AE EEE IN ca A E E eS 5 I S I NEXPECTED MESSAGES did 6 IT 6 REE didas 7 T7 REQUEST STATES lid 7 HE INITIALIZATION aiii iaa ota 9 IIT 1 IBRAR INITIALIZATION ardilla tadas 9 O IE 9 HT I 2 me mal da 9 IIT 2 INFORMATION RETRIEVA Ltrs rd 11 BA o EN T A co ere NORD NY a tae 11 HE22 MX NIC IDS yaaa teas taste Umi e sl kate aca EE tts aseM E 13 HE23 MX COUNTERS Est e HS D d He aaa 13 111 3 ENDPOINT OPENING AND CLOSING sesssssssssscsssccscsssesssssssssssssssssssssssssuussoussonssescsssescsescscsseessesssssesssssessssssesessnes 15 TUD Mx Open OPIO eoe s i eaa eet adea id ds 15 11I 3 2 mx Close endpoint ic est prec ex si RE ETUR ES iit 18 TESSA Wakeup Wits Sates Dante entitled ee efte b eee etu C oc 18 IV SPECIFYING ENDPOUN ES sivivissesccssconcconsibcnspesecunscobcctussbccvpcn bined aa Ee Pea cepocnscansseecae 20 IV 1 JHOSTNAMES AND NIC ID Simi 20 IV 1 1 mx_hostname_to_DIiC_1dl oooooooonnnnnnnnccnnnnnnonnnonanonnnnnnnncnnononnnnn nono no nennen nnne 20 IV 1 2 mx_nic_id_to_hostnaMe ooooooooononcnnnonccnnannnonnnonnnnnnnnonnnnnnononnnnn nono nonnnnnnonnncnn
19. ORTED Operation aborted on peer NIC e mx addr t source This field represent the MX address of the source endpoint from which the NIC id and the endpoint id can be extracted with mx decompose endpoint addr It can be used for identification purposes or to reply to the sender e uint32 t length This is the effective length of the received message It can be smaller than the length of the posted receive but not greater If the incoming message was larger than the length of the posted receive this length is set to the length of the posted receive and the status code returned is MX STATUS TRUNCATED e void context The user defined pointer which was passed to MX when posting the original request It can be used to implement a callback functionality or simply ignored If a context argument was specified when the operation was posted this value will be in the status structure returned To implement callbacks context could be a pointer to a structure containing a callback function address and an argument that the application code would arrange to call 2005 Myricom Inc 36 VI 2 2 mx wait It is sometimes useful to block the current thread when waiting for the completion of a pending operation The blocked thread should not use any CPU cycles while waiting thus yielding the processor to other threads MX provides this capability via mx wait mx return t mx wait mx endpoint t endpoint mx request t request uint32 t timeout
20. S IP 37 VL3 QUERYING FOR ANY COMPLETION sssssssssssssssssssssssscsesscsesesesssssssssssssssssssnsnsssonossnssesssesescsescsccecessesesesesssssssesssees 37 MESE mx nl 38 VL 3 2 mx peek ecd e vitet laa AULA REA Qo QU de Pn Baud 38 VIA OBTAINING THE CONTEXT aii 39 MAIN edh eeoee ooie ossoa 41 VIE Tet sims probe Osce ef aed dated pa eia e afecta I oe ep UM 41 NATI WD qiix EOD e ende e a doe a SEPTA e at it ta te td 42 VIEL CANCELING MX REQUESTS vcccsisccevccsccavsesssvevesssceussccassenskcnocdsceonenscsesensaconvesds 43 VUELA max Can Cel E a ENN LIAR NLS NEN us 43 IX APPLICATION PROGRAMMING NOTEG cccccccscsssscscscssscsssssssssssssssssssseees 43 TAL COMPLETING REE TS A dieras dis 43 IX 2 MULTI THREADED APPLICATIONS ccssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssscsosssssssscsssssssuussesssssssssssssssssssoes 44 X ERRORHANDDLIUNG 2 2 rhet oL Io oV ba do Ge aoa ancianas anar Ope prese uale Up o a aee isas 44 2005 Myricom Inc li Overview Myrinet Express MX is a high performance low level message passing software interface tailored for Myrinet MX exploits the processing capabilities embedded in the Myrinet NIC to provide exceptional performance for modern middleware interfaces such as MPI or VI enables low overhead Ethernet emulation at link speed and offers a simple API for third party Myrinet software developments The MX API package includes thes
21. blocked in mx_probe on the same endpoint only one of them will return with success in case of a matching incoming message 2005 Myricom Inc 42 Vill Canceling MX requests VIII 1 1 mx cancel Pending receive operations may be cancelled via mx canceK This function is required for cleanup Posting a receive ties up user resources receive buffers and MX resources n the library or in the NIC and a cancel may be needed to free these resources gracefully mx return t mx cancel mx endpoint t endpoint mx request t request uint32 t result Parameters IN endpoint The MX endpoint on which the operation is pending IN request The pointer to the handle of the pending request OUT result Non zero if the request was really cancelled mx cancel always returns MX SUCCESS in the current implementation This function always returns immediately If result 1 then this request was cancelled successfully If result 0 then it was too late to cancel this request because the receive has already been matched Thus after a call to mx cancel the request has either been cancelled and the resources freed or the request has been matched and a subsequent call to mx test or mx wait is guaranteed to complete quickly In either case mx cancel provides a way for the application to safely free receive requests IX Application Programming Notes This section discusses important points for wh
22. ce between the two libraries they are completely source and binary compatible Thread safety is transparent to the application many threads can initiate sends or receives or even wait on different handles at the same time The single threaded MX library is provided so that applications may avoid the overhead of thread safe operations if they are not needed 1 1 Upcoming Features Future releases of MX will include support for One sided communication and collective operations such as barrier broadcast_and all reduce e A recovery mechanism from the majority of SRAM parity errors The MX software will have the capability of transparently reinitializing the NIC firmware and data structures When an SRAM parity error occurs in many cases it is recoverable In a few cases however it will not be possible to recover and a reboot of the host will still be required for security purposes e Native support for non contiguous sends and receives All MX communication primitives involving user level data will accept scatter gather lists with a reasonable limit on the number of contiguous segments Different mappings can be used on the sender and receiver sides for the same communication allowing spatial data transformations to be an implicit part of the communication 2005 Myricom Inc 3 lI Concepts This section describes the terms used in the MX API and how they relate to each other Host and process have their usual meanings as in UN
23. d with information about the completed operation If the request is not in the complete state the content of the output parameter status is unchanged and meaningless The information returned to the application upon completion is organized as a structure of type mx status t defined below 2005 Myricom Inc 35 e mx status code t code This code defines the nature of the completion of the operation It can take one of these values o MX STATUS SUCCESS Operation completed successfully o MX STATUS PENDING Request still pending o MX STATUS BUFFERED Request has been buffered but still pending o MX STATUS REJECTED Filter key mismatch message was rejected by the remote endpoint o MX STATUS TIMEOUT Posted operation timed out o MX STATUS TRUNCATED Operation completed but received data was truncated due to undersized buffer or oversized message o MX STATUS CANCELLED Pending operation was cancelled o MX STATUS ENDPOINT UNKNOWN Destination endpoint is unknown on the network fabric o MX STATUS ENDPOINT CLOSED Remote endpoint is closed o MX STATUS ENDPOINT UNREACHABLE Connectivity is broken between the source and the destination o MX STATUS BAD SESSION Bad session no mx connect done o MX STATUS BAD KEY Connection failed due to bad credentials o MX STATUS BAD ENDPOINT Destination endpoint rank is out of range for the peer o MX STATUS BAD RDMAWIN Invalid RDMA window given to the mcp o MX STATUS AB
24. e The complementary routine mx nic id to hostname converts a NIC ID to a hostname mx return t mx nic id to hostname uint64 t nic id char hostname O 2005 Myricom Inc 20 Parameters IN nic_id The NIC ID of the host whose name we want OUT hostname The name of the host If the hostname for the specified NIC ID has been successfully returned mx nic id to hostname returns MX SUCCESS Otherwise the function may return one of the following error codes Error return codes MX HOST NOT FOUND No such NIC is in the network peer table Note that MX MAX HOSTNAME LEN includes a trailing 0 used in C string representations IV 2 Board numbers and NIC IDs IV 2 1 mx board number to nic id In order to facilitate identifying a specific NIC when there are multiple NICs in the same host MX provides utility functions mx board number to nic id and mx nic id to board number to convert from a board number to a NIC ID and vice versa mx board number to nic id returns the MAC address of a board with a given rank mx return t mx board number to nic id uint32 t board number uint64 t nic id Parameters IN board number The board number whose NIC ID we want OUT nic id The NIC ID assigned to this board number If the NIC ID for the board number has been successfully retrieved mx board number to nic id returns MX SUCCESS Otherwise t
25. e design goals Protected and independent access for user level applications MX endpoints virtualize the network interfaces at the process level providing OS bypass communication Transparent memory registration Most modern message passing interfaces do not require explicit memory registration operations In MX explicit memory registration by the application or middleware is avoided altogether by the use of PIO or memory copies for small messages and is made implicit and very low cost for larger messages Very low total latency for small messages In order to minimize latency for small messages MX implements an extremely short critical path without intermediate copies or memory registration Fully asynchronous communication primitives The initiation of any communication is separated from its completion Once an operation has been initiated the application is not involved until it checks or waits for it Virtually unlimited number of pending sends and receives While the number of pending sends and receives natively supported by MX at the NIC level is large MX offers a multiplexing capability to provide an unlimited number of pending sends and receives bounded by the amount of available host resources memory Generic matching mechanism MX provides an efficient matching mechanism between incoming messages and posted receives The matching field is large enough 64 bits to support the matching requirements of all modern message passin
26. e that matches the matching information 2005 Myricom Inc 41 The incoming message is not received yet a call to mx irecv is required to allow delivery of the message One current application of the probe function is to allocate the exact amount of memory needed to receive a message before receiving it VII 1 2mx probe mx probe is the blocking counterpart of mx iprobe mx return t mx probe mx endpoint t endpoint uint32 t timeout uint64 t match recv uint64 t match mask mx status t status uint32 t result Parameters IN endpoint The MX endpoint on which to probe for incoming messages IN timeout The value of the timeout in milliseconds The matching information to be matched by the IN match recv incoming message after masking it by the match_mask The mask applied to the matching information of the IN match_mask dieci incoming message to match the match recv argument The status structure to be filled in case a matchin OUT status i 8 incoming message is available OUT result Non zero if there is a message ready to be received If an incoming message matches the matching information the status structure has been updated and mx_probe returns MX_SUCCESS Otherwise the function may return one of the following error codes Error return codes MX_NO_RESOURCES Shortage of memory or other system resources If multiple threads are
27. eive operation is referenced in future calls The address of the remote endpoint mx endpoint addr t which sends the message ultimately matched to this receive will be included in the mx status t structure to be returned when this request completes mx irecw differs from its send counterparts by specifying matching data match recv and match mask The match send value of any incoming message will be first bitwise ANDed with match mask and the result then compared to match recv If the values are the same the message matches the receive and the sent data is placed in the buffer s associated with this receive Data in excess of the total buffer size provided is discarded and the status of the receive operation will be MX STATUS TRUNCATED The total amount of data delivered is specified in the mx status t structure returned from mx test or mx wait The rules for accessing data buffers are analogous to those for sending The data in receive buffers is non deterministic between the time the mx_irecv call returns and when mx test or mx wait indicates that the receive has been completed Writing to the buffers after the receive has been posted but before the status routine indicates completion may corrupt the receive data As with posting a send the segment list may be reused as soon as the call to mx irecw returns Inasmuch as receive requests cannot be buffered calls to mx ibuffered do not apply for receive requests Only mx test
28. ests are ready to be completed the request returned is non deterministic The output parameter request is only valid if the output parameter result is non zero The returned handle must be subsequently passed to mx test or mx wait in order to learn the success or failure of the request and to release the resources associated with the request mx test is preferred over mx wait in this case as the specified request is guaranteed to be complete VI 3 2 mx peek 2005 Myricom Inc 38 mx peek is the same as mx ipeek except that it does not return until a complete request is available or the timeout specified in the call expired mx return t mx peek mx endpoint t endpoint uint32 t timeout mx request t request uint32 t result Parameters IN endpoint The MX endpoint on which the operations are pending IN timeout The value of the timeout in milliseconds OUT request The handle of the completed operation if any OUT result Non zero if there is a request that can be completed If the asynchronous pending request is complete mx peek returns MX SUCCESS Otherwise the function may return one of the following error codes Error return codes MX NO RESOURCES Shortage of memory or other system resources This function blocks until at least one pending operation on a specific MX endpoint is ready for completion If multiple pending requests are ready to be completed
29. g interfaces Overlap of communication and computation even for large messages As the MX communication primitives are fully asynchronous it is possible for the application to continue its execution between the initiation of an operation and its completion In the absence of unusual out of resource conditions MX does not require the user level application to be involved in the progression of the protocol thus allowing overlap between communication and computation Reliable in order matching MX provides an ordered matching protocol Two messages sent from one endpoint in order will match posted receives in order but may be delivered out of order or their completion may be notified out of order The in order matching is sufficient to support all of the modern message passing interfaces and the out of order delivery allows MX to use multiple routes between NIC ports and multiple ports per NIC 2005 Myricom Inc 1 e Efficient support for unexpected messages An unexpected message is one for which a matching receive request has not yet been posted Unexpected messages are processed by receiving the entire message eagerly in an unexpected queue if it is small and by receiving only its header if it is too large The size threshold distinguishing the handling methods can be controlled by the application MX guarantees in order matching even if unexpected messages have been buffered e Transient network fault recovery and high availabilit
30. handle of the first request on this endpoint that is ready for completion i e that can be successfully processed by mx test or mx wait If there are no requests posted on the endpoint that can be 2005 Myricom Inc 37 completed at the time of the call mx_peek will wait until one is ready for completion and mx_ipeek will return immediately If several requests are eligible for completion the particular one returned by one of the peek functions is non deterministic These functions do not release any resources associated with the request a call to mx test or mx wait is still required to release the resources VI 3 1 mx ipeek mx ipeek looks for a request ready for completion on the specified endpoint and returns immediately mx return t mx ipeek mx endpoint t endpoint mx request t request uint32 t result Parameters IN endpoint The MX endpoint on which the operations are pending OUT request The handle of the completed operation if any OUT result Non zero if there is a request that can be completed If one asynchronous pending request is complete mx ipeek returns MX SUCCESS Otherwise the function may return one of the following error codes Error return codes MX NO RESOURCES Shortage of memory or other system resources This function looks for completion of any pending operations on a specific MX endpoint If multiple pending requ
31. he function may return one of the following error codes Error return codes 2005 Myricom Inc 21 MX BOARD UNKNOWN Invalid board number IV 2 2 mx nic id to board number The complementary routine mx nic id to board number is used if an application wants to open an endpoint on a NIC with a given MAC address It converts the MAC address into a board rank as is required by mx open endpoint mx return t mx nic id to board number uint64 t nic id uint32 t board number Parameters IN nic id The NIC ID assigned to the board number we want OUT board number The board number If the board number for the specified NIC ID has been successfully returned mx nic id to board number returns MX SUCCESS Otherwise the function may return one of the following error codes Error return codes MX BOARD UNKNOWN No local board was found with this nic 1d IV 3 Endpoint Addresses The function mx connect is used to build an MX endpoint address and the function mx decompose endpoint addr is used to extract information from the MX endpoint address IV 3 1 mx connect Remote endpoints are specified through the use of an mx endpoint addr t An mx endpoint addr t is formed by combining the NIC ID of a network interface on the node on which the remote endpoint resides the ID of the endpoint and a filter key An mx endpoint addr t is initialized from these elements
32. he letter Y and will block until completion or expiration of a timeout effectively suspending the execution of the current process As an example mx iprob is a non blocking function whereas mx prob is its blocking counterpart However the blocking variant of a function is not always defined in the MX API For example mx isend is the non blocking sending function but mx send is not provided by MX II 2 Endpoints An MX endpoint is a virtualization of a network interface at the process level A process is defined from the UNIX point of view as a collection of execution threads sharing the same virtual address space An endpoint provides an entry point to the interconnect hardware protected from other processes with fairness relative to the other endpoints opened on the same NIC or collection of NICs O 2005 Myricom Inc 4 An endpoint is also an instance of a software interface It is referenced by a variable of type mx_endpoint_t used by many of the MX operations All operations on an open endpoint are restricted to this endpoint MX objects such as send and receive request handles are relative to a specific endpoint and have no meaning to another endpoint even opened by the same process Nothing prevents a process from simultaneously opening several endpoints on the same network interface An endpoint is created by a call to mx open endpoint which returns a handle for referencing the endpoint in an mx endpoint t If mx o
33. ich application programmers writing to the Myrinet Express API should be aware IX 1 Completing Requests It is important to remember that every request posted must have a matching call to either mx test or mx wait to free the resources allocated for handling the request These resources are not released until a call made with the handle for the request to mx test or mx wait returns successfully 2005 Myricom Inc 43 Remember also that calling mx cancel on a request only releases its resources if result 1 Otherwise the call to mx test or mx wait is still needed to confirm the completion of the request and the release of the resources IX 2 Multi threaded Applications Thread safety in MX imposes special considerations If one thread is already blocked in a blocking state function such as mx wait for a single pending request then no other threads can block on the same handle It is an application error to have several threads waiting on the same operation However it is allowed to have several threads blocking on a whole MX endpoint through calls to mx peek In this case a request on this endpoint reaching the complete state will awaken one of the blocked threads The user must not mix polling and blocking on the same handle Concurrently calling mx test and mx wait for example or on the same endpoint concurrently calling mx ipeek and mx peek is not allowed Such a mix would introduce race conditi
34. lap If the total length of the message is 0 it is then allowed to pass NULL as a list of segments and O as the number of segments The destination is specified by the parameter destination It is an mx_endpoint_addr_t object returned by mx_connect The sender also provides the matching information for the message in the parameter match_send The parameter context specifies a user defined pointer that will be included in the status structure returned when this post completes When a pending send request is completed either successfully or unsuccessfully mx_test or mx_wait will return a status structure with the context field filled in with this user supplied value This mechanism may be used to implement callbacks on top of the status functions The context can also be extracted from the request by mx_context Finally the last argument request is a pointer to an mx_request_t object allocated by the application This handle will be assigned by the library and used to reference the pending send operation when checking or blocking for its completion The data buffer s specified in a send operation must not be modified until the request is in the buffered state This state is detected by a successful return from mx_ibuffered The segment list itself may be modified immediately after mx_isend returns however the data buffers to which the list refers should not be modified until the operation is complete The operation is complete as
35. matching receive on the remote side A user defined pointer that will be passed back to IN context the application as part of the status structure when this request completes OUT monet The pointer to the MX Request object that references the pending send operation If the send operation has been successfully posted mx_issend returns MX_SUCCESS Otherwise the function may return one of the following error codes Error return codes MX_NO_RESOURCES Shortage of memory or other system resources The arguments and return codes are identical to the previous function mx_isend The difference between mx_issend and mx_isend lies in the send completion semantics a send request initiated by mx issend can be completed by a call to mx test or mx wait only if the message has been safely delivered to a matching receive request on the destination has been cancelled or an error has occurred 2005 Myricom Inc 28 The request will be pending and will use resources in the MX library and in the NIC associated to the local endpoint as long as the message is not received Posting too many synchronous sends with mx_issend when no matching receives are posted on the receive side will lead to resource exhaustion on the send side The data buffer s specified in a send operation must not be modified until the request enters the buffered state This state is detected by a successful return from
36. mx_ibuffered Q The segments list itself may be modified immediately after mx_issend returns just not the data buffers to which it refers The operation is complete as soon as a call to mx test or mx wait indicates that this pending operation is complete Note that being complete also indicates that the buffers are available for use In the specific case of a send request initiated by mx issend it may be useful for the application to know when the send buffers can be reused before the message is effectively received on the remote side and the send request is ready to be completed Indeed data can be buffered on the send or receive side with the synchronous send request still pending mx ibuffered is used to check if the send request is in buffered state but not yet in complete state Example V 1 Post of a non blocking non contiguous synchronous send include myriexpress h amt main void mx_return_t re mx_endpoint_t endpoint mx_endpoint_addr_t destination uint64_t nicuid mx_request_t send_handle mx_segment_t buffer desc 2 uint8 t workspace 2560 uint64 t match send mx status t status uint32 t result Init and open endpoint Build address of remote endpoint hostname remotehost Endpoint ID 6 Filter key 0x12345678 rc mx hostname to nic id remotehost amp nic id rc mx connect endpoint nic id 6 0x12345678 MX INFINITE amp destination post an sy
37. nchronous non contiguous send composed of 2 contiguous segments 2005 Myricom Inc 29 wait for it to be do rc mx ibuffered e while rc MX SUC Now OK to modify d wait for send comp CPU instead of loo do rc mx test endpoi while rc MX SUC endpoint closing a 2005 Myricom Inc buffer desc 0 segment ptr amp workspace 64 buffer desc 0 segment length 17 buffer desc 1 segment ptr amp workspace 0 buffer desc 1 segment length 50 match send 0x1111111122223333L rc mx issend endpoint buffer desc 2 destination match send NULL amp send handle safe to modify segment list here safe to change values in workspace ndpoint CESS amp amp amp send handle result amp result ata buffer workspace letion mx wait could be used to release the ping on mx test see section VII 2 nt amp send handle amp status amp result CESS amp amp result nd finalize 30 V 2 Receive Operations V 2 1 mx irecv The receive operation has arguments similar to the send operations MX provides mx_irecv to post asynchronous receives mx return t mx irecv mx endpoint t endpoint mx segment t segments list uint32 t segments count uint64 t match recv uint64 t match mask void context mx request t request Parameters IN endpoint The MX Endpoint
38. ndpoint addr t endpoint addr Parameters The handle of the open local endpoint whose address we wish to know A pointer to an mx endpoint addr t where the address of this endpoint is to be stored IN endpoint OUT endpoint addr The current implementation of mx get endpoint addr always returns MX SUCCESS 2005 Myricom Inc 25 V Point to point Communication V 1 Send Operations MX provides two functions to initiate asynchronous sends mx isend and mx issend V 1 1 mx isend mx isend follows the semantics of the standard mode in MPI the request will be completed when the send buffer described by the gather list can be reused by the application Completion of the operation does not give any indication on the fate of the message either being buffered on the send or receive side or matched by a posted receive on the receive side or even lost due to fatal errors in the network mx return t mx isend mx endpoint t endpoint mx segment t segments list uint32 t segments count mx endpoint addr t destination uint64 t match send void context mx request t request Parameters IN endpoint The local MX endpoint used to post the send The array of contiguous segments constituting the gather list describing the buffer to send IN segments list IN segments count The number of segments in the gather list IN destination The MX Addr of the destination
39. ns Since all communication requests within MX are non blocking applications must be able to check for the completion or the intermediate buffered state of these requests mx ibuffered mx test and mx ipeek are used to check the state of requests without blocking mx wait and mx peek are used to block waiting for a request to complete or for the associated buffer s to be reusable for the application effectively releasing the CPU for use by other threads or processes in the meantime mx context is used to extract the context associated with a particular request VI 1 Buffered State VI 1 1 mx ibuffered The function used to check if the application can reuse the buffer s committed to a pending operation is mx_ibuffered mx return t mx ibuffered mx endpoint t endpoint mx request t request uint32 t result Parameters IN endpoint The MX endpoint on which the operation is pending IN request The handle of the pending request OUT result Filled in with a non zero value if the request is buffered mx ibuffered always returns MX SUCCESS in the current implementation The argument request is the handle referencing the pending MX operation If the value returned in result is non zero the buffer s involved in the pending operation can be recycled by the application Otherwise the data is not buffered yet and the application cannot safely reuse the buffer s 2005 Myricom Inc 34 VI
40. of the message The matching information from the send side that IN match send will be used to find a matching receive on the remote side A user defined pointer that will be passed back to IN context the application as part of the status structure when this request completes The pointer to the MX Request object that references the pending send operation OUT request If the send operation has been successfully posted mx isend returns MX SUCCESS Otherwise the function may return one of the following error codes Error return codes MX NO RESOURCES Shortage of memory or other system resources 2005 Myricom Inc 26 This function notifies the network interface that a new send is pending and returns to the application as soon as possible The send buffer is described by a gather list via the parameter segments_list and a number of segments segments_count The gather list is an array of mx_segment_t structures Each segment describes a contiguous memory area using a pointer and a length The maximum number of segments of a specific endpoint is available under the key MX MAX SEGMENTS via the mx get info mechanism Thus it is possible to send a contiguous buffer using only one segment or a non contiguous buffer without any constraints other than the maximum number of segments Segments of length O are allowed but ignored Results are non deterministic if segments within a segment list over
41. ons and the result would be undefined However it is safe to poll and block on different handles or endpoints at the same time X Error Handling Each MX program has an error handler either the default one or one explicitly given by the application This handler is invoked each time a MX function is unable to complete successfully The error handler may terminate the application or if it returns the error code is simply passed back to the application as the return value of the MX function call The default error handler will print some details about the error and terminate the application Consequently unless the application installs a specific error handler MX functions will always return MX SUCCESS never an error code This is a behavior similar to the MPI default error handling Most applications that would abort upon a fatal network error or memory exhaustion can rely on this default behavior and do not need to check the return value of MX primitives Applications can change the error handler with mx set error handler mx error handler t mx set error handler mx error handler t handler IN handler The error handler chosen by the application It is the only function allowed to be called before mx init It would be necessary to do so if the application wanted to handle mx init errors itself An application can 2005 Myricom Inc 44 change the error handler at any point in the course of the applica
42. pected messages Inasmuch as it is often the case that the receive operation might be a little bit late and be posted just after the incoming message arrives it is appropriate to copy the unexpected message into a temporary area Then when a matching receive is posted by the application the message can be delivered immediately This unexpected buffer is limited in size so only small messages will be buffered in this way Larger messages will leave their matching information along with information about the sending endpoint The threshold in message size between a full copy in the unexpected buffer and a copy of only the header matching information and sender endpoint address is specified by the application when the endpoint is opened Unexpected message handling is transparent at the MX API level When a receive is posted the application does not need to know if the incoming matching send has already been saved in the unexpected queue If this is the case and if the message was small enough to fit in its entirety the message is delivered and the receive is completed immediately If the message was larger than the unexpected threshold set by the application MX will notify the sending side that a matching receive has been posted and this will trigger an immediate transmission from the sender without involvement of the application on the send side If no matching unexpected messages are found the receive information is recorded for matching again
43. pen endpoint does not return MX SUCCESS then the mx endpoint t passed in will remain unmodified A value of NULL is guaranteed never to be a valid endpoint 11 3 Endpoint Addressing In order to communicate with a remote endpoint an application must have the endpoint address of that remote endpoint represented by an mx endpoint addr t An endpoint address can be constructed from three pieces of information the NIC ID of a NIC on the remote host which may be the same as the local host an endpoint ID and an endpoint filter value An mx endpoint addr t is created by a call to mx connect and can only be used with the local endpoint used in the mx connect call Each NIC has a unique 64 bit NIC ID for Myrinet NICs this is the MAC Address encoded as a 64 bit number and it is this NIC ID that is used to create an endpoint address The IDs of the NICs within a given host can be queried from any application on the Myrinet using mx board number to nic id The endpoint ID is an integer associated with each open endpoint that can be assigned either by the application that opens it or by the MX library This value is an index and must be within the range 0 MX MAX ENDPOINTS 1 The endpoint filter is an integer that is assigned by the application to distinguish between different instances of the application Through careful use of this parameter the application can filter out MX messages from lingering or zombie processes a
44. point MX_ANY_NIC MX ANY ENDPOINT filter 0 0 amp endpoint do work here close the MX endpoint rc mx close endpoint endpoint Finalize the MX library rc mx finalize return 0 Once an endpoint has successfully been opened it can be used to post asynchronous send and receive operations and test or wait for their completion O 2005 Myricom Inc 19 IV Specifying Endpoints IV 1 Hostnames and NIC IDs IV 1 1 mx hostname to nic id In order to facilitate identifying remote hosts MX provides utility functions mx hostname to nic id and mx nic id to hostname to convert from a hostname to a NIC ID and vice versa The hostname in the context of these functions is actually a nic name it is different for each NIC on multi NIC hosts and is initialized by default to hostname board rank mx hostname to nic id returns the NIC ID given the name of a NIC mx return t mx hostname to nic id char hostname uint64 t nic id Parameters IN hostname The name of the host whose NIC ID we want OUT nic id The NIC ID of the host If the NIC ID for the specified host has been successfully retrieved mx hostname to nic id returns MX SUCCESS Otherwise the function may return one of the following error codes Error return codes MX HOST NOT FOUND The hostname was not found in the network peer table IV 1 2 mx nic id to hostnam
45. point mx_endpoint_t endpoint Parameters IN endpoint The MX endpoint to close The current implementation of mx close endpoint always returns MX SUCCESS mx close endpoint closes an opened MX endpoint All pending operations are cancelled and the endpoint is deregistered from the NIC This function requires a pointer to the mx endpoint t that references the MX endpoint to close The endpoint is closed immediately but cannot be reopened until all messages in flight have been dropped To satisfy this condition the endpoint may remain unusable for a brief period of time 111 3 3 mx wakeup The function mx wakeup causes blocking functions to abort their wait mx return t mx wakeup mx endpoint t endpoint 2005 Myricom Inc 18 Parameters IN endpoint The MX endpoint associated with the blocking call The current implementation of mx wakeup always returns MX SUCCESS mx wakeup is useful in multithreaded applications where it may be necessary to notify a thread that the current blocking operation will never be satisfied Example III 2 Allocation and release of a MX endpoint error checking excised for brevity tinclude myriexpress h int main void mx_return_t Ec mx endpoint t endpoint uint32 t filter Initialize the MX library rc mx init open an MX endpoint filter Oxcafebabe app specific unique value rc mx open _end
46. reply MX NO RESOURCES Shortage of memory or other system resources The mx endpoint addr t returned by this function can be passed either to mx isend or mx issend for point to point communications IV 3 2 mx decompose endpoint addr The function mx decompose endpoint addr can be used to extract the information associated with mx endpoint addr t for instance to identify the source of a message from the mx status t source field returned at receive completion 2005 Myricom Inc 23 mx_return_t mx_decompose_endpoint_addr mx_endpoint_addr_t endpoint_addr Parameters uint64 t nic id uint32 t endpoint id IN endpoint addr An mx endpoint addr t from which we wish to extract component parts OUT nic id NIC ID of remote node to which this endpoint address refers OUT endpoint id ID of the remote endpoint The current MX SUCCESS 2005 Myricom Inc implementation of mx decompose endpoint addr always returns 24 IV 4 Local Endpoint Address IV 4 1mx get endpoint addr It is frequently useful to know the endpoint address of a local endpoint to either send a message to oneself or extract the NIC id and endpoint id when using MX ANY NIC or MX ANY ENDPOINT to communicate it to others The function mx get endpoint addi returns the endpoint address of an opened endpoint mx return t mx get endpoint addr mx endpoint t endpoint mx e
47. request has been delivered to the MX subsystem and it is in progress Once the buffers associated with a request can be safely used by the posting application the request enters the buffered state At this time the buffers can be read or written without affecting the outcome of the request Finally when all activity needed for a request has finished the request enters the complete state The progression through the various states is different for different request types A receive request enters the pending state when issued and remains there until a matching message has been placed in the associated buffers At this time the request changes directly to the complete state A send request enters the pending state when issued but the subsequent state transitions are slightly different for mx isend and mx issend For requests posted with mx isend once the data has been copied out of the associated buffers possibly into a queue of unexpected messages on the receiving node the request changes directly to the complete state For requests posted with mx issend the second s is for synchronous the buffered state is used Once the data being sent has been copied out of the buffers possibly into a queue of unexpected messages on the receiving node the request enters the buffered state 2005 Myricom Inc 7 Only after a matching receive has been issued on the receiving side does the request enter the complete state
48. sfully completed in a call to mx test or mx wait If multiple requests are in the complete state in the endpoint only one of them will be returned by mx ipeek but which one is non deterministic mx peek is the blocking variant of mx ipeek 2005 Myricom Inc 8 Il Initialization 111 1 Library Initialization 111 1 1 mx init Before any other MX calls may be made the library must be initialized by a call to mx init mx return t mx init void If the MX library has been successfully initialized mx init returns MX SUCCESS Otherwise the function may return one of the following error codes Error return codes MX NO DEV The OS specific dev mx devices are not present MX NO DRIVER The MX driver does not seem to be loaded MX NO PERM No permission to access the mx device MX BAD BAD BAD Something bad happened with the driver maybe the wrong driver You need to check the kernel log MX ALREADY INITIALIZED mx init has already been called MX NO RESOURCES Shortage of memory or other system resources This function allocates and initializes all data structures used by the MX API library 111 1 2 mx finalize The complement of the mx init function is mx finalize mx return t mx finalize void The current implementation of mx finalize always returns MX SUCCESS This function cleans up the MX library and releases any resources previously allocated Example III 1
49. st future incoming messages This mechanism provides an efficient eager protocol for small messages and a loose rendezvous protocol for larger messages allowing overlap of communication and computation even in the case of unexpected messages 2005 Myricom Inc 6 1 6 Requests Requests are identifiers used to specify particular instances of pending asynchronous operations All asynchronous MX operations fill in an mx_request_t object passed in by the application which is used to specify this pending operation to subsequent interface calls These handles are generated by mx isend mx issend and mx irecv and can be passed as arguments to mx test mx wait mx ibuffered and mx cancel If any of the functions that fill in an mx request t object does not return MX SUCCESS the mx request t object passed in will remain unchanged Every posted receive must have a matching successful call to mx test or mx wait in order to release and recycle the resources associated with the request The request handle of a completed operation ceases to be a valid argument to any subsequent MX function calls unless and until the same value is assigned to a newly posted request A value of NULL is guaranteed not to be a valid request Il 7 Request States Once a request has been posted it enters a three state life cycle The states of this life cycle are pending buffered for send requests only and complete Pending means that the
50. tion The mx_set_error_handler function returns the previous error handler that was installed An application can either install its own handler of type mx error handler f or it can install the predefined MX ERRORS RETURN handler This predefined error handler does nothing and returns immediately This is the handler to use to have all errors passed back as the return value of MX functions the application then has the responsibility of checking the return value of MX functions and handling any error condition An application can restore the default error handler at any time by using MX ERRORS ARE FATAL as the error parameter All MX functions return MX SUCCESS when no error occurs A list of possible errors if a non aborting error handler is used is given with each function description For compatibility with future revisions applications should not assume that this list is exhaustive and should always have a default case for unknown errors mx strerror can give a string describing the error in this case MX behavior in the case of programming errors is undefined examples of programming errors are passing an invalid endpoint request or pointer to any MX functions or calling any MX primitive without having called mx init first waiting for the same request twice etc Undefined behavior includes the possibility of generating an undocumented error code with explicative text given by mx strerror the MX implementation might use such
51. ttempting to communicate with the previous owner of a particular endpoint ID 11 4 Matching The matching in MX is the process of associating an incoming message to a pending receive Each message passing interface defines its own matching rules based on O 2005 Myricom Inc 5 elements provided by the sending side and or the receiving side A rich matching capability is required to build a complex message passing interface on top of a low level interface or directly implement applications on top of it MX provides a flexible yet powerful matching interface Each message in MX contains 64 bits of matching information The sender specifies this information match send as part of the sending operation and the receiving side provides match recv and a match mask when posting a receive An incoming message will be associated with a pending receive if and only if the incoming match_send data masked with the match mask matches the match recv information of the posted receive 1 5 Unexpected Messages A sub optimal yet common occurrence in message passing is to send a message before a matching receive is posted on the receive side This occurrence can be due to a slight timing drift or more simply to poor application programming methods A message that arrives on the receive side without a matching receive is in many low level interfaces dropped and retransmitted later In MX a buffer is allocated at endpoint opening time to handle such unex
52. used to post the receive y The array of contiguous segments constituting the EN segmentis scatter list describing the receive buffer IN segments_count The number of segments in the scatter list The matching information to be matched by the IN match_recv incoming message after masking it by the match_mask The mask applied to the matching information of the IN match_mask incoming message to match the match recv associated to the pending receive A user defined pointer that will be passed back to IN context the application as part of the status structure when this request completes or fails OUT The pointer to the MX Request object that references the pending receive operation If the receive operation has been successfully posted mx irecv returns MX SUCCESS Otherwise the function may return one of the following error codes Error return codes MX NO RESOURCES Shortage of memory or other system resources The application describes the receive buffer in the same way as in the send case using a scatter list segments list composed of segments count entries which are mx segment t structures A user defined pointer context can be associated to the receive request that 2005 Myricom Inc 3l will be returned in the mx_status_t structure when this request completes The caller specifies request a pointer to an mx_request_t object allocated by the application to receive a handle by which this rec
53. x get info with the key MX COUNTERS COUNT The memory area passed to the out val parameter of mx get info should be large enough to hold the data returned For MX COUNTERS LABELS this should be N MX MAX STR LEN and for MX COUNTERS VALUES this should be N sizeof uint32 t where N is the number returned bp MX COUNTERS COUNT 2005 Myricom Inc 14 111 3 Endpoint Opening and Closing 111 3 1 mx_open_endpoint Once the MX library is initialized the application needs to open an endpoint to be able to send or receive messages This operation is performed by the function mx open endpoint mx return t mx open endpoint uint32 t board num uint32 t endpoint ig uint32 t filter mx param t params list uint32 t params count mx endpoint t endpoint Parameters The local board rank of the NIC on which MX will IN board num try to open an endpoint IN endpoint id The index of the endpoint to open on the specified NIC A user assigned value used to filter incoming IN filter messages and reject mx connect or any unauthorized messages The array of parameters that specifies the IN params list configuration of the endpoint to open NULL if no parameters The number of entries in the array of parameters 0 IN params count if no parameters OUT endpoint The MX endpoint successfully opened If the endpoint has been successfully opened mx open endpoint returns MX SUCC
54. y support No network fabric is perfect and transient errors corruption loss of packets may occur although not frequently in Myrinet fabrics MX automatically recovers from any faults where recovery is possible through means such as retransmitting packets or routing around dead links Catastrophic or unrecoverable errors due to hardware or software failure will be communicated to the application for handling by a higher level recovery strategy e Basic per message authentication mechanism Messages in MX include a user supplied identifier called a filter that provides a basic authentication mechanism between the source and the destination endpoints Messages sent with a filter value that does not match that of the destination endpoint will be rejected at the NIC level e Per message or per endpoint polling or blocking completion functions MX provides functions to check the completion of a specific pending operation or the completion of any of the pending operations related to an endpoint Similarly there are functions to block waiting for completion of a specific pending operation or the completion of all of the pending operations related to an endpoint These blocking semantics release the processor for other application computation Per call timeouts on blocking completion functions MX functions used to block on a specific operation or on all operations of an endpoint take a timeout as an argument The granularity of this timeout is one
55. y with the NIC thus providing OS bypass Input None Output uint32 t Output Size Key sizeof uint32 t MX NIC COUNT Description The number of NICs available to this application Input None Output uint32 t Output Size Key sizeof uint32 t MX NIC IDS Description Identifier MAC address of all NICS in the system in a O terminated array see Section III 2 2 Input None Output uint64 t Output Size Key variable sizeof uint64_t MX_NATIVE_REQUESTS Description The number of requests that can be handled natively by the NIC Input None Output uint32_t Output Size Key sizeof uint32_t MX_COUNTERS_COUNT Description The number of counters in the count table Input uint32_t the board id Output uint32_t Output Size sizeof uint32 t Key MX COUNTERS LABELS Description The text names for each counter Input uint32 t the board id Output uint8 t MX MAX STR LEN 2005 Myricom Inc 12 Output Size Key Variable MX_MAX_STR_LEN MX_COUNTERS_VALUES Description The counters values Input uint32_t the board id Output uint32 t Output Size Key variable uint32 t MX PRODUCT CODE Descriptions The product string for this Myrinet NIC Input uint32 t the board id Output uint8 tiMX MAX STR LEN Output Size
Download Pdf Manuals
Related Search
Related Contents
Projet de parc éolien de Saint-Sulpice (63) DIN-FOOD R-Sentry: Providing Continuous Sensor Services against à recycler - CABINET PAQUEREAU LanPhone 101 User manual CNSN-011-C - Comisión Nacional de Seguridad Nuclear y Manual em PDF G-Technology G-Drive mini 500Gb Heroes of Annihilated Empires - User`s Manual Copyright © All rights reserved.
Failed to retrieve file