Home
MLNX VPI WinOF 4.70 User Manual
Contents
1. Lid L Uses LID address argument usage u Usage message errors e Shows send and receive errors timeouts and others Guid G Uses GUID address argument In most cases it is the Port GUID Example 0x08f1040023 help h Shows the usage message verbose v vv v v v Increases the application verbosity level version V Shows the version info Ca C ca name Uses the specified ca name Port P ca port Uses the specified ca port timeout t timeout ms Overrides the default timeout for the solicited mads Mellanox Technologies 126 Rev 4 70 Examples sminfo local ports sminfo sminfo 32 show sminfo of lid 32 sminfo G 0x8f1040023 same but using guid address 14 3 12 ibclearerrors ibclearerrors is a script which clears the PMA error counters in PortCounters by either waking the InfiniBand subnet topology or using an already saved topology file 14 3 12 1ibclearerrors Synopsis ibclearerrors h N nocolor lt topology file gt C ca name P ca port t ime out timeout ms 14 3 12 2ibclearerrors Options The table below lists the various flags of the command Table 25 ibclearerrors Flags and Options Flag Description C ca name Use the specified ca name P ca port Use the specified ca port t timeout ms Override the default timeout for the solicited mads 14 3 13 ibstat ibstat is a binary which displays b
2. Cancel Step 5 Reboot the system Mellanox Technologies 63 Rev 4 70 8 10 3 3 Verifying SR IOV Support within the Host Operating System To verify that the system is properly configured for SR IOV Step 1 Go to Start gt Windows Powershell Step2 Run the following PowerShell commands Get VmHost lovSupport Get VmHost lovSupportReasons In case that SR IOV is supported by the OS the output in the PowerShell is as in Figure 5 Figure 9 Operating System Supports SR IOV All rights res Apport gt Get VmHost Supporter Administrator Step 3 Update the registry configuration as described in the Get VmHost lovSupportReasons mes sage if BIOS was updated according to BIOS vendor instructions and you see the message as in the figure below Figure 10 SR IOV Support SupportRe the d cont f all Vir E intended ti 1 med IOVEnableOverride un of the tr signed Step 4 Reboot Step 5 Verify the system is configured correctly for SR IOV as described in Steps 1 2 8 10 3 4 Creating a Virtual Machine To create a virtual machine Step 1 Open the Hyper V Manager Step2 Go to New Virtual Machine and set the following Name lt name gt Startup memory 4096 MB Connection Not Connected Mellanox Technologies 64 Rev 4 70 Figure 11 Hyper V Manager Actions Virtual Machine Hard Disk Import Virtual Machine
3. Examples 1 Query PortInfo by LID with port modifier gt smpquery portinfo 1 1 Port info Lid 1 port 1 MkGy A ETE REEE Agee a me TN 0x0000000000000000 GIdPreflx e e E OO 0xfe80000000000000 lint ln at e ciere E 0x0001 GMa sk EE caer resets sent grasa ne 0x0001 CapMac kiran etae i ec nies ne 0x251086a IsSM IsTrapSupported sAutomaticMigrationSupported IsSLMappingSupported IsSystemImageGUIDsupported IsCommunicatonManagement Supported IsVendorClassSupported IsCapabilityMaskNoticeSupported IsClientRegistrationSupported Diag Coden EE DEN rere 0x0000 MkeyheasePeniod Pee 0 iere ubleiciep rete cL TUE il hahah iychelalsioelouletls eee IT 1X or 4X IramiewidbhSHpposeed a sacasaan eos 1X or 4X IorikwidbhyACEVO ETE C AX limk speed Supported ee CT ET 2 5 Gbps or 5 0 Gbps L3nkStatesee e cn Ue D EI TS Active PhysminkSbatbo EE LinkUp LinkDownDer State e T TOT Polling ProtecbBibste E 0 TIME E A ES RR RE TISUD E RIP EETETG 0 IuWkSoeedAG5iVe n e 5 0 Gbps ImSoeedbnabled em A 2 5 Gbps or 5 0 Gbps NedchborMIU Meet 2048 OMS rer e RR EE 0 MEG ederent erm UE TII e VL0 7 IDXCDVDO E aeons 0x00 VORTA MAMIE one cecedouanuoasses 4 Garb Haigh Caprese E 8 WiATblOWwCaDpipe S TE 8 TAERE PIA ESO EHE PER IE A 0x00 Mellanox Technologies 114 Rev 4 70 2 Query SwitchInfo by GUID 3 Query Nodelnfo by direct route Mellanox Technologies 115 Rev 4 70 PaALl Capertee T OCT 128 DEVIAS E E S 0x634
4. Flag Description p port lt port gt Listens on connect to port lt port gt default 18515 d ib dev lt dev gt Uses IB device lt device guid gt default first device found i ib port lt port gt Uses port lt port gt of IB device default 1 m mtu lt mtu gt The mtu size default 1024 0 outs lt num gt The number of outstanding read atom default for ConnectX 16 oth ers 4 Size lt size gt The size of message to exchange default 65536 a all Runs sizes from 2 till 2423 t tx depth lt dep gt The size of tx queue default 100 n iters lt iters gt The number of exchanges at least 2 default 1000 u qp timeout lt timeout gt QP timeout The timeout value is 4 usec 2 timeout default 14 S sl lt sl gt The service level default 0 Mellanox Technologies 150 Rev 4 70 Table 43 ibv_read_lat Flags and Options Flag Description x gid index lt index gt Test uses GID with GID index taken from command line for RDMAOE index should be 0 C report cycles Reports times in cpu cycle units default microseconds H report histogram Prints out all results default print summary only U report unsorted implies H Prints out unsorted results default sorted V version Displays version number e events Inactive during CQ events default poll F CPU freq The CPU frequency test It is active
5. Flag Description G Guid Use GUID address argument In most cases it is the Port GUID Example 0x08f1040023 s sm port Use lt smlid gt as the target lid for SM SA queries C Ca Use the specified channel adapter or router P Port Use the specified port u usage Usage message t timeout Override the default timeout for the solicited MADs msec dest dr path lid guid Destination s directed path LID or GUID lt portnum gt Destination s port number lt op gt lt value gt Define the allowed port operations enable disable reset speed and query In case of multiple channel adapters CAs or multiple ports without a CA port being specified a port is chosen by the utility according to the following criteria 1 The first ACTIVE port that is found 2 Ifnot found the first port that is UP physical link state is LinkUp Examples 1 Query the status of Port 1 of CA mlx4_0 using ibstatus and use its output the LID 3 in this case to obtain additional link information using ibportstate ibstat CA type MT4099 Number of ports 2 Firmware version 2 11 536 Hardware version 0 Node GUID 0x0002c903002e6670 System image GUID 0x0002c903002e6673 Forte 14 Physical state Disabled Rate 10 Base lid 4 LMC 0 SM lid 2 Capability mask 0x0251486a Port GUID 0x0002c903002e6671 Link layer InfiniBand gt ibportstate C mlx4 0 4 1 query PortInfo Port inf
6. 53 8 8 1 Enabling Disabling NVGRE Offloading 0 0 0 0 ce eee eee ee 54 8 8 2 Configuring the NVGRE using PowerShell 0 0 0 0 cee eee 55 8 83 Verifying the Encapsulation of the Traffic 2 0 0 cece eee 56 8 84 Removing NVGRE configuration 00 000 c eee cette ees 56 8 9 Differentiated Services Code Point DSCP 00 0 cece 56 8 9 1 Setting the DSCP in the IP Header 1 2 0 eee 57 8 9 2 Configuring Quality of Service for TCP and RDMA Traffic 57 8 9 3 Configuring DSCP for TCP Traffic 0 20 cee ee 57 8 9 4 Configuring DSCP for RDMA Traffic een 57 8 9 Registry Settings se eee e ope e ce dete edet de dp re an 58 8 9 6 DSCP Sanity Testing cesses eis Bee ans pne CREATE CER gre reds 59 8 10 SR IOV ys eh ets ees See Ges bey Se ee a eh Pk ee E e ROS 59 8 10 1 System Requirements s coss cees aieia in ekaa n 60 8 10 2 SR IOV Feature Limitations lees 60 8 10 3 Configuring SR IOV Host Machine eese 60 8 10 4 Configuring Mellanox Network Adapter for SR IOV 2 04 66 8 10 5 Configuring Virtual Machine Networking 00 cece eee eee 69 8 11 Virtual Ethernet Adapter 0 0 0 oceans 73 8 11 1 System Requirements ess porre 0 0 0 cee ete e 73 8 11 2 VEA Feature Limitations 0 2 0 0 0 cee eee teens 74 8 11 3 Adding a New Virtual Adapter 0 2 unnan unanenn 74 8 11 4 Removing a Virtual Ethernet Adapter sees 74 8 11
7. Back Finish Cancel e Ifthe firmware upgrade and the restore of the network configuration failed the following message will be displayed InstallShield Wizard Completed The InstallShield Wizard has successfully installed MLNX VPI Click Finish to exit the wizard You chose to run performance tuning The log file can be Found at C windows System32 LogFiles PerformanceTunin glog Firmware upgrade failed with error 8 Error Description refer to the UM For instructions on how to manually burn firmware We Failed to restore the network configuration with error code 3 We could not burn the new version Please Cl Show release notes Back Cancel Mellanox Technologies 24 Rev 4 70 4 2 Unattended Installation The following is an example ofa MLNX WinOF win2012 x64 unattended installation session Step 1 Open a CMD console Windows Server 2008 R2 Click Start gt Run and enter CMD Windows Server 2012 2012 R2 Click Start gt Task Manager gt File gt Run new task gt and enter CMD Step 2 Install the driver Run gt MLNX VPI WinOF 4 70 All win2012 x64 exe S v qn Step 3 Optional Manually configure your setup to contain the logs option gt MINX VPI WinOF 4 70 All win2012 x64 exe S v qn v 1 vx LogFile Starting from MLNX WinOF v4 55 the log option is enabled automatically The default path of the log is LOCALAPPDATA MMLNX WinOF 10g0 Step 4
8. including the loss of data system hang and you may need to reinstall Windows As such it is recommended to back up the registry on your system before implementing recommendations included in this section If the modifications you apply lead to serious problems you will be able to restore the original registry state For more details about backing up and restoring the registry please visit www microsoft com Ahead 11 1 General Performance Optimization and Tuning To achieve the best performance for Windows you may need to modify some of the Windows registries 11 1 1 Registry Tuning The registry entries that may be added changed by this General Tuning procedure are Under HKEY LOCAL MACHINENSSYSTEM CurrentControlSetServices Tcpip Parameters Disable TCP selective acks option for better cpu utilization SackOpts type REG DWORD value set to 0 Under HKEY LOCAL MACHINENSYSTEM CurrentControlSetServices AFD Parameters Enable fast datagram sending for UDP traffic FastSendDatagramThreshold type REG DWORD value set to 64K Under HKEY LOCAL MACHINENSSYSTEM CurrentControlSetServicesNdis Parameters e Set RSS parameters RssBaseCpu type REG DWORD value set to 1 11 1 2 Enable RSS Enabling Receive Side Scaling RSS is performed by means of the following command netsh int tcp set global rss enabled 11 1 3 Tuning the IPoIB Network Adapter The IPoIB Network Adapter tuning can be performed either during install
9. Note This registry value is not exposed via the UI Mellanox Technologies 183 Rev 4 70 Value Name Default Value Description RSS 1 Sets the driver to use Receive Side Scaling RSS mode to improve the performance of handling incoming packets This mode allows the adapter port to utilize the multiple CPUs ina multi core system for receiving incoming packets and steering them to their destination RSS can significantly improve the number of transactions per second the number of connections per second and the network throughput This parameter can be set to one of two values 1 enable default Sets RSS Mode 0 disable The hardware 1s configured once to use the Toeplitz hash function and the indirection table is never changed Note the I O Acceleration Technology IOAT is not func tional in this mode TxHashDisrtibution 3 Sets the algorithm which is used to distribute the send pack ets on different send rings The adapter uses 3 methods e 1 Size In this method only 2 Tx rings are used The send pack ets are distributed based on the packet size Packets that are smaller than 128 bytes use one ring while the larger packets use the other ring e 2 Hash In this method the adapter calculates a hash value based on the destination IP the TCP source and the destination port If the packet type is not IP the packet uses ring number 0 3 Hash and size In this method for each hash value 2 r
10. vvv Step 2 Right click a Mellanox network adapter under Network adapters list and left click Proper ties Select the Advanced tab from the Properties sheet Details Events Power Management General Advanced Information Performance Driver The following properties are available for this network adapter Click the property you want to change on the left and then select its value on the right Property Value Bus master DMA Operations Enabled Flow Control Header Data Split Interrupt Moderation E Interrupt Moderation R Packet Cc Interrupt Moderation RX Packet Ti Interrupt Moderation TX Packet Cc Interrupt Moderation TX Packet Tit IP 4 Checksum Offload Jumbo Packet Large Send Offload LSO Large Send Offload V2 IPv4 Large Send Offload V2 IPv6 Large Send Offload Version 1 IP Y Mellanox Technologies 33 Rev 4 70 Step 3 Modify configuration parameters to suit your system Please note the following a For help on a specific parameter option check the help button at the bottom of the dialog b If you select one of the entries Off load Options Performance Options or Flow Control Options you ll need to click the Properties button to modify parameters via a pop up dialog 7 4 Configuring Quality of Service QoS Prior to configuring Quality of Service you must install Data Center Bridging u
11. eet steels 160 14 4 18 nd send lat 1 52 ou RR ERREUR URP RE EA 161 14 419 NTE oo stubs e Prey ates ee oes Seeded viele gee Gaby ed SEU e P TNNT 162 Chapter 15 Troubleshooting eee nn nn nn nne 164 15 1 InfiniBand Troubleshooting 0 0 0 0 00 0c cette nee 164 15 2 Ethernet Troubleshooting 0 0 nett 164 15 3 Performance Troubleshooting 0 00 c eee e eee eee 166 15 4 General Troubleshooting 00 0 eh 168 15 5 Installation Error Codes and Troubleshooting 00 0 0 00 eee 169 15 5 1 Setup Return Codes ice det eb eh Pee ean ERS 169 15 5 2 Firmware Burning Warning Codes 0000 eee e eee eee 169 15 5 3 Restore Configuration Warnings 169 Mellanox Technologies 6 J Rev 4 70 List of Tables Table 1 ReVISton History vaseho rate a Rn lb ete ed tud 9 Table 2 Documentation Conventions isque ce 404s Hens rS ERAI SERA CAI HS 13 Table 3 Abbreviations and Acronyms 44605 eec ah ate rt xt Cad onal MEAS 14 Table 4 Related Documents o2 gato ees x M ha Hd e ERU SUR 15 Table 5 Hardware and Software Requirements 0 000 cece cece eens 16 Table 6 Registry Keys Setting 2 cette dead ace oe PRU Ga bP OL e Re ae eS eae 38 Table 7 DSCP Registry Keys Settings o2 sese ee exe nh a Rea ae ens 58 Table 8 DSCP Default Registry Keys Settings 0 0 6 0 00 csc e ween ee 58 Table 9 Lossless TCP Associated Events 1 ossa spa e Wes S ARS EAS 77 Table 10 Reserved IP Address Options
12. Flag Description m max_lid This option specifies the maximal LID number to be searched for during inventory file build default to 100 g guid This option specifies the local port GUID value with which OpenSM should bind OpenSM may be bound to port at a time If GUID given is 0 OpenSM displays a list of possible port GUIDs and waits for user input Without g OpenSM tries to use the default port p port This option displays a menu of possible local port GUID values with which osmtest could bind 1 Inventory This option specifies the name of the inventory file Normally osmtest expects to find an inventory file which osmtest uses to validate real time information received from the SA during testing If i is not specified osmtest defaults to the file osmtest dat See c option for related information stress This option runs the specified stress test instead of the normal test suite Stress test options are as follows OPT Description sl Single MAD RMPP response SA queries 82 Multi MAD RMPP response SA queries 83 Multi MAD RMPP Path Record SA queries s4 Single MAD non RMPP get Path Record SA queries Without s stress testing is not performed M Multicast_ Mode This option specify length of Multicast test OPT Description M1 Short Multicast Flow default single mode M2 Short Multicast Flow multiple mode M3 Long Multicast Flow sin
13. Mellanox Technologies 171 Rev 4 70 e New NetQosPolicy UDP IPProtocolMatchCondition UDP Priority Value8021 Action 1 Enable NetQosFlowControl 3 Disable NetQosFlowControl 0 1 2 4 5 6 7 Enable netadapterqos Name A 5 2 Running MPI Command Examples Running MPI pallas test over ND mpiexec exe p 19020 hosts 4 11 11 146 101 11 21 147 101 11 21 147 51 11 11 145 101 env MPICH NETMASK 11 0 0 0 255 0 0 0 env MPICH ND ZCOPY THRESHOLD 1 env MPICH DISABLE ND 0 env MPICH DISABLE SOCK 1 affinity c testl exe Running MPI pallas test over ETH exempiexec exe p 19020 hosts 4 11 11 146 101 11 21 147 101 11 21 147 51 11 11 145 101 env MPICH NETMASK 11 0 0 0 255 0 0 0 env MPICH ND ZCOPY THRESHOLD 1 env MPICH DISABLE ND 1 env MPICH DISABLE SOCK 0 affinity c testl exe Mellanox Technologies 172 Rev 4 70 Appendix B NVGRE Configuration Scrips Examples The setup is as follow for both examples below Hypervisor mtlael4 Port1 192 168 20 114 24 VM on mtlael4 mtlae14 005 172 16 14 5 16 Mac 00155D720100 VM on mtlael4 mtlae14 006 172 16 14 6 16 Mac 00155D720101 Hypervisor mtlael5 Porti 192 168 20 115 24 VM on mtlael5 mtlael15 005 172 16 15 5 16 Mac 00155D730100 VM on mtlae15 mtlae15 006 172 16 15 6 16 Mac 00155D730101 B 1 Adding NVGRE Configuration to Host 14 Example The following is an example of adding NVGRE to Host 14 On both sides vSwitch create command Note
14. NOTE Most OpenIB diagnostics take the following common flags The exact list of supported flags per util ity can be found in the usage message and can be shown using the util name h syntax h shows the usage message y shows the version info Mellanox Technologies 133 Rev 4 70 14 3 18 iblinkinfo iblinkinfo reports link info for each port in an IB fabric node by node Optionally iblinkinfo can do partial scans and limit its output to parts of a fabric 14 3 18 1iblinkinfo Synopsis hedl C ca name P ca port p S port guid G port guid D direct route load cache lt filename gt 14 3 18 2iblinkinfo Flags and Options Table 31 iblinkinfo Flags and Options Flags Description S port guid G port guid port guid Starts partial scan at the port specified by port guid hex format D direct route Starts partial scan at the port specified by the direct route path l Prints all information for each link on one line Default is to print a header with the node information and then a list for each port useful for grep ing output d Prints only nodes which have a port in the Down state p Prints additional port settings lt Life Time gt lt HoqLife gt lt VLStall Count gt C lt ca_name gt Uses the specified ca_name for the search P lt ca_port gt Uses the specified ca_port for the search R This option is obs
15. Added section Advanced Configuration for Ethernet Driver e Added section Updated section Tunable Performance Parame ters Added section Merged Ethernet and InfiniBand features sec tions e Removed section Sockets Direct Protocol and its subsections e Removed section Winsock Direct and Protocol and its subsec tions Removed section Added ConnectX 3 support Removed section IPoIB Drivers Overview Removed section Booting Windows from an iSCSI Target Rev 2 1 3 January 28 2011 Complete restructure Rev 2 1 2 October 10 2010 e Removed section Debug Options Updated Section 3 Uninstalling Mellanox VPI Driver on page 11 Added Section 6 InfiniBand Fabric on page 38 and its sub sections e Added Section 6 3 InfiniBand Fabric Performance Utilities on page 71 and its subsections Rev 2 1 1 1 July 14 2010 e Removed all references of InfiniHost adapter since it is not supported starting with WinOF VPI v2 1 1 Rev 2 1 1 May 2010 First release Mellanox Technologies 12 J Rev 4 70 About this Manual Scope The document describes WinOF Rev 4 70 features performance InfiniBand diagnostic tools content and configuration Additionally this document provides information on various perfor mance tools supplied with this version Intended Audience This manual is intended for system administrators responsible for the installation configuration management and maintenan
16. All devices on the same physical network or on the same logical network must have y the same MTU Receive Buffers The number of receive buffers default 1024 Send Buffers The number of sent buffers default 2048 Performance Options Configures parameters that can improve adapter performance Interrupt Moderation Moderates or delays the interrupts generation Hence optimizes network throughput and CPU uti lization default Enabled When the interrupt moderation is enabled the system accumulates interrupts and sends a single interrupt rather than a series of interrupts An interrupt is generated after receiving 5 packets or after 10ms from the first packet received It improves performance and reduces CPU load however it increases latency When the interrupt moderation is disabled the system generates an interrupt each time a packet is received or sent In this mode the CPU utilization data rates increase as the system handles a larger number of interrupts However the latency decreases as the packet is handled faster Receive Side Scaling RSS Mode Improves incoming packet processing performance RSS enables the adapter port to utilize the multiple CPUs in a multi core system for receiving incoming packets and steering them to the des Mellanox Technologies 91 J Rev 4 70 ignated destination RSS can significantly improve the number of transactions the number of con nections per second and the net
17. oes a0 Se abn e sees Ie Ra ede ac eS 79 Table 11 Mellanox Adapter Traffic Counters 0 0 0 cece eee eee 94 Table 12 Mellanox Adapter Diagnostics Counters 0 00 00000 cece eee 95 Table 13 Mellanox QoS Counters 05 540 6 69 09 x s RESP TAa tah Repo m 97 Table 14 ibdiagnet Options c etes o Medos OR D Seu peter S CU E o nee 104 Table 15 ibdiagnet Output Piles iuo sex XR vee UE d iR 105 Table 16 ibportstate Flags and Options 0 ccc ccc ee cece ee 106 Table 17 ibroute Flags and Options 0 0 cece cece e hn 109 Table 18 ibdump Flags and Options 24 crt d eLVA Me he C RD RETE 112 Table 19 smpquery Flags and Options 004 06 044 seus gr RR nn 113 Table 20 perfquery Flags dnd Options 035 2400 yaw REIR DE REHRE ER EVI Y ee 116 Table 21 abping Flags and Options soo dae y aso RE SRI Veg onse CRUCE Rs 119 Table 22 ibnetdiscover Flags and Options 00 ce cece eens 120 Table 23 ibtracert Flags and Options 23 ed sine PEE READ Seed cones PRs eee UE 124 Table24 sminfo Flags and Options 0 0 0 ccc ee 126 Table 25 ibclearerrors Flags and Options ve leat uet b ad TE CER 127 Table26 ibstat Flags and Options yn lt sccd4 a4 ESTA enue ogee SEES E ORI SR 127 Table27 vstat Flags and Options cc uoexse CRX abe ew ea b AR I aye TR Ss 128 Table 28 osmtest Flags and Options iios E Ra TR oa Say Gm chr e 129 Table29 ibaddr Flags and Options e ok meh sad RPG
18. r o lt out dir gt t topo file s sys name i lt dev index gt p lt port num gt pm pc P lt lt PM counter gt lt Trash Limit gt gt lw lt 1x 4x 12x gt 1s lt 2 5 5 10 gt skip dup guids zero guids pm logical state 14 3 2 1 Link Level Retransmission LLR in FDR Links With the introduction of FDR 56 Gbps technology Mellanox enabled a proprietary technology called LLR Link Level Retransmission to improve the reliability of FDR links This proprietary LLR technology adds additional CRC checking to the data stream and retransmits portions of packets with CRC errors at the local link level Customers should be aware of the following facts associated with LLR technology Traditional methods of checking the link health can be masked because the LLR tech nology automatically fixes errors The traditional IB symbol error counter will show no errors when LLR is active Latency of the fabric can be impacted slightly due to LLR retransmissions Traditional IB performance utilities can be used to monitor any latency impact Bandwidth of links can be reduced if cable performance degrades and LLR retransmis sions become too numerous Traditional IB bandwidth performance utilities can be used to monitor any bandwidth impact Due to these factors an LLR retransmission rate counter has been added to the ibdiagnet utility that can give end users an indication of the link health To monitor
19. Hyper V Settings Floppy Disk Virtual Switch Manager Virtual S amp N Manager Edit Disk Inspect Disk Stop Service Remove Server Refresh OX O BLEERES View b m Help Step3 Connect the virtual hard disk in the New Virtual Machine Wizard Step 4 Go to Connect Virtual Hard Disk gt Use an existing virtual hard disk Step 5 Select the location of the vhd file Figure 12 Connect Virtual Hard Disk Connect Virtual Hard Disk Before You Begin 4 virtual machine requires storage so that you can install an operating system You can specify the Specify Name and Location storage now or configure it later by modifying the virtual machine s properties Assign Memory Create a virtual hard disk Configure Networking Use this option to create a dynamically expanding virtual hard disk with the default Format VHDX Connect Virtual Hard Disk vm1 vhdx Summary CisersiPubliciDocumentslHyper VWirtual Hard Disks 127 GB Maximum 64 TB Lise an existing virtual hard disk Use this option to attach an existing virtual hard disk either VHD or VHDX Format Location XWin8Srv DC x64 fre 9200 vm1 vhd Browse Attach a virtual hard disk later Use this option to skip this step now and attach an existing virtual hard disk later lt Previous Next gt Einish Cancel Mellanox Technologies 65 Rev 4 70 8 10 4 Configuring Mellanox Network Adapter for SR IOV The
20. U options Supported query names and aliases ClassPortInfo CPI NodeRecord NR lid PortInfoRecord PIR lid port options SL2VLTableRecord SL2VL lid in_port out_port PKeyTableRecord PKTR lid port block VLArbitrationTableRecord VLAR lid port block InformInfoRecord IIR LinkRecord LR from_lid from_port to_lid to_port ServiceRecord SR PathRecord PR MCMemberRecord MCMR LFTRecord LFTR lid block MFTRecord MFTR mlid position block GUIDInfoRecord GIR lid block d enables debugging h Shows help 14 3 22 smpdump smpdump is a general purpose SMP utility which gets SM attributes from a specified SMA The result is dumped in hex by default 14 3 22 14smpdump Synopsis smpdump s ring D irect C ca name P ca port t imeout timeout ms V ersion h elp dlid dr path attr mod Mellanox Technologies 141 Rev 4 70 14 3 22 2smpdump Options Table 35 smpdump Flags and Options Flags Description attr IBA attribute ID for SM attribute mod IBA modifier for SM attribute Debugging Flags Description NOTE Most OpenIB diagnostics take the following common flags The exact list of supported flags per utility can be found in the usage message and can be shown using the util_name h syntax d Raises the IB debugging level Can be used several times ddd or
21. a all Runs sizes from 2 till 2423 t tx depth lt dep gt The size of tx queue default 100 n iters lt iters gt The number of exchanges at least 2 default 1000 Mellanox Technologies 143 Rev 4 70 Table 36 ib_read_bw Flags and Options Flag Description b bidirectional Measures bidirectional bandwidth default unidirectional V version Displays version number g grh Use GRH with packets mandatory for RoCE 14 4 2 ib_read_lat ib_read_lat calculates the latency of RDMA read operation of message_size between a pair of machines One acts as a server and the other as a client They perform a ping pong benchmark on which one side RDMA reads the memory of the other side only after the other side have read his memory Each of the sides samples the CPU clock each time they read the other side memory in order to calculate latency Read is availible only in RC connection mode as specified in IB spec 14 4 2 1 ib_read_lat Synopsis ib read lat i b port ib port m tu mtu size s ize message size t x depth tx size n iteration num p ort PDT port o uts outstanding reads a 11 V ersion C report cycles H report histogram U report unsorted 14 4 2 2 ib read lat Options The table below lists the various flags of the command Table 37 ib read lat Flags and Options Flag Description p port lt port gt Listens on co
22. m mtu lt mtu gt The mtu size default 1024 0 outs lt num gt The number of outstanding read atom default for ConnectX 16 oth ers 4 SIZe lt size gt The size of message to exchange default 65536 a all Runs sizes from 2 till 223 t tx depth lt dep gt The size of tx queue default 100 n iters lt iters gt The number of exchanges at least 2 default 1000 u qp timeout lt timeout gt QP timeout The timeout value is 4 usec 2 timeout default 14 S sl lt sl gt The service level default 0 x gid index lt index gt Test uses GID with GID index taken from command line for RDMAOE index should be 0 b bidirectional Measures bidirectional bandwidth default unidirectional V version Displays version number g post lt num of posts The number of posts for each qp in the chain default tx depth e events Inactive during CQ events default poll F CPU freq The CPU frequency test It is active even if the cpufreq_ondemand module is loaded R rdma_cm Connect QPs with rdma_cm and run test on those QPs Z com_rdma_cm Communicate with rdma_cm module to exchange data use regular QPs c connection lt RC UC gt Connection type RC UC default RC I inline_size lt size gt Max size of message to be sent in inline default 0 Mellan
23. put This option may be particularly useful for environ ments where switches are not fully populated thus much of the default iblinkinfo info is considered un useful See ibnetdiscover for information on caching ibnetdiscover out put 14 3 19 ibqueryerrors The default behavior is to report the port error counters which exceed a threshold for each port in the fabric The default threshold is zero 0 Error fields can also be suppressed entirely In addition to reporting errors on every port ibqueryerrors can report the port transmit and receive data as well as report full link information to the remote port if available 14 3 19 1ibqueryerrors Synopsis ibqueryerrors options 14 3 19 2ibqueryerrors Options Table 32 ibqueryerrors Flags and Options Flags Description s lt errl err2 gt Suppresses the errors listed in the comma separated list pro vided c Suppresses some of the common side effect counters These counters usually do not indicate an error condition and can be usually be safely ignored G lt port_guid gt S port guid port Report results for the port specified For switches results are guid printed for all ports not just switch port 0 S same as G Provided only for backward compatibility D lt direct_route gt Reports results for the port specified For switches results are printed for all ports not just switch port 0 Mellanox Technologies
24. 00155D730101 Set VMNetworkAdapter VirtualSubnetID 5001 Mellanox Technologies 175 Rev 4 70 Appendix C Registry Keys Mellanox IPoIB and Ethernet drivers use registry keys to control the NIC operations The regis try keys receive default values during the installation of the Mellanox adapters Most of the parameters are visible in the registry by default however certain parameters must be created in order to modify the default behavior of the Mellanox driver The adapter can be configured either from the User Interface Device Manager gt Mellanox Adapter gt Right click gt Properties or by setting the registry directly AII Mellanox adapter parameters are located in the registry under the following registry key HKEY LOCAL MACHINE SYSTEM CurrentControlSet Control Class 4D36E972 E325 11CE BFC1 08002bE10318 lt Index gt The registry key can be divided into 4 different groups Group Description Basic Contains the basic configuration Offload Options Controls the offloading operation that the NIC supports Performance Options Controls the NIC operation in different environments and scenarios Flow Control Options Controls the TCP IP traffic Any registry key that starts with an asterisk is a well known registry key For more details regarding the registries pl
25. 1 Moderate Interrupt moderation is set to midrange defaults to allow maxi mum throughput at minimum CPU utilization for common sce narios 2 Aggressive Interrupt moderation is set to maximal values to allow maxi mum throughput at minimum CPU utilization for more inten sive multi stream scenarios C 4 Off load Registry Keys This group of registry keys allows the administrator to specify which TCP IP offload settings are handled by the adapter rather than by the operating system Enabling offloading services increases transmission performance Due to offload tasks such as checksum calculations performed by adapter hardware rather than by the operating system and therefore with lower latency In addition CPU resources become more available for other tasks Value Name Default Value Description LsoVIIPv4 1 Large Send Offload Version 1 IPv4 The valid values are e 0 disable e 1 enable Mellanox Technologies 180 Rev 4 70 Value Name Default Value Description LsoVIIPv4 1 Large Send Offload Version 2 IPv4 The valid values are 0 disable e 1 enable LsoVIIPv4 1 Large Send Offload Version 2 IPv6 The valid values are e 0 disable 1 enable LSOSize 32000 The maximum number of bytes that the TCP IP stack can pass to an adapter in a single packet This value affects the memory consumption and the NIC perfor mance The valid values are MTU 102
26. 3 amp Microsoft Kernel Debug Network Adapter p 3 Ports COM amp LPT p Gap Print queues b D Processors b 7 Storage controllers 4 M System devices amp ACPI Fixed Feature Button jk Composite Bus Enumerator jk Direct memory access controller jki Intel 82371AB EB PCI to ISA bridge ISA mode jki Intel 82443BX Pentium R Il Processor to PCI Bridge jk Mellanox ConnectX 3 VPI MT04100 PCle 3 0 5GT s IB FDR 40GigE Virtual Network Adapter jki Microsoft ACPI Compliant System To achieve best performance on SR IOV VF please run the following powershell com mands on the host For 10Gbe Set VMNetworkAdapter Name Network Adapter VMName vm1 IovQueue PairsRequested 4 For 40Gbe Set VMNetworkAdapter Name Network Adapter VMName vm1 IovQueue PairsRequested 8 t 8 11 Virtual Ethernet Adapter The Virtual Ethernet Adapter VEA provides a mechanism enabling multiple ethernet adapters on the same physical port Each of these multiple adapters is referred to as a virtual ethernet adapter VEA At present one can have a total of two VEAs per port The first VEA normally the only adapter for the physical port is referred to as a physical VEA The second VEA if present is called a virtual VEA currently only a single Virtual VEA is supported The difference between a vir tual and a physical VEA is that RDMA is only available through the physical VEA In addition certain settings for the port can only be
27. By default the adapter works in Drop Mode The adapter reverts to this mode upon initialization restart 8 13 5 Known Limitations e The feature is not available for SR IOV Virtual Functions tis recommended that the feature be used only when the port is configured to maintain flow control tis recommended not to exceed typical timeout values of management protocols usu ally in the order of several seconds norder for the feature to effectively prevent packet drops the DPC load duration needs to be lower than the TCP retransmission timeout The feature is only activated if neither of the ports is IB 8 13 6 System Requirements Operating System Windows 2012 or Windows 2012 R2 Firmware 2 31 5050 8 13 7 Enabling Disabling Lossless TCP This feature is controlled using the registry key DelayDropTimeout that enables Lossless TCP capability in hardware and by Set OID OID_MLX pRoPLESS MODE which triggers transition to from Lossless poll mode Mellanox Technologies 76 J Rev 4 70 8 13 7 1 Enabling Lossless TCP Using The Registry Key DelayDropTimeout Registry Key location HKLM SYSTEM CurrentControlSet Control Class Class 4d36e972 325 11ce bfcl 08002be10318 lt nn gt DelayDropTimeout For instructions on how to find interface index in registry lt nn gt Please refer to C 2 Finding the Index Value of the Network Interface on page 177 Key Name Key Type Values Description DelayDropTi
28. If the primary adapter fails the secondary adapter currently in a standby mode takes over Fault Tolerance is the basis for each of the following teaming types and is inherent in all teaming modes 2 Switch Fault Tolerance Provides a failover relationship between two adapters when each adapter is connected to a separate switch 3 Send Load Balancing Provides load balancing of transmit traffic and fault tolerance The load balancing performs only on the send port 4 Load Balancing Send amp Receive Provides load balancing of transmit and receive traffic and fault tolerance The load balancing splits the transmit and receive traffic statically among the team adapters without changing the base of the traffic loading based on the source destination MAC and IP addresses 5 Adaptive Load Balancing The same functionality as Load Balancing Send amp Receive In case of traffic load in one of the adapters the load balancing channels the traffic between the other team adapter 6 Dynamic Link Aggregation 802 3ad Provides dynamic link aggregation allowing creation of one or more channel groups using same speed or mixed speed server adapters 7 Static Link Aggregation 802 3ad Provides increased transmission and reception throughput in a team comprised of two to eight adapter ports through static configuration If the switch connected to the HCA supports 802 3ad the recommended setting is teaming mode 6 8 5 2 Creating a Load Bala
29. M Mellanox ConnectX 3 MTC4U9S Network Adapter R AK Intel R 63T1ESB 6321E58 PCI Express to PCI X Bridge 350C 4d36e97de325 11ce bict 002bet 038 004 R C 2 Finding the Index Value of the Network Interface To find the index value of your Network Interface from the Device Manager please perform the following steps Step 1 Open Device Manager and go to Network Adapters Step 2 Right click gt Properties on Mellanox Connect X Ethernet Adapter Step3 Go to Details tab Step 4 Select the Driver key and obtain the nn number In the below example the index equals 0010 Mellanox Technologies 177 Rev 4 70 es Device Manager File Action View Help 9 m B 8 m E FRB 4 gj l dev w068 ji Computer cs Disk drives KS Display adapters Ki Mellanox C X3 Ethemet Adapter ca IDE ATA ATAPI controllers lt IEEE 1394 host controllers amp Texas Instruments 1394 OHCI Compliant Host Controlle Property amp Keyboards Driver key n Mice and other pointing devices K Monitors General Advanced Information Performance Driver Details Events Power Management Value Network adapters 9726325 11cebfc1 0800 e1031 afooro Broadcom NetXtreme Gigabit Ethernet 5 Broadcom NetXtreme Gigabit Ethernet 6 Broadcom NetXtreme Gigabit Ethernet 7 Broadcom NetXtreme Gigabit Ethernet 8 Hyper V Virtual Ethernet Adapter 2 Hyper V Virtual Ethernet Adapter
30. Optional If you do not want to upgrade your firmware version LNX VPI WinOF 4 70 All win2012 x64 exe v MT SKIPFWUPCRD 1 Step 5 Optional If you want to control the installation of the WMI CIM provider LNX VPI WinOF 4 70 All win2012 x64 exe v MT_WMI 1 Step 6 Optional If you want to control whether to restore network configuration or not LNX VPI WinOF 4 70 All win2012 x64 exe v MT RESTORECONF 1 For further help please run MLNX VPI WinOF 4 70 All win2012 x64 exe v h 4 3 Installation Results Upon installation completion you can verify the successful addition of the network card s through the Device Manager Upon installation completion the inf files can be located at ProgramFiles Mellanox MLNX_VPI ETH ProgramFiles Mellanox MLNX_VPI HW mlx4_bus ProgramFiles Mellanox MLNX_VPI IB IPoIB To see the Mellanox network adapter device and the Ethernet or IPoIB network device depending on the used card for each port display the Device Manager and expand System devices or Network adapters 1 MT SKIPFWUPGRD default value is False 2 MT WMI default value is True 3 MT RESTORECONE default value is True Mellanox Technologies 25 J Rev 4 70 Figure 1 Installation Results P IBM USB Remote NDIS Network Device EP Mellanox ConnectX 3 Ethernet Adapter Lu Mellanox ConnectX 3 Ethernet Adapter 2 Lu Microsoft Kernel Debug Network Adapter b Ports COM amp
31. a set of development tools that allows the creation of Infini Band applications for MLNX_VPI software package The SDK package contains header files libraries and code examples To compile the examples provided with the SDK you must install Windows Driver Kit WDK version 8 1 and higher over Visual Studio 2013 To open the SDK package you must run the sdk exe file and get the complete list of files SDK package can be found under lt installation_directory gt IB SDK It is highly recommended to program the applications over the ND API and not over the IBAL API Mellanox Technologies 100 Rev 4 70 14 InfiniBand Fabric Utilities 14 1 Network Direct Interface The Network Direct Interface NDI architecture provides application developers with a net working interface that enables zero copy data transfers between applications kernel bypass I O generation and completion processing and one sided data transfer operations NDI is supported by Microsoft and is the recommended method to write InfiniBand application NDI exposes the advanced capabilities of the Mellanox networking devices and allows applica tions to leverage advances of InfiniBand For further information please refer to http msdn microsoft com en us library cc904397 v vs 85 aspx 14 2 part man Virtual IPoIB Port Creation Utility part_man is used to add remove virtual IPoIB ports Currently each Mellanox IPoIB port can have a single virtual IPoIB only
32. ib port S ize message size t x depth tx size PDT port a 11 V ersion c onnection type RC UC m tu mtu size n iteration num p ort C report cycles H report histogram U report unsorted 14 4 6 2 ib write lat Options The table below lists the various flags of the command Table 41 ib write lat Flags and Options Flag Description p port lt port gt Listens on connect to port lt port gt default 18515 d ib dev lt dev gt Uses IB device lt device guid gt default first device found i ib port lt port gt Uses port lt port gt of IB device default 1 m mtu lt mtu gt The mtu size default 1024 c connection lt RC UC gt Connection type RC UC default RC SIZe lt size gt The size of message to exchange default 65536 f freq lt dep gt How often the time stamp is taken a all Runs sizes from 2 till 223 t tx depth lt dep gt The size of tx queue default 100 n iters lt iters gt The number of exchanges at least 2 default 1000 C report cycles Reports times in cpu cycle units default microseconds H report histogram Prints out all results default print summary only U report unsorted implies Prints out unsorted results default sorted H V version Displays version number g grh Uses GRH with packets m
33. that vSwitch configuration is persistent no need to configure it after each reboot New VMSwitch VSwMLNX NetAdapterName Port1 AllowManagementOS true Shut down VMs Stop VM Name mtlael4 005 Force Confirm Stop VM Name mtlael4 006 Force Confirm Connect VM to vSwitch maybe you have to switch off VM before doing manual does also work Connect VMNetworkAdapter VMName mtlael4 005 SwitchName VSwMLNX Add VMNetworkAdapter VMName mtlael4 005 SwitchName VSwMLNX StaticMacAddress 00155D720100 Add VMNetworkAdapter VMName mtlael4 006 SwitchName VSwMLNX StaticMacAddress 00155D720101 LLL The commands from Step 2 4 are not persistent Its suggested to create Script is running after each OS reboot Step 2 Configure a Subnet Locator and Route records on each Hyper V Host Host 1 and Host 2 mtlael4 amp mtlael5 New NetVirtualizationLookupRecord CustomerAddress 172 16 14 5 ProviderAddress 192 168 20 114 VirtualSubnetID 5001 MACAddress 00155D720100 Rule TranslationMetho dEncap New NetVirtualizationLookupRecord CustomerAddress 172 16 14 6 ProviderAddress 192 168 20 114 VirtualSubnetID 5001 MACAddress 00155D720101 Rule TranslationMetho dEncap New NetVirtualizationLookupRecord CustomerAddress 172 16 15 5 ProviderAddress 192 168 20 115 VirtualSubnetID 5001 MACAddress 00155D730100 Rule TranslationMetho dEncap New NetVirtualizationLookupRecord CustomerAddress 172 16 15
34. 1200000 Note This registry value is not exposed via the UI EnableQPR 0 Enables query path record The valid values are e O0 disable e 1 enable McastQueryRespon 2 The number of runs of the multicast monitor which runs every 30 seInterval seconds allowed until a response to the IGMP MLD queries is received If after this period a response is not received the driver leaves the multicast group The valid values are 1 up to 10 Note This registry value is not exposed via the UI C 8 General Registry Values This section provides information on general registry keys that affect Mellanox driver operation Value Name Default Value Description MaxNumRssCpus 4 The number of CPUs that participate in the RSS The Mellanox adapter can open multiple receive rings each ring can be processed by a different processor When RSS is disabled the system opens a single Rx ring The Rx ring number that is configured should be powered of two and less than the number of processors on the system Value Type DWORD The valid values are up to number of processors on the system Mellanox Technologies 192 Rev 4 70 Value Name Default Value Description RssBaseCpu 1 The CPU number of the first CPU that the RSS can use NDIS uses the default value of 0 for the base CPU number how ever this value is configurable and can be changed The Mellanox adapter reads this value from registry a
35. 135 Rev 4 70 Table 32 ibqueryerrors Flags and Options Flags Description r Reports the port information This includes LID port exter nal port if applicable link speed setting remote GUID remote port remote external port 1f applicable and remote node description information data Includes the optional transmit and receive data counters threshold file Specifies an alternate threshold file The default is opt ufm files conf infiniband diags error thresholds switch Prints data for switches only ca Prints data for CA s only router Prints data for routers only clear errors k Clear error counters after read k and K can be used together to clear both errors and counters clear counts K Clear data counters after read CAUTION clearing data counters will occur regardless of if they are printed or not This is because data counters are only printed on ports which have errors This means if a port has 0 errors and the K option is specified the data counters will be cleared without any printed output details Includes receive error and transmits discard details load cache lt filename gt Loads and uses the cached ibnetdiscover data stored in the specified filename May be useful for outputting and learn ing about other fabrics or a previous state of a fabric Cannot be used if user specifies a direct route path See ibnetdis cover for information on cac
36. 3 per V Virtual Ethernet Adapter 4 x Mellanox ConnectX 3 Ethernet Adapter Mellanox ConnectX 3 Pro Ethernet Adapter Mellanox ConnectX 3 Pro Ethernet Adapter 2 4 y Other devices jg Base System Device b Ports COM amp LPT C 3 Basic Registry Keys This group contains the registry keys that control the basic operations of the NIC Value Name Default Value Description JumboPacket 1500 The maximum size of a frame or a packet that can be sent over the wire This is also known as the maximum transmission unit MTU The MTU may have a significant impact on the network s performance as a large packet can cause high latency However it can also reduce the CPU utilization and improve the wire effi ciency The standard Ethernet frame size is 1514 bytes but Mella nox drivers support wide range of packet sizes The valid values are Ethernet 600 up to 9600 IPoIB 1500 up to 4092 Note All the devices across the network switches and routers should support the same frame size Be aware that different net work devices calculate the frame size differently Some devices include the header i e information in the frame size while others do not Mellanox adapters do not include Ethernet header information in the frame size i e when setting JumboPacket to 1500 the actual frame size is 1514 Mellanox Technologies 178 Rev 4 70 Value Name Default Va
37. 3 IPoIB Adapter 2 pl Mellanox Virtual Miniport Driver VLAN New Production VLAN ij Other devices fp Unknown device 3 Ports COM amp LPT p Processors lt gt Storage controllers A ET ET E me jE System devices Universal Serial Bus controllers 8 5 4 Removing a Port VLAN in Windows 2008 R2 gt To remove a port VLAN perform the following steps Step 1 In the Device Manager window right click the network adapter from which the port VLAN was created Step 2 Left click Properties Step3 Select the VLAN tab from the Properties sheet Mellanox Technologies 47 Rev 4 70 Step4 Select the VLAN to be removed Step 5 Click Remove and confirm the operation 8 5 5 Configuring a Port to Work with VLAN in Windows 2012 and Above In this procedure you DO NOT create a VLAN rather use an existing VLAN ID Aun To configure a port to work with VLAN using the Device Manager Step 1 Open the Device Manager Step2 Go to the Network adapters Step3 Right click Properties on Mellanox ConnectX 3 Ethernet Adapter card Step 4 Go to Advanced tab Step 5 Choose the VLAN ID in the Property window Step 6 Set its value in the Value window Fy Device Manager File Action View Help Details 3 Events IE Power Management e9 m B gp
38. 3 Pro Ports TX arbitration Bandwidth allocation per port For the complete list of Ethernet and InfiniBand Known Issues and Limitations WinOF Release Notes www mellanox com gt Products gt InfiniBand VPI Drivers gt Windows SW Drivers 8 1 Hyper V with VMQ Mellanox WinOF Rev 4 70 includes a Virtual Machine Queue VMQ interface to support Microsoft Hyper V network performance improvements and security enhancement VMQ interface supports Classification of received packets by using the destination MAC address to route the packets to different receive queues NIC ability to use DMA to transfer packets directly to a Hyper V child partition s shared memory Scaling to multiple processors by processing packets for different virtual machines on different processors To enable Hyper V with VMQ using UI Step 1 Open Hyper V Manager Step2 Right click the desired Virtual Machine VM and left click Settings in the pop up menu Step 3 In the Settings window under the relevant network adapter select Hardware Acceleration Step 4 Check uncheck the box Enable virtual machine queue to enable disable VMQ on that spe cific network adapter gt To enable Hyper V with VMQ using PowerShell Mellanox Technologies 37 J Rev 4 70 Step 1 Enable VMQ on a specific VM Set VMNetworkAdapter VM Name VmqWeight 100 Step 2 Disable VMQ ona specific VM Set VMNetworkAdapter VM Name Vmq Weight 0 8 2 Header
39. 6 ProviderAddress 192 168 20 115 VirtualSubnetID 5001 MACAddress 00155D730101 Rule TranslationMetho dEncap Add customer route New NetVirtualizationCustomerRoute RoutingDomainID 11111111 2222 3333 4444 000000005001 VirtualSubnetID 5001 DestinationPrefix 172 16 0 0 16 NextHop 0 0 0 0 Metric 255 Mellanox Technologies 173 Rev 4 70 Step 3 Configure the Provider Address and Route records on Hyper V Host 1 Host 1 Only mtlael4 SNIC Get NetAdapter Port1 New NetVirtualizationProviderAddress InterfaceIndex SNIC InterfaceIndex Pro viderAddress 192 168 20 114 PrefixLength 24 New NetVirtualizationProviderRoute InterfaceIndex NIC InterfaceIndex Destination Prefix 0 0 0 0 0 NextHop 192 168 20 1 Step 5 Configure the Virtual Subnet ID on the Hyper V Network Switch Ports for each Virtual Machine on each Hyper V Host Host 1 and Host 2 Run the command below for each VM on the host the VM is running on it i e the for mtlael4 005 mtlael4 006 on host 192 168 20 114 and for VMs mtlae15 005 mtlael5 006 on host 192 168 20 115 mtlael4 only Get VMNetworkAdapter VMName mtlae14 005 where MacAddress eq 00155D720100 Set VMNetworkAdapter VirtualSubnetID 5001 Get VMNetworkAdapter VMName mtlael4 006 where MacAddress eq 00155D720101 Set VMNetworkAdapter VirtualSubnetI
40. IB1 Rack 11 spine 1 ISR9288 Voltaire sFB 12D 0x0008f 10400400e2f IB1 Rack 11 spine 1 ISR9288 Voltaire sFB 12D 0x0008f 10400400e31 IB1 Rack 11 spine 2 ISR9288 Voltaire sFB 12D 0x0008f 10400400e32 IB1 Rack 11 spine 2 ISR9288 Voltaire sFB 12D GUID Node Name 0x0008 10400411a08 SW Rack 3 ISR9024 Voltaire 9024D 0x0008f10400411a28 SW2 Rack 3 ISR9024 Voltaire 9024D 0x0008f10400411a34 SW3 Rack 3 ISR9024 Voltaire 9024D 0x0008f104004119d0 SW4 Rack 3 ISR9024 Voltaire 9024D 14 3 10 ibtracert ibtracert uses SMPs to trace the path from a source GID LID to a destination GID LID Each hop along the path is displayed until the destination is reached or a hop does not respond By using the m option multicast path tracing can be performed between source and destination nodes 14 3 10 1ibtracert Synopsis ibtracert d ebug v erbose D irect L id e rrors u sage G uids orce n o info m mlid s smlid C ca name P ca port t imeout timeout ms V ersion node name map node name map h elp lt dest dr path lid guid lt startlid gt lt endlid gt 14 3 10 2ibtracert Options The table below lists the various flags of the command Most OpenIB diagnostics take the following common flags The exact list of supported flags per utility can be found in the usage message and can be shown using the util_name h syntax Table 23 ibtracert Flags an
41. LLR retransmission rate Step 1 Run ibdiagnet no special flags required Step2 If the LLR retransmission rate limit is exceeded it will print to the screen The default limit is set to 500 and requires further investigation if exceeded The LLR retransmission rate is reflected in the results file var tmp ibdiagnet2 ibdiagnet2 pm The default value of 500 retransmissions sec has been determined by Mellanox based on the extensive simulations and testing Links exhibiting a lower LLR retransmission rate should not raise special concern Mellanox Technologies 103 Rev 4 70 14 3 2 2 ibdiagnet Options Table 14 ibdiagnet Options Flag Description c lt count gt Min number of packets to be sent across each link default 10 V Enable verbose mode r Provides a report of the fabric qualities o lt out dir gt Specifies the directory where the output files will be placed default tmp t lt topo file gt Specifies the topology file name s lt sys name gt Specifies the local system name Meaningful only if a topology file is specified 1 lt dev index gt Specifies the index of the device of the port used to connect to the IB fab ric in case of multiple devices on the local system p lt port num gt Specifies the local device s port num used to connect to the IB fabric pm Dump all the fabric links pm Counters into ibdiagnet pm pc Reset all the fabric links pm
42. Lid specified U Returns the name for the Guid specified C Gets the SA s class port info S Returns the PortInfoRecords with isSM or isSMdisabled capability mask bit on g Gets multicast group info m Gets multicast member info If a group is specified limit the output to the group specified and print one line containing only the GUID and node description for each entry Example saquery m 0xc000 X Gets LinkRecord info src to dst Gets a PathRecord for lt src dst gt where src and dst are either node names or LIDs sgid to dgid Gets a PathRecord for sgid to dgid where both GIDs are in an IPv6 format acceptable to inet_pton 3 C lt ca_name gt Uses the specified ca_name P lt ca_port gt Uses the specified ca_port smkey lt val gt Uses SM Key value for the query Will be used only with trusted queries If non numeric value like x is specified then saquery will prompt for a value Mellanox Technologies 140 Rev 4 70 Table 34 saquery Flags and Options Flags Description t timeout lt msec gt Specifies SA query response timeout in milliseconds Default is 100 milliseconds You may want to use this option if IB_TIMEOUT is indicated node name map lt node name map gt Specifies a node name map The node name map file maps GUIDs to more user friendly names See ibnetdiscover 8 for node name map file format Only used with the O and
43. MM LinkUp MRNA ENSUPPO NCEA e C TE 1X or 4X InkwidbhEnabledg eR T 1X or 4X MINKING ENACEIVE MEER 4X mMnkspeedsupponteo d e cag OI 2 5 Gbps or 5 0 Gbps InmksoecdBuablcd RP 2 5 Gbps or 5 0 Gbps InmnlespesdAGAVe gm A 5 0 Gbps Now change the enabled link speed gt ibportstate C mlx4 0 D 0 1 speed 2 ibportstate C mlx4 0 D 0 1 speed 2 Initial PortInfo Port info DR path slid 65535 dlid 65535 0 port 1 Mellanox Technologies 108 Rev 4 70 LinkSpeedEnabled 2 5 Gbps After PortInfo set Port info DR path slid 65535 dlid 65535 0 port 1 LinkSpeedEnabled 5 0 Gbps IBA extension Show the new configuration gt ibportstate C mlx4 0 D01 PortInfo Port info DR path slid 65535 dlid 65535 0 port 1 DinkStatecne me sense ae ere ets Initialize Phyo minke tater E E LinkUp LinkWidthSupported 1X or 4X lipo aiaichelniseloMletly een 1X or 4X ibghal Aloha Neale EE E E oons 4X LinkSpeedSupported 2 5 Gbps or 5 0 Gbps LinkSpeedEnabled 5 0 Gbps IBA extension InnkSpeedActive Ud e er 5 0 Gbps 14 3 4 ibroute Uses SMPs to display the forwarding tables for unicast LinearForwardingTable or LFT or mul ticast MulticastForwardingTable or MFT for the specified switch LID and the optional lid mlid range The default range is all valid entries in the range of 1 to FDBTop 14 3 4 1 ibroute Applicable Hardware InfiniBand switches 1
44. PS New NetQosPolicy SMB Policystore Activestore NetDirectPortMatchCondition 445 PriorityValue8021Action 3 PS New NetQosPolicy DEFAULT Policystore Activestore Default PriorityValue8021Action 3 PS New NetQosPolicy TCP Pol PriorityValue8021Action 1 PS New NetQosPolicy UDP Po PriorityValue8021Action 1 PS Disable NetQosFlowControl 0 1 2 4 5 6 7 PS Enable NetAdapterQos InterfaceAlias port1 PS Enable NetAdapterQos InterfaceAlias port2 PS Enable NetQosFlowControl Priority 3 icystore Activestore IPProtocolMatchCondition TCP icystore Activestore IPProtocolMatchCondition UDP 6 Browse for the script s location 7 Click OK 8 To confirm the settings applied after boot run PS get netgospolicy policystore activestore Mellanox Technologies 36 J Rev 4 70 8 Driver Features The Mellanox VPI WinOF driver release introduces the following capabilities Support for Single and Dual port Adapters Upto 16 Rx queues per port Rxsteering mode RSS Hardware Tx Rx checksum calculation Large Send off load 1 e TCP Segmentation Off load Hardware multicast filtering Adaptive interrupt moderation Support for MSI X interrupts Support for Auto Sensing of Link level protocol Ethernet Only Hardware VLAN filtering Header Data Split RDMA over Converged Ethernet RoCE DSCP over IPv4 e RoCEv2 in ConnectX 3 Pro NVGRE hardware off load in ConnectX
45. Payload 8 8 4 Removing NVGRE configuration Step 1 Set VSID back to 0 on each Hyper V for each Virtual Machine where VSID was set Get VMNetworkAdapter lt VMName gt a where MacAddress eq lt VMMacAddress gt b Set VMNetwork Adapter VirtualSubnetID 0 VMName the name of Virtual machine VMMacAddress the MAC address of VM s network interface associated with vSwitch that was connected to Mellanox device Step 2 Remove all lookup records same command on all Hyper V hosts Remove NetVirtualizationLookupRecord Step3 Remove customer route same command on all Hyper V hosts Remove NetVirtualizationCustomerRoute Step 4 Remove Provider address same command on all Hyper V hosts Remove NetVirtualizationProviderAddress Step 5 For HyperV running Windows 2012 only disable network adapter binding to ms_netwnv ser vice Disable NetAdapterBinding lt EthInterfaceName gt a ComponentID ms netwnv lt EthInterfaceName gt Physical NIC name 8 9 Differentiated Services Code Point DSCP DSCP is a mechanism used for classifying network traffic on IP networks It uses the 6 bit Dif ferentiated Services Field DS or DSCP field in the IP header for packet classification purposes Using Layer 3 classification enables you to maintain the same classification semantics beyond local network across routers Every transmitted packet holds the information allowing network devices to map the packet to the appropriate 802
46. QoS counter set consists of flow statistics per VLAN priority Each QoS policy is associated with a priority The counter presents the priority s traffic pause statistic Table 13 Mellanox QoS Counters Mellanox QoS Counters Description Bytes Packets IN Bytes Received The number of bytes received that are covered by this priority The counted bytes include framing characters modulo 2 64 Bytes Received Sec The number of bytes received per second that are covered by this priority The counted bytes include framing characters Packets Received The number of packets received that are covered by this priority modulo 2 64 Packets Received Sec The number of packets received per second that are covered by this prior ity Bytes Packets OUT Bytes Sent The number of bytes sent that are covered by this priority The counted bytes include framing characters modulo 2 64 Mellanox Technologies 97 J Rev 4 70 Table 13 Mellanox QoS Counters Mellanox QoS Counters Description Bytes Sent Sec The number of bytes sent per second that are covered by this priority The counted bytes include framing characters Packets Sent The number of packets sent that are covered by this priority modulo 2 64 Packets Sent Sec The number of packets sent per second that are covered by this priority Bytes and Packets TOTAL Bytes Total The total number of b
47. RC UC gt Connection type RC UC default RC Size lt size gt The size of message to exchange default 65536 a all Runs sizes from 2 till 223 t tx depth lt dep gt The size of tx queue default 100 n iters lt iters gt The number of exchanges at least 2 default 1000 u qp timeout lt timeout gt QP timeout The timeout value is 4 usec 2 timeout default 14 S sl lt sl gt The service level default 0 x gid index lt index gt Test uses GID with GID index taken from command line for RDMAOE index should be 0 C report cycles Reports times in cpu cycle units default microseconds H report histogram Print out all results default print summary only U report unsorted implies Print out unsorted results default sorted H V version Displays version number F CPU freq The CPU frequency test It is active even if the cpufreq_ondemand module is loaded I inline_size lt size gt The maximum size of message to be sent in inline mode default 0 R rdma_cm Connects QPs with rdma_cm and run test on those QPs Z com_rdma_cm Communicates with rdma_cm module to exchange data use regular QPs Mellanox Technologies 156 Rev 4 70 14 4 13 nd_write_bw This test is used for performance measuring of RDMA Write requests in Microsoft Windows Operating S
48. Rev 4 70 Device Manager Be Ei Fie Action View Help e 9 m u ml mE Computer Disk drives tM Display adapters 3 DVD CD ROM drives Floppy drive controllers on Human Interface Devices IDE ATA ATAPI controllers IEEE 1394 Bus host controllers lt gt Keyboards n Mice and other pointing devices IK Monitors Network adapters x Broadcom BCM5708C Netxtreme II GigE NDIS VBD Client x Broadcom BCM5708C Netxtreme II GigE NDIS VBD Client 2 Mellanox Connectx MT25418 DDR Channel Adapter X Mellanox ConnectX 10Gb Ethernet Adapter Mellanox Connectx 10Gb Ethernet Adapter 2 Mellanox Virtual Miniport Driver Team i Other devices jy Base System Device EII Ports COM amp LPT D Processors H lt Storage controllers HE System devices j Universal Serial Bus controllers iz amp EI 8 8 H HA A m gt To modify an existing bundle perform the following a Select the desired bundle and click Modify b Modify the bundle name its type and or the participating adapters in the bundle c Click the Commit button gt To remove an existing bundle select the desired bundle and click Remove You will be prompted to approve this action Notes on this step a Each adapter that participates in a bundle has two properties Status Connected Disconnected Disabled Role Active or Backup b Each network adapter that is added or removed from a bundle ge
49. Roles Selection DESTINATION SERVER Select server roles iden 2 Select one or more roles to install on the selected server Roles Description Hyper V provides the services that you can use to create and manage Active Directory Domain Services virtual machines and their resources Active Directory Certificate Services Active Directory Federation Services Each virtual machine is a virtualized computer system that operates in an isolated execution environment This allows you to run multiple operating Application Server systems simultaneously DHCP Server DNS Server Active Directory Lightweight Directory Services Active Directory Rights Management Services Fax Server yper V Installed Network Policy and Access Services Print and Document Services Remote Access Remote Desktop Services lt Previous Insta Cancel Step 3 Install Hyper V Management Tools a Go to Features gt Remote Server Administration Tools gt Role Administration Tools gt Hyper V Administration Tool Mellanox Technologies 62 Rev 4 70 Figure 7 Hyper V Features Selection DESTINATION SERVER Select features deut i t Before You Begin Select one or more features to install on the selected server Installation Ty Features Description Loy ormessuge sae Hyper V Management Tools Multipath I O includes GUI
50. Serial Bus controllers EE EHE EH Mellanox Technologies 45 Rev 4 70 Step2 Right click a Mellanox network adapter under Network adapters list and left click Proper ties Select the VLAN tab from the Properties sheet Physical Adapter Virtual Bundle Team canon connects 105b Ethernet Adapter Properties EES Mellanox Virtual Miniport Driver Team A Properties LBFO Driver Details General VLAN Driver Details General Information Advanced Performance VLAN Virtual Lans Virtual Lans Mellanox Mellanox VLANs associated with this adapte VLANs associated with this adapter This dialog allows you to configure Virtual LANs VLANs for the This dialog allows you to configure Virtual LANs VLANs for the adapter adapter NOTE After configuring a VLAN the adapter associated with the NOTE After configuring VLAN the adapter associated with the VLAN may experience a momentary loss of connectivity VLAN may experience a momentary loss of connectivity The list view has four columns The list view has four columns VLAN Name Displays the assigned VLAN name VLAN Name Displays the assigned VLAN name If a physical adapter has been added to a bundle team the VLAN tab will not be displayed Step3 Click New to open a VLAN dialog window Enter the desired VLAN Name and VLAN ID and select the VLAN Priority VLAN Name f VLAN ID 101 VLAN Priority 2 his dialog allows you to enter
51. and Software Requirements 00 000 c cece eee ees 16 1 2 Supplied Packages 0 0 ccc n 16 1 3 WinOF Set of Documentation 000 eens 16 Chapter 2 Downloading Mellanox WinOF Driver ce eeeeceevees 17 Chapter 3 Extracting Files Without Running Installation 6 18 Chapter 4 Installing Mellanox WinOF Driver 0 cece eee e ce ceeee 20 4 1 Attended Installation 0 0 ne 20 4 2 Unattended Installation 0 0 0 eee aes 25 4 3 Installation Results 0 0 cee een E ea 25 Chapter 5 Uninstalling Mellanox WinOF Driver cece eeececvees 27 L Attended Uninstall uie hs etse pn oe e Un 2 5 2 Unattended Uninstall llle 27 2 3 umwareUperade 4 egere barged Ep Pepe ehe ra 27 Chapter 6 Upgrading Mellanox WinOF Driver eeeeeeeeeeen 28 Chapter 7 Advanced Driver Configuration 0 c cece cece e rere eens 29 7 1 Assigning Port IP After Installation 0 0 0 cee eee eee 29 7 2 Configuring the InfiniBand Driver 00 00 cece ee eee 31 7 2 1 Modifying IPoIB Configuration 0 0 ete eee 31 7 2 2 Displaying Adapter Related Information 0 cee eee eee eee 32 7 3 Configuring the Ethernet Driver 0 0 0 0 000 cece tees 33 7 4 Configuring Quality of Service QOS 0 0 cece een eens 34 Chapters Driver Features gt s 6 5 e5 eee SERS eR ERR E RA Dae Eanes 37 8 1 Hyper V w
52. and command line tools for managing Hyper V Network Load Balancing Peer Name Resolution Protocol Quality Windows Audio Video Experience RAS Connection Manager Administration Kit CM _ Remote Assistance Remote Differential Compression Remote Server Administration C Feature Administration Tools Role Ac stration T AD DS and AD LDS Tools per V Management Tools Installed Remote Desktop Services Tools Windows Server Update Services Tools lt Previous Next gt In Cancel Step 4 Confirm the Installation Figure 8 Hyper V Confirming Installation DESTINATION SERVER Confirm installation selections Ee Or vilis mdp Before You Begin To install the following roles role services or features on selected server click Install Installation Type Restart the destination server automatically if required Optional features such as administration tools might be displayed on this page because they have been selected automatically If you do not want to install these optional features click Previous to clear their check boxes Hyper V Virtual Switche Remote Server Administration Tools Role Administration Tools Migration Hyper V Management Tools Default Stores Hyper V GUI Management Tools f Hyper V Module for Windows PowerShell Export configuration settings Specify an alternate source path
53. and receive errors verbose v Increase verbosity level usage u Usage message Mellanox Technologies 116 Rev 4 70 Table 20 perfquery Flags and Options Flag Description loop_ports l Loop ports reset_after_read r Reset the counters after reading them Ca C lt ca_name gt Use the specified channel adapter or router Port P lt ca_port gt Use the specified port Reset_only R Reset the counters timeout t lt timeout_ms gt Override the default timeout for the solicited MADs msec version V Show version info lt lid guid gt LID or GUID port reset_mask extended x show extended port counters extended_speeds T show port extended speeds counters oprcvcounters show Rcv Counters per Op code flowctlcounters show flow control counters vloppackets show packets received per Op code per VL vlopdata show data received per Op code per VL vlxmitflowctlerrors show flow control update errors per VL vlxmitcounters show ticks waiting to transmit counters per VL swportvlcong show sw port VL congestion ICvcc show Rcv congestion control counters slrcvfecn show SL Rcv FECN counters slrcvbecn show SL Rcv BECN counters xmitcc show Xmit congestion control counters vlxmittimecc show VL Xmit Time congestion control counters Examples perfquery r 32 1 read performance counters a
54. chosen pri orities where PFC is enabled on those priorities Configuring Windows host requires configuring QoS To configure QoS please follow the pro cedure described in Section 7 4 Configuring Quality of Service QoS on page 34 8 7 2 2 1 Using Global Pause Flow Control GFC gt To use Global Pause Flow Control GFC mode disable QoS and Priority PS Disable NetQosFlowControl PS Disable NetAdapterQos interface name 8 7 3 Configuring SwitchX amp Based Switch System To enable RoCE the SwitchX should be configured as follows e Ports facing the host should be configured as access ports and either use global pause or Port Control Protocol PCP for priority flow control Ports facing the network should be configured as trunk ports and use Port Control Pro tocol PCP for priority flow control For further information on how to configure SwitchX please refer to SwitchX User Manual 8 7 4 Configuring Arista Switch Step 1 Set the ports that face the hosts as trunk config interface et10 config if Et10 switchport mode trunk Step 2 Set VID allowed on trunk port to match the host VID config if Et10 switchport trunk allowed vlan 100 Step 3 Set the ports that face the network as trunk config interface et20 config if Et20 switchport mode trunk Step 4 Assign the relevant ports to LAG config interface et10 config if Et10 dcbx mode ieee Mellanox Technologies 51 J Rev
55. database run the following command vea man q vea man 8 11 6 Help Message To view the help message run the following command vea man vea man h If your adapter name has spaces in it you need to surround it with quotes p Examples vea man a Ethernet 9 Adds a new adapter as a virtual duplicate of Ethernet 9 vea man r Ethernet 13 Removes virtual ethernet adapter Ethernet 13 Mellanox Technologies 74 J Rev 4 70 8 12 IPoIB SR IOV over KVM This feature is in Beta quality For more details on how to configure IPoIB SR IOV over KVM please contact Mellanox support 8 13 Lossless TCP 8 13 1 Introduction Inbound packets are stored in the data buffers They are split into Lossy and Lossless according to the priority field in the 802 1Q VLAN tag In DSCP based PFC all traffic is directed to the Lossless buffer Packets are taken out of the packet buffer in the same order they were stored and moved into processing where a destination descriptor ring is selected The packet 1s then scattered into the appropriate memory buffer pointed by the first free descriptor Figure 18 Lossless TCP e m a 2 E ao ke 2 oa Packet Process Packet Buffer Lossy XOFF threshold When the Lossless packet buffer crosses the XOFF threshold the adapter sends 802 3x pause frames according to the port configuration Global pause or per priority 802 1Qbb pause PFC where on
56. default 65536B gt and it must not be combined with a flag a Runs all the messages sizes from 1B to 8MB and it must not be combined with s flag n lt num of iterations gt The number of exchanges at least 2 the default is 100000 I lt max inline size gt The maximum size of message to send inline The default number is 128B D lt test duration in seconds gt Tests duration in seconds f lt margin time in seconds gt The margin time to avoid calculation and it must be less than half of the duration time S lt server interface IP gt lt server side only must be last parameter gt C server interface IP gt client side only must be last parameter h Shows the Help screen 14 4 15 nd read bw This test is used for performance measuring of RDMA Read requests in Microsoft Windows Operating Systems nd read bw is performance oriented for RDMA Read with maximum throughput and runs over Microsoft s NetworkDirect standard The level of customizing for the user is relatively high User may choose to run with a customized message size customized num ber of iterations or alternatively customized test duration time nd read bw runs with all mes sage sizes from 1B to 4MB powers of 2 message inlining CQ moderation Mellanox Technologies 158 Rev 4 70 14 4 15 1nd_read_bw Synopsis running on specific single core Server side start
57. even if the cpufreq ondemand module is loaded R rdma cm Connects QPs with rdma cm and run test on those QPs Z com rdma cm Communicates with rdma cm module to exchange data use regular QPs c connection lt RC UC gt Connection type RC UC default RC I inline_size lt size gt Max size of message to be sent in inline default 400 14 4 9 ibv_send_bw This is a more advanced version of ib_send_bw and contains more flags and features than the older version and also improved algorithms ibv_send_bw calculates the BW of SEND between a pair of machines One acts as a server and the other as a client The server receive packets from the client and they both calculate the throughput of the operation The test supports a large variety of features as described below and has better performance than ib_send_bw in Nehalem systems 14 4 9 1 ibv_send_bw Synopsis ibv send bw i b port ib port d ib device c onnection_type RC UC UD m tu mtu size s ize message size t x depth tx size r x dpeth rx size n iteration num p ort PDT port I nline size inline size u qp timeout S 1 sl type x gid index e vents use events N o peak use peak calc F CPU freq fail g num of gps in mcast group M mcast gid b idirectional a 11 V ersion 14 4 9 2 ibv send bw Options The table below lists the various flags of the command Table 44 ibv
58. hold the standard Windows CounterSet API that includes Network Interface RDMA activity SMB Direct Connection Mellanox Technologies 93 J Rev 4 70 11 4 1 Supported Standard Performance Counters 11 4 1 1 Proprietary Mellanox Adapter Traffic Counters Proprietary Mellanox adapter traffic counter set consists of global traffic statistics which gather information from ConnectX 3 and ConnectX 3 Pro network adapters and includes traffic statistics and various types of error and indications from both the Physical Function and Virtual Function Table 11 Mellanox Adapter Traffic Counters Mellanox Adapter Traffic Counters Description Bytes IN Bytes Received Shows the number of bytes received by the adapter The counted bytes include framing characters Bytes Received Sec Shows the rate at which bytes are received by the adapter The counted bytes include framing characters Packets Received Shows the number of packets received by ConnectX 3 and ConnectX 3Pro network interface Packets Received Sec Shows the rate at which packets are received by ConnectX 3 and Con nectX 3Pro network interface Bytes Packets OUT Bytes Sent Shows the number of bytes sent by the adapter The counted bytes include framing characters Bytes Sent Sec Shows the rate at which bytes are sent by the adapter The counted bytes include framing characters Packets Sent Shows the
59. lt dev gt Uses IB device lt device guid gt default first device found i ib port lt port gt Uses port lt port gt of IB device default 1 m mtu lt mtu gt The mtu size default 1024 c connection lt RC UC gt Connection type RC UC default RC size size The size of message to exchange default 65536 a all Runs sizes from 2 till 2 23 t tx depth lt dep gt The size of tx queue default 100 n iters lt iters gt The number of exchanges at least 2 default 1000 b bidirectional Measures bidirectional bandwidth default unidirectional V version Displays version number 0 post lt num of posts gt The number of posts for each qp in the chain default tx depth q qp lt num of qp s gt The number of qp s default 1 BS grh Use GRH with packets mandatory for RoCE 14 4 6 ib write lat ib write lat calculates the latency of RDMA write operation of message size between a pair of machines One acts as a server and the other as a client They perform a ping pong benchmark on which one side RDMA writes to the other side memory only after the other side wrote on his memory Each of the sides samples the CPU clock each time they write to the other side memory in order to calculate latency Mellanox Technologies 147 Rev 4 70 14 4 6 1 ib_write_lat Synopsis ib write lat i b port
60. number of packets sent by ConnectX 3 and ConnectX 3Pro network interface Packets Sent Sec Shows the rate at which packets are sent by ConnectX 3 and ConnectX 3Pro network interface Bytes TOTAL Bytes Total Shows the total of bytes handled by the adapter The counted bytes include framing characters Bytes Total Sec Shows the total rate of bytes that are sent and received by the adapter The counted bytes include framing characters Packets Total Shows the total of packets handled by ConnectX 3 and ConnectX 3Pro network interface Packets Total Sec Shows the rate at which packets are sent and received by ConnectX 3 and ConnectX 3Pro network interface Control Packets The total number of successfully received control frames ERRORS DROP AND MISC INDICATIONS Mellanox Technologies 94 J Rev 4 70 Table 11 Mellanox Adapter Traffic Counters Mellanox Adapter Traffic Counters Description Packets Outbound Errors Shows the number of outbound packets that could not be transmitted because of errors Packets Outbound Discarded Shows the number of outbound packets to be discarded even though no errors had been detected to prevent transmission One possible reason for discarding packets could be to free up buffer space Packets Received Errors Shows the total number of inbound packets that contained errors prevent ing them from being deliverable to a hi
61. or modify the following VLAN properties VLAN Name The name can be any unique alphanumeric string VLAN ID The ID is a number between 1 and 4095 VLAN Priority The priority is a number between 0 and 7 0 lowest 7 highest NOTE After creating a new VLAN the adapter associated with the VLAN may experience a momentary loss of connectivity Lx e Mellanox Technologies 46 Rev 4 70 After installing the first virtual adapter VLAN on a specific port the port becomes dis abled This means that it is not possible to bind to this port until all the virtual adapters Ful associated with it are removed When using a VLAN the network address is configured using the VLAN ID There 48 fore the VLAN ID on both ends of the connection must be the same Step 4 Verify the new VLAN s by opening the Device Manager window or the Network Connections window The newly created VLAN will be displayed in the following format Mellanox Virtual Miniport Driver VLAN name ol Device Manager File Action View Help Abel MEM os a ES ay Fsupp 10 Computer cs Disk drives M Display adapters DVD CD ROM drives n Human Interface Devices C IDE ATA ATAPI controllers Keyboards A Mice and other pointing devices K Monitors i Network adapters X HP NC362i Integrated DP Gigabit Server Adapter a HP NC362i Integrated DP Gigabit Server Adapter 2 amp Mellanox ConnectX 3 Ethernet Adapter Mellanox Connectx
62. priority 3 PS Enable NetQosFlowControl Priority 3 To add the script to the local machine startup scripts Step 1 From the PowerShell invoke gpedit msc Step2 In the pop up window under the Computer Configuration section perform the following 1 Select Windows Settings 2 Select Scripts Startup Shutdown 3 Double click Startup to open the Startup Properties 4 Move to PowerShell Scripts tab Local Group Policy Editor 5 x File Action View Help 9 zr EGB Hrm f Local Computer Policy Scripts Startup Shutdown 4 iK Computer Configuration p C Software Settings Startup 2 B PowerShell Scripts ion Poli 5 Scripts Startup Shutdown a Windows PowerShell Startup Scripts for Local Computer Settings Description ontains computer startup scripts cunt Setting Policy based QoS gt b gt a re Templates Name Parameters D Us uration C Program Files PFC P gt a re Settings b E Windows Settings b C Administrative Templates Add LEa Remove Forthis GPO run scripts in the following order Not configured v JL PowerShell scripts require at least Windows 7 or Windows Server W iin Show Files oc ees Mellanox Technologies 35 J Rev 4 70 5 Click Add The script should include only the following commands PS Remove NetQosTrafficClass PS Remove NetQosPolicy Confirm False PS set NetQosDcbxSetting Willing 0
63. send bw Flags and Options Flag Description p port lt port gt Listens on connect to port lt port gt default 18515 d ib dev lt dev gt Uses IB device lt device guid gt default first device found Mellanox Technologies 151 Rev 4 70 Table 44 ibv_send_bw Flags and Options Flag Description i ib port lt port gt Uses port lt port gt of IB device default 1 m mtu lt mtu gt The mtu size default 1024 c connection lt RC UC UD gt Connection type RC UC UD default RC Size lt size gt The size of message to exchange default 65536 a all Runs sizes from 2 till 2423 t tx depth lt dep gt The size of tx queue default 100 n iters lt iters gt The number of exchanges at least 2 default 1000 u qp timeout lt timeout gt QP timeout The timeout value is 4 usec 2 timeout default 14 S sl lt sl gt The service level default 0 x gid index lt index gt Test uses GID with GID index taken from command line for RDMAoE index should be 0 b bidirectional Measures bidirectional bandwidth default unidirectional V version Displays version number g post lt num of posts gt The number of posts for each qp in the chain default tx depth e events Inactive during CQ events default poll F CPU freq The CPU frequency test It is active even if the cpufreq ondem
64. that attempts to alleviate the scalability problems associated with large cloud computing deployments It uses Generic Routing Encapsulation GRE to tunnel layer 2 packets across an IP fabric and uses 24 bits of the GRE key as a logical network discriminator which is called a tenant network ID Configuring the Hyper V Network Virtualization requires two types of IP addresses Provider Addresses PA unique IP addresses assigned to each Hyper V host that are routable across the physical network infrastructure Each Hyper V host requires at least one PA to be assigned Customer Addresses CA unique IP addresses assigned to each Virtual Machine that participate on a virtualized network Using NVGRE multiple CAs for VMs running on a Hyper V host can be tunneled using a single PA on that Hyper V host CAs must be unique across all VMs on the same virtual network but they do not need to be unique across virtual networks with different Virtual Subnet ID The VM generates a packet with the addresses of the sender and the recipient within the CA space Then Hyper V host encapsulates the packet with the addresses of the sender and the recip ient in PA space Mellanox Technologies 53 J Rev 4 70 PA addresses are determined by using virtualization table Hyper V host retrieves the received packet identifies recipient and forwards the original packet with the CA addresses to the desired VM NVGRE can be implemented across an exis
65. the iSCSI clients 9 1 3 Configuring the DHCP Server To configure the DHCP server 1 Install a DHCP server 2 Add to IPv4 a new scope 3 Add iSCSI boot client identifier MAC GUID to the DHCP reservation 1 Use index 2 for Windows setup and index 1 for WinPE 2 When adding the Mellanox driver to install wim verify you are using the appropriate index for your OS flavor To check the OS run imagex info install win Mellanox Technologies 78 J Rev 4 70 4 Add to the reserved IP address the following options Table 10 Reserved IP Address Options Option Name Value 017 Root Path iscsi 11 4 12 65 iqn 2011 01 iscsiboot Assuming the iSCSI target IP is 11 4 12 65 and the Target Name iqn 2011 01 iscsiboot 060 PXEClient PXEClient 066 Boot Server WDS server IP address Host Name 067 Boot File boot x86 wdsnbp com Name 9 2 Configuring the Client Machine To configuring your client 1 Verify the Mellanox adapter card is burned with the correct Mellanox FlexBoot version For boot over Ethernet when using adapter cards with older firmware version than 2 30 8000 you need to burn the adapter card with Ethernet FlexBoot otherwise use the VPI FlexBoot 2 Verify the Mellanox adapter card is burned with the correct firmware version Set the Mellanox Adapter Card as the first boot device in the BIOS settings boot order 9 3 Installing iSCSI 1 Reboot your i
66. 0 C report cycles Reports times in cpu cycle units default microseconds H report histogram Print out all results default print summary only U report unsorted implies Print out unsorted results default sorted H V version Displays version number F CPU freq The CPU frequency test It is active even if the cpufreq ondemand module is loaded Mellanox Technologies 153 Rev 4 70 Table 45 ibv send lat Flags and Options Flag Description g post lt num of posts gt The number of posts for each qp in the chain default tx depth I inline_size lt size gt The maximum size of message to be sent in inline mode default 0 e events Inactive during CQ events default poll g mcg lt num_of_qps gt Sends messages to multicast group with lt num_of_qps gt qps attached to it M MGID lt multicast_gid gt In case of multicast uses lt multicast_gid gt as the group MGID The format must be 255 1 X X X X X X X X X X X X X X where X is a value within 0 255 You must specify a different MGID on both sides to avoid loopback R rdma_cm Connect QPs with rdma_cm and run test on those QPs Z com_rdma_cm Communicate with rdma_cm module to exchange data use regular QPs 14 4 11 ibv_write_bw This is a more advanced version of ib_write_bw and contains more flags and features than the older version and also
67. 00004900 MT23108 InfiniHost Mellanox Technologies 1 5442b100004901 S 0008 10400410015 4 lid 12 lmc 1 SW 6IB4 Vol taire lid 3 4xSDR vendid 0x2c9 devid 0x5a44 caguid 0x8f10403961354 Ca 2 H 0008 10403961354 MT23108 InfiniHost Mellanox Technologies 1 8 10403961355 S 005442ba00003080 22 lid 4 lmc 1 ISR9024 Voltaire lid 6 4xSDR vendid 0x2c9 devid 0x5a44 caguid 0x8f10403960558 Ca 2 H 0008 10403960558 MT23108 InfiniHost Mellanox Technologies 2 8 1040396055a S 005442ba00003080 8 lid 14 lmc 1 ISR9024 Vol taire lid 6 4xSDR 1 8 10403960559 S 005442ba00003080 12 lid 10 Imc 1 ISR9024 Voltaire lid 6 1xSDR Node Name Map File Format The node name map is used to specify user friendly names for nodes in the output GUIDs are used to perform the lookup comment lt guid gt lt name gt Mellanox Technologies 123 Rev 4 70 Example IB1 Line cards 0x0008 104003f125c IB1 Rack 11 slot 1 ISR9288 ISR9096 Voltaire sLB 24D 0x0008f104003f125d IB1 Rack 11 slot 1 ISR9288 ISR9096 Voltaire sLB 24D 0x0008 104003f10d2 IB1 Rack 11 slot 2 ISR9288 ISR9096 Voltaire sLB 24D 0x0008 104003f10d3 IB1 Rack 11 slot 2 ISR9288 ISR9096 Voltaire sLB 24D 0x0008f 104003f10bf IB1 Rack 11 slot 12 ISR9288 ISR9096 Voltaire sLB 24D Spines 0x0008f 10400400e2d IB1 Rack 11 spine 1 ISR9288 Voltaire sFB 12D 0x0008f 10400400e2e
68. 09C NetXtreme Il GigE 49 Composite Bus Enumerator Direct memory access controller Intel R 5520 5500 Physical and Link Layer Registers Port 1 3427 Intel R 5520 5500 Routing and Protocol Layer Register Port 1 3428 Intel R 5520 5500 X58 0 Hub Control Status and RAS Registers 3423 Intel R 5520 5500 X58 I O Hub GPIO and Scratch Pad Registers 3422 Intel R 5520 5500 X58 1 0 Hub PCI Express Root Port 1 3408 Intel R 5520 5500 X58 0 Hub PCI Express Root Port 2 3409 Intel R 5520 5500 X58 0 Hub PCI Express Root Port 3 3404 Intel R 5520 55008 I O Hub PCI Express Root Port 7 340E Intel R 5520 5500 X58 1 0 Hub System Management Registers 342E Intel R 5520 5500 X58 1 0 Hub Throttle Registers 3438 Intel R 5520 5500 58 1 0 Hub to ESI Port 3406 Intel R 5520 5500 X58 Physical and Link Layer Registers Port 0 3425 Intel R 5520 5500 X58 Routing and Protocol Layer Registers Port 0 3426 Intel R 5520 5500 X58 Trusted Execution Technology Registers 342F i d I I I I I I i d K I Intel R 82801 PCI Bridge 24E Intel R Chipset QuickData Technology device 3429 Intel R Chipset QuickData Technology device 342A Intel R Chipset QuickData Technology device 3428 Intel R Chipset QuickData Technology device 342C Intel R Chipset QuickData Technology device 5430 Intel R Chipset QuickData Technology device 3431 Intel R Chipset QuickData Technology device 3432 Intel R Chipset QuickData Technology dev
69. 0X1 nd send lat s1048576 D10 S 11 137 53 1 Client side start b wait affinity 0X1 nd send lat s1048576 D10 C 11 137 53 1 14 4 18 2nd send lat Options The table below lists the various flags of the command Table 53 nd send lat Options Flag Description h Shows the Help screen V Shows the version number p Connects to the port port default 68307 s msg size Exchanges the message size with default 65536B gt and it must not be combined with a flag Runs all the messages sizes from 1B to 8MB and it must not be combined with s flag n num of iterations The number of exchanges at least 2 the default is 100000 I max inline size The maximum size of message to send inline The default number is 128B D test duration in seconds Tests duration in seconds f margin time in seconds The margin time to avoid calculation and it must be less than half of the duration time S server interface IP gt server side only must be last parameter C server interface IP gt client side only must be last parameter gt h Shows the Help screen 14 4 19 NTttcp NTttcp is a Windows base testing application that sends and receives TCP data between two or more endpoints It is a Winsock based port of the ttcp tool that measures networking perfor mance bytes second To download the latest version of NTttcp 5
70. 0x20 FRAMES dumps all SMP and GMP frames 0x40 ROUTING dump FDB routing information 0x80 currently unused Without vf osmtest defaults to ERROR INFO 0x3 Specifying vf 0 disables all messages Specifying vf OxFF enables all messages see V High verbosity levels may require increasing the transaction timeout with the t option h help Display this usage info then exit 14 3 16 ibaddr Displays the lid and range as well as the GID address of the port specified by DR path lid or GUID or the local port by default This utility can be used as simple address resolver ae 14 3 16 1ibaddr Synopsis ibaddr d ebug D irect G uid 1 id_show g id show C ca name P ca port t imeout timeout ms V ersion h elp lt lid dr path guid gt 14 3 16 2ibaddr Options Table 29 ibaddr Flags and Options Flags Description G Guid shows lid range and gid for GUID address l lid show shows lid range only L Lid show shows lid range in decimal only g gid show shows gid address only Mellanox Technologies 131 Rev 4 70 Table 29 ibaddr Flags and Options Flags Description Debugging Flags Description NOTE Most OpenIB diagnostics take the following common flags The exact list of supported flags per utility can be found in the usage message and can be shown using the util
71. 1Qbb CoS For DSCP based PFC the packet is marked with a DSCP value in the Differentiated Services DS field of the IP header Mellanox Technologies 56 J Rev 4 70 8 9 1 Setting the DSCP in the IP Header Marking DSCP value in the IP header is done differently for IP packets constructed by the NIC e g RDMA traffic and for packets constructed by the IP stack e g TCP traffic e For IP packets generated by the IP stack the DSCP value is provided by the IP stack The NIC does not validate the match between DSCP and Class of Service CoS values CoS and DSCP values are expected to be set through standard tools such as PowerShell command New NetQosPolicy using PriorityValue8021Action and DSCPAction flags respectively ForIP packets generated by the NIC RDMA the DSCP value is generated according to the CoS value programmed for the interface CoS value is set through standard tools such as PowerShell command New NetQosPolicy using Priority Value8021 Action flag The NIC uses a mapping table between the CoS value and the DSCP value configured through the RroceDscpMarkPriorityFlow Control 0 7 Registry keys 8 9 2 Configuring Quality of Service for TCP and RDMA Traffic Step 1 Verify that DCB is installed and enabled is not installed by default Install WindowsFeature Data Center Bridging Step2 Import the PowerShell modules that are required to configure DCB import module NetQos import module DcbQos import module Ne
72. 28 please refer to Microsoft website following the link below http gallery technet microsoft com NTttcp Version 528 Now f8b12769 This tool should be run from cmd only Ahi Mellanox Technologies 162 Rev 4 70 14 4 19 1NTttcp Synopsis Server ntttep x64 exe r t 115 m Go eneee 1P gt Client ntttcp x64 exe s t 15 m 16 same address as above 14 4 19 2NTttcp Options The table below lists the various flags of the command Table 54 NTttcp Options Flags Description S Works as a sender T Works as a receiver l lt Length of buffer gt default TCP 64K UDP 128 n lt Number of buffers gt default 20K p lt port base gt default 5001 sp Synchronizes data ports if used p should be same on every instance a outstanding I O gt default 2 X lt PacketArray size gt default 1 rb lt Receive buffer size gt default 64K sb lt Send buffer size gt default 8K u UDP send recv W WSARecv WSASend d Verifies Flag t Runtime in seconds cd lt Cool down gt in seconds wu lt Warm up gt in seconds nic lt NIC IP Use NIC with for sending data sender only m mapping mapping Mellanox Technologies 163 Rev 4 70 15 Troubleshooting 15 1 InfiniBand Troubleshooting Issue 1 The InfiniBand interfaces are not up after the first reboot after the installation process is comple
73. 4 3 4 2 ibroute Synopsis ibroute h d v V a n D G M L e u s lt smlid gt V C ca name P ca port t timeout ms V dest dr path 1id guid lt startlid gt lt endlid gt 14 3 4 3 ibroute Options The table below lists the various ibroute flags of the command Table 17 ibroute Flags and Options Flag Description h help Print the help menu d debug Raise the IB debug level May be used several times for higher debug levels ddd or d d d a all Show all LIDs in range including invalid entries v verbose Increase verbosity level May be used several times for additional verbos ity vvv or v v v V version Show version info Mellanox Technologies 109 Rev 4 70 Table 17 ibroute Flags and Options Flag Description n no dests Do not try to resolve destinations D Direct Use directed path address arguments The path is a comma separated list of out ports Examples 0 self port 0 1 2 1 4 out via port 1 then 2 G Guid Use GUID address argument In most cases it is the Port GUID Exam ple 0x08f1040023 M Multicast Show multicast forwarding tables The parameters lt startlid gt and lt endlid gt specify the MLID range L Lid Use Lid address argument u usage Usage message e errors Show send and receive errors timeouts a
74. 4 70 config if Et10 speed forced 40gfull config if Et10 channel group 11 mode active Step 5 Enable PFC on ports that face the network config interface et20 config if Et20 load interval 5 config if Et20 speed forced 40gfull config if Et20 switchport trunk native vlan tag config if Et20 dcbx mode ieee config if Et20 priority flow control mode on 0 config if Et20 switchport trunk allowed vlan 11 config if Et20 switchport mode trunk E config if Et20 priority flow control priority 3 no drop 8 7 4 1 Using Global Pause Flow Control GFC gt To enable GFC on ports that face the hosts perform the following config interface et10 config if Et10 flowcontrol receive on config if Et10 flowcontrol send on 8 7 4 2 Using Priority Flow Control PFC gt To enable PFC on ports that face the hosts perform the following config interface et10 config if Et10 dcbx mode ieee config if Et10 priority flow control mode on config if Et10 priority flow control priority 3 no drop 8 7 5 Configuring Router PFC only The router uses L3 s DSCP value to mark the egress traffic of L2 PCP The required mapping maps the three most significant bits of the DSCP into the PCP This is the default behavior and no additional configuration is required 8 7 5 1 Copying Port Control Protocol PCP between Subnets The captured PCP option from the Ethernet
75. 4 up to 64000 Note This registry key is not exposed to the user via the UI If LSOSize is smaller than MTU 1024 LSO will be disabled LSOMinSegment 2 The minimum number of segments that a large TCP packet must be divisible by before the transport can offload it to a NIC for seg mentation The valid values are 2 up to 32 Note This registry key is not exposed to the user via the UI LSOTcpOptions 1 Enables that the miniport driver to segment a large TCP packet whose TCP header contains TCP options The valid values are e 0 disable e 1 enable Note This registry key is not exposed to the user via the UI LSOIpOptions 1 Enables its NIC to segment a large TCP packet whose IP header contains IP options The valid values are 0 disable e 1 enable Note This registry key is not exposed to the user via the UI PChecksumOffloadIP 3 Specifies whether the device performs the calculation of IPv4 v4 checksums The valid values are e 0 disable 1 Tx Enable 2 Rx Enable 3 Tx and Rx enable Mellanox Technologies 181 Rev 4 70 Value Name Default Value Description TCPUDPChecksumO 3 Specifies whether the device performs the calculation of TCP or ffloadIPv4 UDP checksum over IPv4 The valid values are 0 disable 1 Tx Enable e 2 Rx Enable e 3 Tx and Rx enable TCPUDPChecksumO 3 Specifies whether the device performs the calculat
76. 4g OS RS 131 Table30 ibcacheedit Flags and Options coco Eh Rr oo ee eee Ree 133 Table31 iblinkinfo Flags and Options 20 0 0 cc cece cence eese 134 Table32 ibqueryerrors Flags and Options 0 cece eee eee ene 135 Table 33 ibsysstat Flags and Options 0 cece eee ees 137 Table 34 saquery Flags and Options 4 cocus oce eae ei awe Cae dase wands 140 Table 35 smpdump Flags and Opions 222270 Gie eee eee wha pe UE ERU es 142 Table 36 ib read bw Flags and Options 2 cece eee eee nee 143 Table 37 ib read_lat Flags and Options 5 veo yaw Sa Wado cU REEEMEP E eae ee es as 144 Table 38 ib send bw Flags and Options soeur e RR y 145 Table39 ib send lat Flags and Options 0 0 0c eee esses 146 Table40 ib write bw Flags and Options 0 cece eee en 147 Table41 ib write lat Flags and Options 0 cece eee eee eee 148 Table 42 iby read_bw Flags and Options oiaudepot sue tibsetueb ER RSSdS TUE XI 149 Table 43 ibv read lat Flags and Options 2 0 0 cece eee eee eee 150 Mellanox Technologies 7 J Rev 4 70 Table 44 ibv send bw Flags and Options 0 0 0 0 c eee ene 151 Table 45 ibv send lat Flags and Options oo sesso race ead vende 44 On 153 Table 46 ibv write bw Flags and Options 0 neesan eens 154 Table 47 iby write lat Flags and Options 2555 sco e600 le v Ex PROPRES A Ce Eee 156 Table 48 nd write bw F
77. 5 Querying the Virtual Ethernet Database sees 74 8 11 6 Help M ssage ove VIRA es ER OSTIA s CEA E nes 74 8 12 IPoIB SR IOV over KVM usssseeesee teens 75 8 13 Lossless TOP 2o ose ded emet t bt eR UE CEA Dee e Bie eb ey dte 75 8 13 T Introduction ebe n eye de a ts ddl t ee t tlt das 75 8 13 2 Drop Mode uos RR RR werk Sea Bo ENE eho ER I SURE DP e 76 8 13 53 ROU S 0 23 48 ver pipe oet e E rip E RENUEPENU ERES PES 76 8 13 4 Default behavior sisse dee RR T ee bee eS 76 8 13 5 Known Limitations 52d ces Ghee cenai UR RR nents HEURE RUE Deeds 76 8 13 6 System Requirements 0 ccc tenet eens 76 8 13 7 Enabling Disabling Lossless TCP 00 0c cee cece eee eens 76 8 13 8 Monitoring Lossless TCP State llle 77 Mellanox Technologies 4 J Rev 4 70 Chapter 9 Booting Windows from an iSCSI Target cc eeeeeeeee 78 9 1 Configuring the WDS DHCP and iSCSI Servers 200 0005 78 9 1 1 Configuring the WDS Server 2 0 2 eee cette nee 78 9 1 2 Configuring iSCSI Tat getis eee Ghd gua ee Haw MORS p Reales 78 9 1 3 Configuring the DHCP Server 0 cece eens 78 9 2 Configuring the Client Machine 2 0 0 c cece seh 79 9 3 Installing ISESE sia soot cete bee oS pul ad ota eae Aol n alk anaes 79 Chapter 10 Deploying Windows Server 2012 and Above with SMB Direct 81 10 1 IOVErVIe Ws ooo he eO RVREODEdURNCOEI ET REECURP T OEC E eq 81 10 2 Har
78. 6 sysimgguid 0x5442ba00003000 switchguid 0x5442ba00003080 5442ba00003080 Switch 24 S 005442ba00003080 ISR9024 Voltaire base port 0 lid 6 lmc 0 22 H 0008 10403961354 1 8 10403961355 MT23108 InfiniHost Mellanox Technologies lid 4 4xSDR 10 S 0008 10400410015 1 4 SW 6IB4 Voltaire lid 3 4xSDR 8 H 0008 10403960558 2 8 1040396055a MT23108 InfiniHost Mellanox Technologies lid 14 4xSDR 6 S 0008 10400410015 3 SW 6IB4 Voltaire lid 3 4xSDR 12 H 0008 10403960558 1 8 10403960559 MT23108 InfiniHost Mellanox Technologies lid 10 4xSDR vendid 0x8f1 devid 0x5a05 switchguid 0x8f10400410015 8 10400410015 Switch 8 S 0008 10400410015 SW 6IB4 Voltaire base port 0 lid 3 lmc 0 6 H 0008 10403960984 1 8 10403960985 MT23108 InfiniHost Mellanox Technologies lid 16 4xSDR 4 H 005442b100004900 1 5442b100004901 MT23108 InfiniHost Mellanox Technologies lid 12 4xSDR 1 S 005442ba00003080 10 ISR9024 Voltaire lid 6 1xSDR Mellanox Technologies 122 Rev 4 70 3 S 005442ba00003080 6 ISR9024 Voltaire lid 6 4xSDR vendid 0x2c9 devid 0x5a44 caguid 0x8f10403960984 Ca 2 H 0008 10403960984 MT23108 InfiniHost Mellanox Technologies 1 8 10403960985 S 0008 10400410015 6 lid 16 lmc 1 SW 61B4 Vol taire lid 3 4xSDR vendid 0x2c9 devid 0x5a44 caguid 0x5442b100004900 Ca 2 H 005442b1
79. 64 up to 1024 Note This registry value is not exposed via the UI QOS 1 Enables the NDIS Quality of Service QoS The valid values are 1 enable 0 disable Note This keyword is only valid for ConnectX 3 when using Win dows Server 2012 and above Mellanox Technologies 179 Rev 4 70 Value Name Default Value Description RxIntModerationPro 1 Enables the assignment of different interrupt moderation profiles file for receive completions Interrupt moderation can have a great effect on optimizing network throughput and CPU utilization The valid values are 0 Low Latency Implies higher rate of interrupts to achieve better latency or to handle scenarios where only a small number of streams are used 1 Moderate Interrupt moderation is set to midrange defaults to allow maxi mum throughput at minimum CPU utilization for common sce narios 2 Aggressive Interrupt moderation is set to maximal values to allow maxi mum throughput at minimum CPU utilization for more inten sive multi stream scenarios TxIntModerationPro 1 Enables the assignment of different interrupt moderation profiles file for send completions Interrupt moderation can have great effect on optimizing network throughput and CPU utilization The valid values are 0 Low Latency Implies higher rate of interrupts to achieve better latency or to handle scenarios where only a small number of streams are used
80. 96 entries s silent Do not print progress indication mem mode lt size gt When specified packets are written to file only after the capture is stopped It is faster than default mode less chance for packet loss but takes more memory In this mode ibdump stops after lt size gt bytes are captured decap Decapsulate port mirroring headers Should be used when capturing RSPAN traffic h help Display this help screen V version Print version information 14 3 6 smpquery Provides a basic subset of standard SMP queries to query Subnet management attributes such as node info node description switch info and port info 14 3 6 1 smpquery Applicable Hardware All InfiniBand devices Mellanox Technologies 112 Rev 4 70 14 3 6 2 smpquery Synopsis smpquery h d e c v D G s lt smlid gt L u V C ca name P ca port t timeout ms node name map lt node name map gt op dest dr path lid guid op params 14 3 6 3 smpquery Options The table below lists the various flags of the command Table 19 smpquery Flags and Options Flag Description h help Print the help menu d debug Raise the IB debug level May be used several times for higher debug levels ddd or d d d e errors Show send and receive errors timeouts and others v verbose Increase verbosity level May be used sever
81. B protocol is used on top of the TCP IP protocol or other network protocols Using the SMB protocol allows applications to access files or other resources on a remote server to read create and update them In addition it enables communication with any server program that is set up to receive an SMB client request 10 2 Hardware and Software Prerequisites The following are hardware and software prerequisites Two or more machines running Windows Server 2012 and above One or more Mellanox ConnectX 2 ConnectX 3 or ConnectX 3 Pro adapters for each server e One or more Mellanox InfiniBand switches Two or more QSFP cables required for InfiniBand 10 3 SMB Configuration Verification 10 3 1 Verifying Network Adapter Configuration Use the following PowerShell cmdlets to verify Network Direct is globally enabled and that you have NICs with the RDMA capability Runon both the SMB server and the SMB client Get NetOffloadGlobalSetting Select NetworkDirect Get NetAdapterRDMA Get NetAdapterHardwareInfo 10 3 2 Verifying SMB Configuration Use the following PowerShell cmdlets to verify SMB Multichannel is enabled confirm the adapters are recognized by SMB and that their RDMA capability is properly identified Onthe SMB client run the following PowerShell cmdlets Get SmbClientConfiguration Select EnableMultichannel Get SmbClientNetworkInterface Onthe SMB server run the following PowerShell cmdlets Get SmbS
82. Counters P lt PM lt Trash gt gt If any of the provided pm is greater than its provided value print it to screen lw lt 1x 4x 12x gt Specifies the expected link width ls lt 2 5 5 10 gt Specifies the expected link speed skip lt skip option s gt Skip the executions of the selected checks Skip options one or more can be specified dup_guids zero_guids pm logical_state part ipoib all Mellanox Technologies 104 Rev 4 70 14 3 2 3 ibdiagnet Output Files Table 15 ibdiagnet Output Files Output File Description Ibdiagnet log A dump of all the application reports generate according to the provided flags Ibdiagnet lst List of all the nodes ports and links in the fabric Ibdiagnet fdbs A dump of the unicast forwarding tables of the fabric switches Ibdiag A dump of the multicast forwarding tables of the fabric switches net mcfdbs Ibdiag In case of duplicate port node Guids these file include the map between masked Guid net masks and real Guids ibdiagnet sm List of all the SM state and priority in the fabric ibdiagnet pm A dump of the pm Counters values of the fabric links ibdiagnet pkey A dump of the existing partitions and their member host ports ibdiagnet mcg A dump of the multicast groups their properties and member host ports ibdiagnet db A dump of the internal subnet database This file can be loaded in later runs using the load_d
83. D 5001 B 2 Adding NVGRE Configuration to Host 15 Example The following is an example of adding NVGRE to Host 15 On both sides vSwitch create command Note that vSwitch configuration is persistent no need to configure it after each reboot New VMSwitch VSwMLNX NetAdapterName Portl AllowManagementOS true Shut down VMs Stop VM Name mtlael15 005 Force Confirm Stop VM Name mtlae15 006 Force Confirm Connect VM to vSwitch maybe you have to switch off VM before doing manual does also work Connect VMNetworkAdapter VMName mtlael4 005 SwitchName VSwMLNX Add VMNetworkAdapter VMName mtlael5 005 SwitchName VSwMLNX StaticMacAddress 00155D730100 Add VMNetworkAdapter VMName mtlael5 006 SwitchName VSwMLNX StaticMacAddress 00155D730101 Mellanox Technologies 174 Rev 4 70 eue The commands from Step 2 4 are not persistent Its suggested to create Script is running after each OS reboot Step 2 Configure a Subnet Locator and Route records on each Hyper V Host Host 1 and Host 2 mtlael4 amp mtlael5 New NetVirtualizationLookupRecord CustomerAddress 172 16 14 5 ProviderAddress 192 168 20 114 VirtualSubnetID 5001 MACAddress 00155D720100 Rule TranslationMetho dEncap New NetVirtualizationLookupRecord CustomerAddress 172 16 14 6 ProviderAddress 192 168 20 114 VirtualSubnetID 5001 MACAddress 00155D720101 Rule TranslationMetho dEncap New NetVirtualizationLook
84. Data Split The header data split feature improves network performance by splitting the headers and data in received Ethernet frames into separate buffers The feature is disabled by default and can be enabled in the Advanced tab Performance Options from the Properties window For further information please refer to the MSDN library http msdn microsoft com en us library windows hardware ff553723 v VS 85 aspx 8 3 Receive Side Scaling RSS Mellanox WinOF Rev 4 70 IPoIB and Ethernet drivers use NDIS 6 30 new RSS capabilities The main changes are Removed the previous limitation of 64 CPU cores Individual network adapter RSS configuration usage gt RSS capabilities can be set per individual adapters as well as globally To do so set the registry keys listed below For instructions on how to find interface index in registry nn Please refer to C 2 Finding the Index Value of the Network Interface on page 177 Table 6 Registry Keys Setting Sub key Description HKLM SYSTEM CurrentControlSet Con Maximum number of CPUs allotted Sets trol Class 4d366e972 e325 11ce bfc1 the desired maximum number of processors 08002be10318 lt nn gt MaxRSSProcessors for each interface The number can be differ ent for each interface Note Restart the network adapter after you change this registry key HKLM SYSTEM CurrentControlSet Con Base CPU number Sets the desired base trol Class 4d3 6e972 e325 11ce bfc1 CPU num
85. InfiniBand Ethernet and all RoCE versions traffic that flows to and from Mellanox ConnectX 3 ConnectX 3 Pro NIC s ports It provides a similar functionality to the tcpdump tool on a standard Ethernet port The ibdump tool generates packet dump file in Mellanox Technologies 111 Rev 4 70 peap format This file can be loaded by the Wireshark tool www wireshark org for graphical traffic analysis This provides the ability to analyze network behavior and performance and to debug applica tions that send or receive RDMA network traffic Run ibdump h to display a help message which details the tools options 14 3 5 1 ibdump Synopsis ibdump 14 3 5 2 ibdump Options The table below lists the various ibdump flags of the command Table 18 ibdump Flags and Options Flag Description d ib dev lt dev gt Use RDMA device lt dev gt default first device found The relevant devices can be listed by running the ibv_devinfo command i ib port lt port gt Use port lt port gt of IB device default 1 w write lt file gt Dump file name default sniffer pcap stands for stdout enables piping to tcpdump or tshark 0 output lt file gt Alias for the w option Do not use for backward compatibility b max burst lt log2 burst gt log2 of the maximal burst size that can be captured with no packets loss Each entry takes MTU bytes of memory default 12 40
86. Kemel Boot 32 None i Information 12 10 2012 815 12PM Kernel Boot 17 None Forwarded Events p gt Applications and Services Lo 53 Subscriptions Event 50 mb bus General Details MLX4_SF_131_0_0 SRIOV was successfully enabled Running in master mode 8 10 5 Configuring Virtual Machine Networking gt To configure Virtual Machine networking Step 1 Create an SR IOV enabled Virtual Switch over Mellanox Ethernet Adapter Go to Hyper V Manager gt Actions gt Virtual Switch gt external gt Create virtual Switch gt Apply Mellanox Technologies 69 Rev 4 70 Figure 14 Virtual Switch with SR IOV amp Virtual Switches 2 New virtual network switch a Virtual Switch Properties E Internal Virtual Switch Name Internal only Mellanox SRIOV Virtual Switch x Mellanox SRIOV Virtual Switch Mellanox ConnectX 3 Etherne Notes t Global Network Settings MAC Address Range 00 15 5D 21 4C 00 to 00 15 S5D 2 Connection type What do you want to connect this virtual switch to External network Mellanox ConnectX 3 Ethernet Adapter v v Allow management operating system to share this network adapter v Enable single root I O virtualization SR IOV Internal network Private network VLAN ID Enable virtual LAM identification For management operating system Remove SR IOV can on
87. LPT p dh Print queues gt BB Processors b DP Security devices b j Software devices b Storage controllers a jE System devices ji ACPI Fixed Feature Button jE Broadcom BCM5709C NetXtreme Il GigE 48 1i Broadcom BCM5709C NetXtreme Il GigE 49 ji Composite Bus Enumerator i Direct memory access controller 1i Intel R 5520 5500 Physical and Link Layer Registers Port 1 3427 j Intel R 5520 5500 Routing and Protocol Layer Register Port 1 3428 1i Intel R 5520 5500 X58 I O Hub Control Status and RAS Registers 3423 jE Intel R ICH10 Family SMBus Controller 3A30 j Intel R ICH10 LPC Interface Controller 3A18 amp Mellanox ConnectX 3 VPI MT04099 Network Adapter j Microsoft ACPI Compliant System j Microsoft Generic IPMI Compliant Device Mellanox Technologies 26 Rev 4 70 5 Uninstalling Mellanox WinOF Driver 5 1 Attended Uninstall Touninstall MLNX WinOF on a single node 1 Click Start gt Control Panel gt Programs and Features gt MLNX VPI Uninstall NOTE This requires elevated administrator privileges see Section 1 1 Hardware and Software Requirements on page 16 for details 2 Double click the exe and follow the instructions of the install wizard 3 Click Start gt All Programs gt Mellanox Technologies gt MLNX_WinOF gt Uninstall MLNX_WinOF 5 2 Unattended Uninstall gt To uninstall MLNX_WinOF in unattended mode Step 1 Open a CMD console Windows S
88. Mellanox TECHNOLOGIES Mellanox WinOF VPI User Manual Rev 4 70 www mellanox com Rev 4 70 NOTE THIS HARDWARE SOFTWARE OR TEST SUITE PRODUCT PRODUCT S AND ITS RELATED DOCUMENTATION ARE PROVIDED BY MELLANOX TECHNOLOGIES AS IS WITH ALL FAULTS OF ANY KIND AND SOLELY FOR THE PURPOSE OF AIDING THE CUSTOMER IN TESTING APPLICATIONS THAT USE THE PRODUCTS IN DESIGNATED SOLUTIONS THE CUSTOMER S MANUFACTURING TEST ENVIRONMENT HAS NOT MET THE STANDARDS SET BY MELLANOX TECHNOLOGIES TO FULLY QUALIFY THE PRODUCTO S AND OR THE SYSTEM USING IT THEREFORE MELLANOX TECHNOLOGIES CANNOT AND DOES NOT GUARANTEE OR WARRANT THAT THE PRODUCTS WILL OPERATE WITH THE HIGHEST QUALITY ANY EXPRESS OR IMPLIED WARRANTIES INCLUDING BUT NOT LIMITED TO THE IMPLIED WARRANTIES OF MERCHANTABILITY FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT ARE DISCLAIMED IN NO EVENT SHALL MELLANOX BE LIABLE TO CUSTOMER OR ANY THIRD PARTIES FOR ANY DIRECT INDIRECT SPECIAL EXEMPLARY OR CONSEQUENTIAL DAMAGES OF ANY KIND INCLUDING BUT NOT LIMITED TO PAYMENT FOR PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES LOSS OF USE DATA OR PROFITS OR BUSINESS INTERRUPTION HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY WHETHER IN CONTRACT STRICT LIABILITY OR TORT INCLUDING NEGLIGENCE OR OTHERWISE ARISING IN ANY WAY FROM THE USE OF THE PRODUCT S AND RELATED DOCUMENTATION EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE Mellanox TECHNOLOGIES Mellanox Tec
89. Options The table below lists the various flags of the command Table 27 vstat Flags and Options Flag Description V Verbose mode C HCA error statistic counters m more verbose mode pN repeat every N sec 14 3 15 osmtest osmtest is a test program to validate InfiniBand subnet manager and administration SM SA Default is to run all flows with the exception of the QoS flow osmtest provides a test suite for opensm osmtest has the following capabilities and testing flows tcreates an inventory file of all available Nodes Ports and PathRecords including all their fields Mellanox Technologies 128 Rev 4 70 It verifies the existing inventory with all the object fields and matches it to a pre saved one A Multicast Compliancy test An Event Forwarding test A Service Record registration test An RMPP stress test A Small SA Queries stress test It is recommended that after installing opensm the user should run osmtest f c to generate the inventory file and immediately afterwards run osmtest f a to test OpenSM Additionally it is recommended to create the inventory when the IB fabric is stable and occa sionally run osmtest v to verify that nothing has changed 14 3 15 10smtest Synopsis osmtest f low c a v s e f m g t w ait trap wait time d ebug num ber m ax lid LID in hex gt g uid lt GUID in hex gt p ort i n
90. Pause 0 When Per Priority Rx Pause is configured the receiving adapter generates a flow control frame when its priority received queue reaches a pre defined limit The flow control frame is sent to the sending adapter Notes This registry value is not exposed via the UI RxPause and PerPriRxPause are mutual exclusive i e at most only one of them can be set PerPriTxPause 0 When Per Priority TX Pause is configured the sending adapter pauses the transmission of a specific priority if it receives a flow control frame from a link partner Notes This registry value is not exposed via the UI TxPause and PerPriTxPause are mutual exclusive i e at most only one of them can be set C 6 2 VMQ Options This section describes the registry keys that are used to control the NDIS Virtual Machine Queue VMQ The VMQ supports Microsoft Hyper V network performance and is supported on Win dows Server 2008 R2 and above For more details about VMQ please refer to Microsoft web site http msdn microsoft com en us library windows hardware ff571034 v vs 85 aspx Value Name Default Value Description VMQ 1 The support for the virtual machine queue VMQ features of the network adapter The valid values are 1 enable 0 disable Mellanox Technologies 189 Rev 4 70 Value Name Default Value Description RssOrVmqPreference 0 Specifies whether VMQ capabilities should
91. RS General Advanced Information Performance Driver p Wi Monitors The following properties are available for this network adapter Click 4 K Network adapters the property you want to change on the left and then select its value XY Embedded Broadcom NetXtreme 5721 PCI E Gigabit NIC on the right Lu Embedded Broadcom NetXtreme 5721 PCI E Gigabit NIC 2 Property Value Hyper V Virtual Ethernet Adapter 2 RSS load balancing Profile 0 EX Mellanox ConnectX 3 Ethernet Adapter 2 aes har Processor umber x Interrupt Moderation Profile Mellanox ConnectX 3 Ethernet Adapter 3 Rx Interrupt Moderation Type amp Microsoft Kernel Debug Network Adapter Send Buffers Y Send Completion Method FOR Ports COM amp LPT TCP UDP Checksum Offload IPv p Print queues TCP UDP Checksum Offload IPs b m Processors Transmit Control Blocks Tx Interrupt Moderation Profile b amp Storage controllers Virtual Machine Queues 4 Bi System devices VLAN ID oli ACPI Fixed Feature Butt VMO Lookahead Split eere VMO VLAN Filtering Y 8 6 Ports TX Arbitration On a setup with a dual port NIC with both ports at link speed of 40GbE each individual port can achieve maximum line rate When both ports are running simultaneously in a high throughput scenario the total throughput is bottlenecked by the PCIe bus and in this case each port may not achieve its maximum of 40GbE Ports TX Arbitration ensures bandwidth precedence is giv
92. Remote access errors Number of remote access errors when the local machine receives inbound traffic i e the local machine received RDMA request with wrong rkey Requester RNR NAK Number of RNR Receiver Not Ready NAKs received when the local machine generates outbound traffic Responder RNR NAK Number of RNR Receiver Not Ready NAKs sent when the local machine receives inbound traffic Requester out of order sequence NAK Number of Out of Sequence NAK received when the local machine gen erates outbound traffic i e the number of times the local machine received NAKs indicating OOS on the receiving side Responder out of order sequence received Number of Out of Sequence packet received when the local machine receives inbound traffic i e the number of times the local machine received messages that are not consecutive Requester resync Number of resync operations when the local machine generates outbound traffic Responder resync Number of resync operations when the local machine receives inbound traffic Requester Remote operation errors Number of remote operation errors when the local machine generates out bound traffic i e NAK was received indicating that the other end encountered an error that prevented it from completing the request Requester transport retries exceeded errors Number of transport retries exceeded errors when the local machine gen erates outbound traffic Re
93. SCSI client 2 Press F12 when asked to proceed to iSCSI boot Virtual Media File View Macros Tools Power Chat Performance Help ILink doun TX O TXE O RX O RXE 01 Waiting for link up on neti ok DHCP net1 00 c9 00 b5 92 22 ok neti 11 0 0 217255 255 0 0 netO 11 0 0 20 255 255 0 0 Cinaccessible Next serve 0 83 Filename 6 wdsnbp com Root path i 11 0 0 83 iqn 1991 05 com microsoft l winqga 083 l winqa 083 target Registered SAN device 0x80 tftp 11 0 0 83 boot 5Cx86 5Cudsnbp com ok Downloaded WDSNBP from 11 0 0 83 11 0 0 83 WDSNBP started using DHCP Referral r 11 0 0 83 Gateway 9 0 0 0 Contacting S TFTP Download boot x86 wdsnbp com Downloaded WDSNBP from 11 0 0 83 1l winga 083 Press Fi2 for network service boot AFEHTCECTUTE ROT Contacting Server 11 0 0 83 TFTP Download boot x64 pxeboot ni2 Current User s rcon 1 Mellanox Technologies 79 J Rev 4 70 3 Choose the relevant boot image from the list of all available boot images presented windows Boot Manager Server IP 11 0 0 83 Choose an operating system to start Use the arrow keys to highlight yo choice then press ENTER icrosoft windows Setup 2012 x64 4 60RC10 Eth gt Microsoft Windows Setup 2012 x64 4 60RC10 IB Microsoft windows PE x64 2012 4 60RC10 VPI Microsoft Windows PE x64 2012 4 60RC10 Eth 4 Choose the Operating System you wish to install Select the operating system yo
94. WinOF web page at http www mellanox com gt Products gt InfiniBand VPI Drivers gt Windows SW Drivers Step 3 Download the exe image according to the architecture of your machine see Step 1 and the operating system The name of the exe is in the following format MLNX VPI WinOF version All lt OS gt _ lt arch gt exe Installing the incorrect exe file is prohibited If you do so an error message will be dis played For example if you try to install a 64 bit exe on a 32 bit machine the wizard will display the following or a similar error message Windows Installer A This installation package is not supported by this processor type Contact your product vendor OK Mellanox Technologies 17 J Rev 4 70 3 Extracting Files Without Running Installation To extract the files without running installation perform the following steps Step 1 Open a CMD console Windows Server 2008 R2 Click Start gt Run and enter CMD Windows Server 2012 2012 R2 Click Start gt Task Manager gt File gt Run new task gt and enter CMD Step 2 Extract the driver and the tools MLNX VPI WinOF version All 08 arch exe a To extract only the driver files MLNX VPI WinOF version All 0S arch exe a vMT DRIVERS ONLY 1 Step 3 Click Next to create a server image Welcome to the InstallShield Wizard for MLNX VPI The InstallShield R Wizard will install MLNX_VPI on your c
95. _name h syn tax d Raises the IB debugging level Can be used several times ddd or d d d e shows send and receive errors timeouts and oth ers h shows the usage message vV Increases the application verbosity level Can be used several times vv or v v v V shows the version info Addressing Flags Description D Uses directed path address arguments The path is a comma separated list of out ports Examples 0 self port 0 1 2 1 4 out via port 1 then 2 G Uses GUID address argument In most cases it is the Port GUID Example 0x08f1040023 s lt smlid gt Uses smlid as the target lid for SM SA queries Other Common Flags Description C lt ca_name gt Uses the specified ca_name P lt ca_port gt Uses the specified ca_port t lt timeout_ms gt Overrides the default timeout for the solicited mads 14 3 16 3Multiple CA Multiple Port Support When no IB device or port is specified the port to use is selected by the following criteria 1 The first port that is ACTIVE 2 If not found the first port that is UP physical link up If a port and or CA name is specified the user request is attempted to be fulfilled and will fail if it is not possible Mellanox Technologies 132 Rev 4 70 Examples ibaddr local port s address ibaddr 32 show lid range and gid of lid 32 ibaddr G 0x8f1040023 same but using guid address ibaddr 1 32 show lid range only ibaddr L 32 show
96. _port gt Uses the specified ca_port t lt timeout_ms gt Overrides the default timeout for the solicited mads 14 3 20 3Multiple CA Multiple Port Support When no IB device or port is specified the port to use is selected by the following criteria 1 The first port that is ACTIVE 2 If not found the first port that is UP physical link up If a port and or CA name is specified the user request is attempted to be fulfilled and will fail if it is not possible Mellanox Technologies 138 Rev 4 70 14 3 21 saquery saquery issues the selected SA query Node records are queried by default 14 3 21 1saquery Synopsis saquery h d p N list D S 1 L 1 G 0 U c s g m x C ca name P ca port smkey val t 4meout lt msec gt src to dst src dst sgid to dgid lt sgid dgid gt node name map lt node name map gt lt name gt lid guid Mellanox Technologies 139 Rev 4 70 14 3 21 2saquery Options Table 34 saquery Flags and Options Flags Description p Gets PathRecord info N Gets NodeRecord info list D Gets NodeDescriptions of CAs only S Gets ServiceRecord info I Gets InformInfoRecord subscription info L Returns the Lids of the name specified l Returns the unique Lid of the name speci fied G Returns the Guids of the name specified O Returns the name for the
97. a REVAS POM Ne na A A M Ee tice 0x000000a0 local POLE eT T T E eee 1 Vendor T E seiner craters 0x0002c9 14 3 7 perfquery Queries InfiniBand ports performance and error counters Optionally it displays aggregated counters for all ports of a node It can also reset counters after reading them or simply reset them 14 3 7 1 perfquery Applicable Hardware All InfiniBand devices 14 3 7 2 perfquery Synopsis perfquery h d G xmtsl X xmtdisc D rcvsl S rcverr E sigalleel cl e stalel m lesen joerc ea l exenorgs ell ll snerdoosis v usage u 1 r C ca name P lt ca_port gt R t lt timeout_ms gt V lt lid guid gt port reset mask The table below lists the various flags of the command Table 20 perfquery Flags and Options Flag Description help h Print the help menu debug d Raise the IB debug level May be used several times for higher debug levels ddd or d d d Guid G Use GUID address argument In most cases it is the Port GUID Example 0x08f1040023 xmtsl X Show Xmt SL port counters rcvsl S Show Rcv SL port counters xmtdisc D Show Xmt Discard Details rcverr E Show Rcv Error Details smpletl c Show samples control all ports a Apply query to all ports Lid L Use LID address argument sm port s lid SM port lid errors Show send
98. age 27 Section 8 8 4 Removing NVGRE configuration on page 56 Section 8 10 SR IOV on page 59 Section 8 11 Virtual Ethernet Adapter on page 73 Section 8 12 IPoIB SR IOV over KVM on page 75 Section 8 13 Lossless TCP on page 75 Section 9 Booting Windows from an iSCSI Tar get on page 78 Section 15 4 General Troubleshooting on page 168 Section C Registry Keys on page 176 Removed the following sections Documentation Rev 4 60 February 13 2014 Updated the following sections Section 8 1 Hyper V with VMQ on page 37 Section 8 8 1 Enabling Disabling NVGRE Offloading on page 54 Added the following sections Section 8 8 3 Verifying the Encapsulation of the Traffic on page 56 Section 8 11 Virtual Ethernet Adapter on page 73 Mellanox Technologies 9 J Rev 4 70 Table 1 Revision History Document Revision Date Changes December 30 2013 Updated the following sections Section 8 7 2 2 Configuring Windows Host on page 51 Updated the example in Step 5 Section 11 1 4 1 Performance Tuning Tool Application on page 86 Updated the Options table Section 11 2 Application Specific Optimization and Tuning on page 90 Removed the Bus master DMA Operations Section 12 OpenSM Subnet Manager on page 99 Added an option of how to register OpemSM via the PowerShell Section 8 8 2 Configuring the NVGRE us
99. al times for additional verbosity vvv or v v v D Direct Use directed path address arguments The path is a comma sepa rated list of out ports Examples 0 self port 0 1 2 1 4 out via port 1 then 2 G Guid Use GUID address argument In most cases it is the Port GUID Example 0x08f1040023 s sm port lt smlid gt Use lt smlid gt as the target LID for SM SA queries V version Show version info L Lid Use Lid address argument c combined Use combined route address argument u usage Usage message C Ca ca name Use the specified channel adapter or router P Port ca port Use the specified port t timeout timeout ms Override the default timeout for the solicited MADs msec op Supported operations NodelInfo NI lt addr gt NodeDesc ND lt addr gt PortInfo PI lt addr gt lt portnum gt e SwitchInfo SI lt addr gt PKeyTable PKeys lt addr gt lt portnum gt e SL2VLTable SL2VL lt addr gt lt portnum gt e VLArbitration VLArb lt addr gt lt portnum gt e GUIDInfo GI lt addr gt Mellanox Technologies 113 Rev 4 70 Table 19 smpquery Flags and Options Flag Description lt dest dr_path lid guid gt Destination s directed path LID or GUID node name map lt file gt Node name map file x extended Use extended speeds
100. alled in the guest operating system vmi Integration Services Step3 Start and connect to the Virtual Machine Select the newly created Virtual Machine and go to Actions panel gt Connect In the virtual machine window go to Actions gt Start Step 4 Assign IP address to the Mellanox VMNIC 1 To go to Network Connections enter the following command in the command prompt ncpa cpl 2 Right click the Hyper V adapter and choose properties 3 Mark the Use the following IP address checkbox 4 Enter the IP address Step 5 Copy the WinOF driver package to the VM using Mellanox VMNIC IP address Step 6 Install WinOF driver package on the VM Step 7 Reboot the VM at the end of installation Step 8 Enable the SR IOV for Mellanox VMNIC 1 Open VM settings Wizard 2 Right click the Network Adapter and choose Hardware Acceleration Settings Mellanox Technologies 71 Rev 4 70 Figure 16 Enable SR IOV on VMNIC amp Hardware Hardware Acceleration bill Add Hardware 1K BIOS Specify networking tasks that can be offloaded to a physical network adapter Boot From CD m Memory 4096 MB D Processor 1 Virtual processor i i E IDE Controller 0 ca Hard Drive IPsec task offloading DC_x64_fre_920 Support from a physical network adapter and the guest operating system is S E IDE Controller 1 required to offload IPsec tasks 1 DVD Drive When sufficient hardware resources are not available the securit
101. anced version of ib_write_lat and contains more flags and features than the older version and also improved algorithms ibv_write_lat calculates the latency of RDMA write operation of message size between a pair of machines One acts as a server and the other as a client They perform a ping pong benchmark on which one side RDMA writes to the other side memory only after the other side wrote on his memory Each of the sides samples the CPU clock each time they write to the other side memory to calculate latency Mellanox Technologies 155 Rev 4 70 14 4 12 1ibv_write_lat Synopsis ibv write lat i b port ib port s ize message size t x depth tx size size u qp timeout S L sl type iteration num V ersion C report cycles report unsorted 14 4 12 2ibv write lat Options c onnection type RC UC UD m tu mtu size I nline size inline x gid index n p ort PDT port a 11 H report histogram U d ib device name The table below lists the various flags of the command Table 47 ibv write lat Flags and Options Flag Description p port lt port gt Listens on connect to port lt port gt default 18515 d ib dev lt dev gt Uses IB device lt device guid gt default first device found i ib port lt port gt Uses port lt port gt of IB device default 1 m mtu lt mtu gt The mtu size default 1024 c connection lt
102. and module is loaded r rx depth lt dep gt Makes rx queue bigger than tx default 600 I inline_size lt size gt The maximum size of message to be sent in inline mode default 0 N no peak bw Cancels peak bw calculation default with peak bw g mcg num of qps Sends messages to multicast group with num of qps qps attached to it M MGID lt multicast_gid gt In case of multicast uses multicast gid as the group MGID The format must be 255 1 X X X X X X X X X X X X X X where X is a value within 0 255 R rdma cm Connects QPs with rdma cm and run test on those QPs Z com rdma cm Communicates with rdma cm module to exchange data use regular QPs Q cq mod Generates Cqe only after lt cq mod gt completion 14 4 10 ibv send lat This is a more advanced version of ib send lat and contains more flags and features than the older version and also improved algorithms ibv send lat calculates the latency of sending a packet in message size between a pair of machines One acts as a server and the other as a client Mellanox Technologies 152 Rev 4 70 They perform a ping pong benchmark on which you send packet only after you receive one Each of the sides samples the CPU clock each time they receive a send packet in order to calculate the latency 14 4 10 1ibv_send_lat Synopsis ibv send lat i b port ib port m tu mtu size s ize message size I nli
103. andatory for RoCE 14 4 7 ibv read bw This is a more advanced version of ib read bw and contains more flags and features than the older version and also improved algorithms ibv read bw calculates the BW of RDMA read between a pair of machines One acts as a server and the other as a client The client RDMA reads the server memory and calculate the BW by sampling the CPU each time it receive a suc cessful completion The test supports a large variety of features as described below and has bet ter performance than ib read bw in Nahalem systems Read is available only in RC connection mode as specified in the InfiniBand spec Mellanox Technologies 148 ibv read bw i b port ib port Rev 4 70 14 4 7 1 ibv_read_bw Synopsis d ib device o uts outstanding reads m tu mtu size s ize message size t x depth tx size n iteration num p ort PDT port u gp timeout S 1 sl type x gid index e vents use events F CPU freq fail b idirectional a 11 V ersion 14 4 7 2 ibv_read_bw Options The table below lists the various flags of the command Table 42 ibv_read_bw Flags and Options Flag Description p port lt port gt Listens on connect to port lt port gt default 18515 d ib dev lt dev gt Uses IB device lt device guid gt default first device found i ib port lt port gt Uses port lt port gt of IB device default 1
104. ange of mtu size tx size number of iteration message size and more Using the a provides results for all message sizes 14 4 3 1 ib send bw Synopsis ib send bw i b port ib port PDT port b idirectional 14 4 3 2ib send bw Options The table below lists the various flags of the command c onnection type RC UC UD m tu mtu size S ize message size t x depth tx size n iteration num p ort a 11 V ersion Table 38 ib send bw Flags and Options Flag Description p port lt port gt Listens on connect to port port default 18515 d ib dev lt dev gt Uses IB device device guid default first device found i ib port lt port gt Uses port lt port gt of IB device default 1 m mtu lt mtu gt The mtu size default 1024 c connection lt RC UC UD gt Connection type RC UC UD default RC SIZe lt size gt The size of message to exchange default 65536 a all Runs sizes from 2 till 223 t tx depth lt dep gt The size of tx queue default 100 n iters lt iters gt The number of exchanges at least 2 default 1000 b bidirectional Measures bidirectional bandwidth default unidirectional V version Displays version number BS grh Use GRH with packets mandatory for RoCE 14 4 4 ib send lat ib send lat calculates the latency of sendin
105. anox Technologies Lid Out Destination Port Info 0x0003 021 Switch portguid 0x000b8cffff004016 MT47396 Infiniscale III Mellanox Technologies 0x0006 007 Channel Adapter portguid 0x0002c90300001039 sw137 HCA 1 0x0007 021 Channel Adapter portguid 0x0002c9020025874a sw157 HCA 1 3 valid lids dumped 3 Dump all Lids with valid out ports of the switch with portguid oxooobscffff004016 gt ibroute G 0x000b8cffff004016 Unicast lids 0x0 0x8 of switch Lid 3 guid 0x000b8cffff004016 MT47396 Infiniscale III Mellanox Technologies Lid Out Destination Port Info 0x0002 023 Switch portguid 0x0002c902fffff00a MT47396 Infiniscale III Mellanox Technologies 0x0003 000 Switch portguid 0x000b8cffff004016 MT47396 Infiniscale III Mellanox Technologies 0x0006 023 Channel Adapter portguid 0x0002c90300001039 sw137 HCA 1 0x0007 020 Channel Adapter portguid 0x0002c9020025874a sw157 HCA 1 0x0008 024 Channel Adapter portguid 0x0002c902002582cd sw136 HCA 1 5 valid lids dumped 4 Dump all non empty mlids of switch with Lid 3 ibroute M 3 Multicast mlids 0xc000 0xc3ff of switch Lid 3 guid 0x000b8cffff004016 MT47396 Infiniscale III Mellanox Technologies 0 al 2 Borse UI Aoao 252518954 MLid 0xc000 0xc001 0xc002 0xc003 0xc020 0xc021 0xc022 0xc023 0xc024 0xc040 0xc041 0xc042 12 valid mlids dumped BM oW OX PM oM oH oW Wo WM RW 14 3 5 ibdump The ibdump tool dumps
106. asic information obtained from the local IB driver Output includes LID SMLID port state link width active and port physical state 14 3 13 1ibstat Synopsis ibstat d ebug l ist of cas s hort p ort list V ersion h ca name portnum 14 3 13 2ibstat Options The table below lists the various flags of the command Most OpenIB diagnostics take the following common flags The exact list of supported flags per utility can be found in the usage message and can be shown using the util name h syntax Table 26 ibstat Flags and Options Flag Description l list of cas List all IB devices s short Short output p port_list Show port list Mellanox Technologies 127 Rev 4 70 Table 26 ibstat Flags and Options Flag Description ca name InfiniBand device name portnum Port number of InfiniBand device debug d ddd d d d Raise the IB debugging level help h Show the usage message verbose v vv v v v Increase the application verbosity level version V Show the version info usage u usage message Examples ibstat display status of all ports on all IB devices ibstat 1 list all IB devices ibstat p show port guids ibstat mthca0 2 show status of port 2 of mthca0 14 3 14 vstat vstat is a binary which displays information on the HCA attributes e vstat Synopsis is vstat v c m p N 14 3 14 1vstat
107. ation by modifying some of Windows registries as explained in Section 11 1 1 Registry Tuning on page 84 or can be set post installation manually gt To improve the network adapter performance activate the performance tuning tool as fol lows Step 1 Start the Device Manager open a command line window and enter devmgmt msc Step 2 Open Network Adapters Step3 Select Mellanox IPoIB adapter right click and select Properties Step 4 Select the Performance tab Step 5 Choose one of the tuning scenarios Mellanox Technologies 84 J Rev 4 70 Single port traffic Improves performance for running single port traffic each time Dual port traffic Improves performance for running traffic on both ports simultaneously Forwarding traffic Improves performance for running scenarios that involve both ports for exam ple via IXIA Multicast traffic Improves performance when the main traffic runs on multicast Step 6 Click on Run Tuning button Clicking the Run Tuning button changes several registry entries described below and checks for system services that may decrease network performance It also generates a log including the applied changes Users can view this log to restore the previous values The log path is SHOMEDRIVES Windows System32 LogFiles PerformanceTunning log This tuning is required to be performed only once after the installation is completed and on one adapter only a
108. b affinity 0X1 nd read bw s1048576 D10 S 11 137 53 1 Client side start b wait affinity 0X1 nd read bw s1048576 D10 C 11 137 53 1 14 4 15 2nd read bw Options The table below lists the various flags of the command Table 50 nd read bw Options Flags Description h Shows the Help screen V Shows the version number p Connects to the port port default 68307 s msg size Exchanges the message size with default 65536B gt and it must not be combined with a flag a Runs all the messages sizes from 1B to 8MB and it must not be combined with s flag n lt num of iterations gt The number of exchanges at least 2 the default is 100000 I lt max inline size gt The maximum size of message to send inline The default number is 128B D lt test duration in seconds gt Tests duration in seconds f margin time in seconds gt The margin time to avoid calculation and it must be less than half of the duration time Q CQ Moderation lt value gt The default number is 100 S lt server interface IP gt lt server side only must be last parameter gt C lt server interface IP gt lt client side only must be last parameter gt h Shows the Help screen 14 4 16 nd_read_lat This test is used for performance measuring of RDMA Read requests in Microsoft Windows Operating Systems nd read lat is performance orie
109. b option In addition to generating the files above the discovery phase also checks for duplicate node port GUIDs in the IB fabric If such an error is detected it is displayed on the standard output After the discovery phase is completed directed route packets are sent multiple times according to the c option to detect possible problematic paths on which packets may be lost Such paths are explored and a report of the suspected bad links is displayed on the standard output After scanning the fabric if the r option is provided a full report of the fabric qualities is dis played This report includes Mellanox Technologies 105 SM report Number of nodes and systems Hop count information maximal hop count an example path and a hop count histo gram All CA to CA paths traced Credit loop report mgid mlid HCAs multicast group and report Partitions report IPoIB report In case the IB fabric includes only one CA then CA to CA paths are not reported Furthermore if a topology file is provided ibdiagnet uses the names defined in it for the output reports Rev 4 70 14 3 2 4 ibdiagnet Error Codes Failed to fully discover the fabric Failed to parse command line options Failed to interact with IB fabric Failed to use local device or local port Failed to use Topology File n n d W a A p 1 Failed to load required Package 14 3 3 ibportstate Enables querying the logical link and ph
110. be enabled instead of receive side scaling RSS capabilities The valid values are 0 Report RSS capabilities e 1 Report VMQ capabilities Note This registry value is not exposed via the UI VMQLookaheadSplit 1 Specifies whether the driver enables or disables the ability to split the receive buffers into lookahead and post lookahead buffers The valid values are 0 disable e 1 enable VMQVilanFiltering 1 Specifies whether the device enables or disables the ability to filter network packets by using the VLAN identifier in the media access control MAC header The valid values are 0 disable e 1 enable MaxNumVmqs 127 The number of VMQs that the device supports in parallel This parameter can effect memory consumption of the interface since for each VMQ the driver creates a separate receive ring and an allocate buffer for it In order to minimize the memory consumption one can reduce the number of VMs that use VMQ in parallel However this can affect the performance The valid values are 1 up to 127 Note This registry value is not exposed via the UI MaxNumMacA ddrFil 127 The number of different MAC addresses that the physical port sup ters ports This registry key affects the number of supported MAC addresses that is reported to the OS The valid values are 1 up to 127 Note This registry value is not exposed via the UI MaxNumVlanFilters 127 The number of VLANs that are support
111. ber for each interface The number 08002be10318 lt nn gt RssBaseProcNumber can be different for each interface This allows partitioning of CPUs across network adapters Note Restart the network adapter when you change this registry key HKLM SYSTEM CurrentControlSet Con NUMA node affinitization trol Class 4d3 6e972 e325 11ce bfc1 08002be10318 lt nn gt NumaNodeID HKLM SYSTEM CurrentControlSet Con Sets the RSS base processor group for sys trol Class 4d366e972 e325 11ce bfc1 tems with more than 64 processors 08002be10318 lt nn gt RssBaseProcGroup Mellanox Technologies 38 J Rev 4 70 8 4 Port Configuration 8 4 1 Auto Sensing Auto Sensing enables the NIC to automatically sense the link type InfiniBand or Ethernet based on the cable connected to the port and load the appropriate driver stack InfiniBand or Ethernet Auto Sensing is performed only when rebooting the machine or after disabling enabling the mlx4 bus interface from the Device Manager Hence if you replace cables during the runtime the NIC will not perform Auto Sensing For further information on how to configure it please refer to Section 8 4 2 Port Protocol Con figuration on page 39 8 4 2 Port Protocol Configuration Step 1 Display the Device Manager and expand System devices File Action View Help 4 E Kd m e CRS p T Ports COM amp LPT p den Print queues b m Processors b X Storage c
112. ce cece eee e cece eee enn 09 Chapter 13 Software Development Kit SDK cece eee eee eee e eee 100 Chapter 14 InfiniBand Fabric Utilities 0 0 cee cee eee ee ee ee ee eee LOL 14 1 Network Direct Interface 2 teens 101 14 2 part man Virtual IPoIB Port Creation Utility 02 101 14 3 InfiniBand Fabric Diagnostic Utilities 0 0 0 0 cee eee eee 101 14 31 Utilities Usage 4 c 5 sieve kes ee ae Se ee ee ea ee 101 14 332 ibdiagnet 5 edis Race ai PRETEREA URP CURE KEY Eq 103 14 3 3 ibportstates me eI Ie eq Hee S OH 106 14 3 4 1bro te m etd tee tle nte dette tele de ce e de 109 14 3 5 abdump iu or eR rti eS ean A ee ee RENTE does 111 14 3 6 smpquety ebasvbk vedo v badd cals bo oo e erp pP ESSET ENS enG 112 14 3 7 perfquery eee eee eet CERA RE RR A GR e e ena 116 14 3 8 aibping iv sete NIE ODER RADO e RC us 119 14 3 9 abnetdiscOVeE ooo eis pied Bip eg EE etd qom Ee tet ec edocet 120 Mellanox Technologies 5 J Rev 4 70 14 510 abtracert 455 me RD SLAP egies Ladd Rete ae bans eae 124 14 3 T T Smimfo s oun Seite nid bo Eu E UU URN ERN S ur tens dot 125 14 3 IZbclearetrors oi uie ERREUR MUERE CURE ERR NERA 127 14 3 13 4bStat o opcs ates vede Pubs vere pee oed ero suebrVer Spese 127 14 3 I4 Vstat i eese ub eo t emeret et to te ete e dum 128 14 3 15 08mt6St i oou este b ero Ware DEEP e tea set eR EROR oe 128 14 3 T6 ibaddt i oo ER pet dtm gue edet galt gere et
113. ce of the software and hardware of VPI InfiniBand Ethernet adapter cards It is also intended for application developers Documentation Conventions Table 2 Documentation Conventions Description Convention Example File names file extension Directory names directory Commands and their parameters command param1 mts3610 1 gt show hosts Required item lt gt Optional item Mutually exclusive parameters pl p2 p3 or pl p2 p3j Optional mutually exclusive parame pl p2 p3 ters Variables for which users supply spe Italic font enable cific values Emphasized words Italic font These are emphasized words Note lt text gt This is a note L A 3 3 Warning text May result in system instabil A Mellanox Technologies 13 J Rev 4 70 Common Abbreviations and Acronyms Table 3 Abbreviations and Acronyms Abbreviation Acronym Whole Word Description B Capital B is used to indicate size in bytes or multiples of bytes e g IKB 1024 bytes and 1MB 1048576 bytes b Small b is used to indicate size in bits or multiples of bits e g 1Kb 1024 bits FW Firmware HCA Host Channel Adapter HW Hardware IB InfiniBand LSB Least significant byte Isb Least significant bit MSB Most significant byte msb Most significant bit NIC Network Interface Card SW Sof
114. configured on the physical VEA see Figure 8 11 2 on page 74 The user can manage VEAs using the vea_man tool vea_man set of commands allows you to add or remove a VEA or query the existing Mellanox ethernet adapters and see which are virtual and which are physical 8 11 1 System Requirements Operating System Windows 2012 and Windows 2012 R2 Mellanox Technologies 73 Rev 4 70 Firmware version 2 31 5050 8 11 2 VEA Feature Limitations RoCE RDMA is supported only on the physical VEA e MTU JumboFrame registry key QoS and Flow Control are only configured from physical VEA No bandwidth allocation between the two interfaces Both interfaces share the same link speed e SR IOV and VEA are not supported simultaneously Only one of the features can be used at any given time 8 11 3 Adding a New Virtual Adapter To add a new virtual adapter run the following command vea man a adapter name adapter name gt is the name of the existing physical adapter which will be d essentially cloned The new adapter will be named by system default rules 8 11 4 Removing a Virtual Ethernet Adapter gt To remove a virtual ethernet adapter run the following command vea man r adapter name 8 11 5 Querying the Virtual Ethernet Database Querying the virtual ethernet database reports all physical and virtual ethernet adapters on all Mellanox cards in the system To query the virtual ethernet
115. d 0x0058 Device Name Dropless mode exited on port X Drop mode entered packets may now be dropped 0x0059 Device Name Delay drop timeout occurred on port X Drop mode entered packets may now be dropped Mellanox Technologies 77 J Rev 4 70 9 Booting Windows from an iSCSI Target 9 1 Configuring the WDS DHCP and iSCSI Servers 9 1 1 Configuring the WDS Server To configure the WDS server 1 Install the WDS server 2 Extract the Mellanox drivers to a local directory using the a parameter For boot over Ethernet when using adapter cards with older firmware version than 2 30 8000 you need to extract the PXE package otherwise use Mellanox WinOF VPI package Example Mellanox msi exe a 3 Add the Mellanox driver to boot wim dism Mount Wim WimFile boot wim index 2 MountDir mnt dism Image mnt Add Driver Driver drivers recurse dism Unmount Wim MountDir mnt commit 4 Add the Mellanox driver to install wim dism Mount Wim WimFile install wim index 4 MountDir mnt dism Image mnt Add Driver Driver drivers recurse dism Unmount Wim MountDir mnt commit 5 Add the new boot and install images to WDS For additional details on WDS please refer to http technet microsoft com en us library jj648426 aspx 9 1 2 Configuring iSCSI Target gt To configure iSCSI Target 1 Install iSCSI Target e g StartWind 2 Add to the iSCSI target initiators the IP addresses of
116. d RssOrVmgqPreference when the former is controlled by powershell and the latter is controlled by the virtual switch For further information on these registry keys please refer to http msdn microsoft com en us library windows hardware hh451362 v vs 85 aspx Issue3 Installation Setup fails when the remote desktop host service is installed due to a known issue in windows when using the chain MSI feature Suggestion Disable the service before the installation and enable it at the end Mellanox Technologies 168 Rev 4 70 15 5 Installation Error Codes and Troubleshooting 15 5 1 Setup Return Codes Table 55 Setup Return Codes Error Code Description Troubleshooting 1603 Fatal error during installation Contact support 1633 The installation package is not supported on Make sure you are installing the right this platform package for your platform For additional details on Windows installer return codes please refer to http support microsoft com kb 229683 15 5 2 Firmware Burning Warning Codes Table 56 Firmware Burning Warning Codes Error Code Description Troubleshooting 1004 Failed to open the device Contact support 1005 Could not find an image for at least The firmware for your device was not one device found Please try to manually burn the firm ware 1006 Found one device that has multiple Burn the firmware manually and select the images image you wa
117. d d d e Shows send and receive errors timeouts and others h Shows the usage message V Increases the application verbosity level Can be used several times vv or v v v V Shows the version info Addressing Flags Description D Uses directed path address arguments The path is a comma separated list of out ports Examples 0 gself port 0 1 2 1 4 out via port 1 then 2 G Uses GUID address argument In most cases it is the Port GUID Example 0x08f1040023 s lt smlid gt Uses smlid as the target lid for SM SA queries Flags Description C lt ca_name gt Uses the specified ca_name P lt ca_port gt Uses the specified ca_port t lt timeout_ms gt Overrides the default timeout for the solicited mads 14 3 22 3Multiple CA Multiple Port Support When no IB device or port is specified the port to use is selected by the following criteria 1 The first port that is ACTIVE 2 If not found the first port that is UP physical link up If a port and or CA name is specified the user request is attempted to be fulfilled and will fail if it is not possible Mellanox Technologies 142 Rev 4 70 Examples Direct Routed Examples smpdump D 0 1 2 3 5 16 NODE DESC smpdump D 0 1 2 0x15 2 PORT INFO port 2 LID Routed Examples smpdump 3 0x15 2 PORT INFO lid 3 port 2 smpdump 0xa0 0x11 NODE INFO lid 0xa0 14 4 InfiniBand Fabric Performance Utiliti
118. d Options Flag Description force f Force n no_info Simple format do not show additional information mlid m lt mlid gt Shows the multicast trace of the specified mlid node name map lt node name Specifies anode name map The node name map file maps GUIDs to map gt more user friendly names See Topology File Format on page 122 debug d ddd d d d Raises the IB debugging level Mellanox Technologies 124 Rev 4 70 Table 23 ibtracert Flags and Options Flag Description Lid L Uses LID address argument eITOIS Shows send and receive errors usage u Usage message Guid G Uses GUID address argument In most cases it is the Port GUID Example 0x08f1040023 sm_port s lt smlid gt Uses smlid as the target lid for SM SA queries help h Shows the usage message verbose v vv v v v Increases the application verbosity level version V Shows the version info Ca C ca name Uses the specified ca name Port P ca port Uses the specified ca port timeout t timeout ms Overrides the default timeout for the solicited mads Examples Unicast examples ibtracert 4 16 ibtracert n 4 16 ibtracert G 0x8f1040396522d 0x002c9000100d051 use guid addresses show path between lids 4 and 16 same but using simple output format Multicast example ibt
119. d R Wizard will install MLNX_VPI on your computer To continue click Next WARNING This program is protected by copyright law and international treaties Step 7 Read then accept the license agreement and click Next HB MLNX VPI InstallShield Wizard x License Agreement Please read the following license agreement carefully Copyright c 2005 2013 Mellanox Technologies All rights reserved Redistribution and use in source and binary forms with or without modification are permitted provided that the following conditions are met Redistributions of source code must retain the above copyright notice this list of conditions and the following disclaimer up I accept the terms in the license agreement O I do not accept the terms in the license agreement v InstallShield Step 8 Select the target folder for the installation Destination Folder Click Next to install to this folder or dick Change to install to a differen Install MLNX_VPI to C Program Files Mellanox MLNX_VPI Mellanox Technologies 21 Rev 4 70 Step 9 The firmware upgrade screen will be displayed in the following cases If the user has an OEM card in this case the firmware will not be updated If the user has a standard Mellanox card with an older firmware version the firmware will be updated accordingly However if the user has both OEM card and Mellanox card only Mellanox card will be
120. d RoCEv2 packets use a well known UDP destination port value that unequivocally distinguishes the datagram Similar to other protocols that use UDP encapsulation the UDP source port field is used to carry an opaque flow identifier that allows network devices to imple ment packet forwarding optimizations e g ECMP while staying agnostic to the specifics of the protocol header format Furthermore since this change exclusively affects the packet format on the wire and due to the fact that with RDMA semantics packets are generated and consumed below the AP applications can seamlessly operate over any form of RDMA service including the routable version of RoCE as shown in Figure 2 in a completely transparent way Figure 4 RoCEv2 Protocol Stack RDMA Application ULP 55 lel a RDMA API Verbs o RDMA Software Stack IB IB IB Transport Transport Transport zi Protocol Protocol Protocol 5 lt IB Network Layer IB Network Layer InfiniBand RoCE RoCEv2 InfiniBand Ethernet Ethernet Management Management Management 8 7 2 RoCE Configuration In order to function reliably RoCE requires a form of flow control While it is possible to use global flow control this is normally undesirable for performance reasons The normal and optimal way to use RoCE is to use Priority Flow Control PFC To use PFC it must be enabled on all endpoints and switches in the flow path In the following section we present in
121. d calculation and it must be less than half of the duration time Q CQ Moderation lt value gt The default number is 100 S lt server interface IP gt lt server side only must be last parameter gt C server interface IP gt client side only must be last parameter gt 14 4 14 nd write lat This test is used for performance measuring of RDMA Write requests in Microsoft Windows Operating Systems nd write lat is performance oriented for RDMA Write with minimum Mellanox Technologies 157 Rev 4 70 latency and runs over Microsoft s NetworkDirect standard The level of customizing for the user is relatively high User may choose to run with a customized message size customized number of iterations or alternatively customized test duration time nd_write_lat runs with all message sizes from 1B to 4MB powers of 2 message inlining CQ moderation 14 4 14 1nd_write_lat Synopsis running on specific single core Server side start b affinity 0X1 nd write lat s1048576 D10 S 11 137 53 1 Client side start b wait affinity 0X1 nd write lat s1048576 D10 C 11 137 53 1 14 4 14 2nd write lat Options The table below lists the various flags of the command Table 49 nd write lat Options Flag Description h Shows the Help screen V Shows the version number p Connects to the port lt port gt default 68307 s msg size Exchanges the message size with
122. decimal lid range only ibaddr g 32 show gid address only 14 3 17 ibcacheedit ibcacheedit allows users to edit an ibnetdiscover cache created through the cache option in ibnetdiscover 8 14 3 17 1ibcacheedit Synopsis ibcacheedit switchguid BEFOREGUID AFTERGUID caguid BEFORE AFTER sysimgguid BEFOREGUID AFTERGUID port guid NODEGUID BEFOREGUID AFTERGUID h elp orig cache lt new cache gt 14 3 17 2ibcacheedit Options Table 30 ibcacheedit Flags and Options Flags Description switchguid BEFOREGUID AFTERGUID Specifies a switchguid that should be changed The before and after guid should be separated by a colon On switches port guids are identical to the switch guid so port guids will be adjusted as well on switches caguid BEFOREGUID AFTERGUID Specifies a caguid that should be changed The before and after guid should be separated by a colon sysimgguid BEFOREGUID AFTERGUID Specifies a sysimgguid that should be changed The before and after guid should be separated by a colon portguid NODEGUID BEFOREGUID AFTER Specifies a portguid that should be changed The node GUID guid of the port e g switchguid or caguid should be specified first followed by a colon the before port guid another colon then the after port guid On switches port guids are identical to the switch guid so the switch guid will be adjusted as well on switches Debugging Flags Description
123. dware and Software Prerequisites 0 0 c cece eee 81 10 3 SMB Configuration Verification 0 cee cece eee 81 10 3 1 Verifying Network Adapter Configuration 0 0 c eee eee eee 81 10 3 2 Verifying SMB Configuration 0 eee e 81 10 3 3 Verifying SMB Connection 0 0 cece eh 83 10 4 Verifying SMB Events that Confirm RDMA Connection 83 Chapter 11 Performance Tuning ccc ccc cece ce ee eee n n cece eee OF 11 1 General Performance Optimization and Tuning 0 000 84 IT TT Registry Turning ovv SURE A OEE IEEE A 84 11 1 2 Bn ble RSS eco ter ERR eR OES E IRR Res eO Eee 84 11 1 3 Tuning the IPoIB Network Adapter 0 0 0 cee cece eee ee 84 11 1 4 Tuning the Ethernet Network Adapter 00 eee c eee eee eee 85 111 5 SRIOV Tuning ies een ENDE Ee RS epee eee tate ae aptat 90 11 1 6 Improving Live Migration 0 erara enere 90 11 2 Application Specific Optimization and Tuning 02 00 205 90 11 2 1 Ethernet Performance Tuning 0 cece eee teens 90 11 2 2 IPoIB Performance Tuning 0 c ccc eee eens 90 11 3 Tunable Performance Parameters 0 0 c cece eee eee nee 91 11 4 Adapter Proprietary Performance Counters 0 00 eee ee eee 93 11 4 1 Supported Standard Performance Counters 0 0 0 ee eee ee eee 94 Chapter 12 OpenSM Subnet Manager ce
124. e Driver AMA Performance Tuning Tool Mellanox Tuning Scenario C Single por tralfic c C Multicast traffic C Single steam traffic Restore Defaut Setting Run Tuning single port tratfic each time Single stream traffic Improving performance tor running single stream traffic each tme Dual port traffic Improving performance for running traffic on both ports simutaneousty Clicking the Run Tuning button activates the general tuning as explained above and changes several driver registry entries for the current adapter and its sibling device once the sibling is an Ethernet device as well It also generates a log including the applied changes Users can view this log to restore the previous values The log path is SHOMEDRIVES Windows System32 LogFiles PerformanceTunning log This tuning is required to be performed only once after the installation is completed and on one adapter only as long as these entries are not changed directly in the registry or by some other instal lation or script Please note that a reboot may be required for the changes to take effect 11 1 4 1 Performance Tuning Tool Application You can also activate the performance tuning through a script called perf tuning exe This script has 4 options which include the 3 scenarios described above and an additional manual tuning through which you can set the RSS base and number of processors for each Ethernet adapter The adapters you wish to tun
125. e Windows Network Virtualization binding on the physical NIC of each Hyper V Host Host and Host 2 Enable NetAdapterBinding lt EthInterfaceName gt a ComponentID ms netwnv lt EthInterfaceName gt Physical NIC name Step 2 Create a vSwitch New VMSwitch vSwitchName NetAdapterName EthInterfaceName AllowManagementOS true Step3 Shut down the VMs Stop VM Name lt VM Name gt Force Confirm Step 4 Configure the Virtual Subnet ID on the Hyper V Network Switch Ports for each Virtual Machine on each Hyper V Host Host 1 and Host 2 Add VMNetworkAdapter VMName lt VMName gt SwitchName lt vSwitchName gt StaticMacAddress lt StaticMAC Address gt Step 5 Configure a Subnet Locator and Route records on all Hyper V Hosts same command on all Hyper V hosts New NetVirtualizationLookupRecord CustomerAddress VMInterfaceIPAddress 1 n gt ProviderAddress lt HypervisorInterfacelPAddress1 gt VirtualSubnetID virtualsubnetID MACAddress VMmacaddressi Rule TranslationMethodEncap New NetVirtualizationLookupRecord CustomerAddress VMInterfacelPAddress 2 n ProviderAddress HypervisorInterfaceIPAddress2 VirtualSubnetID virtualsubnetID MACAddress lt VMmacaddress2 gt Rule TranslationMethodEncap a This is the VM s MAC address associated with the vSwitch connected to the Mellanox device Step 6 Add customer route on all Hyper V hosts same command on all Hyper V hosts New NetVirtualizationCusto
126. e are supplied to the script by their name according to the Network Connections Mellanox Technologies 86 Rev 4 70 Synopsis perf_tuning exe s cl lt first connection name gt c2 lt second connection name gt perf_tuning exe d cl lt first connection name gt c2 lt second connection name gt perf tuning exe f cl first connection name gt c2 second connection name gt perf tuning exe m cl first connection name b base RSS processor number n number of RSS processors perf tuning st cl first connection name c2 second connection name gt Options Flag Description S Single port traffic scenario This option can be followed by one or two connection names The tuning will restore the default settings on the second connection and performed on the first connection This option automatically sets SendCompletionMethod 0 e RecvCompletionMethod 2 e ReceiveBuffers 1024 In Operating Systems support NDIS6 3 RssProfile 4 Additionally this option chooses the best processors to assign to DefaultRecvRingProcessor C xInterruptProcessor TxForwardingProcessor In Operating Systems support NDIS6 2 RssBaseProcNumber MaxRssProcessors In Operating Systems support NDIS6 3 NumRSS Queues RssMaxProcNumber d Dual port traffic scenario This option must be followed by two connection names The tuning in this case is code pendent This option automatically sets SendComple
127. e combined with a flag a Runs all the messages sizes from 1B to 8MB and it must not be combined with s flag n lt num of iterations gt The number of exchanges at least 2 the default is 100000 I lt max inline size gt The maximum size of message to send inline The default number is 128B D lt test duration in seconds gt Tests duration in seconds f margin time in seconds gt The margin time to avoid calculation and it must be less than half of the duration time Q CQ Moderation lt value gt The default number is 100 S lt server interface IP gt lt server side only must be last parameter gt C lt server interface IP gt lt client side only must be last parameter gt 14 4 18 nd_send_lat This test is used for performance measuring of Send requests in Microsoft Windows Operating Systems nd send lat is performance oriented for Send with minimum latency and runs over Microsoft s NetworkDirect standard The level of customizing for the user is relatively high User may choose to run with a customized message size customized number of iterations or alterna tively customized test duration time nd_send_lat runs with all message sizes from 1B to 4MB powers of 2 message inlining CQ moderation Mellanox Technologies 161 Rev 4 70 14 4 18 1nd_send_lat Synopsis running on specific single core Server side start b affinity
128. e effect For more information please contact Mellanox Support gt To enable SR IOV using mlxconfig tool beta mlxconfig is part of MFT tools used to simplify firmware configuration The tool is available with MFT tools 3 6 0 or higher in beta version Step 1 Download MFT www mellanox com gt Products gt Software gt Firmware Tools Step 2 Check the current SR IOV configuration mlxconfig d mt4099 pciconf0 q Mellanox Technologies 66 J Rev 4 70 Example output Device 1 Device type ConnectX3 PCI device mt4099 pciconf0 Configurations Current SRIOV EN N A NUM OF VFS N A WOL MAGIC EN P2 N A Step3 Enable SR IOV with 8 VFs mlxconfig d mt4099 pciconf0 s SRIOV EN 1 NUM OF VFS 8 Example output Device 1 Device type ConnectX3 PCI device mt4099 pciconf0 Configurations Current New SRIOV EN N A i NUM OF VFS N A 8 WOL MAGIC EN P2 N A N A Apply new Configuration y n n 8 10 4 2 Enabling SR IOV in Mellanox WinOF Package gt To enable SR IOV in Mellanox WinOF Package Step 1 Install Mellanox WinOF package that supports SR IOV Step 2 Configure HCA ports type to Ethernet SR IOV cannot be enabled if one of the ports is Infiniband Step3 Query SR IOV configuration with Powershell PS Get MlnxPCIDeviceSriovSetting Example output Caption MLNX PCIDeviceSriovSettingData Mellanox ConnectX 3 VPI MT04099 Network Adapter Description Mellanox ConnectX 3 VPI MT04099 Network Adapt
129. e receives inbound traffic Requester QP operation errors Number of local QP operation errors when the local machine generates outbound traffic Responder QP operation errors Number of local QP operation errors when the local machine receives inbound traffic Requester protection errors Number of local protection errors when the local machine generates out bound traffic Responder protection errors Number of local protection errors when the local machine receives inbound traffic Mellanox Technologies 95 J Rev 4 70 Table 12 Mellanox Adapter Diagnostics Counters Mellanox Adapter Diagnostics Counters Description Requester CQE errors Number of local CQE with errors when the local machine generates out bound traffic Responder CQE errors Number of local CQE with errors when the local machine receives inbound traffic Requester Invalid request errors Number of remote invalid request errors when the local machine gener ates outbound traffic 1 e NAK was received indicating that the other end detected invalid OpCode request Responder Invalid request errors Number of remote invalid request errors when the local machine receives inbound traffic Requester Remote access errors Number of remote access errors when the local machine generates out bound traffic i e NAK was received indicating that the other end detected wrong rkey Responder
130. e registry keys may affect IPoIB performance For the complete list of registry entries that may be added changed by the performance tuning procedure see MLNX VPI WinOF Registry Keys following the path below Mellanox Technologies 90 J Rev 4 70 http www mellanox com page products dyn product family 32 amp mtag windows sw drivers To improve performance activate the performance tuning tool as follows Step 1 Start the Device Manager open a command line window and enter devmgmt msc Step 2 Open Network Adapters Step3 Right click the relevant IPoIB adapter and select Properties Step 4 Select the Advanced tab Step 5 Modify performance parameters properties as desired 11 3 Tunable Performance Parameters The following is a list of key parameters for performance tuning Jumbo Packet The maximum available size of the transfer unit also known as the Maximum Transmission Unit MTU For IPoIB the MTU should not include the size of the IPoIB header 4B For example if the network adapter card supports a 4K MTU the upper threshold for payload MTU is 4092B and not 4096B The MTU of a network can have a substantial impact on performance A 4K MTU size improves performance for short messages since it allows the OS to coalesce many small messages into a large one Valid MTU values range for an Ethernet driver is between 614 and 9614 Valid MTU values range for an IPoIB driver is between 1500 and 4092
131. ease refer to http msdn microsoft com en us library ff570865 v VS 85 aspx C 1 Finding the Index Value of the HCA To find the nn value of your HCA from the Device Manager please perform the following steps Step 1 Open Device Manager and go to System devices Step 2 Right click gt properties on Mellanox ConnectX card Step3 Go to Details tab Step 4 Select the Driver key and obtain the nn number In the below example the index equals 0041 Mellanox Technologies 176 Rev 4 70 Device Manager File Action View Help eo HE Oa PRK AK Intel R 5000 Series Chipset Reserved Registers 25F1 1H Intel R 5000 Series Chipset Reserved Registers 25F3 General Pot Protocol Diver Deals Events Resouces A Mellanox Connect 3 MT04099 Network Adapter JE Intel R 5000X Chipset Memory Controller Hub 25C0 Property JE Intel R 5000X Chipset PCI Express x16 Port 4 7 25FA A Intel R 637 1ESB 6321ESB PCI Express Downstream Port E1 3510 Value A Intel R 6311ESB 6321ESB PCI Express Upstream Port 3500 AM Intel R 631xESB 6321ESB 3100 Chipset LPC Interface Controller 2 AK Intel R 631xESB 6321ESB 3100 Chipset PCI Express Root Port 1 2 JE Intel R 631xESB 6321ESB 3100 Chipset PCI Express Root Port 2 2 AM Intel R 631xESB 6321ESB 3100 Chipset SMBus Controller 2698 AK Intel R 82801 PCI Bridge 244E y Mellanox ConnectX 3 MTO4099 Network Adapter
132. ed for each port The valid values are 1 up to 127 Note This registry value is not exposed via the UI C 6 3 RoCE Options Mellanox Technologies 190 Rev 4 70 This section describes the registry keys that are used to control RoCE mode Value Name Default Value Description roce mode 0 RoCE The RoCE mode The valid values are e 0 RoCE 4 NoRoCE Note The default value depends on the WinOF package used C 7 IPoIB Registry Keys The following section describes the registry keys that are unique to IPoIB Value Name Default Value Description GUIDMask 0 Controls the way the MAC is generated for IPoIB interface The driver uses the 8 bytes GUID to generate 6 bytes MAC This value should be either 0 or contain exactly 6 non zero digits using binary representation Zero 0 mask indicates its default value Oxb 11100111 That is to take all except intermediate bytes of GUID to form the MAC address In case of an improper mask the driver uses the default one For more details please refer to http mellanox com related docs prod software guid2mac checker user manual txt Note This registry value is not exposed via the UI MediumType802 3 0 Controls the way the interface is exposed to an upper level By default the IPoIB is exposed as an InfiniBand interface The user can change it and cause the interface to be an Ethernet interface by setting thi
133. en tify devices in a fabric or even in one switch system each device is given a GUID a MAC equivalent Since a GUID is a non user friendly string of characters it is better to alias it to a meaningful user given name For this objective the IB Diagnostic Tools can be provided with a topology file which is an optional configuration file specifying the IB fabric topology in user given names For diagnostic tools to fully support the topology file the user may need to provide the local sys tem name if the local hostname is not used in the topology file To specify a topology file to a diagnostic tool use one of the following two options 1 On the command line specify the file name using the option t topology file name gt 2 Define the environment variable IBDIAG_TOPO_ FILE To specify the local system name to a diagnostic tool use one of the following two options 1 On the command line specify the system name using the option s 1ocal system name gt 2 Define the environment variable IBDIAG SYS NAME 14 3 1 2 IB Interface Definition The diagnostic tools installed on a machine connect to the IB fabric by means of an HCA port through which they send MADs To specify this port to an IB diagnostic tool use one of the fol lowing options 1 On the command line specify the port number using the option p local port number see below 2 Define the environment variable IBDIAG PORT NUM In case more t
134. en to one of the ports on a dual port NIC enabling the preferred port to achieve the maximum throughput and the other port taking up the rest of the remaining bandwidth To configure Ports TX Arbitration Step 1 Open the Device Manager Step 2 Go to the Network adapters Step3 Right click Properties on Mellanox ConnectX 3 Ethernet Adapter card Step 4 Go to Advanced tab Mellanox Technologies 48 Rev 4 70 Step 5 Choose the Tx Throughput Port Arbiter option Step 6 Set one of the following values Best Effort Default Default behavior No precedence is given to this port over the other Guaranteed Give higher precedence to this port Not Present No configuration exists defaults are used 8 7 RDMA over Converged Ethernet RoCE 8 7 1 RoCE Overview Remote Direct Memory Access RDMA is the remote memory management capability that allows server to server data movement directly between application memory without any CPU involvement RDMA over Converged Ethernet RoCE is a mechanism to provide this efficient data transfer with very low latencies on loss less Ethernet networks With advances in data center convergence over reliable Ethernet ConnectX EN with RoCE uses the proven and efficient RDMA transport to provide the platform for deploying RDMA technology in mainstream data center application at 10GigE and 40GigE link speed ConnectX EN with its hardware offload support takes advantage of th
135. er ElementName HCA 0 InstanceID 4 PCIVVEN 15B3 amp DEV 1003 amp SUBSYS 002815B3 amp REV_00 0002C90300A0AA6000 Name HCA 0 Source 238 SystemName L DEV W068 SriovEnable False SriovPortlNumVFs SriovPort2NumVFs SriovPortMode PSComputerName Mellanox Technologies 67 J Rev 4 70 Step 4 Enable SR IOV through Powershell Set MlnxPCIDeviceSriovSetting Name HCA 0 SriovEnable true SR IOV mode configuration parameters Parameter Name Values Description SriovEnable 0 RoCE Configures the RDMA or SR IOV mode default Note RDMA is not supported in SR IOV mode 1 SR IOV SriovPortMode e O auto_portl Configures the number of VFs to be enabled by the bus default driver to each port e l auto por Note In auto_portX mode port X will have the number 2 manual of VFs according to the burnt value in the device and the other port will have no SR IOV and it will support native Ethernet i e no RoCE Setting this parameter to Man ual will configure the number of VFs for each port according to the registry key MaxVFPortX Note The number of VFs can be configured both on a Mellanox bus driver level and Network Interface level i e using Set NetAdapterSriov Powershell cmdlet The num ber of VFs actually available to the Network Interface is the minimum value between mellanox bus driver configu ration and Network Interface configuration For example if 8 VFs support was burnt in firmware SriovP
136. erver 2008 R2 Click Start gt Run and enter CMD Windows Server 2012 2012 R2 Click Start gt Task Manager gt File gt Run new task gt and enter CMD Step 2 Uninstall the driver Run MLNX VPI WinOF 4 70 All win2012 x64 exe S x v qn 5 3 Firmware Upgrade For information on how to upgrade firmware please refer to MFT User Manual www mellanox com gt Products gt Adapter IB VPI SW gt Firmware Tools Mellanox Technologies 27 J Rev 4 70 6 Upgrading Mellanox WinOF Driver The upgrade process differs between various Operating Systems Windows Server 2008 R2 When upgrading from WinOF version 3 2 0 to version 4 40 and above the MLNX WinOF driver upgrades the driver automatically by uninstalling the previous version and installing the new driver The existing configuration files are not saved upon driver upgrade Windows Server 2012 and above When upgrading from WinOF version 4 2 to version 4 40 and above the MLNX WinOF driver does not completely uninstall the previous version but rather upgrades only the components that require upgrade The network configuration is saved upon driver upgrade When upgrading from Inbox or any other version the network configuration is automati cally saved upon driver upgrade Mellanox Technologies 28 J Rev 4 70 7 Advanced Driver Configuration Once you have installed Mellanox WinOF VPI package you can perform various modifications to your dri
137. erverConfiguration Select EnableMultichannel Mellanox Technologies 81 J Rev 4 70 Get SmbServerNetworkInterface netstat exe xan match 445 1 The NETSTAT command confirms if the File Server is listening on the RDMA interfaces Mellanox Technologies 82 J Rev 4 70 10 3 3 Verifying SMB Connection To verify the SMB connection on the SMB client Step 1 Copy the large file to create a new session with the SMB Server Step 2 Open a PowerShell window while the copy is ongoing Step 3 Verify the SMB Direct is working properly and that the correct SMB dialect 1s used Get SmbConnection Get SmbMultichannelConnection netstat exe xan _ match 445 If you have no activity while you run the commands above you might get an empty list due to session expiration and no current connections Ahi 10 4 Verifying SMB Events that Confirm RDMA Connection To confirm RDMA connection verify the SMB events Step 1 Open a PowerShell window on the SMB client Step 2 Run the following cmdlets NOTE Any RDMA related connection errors will be displayed as well Get WinEvent LogName Microsoft Windows SMBClient Operational Message match RDMA Mellanox Technologies 83 J Rev 4 70 11 Performance Tuning This section describes how to modify Windows registry parameters in order to improve performance Please note that modifying the registry incorrectly might lead to serious problems
138. es The performance utilities described in this chapter are intended to be used as a performance micro benchmark 14 4 1 ib read bw ib read bw calculates the BW of RDMA read between a pair of machines One acts as a server and the other as a client The client RDMA reads the server memory and calculate the BW by sampling the CPU each time it receive a successful completion The test supports features such as Bidirectional in which they both RDMA read from each other memory s at the same time change of mtu size tx size number of iteration message size and more Read is available only in RC connection mode as specified in IB spec 14 4 1 1 ib read bw Synopsis ib read bw i b port ib port m tu mtu size s ize message size n iteration num p ort PDT port b idirectional o uts outstanding reads a 11 V ersion 14 4 1 2 ib read bw Options The table below lists the various flags of the command Table 36 ib read bw Flags and Options Flag Description p port lt port gt Listens on connect to port lt port gt default 18515 d ib dev lt dev gt Uses IB device lt device guid gt default first device found i ib port lt port gt Uses port lt port gt of IB device default 1 m mtu lt mtu gt The mtu size default 1024 0 outs lt num gt The number of outstanding read atom default 4 Size lt size gt The size of message to exchange default 65536
139. f exchanges at least 2 default 1000 C report cycles Reports times in cpu cycle units default microseconds H report histogram Print out all results default print summary only U report unsorted implies Print out unsorted results default sorted H V version Displays version number g grh Use GRH with packets mandatory for RoCE 14 4 5 ib write bw ib write bw calculates the BW of RDMA write between a pair of machines One acts as a server and the other as a client The client RDMA writes to the server memory and calculate the BW by sampling the CPU each time it receive a successful completion The test supports features such as Bidirectional in which they both RDMA write to each other at the same time change of mtu size tx size number of iteration message size and more Using the a flag provides results for all message sizes Mellanox Technologies 146 Rev 4 70 14 4 5 1 ib_write_bw Synopsis ib write bw q num of gps c onnection type RC UC i b port ib port m tu mtu size s ize message size t x depth tx size n iteration num p ort PDT port b idirectional a 11 V ersion 14 4 5 2ib write bw Options The table below lists the various flags of the command Table 40 ib write bw Flags and Options Flag Description p port lt port gt Listens on connect to port lt port gt default 18515 d ib dev
140. following are the steps for configuring Mellanox Network Adapter for SR IOV 8 10 4 1 Enabling SR IOV in Firmware SR IOV can be enabled and managed by using one of the following methods gt To burn firmware with SR IOV support Step 1 Verify that HCA is configured for SR IOV by dumping the device configuration file to user chosen location ini device file gt ini flint d device dc gt ini device file gt ini Step2 Verify in the HCA section of the ini that the following fields appear HCA num pfs 1 total vfs 16 sriov_en true Step 3 Ifthe fields do not appear please edit the ini file and add them manually Parameter Recommended Value num pfs 1 Note This field is optional and might not always appear total_vfs lt 0 126 gt The chosen value should be within BIOS limit of MMIO available address space sriov_en true Warning Care should be taken in increasing the number of VFs All A servers are guaranteed to support 16 VFs More VFs can lead to exceed ing the BIOS limit of MMIO available address space Step 4 Create a binary image using the modified ini file mlxburn fw lt fw name gt mlx conf lt ini device file gt ini wrimage lt file name gt bin Step 5 Burn the firmware The file lt file name gt bin is a firmware binary file with SR IOV enabled that has 16 VFs flint dev lt PCI device gt image lt file name gt bin b Step 6 Reboot the system for changes to tak
141. g a packet in message size between a pair of machines One acts as a server and the other as a client They perform a ping pong benchmark on Mellanox Technologies 145 Rev 4 70 which you send packet only if you receive one Each of the sides samples the CPU each time they receive a packet in order to calculate the latency 14 4 4 1 ib_send_lat Synopsis ib send lat i b port ib port c onmection_type RC UC UD m tu mtu size S ize message size t x depth tx size n iteration num p ort PDT port a 11 V ersion C report cycles H report histogram U report unsorted 14 4 4 2 ib send lat Options The table below lists the various flags of the command Table 39 ib send lat Flags and Options Flag Description p port lt port gt Listens on connect to port lt port gt default 18515 d ib dev lt dev gt Uses IB device lt device guid gt default first device found i ib port lt port gt Uses port lt port gt of IB device default 1 m mtu lt mtu gt The mtu size default 1024 c connection lt RC UC UD gt Connection type RC UC UD default RC SIZe lt size gt The size of message to exchange default 65536 l signal Signal completion on each msg a all Runs sizes from 2 till 223 t tx depth lt dep gt The size of tx queue default 100 n iters lt iters gt The number o
142. gher layer protocol Packets Received with Frame Length Error Shows the number of inbound packets that contained error where the frame has length error Packets received with frame length error are a sub set of packets received errors Packets Received with Symbol Error Shows the number of inbound packets that contained symbol error or an invalid block Packets received with symbol error are a subset of packets received errors Packets Received with Bad CRC Error Shows the number of inbound packets that failed the CRC check Packets received with bad CRC error are a subset of packets received errors Packets Received Discarded Shows the number of inbound packets that were chosen to be discarded even though no errors had been detected to prevent their being deliverable to a higher layer protocol One possible reason for discarding such a packet could be to free up buffer space 11 4 1 2 Proprietary Mellanox Adapter Diagnostics Counters Proprietary Mellanox adapter diagnostics counter set consists of the NIC diagnostics These counters collect information from ConnectX 3 and ConnectX 3 Pro firmware flows Table 12 Mellanox Adapter Diagnostics Counters Mellanox Adapter Diagnostics Counters Description Requester length errors Number of local length errors when the local machine generates outbound traffic Responder length errors Number of local length errors when the local machin
143. gle mode M4 Long Multicast Flow multiple mode Single mode Osmtest is tested alone with no other apps that interact with OpenSM MC Multiple mode Could be run with other apps using MC with OpenSM Without M default flow testing is performed t This option specifies the time in milliseconds used for transaction timeouts Specifying t 0 disables timeouts Without t OpenSM defaults to a timeout value of 200 milliseconds log file This option defines the log to be the given file By default the log goes to stdout V This option increases the log verbosity level The v option may be specified multiple times to further increase the verbosity level See the vf option for more information about log verbosity V This option sets the maximum verbosity level and forces log flushing The V is equivalent to vf0xFF d 2 See the vf option for more information about log verbosity Mellanox Technologies 130 Rev 4 70 Table 28 osmtest Flags and Options Flag Description vf This option sets the log verbosity level A flags field must follow the D option A bit set clear in the flags enables disables a specific log level as follows BIT LOG LEVEL ENABLED 0x01 ERROR error messages 0x02 INFO basic messages low volume 0x04 VERBOSE interesting stuff moderate volume 0x08 DEBUG diagnostic high volume 0x10 FUNCS function entry exit very high volume
144. han one HCA device is installed on the local machine it is necessary to specify the device s index to the tool as well For this use on of the following options 1 On the command line specify the index of the local device using the following option 1 index of local device gt 2 Define the environment variable IBDIAG DEV IDX 14 3 1 3 Addressing uz This section applies to the ibdiagpath tool only A tool command may require defining the destination device or port to which it applies The following addressing modes can be used to define the IB ports Using a Directed Route to the destination Tool option d This option defines a directed route of output port numbers from the local port to the destination Using port LIDs Tool option 1 In this mode the source and destination ports are defined by means of their LIDs If the fabric is con figured to allow multiple LIDs per port then using any of them is valid for defining a port Mellanox Technologies 102 Rev 4 70 Using port names defined in the topology file Tool option n This option refers to the source and destination ports by the names defined in the topology file Therefore this option is relevant only if a topology file is specified to the tool In this mode the tool uses the names to extract the port LIDs from the matched topology then the tool operates as in the I option 14 3 2 ibdiagnet ibdiagnet c count v
145. he perfor mance The valid values are e 0 disable e 1 enable Note This registry value is not exposed via the UI VlanId 0 Enables packets with Vlanld It is used when no LBFO intermediate driver is used The valid values are 0 disable No Vlan Id is passed e 1 4095 Valid Vlan Id that will be passed Note This registry value is only valid for Ethernet TxForwardingProces Automatically The processor that will be used to forward the packets sent sor selected based by the forwarding thread on RSS con Default is based on number of rings and number of cores on figuration the machine Note This registry value is not exposed via the UI DefaultRecvRingPro Automatically The type of processor which will be used for the default cessor selected based Receive ring This variable handles packets that are not han on RSS con dled by RSS This can be non TCP UDP packets or even figuration UDP packets if they are configured to use the default ring Note This registry value is not exposed via the UI TxInterruptProcessor Automatically The type of processor which will be used to handle the TX selected based completions The default is based on a number of rings and on RSS con a number of cores on the machine figuration Note This registry value is not exposed via the UI NumRSSQueues 8 The maximum number of the RSS queues that the device should use Note This registry key 1s only in Wind
146. header of the incoming packet can be used to set the PCP bits on the outgoing Ethernet header 8 7 6 Configuring the RoCE Mode Configuring the RoCE mode requires the following RoCE mode is configured per driver and is enforced on all the devices in the system The supported RoCE modes depend on the firmware installed If the firmware does not p support the needed mode the fallback mode would be the maximum supported RoCE ae mode of the installed NIC RoCE mode can be enabled and disabled via PowerShell Mellanox Technologies 52 J Rev 4 70 gt To enable RoCE using the PowerShell Open the PowerShell and run Set MlnxDriverCoreSetting RoceMode 1 gt To enable RoCEv2 using the PowerShell Open the PowerShell and run Set MlnxDriverCoreSetting RoceMode 2 gt To disable any version of RoCE using the PowerShell Open the PowerShell and run Set MlnxDriverCoreSetting RoceMode 0 gt To check current version of RoCE using the PowerShell Open the PowerShell and run Get MlnxDriverCoreSetting Example output Caption DriverCoreSettingData mlx4 bus Description Mellanox Driver Option Settings RoceMode 20 8 8 Network Virtualization using Generic Routing Encapsulation P Network Virtualization using Generic Routing Encapsulation NVGRE off load is cur rently supported in Windows Server 2012 R2 only Network Virtualization using Generic Routing Encapsulation NVGRE is a network virtualiza tion technology
147. hic Adapter Priority Onboard UGA are ee b BIOS SETUP UTILITY Microcode Rev 14 When enabled a UMM Cache L1 256 KB can utilize the Cache L2 1024 KB additional HW Caps Cache L3 122B8 KB provided by Intel R Ratio Status Unlocked Min 12 Max 18 Virtualization Tech Ratio Actual Ualue 18 Note A full reset is required to change CPI Rat in Ifut nl the setting C1E Support Enabled Hardware Prefetcher Enabled Adjacent Cache Line Prefetch Enabled DCU Prefetcher Enabled Data Reuse Optimization Enabled e gt Select Screen MPS and ACPI MADT ordering Modern ordering fi Select Item Change Option Execute Disable Dit Capability Enabled Fj Fi General Help Intel AES NI Disabled F10 Save and Exit Simultaneous Hulti Threading Enabled ESC Exit Active Processor Cores IAL k Intel R EIST Technology Enabled v02 68 Copyright 1965 2009 American Megatrends Inc For further details please refer to the vendor s website 8 10 3 2 Installing Hypervisor Operating System To install Hypervisor Operating System Step 1 Install Windows Server 2012 R2 and above Step2 Install Hyper V role Mellanox Technologies 61 J Rev 4 70 a Go to Server Manager gt Manage gt Add Roles and Features gt Installation Type gt Role based or Feature based Installation gt Server Selection local server b Go to Server Roles gt Hyper V Figure 6 Hyper V Server
148. hing ibnetdiscover output R This option is obsolete and has no effect d Raises the IB debugging level May be used several times ddd or d d d e Shows send and receive errors time outs and others h Shows the usage message V Increases the application verbosity level May be used sev eral times vv or v v v C lt ca_name gt Uses the specified ca_name P lt ca_port gt Uses the specified ca_port Mellanox Technologies 136 Rev 4 70 Table 32 ibqueryerrors Flags and Options Flags Description t lt timeout_ms gt Overrides the default timeout for the solicited mads 14 3 19 3ibqueryerrors Exit Status If a failure to scan the fabric occurs return 1 If the scan succeeds without errors beyond thresh olds return 0 If errors are found on ports beyond thresholds return 1 14 3 19 4ibqueryerrors Files opt ufm files conf infiniband diags error_thresholds Define threshold values for errors File format is simple name val Comments begin with Example Define thresholds for error counters SymbolErrorCounter 10 LinkErrorRecoveryCounter 10 VL15Dropped 100 14 3 20 ibsysstat ibsysstat uses vendor MADs to validate connectivity between InfiniBand nodes and obtain other information about the InfiniBand node ibsysstat is run as client server Default is to run as client 14 3 20 1ibsysstat Synopsis ibsysstat d ebug e rr sh
149. hnologies Mellanox Technologies Ltd 350 Oakmead Parkway Suite 100 Beit Mellanox Sunnyvale CA 94085 PO Box 586 Yokneam 20692 U S A Israel www mellanox com www mellanox com Tel 408 970 3400 Tel 972 0 74 723 7200 Fax 408 970 3403 Fax 972 0 4 959 3245 Copyright 2014 Mellanox Technologies All Rights Reserved Mellanox Mellanox logo BridgeX ConnectX Connect IB CoolBox amp CORE Direct InfiniBridge InfiniHost InfiniScale MetroX amp MLNX OS PhyX ScalableHPC SwitchX UFM Virtual Protocol Interconnect and Voltaire are registered trademarks of Mellanox Technologies Ltd ExtendX FabricIT Mellanox Open Ethernet Mellanox Virtual Modular Switch MetroDX TestX Unbreakable Link are trademarks of Mellanox Technologies Ltd All other trademarks are property of their respective owners 2 Mellanox Technologies Document Number MLNX 15 3280 Rev 4 70 Table of Contents Revision History ose duos sa ek uu e steececien ces ween neue seer eakuces betes 9 About this Man alc 4 aoe oA Se Ree ONG EA Ss eee iw ARE DRE PEE 13 COP reed he CHEER EHE HE 13 Intended Audience inserere i Pardeck eee ale Gane see Gaels 13 Documentation Conventions 0 c cece hh 13 Common Abbreviations and Acronyms 0 eee cece eens 14 Related Doctiments duse cx CER ER DRESS ELA MES S 15 Chapter 1 Inffoducti ll uere RE WERE OES OEE RO RaW Eee 16 1 1 Hardware
150. ical PCIe device to present itself multiple times through the PCIe bus This technology enables multiple virtual instances of the device with separate resources Mellanox adapters are capable of exposing in ConnectX 3 ConnectX 3 Pro adapter cards up to 126 virtual instances called Virtual Func tions VFs These virtual functions can then be provisioned separately Each VF can be seen as an addition device connected to the Physical Function It also shares resources with the Physical Function SR IOV is commonly used in conjunction with an SR IOV enabled hypervisor to provide virtual machines direct hardware access to network resources hence increasing its performance This guide demonstrates the setup and configuration of SR IOV using Mellanox ConnectX VPI adapter cards family in Windows Server 2012 R2 and above SR IOV VF is a single port device Mellanox Technologies 59 Rev 4 70 8 10 1 System Requirements A server blade with an SR IOV capable motherboard and BIOS BIOS settings might need to be updated to enable virtualization support and SR IOV support Hypervisor OS Windows Server 2012 R2 and above Virtual Machine VM OS The VM OS can be either Windows Server 2012 or Windows Server 2012 R2 Mellanox ConnectX 3 ConnectX 3 Pro VPI Adapter Card family with SR IOV capability Mellanox WinOF 4 61 or higher 8 10 2 SR IOV Feature Limitations SR IOV is supported only in Ethernet ports and can be enab
151. ice 3433 Intel R ICH10 Family PCI Express Root Port 1 3440 Intel R ICH10 Family PCI Express Root Port 5 3448 Intel R ICH10 Family SMBus Controller 3430 Intel R ICH10 LPC Interface Controller 3418 Mellanox ConnectX 3 VPI MTO4099 Network Adapter Microsoft ACPI Compliant Systern Microsoft Generic IPMI Compliant Device Step 2 Select the Information tab from the Properties sheet Details Events I Power Management General Advanced Information Performance Driver Adapter Information Mellanox Information Value Driver Version 4 2 11165 0 Firmware Version 2 11 500 Port Number 1 Bus Type PCI E 5 0 Gbps 8 Link Speed oa Part Number MCX3544 FCBT levice Id 4099 Revision Id 0 Current MAC Address 00 02 C9 35 9E F0 Permanent MAC Address 00 02 C9 35 9E F0 Network Status Disconnected Adapter Friendly Name Ethernet 3 IPv4 Address 169 254 27 228 Adapter User Name Save To File Cancel To save this information for debug purposes click Save to File and provide the output file name Mellanox Technologies 32 Rev 4 70 7 3 Configuring the Ethernet Driver The following steps describe how to configure advanced features Step 1 Display the Device Manager File Action View Help e m o 8 m 8 PRG b 7 Ports COM amp LPT den Print queues L3 Processors Storage controllers Ml System devices E ACPI Fixed Feature Button Composite Bus Enu
152. improved algorithms ibv_write_bw calculates the BW of RDMA write between a pair of machines One acts as a server and the other as a client The client RDMA writes to the server memory and calculate the BW by sampling the CPU each time it receives a successful completion The test supports a large variety of features as described below and has better performance than ib_write_bw in Nehalem systems 14 4 11 1ibv write bw Synopsis ibv write bw i b port ib port d ib device c onnection type RC UC m tu mtu size s ize message size t x depth tx size n iteration num p ort PDT port I nline size inline size u gp timeout S 1 sl type x gid index e vents use events N o peak use peak calc F CPU freq fail g num of posts q num of qps b idirectional a 11 V ersion 14 4 11 2ibv write bw Options The table below lists the various flags of the command Table 46 ibv write bw Flags and Options Flag Description p port lt port gt Listens on connect to port lt port gt default 18515 d ib dev lt dev gt Uses IB device lt device guid gt default first device found i ib port lt port gt Uses port lt port gt of IB device default 1 m mtu lt mtu gt The mtu size default 1024 Mellanox Technologies 154 Rev 4 70 Table 46 ibv_write_bw Flags and Options Flag Description c connection lt RC UC gt Connection
153. in time to avoid calculation and it must be less than half of the duration time S server interface IP gt server side only must be last parameter C server interface IP gt client side only must be last parameter gt h Shows the Help screen 14 4 17 nd send bw This test is used for performance measuring of Send requests in Microsoft Windows Operating Systems nd send bw is performance oriented for Send with maximum throughput and runs over Microsoft s NetworkDirect standard The level of customizing for the user is relatively high User may choose to run with a customized message size customized number of iterations or alternatively customized test duration time nd send bw runs with all message sizes from 1B to 4MB powers of 2 message inlining CQ moderation Mellanox Technologies 160 Rev 4 70 14 4 17 1nd_send_bw Synopsis lt running on specific single core gt Server side start b affinity 0X1 nd send bw s1048576 D10 S 11 137 53 1 Client side start b wait affinity 0X1 nd send bw s1048576 D10 C 11 137 53 1 14 4 17 2nd send bw Options The table below lists the various flags of the command Table 52 nd send bw Flags and Options Flag Description h Shows the Help screen V Shows the version number p Connects to the port port default 68307 s msg size gt Exchanges the message size with lt default 65536B gt and it must not b
154. ing IBM USB Remote NDIS Network Device This connection uses the following items Client for Microsoft Net d QoS Packet Scheduler d File and Printer Sharing for Microsoft Networks 4 Microsoft Network Adapter Multiplexor Protocol 4 Link Layer Topology Discovery Mapper 1 0 Driver 4 Link Layer Topology Discovery Responder Intemet Protocol Version 6 TCP IPv6 4 Intemet Protocol Version 4 TCP IPv4 Install Uninstall Properties Description Allows your computer to access resources on a Microsoft network Step 3 Select Internet Protocol Version 4 TCP IPv4 from the scroll list and click Properties Step 4 Select the Use the following IP address radio button and enter the desired IP information General Alternate Configuration You can get IP settings assigned automatically if your network supports this capability Otherwise you need to ask your network administrator for the appropriate IP settings Obtain an IP address automatically Use the following IP address Obtain DNS server address automatically Use the following DNS server addresses Preferred DNS server Alternate DNS server Validate settings upon exit Step 5 Click OK Step 6 Close the Local Area Connection dialog Me
155. ing PowerShell on page 55 Added the following sections Section 7 4 Configuring Quality of Service QoS on page 34 Appendix B NVGRE Configuration Scrips Examples on page 173 Rev 4 55 December 15 2013 Updated the following sections Section 8 8 Network Virtualization using Generic Routing Encapsulation on page 53 Section 8 8 2 Configuring the NVGRE using PowerShell on page 55 November 07 2013 Updated the following sections Section 8 7 2 2 Configuring Windows Host on page 51 Section 14 4 19 1 NTttcp Synopsis on page 163 October 03 2013 Added support for Windows Server 2012 R2 Rev 4 40 July 17 2013 Updated the following sections Section 8 7 1 RoCE Overview on page 49 Section 12 OpenSM Subnet Manager on page 99 Section 14 4 19 NTttcp on page 162 Section 15 Troubleshooting on page 164 Added the following sections Appendix A Windows MPI MS MPI on page 170 Mellanox Technologies 10 J Rev 4 70 Table 1 Revision History Document Revision Date Changes June 10 2013 Updated the following sections Section 6 2 Downloading Mellanox Firmware Tools on page 27 Section 14 InfiniBand Fabric Utilities on page 101 Section 15 Troubleshooting on page 164 Section 1 3 WinOF Set of Documentation on page 16 Section Options on page 87 Added the following secti
156. ings are used one for small packets and another one for larger packets The valid values are e 1 size e 2 hash 3 hash and size Note This registry value is not exposed via the UI RxSmallPacketBypass 0 Specifies whether received small packets bypass larger packets when indicating received packet to NDIS This mode is useful in bi directional applications Enabling this mode ensures that the ACK packet will bypass the regular packet and TCP IP stack will issue the next packet more quickly The valid values are e 0 disable e 1 enable Note This registry value is not exposed via the UI Mellanox Technologies 184 Rev 4 70 Value Name Default Value Description ReturnPacketThreshold 0 The allowed number of free received packets on the rings Any number above it will cause the driver to return the packet to the hardware immediately When the value is set to 0 the adapter uses 2 3 of the received ring size The valid values are 0 to 4096 Note This registry value is not exposed via the UI NumTcb 16 The number of send buffers that the driver allocates for sending purposes Each buffer is in LSO size if LSO is enabled or in MTU size otherwise The valid values are 1 up to 64 Note This registry value is not exposed via the UI ThreadPoll 10000 The number of cycles that should be passed without receiv ing any packet before the polling mechanism stops when using polli
157. ion of TCP or ffloadIPv6 UDP checksum over IPv6 The valid values are 0 disable 1 Tx Enable 2 Rx Enable e 3 Tx and Rx enable ParentBusRegPath HKLM SYS TCP checksum off load IP IP TEM Current ControlSet Co ntrol Class 4 d36e97d e325 11ce bfcl 08002be10318 30073 C 5 Performance Registry Keys This group of registry keys configures parameters that can improve adapter performance Value Name Default Value Description RecvCompletion 1 Sets the completion methods of the receive packets and it Method affects network throughput and CPU utilization The supported methods are Polling increases the CPU utilization because the sys tem polls the received rings for incoming packets how ever it may increase the network bandwidth since the incoming packet is handled faster Adaptive combines the interrupt and polling methods dynamically depending on traffic type and network usage The valid values are 0 polling e 1 adaptive Mellanox Technologies 182 Rev 4 70 Value Name Default Value Description InterruptModeration 1 Sets the rate at which the controller moderates or delays the generation of interrupts making it possible to optimize net work throughput and CPU utilization When disabled the interrupt moderation of the system generates an interrupt when the packet is received In this mode the CPU utiliza tion is increased a
158. is efficient RDMA transport InfiniBand services over Ethernet to deliver ultra low latency for performance critical and transaction intensive applications such as financial database storage and content delivery networks RoCE encapsulates IB transport and GRH headers in Ethernet packets bearing a dedicated ether type While the use of GRH is optional within InfiniBand subnets it is mandatory when using RoCE Applications written over IB verbs should work seamlessly but they require provisioning of GRH information when creat ing address vectors The library and driver are modified to provide mapping from GID to MAC addresses required by the hardware 8 7 1 1 IP Routable RoCEv2 A straightforward extension of the RoCE protocol enables traffic to operate in layer 3 environ ments This capability is obtained via a simple modification of the RoCE packet format Instead of the GRH used in RoCE routable RoCE packets carry an IP header which allows traversal of IP L3 Routers and a UDP header that serves as a stateless encapsulation layer for the RDMA Transport Protocol Packets over IP Figure 3 RoCEv2 and RoCE Frame Format Differences EtherType indicates that packet is RoCE i e next header is IB GRH RoCEv2 EtherType indicates bw that packet is IP P dport number Indicates i e next header is IP ip protocol number that next header is IB BTH indicates that packet is UDP Mellanox Technologies 49 Rev 4 70 The propose
159. ith VMO Serer e aas ae a r a an AE en araea E 37 8 2 Header Data Splitis i cer Leve Gardin Vale Oe Eos ele Paten 38 83 Receive Side Scaling RSS 0 0 tenes 38 8 4 Port Configuration else eee eR e M a WI a Ee ete 39 8 4 1 Auto Sensing inh C RDRRBNURRPEEUN AE HH B DP eue eR ES 39 8 4 2 Port Protocol Configuration 0 cece eect eens 39 8 5 Load Balancing Fail Over LBFO and VLAN 00 0 0 c eee eee 40 8 5 1 Adapter Teaming is soere Seek ek beets EVO LS DECR 40 8 5 2 Creating a Load Balancing and Fail Over LBFO Bundle 41 Mellanox Technologies 3 J Rev 4 70 8 5 3 Creating a Port VLAN in Windows 2008 R2 0 0 0 cee eee eee 44 8 5 4 Removing a Port VLAN in Windows 2008 R2 0 0 0 e eee eee 47 8 5 5 Configuring a Port to Work with VLAN in Windows 2012 and Above 48 8 6 Ports EX Arbitration manias rueda ie o ed E o S 48 8 7 RDMA over Converged Ethernet ROCE 0 2 0 0 00 cece eens 49 8 251 RoCE OVerVieWw csse xoci hehe Gs Ge Ge ae es dose 49 8 7 2 ROCE Configuration lt 5 fas Aokees bovis HOP hod ERE ea ped Tega g saad ates 50 8 7 3 Configuring SwitchX Based Switch System 0000 ce eee ee 51 8 74 Configuring Arista Switch 2 cilisis 51 8 7 5 Configuring Router PFC only 0 ccc eens 52 8 7 6 Configuring the ROCE Mode 0 0 cece cece teenies 52 8 8 Network Virtualization using Generic Routing Encapsulation
160. ivity to a Fault Tolerance bundle while using network capture tools e g Wireshark Suggestion This can happen if the network capture tool captures the network traffic of the non active adapter in the bundle This is not allowed since the tool sets the packet filter to pro miscuous thus causing traffic to be transferred on multiple interfaces Close the network cap ture tool on the physical adapter card and set it on the LBFO interface instead Issue7 No Ethernet connectivity on 10Gb adapters after activating Performance Tuning part of the installation Suggestion This can happen due to adding a TcpWindowSize registry value To resolve this issue remove the value key under HKEY LOCAL MACHINE SYSTEM CurrentControl Set Services Tcpip Parameters TcpWindowsSize or set its value to OxFFFF Issue8 Packets are being lost Suggestion This may occur if the port MTU has been set to a value higher than the maximum MTU supported by the switch Issue 9 Issue s not listed above The MLNX EN for Windows driver records events in the system log of the Windows event system Using the event log you ll be able to identify diagnose and predict sources of system problems Suggestion To see the log of events open System Event Viewer as follows 1 Right click on My Computer click Manage and then click Event Viewer OR 1 Click start gt Run and enter eventvwr exe 2 In Event Viewer select the system log The following events are reco
161. lags and Options ssseleeeeeeeee 157 Table 49 nd wnte lat Options o ooscoct heo th ER LEX d E IA a NRE Ss 158 Table 50 md read bw Options es voncad ET yu EdbE und e ds RE 159 JAabl 51 nd read lat Options ius sed exa A etes Sura Ese RICE tac BS 160 Table 52 nd send bw Flags and Options 0 cece eens 161 Table 53 nd send lat Options oss 47 64 bets ord vb e EI DV Hub da er 162 Table 54 gt NTttep OPHONS 2 2 curta e d re esee t eo e iba e ess ER 163 Table 55 Setup Return Codes eed Wiper RS PENA SOUTHUPPCUETPSEaE eens 169 Table 56 Firmware Burning Warning Codes 00 0 cece eee ese 169 Table 57 Restore Configuration Warnings lsseeeeeee eee 169 Mellanox Technologies 8 J Rev 4 70 Revision History Table 1 Revision History Document Revision Date Changes Rev 4 70 June 29 2014 Updated the following section Section 8 7 2 2 1 Using Global Pause Flow Con trol GFC on page 51 May 4 2014 Updated the following sections Section 1 3 WinOF Set of Documentation on page 16 Section 5 3 Firmware Upgrade on page 27 Section 8 10 4 2 Enabling SR IOV in Mellanox WinOF Package on page 67 Section 10 3 1 Verifying Network Adapter Con figuration on page 81 Section 15 2 Ethernet Troubleshooting on page 164 Added the following sections e Section 4 Installing Mellanox WinOF Driver on page 20 Section 5 Uninstalling Mellanox WinOF Driver on p
162. le if all ports are set as Ethernet RDMA capability is not available in SR IOV mode on either port 8 10 3 Configuring SR IOV Host Machine The following are the necessary steps for configuring host machine 8 10 3 1 Enabling SR IOV in BIOS Depending on your system perform the steps below to set up your BIOS The figures used in this section are for illustration purposes only For further information please refer to the appropriate BIOS User Manual gt To enable SR IOV in BIOS Step 1 Make sure the machine s BIOS supports SR IOV Please consult BIOS vendor website for SR IOV supported BIOS versions list Update the BIOS version if necessary Step2 Follow BIOS vendor guidelines to enable SR IOV according to BIOS User Manual For example Mellanox Technologies 60 J Rev 4 70 a Enable SR IOV BIOS SETUP UTILITY Advanced Advanced PCI PnP Settings WARNING Setting wrong values in below sections Disabled may cause system to malfunction Enabled Clear NURAM No Plug amp Play O S Yes PCI Latency Timer 641 PCI IDE BusMaster Disabled Sloti PCI X OPROH Enabled S1ot2 PCI X OPROH Enabled Slot3 PCI X OPROH Enabled Select Screen Slot4 PCI E OPROM Enabled Select Item Slot5 PCI E OPROM Enabled Change Option Slot6 PCI E OPROM Enabled General Help Load Onboard LAN 1 Option ROM Enabled Save and Exit Load Onboard LAN 2 ption RUM Disabled Exit Onboard LAN Option Rom Select PXE Hoots Grap
163. ler moderates or delays the generation of interrupts making it pos sible to optimize network throughput and CPU utilization The default setting Adaptive adjusts the interrupt rates dynamically depending on the traffic type and network usage Choosing a differ ent setting may improve network and system performance in certain configurations Send completion method Sets the completion methods of the Send packets and it may affect network throughput and CPU utilization nterrupt Moderation TX Packet Count Number of packets that need to be sent before an interrupt is generated on the send side default 0 Interrupt Moderation TX Packet Time Maximum elapsed time in usec between the sending of a packet and the generation of an inter rupt even if the moderation count has not been reached default 0 Mellanox Technologies 92 J Rev 4 70 Offload Options Allows you to specify which TCP IP offload settings are handled by the adapter rather than the oper ating system Enabling offloading services increases transmission performance as the offload tasks are performed by the adapter hardware rather than the operating system Thus freeing CPU resources to work on other tasks Pv4 Checksums Offload Enables the adapter to compute IPv4 checksum upon transmit and or receive instead of the CPU default Enabled TCP UDP Checksum Offload for IPv4 packets Enables the adapter to compute TCP UDP checksum over IPv4 packets u
164. llanox Technologies 30 Rev 4 70 Step 7 Verify the IP configuration by running ipconfig from a CMD console ipconfig Ethernet adapter Local Area Connection 4 Connection specific DNS Suffix IP Address a e a utente ec A 12569 SubnecgMas rM 5 255 010 Default Gateway 7 2 Configuring the InfiniBand Driver 7 2 4 Modifying IPoIB Configuration gt To modify the IPoIB configuration after installation perform the following steps Step 1 Open Device Manager and expand Network Adapters in the device display pane Step 2 Right click the Mellanox IPoIB Adapter entry and left click Properties Step 3 Click the Advanced tab and modify the desired properties The IPoIB network interface is automatically restarted once you finish modifying IPoIB parameters Consequently it might affect any running traffic 5 Mellanox Technologies 31 J Rev 4 70 7 2 2 Displaying Adapter Related Information To display a summary of network adapter software firmware and hardware related information such as driver version firmware version bus interface adapter identity and network port link information perform the following steps Step 1 Display the Device Manager ile Action View Help mE Hm WS Fae gt TP Ports COM amp LPT b dh Print queues b Bl Processors p P Security devices b Storage controllers 1E System devices pM ACPI Fixed Feature Button Broadcom BCMS5709C NetXtreme II GigE 48 Broadcom BCM57
165. lue Description ReceiveBuffers 1024 The number of packets each ring receives This parameter affects the memory consumption and the performance Increasing this value can enhance receive performance but also consumes more system memory In case of lack of received buffers dropped packets or out of order received packets you can increase the number of received buffers The valid values are 256 up to 4096 Note On 32 bit systems the non pageable memory is limited As a result when the MTU is higher than 5000 and the ring size is 2048 or more the initialization can fail due to a lack of memory If the MTU is more than 5000 the driver limits the ring size on 32 bit system to be 1024 TransmitBuffers 2048 The number of packets each ring sends Increasing this value can enhance transmission performance but also consumes system memory The valid values are 256 up to 4096 SpeedDuplex 7 default The Speed and Duplex settings that a device supports This registry key should not be changed and it can be used to query the device capability Mellanox ConnectX device is set to 7 meaning10Gbps and Full Duplex Note Default value should not be modified MaxNumOfMCList 128 The number of multicast addresses that are filtered by the NIC If the OS uses more multicast addresses than were defined it sets the port to multicast promiscuous and the multicast addresses are fil tered by OS at protocol level The valid values are
166. ly be configured when the virtual switch is created An external virtual switch with SR IOV enabled cannot be converted to an internal or private switch Step 2 Adda VMNIC connected to a Mellanox vSwitch Go to Hyper V Manager gt Settings gt Add New Hardware gt Network Adapter gt OK In Virtual Switch dropdown box choose Mellanox SR IOV Virtual Switch Mellanox Technologies 70 Rev 4 70 Figure 15 Adding a VMNIC to a Mellanox V switch vmi v amp Hardware O Network Adapter Add Hardware Ki BIOS Specify the configuration of the network adapter or remove the network adapter Boot from CD Virtual switch Bi Memory Not connected M 4096 MB VLAN ID E Processor C Enable virtual LAN identification 1 Virtual processor S IDE Controller 0 Cs Hard Drive WinaSrv DC x64 fre 920 E IDE Controller 1 DVD Drive None Bandwidth Management BE SCSI Controller Network Adapter Not connected E Network Adapter Internal Virtual Switch Enable bandwidth management i Network Adapter 8 Not connected comi o None com2 To remove the network adapter from this virtual machine click Remove None by Diskette Drive Remove None i o Use a legacy network adapter instead of this network adapter to perform a amp Management network based installation of the guest operating system or when integration i Name services are not inst
167. ly the priorities configured as Lossless will be noted in the pause frame Packets arriv ing while the buffer is full are dropped immediately During packet processing if the selected descriptor ring has no free descriptors two modes for handling are available Mellanox Technologies 75 Rev 4 70 8 13 2 Drop Mode In this mode a packet arriving to a descriptor ring with no free descriptors is dropped after veri fying that there are really no free descriptors This allows isolation of the host driver execution delays from the network as well as isolation between different SW entities sharing the adapter e g SR IOV VMs 8 13 3 Poll Mode In this mode a packet arriving to a descriptor ring with no free descriptors will patiently wait until a free descriptor is posted All processing for this packet and the following packets is halted while free descriptor status is polled This behavior will propagate the backpressure into the Rx buffer which will accumulate incoming packets When XOFF threshold is crossed Flow Control mechanisms mentioned earlier will stop the remote transmitters thus avoiding packets from being dropped Since this mode breaks the aforementioned isolation the adapter offers a mitigation mechanism that limits the amount of time a packet may wait for a free descriptor while halting all packet processing When the allowed time expires the adapter reverts to the Drop Mode behavior 8 13 4 Default behavior
168. make sure Flow control is enabled for both TX and RX 4 To eliminate QoS and Flow control as the performance degrading factor set all devices to run with Global Pause and rerun the tests Set Global pause on the switches routers Run Disable NetAdapterQos on all of the hosts in a PowerShell window Mellanox Technologies 167 Rev 4 70 15 4 General Troubleshooting Issue 1 Running Windows as VM over ESX with Mellanox HCAs Virtual machines with Windows 2008 and later guest OS fail to power on ConnectX adapter network cards are connected as Direct I O PCI devices To solve this issue perform the following steps 1 Right click the virtual machine and select Edit Settings 2 Click the Options tab and expand Advanced 3 Click Edit Configuration 4 Click Add Row 5 Add the parameter to the new row a In the Name column add pciPassthru0 maxMSIXvectors b In the Value column add 31 6 Click OK and click OK again For further details please refer to http kb vmware com selfservice microsites search do language en_US amp cmd dis playKC amp externalld 2032981 Issue 2 Hyper V environment In Hyper V enviroment Enable NetAdapterVmq powershell command can enable VMQ on a network adapter only if the virtual switch which does not have SR IOV enabled is defined over corresponding network adapter Suggestion This is because the result of the powershell command depends on 2 registry fields VMQ an
169. mand on MPI server mpiexec exe p smpd port hosts num of hosts hosts ip list env MPICH NETMASK network ip subnet env MPICH ND ZCOPY THRESHOLD 1 env MPICH DISABLE ND 0 1 env MPICH DISABLE SOCK 0 1 affinity lt process gt A 3 Directing MSMPI Traffic Directing MPI traffic to a specific QoS priority may be delayed due to Except for NetDirectPortMatchCondition the QoS powershell CmdLet for NetworkDi rect traffic does not support port range Therefore NetwrokDirect traffic cannot be directed to ports 1 65536 The MSMPI directive to control the port range namely MPICH PORT RANGE 3000 3030 is not working for ND and MSMPI chose a random port A 4 Running MSMPI on the Desired Priority Step 1 Set the default QoS policy to be the desired priority Note this prio should be lossless all the way in the switches Step 2 Set SMB policy to a desired priority only if SMD Traffic running Mellanox Technologies 170 Rev 4 70 Step 3 Recommended Direct ALL TCP UDP traffic to a lossy priority by using the IPProtocol MatchCondition TCP is being used for MPI control channel smpd while UDP is being used for other k services such as remote desktop Arista switches forwards the pcp bits e g 802 1p priority within the vlan tag from ingress to egress to enable any two End Nodes in the fabric as to maintain the priority along the route In this case the packet from the sender goe
170. me REG SZ e 0 disabled Choosing values between 1 65534 enables the out default feature but the chosen value limits the amount e b of time a packet may wait for a free descriptor 65535 enabled The value is in units of 100 microseconds with inaccuracy of up to 2 units The chosen time ranges between 100 microseconds and 6 5 sec onds For example DelayDropTimeout 3000 limits the wait time to 300 miliseconds 200 microseconds Choosing the value of 65535 enables the feature but the amount of time a packet may wait for a free descriptor is infinite Note Changing the value of the DelayDrop Timeout registry key requires restart of the net work interface 8 13 7 2 Entering Exiting Lossless Mode Using Set OID OID MLX DROPLESS MODE In order to enter poll mode registry value of DelayDropTimeout should be non zero and OID MLX DROPLESS MODE Set OID should be called with Information Buffer containing l OID MLX DROPLESS MODE value 0xFFA0C932 OID Information Buffer Size 1 byte OID Information Buffer Contents 0 exit poll mode 1 enter poll mode 8 13 8 Monitoring Lossless TCP State In order to allow state transition monitoring events are written to event log with mlx4 bus as the source The associated events are listed in Table 9 Table 9 Lossless TCP Associated Events Event ID Event Description 0x0057 Device Name Dropless mode entered on port X Packets will not be droppe
171. merRoute RoutingDomainID 11111111 2222 3333 4444 000000005001 VirtualSubnetID virtualsubnetID DestinationPrefix VMInterfaceIPAddress Mask NextHop 0 0 0 0 Metric 255 Step 7 Configure the Provider Address and Route records on each Hyper V Host using an appropriate interface name and IP address NIC Get NetAdapter lt EthInterfaceName gt New NetVirtualizationProviderAddress InterfaceIndex NIC InterfaceIndex ProviderAddress HypervisorInterfaceIPAddress PrefixLength 24 New NetVirtualizationProviderRoute InterfaceIndex S NIC InterfaceIndex DestinationPrefix 0 0 0 0 0 NextHop HypervisorInterfaceIPAddress Step 8 Configure the Virtual Subnet ID on the Hyper V Network Switch Ports for each Virtual Machine on each Hyper V Host Host 1 and Host 2 Get VMNetworkAdapter VMName VMName where MacAddress eq VMmacaddress1 Set VMNet workAdapter VirtualSubnetID virtualsubnetID Mellanox Technologies 55 J Rev 4 70 A Please repeat steps 5 to 8 on each Hyper V after rebooting the Hypervisor 8 8 3 Verifying the Encapsulation of the Traffic Once the configuration using PowerShell is completed verifying that packets are indeed encap sulated as configured is possible through any packet capturing utility If configured correctly an encapsulated packet should appear as a packet consisting of the following headers Outer ETH Header Outer IP GRE Header Inner ETH Header Original Ethernet
172. merator Direct memory access controller Generic Bus Intel R 5000 Series Chipset Error Reporting Registers 25FO Intel R 5000 Series Chipset Error Reporting Registers 25F0 ntel R 5000 Series Chipset Error Reporting Registers 25FO ntel R 5000 Series Chipset FBD Registers 25F5 Intel R 5000 Series Chipset FBD Registers 25F6 Intel R 5000 Series Chipset PCI Express x4 Port 3 25E3 Intel R 5000 Series Chipset PCI Express x4 Port 5 25E5 Intel R 5000 Series Chipset PCI Express x4 Port 6 25E6 Intel R 5000 Series Chipset PCI Express x4 Port 7 25E7 ntel R 5000 Series Chipset PCI Express x8 Port 2 3 25F7 Intel R 5000 Series Chipset Reserved Registers 25F1 Intel R 5000 Series Chipset Reserved Registers 25F3 Intel R 5000X Chipset Memory Controller Hub 25C0 Intel R 5000X Chipset PCI Express x16 Port 4 7 25FA Intel R 6311ESB 6321ESB PCI Express Downstream Port E1 3510 ntel R 6311ESB 6321ESB PCI Express to PCI X Bridge 350C Intel R 6311ESB 6321ESB PCI Express Upstream Port 3500 Intel R 631xESB 6321ESB 3100 Chipset LPC Interface Controller 2670 Intel R 631xESB 6321ESB 3100 Chipset PCI Express Root Port 1 2690 Intel R 631xESB 6321ESB 3100 Chipset PCI Express Root Port 2 2692 Intel R 631xESB 6321ESB 3100 Chipset SMBus Controller 269B Intel R 82801 PCI Bridge 244E Mellanox ConnectX 3 MT04099 Network Adapter Mellanox ConnectX 3 MT04099 Network Adapter 7S Microsoft ACPI Compliant System
173. mode Note If the total number of VFs requested is larger than the number of VFs burnt in firm ware each port X 1 2 will have the number of VFs according to the following formula MaxVFPortX Max VPort1 MaxVPort2 number of VFs burnt in firmware Mellanox Technologies 194
174. n names This option automatically sets the driver registry values back to their default values e SendCompletionMethod 0 IPoIB 1 ETH e RecvCompletionMethod 2 e ReceiveBuffers 1024 UseRSSForRawIP 1 DefaultRecvRingProcessor 1 CxInterruptProcessor 1 TxForwardingProcessor 1 UseRSSForUDP 1 In Operating Systems support NDIS6 2 MaxRssProcessors 8 In Operating Systems support NDIS6 3 NumRSSQueues 8 cl Specifies first connection name See examples c2 Specifies second connection name See examples b Specifies base RSS processor number See examples Used for manual option m only n Specifies number of RSS processors See examples Used for manual option m only Mellanox Technologies 88 J Rev 4 70 Flag Description st Single stream traffic scenario This option must be followed by one or two connection names for an Ethernet adapter The tuning will restore the default settings on the second connection and performed on the first connection This option automatically sets SendCompletionMethod 0 e RecvCompletionMethod 2 ReceiveBuffers 1024 In Operating Systems support NDIS6 3 RssProfile 4 Additionally this option chooses the best processors to assign to DefaultRecvRingProcessor C xInterruptProcessor TxForwardingProcessor In Operating Systems support NDIS6 2 RssBaseProcNumber MaxRssProcessors In Operating System
175. n the connectivity lines The IB node is identified followed by the number of ports and the node GUID On the right of this line is a comment followed by the NodeDescription in quotes If the node is a switch this line also contains whether switch port 0 is base or enhanced and the LID and LMC of port 0 Subsequent lines pertaining to this node show the connectivity On the left is the port number of the current node On the right is the peer node node at other end of link It is identified in quotes with nodetype followed by followed by NodeGUID with the port number in square brackets Further on the right is a comment What follows the comment is dependent on the node type If it is a switch node it is followed by the NodeDescription in quotes and the LID of the peer node If it is a CA or router node it is fol lowed by the local LID and LMC and then followed by the NodeDescription in quotes and the LID of the peer node The active link width and speed are then appended to the end of this output line Example Topology file generated on Tue Jun 5 14 15 10 2007 Max of 3 hops discovered Initiated from node 0008f10403960558 port 0008f10403960559 Non Chassis Nodes When grouping is used InfiniBand nodes are organized into chasses which are numbered Nodes which cannot be determined to be in a chassis are displayed as Non Chassis Nodes External ports are also shown on the connectivity lines vendid 0x8f1 devid 0x5a0
176. n zero VLAN ID i e priority tagged Default 0x0 for DSCP based PFC set to 0x1 RxUntaggedMapToLossless If 0x1 all untagged traffic is mapped to the lossless receive queue Default 0x0 for DSCP based PFC set to 0x1 RroceDscpMarkPriorityFlowControl ID A value to mark DSCP for RoCE v2 packets assigned to CoS ID when priority flow control is enabled The valid values range is from 0 to 63 Default is ID value e g RroceDscpMarkPriorityFlowControl_3 is 3 ID values range from 0 to 7 For changes to take affect please restart the network adapter after changing this registry key Ah 8 9 5 1 Default Settings When DSCP configuration registry keys are missing in the miniport registry the following defaults are assigned Table 8 DSCP Default Registry Keys Settings Registry Key Default Value TxUntagPriorityTag 0 RxUntaggedMapToLossles 0 RroceDscpMarkPriorityFlowControl 0 0 RroceDscpMarkPriorityFlowControl 1 1 RroceDscpMarkPriorityFlowControl 2 2 RroceDscpMarkPriorityFlowControl 3 3 RroceDscpMarkPriorityFlowControl 4 4 RroceDscpMarkPriorityFlowControl_5 5 RroceDscpMarkPriorityFlowControl_6 6 Mellanox Technologies 58 J Rev 4 70 Table 8 DSCP Default Registry Keys Settings Registry Key Default Value RroceDscpMarkPriorityFlowControl 7 7 8 9 6 DSCP Sanity Testing To verify that all QoS and DSCP settings were correct
177. ncing and Fail Over LBFO Bundle LBFO is used to balance the workload of packet transfers by distributing the workload over a bundle of network instances and to set a secondary network instance to take over packet indica tions and information requests if the primary network instance fails The following steps describe the process of creating an LBFO bundle Mellanox Technologies 41 J Rev 4 70 Step 1 Display the Device Manager Disk drives Display adapters DVD CD ROM drives Human Interface Devices C IDE ATA ATAPI controllers gt Keyboards fH Mice and other pointing devices B Monitors Ea Network adapters X Broadcom Netxtreme Gigabit Ethernet ra Broadcom Netxtreme Gigabit Ethernet 2 X Mellanox ConnectX 3 Ethernet Adapter lt a Mellanox Connects 3 Ethernet Adapter 2 iB Ports COM amp LPT B Processors 8 1 System devices H 9 Universal Serial Bus controllers Step2 Right click a Mellanox ConnectX 10Gb Ethernet adapter under Network adapters list and left click Properties Select the LBFO tab from the Properties window gt To create a new bundle perform the following Step 1 Click Create Step 2 Enter a unique bundle name Step 3 Select a bundle type Mellanox Technologies 42 Rev 4 70 Step 4 Select the adapters to be included in the bundle that have not been associated with a VLAN Step 5 Optional Select Primary Adapter An active passive scenari
178. nd others s sm_port lt smlid gt Use lt smlid gt as the target LID for SM SA queries C Ca lt ca_name gt Use the specified channel adapter or router P Port lt ca_port gt Use the specified port t timeout lt timeout_ms gt Override the default timeout for the solicited MADs msec dest dr path lid guid Destination s directed path LID or GUID lt startlid gt Starting LID in an MLID range lt endlid gt Ending LID in an MLID range Examples 1 Dump all Lids with valid out ports of the switch with Lid 2 gt ibroute 2 Unicast lids 0x0 0x8 of switch Lid 2 guid 0x0002c902fffff00a MT47396 Infiniscale III Mellanox Technologies Lid Out Destination Port Info 0x0002 000 Technologies 0x0003 021 Technologies 0x0006 007 0x0007 021 Switch portguid 0x0002c902fffff00a MT47396 Infiniscale III Mellanox Switch portguid 0x000b8cffff004016 MT47396 Infiniscale III Mellanox Channel Adapter portguid 0x0002c90300001039 sw137 HCA 1 Channel Adapter portguid 0x0002c9020025874a sw157 HCA 1 0x0008 008 Channel Adapter portguid 0x0002c902002582cd sw136 HCA 1 5 valid lids dumped 2 Dump all Lids in the range 3 to 7 with valid out ports of the switch with Lid 2 ibroute 2 3 7 Mellanox Technologies 110 Rev 4 70 Unicast lids 0x3 0x7 of switch Lid 2 guid 0x0002c902fffff00a MT47396 Infiniscale III Mell
179. nd reset perfquery e r 32 1 read extended performance counters and reset perfquery R 0x20 1 reset performance counters of port 1 only perfquery e R 0x20 14 reset extended performance counters of port 1 only perfquery R a 32 reset performance counters of all ports perfquery R 32 2 OxOfff4 reset only error counters of port 2 perfquery R 32 2 0xf000 reset only non error counters of port 2 1 Read local port s performance counters gt perfquery Mellanox Technologies 117 Rev 4 70 2 Read performance counters from LID 2 all ports 3 Read then reset performance counters from LID 2 port 1 Mellanox Technologies 118 Rev 4 70 ROVERFOLS C C E SE s 0 RevRemote Phy Sherorscn te 10 0 PoySwRelayEbRoBc PEE 0 XmtDiscardsswerteee Em 3 XmeCons traint ERrORS e PE 0 Rev Cons erant Error ME 0 MEM Eeg EVE onc EE 0 ExcBuiOverrunErrors 0 WISSDreppediec year acres 0 Kmrbatasce ende re I M A 0 Rovbatoc e ee RE RUE edS 0 AMEPKUSH Oy reap E E ep 0 ROV ROO m DRIN 0 14 3 8 ibping ibping uses vendor MADS to validate connectivity between IB nodes On exit IP ping like out put is shown ibping is run as client server however the default is to run it as a client Note also that in addition to ibping a default server is implemented within the kernel 14 3 8 1 ibping Synopsis ibping d ebug e rr_show v erbose G uid C ca name P ca port s smlid t imeout timeout ms V er
180. nd sets it to NDIS on driver start up Value Type DWORD The valid values are 0 up to the number of processors on the sys tem CheckFwVersion 1 Configures the Mellanox driver to skip validation of the FW com patibility to the driver version Skipping this check up is not rec ommended and can cause unexpected behavior It can be used for testing purposes only Value Type DWORD The valid values are 0 Don t check e 1 Check Maximum Work 2 The number of working threads which can work simultaneously on ingThreads receive polling By default the Mellanox driver creates a working thread for each Rx rings if polling or adaptive receive completion is set Value Type DWORD The valid values are 1 up to number of Rx rings C 9 SR IOV Registry Keys SR IOV feature can be controlled on a machine level or per device using the same set of Regis try Keys However only one level must be used consistently to control SR IOV feature If both levels were used the per machine level of configuration will be enforced by the driver Registry Keys location for machine configuration HKLM SYSTEM CurrentControlSet Services mlx4_bus Parameters Registry Keys location for device configuration HKLM SYSTEM CurrentControlSet Control Class 4d36e97d e325 11ce bfc1 08002be10318 lt nn gt Parameters Mellanox Technologies 193 Rev 4 70 For more information on how to find device index nn please refer to C 1 Findi
181. ne size inline size type x gid index iteration num group p ort PDT port a 11 cycles H report histogram CPU freq fail 14 4 10 2ibv send lat Options c onnection type RC UC UD d ib device name t x depth tx size u qp timeout S L sl e events use events n g num of qps in mcast V ersion C report U report unsorted F The table below lists the various flags of the command Table 45 ibv send lat Flags and Options Flag Description p port lt port gt Listens on connect to port lt port gt default 18515 d ib dev lt dev gt Uses IB device device guid default first device found i ib port lt port gt Uses port lt port gt of IB device default 1 m mtu lt mtu gt The mtu size default 1024 c connection lt RC UC UD gt Connection type RC UC UD default RC SIZe lt size gt The size of message to exchange default 65536 a all Runs sizes from 2 till 223 t tx depth lt dep gt The size of tx queue default 100 n iters lt iters gt The number of exchanges at least 2 default 1000 u qp timeout lt timeout gt QP timeout The timeout value is 4 usec 2 timeout default 14 S sl lt sl gt The service level default 0 x gid index lt index gt Test uses GID with GID index taken from command line for RDMAOE index should be
182. ng completion method for receiving Afterwards receiving new packets will generate an interrupt that reschedules the polling mechanism The valid values are 0 up to 200000 Note This registry value is not exposed via the UI AverageFactor 16 The weight of the last polling in the decision whether to continue the polling or give up when using polling comple tion method for receiving The valid values are 0 up to 256 Note This registry value is not exposed via the UI AveragePollThreshold 10 The average threshold polling number when using polling completion method for receiving If the average number is higher than this value the adapter continues to poll The valid values are 0 up to 1000 Note This registry value is not exposed via the UI ThisPoll Threshold 100 The threshold number of the last polling cycle when using polling completion method for receiving If the number of packets received in the last polling cycle is higher than this value the adapter continues to poll The valid values are 0 up to 1000 Note This registry value is not exposed via the UI Mellanox Technologies 185 Rev 4 70 Value Name Default Value Description HeaderDataSplit 0 Enables the driver to use header data split In this mode the adapter uses two buffers to receive the packet The first buf fer holds the header while the second buffer holds the data This method reduces the cache hits and improves t
183. ng the Index Value of the HCA on page 176 Key Name Key Type Values Description SriovEnable REG DWORD 0 RoCE Configures the RDMA or SR IOV mode default Note RDMA is not supported in SR IOV 1 SRJOV mode SriovPortMode e 0 Configures the number of VFs to be enabled auto portl by the bus driver to each port default Note In auto portX mode port X will have e l auto port2 2 manual the number of VFs according to the burnt value in the device and the other port will have no SR IOV and it will support native Ethernet 1 e no RoCE Setting this parame ter to Manual will configure the number of VFs for each port according to the registry key MaxVFPortX Note The number of VFs can be configured both on a Mellanox bus driver level and Net work Interface level 1 e using Set Net AdapterSriov Powershell cmdlet The number of VFs actually available to the Net work Interface is the minimum value between mellanox bus driver configuration and Network Interface configuration For example if 8 VFs support was burnt in firm ware SriovPortMode is auto port1 and Net work Interface was allowed 32 VFs using SetNetAdapterSriov Powershell cmdlet the actual number of VFs available to Network Interface will be 8 MaxVFPortl e 16 default MaxVFPort lt i gt The maximum number of MaxVFPort2 VFs that are allowed per port This is the number of VFs the bus driver will open when working in manual
184. nnect to port lt port gt default 18515 d ib dev lt dev gt Uses IB device lt device guid gt default first device found i ib port lt port gt Uses port lt port gt of IB device default 1 m mtu lt mtu gt The mtu size default 1024 0 outs lt num gt The number of outstanding read atom default 4 SIZe lt size gt The size of message to exchange default 65536 a all Runs sizes from 2 till 223 t tx depth lt dep gt The size of tx queue default 100 n iters lt iters gt The number of exchanges at least 2 default 1000 C report cycles Reports times in cpu cycle units default microseconds H report histogram Print out all results default print summary only U report unsorted implies H Print out unsorted results default sorted V version Displays version number Mellanox Technologies 144 Rev 4 70 14 4 3 Table 37 ib_read_lat Flags and Options Flag Description BS grh Use GRH with packets mandatory for RoCE ib_send_bw ib send bw calculates the BW of SEND between a pair of machines One acts as a server and the other as a client The server receive packets from the client and they both calculate the through put of the operation The test supports features such as Bidirectional on which they both send and receive at the same time ch
185. nt to burn 1007 Found one device for which force Burn the firmware manually with the force update is required flag 1008 Found one device that has mixed ver The firmware version or the expansion rom sions version does not match For additional details please refer to the MFT User Manual http www mellanox com gt Products gt Firmware Tools 15 5 3 Restore Configuration Warnings Table 57 Restore Configuration Warnings Error Description Troubleshooting 3 Failed to restore the configura Please see log for more details and contact the tion support team Mellanox Technologies 169 Rev 4 70 Appendix A Windows MPI MS MPI A 1 Overview Message Passing Interface MPI is meant to provide virtual topology synchronization and com munication functionality between a set of processes With MPI you can run one process on several hosts Windows MPI run over the following protocols Sockets Ethernet Network Direct ND A 1 1 Prerequisites nstall HPC Build 4 0 3906 0 Validate traffic ping between the whole MPI Hosts Every MPI client need to run smpd process which open the mpi channel MPI Initiator Server need to run mpiexec If the initiator is also client it should also run smpd A 2 Running MPI Step 1 Run the following command on each mpi client start smpd d p port Step2 Install ND provider on each MPI client in MPI ND Step 3 Run the following com
186. nted for RDMA Read with minimum latency and runs over Microsoft s NetworkDirect standard The level of customizing for the user is relatively high User may choose to run with a customized message size customized number of iterations or alternatively customized test duration time nd read lat runs with all message sizes from 1B to 4MB powers of 2 message inlining CQ moderation Mellanox Technologies 159 Rev 4 70 14 4 16 1nd_read_lat SynopsisSynopsis lt running on specific single core gt Server side start b affinity 0X1 nd read lat s1048576 D10 S 11 137 53 1 Client side start b wait affinity 0X1 nd read lat s1048576 D10 C 11 137 53 1 14 4 16 2nd read lat Options The table below lists the various flags of the command Table 51 nd read lat Options Flags Description h Shows the Help screen V Shows the version number p Connects to the port port default 68307 s msg size gt Exchanges the message size with lt default 65536B gt and it must not be combined with a flag a Runs all the messages sizes from 1B to 8MB and it must not be combined with s flag n lt num of iterations gt The number of exchanges at least 2 the default is 100000 I max inline size The maximum size of message to send inline The default number is 128B D test duration in seconds Tests duration in seconds f margin time in seconds The marg
187. o Lid 3 port 1 PINKS tate T E e meme wre ec Initialize PhysminkStato per eee ere LinkUp hank Widen Sup pore nmr 1X or 4X Mellanox Technologies 107 Rev 4 70 TWankwidbe Bnabled ara n a TE TT 1X or 4X MAKNA CNACEEVE a suns 4X mnkspesdsuppon ed e e T T T 2 5 Gbps or 5 0 Gbps InkspeedBnaDled eee ETT 2 5 Gbps or 5 0 Gbps IankspesdAGEVOHR RECTE TET 5 0 Gbps 2 Query the status of two channel adapters using directed paths ibportstate C mlx4 0 D 0 1 PortInfo Port info DR path slid 65535 dlid 65535 0 port 1 IunkStates bue ue UU IE Initialize PhysminkSsbato NNNM LinkUp LinkWidthSupported 1X or 4X TimioidibnBmablled RP 1X or 4X KNA ENAC AVE onocaedconuasoas 4X MnkSpeedSuppon ed aE cnc so sucess 2 5 Gbps or 5 0 Gbps InnksoecdBnablcd rrea a 2 5 Gbps or 5 0 Gbps mMnkspeedACtiver raen a remorse 5 0 Gbps gt ibportstate C mthca0 D 0 1 PortInfo Port info DR path slid 65535 dlid 65535 0 port 1 ink States were wren ec eee ae Down physminkSbato Pee Polling IimkwidbgSUsDOBBed ss po C E 1X or 4X IrinmewidooEnabled E 1X or 4X ipa Milla Nee cory ee cagse sore 4X IMnkSoecdSloDO5Bed e REEL 2 5 Gbps IinkspeedEnablled d e 2 5 Gbps LinkSpeedActive sss 2 5 Gbps 3 Change the speed of a port First query for current configuration gt ibportstate C mlx4 0 D 0 1 PortInfo Port info DR path slid 65535 dlid 65535 0 port 1 LINKS RAC ear we ED OE CER Initialize PhysminkSsbato e
188. o used for data transfer of link disconnecting In such scenario the system uses one of the other interfaces When the primary link comes up the LBFO interface returns to transfer data using the primary interface If the primary adapter is not selected the primary interface is selected randomly Step 6 Optional Failback to Primary Step 7 Check the checkbox Mellanox ConnectX 3 Ethernet Adapter Properties xi General Advanced Information Performance Diagnostics VLAN LBFO Driver Details PowerManagement AA Load Balancing and Fail Over LBFO Settings Mellanox unde Nene A Bundle Type Fault Tolerance Primary X Jv Failback to Primary Adapters in the bundle r 07 08070 LI Mellanox ConnectX 3 Ethemet Adapter O Mellanox ConnectX 3 Ethemet Adapter 2 Create Modify Remove LBFO stands for Load Balancing and Fail Over The administrator can configure a bundle of adapters and associate up to 8 Mellanox ConnectX adapters to this bundle LBFO should be used to increase the system reliability upon a link failure and to balance the workload gt mk swe mbimd dee ek meee Cancel The newly created virtual Mellanox adapter representing the bundle will be displayed by the Device Manager under Network adapters in the following format see the figure below Mellanox Virtual Miniport Driver Team lt bundle_name gt Mellanox Technologies 43
189. olete and does nothing load cache lt filename gt Loads and use the cached ibnetdiscover data stored in the specified filename May be useful for outputting and learn ing about other fabrics or a previous state of a fabric Can not be used if user specifies a direct route path See ibnetdiscover for information on caching ibnetdiscover out put diff lt filename gt Loads cached ibnetdiscover data and do a diff comparison to the current network or another cache A special diff output for iblinkinfo output will be displayed showing differences between the old and current fabric links Be default the fol lowing are compared for differences port connections and port state See ibnetdiscover for information on caching ibnetdiscover output Mellanox Technologies 134 Rev 4 70 Table 31 iblinkinfo Flags and Options Flags Description diffcheck lt key s gt Specifies what diff checks should be done in the diffoption above Comma separate multiple diff check key s The available diff checks are port port connections state port state lid lids nodedesc node descriptions If port is specified alongside lid or nodedesc remote port lids and node descriptions will also be compared filterdownports lt filename gt Filters downports indicated in a ibnetdiscover cache If a port was previously indicated as down in the specified cache and is still down do not output it in the resulting out
190. omputer To continue click Next WARNING This program is protected by copyright law and international treaties Step 4 Click Change and specify the location in which the files are extracted to Network Location Specify a network location For the server image of the product Enter the network location or click Change to browse to a location Click Install to create a server image of MLNX VPI at the specified network location or click Cancel to exit the wizard Network location InstallShield Install Cancel Mellanox Technologies 18 Rev 4 70 Step 5 Click Install to extract this folder or click Change to install to a different folder Network Location Specify a network location for the server image of the product Enter the network location or click Change to browse to a location Click Install to create a server image of MLNX VPI at the specified network location or click Cancel to exit the wizard Network location FOOT InstallShield InstallShield Wizard Completed The InstallShield Wizard has successfully installed MLNX VPI Click Finish to exit the wizard Bak ens cm Mellanox Technologies 19 Rev 4 70 4 Installing Mellanox WinOF Driver This section provides instructions for two types of installation procedures Attended Installation An installation procedure that requires frequent user intervention e Unattended Ins
191. on over SMB direct performance please set the following regis try key to 0 and reboot the machine HKEY LOCAL MACHINE System CurrentControlSet Services LanmanServer Parameters RequireSe curitySignature For further details please refer to http blogs technet com b josebda archive 2010 12 01 the basics of smb signing covering both smb1 and smb2 aspx 11 2 Application Specific Optimization and Tuning 11 2 1 Ethernet Performance Tuning The user can configure the Ethernet adapter by setting some registry keys The registry keys may affect Ethernet performance To improve performance activate the performance tuning tool as follows Step 1 Start the Device Manager open a command line window and enter devmgmt msc Step 2 Open Network Adapters Step3 Right click the relevant Ethernet adapter and select Properties Step 4 Select the Advanced tab Step 5 Modify performance parameters properties as desired 11 2 1 1 Performance Known Issues On Intel I OAT supported systems it is highly recommended to install and enable the latest I OAT driver download from www intel com With I OAT enabled sending 256 byte messages or larger will activate I OAT This will cause a significant latency increase due to I OAT algorithms On the other hand throughput will increase significantly when using I OAT 11 2 2 IPolB Performance Tuning The user can configure the IPoIB adapter by setting some registry keys Th
192. onnectX 3 MT04099 Network Adapter Microsoft ACPI Compliant System Step2 Right click on the Mellanox ConnectX Ethernet network adapter and left click Properties Select the Port Protocol tab from the Properties window The Port Protocol tab is displayed only if the NIC is a VPI IB and ETH Be Mellanox Technologies 39 Rev 4 70 The figure below is an example of the displayed Port Protocol window for a dual port VPI adapter card General Port Protocol Driver Details Events Rescurces ji 1 Current Setting Porti IB Mell MIEPABON Port2 Eth HCA Port Type Configuration Hw Defaults Port 4 IB C ETH C AUTO Port 2 C IB ETH C AUTO Port Protocol Configuration This menu displays the adapter s port type and enables you to set the network protocols for the network adapter ports Thenetwork protocol is determined according to the NIC s Hardware Defaults port type You can choose the protocol explicitly by selecting the port type to InfiniBand IB or Ethernet Eth Ta enable in Sensing please chanse SIITA If the NIC ok Step 3 In this step you can perform the following functions Ifyou choose the HW Defaults option the port protocols will be determined according to the NIC s hardware default values Choose the desired port protocol for the available port s If you choose IB o
193. ons perf tuning Appendix Synopsis on page 87 Section 6 3 1 Upgrading Firmware Manually on page 28 Section 8 7 2 RoCE Configuration on page 50 Section 11 4 Adapter Proprietary Performance Counters on page 93 Rev 4 2 October 20 2012 Added the following sections e Section 10 Deploying Windows Server 2012 and Above with SMB Direct on page 81 and its subsec tions Section 8 2 Header Data Split on page 38 e Section 14 2 part man Virtual IPoIB Port Cre ation Utility on page 101 Updated Section 11 Performance Tuning on page 84 Rev 3 2 0 July 23 2012 e No changes Rev 3 1 0 May 21 2012 Added section Tuning the IPoIB Network Adapter Added section Tuning the Ethernet Network Adapter e Added section Performance tuning tool application Removed section Tuning the Network Adapter Removed section part man e Removed section ibdiagnet Mellanox Technologies 11 J Rev 4 70 Table 1 Revision History Document Revision Date Changes Rev 3 0 0 February 08 2012 Added section RDMA over Converged Ethernet RoCE and its subsections Added section Hyper V with VMQ Added section Network Driver Interface Specification NDIS e Added section Header Data Split Added section Auto Sensing Added section Adapter Teaming Added section Port Protocol Configuration Added section Advanced Configuration for InfiniBand Driver
194. ontrollers 4 WE System devices 7E ACPI Fixed Feature Button Composite Bus Enumerator Direct memory access controller Generic Bus Intel R 5000 Series Chipset Error Reporting Registers 25FO Intel R 5000 Series Chipset Error Reporting Registers 25F0 Intel R 5000 Series Chipset Error Reporting Registers 25FO Intel R 5000 Series Chipset FBD Registers 25F5 Intel R 5000 Series Chipset FBD Registers 25F6 Intel R 5000 Series Chipset PCI Express x4 Port 3 25E3 Intel R 5000 Series Chipset PCI Express x4 Port 5 25E5 Intel R 5000 Series Chipset PCI Express x4 Port 6 25E6 Intel R 5000 Series Chipset PCI Express x4 Port 7 25E7 Intel R 5000 Series Chipset PCI Express x8 Port 2 3 25F7 Intel R 5000 Series Chipset Reserved Registers 25F1 Intel R 5000 Series Chipset Reserved Registers 25F3 Intel R 5000X Chipset Memory Controller Hub 25CO Intel R 5000X Chipset PCI Express x16 Port 4 7 25FA Intel R 6311ESB 6321ESB PCI Express Downstream Port E1 3510 Intel R 6311ESB 6321ESB PCI Express to PCI X Bridge 350C Intel R 6311ESB 6321ESB PCI Express Upstrearn Port 3500 Intel R 631xESB 6321ESB 3100 Chipset LPC Interface Controller 2670 Intel R 631xESB 6321ESB 3100 Chipset PCI Express Root Port 1 2690 Intel R 631xESB 6321ESB 3100 Chipset PCI Express Root Port 2 2692 Intel R 631xESB 6321ESB 3100 Chipset SMBus Controller 269B Intel R 82801 PCI Bridge 244E Mellanox ConnectX 3 MT04099 Network Adapter Mellanox C
195. ools install the performance tools that are used to measure the InfiniBand perfor mance in user environment Analyze tools install the tools that can be used either to diagnosed or analyzed the InfiniBand environment SDK contains the libraries and DLLs for developing InfiniBand application over IBAL Documentation contains the User Manual and Installation Guide ND FLTR DLLs contains the files for standalone installation of the mlx4nd provider Custom Setup Select the program features you want installed Click on an icon in the list below to change how a feature is installed Feature Description Subnet manager SM This feature requires 856KB on your hard drive Install to C Program Files Mellanox MLNX_VPI IB Tools InstallShield Help Mellanox Technologies 23 Rev 4 70 b Click Install to start the installation Ready to Install the Program The wizard is ready to begin installation Click Install to begin the installation If you want to review or change any of your installation settings click Back Click Cancel to exit the wizard InstallShield InstallShield Wizard Completed The InstallShield Wizard has successfully installed MLNX VPI Click Finish to exit the wizard You chose to run performance tuning The log File can be Found at C windows System32 LogFiles PerformanceTunin g log Cl Show release notes
196. or vice versa Issue2 The performance is low Suggestion This can be due to non optimal system configuration See the section Perfor mance Tuning to take advantage of Mellanox 40 10 GBit NIC performance Issue3 The driver does not start Suggestion 1 This can happen due to an RSS configuration mismatch between the TCP stack and the Mellanox adapter To confirm this scenario open the event log and look under Sys tem for the mlx4ethX source If found enable RSS as follows 1 Run the following command netsh int tcp set global rss enabled Suggestion 2 This is a less recommended suggestion and will cause low performance To dis able RSS on the adapter run the following command netsh int tcp set global rss no dynamic balancing Issue 4 The Ethernet driver fails to start In the Event log under the mlx4 bus source the fol lowing error message appears RUN FW command failed with error 22 Mellanox Technologies 164 Rev 4 70 Suggestion The error message indicates that the wrong firmware image has been programmed on the adapter card See Section 2 Downloading Mellanox WinOF Driver on page 17 Issue5 The Ethernet driver fails to start A yellow sign appears near the Mellanox ConnectX 10Gb Ethernet Adapter in the Device Manager display Suggestion This can happen due to a hardware error Try to disable and re enable Mellanox ConnectX Adapter from the Device Manager display Issue 6 No connect
197. ortMode is auto portl and Network Interface was allowed 32 VFs using SetNetAdapterSriov Powershell cmdlet the actual number of VFs available to Network Interface will be 8 SrovPortiNumVFs 16 default SriovPort lt i gt NumVFs The maximum number of VFs SriovPort2NumVFs that are allowed per port This is the number of VFs the bus driver will open when working in manual mode Note If the total number of VFs requested is larger than the number of VFs burnt in firmware each port X 1 2 will have the number of VFs according to the following formula SriovPortXNumVFs SriovPorti NumVFs SriovPort2NumVFs number of VFs burnt in firmware Step 5 Check in the System Event Log that SR IOV is enabled Go to Start gt Control Panel gt System and Security Administrative Tools gt View Event Logs Mellanox Technologies 68 J Rev 4 70 Figure 13 System Event Log File Action View Help 2 Br Event Viewer Local System Number of events 4 767 New events available b 3 Custom Views 4 A Windows Logs F Application Level Date and Time Source EventID Task C A Warning 12 10 2012 8 15 24 PM mhil bus 35 None Bl security 12 10 2012 8 15 20PM_ mbd bus None nformation ARD fey m one E setup Information 12 10 2012 85 12PM Hyper V Hyp 129 N n nformation dai er v w one B Stem Information 12 10 2012 815 12PM_ Hyper V Hyp 1N i Information 12 10 2012 8 15 12 PM
198. ow v erbose G uid C ca name P ca port s smlid t imeout timeout ms V ersion o oui S erver h elp dest lid guid op 14 3 20 2ibsysstat Options Table 33 ibsysstat Flags and Options Flags Description ping Verifies connectivity to server default host Obtains host information from server cpu Obtains cpu information from server 0 oul Uses specified OUI number to multiplex vendor mads S Server Starts in server mode do not return Debugging Flags Description Mellanox Technologies 137 Rev 4 70 Table 33 ibsysstat Flags and Options Flags Description NOTE Most OpenIB diagnostics take the following common flags The exact list of sup ported flags per utility can be found in the usage message and can be shown using the util_ name h syntax d Raises the IB debugging level Can be used several times ddd or d d d e Shows send and receive errors timeouts and oth ers h Shows the usage message V Increases the application verbosity level Can be used several times vv or v v v v Shows the version info Addressing Flags Description G Uses GUID address argument In most cases it is the Port GUID Example 0x08f1040023 s lt smlid gt Uses smlid as the target lid for SM SA queries Other Common Flags Description C lt ca_name gt Uses the specified ca_name P lt ca
199. ow Shows progress information during discovery Mellanox Technologies 120 Rev 4 70 Table 22 ibnetdiscover Flags and Options Flag Description node name map lt node name Specifies anode name map The node name map file maps GUIDs to map gt more user friendly names See Topology File Format on page 122 cache filename Caches the ibnetdiscover network data in the specified filename This cache may be used by other tools for later analysis Joad cache filename Loads and use the cached ibnetdiscover data stored in the specified filename May be useful for outputting and learning about other fab rics or a previous state of a fabric diff filename Loads cached ibnetdiscover data and do a diff comparison to the cur rent network or another cache A special diff output for ibnetdiscover output will be displayed showing differences between the old and cur rent fabric By default the following are compared for differences switches channel adapters routers and port connections diffcheck lt key s gt Specifies what diff checks should be done in the diff option above Comma separate multiple diff check key s The available diff checks are sw switches ca channel adapters router routers port port connections lid lids nodedesc node descriptions Note that port lid and nodedesc are checked only for the node types that are specified e g sw ca router If por
200. ows Server 2012 and above Mellanox Technologies 186 Rev 4 70 Value Name Default Value Description BlueFlame 1 The latency critical Send WQEs to the device When a BlueFlame is used the WQEs are written directly to the PCI BAR of the device in addition to memory so that the device may handle them without having to access memory thus shortening the execution latency For best performance it is recommended to use the BlueFlame when the HCA is lightly loaded For high bandwidth scenarios it is recom mended to use regular posting without BlueFlame The valid values are e 0 disable e 1 enable Note This registry value is not exposed via the UI MaxRSSProcessors 8 The maximum number of RSS processors Note This registry key is only in Windows Server 2012 and above C 6 Ethernet Registry Keys The following section describes the registry keys that are only relevant to Ethernet driver Value Name Default Value Description RoceMaxFrameSize 1024 The maximum size of a frame or a packet that can be sent by the RoCE protocol a k a Maximum Transmission Unit MTU Using larger RoCE MTU will improve the performance however one must ensure that the entire system including switches sup ports the defined MTU Ethernet packet uses the general MTU value whereas the RoCE packet uses the ROCE MTU The valid values are 256 512 1024 2048 Note This regis
201. ox Technologies 149 Rev 4 70 Table 42 ibv read bw Flags and Options Flag Description Q cq mod Generate Cqe only after lt cq mod gt completion N no peak bw Cancel peak bw calculation default with peak 14 4 8 ibv_read_lat This is a more advanced version of ib read lat and contains more flags and features than the older version and also improved algorithms ibv read lat calculates the latency of RDMA read operation of message size between a pair of machines One acts as a server and the other as a cli ent They perform a ping pong benchmark on which one side RDMA reads the memory of the other side only after the other side have read his memory Each of the sides samples the CPU clock each time they read the other side memory to calculate latency Read is available only in RC connection mode as specified in InfiniBand spec 14 4 8 1 ibv read lat Synopsis ibv read lat i b port ib port m tu mtu size s ize message size t x depth tx size I nline size inline size u qp timeout S L sl type d ib device name x gid index n iteration num o uts outstanding reads e vents use events p ort PDT port a 11 V ersion xeport cycles H report histogram U report unsorted F CPU freq fail 14 4 8 2 ibv_read_lat Options The table below lists the various flags of the command Table 43 ibv_read_lat Flags and Options
202. packets is done on the processor that was defined in DefaultRecvRingProcessor registry key The valid values are 0 disabled 1 Enabled NOTE This registry value is not exposed via UI SingleStream 0 It used to get the maximum bandwidth when using single stream traffic When setting the registry key to enabled the driver will forward the sending packet to another CPU This decrease the CPU utilization of the sender and allows sending in higher rate The valid values are O disabled 1 Enabled NOTE only relevant for Ethernet and IPoIB C 6 1 Flow Control Options Mellanox Technologies 188 Rev 4 70 This group of registry keys allows the administrator to control the TCP IP traffic by pausing frame transmitting and or receiving operations By enabling the Flow Control mechanism the adapters can overcome any TCP IP issues and eliminate the risk of data loss Value Name Default Value Description FlowControl 3 When Rx Pause is enabled the receiving adapter generates a flow control frame when its received queue reaches a pre defined limit The flow control frame is sent to the sending adapter When TX Pause is enabled the sending adapter pauses the trans mission if it receives a flow control frame from a link partner The valid values are 0 Flow control is disabled 1 Tx Flow control is Enabled 2 Rx Flow control is enabled 3 Rx amp Tx Flow control is enabled PerPriRx
203. pon transmit and or receive instead of the CPU default Enabled TCP UDP Checksum Offload for IPv6 packets Enables the adapter to compute TCP UDP checksum over IPv6 packets upon transmit and or receive instead of the CPU default Enabled Large Send Offload LSO Allows the TCP stack to build a TCP message up to 64KB long and sends it in one call down the stack The adapter then re segments the message into multiple TCP packets for transmission on the wire with each pack sized according to the MTU This option offloads a large amount of kernel processing time from the host CPU to the adapter IB Options Configures parameters related to InfiniBand functionality SA Query Retry Count Sets the number of SA query retries once a query fails The valid values are 1 64 default 10 SA Query Timeout Sets the waiting timeout in millisecond of an SA query completion The valid values are 500 60000 default 1000 ms 11 4 Adapter Proprietary Performance Counters Proprietary Performance Counters are used to provide information on Operating System applica tion service or the drivers performance Counters can be used for different system debugging purposes help to determine system bottlenecks and fine tune system and application perfor mance The Operating System network and devices provide counter data that the application can consume to provide users with a graphical view of the system s performance quality WinOF counters
204. quester RNR NAK retries exceeded errors Number of RNR Receiver Not Ready NAKs retries exceeded errors when the local machine generates outbound traffic Bad multicast received Number of bad multicast packet received Mellanox Technologies 96 J Rev 4 70 Table 12 Mellanox Adapter Diagnostics Counters Mellanox Adapter Diagnostics Deuir Counters Discarded UD packets Number of UD packets silently discarded on the receive queue due to lack of receives descriptor Discarded UC packets Number of UC packets silently discarded on the receive queue due to lack of receives descriptor CQ overflows Number of CQ overflows NOTE this value is evaluated for the entire NIC since there are cases where CQ might be associated with both ports i e the value on all ports is identical EQ overflows Number of EQ overflows NOTE this value is evaluated for the entire NIC since there are cases where EQ might be associated with both ports i e the value on all ports is identical Bad doorbells Number of bad DoorBells Responder duplicate request Number of duplicate requests received when the local machine receives received pending firmware inbound traffic implementation Requester time out received Number of time out received when the local machine generates outbound pending firmware implementa traffic tion 11 4 1 3 Proprietary Mellanox QoS Counters Proprietary Mellanox
205. r ETH both ends of the connection must be of the same type IB or ETH Enable Auto Sensing by checking the AUTO checkbox If the NIC does not support Auto Sensing the AUTO option will be grayed out If you choose AUTO the current setting will indicate the actual port settings IB or ETH 8 5 Load Balancing Fail Over LBFO and VLAN Windows Server 2012 and above supports load balancing as part of the operating system Please refer to Microsoft guide NIC Teaming in Windows Server 2012 following the link below http social technet microsoft com wiki contents articles 1495 1 nic teaming in windows server 2012 aspx For other earlier operating systems please refer to the sections below 8 5 1 Adapter Teaming Adapter teaming can group a group of ports inside a network adapter or a number of physical net work adapters into virtual adapters that provide the fault tolerance and load balancing functions Depending on the teaming mode one or more interfaces can be active The non active interfaces in a team are in a standby mode and will take over the network traffic in the event of a link failure in the active interfaces All of the active interfaces in a team participate in load balancing opera tions by sending and receiving a portion of the total network traffic Mellanox Technologies 40 Rev 4 70 8 5 1 1 Teaming Bundle Modes 1 Fault Tolerance Provides automatic redundancy for the server s network connection
206. racert m 0xc000 4 16 show multicast path of mlid 0xc000 between lids 4 and 16 14 3 11 sminfo Optionally sets and displays the output of a sminfo query in a readable format The target SM is the one listed in the local port info or the SM specified by the optional SM lid or by the SM direct routed path Using sminfo for any purposes other than simple query may result in a mal k function of the target SM Mellanox Technologies 125 Rev 4 70 14 3 11 1sminfo Synopsis sminfo d ebug e rr_show s state p prio a activity D irect L id u sage G uid C ca name P ca port t imeout timeout ms V ersion h elp sm lid sm dr path modifier 14 3 11 2sminfo Options The table below lists the various flags of the command Most OpenIB diagnostics take the following common flags The exact list of supported flags per utility can be found in the usage message and can be shown using the util name h syntax Table 24 sminfo Flags and Options Flag Description state s Sets SM state 0O notactive e 1 discovering e 2 standby e 3 master priority p Sets priority 0 15 activity a Sets activity count debug d ddd d d d Raises the IB debugging level Direct D Uses directed path address arguments The path is a comma separated list of out ports Examples e 0 self port e 0 1 2 1 4 out via port 1 then 2
207. rded Mellanox ConnectX EN 10Gbit Ethernet Adapter X has been successfully initialized and enabled Failed to initialize Mellanox ConnectX EN 10Gbit Ethernet Adapter Mellanox ConnectX EN 10Gbit Ethernet Adapter lt X gt has been successfully initialized and enabled The port s network address is MAC Address The Mellanox ConnectX EN 10Gbit Ethernet was reset Failed to reset the Mellanox ConnectX EN 10Gbit Ethernet NIC Try disabling then re enabling the Mellanox Ethernet Bus Driver device via the Windows device manager e Mellanox ConnectX EN 10Gbit Ethernet Adapter lt X gt has been successfully stopped Failed to initialize the Mellanox ConnectX EN 10Gbit Ethernet Adapter X because it uses old firmware version old firmware version gt You need to burn firmware version new firmware ver sion gt or higher and to restart your computer Mellanox Technologies 165 Rev 4 70 e Mellanox ConnectX EN 10Gbit Ethernet Adapter X device detected that the link connected to port lt Y gt is up and has initiated normal operation e Mellanox ConnectX EN 10Gbit Ethernet Adapter X device detected that the link connected to port lt Y gt is down This can occur if the physical link is disconnected or damaged or if the other end port is down Mismatch in the configurations between the two ports may affect the performance When Using MSI X both ports should use the same RSS mode To fix the problem config
208. s 131 14 317 ibcacheed iti deter RR a Gre deb raped ee a oes 133 14 3 T8 db lin kiff esce s a Wee eal eck DORT OA IECUR HIERO DRE 134 14 3 19 ibqueryerrors sel ke heres Sb et ees ged Saeed eee as RD 135 143 20 1b8y88l3L c She att Geet GE GE ee a ee Cus 137 14 3 2 I saqUety oce etate lace nhs mlt de eor stiegen edt ted s 139 14 3 22 SmpdUutip c ode ot een gehe ere te TE equ RETE eae SES 141 14 4 InfiniBand Fabric Performance Utilities 0 0 00 0 cece eee 143 14 4 1 ib read bw iu Seatac BURR EASE Oe kite nd Sedo ane ieee eee 143 144 2 ibsread lat edes soU ta guested ehh sees bes MP etal easlee EN ubboxss 144 14 4 3 ab send bw i ye ette et dle te d e pe t ct 145 T4 4 4 1bssend Tat one ee te Pele ade edet CREE DERE touc bees 145 14 4 5 ab Write DW e RR edge NE aet gadgets 146 14 4 6 ib write latii iere tee E AERE nee a e RESI 147 1447 aby read DW s ceo eee oa UA OAERTN TERCER 148 144 8 by read lato icu daa Mee Be oo Ee LEAN UR ERR OY e 150 144 9 abycsend DW eI es ae i REM eR RR ERES 151 14 4 TO by send lat crearet e t ect bete cct 152 T4 4 TL laby write DW Re Sete en ee eer IUe quet ege ERR RSEN E OE 154 144 12aby write later ovi citfilSs UI Ie38 hfow den ig0f8 EU bImS fase models 155 144 13 nd Write bw ii sies e rer ERE ER Ee RES er ER xo 157 I44 T4 md write lat ovo VERS GNE CEN PT ERLE BER 157 14 4 15 nd read DW eR p E EUM et 158 144A 6nd read lat cto chk e RE eve EDT p RH 159 14 4 17 uidesend bw
209. s Settings Suggestion 1 In Windows 2012 and above when a kernel debugger is configured not neces sarily physically connected flow control is disabled unless the following registry key is set reboot required after setting Registry Path HKLM SYSTEM CurrentControlSet Services NDIS Parameters Type REG DWORD Key name AllowFlowControlUnderDebugger Value 1 Suggestion 2 Go to Power Options in the Control Panel Make sure Maximum Perfor mance is set as the power scheme reboot is needed Issue2 General Diagnostic Mellanox Technologies 166 Rev 4 70 Suggestion 1 Go to Device Manager locate the Mellanox adapter that you are debugging right click and go to Information PCI Gen 2 should appear as PCI E 5 0 GT s PCI Gen 3 should appear as PCI E 8 0 GT s Link Speed 40 0Gbps 10 0Gbps Suggestion 2 To determine if the Mellanox NIC and PCI bus can achieve their maximum speed it s best to run ib send bw in a loopback On the same machine 1 Run start b affinity 0x1 ibv write bw 2 Run start b affinity Ox2 ibv write bw 127 0 0 1 3 Repeat for port 2 with additional p2 and for other cards if necessary 4 On PCI Gen3 the expected result is around 5700MB s On PCI Gen2 the expected result is around 3300MB s Any number lower than that points to bad configuration or installation on the wrong PCI slot Malfunctioning QoS settings and Flow Control can be the cause as well Suggestion 3 To de
210. s a readable topology file GUIDs node types and port numbers are displayed as well as port LIDs and NodeDescriptions All nodes and links are displayed full topology Optionally this utility can be used to list the current connected nodes by node type The output is printed to standard output unless a topology file is specified 14 3 9 1 ibnetdiscover Synopsis ibnetdiscover d ebug e rr_show v erbose s how l ist g rouping H ca list S witch list R outer list C ca name P ca port t imeout timeout ms V ersion outstanding smps o lt val gt u sage node name map node name map cache filename load cache filename p orts m ax hops h elp topology file 14 3 9 2 ibnetdiscover Options The table below lists the various flags of the command Most OpenIB diagnostics take the following common flags The exact list of supported flags per utility can be found in the usage message and can be shown using the util name h syntax Table 22 ibnetdiscover Flags and Options Flag Description list Lists of connected nodes g grouping Shows grouping Grouping correlates InfiniBand nodes by different vendor specific schemes It may also show the switch external ports correspondence H Hca list Lists of connected CAs S Switch_list Lists of connected switches R Router_list Lists of connected routers s sh
211. s long as these entries are not changed directly in the registry or by some other installation or script A reboot may be required for the changes to take effect be 11 1 4 Tuning the Ethernet Network Adapter The Ethernet Network Adapter general tuning can be performed during installation by modifying some of Windows registries as explained in section Registry Tuning on page 32 Specific sce narios tuning can be set post installation manually gt To improve the network adapter performance activate the performance tuning tool as fol lows Step 1 Start the Device Manager open a command line window and enter devmgmt msc Step 2 Open Network Adapters Step3 Select Mellanox Ethernet adapter right click and select Properties Step4 Select the Performance tab Step 5 Choose one of the tuning scenarios Single port traffic Improves performance for running single port traffic each time Single stream traffic Optimizes tuning for applications with single connection Dual port traffic Improves performance for running traffic on both ports simultaneously Forwarding traffic Improves performance for running scenarios that involve both ports for exam ple via IXIA Multicast traffic Improves performance when the main traffic runs on multicast Mellanox Technologies 85 J Rev 4 70 7 Click on Run Tuning button Detak Everts Power Managemert General Advanced Information Peitormanc
212. s out with priority X and reaches the far end node with the same priority X The priority should be losslessin the switches P gt To force MSMPI to work over ND and not over sockets add the following in mpiexec com mand env MPICH DISABLE ND 0 env MPICH DISABLE SOCK 1 A 5 Configuring MPI Step 1 Configure all the hosts in the cluster with identical PFC see the PFC example below Step 2 Run the WHCK ND based traffic tests to Check PFC ndrping ndping ndrpingpong ndping pong Step 3 Validate PFC counters during the run time of ND tests with Mellanox Adapter QoS Coun ters in the perfmon Step 4 Install the same version of HPC Pack in the entire cluster NOTE Version mismatch in HPC Pack 2012 can cause MPI to hung Step 5 Validate the MPI base infrastructure with simple commands such as hostname A 5 1 PFC Example In the example below ND and NDK go to priority 3 that configures no drop in the switches The TCP UDP traffic directs ALL traffic to priority 1 Install dcbx and remove any previous settings Install WindowsFeature Data Center Bridging Remove NetQosTrafficClass Remove NetQosPolicy Confirm False Set NetQosDobxsSetting Willing 0 New NetQosPolicy SMB NetDirectPortMatchCondition 445 Priority Value8021 Action 3 e New NetQosPolicy DEFAULT Default Priority Value8021 Action 3 e New NetQosPolicy TCP IPProtocolMatchCondition TCP Priority Value8021 Action
213. s registry key The valid values are e 0 the interface is exposed as NdisPhysicalMediumInfiniband e 1 the interface is exposed as NdisPhysicalMedium802 3 Note This registry value is not exposed via the UI SaTimeout 1000 The time in milliseconds before retransmitting an SA query request The valid values are 250 up to 60000 SaRetries 10 The number of times to retry an SA query request The valid values are 1 up to 64 U McastIgmpMldGener The number of runs of the multicast monitor before a general query alQueryInterval is initiated This monitor runs every 30 seconds The valid values are 1 up to 10 Mellanox Technologies 191 Rev 4 70 Value Name Default Value Description LocalEndpointMaxAge 5 The maximum number of runs of the local end point DB monitor before an unused local endpoint is removed The endpoint age is zeroed when it is used as a source in the send flow or a destination in the receive flow Each monitor run will increment the age of all non VMQ local endpoints When LocalEndpointMaxA ge is reached the endpoint will be removed The valid values are 1 up to 20 Note This registry value is not exposed via the UI LocalEndpointMonito 60000 The time interval in ms between each 2 runs of the local end point rInterval DB monitor for aging unused local endpoints Each run will incre ment the age of all non VMQ local endpoints The valid values are 10000 up to
214. s support NDIS6 3 NumRSSQueues RssMaxProcNumber Examples For example if the adapter is represented by Local Area Connection 6 and Local Area Con nection 7 For single port stream tuning type perf tuning exe s cl Local Area Connection 6 c2 Local Area Connection 7 or to set one adapter only perf tuning exe s cl Local Area Connection 6 For single stream tuning type perf tuning exe st cl Local Area Connection 6 c2 Local Area Connection 7 or to set one adapter only perf tuning exe st cl Local Area Connection 6 For dual port streams tuning type perf tuning exe d cl Local Area Connection 6 c2 Local Area Connection 7 For forwarding streams tuning type perf tuning exe f cl Local Area Connection 6 c2 Local Area Connection 7 For manual tuning of the first adapter to use RSS on CPUs 0 3 perf tuning exe m cl Local Area Connection 6 b 0 n 4 In order to restore defaults type perf tuning exe r cl Local Area Connection 6 c2 Local Area Connection 7 Mellanox Technologies 89 J Rev 4 70 11 1 5 SR IOV Tuning To achieve best performance on SR IOV VF please run the following powershell commands on the host Set VMNetworkAdapter Name Network Adapter VMName vml IovQueuePairsRequested 4 OR Set VMNetworkAdapter Name Network Adapter VMName vml IovQueuePairsRequested 8 for 40GbE 11 1 6 Improving Live Migration In order to improve live migrati
215. se in small a MLNX OS OpenSM can run as a Windows service and can be started manually from the following directory lt installation_directory gt tools OpenSM as a service will use the first active port unless it receives a specific GUID OpenSM can be registered as a service from either the Command Line Interface CLI or the PowerShell The following are commands used from the CLI gt To register it as a service execute the OpenSM service sc create OpenSM binPath c Program Files Mellanox MLNX_VPI IB Tools opensm exe service start auto gt To start OpenSM as a service sc start OpenSM gt To run OpenSM manually opensm exe For additional run options enter opensm exe h The following are commands used from the PowerShell gt To register it as a service execute the OpenSM service New Service Name OpenSM BinaryPathName C Program Files Mella nox MLNX_VPI IB Tools opensm exe service L 128 DisplayName OpenSM Description OpenSM for IB subnet StartupType Automatic To start OpenSM as a service run Start Service OpenSM1 Notes Forlong term running please avoid using the v verbosity option to avoid exceeding disk quota Running OpenSM on multiple servers may lead to incorrect OpenSM behavior Please do not run more than two instances of OpenSM in the subnet Mellanox Technologies 99 J Rev 4 70 13 Software Development Kit SDK Software Development Kit SDK
216. sing one of the following methods To install the Data Center Bridging using the Server Manager Step 1 Open the Server Manager Step 2 Select Add Roles and Features Step3 Click Next Step 4 Select Features on the left panel Step 5 Check the Data Center Bridging checkbox Step 6 Click Install To install the Data Center Bridging using PowerShell Step 1 Enable Data Center Bridging DCB PS Install WindowsFeature Data Center Bridging To configure QoS on the host 3 The procedure below is not saved after you reboot your system Hence we recommend you create a script using the steps below and run it on the local machine Please see the procedure below on how to add the script to the local machine startup 5 scripts Step 1 Change the Windows PowerShell execution policy PS Set ExecutionPolicy AllSigned Step2 Remove the entire previous QoS configuration PS Remove NetQosTrafficClass PS Remove NetQosPolicy Confirm False Step3 Set the DCBX Willing parameter to false as Mellanox drivers do not support this feature PS set NetQosDcbxSetting Willing 0 Step 4 Create a Quality of Service QoS policy and tag each type of traffic with the relevant priority In this example we used TCP UDP priority 1 ND NDK priority 3 PS New NetQosPolicy SMB Policystore Activestore NetDirectPortMatchCondition 445 PriorityValue8021Action 3 PS New NetQosPolicy DEFAULT Policystore Activestore Defa
217. sion L id u sage c ping count lood o oui S erver h elp dest lid guid 14 3 8 2 ibping Options The table below lists the various flags of the command Table 21 ibping Flags and Options Flag Description count c lt num gt Stops after count packets f flood Floods destination send packets back to back without delay 0 oui Uses specified OUI number to multiplex vendor mads Server S Starts in server mode do not return debug d ddd d d d Raises the IB debugging level eITOIS Shows send and receive errors timeouts and others help h Shows the usage message verbose v vvv v V V Increases the application verbosity level version V Shows the version info Lid L Use LID address argument usage u Usage message Mellanox Technologies 119 Rev 4 70 Table 21 ibping Flags and Options Flag Description Guid G Uses GUID address argument In most cases it is the Port GUID For example 0x08f1040023 sm_port s lt smlid gt Uses smlid as the target lid for SM SA queries Ca C lt ca_name gt Uses the specified ca_name Port P lt ca_port gt Uses the specified ca_port timeout t lt timeout_ms gt Overrides the default timeout for the solicited mads 14 3 9 ibnetdiscover ibnetdiscover performs IB subnet discovery and output
218. structions to configure PFC on Mellanox ConnectXTM cards There are multiple configuration steps required all of which may be performed via Power Shell Therefore although we present each step individually you may ultimately choose to write a PowerShell script to do them all in one step Note that administrator privileges are required for these steps For further information please refer to http blogs technet com b Josebda archive 2012 07 31 deploying windows server 2012 with smb direct smb over rdma and the mellanox connectx 3 using 10gbe 40gbe roce step by step aspx 1 Standard RDMA APIs are IP based already for all existing RDMA technologies Mellanox Technologies 50 Rev 4 70 8 7 2 1 Prerequisites The following are the driver s prerequisites in order to set or configure RoCE ConnectX 3 and ConnectX 3 Pro firmware version 2 30 3000 or higher All InfiniBand verbs applications which run over InfiniBand verbs should work on RoCE links if they use GRH headers Set HCA to use Ethernet protocol Display the Device Manager and expand System Devices Please refer to Section 8 4 2 Port Protocol Configuration on page 39 8 7 2 2 Configuring Windows Host Since PFC is responsible for flow controlling at the granularity of traffic priority it is necessary to assign different priorities to different types of network traffic As per RoCE configuration all ND NDK traffic is assigned to one or more
219. t higher data rates because the system must handle a larger number of interrupts However the latency is decreased since that packet is processed more quickly When interrupt moderation is enabled the system accumu lates interrupts and sends a single interrupt rather than a series of interrupts An interrupt is generated after receiving 5 packets or after the passing of 10 micro seconds from receiving the first packet The valid values are e 0 disable e 1 enable RxIntModeration 2 Sets the rate at which the controller moderates or delays the generation of interrupts making it possible to optimize net work throughput and CPU utilization The default setting Adaptive adjusts the interrupt rates dynamically depend ing on traffic type and network usage Choosing a different setting may improve network and system performance in certain configurations The valid values are 1 static e 2 adaptive The interrupt moderation count and time are configured dynamically based on traffic types and rate pkt rate low 150000 Sets the packet rate below which the traffic is considered as latency traffic when using adaptive interrupt moderation The valid values are 100 up to 1000000 Note This registry value is not exposed via the UI pkt rate high 170000 Sets the packet rate above which the traffic 1s consid ered as bandwidth traffic when using adaptive interrupt modera tion The valid values are 100 up to 1000000
220. t is specified alongside lid or nodedesc remote port lids and node descriptions will also be com pared p ports Obtains a ports report which is a list of connected ports with relevant information like LID port num GUID width speed and NodeDe scription m max_hops Reports max hops discovered debug d ddd d d d Raises the IB debugging level errors Shows send and receive errors timeouts and others help h Shows the usage message verbose v vv v v V Increases the application verbosity level version V Shows the version info outstanding_smps o lt val gt Specifies the number of outstanding SMPs which should be issued during the scan usage u Usages message Ca C lt ca_name gt Uses the specified ca_name Port P lt ca_port gt Uses the specified ca_port timeout t lt timeout_ms gt Overrides the default timeout for the solicited mads full f Shows full information ports speed and width show s Shows more information Mellanox Technologies 121 Rev 4 70 14 3 9 3 Topology File Format The topology file format is largely intuitive Most identifiers are given textual names like vendor ID vendid device ID device ID GUIDs of various types sysimgguid caguid switchguid etc PortGUIDs are shown in parentheses For switches this is shown on the switchguid line For CA and router ports it is shown o
221. tAdapter Step3 Configure DCB Set NetQosDcbxSetting Willing 0 Step 4 Enable Network Adapter QoS Set NetAdapterQos Name Cx3Pro ETH Pl Enabled 1 Step 5 Enable Priority Flow Control PFC on the specific priority 3 5 Enable NetQosFlowControl 3 5 8 9 3 Configuring DSCP for TCP Traffic Create a QoS policy to tag All TCP UDP traffic with CoS value 1 and DSCP value 9 New NetQosPolicy DEFAULT PriorityValue8021Action 3 DSCPAction 9 DSCP can also be configured per protocol New NetQosPolicy TCP IPProtocolMatchCondition TCP PriorityValue8021Action 3 DSCPAction 16 New NetQosPolicy UDP IPProtocolMatchCondition UDP PriorityValue8021Action 3 DSCPAction 32 8 9 4 Configuring DSCP for RDMA Traffic Create a QoS policy to tag the ND traffic for port 10000 with CoS value 3 New NetQosPolicy ND10000 NetDirectPortMatchCondition 10000 PriorityValue8021Action 3 Related Commands Get NetAdapterQos Gets the QoS properties of the network adapter Get NetQosPolicy Retrieves network QoS policies Get NetQosFlowControl Gets QoS status per priority Mellanox Technologies 57 J Rev 4 70 8 9 5 Registry Settings The following attributes must be set manually and will be added to the miniport registry Table 7 DSCP Registry Keys Settings Registry Key Description TxUntagPriorityTag If Ox1 do not add 802 1Q tag to transmitted packets which are assigned 802 1p priority but are not assigned a no
222. tallation An automated installation procedure that requires no user intervention m P Both Attended and Unattended installations require administrator privileges aa 4 1 Attended Installation The following is an example ofa MLNX WinOF win2012 x64 installation session Step 1 Double click the exe and follow the GUI instructions to install MLNX_WinOF Starting from MLNX WinOF v4 55 the log option is enabled automatically The default path of the log is SLOCALAPPDATASMMLNX WinOF 1og0 sd 2 Step 2 Optional Manually configure your setup to contain the logs option LNX VPI WinOF 4 70 All win2012 x64 exe v l vx LogFile Step 3 Optional If you do not want to upgrade your firmware version LNX VPI WinOF 4 70 All win2012 x64 exe v MT SKIPFWUPGRD 1 Step 4 Optional If you want to control the installation of the WMI CIM provider LNX VPI WinOF 4 70 All win2012 x64 exe v T WMI l Step 5 Optional If you want to control whether to restore network configuration or not LNX VPI WinOF 4 70 All win2012 x64 exe v MT RESTORECONF 1 For further help please run LNX VPI WinOF 4 70 All win2012 x64 exe v h 1 MT SKIPFWUPGRD default value is False 2 MT WMI default value is True 3 MT RESTORECONE default value is True Mellanox Technologies 20 J Rev 4 70 Step 6 Click Next in the Welcome screen Welcome to the InstallShield Wizard for MLNX VPI The InstallShiel
223. ted Suggestion To troubleshoot this issue follow the steps below 1 Check that the InfiniBand driver is running on all nodes by using vstat The vstat utility located at lt installation_directory gt tools displays the status and capabil ities of the network adaptor card s 2 On the command line enter vstat use h for options to retrieve information about one or more adapter ports The field port_state will be equal to PORT DOWN when there is no InfiniBand cable no link PORT INITIALIZED when the port is connected to some other port physical link PORT ACTIVE when the port is connected and OpenSM is running logical link PORT ARMED when the port is connected to some other port physical link 3 Run sminfo and verify that OpenSM is running In case OpenSM is not running please see OpenSM operation instructions in Section 12 OpenSM Subnet Manager on page 99 above 4 Verify the status of ports by using vstat All connected ports should report PORT ACTIVE state 15 2 Ethernet Troubleshooting Issue1 The installation of Win OFED VPI for Windows fails with the following error mes sage This installation package is not supported by this processor type Contact your product vendor Suggestion This message is printed if you have downloaded and attempted to install an incor rect driver version for example if you are trying to install a 64 bit driver on a 32 bit machine
224. termine the maximum speed between the two sides with the most basic test 1 Run ib send bw on machine 1 2 Run ib send bw lt hostl gt on machine 2 where lt hostl gt is the hostname for machine 1 3 Results appear in MB s Mega Bytes 2 20 and reflect the actual data that was transferred excluding headers 4 If these results are not as expected the problem is most probably with one or more of the following Old Firmware version Misconfigured Flow control Global pause or PFC is configured wrong on the hosts routers and switches See Section 8 7 RDMA over Converged Ethernet RoCE on page 49 e CPU power options are not set to Maximum Performance Issue 3 QoS and Flow control Flow control settings can greatly affect results In order to see configured settings for all of the QoS options open a PowerShell prompt and use Get NetAdapterQos To achieve maximum performance all of the following must exist 1 All of the hosts switches and routers should use the same matching flow control settings If Global pause is used all devices must be configured for it If PFC Prior ity Flow control is used all devices must have matching settings for all priorities 2 ETS settings that limit speed of some priorities will greatly affect the output results 3 Make sure Flow Control is enabled on the Mellanox Interfaces enabled by default Go to the device manager right click the Mellanox interface go to Advanced and
225. ting physical IP network without requiring changes to physical network switch architecture Since NVGRE tunnels terminate at each Hyper V host the hosts handle all encapsulation and de encapsulation of the network traffic Firewalls that block GRE tunnels between sites have to be configured to support forwarding GRE IP Protocol 47 tunnel traffic Figure 5 NVGRE Packet Structure GRE Header Inner IP TCP TCP user data CA 8 8 1 Enabling Disabling NVGRE Offloading To leverage NVGRE to virtualize heavy network IO workloads the Mellanox ConnectX 3 Pro network NIC provides hardware support for GRE off load within the network NICs by default To enable disable NVGRE off loading Step 1 Open the Device Manager Step2 Go to the Network adapters Step3 Right click Properties on Mellanox ConnectX 3 Pro Ethernet Adapter card Step 4 Go to Advanced tab Step 5 Choose the Encapsulate Task Offload option Step 6 Set one of the following values Enable GRE off loading is Enabled by default Disabled When disabled the Hyper V host will still be able to transfer NVGRE traffic but TCP and inner IP checksums will be calculated by software that significant reduces performance Mellanox Technologies 54 Rev 4 70 8 8 2 Configuring the NVGRE using PowerShell Hyper V Network Virtualization policies can be centrally configured using PowerShell 3 0 and PowerShell Remoting Step 1 Windows Server 2012 Only Enable th
226. tion Required Disk Space for Installation 100MB Operating Systems e Windows Server 2008 R2 64 bit only Windows Server 2012 64 bit only Windows Server 2012 R2 64 bit only Note The Operating System listed above must run with administrator privileges 1 2 Supplied Packages Mellanox WinOF driver Rev 4 70 includes the following package e MLNX VPI WinOF lt version gt _All OS arch exe In this package the port default is auto RoCE is enabled 1 3 WinOF Set of Documentation Under lt installation_directory gt Documentation License file User Manual this document e MLNX VPI WinOF Release Notes Mellanox Technologies 16 J Rev 4 70 2 Downloading Mellanox WinOF Driver Follow these steps to download the exe according to your Operating System Step 1 Verify the machine architecture For Windows Server 2008 R2 1 Open a CMD console Click start gt Run and enter CMD 2 Enter the following command gt echo PROCESSOR ARCHITECTURES On an x64 64 bit machine the output will be AMD64 For Windows Server 2012 2012 R2 1 To go to the Start menu Position your mouse in the bottom right corner of the Remote Desktop of your screen 2 Open a CMD console Click Task Manager gt File gt Run new task gt and enter CMD 3 Enter the following command gt echo PROCESSOR ARCHITECTURES On an x64 64 bit machine the output will be AMD64 Step 2 Go to the Mellanox
227. tionMethod 0 e RecvCompletionMethod 2 e ReceiveBuffers 1024 In Operating Systems support NDIS6 3 RssProfile 4 Additionally this option chooses the best processors to assign to DefaultRecvRingProcessor TxForwardingProcessor In Operating Systems support NDIS6 2 RssBaseProcNumber MaxRssProcessors In Operating Systems support NDIS6 3 NumRSS Queues RssMaxProcNumber Mellanox Technologies 87 J Rev 4 70 Flag Description f Forwarding traffic scenario This option must be followed by two connection names The tuning in this case is code pendent This option automatically sets SendCompletionMethod 1 e RecvCompletionMethod 0 ReceiveBuffers 4096 UseRSSForRawIP 0 e UseRSSForUDP 0 Additionally this option chooses the best processors to assign to DefaultRecvRingProcessor C xInterruptProcessor TxForwardingProcessor In Operating Systems support NDIS6 2 RssBaseProcNumber MaxRssProcessors In Operating Systems support NDIS6 3 NumRSSQueues RssMaxProcNumber m Manual configuration This option must be followed by one connection name This option assigns the provided base and number of CPUs to e RssBaseProcNumber e MaxRssProcessors Additionally this option assigns the following with processors inside the range DefaultRecvRingProcessor TxInterruptProcessor r Restore default settings This option can be followed by one or two connectio
228. try key is supported only in Ethernet drivers Priority VLANTag 3 Packet Pri Enables sending and receiving IEEE 802 3ac tagged frames which ority amp VLAN include Enabled e 802 1p QoS Quality of Service tags for priority tagged pack ets 802 1Q tags for VLANs When this feature is enabled the Mellanox driver supports sending and receiving a packet with VLAN and QoS tag Mellanox Technologies 187 Rev 4 70 Value Name Default Value Description Promiscuous Vlan 0 Specifies whether a promiscuous VLAN is enabled or not When this parameter is set all the packets with VLAN tags are passed to an upper level without executing any filtering The valid values are e 0 disable e 1 enable Note This registry value is not exposed via the UI UseRSSForRawIP 1 The execution of RSS on UDP and Raw IP packets In a forwarding scenario one can improve the performance by disabling RSS on UDP or a raw packet In such a case the entire receive processing of these packets is done on the processor that was defined in DefaultRecvRingProcessor registry key The valid values are 0 disable e 1 enable This is also relevant for IPoIB Note This registry value is not exposed via the UI UseRSSForUDP 1 Used to execute RSS on UDP and Raw IP packet In forwarding scenario you can improve the performance by disable RSS on UDP or raw packet In such a case all the receive processing of these
229. ts refreshed 1 e disabled then enabled This may cause a temporary loss of connection to the adapter c Incase a bundle loses one or more network adapters by a create or modify operation the remaining adapters in the bundle are automatically notified of the change 8 5 3 Creating a Port VLAN in Windows 2008 R2 You can create a Port VLAN either on a physical Mellanox ConnectX EN adapter or a virtual bundle team The following steps describe how to create a port VLAN Mellanox Technologies 44 Rev 4 70 Step 1 Display the Device Manager oul Device Manager File Action View Help es m Hu q amp Computer E Disk drives Display adapters DVD CD ROM drives eg Floppy drive controllers 5 3 Human Interface Devices IDE ATAJATAPI controllers IEEE 1394 Bus host controllers Keyboards n Mice and other pointing devices K Monitors amp Network adapters Physical Broadcom BCM5708C Netxtreme II GigE NDIS VBD Client Adapters Broadcom BCM5708C Netxtreme II GigE NDIS VBD Client 2 Mellanox ConnectX MT25418 DDR Channel Mellanox Connectx 10Gb Ethernet Adapter Mellanox Connectx 10Gb Ethernet Adapter 2 xX Mellanox Virtual Miniport Driver Team A Wise ee ee ijs Other devices Virtual Bundle m Base System Device Team Ports COM amp LPT 9 99 8 EH Me Pte bey by PE n Processors lt gt Storage controllers 1K System devices Universal
230. tware VPI Virtual Protocol Interconnect IPoIB IP over InfiniBand PFC Priority Flow Control PR Path Record RDS Reliable Datagram Sockets RoCE RDMA over Converged Ethernet SL Service Level MPI Message Passing Interface EoIB Ethernet over InfiniBand QoS Quality of Service ULP Upper Level Protocol VL Virtual Lane Mellanox Technologies 14 J Rev 4 70 Related Documents Table 4 Related Documents Document Description MFT User Manual Describes the set of firmware management tools for a single InfiniBand node MFT can be used for Generating a standard or customized Mellanox firmware image Querying for firmware information Burning a firmware image to a single InfiniBand node WinOF Release Notes For possible software issues please refer to WinOF Release Notes Mellanox Technologies 15 J Rev 4 70 1 Introduction This User Manual describes installation configuration and operation of Mellanox WinOF driver Rev 4 70 package Mellanox WinOF is composed of several software modules that contain InfiniBand and Ethernet drivers The Mellanox WinOF driver supports 10 or 40 Gb s Ethernet and 40 or 56 Gb s Infini Band network ports The port type is determined upon boot based on card capabilities and user settings For more details please refer to MFT User Manual 1 1 Hardware and Software Requirements Table 5 Hardware and Software Requirements Requirements Descrip
231. type RC UC default RC SIZe lt size gt The size of message to exchange default 65536 a all Runs sizes from 2 till 223 t tx depth lt dep gt The size of tx queue default 100 n iters lt iters gt The number of exchanges at least 2 default 1000 u qp timeout lt timeout gt QP timeout The timeout value is 4 usec 2 timeout default 14 S sl lt sl gt The service level default 0 x gid index lt index gt Test uses GID with GID index taken from command line for RDMAOE index should be 0 b bidirectional Measures bidirectional bandwidth default unidirectional V version Displays version number g post lt num of posts The number of posts for each qp in the chain default tx depth F CPU freq The CPU frequency test It is active even if the cpufreq ondemand module is loaded q qp lt num of qp s gt The number of qp s default 1 I inline_size lt size gt The maximum size of message to be sent in inline mode default 0 N no peak bw Cancels peak bw calculation default with peak bw R rdma_cm Connect QPs with rdma_cm and run test on those QPs Z com_rdma_cm Communicate with rdma_cm module to exchange data use regular QPs Q cq mod Generate Cqe only after lt cq mod gt completion 14 4 12 ibv_write_lat This is a more adv
232. u want to install Operating system pages Architecture 012 SERVERDATACENTER 4 0 Windows cm 2012 SERVERDATACENTER 4 60RC10 B en US Description Windows Server 2012 SERVERDATACENTER English 5 Run the Windows Setup Wizard 6 Choose iSCSI target drive to install Windows and follow the instructions presented by the installation Wizard Gom CR NENNEN Where do you want to install Windows Name Total size Free space Type Drive 0 Partition 4 Win2012R2DC9600 63 5 GB 54 1GB Logical Drive 0 Unallocated Space 1 0 MB 10MB Extended Drive 0 Partition 5 Win2012DC 63 5 GB 48 6GB Logical Drive 0 Unallocated Space 297 6 GB 297 6 GB Extended gt Drivel Monocated ie 55 0 GB 55 0 GB Ep Refresh Drive options advanced Load driver Installation process will start once completing all the required steps in the Wizard the Client will reboot and will boot from the 1SCSI target Mellanox Technologies 80 J Rev 4 70 10 Deploying Windows Server 2012 and Above with SMB Direct 10 1 Overview The Server Message Block SMB protocol is a network file sharing protocol implemented in Microsoft Windows The set of message packets that defines a particular version of the protocol is called a dialect The Microsoft SMB protocol is a client server implementation and consists of a set of data pack ets each containing a request sent by the client or a response sent by the server SM
233. ult PriorityValue8021Action 3 PS New NetQosPolicy TCP Policystore Activestore IPProtocolMatchCondition TCP PriorityValue8021Action 1 PS New NetQosPolicy UDP Policystore Activestore IPProtocolMatchCondition UDP PriorityValue8021Action 1 Mellanox Technologies 34 J Rev 4 70 Step 5 Optional If VLANs are used mark the egress traffic with the relevant VlanID The NIC is referred as Ethernet 4 in the examples below PS Set NetAdapterAdvancedProperty Name Ethernet 4 RegistryKeyword VlanID RegistryValue 55 Step 6 Optional Configure the IP address for the NIC If DHCP is used the IP address will be assigned automatically PS Set NetIPInterface InterfaceAlias Ethernet 4 DHCP Disabled PS Remove NetIPAddress InterfaceAlias Ethernet 4 AddressFamily IPv4 Confirm false PS New NetIPAddress InterfaceAlias Ethernet 4 IPAddress 192 168 1 10 PrefixLength 24 Type Unicast Step 7 Optional Set the DNS server assuming its IP address is 192 168 1 2 PS Set DnsClientServerAddress InterfaceAlias Ethernet 4 ServerAddresses 192 168 1 2 After establishing the priorities of ND NDK traffic the priorities must have PFC enabled on them Step 8 Disable Priority Flow Control PFC for all other priorities except for 3 PS Disable NetQosFlowControl 0 1 2 4 5 6 7 Step 9 Enable QoS on the relevant interface PS Enable NetAdapterQos InterfaceAlias Ethernet 4 Step 10 Enable PFC on
234. upRecord CustomerAddress 172 16 15 5 ProviderAddress 192 168 20 115 VirtualSubnetID 5001 MACAddress 00155D730100 Rule TranslationMetho dEncap New NetVirtualizationLookupRecord CustomerAddress 172 16 15 6 ProviderAddress 192 168 20 115 VirtualSubnetID 5001 MACAddress 00155D730101 Rule TranslationMetho dEncap Add customer route New NetVirtualizationCustomerRoute RoutingDomainID 11111111 2222 3333 4444 000000005001 VirtualSubnetID 5001 DestinationPrefix 172 16 0 0 16 NextHop 0 0 0 0 Metric 255 Step 4 Configure the Provider Address and Route records on Hyper V Host 2 Host 2 Only mtlae15 SNIC Get NetAdapter Port1 New NetVirtualizationProviderAddress InterfaceIndex SNIC InterfaceIndex Pro viderAddress 192 168 20 115 PrefixLength 24 New NetVirtualizationProviderRoute InterfaceIndex NIC InterfaceIndex Destination Prefix 0 0 0 0 0 NextHop 192 168 20 1 Step 5 Configure the Virtual Subnet ID on the Hyper V Network Switch Ports for each Virtual Machine on each Hyper V Host Host 1 and Host 2 Run the command below for each VM on the host the VM is running on it i e the for mtlael4 005 mtlael4 006 on host 192 168 20 114 and for VMs mtlae15 005 mtlael5 006 on host 192 168 20 115 mtlael5 only Get VMNetworkAdapter VMName mtlael5 005 where MacAddress eq 00155D730100 Set VMNetworkAdapter VirtualSubnetID 5001 Get VMNetworkAdapter VMName mtlael5 006 where MacAddress eq
235. updated Maximum Performance Check this box to configure your system for maximum performance v Check this box to configure your system for maximum performance Recommended Note This step requires you to reboot the machine at the end of the installation process InstallShield Step 10 Configure your system for maximum performance by checking the maximum performance box Maximum Performance Check this box to configure your system For maximum performance v Check this box to configure your system for maximum performance Recommended Note This step requires you to reboot the machine at the end of the installation process InstallShield Cancel This step requires rebooting your machine at the end of the installation HI Mellanox Technologies 22 Rev 4 70 Step 11 Select a Complete or Custom installation follow Step a and on on page 23 Setup Type Choose the setup type that best suits your needs Please select a setup type Complete All program features will be installed Requires the most disk space Choose which program features you want installed and where they will be installed Recommended for advanced users InstallShield a Select the desired feature to install OpenSM installs Windows OpenSM that is required to manage the subnet from a host OpenSM is part of the driver and installed automatically Performances t
236. ure the RSS mode of both ports to be the same in the driver GUI e Mellanox ConnectX EN 10Gbit Ethernet Adapter X device failed to create enough MSI X vec tors The Network interface will not use MSI X interrupts This may affects the performance To fix the problem configure the number of MSI X vectors in the registry to be at least lt Y gt Issue 10 SR IOV Environment In SR IOV environment Mellanox driver fails to load a host machine Symptom In SR IOV environment Mellanox bus driver fails to load and appears with yellow bang in Device Manager Clicking on the properties will show the reason This device cannot find enough free resources that it can use Code 12 If you want to use this device you will need to disable one of the other devices on this system Suggestion This happens because the BAR space required by device cannot be satisfied by the system To resolve this issue please follow the following steps Step 1 Boot to BIOS and disable SR IOV Step 2 Burn Firmware with lower number of VFs Step 3 Re enable SR IOV in BIOS For more information please contact Mellanox support Issue 11 NVGRE configuration Due to an OS issue NVGRE changes done on a running VM are not propagated and do not take effect until OS is restarted Suggestion Change of NVGRE configuration on VM connected to the SR IOV enabled vir tual switch can be done only when VM is stopped 15 3 Performance Troubleshooting Issue 1 Window
237. ventory filename s tress M ulticast Mode t imeout milliseconds 1 log file v vf lt flags gt h elp 14 3 15 20smtest Options The table below lists the various flags of the command Table 28 osmtest Flags and Options Flag Description f flow This option directs osmtest to run a specific flow The following is the flow s description e c create an inventory file with all nodes ports and paths a run all validation tests expecting an input inventory v only validate the given inventory file e s run service registration deregistration and lease test e runevent forwarding test f flood the SA with queries according to the stress mode m multicast flow q QoS info dump VLArb and SLtoVL tables t run trap 64 65 flow this flow requires running of external tool default is all flows except QoS w wait This option specifies the wait time for trap 64 65 in seconds It is used only when running f t the trap 64 65 flow default to 10 sec d debug This option specifies a debug option These options are not normally needed The number following d selects the debug option to enable as follows OPT Description d0 Ignore other SM nodes d1 Force single threaded dispatching d2 Force log flushing after each log message d3 Disable multicast support Mellanox Technologies 129 Rev 4 70 Table 28 osmtest Flags and Options
238. ver to make it suitable for your system s needs Changes made to the Windows registry happen immediately and no backup is automati A cally made Do not edit the Windows registry unless you are confident regarding the changes 7 141 Assigning Port IP After Installation By default your machine is configured to obtain an automatic IP address via a DHCP server In some cases the DHCP server may require the MAC address of the network adapter installed in your machine gt To obtain the MAC address Step 1 Open a CMD console Windows Server 2008 R2 Click Start gt Run and enter CMD Windows Server 2012 2012 R2 Click Start gt Task Manager gt File gt Run new task gt and enter CMD Step 2 Display the MAC address as Physical Address ipconfig all Configuring a static IP is the same for both IPoIB and Ethernet adapters gt To assign a static IP address to a network port after installation Step 1 Open the Network Connections window Locate Local Area Connections with Mellanox devices E Net Network Con Organize v mae Name Status Device Name x Ethernet Network cable unplugged amp Ethernet 2 Ethernet 3 Ethernet 4 Local Area Connection Unidentified network lt m 5 items Mellanox Technologies 29 Rev 4 70 Step 2 Right click a Mellanox Local Area Connection and left click Properties Networking Sharing Connect us
239. which is created with a default PKey value of Oxffff gt Usage part_man exe v lt show add rem gt Local area connection name e v increases verbosity level Show shows the currently configured virtual ipoib ports e Add adds new virtual IPoIB port Where add should be used with interface name as it appears in Network connection in the control panel 66 99 e Name any printable name without quotations marks commas and starting with i Rem removes existing virtual IPoIB port Therefore it requires running it with Show then copy the parameters gt Example Adding and removing virtual port part_man add Ethernet 4 ipoib 4 1 I nc Part man show Ethernet 6 ipoib 4 1 pamm ers erm BIS hers e Eo aljyeuile 4 al Done 14 3 InfiniBand Fabric Diagnostic Utilities The diagnostic utilities described in this chapter provide means for debugging the connectivity and status of InfiniBand IB devices in a fabric 14 3 1 Utilities Usage This section first describes common configuration interface and addressing for all the tools in the package Then it provides detailed descriptions of the tools themselves including operation synopsis and options descriptions error codes and examples Mellanox Technologies 101 Rev 4 70 14 3 1 1 Common Configuration Interface and Addressing Topology File Optional An InfiniBand fabric is composed of switches and channel adapter HCA TCA devices To id
240. work throughput This parameter can be set to one of the following values Enabled default Set RSS Mode Disabled The hardware is configured once to use the Toeplitz hash function and the indirection table is never changed IOAT is not used while in RSS mode Receive Completion Method Sets the completion methods of the received packets and can affect network throughput and CPU utili zation Polling Method Increases the CPU utilization as the system polls the received rings for the incoming packets However it may increase the network performance as the incoming packet is handled faster Interrupt Method Optimizes the CPU as it uses interrupts for handling incoming messages However in certain scenarios it can decrease the network throughput Adaptive Default Settings A combination of the interrupt and polling methods dynamically depending on traffic type and network usage Choosing a different setting may improve network and or system performance in certain configu rations Interrupt Moderation RX Packet Count Number of packets that need to be received before an interrupt is generated on the receive side default 5 Interrupt Moderation RX Packet Time Maximum elapsed time in usec between the receiving of a packet and the generation of an inter rupt even if the moderation count has not been reached default 10 Rx Interrupt Moderation Type Sets the rate at which the control
241. y associations None are not offloaded and are handled in software by the quest operating system BE SCSI Controller 4 Network Adapter Not connected Select the maximum number of offloaded security associations from a range of 1 to 4 Network Adapter 4096 Internal Virtual Switch Virtual machine queue Virtual machine queue VMQ requires a physical network adapter that supports this feature v Enable virtual machine queue v Enable IPsec task offloading Maximum number 512 Offloaded SA 4 Network Adapter Mellanox SRIOV Virtual Switch Hardware Acceleration Single root I O virtualization Advanced Features Single root I O virtualization SR IOV requires specific hardware It also might P comi require drivers to be installed in the guest operating system None When sufficient hardware resources are not available network connectivity is 7 com2 provided through the virtual switch None v Enable SR IOV ld Diskette Drive None amp Management Step 9 Verify that Mellanox Virtual Function appears in the device manager Virtual Function is configured with DHCP IP address It can also be assigned a static IP address Mellanox Technologies 72 Rev 4 70 Figure 17 Virtual Function in the VM File Action View Help 9 m E Hs 4 amp Network adapters amp Mellanox ConnectX 3 Ethernet Adapter amp Microsoft Hyper V Network Adapter amp Microsoft Hyper V Network Adapter
242. you can capture incoming and outgoing traffic by using the ibdump tool and see the DSCP value in the captured packets as displayed in the figure below File Edit View Go Capture Analyze Statistics Telephony Tools Internals Help See SaEx2air_evaFe qaaemazmxiU Filter Expression Clear Apply Save No Time Source Destination Protocol Length Info 90 042502 11 7 33 148 11 7 33 149 UDP 1086 source port 49153 Destination port expl Frame 9 1086 by o bits 10 y captured its Ethernet II SEE 89 57 11 Dst Mellanox 69 56 41 00 02 c9 89 56 41 a Internet Protocol version 4 Src 11 7 33 148 11 7 33 148 Dst 11 7 33 149 11 7 33 149 version 4 Header length 20 bytes Differentiated services Field OxOe DbsCP 0x03 0000 11 Differentiated services eedepoint Unknown DSCP ECN 0x02 ECT 0 ECN Capable Transport Unknown 0x03 L Explicit Congestion Notification ECT 0 ECN Capable Transport 0x02 Total Length 1068 Identification 0x0001 1 Flags 0x02 Don t Fragment Fragment offset 0 Time to live 16 Protocol UDP 17 Header checksum OxOd7c correct Source 11 7 33 148 11 7 33 148 Destination 11 7 33 149 11 7 33 149 source GeoIP Unknown Destination GeoIP Unknown amp User Datagram Protocol src Port 49153 49153 Dst Port expl 1021 Data 1040 bytes F t 8 10 SR lIOV Single Root I O Virtualization SR IOV is a technology that allows a phys
243. ysical port states of an InfiniBand port It also allows adjusting the link speed that is enabled on any InfiniBand port If the queried port is a switch port then ibportstate can be used to Disable enable or reset the port Validate the port s link width and speed against the peer port 14 3 3 1 ibportstate Applicable Hardware All InfiniBand devices 14 3 3 2 ibportstate Synopsis ibportstate d e v V D L G s lt smlid gt V C ca name P ca port u t timeout ms lt dest dr path lid guid lt portnum gt lt op gt lt value gt 14 3 3 3 ibportstate Options The table below lists the various flags of the command Table 16 ibportstate Flags and Options Flag Description h help Print the help menu d debug Raise the IB debug level May be used several times for higher debug lev els ddd or d d d e errors Show send and receive errors timeouts and others v verbose Increase verbosity level May be used several times for additional verbos ity vvv or v v v V version Show version info D Direct Use directed path address arguments The path is a comma separated list of out ports Examples 0 self port 0 1 2 1 4 out via port 1 then 2 L Lid Use Lid address argument Mellanox Technologies 106 Rev 4 70 Table 16 ibportstate Flags and Options Continued
244. ystems nd write bw is performance oriented for RDMA Write with maximum throughput and runs over Microsoft s NetworkDirect standard The level of customizing for the user is relatively high User may choose to run with a customized message size customized num ber of iterations or alternatively customized test duration time nd write bw runs with all mes sage sizes from 1B to 4MB powers of 2 message inlining CQ moderation 14 4 13 1nd write bw Synopsis running on specific single core Server side start b affinity 0X1 nd write bw s1048576 D10 S 11 137 53 1 Client side start b wait affinity 0X1 nd write bw s1048576 D10 C 11 137 53 1 14 4 13 2nd write bw Options The table below lists the various flags of the command Table 48 nd write bw Flags and Options Flag Description h Shows the Help screen V Shows the version number p Connects to the port lt port gt default 68307 s msg size Exchanges the message size with default 65536B gt and it must not be combined with a flag Runs all the messages sizes from 1B to 8MB and it must not be combined with s flag n num of iterations The number of exchanges at least 2 the default is 100000 I max inline size The maximum size of message to send inline The default number is 128B D test duration in seconds Tests duration in seconds f margin time in seconds The margin time to avoi
245. ytes that are covered by this priority The counted bytes include framing characters modulo 2 64 Bytes Total Sec The total number of bytes per second that are covered by this priority The counted bytes include framing characters Packets Total The total number of packets that are covered by this priority modulo 2 64 Packets Total Sec The total number of packets per second that are covered by this priority PAUSE INDICATION Per prio sent pause frames The number of pause frames that were sent to priority i The untagged instance indicates global pause that were sent Per prio sent pause duration The total duration in microseconds of pause that was sent to the other end to freeze the transmission on priority i Per prio rcv pause frames The number of pause frames that were received for priority 1 The untagged instance indicates global pause that were received Per prio rcv pause duration The total duration in microseconds of pause that was requested by the other end to freeze transmission on priority 1 Mellanox Technologies 98 J Rev 4 70 12 OpenSM Subnet Manager OpenSM v3 3 11 is an InfiniBand Subnet Manager In order to operate one host machine or more in the InfiniBand cluster at least one Subnet Manger is required in the fabric cluster Otherwise we recommend using OpenSM from FabricIT EFMTM or UFMQ or Please use the embedded OpenSM in the WinOF package for testing purpo
Download Pdf Manuals
Related Search
Related Contents
アスレチックタイマー AT100PC データ転送ソフト PACS, MODE D`EMPLOI - Site Officiel de la Mairie de Launac Mode d`emploi - Syndicat des enseignants UNSA de l`Indre StarTech.com 4 Port Black SuperSpeed USB 3.0 Hub photos jessica forde. document non contractuel. MONO ADHESIVO CT-1000 おしらせそうべつ2012年2月1日号 Copyright © All rights reserved.
Failed to retrieve file